Friday, March 4, 2011

Principles of Distributed Databases - Third edition is out, finally!...

cda_displayimage.jpg The third edition is finally out... It has been ten years since the release of the second edition -- it took a while, but we are very happy with the results. We actually started the revision back in 2005 hoping to finish it by 2006, but, as usual, the plans met the reality of many other commitments on both of our parts.

The book is almost a complete re-write. We kept the fundamental principles that have been there since the first edition, but they are updated. The end result is a book that has been heavily revised -- while we maintained and updated the core chapters, we have also added new ones. The major changes are the following:
  1. Database integration and querying is now treated in much more detail, reflecting the attention these topics have received in the community in the past decade. There is one chapter that focuses on the integration process, while another chapter discusses querying over multidatabase systems.
  2. The previous editions had only brief discussion of data replication protocols. This topic is now covered in a separate chapter where we provide an in-depth discussion of the protocols and how they can be integrated with transaction management.
  3. Peer-to-peer data management is discussed in depth. These systems have become an important and interesting architectural alternative to classical distributed database systems. Although the early distributed database systems architectures followed the peer-to-peer paradigm, the modern incarnation of these systems have fundamentally different characteristics, so they deserve in-depth discussion in a chapter of their own.
  4. Web data management is covered in one chapter of its own. This is a difficult topic to cover since there is no unifying framework. We discuss various aspects of the topic ranging from web models to search engines to distributed XML processing.
  5. Earlier editions contained a chapter where we discussed "recent issues" at the time. In this edition, we again have a similar chapter where we cover stream data management and cloud computing. These topics are still in a flux and are subjects of considerable ongoing research. We highlight the issues and the potential research directions.
The resulting manuscript strikes a balance between our two objectives, namely to address new and emerging issues, and maintain the main characteristics of the book in addressing the principles of distributed data management.

The third edition is coming out at a time when there is renewed interest in distributed data management. The last ten years have seen an accelerated investigation of distributed data management technologies spurred by advent of high-speed networks, fast commodity hardware, very heavy parallelization of hardware, and, of course, the increasing pervasiveness of the web. Patrick and I are holding a panel session at the upcoming ICDE 2011 conference on this topic. The objective is to discuss what is likely to happen in the next decade; or to put it differently, if there were to be a fourth edition of our book in 2020, what would it be? What would be new? We'll see what emerges as the important trends. I'll report.

The book is available from Springer, Barnes & Noble, Chapters-Indigo (in Canada), and, of course, Amazon. Springer site will (eventually) have presentation slides, and solutions to selected exercises -- we are working on them right now.

Sunday, February 6, 2011

J.C.R. Lickider and the early days of computing

I just finished reading The Dream Machine by M. Mitchell Waldrop (not the 1991 movie...). It is a biography of J. C. R. Licklider, but it is much more than that - it is the story of the very early days of computing in the US starting in the 1950s. J. C. R. Licklider, or Lick as he apparently preferred to be called, started his career at Harvard in the Psycho-Acoustic Laboratory in 1943 after receiving his PhD at University of Rochester on that very topic. During his time at Harvard, he started attending the famous "supper seminars" organized by Norbert Weiner (who was a distinguished mathematician and is the father of the cybernetics movement). One of the problems debated at these seminars was the relationship of digital computers and the human brain. Thus started Lick's interest in computing, which shaped the rest of his life. In 1950 he moved to MIT with the promise of setting up a cognitive psychology research program and a department of psychology. He did set up a top-notch and influential program, but he could not realize the objective of setting up a department due to institutional obstruction. He moved to BBN in mid-1957 as Vice-President in charge of all psycho-acoustics research. He moved to ARPA in 1962 to head the Information Processing Techniques Office (IPTO) where he stayed until 1964. He then moved to IBM for a short while and then returned to MIT in 1968 from where he retired in 1985. He passed away in 1990.

His academic career, as it relates to computing, is very interesting and it is eye opening to read some of his papers. After I finished the book, I read his 1961 paper "Man Computer Symbiosis" and his 1968 paper co-authored with Bob Taylor (who himself became the head of IPTO later on, and is one of the fathers of the ARPANET, "The Computer as a Communication Device", both of which were included in a 1990 DEC Technical Report in memory of Lick shortly after he passed away. His vision of where computing should go, in particular his emphasis on moving from a computing paradigm based on well-defined specification (and coding) of a solution supported by batch processing to one where the system "works" with the users and "learns" along the way, and is supported by timesharing (and later interactive) computing, is very enlightening when considered in historical perspective.

Lick's ARPA days were perhaps far more influential on the growth of computing in the US. He was influential in initiating and funding projects at a few key institutions on timesharing (Project MAC at MIT, and Ed Feigenbaum at UC Berkeley), AI (again Project MAC and Marvin Minsky at MIT, Allen Newell, Herbert Simon, and Alan Perlis's work at CMU,John McCarthy's work at Stanford), human-computer interaction (Doug Engelbart's group at SRI), and he started the work on ARPAnet. He had explained his ideas of an "intergalactic computer network" in a series of memos in 1961 while he was at BBN. These ideas are also summarized in the 1968 essay "The Computer as a Communication Device". The book is very well researched and very nicely written. It is Lick's life that forms the backbone of the book, but that is not constraining at all given Lick's impact on so many areas. The projects that he funded are very well described. The projects and efforts that grew out of these early projects (such as Xerox PARC) are also included to complete the narrative.

When I completed the book, I kept thinking that new generation of students should be exposed to the history of computing in some way. There is significant value in being able to see the thread of ideas from their early germination to their later realization (sometimes decades later). I believe it would be better to weave the discussion of history into the discussion of fundamental techniques and algorithms. This requires a rethinking of how we introduce computer science -- especially in the early courses -- but that is a topic for another blog.