Friday, March 4, 2011

Principles of Distributed Databases - Third edition is out, finally!...

cda_displayimage.jpg The third edition is finally out... It has been ten years since the release of the second edition -- it took a while, but we are very happy with the results. We actually started the revision back in 2005 hoping to finish it by 2006, but, as usual, the plans met the reality of many other commitments on both of our parts.

The book is almost a complete re-write. We kept the fundamental principles that have been there since the first edition, but they are updated. The end result is a book that has been heavily revised -- while we maintained and updated the core chapters, we have also added new ones. The major changes are the following:
  1. Database integration and querying is now treated in much more detail, reflecting the attention these topics have received in the community in the past decade. There is one chapter that focuses on the integration process, while another chapter discusses querying over multidatabase systems.
  2. The previous editions had only brief discussion of data replication protocols. This topic is now covered in a separate chapter where we provide an in-depth discussion of the protocols and how they can be integrated with transaction management.
  3. Peer-to-peer data management is discussed in depth. These systems have become an important and interesting architectural alternative to classical distributed database systems. Although the early distributed database systems architectures followed the peer-to-peer paradigm, the modern incarnation of these systems have fundamentally different characteristics, so they deserve in-depth discussion in a chapter of their own.
  4. Web data management is covered in one chapter of its own. This is a difficult topic to cover since there is no unifying framework. We discuss various aspects of the topic ranging from web models to search engines to distributed XML processing.
  5. Earlier editions contained a chapter where we discussed "recent issues" at the time. In this edition, we again have a similar chapter where we cover stream data management and cloud computing. These topics are still in a flux and are subjects of considerable ongoing research. We highlight the issues and the potential research directions.
The resulting manuscript strikes a balance between our two objectives, namely to address new and emerging issues, and maintain the main characteristics of the book in addressing the principles of distributed data management.

The third edition is coming out at a time when there is renewed interest in distributed data management. The last ten years have seen an accelerated investigation of distributed data management technologies spurred by advent of high-speed networks, fast commodity hardware, very heavy parallelization of hardware, and, of course, the increasing pervasiveness of the web. Patrick and I are holding a panel session at the upcoming ICDE 2011 conference on this topic. The objective is to discuss what is likely to happen in the next decade; or to put it differently, if there were to be a fourth edition of our book in 2020, what would it be? What would be new? We'll see what emerges as the important trends. I'll report.

The book is available from Springer, Barnes & Noble, Chapters-Indigo (in Canada), and, of course, Amazon. Springer site will (eventually) have presentation slides, and solutions to selected exercises -- we are working on them right now.