Menu

Books


Replication Techniques in Distributed Systems

Abdelsalam A. Helal
Abdelsalam A. Heddaya
Bharat B. Bhargava

Foreward by Jim Gray


Kluwer Academic Publishers
August 1996

ISBN 0-7923-9800-9

From the Foreword

"The book is very readable and covers fundamental work that allows the reader to understand the roots of many ideas. It is a real contribution to the field."

"One cannot look at this book and not be impressed with the progress we have made. The advances in our understanding and skill over the last twenty years are astonishing. I recall the excitement when some of these results were first discovered. Judging from current activity, there are even better ways to do what we are doing today. Data and process replication is an exciting field. I applaud Helal, Heddaya, and Bhargava for surveying and annotating the current literature for students and practitioners." ---Jim Gray, Microsoft, March 1996


From the Preface

In this monograph, we organize and survey the spectrum of replication protocols and systems that achieve high availability by replicating entities in failure-prone distributed computing environments. The entities we consider vary from passive untyped data objects, to which we devote about half the book, to typed and complex objects, to processes and messages. Within the limits imposed by scope, size and available published literature, we strive to present enough detail and comparison to guide students, practitioners, and beginning researchers through the thickets of the field. The book, therefore, serves as an efficient introduction and roadmap.

However, replication of data, processes, or messages, risks a number of ill consequences that can arise even in the absence of failures. Unless carefully managed, replication jeopardizes consistency, reduces total system throughput, renders deadlocks more likely, and weakens security. Site and communication failures exacerbate these risks, especially given that---in an asynchronous environment---failures are generally indistinguishable from mere slowness. It is these undesirable effects that have slowed down the transfer of the voluminous research on replication, into large scale applications. Our overriding concern in this book is to present how the research literature has attempted to understand and solve each of these difficulties.


Table of Contents

  1. Introduction
    1. How systems fail
    2. Reliability x Availability = Dependability
    3. Replication for failure management
    4. Replication for performance
    5. Costs and limitations of replication
  2. Replication of Data
    1. Model of Distributed Database System
      1. Concurrency Control
      2. Atomicity Control
      3. Mutual Consistency in Replicated Databases
    2. Read One Write All (ROWA)
      1. Simple ROWA Protocol
      2. Read One Write All Available (ROWA-A)
      3. Primary Copy ROWA
      4. True Copy Token ROWA
    3. Quorum Consensus (QC) or Voting
      1. Uniform Majority QC
      2. Weighted Majority QC
      3. Weighted Majority QC for Directories
      4. General QC for Abstract Data Types
      5. Hybrid ROWA/QC
    4. Quorum Consensus on Structured Networks
      1. Sqrt(n) Algorithm
      2. The Grid Protocol
      3. Asymptotically High Availability
      4. Tree Quorums
      5. Hierarchical Weighted Majority QC
      6. Multidimensional Weighted Majority QC
    5. Reconfiguration after Site Failures
      1. Primary Copy ROWA
      2. Directory-based ROWA-Available
      3. Regenerative ROWA
      4. Regenerative ROWA-Available
      5. Regenerative Quorum Consensus
      6. QC with Witnesses
      7. QC with Ghosts
    6. Reconfiguration after Network Partitions
      1. Dynamic Uniform Majority Voting
      2. Virtual Partitions
      3. Dynamic Weighted Majority Voting
      4. Dynamic Quorum Adjustment
      5. Class Conflict Analysis
      6. Read-only Transactions
    7. Optimism and Conflict Resolution
    8. Coding-theoretic Redundancy
  3. Replication of Processes
    1. Replication based on Modular Redundancy
    2. Consistency of Processes
    3. Replicated Distributed Programs and the Circus Approach\/
    4. Replicated Transactions and the Clouds Approach
    5. Replication in Isis
    6. Primary/Standby Schemes
    7. Process Replication for Performance
  4. Replication of Objects
    1. Replication of Composite Objects
    2. Replicated Objects in Guide
  5. Replication of Messages
    1. Reliable Broadcast Protocols
    2. Quorum Multicast Protocols
  6. Replication in Heterogeneous, Mobile, and Large-Scale Systems
    1. Replication in Heterogeneous Databases
      1. Identity Connection
      2. Update through Current Copy
      3. Interdependent Data Specification and Polytransactions
      4. Weighted Voting in Heterogeneous Databases
      5. Primary Copy in Heterogeneous Databases
    2. Replication in Mobile Environments
    3. Replication in Large-Scale Systems
  7. The Future of Replication
  • Appendix A: Systems
    1. Amoeba
    2. Alphorn
    3. Andrew (AFS)
    4. Arjuna
    5. Avalon
    6. Birlix
    7. Camelot
    8. Coda
    9. Deceit
    10. Echo
    11. Eden
    12. Ficus-Locus
    13. Galaxy
    14. Guide
    15. Harp
    16. Isis
    17. Mariposa
    18. Oracle 7
    19. Purdue Raid
    20. Rainbow
    21. SDD--1
    22. Sybase 10
    23. Yackos
  • Appendix B: Further Readings
    1. Data Replication (contributed by Yelena Yesha, NASA Goddard Space Flight Center and Konstantinos Kalpak, University of Maryland Baltimore County)
    2. Process, Object, and Message Replication (contributed by Jehan-Francois Paris, University of Houston)
    3. Replication in Heterogeneous, Mobile, and Large-Scale Systems (contributed by Mike W. Bright, University of Pittsburgh)
    4. Availability and Performance (contributed by Waleed A. Muhanna, The Ohio State University)
    5. Implementations (contributed by John Riedl, University of Minnesota)
  • Appendix C: Serializability Theory
  • References
  • Index