Asif Masum


Non-Cooperative Byzantine Failures - A New Framework for the Design of Efficient Fault Tolerance Protocols



Amazon Germany

What is fault-tolerant distributed computing? By what means can distributed systems fail? Is it possible for faulty system components to act in a malicious way? Is it even possible that faulty components by chance collaborate and thus provoke a system failure? How can the research area of fault tolerance help to built more reliable systems?  This book answers those questions and it approaches the field of failure modeling in distributed systems and the design of reliable group communication protocols from an academic viewpoint. In this context, the relationship between failure modeling and protocol efficiency will be in the main focus.

The book, which primarily addresses researchers and students in the field of fault-tolerant distributed computing, is divided into three major parts. In the first chapters the groundwork to understand the work presented in this book is laid. This groundwork is especially an important part for those, who firstly discover the interesting research field of fault tolerance. Based upon these basics, a new framework is presented that allows to precisely define failure mode assumptions (i.e., the precise definition on how system components might fail). This chapter on the general framework is followed by the novel idea and definition of non-cooperative Byzantine failures. The identification of these failures and the lessons which can be learned from them, is a main contribution of this work. Its understanding will lead in the final chapter to the definition of a protocol suite, which realizes the reliable broadcast paradigm. In order to be  independent from any specification or programming language, the protocols are presented by most easy understandable state/transition diagrams. Final words in this book are saved for experimental results that are based upon an implementation of the introduced protocols.