Abstract:
|
Replicated systems are a kind of distributed systems whose main goal
is to ensure that computer systems are highly available, fault tolerant and
provide high performance. One of the last trends in replication techniques
managed ...[+]
Replicated systems are a kind of distributed systems whose main goal
is to ensure that computer systems are highly available, fault tolerant and
provide high performance. One of the last trends in replication techniques
managed by replication protocols, make use of Group Communication Sys-
tem, and more specifically of the communication primitive atomic broadcast
for developing more eficient replication protocols.
An important aspect in these systems consists in how they manage
the disconnection of nodes {which degrades their service{ and the connec-
tion/reconnection of nodes for maintaining their original support. This task
is delegated in replicated systems to recovery protocols. How it works de-
pends specially on the failure model adopted. A model commonly used for
systems managing large state is the crash-recovery with partial amnesia be-
cause it implies short recovery periods. But, assuming it implies arising
several problems. Most of them have been already solved in the literature:
view management, abort of local transactions started in crashed nodes {
when referring to transactional environments{ or for example the reinclu-
sion of new nodes to the replicated system. Anyway, there is one problem
related to the assumption of this second failure model that has not been
completely considered: the amnesia phenomenon. Phenomenon that can
lead to inconsistencies if it is not correctly managed.
This work presents this inconsistency problem due to the amnesia and
formalizes it, de ning the properties that must be ful lled for avoiding it
and de ning possible solutions. Besides, it also presents and formalizes an
inconsistency problem {due to the amnesia{ which appears under a speci c
sequence of events allowed by the majority partition progress condition that
will imply to stop the system, proposing the properties for overcoming it and
proposing di erent solutions. As a consequence it proposes a new majority
partition progress condition. In the sequel there is de
[-]
|