Distributed computing “Coordination and agreement” Mathieu Delalandre University of Tours, Tours city, France
[email protected]
1
Coordination and agreement 1. Introduction 2. Distributed mutual exclusion 2.1. Introduction 2.2. The baseline methods 2.3. The Ricart-Agrawala algorithm 3. Group communication 3.1. Introduction 3.2. Basic multicast 3.3. Reliable multicast over B-multicast 3.4. Reliable multicast over IP-multicast
2
Introduction Coordination and agreement concerns distributed algorithms whose goals aim to coordinate distributed processes on actions to perform, but also agreement on values to reach. Typical problems are: Distributed mutual exclusion: in a group, processes must agree on access requirements to no-shareable resources.
Group communication: specific algorithms need to be designed to enable efficient group communication wherein processes can join, leave or even fail.
P0
failure × 1. send to n
R P1
P2
2. check if n receipts, if not send 0 ACK
At the corner. leader election, consensus agreement, deadlock detection and resolution, global predicate detection, termination detection, failure detection, garbage collection, etc.
3
Coordination and agreement 1. Introduction 2. Distributed mutual exclusion 2.1. Introduction 2.2. The baseline methods 2.3. The Ricart-Agrawala algorithm 3. Group communication 3.1. Introduction 3.2. Basic multicast 3.3. Reliable multicast over B-multicast 3.4. Reliable multicast over IP-multicast
4
Distributed mutual exclusion “Introduction” (1) Mutual exclusion: two events are mutually exclusive if they cannot occur at the same time. Mutual exclusion algorithms are used to avoid the simultaneous use of a resource by the critical section piece of code.
Requirements for mutual exclusion include the following properties:
Safety
At most one process may execute in the critical section.
Liveness
Requests to enter and exit the critical section eventually succeed.
Ordering If one request to enter the critical section happened-before another, then the entry to the critical section is granted in that order.
5
Distributed mutual exclusion “Introduction” (2) Mutual exclusion: two events are mutually exclusive if they cannot occur at the same time. Mutual exclusion algorithms are used to avoid the simultaneous use of a resource by the critical section piece of code.
Performance metrics are generally measured as follows: Message complexity = f(N) is the number of messages that are required per critical section execution with N the members of the system.
The last process exists the critical section
The next process enters the critical section
Synchronization delay (SD) is the time required before the next process enters in the critical section. Section execution time (E) is the execution time that a process requires to be in the critical section.
Synchronization Delay (SD)
section execution time (E)
System Throughput (ST) is the rate at which the system executes requests for the critical section. ST =
1 SD + E
ST
System Throughput in request/second
is the average Synchronization Delay in second 1 [1, n ] SDi ∑ (n is the synchronization magnitude) n ∀i is the average section execution time in second 1 [1, n ] E = ∑ Ei (n is the synchronization magnitude) n ∀i
SD =
6
Distributed mutual exclusion “Introduction” (3) Mutual exclusion: two events are mutually exclusive if they cannot occur at the same time. Mutual exclusion algorithms are used to avoid the simultaneous use of a resource by the critical section piece of code.
Performance metrics are generally measured as follows:
centralized / local request request message Response time (RT) is the time interval that a request waits its sent out
critical section to be over, after its request message has been sent out, we have RT > E
entering the critical section leaving the critical section
section execution time (E)
e.g. a non concurrent access case
Response Time (RT)
Processing
(E) Synchronization Delay (SD)
Response Time (RT)
7
Distributed mutual exclusion “Introduction” (4) Mutual exclusion: two events are mutually exclusive if they cannot occur at the same time. Mutual exclusion algorithms are used to avoid the simultaneous use of a resource by the critical section piece of code.
The exclusion methods: the baseline ways to achieve mutual exclusion is message passing, to use a server or to arrange processes in a logical ring. These approaches suffers from performance and failure and are not supporting ordering. Different advanced algorithms have been proposed to solve these problem including the Ricart-Agrawala’s algorithm.
Message passing Central server algorithm Ring-based algorithm Ricart-Agrawala algorithm
=2 ≥2 ≥2
ordering
safety
Process number
liveness
Requirements
yes
Constraints and performances
it works with 2 processes the coordinator can crash
yes yes
no
SD is large the message complexity is f(N) = 2(N-1)
8
Coordination and agreement 1. Introduction 2. Distributed mutual exclusion 2.1. Introduction 2.2. The baseline methods 2.3. The Ricart-Agrawala algorithm 3. Group communication 3.1. Introduction 3.2. Basic multicast 3.3. Reliable multicast over B-multicast 3.4. Reliable multicast over IP-multicast
9
Distributed mutual exclusion “The baseline methods” (1)
Message passing Central server algorithm Ring-based algorithm Ricart-Agrawala algorithm
=2 ≥2 ≥2
ordering
safety
Process number
liveness
Requirements
yes
Constraints and performances
it works with 2 processes the coordinator can crash
yes yes
no
SD is large the message complexity is f(N) = 2(N-1)
10
Distributed mutual exclusion “The baseline methods” (2) Message passing provides synchronization and communication functions, that can be used in distributed systems as well in shared-memory systems. A number of design issues must be considered: synchronization primitives (blocking, non blocking), addressing, message format and queuing discipline (FIFO, priority, etc.). The next algorithm achieves mutual exclusion with message passing, we assume here the use of a blocking receive primitive and a non blocking send primitive. The two processes exchange a token that ensures a mutual exclusion access.
1. send 1. send
2. receive computer 1
Process A loop (1) receive message from B (2) pop item from message (3) send message to B
message Process B
command
Process A
receive
true
false
continue
wait
2. receive computer 2
Process B loop (1) receive message from A (2) push a new item in message (3) send message to A
11
Distributed mutual exclusion “The baseline methods” (3) Central server algorithm: the simplest way to achieve mutual exclusion with multiple processes is to employ a server that grants permission to enter in the critical section. This simulates how mutual exclusion is done in a one-processor system. The two access cases are no-concurrent and concurrent. The no-concurrent access
(3) ok P1
(2) C gives the section to P1
(4) release C
(5) C frees the section
wait
wait
C
hold
(1) request
(4) P1 sends a release to C (5) C frees the section
P1
hold
(1) P1 sends a request to C (2-3) C registers P1 in the critical section and sends ok
P1
step 1
step 2
(1), (2), (3) … are event orders Pi
Pi are distributed processes
the stack managed by the coordinator
C
C is a process coordinator
a message between two processes 12
Distributed mutual exclusion “The baseline methods” (4) Central server algorithm: the simplest way to achieve mutual exclusion with multiple processes is to employ a server that grants permission to enter in the critical section. This simulates how mutual exclusion is done in a one-processor system. The two access cases are no-concurrent and concurrent. The concurrent access P2
(1) P2 sends a request to C (2) C doesn’t answer and pushes the request in the queue if P2 is not included yet
C
hold
(1) request
P1
P2
(3) release
(5) ok C
P1
hold
P1
P2
(4) C frees the section and pops up the queue
P2 (2) C pushes P2 request in the queue
wait
wait
(3) P1 sends a release to C (4-5) C frees the section, pops up P2 from the queue and gives the section, then it sends ok to P2
step 1
step 2
(1), (2), (3) … are event orders Pi
Pi are distributed processes
the stack managed by the coordinator
C
C is a process coordinator
a message between two processes 13
Distributed mutual exclusion “The baseline methods” (5) Ring-based algorithm: requires that each process Pi has a communication channel to the next process in the ring P(i+1)mod N. The idea is that exclusion is conferred by obtaining a token in the form of a message passed from process to process in a single direction. e.g. eight processes P0 to P7 are scheduled on different computers across a network and synchronize through a ring using a token with message passing.
P0 P7
P1 Pi P2
P6
Pi are distributed processes is the token message ordering
P5
P3 P4
14
Coordination and agreement 1. Introduction 2. Distributed mutual exclusion 2.1. Introduction 2.2. The baseline methods 2.3. The Ricart-Agrawala algorithm 3. Group communication 3.1. Introduction 3.2. Basic multicast 3.3. Reliable multicast over B-multicast 3.4. Reliable multicast over IP-multicast
15
Distributed mutual exclusion “The Ricart-Agrawala algorithm” (1)
Message passing Central server algorithm Ring-based algorithm Ricart-Agrawala algorithm
=2 ≥2 ≥2
ordering
safety
Process number
liveness
Requirements
yes
Constraints and performances
it works with 2 processes the coordinator can crash
yes yes
no
SD is large the message complexity is f(N) = 2(N-1)
16
Distributed mutual exclusion “The Ricart-Agrawala algorithm” (2) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps.
17
Distributed mutual exclusion “The Ricart-Agrawala algorithm” (3) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps. Let us try to understand how the algorithm works, considering (1) the access case
(3) Access to critical section P0
P0 (1) P0 timestamp
(1) P0 timestamp
P1
P2
(2) ok
P0 (2) ok
P1
P2
(1) P0 sends its timestamp to everybody
(2) all processes not in the critical section send ok
step 1
step 2
P1
P2
(3) P0 accesses the critical section
step 3
(1), (2), (3) … are event orders Pi
Pi are distributed processes
a request message with timestamp between two processes a reply message between two processes
the critical section
a process accesses the critical section
18
Distributed mutual exclusion “The Ricart-Agrawala algorithm” (4) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps. Let us try to understand how the algorithm works, considering (2) the blocking case
P0
P0 (1) P0 timestamp
(1) P0 timestamp
P1
P2
(1) P0 sends its timestamp to everybody
step 1
P0
(2) ok
(3) ok
P1
P2
(2) P1 not in the critical section sends ok, not P2, the access to the critical section is denied
step 2
P1
P2
(3) P2 frees the critical section and sends ok to P0, the access to the critical section is granted
step 3
(1), (2), (3) … are event orders Pi
Pi are distributed processes
a request message with timestamp between two processes a reply message between two processes
the critical section
a process accesses the critical section
19
Distributed mutual exclusion “The Ricart-Agrawala algorithm” (5) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps. Let us try to understand how the algorithm works, considering (3) the concurrent case (3) P0 accesses the critical section
(4) P0 frees the critical section
P0
P0 12
8
(2) ok
P0 (2) ok
(5) ok
8 P1
P2 12
P1
P2
(1) P0, P2 interested to access the critical section, (2-3) P1 not in the critical section sends ok to send their timestamps to everybody both, P0’s timestamp is lowest thus it wins, P2 sends ok to P0, P0 accesses
step 1
P1
P2
(2) ok (4-5) P0 frees the critical section and sends ok to P2, it grants P0
step 2
step 3
(1), (2), (3) … are event orders Pi
Pi are distributed processes
a request message with timestamp between two processes a reply message between two processes
the critical section
a process accesses the critical section
20
Distributed mutual exclusion “The Ricart-Agrawala algorithm” (6) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps. The Ricart-Agrawala algorithm can be defined as follows: •The algorithm uses request and reply messages, a request message is sent to all other processes to request their permission to enter in the critical section, a reply messages is sent to a process to give a permission. •Processes use a logical clock to assign timestamps to critical section requests. These timestamps are used to decide the priority of requests in the case of conflict. Considering , , where T are timestamps and p process identifiers, i and j are the receiver and the sender respectively. •If Ti < Tj then pi defers the reply to pj while pi is waiting for the critical section. •Otherwise pi sends a reply message to pj, thus the highest priority request succeeds in collecting all the reply messages.
21
Distributed mutual exclusion “The Ricart-Agrawala algorithm” (7) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps. The Ricart-Agrawala algorithm can be defined as follows: •Each process maintains a request-deferred array RDi of size “number of processes in the system”, initially ∀i ∀j RDi [j] = 0. •Whenever pi defers the request sent by pj, it sets RDi [j] = 1. •After pi has sent a reply message to pj, it sets RDi [j] = 0. •When a process takes up a request for a critical section, it updates its logical clock. Also, when a process receives a timestamp, it updates its clock using this timestamp.
22
Distributed mutual exclusion “The Ricart-Agrawala algorithm” (8) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps.
Initialization with pi (0) state = RELEASED Requesting the critical section with pi (1) state = WANTED, update Ti (2) broadcast
Receiving a reply message with pi (5) Msgi = Msgi +1 if Msgi == N-1 (6) state = HELD
Receiving rule with pi if (state == HELD) or (state == WANTED) and (Ti < Tj) (3) RDi [j] = 1 otherwise (4) update Ti, send a reply
Releasing the critical section with pi (7) state = RELEASED (8) for ∀j, if RDi [j] = 1 send a reply message to pj , set RDi [j] = 0 Msgi = 0
23
Ricart-Agrawala algorithm (9) Initialization with pi (0) state = RELEASED Requesting the critical section with pi (1) state = WANTED, update Ti (2) broadcast Receiving rule with pi if (state == HELD) or (state == WANTED) and (Ti < Tj) (3) RDi [j] = 1 otherwise (4) update Ti, send a reply Receiving a reply message with pi (5) Msgi = Msgi +1 if Msgi == N-1 (6) state = HELD Releasing the critical section with pi (7) state = RELEASED (8) for ∀j, if RDi [j] = 1 send a reply message to pj , set RDi [j] = 0 Msgi = 0
e.g. considering three process P0, P1 and P2 and a resource R: P1 gets the section at first then P2 is blocked while accessing R, P0 and P2 enter in a concurrent access case, the section is released to P2 then P0.
t2
P0
P1
t6
t4
t0
t9
t10 t11
t5
t1
t16
t7
t3
P2
t12
t8
t13 t14
t15
Events
Description
(1) access events
t0 to t4
P1 asks for the critical section, granted by P0, P1.
(2) blocking access events
t5 to t8
P2 asks for the critical section, while P1 holds it.
(3) concurrent access events
t9, t10, t13 P0 asks for the critical section, while P1 holds it and P2 is still looking for it, as CL2 < CL0 at t13 P2 will not reply.
(2) blocking access events
t11, t12, t14 P1 frees the section, as P2 was granted by P0 at t8 it got it. t15 and t16 when releasing the section at t15, P2 grants P0. 24
Ricart-Agrawala algorithm (10) Initialization with pi (0) state = RELEASED Requesting the critical section with pi (1) state = WANTED, update Ti (2) broadcast Receiving rule with pi if (state == HELD) or (state == WANTED) and (Ti < Tj) (3) RDi [j] = 1 otherwise (4) update Ti, send a reply Receiving a reply message with pi (5) Msgi = Msgi +1 if Msgi == N-1 (6) state = HELD
e.g. considering three process P0, P1 and P2 and a resource R: P1 gets the section at first then P2 is blocked while accessing R, P0 and P2 enter in a concurrent access case, the section is released to P2 then P0.
t2
P0
t6
t4
t0
P1
t9
t10 t11
P2
t5
t1
t8
time
t0
t1
t2
t3
rules
(1)(2)
(4)
(4)
(5)
R
CL0
3
RD0
∅
Msg0
0
t4
t5
(5)(6) (1)(2)
t13 t14
t15
t6
t7
t8
t9
t10
t11
(4)
(3)
(5)
(1)(2)
(3)
(7)(8)
W 8
P1 sate
R
W
CL1
6
7
RD1
∅
Msg1
0
Releasing the critical section with pi P2 Sate R (7) state = RELEASED CL2 5 (8) for ∀j, if RDi [j] = 1 RD2 ∅ send a reply message to pj , set RDi [j] = 0 Msg2 0 Msgi = 0
t16
t7
t3
P0 state
t12
10
11
H
R
P2 1
P2,0
2
0 W
8
∅
9
1
Ricart-Agrawala algorithm (11) Initialization with pi (0) state = RELEASED Requesting the critical section with pi (1) state = WANTED, update Ti (2) broadcast Receiving rule with pi if (state == HELD) or (state == WANTED) and (Ti < Tj) (3) RDi [j] = 1 otherwise (4) update Ti, send a reply Receiving a reply message with pi (5) Msgi = Msgi +1 if Msgi == N-1 (6) state = HELD
e.g. considering three process P0, P1 and P2 and a resource R: P1 gets the section at first then P2 is blocked while accessing R, P0 and P2 enter in a concurrent access case, the section is released to P2 then P0.
t2
P0
t6
t4
t0
P1
t9
t10 t11
P2
t5
t1 time
t12
t13
rules
(5)
(3)
W
CL0
11
RD0
∅
Msg0
0
P1 sate
R
CL1
7
RD1
∅
Msg1
0
Releasing the critical section with pi P2 Sate W (7) state = RELEASED CL2 9 (8) for ∀j, if RDi [j] = 1 RD2 ∅ send a reply message to pj , set RDi [j] = 0 Msg2 1 Msgi = 0
t16
t7
t3
P0 state
t12
t14
t8 t15
t16
(5)(6) (7)(8) (5)(6) H
1
2
H
R ∅
P0 2
0
t13 t14
t15
Ricart-Agrawala algorithm (12) Initialization with pi (0) state = RELEASED Requesting the critical section with pi (1) state = WANTED, update Ti (2) broadcast Receiving rule with pi if (state == HELD) or (state == WANTED) and (Ti < Tj) (3) RDi [j] = 1 otherwise (4) update Ti, send a reply Receiving a reply message with pi (5) Msgi = Msgi +1 if Msgi == N-1 (6) state = HELD Releasing the critical section with pi (7) state = RELEASED (8) for ∀j, if RDi [j] = 1 send a reply message to pj , set RDi [j] = 0 Msgi = 0
e.g. considering three process P0, P1 and P2 and a resource R: P1 gets the section at first then P2 is blocked while accessing R, P0 and P2 enter in a concurrent access case, the section is released to P2 then P0.
t2
P0
P1
t6
t9
t4
t0
t1
t16
t7
t3
P2
t12
t10 t11
t5
t8
SD
t13 t14
SD
E1 RT1 RT2
t15
E2 SD
Coordination and agreement 1. Introduction 2. Distributed mutual exclusion 2.1. Introduction 2.2. The baseline methods 2.3. The Ricart-Agrawala algorithm 3. Group communication 3.1. Introduction 3.2. Basic multicast 3.3. Reliable multicast over B-multicast 3.4. Reliable multicast over IP-multicast
28
Group communication “Introduction” (1) Group communication: specific algorithms need to be designed to enable efficient group communication wherein processes can join and leave groups dynamically, or even fail. worker worker coordinator
A group is a collection of processes that share a common context and collaborate on a common task within an application domain. worker
worker
a peer group Multicast: the communication within a group can be ensured with a multicast operation. The multicast operation can be driven at the application or network layer. Application layer multicast is supported with one-to-one send operations. The implementation can use threads to perform the send operations concurrently. Network layer multicast (or IP multicast) sends message to the group in a single transmission. Copies are automatically created into network elements that contain members of the group, such as routers, switches, etc. It can be implemented using IP multicast, part of the IP protocol.
application layer multicast
network layer multicast
a hierarchical group
a group
a group 29
Group communication “Introduction” (2) Reliable multicast is one that satisfies the integrity, validity and agreement properties.
Validity: if a process multicasts a message m, then it will deliver m. The validity property guarantees liveness for the sender.
Agreement
Agreement: if a process delivers a message m, then all other processes in the group will deliver m. The message must arrive to all the members of the group.
1. send to n without failure
Validity
Integrity: the message received is the same as the one sent, and none message is delivered twice.
2. check if n receipts
30
Group communication “Introduction” (3) Ordered multicast: different recipients may receive the messages in different orders, possibly violating the semantics of the distributed program. Formal specifications of ordered delivery need to be formulated and implemented.
process/computer P1
P2
P3
P4
m1 time
m2
case message m1 before m2
case message m2 before m1
31
Group communication “Introduction” (4) Multicast primitives are implementations for the multicast communication that satisfy the reliability and ordering properties with performances.
B-multicast R-multicast over B-multicast R-multicast over IP multicast
application layer
network layer
yes
ordering
agreement
validity
Multicast level
integrity
Requirements
ACK-implosion, 2|g| messages
no yes
yes
Constraints and performances
no
ACK-implosion, Rgk ∀k then there is one or more message that q has not received yet, q maintains p, S in the hold-back queue HQ and sends NACK in the form q, Rgp ,k . q
Message processing deliver DQ HQ
incoming messages
when delivery guaranty are met
51
Group communication “Reliable multicast over IP multicast” (6) Initialization for all p (0) S gp = 0, Rgk = 0 ∀k , HQ = BQ = DQ = ∅
to init p (either q)
For p to R-multicast(g,m) (1) S gp = S gp + 1 (2) piggyback the messages since the last sending and send m
to R-multicast
For p to reply to a negative ACK q, Rgp (3) recover the messages p, S gp from BQ such as S gp > Rgp and send m
to reply to a negative ACK
For q to R-deliver(m) from p S , R = get ( m) (4) if S = Rgp + 1 R-deliver(m), Rgp = S (5) if S ≤ Rgp discard m (6) if S > Rgp + 1 maintain p, S in HQ send a negative ACK q, Rgp to p (7) if R > Rgk ∀k maintain p, S in HQ send a negative ACK q, Rgk ∀k
regular delivering case m has been delivered before there is one or more message that q has not received yet from p there is one or more message that q has not received yet from another process 52
e.g. considering three process P0, P1 and P2 cooperating in a distributed way with reliable multicast over IP multicast.
…. “Reliable over IP” (7) Initialization for all p (0) S gp = 0, Rgk = 0 ∀k , HQ = BQ = DQ = ∅
For q to R-deliver(m) from p S , R = get ( m) (4) if S = Rgp + 1 R-deliver(m), Rgp = S (5) if S ≤ Rgp discard m (6) if S > Rgp + 1 maintain p, S in HQ send a negative ACK q, Rgp to p (7) if R > Rgk ∀k maintain p, S in HQ send a negative ACK q, Rgk ∀k
t5
t11
t7
P0
t12
t15
NACK
1,1 2,1
0,1
1,2
t1 t2
P1
t8
1,3
t10
1,2 1,3
t13
NACK
For p to R-multicast(g,m) (1) S gp = S gp + 1 (2) piggyback the messages since the last sending and send m For p to reply to a negative ACK q, Rgp (3) recover the messages p, S gp from BQ such as S gp > Rgp and send m
t4
0,0
2,1
P2 t 0
t3
2,1
t6
t9
a message m with piggybacked values
t14 a negative ACK
Time Rule
t0
(1)(2)
t1
(4)
t2
(1)(2)
t3
(4)
t4
(7)
t5
(4)
regular sending receiving case, as P2 has not received a message yet there is no piggybacked message. P1 receives a P2’s message and piggybacks it. P0 has not received the P2’s message yet, it pushes the P1’s message in its holdback queue and sends a negative ACK to P2.
(4)
P0 processes the P2 message, then this triggers a delivery condition for the P1’s message recovered from the holdback queue and P2 processes that message.
t6
(3)
P2 resends a message to P1, recovered from its backup queue, to reply to the NACK.
t7
(5)
P0 already received the P2’s message while P2 replied to the negative ACK, the message is discarded. 53
e.g. considering three process P0, P1 and P2 cooperating in a distributed way with reliable multicast over IP multicast.
…. “Reliable over IP” (8) Initialization for all p (0) S gp = 0, Rgk = 0 ∀k , HQ = BQ = DQ = ∅
For q to R-deliver(m) from p S , R = get ( m) (4) if S = Rgp + 1 R-deliver(m), Rgp = S (5) if S ≤ Rgp discard m (6) if S > Rgp + 1 maintain p, S in HQ send a negative ACK q, Rgp to p (7) if R > Rgk ∀k maintain p, S in HQ send a negative ACK q, Rgk ∀k
t5
t11
t7
P0
t12
t15
NACK
1,1 2,1
0,1
1,2
t1 t2
P1
t8
1,3
t10
1,2 1,3
t13
NACK
For p to R-multicast(g,m) (1) S gp = S gp + 1 (2) piggyback the messages since the last sending and send m For p to reply to a negative ACK q, Rgp (3) recover the messages p, S gp from BQ such as S gp > Rgp and send m
t4
0,0
2,1
P2 t 0
2,1
t3
t6
t9
t14
a message m with piggybacked values Time Rule
P0 S
0
0 g
P1
(R , R , R )
HQ
(×, 0, 0)
∅
0 g
1 g
2 g
t0
(1)(2)
t1
(4)
t2
(1)(2)
t3
(4)
t4
(7)
t5
(4)
(×, 0, 1)
(4)
(×, 1, 1)
t6
(3)
t7
(5)
a negative ACK
S
1 g
0
P2
(R , R , R )
HQ
S
(0, ×, 0)
∅
0
0 g
1 g
2 g
2 g
(R , R , R )
HQ
(0, 0, ×)
∅
0 g
1 g
2 g
1 (0, ×, 1) 1 (0, 1, ×)
1,1 ∅
54
e.g. considering three process P0, P1 and P2 cooperating in a distributed way with reliable multicast over IP multicast.
…. “Reliable over IP” (9) Initialization for all p (0) S gp = 0, Rgk = 0 ∀k , HQ = BQ = DQ = ∅
For q to R-deliver(m) from p S , R = get ( m) (4) if S = Rgp + 1 R-deliver(m), Rgp = S (5) if S ≤ Rgp discard m (6) if S > Rgp + 1 maintain p, S in HQ send a negative ACK q, Rgp to p (7) if R > Rgk ∀k maintain p, S in HQ send a negative ACK q, Rgk ∀k
t5
t11
t7
P0
t12
t15
NACK
1,1 2,1
0,1
1,2
t1 t2
P1
t8
1,3
t10
t13
1,2 1,3
NACK
For p to R-multicast(g,m) (1) S gp = S gp + 1 (2) piggyback the messages since the last sending and send m For p to reply to a negative ACK q, Rgp (3) recover the messages p, S gp from BQ such as S gp > Rgp and send m
t4
0,0
2,1
P2 t 0
t3
2,1
t6
t9
a message m with piggybacked values
t14 a negative ACK
Time Rule
t8
(1)(2)
t9
(4)
t10
(1)(2)
t11
(6)
t12
(4)
regular sending receiving case, as P1 didn’t receive a message since t2, it sent none piggybacked message, same at t10.
(4)
P0 receives a message from P1 but it didn’t receive the previous one, it pushes this message in its holdback queue and sends a negative ACK to P1. P0 processes the first P1 message, then this triggers a delivery condition for the previous P1 message recovered from the holdback queue and P0 processes that message.
t13
(3)
P1 resends a message to P0, recovered from its backup queue, to reply to the NACK.
t14
(4)
t15
(5)(5)
P0 has already delivered the P1’s messages while P1 replied to the negative ACK, the messages are discarded. 55
e.g. considering three process P0, P1 and P2 cooperating in a distributed way with reliable multicast over IP multicast.
…. P0 “Reliable over IP” (10) Initialization for all p (0) S gp = 0, Rgk = 0 ∀k , HQ = BQ = DQ = ∅
For q to R-deliver(m) from p S , R = get ( m) (4) if S = Rgp + 1 R-deliver(m), Rgp = S (5) if S ≤ Rgp discard m (6) if S > Rgp + 1 maintain p, S in HQ send a negative ACK q, Rgp to p (7) if R > Rgk ∀k maintain p, S in HQ send a negative ACK q, Rgk ∀k
t5
t11
t7
t12
t15
NACK
1,1 2,1
0,1
1,2
t1 t2
P1
t8
1,3
t10
1,2 1,3
t13
NACK
For p to R-multicast(g,m) (1) S gp = S gp + 1 (2) piggyback the messages since the last sending and send m For p to reply to a negative ACK q, Rgp (3) recover the messages p, S gp from BQ such as S gp > Rgp and send m
t4
0,0
2,1
P2 t 0
2,1
t3
t6
t9
t14
a message m with piggybacked values Time Rule
P0 S
0
0 g
P1
(R , R , R )
HQ
(×, 1, 1)
∅
0 g
1 g
2 g
t8
(1)(2)
t9
(4)
t10
(1)(2)
t11
(6)
t12
(4)
(×, 2, 1)
(4)
(×, 3, 1)
t13
(3)
t14
(4)
t15
(5)(5)
a negative ACK
S
1 g
1
P2
(R , R , R )
HQ
S
(0, ×, 1)
∅
1
0 g
1 g
2 g
2 g
(R , R , R )
HQ
(0, 1, ×)
∅
0 g
1 g
2 g
2 (0, 2, ×) 3
1,3 ∅ (0, 3, ×)
56