Coordination and agreement - Mathieu Delalandre's Home Page

Message passing provides synchronization and communication functions, that can ...... (i) IP multicast: the protocol is based on the observation that IP multicast ...
296KB taille 2 téléchargements 242 vues
Distributed computing “Coordination and agreement” Mathieu Delalandre University of Tours, Tours city, France [email protected]

1

Coordination and agreement 1. Introduction 2. Distributed mutual exclusion 2.1. Introduction 2.2. The baseline methods 2.3. The Ricart-Agrawala algorithm 3. Group communication 3.1. Introduction 3.2. Basic multicast 3.3. Reliable multicast over B-multicast 3.4. Reliable multicast over IP-multicast

2

Introduction Coordination and agreement concerns distributed algorithms whose goals aim to coordinate distributed processes on actions to perform, but also agreement on values to reach. Typical problems are: Distributed mutual exclusion: in a group, processes must agree on access requirements to no-shareable resources.

Group communication: specific algorithms need to be designed to enable efficient group communication wherein processes can join, leave or even fail.

P0

failure × 1. send to n

R P1

P2

2. check if n receipts, if not send 0 ACK

At the corner. leader election, consensus agreement, deadlock detection and resolution, global predicate detection, termination detection, failure detection, garbage collection, etc.

3

Coordination and agreement 1. Introduction 2. Distributed mutual exclusion 2.1. Introduction 2.2. The baseline methods 2.3. The Ricart-Agrawala algorithm 3. Group communication 3.1. Introduction 3.2. Basic multicast 3.3. Reliable multicast over B-multicast 3.4. Reliable multicast over IP-multicast

4

Distributed mutual exclusion “Introduction” (1) Mutual exclusion: two events are mutually exclusive if they cannot occur at the same time. Mutual exclusion algorithms are used to avoid the simultaneous use of a resource by the critical section piece of code.

Requirements for mutual exclusion include the following properties:

Safety

At most one process may execute in the critical section.

Liveness

Requests to enter and exit the critical section eventually succeed.

Ordering If one request to enter the critical section happened-before another, then the entry to the critical section is granted in that order.

5

Distributed mutual exclusion “Introduction” (2) Mutual exclusion: two events are mutually exclusive if they cannot occur at the same time. Mutual exclusion algorithms are used to avoid the simultaneous use of a resource by the critical section piece of code.

Performance metrics are generally measured as follows: Message complexity = f(N) is the number of messages that are required per critical section execution with N the members of the system.

The last process exists the critical section

The next process enters the critical section

Synchronization delay (SD) is the time required before the next process enters in the critical section. Section execution time (E) is the execution time that a process requires to be in the critical section.

Synchronization Delay (SD)

section execution time (E)

System Throughput (ST) is the rate at which the system executes requests for the critical section. ST =

1 SD + E

ST

System Throughput in request/second

is the average Synchronization Delay in second 1 [1, n ] SDi ∑ (n is the synchronization magnitude) n ∀i is the average section execution time in second 1 [1, n ] E = ∑ Ei (n is the synchronization magnitude) n ∀i

SD =

6

Distributed mutual exclusion “Introduction” (3) Mutual exclusion: two events are mutually exclusive if they cannot occur at the same time. Mutual exclusion algorithms are used to avoid the simultaneous use of a resource by the critical section piece of code.

Performance metrics are generally measured as follows:

centralized / local request request message Response time (RT) is the time interval that a request waits its sent out

critical section to be over, after its request message has been sent out, we have RT > E

entering the critical section leaving the critical section

section execution time (E)

e.g. a non concurrent access case

Response Time (RT)

Processing

(E) Synchronization Delay (SD)

Response Time (RT)

7

Distributed mutual exclusion “Introduction” (4) Mutual exclusion: two events are mutually exclusive if they cannot occur at the same time. Mutual exclusion algorithms are used to avoid the simultaneous use of a resource by the critical section piece of code.

The exclusion methods: the baseline ways to achieve mutual exclusion is message passing, to use a server or to arrange processes in a logical ring. These approaches suffers from performance and failure and are not supporting ordering. Different advanced algorithms have been proposed to solve these problem including the Ricart-Agrawala’s algorithm.

Message passing Central server algorithm Ring-based algorithm Ricart-Agrawala algorithm

=2 ≥2 ≥2

ordering

safety

Process number

liveness

Requirements

yes

Constraints and performances

it works with 2 processes the coordinator can crash

yes yes

no

SD is large the message complexity is f(N) = 2(N-1)

8

Coordination and agreement 1. Introduction 2. Distributed mutual exclusion 2.1. Introduction 2.2. The baseline methods 2.3. The Ricart-Agrawala algorithm 3. Group communication 3.1. Introduction 3.2. Basic multicast 3.3. Reliable multicast over B-multicast 3.4. Reliable multicast over IP-multicast

9

Distributed mutual exclusion “The baseline methods” (1)

Message passing Central server algorithm Ring-based algorithm Ricart-Agrawala algorithm

=2 ≥2 ≥2

ordering

safety

Process number

liveness

Requirements

yes

Constraints and performances

it works with 2 processes the coordinator can crash

yes yes

no

SD is large the message complexity is f(N) = 2(N-1)

10

Distributed mutual exclusion “The baseline methods” (2) Message passing provides synchronization and communication functions, that can be used in distributed systems as well in shared-memory systems. A number of design issues must be considered: synchronization primitives (blocking, non blocking), addressing, message format and queuing discipline (FIFO, priority, etc.). The next algorithm achieves mutual exclusion with message passing, we assume here the use of a blocking receive primitive and a non blocking send primitive. The two processes exchange a token that ensures a mutual exclusion access.

1. send 1. send

2. receive computer 1

Process A loop (1) receive message from B (2) pop item from message (3) send message to B

message Process B

command

Process A

receive

true

false

continue

wait

2. receive computer 2

Process B loop (1) receive message from A (2) push a new item in message (3) send message to A

11

Distributed mutual exclusion “The baseline methods” (3) Central server algorithm: the simplest way to achieve mutual exclusion with multiple processes is to employ a server that grants permission to enter in the critical section. This simulates how mutual exclusion is done in a one-processor system. The two access cases are no-concurrent and concurrent. The no-concurrent access

(3) ok P1

(2) C gives the section to P1

(4) release C

(5) C frees the section

wait

wait

C

hold

(1) request

(4) P1 sends a release to C (5) C frees the section

P1

hold

(1) P1 sends a request to C (2-3) C registers P1 in the critical section and sends ok

P1

step 1

step 2

(1), (2), (3) … are event orders Pi

Pi are distributed processes

the stack managed by the coordinator

C

C is a process coordinator

a message between two processes 12

Distributed mutual exclusion “The baseline methods” (4) Central server algorithm: the simplest way to achieve mutual exclusion with multiple processes is to employ a server that grants permission to enter in the critical section. This simulates how mutual exclusion is done in a one-processor system. The two access cases are no-concurrent and concurrent. The concurrent access P2

(1) P2 sends a request to C (2) C doesn’t answer and pushes the request in the queue if P2 is not included yet

C

hold

(1) request

P1

P2

(3) release

(5) ok C

P1

hold

P1

P2

(4) C frees the section and pops up the queue

P2 (2) C pushes P2 request in the queue

wait

wait

(3) P1 sends a release to C (4-5) C frees the section, pops up P2 from the queue and gives the section, then it sends ok to P2

step 1

step 2

(1), (2), (3) … are event orders Pi

Pi are distributed processes

the stack managed by the coordinator

C

C is a process coordinator

a message between two processes 13

Distributed mutual exclusion “The baseline methods” (5) Ring-based algorithm: requires that each process Pi has a communication channel to the next process in the ring P(i+1)mod N. The idea is that exclusion is conferred by obtaining a token in the form of a message passed from process to process in a single direction. e.g. eight processes P0 to P7 are scheduled on different computers across a network and synchronize through a ring using a token with message passing.

P0 P7

P1 Pi P2

P6

Pi are distributed processes is the token message ordering

P5

P3 P4

14

Coordination and agreement 1. Introduction 2. Distributed mutual exclusion 2.1. Introduction 2.2. The baseline methods 2.3. The Ricart-Agrawala algorithm 3. Group communication 3.1. Introduction 3.2. Basic multicast 3.3. Reliable multicast over B-multicast 3.4. Reliable multicast over IP-multicast

15

Distributed mutual exclusion “The Ricart-Agrawala algorithm” (1)

Message passing Central server algorithm Ring-based algorithm Ricart-Agrawala algorithm

=2 ≥2 ≥2

ordering

safety

Process number

liveness

Requirements

yes

Constraints and performances

it works with 2 processes the coordinator can crash

yes yes

no

SD is large the message complexity is f(N) = 2(N-1)

16

Distributed mutual exclusion “The Ricart-Agrawala algorithm” (2) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps.

17

Distributed mutual exclusion “The Ricart-Agrawala algorithm” (3) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps. Let us try to understand how the algorithm works, considering (1) the access case

(3) Access to critical section P0

P0 (1) P0 timestamp

(1) P0 timestamp

P1

P2

(2) ok

P0 (2) ok

P1

P2

(1) P0 sends its timestamp to everybody

(2) all processes not in the critical section send ok

step 1

step 2

P1

P2

(3) P0 accesses the critical section

step 3

(1), (2), (3) … are event orders Pi

Pi are distributed processes

a request message with timestamp between two processes a reply message between two processes

the critical section

a process accesses the critical section

18

Distributed mutual exclusion “The Ricart-Agrawala algorithm” (4) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps. Let us try to understand how the algorithm works, considering (2) the blocking case

P0

P0 (1) P0 timestamp

(1) P0 timestamp

P1

P2

(1) P0 sends its timestamp to everybody

step 1

P0

(2) ok

(3) ok

P1

P2

(2) P1 not in the critical section sends ok, not P2, the access to the critical section is denied

step 2

P1

P2

(3) P2 frees the critical section and sends ok to P0, the access to the critical section is granted

step 3

(1), (2), (3) … are event orders Pi

Pi are distributed processes

a request message with timestamp between two processes a reply message between two processes

the critical section

a process accesses the critical section

19

Distributed mutual exclusion “The Ricart-Agrawala algorithm” (5) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps. Let us try to understand how the algorithm works, considering (3) the concurrent case (3) P0 accesses the critical section

(4) P0 frees the critical section

P0

P0 12

8

(2) ok

P0 (2) ok

(5) ok

8 P1

P2 12

P1

P2

(1) P0, P2 interested to access the critical section, (2-3) P1 not in the critical section sends ok to send their timestamps to everybody both, P0’s timestamp is lowest thus it wins, P2 sends ok to P0, P0 accesses

step 1

P1

P2

(2) ok (4-5) P0 frees the critical section and sends ok to P2, it grants P0

step 2

step 3

(1), (2), (3) … are event orders Pi

Pi are distributed processes

a request message with timestamp between two processes a reply message between two processes

the critical section

a process accesses the critical section

20

Distributed mutual exclusion “The Ricart-Agrawala algorithm” (6) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps. The Ricart-Agrawala algorithm can be defined as follows: •The algorithm uses request and reply messages, a request message is sent to all other processes to request their permission to enter in the critical section, a reply messages is sent to a process to give a permission. •Processes use a logical clock to assign timestamps to critical section requests. These timestamps are used to decide the priority of requests in the case of conflict. Considering , , where T are timestamps and p process identifiers, i and j are the receiver and the sender respectively. •If Ti < Tj then pi defers the reply to pj while pi is waiting for the critical section. •Otherwise pi sends a reply message to pj, thus the highest priority request succeeds in collecting all the reply messages.

21

Distributed mutual exclusion “The Ricart-Agrawala algorithm” (7) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps. The Ricart-Agrawala algorithm can be defined as follows: •Each process maintains a request-deferred array RDi of size “number of processes in the system”, initially ∀i ∀j RDi [j] = 0. •Whenever pi defers the request sent by pj, it sets RDi [j] = 1. •After pi has sent a reply message to pj, it sets RDi [j] = 0. •When a process takes up a request for a critical section, it updates its logical clock. Also, when a process receives a timestamp, it updates its clock using this timestamp.

22

Distributed mutual exclusion “The Ricart-Agrawala algorithm” (8) Ricart-Agrawala algorithm: implements mutual exclusion between N peer processes. A process that requires entry to a critical section multicasts a request message, and can enter it only when all other processes have replied to this message. This algorithm requires a total ordering of all events in the system, logical clock can be used to provide timestamps.

Initialization with pi (0) state = RELEASED Requesting the critical section with pi (1) state = WANTED, update Ti (2) broadcast

Receiving a reply message with pi (5) Msgi = Msgi +1 if Msgi == N-1 (6) state = HELD

Receiving rule with pi if (state == HELD) or (state == WANTED) and (Ti < Tj) (3) RDi [j] = 1 otherwise (4) update Ti, send a reply

Releasing the critical section with pi (7) state = RELEASED (8) for ∀j, if RDi [j] = 1 send a reply message to pj , set RDi [j] = 0 Msgi = 0

23

Ricart-Agrawala algorithm (9) Initialization with pi (0) state = RELEASED Requesting the critical section with pi (1) state = WANTED, update Ti (2) broadcast Receiving rule with pi if (state == HELD) or (state == WANTED) and (Ti < Tj) (3) RDi [j] = 1 otherwise (4) update Ti, send a reply Receiving a reply message with pi (5) Msgi = Msgi +1 if Msgi == N-1 (6) state = HELD Releasing the critical section with pi (7) state = RELEASED (8) for ∀j, if RDi [j] = 1 send a reply message to pj , set RDi [j] = 0 Msgi = 0

e.g. considering three process P0, P1 and P2 and a resource R: P1 gets the section at first then P2 is blocked while accessing R, P0 and P2 enter in a concurrent access case, the section is released to P2 then P0.

t2

P0

P1

t6

t4

t0

t9

t10 t11

t5

t1

t16

t7

t3

P2

t12

t8

t13 t14

t15

Events

Description

(1) access events

t0 to t4

P1 asks for the critical section, granted by P0, P1.

(2) blocking access events

t5 to t8

P2 asks for the critical section, while P1 holds it.

(3) concurrent access events

t9, t10, t13 P0 asks for the critical section, while P1 holds it and P2 is still looking for it, as CL2 < CL0 at t13 P2 will not reply.

(2) blocking access events

t11, t12, t14 P1 frees the section, as P2 was granted by P0 at t8 it got it. t15 and t16 when releasing the section at t15, P2 grants P0. 24

Ricart-Agrawala algorithm (10) Initialization with pi (0) state = RELEASED Requesting the critical section with pi (1) state = WANTED, update Ti (2) broadcast Receiving rule with pi if (state == HELD) or (state == WANTED) and (Ti < Tj) (3) RDi [j] = 1 otherwise (4) update Ti, send a reply Receiving a reply message with pi (5) Msgi = Msgi +1 if Msgi == N-1 (6) state = HELD

e.g. considering three process P0, P1 and P2 and a resource R: P1 gets the section at first then P2 is blocked while accessing R, P0 and P2 enter in a concurrent access case, the section is released to P2 then P0.

t2

P0

t6

t4

t0

P1

t9

t10 t11

P2

t5

t1

t8

time

t0

t1

t2

t3

rules

(1)(2)

(4)

(4)

(5)

R

CL0

3

RD0



Msg0

0

t4

t5

(5)(6) (1)(2)

t13 t14

t15

t6

t7

t8

t9

t10

t11

(4)

(3)

(5)

(1)(2)

(3)

(7)(8)

W 8

P1 sate

R

W

CL1

6

7

RD1



Msg1

0

Releasing the critical section with pi P2 Sate R (7) state = RELEASED CL2 5 (8) for ∀j, if RDi [j] = 1 RD2 ∅ send a reply message to pj , set RDi [j] = 0 Msg2 0 Msgi = 0

t16

t7

t3

P0 state

t12

10

11

H

R

P2 1

P2,0

2

0 W

8



9

1

Ricart-Agrawala algorithm (11) Initialization with pi (0) state = RELEASED Requesting the critical section with pi (1) state = WANTED, update Ti (2) broadcast Receiving rule with pi if (state == HELD) or (state == WANTED) and (Ti < Tj) (3) RDi [j] = 1 otherwise (4) update Ti, send a reply Receiving a reply message with pi (5) Msgi = Msgi +1 if Msgi == N-1 (6) state = HELD

e.g. considering three process P0, P1 and P2 and a resource R: P1 gets the section at first then P2 is blocked while accessing R, P0 and P2 enter in a concurrent access case, the section is released to P2 then P0.

t2

P0

t6

t4

t0

P1

t9

t10 t11

P2

t5

t1 time

t12

t13

rules

(5)

(3)

W

CL0

11

RD0



Msg0

0

P1 sate

R

CL1

7

RD1



Msg1

0

Releasing the critical section with pi P2 Sate W (7) state = RELEASED CL2 9 (8) for ∀j, if RDi [j] = 1 RD2 ∅ send a reply message to pj , set RDi [j] = 0 Msg2 1 Msgi = 0

t16

t7

t3

P0 state

t12

t14

t8 t15

t16

(5)(6) (7)(8) (5)(6) H

1

2

H

R ∅

P0 2

0

t13 t14

t15

Ricart-Agrawala algorithm (12) Initialization with pi (0) state = RELEASED Requesting the critical section with pi (1) state = WANTED, update Ti (2) broadcast Receiving rule with pi if (state == HELD) or (state == WANTED) and (Ti < Tj) (3) RDi [j] = 1 otherwise (4) update Ti, send a reply Receiving a reply message with pi (5) Msgi = Msgi +1 if Msgi == N-1 (6) state = HELD Releasing the critical section with pi (7) state = RELEASED (8) for ∀j, if RDi [j] = 1 send a reply message to pj , set RDi [j] = 0 Msgi = 0

e.g. considering three process P0, P1 and P2 and a resource R: P1 gets the section at first then P2 is blocked while accessing R, P0 and P2 enter in a concurrent access case, the section is released to P2 then P0.

t2

P0

P1

t6

t9

t4

t0

t1

t16

t7

t3

P2

t12

t10 t11

t5

t8

SD

t13 t14

SD

E1 RT1 RT2

t15

E2 SD

Coordination and agreement 1. Introduction 2. Distributed mutual exclusion 2.1. Introduction 2.2. The baseline methods 2.3. The Ricart-Agrawala algorithm 3. Group communication 3.1. Introduction 3.2. Basic multicast 3.3. Reliable multicast over B-multicast 3.4. Reliable multicast over IP-multicast

28

Group communication “Introduction” (1) Group communication: specific algorithms need to be designed to enable efficient group communication wherein processes can join and leave groups dynamically, or even fail. worker worker coordinator

A group is a collection of processes that share a common context and collaborate on a common task within an application domain. worker

worker

a peer group Multicast: the communication within a group can be ensured with a multicast operation. The multicast operation can be driven at the application or network layer. Application layer multicast is supported with one-to-one send operations. The implementation can use threads to perform the send operations concurrently. Network layer multicast (or IP multicast) sends message to the group in a single transmission. Copies are automatically created into network elements that contain members of the group, such as routers, switches, etc. It can be implemented using IP multicast, part of the IP protocol.

application layer multicast

network layer multicast

a hierarchical group

a group

a group 29

Group communication “Introduction” (2) Reliable multicast is one that satisfies the integrity, validity and agreement properties.

Validity: if a process multicasts a message m, then it will deliver m. The validity property guarantees liveness for the sender.

Agreement

Agreement: if a process delivers a message m, then all other processes in the group will deliver m. The message must arrive to all the members of the group.

1. send to n without failure

Validity

Integrity: the message received is the same as the one sent, and none message is delivered twice.

2. check if n receipts

30

Group communication “Introduction” (3) Ordered multicast: different recipients may receive the messages in different orders, possibly violating the semantics of the distributed program. Formal specifications of ordered delivery need to be formulated and implemented.

process/computer P1

P2

P3

P4

m1 time

m2

case message m1 before m2

case message m2 before m1

31

Group communication “Introduction” (4) Multicast primitives are implementations for the multicast communication that satisfy the reliability and ordering properties with performances.

B-multicast R-multicast over B-multicast R-multicast over IP multicast

application layer

network layer

yes

ordering

agreement

validity

Multicast level

integrity

Requirements

ACK-implosion, 2|g| messages

no yes

yes

Constraints and performances

no

ACK-implosion, Rgk ∀k then there is one or more message that q has not received yet, q maintains p, S in the hold-back queue HQ and sends NACK in the form q, Rgp ,k . q

Message processing deliver DQ HQ

incoming messages

when delivery guaranty are met

51

Group communication “Reliable multicast over IP multicast” (6) Initialization for all p (0) S gp = 0, Rgk = 0 ∀k , HQ = BQ = DQ = ∅

to init p (either q)

For p to R-multicast(g,m) (1) S gp = S gp + 1 (2) piggyback the messages since the last sending and send m

to R-multicast

For p to reply to a negative ACK q, Rgp (3) recover the messages p, S gp from BQ such as S gp > Rgp and send m

to reply to a negative ACK

For q to R-deliver(m) from p S , R = get ( m) (4) if S = Rgp + 1 R-deliver(m), Rgp = S (5) if S ≤ Rgp discard m (6) if S > Rgp + 1 maintain p, S in HQ send a negative ACK q, Rgp to p (7) if R > Rgk ∀k maintain p, S in HQ send a negative ACK q, Rgk ∀k

regular delivering case m has been delivered before there is one or more message that q has not received yet from p there is one or more message that q has not received yet from another process 52

e.g. considering three process P0, P1 and P2 cooperating in a distributed way with reliable multicast over IP multicast.

…. “Reliable over IP” (7) Initialization for all p (0) S gp = 0, Rgk = 0 ∀k , HQ = BQ = DQ = ∅

For q to R-deliver(m) from p S , R = get ( m) (4) if S = Rgp + 1 R-deliver(m), Rgp = S (5) if S ≤ Rgp discard m (6) if S > Rgp + 1 maintain p, S in HQ send a negative ACK q, Rgp to p (7) if R > Rgk ∀k maintain p, S in HQ send a negative ACK q, Rgk ∀k

t5

t11

t7

P0

t12

t15

NACK

1,1 2,1

0,1

1,2

t1 t2

P1

t8

1,3

t10

1,2 1,3

t13

NACK

For p to R-multicast(g,m) (1) S gp = S gp + 1 (2) piggyback the messages since the last sending and send m For p to reply to a negative ACK q, Rgp (3) recover the messages p, S gp from BQ such as S gp > Rgp and send m

t4

0,0

2,1

P2 t 0

t3

2,1

t6

t9

a message m with piggybacked values

t14 a negative ACK

Time Rule

t0

(1)(2)

t1

(4)

t2

(1)(2)

t3

(4)

t4

(7)

t5

(4)

regular sending receiving case, as P2 has not received a message yet there is no piggybacked message. P1 receives a P2’s message and piggybacks it. P0 has not received the P2’s message yet, it pushes the P1’s message in its holdback queue and sends a negative ACK to P2.

(4)

P0 processes the P2 message, then this triggers a delivery condition for the P1’s message recovered from the holdback queue and P2 processes that message.

t6

(3)

P2 resends a message to P1, recovered from its backup queue, to reply to the NACK.

t7

(5)

P0 already received the P2’s message while P2 replied to the negative ACK, the message is discarded. 53

e.g. considering three process P0, P1 and P2 cooperating in a distributed way with reliable multicast over IP multicast.

…. “Reliable over IP” (8) Initialization for all p (0) S gp = 0, Rgk = 0 ∀k , HQ = BQ = DQ = ∅

For q to R-deliver(m) from p S , R = get ( m) (4) if S = Rgp + 1 R-deliver(m), Rgp = S (5) if S ≤ Rgp discard m (6) if S > Rgp + 1 maintain p, S in HQ send a negative ACK q, Rgp to p (7) if R > Rgk ∀k maintain p, S in HQ send a negative ACK q, Rgk ∀k

t5

t11

t7

P0

t12

t15

NACK

1,1 2,1

0,1

1,2

t1 t2

P1

t8

1,3

t10

1,2 1,3

t13

NACK

For p to R-multicast(g,m) (1) S gp = S gp + 1 (2) piggyback the messages since the last sending and send m For p to reply to a negative ACK q, Rgp (3) recover the messages p, S gp from BQ such as S gp > Rgp and send m

t4

0,0

2,1

P2 t 0

2,1

t3

t6

t9

t14

a message m with piggybacked values Time Rule

P0 S

0

0 g

P1

(R , R , R )

HQ

(×, 0, 0)



0 g

1 g

2 g

t0

(1)(2)

t1

(4)

t2

(1)(2)

t3

(4)

t4

(7)

t5

(4)

(×, 0, 1)

(4)

(×, 1, 1)

t6

(3)

t7

(5)

a negative ACK

S

1 g

0

P2

(R , R , R )

HQ

S

(0, ×, 0)



0

0 g

1 g

2 g

2 g

(R , R , R )

HQ

(0, 0, ×)



0 g

1 g

2 g

1 (0, ×, 1) 1 (0, 1, ×)

1,1 ∅

54

e.g. considering three process P0, P1 and P2 cooperating in a distributed way with reliable multicast over IP multicast.

…. “Reliable over IP” (9) Initialization for all p (0) S gp = 0, Rgk = 0 ∀k , HQ = BQ = DQ = ∅

For q to R-deliver(m) from p S , R = get ( m) (4) if S = Rgp + 1 R-deliver(m), Rgp = S (5) if S ≤ Rgp discard m (6) if S > Rgp + 1 maintain p, S in HQ send a negative ACK q, Rgp to p (7) if R > Rgk ∀k maintain p, S in HQ send a negative ACK q, Rgk ∀k

t5

t11

t7

P0

t12

t15

NACK

1,1 2,1

0,1

1,2

t1 t2

P1

t8

1,3

t10

t13

1,2 1,3

NACK

For p to R-multicast(g,m) (1) S gp = S gp + 1 (2) piggyback the messages since the last sending and send m For p to reply to a negative ACK q, Rgp (3) recover the messages p, S gp from BQ such as S gp > Rgp and send m

t4

0,0

2,1

P2 t 0

t3

2,1

t6

t9

a message m with piggybacked values

t14 a negative ACK

Time Rule

t8

(1)(2)

t9

(4)

t10

(1)(2)

t11

(6)

t12

(4)

regular sending receiving case, as P1 didn’t receive a message since t2, it sent none piggybacked message, same at t10.

(4)

P0 receives a message from P1 but it didn’t receive the previous one, it pushes this message in its holdback queue and sends a negative ACK to P1. P0 processes the first P1 message, then this triggers a delivery condition for the previous P1 message recovered from the holdback queue and P0 processes that message.

t13

(3)

P1 resends a message to P0, recovered from its backup queue, to reply to the NACK.

t14

(4)

t15

(5)(5)

P0 has already delivered the P1’s messages while P1 replied to the negative ACK, the messages are discarded. 55

e.g. considering three process P0, P1 and P2 cooperating in a distributed way with reliable multicast over IP multicast.

…. P0 “Reliable over IP” (10) Initialization for all p (0) S gp = 0, Rgk = 0 ∀k , HQ = BQ = DQ = ∅

For q to R-deliver(m) from p S , R = get ( m) (4) if S = Rgp + 1 R-deliver(m), Rgp = S (5) if S ≤ Rgp discard m (6) if S > Rgp + 1 maintain p, S in HQ send a negative ACK q, Rgp to p (7) if R > Rgk ∀k maintain p, S in HQ send a negative ACK q, Rgk ∀k

t5

t11

t7

t12

t15

NACK

1,1 2,1

0,1

1,2

t1 t2

P1

t8

1,3

t10

1,2 1,3

t13

NACK

For p to R-multicast(g,m) (1) S gp = S gp + 1 (2) piggyback the messages since the last sending and send m For p to reply to a negative ACK q, Rgp (3) recover the messages p, S gp from BQ such as S gp > Rgp and send m

t4

0,0

2,1

P2 t 0

2,1

t3

t6

t9

t14

a message m with piggybacked values Time Rule

P0 S

0

0 g

P1

(R , R , R )

HQ

(×, 1, 1)



0 g

1 g

2 g

t8

(1)(2)

t9

(4)

t10

(1)(2)

t11

(6)

t12

(4)

(×, 2, 1)

(4)

(×, 3, 1)

t13

(3)

t14

(4)

t15

(5)(5)

a negative ACK

S

1 g

1

P2

(R , R , R )

HQ

S

(0, ×, 1)



1

0 g

1 g

2 g

2 g

(R , R , R )

HQ

(0, 1, ×)



0 g

1 g

2 g

2 (0, 2, ×) 3

1,3 ∅ (0, 3, ×)

56