In many distributed algorithms, one node acts as a coordinator or leader. It often doesn’t matter which node performs this function.
• Objective : processes agree on a new coordinator when the old one failed or was cut out of the network. • In the following algorithms, each processor (node) has a unique ID. The highest or lowest surviving processor ends up being the leader.
After a network partition, the leader-less partition must elect a leader. 6-1
Election Algorithms - Bully(1)
Two key properties of a useful algorithm – Safety : a process is either in the middle of an election or has chosen as leader the non-crashed process with the highest ID (at most one leader is elected) – Liveness : all processes participate and eventually either set their “elected” variable or crash (some process will become the leader)
Efficiency is measured by the number of messages sent. 6-3
• Idea : the process with highest ID bullies his way into leadership. • Assumptions: – – – –
Completely connected network Communication links are fault-free Processes only fail by stopping. Failures can be detected by timeout [synchronous] 6-4
Election Algorithms - Bully(1)
The Bully Algorithm(2)
• Three types of messages : – Election : used to announce an election (Can I be the new leader ?) – Answer : sent in reply to election messages (No, you cannot be the leader) – Leader : sent when the election is complete to announce the election result - sent only by the new leader
• Each process knows the ID of all the other processes. • If a process, p, notices that the current leader is not responding (timeout) it can initiate an election. • If p has the highest remaining ID, it sends a leader message (no election required).
The Bully Algorithm(3) • •
P sends an election message with his ID to all higher numbered processes. If P gets a reply message from a higher numbered process, then he gives up and waits for a leader message from a higher numbered process. If no higher numbered process responded, P sends messages to all lower numbered processes declaring himself the new leader. If no leader message arrives (after a reply message has been received) then P assumes that the winner of the election has crashed and P reinitiates the election. 6-7
The Bully Algorithm (4)
The bully election algorithm a) Process 4 holds an election b) Process 5 and 6 respond, telling 4 to stop c) Now 5 and 6 each hold an election 6-8
The Bully Algorithm (5)
The Bully Algorithm(6) • In a network of N processes, if the leader fails and #N-1 starts an election, he will send one message to the old leader (to give him one last chance to respond) and N-2 messages declaring himself leader. So the cost is about N messages. • If the lowest numbered process is the first to notice that the leader failed, he will initiate an election (N-1 messages) with the result that everyone responds (N-1 messages) and they all begin elections and get responses (! I=1 to N of I). The high numbered process now sends everyone a message declaring himself leader. So the number of messages could be as high as about N2.( If the leader fails then the election must be repeated which can happen N-1 times.)
Process 6 tells 5 to stop Process 6 wins and tells everyone 6-9
A Ring Election Algorithm
A Ring Election Algorithm • Processors are physically or logically organized in a ring (not necessarily a complete network). • Processes might not know the number of processors in the ring nor what all of the IDs are; they only know their own ID and how to find their clockwise and counterclockwise neighbors. • The ring can be unidirectional, in which case, they only need to be able to send messages to their clockwise neighbor. • Communication can be asynchronous, but communication failures are not tolerated. 6-11
• Process states are: Normal, Election, Leader. • Any process that notices that the leader is not functioning, changes his state to Election, starts an election message containing his ID and sends it to his clockwise neighbor.
3 election : 5 5 6-12
A Ring Election Algorithm
A Ring Election Algorithm
When a process receives an election message: • If his ID is higher than the ID in the message, he replaces the ID with his own, changes his state to Election and sends it to his clockwise neighbor. • If his ID is lower than the message ID, he relays it to his clockwise neighbor (changing his state to Election). • If his ID is the same as the ID in the message, he changes his state to Leader and sends out a “I am leader” message to his clockwise neighbor. 0
A Ring Algorithm
• When a process receives an “I am leader message”, he records the ID of the new leader and changes state to Normal, and forwards the message (unless he is the leader). • Variation has all ID’s in election token.
The Ring Algorithm: Complexity • In the best case, only the process with the highest ID starts an election message, so the number of messages is 2N. • In the worst case, if messages go clockwise and his counterclockwise neighbor has the highest ID, the number of messages is 3N-1 • In the worst case, N nodes start an election message resulting in O(N2).
Election algorithm using a ring.
The Ring Algorithm: Limitations
• LCR Ring Election • Assumptions: the ring size can be unknown. The communications are unidirectional. • Each node sends a message with its ID around the ring. When a process receives an incoming message, it compares the ID with its own. If the incoming ID is greater than its own, it passes it to the next node; if it is less than its own, it discards it; if it is equal to its own, it declares itself leader.
• We must assume: – The ring is reformed after the failure of the leader (which brings about the election) – There are no failures during the election process, or – If there is a failure during the election • • • •
It is quickly detected The election is called off The ring is reformed and The election is restarted from the beginning
Elect 0 6-17
LCR Ring Election • What is the message complexity? Suppose processes are ordered. • If messages are passed clockwise, only one survives after the first round : O(n) • If messages are passed counter-clockwise, message i is passed i times : O(N2).
HS Ring Election (1) 2 Elect 2
Elect 1 1 3 Elect 0
The Ring Algorithm: Modifications
Elect 3 0 6-19
• Hirschberg Sinclair Algorithm (ring modification) • O(N2) is a lot of messages. Here is a modification that is O(N log N). • Assumptions: the ring size can be unknown. The communications must be bidirectional. All nodes start more or less at the same time. Each node operates in phases and sends out tokens. The tokens carry hopcounts and direction flags in addition to the ID of the sender. ID=3,2 hops clockwise
ID=3,2 hops counterclckws 6-20
HS Ring Election (2)
HS Ring Election (3)
• Phases are numbered 0, 1, 2, 3, … In each phase, k, node j sends out tokens uj containing its ID in both directions. • The tokens travel 2k hops then return to their origin j. If both tokens make it back, process j continues with the next phase (increments k). If both tokens do not make it back, process j simply waits to be told the result of the election.
• If a process m receives a token uj going in the outbound direction, it compares the token’s ID with its own. – If it has a larger ID, it simply discards the token. – If it has a smaller ID, it relays the token as requested. – If it is equal to the token ID, it has received its own token in the outbound direction, so the token has gone all the way around the ring and the process declares itself leader.
• All processes always relay inbound tokens.
ID=3,2 hops clockwise
HS Ring Election (4)
• Message Complexity: In the first phase, every process sends out 2 tokens and they go one hop and return. This is a total of 4N messages for the tokens to go out and return. • In phase k, where k>0, a process sends out tokens if it was not overruled in the previous phase, that is by a process within a distance of 2k-1 in either direction. This implies that within group of 2k-1+1consecutive processors, at most one goes on to send out tokens in phase k. • This limits the message complexity to O(N log N). 6-23
• Bully – – – –
Must know ID’s of all other processes Message transit and response time bounded Any node initiates election Best: 2N Worst: N2
• Ring – basic – – – –
Logical ring, nodes only know neighbors If failure, ring must reform then restart election Election msgs travel around ring to find highest ID. NlogN 6-24
• Suppose that we allow the communication infrastructure to fail as well as the processes. • When the network is disconnected we may want to elect a leader for each group. • Designed for asynchronous systems.
• Each node in a group periodically checks whether the leader is alive or not, by sending a message to the leader, and waits for reply. • If the node does not reply within a timeout period, the node invokes a Recovery procedure. • The recovery procedure puts node i into a singleton group with node i as the leader. • Periodically, each leader i calls a check procedure, which sends messages to every other node asking whether that node is a leader. • If many nodes respond that it is a leader, node i pauses for some time and then invites other nodes to merge.
• When a leader receives an invitation to merge it forwards it to all members of its own group. Any process receiving an invitation accepts by sending an accept message to the potential leader who acknowledges with an answer message. • The leader who initiated the merger becomes the leader for the enlarged group, and confirms the new group by sending a ready message to each member of the new group. These processes respond with an answer message.
Invitation Algorithm • Each group has an identifier which is unique and any change in the structure of the group results in a new identifier being assigned. Thus coordination is always with respect to the current members of the group only. • It may take a long time for groups to merge. • Works correctly in the presence of timing failures. (No assumption about response time.) • Correctness is defined to depend on a group of processes agreeing to group membership and then agreeing on a value. This is a weaker condition than in the bully algorithm where the requirement is that all functioning processes agree to a value. 6-29