Fast generation of random connected graphs with ... - Fabien Viger

Aug 17, 2005 - Theorem (Taylor 1982) : This Markov chain is ergodic and symmetric. It converges towards the uniform distribution over all states. 5/24 ...
655KB taille 2 téléchargements 344 vues
Fast generation of random connected graphs with prescribed degrees Fabien Viger∗,†, Matthieu Latapy† ∗

LIP6, CNRS and University Pierre and Marie Curie, Paris, France †

LIAFA, CNRS and University Denis Diderot, Paris, France

August 17th, 2005

The Molloy and Reed model Introduction

1/24

The Molloy and Reed model Introduction

Random element in the set of all multigraphs with these degrees

1/24

The Molloy and Reed model Introduction

Random element in the set of all multigraphs with these degrees

1/24

The Molloy and Reed model Introduction

Random element in the set of all multigraphs with these degrees

11

6 1

7

5

12 9

2

8

14

3

15 4 10

16

13

1

6

2

15

3

16

4

10

5

8

7

11

9

12

13

14

1/24

The Molloy and Reed model Introduction

Random element in the set of all multigraphs with these degrees

11

6 1

7

5

12 9

2

8

14

3

15 4 10

16

13

1

6

2

15

3

16

4

10

5

8

7

11

9

12

13

14

1/24

The Molloy and Reed model Introduction

Random element in the set of all multigraphs with these degrees

11

6 1

7

5

12 9

2

8

14

3

15 4 10

16

13

1

6

2

15

3

16

4

10

5

8

7

11

9

12

13

14

Rigorous randomness and linear complexity. . .

1/24

The Molloy and Reed model Introduction

Random element in the set of all multigraphs with these degrees

11

6 1

7

5

12 9

2

8

14

3

15 4 10

16

13

1

6

2

15

3

16

4

10

5

8

7

11

9

12

13

14

Rigorous randomness and linear complexity. . . . . .But the graph isn’t always simple and/or connected 1/24

Plan

. State

of the art

. Towards

optimal heuristics

. Prevent

the disconnection

2/24

Part I

State of the art Generation of random simple connected graphs with prescribed degrees

The global algorithm Generation of simple connected graphs

3/24

The global algorithm Generation of simple connected graphs

Simple

¦ Realize the degree sequence : linear (Havel-Hakimi 1955)

3/24

The global algorithm Generation of simple connected graphs

Simple

¦ Realize the degree sequence : linear (Havel-Hakimi 1955) ¦ Connection : linear number of edge swaps

3/24

The global algorithm Generation of simple connected graphs

Simple Connected

¦ Realize the degree sequence : linear (Havel-Hakimi 1955) ¦ Connection : linear number of edge swaps . At this point, the graph is highly biased

3/24

The global algorithm Generation of simple connected graphs

Simple Connected

¦ Realize the degree sequence : linear (Havel-Hakimi 1955) ¦ Connection : linear number of edge swaps ¦ Shuffle : perform a certain number of random edge swaps that keep the graph simple and connected 3/24

The global algorithm Generation of simple connected graphs

Simple Connected

¦ Realize the degree sequence : linear (Havel-Hakimi 1955) ¦ Connection : linear number of edge swaps ¦ Shuffle : perform a certain number of random edge swaps that keep the graph simple and connected 3/24

The global algorithm Generation of simple connected graphs

Simple Connected

¦ Realize the degree sequence : linear (Havel-Hakimi 1955) ¦ Connection : linear number of edge swaps ¦ Shuffle : perform a certain number of random edge swaps that keep the graph simple and connected 3/24

The global algorithm Generation of simple connected graphs

Simple Connected

¦ Realize the degree sequence : linear (Havel-Hakimi 1955) ¦ Connection : linear number of edge swaps ¦ Shuffle : perform a certain number of random edge swaps that keep the graph simple and connected 3/24

The global algorithm Generation of simple connected graphs

Simple Connected Random

¦ Realize the degree sequence : linear (Havel-Hakimi 1955) ¦ Connection : linear number of edge swaps ¦ Shuffle : perform a certain number of random edge swaps that keep the graph simple and connected 3/24

The Shuffle Generation of simple connected graphs

G

4/24

The Shuffle Generation of simple connected graphs

ok G

4/24

The Shuffle Generation of simple connected graphs

ok G

G

4/24

The Shuffle Generation of simple connected graphs

ok

ok

G

G

4/24

The Shuffle Generation of simple connected graphs

ok

ok

G

G

G

4/24

The Shuffle Generation of simple connected graphs

ok

ok

NO

G

G

G

4/24

The Shuffle Generation of simple connected graphs

ok

ok

G

G

4/24

The Shuffle Generation of simple connected graphs

ok

ok

G

G

G

4/24

The Shuffle Generation of simple connected graphs

ok

ok

ok

G

G

G

G

4/24

The Shuffle Generation of simple connected graphs

ok

ok

ok

ok

G

G

G

G

4/24

The shuffle seen as a Markov chain Generation of simple connected graphs

¦ State space : all simple connected graphs with the right degrees ¦ Initial state : graph obtained after the first two steps ¦ Transitions : valid edges swaps B

C

B

C

A

D

A

D

. Theorem (Taylor 1982) : This Markov chain is ergodic and symmetric. It converges towards the uniform distribution over all states 5/24

Convergence speed The shuffle process

. Empirical result (Milo 2001, Gkantsidis 2003) : After O(|G|) transitions, no difference can be made between the graphs obtained at this point and the graphs obtained with further iterations.

. But each transition takes O(|G|) time (connectivity test) . Quadratic complexity

6/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok G

7/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok G

G

7/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok

ok

G

G

7/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok

ok

G

G

G

7/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok

ok

ok

G

G

G

G

7/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok

ok

ok

ok

G

G

G

G

7/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok

ok

ok

ok

G

G

G

G

. Speed-up : One connectivity test every T edge swaps

ok G

7/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok

ok

ok

ok

G

G

G

G

. Speed-up : One connectivity test every T edge swaps

ok

?

?

G

G

G

G

7/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok

ok

ok

ok

G

G

G

G

. Speed-up : One connectivity test every T edge swaps

ok

?

?

NO

G

G

G

G

7/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok

ok

ok

ok

G

G

G

G

. Speed-up : One connectivity test every T edge swaps

ok G

7/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok

ok

ok

ok

G

G

G

G

. Speed-up : One connectivity test every T edge swaps

ok

?

?

G

G

G

G

7/24

Speed-up (Gkantsidis et al. 2003) Generation of simple connected graphs

. Naive : One connectivity test for each transition

ok

ok

ok

ok

G

G

G

G

. Speed-up : One connectivity test every T edge swaps

ok

?

?

ok

G

G

G

G

7/24

Choice of the speed-up window T : heuristics Speed-up the shuffle process

. Gkantsidis et al. (2003) : auto-adjust No Peform T swaps

Cancel the last swaps

connected ? T = T/2 Yes

T = T+1

. Efficiency ? 8/24

Benchmark Speed-up the shuffle process

Size Naive Gkan. 1000 2.9 s 7.2 104 6 min 13.3 105 ≈10 hours 5 106 ≈40 days 2.6

9/24

Part II

Towards optimal heuristics Formal analysis Proposal of new heuristics

Formal analysis : Definitions Towards optimal heuristics

. Disconnection probability p ok ok

1−p

G

G NO

p

G

ok NO

0

G

G NO

1

G

10/24

Formal analysis Towards optimal heuristics

. Disconnection probability p ok ok

1−p

G

G NO

p

ok

G

NO 1

G

. Success ratio r = (1 − p)T

G

ok T

ok

G

0

NO

(1−p)

G

G

NO T

1−(1−p)

G

. Speed-up factor θ = r · T = T · (1 − p)T 10/24

Optimality condition Formal analysis

40

40

35

35

Speed−up factor θ

Spedd−up factor θ

. Speed-up factor θ = T · (1 − p)T

30 25 20 15 10

T = 1/p

25 20 15 10

r = 1/e

5

5 0 0

30

100

200

Window T

300

400

0 0

0.2

0.4

0.6

Success ratio r

θ is maximal when T = 1/p i.e. r = 1/e and θmax =

0.8

1

1 p·e

11/24

Analysis of the Gkantsidis heuristics Formal analysis

. Auto-stabilisation of the window T towards a steady state T r · (T + 1) + (1 − r) · = T 2

. The steady-state success rate is very close to 1 √

. Speed-up ratio obtained : θ ∼ θmax

12/24

The new heuristics Towards optimal heuristics

Success ⇒ T = T ∗ (1 + q +) Failure

instead of T = T + 1

⇒ T = T ∗ (1 − q −)

instead of T = T /2

. Steady-state window only depends on the ratio q+/q− . Optimality condition Tsteady =

1 p

is satisfied ⇐⇒

q+ q−

=e−1

. Speed-up factor close to θmax

13/24

Benchmark The new heuristics

. Definition of the optimal heuristics . Comparison of the speed-up factors n z θGk θ θopt 104 2.1 0.79 0.88 0.90 104 3 3.00 5.00 5.19 104 6 20.9 112 117 104 20 341 35800 37000

. 90% close to the optimal 14/24

Benchmark II The new heuristics

Size Naive Gkan. Opt. Heur. 1000 2.9 s 7.2 11.4 104 6 min 13.3 50 105 ≈10 hours 5 11.8 106 ≈40 days 2.6 5

15/24

Part III

Prevent the disconnection Decrease the disconnection probability p

Isolated pairs Prevent the disconnection

. Idea decrease p to raise the speed-up factor θ

. How ? Avoid the formation of isolated pairs

16/24

Isolated pairs Prevent the disconnection

. Idea decrease p to raise the speed-up factor θ

. How ? Avoid the formation of isolated pairs

16/24

Isolated pairs Prevent the disconnection

. Idea decrease p to raise the speed-up factor θ

. How ? Avoid the formation of isolated pairs

16/24

Isolated pairs Prevent the disconnection

. Idea decrease p to raise the speed-up factor θ

. How ? Avoid the formation of isolated pairs

16/24

Prevent the disconnection . Idea decrease p to raise the speed-up factor θ

. How ? Avoid the formation of isolated pairs In practice, reduction factor from 1/2 to 1/20

16/24

Going further : K-isolation tests Prevent the disconnection

. Detect and avoid the formation of small isolated components

17/24

Going further : K-isolation tests Prevent the disconnection

. Detect and avoid the formation of small isolated components ¦ For every edge swap, perform two K-limited breadth- or depthfirst search from the vertices that might have been disconnected

17/24

Going further : K-isolation tests Prevent the disconnection

. Detect and avoid the formation of small isolated components ¦ For every edge swap, perform two K-limited breadth- or depthfirst search from the vertices that might have been disconnected ¦ If a small component is detected, cancel the swap rightaway

17/24

Going further : K-isolation tests Prevent the disconnection

. Detect and avoid the formation of small isolated components ¦ For every edge swap, perform two K-limited breadth- or depthfirst search from the vertices that might have been disconnected ¦ If a small component is detected, cancel the swap rightaway ¦ If not, validate the swap

17/24

Going further : K-isolation tests Prevent the disconnection

. Detect and avoid the formation of small isolated components ¦ For every edge swap, perform two K-limited breadth- or depthfirst search from the vertices that might have been disconnected ¦ If a small component is detected, cancel the swap rightaway ¦ If not, validate the swap

. Time complexity O(K) per edge swap, instead of O(1)

17/24

Going further : K-isolation tests Prevent the disconnection

. Detect and avoid the formation of small isolated components ¦ For every edge swap, perform two K-limited breadth- or depthfirst search from the vertices that might have been disconnected ¦ If a small component is detected, cancel the swap rightaway ¦ If not, validate the swap

. Time complexity O(K) per edge swap, instead of O(1) . But the lower probability p causes a raise of the speed-up factor θ

17/24

Going further : K-isolation tests Prevent the disconnection

. Detect and avoid the formation of small isolated components ¦ For every edge swap, perform two K-limited breadth- or depthfirst search from the vertices that might have been disconnected ¦ If a small component is detected, cancel the swap rightaway ¦ If not, validate the swap

. Time complexity O(K) per edge swap, instead of O(1) . But the lower probability p causes a raise of the speed-up factor θ ¦ How much will p decrease ?

17/24

Going further : K-isolation tests Prevent the disconnection

. Detect and avoid the formation of small isolated components ¦ For every edge swap, perform two K-limited breadth- or depthfirst search from the vertices that might have been disconnected ¦ If a small component is detected, cancel the swap rightaway ¦ If not, validate the swap

. Time complexity O(K) per edge swap, instead of O(1) . But the lower probability p causes a raise of the speed-up factor θ ¦ How much will p decrease ? ¦ Intuition : K vertices are K-exponentially unlikely to be isolated

17/24

Going further : K-isolation tests Prevent the disconnection

. Detect and avoid the formation of small isolated components ¦ For every edge swap, perform two K-limited breadth- or depthfirst search from the vertices that might have been disconnected ¦ If a small component is detected, cancel the swap rightaway ¦ If not, validate the swap

. Time complexity O(K) per edge swap, instead of O(1) . But the lower probability p causes a raise of the speed-up factor θ ¦ How much will p decrease ? ¦ Intuition : K vertices are K-exponentially unlikely to be isolated ¦ Consequence : p would decrease exponentially with K ? 17/24

Effect on the disconnection probability K-Isolation tests 0

Disconnection probability p

10

−2

10

−4

10

−6

10

0

20

40

60

80

100

Isolation test width K

18/24

Adjusting the isolation test width K K-Isolation tests

Empirically : p ∼ e−λK 1 θmax = ⇒ θmax ∼ eλK p·e

. Exponential decrease of Ctests (connectivity tests complexity) . Linear increase of Cswaps (complexity of edge swaps)

19/24

Adjusting the isolation test width K K-Isolation tests

Empirically : p ∼ e−λK 1 θmax = ⇒ θmax ∼ eλK p·e

. Exponential decrease of Ctests (connectivity tests complexity) . Linear increase of Cswaps (complexity of edge swaps) . The tradeoff consists in balancing both Cswaps and Ctests Cswaps = O(K³· |G|) ´ Ctests = O

|G|2 eλK

)

⇒ K = O(log |G|)

19/24

Adjusting the isolation test width K K-Isolation tests

Empirically : p ∼ e−λK 1 θmax = ⇒ θmax ∼ eλK p·e

. Exponential decrease of Ctests (connectivity tests complexity) . Linear increase of Cswaps (complexity of edge swaps) . The tradeoff consists in balancing both Cswaps and Ctests Cswaps = O(K³· |G|) ´ Ctests = O

|G|2 eλK

)

⇒ K = O(log |G|)

. Final complexity is O(|G| log |G|) instead of O(|G|2) 19/24

Adjusting the isolation test width K K-Isolation tests

T = T* 2

< C swaps < > C tests

T0 K

K0

> YES

SAVE the graph

Perform T swaps validated by K −isolation tests

Still connected ? NO

K

Restore the saved graph

Maybe not optimal, but works fine 20/24

Benchmark

Size Naive Gkan. Opt. Heur. Final 1000 2.9 s 7.2 11.4 22.3 104 6 min 13.3 50 510 105 ≈10 hours 5 11.8 2180 106 ≈40 days 2.6 5 7780

21/24

Part IV

Conclusion

Contributions . Analysis of Gkantsidis et al. heuristics . New heuristics, designed to reach the optimal . Validation, benchmarks . New idea to prevent the disconnection during the shuffle . Log-linear algorithm. Implementation, benchmarks 22/24

Future work

. More formal proofs . Extension to directed graphs . Application to some dynamic connectivity algorithms

23/24

The End

Thank you

24/24