Describing and simulating internet routes - Jeremie Leguay

Nov 16, 2006 - law of exponent 1.97 (the fraction of nodes of .... degrees, well fitted by power laws. ...... routes similar to real ones (in the sense of the statis-.
468KB taille 2 téléchargements 330 vues
Computer Networks 51 (2007) 2067–2085 www.elsevier.com/locate/comnet

Describing and simulating internet routes Je´re´mie Leguay a, Matthieu Latapy

b,*

q

, Timur Friedman a, Kave´ Salamatian

a

a

b

LIP6 – CNRS and Universite´ Paris 6 – 4, Place Jussieu, 75005 Paris, France LIAFA – CNRS and Universite´ Denis Diderot – 2, Place Jussieu, 75005 Paris, France

Received 19 August 2005; received in revised form 11 October 2006; accepted 17 October 2006 Available online 16 November 2006 Responsible Editor: I. Nikolaidis

Abstract This contribution deals with actual routes followed by packets in the Internet at the IP level. We first propose a set of statistical properties to analyse such routes. We then use the results to suggest and evaluate methods for generating artificial routes suitable for simulation purposes. The proposed approach also leads to insight on various network models. The present work is based on large data sets provided mainly by CAIDA’s skitter infrastructure.  2006 Elsevier B.V. All rights reserved. Keywords: Internet; Routing; Routes; Complex networks; Graphs; Measurement; Modeling

1. Introduction Realistic modeling of routes in the Internet is a challenge for network simulation. Until now, one had to choose one of the three following approaches: (1) use the shortest path model, (2) explicitly model the Internet hierarchy, and separately simulate interand intra-domain routing, or (3) replay routes that

q A reduced conference version of this paper has been published in the proceedings of the international conference Networking 2005. This version is much more detailed, contains significantly more results, and corrects a few mistakes. * Corresponding author. Tel.: +33 144 275 617; fax: +33 144 276 849. E-mail addresses: [email protected] (J. Leguay), latapy @liafa.jussieu.fr (M. Latapy), [email protected] (T. Friedman), [email protected] (K. Salamatian).

have been recorded with a tool like traceroute [1]. All of these methods have serious drawbacks. The first one does not reflect reality: routes do not in general have the same properties as shortest paths, as already pointed out for instance by Paxson [2,3], probably because of routing policies [4,5] mainly at the autonomous system (AS) level. As described in detail recently by Spring et al. [4], and earlier by Tangmunarunkit et al. [6,5], this often induces path inflation. The second method is limited by our ability to explicitly simulate the Internet hierarchy. Much work has been done to model the Internet topology (see for instance [7,8]), and much progress has been made, but today’s topology generators are still inaccurate in capturing some parameters while they strive to adhere to others. See for instance the findings in Li et al.’s Sigcomm 2004 paper [9], and the BRITE case which we study in Section 5.6. Then, even

1389-1286/$ - see front matter  2006 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2006.10.008

2068

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

if one is satisfied with the quality of the topology model, there is the question of simulating dynamic inter- and intra-domain routing. A non-negligible programming effort is required if the choice is made not to use a simulator, such as ns [10], that has these algorithms built in. In order to avoid these challenging modeling issues, one may instead use real-world measurements and then try to interpret the obtained data as a collection of interconnected domains. This issue also is challenging, however, and it is out of the scope of this paper. Finally, the third method is not suitable if routes from a large number of sources are to be simulated. Today’s route tracing systems employ at most a few hundred sources. CAIDA’s skitter [11,12] infrastructure, for instance, produces an extensive graph suitable for simulations, but it is based on routes from just around 30 sources. Despite its well-known drawbacks, and because of the lack of more accurate models, the shortest path model is generally used. Examples from recent years include Lakhina et al.’s Infocom 2003 paper [13], Barford et al.’s Sigcomm 2002 paper [8], Riley et al.’s MASCOTS 2000 paper [14], Guillaume et al.’s Infocom 2005 paper [15], and Clauset et al.’s STOC 2005 paper [16]. The ns network simulator documentation itself proposes the simulation of routes by shortest paths as an alternative to simulating routing algorithms [10, Chapters. 26, 29]. This paper’s principal contribution is a new approach to modeling routes in the Internet, one that does not share the drawbacks just described. We suggest using an actual measured graph of the Internet topology, such as the graph generated by skitter. From that topology, we suggest choosing sources and destinations as one wishes from the nodes of the graph. Between these sources and destinations, we then suggest generating artificial routes with a model that has been chosen to reflect the statistical properties of actual routes. Central to this contribution are two specific models for artificial route generation: the random deviation model (RDM) and the node degree model (NDM). As we will see, these models generate routes with relatively inexpensive calculations, and the routes that they generate better reflect the statistical properties of actual routes than does the shortest path model. This paper’s other contribution is to update measurements of some familiar statistical properties of real routes, notably path length and the hop direction, and to introduce and measure a new statistical property: the evolution of node degree along a route.

These properties serve as the standard for evaluating whether simulated routes resemble real routes. By introducing this standard, this paper lays the groundwork for going beyond the work described here through the introduction of yet better models. The remainder of this paper is organized as follows. Section 2 describes the data set that we have used and the context in which our work lies. Section 3 proposes the set of statistical properties we use to describe routes in the Internet. Section 4 proposes the models we use to simulate routes based on these properties. Section 5 evaluates those models and the assumptions we made, and Section 6 concludes the paper. 2. The framework The ideal perspective from which to characterize routes in the Internet would be from a snapshot of the routing tables of routers throughout the network. Unfortunately, such a snapshot is impossible to obtain on the scale of the entire network. In this section, we describe the alternative that we opted for, and the hypotheses we made. 2.1. The Internet as a graph Efforts to map the Internet graph take place at three main levels. One is the autonomous system (AS) connectivity graph, which can be constructed from BGP announcements (captured for instance by The Oregon Route Views Project [17] from peering arrangements with roughly 60 network service providers). The others are the routing graph, where the nodes are the routers and the links are the physical connections between them, and the IP graph, where the nodes are the IP addresses of network interfaces and the links between them correspond to logical links (hops in the routing). The IP graph can be obtained using traceroute and similar tools from a number of different points in the network. To our knowledge, skitter, which runs traceroute on a daily basis from on the order of 30 servers to on the order of a million destinations, is among the most extensive ongoing efforts at the IP level. Note that this separation into three levels is not exhaustive. One may consider the logical links between routers or the physical ones, for instance. One may also consider the physical links between interfaces. It would also be possible to consider link-layer devices, such as hubs and bridges. The

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

three levels view however is sufficient for our purpose. Let us insist on the fact that, because of the fully distributed nature of the Internet, these graphs are not directly observable. In order to study them, one has to collect large amounts of data from various sources, and then recompose a (partial and possibly biased) view of the real graph. Neither level is ideally suited to the task of modeling the behavior of routes in the Internet. While the AS graph is directly based upon routing information, it is too coarse-grained to capture the details of path inflation. Moreover, a shortest path at the AS level does not necessarily correspond to a shortest path at the routing or IP levels. Simulators that do not explicitly model the AS hierarchy have been found by Tangmunarunkit et al. [7] to do better at generating graphs with desirable properties. Since our goal is to help in network simulations, we will therefore focus on the IP and routing levels. Similar work should however be done at the AS level, and the comparison of the two would certainly be very interesting. The main problem when using traceroute is that what one actually sees is the IP graph, while the routing graph would be more relevant. One single node in the routing graph appears as several separate nodes, one or more for each of its interfaces, in the IP graph. Moreover, traceroute captures logical links, which may miss the presence of tunneling, in ATM or MPLS subnetworks for instance. Ideally, then, one would construct the routing graph using methods to ‘‘disambiguate’’ IP addresses, such as the alias resolution techniques described by Pansiot et al. [18], and by Govindan et al. [19] for Mercator. There are also techniques, such as those used by Spring et al. [20,21], in Rocketfuel, and by Teixeira et al. [22], that take advantage of router and interface naming conventions to infer routing topology from the IP one. Up to our knowledge, no study deals with the tunneling problem and other sophisticated biases. Most of these disambiguation techniques, as applied for example in the iffinder tool from CAIDA [23], do not work by simple inspection of the IP graph; they require active probing, preferably simultaneous with graph discovery. This constraint makes extensive disambiguated routing graphs much harder to obtain than IP graphs. At best, some core network topologies are available in this form thanks to Rocketfuel. But Rocketfuel is untested in stub networks. Finally, it is very difficult to judge

2069

the extent to which disambiguation is successful, and incomplete or incorrect disambiguation could introduce unknown biases. To avoid these difficulties, we have restricted ourselves to the IP graph as obtained from skitter, and routes in this graph as obtained directly from traceroute. The resulting caveat is that the graph may not be properly representative of the routing level graph. This caveat is however mitigated by the fact that the IP graph resembles the routing graph in one important respect: unless we encounter tunneling, route lengths are preserved. That is to say that a route that has a given length (in hops) in the routing graph has the same length in the corresponding IP graph. Furthermore, as Broido et al. note [24], ‘‘interfaces are individual devices, with their own individual processors, memory, buses, and failure modes. It is reasonable to view them as nodes with their own connections.’’. Finally, we consider this work as a first step towards the accurate modeling of routes, and therefore prefer to make choices as simple as possible. We will see in Section 5 that these assumptions have little impact, if any, on our results. 2.2. The data set This study uses skitter data from 2 July, 2003. The data was collected from 23 servers targeting 594,262 destinations, leading to 7,075,189 routes (not all sources probed all destinations) on that day. We obtained an IP graph by merging all these routes. We then removed the following IP addresses, considered as invalid (see RFC 3330 [25]): addresses in the private blocks 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16, link-local addresses in 169.254.0.0/ 16, TEST-NET addresses in 192.0.2.0/24, addresses in the ‘‘this network’’ block 0.0.0.0/8, loopback addresses in 127.0.0.0/8, 6–4 relay anycast addresses in 192.88.99.0/24, benchmark testing addresses in 198.18.0.0/15, multicast addresses in 224.0.0.0/4, and reserved addresses formerly known as the Class E addresses in 240.0.0.0/4, which includes the LAN broadcast address, 255.255.255.255. This led to remove 3.95% of the edges and 3.25% of the nodes. The resulting graph contains 885,438 nodes and 1,266,671 links. This graph captures well the small-world, clusterized, and scale-free nature of the Internet already pointed out in numerous publications, by Jin and Bestavros [26] and others [27–31]. In particular, the average distance is approximately 11.4 hops,

2070

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

and the degree distribution is well fitted by a power law of exponent 1.97 (the fraction of nodes of degree k is distributed as k1.97). It captures the fact that, though most nodes have a low degree, there is a non-negligible number of nodes with very high degree. The average clustering coefficient of this graph (i.e. the probability that two randomly chosen neighbors of any node are linked together, [32]) is equal to 0.035, which is huge compared to the one of random graphs of the same number of nodes and links, equal to 1.30 · 106. The fact that this graph shares properties common to most complex networks encountered in practice, as described for instance by Albert and Baraba´si [32] and Newman [33], will be useful for our characterisation of Internet routes. Notice that this graph is necessarily incomplete and biased due in particular to probing from a limited number of sources, to route dynamics, to tunneling and to erroneous or absent responses to traceroute probes. Biases of graphs induced by acquisition through a small number of traceroute monitors have been studied for instance by Lakhina et al. [13] and by Clauset et al. [16]. Despite these biases, recent studies by Dall’Asta et al. [34] and Guillaume et al. [15] show that one may be quite confident of the accuracy, using this kind of exploration, of distances and degrees, which are the main properties that we use here. Moreover, skitter data represents the current state of the art in its extent and accuracy. We therefore consider this graph as a good approximate of the IP graph in this study, and will call it the skitter graph. 3. Statistical properties of routes This section presents a set of properties for the statistical description of Internet routes. These properties motivate the models of Section 4. Several properties have already been studied in previous works, and the work here serves to evaluate, update and complete them. Before entering in the core of this section, let us insist on the fact that our aim here is to define statistics as simple as possible but that succeed in capturing essential properties of Internet routes. We will see in Section 5 that the statistics presented below fit these requirements. It must be clear however that they are very aggregated, and that much more precise statistics may be used to obtain more insight on the route properties. Such statistics would however lead to intricate, hard to evaluate, route

models, which is in contradiction with our purpose. This is why we will restrict ourselves to these coarse statistics, which will prove to be sufficient here. 3.1. Route lengths It is well known that routes are not in general shortest paths. Fig. 5(left), page 9, shows the distributions of route lengths in our data set, and of the corresponding shortest paths. It also shows the distribution of the difference delta between the length of a route and the corresponding shortest path. These distributions are compiled as follows. For each route i obtained by traceroute, we compute its length ‘i and the length si of a shortest path between the source of the route and its destination. We also compute the difference, di = ‘i  si. The mean length of 15.57 hops for routes in this data set fits closely Paxson’s observations [3,2] on a data set that is older by 9 years. The shortest paths have a mean length of 11.4 hops. The distributions are well centered on their mean value: no route has a length more than twice the average. However, route lengths vary more around their mean, with a standard deviation r = 3.99, than do shortest paths, with r = 2.62. The delta distribution confirms Tangmunarunkit et al.’s observation [6,5], mentioned at the beginning of this paper that roughly 80% of routes are not shortest paths. In this particular data set, 19.34% of routes are shortest paths. Notice that, since the data are incomplete, there are undiscovered links, which implies that 19.34% is an overestimate: at least 80.66% of the considered routes are indeed longer than shortest paths in the true IP graph. Route lengths and shortest path lengths are both well fitted by gamma distributions. Shortest paths have an estimated shape parameter of k = 21.18 and an estimated scale parameter of h = 0.53. Routes have k = 14.56 and h = 1.07. Tangmunarunkit et al. also observed that 20% of routes were at least 50% longer than shortest paths. We find a somewhat larger portion: 33.4%. Again, this is a lower bound, and therefore the larger value that we observe may be due to the use of a more complete graph. One might wonder if the value of d is correlated to the length of the shortest path, which would seem natural. For instance, routes between sources and destinations that are further apart may have a larger d. We examine more closely the shortest path lengths between 9 and 16, which represent more

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

than 85% of the cases. In this range, the mean value of d is best fitted by the line y = 0.13x + 1.46 with an asymptotic standard error for both parameters under ±13%, see Fig. 1 (middle). Given this low slope and this standard error, it may be seen as almost flat, which contradicts the intuition: the value of d does not depend significantly on the actual distance between the considered sources and destinations. Notice however that the mean hides considerable variations, which can be observed in the quantile plots in Fig. 1. 3.2. Hop direction When a packet travels from one router to another, it may move forward to its destination, but it may also move further, or even stay at the same distance from the destination. Likewise, the distance from the source may increase, decrease, or remain constant. We will call these behaviors the hop direction, considered with respect to either the destination or the source. If the hops always increase the distance from the source and decrease the distance to the destination, the route is a shortest path. Notice that hop directions in the IP graph correspond to the ones in the routing graph, since distances are preserved between the two graphs (see Section 2.1). Hop directions with respect to a given source may be computed for all the routes starting at this source using breadth-first search. This has a cost linear in the size of the graph. Likewise, it is possible to study hop directions with respect to the destinations 25

2071

using a breadth-first search rooted at each destination. In our case, we have many destinations but only a few sources. Therefore, only hop directions with respect to sources can be observed while maintaining a reasonable complexity. Hop directions with respect to the destinations may be studied using only a part of all the destinations but, since the number of sources is small, the approximation would be poor in this case. We will therefore restrict ourselves to source point of view in the following. Examining the route traces, we found that 87.3% of hops go forward, 4.6% go backward, and 8.1% remain at the same distance from the source (we call these stable hops). As an example, Fig. 5(right), page 9, shows the portion of forward, backward, and stable hops as a function of the hop distance for routes of 15 hops. We chose this length because it correspond to the most numerous routes, roughly 140,000. The obtained plot is typical of what we obtained for any length. This will be true everywhere we will choose to focus on routes of a given length in the following. As one would expect, the first and last few hops are generally forward because there are few alternatives, if any: these parts of the network have a treelike structure, induced by the underlying access networks. On the contrary, in the core of the network a significant proportion of the hops (more than one third) do not go further from the source. This type of behavior has already been described in the literature as a consequence of policy-based routing in the core of the Internet. As Tangmunarunkit et al. [6,5] note, such behaviors may be induced by load balancing, commercial considerations, etc. 3.3. Degree evolution along a route

delta

20 15 10 5 0 9

10 11 12 13 14 15 16 shortest path length

Fig. 1. Quantile and average plots for d for various shortest path lengths. From top to bottom: the plain line represents the maximal value, the dotted line the 90th percentile (i.e. 90% of the values are below this line); the plain line (crossing the vertical bars) corresponds to the median, the dotted line corresponds to the 10th percentile, and the last plain line, which collapses on the x-axis, is the minimal value. The vertical bars display the quartiles: 50% of the values are within these bars (the topmost corresponding to the 75th percentile, the other to the 25th). The dots show the average.

Recent work has shown that many real-world complex networks tend to have very heterogeneous degrees, well fitted by power laws. This is in particular true for the Internet, as observed by Faloutsos et al. [27] and others. Moreover, most of the short paths between pairs of nodes in these networks tend to pass through the highest degree nodes. Actually, almost all paths (not only short ones) tend to pass through these nodes, which make them essential for network connectivity [35–40]. In the case of the Internet, this may be due to the fact that users access the Internet through access nodes that multiplex huge numbers of subscribers. These observations lead us to ask how the node degree evolves along a route. If routes tend to pass

2072

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

through high degree nodes, where do they do so, and what degree nodes do they encounter? Furthermore, does this tendency to pass through high degree nodes imply that, when a choice exists between next hops, the next hop that leads to the highest degree nodes is generally chosen? Fig. 2 shows how node degrees evolve along routes of length 15 (notice the logarithmic vertical scale for the quantile plot). There is a significant increase in the degrees at the very beginning of the plot, as well as a significant decrease at the end. In between, the plot is quite flat. This leads us to the following interpretation: the hosts have low degree, they are connected at their first hop router to relatively high degree nodes which play the role of access points, and then packets are routed in a core network where the degree (10 on average) does not depend much on the distance from the source or from the destination. Notice that the flatness in the middle of the plot does not mean that all the

10000 degree

degree

1000 100 10 1

60 50 40 30 20 10 0

0 2 4 6 8 10 12 14 distance

1 3 5 7 9 11 13 15 distance

Fig. 2. Degree evolution along routes of length 15. Left: quantile plots (the lines indicate, from top to bottom, the maximal value, the 90th percentile, the median, the 10th percentile and the minimal value; the vertical bars span the region between the 75th and the 25th percentiles, thus corresponding to 50% of the values). Right: the average value.

0.5

proportion

0.35

The previous section provides a set of statistical tools to capture some non-trivial properties of

out-degree=4 (candidates: 13831) out-degree=5 (candidates: 9630) out-degree=6 (candidates: 7267) out-degree=7 (candidates: 5569) out-degree=8 (candidates: 4417) out-degree=9 (candidates: 3499) out-degree=10 (candidates: 2900)

0.26 0.24 0.22 proportion

0.4

4. Route models

0.28

degree=4 (candidates: 37800) degree=5 (candidates: 21419) degree=6 (candidates: 13633) degree=7 (candidates: 9526) degree=8 (candidates: 7289) degree=9 (candidates: 5736) degree=10 (candidates: 4616)

0.45

nodes in the core have a similar degree (the degrees in the core follow a power law). But, once a packet has entered this core, there is no correlation any longer between the degree of the node and the distance from the source or from the destination. One may wonder if there is a simple local rule that can be observed for the degree evolution along a route. In particular, when there is a choice of next hop along a route, is there a correlation between the degree rank of the neighbors and their probability of being chosen? For instance, are highest degree nodes chosen preferentially over lower degree ones? Note that such a rule could be perfectly compatible with the observed flat degree evolution in the middle of routes. Fig. 3(left) plots the probability that a packet goes to a node’s ith ranked neighbor, where the neighbors are ranked from highest degree to lowest. We show the plots obtained for degrees 4–10, which are the cases where both the degrees and the number of nodes are non-trivial. There is no apparent overall correlation in this plot, which seems to invalidate our hypothesis. However, if one considers only the neighbors of a node towards which it is susceptible to send a packet (in other words, we consider the skitter graph directed according to the ways the collected routes are traveled), then one obtains the plot on the right of Fig. 3. One may then observe a bias towards highest degree nodes, though this bias is rather small.

0.3 0.25 0.2

0.2 0.18 0.16 0.14

0.15

0.12

0.1

0.1

0.05

0.08 1

2

3

4

5

6 rank

7

8

9

10

1

2

3

4

5

6

7

8

9

10

rank

Fig. 3. Choice of next hop node as a function of this node’s degree ranking. Left: on the (undirected) skitter graph. Right: on the directed version.

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

routes in the Internet. The statistics on route lengths and hop directions are sufficient to demonstrate that the shortest path model is inadequate for generating realistic simulated routes on a graph of the Internet. The statistics on node degree evolution provide another measure of what realistic routes should look like. We now propose three simple models (only two of which we eventually retain) designed to capture these features. Our approach is as follows: we design a model as simple as possible which focuses on one of the properties of interest, and then we use the other statistics to evaluate the model (in the next section). This ensures that the models stay very simple, and this makes it possible to study the relations between the observed properties: are they independent or on the contrary can some of them be seen as consequences of others?

4.1. Path length model The path length model is the simplest and the most obvious one conceptually, but it proves to be unusable in practice. The model aims at producing routes of the same length as real ones. As discussed in Section 3.1, a real route length typically exceeds that of the shortest known path by some small integer value d P 0 (see Fig. 5(left)). In order to construct a route from a source s to a destination d, the path length model first computes the length ‘ of a shortest path from s to d. Then it samples a deviation d from a distribution such as the one shown in Fig. 5(left), and a route is generated by choosing a path at random from s to d among the ones which are loop-free and have length ‘ + d. This ensures that the difference between

2073

shortest path lengths and actual route lengths will be captured by the model. To choose such a path at random implies however that one must construct all the loop-free paths of length ‘ + d from s to d. In practice, the computation required to generate this number of paths may be prohibitive, since even in simple cases it is exponential in ‘ + d. For example, in trying to generate all paths of length 21 between a pair of nodes in the skitter graph, we enumerated 1,206,525 possible paths. Therefore, despite its conceptual simplicity, we will not consider this model further. As we will see, this does not mean that we do not try to fit this property; we will fit it by using models based on other parameters. 4.2. Random deviation model (RDM) The RDM is based upon the idea that a route usually follows a shortest path, but might occasionally deviate from it. We modeled this using one single parameter, p, the probability at any point of deviating from the current shortest path to the destination, if such a deviation is possible. A random deviation route from source s to destination d is therefore based upon a shortest path u from s to d. At each hop, with probability 1  p, the route continues along u. But with probability p it will, if possible, deviate off u to another path. A deviation from current node x to one of its neighbors y is deemed possible only if there is a shortest path w from y to d that does not pass through x. Should there be a deviation, the route continues along w to d (unless another deviation should occur). Fig. 4(left) shows an example of how a route can be generated using the RDM. In this graph, there is a five hop shortest path from source s to destination

Fig. 4. Examples for the models. Left: random deviation model (RDM). Right: node degree model (NDM).

2074

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

d. The route follows this path for three hops and then deviates at v. This deviation is possible because the shortest path from v 0 to d does not contain v. The resulting route is seven hops long. We can use Fig. 4(left) to illustrate some details of the RDM. It shows instances in which no deviation is possible. For example, there can be no deviation at the first-hop node of the shortest path from s to d, since it has no neighbor that is not already on the shortest path being followed. Also, there can be no deviation at the second-hop node, even though there is a neighbor that is not on the shortest path. The reason for this is that the only shortest path from this neighbor to d passes through the node we come from. The figure also shows an instance where two deviations are possible: at node v, deviations to v 0 and v00 are both possible. The choice of which to take (if any) is random. Finally, notice that large numbers of routes to a given destination d can be efficiently generated with the RDM once a shortest path tree rooted at d has been computed.

Fig. 4(right) shows an example. There are three tree-like structures (the shaded areas). The source s belongs to the leftmost one, which is rooted at rs, and the destination d to the rightmost one, with root at rd. Each directed link goes from one node to its preferred neighbor (the dotted lines are links which do not satisfy this). When one wants to build a route from s to d according to the NDM, one first finds the path from s to rs, and the one from d to rd. One then has to compute a shortest path from rs to rd, which has length 5 in this example. The final route is obtained by merging these paths, and then removing the loops (which leads to the removal of two links, in our example). It has length 8, while the shortest path has length 7. One may empirically observe that this method leads to paths very close to shortest ones, which we will confirm in Section 5. Moreover, the computation of the tree-like structure where each node points to its preferred neighbor is very simple and only has to be processed once. Likewise, the shortest paths between a small number of looping points are computed only once.

4.3. Node degree model (NDM) 5. Evaluation Several previous authors [41,36] have tried to use the heterogeneity of node degrees to compute short paths in complex networks. The basic idea is that a path which goes preferentially towards high degree nodes tends to ‘‘see’’ most nodes very rapidly (a node is considered to be seen when the path passes through one of its neighbors). The NDM is based upon a similar approach, as follows. For each node, we define its preferred neighbor as its highest degree neighbor; we pick one at random if it has several such neighbors. Then, two paths are computed, one starting from the source and the other from the destination. The next node on these paths are always the preferred neighbor of the current nodes. The computation ends when we reach a situation where a node is the preferred neighbor of its own preferred neighbor. One can show that only this kind of loop can occur. Then, one of two cases applies: either the two paths have met at a node, or they have not. In the first case, the route produced by the model is the discovered path (both paths are truncated at the meet up node, and are merged). In the second case (which in practice is very rare), we compute a shortest path between the two loops, and then obtain the route by merging the two paths and this shortest path, removing any loops.

This section is devoted to the evaluation of the models we have just proposed, and to the discussion of their possible use. Our basic methodology will be to compare the properties of the obtained artificial routes to the ones of the original routes. One therefore has to choose a graph on which the routes will be constructed, and then choose sources and destinations. We will first generate routes on the skitter graph using the same sources and destinations as in the original data, and then using random sources and destinations. After this, we will use other maps of the Internet with random sources and destinations, and finally we will run our models on the most widely used graph models of the Internet. All of these experiments give some information on the behavior of our models, as well as on the relevance of the underlying graph. In each case, we will compute a large number of artificial routes and study the same properties as the ones we studied on real routes. Therefore, the evaluation of each set of results is done by comparing the obtained plots to the ones discussed in Section 3, and given in Fig. 5. Finally, the evaluation of the RDM model depends on a parameter, namely the deviation probability p.

delta shortest paths routes

10000

1

1000

0.8 proportion

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

degree

P (X = x)

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

100 10

0.6

2075

F S B

0.4 0.2

1

0 0

5

0 2 4 6 8 10 12 14

10 15 20 25 30 number of hops

1

3

5

7

9 11 13 15

traceroute hops

distance

Fig. 5. Statistics for the original skitter routes. Left: length distributions of routes and shortest paths, and distribution of the difference delta between the length of each route and the corresponding shortest path length. Middle: Degree evolution along routes of length 15 (same plot as in Fig. 2(left)). Right: Hop directions along 15-hop routes (F: forward, S: stable, B: backward).

The choice of a value for this parameter might be expressed as a function of the expected average path length. However, studying this is out of the scope of this paper; instead, we took the same value for all the experiments, p = 0.2, which was chosen to empirically give the best average fits when the RDM is compared to the original skitter routes on the skitter graph with the same sources and destinations. Tuning its value to the best fits in the other cases too would also be relevant, but we observed that the results do not vary significantly as long as the value is not too different. We therefore maintained the same value in order to make the presentation and the interpretation easier. 5.1. skitter Graph

M

delta shortest paths routes

10000

1

1000

0.8

100 10

M

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

delta shortest paths routes

1

0 2 4 6 8 10 12 14 distance

10 15 20 25 30

out-degree

P (X = x)

F S B

0.4

0 5

number of hops

D

0.6

0.2

1 0

N

proportion

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

10000

1

1000

0.8

100 10

5

7

9 11 13 15

0.6 0.4 0.2

1

3

traceroute hops

proportion

D

P (X = x)

R

out-degree

Figs. 6 and 7 show the results obtained with both models on the skitter graph, when one takes the very

same sources and destinations as in the original data, and when one chooses sources and destinations at random, respectively. Before entering into the details, let us notice that the results seem very good: the global shapes of all the plots fit quite well the original ones for both models, even when sources and destinations are taken at random. The average route lengths are 13.6 with the RDM and 14.7 with the NDM, when the sources and destinations are the original ones. They are 15.1 and 14.9 when sources and destinations are random. This is to be compared to the average shortest path length in this graph, 11.4, and to the average length of real routes, 15.6. We may conclude that the average route length is quite well captured, though not exactly. In all the cases the route length distributions are symmetric, average somewhat higher than the shortest path distribution, and have tails similar to the

F S B

0 0

5

10 15 20 25 30 number of hops

0 2 4 6 8 10 12 14 distance

1

3

5

7

9 11 13 15

traceroute hops

Fig. 6. Models on the skitter graph with the same sources and destinations as in the original measurement. From left to right: length distributions, degree evolution along routes, and hop directions.

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

P (X = x)

D

delta shortest paths routes

0.25

M

0.2 0.15 0.1 0.05

10000

1

1000

0.8 proportion

0.3

R

out-degree

2076

100 10

M

0.2 0.15 0.1 0.05

1

10000

1

1000

0.8 proportion

P (X = x)

D

delta shortest paths routes

0.25

0

0 2 4 6 8 10 12 14 distance

10 15 20 25 30 number of hops

out-degree

0.3

N

5

F S B

0.4 0.2

1

0 0

0.6

100 10

5 7 9 11 13 traceroute hops

0.6 0.4 0.2

1

0

3

F S B

0 0

5

10 15 20 25 30 number of hops

2 4 6 8 10 12 14 distance

1

3

5 7 9 11 13 15 traceroute hops

Fig. 7. Models on the skitter graph with random sources and destinations. From left to right: length distributions, degree evolution along routes, and hop directions.

actual route length distribution shown in Fig. 5. Lengths of paths generated with the NDM tail off somewhat quicker than in reality (approaching zero closer to length 20 than length 25), but the degree of fidelity is nonetheless remarkable given that the length distributions are not explicitly part of the model. This indicates that this model probably captures some relevant properties. The RDM generates more routes that are shortest paths than in reality (roughly 30% compared to roughly 20%), whereas the NDM generates somewhat fewer (roughly 12%). The NDM performs better than the RDM in capturing the evolution of the degree along routes, especially close to the source. This is particularly true when using the same sources and destinations as in the original measurement. The difference is less significant with random ones. This indicates that there are more possible choices for routing close to the source, which is probably a bias due to the measurement itself (the map is more precise close to the sources than close to the destinations). The fact that the RDM performs well on average (random sources and destinations) indicates that the shortest path to the destination generally goes to a highest degree neighbor. If one takes the same sources as in the original data, however, this is not true anymore and the NDM performs better. Now focusing on the hop directions in 15-hop routes, it appears clearly that the RDM behaves much better than the NDM. Both capture qualitatively the properties of real routes, but the behavior of the

is very similar to the original one. Overall proportions of forward, stable, and backward hops closely match reality in both cases: 88%, 8% and 4%, and 84%, 9% and 7% for the RDM and for the NDM when we take the original sources and destinations, and 89%, 7% and 4%, and 82%, 11% and 7% for the RDM and for the NDM when we take random sources and destinations. The proportions for the original routes were 87% forward, 8% stable, and 5% backward. RDM

5.2. Mercator graph We ran our models on two Internet maps provided by other researchers. For one of these maps, both the routing and the IP levels were provided. We considered this as an occasion to test the robustness of our models to a change from the IP level to the routing one. Moreover, still for this dataset, the routes were also provided. Therefore, we computed the statistics on them. The results are presented and discussed in this section and the next one. The first case we will consider is the mercator graph studied by Govindan et al. [19], which is freely available on the web [42]. This graph was obtained in 1999 using traceroute massively from one source only but with source routing. Some antialiasing has been done in order to bring it closer to the routing graph. These data correspond to the very beginning of the research on large scale Internet topology; it may contain significant bias and

M

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

delta shortest paths routes

10000

1

1000

0.8 proportion

D

P (X = x)

R

out-degree

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

100 10

0

10 15 20 25 30

2

delta shortest paths routes

6

8

10

1 2 3 4 5 6 7 8 9 10 traceroute hops

10000

1

1000

0.8 proportion

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

4

distance

out-degree

P (X = x)

M

F S B

0.4

0 5

number of hops

D

0.6

0.2

1 0

N

2077

100 10

0.6 0.4 0.2

1

F S B

0 0

5

10 15 20 25 30

0

number of hops

2

4

6

8 10 12

distance

1 2 3 4 5 6 7 8 9 101112 traceroute hops

Fig. 8. Models on the mercator graph with random sources and destinations. From left to right: length distributions, degree evolution along routes, and hop directions.

errors, but it is still one of the very few maps publicly available, and it is widely used. We ran our models on it and obtained the results in Fig. 8. Since the real routes used to construct this map are not available, we could not compare the artificial routes to them. The obtained results are in accordance with the properties of skitter routes concerning path lengths and hop directions. However, the degree evolution along routes is significantly different. We believe that this is due to the fact that, using only one source (and despite the use of source routing), the graph has a tree-like structure with high degree nodes close to the root (i.e., the source of all collected routes). The routes therefore go up this tree, encountering nodes with higher and higher degree, and then go further to the destination. The non-trivial behaviors of route lengths and hop directions would then be a consequence of the links which prevent the map from being exactly a tree.

5.3. Nec graph Despite its being obtained through massive use of traceroute, the measurement method is quite different for this map described by Magoni [43], which is freely available on the web [44]. It is based on the use of so-called looking-glasses, which makes it possible to use several hundreds of sources. However, to avoid an overload of these sources, the

number of destinations also has been reduced to a few hundreds. Moreover, many destinations are routers, whereas in the other maps they generally are hosts. As we will see, this has important consequences for route properties. This dataset however has the important advantage of being available both at the routing level and at the IP one [44]. Moreover, Magoni provided us with the actual routes he used to construct it. This gives us the opportunity to study the statistical properties of these routes, just like we did with the skitter ones. It also makes it possible to compare the properties of interest at the IP and routing levels. Fig. 9 plots the properties of these real routes at the IP level, and Fig. 10 plots them at the routing level. One may be surprised by the fact that the properties of these real routes differ significantly from the ones of skitter: the lengths are smaller, the degree does not grow rapidly at the beginning of the route and does not decrease rapidly at the end, and, even more strikingly, many (and even most) of the hops are not forward at the end of the route. This can however be explained simply by two complementary facts. First, the destinations of these routes often are routers (not hosts), which is equivalent to saying that these routes are only the beginning of host-tohost routes (unlike the skitter ones). Moreover, the neighborhood of the destinations is much better explored than in the skitter graph because of the large number of sources. Therefore, it is more dense,

delta shortest paths routes

1000 100 10

0.6 F S B

0.4 0.2 0

0

10 15 20 25 30 number of hops

2

4 6 distance

8

1

2

3 4 5 6 7 traceroute hops

8

9

level. From left to right: length distributions, degree evolution along routes, and hop directions.

delta shortest paths routes

10000

1

1000

0.8 proportion

IP

out-degree

5

Fig. 9. Original nec routes at

P (X = x)

1 0.8

1 0

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

10000 proportion

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

out-degree

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

P (X = x)

2078

100 10

0.6 F S B

0.4 0.2

1

0 0

5

10 15 20 25 30 number of hops

0

2

4 6 distance

8

1

2

3 4 5 6 7 traceroute hops

8

9

Fig. 10. Original nec routes at routing level. From left to right: length distributions, degree evolution along routes, and hop directions.

and this makes the number of forward hops decrease. The fact that this topology exists at the routing level allow us to check an important assumption we have made at the beginning of the paper: that the plots at the IP and the routing levels are very similar. This tends to confirm that our choice to stay at the IP level is relevant. In order to push the evaluation of our models further, let us now study the properties of artificial routes generated using them, from random sources and destinations (the use of the same sources and destinations as in the original data give very similar results, therefore we do not present them here). Figs. 11 and 12 show these properties for the nec graphs at the IP and the routing levels respectively. Again, these plots confirm that, as long as one is concerned with the simple statistics and models we propose here, there is no significant difference between the IP and the routing levels. Moreover, one can see that the models tend to simulate routes that resemble host-to-host routes, and therefore produce routes, which are much more similar to the skitter routes than the original nec routes. This may be considered as a good point for our models, which may be applied on other graphs than the skitter one and which are able to use the properties of the underlying graph to produce realistic routes.

5.4. Random graphs We begin the evaluation of our route models with the most simple graph model, the classical random graphs from Erdo¨s and Re´nyi [45,46]. Such a graph is constructed from n disconnected nodes by adding links between m randomly chosen pairs of nodes. Here, we took for n and m the same values as in the original skitter graph, in order to have a random graph comparable to this original one. It is well known that the Internet is significantly different from a random graph, in particular concerning its degree distribution (see for instance [27]). We consider this model as an interesting case however because it is the simplest and it is often used as a building block of more intricate models. Fig. 13 shows the results obtained with our models on such a graph (they are representative of all the experiments we ran on such graphs). The sources and destinations are chosen at random. Both the degrees and the shortest path lengths in a random graph are very homogeneous [46,47]: all the nodes have a degree close to the average value, and all the pairs of nodes are at a distance close to the average distance. This is confirmed by the plot of the shortest path length distribution. Moreover, with each model, the degree along a route is very stable due to the low variability of degrees in the graph: the first and last nodes have the average

P (X = x)

D M

delta shortest paths routes

10000

1

1000

0.8 proportion

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

R

out-degree

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

100 10

0

10 15 20 25 30

1

delta shortest paths routes

2 3 4 distance

5

6

1

2

3

4

5

6

traceroute hops

10000

1

1000

0.8 proportion

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

out-degree

P (X = x)

M

F S B

0.4

0 5

number of hops

D

0.6

0.2

1 0

N

2079

100 10

0.6 F S B

0.4 0.2

1

0 0

5

0

10 15 20 25 30

2

number of hops

4 6 distance

8

1

2

3

4

5

6

7

8

9

traceroute hops

Fig. 11. Models on the nec graph at IP level with random sources and destinations. From left to right: length distributions, degree evolution along routes, and hop directions.

M

delta shortest paths routes

1

10000

0.8 1000

proportion

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

out-degree

D

P (X = x)

R

100 10

5

0 0

10 15 20 25 30

1

M

delta shortest paths routes

2 3 4 distance

5

1

6

10000

1

1000

0.8

100 10

2

4

5

6

0.6 F S B

0.4 0.2

1

3

traceroute hops

proportion

P (X = x)

D

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

out-degree

number of hops

N

F S B

0.4 0.2

1 0

0.6

0 0

5

10 15 20 25 30 number of hops

0

2

4

6

distance

8

1

2

3 4 5 6 7 traceroute hops

8

9

Fig. 12. Models on the nec graph at routing level with random sources and destinations. From left to right: length distributions, degree evolution along routes, and hop directions.

degree since they are chosen at random, and all the nodes in between are chosen with a probability proportional to their degree, which explains why their degree is larger than the average degree but quite stable. The RDM produces routes with very rare deviations, since most of the time no deviation at all is possible because of the low average degree of nodes (no deviation at all is possible if the degree of a node

is lower than 3, which is often the case as one can check on the plot of the degree along the routes). Therefore the routes produced by this model are mostly shortest paths, which explains the statistics. The NDM produces routes with properties closer to the ones of real routes: the length distribution is different from shortest paths, and not all the hops are forward. One can have quite a precise idea of the structure of the produced routes by noticing that,

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

M

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

delta shortest paths routes

100

1 0.8 proportion

D

P (X = x)

R

out-degree

2080

10

1

0.2

5

0

10 15 20 25 30

2

4 6 8 distance

number of hops

M

delta shortest paths routes

10 12

1

100

3

5 7 9 traceroute hops

11

1 0.8 proportion

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

out-degree

P (X = x)

D

F S B

0.4

0 0

N

0.6

10

1

0.6 0.4 0.2

F S B

0 0

5

10 15 20 25 30

0

number of hops

2

4

6

8 10 12 14

distance

1

3

5

7

9 11 13 15

traceroute hops

Fig. 13. Models on a purely random graph with random sources and destinations. From left to right: length distributions, degree evolution along routes, and hop directions.

since all the degrees are close to the mean value, the route rapidly reaches the place where it becomes a shortest path. Therefore, a route produced by this model is nothing but very few hops towards higher degree nodes, then a shortest path, and again a few hops to degrees in decreasing order to the destination. This explains the fact that the length distribution of these routes is close to the one of shortest paths, it describes precisely the degree evolution along routes, and finally it explains the observed hop directions. In both case, the produced routes are quite different from real ones. Since the underlying graph has properties qualitatively different from the ones of the IP graph, this cannot be seen as surprising. 5.5. Scale-free graphs We now examine how the models behave on scalefree graphs, i.e. graphs with a power law degree distribution as obtained using the Albert and Baraba´si model [48,32]. Such a graph is constructed by adding nodes one by one until we have the wanted number of nodes, each new node being linked at random to k pre-existing nodes with a probability proportional to their degree. The value of k is chosen in order to induce the wanted number of links at the end of the construction (it is half the average degree). Tangmunarunkit et al. found [7] that power law based generators create topologies that better match the Internet’s topology than do other common sorts

of graphs, such as those produced by explicitly hierarchical topology generators. Despite the simplicity of this model, it captures important features of the Internet topology, and the models we proposed may be relevant on it. Moreover, it is very often used to model the Internet [34,37] and as a building block for more accurate models (see below). Again, we chose the parameters to fit the number of nodes and links of the original skitter graph (k = 1.4), in order to obtain a comparable graph. We chose sources and destinations at random, and obtained the results plotted in Fig. 14. They are representative of all the experiments we ran on such graphs. First notice that scale-free graphs have a very low average shortest path length in general [49–51], here 7.7, as can be seen in the length distributions. Our models produce longer routes, but they remain quite short. This leads us to consider statistics on routes of length 8 or 10 depending on the model, which are the most numerous. Both models clearly fail in capturing the degree evolution along routes in such a graph. The highest degree nodes are always reachable, as can be seen on the plot of the degree evolution along routes from the NDM. This induces a regular increase in the degrees along such routes until a very high degree node, and then a decrease until it reaches the destination. Notice also that a random deviation tends to go towards high degree nodes (they have more links and thus a randomly chosen link has a high proba-

M

P (X = x)

D

1

1000

0.8

100 10

2081

0.6 F S B

0.4 0.2

1

0 0

N

10000 proportion

delta shortest paths routes

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

5

delta shortest paths routes

1

0 1 2 3 4 5 6 7 8

10 15 20 25 30 number of hops

2

3

4

5

6

7

8

traceroute hops

distance 10000

1

1000

0.8 proportion

M

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

out-degree

D

P (X = x)

R

out-degree

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

100 10

0.6 0.4 0.2

1

F S B

0 0

5

10 15 20 25 30

0

number of hops

2

4

6

distance

8

10

1 2 3 4 5 6 7 8 9 10 traceroute hops

Fig. 14. Models on a scale-free graph with random sources and destinations. From left to right: length distributions, degree evolution along routes, and hop directions.

bility to be connected to such a node), which explains the degree evolution along routes from the RDM. Finally, let us observe that the RDM captures surprisingly well (compared to the other statistics and to the other model) the hop directions. It might be seen as a consequence of several facts. First, during the construction of the parts of the route close to its extremities (the source and the destination), one has very few choices for the next hop due to the low degree of nodes in this part of the graph. On the contrary, when constructing the parts of the route far from its extremities, one has many choices. The fact that the NDM performs poorly indicates that choosing the highest degree neighbor at this point is inconsistent with the observed properties; random choices perform better. 5.6. Brite graphs BRITE [52,53] is one of the most widely used models in network simulation, in particular in Internet simulation. We therefore used it to generate a variety of graphs supposed to be good approximations of the skitter graph (in terms of size and degree distribution at least), and ran our models on them. Two cases should be considered:

• a flat topology, which is simply a scale-free graph as described above. However, since BRITE needs an integer value for the number k of links added

at each step (the original definition of the model did not specify what to do when k is non-integer), we had the choice between k = 1 and k = 2. In the first case, one obtains a tree, in which our models produce nothing but shortest paths. We therefore obtain trivial statistics (length distributions are the same, degree evolutions grow to a maximum and then decrease, and there are only forward hops). We therefore took k = 2 and then obtained results very similar to the ones described above for k = 1.4. Therefore, we do not detail experiments on flat topologies here. • a hierarchical topology with nodes distributed in autonomous systems. BRITE first generates the as topology with the scale-free model already described, and then the topology inside each AS is generated using this model again. The obtained degree distribution follows a truncated power law, meaning that the degree are heterogeneous but there is no node with very high degree. We generated such a topology with n = 900,000 nodes distributed in 9000 autonomous systems (100 routers per AS). At the AS level we chose k = 10 and inside each AS k = 1. This leads to an average degree of approximately 2.2. One may also use the purely random model at one level or the other, or both. We present here the parameters, which gave the better results, plotted in Fig. 15. The performances obtained in these experiments are very poor, and there is little hope that other

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

M

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

delta shortest paths routes

1

100

0.8 proportion

D

P (X = x)

R

out-degree

2082

10

1

0.2

5

2

10 15 20 25 30

4

M

delta shortest paths routes

6

1 2 3 4 5 6 7 8 9 1011121314

8 10 12 14

traceroute hops

100

1 0.8 proportion

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

out-degree

P (X = x)

F S B

distance

number of hops

D

0.4

0 0

N

0.6

10

1

0.6 0.4 0.2

F S B

0 0

5

10 15 20 25 30

2

number of hops

4

6

8 10 12 14

distance

1 2 3 4 5 6 7 8 9 1011121314 traceroute hops

Fig. 15. Models on a hierarchical BRITE graph with random sources and destinations. From left to right: length distributions, degree evolution along routes, and hop directions.

parameters would give significantly better results. Indeed, the fact that k = 1 inside each AS causes these graphs to be trees. Therefore, most routes actually are shortest paths, which explains the statistics. Larger values of k should be considered, but BRITE forces them to be integers, and k = 2 gives an average degree significantly too large for Internet modeling. Moreover, one can clearly see on the plot of the degree evolution that the two-level structure induces quasi-periodic variations in the node degrees, which does not fit the properties met in practice.

6. Conclusion and discussion The first contribution of this paper is to provide a framework for describing routes in the Internet, and to use it to describe routes in one of the largest and most complete data sets currently available. The characteristics we have used to describe routes are: their lengths, and the differences between those lengths and the lengths of corresponding shortest paths; the direction of hops along a route; and the evolution of the degree of nodes along a route. We have chosen these characteristics based upon graph theoretic knowledge of the typical properties of real-world complex networks graphs, of which the Internet is an example. Let us notice that these characteristics are very general and may be

used (and extended) with benefit in other complex network studies: until now, no statistical tool had been proposed to describe large sets of paths in such networks. Other graph theoretic characteristics may also be studied in the manner we have done here. The evolution of the node clustering coefficient along a route would be a natural candidate, for instance. One may also study the link clustering coefficient: jN ðuÞ\N ðvÞj ccðu; vÞ ¼ jN where (u, v) is a link in the graph. ðuÞ[N ðvÞj Other interesting perspectives are to consider the routes as directed (from sources to destinations), the links as weighted (by the measured delay), and to take into account the dynamics of the Internet and its routes. Paxson [2,3] and, more recently, Amini et al. [54] have characterized the asymmetry of routes in the Internet. Likewise, we have focused on the topological characteristics of Internet routes. Could we tie this in to the considerable body of knowledge concerning the delay characteristics of routes? Savage et al. [55] and Spring et al. [4], for instance, have characterized round-trip time (RTT) inflation. These works need to be continued, and describing these important characteristics in a way similar to what we have done here for static unweighted undirected routes would certainly make sense. The other main contribution of this paper is to propose simple models which make it possible (and easy) to generate large amounts of artificial

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

routes similar to real ones (in the sense of the statistical properties we have observed). These models may be used in particular for simulation. We have shown both that these models capture non-trivial features (the obtained routes are not shortest paths) and that they fit real-world data well. This last point however depends on the underlying graph and its properties. If we consider the original graph, then the results are in very good accordance with the real-world data. If we take other Internet maps, then the results remain very good. If we turn to graph models, however, the results are very poor. This indicates that the degree of fidelity of our models relies on some properties, which are not captured by these graph models, thus confirming that there is still much to be done for the accurate modeling of Internet topology. It would also make sense to model the fact that routes are directed, dynamic and weighted. The NDM is static and undirected by nature: it always produces the same route from a given node to another (except if there is a choice between several shortest paths in the middle). The RDM, on the contrary, already contains dynamics and a notion of direction. The route obtained may vary from one instance to the next. However, much remains to be done to model these characteristics. We have also shown that the properties of the graph used to model the Internet have a crucial impact on the performance of our models. We explained most of the influence of the graphs on the models, which leads us to conclude that any model would perform poorly because of the fact that graph models are still not accurate enough to actually contain routes with the properties we captured. This is an important point which supports the following points: • first, it would make sense to conduct experiments on more intricate models, such as those using Li et al.’s first-principles approach [9], in order to determine if they are, finally, accurate enough, or rather confirm our conclusion that current models still miss some important statistical properties; • second, the most relevant models of the Internet topology seem to be the real-world maps obtained by actual measurement. Simulation should therefore be run on such graphs, but also on models which have the advantage of being well understood, which in turn makes it possible to interpret the observed phenomena.

2083

Finally, this study has restricted itself to the IP graph (though we have made a comparison with the routing graph in the case of the nec graph). As we mention in the Introduction, measurements of the AS graph are also available, and it is well known that much of path inflation can be explained by decisions taken at the inter autonomous system level. Undertaking the same kinds of analysis and modeling as we have done, but at the AS level, would certainly be interesting. Moreover, relating the results at one level to the other would significantly improve our understanding of Internet routes, and of the Internet in general. Acknowledgements We thank K.C. Claffy and the staff at CAIDA for making the skitter data available to us. We thank Ramesh Govindan and Damien Magoni for providing useful data. No such study would be possible without the real-world data these colleagues collect and make available. We also thank Mark Crovella and Miriam Sofronia for their helpful comments. This work is supported in part by the RNRT’s Metropolis and the ACI Se´curite´ et Informatique’s MetroSec projects. References [1] V. Jacobsen, Traceroute, 1989. ftp://ftp.ee.lbl.gov/traceroute. tar.gz. Also see NANOG traceroute, ftp://ftp.login.com/pub/ software/traceroute/. [2] V. Paxson, End-to-end routing behavior in the Internet, in: Proceedings of ACM SIGCOMM, 1996. [3] V. Paxson, End-to-end routing behavior in the Internet, IEEE/ACM Trans. Network. 5 (5) (1997) 601–615 (see also Proceedings of ACM SIGCOMM 1996). [4] N. Spring, R. Mahajan, T. Anderson, Quantifying the causes of path inflation, in: Proceedings of ACM SIGCOMM, 2003. [5] H. Tangmunarunkit, R. Govindan, S. Shenker, Internet path inflation due to policy routing, in: Proceedings of SPIE ITCom, 2001. [6] H. Tangmunarunkit, R. Govindan, S. Shenker, D. Estrin, The impact of routing policy on internet paths, in: Proceedings of IEEE Infocom, 2001. [7] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, W. Willinger, Network topology generators: degree-based vs. structural, in: Proceedings of ACM SIGCOMM, 2002. [8] P. Barford, A. Bestavros, J. Byers, M. Crovella, On the marginal utility of network topology measurements, in: Proceedings of Internet Measurement Workshop (IMW), 2001. [9] L. Li, D. Alderson, W. Willinger, J. Doyle, A first-principles approach to understanding the internet’s router-level topology, in: Proceedings of ACM SIGCOMM, 2004. [10] Fall, K.Varadhan, The ns Manual, 12, 2003.

2084

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085

[11] B. Huffaker, D. Plummer, D. Moore, K. Claffy, Topology discovery by active probing, in: Proceedings of the Symposium on Applications and the Internet (SAINT), January, 2002. [12] CAIDA, skitter. http://www.caida.org/tools/measurement/ skitter/. [13] A. Lakhina, J. Byers, M. Crovella, P. Xie, Sampling biases in IP topology measurements, in: Proceedings of IEEE Infocom, 2003. [14] G.F. Riley, M.H. Ammar, R. Fujimoto, Stateless routing in network simulations, in: MASCOTS, 2000, pp. 524–531. [15] J.-L. Guillaume, M. Latapy, Relevance of massively distributed explorations of the Internet: simulation results, in: Proceedings of IEEE Infocom, 2005. [16] A. Clauset, C. Moore, Traceroute sampling makes random graphs appear to have power law degree distributions, in: Proceedings of the International Symposium on Theory of Computing (STOC), 2005. [17] D. Meyer, University of Oregon Route Views Project. http:// www.antc.uoregon.edu/route-views/. [18] J.J. Pansiot, D. Grad, On routes and multicast trees in the Internet, ACM SIGCOMM Computer Commun. Rev. 28 (1) (1998) 41–50. [19] R. Govindan, H. Tangmunarunkit, Heuristics for internet map discovery, in: Proceedings of IEEE Infocom, 2000. [20] N. Spring, R. Mahajan, D. Wetherall, Measuring ISP topologies with Rocketfuel, in: Proceedings of ACM SIGCOMM, 2002. [21] N. Spring, M. Dontcheva, M. Rodrig, D. Wetherall, How to resolve ip aliases, Technical Report 04-05-04, Washington University of Computer Science, 5, 2004. [22] R. Teixeira, K. Marzullo, S. Savage, G. Voelker, In search of path diversity in ISP networks, in: Proceedings of Internet Measurement Conference (IMC), 2003. [23] K. Keys, iffinder, a tool for mapping interfaces to routers. http://www.caida.org/tools/measurement/iffinder/ (restricted access). [24] A. Broido, K.C. Claffy, Internet topology: connectivity of IP graphs, in: Proceedings of SPIE International Symposium on Convergence of IT and Communication, 2001. [25] IANA, Special-use IPv4 addresses, RFC 3330, Internet Engineering Task Force, September, 2002. [26] S. Jin, A. Bestavros, Small-world internet topologies, Technical Report BUCS-TR-2002-004, Boston University of Computer Science, 2002. [27] M. Faloutsos, P. Faloutsos, C. Faloutsos, On power-law relationships of the internet topology, in: Proceedings of ACM SIGCOMM, 1999. [28] A. Vazquez, R. Pastor-Satorras, A. Vespignani, Internet topology at the router and autonomous system level, [condmat/0206084]. [29] A. Vazquez, R. Pastor-Satorras, A. Vespignani, Large-scale topological and dynamical properties of the Internet, Phys. Rev. E 65 (2002) 066130. [30] L.A. Adamic, The small world web, in: S. Abiteboul, A.-M. Vercoustre (Eds.), Proceedings of the Third European Conference Research and Advanced Technology for Digital Libraries ECDL, vol. 1696, Springer-Verlag, 1999. [31] A.Z. Broder, S.R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, J.-L. Wiener, Graph structure in the web, Computer Networks 33 (1–6) (2000) 309–320.

[32] R. Albert, A.-L. Barabasi, Statistical mechanics of complex networks, Rev. Modern Phys. 74 (2002) 47. [33] M.E.J. Newman, The structure and function of complex networks, SIAM Rev. 45 (2) (2003) 167–256. [34] L. Dall’Asta, I. Alvarez-Hamelin, A. Barrat, A. Vasquez, A. Vespignani, A statistical approach to the traceroutelike exploration of networks: theory and simulations, in: Proceeding of the First International. Conference on Combinatorial and Algorithmic Aspects of Networks (CAAN), 2004, full version to appear in Theoretical Computer Science (TCS). [35] R. Albert, H. Jeong, A.-L. Baraba´si, Error and attack tolerance in complex networks, Nature 406 (2000) 378– 382. [36] B.J. Kim, C.N. Yoon, S.K. Han, H. Jeong, Path finding strategies in scale-free networks, Phys. Rev. E 65 (2002) 027103. [37] R. Cohen, K. Erez, D. Ben Avraham, S. Havlin, Breakdown of the internet under intentional attack, Phys. Rev. Lett. 86 (2001) 3682–3685. [38] R. Cohen, K. Erez, D. ben Avraham, S. Havlin, Resilience of the internet to random breakdown, Phys. Rev. Lett. 85 (2000) 4626. [39] D.S. Callaway, M.E.J. Newman, S.H. Strogatz, D.J. Watts, Network robustness and fragility: percolation on random graphs, Phys. Rev. Lett. 85 (2000) 5468–5471. [40] J.-L. Guillaume, M. Latapy, C. Magnien, Comparison of failures and attacks on random and scale-free networks., in: Proceedings of the 8th International Conference on Principles of Distributed Systems (OPODIS), 2004. http:// www.liafa.jussieu.fr/~ latapy/Publis/. [41] T. Walsh, Search in a small world, in: Proceedings of IJCAI, 1999. [42] R. Govindan, H. Tangmunarunkit, Mercator. http://www. isi.edu/scan/mercator/mercator.html. [43] Damien Magoni, Mickae¨l Hoerdt, Internet core topology mapping and analysis, Computer Commun. 28 (2005) 494– 506. [44] Damien Magoni, nec (network cartographer). https://dptinfo.u-strasbg.fr/magoni/nec/. [45] P. Erdos, A. Renyi, On random graphs i, Publ. Math. Debrecen 6 (1959) 290–297. [46] B. Bollobas, Random Graphs, Academic Press, 1985. [47] S.N. Dorogovtsev, J.F.F. Mendes, Evolution of networks, Adv. Phys. 51 (2002) 1079–1187. [48] A.-L. Barabasi, R. Albert, Emergence of scaling in random networks, Science 286 (1999) 509–512. [49] Linyuan Lu, The diameter of random massive graphs, in: ACM-SIAM (Ed.), 12th Annual Symposium on Discrete Algorithms (SODA), 2001, pp. 912–921. [50] M.E.J. Newman, D.J. Watts, S.H. Strogatz, Random graphs with arbitrary degree distributions and their applications, Phys. Rev. E (2001). [51] R. Cohen, S. Havlin, Scale free networks are ultrasmall, Phys. Rev. Lett. (2003) 90. [52] A. Medina, A. Lakhina, I. Matta, J. Byers, BRITE, Boston University Representative Internet Topology Generator. http://www.cs.bu.edu/brite/. [53] A. Medina, A. Lakhina, I. Matta, and J. Byers, BRITE: an approach to universal topology generation, in: Proceedings of MASCOTS, 2001.

J. Leguay et al. / Computer Networks 51 (2007) 2067–2085 [54] L.D. Amini, A. Shaikh, H.G. Schulzrinne, Issues with inferring Internet topological attributes, in: Proceedings of SPIE ITCom, 2002. [55] S. Savage, A. Collins, E. Hoffman, J. Snell, T. Anderson, The end-to-end effects of Internet path selection, in: Proceedings of ACM SIGCOMM, 1999.

2085

He is the head of the national initiative aimed at coordinating french studies on large complex networks. He leads a national project about social networks on the internet, involving social and computer scientists. He also contributes to several other projects, among which MetroSec (Metrology of the internet for security and quality of services, http://www.laas.fr/METROSEC/), and the european COST 295 Dynamo (Dynamic Communication Networks, http://cost295.net/).

Je´re´mie Leguay received a Master of Science in Computer Science at Link} oping University in Sweden. He worked on the description and simulation of Internet routes during his master thesis. He is, since 2004, a Ph.D. candidate at the Computer Science laboratory (LIP6) of Pierre & Marie Curie University and at Thales Communications where he conducts research in ad hoc and delay tolerant networking.

Timur Friedman received the Ph.D. degree in computer science from the University of Massachusetts Amherst in 2001. He is currently a Maıˆtre de Confe´rences (assistant professor) of computer science at the Pierre et Marie Curie University in Paris, and a researcher at the Laboratoire d’Informatique de Paris 6 (LIP6). His research interests include large scale network measurement systems and disruption tolerant networking.

Matthieu Latapy is a LIAFA permanent researcher, CNRS and university Paris 7. He completed his PhD in computer science from the university Paris 7 in 2001, and obtained his current position in 2002. His research focuses on (very) large graphs met in practice. This includes in particular the ones originating from computer science, like the internet topology, the web graph, peer-to-peer overlays and data exchanges. But it also includes many other cases, like for instance social, biological and linguistic networks. He is involved both in theoretical and practical studies on these objects: he works on the measurement of such complex networks, their analysis, their modeling, and the related algorithmics.

Kave´ Salamatian is an associate professor at Pierre & Marie Curie University and a researcher at LIP6. His main areas of research cover networking information theory and Internet measurement and modeling. He has graduated in 1998 from Paris Sud – Orsay University with a Ph.D. in computer science.