chapter one

State of the art and trends in information networks modeling

1.1  Self-similarity and fractals in traffic modeling

1.1.1  Building of a framework

One of the key issues in measurement-based network control is to predict traffic in the next control time interval based on the online measurements of traffic characteristics. The goal is to forecast future traffic variations as precisely as possible, based on the measured traffic history. Traffic prediction requires accurate traffic models that can capture the statistical characteristics of the actual traffic. Inaccurate models may overestimate or underestimate the network traffic.

In a landmark paper, Leland et al. (1993) reported the discovery of self-similarity (SS) in the local-area network (LAN) traffic, more precisely the Ethernet traffic. Specifically, they studied two separate processes: the number of bytes arriving per time interval (a.k.a. byte count process), and the number of IP packets (a.k.a. packet count process).

Since packets can vary widely in size (40–1500 bytes in the case of Ethernet), it is in principle conceivable that the two processes would be quite different. However, the paper shows that they are both self-similar and also display fractal behavior according to the changing network conditions. Since then, self-similarity, fractals, and long-range dependence (LRD) have emerged as powerful tools for modeling traffic behavior and have brought a significant change in the understanding of network traffic.

It has been found in numerous studies (Crovella and Bestavros 1997, Willinger et al. 1997, Grossglauser and Bolot 1999, Ostring and Sirisena 2001 are among the most representative) that data traffic in high-speed networks exhibits self-similarity that cannot be captured by classical models, necessitating the development of self-similar models.

The problem with self-similar models is that they are computationally complex. Their fitting procedure is very time-consuming while their parameters cannot be estimated based on the on-line measurements. The main objective of this book is to discuss the problem of traffic prediction in the presence of self-similarity and particularly to describe a number of short-memory and long-memory stochastic traffic models and to associate them to non-model-based predictors, based on the following criteria:

1.  Accuracy: The most important criterion for choosing a predictor is the quality of its predictions, since the goal of the predictor is to closely model the future.

2.  Simplicity: In order to achieve a real-time predictor, a certain level of simplicity is necessary. Simplicity has an intrinsic value because of the ease of use and implementation.

3.  On-line: Most traffic modeling has been done for off-line data. In reality, for network control, we want to use on-line measurements to forecast the future. We do not know anything about the underlying traffic structure instead we should estimate the predictor parameters based on the on-line measurements.

4.  Adaptability: A good predictor should adapt to the changing traffic. As time progresses, more samples become available. Therefore, more information is known about the traffic characteristics. An adaptive traffic predictor should use new information for improvement and updation of its parameters.

The statistical characteristics of the network traffic were studied especially for understanding the factors that influence the performance and the scalability of large systems. Hence, there exist a number of studies related to the evidence of the self-similarity in modern communication networks (Willinger et al. 1996, Cao et al. 2000, Floyd and Paxson 2001, Sikdar and Vastola 2001), such as ON/OFF models for heavy tails distributions, users behavior, back-off algorithms, routers buffering, TCP algorithms for congestion avoidance, etc.

1.1.2  Mathematical background of self-similar processes

1.1.2.1  Stable distributions and semistable processes

Let us begin with the central limit theorem (due to DeMoivre and Laplace):

Xj is a sequence of independent and identically distributed (i.i.d.) random variables (RVs), having finite first moment (μ) and second moment (σ2). Denote Sn=X1++Xn, and let N(0,1) be the normal distribution with mean 0 and variance 1. Then, the RV Zn=(Snnμ)/σn approaches N(0,1) in distribution as n → ∞.

Note that the shifting and scaling parameters depend on n. A compact and general form of the theorem can be written as

Zn=anSn+bn.

(1.1)

This is not very interesting because no matter what the initial distribution, the limit is the same: N(0,1). However, if we drop the assumption of finiteness of σ, Zn can have as its limit any member of an entire family of distributions, the so-called stable distributions. The explanation of this occurrence is: if the common distribution X of the i.i.d. RVs Xj is a stable one, there exists at least one choice of sequences {an}, {bn} such that Zn=anSn+bn=anΣnX+bnX. In other words, X is stable under the “sum, shift, scale” algorithm defined above. The theory of stable distributions attained maturity in the 1950s.

The next step in generalization was to drop the independence assumption, and here the stochastic processes enter the picture. A stochastic process (SP) can be thought of as a set of RVs {Xi}, where the index i is referred to as the “time” parameter. In this context, the index will be either real (we say that the process has continuous time, and use the widespread notation Xt), real bidimensional (Xt, s), or an integer (discrete time, denoted Xn). The novelty consists in the fact that the individual RV need not be independent—in fact, the dependency itself is what makes a SP interesting. In order to completely characterize a SP, in general, we need to specify all the finite-dimensional distributions of its RVs: unidimensional (individual distributions of X1, X2, etc.), bidimensional (joint distributions of X1 and X2, X1 and X3, X2 and X3, etc.), …, n-dimensional, … Two SPs {Xt} and {ϒt} are said to be equal if all their finite-dimensional distributions are equal.

The connection with the stable laws is achieved through the so-called increment process. Let {Xt} be a SP in continuous time. For any pair st, the difference XtXs = Xs,t is a RV indexed by s and t. The set {Xs,t} is therefore a bidimensional SP in continuous time, called the increment process associated with {Xt}. Obviously, a given unidimensional SP uniquely defines a (bidimensional) increment process. The converse requires a little care, because more than one SP can have the same increment process. The simplest example is {Xt} and {Xt + C}, where C is a deterministic constant. To remove this impediment, one usually “fixes the origin” by assuming X0 = 0, and this permits the exact recovery of {Xt} when {Xs,t} is known: Xt = Xt,0. The SP analogous to a stable distribution is a SP with independent increments, all of which have the same stable distribution.

The analogy goes even further: every time interval [0,t] can be divided into an arbitrary number n of equal subintervals, and Xt becomes a sum of n increments, just like Sn from the central limit theorem (the difference is that the increments are not independent any more). This type of argument led Lamperti in a much-quoted paper (Lamperti 1962) to define semistable processes as those satisfying:

{Xrt}={arXt+br}

(1.2)

where the notation emphasizes that a and b depend on r. Lamperti himself explains the term semistable: “The name is intended to suggest the analogy with the theory of stable laws, and is rendered more appropriate by the fact […] that a semistable process, if it is assumed to have independent increments, must actually be a stable one.” He then goes on to prove the fundamental property of this class of SPs: for any semistable process, the scaling parameter must be of the form:

ar=rα,α0

(1.3)

This power law has had a long, illustrious career.

1.1.2.2  First empirical processes with power-law statistics

In 1951, the British physicist-turned-hydrologist H. E. Hurst described the first instance of a natural process exhibiting power-law behavior (Hurst 1951). The analysis was based on several records of levels of water in the river Nile, the longest of which extends over no less than 663 years (622–1284 ad).

By the very nature of the measurement process, such empirical data series are discrete and finite. As an analytical tool, Hurst used the rescaled adjusted range (R/S) statistic. A brief description of R/S will immediately reveal its relevance for hydrology: Let a data series be represented by {x0, x1, x2, …, xn}. The lowercase letters are meant to emphasize that this is not (yet!) a stochastic process (SP), but rather an ordered set of deterministic, fixed values. For each index nN, the following steps are performed:

•  Compute the sample mean μn and sample variance S(n) of the truncated series {x0, x1, x2, …, xn}. We expect these values to converge as n → ∞. If we imagine the data series to be a sample path of an underlying SP, we say that the SP is stationary.

•  For each kn, compute the rescaled value of the data series up to k:

Ak = x0 + … + xkkμn. This step “extracts” the average from the data, leaving only the variations around the average. If xk has the physical interpretation of the net annual amount of water that flows into a reservoir (positive or negative), the sum x0 + ⋯ + xk is the total quantity of water in the reservoir after k years, and Ak itself is the random variation of this quantity around its mean. In real life, the outflow of water from the reservoir is controllable, so the mean will ideally be kept equal to zero.

•  Compute MAXn = max[0, A1, …, An] and MINn = min[0, A1, …, An]. These are the extreme quantities of water that have to be stored in a period of n years.

•  Define R(n) = MAXn − MINn. This is the adjusted range, which is the capacity needed in the reservoir in order for it to either underflow or overflow due to the random variations in water input. Unlike S(n), R(n) does not in general converge. The intuitive explanation is that, as time (n) goes by, more and more “unlikely” excursions of Ak will occur, thus ever-increasing the capacity R(n) needed to cover them.

•  Define R/Sn = R(n)/S(n).

From what was said above, it is clear that R/Sn is not bounded as a function of n. Referring again to the underlying SP, R/Sn is itself a RV, so it makes sense to assume that E[R/Sn]. If E[R/Sn] is not bounded, the next question is: What is a function that closely resembles its dependency on n? Hurst’s discovery was that E[R/Sn] behaves like a power law in n, that is, there are constants C and H such that the ratio E[R/Sn]/(CnH)1 as n. We write this as follows:

E[ RSn ]nH

(1.4)

H is now called the Hurst parameter of the respective SP. When trying to fit the process with any of the “classical” models (e.g., Poisson processes, ARMA time-series, etc.), it was relatively easy to prove that the H resulting from such models is exactly 0.5—at odds with the experimental value. When represented in logarithmic coordinates, Equation 1.4 is a curve with an asymptote of slope H. This can be used to estimate H.

The second example was given by the economist Adelman (1965), who used Fourier transforms to analyze series of U.S. economic indicators collected since 1889. He found that all power spectra have a singularity (pole) at the origin, that is, the power at small frequencies is very high. (In time domain, this corresponds to long cycles, the longest of which have periods close to the sample size N.) Again there was discrepancy, since “classical” models would only yield power spectra which are flat at the origin.

1.1.2.3  From semistability to self-similarity

In 1965, Benoit Mandelbrot introduced self-similar SPs (the original French name was processus homothétiques à soi, while another widely used term is fractal processes). This is a model which finally captures the power-law behavior described earlier. (His most cited paper is the one published 3 years later in the SIAM Review [Mandelbrot and Van Ness 1968].) Referring directly to the increment process Xs,t = XtXs, he defines stochastic self-similarity as

Xt0,t0+rt=rHXt0,t0+t,t0,t,r>0.

(1.5)

Compared to Equation 1.2, it is clear that Mandelbrot’s self-similar processes are semistable in the sense of Lamperti. Taking advantage of the similar initials, we can abbreviate both as SS. The similarity between the two concepts goes even further, since Mandelbrot constructs his SS process (fractional Brownian motion, fBm for short) starting with the usual Brownian motion (Bm), which is used as an example by Lamperti in his already cited paper.

Assuming that the reader is familiar with the properties of Bm, we only mention two properties, which are important for our discussion:

•  Bm has independent increments.

•  Bm is self-similar, with Hurst parameter H = 0.5.

Denoting Bm as B(t) and fBm as BH(t), here is a simplified version of Mandelbrot’s definition of the fBm: BH(0) = 0, H ∈ [0,1] and

BH(t)BH(0)=1Γ(H+1/2){ 0[ (ts)H1/2(s)H1/2 ]dB(s)+0t(ts)H1/2dB(s) }

(1.6)

The reader need not worry now about the technicalities of this definition (gamma function, stochastic integration), but rather needs to concentrate on the big picture. The integral from –∞ causes the current value of the process (BH(t)) to depend on the entire history of the process, and this is the reason why, unlike the Bm, fBm does not have independent increments. For H > 0.5, the integrand goes to zero as t → –∞, so the “influence” of the distant past on the present is less than the “influence” of the more recent past (in this respect, we have something like a weighted average). The case H < 0.5 is much less understood, but, fortunately, without practical relevance as far as we know (Figure 1.1).

Returning to the definition of the fBm, we define fBm in Equation 1.6 as indeed an SS process in the sense of Equation 1.5, and that its R/S statistic obeys the power law of Equation 1.4. For a deeper mathematical treatment of the fBm, the reader is directed to Embrechts and Maejima (2002).

1.1.2.4  Self-similarity versus long-range dependence

The integral in Equation 1.6 points to another important property of SS processes: long-range dependence (LRD). Intuitively, LRD can be viewed as Adelman’s “long cycles”—correlations which, instead of “dying out” quickly, extend over long periods of time.

Mathematically it is not as easy to agree to what LRD means, because several definitions (not all equivalents) exist. One of them uses the correlation function of the discreteSP: ρ(k)=Cov[Xt,Xt+k]/σ2 is the autocorrelation of lag k. The SP is called LRD if there are constants α ∈ (0,1) and C > 0 such that

Image

Figure 1.1 Self-similar aspects for unidimensional stochastic processes.

ρ(k)Ckαk1,which we write simply as ρ(k)kα

(1.7)

For comparison, note that the AR(1) process Xi=aXi1+i,, with a < 1 has the polynomial autocorrelation ρ(k)ak. It is known that such a polynomial decreases faster than any power law of the type (1.7). More dramatically, the autocorrelation series k=1Nρ(k) is convergent for the polynomial form, but divergent for the power law. It is proved that a fBm with parameter H is LRD in the above sense, and that H = 1 − α/2 ∈ (1/2, 1), or, equivalently α = 2 − 2H ∈ (0,1) (Mandelbrot and Van Ness 1968). When represented in logarithmic coordinates, Equation 1.7 is called the correlogram of the process and has an asymptote of slope—α. This property can be used to estimate α (and therefore H), either graphically or through some analytic fitting procedure (e.g., least squares).

There are other definitions (see Beran 1994 for a thorough introduction), but we only mention another one, which uses the variance of the sample mean:

Var[ 1kXi/K ]Ck2H2k1,

(1.8)

or, denoting the sample mean by Xk,Var[Xk]k2H2.

The graphical representation of Equation 1.8 in logarithmic coordinates is called the variance–time plot and can also be used to estimate H. At this point we have three known methods for estimating H: R/S plot (1.4), correlogram (1.7), and variance–time plot (1.8).

A word of caution: SS and LRD are not overlapping properties. There are SS processes which are not LRD (the simple Bm is an example), and, conversely, there are LRD processes that are not SS. However, the fBm with H > 0.5 is both SS and LRD (according to Equations 1.4, 1.7, and 1.8).

All examples of power laws given in 1.3 Equations through 1.5, 1.7, and 1.8 can be presented in two ways:

•  Exact laws: Valid for all values of time (the equation has the “=”sign).

•  Asymptotic laws: Valid only in the limit as t or k → ∞ (the equation has the “~” sign).

FBm is exactly SS (equality in Equation 1.5) and also exactly LRD (equality in Equations 1.4, 1.7, and 1.8).

1.1.2.5  Self-similarity discovered in network traffic

In the already mentioned paper (Leland et al. (1993), reported the discovery of self-similarity in a local-area network (LAN) traffic, more precisely Ethernet traffic. Specifically, they studied two separate processes: number of bytes arriving per time interval (a.k.a. byte count process), and number of IP packets (a.k.a. packet count process). Since packets can vary widely in size (40–1500 bytes in the case of Ethernet), it is, in principle, conceivable that the two processes would be quite different. However, the paper shows that they are both SS, with H estimated in the range (0.6–0.9) according to the changing network conditions.

As a general critique of this approach, we observed that all methods used in Beran (1994) (and in numerous papers that followed) detect and estimate LRD rather than SS. Indeed, the only “proof” offered for SS per se is the visual inspection of the time series at different timescales. In the same paper, fBm is proposed as a model for the byte/packet count processes. Our opinion is that this is just an approximation, which tends to be off the mark at low utilizations of the LAN under test: the periodo-gram, correlogram and R/S plot all depart considerably from linearity for small t. In contrast, a fBm process should in theory generate perfectly straight lines in all these tests.

“Self-similarity” (actually LRD) has since been reported in various types of data traffic: LAN, WAN (Paxson and Floyd 1994), Variable-Bit-Rate video, SS7 control, HTTP, etc. (see Park and Willinger 2000 for an extensive bibliography). An overview of the above studies shows link speeds ranging from 10 Mbps (Ethernet) to 622 Mbps (OC-12), with the link type being “access” (typically connecting a university campus or research lab to an Internet service provider), average bandwidths between 1.4 and 42 Mbps, minimum timescale of 1 ms (in only one instance, usually above 10–100 ms), and at most six orders of magnitude for timescales. Lack of access to high speed, high-aggregation links, and lack of devices capable of measuring such links have prevented similar studies from being performed on Internet backbone links until recently. In principle, backbone traffic could be qualitatively different from the types enumerated earlier, due to factors such as much higher level of aggregation, traffic conditioning (policing and shaping) performed at the edge, and much larger round-trip-time (RTT) for TCP sessions.

Among the first proofs of the self-similarity in the traffic on the Internet backbone (OC-12 ATM and OC-48 POS links) are the correlograms shown in Figure 1.2 (Yao et al. 2003). They demonstrated that this traffic is indeed SS asymptotically and also reported as a new autocorrelation structure for short lags. The autocorrelation function for short lags has the same power form as that of the long lags, that is, ρ(k) ~ k −α, but the parameter α turns out to assume values that are significantly larger: α ∈ [0.55, 0.71] for k ∈ [50 μs, 10 ms], compared to α ∈ [0.1, 0.18] for k ∈ [100 ms, 500 s].

The first plot in Figure 1.2 shows the correlogram for the shortest time unit used in the analysis. Although the plot is clearly linear, its dependence is too chaotic to be of much use. For the second plot, the bytes arrived are aggregated at 0.4 ms time intervals, and the two slopes corresponding to the two values of α are easily observed. The third is a variance–time plot, which is just another way of looking at LRD. The straight line corresponds to a Hurst parameter H = 0.5 and therefore the asymptote of the function represented clearly has a larger slope (between 0.84 and 0.96, to be precise). As the arrival process occurs with an average speed of about 700 Mbps, the hypothesis that at high speeds the traffic becomes Poissonian (H → 0.5) is rejected. The conclusion is that the small timescales of Internet backbone traffic are not modelled accurately by fBm.

Image

Figure 1.2 Correlograms for different types of Internet traffic.

So, although the traffic processes in high-speed Internet links exhibit asymptotic self-similarity, their correlation structure at short timescales renders their modeling to be exact self-similar processes (like the fractional Brownian motion) inaccurate. Based on statistical and queuing analysis of data traces from high-speed Internet links, we conclude that Internet traffic retains its self-similar properties even under high aggregation.

1.2  Models of complex networks

It is easy to accept that the World Wide Web is a huge, complex network. It remains more difficult to define the complexity or more exactly, the main characteristics of a complex system. We expect that its definition should be richer than that of algorithmic complexity, and should express the level of interconnectedness and interdependencies of a system. In a complex system, the utility of a structure or a process is often expressed at the next higher level of organization relative to the process itself. Unlike entropy and the related concept of information, complexity is neither extensive nor entirely intensive. What is clear though is that complexity involves a specific description, which, is of course, dependent on the technology and subjective capabilities of the observer. Anyway, we can consider that a complex system is a system with a large number of elements, building blocks, or agents, capable of interacting with each other and with their environment. The interaction between elements may occur only with immediate neighbors or with distant ones; the agents can be all identical or different; they may move in space or occupy fixed positions, and can be in one of the two states or of multiple states. The common characteristic of all complex systems is that they display organization without the application of any external organizing principle. Today, we encounter many more examples of complex systems: physical, mechanical, biological, and social. The stock market, cities, metabolic pathways, ecosystems, the Internet, or the human brain, are all complex. The question naturally arises as to what is in common all these systems. That they all share similar network architectures has emerged as the answer in the last few years. Network theory has become one of the most visible pieces of the body of knowledge that can be applied to the description, analysis, and understanding of complex systems and is now an essential ingredient of their study.

A network is a system of nodes with connecting links. Once this viewpoint is accepted, networks appear everywhere. Consider some examples from two main fields: (a) biological networks: autonomous nervous systems of complex organisms, a network of neurons connected by synapses, gene regulation networks, a network of genes connected by cross-regulation interactions or metabolic networks, or a network of metabolites connected by chemical reactions and (b) social networks, like e-mails services, Internet, and the World Wide Web. The structure of such social networks was formalized initially by using random graphs, in which the existence of a link between any pair of nodes has probability p. Erdos, in collaboration with Renyi, undertook the theoretical analysis of the properties of random graphs and obtained a number of important results, including the identification of the percolation threshold, that is, the average number of links per node necessary in order for a random graph to be fully connected, or the typical number of intermediate links in the shortest path between any two nodes in the graph.

Paul Erdos and Alfred Renyi initiated the study of random graphs in the 1960s (Erdos and Renyi 1960). Random graph theory is, in fact, not the study of graphs, but the study of an ensemble of graphs (or, as mathematicians prefers to call it, a probability space of graphs). The ensemble is a group consisting of many different graphs, where each graph has a probability attached to it. They proposed a graph with N nodes in which each of the N(N + 1)/2 possible edges is present with a probability p. A property Q is said to exist with a probability P if the total probability of all the graphs in the ensemble having that property is P. Two well-studied graph ensembles are GN,M, which is the ensemble of all graphs having N vertices and M edges, and GN, p consists of graphs with N vertices, where each possible edge is realized with a probability p. These families are known to be similar if

M=(N2)p,

so long as p is not too close to 0 or 1 and are referred to as ER graphs.

An important attribute of a graph is the average degree, that is, the average number of edges connected to each node. We shall denote the degree of the i-th node by ki and the average degree by 〈k〉. N-vertex graphs with 〈k〉 = O(N0) are called sparse graphs. In what follows, we exclusively focus on sparse graphs.

An interesting characteristic of the ensemble GN, p is that many of its properties have a related threshold function, pt(N), such that if p < pt the property exists with probability 0, in the “thermodynamic limit” of N → ∞, and with probability 1 if p > pt. This phenomenon is similar to the physical notion of a phase transition. For a property Q, we have

limNPN,p(Q)={ 0,ifp(N)pt(N)01,ifp(N)pt(N)

(1.9)

An example of such a property is the existence of a giant component, that is, a set of connected nodes, in the sense that a path exists between any two of them, whose size is proportional to N. Erdos and Renyi showed that for ER graphs such a component exists if 〈k〉 > 1. If 〈k〉 < 1 only small components exist, and the size of the largest component is proportional to ln N. Exactly at the threshold, 〈k〉 = 1, a component of size proportional to N2/3 emerges. This phenomenon was described by Erdos as the “double jump.” Another property is the average path length between any two sites, which in almost every graph of the ensemble is of order ln N.

Recently, several studies of real-world networks have indicated that the ER model fails to reproduce many of their observed properties. One of the simplest properties of a network that can be measured directly is the degree distribution, or the fraction P(k) of nodes having k connections (degree k). A well-known result for ER networks is that the degree distribution is Poissonian, P(k)=ezzk/k!, where z = 〈k〉 is the average degree. Direct measurements of the degree distribution for networks of the Internet (Faloutsos et al. 1999), WWW (Broder et al. 2000), metabolic networks (Jeong et al. 2000), network traffic control (Dobrescu et al. 2004b) and many more, show that the Poisson law does not apply. Most often these nets exhibit a scale-free degree distribution: P(k)=kλ,k=m,,K where c(λ1)m(λ1) is a normalization factor, and m and K are the lower and upper cutoffs for the connectivity of a node, respectively.

One of the first network models that presents scale-free degree distribution is represented by the small-world network model, which has the so-called small-world phenomenon as main characteristic. This phenomenon is defined by the coexistence of two apparently incompatible conditions: (i) the number of intermediaries between any pair of nodes in the network is quite small, typically referred to as the six degrees of separation phenomenon and (ii) the large local redundancy of the network, that is, the large overlap of the circles of neighbors of two network neighbors. The latter property is typical of ordered lattices, while the former is typical of random graphs. Watts and Strogatz (1998) proposed a minimal model for the emergence of the small-world phenomenon in simple networks. In their model, small-world networks emerge as the result of randomly rewiring a fraction p of the links in a d-dimensional lattice (Figure 1.3). The parameter p enables one to continuously interpolate between the two limiting cases of a regular lattice (p = 0) and a random graph (p = 1).

Image

Figure 1.3 Small-world networks model generation. (First published in Watts, D.J. and S.H. Strogatz. Nature, 393, 440–442, 1998.)

Watts and Strogatz probed the structure of their small-world network model using two quantities: (i) the mean shortest distance L between all pairs of nodes in the network and (ii) the mean clustering coefficient C of the nodes in the network. For a d-dimensional lattice, L ~ N1/d and C = O(1), where N is the number of nodes in the network; for a random graph, L ~ ln N and C ~ 1/N.

An important characteristic of a graph that was not taken into consideration in the small-world model of Watts and Strogatz is the degree of distribution, that is, the distribution of number of connections of the nodes in the network. The Erdos–Renyi class of random graphs has a Poisson degree distribution, while lattice-like networks have even more strongly peaked distributions, which means they are a perfectly ordered lattice having a delta-Dirac degree of distribution. Similarly, the small-world networks generated by the Watts and Strogatz model also have peaked, single-scale, degree distributions, that is, one can clearly identify a typical degree of the nodes comprising the network. Against this theoretical background, Barabasi and coworkers found that a number of real-world networks have a scale-free degree of distribution with tails that decay following the power law (Barabási and Albert 1999). These networks were called scale-free networks (SFNs).

1.3  Scale-free networks

1.3.1  Basic properties of SFNs

Scale-free networks are complex networks in which some nodes are very well connected while most nodes have a very small number of connections. An important characteristic of scale-free networks is that they are size-independent, that is, they preserve the same characteristics regardless of the network size N. Scale-free networks have a degree of distribution that follows a power relationship, P(k)=kλ. Many real networks have a scale-free degree distribution, including the Internet.

The Internet is the primary example of a self-organizing complex system, having grown mostly in the absence of centralized control or direction. In this network, information is transferred in the form of packets from the sender to the receiver via routers, which are computers specialized in the transfer packets to another router “closer” to the receiver. A router decides the route of the packet using only local information obtained from its interaction with neighboring routers, not by following instructions from a centralized server. A router stores packets in its finite queue and processes them sequentially. However, if the queue overflows due to excess demand, the router will discard incoming packets, leading to a congestion-like situation. A number of studies have probed the topology of the Internet and its implications for traffic dynamics. In Section 1.1, it has been already mentioned that Internet traffic fluctuations are statistically self-similar and that the traffic displays two separate phases: congested and noncongested. It was also shown that time series of a number of connections are nonstationary and are characterized by different mean values depending on the observation period and that the Internet displays a number of properties that distinguishes it from random graphs: wiring redundancy and clustering, nontrivial eigenvalue spectra of the connectivity matrix and a scale-free degree distribution.

The degree of distribution does not characterize the graph or ensemble in full. There are other quantities, such as the degree–degree correlation (between connected sites), the spatial correlations, etc. Several models have been presented for the evolution of scale-free networks, each of which may lead to a different ensemble. The first suggestion was the preferential attachment model proposed by Barabasi and Albert, which came to be known as the Barabasi–Albert (BA) model. Several variants have been suggested to this model. One of them, known as the Molloy–Reed construction (Molloy and Reed 1998), which ignores the evolution and assumes only the degree of distribution and no correlations between the nodes, will be considered in the following. Thus, reaching a site by following a link is independent of the origin.

The most representative property of a SFN is the power-law distribution P(k) for the number of nodes k, which are connected to a randomly chosen node. Considering the degree of node as a number of edges that start from a given node, one can observe that the node degree of distribution is well represented by P(k)kλe(kK), where the coefficient λ may vary approximately from 1 to 3 for most real networks, and K ≫ 10 shows an exponential cutoff.

Another important result is that, in spite of the huge dimension of a network like the World Wide Web, having k ΣkP(k)O(1), the average length ℓ between two nodes is very small. For this reason SFNs can be considered as small-world networks, in particular.

The third main property of SFNs is clustering, which is characterized by the grouping coefficient Ci. The grouping coefficient of the node i of degree ki is Ci=(2Ei/ki(ki1)), where Ei is the number of edges between the neighbors of the node i, and ki(ki − 1)/2 is the maximum number of edges. The grouping coefficient of the whole graph is the average of the grouping coefficients of all nodes.

1.3.2  Distances and bounds in SFN

In most random network models the structure is locally tree-like (since most loops occur only for n(l) ~ N), and, since the number of sites grows as n(l) ~ (k − 1)l, they are also infinite-dimensional. As a consequence, the diameter of such graphs (i.e., the minimal path between the most distant nodes) scales like D ~ ln N. This small diameter is to be contrasted with that of finite-dimensional lattices, where DN1/dl. Watts and Strogatz (1998) have suggested a model that retains the local high clustering of lattices while reducing the diameter to D ~ ln N. This so-called small-world network is achieved by replacing a fraction z of the links in a regular lattice with random links, to random distant neighbors.

We now aim to show that scale-free networks with the exponential degree 2 < λ < 3 has a diameter D ~ ln ln N, which is smaller than that of ER and small-world networks. If the network is fragmented, we will only be interested in the diameter of the largest cluster (assuming there is one). We consider the diameter of a Molloy–Reed scale-free network defined as the average distance between any two sites on the graph. Actually, it is easier still to focus on the radius of a graph, L ≡ 〈l〉 as the average distance of all sites from the site of highest degree in the network (if there is more than one, we pick one arbitrarily). The diameter of the graph, D, is restricted to LD ≤ 2L and thus scales like L.

Cohen et al. show that the radius of any scale-free graph with λ > 2 has a rigorous lower bound that scales as ln ln N (Cohen et al. 2000). It is easy to understand that the smallest diameter of a graph, of a given degree of distribution, is achieved by the following construction: Start with the highest degree site and then connect to each successive layer the existing sites of the highest degree, until the layer is full. By construction, loops will occur only in the last layer.

To bind the radius L of the graph, we will assume that the low degree sites are connected randomly to the giant cluster. We pick a site of degree 1k*(lnlnN)1/(λ1)n. If l1lnlnN/ln(λ2) then Kl1 < k*. Therefore, all sites of degree kk* with probability 1 lie within the l1 layers from the site we chose. On the other hand, if we start uncovering the graph from any site—provided it belongs to the giant component—then within a distance l2 from this site, there are at least l2 bonds. Since l = l1 + l2, all sites are at a distance of order ln ln N from the highest degree site, and L = ln ln N is a rigorous lower bound for the diameter of scale-free networks with λ > 2.

In a similar way, one can demonstrate that the scaling of D ~ ln ln N is actually realized in the general case of random scale-free graphs with 2 < λ < 3. For λ > 3 and N ≫ 1, k is independent of N, and the radius of the network is L ~ ln N (Newman et al. 2001). The lower bound is obtained from the highest degree site for λ = 3, with K=mN. Then, assuming ln ln N ≫ 1, the upper bound becomes L ~ ln N/ln ln N. This result has been obtained rigorously for the maximum distance in the BA model where λ = 3, for m ≥ 2. For m = 1, the graphs in the BA model turn into trees, and the behavior of D ~ ln N is obtained. It should be noted that for m = 1 the giant component in the random model contains only a fraction of the sites (while for m ≥ 2 it contains all sites at least in the leading order). This might explain why exact trees and BA trees are different from Molloy–Reed random graphs.

Figure 1.4 shows a scale-free network that grows increment by increment from 2 to 11 nodes. When a new node (gray color) makes a new connection, it attaches preferentially to the most connected existent node (black color). These two mechanisms—incremental step-by-step growth and preferential attachment—lead finally to a system dominated by hubs, that is, nodes having a huge number of connections.

After the researchers established that P(k) has a scale-free nature, they tried to generalize the concept of random graph by a deterministic construct of the degree distributions. That leads to the theory of random graphs with arbitrary degree distributions, in which one can compute the average of all graphs with N possible nodes, each graph being weighted with i=1nP(ki), where {k1, …, kN} are the nodes degrees. One of the first results in this field was proposed by Molloy and Reed, which demonstrated that if kK(k2)P(k)>0, then there exists a phase transition in the distribution space {P(k)} (Molloy and Reed 1998). After that, Dorogovtsev et al. (2000a, b) used a “master-equation” to obtain an exact solution for a class of growing network models, Krapivsky et al. (2000) examined the effect of a nonlinear preferential attachment on network dynamics and topology. Models that incorporate aging and cost and capacity constraints were studied by Amaral et al. (2000) to explain deviations from the power-law behavior in several real-life networks. Bianconi and Barabasi (2001) introduced a model addressing the competitive aspect of many real networks such as the World Wide Web. Additionally, in real systems microscopic events affect the network evolution, including the addition or rewiring of new edges or the removal of vertices or edges. Albert and Barabasi (2000) discussed a model that incorporates new edges between existing vertices and the rewiring of old edges. It is now well established that preferential attachment can explain the power-law characteristic of networks, but some other alternative mechanisms affecting the evolution of growing networks can also lead to the observed scale-free topologies. Kleinberg et al. (1999) and Kumar et al. (2000) suggested certain copying mechanisms in an attempt to explain the power-law degree distribution of the World Wide Web.

Image

Figure 1.4 Birth of a scale-free network.

1.4  Current trends in traffic flow and complex networks modeling

The interest of the researchers for studying the traffic in large-scale informatics networks has growth in permanence starting from 1990. The capture and analysis of the traffic behavior in these networks were facilitated by the increased capacity in modeling offered by digital processing techniques (Crovella and Bestavros 1997, Floyd and Paxson 2001, Jiang et al. 2001, Willinger et al. 2001, Dobrescu et al. 2004b, Field et al. 2004).

After a large number of experiments, two main results toward characterizing and quantifying the network traffic processes have been achieved:

•  First, self-similarity is an adaptability of traffic in networks. Many factors are involved in creating this characteristic. A new view of this self-similar traffic structure is provided. This view is an improvement over the theory used in most current literature, which assumes that the traffic self-similarity is solely based on the heavy-tailed filesize distribution.

•  Second, the scaling region for traffic self-similarity is divided into two timescale regimes: short-range dependence (SRD) and long-range dependence (LRD). Experimental results show that a delay in the network transmission separates the two scaling regions. This gives us a physical source of the periodicity in the observed traffic. Also, bandwidth, TCP window size, and packet size have impacts on SRD. The statistical heavy tailedness (Pareto shape parameter) affects the structure of LRD. In addition, a formula to quantify traffic burstiness can be derived from the self-similarity property.

Starting 1999, the scale-free models were considered most suitable to characterize huge networks like Internet and the World Wide Web. Like the phenomenon of self-similarity was accepted as typical for traffic flow, network topologies considered initially to be random has shown a totally different structure when considering the topologic scaling and repeatability of the scale-free graphs, which are close to the concept of fractality, also introduced and discussed at the end of the twentieth century. Furthermore, studies of fractal traffic with multifractal analysis have given more interesting and applicable results:

•  At large timescales, increasing bandwidth does not improve throughput (or network performance). The two factors affecting traffic throughput are network delay and TCP window size. On the other hand, more simultaneous connections smooth traffic, which could result in an improvement of network efficiency.

•  At small timescales, traffic burstiness varies. In order to improve network efficiency, we need to control bandwidth, TCP window size, and network delay to reduce traffic burstiness. There are the tradeoffs from each other, but the effect is nonlinear.

•  In general, the statistics of the network traffic processes differ from Poisson processes. To apply this prior knowledge from traffic analysis and to improve network efficiency, the notion of the efficient bandwidth can be derived to represent the fractal concentration set. Above that bandwidth, traffic appears bursty and cannot be reduced by multiplexing. But, below the bandwidth, traffic congestion occurs. An important finding is that the relationship between the bandwidth and the transfer delay is nonlinear.

The past few decades have seen an exponential growth in the amount of data being carried across packet-switched networks, and particularly the Internet. In recent analyses of traffic measurements, evidence of non-Markovian effects, such as burstiness across multiple timescales, long-range dependence, and self-similarity have been observed in a wide variety of traffic sources. Given the evidence of long-range dependence and self-similarity in such traffic sources, it is clear that any general model for data traffic must account for these properties. This has led to the development of a number of new models, which allow the characterization of phenomenon like betweenness centrality or traffic burstiness.

It is generally accepted that in many networks nodes having a larger degree also have a larger betweenness centrality. Indeed, the larger the degree of a node, the larger the chance that many of the shortest paths will pass through this node; the chance of many shortest paths passing a law degree node is presumably small.

The betweenness centrality of many small-degree nodes can be comparable to that of the largest hubs of the network. For nonfractal networks, on the other hand, degree and betweenness centrality of nodes are strongly correlated. To demonstrate the difference in the relation between degree and betweenness centrality in real networks we can compare original networks with their random (uncorrelated) counterparts. The random counterpart network can be constructed by rewiring the edges of the original network, yet preserving the degrees of the nodes and enforcing its connectivity. As a result, we obtain a random network with the same degree of distribution, which is always nonfractal regardless of the original network. One can observe that a random network obtained by rewiring edges of the World Wide Web network is much stronger compared to that of the original network (Kitsak et al. 2007).

The ranges of betweenness centrality values for a given degree decrease significantly as we randomly rewire edges of a fractal SF network. Thus, the betweenness centrality of nodes of fractal networks is significantly less correlated with degree than in nonfractal networks. This can be understood as a result of the repulsion between hubs found in fractals: large-degree nodes prefer to connect to nodes of small degree and not to each other. Therefore, the shortest path between the two nodes must necessarily pass through the small-degree nodes, which are found at all scales of a network. Thus, in fractal networks, small-degree nodes have a broad range of values of betweenness centrality while in nonfractal networks nodes of small degree generally have small betweenness centrality.

The problem of traffic burstiness raised a lot of controversies in the literature. Traffic that is bursty on many or all timescales can be characterized statistically using the concept of self-similarity. Self-similarity is often associated with objects in fractal geometry, that is, objects that appear to look alike regardless of the scale at which they are viewed. In the case of stochastic processes, like time series, self-similarity refers to the process distribution; when viewed at varying timescales, the process distribution remains the same. A self-similar time series has noticeable bursts—long periods with extremely high values on all timescales. Characteristics of network traffic, such as interarrival times or length of frames, can be considered as stochastic time series. Therefore, measuring traffic burstiness is the same as characterizing the self-similarity of the corresponding time series.

Various papers discuss the impact of burstiness on network congestion. Their conclusions can be formulated as follows:

•  Congested periods can be quite long with losses that are heavily concentrated.

•  Linear increases in buffer size do not result in large decreases in packet drop rates.

•  A slight increase in the number of active connections can result in a large increase in the packet loss rate.

Many previous works also analyzed the burstiness and the correlation structure of Internet traffic in various timescales in terms of the protocol mechanisms of the TCP, such as timeouts, congestion avoidance, self-clocking, etc. It was shown that large transfers over high-capacity links produced non-Gaussian traffic, while the low-volume transmissions produced Gaussian and long-range-dependent traffic. Long sequences of back-to-back packets can cause significant correlations in short timescales.

A first generation of papers, approximately from 1994 to 2004, argued that the traditionally used Poisson models oversimplified the characteristics of network traffic and were not appropriate for modeling bursty, local-area, and wide-area network traffic. Since 2004, a second generation of papers has challenged the suitability of these results in networks of the new century and has claimed that the traditional Poisson-based and other models are still more appropriate for characterizing today’s Internet traffic burstiness. A possible explanation was that as the speed and amount of Internet traffic grow spectacularly, any irregularity of the network traffic, such as self-similarity, might cancel out as a consequence of high-speed optical connections, new communications protocols, and the vast number of multiplexed flows. These papers analyzed the traffic traces of Internet backbone. Terdik and Gyires (2009) applied the theory of smoothly truncated Levy flights and the linear fractal model in examining the variability of Internet traffic from self-similar to Poisson and demonstrated that the series of interarrival times was still close to a self-similar process, but the burstiness of the packet lengths observed from the Internet backbone traces captured in 2008 significantly decreased compared to earlier traces.

This book offers a rigorous analysis of the achievements in the field of traffic control in large networks, oriented on two main aspects: the self-similarity in traffic behavior and the scale-free characteristic of a complex network. Additionally, the authors propose a new insight in understanding the inner nature of things and the cause and effect based on identification of relationships and behaviors within a model, based on the study of the influence of the topological characteristics of a network upon the traffic behavior. The effects of this influence are then discussed in order to find new solutions for traffic monitoring and diagnosis and for prediction of traffic anomalies.