In this chapter, we introduce a propagation source estimator under snapshot observations: a sample path based source estimator (Jordan Center). According to Chap. 5, a snapshot provides partial knowledge of network status at a given time t. Many approaches have been proposed to identify propagation sources under snapshot observations, including Jordan Center, Dynamic Message Passing, effective distance based method, etc. Within these methods, Jordan center is a representative one and many variations and improvements have been made based on this method. Here, we present the details of the Jordan Center estimator. For the techniques involved in other methods under snapshot observations, readers could refer to Chap. 9 for details.
7.1 Introduction
Malicious attack propagation in networks refer to the spread process of attack throughout the networks, and have been widely used to model many real-world phenomena such as the spreading of rumors over online social networks, and the spreading of computer virus over the Internet. The problem of identifying propagation source has attracted many attentions. For example, Shah and Zaman proposed Rumor Center to identify rumor propagation sources [160–162] was analyzed in [160, 161]. They formulated the problem as a maximum likelihood estimation (MLE) problem, and developed novel algorithms to detect the source. However, these methods are considered under the complete observation of networks, which is not applicable for many real-world scenarios. Zhu and Ying [200] considered the following problem: given a snapshot of the diffusion process at time t, can we tell which node is the source of the propagation?
Zhu and Ying [200] adopted the Susceptible-Infected-Recovered (SIR) model, a standard model of epidemics [12, 47]. The network is assumed to be an undirected graph and each node in the network has three possible states: susceptible (S), infected (I), and recovered (R). Nodes in state S can be infected and change to state I, and nodes in state I can recover and change to state R. Recovered nodes cannot be infected again. Initially, all nodes are assumed to be in the susceptible state except one infected node. The infected node is the propagation source of the malicious attack. The source then infects its neighbors, and the attack starts to spread in the network. Now given a snapshot of the network, in which some nodes are infected nodes, some are healthy (susceptible and recovered) nodes. The susceptible nodes and recovered nodes assumed to be indistinguishable. Zhu and Ying [200] proposed a low complexity algorithm, called reverse infection algorithm, through finding the sample path based estimator in the underlying network. In the algorithm, each infected node broadcasts its identity in the network, the node who first collect all identities of infected nodes declares itself as the information source. They proved that the estimated source node is the node with the minimum infection eccentricity. Since a node with the minimum eccentricity in a graph is called the Jordan center, they call the nodes with the minimum infection eccentricity the Jordan infection centers.
7.2 The SIR Model for Information Propagation
Consider an undirected graph G(V, E), where V is the set of nodes and E is the set of undirected edges. Each node v ∈ V has three possible states: susceptible (S), infected (I), and recovered (R). Zhu and Ying [200] assumed a time slotted system. Nodes change their states at the beginning of each time slot, and the state of node v in time t is denoted by X v(t). Initially, all nodes are in state S except source node v ∗ which is in state I. At the beginning of each time slot, each infected node infects its susceptible neighbors with probability q. Each infected node recovers with probability p. Once a node get recovered, it cannot be infected again. Then, the infection process can be modeled as a discrete time Markov chain X(t), where X(t) = {X v(t), v ∈ V } is the states of all the nodes at time t. The initial state of this Markov chain is X v(0) = S for v ≠ v ∗ and .
7.3 Maximum Likelihood Detection
Note that the difficulty of solving the problem in (7.3) is the curse of dimensionality. For each v such that Y v = 0, its infection time and recovered time are required, i.e., O(t 2) possible choices; for each v such that Y v = 1, the infection time needs to be considered, i.e., O(t) possible choices. Therefore, even for a fixed t, the number of possible sample paths is at least at the order of t N, where N is the number of nodes in the network. This curse of dimensionality makes it computational expensive. Zhu and Ying [200] introduced a sample path based approach to overcome the difficulty.
7.4 Sample Path Based Detection
7.5 The Sample Path Based Estimator
Lemma 7.1
where d(v r, u) is the length of the shortest path between and u and also called the distance between v r and u, and I is the set of infected nodes.
This lemma states that the optimal time is equal to the infection eccentricity. The next lemma states that the optimal sample path rooted a node with a smaller infection eccentricity is more likely to occur.
Lemma 7.2
where is the optimal sample path starting from node u.
Proof
Denote by T v the tree rooted in v and the tree rooted at u but without the branch from v. See and in Fig. 7.2. Furthermore, denote by the set of children of v. The sample path X[0, t] restricted to is defined to be .
The first step is to show . Note that , otherwise, all infected node are on . As T is a tree, v can only reach nodes in through edge (u, v), , which contradicts .
The following lemma gives a useful property of the Jordan infection centers.
Lemma 7.3
On a tree network with at least one infected node, there exist at most two Jordan infection centers. When the network has two Jordan infection centers, the two must be neighbors.
The following theorem states that the sample path based estimator is one of the Jordan infection centers.
Theorem 7.1
Proof
Assume the network has two Jordan infection centers: w and u, and assume . Based on Lemma 7.3, w and u must be adjacent. The following steps show that, for any a ∈ V ∖{w, u}, there exists a path from a to u (or w) along which the infection eccentricity strictly decreases.
Repeatedly applying Lemma 7.2 along the path from node a to u, then the optimal sample path rooted at node u is more likely to occur than the optimal sample path rooted at node a. Therefore, the root node associated with the optimal sample path X ∗[0, t ∗] must be a Jordan infection center. The theorem holds.
7.6 Reverse Infection Algorithm
Zhu and Ying [200] further proposed a reverse infection algorithm to identify the Jordan infection centers. The key idea of the algorithm is to let every infected node broadcast a message containing its identity (ID) to its neighbors. Each node, after receiving messages from its neighbors, checks whether the ID in the message has been received. If not, the node records the ID (say v), the time at which the message is received (say t v), and then broadcasts the ID to its neighbors. When a node receives the IDs of all infected nodes, it claims itself as the propagation source and the algorithm terminates. If there are multiple nodes receiving all IDs at the same time, the tie is broken by selecting the node with the smallest . The details of the algorithm is presented in Algorithm 7.1.
Simulations on g-regular trees were conducted. The infection probability q was chosen uniformly from (0, 1) and the recovery probability p was chosen uniformly from (0, q). The infection process propagates t was uniformly chosen from [3, 20]. The experimental results show that the detection rates of both the reverse infection and closeness centrality algorithms increase as the degree increases and is higher than 60% when g > 6.
Algorithm 7.1: Reverse infection algorithm