J. HawkinsErgodic DynamicsGraduate Texts in Mathematics289https://doi.org/10.1007/978-3-030-59242-4_7

7. Perron–Frobenius Theorem and Some Applications

Jane Hawkins¹

(1)

Department of Mathematics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

The Perron–Frobenius theory of nonnegative matrices has many useful dynamical consequences, in the field of Markov shifts in particular. The math in turn gives us insight into areas as diverse as Google page rank and virus dynamics, applications which will be discussed in this chapter.

While proofs of the important Perron–Frobenius theorem are sometimes relegated to linear algebra texts, we include one here due to its significance in the field of ergodic theory and dynamical systems. Different proofs by authors in dynamics also appear in [173] and [106] for example. The spectrum of an n × n matrix A, written here as Spec(A), is the topic of the theorem; that is, we are interested in statements about the eigenvalues, eigenspaces, and generalized eigenspaces of nonnegative matrices. A complex number λ is an eigenvalue for A if there exists $v \in \mathbb C^n, v \neq 0$ such that Av = λv; equivalently, (A − λI)v = 0. Such a vector v is called an eigenvector for λ (and A). Each eigenvalue for A is a root of the characteristic polynomial $p_A(t)= \det (A - t I)$ . The subspace of $\mathbb C^n$ spanned by all the eigenvectors for λ is the eigenspace for λ. A nonzero vector $v \in \mathbb C^n$ is a generalized eigenvector for $\lambda \in \mathbb C$ if there exists some k ≥ 1 such that

$\displaystyle \begin{aligned} (A - \lambda I)^k v = 0. \end{aligned}$

(7.1)

The generalized eigenspace for λ is the subspace of all vectors satisfying (7.1) for some $k \in \mathbb N$ . A generalized eigenspace necessarily includes an eigenspace, since if λ satisfies (7.1) for some k ≥ 1, it is an eigenvalue (see Exercise 2). The spectral radius of a square matrix A is the maximum of the moduli of its eigenvalues, and we denote it by ρ(A). There are various versions of what is referred to as the Perron–Frobenius Theorem; what they have in common is that each gives properties of the spectral radius of matrices with nonnegative entries.

7.1 Preliminary Background

Throughout this chapter, each matrix A = (a _ij), i, j = 1, …n is an n × n matrix. If each a _ij ≥ 0, we write A ≥ 0, and call A a nonnegative matrix. If a _ij > 0 for all i, j, we write A > 0, and say A is positive. For a vector $v \in \mathbb R^n$ , v ≥ 0 and v > 0 mean v _j ≥ 0 and v _j > 0 for all j, respectively. We use I to denote the n × n identity matrix. On finite dimensional spaces all norms are equivalent; we use the following norms on matrices and vectors, convenient for our setting. For $v \in \mathbb R^n$ , ∥v∥_∞ =max_1≤j≤n|v _j|, and for A an n × n matrix, $\|A\|{ }_{\infty }= \sup \{ \|Av\|{ }_\infty \,: \|v\|{ }_\infty =1\}.$ We write ∥v∥ and ∥A∥ for these norms throughout this chapter; the reader is referred to Appendix B, Section B.1.2 for more about norms. We note that sometimes $v \in \mathbb R^n$ is naturally a row vector, and other times a column vector (to make multiplication work). We assume it will be clear from the context, but when we need to change left multiplication of A by v to right multiplication of A ^t, we use (vA)^t = A ^t v ^t and vice versa, where A ^t = (a _ji) is the transpose of A = (a _ij). For two vectors $v,w \in \mathbb R^n$ , by (v, w) we denote the usual inner product (dot product).

Of particular interest in dynamics are nonnegative matrices and vectors whose rows all sum to 1, stochastic matrices, introduced in Remark 6.18. The (ij)th entry in a stochastic matrix represents the probability of going from state i to state j in one time step. A stochastic vector gives the probabilities of initial states in the process being modelled. We draw on the next key result for the proof of the Perron–Frobenius Theorem.

Proposition 7.1

Suppose A > 0 is a stochastic n × n matrix. If a vector $v =(v_1,\ldots ,v_n) \in \mathbb R^n$ is not of the form v = (t, t, …, t) for any $t \in \mathbb R$ , then ∥Av∥ < ∥v∥.

Proof

By hypothesis, given v ≠ (0, …, 0), let ∥v∥ = α > 0; then $\alpha = |v_{j_o}|$ for some j _o = 1, …, n. By using − v instead of v if needed, assume $\alpha = v_{j_o}$ .

Note that $|(Av)_k | = |\sum _{i=1}^n a_{ki}v_i|$ ; also $\alpha = \sum _{i=1}^n a_{ki}v_{j_o}.$ By hypothesis, there is some $v_m \neq v_{j_o}$ . If v _m ≤ 0, then

$\displaystyle \begin{aligned}|(Av)_k| = \left| \sum_{i=1}^n a_{ki}v_i \right | \leq \left | \sum_{i=1, i \neq m}^n a_{ki}v_i \right | \leq \sum_{i=1, i \neq m}^n a_{ki}|v_i| < \sum_{i=1}^n a_{ki}v_{j_o}.\end{aligned}$

The last inequality holds since a _ki > 0 for each k, including m; so |(Av)_k| < α. Otherwise all v _m > 0, and

$\displaystyle \begin{aligned}|(Av)_k| = \sum_{i=1}^n a_{ki}v_i \leq \sum_{i=1, i \neq m}^n a_{ki}v_i + a_{km}v_m < \sum_{i=1}^n a_{ki}v_{j_o},\end{aligned}$

and again |(Av)_k| < α since $v_m<v_{j_o}$ . Therefore ∥Av∥ =max_k|(Av)_k| < α as claimed. □

Corollary 7.2

Every n × n stochastic matrix A > 0 has 1 as an eigenvalue with a 1-dimensional eigenspace spanned by η = (1, 1, …, 1).

Proof

$(A\eta )_k = \sum _{i=1}^n a_{ki}\eta _i = 1$ for each k. By Proposition 7.1 there is no other vector $v \in \mathbb R^n$ such that Av = v apart from scalar multiples of η. □

For a square matrix A, the ℓ-fold product of A with itself is well-defined; we write it as $A^\ell = (a_{ij}^{(\ell )}), i,j=1,\ldots n$ . A nonnegative matrix A is irreducible if for each i, j, there exists some ℓ = ℓ(i, j) such that $a_{ij}^{(\ell )}>0$ . (This is consistent with Definition 6.15 and Remark 6.17). A stronger condition is that A is primitive, which is defined by the property that there is some $\ell \in \mathbb N$ such that $a_{ij}^{(\ell )}>0$ for all i, j = 1, …, n, i.e., A ^ℓ is positive.

After several intermediate results, we give a version of the Perron–Frobenius Theorem whose proof, and the proof of a stronger version, appears in Section 7.2 If r is an eigenvalue for A, then it is an eigenvalue for A ^t as well, and we say v is a left eigenvector for r if vA = rv, or equivalently if A ^t v ^t = rv ^t.

Theorem 7.3 (Perron–Frobenius Theorem)

Assume that A ≥ 0 is an irreducible n × n matrix.

1.
A has an eigenvalue r > 0, which is a simple root of the characteristic polynomial of A. There exists an eigenvector v > 0 such that Av = rv.
2.
Every other eigenvalue λ of A satisfies |λ|≤ r.
3.
If A is primitive, then every eigenvalue λ ≠ r satisfies |λ| < r.

The matrices

$\displaystyle \begin{aligned} A_1 = \left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right) \quad \mbox{and} \quad A_2 = \left( \begin{array}{ccc} 0 & 1 & 0 \\ \frac 1 2 & 0 & \frac 1 2 \\ 0 & 1 & 0 \end{array} \right) \end{aligned}$

provide the simplest examples of stochastic matrices that are irreducible but not primitive, and for which the last statement of Theorem 7.3 does not hold since each has both − 1 and 1 as eigenvalues.

Remark 7.4

If A is a square matrix, we define the exponential of A by

$\displaystyle \begin{aligned} e^A = \sum_{k=0}^\infty \frac{A^k}{k!}, \end{aligned}$

(7.2)

where the convergence is in the matrix norm topology.

We note that if Av = λv, then e ^A v = e ^λ v. In addition, each eigenspace and generalized eigenspace for A and λ is also an eigenspace and generalized eigenspace for e ^A and e ^λ. Suppose A ≥ 0; an important observation is that e ^A − I ≥ 0, and if in addition A is irreducible, then e ^A − I > 0, since for each i, j = 1, …, n, we have $a_{ij}^{(k)}>0$ for some $k \in \mathbb N$ . Then (7.2) gives

$\displaystyle \begin{aligned}e^A = I + A + \frac{ A^2}{2!} + \ldots \ \mbox{implies} \ e^A-I = A + \frac{A^2}{2!}+ \cdots + \frac{A^j}{j!} \cdots.\end{aligned}$

Therefore each entry will eventually be strictly positive in the matrix sum on the right.

We have the following lemma about the action of A and A ^t on nonnegative vectors when A is irreducible.

Lemma 7.5

Given an irreducible n × n matrix A ≥ 0, let $Q = \{ v \in \mathbb R^n\,:\, v_i \geq 0,\, i=1,\ldots ,n\}$ denote the positive cone of $\mathbb R^n$ . If v ∈ Q, then both Av ∈ Q and A ^t v ∈ Q, and if v ≠ 0, then both Av and A ^t v are nonzero.

Moreover if v ∈ Q ∖{0}, and Av = rv, r > 0, then v > 0. Similarly, if u is an eigenvector for A ^t in Q, for r > 0, then u > 0 (all coordinates of u and v are strictly positive).

Proof

We consider the matrix $B=e^A-I = \sum _{k=1}^{\infty }\frac {A^k}{k!}$ ; A irreducible gives that B > 0 by Remark 7.4. Now we have

$\displaystyle \begin{aligned} B>0 \, \, \mathrm{and} \,\, v \in Q\setminus \{0\} \quad \Rightarrow \quad Bv > 0.\end{aligned}$

(7.3)

However, if Av = 0, then by definition of B, Bv = 0 as well. Therefore, v ∈ Q ∖{0} implies Av ≠ 0 by (7.3), and Av ≥ 0, so Av ∈ Q. This proves the first statement for A.

If v ∈ Q ∖{0} satisfies Av = rv for some r > 0, then using Eqn (7.3) (e ^A − I)v = (e ^r − 1)v > 0. Therefore v > 0.

Since A ^t is irreducible if and only if A is, the remaining statements follow by the same argument replacing A with A ^t, and using that $B^t =(e^A-I)^t = e^{A^t}-I >0$ . □

For A ≥ 0 irreducible, we consider the simplex in S ⊂ Q defined by S = {v ∈ Q : (v, η) = 1}, with η = (1, …, 1). Using the norm on $\mathbb R^n$ given by $\|v\|{ }_1=\sum _{j=1}^n|v_j|$ , we have S = {v ∈ Q : ∥v∥₁ = 1}, and from Lemma 7.5, the map $\hat {A}:S \rightarrow S$ defined by: $\hat {A}(v) = A v / \|Av\|{ }_1$ is well-defined and continuous on S.

Lemma 7.6

The map $\hat {A}:S \rightarrow S$ defined by $\hat {A}(v) = \dfrac {A v}{\|Av\|{ }_1}$ has a fixed point.

Proof

Since S is a compact convex subset of $\mathbb R^n$ of dimension n − 1, S is homeomorphic to a ball in $\mathbb R^{n-1}$ . Since $\hat {A}$ is continuous, the result follows from the Brouwer Fixed Point Theorem [142]. To see that S is convex, if t ∈ (0, 1), and v, w ∈ S, then

$\displaystyle \begin{aligned} \|tv+(1-t)w\|{}_1&=(tv+(1-t)w,\eta) \\& = t(v,\eta)+(1-t)(w,\eta) \\& = t+ (1-t)=1.\end{aligned}$

□

7.2 Spectrum and the Perron–Frobenius Theorem

We have now laid the groundwork to prove the first part of the Perron–Frobenius Theorem.

Proposition 7.7

An irreducible n × n matrix A has an eigenvalue r > 0, with an eigenvector v ^∗ > 0. Moreover, r is the only eigenvalue for A with an eigenvector in Q. Additionally the matrix A ^t has an eigenvector w ^∗ > 0 for r, and r is the only eigenvalue for A ^t with an eigenvector in Q. Equivalently, there is a left eigenvector x > 0 such that xA = rx.

Proof

By Lemmas 7.5 and 7.6, A maps Q into Q, and there is a vector v ^∗∈ S ⊂ Q such that v ^∗ = Av ^∗∕∥Av ^∗∥₁; therefore r = ∥Av ^∗∥₁ > 0 is an eigenvalue for A with eigenvector v ^∗ > 0.

An eigenvalue for A is also an eigenvalue for A ^t, so r > 0 is an eigenvalue for A ^t. Lemma 7.5 implies that A ^t : Q → Q, and an application of Lemma 7.6 to the map $\widehat {A^t}$ on S ⊂ Q yields an eigenvalue λ > 0 and eigenvector w ^∗∈ Q for A ^t. To show that λ = r, assume Av = rv and A ^t w = λw for some nonzero vectors v, w ∈ Q and λ, r > 0. By Lemma 7.5, w > 0 and v > 0, so (v, w) > 0 and therefore

$\displaystyle \begin{aligned} \lambda (v,w) =(v,A^tw)=(Av,w)=r (v,w). \end{aligned}$

It follows that λ = r and A ^t w = rw for some w ∈ Q, call it w ^∗.

Now suppose there exists a vector u ∈ Q ∖{0}, and some $\lambda \in \mathbb C$ with Au = λu. Since A maps Q into Q, λ ≥ 0. Then using w ^∗ > 0 from above,

$\displaystyle \begin{aligned} \lambda (u,w^*) =(A u ,w^*)=(u, A^t w^*)=r(u,w^*). \end{aligned}$

Again, since (u, w ^∗) ≠ 0, λ = r. This shows that r is the only eigenvalue for A with an eigenvector in Q. Also x = (w ^∗)^t satisfies xA = rx as claimed. □

We are interested primarily in irreducible matrices whose positive eigenvalue is 1, so we replace A by (1∕r)A if needed so that r = 1. Without loss of generality, for now we assume the eigenvalue from Proposition 7.7 is r = 1, and we continue to write the matrix as A. A classical result from linear algebra allows us to replace A by a similar matrix, i.e., one of the form C = D ⁻¹ AD, and it follows that Spec(C) = Spec(A). However, we make a strategic choice of similarity (the matrix D) that gives a more useful norm, since every matrix norm acts as an upper bound for Spec(A) (see Exercise 3). We consider the eigenvector for r = 1, v ^∗ = (v ₁, ⋯ , v _n) > 0, and we define D = (d _ij) to be the invertible diagonal matrix with d _jj = v _j > 0. It is not hard to see that the resulting matrix C = D ⁻¹ AD is stochastic (see the Exercises below). In the next proposition we show that these adjustments help simplify the proof of Theorem 7.3.

Recall that if A is an n × n matrix and λ is an eigenvalue, then λ is geometrically simple if the dimension of the eigenspace in $\mathbb R^n$ for λ is one, and algebraically simple if λ is a root of multiplicity one of the characteristic polynomial p _A(t).

Proposition 7.8

Suppose C ≥ 0 is an irreducible n × n stochastic matrix. Then the following hold.

1.
r = 1 is an eigenvalue for C with η = (1, 1, ⋯ , 1) as an eigenvector.
2.
∥C∥ = ∥Cη∥_∞ = 1.
3.
If λ is an eigenvalue for C, then |λ|≤ 1.
4.
The eigenvalue 1 is a simple root of the characteristic equation of C, so 1 is algebraically and geometrically simple.

Proof

(1)
For each j = 1, …, n, $(C\eta )_j = \sum _{k=1}^n c_{jk}1 =1$ since each row of C sums to 1.
(2)
Let w = (w ₁, ⋯ , w _n) satisfy ∥w∥ = 1. Then since 0 ≤ c _ij ≤ 1 for all i, j, it is clear that
$\displaystyle \begin{aligned} |(Cw)_i| = \left | \sum_{k=1}^n c_{ik}w_k \right | \leq \max_k |w_k| \sum_{k=1}^n c_{ik} = 1.\end{aligned}$

(7.4)

Therefore ∥C∥ = 1.
(3)
Since Spec(C) ≤∥C∥, 1 ≥|λ| for every λ ∈Spec(C), and (3) follows immediately.
(4)
It suffices to prove (4) for the matrix B = 1∕(e − 1)(e ^C − I); by Remark 7.4, B > 0 and proving (4) for B implies (4) holds for C as well (see Exercise 5). We first show that B is stochastic by showing all rows sum to 1. Because Cη = η and $e^C\eta = (\sum _{k=0}^\infty C^k /k!) \, \eta = e \eta$ , it follows that Bη = 1∕(e − 1)((e − 1)η) = η = (1, 1, …, 1). This shows that B is stochastic, since (Bη)_j = 1 is the sum of the entries of the j th row of B.

To show (4) for B, we consider two B-invariant subspaces. First, $V_1 =\{x \in \mathbb R^n : x=(a,a,\ldots ,a),\, a \in \mathbb R\}$ is the one-dimensional eigenspace for 1 (by Corollary 7.2). We next consider an eigenvector u ^∗ > 0 for B ^t corresponding to the eigenvalue 1. We then define the second subspace $W = \{ w \in \mathbb R^n : (w, u^*) = 0 \}$ . If w ∈ W, then (Bw, u ^∗) = (w, B ^t u ^∗) = (w, u ^∗) = 0, so Bw ∈ W. W is n − 1 dimensional, since it is the orthogonal complement to the eigenspace spanned by u ^∗, and W ∩ V ₁ = {0}. Since y∉V ₁ implies By − y∉V ₁ unless y is an eigenvector and By − y = 0, there are no generalized eigenvectors for the eigenvalue 1 except for vectors in V ₁. This shows that the eigenvalue 1 is both geometrically and algebraically simple. □

Proof of the Perron–Frobenius Theorem 7.3

Assume the matrix A ≥ 0 is irreducible; Proposition 7.7 implies that A has an eigenvalue r > 0 with an eigenvector ζ ^∗ > 0. Set A _r = (1∕r)A so that A _r ≥ 0 is irreducible and has 1 as an eigenvalue, with eigenvector ζ ^∗ > 0 by Lemma 7.5. Define D to be the diagonal matrix with $d_{jj} = \zeta _j^* >0$ so that the resulting matrix C _r = D ⁻¹ A _r D satisfies the hypotheses of Proposition 7.8. It then follows that 1 is an algebraically simple eigenvalue of C _r and therefore r is an algebraically simple eigenvalue of A, yielding (1) and (2) of the theorem.

To prove (3), assume that in addition A is primitive, so A ^m > 0 for some $m \in \mathbb N$ , and set M = r ^−m A ^m > 0. Now M has 1 as an eigenvalue with the eigenvector ζ ^∗, and we claim that for (3) to hold for the original matrix A, it is enough to show that λ ∈Spec(M), λ ≠ 1 implies |λ| < 1. The claim is true because if β ∈Spec(A) and |β| = r, then β ^m ∈Spec(A ^m) and (β∕r)^m ∈Spec(M); but |β∕r| = 1, so β = r if (3) holds for M.

Finally, to prove that (3) holds for M, denote by D the diagonal matrix defined above with $d_{jj}= \zeta ^*_j$ . Then C = D ⁻¹ MD is positive and stochastic, which yields a spectral radius of 1 with a geometrically and algebraically simple eigenvalue 1 for C with eigenspace V ₁. Therefore $\mathbb R^n$ is the direct sum $\mathbb R^n = V_1 \oplus W$ , each a C invariant subspace, as in the proof of Proposition 7.8. In particular, there is a corresponding eigenvector u ^∗ > 0 with C ^t u ^∗ = u ^∗, and $W=\{ w \in \mathbb R^n\, : \, (w,u^*)=0 \}$ .

If λ ≠ 1 is an eigenvector for C, then λ ∈Spec(C|_W). However by Proposition 7.1, if w ∈ W, and Cw = λw, then ∥Cw∥ < ∥w∥, which implies |λ| < 1. This proves (3) for C, and therefore for the similar matrix M, and hence for A.

The following version of the Perron–Frobenius Theorem, often used in applications, is a consequence of Proposition 7.8.

Theorem 7.9

Suppose A is primitive and stochastic with left eigenvector q > 0 for the eigenvalue 1, normalized so that ∥q∥₁ = 1. Let P denote the matrix where each row of P is identically q. Then for every $x \in \mathbb R^n$ , A ^k x → Px exponentially fast. In particular, ∥A ^k − P∥→ 0 as k →∞.

Proof

By hypothesis, there is an $m \in \mathbb N$ such that A ^m > 0 and is stochastic. We recall the subspace $V_1 =\{x \in \mathbb R^n : x=(a,a,\ldots ),\, a \in \mathbb R\}$ , and as in the proof of Proposition 7.8, we set $W = \{ w \in \mathbb R^n : (w, q) = 0 \}$ . W is the orthogonal complement to the span of q, and $\mathbb R^n = V_1 \oplus W$ , AW ⊆ W, and AV ₁ ⊆ V ₁. By Theorem 7.3

$\displaystyle \begin{aligned} \|A^m |{}_W \| = \beta<1 \,\, \mathrm{and} \, \|A\| = 1, \end{aligned}$

(7.5)

where ∥A ^m|_W∥ is the norm of A restricted to W. If P is the square matrix with each row the left eigenvector q (i.e., P _ij = q _j), then for all w ∈ W, Pw = 0 since (Pw)_j = (w, q) = 0. Clearly Pv ∈ V ₁ if v ∈ V ₁; therefore Px is the projection of $x \in \mathbb R^n$ onto V ₁ such that P = 0 on W. Hence AP = P, and I − P is the projection onto W with I − P = 0 on V ₁.

It remains to show lim_k→∞∥A ^k − P∥ = 0. For every $v \in \mathbb R^n$ , we can write v = w + u, with u ∈ V ₁, w ∈ W. By Equation (7.5), for each $j \in \mathbb N$ ,

$\displaystyle \begin{aligned}\|A^{mj} v - Pv \| = \|A^{mj} (w + u) - Pv \| = \|(A^m)^jw + u - u \| \leq |\beta^j| \rightarrow 0.\end{aligned}$

Then if $k \in \mathbb N$ , we write k = mj + l, so

$\displaystyle \begin{aligned} A^k = A^l {(A^m)^j}(P + (I-P)) = P + A^l {(A^m)^j}(I-P).\end{aligned}$

(7.6)

Then using estimates from (7.5) and (7.6 ),

$\displaystyle \begin{aligned} \begin{array}{rcl} \|A^k-P\|& = &\displaystyle \|A^l {(A^m)^j}( I-P)\| \\ & \leq &\displaystyle \|A^l\| \|(A^m|{}_W)^j \| \|I - P\| \\ & \leq &\displaystyle \beta^j \|I - P\| \rightarrow 0 \end{array} \end{aligned}$

(7.7)

as k →∞, so j →∞, and the theorem is proved. □

Remark 7.10

The matrix P defined in Theorem 7.9 satisfies the following: (1) P is stochastic, (2) PA = AP = P, (3) an eigenvalue of A is an eigenvalue of P, and (4) P ² = P.

In many applications, for example Section 7.4 below, the following consequence is used.

Corollary 7.11

Let A be a primitive stochastic matrix. Then for every row vector x ≥ 0, with $\sum _{i=1}^nx_i=1$ ,

$\displaystyle \begin{aligned}\lim_{k \rightarrow \infty} x A^k = q\end{aligned}$

where q is as in Theorem 7.9.

Proof

We set v = x ^t to be a column vector, and then lim_k→∞(xA ^k)^t =lim_k→∞(A ^k)^t v = P ^t(v) = q. (See Exercise 8.) □

7.2.1 Application to Markov Shift Dynamics

Using the notation from Chapter 6.2, we let M denote an incidence matrix, and A a stochastic matrix with zero entries precisely where M has zeros. The matrix M = (m _ij) satisfies m _ij = 1 if and only if in one time step state i can go to state j. For each pair of states i and j, a _ij = 0 if and only if m _ij = 0; if a _ij > 0, its value represents the probability of going from state i to state j. We apply the above results to the dynamical system σ on $(\Sigma _M,\mathcal B)$ to obtain proofs of the following results.

Proposition 7.12

If A is an irreducible stochastic matrix, then there exists an invariant probability measure ρ _A satisfying all the conditions of Definition 6.12 , making $(\Sigma _M,\mathcal B, \rho _A,\sigma )$ into a finite measure-preserving Markov shift.

Proof

By Proposition 7.7, we have a left stochastic eigenvector q > 0 for the eigenvalue 1; this gives the existence of an invariant measure ρ _A satisfying Definition 6.12. □

In order to show that ρ _A is ergodic, or equivalently that the (q, A) Markov shift is ergodic, we need one more lemma.

Lemma 7.13

Let A be an n × n irreducible stochastic matrix and q > 0 a stochastic vector in $\mathbb R^n$ such that qA = q. Then $\displaystyle P = \lim _{N \rightarrow \infty } \frac 1 N \sum _{k=0}^{N-1} A^k$ , where every row of P is q.

Proof

Consider the measure-preserving dynamical system $(\Sigma _M,\mathcal B,\rho _A, \sigma )$ , and the cylinder set $$C_0^j$$

. Let $\chi _j = \chi _{C_0^j}$ and apply Birkhoff’s Ergodic theorem. Then

$\displaystyle \begin{aligned}\lim_{N \rightarrow \infty} \frac 1 N \sum_{k=0}^{N-1} \chi_j (\sigma^k x) = \chi_j^* (x) \quad \mu\mathrm{-}\mathrm{a.e.}\end{aligned}$

Multiplying by χ _i(x) and integrating using ρ _A, the Dominated Convergence Theorem gives that

$\displaystyle \begin{aligned}\lim_{N \rightarrow \infty} \frac 1 N \sum_{k=0}^{N-1} q_i a_{i j}^{(k)} = \int \chi_j^* (x)\chi_i(x) d \rho_A.\end{aligned}$

Therefore setting B = (b _ij), with

$\displaystyle \begin{aligned}b_{ij} = \frac{1}{q_i} \int \chi_j^* (x)\chi_i(x) d \rho_A\end{aligned}$

gives the desired matrix. Since A is irreducible, 1 is a simple eigenvalue. By the construction, BA = B, so each row of B must be the same, and be q. Therefore B = P. □

Theorem 7.14

If A is a primitive stochastic matrix, then $(\Sigma _M,\mathcal B, \rho _A,\sigma )$ is mixing (and hence weak mixing and ergodic).

Proof

Since ρ = ρ _A is preserved, by Theorem 5.10 (1), it is enough to show that for cylinder sets of the form $X=C_{k}^w$ and $Y= C_\ell ^u$ , for u, w finite allowable words,

$\displaystyle \begin{aligned}\lim_{n \rightarrow \infty} \rho(\sigma^{-n}X \cap Y) = \rho(X)\rho(Y).\end{aligned}$

If w = (w ₁, w ₂, ⋯ , w _r), and u = (u ₁, u ₂, ⋯ , u _s), for n > ℓ + s − k

$\displaystyle \begin{aligned} \rho(\sigma^{-n} X \cap Y) = q_{u_1}a_{u_1u_2} \cdots a_{u_{s-1}u_{s}}a_{u_sw_1}^{(n+k-(\ell+s))}a_{w_1 w_2} \cdots a_{w_{r-1}w_r}, \end{aligned}$

(7.8)

and since by Theorem 7.9, $q_j = \lim _{n \rightarrow \infty } a_{ij}^{(n)}$ , (7.8) gives

$\displaystyle \begin{aligned} \lim_{n \rightarrow \infty} \rho(\sigma^{-n}X \cap Y) &= q_{u_1}a_{u_1u_2} \cdots a_{u_{s-1}u_{s}}q_{w_1}a_{w_1 w_2} \cdots a_{w_{r-1}w_r} \\& = \rho(Y)\rho(X).\end{aligned}$

□

Theorem 7.15

If A is irreducible, then the (q, A) Markov shift is ergodic with respect to ρ.

Proof

The technique is the same as that for Theorem 7.14, using Lemma 7.13 instead of Theorem 7.9. □

7.3 An Application to Google’s PageRank

A search engine such as Google uses multiple ways to address an internet search query from a user, but PageRank is one of the best and oldest techniques. It was developed by Brin and Page in 1998 [22] at Stanford, and uses the Perron–Frobenius Theorem at its core. A PageRank is a score associated to a web page that assesses its importance in a particular search and is usually conducted only on web pages that contain data matching the original query. The list that appears in a Google search has as its first step finding pages that contain matches to the query words. Many details about PageRank can be found in sources such as [121] and [122]. We give a short account of the original method of Brin and Page to determine PageRank.

From a list of potentially relevant pages containing the queried word or phrase, a page ranking is obtained by first constructing a nonnegative stochastic matrix as follows. Suppose there are N pages V ₁, ⋯ , V _N to rank, each being relevant to the search. One way to determine importance, or high ranking of a page V _t, is to compute how many pages point to V _t, weighted by their own importance. That is, we want the PageRank of V _t, which we write as PR(V _t) to satisfy

$\displaystyle \begin{aligned} \mathrm{{PR}}(V_t) = \sum_{V_j \, {\mathrm{links} \, \mathrm{to}} \, V_t} \frac {\mathrm{{PR}}(V_j)}{{\# \mathrm{links} \, \mathrm{out}\, \mathrm{of} \,}V_j}.\end{aligned}$

(7.9)

We note that this takes into consideration that if the page V _j has a huge number of links going out from it, then the impact of it linking to the page of interest, V _t, is lessened. This can be rewritten in matrix form, and the links among pages are illustrated by a directed graph such as the one shown in Figure 7.1. In the graph shown there an arrow from V _i to V _j denotes a link from page V _i to page V _j.

../images/491076_1_En_7_Chapter/491076_1_En_7_Fig1_HTML.png — Fig. 7.1
The links among 4 pages relevant to an internet search, where a directed arrow denotes a page link

Let A = (a _ij) denote the (enormous) N × N matrix where a _ij = 0 if page V _i has no link to V _j, and 1∕(#links out of V_i) if it does. We seek a left eigenvector for the eigenvalue 1 for A if it exists, since if vA = v, then writing v = (v ₁, …, v _t, …, v _N), v _t = PR(V _t) as defined by (7.9). Even though A has row sums of 1, we do not necessarily have a stochastic matrix, since row sums of 0 also occur. Brin and Page made a few adjustments to A to yield an irreducible (positive) stochastic matrix to which the Perron–Frobenius applies, which we describe here.

The N × N matrix A contains many zeros, in particular, a dangling node is a page with no outgoing links and corresponds to a row of zeros in A. To take care of this issue, each row of zeros is replaced by a row of entries of 1∕N. We call the new matrix B, and B is a stochastic matrix. There is no obvious reason for it to be irreducible, but since there is always a very small probability that a browser might arrive at page V _j by chance, we make one more adjustment which results in a primitive matrix C; in fact C > 0.

We assign a weight β < 1, but close to 1, to the matrix B, and the weight (1 − β) to the matrix with each entry 1∕N; denote this by F _N. We define C = βB + (1 − β)F _N. The PageRank vector v _P is then defined to be the left eigenvector of C, for the eigenvalue 1, normalized so that ∥v _P∥₁ = 1. The entry of v _P with the highest value determines the most important page, or, in a simple system, the page that is returned in a search.

Example 7.16

As an example, we consider a tiny internet consisting of only 4 pages. They are interconnected via the graph shown in Figure 7.1.

The matrix A that would result from the algorithm given in Figure 7.1 is

$\displaystyle \begin{aligned} A = \left(\begin{array}{cccc} 0 & 0 & 1 & 0\\[.1 cm] \frac 12 & 0 & 0 & \frac 12\\[.1 cm] \frac 1 2 & 0 & 0& \frac 1 2\\[.1 cm] \frac 13& \frac 13&\frac 13&0\\ \end{array}\right). \end{aligned}$

Since A is already stochastic we have no need of matrix B, and using β = .85, we get the matrix

$\displaystyle \begin{aligned} C = \left(\begin{array}{cccc} .0375 & .0375 & .8875&.0375\\ .4625 & .0375 & .0375 & .4625\\ .4625 & .0375 & .0375& .4625\\ .3208&.3208&.3208&.0375\\ \end{array}\right), \end{aligned}$

We now have C > 0, and from Corollary 7.11 above, we can take powers of C to converge to the matrix whose rows are the left eigenvector for the eigenvalue 1. Direct computation also show that

$\displaystyle \begin{aligned}v_P \approx (.3012, .1040, .3600, .2347)\end{aligned}$

gives the PageRank vector we seek. Therefore Page 3 would appear first in the search, followed by Page 1. The implementation of this algorithm for the internet is another interesting problem not covered in this book.

7.4 An Application to Virus Dynamics

Eradicating the human immunodeficiency virus (HIV) in an infected individual has remained out of reach of current medical practice; however antiretroviral therapy (ART) has changed HIV to a chronic disease. Studies going back to the 1990s reveal that even when HIV is undetectable in the blood, the virus is not completely gone. The cells infected by the virus are white blood cells that are called CD4 cells (these are a type of T cell) and form a critically important component of the immune system. A latent HIV reservoir is a group of CD4 cells in the body that are infected with HIV but are not actively producing new virus particles.

In a patient infected with HIV, although latently infected CD4 cells are relatively sparse, it is believed by many that they prevent HIV from being eradicated from an infected individual and therefore knowledge about their location (and source) is critical to the study of their elimination [36]. Following the discussion given in [87], and the numerous references therein, we construct a Markov model of the hidden dynamics of a latently infected CD4 cell in an HIV positive patient. We use the Perron–Frobenius Theorem to also show that a limiting distribution of the latently infected cells exists, which indicates in which anatomical compartments the HIV typically can be found, and for each, the relative likelihood of being in that compartment.

Although the cardiovascular system is the main part of the circulatory system, the lymphatic system is the second part. This system is connected to, but quite different from, the cardiovascular system insofar as the fluid, consisting of lymph and the immune cells called lymphocytes, circulates much more slowly. Lymph is primarily moved by surrounding muscles; it is of critical importance in the study of HIV. There are associated lymphatic tissue and organs that filter pathogens from the lymph before returning it to the bloodstream. With that in mind, we review the major repositories of latently infected cells, and number the corresponding compartment to give the states represented in our matrix model.

7.4.1 States of the Markov Process

0.
The peripheral blood system is the pool of circulating blood in the cardiovascular system; it contains red blood cells, white blood cells (including latently infected CD4 cells), and platelets suspended in plasma. The blood is rapidly circulated through a closed circulatory system pumped by the heart. Red blood cells do not escape the cardiovascular system in a healthy individual, though white blood cells, slightly smaller, do.
1.
Gut-associated lymphatic tissue (GALT) is the site of many CD4 cells and is the largest immune organ in the body. Clinical tests have shown the presence of significant numbers of latently infected cells in GALT (around 36 copies per million cells).
2.
Lymph nodes are (some of the) filtering compartments in the lymphatic system that have an elaborate structure through which lymph flows in order to attack pathogens. The lymph nodes in a human (there are likely to be 500–600 of them per person) also house a large percentage of CD4 cells, including a large percentage of the latently infected ones.
3.
Cerebrospinal fluid and the central nervous system (CSF) house reservoirs of HIV as well. Infected cells in CSF appear from the early stages of infection, and the infected cells harboring the viruses are believed to be CD4 cells, though the concentration is lower than in other parts of the body.
4
Bone marrow hosts HIV reservoirs, and clinical studies show that the HIV DNA is latently present in CD4 cells found in bone marrow.

We model the viral spread within an individual human patient using a Markov process on a finite state space $\mathcal {A}=\{0,1,\ldots k-1 \}$ , where k is the number of compartments of the body in which CD4 cells with latent virus have been found, and each state represents a compartment. We are interested in dynamics of latently infected resting CD4 cells in a sample taken from the peripheral blood; since this fluid circulates quickly, we choose a time increment for taking measurements, (assigning a time increment to be a day is a reasonable choice). We study the statistical properties of a generic latently infected cell located in one of the k compartments.

We define a sequence of $\mathcal {A}$ -valued random variables {X _n : n ≥ 0} by first setting X ₀ to be the compartment where the cell resides in the body at the beginning of our observations. Then X ₁ is the compartment where the cell resides at the first (time 1) observation, and X _n is the compartment at the nth observation. We assume that X _n depends on X _n−1 and not on earlier X _j’s; this is a reasonable observation given that some of the compartments allow flow from one to another, but not all. We look at a sample case, using the states 0, 1, 2, 3, 4 described above.

We then define a k × k incidence matrix, denoted by B = (b _ij), with b _ij = 1 if in one time step a CD4 cell can move from compartment i to j, and 0 otherwise. Using this notation, if B ⁿ is the product of n copies of the matrix, writing $B^n=(b^{(n)}_{ij})$ , it follows that $b^{(n)}_{ij}=1$ if a CD4 cell can move from compartment i to j in n steps, and is 0 otherwise. In the current model we use the matrix given by (7.10), obtained from the graph given in Figure 7.2.

../images/491076_1_En_7_Chapter/491076_1_En_7_Fig2_HTML.png — Fig. 7.2
Labelled graph showing the fluid flow connections in one day among the compartments listed in 7.4.1

Using the matrix B and clinical data we construct a stochastic matrix P = (p _ij) with the property that p _ij = P(X _n = j|X _n−1 = i), where P denotes probability. Clearly p _ij > 0 if and only if b _ij > 0 for the corresponding incidence matrix B. We label the states as Lymph nodes (Node), Peripheral blood (Blood), Bone marrow (Bone), Cerebrospinal fluid and central nervous system(CSF), and Gut-associated lymphatic tissue (GALT), assigning numbers 0 − 4, respectively.

The incidence matrix with a time step of a day corresponding to the graph in Figure 7.2 is given by the matrix B, where b _ij = 1, where i, j = 0, 1, 2, 3, 4, if and only if a latently infected CD4 cell in compartment i can enter j.

$\displaystyle \begin{aligned} B_{\mathrm{{day}}} =\left(\begin{array}{ccccc}1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 0 & 1 \\ 1 & 1 & 1 & 0 & 0\\ 1 & 0 & 0 & 1 &0 \\ 1 & 1 & 0 & 0 & 1 \end{array} \right) \end{aligned}$

(7.10)

If we had perfect information, then we would be able to associate a stochastic matrix to the incidence matrix by following where the CD4 cells move over time. We rely on multiple studies done on the migration of T cells, the observation that there are non-migratory T cells, and the location of HIV reservoirs. If we synchronize the time step to match the blood test time step of one month (four weeks), then since $B_{\mathrm {{day}}}^2$ has only positive entries, in one month, a CD4 cell could in theory travel from any one compartment to any other.

Even though the precise mechanism by which the latently infected cells move through the circulatory and lymphatic systems is incredibly complex, we can form the incidence matrix and the stochastic matrix P with some confidence. While we have incomplete information about the stochastic matrix P of the Markov shift given by (7.11), we know from clinical studies that each $p^{(2)}_{ij} >0$ , so all entries in P _month are positive; i.e., P is primitive.

$\displaystyle \begin{aligned} P_{\mathrm{{month}}}=\left(\begin{array}{ccccc}p_{00} & p_{01} & p_{02} & p_{03} & p_{04} \\ p_{10} & p_{11} & p_{12} & p_{13} & p_{14} \\ p_{20} & p_{21} & p_{22} & p_{23} & p_{24}\\ p_{30} & p_{31} & p_{32} & p_{33} &p_{34} \\ p_{40} & p_{41} & p_{42} & p_{43} & p_{44} \end{array} \right) \end{aligned}$

(7.11)

Each choice of stochastic row entries will yield a stochastic left eigenvector q for the eigenvalue 1; it is also the case that several choices might yield the same vector q.

However we consider an example for the purposes of illustration. We consider the following associated stochastic matrix by using data given in the references in [87], Section 2.2:

$\displaystyle \begin{aligned} P_{\mathrm{{day}}}=\left(\begin{array}{ccccc}.5 & .19 & .005 & .005 & .3 \\ .35 & .4 & .05 & 0 & .2 \\ .3 & .2 & .5 & 0 & 0\\ .2 & 0 & 0 & .8 &0 \\ .4 & .1 & 0 & 0 & .5 \end{array} \right) \end{aligned}$

(7.12)

Since each row sum in P _day is 1, every power of the matrix is also stochastic, so we can calculate $P_{\mathrm {{month}}} = P_{\mathrm {{day}}}^{28}$ easily by computer. Using the entries of P _day, the left eigenvector for the eigenvalue 1 is approximately:

$\displaystyle \begin{aligned} q=(.428, .200, .024, .011, .337). \end{aligned}$

(7.13)

Since P _day is primitive, Corollary 7.11 can be applied, and we have for an initial $x \in \mathbb R^n$ ,

$\displaystyle \begin{aligned}\lim_{k \rightarrow \infty} x P_{\mathrm{{month}}}^k = \lim_{k \rightarrow \infty} x P_{\mathrm{{day}}}^{28 k}=q;\end{aligned}$

by computing, we see that rounded to the nearest .001,

$\displaystyle \begin{aligned} P_{\mathrm{{month}}}=\left(\begin{array}{ccccc}.428 & .200 & .024 & .011 & .337 \\ .428 & .200 & .024 & .011 & .337 \\ .428 & .200 & .024 & .011 & .337\\ .428 & .200 & .024 & .012 & .336 \\ .428 & .200 & .024 & .011 & .337\end{array} \right) \end{aligned}$

(7.14)

We can interpret the jth vector entry in q as the probability that a latently infected cell can be found in compartment j. With this assessment of the compartments in which latent CD4 cells are found in patients, the model shows that a generic latently infected CD4 cell is almost three times as likely to be found in the lymphatic system (GALT, 34%, or lymph nodes, 43%, giving 77%) as in the blood (20% likelihood) and the outcome is independent of where the latent infection started. The model also shows what is supported by clinical data, that there is a positive probability that there will be a steady state of latently infected cells found in the bone marrow and cerebral spinal fluid, despite the infrequency of its discovery.

The power of the Perron–Frobenius Theorem is that even though the assigned probabilities might be somewhat inaccurate, the existence of a left eigenvector q with positive entries is guaranteed, and the entries of q depend continuously on the entries of P; improvement in initial data will quickly accelerate the accuracy of the model. Due to Corollary 7.11, about four days after the original measurement in this example, we are quite close to the limiting distribution. The error is easily estimated [174], so at one month we can be assured we are (typically) extremely close to the limiting distribution, which could be used to make testing more efficient and informative.

Exercises

1.
Prove that if A is an n × n matrix and B = D ⁻¹ AD for some invertible square matrix D, then λ is an eigenvalue for A if and only if λ is an eigenvalue for B.
2.
Show that if A is an n × n matrix and there exists a generalized eigenvector for A and λ, then λ is an eigenvalue for A.
3.
Prove that if A is an n × n matrix such that ∥A∥ = r for some norm for A, then Spec(A) ≤ r.
4.
Assume that A has an eigenvalue λ = 1, with eigenvector v ^∗ = (v ₁, ⋯ , v _n) > 0. Define D = d _ij to be the diagonal matrix with d _jj = v _j > 0. Show that C = D ⁻¹ AD is stochastic.
5.
If C ≥ 0 is an n × n matrix, show that if 1 is an algebraically simple eigenvalue for B = 1∕(e − 1)(e ^C − I), then 1 is an algebraically simple eigenvalue for C. Hint: Set f(z) = (e ^z − 1)∕(e − 1), and consider B = f(C). Use f(1) = 1 to write B − I = (C − I)g(C), where g(z) is entire on $\mathbb C$ .
6.
Assume that A is irreducible and has an eigenvalue λ = 1, with eigenvector v ^∗ = (v ₁, ⋯ , v _n) > 0. Define D = d _ij to be the diagonal matrix with d _jj = v _j > 0. Show that C = D ⁻¹ AD is irreducible.
7.
Prove that if M is a primitive incidence matrix, then the associated topological Markov shift σ on Σ_M is topologically transitive.
8.
Assume A is a primitive stochastic matrix with left eigenvector q for the eigenvalue 1. Let P be the matrix with each row q. Show that for every vector $v \in \mathbb R^n$ with ∥v∥₁ = 1, P ^t v = q.
9.
For k ≥ 2 an integer, find the Markov measure ρ _A for the matrix
$\displaystyle \begin{aligned} A = \left( \begin{array}{cc} \frac 1 k & \frac{(k-1)}{k} \\[.1 cm] \frac {(k-1)}{k} & \frac {1}{k} \end{array} \right) \end{aligned}$
10.
Find an invariant probability measure for the Markov shift determined by the stochastic matrix:
$\displaystyle \begin{aligned} A = \begin{pmatrix} \, \frac 12 \, \, &\, \, 0 \, \, & \frac 12 \, \\[.2 cm] \frac 13 & \frac 13 & \frac 13 \\[.2cm] 0 & \frac 12 & \frac 12 \end{pmatrix}\end{aligned}$

References

22.
S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998)Crossref
36.
C. Chan et al., Recent developments in human immunodeficiency virus-1 latency research. J. Gen. Virol. 94, 917–932 (2013)Crossref
87.
J. Hawkins, Markov process models of the dynamics of HIV reservoirs. Math. Biosci. 275, 18–24 (2016)MathSciNetCrossref
106.
A. Katok, B. Hasselblatt, Introduction to the Modern Theory of Dynamical Systems (Cambridge University Press, Cambridge, 1995)Crossref
121.
A. Langville, C. Meyer, A survey of eigenvector methods for Web information retrieval. SIAM Rev. 47(1), 135–161 (2005)MathSciNetCrossref
122.
A. Langville, C. Meyer, Google’s PageRank and Beyond: The Science of Search engine Rankings (Princeton University Press, Princeton, 2006)Crossref
142.
J. Munkres, Topology: A First Course (Prentice-Hall, Upper Saddle River, 1975)zbMATH
173.
S. Sternberg, Dynamical Systems (Dover Publications, Hayward Cirker, 2010)zbMATH
174.
D. Stroock, An Introduction to Markov Processes. Graduate Texts in Mathematics, vol. 230, 2nd edn. (Springer, Heidelberg, 2014)