© Springer Nature Switzerland AG 2021
J. HawkinsErgodic DynamicsGraduate Texts in Mathematics289https://doi.org/10.1007/978-3-030-59242-4_5

5. Mixing Properties of Dynamical Systems

Jane Hawkins1  
(1)
Department of Mathematics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
 

In this chapter, we develop the concept of mixing to show that ergodic maps are not the most chaotic type of transformation. For example, not every ergodic dynamical system can make a heterogeneous environment appear to be more homogeneous after repeated applications of the transformation, but mixing systems can. Toral translations (and in fact all isometries on Riemannian manifolds) take every set A to a congruent set, so the set cannot mix into other sets. We define properties of mixing in this chapter and we also discuss how noninvertibility of a map is connected to mixing properties.

We begin with the following characterization of ergodicity and develop the mixing definitions from there.

Proposition 5.1
If f is a finite measure-preserving transformation of 
$$(X,\mathcal B,\mu )$$
, then f is ergodic if and only if for all sets 
$$A,B \in \mathcal B$$
,

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} \mu(f^{-k}A \cap B) = \mu(A)\mu(B). \end{aligned} $$
(5.1)
Proof
(⇒): If f is ergodic, given 
$$A,B \in \mathcal B$$
, let ϕ(x) = χ A(x); then by the Birkhoff Ergodic Theorem,

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} \chi_A(f^{k}x) = \int_X\chi_A\,d\mu = \mu(A) \ \mbox{a.e.}\end{aligned} $$
(5.2)
Multiplying (5.2) by χ B gives

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} \chi_A(f^{k}x)\chi_B(x) = \mu(A) \chi_B(x) \ \mbox{a.e.}\end{aligned} $$
(5.3)
The Dominated Convergence Theorem implies that the integrals converge as well, so integrating both sides of (5.3) gives

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} \mu(f^{-k}A \cap B) = \mu(A)\mu(B), \end{aligned} $$
(5.4)
as claimed.
(⇐=): We assume that (5.1) holds, and suppose that we have a set 
$$A \in \mathcal B$$
such that f −1 A = A (μmod0). If we set A = B, then (5.1) becomes

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} \mu(f^{-k}A \cap A) = \mu(A)\mu(A),\end{aligned} $$
(5.5)
and since f k A = A, the left hand side is exactly equal to μ(A) for each N and therefore μ(A) = (μ(A))2. This implies that μ(A) = 0 or 1. □

The next result does not require that f preserve a probability measure if it is invertible.

Theorem 5.2

Assume 
$$(X,\mathcal B,\mu ,f)$$
is nonsingular, invertible, and μ(X) = 1. Then f is ergodic if and only if for every 
$$A,B \in \mathcal B_+$$
, there exists 
$$n \in \mathbb Z$$
such that μ(f n A  B) > 0.

Proof

(⇐= ): If the set condition holds, then suppose f −1 A = A (μ mod  0) for some 
$$A \in \mathcal B_+$$
. The invertibility of f implies that for every 
$$A \in \mathcal B$$
, f −1 A = A if and only if f −1(X ∖ A) = (X ∖ A). Therefore μ(X ∖ A) = 0 and f is ergodic.

(⇒): Given sets 
$$A, B \in \mathcal B_+$$
, the smallest invariant set containing A is 
$$ \cup _{n= - \infty }^{+\infty } f^{-n}(A)$$
; if f is ergodic, then this set has measure 1, so 
$$ B \subset \cup _{n \in \mathbb Z} f^{-n}(A)$$
(μ mod  0). Hence the theorem is proved. □

We prove a basic result about sequences of real numbers that is useful in our comparison of mixing types.

Lemma 5.3
Let {a n} be a sequence of real numbers such that 
$$\lim  \limits _{n \rightarrow \infty } a_n =0$$
. Then

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} |a_k| = 0 \end{aligned} $$
(5.6)
and

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} a_k = 0 \end{aligned} $$
(5.7)
Proof
Since a n converges, there exists some M ≥ 0 such that |a n|≤ M for all n. And since ../images/491076_1_En_5_Chapter/491076_1_En_5_IEq14_HTML.gif, given ε > 0 there exists an N 0 such that for all n ≥ N 0, |a n| < ε∕2. Then for all n ≥ N 0

$$\displaystyle \begin{aligned} \frac 1n \sum_{k=0}^{n-1} |a_k| \leq \frac{N_0 M + (n-N_0) \frac{\varepsilon}{2}}{n } &lt; \frac{N_0 M}{n} + \frac{\varepsilon}{2}. \end{aligned}$$
We choose N 1 ≥ N 0 so that (N 0 M)∕N 1 ≤ ε∕2; then for all n ≥ N 1,

$$\displaystyle \begin{aligned} \frac 1n \sum_{k=0}^{n-1} |a_k| &lt; \frac{\varepsilon}{2} + \frac{\varepsilon}{2} = \varepsilon, \end{aligned}$$
so Equation(5.6) holds. Since

$$\displaystyle \begin{aligned}\left | \frac 1n \sum_{k=0}^{n-1} a_k \right | \leq \frac 1n \sum_{k=0}^{n-1} |a_k|,\end{aligned}$$
Equation (5.7) follows immediately. □

5.1 Weak Mixing and Mixing

We apply Lemma 5.3 to the definitions of weak mixing and mixing for finite measure-preserving transformations to obtain useful characterizations of these properties. We begin with the definitions.

Definition 5.4

If f is a finite measure-preserving transformation of 
$$(X,\mathcal B,\mu )$$
, then f is

  1. 1.
    weak mixing if and only if for all sets 
$$A,B \in \mathcal B$$
,
    
$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} |\mu(f^{-k}A \cap B)- \mu(A)\mu(B)|=0, \end{aligned} $$
    (5.8)

    and

     
  2. 2.
    (strong) mixing if and only if for all sets 
$$A,B \in \mathcal B$$
,
    
$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \mu(f^{-N}A \cap B) = \mu(A)\mu(B). \end{aligned} $$
    (5.9)
     

There is the following mixing hierarchy.

Proposition 5.5

If f is a finite measure-preserving transformation of 
$$(X,\mathcal B,\mu )$$
, then f is mixing implies f is weak mixing, and f is weak mixing implies f is ergodic.

Proof
Given sets 
$$A,B \in \mathcal B$$
, we define the sequence

$$\displaystyle \begin{aligned}a_k = \mu(f^{-k}A \cap B) - \mu(A)\mu(B).\end{aligned}$$
Then f is mixing if and only if limN a N = 0; by Lemma 5.3, (5.6) implies that f is weak mixing, and (5.7) and Proposition 5.1 imply that f is ergodic. □

Because of the hierarchy coming from Proposition 5.5, mixing is sometimes called strong mixing for emphasis. However, despite its appearance of between “in between” ergodicity and mixing, the property of being weak mixing is extremely interesting and important in ergodic theory. First we show the classical example of an ergodic transformation that is not weak mixing, namely irrational rotation on 
$$X = \mathbb R/\mathbb Z$$
. We give a statistical argument from [155].

Example 5.6
We consider the map R α(x) = x + α ( mod 1) on 
$$X=\mathbb R/\mathbb Z$$
, with m =  Lebesgue measure and α irrational. If we choose A = B = [0, 1∕2], then m(A) = m(B) = 1∕2 so m(A)m(B) = 1∕4. By the unique ergodicity of R α (Theorem 4.​24) if we consider an interval I ⊂ [0, 1] of length , then the proportion of times that R α(0) lands in I is ; i.e., as N →,

$$\displaystyle \begin{aligned} \frac 1N \sum_{k=0}^{N-1} \chi_I(R_{\alpha}^k(0)) \rightarrow \ell = m(I). \end{aligned} $$
(5.10)
Equation (5.10) holds at the point x = 0, and therefore for every x ∈ X. In particular, suppose I = [0, 1∕100], then there is a subsequence 
$$\{n_i\}_{i \in \mathbb N}$$
with n 1 < n 2⋯ < n i⋯, for which n i α ( mod 1) ∈ I, and n i occurs 1∕100 th of the time in the following sense:

$$\displaystyle \begin{aligned}\lim_{N \rightarrow \infty} \frac { | m \in \{n_i\} \,:\, m \leq N |} {N} = \frac 1 {100}.\end{aligned}$$
At these values n i we have, for A = B as above:

$$\displaystyle \begin{aligned} m(R_{\alpha}^{-n_i}A \cap B) - m(A)m(B) \geq \frac {49}{100} - \frac 14= \frac{6}{25}. \end{aligned}$$
This means that

$$\displaystyle \begin{aligned} \liminf_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} | m(R_{\alpha}^{-k}A \cap B) - m(A)m(B)| \geq \frac{1}{100} \cdot \frac{6}{25} &gt;0. \end{aligned}$$

Therefore the limit cannot be 0, so R α is not weak mixing from (5.8).

Before proving the next list of properties of weak mixing transformations, we establish a few more identities of real-valued sequences. We define the density of a set of positive integers as follows; if 
$$J \subset \mathbb N$$
, then |J| is the cardinality of J, and the density of J is

$$\displaystyle \begin{aligned} \Delta(J) = \lim_{n \rightarrow \infty} \frac {|J \cap \{1,2,\ldots,n\}|}{n}.\end{aligned}$$
Lemma 5.7
If {a n} is a bounded sequence of real numbers, then the following are equivalent:
  1. 1.

    
$$\displaystyle  \lim _{N \rightarrow \infty } \frac 1N \sum _{k=1}^{N} |a_k| = 0.$$

     
  2. 2.

    There exists a subset 
$$J \subset \mathbb N$$
with Δ(J) = 0, such that limn a n = 0 provided nJ.

     
  3. 3.

    
$$ \displaystyle  \lim _{n \rightarrow \infty } \frac 1N \sum _{k=1}^{N} |a_k|{ }^2 = 0. $$

     
Proof

(1) ⇒ (2): For 
$$J \subset \mathbb N$$
, let δ J(n) = |J ∩{1, …, n}|. For a fixed 
$$\kappa \in \mathbb N$$
, define J κ = {n : |a n|≥ 1∕κ}.

By construction, J 1 ⊂ J 2 ⊂ J 3 ⊂⋯. We first show that each J κ has density 0. Consider the average

$$\displaystyle \begin{aligned} \frac 1N \sum_{k=1}^{N} |a_k| \geq \frac 1N \cdot \frac 1\kappa \cdot \delta_{J_\kappa}(N).\end{aligned}$$
Taking the limit as N →, by (1) we have that 
$$\frac 1N \sum _{k=1}^{N} |a_k| \rightarrow 0$$
. Therefore, since 1∕κ is fixed, 
$$\delta _{J_\kappa }(N) /N \rightarrow 0$$
as well, so Δ(J κ) = 0.
It follows that for each 
$$\kappa \in \mathbb N$$
, there exist 0 = M 0 < M 1 < ⋯ such that for all n ≥ M κ,

$$\displaystyle \begin{aligned}\dfrac {\delta_{J_{\kappa+1}}(n)}{n} &lt; \frac{1}{(\kappa+1)}.\end{aligned}$$
We now define

$$\displaystyle \begin{aligned} J = \bigcup_{\kappa=0}^{\infty} [J_{\kappa+1} \cap [M_\kappa,M_{\kappa+1})],\end{aligned} $$
(5.11)
and we claim (5.11) gives that J has density 0.

Assuming the claim holds, if n > M κ and nJ, then nJ κ+1 and therefore |a n| < 1∕(κ + 1). Therefore limnJ,n|a n| = 0.

To prove the claim, we see that J 1 ⊂ J 2⋯ implies that if M κ ≤ n ≤ M κ+1, then

$$\displaystyle \begin{aligned} J \cap [1,n] = \left[ J \cap [1,M_\kappa)]\cup[J\cap [M_\kappa,n]\right] \subset \left[J_\kappa \cap [1,M_\kappa)]\cup[J_{\kappa+1} \cap [1,n]\right], \end{aligned}$$
and therefore

$$\displaystyle \begin{aligned} \frac {\delta_J(n)}{n} \leq \frac 1n \left( \delta_{J_\kappa}(M_\kappa)+ \delta_{J_{\kappa+1}}(n)\right) \leq \frac 1n \left(\delta_{J_\kappa}(n)+ \delta_{J_{\kappa+1}}(n)\right ) &lt; \frac{1}{\kappa} + \frac{1}{\kappa+1}.\end{aligned}$$
Since n → implies κ →, Δ(J) = 0. This proves (2).
(2) ⇒ (1): Since the sequence is bounded, assume |a k|≤ M for all 
$$k \in \mathbb N$$
. Given ε > 0, find N such that for all n ≥ N, nJ implies |a n| < ε. Choosing N larger if necessary, we also want that n ≥ N implies that δ J(n)∕n < ε. Therefore n ≥ N gives

$$\displaystyle \begin{aligned} \frac 1n \sum_{i=1}^n|a_i| = \frac 1n \left [ \sum_{i \in J, i\leq n} |a_i| + \sum_{i \notin J, i\leq n} |a_i| \right ] &lt; \frac M n \delta_J(n) + \varepsilon &lt; (M+1) \varepsilon. \end{aligned}$$
(1) ⇒ (3): Assume that |a k|≤ M for all 
$$k \in \mathbb N$$
and 
$$ \lim _{N \rightarrow \infty } \frac 1N \sum _{k=1}^{N} |a_k| = 0.$$
Then

$$\displaystyle \begin{aligned}\frac 1N \sum_{k=1}^{N} |a_k|{}^2 \leq \frac M N \sum_{k=1}^{N} |a_k|,\end{aligned}$$
so (3) follows after letting N →.
(3) ⇒ (1): For each 
$$N \in \mathbb N$$
, applying the Cauchy–Schwarz inequality to the vectors 1∕N(|a 1|, …, |a N|) and 1∕N(1, …, 1) gives

$$\displaystyle \begin{aligned} \frac 1N \sum_{i=1}^N|a_i| \leq &amp; \left( \frac 1N \sum_{k=1}^{N} |a_k|{}^2 \right )^{1/2} \left( \frac 1N \sum_{k=1}^{N} 1 \right )^{1/2} \\ &amp; = \left( \frac 1N \sum_{k=1}^{N} |a_k|{}^2 \right )^{1/2}. \end{aligned}$$
Then letting N → gives (1). □

We next give a list of equivalent characterizations of weak mixing for dynamical systems 
$$(X,\mathcal B,\mu ,f)$$
preserving μ, with μ(X) = 1. We need a spectral definition first.

Definition 5.8

Under the hypotheses above, we say f has continuous spectrum if 1 is the only eigenvalue of the Koopman operator U = U f and the constants are the only eigenfunctions.

To explain the terminology, if f has continuous spectrum we split

$$\displaystyle \begin{aligned} L^2(X,\mathcal B,\mu) = L_0 \oplus \mathbb C, \end{aligned} $$
(5.12)
where L 0 = {ϕ :∫x ϕdμ = 0} and 
$$\mathbb C$$
is shorthand for the constant functions. The restriction of the isometry U to the (invariant) subspace L 0 yields continuous (nonatomic) spectral measures on S 1. That is, for each ϕ ∈ L 0, there exists a positive measure ν ϕ on S 1 such that

$$\displaystyle \begin{aligned}(U^n\phi,\phi)= \int_{S^1} z^n d \nu_{\phi}, \quad  n \in \mathbb N \cup \{0\},\end{aligned}$$
(see Appendix B for the details).

Although mixing seems to be the property that is physically observable, we give a list of conditions equivalent to weak mixing to show the property is quite natural mathematically. It is also very prevalent among measure-preserving transformations, more so than mixing [81].

The next result consists of seven statements equivalent to weak mixing. To avoid confusion, in Figure 5.1 we diagram the implications that are proved below.
../images/491076_1_En_5_Chapter/491076_1_En_5_Fig1_HTML.png
Fig. 5.1

A diagram showing the implications in the proof of Theorem 5.9.

Theorem 5.9
If f is a finite measure-preserving transformation of 
$$(X,\mathcal B,\mu )$$
, then the following are equivalent:
  1. 1.

    f is weak mixing.

     
  2. 2.
    For every pair of sets 
$$A,B \in \mathcal B$$
, there exists a set 
$$J \subset \mathbb N$$
of density 0 such that
    
$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty, N \notin J} \mu(f^{-N}A \cap B) = \mu(A)\mu(B). \end{aligned}$$
     
  3. 3.
    For all 
$$\phi ,\psi \in L^2(X,\mathcal B,\mu ),$$
    
$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} \left | (U^k\phi,\psi) - (\phi,1)(1,\psi)\right| = 0. \end{aligned}$$
     
  4. 4.
    For every 
$$\phi \in L^2(X,\mathcal B,\mu ),$$
    
$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} \left | (U^k\phi,\phi) - (\phi,1)(1,\phi)\right| = 0. \end{aligned}$$
     
  5. 5.
    For every 
$$\phi \in L^2(X,\mathcal B,\mu )$$
,
    
$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} \left | (U^k\phi,\phi) - (\phi,1)(1,\phi)\right|{}^2 = 0. \end{aligned}$$
     
  6. 6.

    The transformation on 
$$(X \times X, \mathcal B \times \mathcal B, \mu \times \mu )$$
defined by f × f is ergodic.

     
  7. 7.

    The transformation on 
$$(X \times X, \mathcal B \times \mathcal B, \mu \times \mu )$$
defined by f × f is weak mixing.

     
  8. 8.

    f has continuous spectrum on the orthogonal complement to the constants.

     
Proof

The equivalence of (1) and (2) follows from Lemma 5.7 by setting a k = μ(f k A ∩ B) − μ(A)μ(B), for sets 
$$A,B \in \mathcal B$$
.

Proving the equivalence of (1) and (3) is straightforward; statement (3) is the definition of weak mixing when ϕ = χ A and ψ = χ B, so (3) implies (1), and (1) gives (3) for characteristic functions. Passing to simple functions and using a standard approximation argument, first fixing B, shows that statement (3) holds for all L 2 pairs of functions.

(1) ⇒ (4) proceeds like (1) ⇒ (3).

(3) ⇒ (4) by choosing ϕ = ψ.

To prove (4) implies (3), assume (4) holds. The first observation is that for each fixed ϕ the set of functions ψ ∈ L 2 satisfying (3) forms a closed subspace of L 2; denote this space by V ϕ. V ϕ contains the constants and ϕ by (4), and U(V ϕ) = V ϕ since f preserves μ. The claim is that 
$$V_{\phi } = L^2(X,\mathcal B,\mu )$$
, which then proves (3). If not, consider some 
$$\psi \in V_{\phi }^{\perp }$$
. Then for all k, (U k ϕ, ψ) = 0 and (1, ψ) = 0. In this case (3) holds for ψ, so no such ψ exists.

An application of Lemma 5.7 using 
$$a_k = \left |(U^k\phi ,\phi ) - (\phi ,1)(1,\phi )\right |$$
shows that (5) is equivalent to (4).

(2) ⇒ (7): Fix sets 
$$A,B,C,D \in \mathcal B$$
; by (2) there exist sets 
$$J_1,J_2 \subset \mathbb N$$
each of density 0 such that

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty, N \notin J_1} \mu(f^{-N}A \cap B) = \mu(A)\mu(B)\end{aligned}$$
and

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty, N \notin J_2} \mu(f^{-N}C \cap D) = \mu(C)\mu(D).\end{aligned}$$
Then consider

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty, N \notin J_1 \cup J_2}(\mu \times \mu)\left[(f \times f)^{-N}(A \times C) \cap (B \times D)\right]; \end{aligned} $$
(5.13)
using the definition of product measure, (5.13) is equivalent to

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty, N \notin J_1 \cup J_2} \mu (f^{-N}A \cap B) \mu (f^{-N}C \cap D). \end{aligned} $$
(5.14)
Since f satisfies (2), (5.14) is equal to

$$\displaystyle \begin{aligned} \mu(A)\mu(B)\mu(C)\mu(D) = (\mu \times \mu)(A \times C) \cdot (\mu \times \mu) (B \times D).\end{aligned}$$
This establishes (2) for the map f × f for rectangular sets in X × X. Obtaining the result for every measurable set in X × X is a standard approximation argument, since rectangular sets generate the product σ-algebra 
$$\mathcal B \times \mathcal B$$
. Therefore f × f is weak mixing since (2) ⇒ (1).

(7) ⇒ (6) follows from Proposition 5.5.

(6) ⇒ (1): Suppose that 
$$A,B \in \mathcal B$$
. By Lemma 5.7, it is enough to show that

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} \{\mu(f^{-k}A \cap B)- \mu(A)\mu(B)\}^2=0. \end{aligned} $$
(5.15)
To do that, we apply the characterization of ergodicity of f × f given in Proposition 5.1 to the following two pairs of sets: A × X and B × X, and A × A and B × B.
Using the first pair, it follows that as N →,

$$\displaystyle \begin{aligned} \frac 1N \sum_{k=0}^{N-1} \mu(f^{-k}A \cap B) = \frac 1N \sum_{k=0}^{N-1}(\mu \times \mu) \{(f\times f)^{-k}(A \times X) \cap (B \times X) \} \end{aligned}$$

$$\displaystyle \begin{aligned} \rightarrow (\mu \times \mu) (A \times X) (\mu \times \mu)(B \times X) =\mu(A)\mu(B). \end{aligned}$$
The second pair of sets gives

$$\displaystyle \begin{aligned} \frac 1N \sum_{k=0}^{N-1} (\mu(f^{-k}A \cap B))^2 = \frac 1N \sum_{k=0}^{N-1}(\mu \times \mu) \{(f\times f)^{-k}(A \times A) \cap (B \times B) \} \end{aligned}$$

$$\displaystyle \begin{aligned} \rightarrow (\mu \times \mu) (A \times A) (\mu \times \mu)(B \times B) =\mu(A)^2\mu(B)^2, \end{aligned}$$
as N →.
Using these identities, it follows that

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} \{\mu(f^{-k}A \cap B)- \mu(A)\mu(B)\}^2\end{aligned}$$

$$\displaystyle \begin{aligned} = \lim_{N \rightarrow \infty} \frac 1N \sum_{k=0}^{N-1} \{\mu(f^{-k}A \cap B)^2- 2 \mu(f^{-k}A \cap B)\mu(A)\mu(B)+ \mu(A)^2\mu(B)^2\}\end{aligned}$$

$$\displaystyle \begin{aligned} = 2 \mu(A)^2\mu(B)^2 - 2 \mu(A)^2\mu(B)^2 =0.\end{aligned} $$
From this it follows that f is weak mixing using Lemma 5.7.

It remains to connect (8) to the rest of the equivalent statements.

(6) ⇒ (8): if ϕ(fx) = λϕ(x) for some nonconstant ϕ ∈ L 2, ϕ, then setting 
$$\Phi (x,y) = \phi (x) \overline {\phi (y)}$$
, it follows that Φ(fx, fy) = |λ|2 Φ(x, y). Then there is a nonconstant f × f invariant function, contradicting (6).

(8) ⇒ (5): Assume that f has continuous spectrum as in Definition 5.8 and the remarks following it; consider ϕ ∈ L 2, and assume that ∫X ϕ  = 0 (otherwise replace ϕ by ϕ −∫X ϕ ). To show (5) it suffices to show

$$\displaystyle \begin{aligned} \frac 1 N \sum_{k=0}^{N-1}\left |(U^k\phi,\phi)\right |{}^2 \rightarrow 0 \ \mathrm{as} \ N \rightarrow \infty.\end{aligned} $$
(5.16)
Using Theorem B.21, with Q = U and h = ϕ, and the hypothesis in (8), it is enough to show that for the associated nonatomic measure μ ϕ on S 1,

$$\displaystyle \begin{aligned} \frac 1 N \sum_{k=0}^{N-1} \left |\int_{S^1} z^k d \mu_\phi(z) \right |{}^2 \rightarrow 0 \ \mathrm{as} \ N \rightarrow \infty. \end{aligned} $$
(5.17)
For z ∈ S 1, the identity 
$$\overline {z^n}=z^{-n}$$
allows (5.17) to be rewritten as follows:

$$\displaystyle \begin{aligned} \frac 1 N \sum_{k=0}^{N-1}\left |\int_{S^1} z^k d \mu_\phi(z) \right |{}^2 &amp;=\frac 1 N \sum_{k=0}^{N-1}\left (\int_{S^1} z^k d \mu_\phi(z) \cdot \int_{S^1} w^{-k} d \mu_{\phi}(w) \right) \\ &amp;= \frac 1 N \sum_{k=0}^{N-1}\left ( \int_{S^1\times S^1} (z/w)^k d (\mu_{\phi} \times \mu_{\phi})(z,w) \right) \\ &amp; = \int_{S^1\times S^1} \left ( \frac 1 N \sum_{k=0}^{N-1}(z/w)^k \right ) d (\mu_{\phi} \times \mu_{\phi})(z,w)\\ &amp;= \int_{S^1\times S^1} \frac 1 N \left (\frac{(z/ w)^{N}-1}{z/w - 1} \right) d (\mu_{\phi} \times \mu_{\phi})(z,w), \end{aligned}$$
using Fubini’s Theorem and the fact that zw ≠ 1, since d(μ ϕ × μ ϕ) gives measure 0 to the diagonal. As N →, the integrand goes to 0 in the last expression, so by the Lebesgue Dominated Convergence Theorem,

$$\displaystyle \begin{aligned}\lim_{N \rightarrow \infty} \int_{S^1\times S^1} \frac 1 N \left ( \frac{(z/w)^N-1}{z/w - 1} \right ) d (\mu_{\phi} \times \mu_{\phi})(z,w) = 0,\end{aligned}$$
which completes the proof of the theorem. □

The question of whether mixing is actually stronger than weak mixing was solved in 1969 by Chacon [35]. Examples can also be found in [148], and are attributed to Kakutani and von Neumann as early as the 1940s. What most of the examples have in common is that they start with an ergodic transformation f with discrete spectrum on a space X, send a proper set A ⊂ X off into an identical but disjoint copy of A, and then back into X using f, to create a new map 
$$\tilde {f}$$
on a larger space. By doing this carefully, eigenfunctions can be destroyed, but the resulting transformation 
$$\tilde {f}$$
still does not mix the space up too much.

We have some equivalent characterizations for mixing; the proof of the next result is similar to that given for weak mixing and appears as an exercise.

Theorem 5.10
If f is a finite measure-preserving transformation of 
$$(X,\mathcal B,\mu ),$$
then the following are equivalent:
  1. 1.
    f is mixing; i.e., for all 
$$A,B \in \mathcal B$$
,
    
$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \mu(f^{-N}A \cap B) = \mu(A)\mu(B).\end{aligned}$$
     
  2. 2.
    For all 
$$\phi ,\psi \in L^2(X,\mathcal B,\mu ),$$
    
$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} (U^N\phi,\psi) = (\phi,1)(1,\psi). \end{aligned}$$
     
  3. 3.
    For every 
$$\phi \in L^2(X,\mathcal B,\mu ),$$
    
$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} (U^N\phi,\phi) = (\phi,1)(1,\phi). \end{aligned}$$
     

The notion of multiple mixing plays a key role in the subject of ergodic theory, potentially providing an aid in classifying dynamical systems.

Definition 5.11
A finite measure-preserving transformation of 
$$(X,\mathcal B,\mu ,f)$$
is r-fold mixing if for sets 
$$B_0,B_1,\ldots , B_{r} \in \mathcal B$$

$$\displaystyle \begin{aligned}\lim (B_0 \cap f^{-n_1}B_1 \cap \cdots \cap f^{- n_{r}}B_{r}) = \mu(B_0) \mu(B_1) \cdots \mu(B_{r})\end{aligned}$$
as n 1, (n 2 − n 1), …, (n r − n r−1) →.

The proof given in Theorem 6.​3, Chapter 6 that Bernoulli shifts are mixing also shows that they are r-fold mixing. There is an open problem due to Rohlin in 1949, outlined by Halmos in 1956 ([81], the last page of the book), as to whether mixing implies r-fold mixing for every probability measure-preserving dynamical system 
$$(X,\mathcal B,\mu ,f)$$
. While some partial results and reductions have been obtained in the intervening years, the problem remains largely open. All results so far give hypotheses under which the answer is yes (see [103], and Remark 5.31 below). We give an example for a higher dimensional (
$$\mathbb Z^2$$
) action in Chapter 6 (Example 6.​24), which is mixing but not 3-fold mixing. For an overview of the state of the problem in 2006, a good source is [50].

5.2 Noninvertibility

Noninvertible maps mix up spaces in a natural way. We begin to make this precise here, and a clearer picture emerges in subsequent chapters, e.g., when entropy is introduced in Chapter 11. In calculus, the test for the invertibility of a function 
$$f:\mathbb R \rightarrow \mathbb R$$
is the “horizontal line test.” If every line drawn parallel to the x-axis cuts the graph in exactly one point or does not intersect the graph at all, then f is invertible on 
$$f(\mathbb R)$$
. However in ergodic theory, invertibility or the lack thereof is a measure theoretic property. Moreover a map need only satisfy a property on a set of full measure, so exceptional points can occur with respect to invertibility.

We describe the basic structure and define invertibility and noninvertibility of a dynamical system. In Figure 5.2 we see examples of a noninvertible and an invertible map of the interval, respectively. The graph on the left side of Figure 3.​8, the Feigenbaum map, is noninvertible with respect to Lebesgue measure, but invertible with respect to a measure supported on the Cantor attractor. In addition, even noninvertible maps have ambiguity about the number of preimages as discussed below and in Example 6.​27.
../images/491076_1_En_5_Chapter/491076_1_En_5_Fig2_HTML.png
Fig. 5.2

Graphs of noninvertible and invertible ergodic interval maps.

5.2.1 Partitions

A key tool for understanding noninvertibility is the notion of a partition of a standard measure space 
$$(X, \mathcal B,\mu )$$
. Partitions consisting of finitely many sets provide the basic building blocks for most of symbolic dynamics and coding as well.

Definition 5.12
A partition P of 
$$(X,\mathcal B,\mu )$$
is a disjoint collection of sets in 
$$\mathcal B$$
whose union is X. An element 
$$A_{\iota } \in P=\{A_{\iota }\}_{\iota \in \mathcal {I}}$$
is called an atom of P. A finite partition has finitely many elements and is written P = {A 1, A 2, …, A n}, and a countable partition has at most countably many elements. A partition P is defined only up to sets of μ measure 0.
  • If 
$$(X,\mathcal B,\mu ,f)$$
is a dynamical system and P is a finite partition of X, then we can define for each 
$$k \in \mathbb N$$
,
    
$$\displaystyle \begin{aligned} f^{-k}P = \{f^{-k}A_1, f^{-k}A_2, \ldots, f^{-k}A_n\}.\end{aligned}$$
  • Given two finite partitions P and Q, their join is the partition given by
    
$$\displaystyle \begin{aligned} P \vee Q = \{ A_i \cap B_j \,: \, A_i \in P, B_j \in Q\}.\end{aligned}$$
  • For 
$$i,j \in \mathbb N$$
, i < j, 
$$P_i^j = \bigvee _{k=i}^j f^{-k}P$$
.

  • If P is a partition, a set A is a P-set if A is a union of atoms of P.

Example 5.13
The point partition of X is the partition whose atoms are the points of X. We write

$$\displaystyle \begin{aligned} \varepsilon = \{ \{x\}: x \in X \}. \end{aligned} $$
(5.18)
There is also the trivial partition of X, for which we write

$$\displaystyle \begin{aligned}\zeta = \{\emptyset, X\}.\end{aligned} $$
(5.19)

By definition every partition has measurable atoms, but a “measurable partition” has a different definition and not every partition satisfies Definition 5.14 below. It is typical to study properties of measurable partitions on Lebesgue probability spaces, obtained by extending 
$$\mathcal B$$
to include the completion of the probability measure μ under consideration; when in that setting we write 
$$\mathcal B_{\mu }$$
. We note that the sets we add to 
$$\mathcal B$$
to obtain 
$$\mathcal B_\mu $$
are all subsets of sets 
$$E \in \mathcal B$$
with μ(E) = 0. Equivalently, we assume that 
$$(X, \mathcal B_\mu , \mu )$$
is isomorphic to 
$$([0,1], {\mathcal L}, m)$$
, the unit interval with Lebesgue measure structure on it, and we call X a (nonatomic) Lebesgue space (see Appendix A, Section A.2 for details). This provides a technical tool needed later in this section, making it easier to define a quotient space in this setting. More precisely, if we start with a Lebesgue space X and a measurable partition P of X and consider the quotient space XP whose points are the atoms of P, then XP is again a Lebesgue space. We do not develop all the technical aspects here, but refer to [134], [157], or [160] for more details.

Definition 5.14

P is a measurable partition if there is a countable family of P-sets, 
$$\{A_i\}_{i \in \mathbb N}$$
such that if B ≠ C are atoms in P, then there exists a set A j such that B ⊂ A j and C ⊂ X ∖ A j or vice versa.

Example 5.15
For the space 
$$([0,1],{\mathcal L},m)$$
with the usual Lebesgue structure, the point partition ε is a measurable partition. To show this we can take the countable collection of ε-sets of the form:

$$\displaystyle \begin{aligned} A_j^n=\left [ \frac{j}{2^n},\frac{j+1}{2^n} \right ),\quad  j=0, \ldots, 2^n-1, \, n \in \mathbb N \end{aligned} $$
(5.20)
Given 0 ≤ x < y ≤ 1, by looking at the dyadic expansion of x and y to see where they differ, we can choose n and j such that 
$$x \in A_j^n$$
and 
$$y \notin A^n_j$$
.

There exists a partial order on the set of measurable partitions on X.

Definition 5.16

Given two partitions P and Q of 
$$(X, \mathcal B,\mu )$$
we say that P is refined by Q if every atom of P is a Q-set. We write P ≼ Q to show that P has fewer atoms, or (in case they are both infinite partitions) atoms of P are Q-sets so are coarser. We also write Q ≽ P for P ≼ Q.

Given two finite partitions P and Q on 
$$(X,\mathcal B,\mu )$$
, we can assume by adding sets of measure 0 that they have the same number of atoms. We can then define a distance between them as follows:
../images/491076_1_En_5_Chapter/491076_1_En_5_Equam_HTML.png
where the minimum is over all permutations π of {1, 2, …, n}. Using the fact that partitions are only defined up to sets of measure 0, δ is a metric on the space of all partitions with n atoms. Once we identify partitions whose atoms agree up to sets of measure 0, refinement defines a very useful partial order.
Definition 5.17
If {P n} is a countable collection of measurable partitions, then we define the partition 
$$Q = \bigvee _{n=1}^{\infty } P_n$$
by
  • 
$$P_n \preceq Q \ \mbox{for all} \ n \in \mathbb N$$
, and

  • if P n ≼ Q′ for all n, and Q′ is measurable, then Q′≽ Q.

We call Q the infinite join or infinite product of the P n’s.

Every finite or countable partition P is measurable, and we assume from now on that every partition we refer to is a measurable partition. A finite partition P generates a sub-σ-algebra 
$$\mathcal F \subset \mathcal B$$
in a natural way; namely, a set 
$$E\in \mathcal B$$
is in 
$$\mathcal F$$
if and only if it is a P-set. Given a partition P, we denote the σ-algebra of P-sets by 
$$\mathcal F = \mathcal F(P).$$
Conversely, given a finite sub-σ-algebra 
$$\mathcal F \subset \mathcal B$$
, we can extract a partition P from 
$$\mathcal F$$
, denoted 
$$P(\mathcal F)$$
, by taking intersections of sets F j or X ∖ F k until we have disjoint atoms whose union is X. We define a subalgebra of measurable sets analogously to Definition 5.17.

Definition 5.18
If {P n} is a countable collection of measurable partitions of 
$$(X,\mathcal B,\mu )$$
, we define

$$\displaystyle \begin{aligned} \bigvee_{n=1}^{\infty} \mathcal{F}(P_n) = \mathcal{F}_{\infty} \end{aligned} $$
such that the following hold:
  1. 1.

    
$$\mathcal {F}(P_n) \subset \mathcal {F}_\infty $$
for all n, and

     
  2. 2.

    if 
$$\mathcal {F}(P_n) \subset \mathcal {F}'$$
for all n, then 
$$ \mathcal {F}_\infty \subset \mathcal {F}' $$
.

     
One can show that for every family of measurable partitions 
$$\{P_n\}_{n \in \mathbb N}$$
,

$$\displaystyle \begin{aligned} \mathcal{F} \left ( \bigvee_{n=1}^{\infty}P_n \right ) = \bigvee_{n=1}^{\infty}\mathcal{F}(P_n),\end{aligned} $$
(5.21)
(see Exercise 10). We characterize noninvertible maps by the existence of certain types of partitions, often finite.

5.2.2 Rohlin Partitions and Factors

We now combine partitions with dynamical systems. Assume that 
$$(X,\mathcal B,\mu ,f)$$
is a nonsingular dynamical system; recall the standing assumption that μ-a.e. x ∈ X has at most countably many preimages under f.

Definition 5.19
We apply a result of Rohlin [157] to obtain a partition ζ ≡ ζ(f) = {A 1, A 2, A 3, …} of X into at most countably many atoms satisfying
  1. 1.

    μ(A i) > 0 for each i;

     
  2. 2.

    the restriction of f to each A i, which we write as f i, is one-to-one;

     
  3. 3.

    each A i is of maximal measure in X ∖⋃j<i A j with respect to property (2);

     
  4. 4.
    f 1 is one-to-one and onto X, by numbering the atoms so that
    
$$\displaystyle \begin{aligned}\mu (fA_i)\geq \mu (fA_{i+1})\end{aligned}$$

    for 
$$i \in \mathbb N.$$

     

We call every partition defined as above a Rohlin partition for f , and we denote it by ζ.

We make the elementary observation that if f is invertible, then every Rohlin partition consists of one atom, namely A 1 = X (μ mod  0). In general, ζ is not uniquely defined; there are many choices possible. When we say that an endomorphism f is n-to-one, we mean that every Rohlin partition ζ = {A 1, A 2, A 3, … } satisfying (1)–(4) contains precisely n atoms and that f i is one-to-one and onto X for each i = 1, .., n. Equivalently, for μ-a.e. x ∈ X, the set f −1 x contains exactly n points. If ζ has n > 1 atoms but f does not necessarily map each A j onto X, then we say that f is bounded-to-one . A tent map, for example, is typically bounded-to-one but not 2-to-one (see Figure 5.3).
../images/491076_1_En_5_Chapter/491076_1_En_5_Fig3_HTML.png
Fig. 5.3

The graph of a bounded-to-one tent map.

Because the definition of noninvertibility depends on the measure μ, we specify the measure if there is some ambiguity. Recall that 
$$f^{-1}\mathcal B=\{C=f^{-1}A : A \in \mathcal B\}$$
. Rohlin partitions characterize noninvertibility of f in the following way.

Proposition 5.20
For a nonsingular system 
$$(X,\mathcal B,\mu ,f)$$
, the following are equivalent:
  1. 1.

    f is noninvertible with respect to μ.

     
  2. 2.

    There exists a Rohlin partition ζ containing at least two atoms.

     
  3. 3.

    Every Rohlin partition ζ contains at least two atoms.

     
  4. 4.

    
$$f^{-1}\mathcal B \subsetneqq \mathcal B$$
.

     
Proof
By using an equivalent probability measure if needed, assume μ(X) = 1. All statements made about sets are (μ mod  0).
  • (1) ⇒ (3): Suppose that there exists a Rohlin partition ζ containing exactly one atom. Then there is a set A ∈ B, μ(A) = 1 on which f is injective and surjective. By considering the set X′ =⋂n≥0 f n(f n A) ⊃ A, we see that μ(X′) = 1 and f is an automorphism on X′. This contradicts the noninvertibility of f, so every Rohlin partition has at least two atoms.

  • (2) ⇒ (1): Suppose (2) holds, then there exist sets 
$$A_1,A_2 \in \mathcal B$$
, 0 < μ(A 2) ≤ μ(A 1) < 1, with A 1 ∩ A 2 = ∅ and f(A 1) = X. Consider the set 
$$C=f(A_1) \cap f(A_2) \in \mathcal B$$
; it follows that C = f(A 2). Then C is a set of positive measure such that every point has at least two preimages, namely a point in A 1 and a point in A 2. Therefore f is not invertible.

  • (3) ⇒ (2): This implication follows immediately since a Rohlin partition always exists.

  • (4) ⇒ (1): If f is an automorphism, then every 
$$B \in \mathcal B$$
satisfies f −1 ∘ f(B) = B. Therefore every 
$$B \in \mathcal B$$
is also in 
$$f^{-1}\mathcal B = \{ f^{-1}A : A \in \mathcal B\}$$
, using A = f(B).

  • (2) ⇒ (4) If f has a Rohlin partition with at least two atoms in it, A 1, A 2, then there exists a set C ⊂ A 1, 0 < μ(C) ≤ μ(A 1) which is 
$$\mathcal B$$
-measurable, but is not in 
$$f^{-1}\mathcal B$$
. To see this choose C = f −1(fA 2) ∩ A 1. Since A 2 ⊂ f −1(fA 2), and C ∩ A 2 = ∅, C is not of the form f −1 B for any 
$$B \in \mathcal B$$
.

Definition 5.21
Let 
$$(X,\mathcal B,\mu ,f)$$
be a noninvertible dynamical system on a complete measure space. (We write 
$$\mathcal B$$
for 
$$\mathcal B_\mu $$
.)
  1. 1.

    A Rohlin partition ζ is a generating partition if 
$${\mathcal F}(\bigvee _{n=1}^{\infty } f^{-n}\zeta ) = \mathcal B\ (\mu \bmod 0)$$
.

     
  2. 2.
    More generally, a Rohlin partition ζ defines a sub-σ-algebra
    
$$\displaystyle \begin{aligned}\mathcal{F}= {\mathcal F} \left( \bigvee_{n=1}^{\infty} f^{-n}\zeta \right ) \subseteq \mathcal B,\end{aligned}$$

    which we call the subalgebra generated by ζ. (In this space, we only see sets from 
$$\mathcal {F}$$
).

     
  3. 3.

    Since 
$$f^{-1} \mathcal F \subseteq \mathcal F$$
, we have that each Rohlin partition determines a factor map onto 
$$(X,\mathcal F, \mu |{ }_{\mathcal F})$$
and f is well-defined on this space. We call this factor a Rohlin factor [157].

     

In [27, 30], the map x↦2x ( mod 1) provides an example showing that there exists a Rohlin partition that generates, and one that does not; therefore Rohlin factors are not unique (see Exercise 4 below).

5.3 The Parry Jacobian and Radon–Nikodym Derivatives

Assume 
$$(X,\mathcal B,\mu ,f)$$
is a bounded-to-one nonsingular system with a Rohlin partition ζ = {A 1, A 2, …, A k}, for some k ≥ 1. For each x ∈ A i, the map f i : A i → f(A i) is an isomorphism with respect to the measure μ. Therefore we can define a measure on A i, denoted μf i, as follows: for every 
$$B \in A_i \cap \mathcal B,$$

$$\displaystyle \begin{aligned} \mu f_i(B)&amp; = (f_i)^{-1}_*(\mu|{}_{A_i}) \\ &amp;= \int_X \chi_B\circ f_i^{-1}(x) \,d \mu(x) \\ &amp;= \mu(f_i(B)). \end{aligned} $$
(5.22)
Since μf i << μ, by the Radon–Nikodym Theorem we can define for μ-a.e. x ∈ A i,

$$\displaystyle \begin{aligned}\mathrm{{Jac}}_{\mu {f_i}}(x)=\frac{d\mu f_i}{d\mu}(x).\end{aligned}$$
Then for x ∈ X, set

$$\displaystyle \begin{aligned}\mathrm{{Jac}}_{\mu f}(x)=\sum \limits_{i=1}^k \mathrm{{Jac}}_{\mu {f_i}}(x)\chi_{A_i}(x). \end{aligned}$$
We call Jacμf the Jacobian function for f, defined by Parry [12]. It is independent of the choice of Rohlin partition ζ and the nonsingularity of f implies that Jacμf > 0 μ-a.e. Thinking of the Jacobian function as a local Radon–Nikodym derivative for f, we can define a global derivative as well. The following identities hold for μ-a.e. x ∈ X ([57], see also [86, 168]):

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \theta _{\mu f}(x) \equiv &amp; \dfrac{d f_* \mu }{d \mu}(x)= &amp;\displaystyle \sum_{y \in f^{-1}x}\frac{1}{\mathrm{{Jac}}_{\mu f}(y)}, \end{array} \end{aligned} $$
(5.23)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \omega _{\mu f}(x) \equiv &amp; \dfrac {d \mu}{d f_*\mu }(fx)=&amp;\displaystyle \frac{1}{\theta _{\mu f}(fx)} = \left(\sum_{y \in f^{-1}(f x)}\frac{1}{\mathrm{{Jac}}_{\mu f}(y)} \right )^{-1} . \end{array} \end{aligned} $$
(5.24)
The function ω μf is frequently referred to as the Radon–Nikodym derivative of f. When f is invertible with respect to μ, we have A 1 = X, and f 1 = f, so from the chain rule we see that ω μf = dμf, as in this case μf, also written 
$$f^{-1}_*\mu $$
, defines a measure on 
$$(X,\mathcal B)$$
.
In the general nonsingular case, ω μf is the global Radon–Nikodym derivative of the endomorphism f with respect to μ. We can also characterize ω μf as, (μ mod  0), the unique 
$$f^{-1}\mathcal B$$
-measurable function satisfying

$$\displaystyle \begin{aligned}\int_{X} \phi \circ f \cdot \omega_{\mu f}\, d \mu = \int_X \phi \, d \mu \,\mbox{ for all } \, \phi \in L^1(X,\mathcal B,\mu) \end{aligned}$$
[57] ; from this it follows that ω μf = 1 a.e. if and only if f preserves μ.
If we have an equivalent measure ν ∼ μ, then by the Radon–Nikodym Theorem we write 
$$\frac {d\mu }{d \nu } = g$$
with g > 0 a.e., and it follows that

$$\displaystyle \begin{aligned} \mathrm{{Jac}}_{\mu f}(x) = \frac{g \circ f}{g}(x) \cdot \mathrm{{Jac}}_{\nu f}(x)\quad  \mbox{a.e.} \end{aligned} $$
(5.25)
More generally we use the Jacobian to define the transfer operator 
$${\mathcal L} _{\mu f}$$
acting on the space of measurable functions 
$$h: X \rightarrow \mathbb R$$
by

$$\displaystyle \begin{aligned}{\mathcal L}_{\mu f}h (x) = \sum_{y \in f^{-1}x} \frac {h(y)}{\mathrm{{Jac}}_{\mu f}(y)}. \end{aligned} $$
(5.26)
Example 5.22
Suppose 
$$T=(T_1,\ldots ,T_k): \mathbb R^k \rightarrow \mathbb R^k$$
is a continuously differentiable map on an open set 
$$\mathcal O$$
. If for every 
$$x \in \mathcal O$$
the classical Jacobian, 
$$\det \left ( \partial T_i /\partial x_j \right )$$
, is nonzero, then T is a diffeomorphism between 
$$\mathcal O$$
and 
$$T(\mathcal O)$$
with 
$$\mathrm {{Jac}}_{m_k T}(x)= |\det (\partial T_i / \partial x_j)(x)|.$$
For example, if 
$$T:\mathbb R \rightarrow \mathbb R$$
is C 1, and (5.26) becomes

$$\displaystyle \begin{aligned} {\mathcal L}_{m T} h (x) = \sum_{y \in T^{-1}x} \frac {h(y)}{|T^{\prime} (y)|}.\end{aligned} $$
(5.27)
Furthermore if 
$$T:\mathbb C \rightarrow \mathbb C$$
is holomorphic the Jacobian with respect to (two-dimensional) Lebesgue measure m on 
$$\mathbb C$$
is JacmT(z) = |T (z)|2.

For a nonsingular system 
$$(X,\mathcal B,\mu ,f)$$
the following identities hold with proofs following from (5.23)–(5.26) (see also [168] or [86]):

Lemma 5.23
If f is bounded-to-one, then the following hold:
  1. 1.

    
$$\theta _{\mu f}=\dfrac {d \mu f^{-1}}{d \mu }(x)= {\mathcal L}_{\mu f}1$$
;

     
  2. 2.

    f preserves μ if and only if 
$${\mathcal L}_{\mu f}1 = 1$$
; in this case 
$$U_f^*(\phi ) = {\mathcal L}_{\mu f}(\phi ) \, \mathit{\mbox{for every}} \, \phi \in L^2(X,\mathcal B,\,mu)$$
.

     
  3. 3.

    f preserves a measure ν  μ if and only if 
$${\mathcal L}_{\mu f}g = g$$
and dν = gdμ;

     

The next result was proved in [48].

Lemma 5.24
Let f on 
$$(X, \mathcal B_\mu , \mu )$$
be an n-to-one measure-preserving endomorphism, with μ a complete measure, and ζ = {A 1, …A n} a Rohlin partition. As in Definition 5.21 , denote by 
$$\mathcal F$$
the associated subalgebra generated by ζ.
  1. 1.
    Then the induced Rohlin factor map of f on 
$$\mathcal F$$
, is a measure preserving shift on 
$$\Sigma _n^+$$
, so the diagram
    
$$\displaystyle \begin{aligned} \begin{array}{ccc} (X,\mathcal B_\mu,\mu) &amp; \stackrel{f}{\longrightarrow} &amp; (X,\mathcal B_\mu,\mu) \\[0.2cm] \downarrow \pi &amp; &amp; \downarrow \pi \\[0.2cm] (\Sigma_n^+, \mathcal C, \nu) &amp; \stackrel{\sigma}{\longrightarrow} &amp; (\Sigma_n^+, \mathcal C, \nu) \end{array} \end{aligned}$$

    commutes, where ν(C) = μ(π −1(C)), 
$$C \in \mathcal {C}$$
, is the factor measure induced by μ.

     
  2. 2.

    If in addition there exists a Rohlin partition for f such that the Jacobian 
$$\mathrm {{Jac}}_{\mu f}(x) = \frac {1}{p_i}$$
for all x  A i , then the induced Rohlin factor 
$$(\Sigma _n^+, \mathcal C, \nu , \sigma )$$
is the p = (p 0, p 1, …, p n−1) one-sided Bernoulli shift.

     
Remark 5.25
The Jacobian function can be defined more generally as a Radon–Nikodym derivative of a map between different spaces. If 
$$\phi :(X_1, \mathcal B_1,\mu _1) \rightarrow (X_2,\mathcal B_2,\mu _2)$$
is a measure-preserving isomorphism, then 1 ϕ = 1 μ-a.e. If 
$$f:(X, \mathcal B,\mu ) \rightarrow (X,\mathcal B,\mu )$$
is a measure-preserving automorphism, then

$$\displaystyle \begin{aligned} \mathrm{{Jac}}_{\mu f}=\omega_{\mu f}=\theta_{\mu f}=1 \quad  \mu\mbox{-a.e.} \end{aligned} $$
(5.28)

5.4 Examples of Noninvertible Maps

We give some basic examples of noninvertible dynamical systems.

  • One-sided Bernoulli shifts: let 
$$X= \Sigma _n^+ = \prod _{n=0}^{\infty }\{0,1,\ldots ,n-1\}$$
. The shift map σ(x)i = x i+1 is n-to-one with respect to Bernoulli measures in the sense that given a point x = (x 0, x 1, …, x k, …), the set
    
$$\displaystyle \begin{aligned}\sigma^{-1}(x)=\{(0,x_0,x_1,\ldots,),(1,x_0,x_1,\ldots,), \ldots,(n-1,x_0,x_1,\ldots,x_k,\ldots,)\},\end{aligned}$$
    consists of n distinct points. A Rohlin partition ζ consists of sets of the form A j = {x  :  x 0 = j}, j = 0, 1, …, n − 1.
  • Complex dynamics: we consider an analytic map of the Riemann sphere 
$$\widehat {\mathbb {C}}$$
such as
    
$$\displaystyle \begin{aligned} R(z)= -\frac 14 \left ( z+\frac 1z+2 \right ),\end{aligned}$$
    (see Chapter 12); R(z) is 2-to-1 with respect to a smooth measure m, giving surface area. An example of a Rohlin partition is ζ = {A 1, A 2} with A 1 = {z = x + iy  :  y > 0 or y = 0, x ≥ 0} and A 2 = {z = x + iy  :  y < 0 or y = 0, x ≤ 0}.
  • Interval maps: there is a well-studied subject of piecewise monotone maps that are bounded-to-one.

  • Many mathematical models of physical systems that are irreversible processes naturally involve noninvertible maps when one of the variables represents time.

5.5 Exact Endomorphisms

Recall the definition of exactness from Section 2.​3, which is that a nonsingular dynamical system 
$$(X, \mathcal B, \mu ,f)$$
, μ(X) = 1, is exact if

$$\displaystyle \begin{aligned}\displaystyle \bigcap_{n \geq 0} f^{-n}\mathcal B = \{\emptyset, X\} \ (\mu \bmod 0).\end{aligned}$$
Equivalently, if 
$$A \in \mathcal B$$
is a tail set, so A = f n(f n A) for all n ≥ 1, then μ(A) = 0 or 1. Here we give an operator characterization of exactness in the finite measure-preserving case, a version of which appears in [125].
When 
$$(X,\mathcal B,\mu ,f)$$
is a probability measure-preserving dynamical system, then the following operators on 
$$L^2(X, \mathcal B,\mu )$$
are all the same:

$$\displaystyle \begin{aligned}(U_f^*)^n = U_{f^n}^* = (U_f^n)^*.\end{aligned}$$
When f is understood, we write this operator as (U )n. Moreover, by Lemma 5.23 (2), we have the following result, yielding an explicit formula for (U )n.
Lemma 5.26
Assume 
$$f:(X, \mathcal B, \mu ) \rightarrow (X, \mathcal B, \mu )$$
is a bounded-to-one map, and that f preserves the probability measure μ. Then for each n ≥ 1, for each 
$$\phi \in L^2(X,\mathcal B,\mu )$$
,

$$\displaystyle \begin{aligned} (U^*)^n \phi(x) = {\mathcal L}_{\mu f^n} \phi(x) = \sum_{y \in f^{-n}x} \frac {\phi(y)}{\mathrm{{Jac}}_{\mu f^n}(y)} \quad  \mu\mathit{\mbox{-a.e.}} \end{aligned} $$
(5.29)
Remark 5.27
  1. 1.
    For each k ≥ 1, 
$$L^2(X,f^{-k}\mathcal B,\mu ) \subset L^2(X,\mathcal B, \mu )$$
defines a closed subspace, and we can define 
$$\mathcal {P}_k$$
to be the orthogonal projection onto 
$$L^2(X,f^{-k}\mathcal B,\mu )$$
in L 2. We note that
    
$$\displaystyle \begin{aligned}L^2(X,f^{-k}\mathcal B,\mu) \subset \cdots \subset L^2(X,f^{-1}\mathcal B, \mu) \subset L^2(X,\mathcal B, \mu).\end{aligned}$$
     
  2. 2.

    We set 
$$\mathcal {P}_\infty $$
to be projection onto 
$$L^2(X, \cap _{k=1}^\infty f^{-k}\mathcal B,\mu )$$
. Then exactness of f means that for all 
$$\phi \in L^2(X,\mathcal B,\mu )$$
, 
$$\mathcal {P}_\infty (\phi )$$
is the constant function ∫X ϕ , or equivalently, that 
$$L^2(X, \cap _{k=1}^\infty f^{-k}\mathcal B,\mu )$$
consists only of constant functions.

     

We give a proof of a characterization of exactness, and refer to ([125], Thm 4.4) for a different proof. A decreasing martingale proof can also be given using ([86], Definition 4.2 and [176], Proposition 17.4).

Proposition 5.28
If 
$$(X, \mathcal B, \mu ,f)$$
is a bounded-to-one probability measure-preserving dynamical system, then f is exact if and only if for all 
$$\phi \in L^2(X,\mathcal B, \mu )$$
,

$$\displaystyle \begin{aligned} (U^*)^n \phi \rightarrow \int_{X}\phi\, d\mu\end{aligned} $$
(5.30)

as n ∞, with convergence in the L 2 norm and whereX ϕ dμ denotes the constant function with that value.

Proof

(⇒): Fix 
$$n \in \mathbb N$$
, and apply Lemma 5.26. Then on a set of full measure in X, (U )n U n ϕ(x) = (U )n U n ϕ(w) for w ∈ X such that f n x = f n w. Properties of the Koopman operator and f imply that (U )n satisfies ∥(U )n∥ = 1 and maps 
$$L^2(X,\mathcal B,\mu )$$
onto 
$$L^2(X,f^{-n} \mathcal B,\mu )$$
so (U )n gives an orthogonal projection of 
$$L^2(X,\mathcal B,\mu )$$
onto 
$$L^2(X,f^{-n} \mathcal B,\mu ).$$

Set ϕ = χ A for some 
$$A \in \mathcal B_+$$
. From the discussion above, (U )n ϕ(x) = ψ(w), where w ∈{f n x} is any value in the set, or equivalently (U )n ϕ(x) = ψ({f n x}), for some 
$$\psi \in L^2(X,f^{-n} \mathcal B,\mu ).$$
Since for each 
$$n \in \mathbb N$$
, 
$$(U{{ }^*})^n \phi \in L^2(X, \cap _{k = 0}^n f^{-k} \mathcal B, \mu )$$
as n →, ∥(U )n χ A − α2 → 0 for some constant function α by exactness. It follows that α = μ(A). Extending to simple functions and then L 2 functions using linearity and denseness, gives the result.

(⇐= ): Assume (5.30) holds. Consider 
$$A \in \mathcal B_+$$
such that 
$$A \in \cap _{i \in \mathbb N} f^{-i}\mathcal B$$
, so for every 
$$n \in \mathbb N$$
, A = f n(A n) (μ mod  0), for some 
$$A_n \in \mathcal B$$
. Setting ϕ = χ A, applying the hypothesis and Lemma 5.26,

$$\displaystyle \begin{aligned} \left \| \sum_{y \in f^{-n}x}\frac{\chi_A(y)}{\mathrm{{Jac}}_{\mu f^n}(y)} - \mu(A) \right \|{}_2 \rightarrow 0 \end{aligned} $$
(5.31)
as n →. However for every n, and μ-a.e. x,

$$\displaystyle \begin{aligned}\sum_{y \in f^{-n}x}\frac{\chi_A(y)}{\mathrm{{Jac}}_{\mu f^n}(y)} = 0\end{aligned}$$
if one of the terms in the sum is 0. This holds since A is a tail set, which means y ∈ A if and only if w ∈ A for all other w such that f n w = f n y. Since the limit in (5.31) exists, we must have μ(A) = 0 if the sum is 0 for all n large enough. Otherwise we claim the sum is 1. Since μ is preserved, 
$$\sum _{y \in f^{-n}x}1/\mathrm {{Jac}}_{\mu f^n}(y) = 1$$
by Lemma 5.23 (2). Therefore if y ∈{f n(f n x)}⊂ A, then since A is a tail set, w ∈ A for all other w such that f n w = f n y = x. Then μ(A) must be 0 or 1, so by assumption, 1. Therefore f is exact since every tail set of positive measure has measure 1. □
Corollary 5.29

If 
$$f:(X, \mathcal B, \mu ) \rightarrow (X, \mathcal B, \mu )$$
preserves the probability measure μ and is exact, then f is mixing and ergodic.

Proof
We consider the sets 
$$A,B \in \mathcal B$$
. Then

$$\displaystyle \begin{aligned}\mu(f^{-n}A \cap B)=( U^n \chi_A,\chi_B) =(\chi_A, (U^*)^n\chi_B),\end{aligned}$$
so as n →, applying Proposition 5.28,

$$\displaystyle \begin{aligned}\mu(f^{-n}A \cap B) \rightarrow (\chi_A, 1) \mu(B) = \mu(A) \mu(B).\end{aligned}$$
This proves mixing, and mixing implies ergodicity by Proposition 5.5. □

We remark that an invertible map can carry an exact map as a measurable factor. When the factor comes from a generating partition in the sense described in Definition 5.30, f is eponymously called a K-automorphism after Kolmogorov.

Definition 5.30
A probability measure-preserving dynamical system 
$$(X,\mathcal B,\mu ,f)$$
is a K-automorphism or a Kolmogorov automorphism if it is invertible and if there exists a subalgebra 
$$\mathcal K \subset \mathcal B$$
, such that 
$$\mathcal {K} \subset f (\mathcal {K})$$
, and 
$$\mathcal {K}$$
generates 
$$\mathcal B$$
in the following sense:

$$\displaystyle \begin{aligned} \bigvee_{n=0}^{\infty} f^n(\mathcal{K}) = \mathcal B \ (\mu \bmod 0). \end{aligned} $$
(5.32)
Additionally, the tail of 
$$\mathcal K$$
is trivial; i.e., 
$$\bigcap _{n \geq 0}f^{-n}\mathcal {K} =\{\emptyset , X\} \ (\mu \bmod 0)$$
.
Remark 5.31

Finite measure-preserving exact endomorphisms and K-automorphisms are r-fold mixing for all r ≥ 1 [44, 158].

Exercises
  1. 1.

    Prove Theorem 5.10. Hint: Use the techniques from the relevant parts of the proof of Theorem 5.9.

     
  2. 2.

    Prove that the map f(x) = 2x ( mod 1) on 
$$([0,1] ,\mathcal B,m)$$
is weak mixing.

     
  3. 3.

    Prove that the map f(x) = bx ( mod 1) 
$$([0,1),\mathcal B,m)$$
is mixing for every integer b ≥ 2.

     
  4. 4.
    1. a.

      Show that for the map f(x) = 2x ( mod 1) on [0, 1] with Lebesgue measure, for every 
$$t \in [0, \frac 12)$$
, the partition ζ t = {A 1, A 2} with 
$$A_1 = [t , t + \frac 12)$$
, 
$$A_2 = [t + \frac 12 , t)$$
is a Rohlin partition.

       
    2. b.

      Prove that ζ 1∕4 is not a generating partition, but ζ 0 is. Hint: Use the partition to make a coding map to a symbol space.

       
     
  5. 5.

    Show f = R α on 
$$(\mathbb R/\mathbb Z,\mathcal B,m)$$
with α irrational, is not weak mixing by showing that f × f is not ergodic.

     
  6. 6.

    For 
$$(X,\mathcal B,\mu ,f)$$
a probability preserving dynamical system, prove that f is mixing if and only if for every 
$$A \in \mathcal B$$
, limn μ(f n A ∩ A) = μ(A)2.

     
  7. 7.

    For 
$$(X,\mathcal B,\mu ,f)$$
a probability preserving dynamical system, prove that f is mixing if and only if f × f is mixing.

     
  8. 8.

    Prove that the map f(x) = bx ( mod 1) on 
$$([0,1),\mathcal B,m)$$
is exact for every integer b ≥ 2.

     
  9. 9.

    Prove that if 
$$(X,\mathcal B,\mu ,f)$$
, with μ(X) < , has an attractor, then f cannot be mixing.

     
  10. 10.
    Prove that if 
$$\{P_n\}_{n \in \mathbb N}$$
, is a family of finite partitions of 
$$(X,\mathcal B,\mu )$$
, then
    
$$\displaystyle \begin{aligned} \mathcal{F} \left ( \bigvee_{n=1}^{\infty}P_n \right ) = \bigcup_{n=1}^{\infty}\mathcal{F}(P_n). \end{aligned} $$
    (5.33)