In this chapter, we develop the concept of mixing to show that ergodic maps are not the most chaotic type of transformation. For example, not every ergodic dynamical system can make a heterogeneous environment appear to be more homogeneous after repeated applications of the transformation, but mixing systems can. Toral translations (and in fact all isometries on Riemannian manifolds) take every set A to a congruent set, so the set cannot mix into other sets. We define properties of mixing in this chapter and we also discuss how noninvertibility of a map is connected to mixing properties.
We begin with the following characterization of ergodicity and develop the mixing definitions from there.
The next result does not require that f preserve a probability measure if it is invertible.
Assume is nonsingular, invertible, and μ(X) = 1. Then f is ergodic if and only if for every , there exists such that μ(f −n A ∩ B) > 0.
(⇐= ): If the set condition holds, then suppose f −1 A = A (μ mod 0) for some . The invertibility of f implies that for every , f −1 A = A if and only if f −1(X ∖ A) = (X ∖ A). Therefore μ(X ∖ A) = 0 and f is ergodic.
(⇒): Given sets , the smallest invariant set containing A is ; if f is ergodic, then this set has measure 1, so (μ mod 0). Hence the theorem is proved. □
We prove a basic result about sequences of real numbers that is useful in our comparison of mixing types.
5.1 Weak Mixing and Mixing
We apply Lemma 5.3 to the definitions of weak mixing and mixing for finite measure-preserving transformations to obtain useful characterizations of these properties. We begin with the definitions.
If f is a finite measure-preserving transformation of , then f is
- 1.weak mixing if and only if for all sets ,(5.8)
and
- 2.(strong) mixing if and only if for all sets ,(5.9)
There is the following mixing hierarchy.
If f is a finite measure-preserving transformation of , then f is mixing implies f is weak mixing, and f is weak mixing implies f is ergodic.
Because of the hierarchy coming from Proposition 5.5, mixing is sometimes called strong mixing for emphasis. However, despite its appearance of between “in between” ergodicity and mixing, the property of being weak mixing is extremely interesting and important in ergodic theory. First we show the classical example of an ergodic transformation that is not weak mixing, namely irrational rotation on . We give a statistical argument from [155].
Therefore the limit cannot be 0, so R α is not weak mixing from (5.8).
- 1.
- 2.
There exists a subset with Δ(J) = 0, such that limn→∞ a n = 0 provided n∉J.
- 3.
(1) ⇒ (2): For , let δ J(n) = |J ∩{1, …, n}|. For a fixed , define J κ = {n : |a n|≥ 1∕κ}.
Assuming the claim holds, if n > M κ and n∉J, then n∉J κ+1 and therefore |a n| < 1∕(κ + 1). Therefore limn∉J,n→∞|a n| = 0.
We next give a list of equivalent characterizations of weak mixing for dynamical systems preserving μ, with μ(X) = 1. We need a spectral definition first.
Under the hypotheses above, we say f has continuous spectrum if 1 is the only eigenvalue of the Koopman operator U = U f and the constants are the only eigenfunctions.
Although mixing seems to be the property that is physically observable, we give a list of conditions equivalent to weak mixing to show the property is quite natural mathematically. It is also very prevalent among measure-preserving transformations, more so than mixing [81].
- 1.
f is weak mixing.
- 2.For every pair of sets , there exists a set of density 0 such that
- 3.For all
- 4.For every
- 5.For every ,
- 6.
The transformation on defined by f × f is ergodic.
- 7.
The transformation on defined by f × f is weak mixing.
- 8.
f has continuous spectrum on the orthogonal complement to the constants.
The equivalence of (1) and (2) follows from Lemma 5.7 by setting a k = μ(f −k A ∩ B) − μ(A)μ(B), for sets .
Proving the equivalence of (1) and (3) is straightforward; statement (3) is the definition of weak mixing when ϕ = χ A and ψ = χ B, so (3) implies (1), and (1) gives (3) for characteristic functions. Passing to simple functions and using a standard approximation argument, first fixing B, shows that statement (3) holds for all L 2 pairs of functions.
(1) ⇒ (4) proceeds like (1) ⇒ (3).
(3) ⇒ (4) by choosing ϕ = ψ.
To prove (4) implies (3), assume (4) holds. The first observation is that for each fixed ϕ the set of functions ψ ∈ L 2 satisfying (3) forms a closed subspace of L 2; denote this space by V ϕ. V ϕ contains the constants and ϕ by (4), and U(V ϕ) = V ϕ since f preserves μ. The claim is that , which then proves (3). If not, consider some . Then for all k, (U k ϕ, ψ) = 0 and (1, ψ) = 0. In this case (3) holds for ψ, so no such ψ exists.
An application of Lemma 5.7 using shows that (5) is equivalent to (4).
(7) ⇒ (6) follows from Proposition 5.5.
It remains to connect (8) to the rest of the equivalent statements.
(6) ⇒ (8): if ϕ(fx) = λϕ(x) for some nonconstant ϕ ∈ L 2, ϕ, then setting , it follows that Φ(fx, fy) = |λ|2 Φ(x, y). Then there is a nonconstant f × f invariant function, contradicting (6).
The question of whether mixing is actually stronger than weak mixing was solved in 1969 by Chacon [35]. Examples can also be found in [148], and are attributed to Kakutani and von Neumann as early as the 1940s. What most of the examples have in common is that they start with an ergodic transformation f with discrete spectrum on a space X, send a proper set A ⊂ X off into an identical but disjoint copy of A, and then back into X using f, to create a new map on a larger space. By doing this carefully, eigenfunctions can be destroyed, but the resulting transformation still does not mix the space up too much.
We have some equivalent characterizations for mixing; the proof of the next result is similar to that given for weak mixing and appears as an exercise.
- 1.f is mixing; i.e., for all ,
- 2.For all
- 3.For every
The notion of multiple mixing plays a key role in the subject of ergodic theory, potentially providing an aid in classifying dynamical systems.
The proof given in Theorem 6.3, Chapter 6 that Bernoulli shifts are mixing also shows that they are r-fold mixing. There is an open problem due to Rohlin in 1949, outlined by Halmos in 1956 ([81], the last page of the book), as to whether mixing implies r-fold mixing for every probability measure-preserving dynamical system . While some partial results and reductions have been obtained in the intervening years, the problem remains largely open. All results so far give hypotheses under which the answer is yes (see [103], and Remark 5.31 below). We give an example for a higher dimensional () action in Chapter 6 (Example 6.24), which is mixing but not 3-fold mixing. For an overview of the state of the problem in 2006, a good source is [50].
5.2 Noninvertibility
Noninvertible maps mix up spaces in a natural way. We begin to make this precise here, and a clearer picture emerges in subsequent chapters, e.g., when entropy is introduced in Chapter 11. In calculus, the test for the invertibility of a function is the “horizontal line test.” If every line drawn parallel to the x-axis cuts the graph in exactly one point or does not intersect the graph at all, then f is invertible on . However in ergodic theory, invertibility or the lack thereof is a measure theoretic property. Moreover a map need only satisfy a property on a set of full measure, so exceptional points can occur with respect to invertibility.
5.2.1 Partitions
A key tool for understanding noninvertibility is the notion of a partition of a standard measure space . Partitions consisting of finitely many sets provide the basic building blocks for most of symbolic dynamics and coding as well.
- If is a dynamical system and P is a finite partition of X, then we can define for each ,
- Given two finite partitions P and Q, their join is the partition given by
For , i < j, .
If P is a partition, a set A is a P-set if A is a union of atoms of P.
By definition every partition has measurable atoms, but a “measurable partition” has a different definition and not every partition satisfies Definition 5.14 below. It is typical to study properties of measurable partitions on Lebesgue probability spaces, obtained by extending to include the completion of the probability measure μ under consideration; when in that setting we write . We note that the sets we add to to obtain are all subsets of sets with μ(E) = 0. Equivalently, we assume that is isomorphic to , the unit interval with Lebesgue measure structure on it, and we call X a (nonatomic) Lebesgue space (see Appendix A, Section A.2 for details). This provides a technical tool needed later in this section, making it easier to define a quotient space in this setting. More precisely, if we start with a Lebesgue space X and a measurable partition P of X and consider the quotient space X∕P whose points are the atoms of P, then X∕P is again a Lebesgue space. We do not develop all the technical aspects here, but refer to [134], [157], or [160] for more details.
P is a measurable partition if there is a countable family of P-sets, such that if B ≠ C are atoms in P, then there exists a set A j such that B ⊂ A j and C ⊂ X ∖ A j or vice versa.
There exists a partial order on the set of measurable partitions on X.
Given two partitions P and Q of we say that P is refined by Q if every atom of P is a Q-set. We write P ≼ Q to show that P has fewer atoms, or (in case they are both infinite partitions) atoms of P are Q-sets so are coarser. We also write Q ≽ P for P ≼ Q.
, and
if P n ≼ Q′ for all n, and Q′ is measurable, then Q′≽ Q.
Every finite or countable partition P is measurable, and we assume from now on that every partition we refer to is a measurable partition. A finite partition P generates a sub-σ-algebra in a natural way; namely, a set is in if and only if it is a P-set. Given a partition P, we denote the σ-algebra of P-sets by Conversely, given a finite sub-σ-algebra , we can extract a partition P from , denoted , by taking intersections of sets F j or X ∖ F k until we have disjoint atoms whose union is X. We define a subalgebra of measurable sets analogously to Definition 5.17.
- 1.
for all n, and
- 2.
if for all n, then .
5.2.2 Rohlin Partitions and Factors
We now combine partitions with dynamical systems. Assume that is a nonsingular dynamical system; recall the standing assumption that μ-a.e. x ∈ X has at most countably many preimages under f.
- 1.
μ(A i) > 0 for each i;
- 2.
the restriction of f to each A i, which we write as f i, is one-to-one;
- 3.
each A i is of maximal measure in X ∖⋃j<i A j with respect to property (2);
- 4.f 1 is one-to-one and onto X, by numbering the atoms so that
for
We call every partition defined as above a Rohlin partition for f , and we denote it by ζ.
Because the definition of noninvertibility depends on the measure μ, we specify the measure if there is some ambiguity. Recall that . Rohlin partitions characterize noninvertibility of f in the following way.
- 1.
f is noninvertible with respect to μ.
- 2.
There exists a Rohlin partition ζ containing at least two atoms.
- 3.
Every Rohlin partition ζ contains at least two atoms.
- 4.
.
(1) ⇒ (3): Suppose that there exists a Rohlin partition ζ containing exactly one atom. Then there is a set A ∈ B, μ(A) = 1 on which f is injective and surjective. By considering the set X′ =⋂n≥0 f −n(f n A) ⊃ A, we see that μ(X′) = 1 and f is an automorphism on X′. This contradicts the noninvertibility of f, so every Rohlin partition has at least two atoms.
(2) ⇒ (1): Suppose (2) holds, then there exist sets , 0 < μ(A 2) ≤ μ(A 1) < 1, with A 1 ∩ A 2 = ∅ and f(A 1) = X. Consider the set ; it follows that C = f(A 2). Then C is a set of positive measure such that every point has at least two preimages, namely a point in A 1 and a point in A 2. Therefore f is not invertible.
(3) ⇒ (2): This implication follows immediately since a Rohlin partition always exists.
(4) ⇒ (1): If f is an automorphism, then every satisfies f −1 ∘ f(B) = B. Therefore every is also in , using A = f(B).
(2) ⇒ (4) If f has a Rohlin partition with at least two atoms in it, A 1, A 2, then there exists a set C ⊂ A 1, 0 < μ(C) ≤ μ(A 1) which is -measurable, but is not in . To see this choose C = f −1(fA 2) ∩ A 1. Since A 2 ⊂ f −1(fA 2), and C ∩ A 2 = ∅, C is not of the form f −1 B for any .
□
- 1.
A Rohlin partition ζ is a generating partition if .
- 2.More generally, a Rohlin partition ζ defines a sub-σ-algebra
which we call the subalgebra generated by ζ. (In this space, we only see sets from ).
- 3.
Since , we have that each Rohlin partition determines a factor map onto and f is well-defined on this space. We call this factor a Rohlin factor [157].
In [27, 30], the map x↦2x ( mod 1) provides an example showing that there exists a Rohlin partition that generates, and one that does not; therefore Rohlin factors are not unique (see Exercise 4 below).
5.3 The Parry Jacobian and Radon–Nikodym Derivatives
For a nonsingular system the following identities hold with proofs following from (5.23)–(5.26) (see also [168] or [86]):
- 1.
;
- 2.
f preserves μ if and only if ; in this case .
- 3.
f preserves a measure ν ∼ μ if and only if and dν = gdμ;
The next result was proved in [48].
- 1.Then the induced Rohlin factor map of f on , is a measure preserving shift on , so the diagram
commutes, where ν(C) = μ(π −1(C)), , is the factor measure induced by μ.
- 2.
If in addition there exists a Rohlin partition for f such that the Jacobian for all x ∈ A i , then the induced Rohlin factor is the p = (p 0, p 1, …, p n−1) one-sided Bernoulli shift.
5.4 Examples of Noninvertible Maps
We give some basic examples of noninvertible dynamical systems.
- One-sided Bernoulli shifts: let . The shift map σ(x)i = x i+1 is n-to-one with respect to Bernoulli measures in the sense that given a point x = (x 0, x 1, …, x k, …), the setconsists of n distinct points. A Rohlin partition ζ consists of sets of the form A j = {x : x 0 = j}, j = 0, 1, …, n − 1.
- Complex dynamics: we consider an analytic map of the Riemann sphere such as(see Chapter 12); R(z) is 2-to-1 with respect to a smooth measure m, giving surface area. An example of a Rohlin partition is ζ = {A 1, A 2} with A 1 = {z = x + iy : y > 0 or y = 0, x ≥ 0} and A 2 = {z = x + iy : y < 0 or y = 0, x ≤ 0}.
Interval maps: there is a well-studied subject of piecewise monotone maps that are bounded-to-one.
Many mathematical models of physical systems that are irreversible processes naturally involve noninvertible maps when one of the variables represents time.
5.5 Exact Endomorphisms
- 1.For each k ≥ 1, defines a closed subspace, and we can define to be the orthogonal projection onto in L 2. We note that
- 2.
We set to be projection onto . Then exactness of f means that for all , is the constant function ∫X ϕ dμ, or equivalently, that consists only of constant functions.
We give a proof of a characterization of exactness, and refer to ([125], Thm 4.4) for a different proof. A decreasing martingale proof can also be given using ([86], Definition 4.2 and [176], Proposition 17.4).
as n →∞, with convergence in the L 2 norm and where ∫X ϕ dμ denotes the constant function with that value.
(⇒): Fix , and apply Lemma 5.26. Then on a set of full measure in X, (U ∗)n U n ϕ(x) = (U ∗)n U n ϕ(w) for w ∈ X such that f n x = f n w. Properties of the Koopman operator and f imply that (U ∗)n satisfies ∥(U ∗)n∥ = 1 and maps onto so (U ∗)n gives an orthogonal projection of onto
Set ϕ = χ A for some . From the discussion above, (U ∗)n ϕ(x) = ψ(w), where w ∈{f −n x} is any value in the set, or equivalently (U ∗)n ϕ(x) = ψ({f −n x}), for some Since for each , as n →∞, ∥(U ∗)n χ A − α∥2 → 0 for some constant function α by exactness. It follows that α = μ(A). Extending to simple functions and then L 2 functions using linearity and denseness, gives the result.
If preserves the probability measure μ and is exact, then f is mixing and ergodic.
We remark that an invertible map can carry an exact map as a measurable factor. When the factor comes from a generating partition in the sense described in Definition 5.30, f is eponymously called a K-automorphism after Kolmogorov.
Finite measure-preserving exact endomorphisms and K-automorphisms are r-fold mixing for all r ≥ 1 [44, 158].
- 1.
- 2.
Prove that the map f(x) = 2x ( mod 1) on is weak mixing.
- 3.
Prove that the map f(x) = bx ( mod 1) is mixing for every integer b ≥ 2.
- 4.
- a.
Show that for the map f(x) = 2x ( mod 1) on [0, 1] with Lebesgue measure, for every , the partition ζ t = {A 1, A 2} with , is a Rohlin partition.
- b.
Prove that ζ 1∕4 is not a generating partition, but ζ 0 is. Hint: Use the partition to make a coding map to a symbol space.
- 5.
Show f = R α on with α irrational, is not weak mixing by showing that f × f is not ergodic.
- 6.
For a probability preserving dynamical system, prove that f is mixing if and only if for every , limn→∞ μ(f −n A ∩ A) = μ(A)2.
- 7.
For a probability preserving dynamical system, prove that f is mixing if and only if f × f is mixing.
- 8.
Prove that the map f(x) = bx ( mod 1) on is exact for every integer b ≥ 2.
- 9.
Prove that if , with μ(X) < ∞, has an attractor, then f cannot be mixing.
- 10.Prove that if , is a family of finite partitions of , then(5.33)