The Birkhoff & Maximal Ergodic Theorems
Joel Shapiro
(Revised 11/16/2018)
1. Review

(X,ℱ,m) is a finite measure space
i.e., X is a set, ℱ is a sigmaalgebra of subsets, and m is a finite measure on X.

Notation: for f∈L1(m): ∫Xfdm=∫fdm=∫f .

T:X→X is a measurepreserving transformation (mpt),
i.e., for each F∈ℱ: T−1(E)∈ℱ and m(T−1(E))=m(E).

The (T) orbit of a point x∈X is the sequence (Tnx:n=0,1,2,…).

The Poincare Recurrence Theorem says that for T a mpt and E∈ℱ with m(E)>0:
Tnx∈E infinitely often for almostevery x∈X.
In other words, "The Torbit of a.e. x∈E revisits E infinitely often."

Question: How often does the orbit of x∈E revisit E?
For a more quantitative version of our question, consider for f∈L1(m), and n is a nonnegative integer, the time average Anf, defined by:
Anf(x):=1n∑k=0n−1f(Tnx)(x∈X).
If, e.g., χE, the characteristic function of E∈ℱ (value =1 at x if x∈E, and =0 otherwise), then AnχE(x) is the average number of visits in time n that the orbit of x makes to E.

The Birkhoff Ergodic Theorem (next section) concerns the longterm behavior of these averages.
2. The Birkhoff (pointwise) Ergodic Theorem

Theorem ([Bir], 1931). For each f∈L1(m) there exists an ℱmeasurable function f∗, finitevalued a.e. on X, such that for a.e. x∈X:
limn→∞Anf(x)=f∗(x).

Special properties of f∗.
 f∗∈L1(m) (by Fatou's Lemma)
 f∗∘T=f∗ (i.e., f is "Tinvariant)", and
 ∫f∗dm=∫fdm.

For the special case f=χE, where E∈ℱ and m(X)=1, Birkhoff's Theorem says that:
For a.e. x∈X the "longterm average" number of visits, χ∗E(x), that the orbit of x makes to E exists, and its "space average" ∫χ∗Edm is just m(E) (i.e, the "probability that a random point of X lies in E")

Proof of "special properties of f∗".

(a) That f∗∈L1(m) follows easily from Birkhoff's Theorem and Fatou's Lemma.

(b) We have
Anf(Tx)=An+1f(TX)−f(x)n
for each x∈X. Thus for a.e. x∈X:
limnAnf(Tx)=limnAn+1f(x)−limnf(x)n=f∗(x)−0=f∗(x).◻

We'll see, in the course of proving Birkhoff's Theorem, that Anf→f∗ in L2(m) for every f∈L2(m). Since m(X)<∞ we know that L2(m) is a dense subspace of L1(m), and that its norm is stronger. Thus Anf→f∗ in the norm of L1(m) for each f in the dense subspace L2(m). It's easy to see that each An, now viewed as a linear operator on L1(m), has norm ≤1. This allows the L1convergence of Anf to f∗ to be extended from the dense subspace L2(m) to all of L1(m). A consequence of this and the fact that ∫Amf=∫f for each index n is:
∫f∗dm=limn∫Anfdm=∫fdm◻
3. Maximal Averages
Instead of trying to prove directly that the limit the averages Anf(x) exits, we focus on the supremum
A∗f(x)=supnAnf(x)(f∈L1(m),x∈X)
of these averages, which always exists (possibly
=∞). We call
A∗f the
maximal function of
f. The key to proving Birkhoff's Theorem lies with

The Maximal Ergodic Theorem. ∫{A∗f≥0}fdm≥0 for each f∈L1(m).
We can think of this theorem as saying that f∈L1(m) can't be too often negative on the set where some time average of the values of f∘Tk is nonnegative. Here's an alternative formulation:

The Maximal Ergodic Consequence. For each f∈L1(m) and λ>0:
m(A∗f>λ)≤∥f∥1λ.(*)
Proof. For f∈L1(m) and λ>0, replace f by f−λ in the conclusion of the Maximal Ergodic Theorem, and note that A∗(f−λ)=(A∗f)−λ. There results:
∫{A∗f≥λ}(f−λ)dm≥0,ie.,∫{A∗f≥λ}fdm≥λm({A∗f≥λ}).
Thus:
m({A∗f≥λ})≤1λ∫{A∗f≥λ}fdm≤1λ∫fdm=∥f∥1λ.◻
4. A Consequence of the "Maximal Ergodic Consequence"
Let 𝒢 denote the set of functions in f∈L1(m) for which limnAnf exists a.e..

Proposition. 𝒢 is closed in L1(m).
Proof. We wish to show that every limit point of 𝒢 belongs to 𝒢. For f∈L1(m) let
Ωf(x)=supnAnf(x)−infnAnf(x)(x∈X).
Then Ωf≥0 a.e. and limnAnf(x) exists iff Ωf(x)=0.
Thus for each f∈L1(m): 𝒢={f∈L1(m):Ωf=0a.e.}.

To prove Birkhoff's Ergodic Theorem it remains to show that the set 𝒢, for which does hold it hold (by definition), and which we've just seen is closed in L1(m), is dense therein. We'll see this in the course of proving another famous ergodic theorem, to which we now turn.
5. The Mean Ergodic Theorem
The setting now shifts to a Hilbert space ℋ on which acts a contraction U, i.e., a linear transformation for which ∥Uf∥≤∥f∥ for each f∈ℋ.
The Mean Ergodic Theorem. Suppose U is a contraction of a Hilbert space ℋ. Denote by 𝒦 the null space of I−U, and by P the orthogonal projection of ℋ onto 𝒦. Then for each f∈ℋ:
limn→∞1n∑k=0n−1Ukf=Pf,(♡)
the convergence taking place in the norm topology of ℋ.
For isometries U this result was published in 1932 by von Neumann [vN].
Comparison with Birkhoff's Theorem. In the setting of Birkhoff's Ergodic Theorem: (X,ℱ,m) is a measure space with m(X)<∞, and a measurepreserving transformation T:X→X is used to induce an isometry U on L1(m) by setting Uf=f∘T for each f∈L1(m). Now L2(m) is contained in L1(m) since m(X)<∞, and the measurepreservingness of T, which guaranteed that U is an isometry of L1(m), also guarantees that it's an isometry of L2(m). Since L2(m) is a Hilbert space, the Mean Ergodic Theorem shows that (♡) holds in the L2(m)norm for every f∈L2(m).
6. Some Hilbertspace preliminaries
Notation. We'll denote the null space of a linear transformation L by "kerL".
A Contraction Theorem. If U is a (linear) contraction on a Hilbert space ℋ, then ker(I−U)=ker(I−U∗), i.e,
Uf=f⟺U∗f=f(∀f∈ℋ)
Proof. Since the norm of a (bounded) Hilbertspace operator equals the norm of its adjoint, U∗ is also a contraction.
Suppose f∈ker(I−U), i.e., that Uf=f. Then (assuming complex scalars for our Hilbert space)
∥(I−U∗)f∥=∥f−U∗f∥=∥f∥2−2re<f,U∗f>+∥U∗f∥2≤2∥f∥2−2re<Uf,f>=2∥f∥2−2re<f,f>=0,
where in the third line we've used the fact that
∥U∗f∥≤f (since
U∗ is also a contraxtion), and in the fourth one the assumption that
Uf=f. Thus
f∈ker(I−U∗).
The argument so far shows that ker(I−U)⊂ker(I−U∗). The reverse inclusion follows upon substuting U∗ for U and using the fact that U∗∗=U.◻
A Contraction Corollary. If U is a contraction then
ℋ=ker(I−U)⊕ran(I−U)⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯(♣)
where the overline denotes "normclosure of
ℋ."
Proof. This follows from the general fact that if L is a bounded operator on ℋ then kerL∗=(ranL)⊥. In our case, L=I−U, so by the above Theorem, kerL=ranL⊥, from which follows ♣.
7. Proof of the Mean Ergodic Theorem.
We're given a contraction U of Hilbert space ℋ. For the operator I−U, let 𝒦 denote its null space and ℛ its range, i.e, 𝒦={f∈ℋ:Uf=f} and ℛ=(I−U)ℋ.
8. Proof of Birkhoff's Ergodic Theorem
We're back to the setting of a measure space (X,ℱ,m) with μ(X)<∞, and measurepreserving transformation T on X, with its induced isometry U on L1(m) defined by Uf=f∘T.

To show: ∃f∗∈L1(m) such that Anf(x)→f∗(x) for a.e. x∈U.

A "Proto"Birkhoff Theorem. Since μ(X)<∞ we have L2(m) contained (densely) in L1(m). Our proof of von Neumann's Ergodic Theorem involved proving the result first for the subspace 𝒟 formed by taking the orthogonal direct sum of the closed subspace 𝒦=ker(I−U) and the notnecessarilyclosed subspace ℛ=ran(I−U).

For our current purposes we can interpret the fact Anf=f for each index n and each f∈𝒦 as implying that, for each such f, the averages Anf converge pointwise to f.

On the other hand, if f is in ℛ, so has the form f=g−Ug for some g∈L2(m), we've seen that for each index n:
Anf=gn−Ungn.(**)
The first term* on the lefthand side of (*) converges to 0 pointwise on X. As for the second one, note that
∫X∑n=1∞Ung2n2dm=∑n=1∞∥Ung∥22n2=∑n=1∞∥g∥22n2<∞.
Thus the integrand on the lefthand side above is a series that converges a.e. on X, so its sequence of terms →0 a.e. on X. That is: on the lefthand side of (**): n−1(Ung)→0 a.e., hence on the righthand side: Anf→0 a.e. for every f∈𝒟.

So far: For each f∈𝒟 the sequence of averages (Anf) converges pointwise a.e. on X. Now 𝒟 is dense in L2(m), and L2(m) is dense in L1(m). Since convergence in L2(m) implies convergence in L1(m) (thanks again to the fact that m(X)<∞), we see that 𝒟 is dense in L1(m), hence the conclusion of the Birkhoff Ergodic Theorem holds for every f in a dense subset of L1(m):

Returning to the work of Section 3 on Birkhoff's Theoremwhere we used the Maximal Ergodic Theorem establish closedness for the subset 𝒢 of f∈L1(m) for which the averages Anf converge a.e.; we now know that 𝒢 is dense in L1(m), hence it's all of L1(m).◻
This completes the proof (modulo proving the Maximal Ergodic Theorem) of the Birkhoff Ergodic Theorem. For a readily available proof of the Maximal Ergodic Theorem, see Peter Oberly's lecture notes [Ob], Theorem 5, page 6.
9. The Lebesgue Differentiation Theorem.

The setting. For this one we work in L1(ℝd). For x∈ℝd and r>0 let Br(x) denote the open ball in ℝd of radius r centered at x. For f∈L1(ℝd), x∈ℝd, and r>0 let
Arf(x)=1m(Br(x))∫Br(x)fdm,
where m denotes Lebesgue measure on (the Lebesguemeasurable subsets of) ℝd.

The Theorem. Suppose f∈L1(ℝd). Then limr→0+Arf(x)=f(x) for a.e. x∈ℝd.

The Proof. We know the result for a dense subset of L1(ℝd), namely the continuous functions with compact support (if d=1 this is essentially the Fundamental Theorem of Integral Calculus). The heavy lifting is now supplied by the:

HardyLittlewood Maximal Theorem. For f∈L1(ℛd) let
A∗f(x)=supr>0Arf(x)(x∈ℝd).
Then there is a positive constant Cd, such that for each λ>0 and f∈L1(ℝd): m(A∗f≥λ)≤Cd∥f∥1λ
For a proof see, e.g., [Sh], §4. pp. 56.
This Maximal Theorem, along with our argument of Part 3 above, shows that the set of f∈L1(ℝd) for which the averages in the Lebesgue Differentiation Theorem converge a.e. is closed in L1(ℝd). But we already know the result holds for a dense subset, so therefore it must hold for every f∈L1(ℝd).
To show that these averages converge a.e. to f(x) takes just a little more work. For the details, see, e.g., [Sh], §3. pp. 45. ◻
10. Banach's Principle
The method behind the work just done generalizes considerably. Suppose that B is a Banach space and (X,ℱ,m) a measure space with m(X)<∞. Let L0(m) denote the space of (mequivalence classes of) ℱmeasurable, realvalued functions that take finite values a.e.

Continuity in measure. To say a linear transformation L:B→L0(m) is "continuous in measure" means that if (fn) is a sequence in B that converges in the norm of B to a vector f∈B, then Lfn→Lf in measure, i.e., that for every λ>0:
limn→∞m(Lfn−Lf>λ)=0.

The maximal function. Suppose B is a normed linear space and (Un) a sequence of linear transformations B→L0(m), each of which is continuous in measure. Define the maximal function U∗ of this sequence U∗:B→L0(m) by
(U∗v)(x)=supnUnv(x)(v∈B,x∈X).

In particular, if, for each v∈B the sequence (Unv) converges a.e. to an element of L0(m) (i.e., if the limit is finite a.e.), then U∗v is finite for a.e. for each v∈B. A surprising theorem of Banach asserts that the more is true.

Banach's Principle ([Ban],1926; [Gar], pp. 12). Suppose B is a Banach space and (Un) a sequence of linear transformations B→L0(m), each of which is continuous in measure. If (Tnv) converges a.e. for each b∈B. Then
sup{m({U∗v>λ}):∥v∥≤1}↘0asλ↗∞.(#)
In particular: U∗ is continuous in measure.

A Banach "Converse Principle". We've seen that maximal inequalities of the form (#) can give rise to a.e.convergence theorems; e.g., the Ergodic Maximal Theorem, in the form (*) of §3, and the HardyLittlewood Maximal Theorem of §4. In fact, this is always true; the argument of §3 shows:

Theorem. If B is a normed linear space and Un a sequence of continuous linear transformations B→L0(m) such for which (#) holds, then the set of v∈B for which (Unv) converges a.e. is closed in B.
References
[Ban] Stefan Banach, Sur la convergence presque partout de fonctionelles linéaires, Bull. Sci. Math., (2) 50 (1926) 2732 & 3643.
[Bir] George D. Birkhoff, Proof of the Ergodic theorem, Proc. Nat. Acad. Sci. 17 (1931) 656660.
[Gar] Adriano Garsia, Topics in Almost Everywhere Convergence, Lectures in Advanced Mathematics #4, Markham Publishing Co., Chicago, 1970.
[Ob] Peter Oberly, The Pointwise Ergodic Theorem and its applications, Lecture Notes, Portland State University Analysis Seminar, November 2018.
[Sh] Joel H. Shapiro, Almosteverywhere convergence ... done right! Lecture Notes, Portland State University Analysis Seminar, October 2017.
[vN] John von Neumann Proof of the QuasiErgodic Hypothesis, Proc. Nat. Acad. Sci. 18 (1932) 7082.