## The Birkhoff & Maximal Ergodic Theorems

### Joel Shapiro

(Revised 11/16/2018)

### 1. Review

• (X,,m)$(X,\mathscr{F},m)$ is a finite measure space
i.e., X$X$ is a set, $\mathscr{F}$ is a sigma-algebra of subsets, and m$m$ is a finite measure on X$X$.

• Notation: for fL1(m)$f\in L^1(m)$:       Xfdm=fdm=f$\int_X f\,dm = \int f\,dm = \int f$ .

• T:XX$T\colon X\to X$ is a measure-preserving transformation (mpt),
i.e., for each F$F\in\mathscr{F}$T1(E)$T^{-1}(E)\in \mathscr{F}$  and   m(T1(E))=m(E)$m(T^{-1}(E)) = m(E)$.

• The (T$T$-) orbit of a point xX$x\in X$ is the sequence (Tnx:n=0,1,2,)$(T^nx\colon n=0, 1, 2, \, \ldots)$.

• The Poincare Recurrence Theorem says that for T$T$ a mpt and E$E\in \mathscr{F}$ with m(E)>0$m(E)>0$:

TnxE$T^nx\in E$ infinitely often for almost-every xX$x\in X$.

In other words, "The T$T$-orbit of a.e. xE$x\in E$ revisits E$E$ infinitely often."

• Question: How often does the orbit of xE$x\in E$ revisit E$E$?

For a more quantitative version of our question, consider for fL1(m)$f\in L^1(m)$, and n$n$ is a non-negative integer, the time average Anf$A_nf$, defined by:

Anf(x):=1nk=0n1f(Tnx)(xX).

If, e.g., χE$\chi_E$, the characteristic function of E$E\in\mathscr{F}$ (value =1$\,=1$ at x$x$ if xE$x\in E$, and =0$\,=0$ otherwise), then AnχE(x)$A_n\chi_E(x)$ is the average number of visits in time n$n$ that the orbit of x$x$ makes to E$E$.

• The Birkhoff Ergodic Theorem (next section) concerns the long-term behavior of these averages.

### 2. The Birkhoff (pointwise) Ergodic Theorem

• Theorem ([Bir], 1931). For each fL1(m)$f\in L^1(m)$ there exists an $\mathscr{F}$-measurable function f$f^*$, finite-valued a.e. on X$X$, such that for a.e. xX$x\in X$:

limnAnf(x)=f(x).

• Special properties of   f${\bf f^*}$.

• fL1(m)$f^*\in L^1(m)$   (by Fatou's Lemma)
• fT=f$f^*\circ T = f^*$   (i.e., f$f$ is "T$T$-invariant)", and
• fdm=fdm$\int f^*\,dm =\int f\,dm$.
• For the special case f=χE$\,f=\chi_E$, where E$E\in\mathscr{F}$ and m(X)=1$m(X)=1$, Birkhoff's Theorem says that:

For a.e. xX$x\in X$ the "long-term average" number of visits, χE(x)$\chi_E^*(x)$, that the orbit of x$x$ makes to E$E$ exists, and its "space average" χEdm$\int \chi_E^*\,dm$ is just m(E)$m(E)$ (i.e, the "probability that a random point of X$X$ lies in E$E$")

• Proof of "special properties of   f${\bf f^*}$".

• (a) That   fL1(m)$f^*\in L^1(m)$ follows easily from Birkhoff's Theorem and Fatou's Lemma.

• (b) We have

Anf(Tx)=An+1f(TX)f(x)n
for each xX$x\in X$. Thus for a.e. xX$x\in X$:
limnAnf(Tx)=limnAn+1f(x)limnf(x)n=f(x)0=f(x).

• We'll see, in the course of proving Birkhoff's Theorem, that Anff$A_nf\to f^*$ in L2(m)$L^2(m)$ for every fL2(m)$f\in L^2(m)$. Since m(X)<$m(X)<\infty$ we know that L2(m)$L^2(m)$ is a dense subspace of L1(m)$L^1(m)$, and that its norm is stronger. Thus Anff$A_nf\to f^*$ in the norm of L1(m)$L^1(m)$ for each f$f$ in the dense subspace L2(m)$L^2(m)$. It's easy to see that each An$A_n$, now viewed as a linear operator on L1(m)$L^1(m)$, has norm 1$\le 1$. This allows the L1$L^1$-convergence of Anf$A_nf$ to f$f^*$ to be extended from the dense subspace L2(m)$L^2(m)$ to all of L1(m)$L^1(m)$. A consequence of this and the fact that Amf=f$\int A_mf = \int f$ for each index n$n$ is:

fdm=limnAnfdm=fdm

### 3. Maximal Averages

Instead of trying to prove directly that the limit the averages Anf(x)$A_nf(x)$ exits, we focus on the supremum

Af(x)=supnAnf(x)(fL1(m),xX)
of these averages, which always exists (possibly =$=\infty$). We call Af$A^*f$ the maximal function of f$f$. The key to proving Birkhoff's Theorem lies with

• The Maximal Ergodic Theorem. {Af0}fdm0$\int_{\{A^*f\ge 0\}} f\, dm \ge 0$ for each fL1(m)$f\in L^1(m)$.

We can think of this theorem as saying that fL1(m)$f\in L^1(m)$ can't be too often negative on the set where some time average of the values of fTk$f\circ T^k$ is non-negative. Here's an alternative formulation:

• The Maximal Ergodic Consequence. For each fL1(m)$f\in L^1(m)$ and λ>0$\lambda >0$:

m(A|f|>λ)f1λ.(*)

Proof. For fL1(m)$f\in L^1(m)$ and λ>0$\lambda>0$, replace f$f$ by |f|λ$|f|-\lambda$ in the conclusion of the Maximal Ergodic Theorem, and note that A(|f|λ)=(Af)λ$A^*(|f|-\lambda) = (A^*f)-\lambda$. There results:

{A|f|λ}(|f|λ)dm0,ie.,{A|f|λ}|f|dmλm({A|f|λ}).
Thus:
m({A|f|λ})1λ{A|f|λ}|f|dm1λ|f|dm=f1λ.

### 4. A Consequence of the "Maximal Ergodic Consequence"

Let 𝒢$\mathscr{G}$ denote the set of functions in fL1(m)$f\in L^1(m)$ for which limnAnf$\lim_n A_nf$ exists a.e..

• Proposition. 𝒢$\mathscr{G}$ is closed in L1(m)$L^1(m)$.

Proof. We wish to show that every limit point of 𝒢$\mathscr{G}$ belongs to 𝒢.$\mathscr{G}.$ For fL1(m)$f\in L^1(m)$ let

Ωf(x)=supnAnf(x)infnAnf(x)(xX).
Then Ωf0$\Omega f\ge 0$ a.e. and limnAnf(x)$\,\lim_nA_nf(x)\,$ exists iff Ωf(x)=0$\,\Omega f(x)=0\,$.

Thus for each fL1(m)$f\in L^1(m)$:   𝒢={fL1(m):Ωf=0a.e.}.$\mathscr{G}=\{f\in L^1(m) \colon\Omega f = 0 \,\,\text{a.e.}\}.$

• Suppose f$f$ is a limit point of 𝒢$\mathscr{G}$.   We wish to show that f𝒢$f\in\mathscr{G}$.
• To this end, fix ϵ>0$\epsilon>0$ and choose g𝒢$g\in\mathscr{G}$ such that fg1ϵ$\|f-g\|_1\le \epsilon$. Note that Ω(fg)=Ω(f)$\Omega(f-g)=\Omega(f)$ a.e. Thus for λ>0$\lambda>0$ the "Maximal Ergodic Consequence" implies:
m(Ωf>λ)=m(Ω(fg)>λ)fg1λϵλ.
• Consequently m(Ωf>λ)=0$m(\Omega f>\lambda)=0$ (because ϵ>0$\epsilon>0$ was arbitrary). But λ>0$\lambda>0$ is also arbitrary, and the sets (Ωf>λ)$(\Omega f>\lambda)$ increase as λ0$\lambda\searrow 0$, their union for λ>0$\lambda>0$ being (Ωf>0)$(\Omega f>0)$.
• Thus by upward-continuity of measures, m(Ωf>0)=0$m(\Omega f>0)=0$. $\hskip1in\Box$
• To prove Birkhoff's Ergodic Theorem it remains to show that the set 𝒢$\mathscr{{G}}$, for which does hold it hold (by definition), and which we've just seen is closed in L1(m)$L^1(m)$, is dense therein. We'll see this in the course of proving another famous ergodic theorem, to which we now turn.

### 5. The Mean Ergodic Theorem

The setting now shifts to a Hilbert space $\mathscr{H}$ on which acts a contraction U$U$, i.e., a linear transformation for which Uff$\|Uf\|\le \|f\|$ for each f$f\in\mathscr{H}$.

The Mean Ergodic Theorem. Suppose U$U$ is a contraction of a Hilbert space $\mathscr{H}$. Denote by 𝒦$\mathscr{K}$ the null space of IU$I-U$, and by P$P$ the orthogonal projection of $\mathscr{H}$ onto 𝒦$\mathscr{K}$. Then for each f$f\in \mathscr{H}$:

limn1nk=0n1Ukf=Pf,()
the convergence taking place in the norm topology of $\mathscr{H}$.

For isometries U$U$ this result was published in 1932 by von Neumann [vN].

Comparison with Birkhoff's Theorem. In the setting of Birkhoff's Ergodic Theorem: (X,,m)$(X,\mathscr{F},m)$ is a measure space with m(X)<$m(X)<\infty$, and a measure-preserving transformation T:XX$T\colon X\to X$ is used to induce an isometry U$U$ on L1(m)$L^1(m)$ by setting Uf=fT$Uf=f\circ T$ for each fL1(m)$f\in L^1(m)$. Now L2(m)$L^2(m)$ is contained in L1(m)$L^1(m)$ since m(X)<$m(X)<\infty$, and the measure-preserving-ness of T$T$, which guaranteed that U$U$ is an isometry of L1(m)$L^1(m)$, also guarantees that it's an isometry of L2(m)$L^2(m)$. Since L2(m)$L^2(m)$ is a Hilbert space, the Mean Ergodic Theorem shows that ($\heartsuit$) holds in the L2(m)$L^2(m)$-norm for every fL2(m)$f\in L^2(m)$.

### 6. Some Hilbert-space preliminaries

Notation. We'll denote the null space of a linear transformation L$L$ by "kerL$\,\ker L\,$".

A Contraction Theorem. If U$U$ is a (linear) contraction on a Hilbert space $\mathscr{H}$, then ker(IU)=ker(IU)$\ker\,(I-U)=\ker\,(I-U^*)$, i.e,

Uf=fUf=f(f)

Proof. Since the norm of a (bounded) Hilbert-space operator equals the norm of its adjoint, U$U^*$ is also a contraction.

Suppose fker(IU)$f\in\ker(I-U)$, i.e., that Uf=f$Uf=f$. Then (assuming complex scalars for our Hilbert space)

(IU)f=fUf=f22re<f,Uf>+Uf22f22re<Uf,f>=2f22re<f,f>=0,
where in the third line we've used the fact that Uff$\|U^*f\|\le f$ (since U$U^*$ is also a contraxtion), and in the fourth one the assumption that Uf=f$Uf=f$. Thus fker(IU)$f\in \ker(I-U^*)$.

The argument so far shows that ker(IU)ker(IU)$\ker(I-U)\subset \ker(I-U^*)$. The reverse inclusion follows upon substuting U$U^*$ for U$U$ and using the fact that U=U.$U^{**}=U. \hskip 1in \Box$

A Contraction Corollary. If U$U$ is a contraction then

=ker(IU)ran(IU)()
where the overline denotes "norm-closure of $\mathscr{H}$."

Proof. This follows from the general fact that if L$L$ is a bounded operator on $\mathscr{H}$ then kerL=(ranL)$\ker L^* = (\text{ran}\,L)^\perp$. In our case, L=IU$L=I-U$, so by the above Theorem, kerL=ranL$\ker L = \text{ran} L^\perp$, from which follows $\clubsuit$.

### 7. Proof of the Mean Ergodic Theorem.

We're given a contraction U$U$ of Hilbert space $\mathscr{H}$. For the operator IU$I-U$, let 𝒦$\mathscr{K}$ denote its null space and $\mathscr{R}$ its range, i.e, 𝒦={f:Uf=f}$\mathscr{K}=\{f\in\mathscr{H}\colon Uf=f\}$ and =(IU)$\mathscr{R} = (I-U)\mathscr{H}$.

• The "Contraction Corollary" of the previous section, allows us to split $\mathscr{H}$ into the orthogonal direct sum 𝒦$\mathscr{K}\oplus\overline{\mathscr{R}}$.
• To show: The sequence of averages An$A_n$ of the sequence of iterates of U$U$ converges pointwise on $\mathscr{H}$ to the orthogonal projection P$P$ taking $\mathscr{H}$ onto 𝒦$\mathscr{K}$.
• For   f𝒦$f\in \mathscr{K}$  we have  Uf=f$Uf=f$, so  Ukf=f$U^kf=f$ for each non-negative integer k$k$, hence  Anf=f$A_nf=f$  for each n$n$. Thus we've got the desired result for the restriction of U$U$ to 𝒦$\mathscr{K}$.
• For f$f\in \mathscr{R}$ we have f=(IU)g$\,f=(I-U)g\,$ for some g$g\in\mathscr{H}$, whereupon
Anf=1nk=0n1Uk(gUg)=1n(gUng),
and since Un$U^n$ is also a contraction we have Ungg$\|U^ng\|\le \|g\|$, so
Anf2gn0asn.
Thus  AnfPf$A_nf\to Pf$   if  f$f$ belongs to either 𝒦$\mathscr{K}$ or $\mathscr{R}$.
• Let 𝒟:=𝒦+$\mathscr{D}: =\mathscr{K}+\mathscr{R}\,$, a dense subspace of $\mathscr{H}$. We've just seen that AnfPf$A_nf\to Pf$ for each f$f$ in 𝒟$\mathscr{D}$. Since 𝒟$\mathscr{D}$ is dense in $\mathscr{H}$, and the operators An$A_n$ all have norm 1$\le 1$, a standard "ϵ/2$\epsilon/2$-argument" shows that AnfPf$A_nf\to Pf$ for every f.$f\in\mathscr{H}.\hskip .5in\Box$

### 8. Proof of Birkhoff's Ergodic Theorem

We're back to the setting of a measure space (X,,m)$(X,\mathscr{F},m)$ with μ(X)<$\mu(X)<\infty$, and measure-preserving transformation T$T$ on X$X$, with its induced isometry U$U$ on L1(m)$L^1(m)$ defined by Uf=fT$Uf=f\circ T$.

• To show: fL1(m)$\exists\,f^*\in L^1(m)$ such that Anf(x)f(x)$A^nf(x)\to f^*(x)$ for a.e. xU$x\in U$.

• A "Proto"-Birkhoff Theorem. Since μ(X)<$\mu(X)<\infty$ we have L2(m)$L^2(m)$ contained (densely) in L1(m)$L^1(m)$. Our proof of von Neumann's Ergodic Theorem involved proving the result first for the subspace 𝒟$\mathscr{D}$ formed by taking the orthogonal direct sum of the closed subspace 𝒦=ker(IU)$\mathscr{K}=\ker(I-U)$ and the not-necessarily-closed subspace   =ran(IU)$\mathscr{R}=\text{ran}\,(I-U)$.

• For our current purposes we can interpret the fact Anf=f$A_nf=f$ for each index n$n$ and each f𝒦$f\in\mathscr{K}$ as implying that, for each such f$\,f$, the averages Anf$A^nf$ converge pointwise to f$f$.

• On the other hand, if f$f$ is in $\mathscr{R}$, so has the form f=gUg$f=g-Ug$ for some gL2(m)$g\in L^2(m)$, we've seen that for each index n$n$:

Anf=gnUngn.(**)
The first term* on the left-hand side of (*) converges to 0 pointwise on X$X$. As for the second one, note that
Xn=1|Ung|2n2dm=n=1Ung22n2=n=1g22n2<.
Thus the integrand on the left-hand side above is a series that converges a.e. on X$X$, so its sequence of terms 0$\to 0$ a.e. on X$X$. That is: on the left-hand side of (**):   n1(Ung)0$n^{-1}(U^ng)\to 0\,$ a.e., hence on the right-hand side: Anf0$A_nf\to 0$ a.e. for every f𝒟$f\in \mathscr{D}$.

• So far: For each f𝒟$f\in\mathscr{D}$ the sequence of averages (Anf)$(A_nf)$ converges pointwise a.e. on X$X$. Now 𝒟$\mathscr{D}$ is dense in L2(m)$L^2(m)$, and L2(m)$L^2(m)$ is dense in L1(m)$L^1(m)$. Since convergence in L2(m)$L^2(m)$ implies convergence in L1(m)$L^1(m)$ (thanks again to the fact that m(X)<$m(X)<\infty$), we see that 𝒟$\mathscr{D}$ is dense in L1(m)$L^1(m)$, hence the conclusion of the Birkhoff Ergodic Theorem holds for every f$f$ in a dense subset of L1(m)$L^1(m)$:

• Returning to the work of Section 3 on Birkhoff's Theorem---where we used the Maximal Ergodic Theorem establish closed-ness for the subset 𝒢$\mathscr{G}$ of fL1(m)$f\in L^1(m)$ for which the averages Anf$A_nf$ converge a.e.; we now know that 𝒢$\mathscr{G}$ is dense in L1(m)$L^1(m)$, hence it's all of L1(m).$L^1(m). \hskip 1in \Box$

This completes the proof (modulo proving the Maximal Ergodic Theorem) of the Birkhoff Ergodic Theorem. For a readily available proof of the Maximal Ergodic Theorem, see Peter Oberly's lecture notes [Ob], Theorem 5, page 6.

### 9. The Lebesgue Differentiation Theorem.

• The setting. For this one we work in L1(d)$L^1(\mathbb{R}^d)$. For xd$x\in \mathbb{R}^d$ and r>0$r>0$ let Br(x)$B_r(x)$ denote the open ball in d$\mathbb{R}^d$ of radius r$r$ centered at x$x$. For fL1(d)$f\in L^1(\mathbb{R}^d)$, xd$x\in \mathbb{R}^d$, and r>0$r>0$ let

Arf(x)=1m(Br(x))Br(x)fdm,
where m$m$ denotes Lebesgue measure on (the Lebesgue-measurable subsets of) d$\mathbb{R}^d$.

• The Theorem. Suppose fL1(d)$f\in L^1(\mathbb{R}^d)$. Then   limr0+Arf(x)=f(x)$\lim_{r\to 0+}A_rf(x) = f(x)$  for a.e.  xd$x\in \mathbb{R}^d$.

• The Proof. We know the result for a dense subset of L1(d)$L^1(\mathbb{R}^d)$, namely the continuous functions with compact support (if d=1$d=1$ this is essentially the Fundamental Theorem of Integral Calculus). The heavy lifting is now supplied by the:

• Hardy-Littlewood Maximal Theorem. For fL1(d)$f\in L^1(\mathscr{R}^d)$ let

Af(x)=supr>0Ar|f|(x)(xd).
Then there is a positive constant Cd$C_d$, such that for each λ>0$\lambda>0$ and fL1(d)$f\in L^1(\mathbb{R}^d)$:
m(Afλ)Cdf1λ
For a proof see, e.g., [Sh], §$\S$4. pp. 5-6.

This Maximal Theorem, along with our argument of Part 3 above, shows that the set of fL1(d)$f\in L^1(\mathbb{R}^d)$ for which the averages in the Lebesgue Differentiation Theorem converge a.e. is closed in L1(d)$L^1(\mathbb{R}^d)$. But we already know the result holds for a dense subset, so therefore it must hold for every fL1(d)$f\in L^1(\mathbb{R}^d)$.

To show that these averages converge a.e. to f(x)$f(x)$ takes just a little more work. For the details, see, e.g., [Sh], §$\S$3. pp. 4-5. $\hskip 1in \Box$

### 10. Banach's Principle

The method behind the work just done generalizes considerably. Suppose that B$B$ is a Banach space and (X,,m)$(X,\mathscr{F},m)$ a measure space with m(X)<$m(X)<\infty$. Let L0(m)$L^0(m)$ denote the space of (m$m$-equivalence classes of) $\mathscr{F}$-measurable, real-valued functions that take finite values a.e.

• Continuity in measure. To say a linear transformation L:BL0(m)$L\colon B\to L^0(m)$ is "continuous in measure" means that if (fn)$(f_n)$ is a sequence in B$B$ that converges in the norm of B$B$ to a vector fB$f\in B$, then LfnLf$L\,f_n\to L\,f$ in measure, i.e., that for every λ>0$\lambda>0$:

limnm(|LfnLf|>λ)=0.

• The maximal function. Suppose B$B$ is a normed linear space and (Un)$(U_n)$ a sequence of linear transformations BL0(m)$B\to L^0(m)$, each of which is continuous in measure. Define the maximal function U$U^*$ of this sequence U:BL0(m)$\,U^*\colon B\to L^0(m)\,$ by

(Uv)(x)=supn|Unv(x)|(vB,xX).

• In particular, if, for each vB$v\in B$ the sequence (Unv)$(U_nv)$ converges a.e. to an element of L0(m)$L^0(m)$ (i.e., if the limit is finite a.e.), then Uv$U^*v$ is finite for a.e. for each vB$v\in B$. A surprising theorem of Banach asserts that the more is true.

• Banach's Principle ([Ban],1926; [Gar], pp. 1-2). Suppose B$B$ is a Banach space and (Un)$(U_n)$ a sequence of linear transformations BL0(m)$B\to L^0(m)$, each of which is continuous in measure. If (Tnv)$(T_nv)$ converges a.e. for each bB$b\in B$. Then

sup{m({Uv>λ}):v1}0asλ.(#)

In particular: U$U^*$ is continuous in measure.

• A Banach "Converse Principle". We've seen that maximal inequalities of the form (#) can give rise to a.e.-convergence theorems; e.g., the Ergodic Maximal Theorem, in the form (*) of §$\S$3, and the Hardy-Littlewood Maximal Theorem of §$\S$4. In fact, this is always true; the argument of §$\S$3 shows:

• Theorem. If B$B$ is a normed linear space and Un$U_n$ a sequence of continuous linear transformations BL0(m)$B\to L^0(m)$ such for which (#) holds, then the set of vB$v\in B$ for which (Unv)$(U_nv)$ converges a.e. is closed in B$B$.

### References

[Ban] Stefan Banach, Sur la convergence presque partout de fonctionelles linéaires, Bull. Sci. Math., (2) 50 (1926) 27-32 & 36-43.

[Bir] George D. Birkhoff, Proof of the Ergodic theorem, Proc. Nat. Acad. Sci. 17 (1931) 656-660.

[Gar] Adriano Garsia, Topics in Almost Everywhere Convergence, Lectures in Advanced Mathematics #4, Markham Publishing Co., Chicago, 1970.

[Ob] Peter Oberly, The Pointwise Ergodic Theorem and its applications, Lecture Notes, Portland State University Analysis Seminar, November 2018.

[Sh] Joel H. Shapiro, Almost-everywhere convergence ... done right!  Lecture Notes, Portland State University Analysis Seminar, October 2017.

[vN] John von Neumann Proof of the Quasi-Ergodic Hypothesis, Proc. Nat. Acad. Sci. 18 (1932) 70-82.