The Birkhoff & Maximal Ergodic Theorems

Joel Shapiro

(Revised 11/16/2018)

1. Review

$(X,\mathscr{F},m)$ is a finite measure space
i.e., $X$ is a set, $\mathscr{F}$ is a sigma-algebra of subsets, and $m$ is a finite measure on $X$ .
Notation: for $f\in L^1(m)$ : $\int_X f\,dm = \int f\,dm = \int f$ .
$T\colon X\to X$ is a measure-preserving transformation (mpt),
i.e., for each $F\in\mathscr{F}$ : $T^{-1}(E)\in \mathscr{F}$ and $m(T^{-1}(E)) = m(E)$ .
The ( $T$ -) orbit of a point $x\in X$ is the sequence $(T^nx\colon n=0, 1, 2, \, \ldots)$ .
The Poincare Recurrence Theorem says that for $T$ a mpt and $E\in \mathscr{F}$ with $m(E)>0$ :

$T^nx\in E$ infinitely often for almost-every $x\in X$ .

In other words, "The $T$ -orbit of a.e. $x\in E$ revisits $E$ infinitely often."
Question: How often does the orbit of $x\in E$ revisit $E$ ?

For a more quantitative version of our question, consider for $f\in L^1(m)$ , and $n$ is a non-negative integer, the time average $A_nf$ , defined by:

$A n f (x) : = 1 n \sum k = 0 n - 1 f (T n x) (x \in X) .$ $A_nf(x) := \frac{1}{n}\sum_{k=0}^{n-1}f(T^nx) \qquad (x\in X).$

If, e.g., $\chi_E$ , the characteristic function of $E\in\mathscr{F}$ (value $\,=1$ at $x$ if $x\in E$ , and $\,=0$ otherwise), then $A_n\chi_E(x)$ is the average number of visits in time $n$ that the orbit of $x$ makes to $E$ .
The Birkhoff Ergodic Theorem (next section) concerns the long-term behavior of these averages.

2. The Birkhoff (pointwise) Ergodic Theorem

Theorem ([Bir], 1931). For each $f\in L^1(m)$ there exists an $\mathscr{F}$ -measurable function $f^*$ , finite-valued a.e. on $X$ , such that for a.e. $x\in X$ :

$lim n \to \infty A n f (x) = f * (x) .$ $\lim_{n\to \infty}A_nf(x) = f^*(x).$
Special properties of ${\bf f^*}$ .
- $f^*\in L^1(m)$ (by Fatou's Lemma)
- $f^*\circ T = f^*$ (i.e., $f$ is " $T$ -invariant)", and
- $\int f^*\,dm =\int f\,dm$ .
For the special case $\,f=\chi_E$ , where $E\in\mathscr{F}$ and $m(X)=1$ , Birkhoff's Theorem says that:

For a.e. $x\in X$ the "long-term average" number of visits, $\chi_E^*(x)$ , that the orbit of $x$ makes to $E$ exists, and its "space average" $\int \chi_E^*\,dm$ is just $m(E)$ (i.e, the "probability that a random point of $X$ lies in $E$ ")
Proof of "special properties of ${\bf f^*}$ ".
- (a) That $f^*\in L^1(m)$ follows easily from Birkhoff's Theorem and Fatou's Lemma.
- (b) We have
  $A n f (T x) = A n + 1 f (T X) - f ( x ) n$ $A_nf(Tx) = A_{n+1}f(TX)-\frac{f(x)}{n}$ for each $x\in X$ . Thus for a.e. $x\in X$ :
  $lim n A n f (T x) = lim n A n + 1 f (x) - lim n f ( x ) n = f * (x) - 0 = f * (x) . ◻$ $\lim_n A_nf(Tx) = \lim_n A_{n+1}f(x)-\lim_n\frac{f(x)}{n}=f^*(x)-0 = f^*(x).\qquad\Box$
- We'll see, in the course of proving Birkhoff's Theorem, that $A_nf\to f^*$ in $L^2(m)$ for every $f\in L^2(m)$ . Since $m(X)<\infty$ we know that $L^2(m)$ is a dense subspace of $L^1(m)$ , and that its norm is stronger. Thus $A_nf\to f^*$ in the norm of $L^1(m)$ for each $f$ in the dense subspace $L^2(m)$ . It's easy to see that each $A_n$ , now viewed as a linear operator on $L^1(m)$ , has norm $\le 1$ . This allows the $L^1$ -convergence of $A_nf$ to $f^*$ to be extended from the dense subspace $L^2(m)$ to all of $L^1(m)$ . A consequence of this and the fact that $\int A_mf = \int f$ for each index $n$ is:
  
  $\int f * d m = lim n \int A n f d m = \int f d m ◻$ $\int f^*\,dm = \lim_n\int A_n f\,dm = \int f\,dm \qquad\qquad\Box$

3. Maximal Averages

Instead of trying to prove directly that the limit the averages $A_nf(x)$ exits, we focus on the supremum

A * f (x) = sup n A n f (x) (f \in L 1 (m), x \in X)

$A^*f(x) = \sup_n A_n f(x) \qquad (f \in L^1(m), \, x\in X)$ of these averages, which always exists (possibly =∞ $=\infty$ ). We call A∗f $A^*f$ the maximal function of f $f$ . The key to proving Birkhoff's Theorem lies with

The Maximal Ergodic Theorem. $\int_{\{A^*f\ge 0\}} f\, dm \ge 0$ for each $f\in L^1(m)$ .

We can think of this theorem as saying that $f\in L^1(m)$ can't be too often negative on the set where some time average of the values of $f\circ T^k$ is non-negative. Here's an alternative formulation:
The Maximal Ergodic Consequence. For each $f\in L^1(m)$ and $\lambda >0$ :

$m (A * | f | > λ) \leq ∥ f ∥ 1 λ . (*)$ $m(A^*|f|>\lambda)\le \frac{\|\,f\|_1}{\lambda}\,. \tag{*}$

Proof. For $f\in L^1(m)$ and $\lambda>0$ , replace $f$ by $|f|-\lambda$ in the conclusion of the Maximal Ergodic Theorem, and note that $A^*(|f|-\lambda) = (A^*f)-\lambda$ . There results:

$\int {A * | f | \geq λ} (| f | - λ) d m \geq 0, ie., \int {A * | f | \geq λ} | f | d m \geq λ m ({A * | f | \geq λ}) .$ $\int_{\{A^*|\,f|\ge \lambda\}} (|\,f|-\lambda)\,dm \ge 0,\quad\text{ie.,}\quad\int_{\{A^*|\,f|\ge \lambda\}}|\,f|\,dm \ge \lambda\, m(\{{A^*|\,f|\ge \lambda\})}.$ Thus:
$m ({A * | f | \geq λ}) \leq 1 λ \int {A * | f | \geq λ} | f | d m \leq 1 λ \int | f | d m = ∥ f ∥ 1 λ . ◻$ $m(\{{A^*|f|\ge \lambda\})}\le \frac{1}{\lambda}\int_{\{A^*|f|\ge \lambda\}}|\,f|\,dm \le \frac{1}{\lambda}\int|\,f|\,dm = \frac{\|f\|_1}{\lambda}. \qquad\Box$

4. A Consequence of the "Maximal Ergodic Consequence"

Let $\mathscr{G}$ denote the set of functions in $f\in L^1(m)$ for which $\lim_n A_nf$ exists a.e..

Proposition. $\mathscr{G}$ is closed in $L^1(m)$ .

Proof. We wish to show that every limit point of $\mathscr{G}$ belongs to $\mathscr{G}.$ For $f\in L^1(m)$ let

Ωf(x)=supnAnf(x)−infnAnf(x)(x∈X).
Then Ωf≥0 a.e. and limnAnf(x) exists iff Ωf(x)=0.

Thus for each $f\in L^1(m)$ : $\mathscr{G}=\{f\in L^1(m) \colon\Omega f = 0 \,\,\text{a.e.}\}.$
- Suppose $f$ is a limit point of $\mathscr{G}$ . We wish to show that $f\in\mathscr{G}$ .
- To this end, fix $\epsilon>0$ and choose $g\in\mathscr{G}$ such that $\|f-g\|_1\le \epsilon$ . Note that $\Omega(f-g)=\Omega(f)$ a.e. Thus for $\lambda>0$ the "Maximal Ergodic Consequence" implies:
  $m (Ω f > λ) = m (Ω (f - g) > λ) \leq ∥ f - g ∥ 1 λ \leq ϵ λ .$ $\begin{align*} m(\Omega f>\lambda) &= m(\Omega (f-g)>\lambda) \le \frac{\|f-g\|_1}{\lambda} \le \frac{\epsilon}{\lambda}\,. \end{align*}$
- Consequently $m(\Omega f>\lambda)=0$ (because $\epsilon>0$ was arbitrary). But $\lambda>0$ is also arbitrary, and the sets $(\Omega f>\lambda)$ increase as $\lambda\searrow 0$ , their union for $\lambda>0$ being $(\Omega f>0)$ .
- Thus by upward-continuity of measures, $m(\Omega f>0)=0$ . $\hskip1in\Box$
To prove Birkhoff's Ergodic Theorem it remains to show that the set $\mathscr{{G}}$ , for which does hold it hold (by definition), and which we've just seen is closed in $L^1(m)$ , is dense therein. We'll see this in the course of proving another famous ergodic theorem, to which we now turn.

5. The Mean Ergodic Theorem

The setting now shifts to a Hilbert space $\mathscr{H}$ on which acts a contraction $U$ , i.e., a linear transformation for which $\|Uf\|\le \|f\|$ for each $f\in\mathscr{H}$ .

The Mean Ergodic Theorem. Suppose $U$ is a contraction of a Hilbert space $\mathscr{H}$ . Denote by $\mathscr{K}$ the null space of $I-U$ , and by $P$ the orthogonal projection of $\mathscr{H}$ onto $\mathscr{K}$ . Then for each $f\in \mathscr{H}$ :

lim n \to \infty 1 n \sum k = 0 n - 1 U k f = P f, (♡)

$\lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n-1}U^kf = Pf,\tag{$\heartsuit$}$ the convergence taking place in the norm topology of $\mathscr{H}$ .

For isometries $U$ this result was published in 1932 by von Neumann [vN].

Comparison with Birkhoff's Theorem. In the setting of Birkhoff's Ergodic Theorem: $(X,\mathscr{F},m)$ is a measure space with $m(X)<\infty$ , and a measure-preserving transformation $T\colon X\to X$ is used to induce an isometry $U$ on $L^1(m)$ by setting $Uf=f\circ T$ for each $f\in L^1(m)$ . Now $L^2(m)$ is contained in $L^1(m)$ since $m(X)<\infty$ , and the measure-preserving-ness of $T$ , which guaranteed that $U$ is an isometry of $L^1(m)$ , also guarantees that it's an isometry of $L^2(m)$ . Since $L^2(m)$ is a Hilbert space, the Mean Ergodic Theorem shows that ( $\heartsuit$ ) holds in the $L^2(m)$ -norm for every $f\in L^2(m)$ .

6. Some Hilbert-space preliminaries

Notation. We'll denote the null space of a linear transformation $L$ by " $\,\ker L\,$ ".

A Contraction Theorem. If $U$ is a (linear) contraction on a Hilbert space $\mathscr{H}$ , then $\ker\,(I-U)=\ker\,(I-U^*)$ , i.e,

U f = f ⟺ U * f = f (\forall f \in ℋ)

$Uf=f \iff U^*f=f \quad (\forall\, f\in \mathscr{H})$

Proof. Since the norm of a (bounded) Hilbert-space operator equals the norm of its adjoint, $U^*$ is also a contraction.

Suppose $f\in\ker(I-U)$ , i.e., that $Uf=f$ . Then (assuming complex scalars for our Hilbert space)

∥ (I - U *) f ∥ = ∥ f - U * f ∥ = ∥ f ∥ 2 - 2 re < f, U * f > + ∥ U * f ∥ 2 \leq 2 ∥ f ∥ 2 - 2 re < U f, f > = 2 ∥ f ∥ 2 - 2 re < f, f > = 0,

$\begin{align*} \|(I-U^*)f\| &= \|f-U^*f\|\\ &= \|f\|^2 - 2 \,\text{re}\,<f,U^*f> + \|U^*f\|^2 \\ &\le 2\|f\|^2- 2 \,\text{re}\,<Uf,f> \\ &= 2\|f\|^2- 2 \,\text{re}\,<f,f>\\ &=0\,, \end{align*}$ where in the third line we've used the fact that ∥U∗f∥≤f $\|U^*f\|\le f$ (since U∗ $U^*$ is also a contraxtion), and in the fourth one the assumption that Uf=f $Uf=f$ . Thus f∈ker(I−U∗) $f\in \ker(I-U^*)$ .

The argument so far shows that $\ker(I-U)\subset \ker(I-U^*)$ . The reverse inclusion follows upon substuting $U^*$ for $U$ and using the fact that $U^{**}=U. \hskip 1in \Box$

A Contraction Corollary. If $U$ is a contraction then

ℋ = ker (I - U) \oplus ran (I - U) ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ (♣)

$\quad \mathscr{H}= \ker\,(I-U)\oplus\overline{\text{ran}\,(I-U)} \tag{$\clubsuit$}$ where the overline denotes "norm-closure of ℋ $\mathscr{H}$ ."

Proof. This follows from the general fact that if $L$ is a bounded operator on $\mathscr{H}$ then $\ker L^* = (\text{ran}\,L)^\perp$ . In our case, $L=I-U$ , so by the above Theorem, $\ker L = \text{ran} L^\perp$ , from which follows $\clubsuit$ .

7. Proof of the Mean Ergodic Theorem.

We're given a contraction $U$ of Hilbert space $\mathscr{H}$ . For the operator $I-U$ , let $\mathscr{K}$ denote its null space and $\mathscr{R}$ its range, i.e, $\mathscr{K}=\{f\in\mathscr{H}\colon Uf=f\}$ and $\mathscr{R} = (I-U)\mathscr{H}$ .

The "Contraction Corollary" of the previous section, allows us to split $\mathscr{H}$ into the orthogonal direct sum $\mathscr{K}\oplus\overline{\mathscr{R}}$ .
To show: The sequence of averages $A_n$ of the sequence of iterates of $U$ converges pointwise on $\mathscr{H}$ to the orthogonal projection $P$ taking $\mathscr{H}$ onto $\mathscr{K}$ .
For $f\in \mathscr{K}$ we have $Uf=f$ , so $U^kf=f$ for each non-negative integer $k$ , hence $A_nf=f$ for each $n$ . Thus we've got the desired result for the restriction of $U$ to $\mathscr{K}$ .
For $f\in \mathscr{R}$ we have $\,f=(I-U)g\,$ for some $g\in\mathscr{H}$ , whereupon
$A n f = 1 n \sum k = 0 n - 1 U k (g - U g) = 1 n (g - U n g),$ $A_nf = \frac{1}{n}\sum_{k=0}^{n-1}U^k(g-Ug) = \frac{1}{n}(g-U^ng),$ and since $U^n$ is also a contraction we have $\|U^ng\|\le \|g\|$ , so $∥ A n f ∥ \leq 2 ∥ g ∥ n \to 0 as n \to \infty .$ $\|A_nf\|\le \frac{2\|g\|}{n}\to 0 \quad\text{as}\quad n\to \infty.$ Thus $A_nf\to Pf$ if $f$ belongs to either $\mathscr{K}$ or $\mathscr{R}$ .
Let $\mathscr{D}: =\mathscr{K}+\mathscr{R}\,$ , a dense subspace of $\mathscr{H}$ . We've just seen that $A_nf\to Pf$ for each $f$ in $\mathscr{D}$ . Since $\mathscr{D}$ is dense in $\mathscr{H}$ , and the operators $A_n$ all have norm $\le 1$ , a standard " $\epsilon/2$ -argument" shows that $A_nf\to Pf$ for every $f\in\mathscr{H}.\hskip .5in\Box$

8. Proof of Birkhoff's Ergodic Theorem

We're back to the setting of a measure space $(X,\mathscr{F},m)$ with $\mu(X)<\infty$ , and measure-preserving transformation $T$ on $X$ , with its induced isometry $U$ on $L^1(m)$ defined by $Uf=f\circ T$ .

To show: $\exists\,f^*\in L^1(m)$ such that $A^nf(x)\to f^*(x)$ for a.e. $x\in U$ .
A "Proto"-Birkhoff Theorem. Since $\mu(X)<\infty$ we have $L^2(m)$ contained (densely) in $L^1(m)$ . Our proof of von Neumann's Ergodic Theorem involved proving the result first for the subspace $\mathscr{D}$ formed by taking the orthogonal direct sum of the closed subspace $\mathscr{K}=\ker(I-U)$ and the not-necessarily-closed subspace $\mathscr{R}=\text{ran}\,(I-U)$ .
- For our current purposes we can interpret the fact $A_nf=f$ for each index $n$ and each $f\in\mathscr{K}$ as implying that, for each such $\,f$ , the averages $A^nf$ converge pointwise to $f$ .
- On the other hand, if $f$ is in $\mathscr{R}$ , so has the form $f=g-Ug$ for some $g\in L^2(m)$ , we've seen that for each index $n$ :
  
  $A n f = g n - U n g n . (**)$ $A_nf = \frac{g}{n} - \frac{U^ng}{n}\,.\tag{**}$ The first term* on the left-hand side of (*) converges to 0 pointwise on $X$ . As for the second one, note that
  $\int X \sum n = 1 \infty | U n g | 2 n 2 d m = \sum n = 1 \infty ∥ U n g ∥ 2 2 n 2 = \sum n = 1 \infty ∥ g ∥ 2 2 n 2 < \infty .$ $\int_X\,\sum_{n=1}^\infty\frac{|U^ng|^2}{n^2}dm = \sum_{n=1}^\infty\frac{\|U^ng\|_2^2}{n^2}=\sum_{n=1}^\infty\frac{\|g\|_2^2}{n^2}<\infty.$ Thus the integrand on the left-hand side above is a series that converges a.e. on $X$ , so its sequence of terms $\to 0$ a.e. on $X$ . That is: on the left-hand side of (**): $n^{-1}(U^ng)\to 0\,$ a.e., hence on the right-hand side: $A_nf\to 0$ a.e. for every $f\in \mathscr{D}$ .
- So far: For each $f\in\mathscr{D}$ the sequence of averages $(A_nf)$ converges pointwise a.e. on $X$ . Now $\mathscr{D}$ is dense in $L^2(m)$ , and $L^2(m)$ is dense in $L^1(m)$ . Since convergence in $L^2(m)$ implies convergence in $L^1(m)$ (thanks again to the fact that $m(X)<\infty$ ), we see that $\mathscr{D}$ is dense in $L^1(m)$ , hence the conclusion of the Birkhoff Ergodic Theorem holds for every $f$ in a dense subset of $L^1(m)$ :
- Returning to the work of Section 3 on Birkhoff's Theorem---where we used the Maximal Ergodic Theorem establish closed-ness for the subset $\mathscr{G}$ of $f\in L^1(m)$ for which the averages $A_nf$ converge a.e.; we now know that $\mathscr{G}$ is dense in $L^1(m)$ , hence it's all of $L^1(m). \hskip 1in \Box$
This completes the proof (modulo proving the Maximal Ergodic Theorem) of the Birkhoff Ergodic Theorem. For a readily available proof of the Maximal Ergodic Theorem, see Peter Oberly's lecture notes [Ob], Theorem 5, page 6.

9. The Lebesgue Differentiation Theorem.

The setting. For this one we work in $L^1(\mathbb{R}^d)$ . For $x\in \mathbb{R}^d$ and $r>0$ let $B_r(x)$ denote the open ball in $\mathbb{R}^d$ of radius $r$ centered at $x$ . For $f\in L^1(\mathbb{R}^d)$ , $x\in \mathbb{R}^d$ , and $r>0$ let

$A r f (x) = 1 m ( B r ( x ) ) \int B r (x) f d m,$ $A_rf(x) = \frac{1}{m(B_r(x))}\int_{B_r(x)}f\,dm\,,$ where $m$ denotes Lebesgue measure on (the Lebesgue-measurable subsets of) $\mathbb{R}^d$ .
The Theorem. Suppose $f\in L^1(\mathbb{R}^d)$ . Then $\lim_{r\to 0+}A_rf(x) = f(x)$ for a.e. $x\in \mathbb{R}^d$ .
The Proof. We know the result for a dense subset of $L^1(\mathbb{R}^d)$ , namely the continuous functions with compact support (if $d=1$ this is essentially the Fundamental Theorem of Integral Calculus). The heavy lifting is now supplied by the:
Hardy-Littlewood Maximal Theorem. For $f\in L^1(\mathscr{R}^d)$ let
$A * f (x) = sup r > 0 A r | f | (x) (x \in ℝ d) .$ $A^*f(x) = \sup_{r>0} A_r|f|(x) \qquad (x \in \mathbb{R}^d).$ Then there is a positive constant $C_d$ , such that for each $\lambda>0$ and $f\in L^1(\mathbb{R}^d)$ : $m (A * f \geq λ) \leq C d ∥ f ∥ 1 λ$ $m(A^*f\ge \lambda)\le C_d\frac{\|f\|_1}{\lambda}$ For a proof see, e.g., [Sh], $\S$ 4. pp. 5-6.

This Maximal Theorem, along with our argument of Part 3 above, shows that the set of $f\in L^1(\mathbb{R}^d)$ for which the averages in the Lebesgue Differentiation Theorem converge a.e. is closed in $L^1(\mathbb{R}^d)$ . But we already know the result holds for a dense subset, so therefore it must hold for every $f\in L^1(\mathbb{R}^d)$ .

To show that these averages converge a.e. to $f(x)$ takes just a little more work. For the details, see, e.g., [Sh], $\S$ 3. pp. 4-5. $\hskip 1in \Box$

10. Banach's Principle

The method behind the work just done generalizes considerably. Suppose that $B$ is a Banach space and $(X,\mathscr{F},m)$ a measure space with $m(X)<\infty$ . Let $L^0(m)$ denote the space of ( $m$ -equivalence classes of) $\mathscr{F}$ -measurable, real-valued functions that take finite values a.e.

Continuity in measure. To say a linear transformation $L\colon B\to L^0(m)$ is "continuous in measure" means that if $(f_n)$ is a sequence in $B$ that converges in the norm of $B$ to a vector $f\in B$ , then $L\,f_n\to L\,f$ in measure, i.e., that for every $\lambda>0$ :

$lim n \to \infty m (| L f n - L f | > λ) = 0.$ $\lim_{n\to\infty} m(|L\,f_n-L\,f|>\lambda)=0.$
The maximal function. Suppose $B$ is a normed linear space and $(U_n)$ a sequence of linear transformations $B\to L^0(m)$ , each of which is continuous in measure. Define the maximal function $U^*$ of this sequence $\,U^*\colon B\to L^0(m)\,$ by
$(U * v) (x) = sup n | U n v (x) | (v \in B, x \in X) .$ $(U^*v)(x) = \sup_n|U_nv(x)|\qquad (v\in B,\,x\in X).$
In particular, if, for each $v\in B$ the sequence $(U_nv)$ converges a.e. to an element of $L^0(m)$ (i.e., if the limit is finite a.e.), then $U^*v$ is finite for a.e. for each $v\in B$ . A surprising theorem of Banach asserts that the more is true.
Banach's Principle ([Ban],1926; [Gar], pp. 1-2). Suppose $B$ is a Banach space and $(U_n)$ a sequence of linear transformations $B\to L^0(m)$ , each of which is continuous in measure. If $(T_nv)$ converges a.e. for each $b\in B$ . Then
$sup {m ({U * v > λ}) : ∥ v ∥ \leq 1} ↘ 0 as λ ↗ \infty . (#)$ $\sup\big\{m(\{U^*v>\lambda\})\colon \|v\|\le 1\big\}\searrow 0\quad\text{as}\quad \lambda\nearrow\infty.\tag{#}$

In particular: $U^*$ is continuous in measure.
A Banach "Converse Principle". We've seen that maximal inequalities of the form (#) can give rise to a.e.-convergence theorems; e.g., the Ergodic Maximal Theorem, in the form (*) of $\S$ 3, and the Hardy-Littlewood Maximal Theorem of $\S$ 4. In fact, this is always true; the argument of $\S$ 3 shows:
Theorem. If $B$ is a normed linear space and $U_n$ a sequence of continuous linear transformations $B\to L^0(m)$ such for which (#) holds, then the set of $v\in B$ for which $(U_nv)$ converges a.e. is closed in $B$ .

References

[Ban] Stefan Banach, Sur la convergence presque partout de fonctionelles linéaires, Bull. Sci. Math., (2) 50 (1926) 27-32 & 36-43.

[Bir] George D. Birkhoff, Proof of the Ergodic theorem, Proc. Nat. Acad. Sci. 17 (1931) 656-660.

[Gar] Adriano Garsia, Topics in Almost Everywhere Convergence, Lectures in Advanced Mathematics #4, Markham Publishing Co., Chicago, 1970.

[Ob] Peter Oberly, The Pointwise Ergodic Theorem and its applications, Lecture Notes, Portland State University Analysis Seminar, November 2018.

[Sh] Joel H. Shapiro, Almost-everywhere convergence ... done right! Lecture Notes, Portland State University Analysis Seminar, October 2017.

[vN] John von Neumann Proof of the Quasi-Ergodic Hypothesis, Proc. Nat. Acad. Sci. 18 (1932) 70-82.