Proof of the average reward equation

Question

Prove the equation for the average reward of a policy $π$ :

n \to \infty lim \frac{1}{n} E [t = 0 \sum n - 1 R_{t + 1}] = s \in S \sum d_{π} (s) r_{π} (s) = \overset{r}{ˉ}_{π}

First prove that the equation(mentioned in PG metric: average reward) is valid for any starting state $s_{0}$

\overset{r}{ˉ}_{π} = n \to \infty lim \frac{1}{n} E [t = 0 \sum n - 1 R_{t + 1} ∣ S_{0} = s_{0}]

Proof:

= = = = = = = n \to \infty lim \frac{1}{n} E [t = 0 \sum n - 1 R_{t + 1} ∣ S_{0} = s_{0}] n \to \infty lim \frac{1}{n} t = 0 \sum n - 1 E [R_{t + 1} ∣ S_{0} = s_{0}] t \to \infty lim E [R_{t + 1} ∣ S_{0} = s_{0}] t \to \infty lim s \in S \sum E [R_{t + 1} ∣ S_{t} = s, S_{0} = s_{0}] p^{(t)} (s ∣ s_{0}) t \to \infty lim s \in S \sum E [R_{t + 1} ∣ S_{t} = s] p^{(t)} (s ∣ s_{0}) t \to \infty lim s \in S \sum r_{π} (s) p^{(t)} (s ∣ s_{0}) s \in S \sum r_{π} (s) d_{π} (s) \overset{r}{ˉ}_{π} (Linearity of expectation) (Cesaro mean) (Markov memoryless property) (Definition of r_{π} (s)) (Stationary distribution)

where $p^{(t)} (s ∣ s_{0})$ denotes the probability of transitioning from $s_{0}$ to $s$ using exactly $t$ steps, and note that:

t \to \infty lim p^{(t)} (s ∣ s_{0}) = d_{π} (s)

Next, consider an arbitrary state distribution $d$ . By the law of total expectation, we have

= = = = n \to \infty lim \frac{1}{n} E [t = 0 \sum n - 1 R_{t + 1}] n \to \infty lim \frac{1}{n} s \in S \sum d (s) E [t = 0 \sum n - 1 R_{t + 1} ∣ S_{0} = s] s \in S \sum d (s) n \to \infty lim \frac{1}{n} E [t = 0 \sum n - 1 R_{t + 1} ∣ S_{0} = s] s \in S \sum d (s) \overset{r}{ˉ}_{π} \overset{r}{ˉ}_{π} (From Step 1)

The proof is complete.

Also called the Cesaro summation.
If ${a_{k}}_{k = 1}^{\infty}$ is a convergent sequence such that $lim_{k \to \infty} a_{k}$ exists
then ${\frac{1}{n} \sum_{k = 1}^{n} a_{k}}_{n = 1}^{\infty}$ is also a convergent sequence such that $n \to \infty lim \frac{1}{n} k = 1 \sum n a_{k} = k \to \infty lim a_{k}$