Proof of the average reward equation

Question

Prove the equation for the average reward of a policy :

Proof

Step 1: Equation is valid for any starting state

First prove that the equation(mentioned in PG metric: average reward) is valid for any starting state

Proof:

where denotes the probability of transitioning from to using exactly steps, and note that:

Step 2: Equation is valid for any state distribution

Next, consider an arbitrary state distribution . By the law of total expectation, we have

The proof is complete.

Cesaro mean

  • Also called the Cesaro summation.
  • If is a convergent sequence such that exists
  • then is also a convergent sequence such that