Example of Temporal-Difference

Revisit the mean estimation problem

based on some i.i.d. samples of .

  1. reformulate to root-finding problem: is the equation to solve;
  2. sample from , obtain noisy observation
  3. update by RM

To estimate mean of a function

based on some i.i.d. random samples of .

  1. reformulate:
  2. update by RM:

To estimate mean of a function

where are random variables, constant , and is a function.

  1. suppose we can obtain samples of respectively:
  2. update by RM:

TLDR

上面三个例子都是用来估计expectation的,都是用RM来写成迭代的相似形式,下面我们就要正式引入state value的TD learning了。