Example of Temporal-Difference
Revisit the mean estimation problem
based on some i.i.d. samples of .
- reformulate to root-finding problem: is the equation to solve;
- sample from , obtain noisy observation
- update by RM
To estimate mean of a function
based on some i.i.d. random samples of .
- reformulate:
- update by RM:
To estimate mean of a function
where are random variables, constant , and is a function.
- suppose we can obtain samples of respectively:
- update by RM:
TLDR
上面三个例子都是用来估计expectation的,都是用RM来写成迭代的相似形式,下面我们就要正式引入state value的TD learning了。