Example of Stochastic Approximation
Review
Remember Law of large numbers, the mean estimation problem:
- consider a random variable
- suppose that we collected a sequence of i.i.d. samples from
- aim is to estimate
- The expectation of can be approximated by
- the basic idea of Monte Carlo estimation
重申:为什么这么关注mean estimation?
之前已经在这里提过一次,当时是从环境模型的两个概率分布的角度出发(因为我们希望用一些别的什么东西替代环境模型的概率形式,这个东西其实就是)。其实除了这个角度,我们大部分在强化学习问题中的值比如action values和gradients都是由expectation的形式所定义的。
New question: how to calculate the mean ?
- Non-incremental: trivial(平常的), wait until all samples are collected, then calculate the average
- drawback: cannot get the estimation during the collecting process
- Incremental: update the average after each sample
Incremental mean estimation
Suppose is the average of the first samples
hence
Then we can express in terms of and
Verification:
Remarks:
- mean estimate can be obtained immediately once a sample is received
A more general form(also known as exponential moving average, EMA, used to estimate the mean of a time series, e.g., smoothing stock prices):
- where is replaced by
- if satisify some mild conditions, then
- this algorithm is a special SA algorithm, and also a special SGD algorithm
- in the next lecture, we will see TD have simliar (but more complex) form
EMA和SGD的相似性
确实非常相似,如果将视作梯度的话,如果是batch的形式,和平时我们torch框架中的反向传播就更像了。