Example of Monte-Carlo
Flip a coin:
The result is denoted as a random variable X
- head X=1
- tail X=-1
The aim is to compute estimation
With model (model-based)
suppose the probabilistic model is known as
then by definition
distribution怎么才能知道呢
对于这种概率分布的方法我们是没有办法去知道精确的现实世界中的真实分布的,只能是模拟的抛硬币,然后统计结果,然后估计出来的。
Without model (model-free)
idea: sample and average
- flip the coin times, and then calculate the average of the results
- suppose we get a sample sequence , then the estimation of is
this is the idea of estimation
Monte Carlo准确吗
由于是随机的,所以是不准确的,但是随着的增大,会越来越接近
Law of large numbers
For a random variable , suppose is some i.i.d. samples from , then let be the average of the samples, then
- For detail, see Proof
为什么要用i.i.d.的样本
i.i.d. (independent and identically distributed) 独立同分布的样本是为了保证样本的独立性,如果样本之间有关联,那么就会导致估计的不准确
Summary
- MC estimation refers to a broad class of techniques that use repeated random sampling to solve approximation problems
- MC estimation is a model-free method, not require the model
为什么我们那么关心所谓的mean estimation
因为前面所推出的Bellman equation或者是vi还是pi的算法中所设计的state value和action value都是以expectaion的形式给出的,记得我们之前都是要通过两个分布和来计算的,但是现在我们学了这张之后就可以通过MC来估计出来了,所以这个是非常重要的。(just sample and average)