05 Monte Carlo Methods
- Gap: how to do model-free learning
- Mean estimation with sampling data
- First model-free RL algorithms
- Algorithms
- MC Basic (policy iteration replace model-based by data-based)
- MC Exploring Starts
- MC -greedy
数据与模型
要么有数据没有模型,要么有模型没有数据,这里的模型还是指environment model,我们从始至终都是需要从环境中获得数据的,即使没有环境模型的情况下我们也需要从环境中采样数据,然后用期望值来估计,这就是Monte Carlo的思想。
Outline
- Motivating example
- The simplest MC-based RL algorithm
- Use data more efficiently
- MC without exploring starts
Summary
- Mean estimation by the Monete Carlo methods
- 3 MC algorithms: Basic ⇒ Exploring Starts ⇒ -Greedy
- Optimality vs exploration of -Greedy
Later
- preliminiary of TD ⇒ 06 Stochastic Approximation
- non-incremental to incremental ⇒ 07 Temporal-Difference Methods