05 Monte Carlo Methods

  • Gap: how to do model-free learning
  • Mean estimation with sampling data
  • First model-free RL algorithms
  • Algorithms
    • MC Basic (policy iteration replace model-based by data-based)
    • MC Exploring Starts
    • MC -greedy

数据与模型

要么有数据没有模型,要么有模型没有数据,这里的模型还是指environment model,我们从始至终都是需要从环境中获得数据的,即使没有环境模型的情况下我们也需要从环境中采样数据,然后用期望值来估计,这就是Monte Carlo的思想。

Outline

  1. Motivating example
  2. The simplest MC-based RL algorithm
  3. Use data more efficiently
  4. MC without exploring starts

Summary

  • Mean estimation by the Monete Carlo methods
  • 3 MC algorithms: Basic Exploring Starts -Greedy
  • Optimality vs exploration of -Greedy

Later