Reinforcement Learning Notes

❯

05 Monte Carlo Methods

05 Monte Carlo Methods

Sep 30, 20251 min read

05 Monte Carlo Methods

Gap: how to do model-free learning
Mean estimation with sampling data

E (X) \approx \overset{x}{ˉ} = \frac{1}{n} i = 1 \sum n x_{i}

First model-free RL algorithms
Algorithms
- MC Basic (policy iteration replace model-based by data-based)
- MC Exploring Starts
- MC $ε$ -greedy

数据与模型

要么有数据没有模型，要么有模型没有数据，这里的模型还是指environment model，我们从始至终都是需要从环境中获得数据的，即使没有环境模型的情况下我们也需要从环境中采样数据，然后用期望值来估计，这就是Monte Carlo的思想。

Outline

Motivating example
The simplest MC-based RL algorithm
Use data more efficiently
MC without exploring starts

Summary

Mean estimation by the Monete Carlo methods
3 MC algorithms: Basic ⇒ Exploring Starts ⇒ $ε$ -Greedy
Optimality vs exploration of $ε$ -Greedy

Later

preliminiary of TD ⇒ 06 Stochastic Approximation
non-incremental to incremental ⇒ 07 Temporal-Difference Methods

Graph View

05 Monte Carlo Methods
Outline
Summary
Later

Backlinks

04 Value Iteration & Policy Iteration
index

Created with Quartz v4.4.0 © 2025

GitHub
Email
Home