Reinforcement Learning Notes

❯

08 Value Function Methods

08 Value Function Methods

Apr 01, 20252 min read

08 Value Function Methods

Gap: tabular representaion to function representation
Algorithms
- state value estimation with Value Function Approximation (VFA) $w min J (w) = E (v_{π} (S) - \overset{v}{^} (S, w))$
- Sarsa with VFA
- Q-learning with VFA
- Deep Q-learning (DQN)
Neural networks come to RL

为什么representation要tabular to function

在之前几章中，state是离散的值，我们就可以很容易得用一个表格 $v_{π} (s)$ 来记录或修改每个state对应的value是多少（当然是指在某一个policy $π$ 的特定条件下），然而，当state一多（指state变成连续了），我们再用表格的方法就不合适了，此时我们就引入参数weight $W$ ，希望可以将state映射到一个value上，这就是函数的表示方法，通常是 $\overset{v}{^} (s, W)$

Outline

Motivating example
Algorithm for state value estimation
Sarsa with function approximation
Q-learning with function approximation
Deep Q-learning
Summary

Summary

Value Function Approximation (VFA)
basic idea
- discrete(tabular) ⇒ continuous(function)
- limit storage ⇒ unlimited approximation(just different mapping way)
- generalization ⇒ unseen states
algorithms
- state value estimation
- Sarsa
- Q-learning
- DQN

Later

value-based to policy-based ⇒ 09 Policy Gradient Methods

Graph View

08 Value Function Methods
Outline
Summary
Later

Backlinks

07 Temporal-Difference Methods
09 Policy Gradient Methods
10 Actor-Critic Methods
index

Created with Quartz v4.4.0 © 2025

GitHub