08 Value Function Methods
- Gap: tabular representaion to function representation
- Algorithms
- state value estimation with Value Function Approximation (VFA)
- Sarsa with VFA
- Q-learning with VFA
- Deep Q-learning (DQN)
- Neural networks come to RL
为什么representation要tabular to function
在之前几章中,state是离散的值,我们就可以很容易得用一个表格来记录或修改每个state对应的value是多少(当然是指在某一个policy 的特定条件下),然而,当state一多(指state变成连续了),我们再用表格的方法就不合适了,此时我们就引入参数weight ,希望可以将state映射到一个value上,这就是函数的表示方法,通常是
Outline
- Motivating example
- Algorithm for state value estimation
- Sarsa with function approximation
- Q-learning with function approximation
- Deep Q-learning
- Summary
Summary
- Value Function Approximation (VFA)
- basic idea
- discrete(tabular) ⇒ continuous(function)
- limit storage ⇒ unlimited approximation(just different mapping way)
- generalization ⇒ unseen states
- algorithms
- state value estimation
- Sarsa
- Q-learning
- DQN
Later
value-based to policy-based ⇒ 09 Policy Gradient Methods