08 Value Function Methods

  • Gap: tabular representaion to function representation
  • Algorithms
    • state value estimation with Value Function Approximation (VFA)
    • Sarsa with VFA
    • Q-learning with VFA
    • Deep Q-learning (DQN)
  • Neural networks come to RL

为什么representation要tabular to function

在之前几章中,state是离散的值,我们就可以很容易得用一个表格来记录或修改每个state对应的value是多少(当然是指在某一个policy 的特定条件下),然而,当state一多(指state变成连续了),我们再用表格的方法就不合适了,此时我们就引入参数weight ,希望可以将state映射到一个value上,这就是函数的表示方法,通常是

Outline

  1. Motivating example
  2. Algorithm for state value estimation
  3. Sarsa with function approximation
  4. Q-learning with function approximation
  5. Deep Q-learning
  6. Summary

Summary

  • Value Function Approximation (VFA)
  • basic idea
    • discrete(tabular) continuous(function)
    • limit storage unlimited approximation(just different mapping way)
    • generalization unseen states
  • algorithms
    • state value estimation
    • Sarsa
    • Q-learning
    • DQN

Later

value-based to policy-based 09 Policy Gradient Methods