09 Policy Gradient Methods

  • Gap: from value-based to policy-based
  • Contents:
    1. Metrics to define optimal policies:
    2. Policy gradient:
    3. Gradient-ascent algorithm(REINFORCE)

小思考

和前面一章08 Value Function Methods做出的改进其实是异曲同工的,我们在之前是专注于如何得出value function,从而给它加了一个参数w,通过优化来逼近真实的价值函数,现在我们也是加了一个参数希望通过优化

Outline

  1. Basic idea of policy gradient
  2. Metrics to define optimal policies
  3. Gradients of the metrics
  4. Gradient ascent algorithm
  5. Summary

Summary

  • Metrics for optimality
  • Gradients of the metrics
  • Gradient ascent algorithm
    • REINFORCE

Later

policy-based plus value-based 10 Actor-Critic Methods