09 Policy Gradient Methods
- Gap: from value-based to policy-based
- Contents:
- Metrics to define optimal policies:
- Policy gradient:
- Gradient-ascent algorithm(REINFORCE)
小思考
和前面一章08 Value Function Methods做出的改进其实是异曲同工的,我们在之前是专注于如何得出value function,从而给它加了一个参数w,通过优化来逼近真实的价值函数,现在我们也是加了一个参数希望通过优化
Outline
- Basic idea of policy gradient
- Metrics to define optimal policies
- Gradients of the metrics
- Gradient ascent algorithm
- Summary
Summary
- Metrics for optimality
- Gradients of the metrics
- Gradient ascent algorithm
- REINFORCE
Later
policy-based plus value-based ⇒ 10 Actor-Critic Methods