10 Actor-Critic Methods
- Gap: policy-based + value-based
- Algorithms
- the simplest actor-critic (QAC)
- Advantage actor-critic (A2C)
- Off-policy actor-critic
- Importance sampling
- Deterministic actor-critic (DPG)
为什么叫actor-critic
其实actor就是policy-based,critic就是value-based,而整个算法其实就是09 Policy Gradient Methods,为了突出其中value的作用,上面用红色标注,其实整个方法就是08 Value Function Methods+09 Policy Gradient Methods
Introduction
Actor-critic methods are still policy gradient methods.
- emphasize(着重于) the structure that incorporates the policy gradient and value-based methods What are “actor” and “critic”?
- “actor” refers to policy update
- “critic” refers to policy evaluation(or value estimation)
Outline
Later
next ⇒ your unique journey in the RL field