10 Actor-Critic Methods

  • Gap: policy-based + value-based
  • Algorithms
    • the simplest actor-critic (QAC)
    • Advantage actor-critic (A2C)
    • Off-policy actor-critic
      • Importance sampling
    • Deterministic actor-critic (DPG)

为什么叫actor-critic

其实actor就是policy-based,critic就是value-based,而整个算法其实就是09 Policy Gradient Methods,为了突出其中value的作用,上面用红色标注,其实整个方法就是08 Value Function Methods+09 Policy Gradient Methods

Introduction

Actor-critic methods are still policy gradient methods.

  • emphasize(着重于) the structure that incorporates the policy gradient and value-based methods What are “actor” and “critic”?
  • “actor” refers to policy update
  • “critic” refers to policy evaluation(or value estimation)

Outline

  1. The simplest actor-critic
  2. Advantage actor-critic
  3. Off-policy actor-critic
  4. Deterministic actor-critic

Later

next your unique journey in the RL field