02 Bellman Equation

  • One concept: state value
  • One tool: Bellman equation

Bellman Equation(贝尔曼公式)用一句话来描述

实际上就是描述了所有状态的状态值之间的关系。求解贝尔曼公式进而得到一个策略所对应的状态值的一个过程就叫做policy evaluation。

Outline

  1. Motivating examples
  2. State value
  3. Derivation
  4. Matrix-vector form
  5. Solve the state values
  6. Action value
  7. Summary

Motivating examples

Calculating return is important to evaluate a policy. Return is the (discounted) sum of the rewards obtained along a trajectory

center

Let denote the return obtained starting from

Note that we can write it in this method:

  • The returns rely on each other. Bootstrapping!

Matrix-vector form:

Summary

  • State value:
  • Action value:
  • The Bellman equation
    • elementwise form:
    • matrix form:
    • See All in one
  • Closed-form solution and iterative solution.

Later

Policy evaluation, widely used later

next03 Bellman Optimality Equation