02 Bellman Equation
- One concept: state value
- One tool: Bellman equation
Bellman Equation(贝尔曼公式)用一句话来描述
实际上就是描述了所有状态的状态值之间的关系。求解贝尔曼公式进而得到一个策略所对应的状态值的一个过程就叫做policy evaluation。
Outline
- Motivating examples
- State value
- Derivation
- Matrix-vector form
- Solve the state values
- Action value
- Summary
Motivating examples
Calculating return is important to evaluate a policy. Return is the (discounted) sum of the rewards obtained along a trajectory
Let denote the return obtained starting from
Note that we can write it in this method:
- The returns rely on each other. Bootstrapping!
Matrix-vector form:
Summary
- State value:
- Action value:
- The Bellman equation
- elementwise form:
- matrix form:
- See All in one
- elementwise form:
- Closed-form solution and iterative solution.
Later
Policy evaluation, widely used later