Action value
- State value: the average return the agent can get starting from a state
- Action value: the average return the agent can get starting from a state and taking an action
Definition:
- is a function of state-action pair
- the function depends on
Recalls that:
Hence,
Also we can expand it, refer to the derivation of state value
Matrix form
Less is more, then we will have six matrix formulla here
All in one:
Two sides of the coin:
Three basic formula we got during the derivation of state value:
一些有意思的地方
最后其实可以发现一个有意思的点就是,也就是说action value和policy的shape是一样的,这也就说明了我们其实说白了是在从action value中通过policy概率去统合每个动作的价值,才得到当前state的价值,这样显得非常直观,但是又非常巧妙的将action probability(policy)和action value一一对应起来了; 这也基本解决当时初学强化学习时一直没弄弄懂Q表和policy的关系,当然也可以直接根据action value的大小选取价值最大的action,从而得到当前最正确的policy,这样policy和Q相乘当然也可以得到最大的state value,刚好两全其美。