Policy

: At state , the probability to choose action is

Policy tells the agent what actions to take at a state.

Intuitive representation: the arrows demonstrate a policy

center

Math representation: using conditional probability Deterministic policy at state

center

Stochastic policy at state

如何在编程中表示policy

一般用向量或者矩阵来表示,直观的,我们先有tabular representation, a 9x5 table,所以可以直接用矩阵表示,policy.shape=(9,5), policy.sum(dim=1)=[1,1,1,...,1]