Policy
: At state , the probability to choose action is
Policy tells the agent what actions to take at a state.
Intuitive representation: the arrows demonstrate a policy
Math representation: using conditional probability Deterministic policy at state
Stochastic policy at state
如何在编程中表示policy
一般用向量或者矩阵来表示,直观的,我们先有tabular representation, a 9x5 table,所以可以直接用矩阵表示,
policy.shape=(9,5), policy.sum(dim=1)=[1,1,1,...,1]