Q-learning with Value Function
Introduction
The Q-learning algorithm with value function approximation is
Core Idea
Replace with in VF Sarsa.
Implementation
一些观点
赵老师的PPT中说上面这个算法是on-policy的,但自己觉得不对,我觉得是off-policy的,理由如下:
- behavior policy: -greedy
- target policy:
两个policy不一样,所以是off-policy的