Sarsa with Value Function

Introduction

So far, we merely considered state value estimation. That is

To search for optimal policies, we need to estimate action values.

The Sarsa algorithm with value function approximation is

This is the same as the algorithm we introduced previously in this lecture except that is replaced by .

Core Idea

Replace with in VF state value.

Implementation

可以对比Implementation的实现来看这里的改进