AlgorithmSARSA with Function Approximation and ε-GreedyInitialization: Initial parameters w0, policy π0(a∣s), step size α∈(0,1], discount factor γ.For each episode, do:Initialize state s0.Select action a0 using π0(s0).While st is not the target state, do:Execute at, observe rt+1,st+1.Select at+1 using πt(st+1).Update parameters (update q-value):wt+1←wt+α[rt+1+γq^(st+1,at+1,wt)−q^(st,at,wt)]∇wq^(st,at,wt)Update policy for st:πt+1(a∣st)←{1−ε+∣A(st)∣ε,∣A(st)∣ε,if a=argmaxa′q^(st,a′,wt+1)otherwisest←st+1,at←at+1