Episode

When interacting with the enviroment following a policy, the agent stop at some terminal states. The result trajectory is called an episode(or a trial).

翻译成一次尝试会不会更好一点

An $episode$ is usually to be a finite trajectory

s_{1} a_{2} r = 0 s_{2} a_{3} r = 0 s_{5} a_{3} r = 0 s_{8} a_{2} r = 1 s_{9}

Tasks with episodes are called episodic tasks.

Episodic tasks && Continuing tasks

Some tasks have no terminal states or have a long trajectory, interaction with the environment will never end，我们认为这属于continuing tasks，感觉应该翻译成持续比较好，目前llm的持续学习也是一个非常吸引人的地方，持续学习完成闭环才是心目中的AGI

In fact, we can treat episodic tasks and continuing tasks in a unified mathmatical way by converting episodic tasks to continuing tasks

option1: treat target state as special absorbing(吸收) state, (means once reach, never leave), the consequent rewards r=0
option2: treat target state as normal state, can still leave the target, can still gain r=+1 when entering the target state

Consider option2 later, so that not distinguish the target state from the others.

Reinforcement Learning Notes

Explorer

Concept episode

Episode

Graph View

Backlinks