Episode

When interacting with the enviroment following a policy, the agent stop at some terminal states. The result trajectory is called an episode(or a trial).

翻译成一次尝试会不会更好一点

An is usually to be a finite trajectory

Tasks with episodes are called episodic tasks.

Episodic tasks && Continuing tasks

Some tasks have no terminal states or have a long trajectory, interaction with the environment will never end,我们认为这属于continuing tasks,感觉应该翻译成持续比较好,目前llm的持续学习也是一个非常吸引人的地方,持续学习完成闭环才是心目中的AGI

In fact, we can treat episodic tasks and continuing tasks in a unified mathmatical way by converting episodic tasks to continuing tasks

  • option1: treat target state as special absorbing(吸收) state, (means once reach, never leave), the consequent rewards r=0
  • option2: treat target state as normal state, can still leave the target, can still gain r=+1 when entering the target state

Consider option2 later, so that not distinguish the target state from the others.