Episode
When interacting with the enviroment following a policy, the agent stop at some terminal states. The result trajectory is called an episode(or a trial).
翻译成一次尝试会不会更好一点
An is usually to be a finite trajectory
Tasks with episodes are called episodic tasks.
Episodic tasks && Continuing tasks
Some tasks have no terminal states or have a long trajectory, interaction with the environment will never end,我们认为这属于continuing tasks,感觉应该翻译成
持续
比较好,目前llm的持续学习也是一个非常吸引人的地方,持续学习完成闭环才是心目中的AGI
In fact, we can treat episodic tasks and continuing tasks in a unified mathmatical way by converting episodic tasks to continuing tasks
- option1: treat target state as special absorbing(吸收) state, (means once reach, never leave), the consequent rewards r=0
- option2: treat target state as normal state, can still leave the target, can still gain r=+1 when entering the target state
Consider option2 later, so that not distinguish the target state from the others.