Reinforcement Learning Notes

❯

❯

Concept Markov Decision Process

Concept Markov Decision Process

Apr 01, 20252 min read

Markov Decision Process(MDP)

Key elements of MDP

Sets(3个集合):
- State: the set of states $S$
- Action: the set of actions $A (s)$ is associated for state $s \in S$
- Reward: the set of rewards $R (s, a)$
Probability distribution(两个分布):
- state transition probability: $p (s^{'} ∣ s, a)$
- reward probability: $p (r ∣ s, a)$
Policy: $π (a ∣ s)$
Markov property: memoryless property

p (s_{t + 1} ∣ a_{t + 1}, s_{t}, \dots, a_{1}, s_{0}) p (r_{t + 1} ∣ a_{t + 1}, s_{t}, \dots, a_{1}, s_{0}) = p (s_{t + 1} ∣ a_{t + 1}, s_{t}) = p (r_{t + 1} ∣ a_{t + 1}, s_{t})

为什么要写成 $A (s)$

意味着不同state可以有不同action space

如果以后别人问什么是MDP怎么办？

就回答：MDP就是markov decision process，废话一样，其实可以从题意出发

markov: 就是指马尔可夫性质，当前状态采取动作之后，下一状态以及得到的奖励的概率分布和历史无关

decision: 就是指policy，策略就是指当前状态采取某一个动作的概率

process: 包含state, action, reward三个set和state trainsition prob和reward prob

Markov decision process becomes Markov process once the process is given.

center

Graph View

Backlinks

01 Basic Concepts

Created with Quartz v4.4.0 © 2025

GitHub