Robbins-Monro algorithm(RM)

Introduction

Stochasitic approximation (SA):

refers to a broad class of stochastic iterative algorithms solving root finding or optimization problems
compared to other root-finding algorithms such as gradient-based, SA does not require know the expression of the objective function nor its derivative.

Robbins-Monro (RM) algorithm:

pioneering work in the field of stochastic approximation
SGD is a special form of RM algorithm

Problem statement

Suppose we want to solve the following equation:

g (w) = 0

where $w \in R$ is the variable to be solved and $g : R \to R$ is a continuous function.

many problems can be formulated as root-finding problems
e.g. solving the fixed point of a function, finding the solution of a system of equations, minimize an objective function $J (w)$ by solving $\nabla_{w} J (w) = 0$
method:
- model-based: the expression of $g (w)$ is known, use numerical methods to solve it
- model-free: the expression of $g (w)$ is unknown, e.g. the function is a black-box(neural network)

Robbins-Monro algorithm

The RM algorithm can solve this problem:

w_{k + 1} = w_{k} - a_{k} \tilde{g} (w_{k}, η_{k})

where:

$w_{k}$ is the $k$ -th estimate of the root
$\tilde{g} (w_{k}, η_{k}) = g (w_{k}) + η_{k}$ is the $k$ -th noisy observation
- $η_{k}$ is a random noise
- why noise? because we don’t know the exact value of $g (w_{k})$
- e.g. consider a random sampling $x$ of $X$
$a_{k}$ is a positive coefficient

center

the model here refers to the expression of the function
$g (w)$ is a black box

Philosophy: without model, we need data

Robbins-Monro Theorem

In the Robbins-Monro algorithm, if the following conditions are satisfied:

$0 < c_{1} \leq \nabla_{w} g (w) \leq c_{2}$ for all $w$ ;
$\sum_{k = 1}^{\infty} a_{k} = \infty$ and $\sum_{k = 1}^{\infty} a_{k}^{2} < \infty$ ;
$E [η_{k} ∣ H_{k}] = 0$ and $E [η_{k}^{2} ∣ H_{k}] < \infty$ ;

where $H_{k} = {w_{1}, η_{1}, \dots, w_{k}}$ is the history of the algorithm up to time $k$ .

Then, the $w_{k}$ converges to the root $w^{*}$ satisfing $g (w^{*}) = 0$ with probability 1 (w.p.1).

上面三个条件的解释

$0 < c_{1} \leq \nabla_{w} g (w) \leq c_{2}$ for all $w$ ;

为了确保g是单调递增(monotonically increasing)的（递减也可以通过取负数变单调递增）

保证根存在且唯一

可以看到gradient的范围是有界的(bounded)

但其实这个条件不够严格，e.g. $g (x) = x^{2}$ 就不是单调递增的，而且也有唯一解

所以应该还要加个条件： $g (w)$ 是凸函数(convex function)

$\sum_{k = 1}^{\infty} a_{k} = \infty$ and $\sum_{k = 1}^{\infty} a_{k}^{2} < \infty$ ;

$\sum_{k = 1}^{\infty} a_{k} = \infty$ ：保证步长 $a_{k}$ 一直都有，不会停止

$w_{1} - w_{\infty} = \sum_{k = 1}^{\infty} a_{k} \tilde{g} (w_{k}, η_{k})$

suppose $w_{\infty} = w^{*}$ if $\sum_{k = 1}^{\infty} a_{k} < \infty$ , then $w_{1} - w_{\infty}$ 可能收敛到某个值或者叫bounded，这样会导致的结果是 $w_{1}$ 如果arbitrarily选择并离最终的解 $w^{*}$ 很远，那么迭代可能永远不会收敛到 $w^{*}$

$\sum_{k = 1}^{\infty} a_{k}^{2} < \infty$ ：保证步长 $a_{k}$ 一直在减小，最后收敛到0

容易证明： $lim_{k \to \infty} a_{k} = S_{k} - S_{k - 1} = C - C = 0$ ，其中 $S_{k} = \sum_{i = 1}^{k} a_{i}$

这个条件同样保证了 $w_{k}$ 收敛到 $w^{*}$

$w_{k + 1} - w_{k} = - a_{k} \tilde{g} (w_{k}, η_{k})$

$a_{k} \to 0$ then $- a_{k} \tilde{g} (w_{k}, η_{k}) \to 0$

$w_{k + 1} - w_{k} \to 0$ then $w_{k} \to w^{*}$

“路虽远，行则将至”

$E [η_{k} ∣ H_{k}] = 0$ and $E [η_{k}^{2} ∣ H_{k}] < \infty$ ;

这个条件是对noise的约束，common case is i.i.d. noise

$E [η_{k}]$ 保证了noise的期望是0，即不会有bias

$E [η_{k}^{2}]$ 同时保证了noise的方差是有界的，即不会有无穷大的方差

What $a_{k}$ satisfies the two conditions $\sum_{k = 1}^{\infty} a_{k} = \infty$ and $\sum_{k = 1}^{\infty} a_{k}^{2} < \infty$ ? See $a_{k}$ =k

实际RL中这三个条件一定要满足吗？

一般来说，这三个条件是为了保证RM算法的收敛性，但实际应用中不一定要满足

但是不满足的话，算法可能不work，比如 $g (w) = w^{3} - 5$ 不满足condition 1，此时如果不设置对初始值的限制，可能会导致算法不收敛

但是实际应用RL中，我们往往都是设置 $a_{k}$ 为一个非常小的constant，这样即使不符合condition 2，也往往都可以work

Apply to mean estimation

使用RM算法来求解mean estimation问题如下：

考虑下面函数： $g (w) = w - E (X)$ 其中 $X$ 是一个随机变量， $E (X)$ 是 $X$ 的期望
- 我们的目的是求解 $g (w) = 0$ ，从而得到 $w = E (X)$
- 注意我们这里是不知道 $g (w)$ 的expression的，只能通过采样得到 $X$ 的值
每一次我们得到observation: $\tilde{g} (w, x) = w - x,$ 其中 $x$ 是从 $X$ 中采样得到的 $\tilde{g} (w, η) = w - x = w - E (X) + E (X) - x = g (w) + η,$ 其中 $η = E (X) - x$ 正好就是一个noise
从而我们可以得到RM算法的公式： $w_{k + 1} = w_{k} - a_{k} (w_{k} - x_{k})$ 回想起Incremental mean estimation中最后推导出来的公式： $w_{k + 1} = w_{k} - \frac{1}{k} (w_{k} - x_{k})$
- 这个公式就是RM算法的特例，其中 $a_{k} = \frac{1}{k}$

Reinforcement Learning Notes

Explorer

SA Robbins-Monro algorithm

Robbins-Monro algorithm(RM)

Introduction

Problem statement

Robbins-Monro algorithm

Robbins-Monro Theorem

Apply to mean estimation

Graph View

Table of Contents

Backlinks