Robbins-Monro algorithm(RM)
Introduction
Stochasitic approximation (SA):
- refers to a broad class of stochastic iterative algorithms solving root finding or optimization problems
- compared to other root-finding algorithms such as gradient-based, SA does not require know the expression of the objective function nor its derivative.
Robbins-Monro (RM) algorithm:
- pioneering work in the field of stochastic approximation
- SGD is a special form of RM algorithm
Problem statement
Suppose we want to solve the following equation:
where is the variable to be solved and is a continuous function.
- many problems can be formulated as root-finding problems
- e.g. solving the fixed point of a function, finding the solution of a system of equations, minimize an objective function by solving
- method:
- model-based: the expression of is known, use numerical methods to solve it
- model-free: the expression of is unknown, e.g. the function is a black-box(neural network)
Robbins-Monro algorithm
The RM algorithm can solve this problem:
where:
- is the -th estimate of the root
- is the -th noisy observation
- is a random noise
- why noise? because we don’t know the exact value of
- e.g. consider a random sampling of
- is a positive coefficient
- the model here refers to the expression of the function
- is a black box
Philosophy: without model, we need data
Robbins-Monro Theorem
In the Robbins-Monro algorithm, if the following conditions are satisfied:
- for all ;
- and ;
- and ;
where is the history of the algorithm up to time .
Then, the converges to the root satisfing with probability 1 (w.p.1).
上面三个条件的解释
- for all ;
- 为了确保g是单调递增(monotonically increasing)的(递减也可以通过取负数变单调递增)
- 保证根存在且唯一
- 可以看到gradient的范围是有界的(bounded)
- 但其实这个条件不够严格,e.g. 就不是单调递增的,而且也有唯一解
- 所以应该还要加个条件:是凸函数(convex function)
- and ;
- :保证步长一直都有,不会停止
- suppose if , then 可能收敛到某个值或者叫bounded,这样会导致的结果是如果arbitrarily选择并离最终的解很远,那么迭代可能永远不会收敛到
- :保证步长一直在减小,最后收敛到0
- 容易证明:,其中
- 这个条件同样保证了收敛到
- then
- then
- “路虽远,行则将至”
- and ;
- 这个条件是对noise的约束,common case is i.i.d. noise
- 保证了noise的期望是0,即不会有bias
- 同时保证了noise的方差是有界的,即不会有无穷大的方差
- What satisfies the two conditions and ? See =k
实际RL中这三个条件一定要满足吗?
- 一般来说,这三个条件是为了保证RM算法的收敛性,但实际应用中不一定要满足
- 但是不满足的话,算法可能不work,比如不满足condition 1,此时如果不设置对初始值的限制,可能会导致算法不收敛
- 但是实际应用RL中,我们往往都是设置为一个非常小的constant,这样即使不符合condition 2,也往往都可以work
Apply to mean estimation
使用RM算法来求解mean estimation问题如下:
- 考虑下面函数:
其中是一个随机变量,是的期望
- 我们的目的是求解,从而得到
- 注意我们这里是不知道的expression的,只能通过采样得到的值
- 每一次我们得到observation: 其中是从中采样得到的 其中正好就是一个noise
- 从而我们可以得到RM算法的公式:
回想起Incremental mean estimation中最后推导出来的公式:
- 这个公式就是RM算法的特例,其中