Robbins-Monro algorithm(RM)

Introduction

Stochasitic approximation (SA):

  • refers to a broad class of stochastic iterative algorithms solving root finding or optimization problems
  • compared to other root-finding algorithms such as gradient-based, SA does not require know the expression of the objective function nor its derivative.

Robbins-Monro (RM) algorithm:

  • pioneering work in the field of stochastic approximation
  • SGD is a special form of RM algorithm

Problem statement

Suppose we want to solve the following equation:

where is the variable to be solved and is a continuous function.

  • many problems can be formulated as root-finding problems
  • e.g. solving the fixed point of a function, finding the solution of a system of equations, minimize an objective function by solving
  • method:
    • model-based: the expression of is known, use numerical methods to solve it
    • model-free: the expression of is unknown, e.g. the function is a black-box(neural network)

Robbins-Monro algorithm

The RM algorithm can solve this problem:

where:

  • is the -th estimate of the root
  • is the -th noisy observation
    • is a random noise
    • why noise? because we don’t know the exact value of
    • e.g. consider a random sampling of
  • is a positive coefficient

center

  • the model here refers to the expression of the function
  • is a black box

Philosophy: without model, we need data

Robbins-Monro Theorem

In the Robbins-Monro algorithm, if the following conditions are satisfied:

  1. for all ;
  2. and ;
  3. and ;

where is the history of the algorithm up to time .

Then, the converges to the root satisfing with probability 1 (w.p.1).

上面三个条件的解释

  • for all ;
    • 为了确保g是单调递增(monotonically increasing)的(递减也可以通过取负数变单调递增)
    • 保证根存在且唯一
    • 可以看到gradient的范围是有界的(bounded)
    • 但其实这个条件不够严格,e.g. 就不是单调递增的,而且也有唯一解
    • 所以应该还要加个条件:凸函数(convex function)
  • and ;
    1. :保证步长一直都有,不会停止
      • suppose if , then 可能收敛到某个值或者叫bounded,这样会导致的结果是如果arbitrarily选择并离最终的解很远,那么迭代可能永远不会收敛到
    2. :保证步长一直在减小,最后收敛到0
      • 容易证明:,其中
      • 这个条件同样保证了收敛到
        • then
        • then
    • “路虽远,行则将至”
  • and ;
    • 这个条件是对noise的约束,common case is i.i.d. noise
    • 保证了noise的期望是0,即不会有bias
    • 同时保证了noise的方差是有界的,即不会有无穷大的方差
  • What satisfies the two conditions and ? See =k

实际RL中这三个条件一定要满足吗?

  • 一般来说,这三个条件是为了保证RM算法的收敛性,但实际应用中不一定要满足
  • 但是不满足的话,算法可能不work,比如不满足condition 1,此时如果不设置对初始值的限制,可能会导致算法不收敛
  • 但是实际应用RL中,我们往往都是设置为一个非常小的constant,这样即使不符合condition 2,也往往都可以work

Apply to mean estimation

使用RM算法来求解mean estimation问题如下:

  1. 考虑下面函数: 其中是一个随机变量,的期望
    • 我们的目的是求解,从而得到
    • 注意我们这里是不知道的expression的,只能通过采样得到的值
  2. 每一次我们得到observation: 其中是从中采样得到的 其中正好就是一个noise
  3. 从而我们可以得到RM算法的公式: 回想起Incremental mean estimation中最后推导出来的公式:
    • 这个公式就是RM算法的特例,其中