Differentiable Unconstrained Minimization
其中可微
Convergence Rate
计算方法:
假设给定objective function能够转换成一个序列, 计算极限
convergence rate就是.
Convergence Type
- : superlinearly
- : linearly
- : sublinearly
- : non-convergence
Convergence Rate of Quadratic Minimization
假设是的最大的eigenvalue, 是最小eigenvalue, 那么可以认为:
, 那么
Convergence Analysis:
Convergence Rate:
- sublinear:
- linear:
- quadratic(super linear):
Iteration Function: 4. sublinear: 5. linear: 6. quadratic:
Iterative descend algorithm
从开始, 构造序列满足
下降方向(descend direction) 满足:
每一次迭代中, 有, 其中是在的时候的descend direction, 是步长.
在机器学习中, 通常是loss函数, 通常是loss函数中的参数, 是学习率
Note
Steepest Descend 最陡下降法
- 最快优化objective function的方向:
mathop{\arg\min}_{\mathbf d:|\mathbf d|2\leq1}f’(\mathbf x;\mathbf d)=\mathop{\arg\min}{\mathbf d:|\mathbf d|_2\leq1}\nabla f(\mathbf x)^\top\mathbf d=-|\nabla f(\mathbf x)|_2$$
Quadratic Minimization
该方程的梯度为
参数更新:
step size() rule:
Exact Line Search
假设有, 那么
Kantorovich’s inequality:
Smooth problem
-strong和-smooth定义:
或者:
Theorem
对于-strong和-smooth的问题, 有:
- step size: (v.s. )
- contraction rate: (v.s. )
iteration complexity:
Backtracking Line Search
Armijo Condition:
algorithm:
- initialize , ,
- while , do:
上界:
Theorem
收敛性:
Regularity Condition
-strong + -smooth
Polyak-Lojasiewicz Condition
Over-parameterized linear regression
over-parameterize: model dimension > sample size
定义
有
认为如果, 其rank为, 且满足step size , 有:
Convex and Smooth Problem
L-smooth
majorization-minimization
Theorem
如果是-smooth的, 且有:
Non-convex Problem
Theorem
- for general: $$
- for convex: $$
Gradient methods for Constrained Problems
Frank-Wolfe algorithm
over a convex set: . 步长类似Exact Line Search:
对于non-convex:
有:
Convergence
Theorem
对于compact约束集合, 效率可以达到 -accuracy, 在个迭代中
Example
假设有集合是-convex的, 假设, , 定义, 那么有:
Theorem
假设是convex and L-smooth的, 假设是-strongly convex的, 那么
Projected Gradient Method
将一个在集合外的点映射到集合中
Definition
Euclidean projection(quadratic minimization):
循环:
Theorem
假设有集合是close且convex的, 那么有:
从上图可知, 有, 即和最速下降的方向是正相关的
Strongly Convex
Theorem
假设, 假设是-strongly convex且L-smooth的. 令, 有
一些其他情况参见Smooth problem