Preliminaries

LinearAlgebra

Theorem

对于方阵 $A$ , 下列的说法是相等的(知一推其他)

$A$ 是可逆的(invertible)

$A^{⊤}$ 是可逆的

$∣ A ∣ \neq = 0$

$A$ 的行是线性独立的

$A$ 的列是线性独立的

对于任意一个向量 $b$ , 线性系统 $Ax = b$ 都有特定的解

存在一个向量 $b$ , 线性系统 $Ax = b$ 都有特定的解

Introduction

Variants of the linear programming problems

线性规划的一般形式:

minimize subject to 2 x_{1} x_{1} x_{1} - + x_{2} x_{2} 3 x_{2} + - 4 x_{3} x_{3} x_{3} x_{3} + + x_{4} x_{4} \leq = \geq \geq \geq 25300

其中, $x_{i}, i \in {1, 2, 3, 4}$ 是变量, 是用于最小化方程 $2 x_{1} - x_{2} + 4 x_{3}$ 的变量, 但是同时需要满足subject to的约束条件(线性方程或者线性不等式的集合).

我们可以将线性方程(不等式)用向量乘法来表示出来, 如 $a = (1, 1, 0, 1), x = (x_{1}, x_{2}, x_{3}, x_{4})$ , 那么第一个约束就可以是 $a^{⊤} x \leq 2$

我们将系数 $a$ 称作cost vector, 我们期望去最小化cost function: $c^{⊤} x = \sum_{i} c_{i} x_{i}$

于是我们可以把线性规划写成如下形式:

minimize subject to c^{⊤} x a_{1} x a_{2} x a_{3} x x_{j_{1}} x_{j_{2}} \leq \geq = \geq \leq b_{1} b_{2} b_{3} 00 j_{1} \in N_{1} j_{2} \in N_{2}

其中, $a_{1}, a_{2}, a_{3}$ 是系数矩阵, $b_{1}, b_{2}, b_{3}$ 是向量. $N_{1}, N_{2}$ 是两个集合, 表示需要满足约束的 $x_{i}$ 的index

我们成 $x_{i}$ 称作决定变量(decision variables), 一个满足所有约束的向量 $x$ 成为一个可行解(feasible solution). 所有的可行解组成一个集合, 称这个集合为可行集(feasible set)或者可行域(feasible region)

我们将需要最小化的方程称作目标方程(objective function)或者cost function, 能够使得目标方程最小化的可行解称为最优可行解(optimal feasible solution), 此时的目标函数的值称作最优消耗(optimal cost).

如果对于任意的实数 $K$ 都能找到一个可行解 $x$ , 使得 $c^{⊤} x \leq K$ , 那么可以认为optimal cost为 $- \infty$ , 是无下限的(unbounded below)

对于求最大化的情况, 我们先转换成 $- c^{⊤} x$ , 然后求最小值.

对于约束条件中的 $\leq$ 情况, 我们做类似的处理, 转换成 $(- a_{i})^{⊤} x \geq - b_{i}$ . 对于 $x_{j} \leq 0$ 和 $x_{j} \geq 0$ , 我们可以看作是 $a^{⊤} x \geq 0$ 的特殊情况. 对于等式 $a^{⊤} x = b$ , 我们可以转换为 $a^{⊤} x \geq b$ 和 $a^{⊤} x \leq b$ 同时满足.

于是转换结束之后我们可以得到所有条件都是 $a^{⊤} x \geq b$ 的形式. 将所有的向量 $a^{⊤}$ 拼接, 我们得到了一个矩阵:

A = - - a_{1}^{⊤} ⋮ a_{m}^{⊤} - -

因此我们的线性规划的形式转换为:

minimize subject to c^{⊤} x Ax \geq b

Standard form

minimize subject to c^{⊤} x Ax x = \geq b 0

我们可以认为约束条件是一个线性组合: $Ax = \sum_{i} A_{i} x_{i} = b$ .

将一般形式规约到标准形式:

消除自由的变量: 自由变量是一个变量没有单独的约束(如 $x \leq 0$ 或 $x \geq 0$ ). 我们需要用两个额外的变量代替: 用 $x_{i}^{+} - x_{i}^{-}$ 代替 $x_{i}$ , 并满足约束 $x_{i}^{+} \geq 0, x_{i}^{-} \geq 0$ .
消除不等式约束: 将不等式转换为等式. 引入新的变量: $a^{⊤} x \leq b \Rightarrow a^{⊤} x + s = b$ , 满足 $s \geq 0$ . 以及 $a^{⊤} x \geq b \Rightarrow a^{⊤} x - s = b$ , 满足 $s \geq 0$

如此可以将一般形式的线性规划转换为标准形式, 为后续求解做准备. 一般形式一般与用推到线性规划的数学理论问题.

Piecewise linear convex objective functions

Definition

如果一个函数 $f : R^{n} \mapsto R$ 满足 $\forall x, y \in R^{n}, λ \in [0, 1]$ 都有 $f (λ x + (1 - λ) y) \leq λ f (x) + (1 - λ) f (y)$ , 那么我们称函数 $f$ 为convex(凸)的

如果 $f : R^{n} \mapsto R$ 满足 $\forall x, y \in R^{n}, λ \in [0, 1]$ 都有 $f (λ x + (1 - λ) y) \geq λ f (x) + (1 - λ) f (y)$ , 那么我们称函数 $f$ 为concave(凹)的

注意, $λ x + (1 - λ) y$ 就是在 $x$ 和 $y$ 两点连线上的一点.

对于线性规划而言, 所有的函数都是线性函数, 那么不论是concave还是convex, 他们的判断方法都从不等式变成了等式 $f (λ x + (1 - λ) y) = λ f (x) + (1 - λ) f (y)$ , 并且如果是convex, 那么一定是concave; 反之亦然.

一个函数 $f$ 如果是convex的, 那么 $- f$ 一定是concave的.

如果是一个仿射函数(affine function) $f (x) = a_{0} + \sum_{i = i}^{n} a_{i} x_{i}$ , 那么该函数一定同时是convex和concave的.

当 $f (x) \leq f (y)$ , 其中 $y$ 是 $x$ 的邻域时, 我们称一个可行解 $x$ 是局部最小化(local minimize)

当 $f (x) \leq f (y), \forall y$ 时, 我们称一个可行解 $x$ 是全局最小化(global minimize)

对于convex函数, local minimize就是global minimize

Theorem

假设 $f_{1}, \dots, f_{m} : R^{n} \mapsto R$ 都是convex的函数, 那么函数 $f (x) = max_{i = 1, \dots, m} f_{i} (x)$ 也是convex的

其中 $f (x) = max_{i = 1, \dots, m} f_{i} (x)$ 被称为分段函数(piecewise function). 分段函数也有时候用于模仿一个convex的幂次函数.

我们可以把一个目标函数为分段线性函数的线性规划规约成一个标准形式的线性规划:

minimize subject to max (c_{i}^{⊤} x + d_{i}) Ax \geq b \Rightarrow minimize subject to z z Ax \geq \geq c_{i}^{⊤} x + d_{i} b

其中, 决定变量为 $z$ 和 $x$

使用分段函数能够便捷的模拟一个真实的convex的函数. 但是会引入不可导点(两端函数连接的地方), 有不连续导数, 会导致函数不平滑.

对于绝对值函数 $∣ x ∣$ 而言, 可以使用类似的方法来规约:

将绝对值替换为 $max {x, - x}$
引入新的变量替换掉 $max$ 函数: $\forall i, x_{i} \leq z_{i}, - x_{i} \leq z_{i}$

或者用另外一种方法:

引入两个新的变量: 使用 $x_{i}^{+} + x_{i}^{-}$ 替换 $∣ x_{i} ∣$ , 满足 $x_{i}^{+}, x_{i}^{-} \geq 0$ . 其中, 原始的 $x_{i} = x_{i}^{+} - x_{i}^{-}$ .
需要添加额外的约束条件, $x_{i}^{+}$ 和 $x_{i}^{-}$ 中至少有一个为0

Geometry of linear programming

Polyhedra and convex set

Definition

Polyhedron是一个集合 ${x \in R^{n} ∣ Ax \geq b}, A \in R^{m \times n}, b \in R^{n}$

如果一个集合 $S \subset R^{n}$ 存在一个常量 $K$ 满足所有 $S$ 中的元素的所有components的绝对值都小于等于 $K$ , 那么我们称 $S$ 是有界的(bounded)

如果一个向量 $a \in R^{n}$ , 一个标量 $b$ , 那么

${x \in R^{n} ∣ a^{⊤} x = b}$ 称为超平面(hyperplane)

${x \in R^{n} ∣ a^{⊤} x \geq b}$ 称为半空间(halfspace)

之前的feasible set中有提到可以将所有的约束规约成 $Ax \geq b$ 的形式, 那么我们可以把Polyhedron看成一个feasible set. 相似的, 我们可以将 ${x \in R^{n} ∣ Ax = b, x \geq 0}$ 当成Polyhedron的标准形式

超平面hyperplane可以看成是有界(bounded)的半空间.

hyperplane的表达中的向量 $a$ 可以认为是该plane的法向量. 假设该平面与法向量 $a$ 的交点是 $x_{0}$ , 那么对于任意一个非 $x_{0}$ 的点 $x$ , 都有

a^{⊤} (x - x_{0}) = 0 \Rightarrow a^{⊤} x - a^{⊤} x_{0} = 0 \Rightarrow a^{⊤} x = a^{⊤} x_{0} \Rightarrow a^{⊤} x = b

上图是一个polyhedron ${x \in R^{2} ∣ a_{i}^{⊤} x \geq b_{i}, i = 1, 2, 3, 4, 5}$

Convex Set

Definition

对于一个集合 $S$ , 如果有 $x, y \in S$ 满足 $λ x + (1 - λ) y \in S, λ \in [0, 1]$ , 那么我们称 $S$ 为凸集(convex set)

假设有一组向量 $x_{1}, \dots, x_{n}$ , 有对应的一组和为 $1$ 的标量 $λ_{1}, \dots, λ_{n}, s . t . \sum_{i} λ_{i} = 1$ .

我们称 $\sum_{i = 1}^{n} λ_{i} x_{i}$ 为convex combination

上述所有向量 $x_{i}$ 的**凸面体(convex hull)**即为所有的向量的convex combination组成的集合

注意到 $λ x + (1 - λ) y$ 实际上是 $x$ 和 $y$ 的加权平均, 是 $xy$ 连线上的一个分割点. convex set就是判断这条线段是否也在集合中

Theorem

convex set的交集(intersection)还是convex set

每一个polyhedron都是convex set

一个convex set中有限元素的convex combination还是属于该convex set

有限个向量的convex hull是一个convex set

Extreme points, vertices, and basic feasible solution

Definition

假设 $P$ 是多面体Polyhedron

如果一个向量 $x \in P$ 无法在 $P$ 中找到两个向量 $y, z \in P$ 满足 $λ y + (1 - λ) z = x, λ \in [0, 1]$ , 那么我们称这个向量 $x$ 为极点(extreme point)

如果一个向量 $x \in P$ , $\exists c$ 满足 $c^{⊤} x < c^{⊤} y, \forall y \in P, y \neq = x$ , 那么我们称向量 $x$ 为顶点(vertex)

向量 $x$ 是无法使用两个都在 $P$ 内的点表示的, 但是 $x$ 本是是属于 $P$ 的.

Definition

如果一个向量 $x^{*}$ 满足对于某些 $i \in M_{1}, M_{2}, M_{3}, a_{i}^{⊤} x = b_{i}$ , 那么我们称对应的约束条件为积极约束(active constrains)

对于一个Polyhedron $P$ , 通过等式和不等式来约束. 假设有一个向量 $x^{*} \in R^{n}$ ,

如果所有的等式约束都是积极的, 并且有 $n$ 个约束是线性独立的, 那么我们称向量 $x^{*}$ 为基础解(basic solution)

若 $x^{*}$ 是basic solution并满足所有的约束, 那么我们称这个向量为基本可行解(basic feasible solution)

如果积极约束有 $n$ 个, 对应 $n$ 个未知变量, 那么如果这 $n$ 个线性方程是线性独立的, 那么这个线性规划系统有解

Theorem

假设向量 $x^{*} \in R^{n}$ , 集合 $I = {i ∣ a_{i}^{⊤} x = b_{i}}$ 是 $x^{*}$ 的积极约束的下标, 那么下列说法是等价的(知一推其他)

存在 $n$ 个向量, 属于集合 ${a_{i}^{⊤} ∣ i \in I}$ 中, 是线性独立的

所有 ${a_{i} ∣ i \in I}$ 中的向量能够张成(span)整个 $R^{n}$ 空间, 也就是说, 所有的 $R^{n}$ 中的向量可以使用 $a_{i}$ 的线性组合表示出

线性系统 $a_{i}^{⊤} x = b_{i}, i \in I$ 有唯一解

或者说是约束是线性独立(linear independent)的, 即 $a_{i}$ 是linear independent的.

现在定义corner point的定义: 存在 $n$ 个linear independent的active constrains的feasible solution. 通过寻找 $n$ 个linear independent的active constrains, 我们有一个unique的解 $x^{*}$ , 但是这个不一定是feasible的, 因为可能会违反inactive constrains

Definition

假设Polyhedron $P$ 被线性等式和不等式定义, 假设 $x^{*}$ 是 $P$ 中的一个元素

若满足: 1. 所有等式均为active solution 2. 在所有对 $x^{*}$ 的constrains中, 有 $n$ 个是linear independent的. 那么我们称 $x^{*}$ 是基本解(basic solution)

若 $x^{*}$ 是basic solution, 并且满足所有的constrains, 那么我们称 $x^{*}$ 为基本可行解(basic feasible solution)

如果只有 $m$ 个constrains定义Polyhedron, 并且 $m < n$ , 那么可以认为这个Polyhedron没有basic solution或者basic feasible solution

Theorem

假设非空的Polyhedron $P$ , 假设 $x^{*} \in P$ , 那么下列条件是等价的:

$x^{*}$ 是vertex

$x^{*}$ 是extreme point

$x^{*}$ 是basic feasible solution

推论: 假设有finite linear inequality constrains, 那么有finite basic or basic feasible solution

Adjacent basic solution

两个 $R^{N}$ 上的不同的basic solution, 如果有 $n - 1$ 个linear equality constrains共同active, 那么我们称这两个basic solution为adjacent的.

如果两个adjacent的basic solution都是feasible的, 那么这两个点的连线称为feasible set的边(edge)

Polyhedra in standard form

假设Polyhedron $P = {x \in R^{n} ∣ Ax = b, x \geq 0}$ , 其中 $A \in R^{m \times n}$ , $m$ 是行数, 表示equality约束的个数. 假设 $A$ 的 $m$ 行都是linear independent的, 由于每一行都是 $n$ 维的, 我们可以认为说 $m \leq n$ (为了保证线性独立). 那么我们可以说当 $P$ 非空时, 可以丢弃 $A$ 的线性相关行的冗余约束.

对于任何的basic solution, 都有 $n$ 个linear independent active constrains. 此外, 如果要满足约束 $Ax = b$ , 那么这提供了 $m$ 个active constrains. 由于我们假设了 $m \leq n$ 并且这 $m$ 个约束都是linear independent的, 我们还需要 $n - m$ 个与 $Ax = b$ 提供的 $m$ 个约束也独立的线性约束. 因此我们会选择 $n - m$ 个变量 $x_{i}$ 创建等式 $x_{i} = 0$ , 即满足约束 $x \geq 0$ . 为了让 $x_{i} = 0$ 也是linear independent, 我们对 $x_{i}$ 的选择是有特定方案的.

Definition

考虑 $Ax = b$ 和 $x \geq 0$ , 假设 $A$ 是row-independent的. 假设basic solution $x \in R^{n}$ , 当且仅当有 $Ax = b$ 时, 存在索引(indices) $B (1), \dots, B (m)$ , 满足:

列 $A_{B (1)}, \dots, A_{B (m)}$ 是linear independent的

如果 $i \neq = B (1), \dots, B (m)$ , 那么 $x_{i} = 0$

那么对于Standard Form的Polyhedron可以用这个方式求解:

找 $m$ 个linear independent column: $A_{B (1)}, \dots, A_{B (m)}$
令 $x_{i} = 0, \forall i \neq = B (1), \dots, B (m)$
解方程 $Ax = b$ , 求解 $x_{B (1)}, \dots, x_{B (m)}$

如果上述方法构建的basic solution是非负的, 那么我们可以认为是一个feasible basic solution.

变量 $x_{B (i)}$ 称为基本变量(basic variables), 其他的是非基本变量(nonbasic). 我们称列 $A_{B (i)}$ 为基本列(basic column), 并且他们是linear independent的, 因此他们组成了 $R^{m}$ 的一组基. 我们假定不同的基有不同的索引, 但是如果顺序不同不认为是不同的基.

将 $m$ 个基本列组合在一起, 我们获得了一个 $B \in R^{m \times m}$ , 称为基矩阵, 是可逆的, 因为所有column linear independent, 因此是满秩的.

B = ∣ A_{B (1)} ∣ \dots \dots \dots ∣ A_{B (m)} ∣, x_{B} = x_{B (1)} ⋮ x_{B (m)}

解基本方程 $Bx_{B} = b$ 有 $x_{B} = B^{- 1} b$

假设有 $A_{B (p)} = A_{B (q)}$ , 那么两组基 ${\dots, B (p), \dots}$ 和 ${\dots, B (q), \dots}$ 是完全相同的, 但是不是相同的基, 因为indices不同

Correspondence of bases and basic solutions

一组基唯一确定一个基本解. 但是不同的基可能确定了相同的基本解.

Adjacent basic solution and adjacent base

相似的, 如果两组bases共享除了一个basic column以为所有的basic column, 我们称这两组bases为相邻的

相邻的basic solution总是从相邻的bases中获得. 同样, 如果相邻的bases获取不同的basic solution, 那么也是相邻的.

The full row rank assumption on $A$

Theorem

假设有非空Polyhedron $P = {x ∣ Ax = b, x \geq 0}$ , $A \in R^{m \times n}$ , 有 $a_{1}^{⊤}, \dots, a_{m}^{⊤}$ 是 $A$ 的行向量. 假设 $rank A = k < m$ 且行 $a_{i_{1}}^{⊤}, \dots, a_{i_{k}}^{⊤}$ 是linear independent的. 假设有一个Polyhedron $Q = {x ∣ a_{i_{1}}^{⊤} x = b_{i_{1}}, \dots, a_{i_{k}}^{⊤} x = b_{i_{k}}, x \geq 0}$ , 那么可以认为 $P = Q$

注意, Q其实是标准形式的, 可以写成 $Q = {x ∣ Dx = f, x \geq 0}$ , 其中 $D \in R^{k \times n}$ 是 $A$ 的一个子矩阵, $f \in R^{k}$ 是 $b$ 的 $k$ 维子向量.

因此我们可以得出结论, 只要是非空的feasible set, 那么可以将标准形式的线性规划问题简化为一个等价的具有相同feasible set的标准形式问题, 并且该问题的所有约束都是linear independent的.

Degeneracy

可能会存在有多个active constrains的情况, 但是线性独立的约束最多不超过 $n$ 个. 在这种情况下, 我们有了一个**退化(degeneracy)**的basic solution.

Definition

若一个basic solution $x \in R^{n}$ 的active constrains超过 $n$ 个, 那么我们称这个 $x$ 是**退化(degeneracy)**的.

在二维空间中, 一个degeneracy的basic solution基本上是三条或以上的直线的交点.

Degeneracy in standard form polyhedra

Definition

标准形式Polyhedron $P = {x ∣ Ax = b, x \geq 0}, A \in R^{m \times n}$ 有basic solution $x$ . 假设 $x$ 有超过 $n - m$ 个分量为 $0$ , 那么 $x$ 是一个degeneracy的basic solution.

Degeneracy is not a purely geometric property

在一个特定标准形式表示下degeneracy的basic feasible solution在另一个表示下可能是非退化的。然而，可以证明，如果一个basic feasible solution在一个特定的标准形式表示下是退化的，那么它在同一多面体的每个标准形式表示下都是退化的

Existence of extreme points

Definition

一个Polyhedron $P \subset R^{n}$ , 如果存在一个向量 $x \in P$ 和一个非 $0$ 向量 $d \in R^{n}$ , 满足 $x + λ d \in P$ 对于任意的标量 $λ$ , 那么 $P$ 包含一条直线.

Theorem

假设 $P = {x \in R^{n} ∣ a_{i}^{⊤} x \geq b_{i}, i = 1 \dots, m}$ , 下列说法是等价的:

$P$ 至少有一个extreme point

$P$ 不包含直线

存在来自 $a_{i}, i = 1, \dots, m$ 的 $n$ 个向量是linear independent的

Corollary

每个非空有界的Polyhedron至少有一个basic feasible solution

Optimality of extreme points

Theorem

考虑在Polyhedron $P$ 上最小化 $c^{⊤} x$ 的线性规划问题. 假设Polyhedron至少有一个extreme point, 至少有一个最优解. 那么 $P$ 上的extreme point就是 $c^{⊤} x$ 的最优解

如果不存在最优解, 那么最小值一定是 $- \infty$ , 否则最优解一定是其中一个extreme points

Corollary

一个线性规划求解 $c^{⊤} x$ 的最小值, 要么是 $- \infty$ , 要么存在一个最优解

Representation of bounded polyhedra

Theorem

一个非空的有界的(bounded) polyhedron $P$ 是其所有的extreme points组成的convex hull

The Simplex Method

Optimality conditions

Definition

假设 $x$ 是polyhedron $P$ 的一个元素. 假设 $d \in R^{n}$ 满足 $x + θ d \in P$ , $θ$ 是一个正标量, 那么我们称 $d$ 为可行方向(feasible direction)

我们假设 $x$ 是线性规划的basic feasible solution, 设 $B (1), \dots, B (m)$ 是basic variable的索引, 设basic matrix $B = ∣ A_{B (1)} ∣ \dots \dots \dots ∣ A_{B (m)} ∣$ . 特别的, 对于nonbasic variable有 $x_{i} = 0$ , 对于basic variables有 $x_{B} = (x_{B (1)}, \dots, x_{B (m)})$ , 满足 $x_{B} = B^{- 1} b$

我们考虑通过一个nonbasic variable( $x_{j} = 0$ ), 并将其数值增加到positive value $θ$ , 同时保持其他的nonbasic variable仍为 $0$ , 从而将 $x \to x + θ d$ . 此时从代数上而言, $d_{j} = 1$ 而其他nonbasic variable对应的 $d_{i} = 0$ , 而basic variable对应的变量 $d_{B} = (d_{B (1)}, \dots, d_{B (m)})$ .

由于我们只关注feasible solution, 因此我们希望有 $A (x + θ d) = 0$ , 由于可行解 $Ax = b$ , 因此有 $θ Ad = Ad = 0$ . 其中,

Ad = i = 1 \sum n A_{i} d_{i} = A_{j} + i = 1 \sum m A_{B (i)} d_{B (i)} = Bd_{B} + A_{j} = 0

由于 $B$ 是可逆的, 因此有 $d_{B} = - B^{- 1} A_{j}$

其中 $d$ 被称为第 $j$ 个基准方向(basic direction). 这个条件下我们一定能满足active constrains. 对于 $x \geq 0$ 这个约束, nonbasic只有 $x_{j}$ 上升了, 其他的都还是 $0$ , 因此只有basic variable是可能有negative的.

而对于basic variable, 有两种可能:

basic feasible solution如果不是degeneracy的, 那么 $θ$ 足够小时, $d$ 是一个feasible direction
basic feasible solution如果是degeneracy的, 那么会被引向infeasible solution

现在研究在 $d$ 方向上移动对cost function的影响. 假设 $d$ 是第 $j$ 个basic direction, 那么 $c^{⊤} d$ 由 $c_{B}^{⊤} d_{B} + c_{j}$ 给出, 其中 $c_{B} = (c_{B (1)}, \dots, c_{B (m)})$

Definition

假设basic solution $x$ , basic matrix $B$ , $c_{B}$ 是basic variable的成本向量. 对于每一个 $j$ 都有 $\overset{c}{ˉ}_{j} = c_{j} - c_{B}^{⊤} B^{- 1} A_{j}$ , 定义为缩减成本(reduced cost)

若basic matrix $B$ 满足 1. $B^{- 1} b \geq 0$ . 2. $\overset{ˉ}{c}^{⊤} = c^{⊤} - c_{B}^{⊤} B^{- 1} A \geq 0^{⊤}$ , 那么我们称 $B$ 为最优的

Theorem

假设basic feasible solution $x$ 对应的basic matrix $B$ , 有对应的reduced cost $\overset{c}{ˉ}_{j}$ , 那么

$\overset{c}{ˉ}_{j} \geq 0$ , 那么 $x$ 是optimal的

若 $x$ 是optimal且没有degeneracy的, 那么 $\overset{c}{ˉ}_{j} \geq 0$

Implementations of the simplex method

字典序

按照顺序对vector逐元素比较.

e.g. $(0, 4, 5, 6) < (1, 0, 0, 0)$ 因为第一个元素 $0 < 1$

Bland Rule

enter basis后是目标值减小的variable中, 选择指标最小的enter

exit basis后保持feasible的variable中, 选择指标最小的exit

实现

对于maximize的问题, 我们首先需要转换成minimize的问题, 然后再求解

求解流程:

我们一般将non basic variable组成一个basis, 因为他们是Identity Matrix(见initial tableau)

将一般形式的LP转换成standard form
写出一个initial simplex tableau
选择一个basic variable, 作为enter basis(Bland Rule) 选取方法: 找到negative的 $r_{i}^{⊤}$ , 并选择最小(最负)的那一个: $r_{q}, q = j ar g min {r_{j} ∣ r_{j} < 0, j = 0 \dots, n}$
选择一个nonbasic variable, 作为exit basis 选取方法:
- 如果这一列都是negative or zero, 那么停止, 最值无界
- 找到比值最小的一个: $p = j ar g min {\frac{b ˉ _{j}}{u _{j q}} ∣ u_{j q} > 0, j = 1, \dots, m}$
- 注意, 如果有比值相等的情况, 使用字典序找到最小的那一个.
消元(或者说叫做转轴)
重复上述操作, 直到 $r^{⊤}$ 没有negative为止.

e.g.

max s.t. - 3 x_{1} + 5 x_{2} + 2 x_{3} + x_{4} x_{1} + x_{2} + x_{3} \leq 4 4 x_{1} - x_{2} + x_{3} + 2 x_{4} \leq 6 - x_{1} + x_{2} + 2 x_{3} + 3 x_{4} \leq 12 x_{j} \geq 0, j = 1, 2, 3, 4.

Solution:

Turn into standard form:

min s.t. 3 x_{1} - 5 x_{2} - 2 x_{3} - x_{4} x_{1} + x_{2} + x_{3} + x_{5} = 4, 4 x_{1} - x_{2} + x_{3} + 2 x_{4} + x_{6} = 6, - x_{1} + x_{2} + 2 x_{3} + 3 x_{4} + x_{7} = 12, x_{i} \geq 0, i = 1, 2, 3, 4, 5, 6, 7.

then we can generate the simplex initial tableau:

x_{5} x_{6} x_{7} r^{⊤} x_{1} 14 - 1 3 x_{2} 1 - 1 1 - 5 x_{3} 112 - 2 x_{4} 023 - 1 x_{5} 1000 x_{6} 0100 x_{7} 0010 B^{- 1} b 46120

Tip

我们选择这个是因为:

选择 $q = i ar g min {r_{i} ∣ r_{i} < 0}$ , 即在 ${- 5, - 2, - 1}$ 中选择了最小的那个 $r_{2} = - 5$

选择 $p = i ar g min {\frac{b ˉ _{i}}{u _{i 2}} ∣ u_{i 2} > 0}$ , 即 ${\frac{4}{1} = 4, \frac{12}{1} = 12}$ 中选择了最小的 $\frac{b ˉ _{1}}{u _{12}} = \frac{4}{1} = 4$

使用消元法进行消元

注意更新了左侧的basis, 从 $x_{5} x_{6} x_{7}$ 变成了 $x_{2} x_{6} x_{7}$

after update:

x_{2} x_{6} x_{7} r^{⊤} x_{1} 15 - 2 8 x_{2} 1000 x_{3} 1213 x_{4} 023 - 1 x_{5} 11 - 1 5 x_{6} 0100 x_{7} 0010 B^{- 1} b 410820

Tip

同理, 因为这里只有一个 $r_{4} = - 1$ 是negative, 因此只能是 $q = 4$

然后, 我们选择 $p = i ar g min {\frac{b ˉ _{i}}{u _{i 4} ∣ u _{i 4} > 0}}$ , 即 ${\frac{10}{2} = 5, \frac{8}{3}}$ 中选择最小的 $\frac{b ˉ _{3}}{u _{34}} = \frac{8}{3}$

消元, 更新左侧的basis, 从 $x_{2} x_{6} x_{7}$ 变成了 $x_{2} x_{6} x_{4}$

after update:

x_{2} x_{6} x_{4} r^{⊤} x_{1} 1 \frac{19}{3} - \frac{2}{3} \frac{22}{3} x_{2} 1000 x_{3} 1 \frac{4}{3} \frac{1}{3} \frac{10}{3} x_{4} 0010 x_{5} 1 \frac{5}{3} - \frac{1}{3} \frac{14}{3} x_{6} 0100 x_{7} 0 - \frac{2}{3} \frac{1}{3} \frac{1}{3} B^{- 1} b 4 \frac{14}{3} \frac{8}{3} \frac{68}{3}

Tip

当 $r^{⊤}$ 中不存在任何的negative的时候, 我们认为求解已经结束.

右侧的数字和左侧的basis对应, 就是最后的解. 这个时候, 下方的 $r^{⊤}$ 的 $0$ 应该正好和左侧的basis对的上, 否则就是一个degeneracy的solution.

Convergence and Degeneracy

Duality Theorem

Motivate

可以看作是拉格朗日乘子法(Lagrange Multiplier method) 的延伸.

Tip

Lagrange Multiplier method:
$min subject to x + y = 1 x^{2} + y^{2}$
introduce a Lagrange multiplier $p$ and form the Lagrangean $L (x, y, p) = x^{2} + y^{2} + p (1 - x - y) h$

当保持 $k$ 不变的时候, 该问题可以通过求解 $\frac{\partial L}{\partial x} = 0$ 和 $\frac{\partial L}{\partial y} = 0$ 来求解:
$x = \frac{p}{2}, y = \frac{p}{2}$
然后将 $x$ , $y$ 代入原始约束 $x + y = 1$ 求解 $p = 1$

允许约束被违反, 但是违反是有代价( $p$ )的. 当我们想要最小化 $L$ 的时候, 我们必然需要考虑违反约束所带来的成本.

Dual Problem

考虑一个Standard form的Linear programming:

min subject to c^{⊤} x Ax = b x \geq 0

我们称之为原始问题(primal problem).

引入松弛问题(relax problem), 将约束条件变成惩罚项:

min subject to c^{⊤} x - p^{⊤} (b - Ax) x \geq 0

其中 $p$ 和 $b$ 维度相同.

令 $g (p)$ 为relax problem的optimal solution, 有

g (p) = x \geq 0 min [c^{⊤} x + p^{⊤} (b - Ax)] \leq c^{⊤} x^{*} + p^{⊤} (b - A x^{*}) = c^{⊤} x^{*}

其中 $x^{*}$ 是primal problem的optimal solution. 因此我们可以认为 $g (p)$ 给定了原始问题的cost function的lower bound. 于是我们只需要求解 $g (p)$ 的最大值即可:

max subject to g (p) No Constrains

我们注意到 $g (p) = mi n_{x \geq 0} [c^{⊤} x + p^{⊤} (b - Ax)] = p^{⊤} b + x \geq 0 min (c^{⊤} - p^{⊤} A) x$ , 其中:

x \geq 0 min (c^{⊤} - p^{⊤} A) x = {0, - \infty, if c^{⊤} x + p^{⊤} (b - Ax) \geq 0^{⊤} otherwise

我们在最大化 $g (p)$ 的时候, 只需要考虑不等于 $- \infty$ 的值, 因此dual problem和如下linear programming没有区别:

max subject to p^{⊤} b p^{⊤} A \leq c^{⊤}

因此我们得到了dual problem的一般形式:

min subject to c^{⊤} x a_{i} x \geq b_{i} a_{i} x \leq b_{i} a_{i} x = b_{i} x_{j} \geq 0 x_{j} \leq 0 x_{j} free i \in M_{1} i \in M_{2} i \in M_{3} j \in N_{1} j \in N_{2} j \in N_{3} max subject to p^{⊤} b p_{i} \geq 0 p_{i} \leq 0 p_{i} free p^{⊤} A_{j} \leq c_{j} p^{⊤} A_{j} \geq c_{j} p^{⊤} A = c_{j} i \in M_{1} i \in M_{2} i \in M_{3} j \in N_{1} j \in N_{2} j \in N_{3}

primal problem	minimize	maximize	dual problem
constrains	$\geq b_{i} \leq b_{i} = b_{i}$	$\geq 0 \leq 0 free$	variables
variables	$\geq 0 \leq 0 free$	$\leq c_{j} \geq c_{j} = c_{j}$	constrains

对于特殊形式, 可以使用矩阵表示(e.g. [[#standard-form	Standard form]]):

min subject to c^{⊤} x Ax = b x \geq 0 max subject to p^{⊤} b p^{⊤} A \leq c^{⊤}

min subject to c^{⊤} x Ax \geq b max subject to p^{⊤} b p^{⊤} A = c^{⊤} p \geq 0

e.g.

min subject to x_{1} + 2 x_{2} + 3 x_{3} - x_{1} + 3 x_{2} = 5 2 x_{1} - x_{2} + 3 x_{3} \geq 6 x_{3} \leq 4 x_{1} \geq 0 x_{2} \leq 0 x_{3} free, max subject to 5 p_{1} + 6 p_{2} + 4 p_{3} p_{1} free p_{2} \geq 0 p_{3} \leq 0 - p_{1} + 2 p_{2} \leq 1 3 p_{1} - p_{2} \geq 2 3 p_{2} + p_{3} = 3.

min subject to - 5 x_{1} - 6 x_{2} - 4 x_{3} x_{1} free x_{2} \geq 0 x_{3} \leq 0 x_{1} - 2 x_{2} \geq - 1 - 3 x_{1} + x_{2} \leq - 2 - 3 x_{2} - x_{3} = - 3, max subject to - p_{1} - 2 p_{2} - 3 p_{3} p_{1} - 3 p_{2} = - 5 - 2 p_{1} + p_{2} - 3 p_{3} \leq - 6 - p_{3} \geq - 4 p_{1} \geq 0 p_{2} \leq 0 p_{3} free .

Theorem

如果将一个问题转换为其对偶问题, 然后将对偶问题的等价最小化问题再次对偶一次, 得到原始问题的等价最大化问题.

The Duality Theorem

Theorem

弱对偶定理: $c^{⊤} x \geq p^{⊤} b$ , 其中 $x$ 是原始问题可行解, $p$ 是对偶问题可行解

Corollary

如果 $c^{⊤} x = p^{⊤} b$ , 那么 $x$ 是原始问题最优解, $p$ 是对偶问题最优解

如果原始问题最优成本为 $+ \infty$ , 那么对偶问题无解

如果对偶问题最优成本为 $- \infty$ , 那么原始问题无解

Theorem

强对偶定理: 4. 如果primal problem和dual problem中有一个有解, 则另一个问题也有解, 且最优值相等. 5. 设 $x^{*}$ 是primal的optimal solution, $B$ 是primal的optimal basis, $p^{*}$ 是dual的optimal solution, 则:
$p^{*} = (c_{B}^{⊤} B^{- 1})^{⊤}$

primal problem ⇒ dual problem ⇒ introduce relax variable, turn to standard form ⇒ simplex solve

primal problem的simplex解出来的松弛变量对应的 $r_{i}^{⊤}$ 的值就是dual problem solution.

对偶单纯形法:

适用范围: 需要同时满足两个条件, 使普通单纯形法无法使用
1. 将 $\geq$ 乘 $- 1$ 转成 $\leq$ 之后, 导致 $b$ 列有负值
2. 同时, 在 $r^{⊤}$ 行全部大于0
求解流程: 实现类似[[#Implementations of the simplex method#实现|单纯形法]], 但是改了

我们一般将non basic variable组成一个basis, 因为他们是Identity Matrix(见initial tableau)
1. 转换成Dual problem
2. 将primal problem的 $\geq 0$ 的constrains乘 $- 1$ , 然后写出initial simplex tableau
3. 选择一个basic variable, 作为enter basis(Bland Rule) 选取方法: 对于 $\overset{ˉ}{b}_{i} \geq 0$ 的, 选择第 $i$ 个变量 $x_{i}$ .
4. 选择一个non-basis variable, 作为exit basis 选取方法:
  - 如果这一列都是positive or zero, 那么停止, 最值无界
  - 找到比值最小的一个: $p = j ar g min {\frac{r _{j}^{⊤}}{- u _{j q}} ∣ u_{j q} < 0, j = 1, \dots, m}$
  - 注意, 如果有比值相等的情况, 使用字典序找到最小的那一个.
5. 消元(或者说叫做转轴)
6. 重复上述操作, 直到 $r^{⊤}$ 没有negative为止.

e.g.

min s.t. 12 x_{1} + 16 x_{2} + 15 x_{3} 2 x_{1} + 4 x_{2} \geq 2, 2 x_{1} + 5 x_{3} \geq 3, x_{i} \geq 0, i = 1, 2, 3.

Turn into standard form, with $\leq$ constrains:

min s.t. 12 x_{1} + 16 x_{2} + 15 x_{3} - 2 x_{1} - 4 x_{2} + x_{4} = - 2, - 2 x_{1} - 5 x_{3} + x_{5} = - 3, x_{i} \geq 0, i = 1, 2, 3, 4, 5.

generate initial simplex table:

x_{4} x_{5} r^{⊤} x_{1} - 2 - 2 12 x_{2} - 4 016 x_{3} 0 - 5 15 x_{4} 100 x_{5} 010 B^{- 1} b - 2 - 3 0

Tip

选择这个是因为:

找 $B^{- 1} b$ 的最小负数, 选择 $- 3$ 对应的 $x_{5}$

找 $\frac{r _{j}^{⊤}}{- u _{j q}}$ 最小的一项, 选择 $min (6,, 3) = 3$ , 选择 $x_{5}$ 为exit basis, $x_{3}$ 为enter basis, $- 5$ 是对应的值

The second simplex table:

x_{4} x_{3} r^{⊤} x_{1} - 2 \frac{2}{5} 6 x_{2} - 4 016 x_{3} 010 x_{4} 100 x_{5} 0 - \frac{1}{5} 3 B^{- 1} b - 2 \frac{3}{5} - 9

Tip

选择这个是因为:

$B^{- 1} b < 0$ 只有一个

选择 $min (3, 4,) = 3$ , 选择 $x_{4}$ 为exit basis, $x_{1}$ 为enter basis, $- 2$ 是对应的值

The third simplex table:

x_{1} x_{3} r^{⊤} x_{1} 100 x_{2} 2 - \frac{4}{5} 4 x_{3} 010 x_{4} - \frac{1}{2} \frac{1}{5} 3 x_{5} 0 - \frac{1}{5} 3 B^{- 1} b 1 \frac{1}{5} - 15

Therefore, $(x_{1}, x_{2}, x_{3}, x_{4}, x_{5})^{⊤} = (1, 0, \frac{1}{5}, 0, 0)^{⊤}$ , the optimal cost is $15$ .

Convex Set

Affine Set

Definition

一个集合 $C \subset R^{n}$ 满足任意不同两点 $x_{1}, x_{2} \in C, θ \in R$ , 且 $θ x_{1} + (1 - θ) x_{2} \in C$ , 则这个集合是affine的

即, 如果任意两点连线表示的直线上的所有点都在这个set中, 那么这个set是affine set

这个定义可以扩展到多个点: $x_{1}, \dots, x_{k}$ 的affine combination $θ_{1} x_{1} + \dots + θ_{k} x_{k}$ where $\sum_{i = 1}^{k} θ_{i} = 1$ 也是属于affine set的.

Definition

affine dimension: 一个affine hull的维度定义为affine dimension

relative interior: 闭包仿射(closure affine C)的内部, 记作 $relint C = {x \in C ∣ B (x, r) \cap aff C \subset C for some r > 0}$

其中 $B (x, r) = {y ∣∥ y - x ∥ \leq r}$ 表示relative boundary

Convex Set

Definition

一个集合 $C \subset R^{n}$ 满足任意不同两点 $x_{1}, x_{2} \in C, θ \in [0, 1]$ , 且 $θ x_{1} + (1 - θ) x_{2} \in C$ , 则这个集合是convex的

即, 如果任意两点连线上的所有点都在这个set中, 那么这个set是convex set

注意与affine set区分, affine set要求整个直线都在这个set里面, 但是convex只要求连线线段在set中

Definition

convex hull: $conv C = {θ_{1} x_{1} + \dots + θ_{k} x_{k} ∣ x_{i} \in C, θ_{i} \geq 0, i = i, \dots, k, \sum_{i} θ_{i} = 1}$

Cones

Definition

$\forall x \in C, θ \geq 0$ , 如果满足 $θ x \in C$ , 那么C是cone, 或者non-negative homogeneous(非负齐次)

$\forall x_{1}, x_{2} \in C, θ_{1}, θ_{2} \geq 0$ , 如果 $θ_{1} x_{1} + θ_{2} x_{2} \in C$ , 那么 $C$ 是convex cone

${θ_{1} x_{1} + \dots + θ_{k} x_{k} ∣ θ_{i} \geq 0, x_{i} \in C, i = 1, \dots, k}$ 称为conic cone(锥包). 即所有点的conic combination的集合

Operation that Preserve Convexity

Insertion 如果 $S_{1}, \dots, S_{k}$ 都是convex的, 那么 $S_{1} \cap \dots \cap S_{k}$ 也是convex的
affine function 如果 $f : R^{n} \mapsto R^{m}$ 是affine的( $f (x) = Ax + b, A \in R^{m \times n}, b \in R^{m}$ )
perspective and linear-fractional function:
- perspective function: $P (x, t) = \frac{x}{t}, dom P = {(x, t) ∣ t \geq 0}$
- Linear-fractional function: $f (x) = \frac{Ax + b}{c ^{⊤} x + d}, dom P = {x ∣ c^{⊤} x + d > 0}$

Convex Function

Definition

convex functino:
$f (θ x + (1 - θ) y) \leq θ f (x) + (1 - θ) f (y)$

examples:

in $R$ :

$a x + b$
$∣ x ∣^{p}$ , $p \geq 1$
$x^{p}$ , $p \geq 1$ or $p \leq 0$
$e^{a x}$
$x lo g x$ Concave: $lo g x$ , $a x + b$ , $x^{p}$ for $0 \leq p \leq 1$

in $R^{n}$ :

$a^{⊤} x + b$
$∥ x ∥_{k}$
quadratic function: $f (x) = x^{⊤} Px + 2 q^{⊤} x + r$ is concave if and only if $P ⪯ 0$
geometric mean: $f (x) = (\prod_{i = 1}^{n} x_{i})^{\frac{1}{n}}$ is concave on $R_{++}^{n}$
$lo g \sum_{i} e^{x_{i}}$ is concave on $R^{n}$
$f (x, y) = \frac{x ^{⊤} x}{y}$ is concave on $R^{n} \times R_{++}$

in $R^{n \times n}$ :

affine function: $f (X) = tr (AX) + b$ is concave and convex on $R^{n \times n}$
Logarithmic Determinant Function: $f (X) = lo g det X$ is concave on $S^{n} = {X \in R^{n \times n} ∣ X ⪰ 0}$
maximum eigenvalue function: $f (x) = λ_{max} (X) = sup_{y \neq = 0} \frac{y ^{⊤} Xy}{y ^{⊤} y}$ is convex on $S^{n}$

Definition

epigraph: $epi f = {(x, t) \in R^{n + 1} ∣ x \in dom f, f (x) \leq t}$

Restriction of a Convex Function to a Line

Theorem

let $f : R^{n} \mapsto R$ and $g : R \mapsto R$ is $g (t) = f (x + t v)$ .

$f$ is convex if and only if $g$ is convex for $\forall x \in dom f, v \in R^{n}$

First and Second Order Condition

Gradient:

\nabla f (x) = [\frac{\partial f ( x )}{\partial x _{1}} \dots \frac{\partial f ( x )}{\partial x _{n}}]^{⊤} \in R^{n}

Hessian:

\nabla^{2} f (x) = (\frac{\partial ^{2} f ( x )}{\partial x _{i} \partial x _{j}})_{ij} \in R^{n \times n}

Definition

First order condition: 有convex的domain的可微函数 $f$ , 当且仅当 $f (y) \geq f (x) + \nabla f (x)^{⊤} (y - x)$ 的时候是convex的

Second order condition: convex domain的二次可微的函数 $f$ , 当且仅当 $\nabla^{2} f (x) ⪰ 0$ 的时候是convex的

Some Other Convexity

Quasi-Convexity

一个函数 $f : R^{n} \mapsto R$ 是quasi-convexity的, 当且仅当 $dom f$ 是convex的且sublevel set $S_{α} = {x \in dom f ∣ f (x) \leq α}$ 对于任何 $α$ 都是convex的

Log-Convexity

f (θ x + (1 - θ) y) \geq f (x)^{θ} f (y)^{1 - θ}

Convexity w.r.t. Generalized Inequalities

$f : R^{n} \mapsto R$ 如果 $dom f$ 是convex的且满足

\forall x, y \in dom f, 0 \leq θ \leq 1, f (θ x + (1 - θ) y) ⪯_{K} θ f (x) + (1 - θ) f (y)

那么称 $f$ 是K-convex的

Convex Problem

Standard form:

minimize subject to f_{0} (x) f_{i} (x) \leq 0 h_{i} (x) = 0 i = 1, \dots, m i = 1, \dots, p

feasibility:

如果一个点 $x \in dom f$ 满足所有的constraints, 那么称这个点是feasible的. 否则是infeasible
如果一个问题, 至少有一个点是feasible的, 那么称该问题为feasible的. 否则是infeasible

optimal:

p^{*} = in f {f_{0} (x) ∣ f_{i} (x) \leq 0, i = 1, \dots, m, h_{i} (x) = 0, i = 1, \dots, p}

如果问题是infeasible的, 那么 $p^{*} = \infty$
如果问题是无边界的(unbounded below), 那么 $p^{*} = - \infty$

Stationary Point

Definition

如果 $\nabla f (x) = 0$ , 那么称 $x$ 为stationary point

Theorem

如果一个stationary point $x$ , 对于其neighborhood $B \subset R^{n}$ , 满足 $f (y) \geq f (x), \forall y \in B$ , 那么 $x$ 是local minimization

如果一个stationary point $x$ , 有 $\forall y \in dom f$ , 满足 $f (y) \geq f (x)$ , 那么 $x$ 是global minimization

如果一个stationary point $x$ , 对于其neighborhood $B \subset R^{n}$ , 满足 $f (z) \leq f (x) \leq f (y), y, z \in B$ , 并且 $λ_{min} (\nabla^{2} f (x)) \leq 0$ 那么称 $x$ 为Saddle point

Convex Optimization Problem

standard form:

minimize subject to f_{0} (x) f_{i} (x) \leq 0 Ax = b i = 1, \dots, m

其中, $f_{0}, f_{1}, \dots, f_{m}$ 都是convex的. 等式约束(equality constraints)都是affine的

对于一个convex problem, 其local optimal就是global optimal.

Theorem

一个可行解(feasible point) $x$ 是optimal的当且仅当
$\nabla f (x)^{⊤} (y - x) \geq 0, \forall feasible y$

Quasi-Convex Optimization Problem

standard form:

minimize subject to f_{0} (x) f_{i} (x) \leq 0 Ax = b i = 1, \dots, m

其中, $f_{0}$ 是Quasi-Convexity的, $f_{1}, \dots, f_{m}$ 都是convex的. 等式约束(equality constraints)都是affine的

将 $f_{0}$ 使用epigraph $ϕ_{t} (x)$ 表示:

f_{0} (x) \leq t \Leftrightarrow ϕ_{t} (x) \leq 0

e.g.: Assume $p (x)$ is convex, $q (x)$ is concave and $p (x) \geq 0$ , $q (x) > 0$ .

f (x) = \frac{p ( x )}{q ( x )} \Leftrightarrow ϕ_{t} (x) = p (x) - tq (x)

for $t \geq 0$ , $ϕ_{t} (x)$ is convex in $x$ $\frac{p ( x )}{q ( x )} \leq t$ if and only if $ϕ_{t} (x) \leq 0$

Some Other Solver

LP(Linear Programming)

minimize subject to c^{⊤} x + d Gx \leq h Ax = b

Convex Problem: affine的objective function and constraints.

QP(Quadratic Programming)

minimize subject to \frac{1}{2} x^{⊤} Px + q^{⊤} x + r Gx \leq h Ax = b

convex problem: assume $P \in S_{+}^{n} ⪰ 0$ , convex quadratic objective function and affine constraints

QCQP(Quadratically Constrained QP)

minimize subject to \frac{1}{2} x^{⊤} P_{0} x + q_{0}^{⊤} x + r_{0} \frac{1}{2} x^{⊤} P_{i} x + q_{i}^{⊤} x + r_{i} \leq 0 Ax = b i = 1, \dots, m

convex problem: assume $P \in S_{+}^{n} ⪰ 0$ , convex quadratic objective function and constraints

SOCP(Second-Order Cone Programming)

minimize subject to f^{⊤} x ∥ A_{i} x + b_{i} ∥ \leq c_{i}^{⊤} x + d_{i} Fx = g i = 1, \dots, m

convex problem: linear objective and second-order cone constraints

如果 $A_{i}$ 是行向量, 那么会退化成LP(Linear Programming)
如果 $c_{i} = 0$ , 那么会退化成QCQP(Quadratically Constrained QP)

Generalized Inequality Constraints

minimize subject to f_{0} (x) f_{i} (x) ⪯_{K_{i}} 0 Ax = b i = 1, \dots, m

其中, $f_{0}$ 是convex objective function, $f_{i}$ 是K-Convex的

Conic Form Problem

minimize subject to c^{⊤} x Fx + g ⪯ 0 Ax = b

SDP(Semidefinite Problem)

minimize subject to c^{⊤} x x_{1} F_{1} + \dots + x_{n} F_{n} ⪯ G Ax = b

convex problem: linear objective function and linear matrix inequality(LMI) constraints

注意到多个LMI可以写成一个LMI, 因此有:

LP(Linear Programming) and equivalent SDP(Semidefinite Problem): $minimize subject to c^{⊤} x Ax ⪯ b \equiv minimize subject to c^{⊤} x diag (Ax - b) ⪯ 0$
SOCP(Second-Order Cone Programming) and equivalent SDP(Semidefinite Problem): $minimize subject to f^{⊤} x ∥ A_{i} x + b_{i} ∥ \leq c_{i}^{⊤} x + d_{i} i = 1, \dots, m$ $\equiv minimize subject to f^{⊤} x [(c_{i}^{⊤} x + d_{i}) I A_{i} x + b_{i} A_{i} x + b_{i} c_{i}^{⊤} x + d_{i}] ⪯ 0 i = 1, \dots, m$

Eigenvalue minimization:

\begin{matrix}\text{minimize}&\lambda_\max(\mathbf A(\mathbf x))\end{matrix}

其中, $\lambda_\max$ 指的是 $A (x) = A_{0} + x_{1} A_{1} + \dots + x_{n} A_{n}$ 的最大的特征值, 因此有 $\lambda_\max(\mathbf A(\mathbf x))\leq t\Leftrightarrow\mathbf A(\mathbf x)\preceq t\mathbf I$

因此, 可以转成等效SDP:

minimize subject to t A (x) ⪯ t I

Lagrangian

minimize subject to f_{0} (x) f_{i} (x) \leq 0 h_{i} (x) = 0 i = 1, \dots, m i = 1, \dots, n

定义Lagrangian方程为:

L (x, λ, ν) = f_{0} (x) + \sum λ_{i} f_{i} (x) + \sum ν_{i} h_{i} (x)

Lagrangian Dual Function

Theorem

下界:
$f_{0} (x) \geq L (x, λ, ν) \geq x \in D in f L (x, λ, ν) = g (λ, ν)$

拉格朗日对偶优化问题:

λ minimize subject to g (λ, ν) λ ⪰ 0

KKT Condition

primal feasibility

原始条件可行:

f_{i} (x)] \leq 0, i = 1, \dots, m h_{i} (x) = 0, i = 1, \dots, n

dual feasibility

对偶的Lagrangian multiplier非负:

λ ⪰ 0

complementary slackness

λ_{i} f_{i} (x^{*}) = 0, i = 1 \dots, m

zero gradient for Lagrangian with respect to x

\nabla_{x} f_{0} (x) + \sum \nabla_{x} λ_{i} f_{i} (x) + \sum \nabla_{x} ν_{i} h_{i} (x) = 0

Differentiable Unconstrained Minimization

min subject to f (x) x \in R^{n}

其中 $f$ 可微

Convergence Rate

计算方法:

假设给定objective function能够转换成一个序列 $r_{k}$ , 计算极限

q = k \to \infty lim \frac{r _{k + 1}}{r _{k}}

convergence rate就是 $q$ .

Convergence Type

$q = 0$ : superlinearly
$0 < q < 1$ : linearly
$q = 1$ : sublinearly
$q > 1$ : non-convergence

Convergence Rate of Quadratic Minimization

假设 $λ_{1} (Q)$ 是 $Q$ 的最大的eigenvalue, $λ_{n} (Q)$ 是最小eigenvalue, 那么可以认为:
$r = \frac{λ _{1} ( Q )}{λ _{n} ( Q )} = \frac{max _{x} λ _{1} ( \nabla ^{2} f ( x ))}{min _{x} λ _{n} ( \nabla ^{2} f ( x ))}$
$η_{t} \equiv η = \frac{2}{λ _{1} ( Q ) + λ _{n} ( Q )}$ , 那么
$∥ x^{t} - x^{*} ∥_{2} \leq (\frac{λ _{1} ( Q ) - λ _{n} ( Q )}{λ _{1} ( Q ) + λ _{n} ( Q )})^{t} ∥ x^{0} - x^{*} ∥_{2} = ε$
Convergence Analysis:
$1. 2. 3. ∥ x^{t} - x^{*} ∥_{2} \leq ε ∥ f (x^{t}) - f (x^{*}) ∥_{2} \leq ε ∥\nabla f (x) ∥_{2} \leq ε$
Convergence Rate:

sublinear: $T > \frac{1}{ε ^{k}}, o (\frac{1}{ε ^{k}})$

linear: $T > lo g \frac{1}{ε}, o (lo g \frac{1}{ε})$

quadratic(super linear): $T > lo g (lo g \frac{1}{ε}), o (lo g (lo g \frac{1}{ε}))$

Iteration Function: 4. sublinear: $∥ x^{t} - x^{*} ∥_{2} \leq \frac{1}{t ^{\frac{1}{k}}} ∥ x^{0} - x^{*} ∥_{2}$ 5. linear: $∥ x^{t} - x^{*} ∥_{2} \leq ∥ x^{q - 1} - x^{*} ∥_{2} \Rightarrow ∥ x^{t} - x^{*} ∥_{2} \leq q^{t} ∥ x^{0} - x^{*} ∥_{2}$ 6. quadratic: $∥ x^{t} - x^{*} ∥_{2} ≦ ∥ x^{q - 1} - x^{*} ∥_{2}^{2}$

Iterative descend algorithm

从 $x_{0}$ 开始, 构造序列 ${x^{t}}$ 满足 $f (x^{t + 1}) < f (x^{t}), t = 0, 1, \dots$

下降方向(descend direction) $d$ 满足:

f^{'} (x; d) := τ ↓ 0 lim \frac{f ( x + τ d ) - f ( x )}{τ} = \nabla f (x)^{⊤} d \leq 0

每一次迭代中, 有 $x^{t + 1} = x^{t} - η d^{t}$ , 其中 $d^{t}$ 是在 $x^{t}$ 的时候的descend direction, $η$ 是步长.

在机器学习中, $f$ 通常是loss函数, $x$ 通常是loss函数中的参数, $η$ 是学习率

Note

Steepest Descend 最陡下降法

最快优化objective function的方向:

mathop{\arg\min}_{\mathbf d:|\mathbf d|2\leq1}f’(\mathbf x;\mathbf d)=\mathop{\arg\min}{\mathbf d:|\mathbf d|_2\leq1}\nabla f(\mathbf x)^\top\mathbf d=-|\nabla f(\mathbf x)|_2$$
$- ∥\nabla f (x) ∥ \cdot ∥ d ∥_{2} \leq ⟨ \nabla f (x), d ⟩ \leq ∥\nabla f (x) ∥ \cdot ∥ d ∥_{2}$

Quadratic Minimization

min subject to f (x) := \frac{1}{2} (x - x^{*})^{⊤} Q (x - x^{*}) Q ≻ 0

该方程的梯度为 $\nabla f (x) = Q (x - x^{*})$

参数更新:

x^{t + 1} = (I - η_{t} Q) x^{t} + η_{t} Q x^{*}

step size( $η$ ) rule:

\Rightarrow \Rightarrow \Rightarrow x^{t + 1} - x^{*} = (I - η_{t} Q) (x^{t} - x^{*}) ∥ x^{t + 1} - x^{*} ∥ \leq ∥ I - η_{t} Q ∥ \cdot ∥ x^{t} - x^{*} ∥ ∥ I - η Q ∥ = \frac{λ _{1} ( Q ) - λ _{n} ( Q )}{λ _{1} ( Q ) + λ _{n} ( Q )} η \equiv η_{t} = \frac{2}{λ _{1} ( Q ) + λ _{n} ( Q )}

Exact Line Search

$η_{t} = ar g min_{η \geq 0} f (x^{t} - η \nabla f (x^{t}))$

假设有 $g^{t} = \nabla f (x^{t}) = Q (x^{t} - x^{*})$ , 那么 $η_{t} = \frac{g ^{t} ^{⊤} g ^{t}}{g ^{t} ^{⊤} Q g ^{t}}$

Kantorovich’s inequality:

\frac{∥ y ∥ _{2}^{4}}{( y ^{⊤} Qy ) ( y ^{⊤} Q ^{- 1} y )} \geq \frac{4 λ _{1} ( Q ) λ _{n} ( Q )}{( λ _{1} ( Q ) + λ _{n} ( Q ) ) ^{2}}

Smooth problem

$μ$ -strong和 $L$ -smooth定义:

0 ⪯ μ I ⪯ \nabla^{2} f (x) ⪯ L I

或者:

λ_{n} (Q) I ⪯ Q ⪯ λ_{1} (Q) I

Theorem

对于 $μ$ -strong和 $L$ -smooth的问题, 有:

step size: $η = \frac{2}{μ + L}$ (v.s. $\frac{2}{λ _{1} + λ _{n}}$ )

contraction rate: $\frac{κ - 1}{κ + 1}, κ = \frac{L}{μ}$ (v.s. $\frac{λ _{1} - λ _{n}}{λ _{n} + λ _{1}}$ )

iteration complexity: $o (\frac{l o g \frac{1}{ε}}{l o g \frac{κ - 1}{κ + 1}})$

f (x^{t}) - f (x^{*}) \leq \frac{L}{2} (\frac{κ - 1}{κ + 1})^{2 t} ∥ x^{0} - x^{*} ∥_{2}

Backtracking Line Search

Armijo Condition:

f (x^{t} - η \nabla f (x^{t})) < f (x^{t}) - α η ∥\nabla f (x^{t}) ∥_{2}^{2}, 0 < α < 1

algorithm:

initialize $η = 1$ , $0 < α < \frac{1}{2}$ , $0 < β < 1$
while $f (x^{t} - η \nabla f (x^{t})) < f (x^{t}) - α η ∥\nabla f (x^{t}) ∥_{2}^{2}$ , do:
- $η \leftarrow β η$

上界: $f (x^{t}) - η ∥\nabla f (x^{t}) ∥_{2}^{2} + \frac{L η ^{2}}{2} ∥\nabla f (x^{t}) ∥_{2}^{2}$

Theorem

$f (x^{t}) - f (x^{*}) \leq (1 - min {2 μα, \frac{2 β α μ}{L}})^{t} (f (x^{0}) - f (x^{*}))$

收敛性:

∥ x^{t} - x^{*} ∥ \leq \frac{κ - 1}{κ + 1} ∥ x^{t - 1} - x^{0} ∥

Regularity Condition

$μ$ -strong + $L$ -smooth

η \equiv η_{t} = \frac{1}{L}

∥ x^{t} - x^{*} ∥ \leq (1 - \frac{μ}{L})^{t} ∥ x^{0} - x^{*} ∥

Polyak-Lojasiewicz Condition

∥\nabla f (x) ∥_{2}^{2} \geq 2 μ (f (x) - f (x^{*}))

f (x^{t}) - f (x^{*}) \leq (1 - \frac{μ}{L})^{t} (f (x^{0}) - f (x^{*}))

Over-parameterized linear regression

over-parameterize: model dimension > sample size

定义

f (x) = \frac{1}{2} i = 1 \sum m (a_{i}^{⊤} x - y_{i})^{2} = \frac{1}{2} (AX - Y)^{2}

有

\nabla f (x) = 0 \Leftrightarrow X = (A^{⊤} A)^{- 1} A^{⊤} Y

\nabla^{2} f (x) = i = 1 \sum m a_{i} a_{i}^{⊤}

认为如果 $A = [a_{1}, \dots, a_{m}]^{⊤} \in R^{m \times n}$ , 其rank为 $m$ , 且满足step size $η_{t} \equiv η = \frac{1}{λ _{max} ( AA ^{⊤} )}$ , 有:

f (x^{t}) - f (x^{*}) \leq (1 - \frac{λ _{min} ( AA ^{⊤} )}{λ _{max} ( AA ^{⊤} )})^{t} (f (x^{0}) - f (x^{*})), \forall t

Convex and Smooth Problem

L-smooth

majorization-minimization

Theorem

如果是 $L$ -smooth的, 且有: $η = \frac{1}{L}$
$f (x^{t + 1}) \leq f (x^{t}) - \frac{1}{2 L} ∥\nabla f (x^{t}) ∥_{2}^{2}$ $∥ x^{t} - x^{*} ∥ \leq ∥ x^{t - 1} - x^{*} ∥ - \frac{1}{L ^{2}} ∥\nabla f (x^{t - 1}) ∥_{2}^{2}$ $f (x^{t}) - f (x^{*}) \leq \frac{2 L ∥ x ^{0} - x ^{*} ∥}{t}$

Non-convex Problem

Theorem

for general: $ $min_{0 \leq k < t} ∥\nabla f (x^{k}) ∥_{2} \leq \frac{2 L ( f ( x ^{0} ) - f ( x ^{*} ))}{t}$ $

for convex: $ $min_{\frac{t}{2} \leq k < t} ∥\nabla f (x^{k}) ∥_{2} = \frac{4 L ∥ x ^{0} - x ^{*} ∥ _{2}}{t}$ $

Gradient methods for Constrained Problems

Frank-Wolfe algorithm

$y^{t} := ar g min_{x \in C} ⟨ \nabla f (x^{t}), x^{t} ⟩$
$x^{t + 1} = (1 - η_{t}) x^{t} + η_{t} y^{t}$

over a convex set: $f (x^{t}) + ⟨ \nabla f (x^{t}), x - x^{t} ⟩$ . 步长类似Exact Line Search: $η_{t} = \frac{2}{t + 2}$

对于non-convex:

minimize subject to - x^{⊤} Qx ∥ x ∥_{2} \leq 1

有:

y^{t} \Rightarrow x^{t + 1} = ar g min_{x : ∥ x ∥_{2} \leq 1} ⟨ \nabla f (x^{t}), x ⟩ = - \frac{\nabla f ( x ^{t} )}{∥\nabla f ( x ^{t} ) ∥ _{2}} = \frac{Qx ^{t}}{∥ Qx ^{t} ∥ _{2}} = (1 - η_{t}) x^{t} + η_{t} \frac{Qx ^{t}}{∥ Qx ^{t} ∥ _{2}}

Convergence

Theorem

假设 $f$ 是convex的, 且是L-smooth的, 假设有 $η_{t} = \frac{2}{t + 2}$ , 那么有:
$f (x^{t}) - f (x^{*}) \leq \frac{2 L d _{C}^{2}}{t + 2}$
其中 $d_{C} = sup_{x, y \in C} ∥ x - y ∥_{2}$

对于compact约束集合, 效率可以达到 $ε$ -accuracy, 在 $O (\frac{1}{ε})$ 个迭代中

Example

假设有集合 $C$ 是 $μ$ -convex的, 假设 $\forall λ \in [0, 1]$ , $\forall x, z \in C$ , 定义 $B (a, r) := {y ∣∥ y - a ∥_{2} \leq r}$ , 那么有:
$B (λ x + (1 - λ) z, \frac{μ}{2} λ (1 - λ) ∥ x - z ∥_{2}^{2}) \in C$

Theorem

假设 $f$ 是convex and L-smooth的, 假设 $C$ 是 $μ$ -strongly convex的, 那么 $0 \leq c \leq ∥\nabla f (x) ∥_{2}, \forall x \in C$

Projected Gradient Method

将一个在集合外的点映射到集合中

Definition

Euclidean projection(quadratic minimization):
$P_{C} (x) := ar g min_{z \in C} ∥ x - z ∥_{2}^{2}$

循环:

x^{t + 1} = P_{C} (x^{t} - η_{t} \nabla f (x^{t}))

Theorem

假设有集合 $C$ 是close且convex的, 那么有:
$(x - P_{C} (x))^{⊤} (z - P_{C} (x)) \leq 0, \forall z \in C$

从上图可知, 有 $- \nabla f (x^{t})^{⊤} (x^{t + 1} - x^{t}) \geq 0$ , 即 $x^{t + 1} - x^{t}$ 和最速下降的方向是正相关的

Strongly Convex

Theorem

假设 $x^{*} \in int (C)$ , 假设 $f$ 是 $μ$ -strongly convex且L-smooth的. 令 $η_{t} = \frac{2}{μ + L}, κ = \frac{L}{μ}$ , 有
$∥ x^{t} - x^{*} ∥_{2} \leq (\frac{κ - 1}{κ + 1})^{t} ∥ x_{0} - x^{*} ∥_{2}$

一些其他情况参见Smooth problem

Non-differentiable Problems

(Projected) Sub-gradient Method

Definition

当且仅当 $g$ 满足
$f (z) \geq f (x) + g^{⊤} (z - x), \forall z$
的时候, 称之为sub-gradient

x^{t + 1} = P_{C} (x^{t} - η_{t} g^{t})

其中 $g^{t}$ 是 $f$ 的任意的sub-gradient

Theorem

$\partial f (a x) = a \partial f (x) \forall a > 0$ $\partial (f_{1} + f_{2}) = \partial f_{1} + \partial f_{2}$ $\partial f (Ax + b) = A^{⊤} \partial f (Ax + b)$
chain rule: $\partial g \circ f = g^{'} (f) \partial f$

composition: $f (x) = h (f (x_{1}), \dots, f (x_{m}))$ , 假设 $q = \nabla h (y) ∣_{y = [f_{1} (x), \dots, f_{n} (x)]}$ , 且 $g_{i} \in \nabla f_{i} (x)$ , 那么有: $q_{1} g_{1} + \dots + q_{n} g_{n} (x) \in \partial f (x)$

pointwise maximum: 如果有 $f (x) = max_{1 \leq i \leq k} f_{i} (x)$ , 那么有: $\partial f (x) = conv {\cup {\partial f_{i} (x) ∣ f_{i} (x) = f (x)}}$ , $conv$ 指的是convex hull of subdifferentials of all active functions

pointwise supremum: 对于 $f (x) = sup_{α \in F} f_{α} (x)$ , 有: $\partial f (x) = closure (conv {\cup {\partial f_{α} (x) ∣ f_{α} (x) = f (x)}})$ , closure的定义参见闭包仿射部分

Example

假设有norm: $f (x) = ∥ x ∥$ , 对于 $g$ 满足 $∥ g ∥_{*} \leq 1$ , 有
$g \in \partial f (0)$
其中, $∥ \cdot ∥_{*}$ 是原始norm的对偶: $∥ x ∥_{*} = sup_{z : ∥ z ∥ \leq 1} ⟨ z, x ⟩$

Example

l1-norm:
$f (x) = ∥ x ∥_{1} = i = 1 \sum n ∣ x_{i} ∣$
因为
$\partial f_{i} (x) = {sgn (x_{i}) e_{i} [- 1, 1] \cdot e_{i} if x_{1} \neq = 0 if x_{i} = 0$
因此有
$i : x_{i} \neq = 0 \sum sgn (x_{i}) e_{i} \in \partial f (x)$

Fundamental inequality for projected sub-gradient methods

majorization-minimization

找到另一个函数majorizes $∥ x^{t + 1} - x^{*} ∥_{2}^{2}$ , 然后优化这个函数

Lemma

Projected sub-gradient 更新规则满足:
$∥ x^{t + 1} - x^{*} ∥_{2}^{2} \leq majorizing function fixed ∥ x^{t} - x^{*} ∥_{2}^{2} - 2 η_{t} (f (x^{t}) - f^{opt}) + η_{t}^{2} ∥ g^{t} ∥_{2}^{2}$

Polyak’s step size rule

推荐的step size: $η_{t} = \frac{f ( x ^{t} ) - f ^{opt}}{∥ g ^{t} ∥ _{2}^{2}}$ , 其error reduction为:

∥ x^{t + 1} - x^{*} ∥_{2}^{2} \leq ∥ x^{t} - x^{*} ∥_{2}^{2} - \frac{( f ( x ^{t} ) - f ( x ^{*} ) ) ^{2}}{∥ g ^{t} ∥ _{2}^{2}}

当已知 $f^{opt}$ 的时候很有用

Convergence Rate是sublinear,

Theorem

假设 $f$ 是convex且 $L_{f}$ -Lipschitz continuous的. 那么projected sub-gradient method 应用Polyak’s step size有:
$f^{best, t} - f^{opt} \leq \frac{L _{f} ∥ x ^{0} - x ^{*} ∥ _{2}}{t + 1}$

Theorem

假设 $f$ 是convex且 $L_{f}$ -Lipschitz continuous的. 那么projected sub-gradient method, 但是没有应用Polyak’s step size, 有:
$f^{best, t} - f^{opt} \leq \frac{L ∥ x ^{0} - x ^{*} ∥ _{2} + L _{f}^{2} \sum _{i = 0}^{t} η _{i}^{2}}{2 \sum _{i = 0}^{t} η _{i}}$

summary:

Convex-concave saddle point problems

Transclude of #stationary-point

Knowledge Base

Explorer

Numerical Optimization

Preliminaries

Introduction

Variants of the linear programming problems

Standard form

Piecewise linear convex objective functions

Geometry of linear programming

Polyhedra and convex set

Convex Set

Extreme points, vertices, and basic feasible solution

Adjacent basic solution

Polyhedra in standard form

Correspondence of bases and basic solutions

Adjacent basic solution and adjacent base

The full row rank assumption on A

Degeneracy

Degeneracy in standard form polyhedra

Degeneracy is not a purely geometric property

Existence of extreme points

Optimality of extreme points

Representation of bounded polyhedra

The Simplex Method

Optimality conditions

Implementations of the simplex method

字典序

Bland Rule

实现

Convergence and Degeneracy

Duality Theorem

Motivate

Dual Problem

The Duality Theorem

Convex Set

Affine Set

Convex Set

Cones

Operation that Preserve Convexity

Convex Function

Restriction of a Convex Function to a Line

First and Second Order Condition

Some Other Convexity

Quasi-Convexity

Log-Convexity

Convexity w.r.t. Generalized Inequalities

Convex Problem

Stationary Point

Convex Optimization Problem

Quasi-Convex Optimization Problem

Some Other Solver

LP(Linear Programming)

QP(Quadratic Programming)

QCQP(Quadratically Constrained QP)

SOCP(Second-Order Cone Programming)

Generalized Inequality Constraints

SDP(Semidefinite Problem)

Lagrangian

Lagrangian Dual Function

KKT Condition

Differentiable Unconstrained Minimization

Convergence Rate

Convergence Type

Iterative descend algorithm

Quadratic Minimization

Exact Line Search

Smooth problem

Backtracking Line Search

Regularity Condition

Polyak-Lojasiewicz Condition

Over-parameterized linear regression

Convex and Smooth Problem

L-smooth

Non-convex Problem

Gradient methods for Constrained Problems

Frank-Wolfe algorithm

Convergence

Projected Gradient Method

Strongly Convex

Non-differentiable Problems

The full row rank assumption on $A$