PR01-Distributionally Robust Optimization Under Moment Uncertainty

This week, I read the paper Delage et al. (2010). The following are the reading notes.

1 Background

$$ \mathop{\text{minimize}}\limits_{\pmb{x} \in \mathscr{X}} \ h(\pmb{x}, \pmb{\xi}) \tag{1} $$ in which, $\mathscr{X}$ is a convex set of feasible solutions and $h(\pmb{x}, \pmb{\xi})$ is a convex cost function in $\pmb{x}$ that depends on some vector of parameters $\pmb{\xi}$. In practice, it is often the case that at the time of optimization, the parameters have not yet been fully resolved.

If one choose to represent the uncertainty of $\pmb{\xi}$ through a distribution $F$, one can instead resort to minimizing the expected cost. This leads to solving a stochastic program, $$ (\text{SP}) \quad \mathop{\text{minimize}} \limits_{\pmb{x} \in \mathscr{X}} \ \mathbb{E}[h(\pmb{x}, \pmb{\xi})] \tag{2} $$

in which the expectation is taken with respect to the random parameters $\pmb{\xi} \in \mathbb{R}^m$, given that it follows the probability distribution $F$. But unfortunately, although the SP is a convex optimization problem, to solve it one must often resort to Monte Carlo approximation, which can be computationally challenging. A more challenging difficulty is the need to commit to a distribution $F$ given only limited information about the stochastic parameters.

To address these issues, a robust formulation for SP was proposed by Scarf(1958). In this model, the true distribution $F$ is assumed to be included in a set of probability distributions $\mathscr{D}$. And the objective function is reformulated with respect to he worst case expected cost over the choice of a distribution in this set. This leads to solving the distributionally robust stochastic program, $$ (\text{DRSP}) \quad \mathop{\text{minimize}} \limits_{\pmb{x} \in \mathscr{X}} \left(\max_{F \in \mathscr{D}} \mathbb{E}_{F} [h(\pmb{x}, \pmb{\xi})] \right) \tag{3} $$

in which $\mathbb{E}_F[\cdot]$ is the expectation taken with respect to the random vector $\pmb{\xi}$, given that it follows the probability distribution $F$.

2 Motivation

Since the information about the distribution $F$ is limited, it might instead be safer to rely on estimated of the mean $\pmb{\mu}_0$ and covariance matrix $\pmb{\Sigma}_0$ of the random vector.

However, it is believed that it is rarely the case that one is entirely confident in these estimates. Hence, the authors proposed two hyper-parameters $\gamma_1$ and $\gamma_2$ to quantifying one’s confidence in $\pmb{\mu}_0$ and $\pmb{\Sigma}_0$.

For the mean of $\pmb{\xi}$, they assumed that it lies in an ellipsoid of size $\gamma_1$ centered at the estimate $\pmb{\mu}_0$, $$ (\mathbb{E}[\pmb{\xi}] - \pmb{\mu}_0)^\top \pmb{\Sigma}_0^{-1} (\mathbb{E}[\pmb{\xi}] - \pmb{\mu}_0) \leq \gamma_1 \tag{4} $$
For the covariance of $\pmb{\xi}$, the so-called second-moment is used, $$ \mathbb{E}[(\pmb{\xi} - \pmb{\mu}_0) (\pmb{\xi} - \pmb{\mu}_0)^\top] \preceq \gamma_2 \pmb{\Sigma}_0 \tag{5} $$

Totally, the distributional set is formulated as, $$ \mathscr{D}_1 (\mathscr{S}, \pmb{\mu}_0, \pmb{\Sigma}_0, \gamma_1, \gamma_2) = \left \lbrace \begin{aligned} F \in \mathscr{M} \left| \begin{aligned} \ & \mathbb{P}(\pmb{\xi} \in \mathscr{S}) = 1 \newline & (\mathbb{E}[\pmb{\xi}] - \pmb{\mu}_0)^\top \pmb{\Sigma}_0^{-1} (\mathbb{E}[\pmb{\xi}] - \pmb{\mu}_0) \leq \gamma_1\newline & \mathbb{E}[(\pmb{\xi} - \pmb{\mu}_0) (\pmb{\xi} - \pmb{\mu}_0)^\top] \preceq \gamma_2 \pmb{\Sigma}_0 \end{aligned} \right. \end{aligned}\right \rbrace \tag{6} $$ in which $\mathscr{M}$ is the set of probability measures, $\mathscr{S}$ is any closed convex set known to contain the support of $F$.

3 Distributionally Robust Stochastic Optimization

Consider the following DRSP model under the distributional set $\mathscr{D}_1$,

$$ (\text{DRSP}) \quad \mathop{\text{minimize}} \limits_{\pmb{x} \in \mathscr{X}} \left( \max_{F \in \mathscr{D}1} \mathbb{E}{F} [h(\pmb{x}, \pmb{\xi})] \right) \tag{7} $$ The authors analyzed the computation complexity of the inner moment problem and addressed the tractable solution method of the whole problem.

3.1 Complexity of the Inner Moment Problem

Let $\Psi(\pmb{x}, \gamma_1, \gamma_2)$ be the optimal value of the moment problem, $$ \mathop{\text{maximize}} \limits_{F \in \mathscr{D}1} \quad \mathbb{E}F[h(\pmb{x}, \pmb{\xi})] \tag{8} $$ It also can be described as the conic linear problem, $$ \begin{aligned} \mathop{\text{maximize}}\limits{F} \quad & \int{\mathscr{S}} h(\pmb{x}, \pmb{\xi}) dF(\pmb{\xi}) \newline \mathop{\text{subject to}} \quad & \left\lbrace \begin{aligned} & \int_{\mathscr{S}} dF(\pmb{\xi}) = 1, \newline & \int_{\mathscr{S}} (\pmb{\xi} - \pmb{\mu}_0) (\pmb{\xi} - \pmb{\mu}_0)^\top dF(\pmb{\xi}) \preceq \gamma_2 \pmb{\Sigma}0, \newline & \int{\mathscr{S}} \begin{bmatrix} \pmb{\Sigma}_0 & (\pmb{\xi} - \pmb{\mu}_0) \newline (\pmb{\xi} - \pmb{\mu}_0)^\top & \gamma_1 \end{bmatrix} dF(\pmb{\xi}) \succeq 0 \newline & F \in \mathscr{M} \end{aligned} \right. \end{aligned} \tag{9} $$

Theorem 1. For a fixed $\pmb{x} \in \mathbb{R}^n$, suppose that $\gamma_1 \geq 0, \gamma_2 \geq 1, \pmb{\Sigma}_0 \succ 0$, and that $h(\pmb{x}, \pmb{\xi})$ is F-integrable for all $F \in \mathscr{D}_1$. Then $\Psi(\pmb{x}, \gamma_1, \gamma_2)$ must be equal to the optimal value of the problem,

$$ \begin{aligned} \mathop{\text{minimize}}\limits_{\mathbf{Q}, \mathbf{q}, r, t} \quad & r + t \label{inner-obj} \newline \text{subject to} \quad & \left\lbrace \begin{aligned} & r \geq h(\pmb{x}, \pmb{\xi}) - \pmb{\xi}^\top \mathbf{Q} \pmb{\xi} - \pmb{\xi}^\top \mathbf{q}, \qquad \forall \pmb{\xi} \in \mathscr{S} \newline & t \geq (\gamma_2 \pmb{\Sigma}_0 + \pmb{\mu}_0 \pmb{\mu}_0^\top ) \bullet \mathbf{Q} + \pmb{\mu}_0^\top \mathbf{q} + \sqrt{\gamma_1} | \pmb{\Sigma}_0^{1 / 2} (\mathbf{q} + 2 \mathbf{Q}\pmb{\mu}_0) |, \newline & \mathbf{Q} \succeq 0 \end{aligned} \right. \label{inner-cons} \end{aligned} \tag{10} $$

in which $(\pmb{A} \bullet \pmb{B}) $ refers to the Frobenius inner product between matrices, $\mathbf{Q} \in \mathbb{R}^{m \times m}$ is a symmetric matrix, the vector $\mathbf{q} \in \mathbb{R}^m$, and $r, t \in \mathbb{R}$.

Lemma 1. (Grotschel et al. (1981)) Consider a convex optimization problem of the form, $$ \mathop{\text{minimize}} \limits_{\pmb{z} \in \mathscr{Z}} \ \pmb{c}^\top \pmb{z} $$ with linear objective and convex feasible set $\mathscr{Z}$. Given that the set of optimal solutions is nonempty, the problem can be solved to any precision $\epsilon$ in time polynomial in $\log(1 / \epsilon)$ and in the size of the problem by using ellipsoid method is and only if $\mathscr{Z}$ satisfies the following two conditions:

for any $\bar{\pmb{z}}$, it can be verified whether $\bar{\pmb{z}} \in \mathscr{Z}$ or not in time polynomial in the dimension of $\pmb{z}$

for any infeasible $\bar{\pmb{z}}$, a hyperplane that separates $\bar{\pmb{z}}$ from the feasible set $\mathscr{Z}$ can be generated in time polynomial in the dimension of $\pmb{z}$.

Assumption 1. The support set $\mathscr{S} \subset \mathbb{R}^m$ is convex and compact, and it is equipped with an oracle that can for any $\pmb{\xi} \in \mathbb{R}^m$ either confirm that $\pmb{\xi} \in \mathscr{S}$ or provide a hyperplane that separates $\pmb{\xi}$ from $\mathscr{S}$ in time polynomial in m.

Assumption 2. The function $h(\pmb{x}, \pmb{\xi})$ has the form $h(\pmb{x}, \pmb{\xi}) = \max_{k \in \left\lbrace 1, …, K\right\rbrace} h_k(\pmb{x}, \pmb{\xi})$ such that for each $k, h_k(\pmb{x}, \pmb{\xi})$ is concave in $\pmb{\xi}$. In addition, given a pair $(\pmb{x}, \pmb{\xi})$, it is assumed that one can in polynomial time:

evaluate the value of $h_k(\pmb{x}, \pmb{\xi}) $

find a supergradient of $h_k(\pmb{x}, \pmb{\xi})$ in $\pmb{\xi}$

Furthermore, for any $\pmb{x} \in \mathbb{R}^n, \pmb{q} \in \mathbb{R}^m$, and ay positive semidefinite $\mathbf{Q} \in \mathbb{R}^{m \times m}$, the set $\left\lbrace y \in \mathbb{R} | \exists \pmb{\xi} \in \mathscr{S}, y \leq h(\pmb{x}, \pmb{\xi}) - \pmb{q}^\top \pmb{\xi} - \pmb{\xi}^\top \mathbf{Q} \pmb{\xi} \right\rbrace$ is closed.

Theorem 2. Given that $\mathscr{S}$ satisfies Assumption 1 and that $h(\pmb{x}, \pmb{\xi})$ satisfies Assumption 2 and satisfies the condition of Lemma 1, then problem (10) is a convex optimization problem whose optimal value is finite and equal to $\Psi(\pmb{x}, \gamma_1, \gamma_2)$. Moreover, problem (10) can be solved to any precision $\epsilon$ in time polynomial in $\log (1 / \epsilon)$ and the size of the problem.

3.2 Complexity of the DRSP

$$ (\text{DRSP})\mathop{\text{minimize}} \limits_{\pmb{x} \in \mathscr{X}} \left(\max_{F \in \mathscr{D}1} \mathbb{E}{F} [h(\pmb{x}, \pmb{\xi})] \right) \tag{11} $$

Assumption 3. The $\mathscr{X} \subset \mathbb{R}^n$ is convex and compact, and it is equipped with an oracle that can for any $\pmb{x} \in \mathbb{R}^n$ either confirm that $\pmb{x} \in \mathscr{X}$ or provide a hyperplane that separates $\pmb{x}$ from $\mathscr{X}$ in time polynomial n.

Assumption 4. The function $h(\pmb{x}, \pmb{\xi})$ is convex in $\pmb{x}$. In addition, it is assumed that one can find in polynomial time a subgradient of $h(\pmb{x}, \pmb{\xi})$ in $\pmb{x}$.

Theorem 3. Given that Assumption 1, 2, 3 and 4 hold, then the DRSP model presented in problem (11) can be solved to any precision $\epsilon$ in time polynomial in $\log(1 / \epsilon)$ and the sizes of $\pmb{x}$ and $\pmb{\xi}$.

The proof idea is to transform the DRSP to the following convex optimization problem, $$ \begin{aligned} \mathop{\text{minimize}}\limits_{\mathbf{Q}, \mathbf{q}, r, t} \quad & r + t \newline \text{subject to} \quad & \left\lbrace \begin{aligned} & r \geq h_k(\pmb{x}, \pmb{\xi}) - \pmb{\xi}^\top \mathbf{Q} \pmb{\xi} - \pmb{\xi}^\top \mathbf{q},\quad \forall \pmb{\xi} \in \mathscr{S}, k \in \left\lbrace 1, …, K\right\rbrace, \newline & t \geq (\gamma_2 \pmb{\Sigma}_0 + \pmb{\mu}_0 \pmb{\mu}_0^\top ) \bullet \mathbf{Q} + \pmb{\mu}_0^\top \mathbf{q} + \sqrt{\gamma_1} | \pmb{\Sigma}_0^{1 / 2} (\mathbf{q} + 2 \mathbf{Q}\pmb{\mu}_0) |, \newline & \mathbf{Q} \succeq 0 \newline & \pmb{x} \in \mathscr{X} \end{aligned} \right. \end{aligned} $$ which can be solved to any precision $\epsilon$ in time polynomial in $\log (1 / \epsilon)$ with ellipsoid method if these assumptions hold.

4 Conclusion

This paper proposed the moment-based distributional set. And to my best knowledge, it is the first time that such robust stochastic optimization approach is named distributionally robust optimization (DRO). To some extent, it opens the era of DRO.
One important thing I get from this paper is that, not all the convex problems are easy to solve. Some problem has no polynomial time algorithm.
The problem type discussed in this paper is too general for me. The next week I am going to read a more practical paper.

Reference

Delage, E. and Y. Ye (2010). Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations Research 58(3), 595–612.

Grotschel, M., L. Lov ́asz, and A. Schrijver (1981). The ellipsoid method and its consequences in combinatorial optimization. Combinatorica 1(2), 169–197.

Scarf, H. (1958). A min-max solution of an inventory problem. Studies in the mathematical theory of inventory and production.

1 Background#

2 Motivation#

3 Distributionally Robust Stochastic Optimization#

3.1 Complexity of the Inner Moment Problem#

3.2 Complexity of the DRSP#

4 Conclusion#

Reference#