© Springer Nature Switzerland AG 2020
S. Dempe, A. Zemkoho (eds.)Bilevel OptimizationSpringer Optimization and Its Applications161https://doi.org/10.1007/978-3-030-52119-6_7

7. Bilevel Optimization and Variational Analysis

Boris S. Mordukhovich1  
(1)
Department of Mathematics, Wayne State University, Detroit, MI, USA
 
 
Boris S. Mordukhovich

Abstract

This chapter presents a self-contained approach of variational analysis and generalized differentiation to deriving necessary optimality conditions in bilevel optimization with Lipschitzian data. We mainly concentrate on optimistic models, although the developed machinery also applies to pessimistic versions. Some open problems are posed and discussed.

Keywords
Bilevel optimizationVariational analysisNondifferentiable programmingGeneralized differentiationLipschitzian functions and mappings

7.1 Introduction

Bilevel optimization has been well recognized as a theoretically very challenging and practically important area of applied mathematics. We refer the reader to the monographs [2, 6, 14, 30, 41], the extensive bibliographies and commentaries therein, as well as to the advanced material included in this book for various approaches, theoretical and numerical results, and a variety of practical applications of bilevel optimization and related topics.

One of the characteristic features of bilevel optimization problems is their intrinsic nonsmoothness, even if their initial data are described by linear functions. This makes natural to develop an approach of modern variational analysis and generalized differentiation to the study and applications of major models in bilevel optimization. It has been done in numerous publications, which are presented and analyzed in the author’s recent book [30].

The main goal we pursue here is to overview this approach together with the corresponding machinery of variational analysis and to apply it to deriving necessary optimality conditions in optimistic bilevel models with Lipschitzian data while also commenting on other versions in bilevel optimization with posting open questions. To make this chapter largely self-contained and more accessible for the reader, we present here the basic background from variational analysis and generalized differentiation, which is needed for applications to bilevel optimization. For brevity and simplicity we confine ourselves to problems in finite-dimensional spaces.

The rest of this work is organized as follows. In Sect. 7.2 we recall those constructions of generalized differentiation in variational analysis, which are broadly used in the subsequent text. Section 7.3 presents the fundamental extremal principle that is behind generalized differential calculus and applications to optimization in the geometric approach to variational analysis developed in [29, 30]. Section 7.4 is devoted to deriving—via the extremal principle—the two basic calculus rules, which are particularly useful for applications to optimality conditions. In Sect. 7.5 we establish subdifferential evaluations and efficient conditions that ensure the local Lipschitz continuity of optimal value functions in general problems of parametric optimization. These results are crucial for variational applications to bilevel programming.

To proceed with such applications, we first consider in Sect. 7.6 problems of nondifferentiable programming with Lipschitzian data. Subdifferential necessary optimality conditions for Lipschitzian programs are derived there by using the extremal principle and calculus rules. Section 7.7 contains the formulation of the bilevel optimization problems under consideration and the description of the variational approach to their study. Based on this approach and subdifferentiation of the optimal value functions for lower-level problems, we establish in Sect. 7.8 necessary optimality conditions for Lipschitzian bilevel programs. The other developments in this direction for bilevel optimization problems with Lipschitzian data is presented in Sect. 7.9 by using the subdifferential difference rule based on a certain variational technique. The concluding Sect. 7.10 discusses further perspectives of employing concepts and techniques of variational analysis to bilevel optimization with formulations of some open questions.

Throughout this chapter we use the standard notation and terminology of variational analysis and generalized differentiation; see, e.g., [29, 30, 40].

7.2 Basic Constructions of Generalized Differentiation

Here we present the basic definitions of generalized normals to sets, coderivatives of set-valued mappings, and subgradients of extended-real-valued functions initiated by the author [26] that are predominantly used in what follows. The reader is referred to the books [29, 30, 40] for more details. Developing a geometric approach to generalized differentiation, we start with normals to sets, then continue with coderivatives of (set-valued and single-valued) mappings, and finally pass to subgradients of extended-real-valued functions.

Given a nonempty set 
$$\Omega \subset \mathbb {R}^n$$
, we always suppose without loss of generality that it is locally closed around the reference point 
$$\bar {x}\in \Omega $$
. For each 
$$x\in \mathbb {R}^n$$
close to 
$$\bar {x}$$
consider its (nonempty) Euclidean projector to Ω defined by

$$\displaystyle \begin{aligned} \Pi(x;\Omega ):=\big\{w\in\Omega \big|\;\|x-w\|=\displaystyle\min_{u\in\Omega }\|x-u\|\big\}. \end{aligned}$$
Then the (basic, limiting, Mordukhovich) normal cone to Ω at 
$$\bar {x}$$
is

$$\displaystyle \begin{aligned} \begin{array}{ll} N(\bar{x};\Omega ):=\Big\{v\in\mathbb{R}^n\Big|&\exists\,x_k\to\bar{x},\;\exists\,w_k\in\Pi(x_k;\Omega ),\;\exists\,\alpha_k\ge 0\\ &\mbox{such that }\;\alpha_k(x_k-w_k)\to v\;\mbox{ as }\;k\to\infty\Big\}. \end{array} \end{aligned} $$
(7.2.1)
The normal cone (7.2.1) is always closed while may be nonconvex in standard situations; e.g., when Ω is the graph of the simplest nonsmooth convex function |x| at 
$$\bar {x}=(0,0)\in \mathbb {R}^2$$
. Nevertheless, this normal cone and the associated coderivatives of mappings and subdifferentials of functions enjoy comprehensive calculus rules due to variational/extremal principles of variational analysis. Note that 
$$N(\bar {x};\Omega )\neq \{0\}$$
if and only if 
$$\bar {x}$$
is a boundary point of Ω.
There is a useful representation of the normal cone (7.2.1) in terms of convex collections of (pre)normal vectors to Ω at points nearby 
$$\bar {x}$$
. Given x ∈ Ω close to 
$$\bar {x}$$
, the prenormal cone to Ω at x (known also as the regular or Fréchet normal cone) is defined by
../images/480569_1_En_7_Chapter/480569_1_En_7_Equ2_HTML.png
(7.2.2)
where the symbol ../images/480569_1_En_7_Chapter/480569_1_En_7_IEq11_HTML.gif means that u → x with u ∈ Ω. Then the prenormal cone (7.2.2) is always closed and convex while may collapse to {0} at boundary points of closed sets, which in fact contradicts the very meaning of generalized normals. If Ω is convex, then both normal and prenormal cones reduce to the normal cone of convex analysis. In general we have
../images/480569_1_En_7_Chapter/480569_1_En_7_Equ3_HTML.png
(7.2.3)
bNote that the limiting representation (7.2.3) keeps holding if the prenormal cone (7.2.2) therein is expanded to its ε k-enlargements 
$$\hat {N}_{\varepsilon _k}$$
as ε k 0, where the latter expansions are defined by replacing 0 with ε k on the right-hand side of (7.2.2).
Let 
$$F\colon \mathbb {R}^n\rightrightarrows \mathbb {R}^m$$
be a set-valued mapping/multifunction with values 
$$F(x)\subset \mathbb {R}^m$$
and with its domain and graph defined, respectively, by

$$\displaystyle \begin{aligned} \mbox{dom}\, F:=\big\{x\in\mathbb{R}^n\big|\;F(x)\neq \emptyset\big\}\;\mbox{ and }\;\mbox{gph}\, F:=\big\{(x,y)\in\mathbb{R}^n\times\mathbb{R}^m\big|\;y\in F(x)\big\}. \end{aligned}$$
When F is single-valued, we use the standard notation 
$$F\colon \mathbb {R}^n\to \mathbb {R}^m$$
. Assuming that the graph of F is locally closed around 
$$(\bar {x},\bar {y})\in \mbox{gph}\, F$$
, we define the coderivative of F at this point via the normal cone (7.2.1) to the graph of F by

$$\displaystyle \begin{aligned} D^*F(\bar{x},\bar{y})(w):=\big\{v\in\mathbb{R}^n\big|\;(v,-w)\in N\big((\bar{x},\bar{y});\mbox{gph}\, F\big)\big\},\quad  w\in\mathbb{R}^m. \end{aligned} $$
(7.2.4)
Thus 
$$D^*F(\bar {x},\bar {y})\colon \mathbb {R}^m\rightrightarrows \mathbb {R}^n$$
is a set-valued positively homogeneous mapping, which reduces to the adjoint/transposed Jacobian for all single-valued mappings 
$$F\colon \mathbb {R}^n\to \mathbb {R}^m$$
that are smooth around 
$$\bar {x}$$
, where 
$$\bar {y}=F(\bar {x})$$
is dropped in this case in the coderivative notation:

$$\displaystyle \begin{aligned} D^*F(\bar{x})(w)=\big\{\nabla F(\bar{x})^*w\big\}\;\mbox{ for all }\;w\in\mathbb{R}^m. \end{aligned}$$
Besides a full calculus available for the coderivative (7.2.4), this construction plays an important role in variational analysis and its applications since it provides complete characterizations of fundamental well-posedness properties of multifunctions concerning Lipschitzian stability, metric regularity, and linear openness/covering. In this work we deal with the Lipschitz-like (Aubin, pseudo-Lipschitz) property of 
$$F\colon \mathbb {R}^n\rightrightarrows \mathbb {R}^m$$
around 
$$(\bar {x},\bar {y})\in \mbox{gph}\, F$$
defined as follows: there exist neighborhoods U of 
$$\bar {x}$$
and V  of 
$$\bar {y}$$
and a constant  ≥ 0 such that

$$\displaystyle \begin{aligned} F(x)\cap V\subset F(u)+\ell\|x-u\|\mathbb{B}\;\mbox{ for all }\;x,u\in U, \end{aligned} $$
(7.2.5)
where 
$$\mathbb {B}$$
stands for the closed unit ball of the space in question. If 
$$V=\mathbb {R}^m$$
in (7.2.5), then it reduces to the classical local Lipschitzian property of F around 
$$\bar {x}$$
. The coderivative characterization of (7.2.5), which is called in [40] the Mordukhovich criterion, tells us that F is Lipschitz-like around 
$$(\bar {x},\bar {y})$$
if and only if we have

$$\displaystyle \begin{aligned} D^*F(\bar{x},\bar{y})(0)=\{0\}. \end{aligned} $$
(7.2.6)
Furthermore, the exact bound (infimum) of all the Lipschitz constant {} in (7.2.5) is calculated as the norm 
$$\|D^*F(\bar {x},\bar {y})\|$$
of the positively homogeneous coderivative mapping 
$$w\mapsto D^*F(\bar {x},\bar {y})(w)$$
. The reader can find in [2830, 40] different proofs of this result with numerous applications.
Consider finally an extended-real-valued function 
$$\varphi \colon \mathbb {R}^n\to \overline {\mathbb {R}}:=(-\infty ,\infty ]$$
finite at 
$$\bar {x}$$
and lower semicontinuous (l.s.c.) around this point. Denote by

$$\displaystyle \begin{aligned} \mbox{dom}\,\varphi:=\big\{x\in\mathbb{R}^n\big|\;\varphi(x)<\infty\big\}\;\mbox{ and }\;\mbox{epi}\,\varphi:=\big\{(x,\mu)\in\mathbb{R}^n\times\mathbb{R}\big|\;\mu\ge\varphi(x)\big\} \end{aligned}$$
the domain and epigraph of φ, respectively. Given 
$$\bar {x}\in \mbox{dom}\,\varphi $$
and using the normal cone (7.2.1) to the epigraph of φ at 
$$(\bar {x},\varphi (\bar {x}))$$
, we define the two types of the subdifferentials of φ at 
$$\bar {x}$$
: the basic subdifferential and the singular subdifferential by, respectively,

$$\displaystyle \begin{aligned} \partial\varphi(\bar{x}):=\big\{v\in\mathbb{R}^n\big|\;(v,-1)\in N\big((\bar{x},\varphi(\bar{x}));\mbox{epi}\,\varphi\big)\big\}, \end{aligned} $$
(7.2.7)

$$\displaystyle \begin{aligned} \partial^\infty\varphi(\bar{x}):=\big\{v\in\mathbb{R}^n\big|\;(v,0)\in N\big((\bar{x},\varphi(\bar{x}));\mbox{epi}\,\varphi\big)\big\}. \end{aligned} $$
(7.2.8)
The basic subdifferential (7.2.7) reduces to the gradient 
$$\{\nabla \varphi (\bar {x})\}$$
for smooth functions and to the subdifferential of convex analysis if φ is convex. Observe that 
$$\partial \varphi (\bar {x})=D^*E_\varphi (\bar {x},\varphi (\bar {x}))(1)$$
and 
$$\partial ^\infty \varphi (\bar {x})=D^*E_\varphi (\bar {x},\varphi (\bar {x}))(0)$$
via the coderivative (7.2.4) of the epigraphical multifunction 
$$E_\varphi \colon \mathbb {R}^n\rightrightarrows \mathbb {R}$$
defined by 
$$E_\varphi (x):=\{\mu \in \mathbb {R}|\;\mu \ge \varphi (x)\}$$
. Thus the coderivative characterization (7.2.6) of the Lipschitz-like property of multifunctions implies that a lower semicontinuous function φ is locally Lipschitzian around 
$$\bar {x}$$
if and only if

$$\displaystyle \begin{aligned} \partial^\infty\varphi(\bar{x})=\{0\}. \end{aligned} $$
(7.2.9)
Note also that, given any (closed) set 
$$\Omega \subset \mathbb {R}^n$$
with its indicator functionδ(x; Ω) = δ Ω(x) equal 0 for x ∈ Ω and otherwise, we have that

$$\displaystyle \begin{aligned} \partial\delta (\bar{x};\Omega )=\partial^\infty\delta (\bar{x};\Omega )=N(\bar{x};\Omega )\;\mbox{ whenever }\;\bar{x}\in\Omega . \end{aligned} $$
(7.2.10)
Both subdifferentials (7.2.7) and (7.2.8) admit limiting representations in terms of the presubdifferential, or regular subdifferential

$$\displaystyle \begin{aligned} \hat\partial\varphi(x):=\Big\{v\in\mathbb{R}^n\Big|\;\displaystyle\liminf_{u\to x}\frac{\varphi(u)-\varphi(x)-\langle v,u-x\rangle}{\|u-x\|}\ge 0\Big\} \end{aligned} $$
(7.2.11)
of φ at points x close to 
$$\bar {x}$$
. Namely, we have
../images/480569_1_En_7_Chapter/480569_1_En_7_Equ12_HTML.png
(7.2.12)
../images/480569_1_En_7_Chapter/480569_1_En_7_Equ13_HTML.png
(7.2.13)
where the symbol ../images/480569_1_En_7_Chapter/480569_1_En_7_IEq44_HTML.gif indicates that 
$$x\to \bar {x}$$
with 
$$\varphi (x)\to \varphi (\bar {x})$$
. Note that the presubdifferential (7.2.11) is related to the prenormal cone (7.2.2) as in (7.2.7) and is also used in variational analysis under the names of the Fréchet subdifferential and the viscosity subdifferential. Similarly to the case of basic normals in (7.2.3) it is not hard to observe that we still have the subdifferential representations in (7.2.12) and (7.2.13) if the presubdifferential (7.2.11) therein is expanded by its ε k-enlargements 
$$\hat \partial _{\varepsilon _k}\varphi $$
defined with replacing 0 on the right-hand side of (7.2.11) by − ε k.

7.3 Extremal Principle in Variational Analysis

In this section we recall, following [22], the notion of locally extremal points for systems of finitely many sets and then derive the fundamental extremal principle, which gives us necessary conditions for extremality of closed set systems in 
$$\mathbb {R}^n$$
.

Definition 7.3.1
Let Ω1, …, Ωs as s ≥ 2 be nonempty subsets of 
$$\mathbb {R}^n$$
, which are assumed to be locally closed around their common point 
$$\bar {x}$$
. We say that 
$$\bar {x}$$
is a Locally Extremal Point of the set system { Ω1, …, Ωs} if there exist a neighborhood U of 
$$\bar {x}$$
and sequences of vectors 
$$a_{ik}\in \mathbb {R}^n,\ i=1,\ldots ,s$$
, such that a ik → 0 as k → for all i ∈{1, …, s} and

$$\displaystyle \begin{aligned} \bigcap_{i=1}^s\big(\Omega _i-a_{ik}\big)\cap U=\emptyset\;\mbox{ whenever }\;k=1,2,\ldots. \end{aligned} $$
(7.3.1)

Geometrically Definition 7.3.1 says that 
$$\bar {x}$$
is a locally extremal point of the set system { Ω1, …, Ωs} if these sets can be pushed apart from each other locally around 
$$\bar {x}$$
. Observe also that for the case of two sets Ω1, Ω2 containing 
$$\bar {x}$$
the above definition can be equivalently reformulated as follows: there is a neighborhood U of 
$$\bar {x}$$
such that for any ε > 0 there exists a vector 
$$a\in \mathbb {R}^n$$
with ∥a∥≤ ε and ( Ω1 − a) ∩ Ω2 ∩ U = ∅.

It is easy to see that for a closed set Ω and any of its boundary points 
$$\bar {x}$$
, we have that 
$$\bar {x}$$
is a locally extremal point of the set system 
$$\{\Omega ,\{\bar {x}\}\}$$
. Furthermore, the introduced notion of set extremality covers various notions of optimality and equilibria in problems of scalar and vector optimization. In particular, a local minimizer 
$$\bar {x}$$
of the general constrained optimization problem

$$\displaystyle \begin{aligned} \mbox{minimize }\;\varphi(x)\;\mbox{ subject to }\;x\in\Omega \subset\mathbb{R}^n,\end{aligned} $$
where φ is l.s.c. and Ω is closed around 
$$\bar {x}$$
, corresponds to the locally extremal point 
$$(\bar {x},\varphi (\bar {x}))$$
of the sets Ω1 := epi φ and 
$$\Omega _2:=\Omega \times \{\varphi (\bar {x})\}$$
. As we see below, extremal systems naturally arise in deriving calculus rules of generalized differentiation.

Now we are ready to formulate and prove the basic extremal principle of variational analysis for systems of finitely many closed sets in 
$$\mathbb {R}^n$$
by using the normal cone construction (7.2.1).

Theorem 7.3.2
Let 
$$\bar {x}$$
be a locally extremal point of the system { Ω 1, …, Ω s} of nonempty subsets of 
$$\mathbb {R}^n$$
, which are locally closed around 
$$\bar {x}$$
. Then there exist generalized normals 
$$v_i\in N(\bar {x};\Omega _i)$$
for i = 1, …, s, not equal to zero simultaneously, such that we have the generalized Euler equation

$$\displaystyle \begin{aligned} v_1+\ldots+v_s=0. \end{aligned} $$
(7.3.2)

Proof
Using Definition 7.3.1, suppose without loss of generality that 
$$U=\mathbb {R}^n$$
. Taking the sequences {a ik} therein, for each k = 1, 2, … consider the unconstrained optimization problem:

$$\displaystyle \begin{aligned} \mbox{minimize }\;\varphi_k(x):=\Big[\sum_{i=1}^s d^2(x+a_{ik};\Omega _i)\Big]^{1/2}+\|x-\bar{x}\|{}^2,\quad  x\in\mathbb{R}^n, \end{aligned} $$
(7.3.3)
where d(x; Ω) indicates the Euclidean distance between x and Ω. Since φ k is continuous and the level sets of it are bounded, we deduce from the classical Weierstrass theorem that there exists an optimal solution x k to each problem (7.3.3) as k = 1, 2, …. It follows from the crucial extremality requirement (7.3.1) in Definition 7.3.1 that

$$\displaystyle \begin{aligned} \gamma_k:=\Big[\sum_{i=1}^sd^2(x_k+a_{ik};\Omega _i)\Big]^{1/2}>0. \end{aligned} $$
(7.3.4)
The optimality of x k in (7.3.3) tells us that

$$\displaystyle \begin{aligned} 0<\gamma_k+\|x_k-\bar{x}\|{}^2=\varphi_k(x_k)\le\varphi_k(\bar{x})=\Big[\sum_{i=1}^s\|a_{ik}\|{}^2\Big]^{1/2}\downarrow 0, \end{aligned}$$
and so γ k 0 and 
$$x_k\to \bar {x}$$
as k →. By the closedness of the sets Ωi, i = 1, …, s, around 
$$\bar {x}$$
, we pick w ik ∈ Π(x k + a ik; Ωi) and for each k form another unconstrained optimization problem:

$$\displaystyle \begin{aligned} \mbox{minimize }\;\psi_k(x):=\Big[\sum_{i=1}^s\|x+a_{ik}-w_{ik}\|{}^2\Big]^{1/2}+\|x-\bar{x}\|{}^2,\quad  x\in\mathbb{R}^n \end{aligned} $$
(7.3.5)
which obviously has the same optimal solution x k. In contrast to φ k in (7.3.3), the function ψ k in (7.3.5) is differentiable at x k due to (7.3.4). Thus applying the Fermat rule in (7.3.5) tells us that

$$\displaystyle \begin{aligned} \nabla\psi_k(x_k)=\sum_{i=1}^s v_{ik}+2(x_k-\bar{x})=0 \end{aligned} $$
(7.3.6)
with v ik := (x k + a ik − w ik)∕γ k, i = 1, …, s, satisfying

$$\displaystyle \begin{aligned} \|v_{1k}\|{}^2+\ldots+\|v_{sk}\|{}^2=1\;\mbox{ for all }\;k=1,2,\ldots. \end{aligned} $$
(7.3.7)
Remembering the compactness of the unit sphere in 
$$\mathbb {R}^n$$
, we get by passing to the limit as k → in (7.3.6) and (7.3.7) that there exist v 1, …, v s, not equal to zero simultaneously, for which (7.3.2) holds. Finally, it follows directly from the above constructions and the normal cone definition (7.2.1) that 
$$v_i\in N(\bar {x};\Omega _i)$$
for all i = 1, …, s. This completes the proof of the theorem. □
Since for convex sets Ω the normal cone (7.2.1) reduces to the normal cone of convex analysis

$$\displaystyle \begin{aligned} N(\bar{x};\Omega ):=\big\{v\in\mathbb{R}^n\big|\;\langle v,x-\bar{x}\rangle\le 0\;\mbox{ whenever }\;x\in\Omega \big\}, \end{aligned}$$
the extremal principle of Theorem 7.3.2 can be treated as a variational extension of the classical separation theorem to the case of finitely many nonconvex sets in 
$$\mathbb {R}^n$$
.

7.4 Fundamental Calculus Rules

Employing the extremal principle, we derive here two fundamental rules of generalized differential calculus, which are broadly used in this chapter and from which many other calculus rules follow; see [29, 30]. The first result is the intersection rule for basic normals (7.2.1).

Theorem 7.4.1
Let Ω 1, …, Ω sbe nonempty subsets of 
$$\mathbb {R}^n$$
, which are locally closed around their common point 
$$\bar {x}$$
. Assume the validity of the following qualification condition:

$$\displaystyle \begin{aligned} \big[x_i\in N(\bar{x};\Omega _i),\;x_1+\ldots+x_s=0\big]\Longrightarrow x_i=0\;\mathit{\mbox{ for all }}\;i=1,\ldots,s. \end{aligned} $$
(7.4.1)
Then we have the normal cone intersection rule

$$\displaystyle \begin{aligned} \displaystyle N\Big(\bar{x};\bigcap_{i=1}^s\Omega _i\Big)\subset N(\bar{x};\Omega _1)+\ldots+N(\bar{x};\Omega _s). \end{aligned} $$
(7.4.2)

Proof
Arguing by induction, we first verify the result for s = 2. Pick any 
$$v\in N(\bar {x};\Omega _1\cap \Omega _2)$$
and use the normal cone representation (7.2.3). It gives us sequences 
$$x_k\to \bar {x}$$
with x k ∈ Ω1 ∩ Ω2 and v k → v with 
$$v_k\in \hat N(\bar {x},\Omega _1\cap \Omega _2)$$
as k →. Take any sequence ε k 0 and construct the sets

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \Theta_1:=\Omega _1\times\mathbb{R}_+,\\ &\displaystyle &\displaystyle \Theta_{2k}:=\big\{(x,\alpha)\big|\;x\in\Omega _2,\,\langle v_k,x-x_k\rangle-\varepsilon_k\|x-x_k\|\ge\alpha\big\}\;\mbox{ for any }\;k=1,2,\ldots. \end{array} \end{aligned} $$
These sets are obviously closed around (x k, 0) ∈ Θ1 ∩ Θ2k for all k sufficiently large. Furthermore, it follows from the prenormal cone definition (7.2.2) that there exists a neighborhood U of x k with

$$\displaystyle \begin{aligned} \Theta_1\cap\big(\Theta_{2k}-(0,\gamma)\big)\cap(U\times\mathbb{R})=\emptyset \end{aligned}$$
for small numbers γ > 0. Thus the pair (x k, 0) is a locally extremal point of the set system { Θ1, Θ2k} for such k. Applying to this system the extremal principle from Theorem 7.3.2 gives us pairs (u k, λ k) from the unit sphere in 
$$\mathbb {R}^{n+1}$$
for which

$$\displaystyle \begin{aligned} (u_k,\lambda_k)\in N\big((x_k,0);\Theta_1\big)\;\mbox{ and }\;(-u_k,-\lambda_k)\in N\big((x_k,0);\Theta_{2k}\big). \end{aligned} $$
(7.4.3)
Considering a subsequence of the unit vectors (u k, λ k) if needed, we get (u k, λ k) → (u, λ) as k → for some 
$$(u,\lambda )\in \mathbb {R}^{n+1}$$
with ∥(u, λ)∥ = 1. Passing to the limit in the first inclusion of (7.4.3) gives us 
$$(u,\lambda )\in N((\bar {x},0);\Theta _1)$$
, which immediately implies—by the easily verifiable product rule equality for normals to Cartesian products (see, e.g., [29, Proposition 1.2])—that 
$$u\in N(\bar {x};\Omega _1)$$
and λ ≤ 0. On the other hand, the limiting procedure in the second inclusion of (7.4.3) leads us by the structure of Θ2k to

$$\displaystyle \begin{aligned} (-\lambda v-u,\lambda)\in N\big((\bar{x},0);\Omega _2\times\mathbb{R}_+\big). \end{aligned} $$
(7.4.4)
To verify (7.4.4), for each 
$$k\in \mathbb {N}$$
rewrite the set Θ2k as

$$\displaystyle \begin{aligned} \Theta_{2k}=\big\{\big(x,\langle v_k,x-x_k\rangle-\varepsilon_k\|x-x_k\|-\alpha\big)\big|\;x\in\Omega _2,\;\alpha\in\mathbb{R}_+\big\}. \end{aligned}$$
Using representation (7.2.3) of basic normals via limits of regular ones, for each fixed 
$$k\in \mathbb {N}$$
we find sequences x km → x k, α km 0, u km → u k and λ km → λ k as m → such that

$$\displaystyle \begin{aligned} (-u_{km},-\lambda_{km})\in\hat N\big((x_{km},\langle v_k,x_{km}-x_k\rangle-\varepsilon_k\|x_{km}-x_k\|-\alpha_{km});\Theta_{2k}\big)\end{aligned}$$
whenever 
$$m\in \mathbb {N}.$$
The latter means by definition (7.2.2) of regular normals that
../images/480569_1_En_7_Chapter/480569_1_En_7_Equk_HTML.png
Thus for all 
$$k\in \mathbb {N}$$
and 
$$m\in \mathbb {N}$$
we arrive at the limiting condition
../images/480569_1_En_7_Chapter/480569_1_En_7_Equl_HTML.png
which can be rewritten in terms of ε-enlargements of the prenormal cone as follows:

$$\displaystyle \begin{aligned} (-u_{km}-\lambda_{km}v_k,\lambda_{km})\in\hat N_{\varepsilon_{km}}\big((x_{km},\alpha_{km});\Omega _2\times\mathbb{R}_+\big)\;\mbox{ with }\;\varepsilon_{km}:=\frac{|\lambda_{km}|\varepsilon_k}{1+\|v_k\|+\varepsilon_k}. \end{aligned}$$
Employing the standard diagonal process and remembering the representation of basic normals via limits of ε-enlargements of regular ones bring us to the claimed inclusion (7.4.4).

Observe that having λ = 0 in (7.4.4) contradicts the qualification condition (7.4.1) for s = 2. Thus λ < 0, which readily implies that 
$$v\in N(\bar {x};\Omega _1)+N(\bar {x};\Omega _2)$$
.

To proceed finally by induction for s > 2, we observe that the induction assumption for (7.4.2) in the previous step yields the validity of the qualification condition (7.4.1) needed for the current step of induction. This completes the proof of the theorem. □

Next we derive the subdifferential sum rules concerning both basic subdifferential (7.2.7) and singular subdifferential (7.2.8). For our subsequent applications to bilevel optimization, it is sufficient to consider the case where all but one of the functions involved in summation are locally Lipschitzian around the reference point. This case allows us to obtain the subdifferential sum rules without any qualification conditions.

Theorem 7.4.2
Let 
$$\varphi _1\colon \mathbb {R}^n\to \overline {\mathbb {R}}$$
be l.s.c. around 
$$\bar {x}\in \mathit{\mbox{dom}}\,\varphi _1$$
, and let 
$$\varphi _i\colon \mathbb {R}^n\to \overline {\mathbb {R}}$$
for i = 2, …, s and s ≥ 2 be locally Lipschitzian around 
$$\bar {x}$$
. Then we have the sum rules

$$\displaystyle \begin{aligned} \partial\Big(\displaystyle\sum_{i=1}^s\varphi_i\Big)(\bar{x})\subset\sum_{i=1}^s\partial\varphi_i(\bar{x}), \end{aligned} $$
(7.4.5)

$$\displaystyle \begin{aligned} \partial^\infty\Big(\displaystyle\sum_{i=1}^s\varphi_i\Big)(\bar{x})=\partial^\infty\varphi_1(\bar{x}). \end{aligned} $$
(7.4.6)

Proof

We consider the case where only two functions are under summation since the general case of finitely many functions obviously follows by induction. Let us start with the basic subdifferential sum rule (7.4.5) for s = 2 therein.

Pick any 
$$v\in \partial (\varphi _1+\varphi _2)(\bar {x})$$
and get by definition (7.2.7) that

$$\displaystyle \begin{aligned} \begin{array}{rcl} (v,-1)\in N\big((\bar{x},(\varphi_1+\varphi_2)(\bar{x})); \mbox{epi}\,(\varphi_1+\varphi_2)\big). \end{array} \end{aligned} $$
Then construct the sets

$$\displaystyle \begin{aligned} \Omega _i:=\big\{(x,\mu_1,\mu_2)\in\mathbb{R}^n\times\mathbb{R}\times\mathbb{R}\big|\;\mu_i\ge\varphi_i(x)\big\}\;\mbox{ for }\;i=1,2. \end{aligned}$$
Denoting 
$$\bar \mu _i:=\varphi _i(\bar {x}),\ i=1,2$$
, we obviously have that the sets Ω1 and Ω2 are locally closed around the triple 
$$(\bar {x},\bar \mu _1,\bar \mu _2)\in \Omega _1\cap \Omega _2$$
. It is easy to check that 
$$(v,-1,-1)\in N((\bar {x},\bar \mu _1,\bar \mu _2);\Omega _1\cap \Omega _2)$$
. Applying now to this set intersection the normal cone intersection rule from Theorem 7.4.1, we observe that the qualification condition (7.4.1) is automatically satisfied in this case due to the singular subdifferential characterization (7.2.9) of the local Lipschitz continuity. Hence we get pairs 
$$(v_i,-\lambda _i)\in N((\bar {x},\bar \mu _i);\mbox{epi}\,\varphi _i)$$
for i = 1, 2 satisfying the condition

$$\displaystyle \begin{aligned} (v,-1,-1)=(v_1,-\lambda_1,0)+(v_2,0,-\lambda_2),\end{aligned} $$
which implies that v = v 1 + v 2 and λ 1 = λ 2 = −1. Therefore it shows that 
$$v_i\in \partial \varphi _i(\bar {x})$$
for i = 1, 2, and thus the sum rule (7.4.5) is verified.
Next we proceed with the proof of (7.4.6) for s = 2 starting with verifying the inclusion “⊂” therein. Pick 
$$v\in \partial ^\infty (\varphi _1+\varphi _2)(\bar {x})$$
and find by definition sequences ../images/480569_1_En_7_Chapter/480569_1_En_7_IEq103_HTML.gif, and η k 0 such that

$$\displaystyle \begin{aligned}\langle v_k,x-x_k\rangle+\nu_k(\mu-\mu_k)\le\gamma_k(\|x-x_k\|+|\mu-\mu_k|)\end{aligned} $$
whenever (x, μ) ∈epi (φ 1 + φ 2) with 
$$x\in x_k+\eta _k\mathbb {B}$$
and |μ − μ k|≤ η k as k = 1, 2, …. Taking a Lipschitz constant  > 0 of φ 2 around 
$$\bar {x}$$
, denote 
$$\tilde {\eta }_k:=\eta _k/2(\ell +1)$$
and 
$$\widetilde {\mu }_k:=\mu _k-\varphi _2(x_k)$$
. Then ../images/480569_1_En_7_Chapter/480569_1_En_7_IEq108_HTML.gif and

$$\displaystyle \begin{aligned}(x,\mu+\varphi_2(x))\in\mbox{epi}\,(\varphi_1+\varphi_2),\quad |(\mu+\varphi_2(x))-\mu_k|\le\eta_k \end{aligned}$$
for all 
$$(x,\mu )\in \mbox{epi}\,\varphi _1,\ x\in x_k+\widetilde {\eta }_k\mathbb {B}$$
, and 
$$|\mu -\tilde {\mu }_k|\le \widetilde {\eta }_k$$
. Therefore

$$\displaystyle \begin{aligned}\langle v_k,x-x_k\rangle+\nu_k(\mu-\tilde{\mu}_k)\le\varepsilon_k(\|x-x_k\|+|\mu-\tilde{\mu}_k|) \mbox{ with }\;\varepsilon_k:=\gamma_k(1+\ell)+|\nu_k|\ell\end{aligned} $$
if (x, μ) ∈epi φ 1 with 
$$x\in x_k+\tilde {\eta }_k\mathbb {B}$$
and 
$$|\mu -\tilde {\mu }_k|\le \tilde {\eta }_k$$
. It yields 
$$(v_k,\nu _k)\in \hat {N}_{\varepsilon _k}((x_k,\tilde {\mu }_k);\mbox{epi}\,\varphi _1)$$
for all k = 1, 2, …, and so 
$$(v,0)\in N((\bar {x},\varphi (\bar {x}));\mbox{epi}\,\varphi _1)$$
since ε k 0 as k →. This verifies the inclusion “⊂” in (7.4.6). Applying it to the sum φ 1 = (φ 1 + φ 2) + (−φ 2) yields 
$$\partial ^\infty \varphi _1(\bar {x})\subset \partial ^\infty (\varphi _2+\varphi _1)(\bar {x})$$
, which justifies the equality in (7.4.6) and thus completes the proof. □

7.5 Subdifferentials and Lipschitz Continuity of Value Functions

In this section we consider the class of extended-real-valued functions 
$$\vartheta \colon \mathbb {R}^n\to \overline {\mathbb {R}}$$
defined by

$$\displaystyle \begin{aligned} \vartheta(x):=\inf\big\{\varphi(x,y)\big|\;y\in F(x)\big\},\quad  x\in\mathbb{R}^n, \end{aligned} $$
(7.5.1)
where 
$$\varphi \colon \mathbb {R}^n\times \mathbb {R}^m\to \overline {\mathbb {R}}$$
is an l.s.c. function, and where 
$$F\colon \mathbb {R}^n\rightrightarrows \mathbb {R}^m$$
is a set-valued mapping of closed graph. We can view (7.5.1) as the optimal value function in the problem of parametric optimization described as follows:

$$\displaystyle \begin{aligned}\mbox{minimize }\;\varphi(x,y)\;\mbox{ subject to }\;y\in F(x)\end{aligned} $$
with the cost function φ and the constraint mapping F, where y and x are the decision and parameter variables, respectively. Functions of this type are also known in variational analysis under the name of “marginal functions.” A characteristic feature of such functions is their nonsmoothness regardless of the smoothness of the cost function φ and the simplicity of the constraint mapping F that may nicely behave on the parameter x.

As seen below, functions of type (7.5.1) play a crucial role in applications to bilevel optimization while revealing intrinsic nonsmoothness of the latter class of optimization problems. This section presents evaluations of both basic and singular subdifferentials of (7.5.1), which are equally important for the aforementioned applications. Singular subdifferential evaluations are used for establishing the Lipschitz continuity of 𝜗(x) with respect to the parameter x that allows us to reduce the bilevel model under consideration to a single-level problem of Lipschitzian programming. On the other hand, basic subdifferential evaluations open the gate to derive in this way necessary optimality conditions for Lipschitzian bilevel programs.

To proceed, let us consider the argminimum mapping 
$$M\colon \mathbb {R}^n\rightrightarrows \mathbb {R}^m$$
associated with (7.5.1) by

$$\displaystyle \begin{aligned} M(x):=\big\{y\in F(x)\big|\;\varphi(x,y)=\vartheta(x)\big\},\quad  x\in\mathbb{R}^n,\end{aligned} $$
(7.5.2)
and recall that this mapping is inner semicontinuous at 
$$(\bar {x},\bar {y})\in \mbox{gph}\, M$$
if for every sequence ../images/480569_1_En_7_Chapter/480569_1_En_7_IEq121_HTML.gif there exists a sequence y k ∈ M(x k) that converges to 
$$\bar {y}$$
as k →. Observe that the inner semicontinuity of M at 
$$(\bar {x},\bar {y})$$
is implied by its Lipschitz-like property at this point.

The following theorem gives us efficient upper estimates of both basic and singular subdifferentials of the optimal value function 𝜗 needed for subsequent applications. We confine ourselves to the case of local Lipschitz continuity of the cost function φ in (7.5.1) that is sufficient to derive necessary optimality conditions for bilevel programs in Sects. 7.8 and 7.9.

Theorem 7.5.1
Let the argminimum mapping (7.5.2) be inner semicontinuous at the point 
$$(\bar {x},\bar {y})\in \mathit{\mbox{gph}}\, M$$
, and let the cost function φ be locally Lipschitzian around this point. Then we have

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \partial\vartheta(\bar{x})\subset\bigcup_{(v,w)\in\partial\varphi(\bar{x},\bar{y})}\Big[v+D^*F(\bar{x},\bar{y})(w)\Big], \end{array} \end{aligned} $$
(7.5.3)

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \partial^\infty\vartheta(\bar{x})\subset D^*F(\bar{x},\bar{y})(0). \end{array} \end{aligned} $$
(7.5.4)

Proof
To start with the verification of (7.5.3), consider the extended-real-valued function 
$$\psi \colon \mathbb {R}^n\times \mathbb {R}^m\to \overline {\mathbb {R}}$$
defined via the indicator function of the set gph F by

$$\displaystyle \begin{aligned} \psi(x,y):=\varphi(x,y)+\delta\big((x,y);\mbox{gph}\, F\big)\;\mbox{ for all }\;(x,y)\in\mathbb{R}^n\times\mathbb{R}^m \end{aligned} $$
(7.5.5)
and prove first the fulfillment of the estimate

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \partial\vartheta(\bar{x})\subset\big\{v\in\mathbb{R}^n\big|\;(v,0)\in\partial\psi(\bar{x},\bar{y})\big\}. \end{array} \end{aligned} $$
(7.5.6)
Indeed, pick any subgradient 
$$v\in \partial \vartheta (\bar {x})$$
and get from its representation in (7.2.12) sequences ../images/480569_1_En_7_Chapter/480569_1_En_7_IEq127_HTML.gif and v k → v with 
$$v_k\in \hat {\partial }\vartheta (x_k)$$
as k →. Based on definition (7.2.11), for any sequence ε k 0 there exists η k 0 as k → such that

$$\displaystyle \begin{aligned}\langle v_k,x-x_k\rangle\le\vartheta(x)-\vartheta(x_k)+\varepsilon_k\|x-x_k\|\;\mbox{ whenever }\;x\in x_k+\eta_k\mathbb{B},\quad  k=1,2,\ldots. \end{aligned}$$
This ensures by using the constructions above that

$$\displaystyle \begin{aligned}\langle(v_k,0),(x,y)-(x_k,y_k)\rangle\le\psi(x,y)-\psi(x_k,y_k)+\varepsilon_k\big(\|x-x_k\|+\|y-y_k\|\big) \end{aligned}$$
for all y k ∈ M(x k) and 
$$(x,y)\in (x_k,y_k)+\eta _k\mathbb {B}$$
. This tells us that 
$$(v_k,0)\in \hat {\partial }_{\varepsilon _k}\psi (x_k,y_k)$$
for all k = 1, 2, …. Employing further the inner semicontinuity of the argminimum mapping M at 
$$(\bar {x},\bar {y})$$
, we find a sequence of y k ∈ M(x k) converging to 
$$\bar {y}$$
as k →. It follows from imposed convergence 
$$\vartheta (x_k)\to \vartheta (\bar {x})$$
that 
$$\psi (x_k,y_k)\to \psi (\bar {x},\bar {y})$$
. Hence we arrive at 
$$(v,0)\in \partial \psi (\bar {x},\bar {y})$$
by passing to the limit as k →, which verifies therefore the validity of the upper estimate (7.5.6). To derive from (7.5.6) the one in (7.5.3) claimed in the theorem, it remains to use in (7.5.6) the basic subdifferential sum rule (7.4.5) from Theorem 7.4.2 combining it with subdifferentiation of the indicator function in (7.2.10) and the coderivative definition in (7.2.4).
Next we verify the singular subdifferential estimate

$$\displaystyle \begin{aligned} \partial^\infty\vartheta(\bar{x})\subset\big\{v\in\mathbb{R}^n\big|\;(v,0)\in\partial^\infty\psi(\bar{x},\bar{y})\big\} \end{aligned} $$
(7.5.7)
for optimal value function (7.5.1) in terms of the auxiliary function (7.5.5) under the assumptions made. Picking 
$$v\in \partial ^\infty \vartheta (\bar {x})$$
and taking any sequence ε k 0, find by (7.2.13) sequences ../images/480569_1_En_7_Chapter/480569_1_En_7_IEq137_HTML.gif, and η k 0 as k → satisfying

$$\displaystyle \begin{aligned}\langle v_k,x-x_k\rangle+\nu_k(\mu-\mu_k)\le\varepsilon_k\big(\|x-x_k\|+|\mu-\mu_k|\big) \end{aligned}$$
for all 
$$(x,\mu )\in \mbox{epi}\,\vartheta ,\ x\in x_k+\eta _k\mathbb {B}$$
, and |μ − μ k|≤ η k. The assumed inner semicontinuity of (7.5.2) ensures the existence of sequences ../images/480569_1_En_7_Chapter/480569_1_En_7_IEq139_HTML.gif and 
$$\mu _k\downarrow \psi (\bar {x})$$
such that

$$\displaystyle \begin{aligned}(v_k,0,\nu_k)\in\hat{N}_{\varepsilon_k}\big((x_k,y_k,\mu_k);\mbox{epi}\,\psi\big)\;\mbox{ for all }\;k=1,2,\ldots, \end{aligned}$$
via the ε k-enlargements 
$$\hat N_{\varepsilon _k}$$
of the prenormal cone to the epigraph of ψ. This gives us (7.5.7) by passing to the limit as k →. Applying finally to ψ in (7.5.7) the singular subdifferential relation (7.4.6) from Theorem 7.4.2 with taking into account the singular subdifferential calculation in (7.2.10) together with the coderivative definition (7.2.4), we arrive at the claimed upper estimate (7.5.4) and thus complete the proof of the theorem. □

As mentioned above, in our applications to bilevel programming we need to have verifiable conditions that ensure the local Lipschitz continuity of the optimal value function (7.5.1). This is provided by the following corollary, which is a direct consequence of Theorem 7.5.1 and the coderivative criterion (7.2.6) for the Lipschitz-like property.

Corollary 7.5.2

In addition to the assumptions of Theorem 7.5.1, suppose that the constraint mapping F is Lipschitz-like around 
$$(\bar {x},\bar {y})\in \mathit{\mbox{gph}}\, M$$
in (7.5.2). Then the optimal value function (7.5.1) is locally Lipschitzian around 
$$(\bar {x},\bar {y})$$
. △

Proof

We know from the coderivative criterion (7.2.6) that F is Lipschitz-like around 
$$(\bar {x},\bar {y})$$
if and only if 
$$D^*F(\bar {x},\bar {y})(0)=\{0\}$$
. Applying it to (7.5.4) tells us that the assumed Lipschitz-like property of the constraint mapping F in (7.5.1) ensures that 
$$\partial ^\infty \vartheta (\bar {x})=\{0\}$$
. Furthermore, it easily follows from the assumptions made that the optimal value function is l.s.c. around 
$$\bar {x}$$
. Thus 𝜗 is locally Lipschitzian around 
$$\bar {x}$$
by the characterization of this property given in (7.2.9). □

7.6 Problems of Lipschitzian Programming

Before deriving necessary optimality conditions in Lipschitzian problems of bilevel optimization in the subsequent sections, we devote this section to problems of single-level Lipschitzian programming. The results obtained here are based on the extremal principle and subdifferential characterization of local Lipschitzian functions while being instrumental for applications to bilevel programs given in Sect. 7.8.

The mathematical program under consideration here is as follows:

$$\displaystyle \begin{aligned} \begin{array}{ll} \mbox{minimize }\;\varphi_0(x)\;\mbox{ subject to }\\ \varphi_i(x)\le 0\;\mbox{ for all }\;i=1,\ldots,m, \end{array} \end{aligned} $$
(7.6.1)
where the functions 
$$\varphi _i\colon \mathbb {R}^n\to \mathbb {R},\ i=0,\ldots ,m$$
, are locally Lipschitzian around the reference point 
$$\bar {x}$$
. The next theorem provides necessary optimality conditions in problem (7.6.1) of Lipschitzian programming that are expressed in terms of the basic subdifferential (7.2.7).
Theorem 7.6.1
Let 
$$\bar {x}$$
be a feasible solution to problem (7.6.1) that gives a local minimum to the cost function φ 0therein. Then there exist multipliers λ 0, …, λ msatisfying the sign conditions

$$\displaystyle \begin{aligned} \lambda_i\ge 0\;\mathit{\mbox{ for all }}\;i=0,\ldots,m, \end{aligned} $$
(7.6.2)
the nontriviality conditions

$$\displaystyle \begin{aligned} \lambda_0+\ldots+\lambda_m\neq 0, \end{aligned} $$
(7.6.3)
the complementary slackness conditions

$$\displaystyle \begin{aligned} \lambda_i\varphi_i(\bar{x})=0\;\mathit{\mbox{ whenever }}\;i=1,\ldots,m, \end{aligned} $$
(7.6.4)
and the subdifferential Lagrangian inclusion

$$\displaystyle \begin{aligned} 0\in\displaystyle\sum_{i=0}^{m}\lambda_i\partial\varphi_i(\bar{x}). \end{aligned} $$
(7.6.5)
Assume in addition that

$$\displaystyle \begin{aligned} \Big[\displaystyle\sum_{i\in I(\bar{x})}\lambda_i v_i=0,\;\lambda_i\ge 0\Big]\Longrightarrow\Big[\lambda_i=0\;\mathit{\mbox{ for all }}\;i\in I(\bar{x})\Big] \end{aligned} $$
(7.6.6)

whenever 
$$v_i\in \partial \varphi _i(\bar {x})$$
with 
$$I(\bar {x}):=\big \{i\in \{1,\ldots ,m\}\big |\;\varphi _i(\bar {x})=0\big \}$$
. Then the necessary optimality conditions formulated above hold with λ 0 = 1. △

Proof
Supposing without loss of generality that 
$$\varphi _0(\bar {x})=0$$
, consider the point 
$$(\bar {x},0)\in \mathbb {R}^n\times \mathbb {R}^{m+1}$$
and form the following system of m + 2 sets in the space 
$$\mathbb {R}^n\times \mathbb {R}^{m+1}$$
:

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} &amp;\displaystyle &amp;\displaystyle \hspace{-6pt}\Omega_i:=\big\{(x,\mu_0,\ldots,\mu_m)\in\mathbb{R}^n\times\mathbb{R}^{m+1}\big| (x,\mu_i)\in\mbox{epi}\,\varphi_i\big\}, i=0,\ldots,m, {}\qquad \qquad  \end{array} \end{aligned} $$
(7.6.7a)

$$\displaystyle \begin{aligned} \begin{array}{rcl} &amp;\displaystyle &amp;\displaystyle \hspace{-6pt}\Omega _{m+1}:=\mathbb{R}^n\times\{0\}.{} \end{array} \end{aligned} $$
(7.6.7b)
It is obvious that 
$$(\bar {x},0)\in \Omega _0\cap \ldots \cap \Omega _{m+1}$$
and that all the sets Ωi, i = 0, …, m + 1, are locally closed around 
$$(\bar {x},0)$$
. Furthermore, there exists a neighborhood U of the local minimizer 
$$\bar {x}$$
such that for any ε > 0 we find ν ∈ (0, ε) ensuring that

$$\displaystyle \begin{aligned} \big(\Omega _0+a\big)\displaystyle\bigcap_{i=1}^{m+1}\Omega _i\cap\big(U\times\mathbb{R}^{m+1}\big)=\emptyset, \end{aligned} $$
(7.6.8)
where 
$$a:=(0,\nu ,0,\ldots ,0)\in \mathbb {R}^n\times \mathbb {R}^{m+1}$$
with 
$$\nu \in \mathbb {R}$$
standing at the first position after 
$$0\in \mathbb {R}^n$$
. Indeed, the negation of (7.6.8) contradicts the local minimality of 
$$\bar {x}$$
in (7.6.1). Having (7.6.8) gives us (7.3.1) for the set system in (7.6.7a), (7.6.7b) and thus verifies that 
$$(\bar {x},0)$$
is a locally extremal point of these sets. Applying now the extremal principle from Theorem 7.3.2 to { Ω0, …, Ωm+1} at 
$$(\bar {x},0)$$
with taking into account the structures of Ωi, we get 
$$v_0,\ldots ,v_m\in \mathbb {R}^n,\ \lambda _0,\ldots ,\lambda _m\in \mathbb {R}$$
, and 
$$\xi =(\xi _1,\ldots ,\xi _{m+1})\in \mathbb {R}^{m+1}$$
satisfying the inclusions

$$\displaystyle \begin{aligned} (v_i,-\lambda_i)\in N\big((\bar{x},0);\mbox{epi}\,\varphi_i\big)\;\mbox{ for all }\;i=0,\ldots,m \end{aligned} $$
(7.6.9)
together with the relationships

$$\displaystyle \begin{aligned} \displaystyle\sum_{i=0}^m\|v_i\|+\sum_{i=0}^m|\lambda_i|+\|\xi\|\neq 0, \end{aligned} $$
(7.6.10)

$$\displaystyle \begin{aligned} \displaystyle\sum_{i=0}^m v_i=0,\;\mbox{ and }\;\lambda_i=\xi_{i+1}\;\mbox{ as }\;i=0,\ldots,m. \end{aligned} $$
(7.6.11)
It easily follows from (7.6.9) and the structure of the epigraphical sets in (7.6.9) that the sign conditions (7.6.2) are satisfied. If we suppose while arguing by contradiction to (7.6.3) that λ 0 = … = λ m = 0, then ξ = 0 by (7.6.11). Furthermore, the singular subdifferential criterion (7.2.9) for the local Lipschitz continuity of the functions φ i as i = 0, …, m being combined with the sign conditions (7.6.2) and the definitions (7.2.7) and (7.2.8) of the basic and singular subdifferentials, respectively, tells us that the inclusions in (7.6.9) are equivalent to

$$\displaystyle \begin{aligned} v_i\in\lambda_i\partial\varphi_i(\bar{x})\;\mbox{ for all }\;i=0,\ldots,m, \end{aligned} $$
(7.6.12)
and hence we get v 0 = … = v m = 0. This clearly violates (7.6.10) and thus verifies (7.6.3). The generalized Euler equation in (7.6.11) clearly reduces to the Lagrangian inclusion (7.6.5).

To check further the complementary slackness conditions in (7.6.4), fix i ∈{1, …, m} and suppose that 
$$\varphi _i(\bar {x})&lt;0$$
. Then the continuity of φ i at 
$$\bar {x}$$
ensures that the pair 
$$(\bar {x},0)$$
is an interior point of the epigraphical set epi φ i. It readily implies that 
$$N((\bar {x},0);\mbox{epi}\,\varphi _i)=(0,0)$$
, and hence λ i = 0. This yields 
$$\lambda _i\varphi _i(\bar {x})=0$$
, which justifies (7.6.4).

To complete the proof of the theorem, it remains to check that the validity of (7.6.6) ensures that λ 0 = 1 in (7.6.5). We easily arrive at this assertion while arguing by contradiction. □

If the constraint functions φ i, i = 1, …, m, are smooth around the reference point 
$$\bar {x}$$
, condition (7.6.6) clearly reduces to the classical Mangasarian-Fromovitz constraint qualification. It suggests us to label this condition (7.6.6) as the generalized Mangasarian-Fromovitz constraint qualification, or the generalized MFCQ.

7.7 Variational Approach to Bilevel Optimization

This section is devoted to describing some models of bilevel programming and a variational approach to them that involves nondifferentiable optimal value functions in single-level problems of parametric optimization.

Let us first consider the following problem of parametric optimization with respect to the decision variable 
$$y\in \mathbb {R}^m$$
under each fixed parameter 
$$x\in \mathbb {R}^n$$
:

$$\displaystyle \begin{aligned} \mbox{minimize }\varphi(x,y)\;\mbox{ subject to }\;y\in F(x)\;\mbox{ with fixed }\;x\in\mathbb{R}^n, \end{aligned} $$
(7.7.1)
where 
$$\varphi \colon \mathbb {R}^n\times \mathbb {R}^m\to \mathbb {R}$$
is the cost function and 
$$F\colon \mathbb {R}^n\to \mathbb {R}^m$$
is the constraint mapping in (7.7.1), which is called the lower-level problem of parametric optimization. Denoting by

$$\displaystyle \begin{aligned} S(x):=\mbox{argmin}\big\{\varphi(x,y)\big|\;y\in F(x)\big\} \end{aligned} $$
(7.7.2)
the parameterized solution set for (7.7.1) for each 
$$x\in \mathbb {R}^n$$
and given yet another cost function 
$$\psi \colon \mathbb {R}^n\times \mathbb {R}^m\to \mathbb {R}$$
, we consider the upper-level parametric optimization problem of minimizing ψ(x, y) over the lower-level solution map 
$$S\colon \mathbb {R}^n\rightrightarrows \mathbb {R}^m$$
from (7.7.2) written as:

$$\displaystyle \begin{aligned} \mbox{minimize }\;\psi(x,y)\;\mbox{ subject to }\;y\in S(x)\;\mbox{ for each }\;x\in\Omega. \end{aligned} $$
(7.7.3)
The optimistic bilevel programming model is defined by

$$\displaystyle \begin{aligned} \mbox{minimize }\;\mu(x)\;\mbox{ subject to }\;x\in\Omega ,\;\mbox{ where }\;\mu(x):=\inf\big\{\psi(x,y)\big|\;y\in S(x)\big\}, \end{aligned} $$
(7.7.4)
and where 
$$\Omega \subset \mathbb {R}^n$$
be a given constraint set. On the other hand, the pessimistic bilevel programming model is defined as follows:

$$\displaystyle \begin{aligned} \mbox{minimize }\;\eta(x)\;\mbox{ subject to }\;x\in\Omega ,\;\mbox{ where }\;\eta(x):=\sup\big\{\psi(x,y)\big|\;y\in S(x)\big\}. \end{aligned} $$
(7.7.5)
We refer the reader to [58, 1116, 19, 30, 45, 47, 48], and the bibliographies therein for more details on both optimistic and pessimistic versions in bilevel programming, their local and global solutions as well as reformulations, modifications and relationships with other classes of optimization problems, theoretical and numerical developments, and various applications in finite-dimensional spaces. Investigations and applications of bilevel optimization problems in infinite dimensions can be found, e.g., in [1, 3, 8, 15, 17, 18, 20, 21, 23, 24, 34, 36, 37, 42, 43].

The main attention in this and subsequent sections is paid to the application of the machinery and results of variational analysis and generalized differentiation to problems of bilevel optimization by implementing the value function approach. This approach is based on reducing bilevel programs to single-level problems of mathematical programming by using the nonsmooth optimal value function𝜗(x) of the lower-level problem defined in (7.5.1). Such a device was initiated by Outrata [35] for a particular class of bilevel optimization problems and was used by him for developing a numerical algorithm to solve bilevel programs. Then this approach was strongly developed by Ye and Zhu [44] who employed it to derive necessary optimality conditions for optimistic bilevel programs by using Clarke’s generalized gradients of optimal value functions. More advanced necessary optimality conditions for optimistic bilevel programs in terms of the author’s generalized differentiation reviewed in Sect. 7.2 were developed in [11, 12, 30, 34, 46, 47]. Optimality and stability conditions for pessimistic bilevel models were derived in [13, 16].

We restrict ourselves in what follows to implementing variational analysis and the aforementioned machinery of generalized differentiation within the value function approach to optimistic bilevel models with Lipschitzian data in finite-dimensional spaces. This allows us to most clearly communicate the basic variational ideas behind this approach, without additional technical complications. The variational results presented in the previous sections make our presentation self-contained and complete.

For simplicity we consider the optimistic bilevel model (7.7.4) with only inequality constraints on the lower and upper levels described by

$$\displaystyle \begin{aligned} F(x):=\big\{y\in\mathbb{R}^m\big|\;f_i(x,y)\le 0\;\mbox{ for }\;i=1,\ldots,r\big\}, \end{aligned} $$
(7.7.6)

$$\displaystyle \begin{aligned} \Omega :=\big\{x\in\mathbb{R}^n\big|\;g_j(x)\le 0\;\mbox{ for }\;j=1,\ldots,s\big\}.\end{aligned} $$
(7.7.7)
The reduction of the bilevel program (7.7.4) with the constraints (7.7.6) and (7.7.7) to a single-level problem of nondifferentiable programming and deriving in this way necessary optimality conditions for it are given in the next section.

7.8 Optimality Conditions for Lipschitzian Bilevel Programs

The optimal value function (7.5.1) for the lower-level program (7.7.1) with the inequality constraints specified in (7.7.6) reads as

$$\displaystyle \begin{aligned} \vartheta(x)=\inf\big\{\varphi(x,y)\big|\;f_i(x,y)\le 0,\;i=1,\ldots,r\big\},\quad  x\in\mathbb{R}^n. \end{aligned} $$
(7.8.1)
With the upper-level cost function ψ given in (7.7.3) and the upper-level constraints taken from (7.7.7), consider the following single-level mathematical program with inequality constraints:

$$\displaystyle \begin{aligned} \begin{array}{ll} &amp;\mbox{minimize }\;\psi(x,y)\;\mbox{ subject to }\;g_j(x)\le 0,\;j=1,\ldots,s,\\ &amp;f_i(x,y)\le 0,\;i=1,\ldots,r,\;\mbox{ and }\;\varphi(x,y)\le\vartheta(x). \end{array} \end{aligned} $$
(7.8.2)
We can easily observe that global optimal solutions to (7.8.2) agree with those to problem (7.7.4), (7.7.6), and (7.7.7). Although this is not always the case for local minimizers, it holds under the inner semicontinuity assumption on the solution map associated with the optimistic bilevel program (7.7.4); see [12, Proposition 6.9] for the precise statement.
Looking at (7.8.2), observe that this problem is of type (7.6.1) for which necessary optimality conditions are given in Theorem 7.6.1, provided that all the functions involved are locally Lipschitzian. However, the direct application of Theorem 7.6.1 to problem (7.8.2) is not efficient due to the structure of the last constraint therein defined via the lower-level optimal value function (7.8.1). Indeed, it has been realized in bilevel programming that this constraint prevents the fulfillment of conventional constraint qualifications; in particular, the generalized MFCQ (7.6.6). To avoid this obstacle, Ye and Zhu [44] introduced the following property postulating an appropriate behavior of the cost function in (7.8.2) with respect to linear perturbations of the constraint φ(x, y) ≤ 𝜗(x). Consider the problem:

$$\displaystyle \begin{aligned} \begin{array}{ll} &amp;\mbox{minimize }\;\psi(x,y)\;\mbox{ subject to }\;g_j(x)\le 0,\;j=1,\ldots,s,\\ &amp;f_i(x,y)\le 0,\;i=1,\ldots,r,\;\mbox{ and }\;\varphi(x,y)-\vartheta(x)+\nu=0\;\mbox{ as }\;\nu\in\mathbb{R}. \end{array} \end{aligned} $$
(7.8.3)
Definition 7.8.1
Problem (7.8.2) is Partially Calm at its local optimal solution 
$$(\bar {x},\bar {y})$$
if there exist a constant κ > 0 and a neighborhood U of 
$$(\bar {x},\bar {y},0)\in \mathbb {R}^n\times \mathbb {R}^m\times \mathbb {R}$$
such that

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \psi(x,y)-\psi(\bar{x},\bar{y})+\kappa|\nu|\ge 0 \end{array} \end{aligned} $$
(7.8.4)
for all the triples (x, y, ν) ∈ U feasible to (7.8.3). △

There are various efficient conditions, which ensure the fulfillment of the partial calmness property for (7.8.2). They include the uniform sharp minimum condition [44], linearity of the lower-level problem with respect to the decision variable [9], the kernel condition [30], etc. On the other hand, partial calmness may fail in rather common situations; see [19, 30] for more results and discussions on partial calmness and related properties.

The main impact of partial calmness to deriving necessary optimality conditions for (7.8.2) is its equivalence to the possibility of transferring the troublesome constraint φ(x, y) ≤ 𝜗(x) into the penalized cost function as in the following proposition.

Proposition 7.8.2
Let problem (7.8.2) be partially calm at its local optimal solution 
$$(\bar {x},\bar {y})$$
with ψ being continuous at this point. Then 
$$(\bar {x},\bar {y})$$
is a local optimal solution to the penalized problem

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \begin{array}{ll} \mathit{\mbox{minimize }}\;\psi(x,y)+\kappa\big(\varphi(x,y)-\vartheta(x)\big)\;\mathit{\mbox{ subject to }}\\ g_j(x)\le 0,\;j=1,\ldots,s,\;\mathit{\mbox{ and }}\;f_i(x,y)\le 0,\;i=1,\ldots,r, \end{array} \end{array} \end{aligned} $$
(7.8.5)

where the number κ > 0 is taken from (7.8.4). △

Proof
Taking κ and U from Definition 7.8.1 and using the continuity of ψ at 
$$(\bar {x},\bar {y})$$
, we find γ > 0 and η > 0 with 
$$\tilde U:=[(\bar {x},\bar {y})+\eta \mathbb {B}]\times (-\gamma ,\gamma )\subset U$$
and

$$\displaystyle \begin{aligned} \begin{array}{rcl} |\psi(x,y)-\psi(\bar{x},\bar{y})|\le\kappa\gamma\;\mbox{ for all }\;(x,y)-(\bar{x},\bar{y})\in\eta\mathbb{B}. \end{array} \end{aligned} $$
Let us employ it to verify that

$$\displaystyle \begin{aligned} \psi(x,y)-\psi(\bar{x},\bar{y})+\kappa\big(\varphi(x,y)-\vartheta(x)\big)\ge 0\;\mbox{ if }\;g_j(x)\le 0,\;j=1,\ldots,s, \end{aligned} $$
(7.8.6)
and 
$$(x,y)\in [(\bar {x},\bar {y})+\eta \mathbb {B}]\cap \mbox{gph}\, F$$
with F taken from (7.7.6). If 
$$(x,y,\vartheta (x)-\varphi (x,y))\in \tilde U$$
, then (7.8.6) follows from (7.8.4). In the remaining case where 
$$(x,y,\vartheta (x)-\varphi (x,y))\notin \tilde U$$
, we get that

$$\displaystyle \begin{aligned} \varphi(x,y)-\vartheta(x)\ge\gamma,\;\mbox{ and hence }\;\kappa\big(\varphi(x,y)-\vartheta(x)\big)\ge\kappa\gamma, \end{aligned}$$
which also yields (7.8.6) by 
$$\psi (x,y)-\psi (\bar {x},\bar {y})\ge -\kappa \gamma $$
. The feasibility of 
$$(\bar {x},\bar {y})$$
to (7.8.2) tells us that 
$$\varphi (\bar {x},\bar {y})-\vartheta (\bar {x})=0$$
, which thus verifies the claimed statement. □

Thus the imposed partial calmness allows us to deduce the original problem of optimistic bilevel optimization to the single-level mathematical program (7.8.5) with conventional inequality constraints, where the troublesome term φ(x, y) − 𝜗(x) enters the penalized cost function. To derive necessary optimality conditions for (7.8.5), let us reformulate the generalized MFCQ (7.6.6) in conventional bilevel terms as in the case of bilevel programs with smooth data [6].

We say that 
$$(\bar {x},\bar {y})\in \mathbb {R}^n\times \mathbb {R}^m$$
is lower-level regular if it satisfies the generalized MFCQ in the lower-level problem (7.7.1). This means due to the structure of (7.7.1) that

$$\displaystyle \begin{aligned} \Big[\displaystyle\sum_{i\in I(\bar{x},\bar{y})}\lambda_i v_i=0,\;\lambda_i\ge 0\Big]\Longrightarrow\Big[\lambda_i=0\;\mbox{ for all }\;i\in I(\bar{x},\bar{y})\Big] \end{aligned} $$
(7.8.7)
whenever 
$$(u_i,v_i)\in \partial f_i(\bar {x},\bar {y})$$
with some 
$$u\in \mathbb {R}^n$$
and

$$\displaystyle \begin{aligned}I(\bar{x},\bar{y}):=\big\{i\in\{1,\ldots,r\}\big|\;f_i(\bar{x},\bar{y})=0\big\}.\end{aligned}$$
Similarly, a point 
$$\bar {x}\in \mathbb {R}^n$$
satisfying the upper-level constraints in (7.7.7) is upper-level regular if

$$\displaystyle \begin{aligned} \Big[\displaystyle 0\in\sum_{j\in J(\bar{x})}\mu_j\partial g_j(\bar{x}),\;\mu_j\ge 0\Big]\Longrightarrow\Big[\mu_j=0\;\mbox{ whenever }\;j\in J(\bar{x})\Big] \end{aligned} $$
(7.8.8)
with the active constraint indexes 
$$J(\bar {x}):=\big \{j\in \{1,\ldots ,s\}\big |\;g_j(\bar {x})=0\big \}$$
.

Now we are ready for the application of Theorem 7.6.1 to the optimistic bilevel program in the equivalent form (7.8.5). To proceed, we have to verify first that the optimal value function 𝜗(x) in the cost function of (7.8.5) is locally Lipschitzian around the reference point and then to be able to upper estimate the basic subdifferential (7.2.7) of the function − 𝜗(⋅). Since the basic subdifferential 
$$\partial \vartheta (\bar {x})$$
does not possess the plus-minus symmetry while its convex hull 
$$\mbox{co}\,\partial \vartheta (\bar {x})$$
does, we need to convexify in the proof the set on the right-hand side of the upper estimate in (7.5.3). In this way we arrive at the following major result.

Theorem 7.8.3
Let 
$$(\bar {x},\bar {y})$$
be a local optimal solution to the optimistic bilevel program in the equivalent form (7.8.2) with the lower-level optimal value function 𝜗(x) defined in (7.8.1). Assume that all the functions φ, ψ, f i, g jare locally Lipschitzian around the reference point, that the lower-level solution map S from (7.7.2) is inner semicontinuous at 
$$(\bar {x},\bar {y})$$
, that the lower-level regularity (7.8.7) and upper-level regularity (7.8.8) conditions hold, and that problem (7.8.2) is partially calm at 
$$(\bar {x},\bar {y})$$
with constant κ > 0. Then there exist multipliers λ 1, …, λ r, μ 1, …, μ s, and ν 1, …, ν rsatisfying the sign and complementary slackness conditions

$$\displaystyle \begin{aligned} \lambda_i\ge 0,\;\;\lambda_i f_i(\bar{x},\bar{y})=0\;\mathit{\mbox{ for }}\;i=1,\ldots,r, \end{aligned} $$
(7.8.9)

$$\displaystyle \begin{aligned} \mu_j\ge 0,\;\;\mu_j g_j(\bar{x})=0\;\mathit{\mbox{ for }}\;j=1,\ldots,s, \end{aligned} $$
(7.8.10)

$$\displaystyle \begin{aligned} \nu_i\ge 0,\;\;\nu_i f_i(\bar{x},\bar{y})=0\;\mathit{\mbox{ for }}\;i=1,\ldots,r, \end{aligned} $$
(7.8.11)
together with the following relationships, which involve some vector 
$$u\in \mathit{\mbox{co}}\,\partial \vartheta (\bar {x}):$$

$$\displaystyle \begin{aligned} (u,0)\in\mathit{\mbox{co}}\,\partial\varphi(\bar{x},\bar{y})+\displaystyle\sum_{i=1}^r\nu_i\mathit{\mbox{co}}\,\partial f_i(\bar{x},\bar{y}), \end{aligned} $$
(7.8.12)

$$\displaystyle \begin{aligned} (u,0)\in\partial\varphi(\bar{x},\bar{y})+\kappa^{-1}\partial\psi(\bar{x},\bar{y})+\displaystyle\sum_{i=1}^r\lambda_i\partial f_i(\bar{x},\bar{y})+\sum_{j=1}^s\mu_j\Big(\partial g_j(\bar{x}),0\Big). \end{aligned} $$
(7.8.13)

Proof
It follows from Proposition 7.8.2 that 
$$(\bar {x},\bar {y})$$
is a local minimizer of the mathematical program (7.8.5) with inequality constraints. To show that it belongs to problems of Lipschitzian programming considered in Sect. 7.6, we need to check that the optimal value function (7.8.1) is locally Lipschitzian around 
$$\bar {x}$$
under the assumptions made. This function is clearly l.s.c. around 
$$\bar {x}$$
, and thus its Lipschitz continuity around this point is equivalent to the condition 
$$\partial ^\infty \vartheta (\bar {x})=\{0\}$$
. It follows from the upper estimate (7.5.4) of Theorem 7.5.1 and the assumed inner semicontinuity of the solution map (7.7.2) at 
$$(\bar {x},\bar {y})$$
that

$$\displaystyle \begin{aligned} \partial^\infty\vartheta(\bar{x})\subset D^*F(\bar{x})(0)\;\mbox{ with }\;F(x)=\big\{y\in\mathbb{R}^m\big|\;f_i(x,y)\le 0,\;i=1,\ldots,r\big\}. \end{aligned}$$
Corollary 7.5.2 tells us that the local Lipschitz continuity of 𝜗 around 
$$\bar {x}$$
follows from the Lipschitz-like property of the mapping 
$$F\colon \mathbb {R}^n\rightrightarrows \mathbb {R}^m$$
. Since

$$\displaystyle \begin{aligned} \mbox{gph}\, F=\big\{(x,y)\in\mathbb{R}^n\times\mathbb{R}^m\big|\;f_i(x,y)\le 0,\;i=1,\ldots,r\big\}, \end{aligned}$$
we deduce that 
$$D^*F(\bar {x},\bar {y})(0)=\{0\}$$
from the assumed lower-level regularity due to the coderivative definition and the normal come intersection rule in Theorem 7.4.1. Thus F is Lipschitz-like around 
$$(\bar {x},\bar {y})$$
by the coderivative criterion (7.2.6), and the Lipschitz continuity of 𝜗(⋅) is verified.
Now we can apply to problem (7.8.5) the necessary optimality conditions for Lipschitzian programs obtained in Theorem 7.6.1. The assumed lower-level regularity and upper-level regularity clearly imply that the generalized MFCQ condition (7.6.6) holds. Thus there are multipliers λ 1, …, λ r and μ 1, ⋯ , μ s satisfying the sign and complementary slackness conditions in (7.8.9) and (7.8.10) for which we have

$$\displaystyle \begin{aligned} \begin{array}{ll} 0&amp;\in\partial\psi(\bar{x},\bar{y})+\kappa\partial\varphi(\bar{x},\bar{y})+\Big(\kappa\partial(-\vartheta)(\bar{x}),0\Big)\\ &amp;+\displaystyle\sum_{i=1}^r\lambda_i\partial f_i(\bar{x},\bar{y})+\sum_{j=1}^s\mu_j\Big(\partial g_j(\bar{x}),0\Big). \end{array} \end{aligned} $$
(7.8.14)
To estimate 
$$\partial (-\vartheta )(\bar {x})$$
in (7.8.14), recall that

$$\displaystyle \begin{aligned} \partial(-\vartheta)(\bar{x})\subset\overline\partial(-\vartheta)(\bar{x})=-\overline\partial\vartheta(\bar{x})=-\mbox{co}\,\partial\vartheta(\bar{x}), \end{aligned}$$
where 
$$\overline \partial $$
stands for Clarke’s generalized gradient of locally Lipschitzian functions that possesses the plus-minus symmetry property [4]. Using it in (7.8.14), we get 
$$u\in \mbox{co}\,\partial \vartheta (\bar {x})$$
such that

$$\displaystyle \begin{aligned} \kappa(u,0)\in\partial\psi(\bar{x},\bar{y})+\partial\varphi(\bar{x},\bar{y})+\displaystyle\sum_{i=1}^r\lambda_i\partial f_i(\bar{x},\bar{y})+\sum_{j=1}^s\Big(\mu_j\partial g_j(\bar{x}),0\Big). \end{aligned} $$
(7.8.15)
Applying the convexified subdifferential estimates (7.5.3) from Theorem 7.5.1 to the optimal value function (7.8.1) allows us to find multipliers ν 1, …, ν r satisfying the sign and complementary slackness conditions in (7.8.11) that ensure the validity of (7.8.12). To verify finally (7.8.13), we divide (7.8.15) by κ > 0 with keeping the same notation for the scaled multipliers λ i and μ j. □

The next section presents an independent set of necessary optimality conditions for optimistic bilevel programs with Lipschitzian data that are obtained without using any convexification while employing instead yet another variational device and subdifferential calculus rule.

7.9 Bilevel Optimization via Subdifferential Difference Rule

Considering the single-level problem (7.8.5), which we are finally dealing with while deriving necessary optimality conditions for optimistic bilevel programs, note that the objective therein contains the difference of two nonsmooth functions. The basic subdifferential (7.2.7) does not possesses any special rule for difference of nonsmooth functions, but the regular subdifferential (7.2.11) does, as was first observed in [33] by using a smooth variational description of regular subgradients. Here we employ this approach to establish necessary optimality conditions for Lipschitzian bilevel programs that are different from those in Theorem 7.8.3.

The derivation of these necessary optimality conditions is based on the following two results, which are certainly of their independent interest. The first one provides a smooth variational description of regular subgradients of arbitrary functions 
$$\varphi \colon \mathbb {R}^n\to \overline {\mathbb {R}}$$
.

Lemma 7.9.1

Let 
$$\varphi \colon \mathbb {R}^n\to \overline {\mathbb {R}}$$
be finite at 
$$\bar {x}$$
, and let 
$$v\in \hat \partial \varphi (\bar {x})$$
. Then there exists a neighborhood U of 
$$\bar {x}$$
and a function 
$$\psi \colon U\to \mathbb {R}$$
such that 
$$\psi (\bar {x})=\varphi (\bar {x})$$
, that ψ is Fréchet differentiable at 
$$\bar {x}$$
with 
$$\nabla \psi (\bar {x})=v$$
, and that the difference ψ  φ achieves at 
$$\bar {x}$$
its local maximum on U.

Proof
We proceed geometrically due to the relationship

$$\displaystyle \begin{aligned} \hat\partial\varphi(\bar{x})=\big\{v\in\mathbb{R}^n\big|\;(v,-1)\in\hat N\big((\bar{x},\varphi(\bar{x}));\mbox{epi}\,\varphi\big)\big\}, \end{aligned}$$
which reduces the claimed assertion to the following: 
$$w\in \hat N(\bar {z};\Omega )$$
if and only if there exists a neighborhood U of 
$$\bar {z}$$
and a function 
$$\psi \colon \mathbb {R}^n\to \mathbb {R}$$
such that ψ is Fréchet differentiable at 
$$\bar {z}$$
with 
$$\nabla \psi (\bar {z})=w$$
while achieving at 
$$\bar {z}$$
its local maximum relative to Ω.
To verify the latter, observe that for any 
$$\psi \colon U\to \mathbb {R}$$
satisfying the listed properties we get

$$\displaystyle \begin{aligned} \psi(z)=\psi(\bar{z})+\langle w,z-\bar{z}\rangle+o(\|z-\bar{z}\|)\le\psi(\bar{z})\;\mbox{ whenever }\;z\in U. \end{aligned}$$
It shows that 
$$\langle w,z-\bar {z}\rangle +o(\|z-\bar {z}\|)\le 0$$
, and thus 
$$w\in \hat N(\bar {z};\Omega )$$
by definition (7.2.2). Conversely, pick 
$$w\in \hat N(\bar {z};\Omega )$$
and define the function

$$\displaystyle \begin{aligned} \begin{array}{rcl} \psi(z):=\left\{\begin{array}{ll} \min\big\{0,\langle w,z-\bar{z}\rangle\big\}&amp;\displaystyle \mbox{ if }\;z\in\Omega ,\\ \langle w,z-\bar{z}\rangle&amp;\displaystyle \mbox{ otherwise}. \end{array}\right. \end{array} \end{aligned} $$
It is easy to check that this function enjoys all the properties listed above. □

The second lemma gives us the aforementioned difference rule for regular subgradients.

Lemma 7.9.2
Consider two arbitrary functions 
$$\varphi _1,\varphi _2\colon \mathbb {R}^n\to \overline {\mathbb {R}}$$
that are finite at 
$$\bar {x}$$
and assume that 
$$\hat \partial \varphi _2(\bar {x})\neq \emptyset $$
. Then we have the inclusions

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \hat\partial(\varphi_1-\varphi_2)(\bar{x})\subset\bigcap_{v\in\hat\partial\varphi_2(\bar{x})}\Big[\hat\partial\varphi_1(\bar{x})-v\Big]\subset \hat\partial\varphi_1(\bar{x})-\hat\partial\varphi_2(\bar{x}). \end{array} \end{aligned} $$
(7.9.1)
It implies, in particular, that any local minimizer 
$$\bar {x}$$
of the difference function φ 1 − φ 2satisfies the necessary optimality condition

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \hat\partial\varphi_2(\bar{x})\subset\hat\partial\varphi_1(\bar{x}). \end{array} \end{aligned} $$
(7.9.2)

Proof
Starting with (7.9.1), pick 
$$v\in \hat \partial \varphi _2(\bar {x})$$
. Then the smooth variational description of v from Lemma 7.9.1 gives us 
$$\psi \colon U\to \mathbb {R}$$
on a neighborhood U of 
$$\bar {x}$$
that is differentiable at 
$$\bar {x}$$
with

$$\displaystyle \begin{aligned} \begin{array}{rcl} \psi(\bar{x})=\varphi_2(\bar{x}),\quad \nabla\psi(\bar{x})=v,\;\mbox{ and }\;\psi(x)\le\varphi_2(x)\;\mbox{ whenever }\;x\in U. \end{array} \end{aligned} $$
Fix further an arbitrary vector 
$$w\in \hat \partial (\varphi _1-\varphi _2)(\bar {x})$$
and for any ε > 0 find γ > 0 such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{array}{ll} \langle w,x-\bar{x}\rangle&amp;\displaystyle \le\varphi_1(x)-\varphi_2(x)-\big(\varphi_1(\bar{x})-\varphi_2(\bar{x})\big)+\varepsilon\|x-\bar{x}\|\\ &amp;\displaystyle \le\varphi_1(x)-\psi(x)-\big(\varphi_1(\bar{x})-\psi(\bar{x})\big)+\varepsilon\|x-\bar{x}\| \end{array} \end{array} \end{aligned} $$
if 
$$\|x-\bar {x}\|\le \gamma $$
. Due to the differentiability of ψ at 
$$\bar {x}$$
we use the elementary sum rule for the regular subgradients (see, e.g., [29, Proposition 1.107(i)]) and immediately get the relationships

$$\displaystyle \begin{aligned} w\in\hat\partial(\varphi_1-\psi)(\bar{x})=\hat\partial\varphi_1(\bar{x})-\nabla\psi(\bar{x})=\hat\partial\varphi_1(\bar{x})-v, \end{aligned}$$
which clearly verify both inclusions in (7.9.1). Picking further any 
$$v\in \hat \partial \varphi _2(\bar {x})$$
, we deduce from (7.9.1) and the obvious Fermat stationary rule via regular subgradients that

$$\displaystyle \begin{aligned} 0\in\hat\partial(\varphi_1-\varphi_2)(\bar{x})\subset\hat\partial\varphi_1(\bar{x})-v. \end{aligned}$$
It shows that 
$$v\in \hat \partial \varphi _1(\bar {x})$$
and thus verifies the fulfillment of (7.9.2). □

Now we are in a position to derive refined necessary optimality conditions for optimistic bilevel programs with Lipschitzian data.

Theorem 7.9.3
Let 
$$(\bar {x},\bar {y})$$
be a local optimal solution to the optimistic bilevel program in the equivalent form (7.8.2) without upper level constraints. Suppose that all the functions φ, ψ, f iare locally Lipschitzian around the reference point, that the lower-level solution map S in (7.7.2) is inner semicontinuous at 
$$(\bar {x},\bar {y})$$
, that the lower-level regularity (7.8.7) condition holds, and that problem (7.8.2) is partially calm at 
$$(\bar {x},\bar {y})$$
with constant κ > 0. Assume in addition that 
$$\hat \partial \vartheta (\bar {x})\neq \emptyset $$
for the optimal value function (7.8.1). Then there exists a vector 
$$u\in \hat \partial \vartheta (\bar {x})$$
together with multipliers λ 1, …, λ rand ν 1, …, ν rfor i = 1, …, r satisfying the sign and complementary slackness conditions in (7.8.11) and (7.8.9), respectively, such that

$$\displaystyle \begin{aligned} (u,0)\in\partial\varphi(\bar{x},\bar{y})+\displaystyle\sum_{i=1}^r\nu_i\partial f_i(\bar{x},\bar{y}),\end{aligned} $$
(7.9.3)

$$\displaystyle \begin{aligned} (u,0)\in\partial\varphi(\bar{x},\bar{y})+\kappa^{-1}\partial\psi(\bar{x},\bar{y})+\displaystyle\sum_{i=1}^r\lambda_i\partial f_i(\bar{x},\bar{y}). \end{aligned} $$
(7.9.4)

Proof
We get from the partial calmness penalization in Proposition 7.8.2 employed together with the infinite penalization of the lower-level constraints that 
$$(\bar {x},\bar {y})$$
a local minimizer for the unconstrained optimization problem

$$\displaystyle \begin{aligned} \mbox{minimize }\;\psi(x,y)+\kappa\big(\varphi(x,y)-\vartheta(x)\big)+\delta\big((x,y);\mbox{gph}\, F\big) \end{aligned} $$
(7.9.5)
with the mapping 
$$F\colon \mathbb {R}^n\rightrightarrows \mathbb {R}^m$$
defined in (7.7.6). Then applying to (7.9.5) the difference rule (7.9.2) from Proposition 7.9.2 gives us the inclusion

$$\displaystyle \begin{aligned} \big(\kappa\hat\partial\vartheta(\bar{x}),0\big)\subset\hat\partial\big(\psi(\cdot)+\kappa\varphi(\cdot)+\delta(\cdot;\mbox{gph}\, F)\big)(\bar{x},\bar{y}). \end{aligned} $$
(7.9.6)
At the same time it follows from the proof of Theorem 7.5.1 that

$$\displaystyle \begin{aligned} \big(\hat\partial\vartheta(\bar{x}),0\big)\subset\hat\partial\big(\varphi(\cdot)+\delta\big(\cdot;\mbox{gph}\, F\big)\big)(\bar{x},\bar{y}). \end{aligned} $$
(7.9.7)
Replacing 
$$\hat \partial $$
by the larger on the right-hand sides of (7.9.6) and (7.9.7) and then using the basic subdifferential sum rule from Theorem 7.4.2 yield the inclusions

$$\displaystyle \begin{aligned} \Big(\kappa\hat\partial\vartheta(\bar{x}),0\Big)&amp;\subset \partial\psi(\bar{x},\bar{y})+\kappa\partial\varphi(\bar{x},\bar{y})+N\big((\bar{x},\bar{y});\mbox{gph}\, F\big),\\ \Big(\hat\partial\vartheta(\bar{x}),0\Big)&amp;\subset\partial\varphi(\bar{x},\bar{y})+N\big((\bar{x},\bar{y});\mbox{gph}\, F\big). \end{aligned} $$
(7.9.8)
Employing in these inclusions Theorem 7.4.1 under the lower-level regularity of 
$$(\bar {x},\bar {y})$$
with the usage of the singular subdifferential characterization of Lipschitzian functions in (7.2.9), we get

$$\displaystyle \begin{aligned} N\big((\bar{x},\bar{y});\mbox{gph}\, F\big)\subset\bigcup\Big\{\displaystyle\sum_{i=1}^r\lambda_i\partial f_i(\bar{x},\bar{y})\Big|\;\lambda_i\ge 0,\;\lambda_i f_i(\bar{x},\bar{y})=0\;\mbox{ as }\; i=1,\ldots,r\Big\}. \end{aligned}$$
It allows us to deduce from (7.9.8) the existence of 
$$u\in \hat \partial \vartheta (\bar {x})$$
ensuring the validity of (7.9.3) and

$$\displaystyle \begin{aligned} \kappa(u,0)\in\partial\psi(\bar{x},\bar{y})+\kappa\partial\varphi(\bar{x},\bar{y})+\displaystyle\sum_{i=1}^r\lambda_i\partial f_i(\bar{x},\bar{y}). \end{aligned}$$
Dividing the latter by κ > 0, we arrive at (7.9.4) and thus complete the proof of the theorem. □

Observe that we always have 
$$\hat \partial \vartheta (\bar {x})\neq \emptyset $$
if the optimal value function (7.8.1) is convex, which is surely the case when all the functions φ and f i therein are convex. If in addition the upper-level data are also convex, more general results were derived in [17] for problems of semi-infinite programming with arbitrary number of inequality constraints in locally convex topological vector spaces by reducing them to problems of DC programming with objectives represented as differences of convex functions. Note further that the necessary optimality conditions obtained in Theorems 7.8.3 and 7.9.3 are independent of each other even in the case of bilevel programs with smooth data. In particular, we refer the reader to [30, Example 6.24] for illustrating this statement and for using the obtained results to solve smooth bilevel programs. Finally, we mention the possibility to replace the inner semicontinuity assumption on the solution map S(x) imposed in both Theorems 7.8.3 and 7.9.3 by the uniform boundedness of this map in finite-dimensions, or by its inner semicompactness counterpart in infinite-dimensional spaces; cf. [11, 16, 30, 34] for similar transitions in various bilevel settings.

7.10 Concluding Remarks and Open Questions

In this self-contained chapter of the book we described a variational approach to bilevel optimization with its implementation to deriving advanced necessary optimality conditions for optimistic bilevel programs in finite-dimensional spaces. The entire machinery of variational analysis and generalized differentiation (including the fundamental extremal principle, major calculus rules, and subdifferentiation of optimal value functions), which is needed for this device, is presented here with the proofs. The given variational approach definitely has strong perspectives for further developments. Let us briefly discuss some open questions in this direction.

  • The major difference between the optimistic model (7.7.4) and pessimistic model (7.7.5) in bilevel programming is that the latter invokes the supremum marginal/optimal value function instead of the infimum type in (7.7.4). Subdifferentiation of the supremum marginal functions is more involved in comparison with that of the infimum type. Some results in this vein for problems with Lipschitzian data can be distilled from the recent papers [31, 32, 38], while their implementation in the framework of pessimistic bilevel programs is a challenging issue.

  • The given proof of the necessary optimality conditions for bilevel programs in Theorem 7.8.3 requires an upper estimate of 
$$\partial (-\vartheta )(\bar {x})$$
, which cannot be directly derived from that for 
$$\partial \vartheta (\bar {x})$$
since 
$$\partial (-\vartheta )(\bar {x})\neq -\partial \vartheta (\bar {x})$$
. To obtain such an estimate, we used the subdifferential convexification and the fact that the convexified/Clarke subdifferential of Lipschitz continuous functions possesses the plus-minus symmetry. However, there is a nonconvex subgradient set that is much smaller than Clarke’s one while having this symmetry. It is the symmetric subdifferential 
$$\partial ^0\vartheta (\bar {x}):=\partial \vartheta (\bar {x})\cup (-\partial (-\vartheta )(\bar {x}))$$
, which enjoys full calculus induced by the basic one. Efficient evaluations of 
$$\partial ^0\vartheta (\bar {x})$$
for infimum and supremum marginal functions would lead us to refined optimality conditions for both optimistic and pessimistic models in bilevel optimization.

  • The partial calmness property used in both Theorems 7.8.3 and 7.9.3 seems to be rather restrictive when the lower-level problem is nonlinear with respect to the decision variable. It is a challenging research topic to relax this assumption and to investigate more the uniform weak sharp minimum property and its modifications that yield partial calmness.

  • One of the possible ways to avoid partial calmness in bilevel programming is as follows. Having the solution map 
$$S\colon \mathbb {R}^n\rightrightarrows \mathbb {R}^m$$
to the lower-level problem, consider the constrained upper-level problem given by


$$\displaystyle \begin{aligned} \mbox{minimize }\;\Psi(x):=\psi\big(x,S(x)\big)\;\mbox{ subject to }\;x\in\Omega , \end{aligned} $$
(7.10.1)
where 
$$\psi \colon \mathbb {R}^n\times \mathbb {R}^m\to \overline {\mathbb {R}}$$
is the cost function on the upper level with the upper-level constraint set 
$$\Omega \subset \mathbb {R}^n$$
, and where the minimization of 
$$\Psi \colon \mathbb {R}^n\rightrightarrows \mathbb {R}$$
is understood with respect to the standard order on 
$$\mathbb {R}$$
. Then (7.10.1) is a problem of set-valued optimization for which various necessary optimality conditions of the coderivative and subdifferential types can be found in [30] and the references therein. Evaluating the coderivatives and subdifferentials of the composition ψ(x, S(x)) in terms of the given lower-level and upper-level data of bilevel programs would lead us to necessary optimality conditions in both optimistic and pessimistic models. There are many open questions arising in efficient realizations of this approach for particular classes of problems in bilevel optimization even with smooth initial data. We refer the reader to the paper by Zemkoho [47] for some recent results and implementations in this direction.

Note that a somewhat related approach to bilevel optimization was developed in [1], where a lower-level problem was replaced by the corresponding KKT system described by a certain generalized equation of the Robinson type [39]. Applying to the latter necessary optimality conditions for upper-level problems with such constraints allowed us to establish verifiable results for the original nonsmooth problem of bilevel programming.

  • Henrion and Surowiec suggested in [19] a novel approach to derive necessary optimality conditions for optimistic bilevel programs with 
$$\mathcal {C}^2$$
-smooth data and convex lower-level problems. Their approach used a reduction to mathematical programs with equilibrium constraints (MPECs) and allowed them to significantly relax the partial calmness assumption. Furthermore, in this way they obtained new necessary optimality conditions for the bilevel programs under consideration, which are described via the Hessian matrices of the program data.

    A challenging direction of the future research is to develop the approach and results from [19] to nonconvex bilevel programs with nonsmooth data. It would be natural to replace in this way the classical Hessian in necessary optimality conditions by the generalized one (known as the second-order subdifferential) introduced by the author in [27] and then broadly employed in variational analysis and its applications; see, e.g., [30] and the references therein.

  • As it has been long time realized, problems of bilevel optimization are generally ill-posed, which creates serious computational difficulties for their numerical solving; see, e.g., [5, 6, 47] for more details and discussions. Furthermore, various regularization methods and approximation procedures devised in order to avoid ill-posedness have their serious drawbacks and often end up with approximate solutions, which may be far enough from optimal ones. Thus it seems appealing to deal with ill-posed bilevel programs how they are and to develop numerical algorithms based on the obtained necessary optimality conditions. Some results in this vein are presented in [47] with involving necessary optimality conditions of the type discussed here, while much more work is required to be done in this very important direction with practical applications.

Acknowledgements

This research was partly supported by the USA National Science Foundation under grants DMS-1512846 and DMS-1808978, by the USA Air Force Office of Scientific Research under grant 15RT04, and by Australian Research Council Discovery Project DP-190100555.

The author is indebted to an anonymous referee and the editors for their very careful reading of the original presentation and making many useful suggestions and remarks, which resulted in essential improvements of the paper.