9.2 Optimality and Links with MPCC
Optimality conditions lie at the heart of any analysis of optimization problems. In standard bilevel programming, where there are two decision vectors i.e. the leader’s and the follower’s decision vectors, it has been a challenging task to write optimality conditions, since standard constraint qualifications like Mangasarian-Fromovitz constraint qualification (MFCQ) fails at each feasible point of the usual bilevel programming problem. See for example Dempe [
3], Dempe et al. [
5] and Dutta [
10]. The question that arises here is whether the simple bilevel programming problem (SBP) also faces similar hurdles. Let us now look into this issue. When we are concerned with optimality conditions, one of the key step is to reformulate it as an equivalent single-level problem. Assume that the lower-level problem in (SBP) has a finite lower bound. Let us set
![$$ \alpha = \inf \limits _{x \in C} g(x) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq36.png)
. Then (SBP) can be equivalently reformulated as
Those who are well-versed with the standard bilevel programming problem would immediately realize that above reformulation is the analogue of the value-function reformulation in the standard case. It is simple to see that the Slater’s condition fails here. If
![$$ \hat {x} \in C $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq37.png)
such that
![$$ g(\hat {x}) < \alpha $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq38.png)
, then it contradicts the fact that
![$$ \alpha = \inf \limits _{x \in C } g(x)$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq39.png)
. Since Slater’s condition fails one may doubt whether the KKT conditions would hold. One might also feel that KKT conditions might hold under more weaker conditions. The KKT condition for the (SBP), is defined to be the KKT condition for the reformulated problem which we will term as (r-SBP). The KKT conditions at any
x feasible to (r-SBP) postulates the existence of
λ ≥ 0, such that
- 1.
0 ∈ ∂f(x) + λ∂g(x) + N
C(x)
- 2.
If
x
∗ is the solution of (SBP), then
g(
x
∗) =
α and hence (2) automatically holds. Thus if
x
∗ solves (SBP), we have
as the required KKT condition. This has to be both necessary and sufficient. Proving sufficiency is not very difficult while proving necessity needs quite involved qualification conditions. See for example Dempe et al. [
6], Dhara and Dutta [
9] and Dempe et al.[
7].
Since (SBP) is a convex programming problem we expect it to have a necessary and sufficient optimality condition which ever way we approach it. However it was shown in Dempe et al. [
6], that this is not the case. Consider
g to be differentiable convex. In such a case one can equivalently write (SBP) as follows
It has been shown in [
6] that necessary optimality condition obtained is not sufficient.
Let us now focus our attention to the case where the lower level problem is a linear programming problem, i.e. consider the (SBP) given as
where as before
![$$ c \in \mathbb {R}^n ,\ A $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq40.png)
is a
m ×
n matrix and
![$$ b \in \mathbb {R}^m $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq41.png)
. For simplicity let us assume that
f is differentiable. Assume that
Then we can write down the associated (r-SBP) as
Since (r-SBP) has linear constraints it is well-known that there is no requirement of a constraint qualification for KKT condition to hold. They hold automatically (see Guler [
12]).
Let
x
∗ be a solution of (SBP) with the lower level problem as a linear problem as given above. Then there exists
λ ≥ 0 and
![$$ \mu \in \mathbb {R}^m $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq42.png)
such that
i.e.
Let
x
∗ be a feasible point of (SBP) and assume that there exists
![$$\mu \in \mathbb {R}^m $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq43.png)
and
λ ≥ 0, such that (
9.2.2) holds. Then is
x
∗ a solution of (SBP)? To answer this question it is important to note that
x
∗ is also feasible for (r-SBP) as 〈
c,
x〉 =
α. Now we have the following simple steps which we provide for completeness. For any feasible
x for (SBP) we have
These two together imply that
Now as 〈
c,
x〉 = 〈
c,
x
∗〉 =
α, using (
9.2.2) we have,
This implies that
Again as
Ax =
Ax
∗ =
b, we get
f(
x) −
f(
x
∗) ≥ 0. Therefore
which implies that
x
∗ is a solution of the (SBP) problem.
In particular, let
![$$ f(x) = \frac {1}{2} \|x\|{ }^2 $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq44.png)
and
![$$ \alpha = \inf \{ \langle c, x \rangle : Ax = b \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq45.png)
. Then
x
∗ is a solution of the minimum-norm solution problem of the linear programming problem
if and only if there exists
![$$ \mu \in \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq46.png)
and
λ ≥ 0 such that
So the key idea is to first solve the linear programming problem and find
α, and then solve the above system of equations to find the minimum norm solution.
9.2.1 Simple Bilevel Programming Problem and MPCC
This subsection focuses in the relation between simple bilevel programming problem (9.2.3) and its corresponding MPCC problem. Here we present some results showing that (SBP) problem exactly follows the standard optimistic bilevel programming problem in this context (see [4] for details). We will show that if the lower level problem of the (SBP) satisfies the Slater condition then it is equivalent to it’s corresponding MPCC problem. Lack of Slater condition leads to counter-examples where (SBP) and related MPCC are not connected in solution sense. The results of this subsection were first presented in the thesis of Pandit [18].
Let us consider the following structure of Simple Bilevel Programming (SBP) problem
where
![$$ f,h : \mathbb {R}^n \rightarrow \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq47.png)
are convex functions and for each
i ∈{1, ⋯ ,
m} the functions
![$$ g_i : \mathbb {R}^n \rightarrow \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq48.png)
are also convex functions. The corresponding MPCC is given by
(SBP) problem not only follows the optimistic bilevel programming problem, in fact we don’t get any better results for (SBP) problem in spite of the extra convexity properties which lacked in bilevel programming problem. Here we present some results showing that Slater condition is sufficient to establish equivalence between (SBP) and simple MPCC problem.
For any
![$$x \in \mathbb {R}^n$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq49.png)
such that
g(
x) ≤ 0, let us define
Then (
x,
λ) for
λ ∈ Λ(
x) is a feasible point of the problem (
9.2.4).
We omit the simple proof which is an application of Theorem 9.2.2 followed by Theorem 9.2.1. Corollary 9.2.3 tells us that the (MPCC) problem (9.2.4) is an example of a non-convex problem whose local minimizers are global minimizers also if the Slater condition holds for the lower level problem of the (SBP) problem (9.2.3).
In the above results one must have observed the key role played by the Slater condition. However if the Slater condition fails, the solution of one of the problems (9.2.3) or (9.2.4) may not lead to a solution of the other.
9.3 Algorithms for Smooth Data
To the best of our knowledge the first algorithmic analysis was carried out by Solodov [22] in 2007, though the more simpler problem of finding the least norm solution was tackled by Beck and Sabach [1] in 2014 and then a strongly convex upper level objective was considered by Sabach and Shtern [20] in 2017. While writing an exposition it is always a difficult task as to how to present the subject. Shall one concentrate on the simpler problem first or shall one present in a more historical way. After some deliberations we choose to maintain the chronological order of the development so the significance of the simplifications would be more clear.
Let us recall the problem (SBP) given as
where
![$$ f: \mathbb {R}^n \rightarrow \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq72.png)
and
![$$ g: \mathbb {R}^n \rightarrow \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq73.png)
are smooth convex functions and
C is a closed, convex set.
We shall present here the key ideas from Solodov [
22] in a manner that highlights the difficulties in developing a convergent scheme for (SBP). The key idea in Solodov’s approach is that of penalization. He constructs the function
where
ε varies along iteration. Hence we have sequence of positive numbers {
ε
k}, and
In effect we are going minimize at each step functions of the form
with
ε
k satisfying (
9.3.1) over the set
C.
Solodov then applies the projected gradient method and develops a scheme for (SBP). Let us try to understand why such a penalization scheme is interesting for (SBP). We believe the key lies in the reformulated version of (SBP), i.e. (r-SBP), which we write as
The usual penalization schemes may not work as
α is involved but a direct penalization can be done and one can write down the following penalty problem
Now for each value of
![$$ \varepsilon ,\ \displaystyle \frac {\alpha }{\varepsilon }$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq74.png)
is a fixed quantity, thus it is enough to minimize
![$$ f(x) + \displaystyle \frac {1}{\varepsilon } g(x) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq75.png)
. The minimizer of this function will also minimize
εf(
x) +
g(
x) and thus we can use this format to design algorithms for (SBP). Let us understand that this approach will remain very fundamental to design algorithms for (SBP).
The important issue is how to stop the algorithm. Of course if we have for some
k,
where
![$$ \beta _k = \eta ^{m_k} \bar {\beta } $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq78.png)
, where
m
k is as given in the scheme then it is clear that
x
k is the solution. However this is not the usual scenario and hence one needs to have a stopping criteria for the above algorithm while implementing the algorithm in practice. A good criteria may be to set a threshold
ε
0 > 0 and stop when for a given
k,
One needs to be cautioned that this algorithm is useful for those cases of (SBP) where the projection on the set
C can be computed easily.
Let us now discuss the convergence analysis. One of the key assumptions is that the gradients of f and g are Lipschitz continuous over bounded sets. However for simplicity in our exposition, we shall consider ∇f and ∇g are Lipschitz continuous on
with Lipschitz constant L > 0. So our assumption (A) will be as follows.
For examples of functions for which the gradient is Lipschitz continuous over
, see Beck and Sabach [1]).
Further the fact that ∇
f and ∇
g are Lipschitz with Lipschitz constant
L > 0 is equivalent to the fact that
for any
![$$ x, y \in \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq82.png)
.
For the convergence analysis Solodov considers the following assumptions which we mark here as Assumption B; which is as follows.
A key feature to observe is that we will generate the same iterates if instead of
![$$\varphi _{\varepsilon _k},\ k\in \mathbb {N}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq87.png)
, we successively minimize
over the set
C.
Of course we will begin by assuming that the (SBP) is solvable and hence the second assumption in Assumption B automatically holds.
Solodov [22] begins his analysis by proving a key lemma which says that finding the step size using the Armijo type rule indeed terminates, thus showing that algorithm described above is well-defined. The following lemma is presented in Solodov [22].
It is important to have a closer look at the last expression of the proof. If
, then
. But as
, in such a case
. This case may not be very likely in real implementation.
We shall now look into the issue of convergence analysis. We shall present the key convergence result and then outline the key steps of proof and highlighting the key points which are hallmark of the (SBP) problem.
Proof (Outline of the Main Ideas)
The key idea of the proof lies in replacing
![$$ \varphi _{\varepsilon _k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq97.png)
with
![$$ \widetilde {\varphi _{\varepsilon _k}} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq98.png)
, for each
k and rewriting (Algo-SBP-1) in terms of
![$$ \widetilde {\varphi _{\varepsilon _k}} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq99.png)
. In such a case the Armijo type criteria shows that
This shows that
i.e.
Now by the definition of
![$$ \bar {f},\ \bar {f} \leq f(x^k) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq100.png)
for all
![$$ k \in \mathbb {N}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq101.png)
as
x
k ∈
C. Also note that
ε
k+1 ≤
ε
k. Therefore for any
Then from (
9.3.5), we get
Since this holds for all
![$$ k \in \mathbb {N}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq103.png)
, by summing over from
k = 0 to
![$$ k = \hat {k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq104.png)
, we have
We urge the reader to check the above inequality themselves. Now as
![$$ \hat {k} \rightarrow \infty $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq105.png)
, we conclude that
Thus the series
![$$ \sum \limits _{k= 0}^{\infty } \langle \nabla \tilde {\varphi }_{\varepsilon _k}(x^k), x^k - x^{k+1} \rangle $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq106.png)
is summable and hence
From the point of view of convergence analysis it is important that the sequence generated by the algorithm is bounded. In that case an effort should be made to show that any limit point of that sequence is in the solution set of the problem under consideration. In fact in the setting of bilevel programming it is slightly tricky to show that {
x
k} is bounded. Rather it is much easier to show that if {
x
k} is bounded, then every limit point will be a solution of the lower level problem. This is the approach taken by Solodov [
22] and we will outline this approach here. We shall then show that {
x
k} is bounded and the accumulation points indeed belong to sol(SBP). Our first step would be to show that for any bounded {
x
k} generated by (Algo-SBP-1) a limit point is always a solution of the lower level problem. Now from the algorithm we know that 0 <
ε
k+1 <
ε
k <
ε
0. Thus
Using (
9.3.4) and (
9.3.6) we conclude that ∥
x
k −
x
k+1∥→ 0 and hence
Thus by definition, as
k → 0
Hence let
![$$ \tilde {x} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq107.png)
be an accumulation point or limit point of {
x
k} then as
k → 0
where
![$$ \beta _k \rightarrow \tilde {\beta } >0 $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq108.png)
(Noting that it is simple to show that
β
k is bounded). Hence
![$$ \tilde {x} = \mbox{proj}_C (\tilde {x} - \tilde {\beta } \nabla \tilde {\varphi }_{\varepsilon _k}(\tilde {x})) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq109.png)
showing that
![$$ \tilde {x} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq110.png)
is the solution of the lower level problem. But before we establish that
![$$\tilde {x}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq111.png)
also lies in sol(SBP), we shall show that {
x
k} is bounded. Using the fact that the sol(SBP) is non empty, closed and bounded; Solodov starts the analysis of this part by considering an
![$$\bar {x} \in \mbox{sol(SBP)}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq112.png)
. Now convexity of
![$$ \varphi _{\varepsilon _k}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq113.png)
will show us that
Solodov then employs the following identity
Note that,
Since by the algorithm
by the well known property of projection (see e.g. Hirriart Urruty and Lemarechal [
14]) we have
Hence from this we can deduce by noting that
![$$ \beta _k \leq \bar {\beta } $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq114.png)
and (
9.3.8),
Now from (
9.3.9) and using the above relations, we have
We need to focus on the quantity
![$$ (f(\bar {x}) - f(x^k)) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq115.png)
, an information about the second term on the right hand side is given by (
9.3.6). Now one needs to look into two separate cases formally.
We first deal with Case I. It is shown in Solodov [
22] that using case I, (
9.3.11) gives
This shows that
![$$ \{ \|x^k - \bar {x} \|{ }^2 \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq120.png)
converges since if {
a
k} and {
b
k} are sequences of non-negative real numbers such that
a
k+1 ≤
a
k +
b
k and
![$$ \sum \limits _{k=1}^{\infty } b_k < + \infty $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq121.png)
, then {
a
k} converges. Hence {
x
k} is bounded. Solodov [
22] then proves that
Hence we can find a convergent subsequence of
![$$ \{ x^{k_j} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq122.png)
of {
x
k} such that
Let
![$$ x^{k_j} \rightarrow \tilde {x} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq123.png)
as
j →
∞. Hence
![$$ f(\tilde {x}) = f(\bar {x}) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq124.png)
. We have seen before that
![../images/480569_1_En_9_Chapter/480569_1_En_9_IEq125_HTML.gif](../images/480569_1_En_9_Chapter/480569_1_En_9_IEq125_HTML.gif)
and thus
![$$ \tilde {x} \in \mbox{sol(SBP)} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq126.png)
.
Solodov [
22] then shows that if we set
![$$ \tilde {x} = \bar {x} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq127.png)
, then we can demonstrate that
![$$ x^k \rightarrow \tilde {x} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq128.png)
as
k →
∞. Now looking at the proof above we can show that
![$$ \{ \| x^k - \tilde {x} \|{ }^2 \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq129.png)
converges. Since
![$$ x^{k_j} \rightarrow \tilde {x} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq130.png)
; there is a subsequence of
![$$ \{ \| x^k - \tilde {x} \|{ }^2 \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq131.png)
that converges to 0. Hence the sequence
![$$ \{ \| x^k - \tilde {x} \|{ }^2 \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq132.png)
converges to 0, which implies that
![$$ x^k \rightarrow \tilde {x} \in \mbox{sol(SBP)}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq133.png)
. This argument is a very crucial one as one observes. For details see Solodov [
22]. Handling the second case is more complicated since we have to deal essentially with subsequences. One of the main tricks of this approach is to define the following quantity
i
k for each
![$$ k \in \mathbb {N} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq134.png)
as
An important thing to observe is that for a given
k one may have
![$$ \{ i \leq k : f(\bar {x}) > f(x^i) \} = \emptyset $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq135.png)
and in that case
i
k can not be defined. Thus a better way to write is that let
![$$ \hat {k} \in \mathbb {N} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq136.png)
be first integer for which
is non-empty. Then for all
![$$ k \geq \hat {k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq137.png)
and
![$$ k \in \mathbb {N} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq138.png)
we can define
Thus we shall only consider now
![$$ k \geq \hat {k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq139.png)
. Also from the description of Case II, we can conclude that
i
k →
∞ as
k →
∞.
The key step now is to show that
![$$ \{ x^{i_k} \}_{k= \hat {k}}^{\infty } $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq140.png)
is bounded. Let us observe that
The function
![$$ \psi (x) = \max \{ f(x) - f(\bar {x}), g(x) - \bar {g} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq141.png)
is a convex function. This shows that
Now sol(SBP) is assumed to be non empty and bounded. The level set {
x ∈
C :
ψ(
x) ≤ 0} =
L
ψ(0) of the convex function
ψ is non empty and bounded. Further it is well known that in such a case for any
![$$ q \in \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq142.png)
,
is also bounded. Solodov [
22] then shows that
This is a key step since it shows that the sequence
![$$ \{ \tilde {\varphi }_{\varepsilon _{k}}(x^{k})\} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq143.png)
is non-increasing and bounded below which establishes the fact that
![$$ \{ \tilde {\varphi }_{\varepsilon _{k}}(x^{k})\} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq144.png)
is a convergent sequence and hence bounded. This shows that the sequence
![$$ \{ g(x^k) - \bar {g} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq145.png)
is bounded. In the very next step Solodov [
22] uses this boundedness of
![$$ \{ g(x^k) - \bar {g} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq146.png)
in a profitable way. Let
q ≥ 0 be such that
![$$ g(x^k) - \bar {g} \leq q $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq147.png)
for all
![$$ k \in \mathbb {N} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq148.png)
.
Now from the definition of
i
k we have for all
and thus
implying that
Hence
![$$ x^{i_k} \in L_{\tilde {\varphi }_{\varepsilon _{i_k}}} (q) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq150.png)
for all
![$$ k \geq \hat {k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq151.png)
. Thus
![$$ \{ x^{i_k} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq152.png)
is bounded. Now for any
![$$ k \geq \hat {k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq153.png)
, the definition
i
k tells us that
for
i =
i
k + 1, ⋯ ,
k; if
k >
i
k. Thus from (
9.3.11), we get for
i =
i
k + 1, ⋯ ,
k
This holds true for all cases where
k >
i
k. Thus we have
for all
k for which
i
k <
k.
Let us now consider the case where
i
k =
k. In that case we will proceed in the following way. For any
![$$ k \in \mathbb {N} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq154.png)
we have
From the definition of projection we can conclude that
Thus we have for all
Observe however that this shows that
Thus for
k with
i
k =
k we have
![$$ \| x^k - \bar {x} \|{ }^2 = \| x^{i_k} - \bar {x} \|{ }^2 $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq156.png)
. Hence from (
9.3.14)
So we conclude that for all
k we have
Now
i
k →
∞ as
k →
∞ we see that
Thus using the boundedness of
![$$ \{ x^{i_k} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq157.png)
we conclude that {
x
k} is also bounded. We can immediately conclude that all accumulation points of {
x
k} lie in
![../images/480569_1_En_9_Chapter/480569_1_En_9_IEq158_HTML.gif](../images/480569_1_En_9_Chapter/480569_1_En_9_IEq158_HTML.gif)
. Further for any accumulation point
x
∗ of
![$$ \{ x^{i_k} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq159.png)
we have
![$$ f(\bar {x}) \geq f(x^*) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq160.png)
. But this shows that
x
∗∈sol(SBP). In fact all accumulation points of
![$$ \{ x^{i_k} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq161.png)
must be in sol(SBP). If
x
∗ is an accumulation point of
![$$ \{ x^{i_k} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq162.png)
, thus there exists a convergent subsequence
![$$ \{ x^{i_{k_j}} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq163.png)
of
![$$ \{ x^{i_k} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq164.png)
such that
![$$ x^{i_{k_j}} \rightarrow x^* $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq165.png)
as
j →
∞. Further we have
Thus as
j →
∞ we have
We can also see that for any convergent subsequence of
![$$ \{ x^{i_k} \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq166.png)
(
9.3.16) holds, this shows that
Now for each
k, Solodov [
22] defines
Now using (
9.3.15) with
![$$ \bar {x} = \bar {x}^k $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq167.png)
we have
Now as
k →
∞ we have
d
2(
x
k, sol(SBP)) → 0, hence
d(
x
k, sol(SBP)) → 0. □
We shall now discuss a work by Beck and Sabach [
1], which focuses on what we believe as the most important class of simple bilevel problems. In this case the upper-level objective is always strongly convex. We shall denote such problems as (SBP-1). Thus an (SBP-1) problem is given as
where
![$$ \phi : \mathbb {R}^n \rightarrow \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq168.png)
is strongly convex with modulus
ρ > 0 and differentiable, while
![$$ g: \mathbb {R}^n \rightarrow \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq169.png)
is a convex and differentiable function with ∇
g a Lipschitz function on
![$$\mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq170.png)
with Lipschitz rank (or Lipschitz constant)
L > 0. When
![$$ \phi (x) = \frac {1}{2} \|x\|{ }^2 $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq171.png)
, we are talking about the minimum norm problem. Thus (SBP-1) actually encompasses a large class of important problems.
Beck and Sabach [
1] develops the Minimal norm gradient algorithm, which they present in two different versions. For the first version the Lipschitz rank of ∇
g is known while for the second version it is assumed that the Lipschitz rank is not known and thus an Armijo type criteria is used. In Solodov [
22] in comparison an Armijo type criteria is used even though the Lipschitz rank was known. Also note that in Beck and Sabach [
1], the Lipschitz gradient assumption is only on
g and not on
ϕ. In fact Beck and Sabach [
1] mention that a problem like (SBP-1) can be reformulated as follows
where
![$$ g^* = \inf \limits _{x\in C } g $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq172.png)
. We are already familiar with this reformulation and further they mention that as the Slater condition fails and
g
∗ may not be known, the above problem may not be suitable to develop numerical schemes to solve (SBP-1). They mention that even if one could compute
g
∗ exactly, the fact that Slater’s condition fails would lead to numerical problems. As a result they take a completely different path by basing their approach on the idea of cutting planes. However we would like to mention that very recently in the thesis of Pandit [
18], the above reformulation has been slightly modified to generate a sequence of problems whose solutions converge to that of (SBP) or (SBP-1), depending on the choice of the problem. The algorithm thus presented in the thesis of Pandit [
18] has been a collaborative effort with Dutta and Rao and has also been presented in the preprint [
19]. The approach in Pandit et al. [
19] has been to develop an algorithm where no additional assumption on the upper level objective is assumed other than the fact that it is convex, where the lower level objective is differentiable but need not have a Lipschitz gradient. The algorithm in Pandit [
18] or [
19] is very simple and just relies on an Armijo type decrease criteria. Convergence analysis has been carried out and numerical experiments have given encouraging results. Further in Pandit et al. [
19], the upper level objective need not be strongly convex.
Let us outline the work of Beck and Sabach [
1]. Let us begin by mentioning some key tools in [
1]. Let
and we call
x
∗ the prox-center of
ϕ. Without loss of generality we may assume that
ϕ(
x
∗) = 0. Then strong convexity of
ϕ tells us that
A key tool in the study of Beck and Sabach [
1] is the Bregman distance. This is a non-Euclidean distance which makes the analysis a bit more involved.
Let
![$$ h : \mathbb {R}^n \rightarrow \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq173.png)
be a strictly convex function. Then the Bregman distance generated by
h is given as
It is clear that
![$$ D_h (x,y) \geq 0 ,\ \forall x, y \in \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq174.png)
and
D
h(
x,
y) = 0 if and only if
x =
y. Further if
h is strongly convex,
A key inequality associated with a Bregman distance is the well-known three point inequality. If
h is strongly convex and
![$$ x, y,z \in \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq175.png)
, then
The following two vector-valued mappings play a central role in [
1]. First comes the projected gradient mapping or proj-grad mapping. For
μ > 0, the proj-grad mapping is given as
The gradient map is defined as
If
![$$ C = \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq176.png)
, then
G
M(
x) = ∇
g(
x), which is thus the source of its name. Further
G
M(
x
∗) = 0 implies that
x
∗ solves the lower level problem in (SBP-1). Further it was shown in [
1] that the function
is monotonically non-decreasing over (0,
∞). The idea of their algorithm is based on the idea of cutting planes in convex optimization. If
![$$ x^* \in \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq177.png)
is a minimizer of a convex function
![$$ \phi : \mathbb {R}^n \rightarrow \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq178.png)
, then for any
![$$ x\in \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq179.png)
we have
By monotonicity, we have
Thus if
![$$\mbox{sol}(\phi , \mathbb {R}^n)$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq180.png)
denotes the solution set of the problem of minimizing
ϕ over
![$$ \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq181.png)
, then
Therefore, given
![$$ x\in \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq182.png)
, we can immediately eliminate the strict half space
So one needs to check, given any
![$$ x\in \mathbb {R}^n$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq183.png)
, the cut at
x is given by the hyperplane
This is a very important approach in convex optimization, see for example the fundamental paper by Kelley [
15]. Following the idea of the cutting plane method, Beck and Sabach introduces the following set whose motivation will be clear as we go along.
If
![$$ C = \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq184.png)
, then
This is what they refer to as deep cuts and play a fundamental role in their algorithm which we describe below. The first algorithm considers the fact that the Lipschitz constant of ∇
g is known. As before let us call the Lipschitz constant as
L > 0. Here is the algorithm.
The reason for setting
![$$ \beta = \frac {4}{3}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq189.png)
when
![$$ C \neq \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq190.png)
i.e.
![$$ C \subset \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq191.png)
is that
![../images/480569_1_En_9_Chapter/480569_1_En_9_IEq192_HTML.gif](../images/480569_1_En_9_Chapter/480569_1_En_9_IEq192_HTML.gif)
and if
![$$ C = \mathbb {R}^n ,\ \beta = 1$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq193.png)
as
![$$ \mbox{sol}(g, \mathbb {R}^n ) \subset Q_{L, 1, x }$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq194.png)
. These facts are proved in Lemma 2.3 and Lemma 2.4 in Beck and Sabach [
1]. In Lemma 2.3 in [
1], it is shown that
If we set
![../images/480569_1_En_9_Chapter/480569_1_En_9_IEq195_HTML.gif](../images/480569_1_En_9_Chapter/480569_1_En_9_IEq195_HTML.gif)
, then
G
L(
x
∗) = 0 and hence
Thus
![$$ x^* \in Q_{L, \frac {4}{3}, x }$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq196.png)
. Since ∇
g is Lipschitz continuous with constant
L, then from [
1] we see that
Putting
y =
x
∗, where
![../images/480569_1_En_9_Chapter/480569_1_En_9_IEq197_HTML.gif](../images/480569_1_En_9_Chapter/480569_1_En_9_IEq197_HTML.gif)
, we have
This shows that
If we do not have any knowledge of the Lipschitz constant of ∇
g, then we use an Armijo type criteria in the Minimal norm gradient algorithm that we state here.
We shall now show that
![$$ \mbox{sol(g,C)} \subseteq Q_{\bar {L},2,x}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq201.png)
. It has been demonstrated in Lemma 2.5 in Beck and Sabach [
1] that if
then for any
x
∗∈sol(g,C), one has
and thus
![$$ \mbox{sol(g,C)} \subseteq Q_{\bar {L},2,x}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq202.png)
.
Beck and Sabach [
1] also shows that whichever form of the minimal norm gradient algorithm is used one has
We will now state below the result of Beck and Sabach [
1], describing the convergence of the minimum norm gradient algorithm.
Keeping in view our chronological approach to the development of algorithms for simple bilevel programming, we shall end this section by discussing the work of Sabach and Shtern [
20], which appeared in 2017 and is in some sense a generalization of Beck and Sabach [
1]. The problem they consider is the following
where
![$$ \phi : \mathbb {R}^n \rightarrow \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq205.png)
is a strongly convex function and
where
f is continuously differentiable and
g is an extended-valued proper lower semi-continuous convex function. We shall call this problem (SBP-2). In fact strictly speaking we should consider this problem under the category of non-smooth simple bilevel problems since
g can be non-smooth. However we are discussing this problem in this section since we view (SBP-2) as a generalization of (SBP-1). Observe that if we choose
g =
δ
C, where
δ
C is the indicator function of a closed convex set
C, then (SBP-2) reduces to (SBP-1). Those readers who are aware of the algorithms of convex optimization might guess that a proximal gradient type approach needs to be used for solving (SBP-2). Indeed Sabach and Shtern [
20], does develop an algorithm based on the prox-mapping. It is important to realize that since
ϕ is strongly convex, (SBP-2) also has a unique minimizer which we may mark as
x
∗. In fact the method developed in [
20], called BiG-SAM, is simpler and cheaper compared than the Minimal Norm Gradient (MNG) method of Beck and Sabach [
1]. In the Big-SAM method one needs to compute ∇
ϕ only, while in the (MNG) method one needs to compute ∇
ϕ and also to minimize
ϕ to compute the prox-center.
In Sabach and Shtern [20], the following assumptions are made
- (1)
f has Lipschitz-gradient, i.e. ∇f is Lipschitz on
with Lipschitz constant L
f.
- (2)
One of the key approaches to solve the lower-level problem in (SBP-2) is the proximal gradient approach which depends on the prox-mapping. Given
![$$ h : \mathbb {R}^n \rightarrow \mathbb {R} \cup \{ + \infty \} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq207.png)
, a proper lower semi-continuous prox-mapping is given as
The proximal point algorithm generates the iterates for the lower level problem using the following scheme
where
t > 0 is the step-size. If we set
then
x is a solution of the lower level problem if and only if
x =
P
t(
x). Further
P
t(
x) is also non-expansive (see for example Lemma 1 in [
22]). In [
22], it is also assumed that the strongly convex upper level objective
ϕ has a modulus
ρ > 0 and also a Lipschitz gradient on
![$$ \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq208.png)
, whose Lipschitz constant is given as
L
ϕ. We are now in a position to state the Big-SAM method.
The key result regarding the Big-SAM method, is given in Proposition 5 in [
20] which says the iterates {
x
k} generated by this method converge to a point
![$$ \tilde {x} \in S $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq215.png)
such that
showing that
![$$ \tilde {x} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq216.png)
solves (SBP-2) and hence
![$$ \tilde {x} = x^* $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq217.png)
. Here BiG-SAM means Bilevel gradient sequential averaging method, where the sequential averaging is reflected in (iii) of Step 1 in the above algorithm.
Very recently a modification of the BiG-SAM method was presented in [21].
9.4 Algorithms for Non-smooth Data
There are not many algorithmic analysis of the (SBP) problem in the literature and of course for the non-smooth (SBP) problem the situation is not better. The bundle method for non-smooth (SBP) problem presented by Solodov [
23] is one of the initial work in this direction. He considered an (SBP) problem with both the upper and lower level objective function to be non-smooth and the lower level problem is unconstrained. The (SBP) problem considered by Solodov in [
23] is given as
where
![$$ f, g : \mathbb {R}^n \rightarrow \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq218.png)
are non-smooth convex functions. The main idea behind the algorithm presented in [
23] is to minimize the penalized function
over
![$$ \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq219.png)
. As
ε ↓ 0, the solutions of the problem
tend to the solution of the (SBP) problem (
9.4.1). But it is almost impossible to calculate the solution of (
9.4.2) exactly. To avoid that, Solodov’s idea was to make one step in the descent direction of the function
![$$ \varphi _{\varepsilon _k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq220.png)
from the
k-th iteration
x
k and then update
ε
k to
ε
k+1 and repeat the step. To realize the descent step Solodov considered the bundle method for unconstrained non-smooth optimization problem (see [
16]). Also we will need
ε-subdifferential to discuss the algorithm which we present here.
The
ε-
subdifferential of
![$$f,\ \partial _{\varepsilon }f: \mathbb {R}^n \rightrightarrows \mathbb {R}^n$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq221.png)
, is defined as
The idea of a bundle method is to store information of the past in a
bundle. At any
k-th step let
x
k be the iteration point and
y
i (
i = 1, 2, ⋯ ,
l − 1) are the points generated by the algorithm so far. Note that all
y
i’s may not be accepted as the iteration point in the process. Which means there may be some
y
i’s which are not included in the sequence {
x
k}. The information about all the
y
i (
i = 1, 2, ⋯ ,
l − 1), such as the subgradient and some quantities required in this algorithm are stored in the
bundle. Note that
![$$ \{ x^n \}_{n=1}^ {k}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq222.png)
is actually a finite subsequence of the finite sequence
![$$ \{ y^i \}_{i=1}^ {l-1}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq223.png)
. Let there be
l iterations before we choose to modify
x
k and
ε
k, then
k(
l) is the index of the last iteration and
k =
k(
l). Let
ξ
i ∈
∂f(
y
i) and
ζ
i ∈
∂g(
y
i). Then
![$$ \varepsilon _k \xi ^i + \zeta ^i \in \partial \phi _{\varepsilon _k} (y^i) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq224.png)
. In the bundle method, a linearisation of the function
![$$ \varphi _{\varepsilon _k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq225.png)
is used (see for example [
16]). Then
![$$ \varphi _{\varepsilon _k}(x) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq226.png)
can be approximated by the following function
But this includes the calculation of all the subgradients of
![$$ \varphi _{\varepsilon _k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq227.png)
at
x
k, which is again a tough job. So the following cutting-planes approximation
ψ
l is used by Solodov of the function
![$$ \varphi _{\varepsilon _k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq228.png)
.
This is a linearisation centred at
y
i. To reduce the storage requirements the following expression is constructed, which is centred at
x
k.
where
and
Clearly, by convexity of
f and
g;
![$$e^{k,i}_{f},e^{k,i}_{g} \geq 0 $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq229.png)
. Now as
ξ
i ∈
∂f(
y
i), for any
![$$ x \in \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq230.png)
we have
This implies that
![$$ \xi ^i \in \partial _{e^{k,i}_f} f(x^k) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq231.png)
. Similarly,
![$$ \zeta ^i \in \partial _{e^{k,i}_g} g(x^k) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq232.png)
. Then
Choose
μ
l > 0, then the minimizer of the following convex optimization problem
can be taken as the next candidate point
y
l. Note that (
9.4.6) has unique solution since the objective function is strongly convex. If
![$$ \varphi _{\varepsilon _k} (y^l) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq233.png)
is sufficiently small compared with
![$$ \varphi _{\varepsilon _k} (x^k) $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq234.png)
, then
y
l is considered to be a serious (or acceptable) step and
x
k+1 :=
y
l. In fact
y
l is considered to be a serious step if the following decrease criteria holds, i.e.
For
γ ∈ (0, 1), let
with,
Therefore,
We can see that
δ
l deals with the difference in the functional values of
![$$ \varphi _{\varepsilon _k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq235.png)
at
x
k and
y
l and also keeps the distance between
x
k and
y
l into account. Note that
δ
l ≥ 0 as
![$$\hat {\varepsilon }^{k,l} \geq 0$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq236.png)
, which is shown in Lemma 2.1 [
23]. If
y
l does not satisfy the condition (
9.4.7), then
y
l is considered as a null step and is not included in the sequence of iterates. But this null step then contributes to construct a better approximation
ψ
l+1 of the function
![$$ \varphi _{\varepsilon _k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq237.png)
using (
9.4.3). Then
y
l+1 is generated using (
9.4.6) with
μ
l+1 > 0, which can again be a serious step or a null step. The process is continued until
x
k+1 is generated.
Let us now discuss a little about
bundles, where the information about the candidate points
y
l (no matter serious step or null step) is stored. In [
23], Solodov describes an efficient way to store all the information needed for the bundle method algorithm and also ensures that the number of elements in the
bundles is under some pre-decided bound. Managing the bundle this way keeps the algorithm computationally more suitable and efficient. Solodov [
23] has divided the information at any iteration
l into two parts, one is called oracle bundle denoted by
![$$ B^{oracle}_l $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq238.png)
and another one is aggregate bundle,
![$$ B^{agg}_l $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq239.png)
. Both the bundles contain approximated subgradients of the objective functions
f and
g at
x
k and are defined in the following way.
and
where
![$$e^{k,i}_f,e^{k,i}_g $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq240.png)
are given by (
9.4.4) and (
9.4.5) and let us set
with
![$$ \lambda ^l_i, \hat {\lambda }^l_i \geq 0 $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq241.png)
and
![$$ \sum _{i \in B^{oracle}_l} \lambda ^l_i + \sum _{i \in B^{agg}_l} \hat {\lambda }^l_i = 1 $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq242.png)
.
Note that
ξ
i ∈
∂f(
y
i) and
ζ
i ∈
∂g(
y
i), so we can see that the bundle
![$$ B^{oracle}_l $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq243.png)
stores the subgradient information of the objective functions at some candidate points
y
i. But there may not be any candidate point
y
j,
j <
l such that
![$$ \hat {\xi }^j \in \partial f (y^j)$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq244.png)
and
![$$ \hat {\zeta }^j \in \partial g (y^j)$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq245.png)
. Also note that the construction of
![$$\hat {\varepsilon }^{k,l}_f $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq246.png)
and
![$$ \hat {\varepsilon }^{k,l}_g $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq247.png)
justifies the term aggregate bundle. Even though the bundles
![$$ B^{oracle}_l $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq248.png)
and
![$$ B^{agg}_l $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq249.png)
do not restore all the subgradients at all the candidate points, still a cutting-plane approximation of
![$$ \varphi _{\varepsilon _k}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq250.png)
can be done using these bundle information which is given by
As we can see the two approximations (
9.4.3) and (
9.4.10) of the penalized function
![$$ \varphi _{\varepsilon _k} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq251.png)
are different. The connection between these two approximations are well established in terms of the solution of the problem given in (
9.4.6) and the function (
9.4.10) in the following lemma (Lemma 2.1, [
23]).
Here note that
λ
l and
![$$ \hat {\lambda }^l $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq264.png)
in Lemma
9.4.1 are obtained as a part of subgradient calculation of
ψ
l while solving the strongly convex optimization problem (
9.4.6). We have already mentioned that Solodov [
23] has developed two different bundles to maintain the bundle size under a fixed limit. Here we discuss how that is done. Let the maximum size that the bundle can attain together be |
B|
max. So at any
l-th step, if
then at least two elements are removed from
![$$ B^{oracle}_{l+1}\cup B^{agg}_{l+1} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq265.png)
and the aggregate information
![$$ ( \hat {\varepsilon }^{k,l}_{f},\hat {\varepsilon }^{k,l}_{g}, \hat {\xi }^l, \hat {\zeta }^l )$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq266.png)
is included in
![$$ B^{agg}_{l} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq267.png)
and renamed as
![$$ B^{agg}_{l+1} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq268.png)
. Also include
![$$ ( e^{k,l}_{f},e^{k,l}_{g}, \xi ^l, \zeta ^l )$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq269.png)
in
![$$ B^{oracle}_{l}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq270.png)
and called
![$$ B^{oracle}_{l+1}$$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq271.png)
. Now using the bundle information, Solodov constructed the following relations which satisfies all the results mentioned in Lemma
9.4.1.
Now we are ready to present the algorithm for (SBP) problem.
Note that
works as a stopping criteria to minimize the function ψ
k. The convergence result of this algorithm is presented through the following theorem from [23].
Another algorithm is proposed by Dempe et al. [
8] for the non-smooth simple bilevel programming problem. In [
8], we present an algorithm for the (SBP) problem in more general scenario than that we have discussed so far. Note that in [
23], the considered (SBP) problem is unconstrained in the lower level, whereas the (SBP) problem we consider, which given below, has constrained lower level problem.
where
![$$ f, g : \mathbb {R}^n \rightarrow \mathbb {R} $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq287.png)
are continuous, convex functions and
![$$ C \subseteq \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq288.png)
is closed, convex set. Our algorithm is inspired from the penalisation approach by Solodov [
22]. But our algorithm is different from Solodov’s approach in two aspects, one is that we don’t assume any differentiability assumption on the upper and lower level objective functions which naturally helps us to get rid of the Lipschitz continuity assumption on the gradient of the objective functions that is made by Solodov [
22], another one is the fact that our algorithm is based on proximal point method whereas Solodov’s algorithm is based on projected gradient method. Our Algorithm is also inspired by Cabot [
2] and Facchinei et al. [
11] but again different in the sense that our algorithm deals with more general form of (SBP) compared with those. The only assumption we make is that the solution set of the (SBP) problem needs to be non-empty and bounded. This is done to ensure the boundedness of the iteration points which will finally lead to the existence of an accumulation point and hence a solution of the (SBP) problem.
In the proximal point algorithm that we present, we consider an approximate solution rather than the exact solution. For this purpose we need to use ε- subdifferential and ε-normal cone for ε > 0. We have already presented the ε-subdifferential while discussing the last algorithm. Here we present the ε-normal set.
The
ε-normal set of
![$$ C \subseteq \mathbb {R}^n $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq289.png)
is defined as
Consider the following sequence of penalized functions
We aim to minimize this function approximately over the constraint set
C to obtain the iteration point at any (
k + 1)-st step. Let
η
k > 0 and
λ
k > 0, then the (
k + 1)-st iterate comes from the following relation.
Considering the optimality condition for this approximate minimization problem we get our key step to design the algorithm (for details see [
8]), which is given as
where
![$$ \eta _k^1 + \eta _k^2 \leq \eta _k $$](../images/480569_1_En_9_Chapter/480569_1_En_9_Chapter_TeX_IEq290.png)
. Now we present the algorithm formally.
For the convergence analysis of this algorithm we make the following assumptions on the sequences {
ε
k}, {
λ
k} and {
η
k}. Then the convergence results of this algorithm is presented in the next theorem.
The reference [13] in the list of reference was first brought to our notice by M. V. Solodov. In [13], the ε-subgradient is used to develop an inexact algorithmic scheme for the (SBP) problem. however the scheme and convergence analysis developed in [13] is very different from the one in [8]. It is important to note that in contrast to the convergence analysis in [8], the scheme in [13] needs more stringent condition. Also an additional assumption is made on the subgradient of the lower level objective (see Proposition 2, [13]). They also considered the case when the upper level objective is smooth and also has Lipschitz gradient.