Chapter Ten: Optimization

There is a multifaceted interaction between optimization and Monte Carlo simulation, which is illustrated by the rich connections between this and other chapters of this book. Monte Carlo methods may be used when dealing with stochastic optimization models, i.e., when some of the parameters characterizing a decision problem are uncertain. This is clearly relevant in finance, and there is an array of approaches depending on the problem structure and on our purpose:

In stochastic programming (SP) models we generalize deterministic mathematical programming models in order to deal with uncertainty that is typically represented by a scenario tree. The scenario tree may be generated by crude Monte Carlo sampling, or we may adopt other approaches like Gaussian quadrature (see Chapter 2), variance reduction (see Chapter 8), or low-discrepancy sequences (see Chapter 9). The stochastic programming models that we consider here can be extended to account for risk aversion, as illustrated in Chapter 13.

Another approach to decision making under uncertainty relies on dynamic programming (DP). We will illustrate the advantages and the disadvantages of DP with respect to SP models. The principle behind DP is embodied in the Bellman recursive equation and is extremely powerful; nevertheless, DP is plagued by the so-called curse of dimensionality. There are approaches to overcome this issue and to make DP a viable computational tool. On the one hand, we have numerical dynamic programming methods based on relatively straightforward tools from numerical analysis; on the other hand we have approximate dynamic programming (ADP) methods, which rely more heavily on learning by Monte Carlo simulation. ADP methods are also used in Chapter 11 to price options with early exercise features.

Both SP and classical DP approaches rely on a formal model of the system, in the form of explicit mathematical expressions. However, we may have to cope with systems that are too complicated for analytical modeling. In such a case, we may rely on simulation-based optimization, where there is also a clear role for Monte Carlo methods. The idea is also appealing when we want to reduce the complexity of optimization under uncertainty by making decisions on the basis of parameterized decision rules. Learning by search and simulation is a possible strategy in such cases. Sometimes, we may also wish to take advantage of simulation to estimate the sensitivity of performance with respect to some parameters, in order to guide the search process. Sensitivity estimation is the topic of Chapter 12.

A further use of Monte Carlo methods is to check the out-of-sample robustness of a solution. Within an optimization model, we may only afford a limited representation of uncertainty; the in-sample value of the objective function may be a misleading indicator of quality. Thus, it may be useful to verify the performance of the selected solution in a much wider set of scenarios.

Stochastic optimization in all of its forms does not tell the whole story about the interaction between optimization and Monte Carlo methods. Stochastic search strategies are also used to solve deterministic nonconvex optimization problems. There is an array of useful strategies, including some fancy ones mimicking biological systems, such as genetic algorithms or particle swarm optimization. Such methods are typically derivative-free, i.e., unlike classical nonlinear programming methods, they do not rely on information provided by the gradient or the Hessian matrix of the objective function. This allows to use them when the objective is not guaranteed to be differentiable or even continuous. Furthermore, by blending exploitation with exploration behavior, they may be able to overcome the local optimality issues that make nonconvex optimization difficult. The optimization of nonconvex functions may arise, e.g., in model calibration or parameter estimation by maximum likelihood (see Chapter 4), when the underlying model is complicated enough.

As the reader can imagine, there is an enormous range of models and methods involved, which cannot be covered in a single chapter. Nevertheless, we offer here a set of basic examples in order to get a taste for this wide and quite interesting application field. In Section 10.1 we outline a classification of optimization models, for the sake of the uninitiated. Then, we provide a few concrete examples in Section 10.2, with due emphasis on economic and financial applications. Given our limited aim and scope, we will not go through traditional concepts like Lagrange multipliers in nonlinear programming or the simplex method for linear programming.¹ When necessary, we will just use R functions from suitable packages implementing these methods. Emphasis is partly on modeling and partly on alternative methods, which are illustrated in Section 10.3 for global (nonconvex) optimization. These methods can also be applied to simulation-based optimization, which is the subject of Section 10.4. In Section 10.5 we illustrate model building in stochastic programming. We cannot cover specialized solution methods for this class of models, but we insist on scenario generation in Section 10.5.3, illustrating important concepts such as in-sample and out-of-sample stability. The last three sections are devoted to stochastic dynamic programming. The cornerstone principle of dynamic programming, the recursive functional equation due to Bellman, is introduced in Section 10.6, where we also compare this approach against stochastic programming with recourse. In Section 10.7 we describe “standard” numerical methods for dynamic programming, which are well-suited to problems with a rather moderate size, whereas in Section 10.8 we describe ADP strategies that are aimed at larger scale problems and feature interesting links with machine learning.

10.1 Classification of optimization problems

This rather dry and abstract statement encompasses an incredible variety of decision problems. Note that there is no loss of generality in considering minimization problems, as a maximization problem can be easily converted into a minimization one:

Indeed, many software tools assume that the problem is either in minimization or in maximization form and, if necessary, leave the burden of changing the sign of the objective function to the user. In other cases, a flag is used to characterize the kind of problem.

In this book we only deal with finite-dimensional optimization problems, involving a finite number of decision variables. In continuous-time optimal control problems, where system dynamics is described by differential equations, decisions may be represented by functions u(t) where t

[t₀, T]; hence, such problems are infinite-dimensional. We do consider dynamic optimization problems in this chapter, but we assume that time has been discretized; this results in a finite-dimensional problem when the time horizon is finite. When the time horizon is infinite, in principle we do have an infinite-dimensional problem (though countably so); however, if the system is stationary, we may look for a stationary policy by dynamic programming. Dynamic programming relies on functional equations whose unknown are, in fact, infinite-dimensional objects if the underlying state space is continuous. As we shall see, we always boil down the overall problem to a sequence of finite-dimensional subproblems by suitable discretizations. Hence, what we really deal with are mathematical programming problems, but we will use this term and the more general “optimization problem” interchangeably.

When S ≡

ⁿ we speak of an unconstrained optimization problem; otherwise we have a constrained optimization problem. The feasible region S, in concrete terms, is typically described by a set of of equality and inequality constraints, resulting in the mathematical program:

The condition x

X may include additional restrictions, such as the integrality of some decision variables. As we shall see, the most common form of this restriction concerns binary decision variables, i.e., x_j

{0, 1}; this trick of the trade is useful to model logical decisions like “we do it” vs. “we do not.” Simple lower and upper bounds, describing box constraints like l_j ≤ x_j ≤ u_j, are usually considered apart from general inequalities for the sake of computational efficiency. So, we might also have X = {x

ⁿ | 1 ≤ x ≤ u}, where vectors 1 and u collect the lower and upper bounds on decision variables, respectively.

An optimization problem can be very simple or on the verge of intractability, depending on some essential features of its building blocks. Such features are often more important than the sheer size of the problem. We will consider concrete modeling examples later, shedding light on such features and showing the astonishing variety of problems that we may address, but an essential classification framework involves the following dimensions:

Convex vs. nonconvex optimization problems

By far, the most important property of an optimization problem is convexity. Convexity is a property of sets, in particular of the feasible set S, which can be extended to a property of functions, in our case, the objective function.

Convexity can be grasped intuitively by observing that points of the form λx + (1 − λ)y, where 0 ≤ λ ≤ 1, are simply the points on the straight-line segment joining x and y. So, a set S is convex if the line segment joining any pair of points x, y

S is contained in S. This is illustrated in Fig. 10.1: S₁ is convex, but S₂ is not. S₃ is a discrete set and it is not convex; this fact has important consequences for discrete optimization problems. It is easy to see that the intersection of convex sets is a convex set. This does not necessarily apply to set union.

Example 10.1 Polyhedra are convex sets

Polyhedral sets are essential in describing the feasible set of a linear programming problem. A hyperplane a_ix = b_i, where b_i and a_i, x ⁿ, divides ⁿ into two half-spaces expressed by the linear inequalities a_ix ≤ b_i and a_ix ≥ b_i. Note that we always assume column vectors, so we use transposition to specify a row vector a_i. A polyhedron P ⊆ ⁿ is a set of points satisfying a finite collection of linear inequalities, i.e.,

equation

where matrix A ^m,n collects row vectors a_i, i = 1, …, m, and column vector b ^m collects the right-hand sides b_i. Note that a linear equality can be rewritten as two inequalities with opposite senses. A polyhedron is therefore the intersection of a finite collection of half-spaces and is convex. As we will discuss later (see Eqs. (10.4) and (10.5)), the feasible set of a linear programming problem can be described, e.g., as S = {x ⁿ | Ax = b, x ≥ 0} or S = {x ⁿ | Ax ≥ b}, and it is a polyhedron.

Example 10.2 The union of convex sets need not be convex

The closed intervals [2, 5] and [6, 10] are clearly convex sets. However, their union [2, 5] ∪ [6, 10] is not. As a more practically motivated example, consider an activity whose level is measured by the decision variable x_j; in concrete, x_j may represent the weight of an asset in a financial portfolio. A possible restriction is

equation

meaning that we may not invest in that asset, i.e., x_j = 0, but if we do, there are a lower bound l_j and an upper bound u_j on its weight. This restriction specifies a nonconvex set, and it should not be confused with the box constraint l_j ≤ x_j ≤ u_j, which corresponds to a convex set. Later, we shall see how we can represent such a nonconvexity by the introduction of binary decision variables.

DEFINITION 10.2 (Convex functions) A function f :

ⁿ →

, defined over a convex set S ⊆

ⁿ, is a convex function on S if, for any y and z in S, and for any λ

[0, 1], we have

The definition can be interpreted by looking at Fig. 10.2. If we join any two points on the function graph with a line segment, all of the segment lies above the function graph. In other words, a function is convex if its epigraph, i.e., the region above the function graph, is a convex set. If the condition (10.3) is satisfied with strict inequality for all y ≠ z, the function is strictly convex. We illustrate examples of convex and nonconvex functions in Fig. 10.3. The first function is convex, whereas the second is not. The third function is a polyhedral convex function, and it is kinky. This example shows that a convex function need not be differentiable everywhere. Note that in the second case we have a local minimum that is not a global one. Indeed, convexity is so relevant in minimization problems because it rules out local minima.

Example 10.3 Local minima in polynomial optimization

Consider the following unconstrained problem:

equation

The objective function is a polynomial, which can be dealt with in R by the polynom package, as illustrated by the script in Fig. 10.4. The script produces the following output:

FIGURE 10.4 Local minima in polynomial optimization.

out1 <- optim(par=4,fn=f,method=“BFGS”)
> out1$par; out1$value
[1] 3.643656
[1] −0.6934635
out2 <- optim(par=1,fn=f,method=“BFGS”)
> out2$par; out2$value
[1] 1.488273
[1] −1.875737

showing that we end up with different solutions, depending on the initial point. The script also produces the plot in Fig. 10.5: We observe that the polynomial features two local minimizers, one of which is also the global one, as well as one local maximizer. The function goes to infinity when x is large in absolute value, so there is no global maximizer. With a high-degree polynomial, there are potentially many local maxima and minima; in fact, multivariable polynomial optimization is, in general, a difficult nonconvex problem.

When solving a minimization problem, it is nice to have a convex objective function. When the objective has to be maximized, it is nice to deal with a concave function.

DEFINITION 10.3 (Concave function) A function f is concave if (−f) is convex.

Thus, a concave function is just a convex function turned upside down. We have observed that a convex function features a convex epigraph; hence, function convexity relies on set convexity. A further link between convex sets and convex functions is that the set S = {x

ⁿ | g(x) ≤ 0} is convex if g is a convex function.

Example 10.4 Variations on a circle

Consider the function g(x) = x₁² + x₂² − 1. The constraint g(x) ≤ 0 describes a circle of radius 1, which is clearly a convex set. Indeed, g(x) is a convex function. However, the constraint g(x) ≥ 0 describes the exterior of the circle, which is clearly nonconvex. Indeed, the latter constraint is equivalent to −g(x) ≤ 0, where the function describing the set is concave. The equality constraint g(x) = 0 describes the boundary of the circle, i.e., the circumference, which again is nonconvex.

do not describe convex sets. We may understand why by rewriting the equality as two inequalities:

There is no way in which h(·) and −h(·) can be both convex, unless the involved function is affine:

It is pretty intuitive that convexity of the objective function makes an unconstrained problem relatively easy, since local minima are ruled out. But what about convexity of the feasible set of a constrained problem? Actually, convexity must be ensured in both the objective function and the feasible set, as shown by the next example.

Example 10.5 Nonconvexities in the feasible set

Even if the objective function is convex, nonconvexities in the feasible set may make the problem difficult to solve. In Fig. 10.6 we observe two level curves of a convex objective function like f(x₁, x₂) = (x₁ − α)² + (x₂ − β)². The feasible set is a sort of bean and is nonconvex. We may notice that B is the global minimum, as there the feasible set is tangent to the lowest level curve. However, we also have a local optimum at point A: In its neighborhood there is no better feasible solution.

DEFINITION 10.4 (Convex optimization problem) The minimization problem min_xS f(x) is convex if both S and f(·) are convex. The maximization problem max_xS f(x) is convex if S is convex and f is concave.

Note that we also speak of a convex problem when we are maximizing a concave objective. We speak of a concave problem when we minimize a concave objective on a convex set. Concave problems are not as nice as convex problems, as the following example illustrates.

Example 10.6 A concave problem

Consider the following one-dimensional problem:

equation

This is a concave problem, since the leading term in the quadratic polynomial defining the objective is negative, so that the second-order derivative is negative everywhere. In Fig. 10.7 we show the objective function and the feasible set. The stationarity point x = 2 is of no use to us, since it is a maximizer. We see that local minimizers are located at the boundary of the feasible set. A local minimizer lies at the left boundary, x = 1, and the global minimizer is located at the right boundary, x = 4.

It turns out that the property illustrated in the example, i.e., there is an optimal solution on the boundary of the (convex) feasible set, does apply to concave problems in general. This kind of structure may be exploited in global optimization procedures.

Linear vs. nonlinear optimization problems

An optimization model in the concrete form (10.2) can be linear or nonlinear. We have a linear programming (LP) problem if all of the involved functions are linear affine, as in the following model:

where

, and

. Clearly, such a problem makes sense if m < n. The above form is an LP in standard form, involving only equality constraints and non-negative decision variables. Another way to state the problem is the canonical form

Using tricks of the trade, it is possible to convert any LP in one of the two forms above. The important point is that LP is both a convex and concave programming problem. This implies that any local optimum is also a global one, and that we may just bother with points on the boundary of the feasible region. More precisely, we have an optimal solution corresponding to a vertex of the feasible region, which is polyhedral. This is exploited in the celebrated simplex algorithm, which is available in plenty of commercial packages. There are also alternative strategies based on interior point methods.

Example 10.7 The linprog package

Say that we want to solve the following LP:

equation

We may resort to the linprog package, which provides us with the function solveLP to solve LPs by the simplex algorithm. The following script solves the problem:

library(linprog)
Amat <- rbind(c(15,35),c(25,15),c(1, 0),c(0,1))
bvet <- c(2400,2400,100,50)
cvet <- c(45,60)
result <- solveLP(cvet,bvet,Amat,maximum=TRUE)
cat(’optimal obj’, result$opt, ‘\n’)
cat(’optimal x’, result$solution, ‘\n’)

Note that we have to include the upper bounds in the matrix of generic linear constraints; better solvers allow us to keep bounds apart. The resulting output is

optimal obj 5538.462
optimal x 73.84615 36.92308

If even one function involved in problem (10.2) is nonlinear, be it the objective or a function defining an equality or inequality constraint, we step into the domain of nonlinear programming. Contrary to what one may imagine, the real difficulty boundary is not marked by the linear/nonlinear dichotomy. The defining property is convexity.

For instance, we lose convexity in LP if we restrict a subset of variables to take only integer values: We get a mixed-integer LP model, MILP for short, which is nonconvex and requires much more complicated solution procedures. In principle, since there is no easy optimality condition to check for MILPs, enumeration of solution is the only viable approach; partial enumeration strategies based on branch and bound methods are widely available, even though they may be quite expensive in terms of computational effort. On the contrary, there are nonlinear programming problems that are rather easy to solve. One such example is convex quadratic programming (QP):

In QPs, we have an objective involving a quadratic form, whereas all of the constraints are linear. The problem is easy if the quadratic form is convex, which is the case if the matrix Q is positive semidefinite. It is worth noting that a quite relevant example of positive semidefinite matrix in financial applications is a covariance matrix; see Section 10.2.1.

Example 10.8 The quadprog package

Say that we want to solve the following QP:

equation

with obvious solution x* = [2, 2]. We may resort to the quadprog package. The following script solves the problem:

    library(quadprog)
    Qmat <- rbind(c(2,0),c(0,2))
    bvet <- c(4)
    Amat <- matrix(c(1,1),nrow=2)
    result <- solve.QP(Qmat, c(0,0),Amat,bvet,meq=1)
    cat(’optimal obj’, result$value, ‘\n’)
    cat(’optimal x’, result$solution, ‘\n’)

The resulting output is

    optimal obj 8
    optimal x 2 2

We note a few peculiarities of solve.QP:

We have to double the matrix Q, since the objective function is assumed in the form shown in Eq. (10.6), with a leading 1/2.

We have to specify that the first meq=1 constraints are equalities, whereas all of the rest of them (none, here) are assumed to be inequalities.

We have to transpose the matrix A, as for some reason solve.QP assumes constraints in the forms Ax ~ b, where ~ can be = and/or ≥.

As we show later, QPs have a long tradition in finance. Another class of relevant problems is quadratically constrained QP (QCQP), which involves convex quadratic forms in the constraints as well. There are other classes of nonlinear, yet convex problems for which quite efficient solution procedures are available. The more general nonlinear programming problems are typically harder but not intractable, provided that convexity applies. On the contrary, nonconvex problems require global optimization methods and are quite challenging. This is where Monte Carlo methods come into play.

Deterministic vs. stochastic optimization problems

In most interesting and realistic problems we have to cope with some form of uncertainty in the data. This is certainly the case in financial problems, where return on investments is uncertain. We may have even problems with an uncertain time horizon, such as pension planning. A rather abstract form of a stochastic optimization problem is

which is an obvious generalization of the abstract problem (10.1). Here the objective function depends on decision variables x and random variables ξ. Taking the expectation eliminates ξ, yielding a function of x. Assuming that the random variables are continuous with joint density g_ξ(0), we have

The random variables represent uncertainty that is realized after we make the decision x, as illustrated in the following simple example.

Example 10.9 The newsvendor problem

The newsvendor problem is a prototypical problem in supply chain management and concerns production or purchasing decisions for items subject to perishability or obsolescence risk, like fashion goods. There is a limited time window for sales, and we must decide the order quantity q before knowing the realization of a random demand D during the time window. Items are bought (or produced) at unit cost c and sold at the full retail price p > c within the sales time window; any unsold item after the time window must be marked down and sold at price p_u < c. If the case q < D occurs, we have an opportunity cost corresponding to the lost profit margin m ≡ p − c for each unit of unsatisfied demand. On the other hand, when q > D we incur a cost c_u ≡ c − p_u for each unsold item. The newsvendor problem seems quite unrelated with financial problems. Nevertheless, to see a useful connection, imagine a retail bank that has to decide on the amount of cash required to meet withdrawal requests; ideally, we would like to set apart exactly the cash needed, and any surplus or shortfall is associated with a cost. Furthermore, the newsvendor problem is a simple decision problem with an analytical solution, which we use later to illustrate issues in scenario generation.

Let us use the notation X⁺ ≡ max{0, X} and express profit as:

(10.8) equation

Our problem is the maximization of expected profit,

(10.9) equation

which can also be recast in terms of cost minimization:

(10.10) equation

To see the equivalence of models (10.9) and (10.10), observe that the expectation of Eq. (10.8) involves the the expected demand, which is a given constant and does not play any role. In the second form (10.10), we associate a cost c_u with each unit of surplus and a cost m with each unit of shortfall with respect to demand. In the case of a continuous demand distribution, it is easy to show that the optimal solution solves the equation

(10.11) equation

where F_D(x) ≡ P{D ≤ x} is the cumulative distribution function (CDF) of demand. In other words, the profit margin m and the cost of unsold items c_u define a quantile of the demand distribution. If demand is normal, D ~ N(μ, σ²), then q* = μ + zσ, where z is the standard normal quantile corresponding to a probability level given by the ratio in Eq. (10.11). Note that, in general, the optimal solution is not to order the expected value of demand. If uncertainty is significant, we should tilt our decision toward larger values if the critical ratio is larger than 0.5, and toward smaller values when it is smaller than 0.5. The important takeaway is that, when faced with an optimization problem involving uncertain data, we should not solve a deterministic problem based on their expected values.

The formulation (10.7) actually encompasses cases in which we are not really able to formulate the problem as a mathematical program explicitly. It may be the case that the form of the objective function is not known, but we can only estimate its value by a simulation model. At first, this may sound weird, but there are indeed optimization problems in which we are not quite able to evaluate the objective function exactly. In fact, the objective function could be related to the performance of a complex dynamical system, such that there is no closed-form expression of the objective function. Under uncertainty, the decisions themselves may not be explicit, but given in the form of a decision rule to be applied on the basis of the current state and the previous realization of uncertain parameters. In such a setting, we have to resort to simulation-based optimization approaches, where Monte Carlo methods play a prominent role.

While the formulation of the model (10.7) is general, it obscures two different sides of the coin:

Infeasibility is not an issue in the newsvendor model above, but it is in general. We will be more specific on how stochastic optimization can be addressed in Section 10.5.

10.2 Optimization model building

The ability of solving an optimization problem is of no use without the ability of building a sensible optimization model. Model building is a difficult art to learn, possibly a bit out of the scope of this book. Nevertheless, it is useful to illustrate a few relevant examples to get a glimpse of the resulting model classes:

10.2.1 MEAN–VARIANCE PORTFOLIO OPTIMIZATION

Mean–variance optimization is the prototypical static portfolio optimization model. The decision concerns setting the weights w_i, i = 1, …, n, of a set of n assets, in such as way to strike a satisfactory trade-off between expected return and risk. Let

If we assume that risk can be measured by the standard deviation of portfolio return, one sensible model for the asset allocation decision is:

Here we minimize variance, rather than standard deviation, since this yields an equivalent, but highly tractable convex quadratic programming problem. The constraint (10.12) enforces the achievement of the target expected return; then we also require that portfolio weights add up to 1, and the non-negativity requirements forbid short-selling.

From a computational point of view, this is a trivial problem that can be solved quite efficiently. However, the model itself has a few weak points:

1. Using standard deviation as a risk measure might make sense for symmetric distributions, such as a multivariate normal; however, asset returns are not quite symmetric, especially if financial derivatives are involved. Alternative risk measures, such as conditional value-at-risk can be adopted; see Section 13.5.

2. The model is static and does not allow for dynamic rebalancing. We may formulate a full-fledged multiperiod model, as we do in Section 10.5.2; alternatively, we may simplify the problem by enforcing some structure on the investment policy, as we do in Section 10.2.4.

3. Even if we stay within the easier domain of static optimization models, we may wish to enforce additional constraints on the portfolio, such as an upper bound on the number of stock names in our portfolio, as we do in Section 10.2.2 below.

These variations on the theme lead to nonconvex and/or stochastic optimization models, which may be quite challenging to formulate and solve; Monte Carlo methods may play a pivotal role in tackling them, in one way or another.

10.2.2 MODELING WITH LOGICAL DECISION VARIABLES: OPTIMAL PORTFOLIO TRACKING

One of the features that may make an optimization problem nonconvex and much harder to solve is the presence of decision variables restricted to integer values. By far, the most common case involves logical (or binary) decision variables, taking values in the set {0, 1}. We first outline a few standard modeling tricks involving such variables, and then we describe a practically relevant portfolio optimization model requiring their introduction.

Modeling fixed charges

Consider an economic activity whose level is measured by a continuous decision variable x ≥ 0. The total cost TC(x) related to the activity may consist of two terms:

Unfortunately, the step function is nonlinear and discontinuous at the origin, as it jumps from 0 to 1. To avoid the resulting technical difficulty, we may resort to an alternative representation by introducing the following binary decision variable:

where M is a suitably large constant. To see how Eq. (10.13) works, imagine that δ = 0; then, the constraint reads x ≤ 0, whatever the value of M is. This constraint, together with the standard non-negativity restriction x ≥ 0, enforces x = 0. If δ = 1, the constraint is x ≤ M, which is nonbinding if M is large enough; in practice, M should be a sensible upper bound on the level of activity x.

Semicontinuous variables

A possible requirement on the level of an activity is that, if it is undertaken, its level should be in the interval [m_i, M_i]. Note that this is not equivalent to requiring that m_i ≤ x_i ≤ M_i. Rather, we want something like

which involves a nonconvex set (recall that the union of convex sets need not be convex). For instance, we may use a constraint like this to avoid the inclusion of assets with negligible weights in a portfolio. Using the big-M trick as above, we may write

Portfolio tracking and compression

Active portfolio managers aim at achieving superior return by tacking advantage of skill or analytical capabilities. On the contrary, passive portfolio managers aim at replicating an index or tracking a benchmark portfolio as close as possible. There is some theoretical support for the passive view, as the efficient market hypothesis, in its various forms, essentially states that there is no way to beat the market. The capital asset pricing model, too, supports the view that the market portfolio is the optimal risky component in an overall portfolio. The matter is highly controversial, but one thing can be taken for granted: A passive portfolio must have a very low management cost. One way to achieve this is to restrict the number of asset names included in the portfolio, i.e., to enforce a cardinality constraint.

To formulate the problem, let us consider a universe of n assets and a benchmark portfolio with given weights w_i^b, i, …, n. We want to find the weights w_i ≥ 0 of a tracking portfolio, in such a way to minimize a measure of distance between the target and the tracking portfolio. An additional restriction is that at most C_max assets can be included, i.e., at most C_max weights can be strictly positive (we assume that short-selling is forbidden). In order to express the cardinality constraint, we introduce a set of binary variables δ_i, one for each asset, modeling the inclusion of asset i in the tracking portfolio:

The binary variables δ_i and the continuous variables w_i are linked by the big-M constraint

Since a weight cannot be larger than 1, when we rule out short-selling, we can set M = 1 in Eq. (10.14); the big-M constraints can be tightened by setting M =

_i, where

_i is some suitable upper bound on the weight of asset i.³

Now the problem is how to measure the portfolio tracking error. A trivial distance metric can be defined by an L₁ norm:

By proper model formulation, this metric yields an MILP model. However, this distance metric does not take any return model into account. To see the point, let us consider the following weights for two assets:

According to the L₁ norm, the distance between the target and the tracking portfolio, considering only these two assets, is

However, what is the distance if the two assets are perfectly correlated? Of course, the actual distance would be zero, in such a case. More generally, the actual distance depends on the link between the two returns. An alternative distance metric is TEV (tracking error variance), defined as

where R_p and R_b are the return of the tracking portfolio and the benchmark, respectively. Then, TEV can be expressed as

where σ_ij is the covariance between the returns of assets i and j. Straightforward minimization of TEV, subject to a cardinality constraint, yields the following model:

This model is a mixed-integer quadratic programming problem with a convex relaxation. Commercial solvers are able to tackle this model structure. Nevertheless, some stochastic search methods, such as the sampling based algorithms described in Section 10.3, have been proposed as alternatives for large-scale problem instances.

10.2.3 A SCENARIO-BASED MODEL FOR THE NEWSVENDOR PROBLEM

We have introduced the newsvendor problem in Example 10.9. The basic version of the problem can be solved analytically, and the same applies to many interesting variants. Nevertheless, it is instructive, as well useful for what follows, to consider a scenario-based model formulation. The idea is to approximate the probability distribution for demand by a set of S scenarios, characterized by a probability π_S and a demand realization D_s, s = 1, …, S. The main decision variable is the ordered amount q ≥ 0 and, in order to express the objective in cost minimization form, we introduce two non-negative deviations with respect to realized demand:

where y⁺_s is a surplus and y⁻_s is a shortfall. The, the model can be formulated as an LP model:

As we shall discover later, this can be considered as a very simple example of two-stage stochastic LP model. From a practical point of view, it is a basis for more interesting models allowing for interactions among multiple products, e.g., by demand substitution.

10.2.4 FIXED-MIX ASSET ALLOCATION

In Section 10.2.1 we have considered a static, single-period asset allocation model. Later, in Section 10.5.2, we will consider a full-fledged multiperiod model, allowing for dynamic portfolio adjustments. An intermediate strategy can be devised by specifying a fixed-mix, i.e., by keeping portfolio weights w_i constants over multiple periods. Wealth will change along the way, and to keep the relative weights constant, we will sell high-price assets, and buy low-price assets. Thus, the strategy is intrinsically a contrarian one, rather than a trend-following one. Let us also assume that we only care about mean and variance, as we did before, and that our mean–risk objective is the expected value of terminal wealth penalized by its variance. Unlike the static case, expressing variance of the terminal wealth is not easily accomplished; hence, we resort to a set of S return scenarios. Each scenario is a history of total⁴ asset returns R_it^s, over a planning horizon consisting of time periods t = 1, …, T. Each scenario is associated with a probability π_s, s = 1 …, S. The model we describe here is due to [32], to which we refer the reader for further information and computational experiments, and is basically an extension of the mean–variance framework.

Let W₀ be the initial wealth. Then, wealth at the end of time period 1 in scenario s will be

Note that wealth is scenario-dependent, but the asset allocation is not. In general, when we consider two consecutive time periods, we have

Unfolding the recursion, we see that wealth at the end of the planning horizon is

Within a mean–variance framework, we may build a quadratic utility function depending on the terminal wealth. Given a parameter λ related to risk aversion, the objective function is

To express the objective function, we recall that Var(X) = E[X²] − E²[X] and write the model as

This looks like a very complex problem; however, while the objective function is a bit messy, the constraints are quite simple. The real difficulty is that this is a nonconvex problem. To see why, just note that the objective turns out to be a polynomial in the decision variables; since polynomials may have many minima and maxima (see Example 10.3), we face a nonlinear nonconvex problem. Once again, stochastic search algorithms, based on Monte Carlo sampling, can be used to cope with these problems.

10.2.5 ASSET PRICING

Asset pricing is a central problem in financial engineering, as well as in financial economics and corporate finance. We know from Section 3.9.2 that the price of a European-style option can be expressed as the discounted expected payoff, under the risk-neutral measure. Assuming a constant risk-free rate r, with continuous compounding, we have

where S₀ is the underlying asset price now and Φ(S_T) is the option payoff, depending on the underlying asset price S_T at maturity. In principle, pricing a European-style option by Monte Carlo methods is easily accomplished, provided that we can generate sample paths efficiently; if we do not consider hedging issues, we have just to sit down, cross our fingers, and wait until maturity. However, there are assets whose value depends on our choices along the way. Generally speaking, the value of a resource depends on how the resource itself is used. American-style options, unlike their European-style counterparts, can be exercised at any date prior to expiration. One easy conclusion is that an American-style option cannot cost less than the corresponding European one, as it offers more opportunities. However, early exercise opportunities make pricing much harder, as we should consider an optimal exercise strategy, which in turn calls for the solution of a stochastic dynamic optimization problem. At each exercise opportunity, if the option is in-the-money, we have to compare the intrinsic value of the option, i.e., the immediate payoff that we may earn from exercising the option early, and the continuation value, i.e., the value of postponing exercise and waiting for better opportunities.

The difference with Eq. (10.18) is that now we have to solve an optimization problem with respect to a stopping time τ. The term “stopping time” has a very precise meaning in the theory of stochastic processes, but here we may simply interpret it as the time at which we exercise the option, according to some well-defined strategy.

Example 10.10 An American-style put option

A put option is in-the-money at time t if S(t) < K. Immediate exercise in this case yields a payoff K − S(t). Clearly, early exercise will not occur if the option is not in-the-money, but even if S(t) < K it may be better to keep the option alive and wait. This means that early exercise will occur only if the option is “enough” in-the-money. By how much, it will generally depend on time to expiration. When we are close to option expiration, there are less opportunities, and we are more inclined to exercise; but when we are far from expiration, we are more inclined to wait. Formally, we have to find a boundary separating the early exercise region, where the intrinsic value is larger than the continuation value, from the continuation region. Qualitatively, for an American put option we would expect an early exercise boundary like the one depicted in Fig. 10.8. This boundary specifies a stock price S*(t) such that if S(t) < S*(t), i.e., if the option is deeply enough in-the-money, then we exercise. If we are above the boundary, we are in the continuation region, and we keep the option alive. Finding the boundary is what makes the problem difficult; for a detailed treatment of the exercise boundary for American options see, e.g., [31, Chapter 4].

From the last example, we understand that the time at which the option will be exercised is indeed a random variable, in this case the time at which we cross the exercise boundary, if we cross it. More complicated strategies are conceivable and must be applied to exotic and multidimensional options, and this is what the stopping time captures. The stopping time is related to any conceivable strategy, provided that it is nonanticipative; we may only use information gathered from time 0 up to time t. From a practical perspective, in order to price an option with early exercise features, we have to solve a possibly nontrivial optimization problem. For low-dimensional options, numerical methods based on suitable discretizations of the state space, like binomial or trinomial lattices, or finite-difference methods to solve partial differential equations work very well. For high-dimensional or path-dependent options, we have to resort to Monte Carlo methods. In the past, this was considered impossible, but by framing the problem within a stochastic dynamic programming framework, we may tackle it, as we shall see in Section 11.3.

10.2.6 PARAMETER ESTIMATION AND MODEL CALIBRATION

We have introduced maximum-likelihood estimation in Section 4.2.3. Parameter values are chosen in such a way as to maximize a likelihood function related to a selected probability distribution. In very simple cases, the solution is easily found analytically. In nontrivial cases, numerical optimization methods are needed for the solution of the resulting problems, possibly because of the complexity of the likelihood function or the addition of constraints on parameter values, on the basis of some additional information we might have. Furthermore, the resulting problem may be nonconvex and call for global optimization strategies.

A similar kind of problem is encountered when we have to calibrate a pricing model. For instance:

Example 10.11 Fitting the yield curve

At time t we observe a set of bond prices P_k(t), k = 1, …, n. Let us assume for the moment that the cash flows associated with these bonds, i.e., the payment of coupons along the way and the face value at bond maturity, occur at a set of common time instants T_i, i = 1, …, m. Let r(t, T) denote the continuously compounded interest rate applying over the time interval [t, T]. Based on observed bond prices, we want to estimate r(t, T) for a range of maturities. According to straightforward bond pricing formulas, the bond prices should be given by

equation

where c_k,i is the cash flow from bond k at time T_i. These predicted prices are compared against the observed prices P_k^o(t). If m = n, the number of unknown parameters and prices are the same, and we may solve a system of nonlinear equations to find the interest rates for maturities T_i, where we assume _k(t) = P_k^o(t). A textbook approach, known as bootstrapping, assumes that exactly one bond will mature at each time instant T_i; then, the system of equations has a “triangular” structure and we can solve one equation at a time, finding one rate per equation. In practice, it may be difficult to find a suitable set of bonds lending itself to such a simplified solution strategy. Liquidity and other issues may affect bond prices, introducing noise and some irregularity in the resulting yield curve.

To smooth those effects, it is preferable to choose m > n and minimize an error distance between observed and predicted prices, with respect to rates:

equation

This is a nonlinear least-squares problem that can be solved by numerical optimization methods. However, it is typically desirable not to consider a restricted set of cash flow times T_i, in order to widen the set of bonds we may use for calibration. Furthermore, we would like to fit a term structure with some reasonable degree of smoothness, in such a way that we can estimate rates for maturities different from the aforementioned cash flow times T_i. One way to accomplish this is by spline interpolation, but due to possible pricing anomalies exact interpolation may yield unreasonable results, and least-squares approximation may be preferred. More generally, we may consider a set of basis functions φ_j(T), j = 1, …, J, whose linear combination

equation

gives the yield curve as a function of maturity T. Here, the decision variables of the optimization problem are the coefficients α_j, and we might also consider estimating the discount curve directly, which is then recast as the yield curve. The use of basis functions to boil down an infinite dimensional problem to a finite dimensional one is very common, and we will take advantage of the idea later, when dealing with numerical dynamic programming.

The calibration of the yield curve on the basis of quoted prices is an example of an inverse problem. Model calibration is also a cornerstone of option pricing in incomplete markets. As we have seen in Section 3.9.3, market incompleteness arises whenever we step out of the safe BSM domain based on GBM process with very simple drift and volatility terms. If volatility is stochastic or the underlying variable is nontradable, we may need a more sophisticated approach. In Eq. (3.126) we have pointed out the existence of an unknown function m*(·, ·) serving as a drift in an Itô differential equation, under a pricing measure. We report the equation here for the sake of convenience, recasting it in terms of a generic state variable X_t:

Once again, note that for pricing purposes the drift term is related to a risk-neutral measure, and it cannot be estimated on the basis of a time series of X_t. By a similar token, we may not want to fit the volatility function on the basis of historical values of volatility; rather, we want to extract the current market view, which is implicit in asset prices. Hence, we are faced with an inverse problem. Depending on the kind of dependence on time, the functions m*(·, ·) and s(·, ·) may depend on a finite or infinite set of parameters. The second case occurs when these terms explicitly depend on time, in order to improve their matching performance against quoted prices. Whatever the case, by a suitable discretization mechanism, possibly based on basis functions, interpolation, and whatnot, we end up with a pricing model depending on a vector of parameters α. The pricing model predicts a price _k(α) for a set of securities with quoted (observed) price P_k^o, and it calibrated by solving the optimization problem

One might wonder why we should calibrate a model by fitting a set of unobservable parameters against quoted prices of exchange-traded instruments. Clearly, the aim is not to recover quoted prices per se, but to fit a model in order to price over-the-counter assets and to assess option price sensitivity with respect to a set of risk factors.

Depending on the complexity of the pricing model, we may end up with relatively simple optimization models, amenable to classical nonlinear programming procedures (most notably, the Levenberg–Marquardt algorithm), or an extremely difficult global optimization problem. In the latter case, the Monte Carlo based global optimization methods that we describe in the next section might be the only viable solution method.

10.3 Monte Carlo methods for global optimization

When dealing with a convex optimization problem featuring a well-defined objective function, we may take advantage of useful information provided by its gradient and Hessian matrix. Unfortunately, it may be the case that the objective function is badly behaved, i.e., nonconvex, possibly discontinuous, or even not known in analytical form. Then, we may resort to algorithms that rely only on function evaluations, possibly obtained by stochastic simulation, and a search mechanism to explore the solution space randomly. Monte Carlo methods may be used at both levels:

As the reader can imagine, there is no clear-cut line separating the two ideas, as random sampling may be employed at both levels.

For any specific decision problem it is possible to devise an ad hoc heuristic method providing us with a good, possibly near-optimal, solution. Here we only consider fairly general strategies, which may be customized to the specific problem at hand. Indeed, a wide class of such methods are known under the name of metaheuristics, which emphasizes their generality and flexibility. Indeed, they are able to cope with both continuous nonconvex problems as well as discrete optimization problems, like optimal portfolio tracking, where the nonconvexity arises from the feasible set, which includes binary decision variables. We first introduce local search and other metaheuristic principles in Section 10.3.1. Then, in Section 10.3.2 we describe simulated annealing, a simple stochastic search method inspired by an interesting physical analogy. Other methods, such as genetic algorithms (Section 10.3.3) and particle swarm optimization (Section 10.3.4) rely on analogies with biological systems and, unlike simulated annealing, work on a population of solutions, rather than a single one.

10.3.1 LOCAL SEARCH AND OTHER METAHEURISTICS

Local search is a wide class of simple exploration strategies based on the idea of generating a sequence of solutions exploring the neighborhood of the current one. Consider a generic optimization problem

defined over a feasible set S. The basic idea is to improve a given feasible solution by the application of a set of local perturbations. We associate a feasible solution x with a neighborhood N(x), which is defined by applying a set of simple perturbations to x. Different perturbations yield different neighborhood structures. The simplest local search algorithm is local improvement: Given a current (incumbent) solution x⁰, an alternative (candidate) solution x’ is searched for in the neighborhood of the incumbent:

If the candidate solution improves the objective function, i.e., f(x’) < f(x⁰), then the incumbent solution is replaced by the candidate and the process is repeated. The search is terminated when we cannot find any improving solution, and the incumbent solution is returned. Clearly, there is no guarantee that an optimal solution is found, as the process may get trapped in a bad locally optimal solution. Furthermore the solution is locally optimal with respect to the selected neighborhood structure, which must be simple enough to allow for an efficient search, but not so simplistic that the exploration process is ineffective. If the candidate solution is the optimal solution in the neighborhood, we speak of a best-improving local search; if we stop exploring the neighborhood as soon as an improving solution is found, we speak of a first-improving local search.

Example 10.12 Neighborhood structures for local search

Neighborhood structures may be devised for both discrete and continuous optimization problems. If all the components of x ⁿ are restricted to binary values, a simple neighborhood structure is obtained by complementing each variable in turn, i.e., for each i = 1, …, n we set

equation

The neighborhood consists of n points and it is easy to explore, but possibly very poor. We may consider also complementing pairs of variables, as well as swapping values of variables having different values.

The case of continuous variables adds a degree of difficulty, as it is not clear by how much we should perturb each component. We may define steps δ_i for each component and generate 2n alternative solutions by applying the perturbation in both directions:

equation

where e_i is a unit vector with 1 in position i and 0 in the other positions. This is the simplest pattern for searching the neighborhood, but some mechanism should be devised to dynamically adjust the step lengths.

Alternatively, we may generate a random candidate at a time, by sampling randomly in the neighborhood of the incumbent solution using a suitable probability distribution. For instance, we may adopt a multivariate normal distribution with expected value x⁰ and a diagonal covariance matrix ∑. This approach lends itself to first-improving strategy.

Actually, devising a clever and effective neighborhood is not trivial at all. A further complication is that due attention must be paid to constraints. In some cases we might just discard solutions violating them; however, this is not easily accomplished when dealing with equality constraints, as staying inside the feasible region might be difficult. An alternative strategy is to relax constraints and augment the objective function by a corresponding penalty function. For instance, the constrained problem

for suitably large penalty coefficients ω_i, possibly expressing the relative importance of constraints. The penalty coefficients might also be dynamically adjusted, allowing the search process to occasionally leave the feasible region and getting back into it at another point. This kind of approach, called strategic oscillation, may be used, along with other trick of the trade, to balance exploitation (the need to stick to a good solution and explore its neighborhood) and exploration (the need to explore other subregions of the feasible set).

Local improvement is not suitable to global optimization. However, one may relax the requirement that the new solution in the search path is an improving one. By accepting nonimproving candidates, we may hope to escape from local optima. Two such strategies are:

Both simulated annealing and tabu search work with a single incumbent solution. In order to find a better trade-off between exploration and exploitation, one may work with a population of solutions. Examples of population-based approaches are genetic algorithms and particle swarm optimization.

10.3.2 SIMULATED ANNEALING

Simulated annealing owes its name to an analogy between cost minimization in optimization and energy minimization in physical systems. The local improvement strategy behaves much like physical systems do according to classical mechanics. It is impossible for a system to have a certain energy level at a certain time instant then and to increase it without external input: If you place a ball in a hole, it will stay there. This is not true in statistical mechanics; according to these physical models, at a temperature above absolute zero, thermal noise makes an increase in the energy of a system possible. This applies to changes at a microscopic level, and an increase in energy is more likely to occur at high temperatures. The probability P of this upward jump depends on the increase in energy ΔE and the temperature T, according to the Boltzmann distribution

where K is the Boltzmann constant. Annealing is a metallurgical process by which a melted material is slowly cooled in order to obtain a good (low-energy) solid-state configuration. If the temperature is decreased too fast, the system gets trapped in a local energy minimum, and a glass is produced. But if the process is slow enough, random kinetic fluctuations due to thermal noise allow the system to escape from local minima, reaching a point very close to the global optimum.

In simulated annealing we consider the change in the objective function when moving from the incumbent solution to a candidate solution:

An improving candidate yields Δf < 0 and is always accepted. A nonimproving candidate is randomly accepted, according to a probability depending on the amount of cost increase Δf and the control parameter T, as shown in Fig. 10.9. At very high temperatures, most candidate solutions are accepted, whereas at intermediate temperatures candidate solutions that increase cost are accepted with lower probability if the deterioration is significant. For T → 0 the acceptance probability collapses into a step function, and the method behaves like local improvement. In Section 14.4 we will interpret simulated annealing within the framework of Markov chain Monte Carlo methods, as a way to sample from a density that is concentrated on the set of globally optimal solutions. However, to reach that set without getting trapped in local optima, the temperature should be decreased according to a cooling schedule. The simplest cooling schedule is

In practice, it is advisable to keep the temperature constant for a certain number of steps, in order to reach a thermodynamic equilibrium before changing the control parameter. More sophisticated adaptive cooling strategies have been proposed. Furthermore, we have to establish a termination criterion. The simplest idea is to stop the algorithm after a given number of iterations; alternatively, we may stop after a maximum number of iterations without accepting a candidate.

We show an extremely naive implementation of simulated annealing in Fig. 10.10, for illustrative purpose only. We may test it on the Rastrigin function⁵

We use n = 20 and start search from a randomly selected point within a box. For compatibility reasons with other R functions that we want to use for comparison purposes, we constrain all decision variables in a box with lower bound −100 and upper bound 100. A classical nonlinear programming method, available as an option in the standard R function optim gets quickly stuck in a bad solution, whereas the naive annealing does better. Actually, we may take advantage of a better implementation of simulated annealing offered by optim:

Here, tmax is the maximum number of iterations without decreasing the temperature. Still better results are obtained by function GenSA offered in the package having the same name:

This function implements a generalized simulated annealing algorithm; see [44] for more information.

10.3.3 GENETIC ALGORITHMS

Unlike simulated annealing, genetic algorithms work with a set of solutions rather than a single point. The idea is based on the survival-of-the-fittest mechanism of biological evolution, whereby a population of individuals evolves by a combination of the following elements:

There is a considerable freedom in the way the above idea is translated into a concrete algorithm. A critical issue is how to balance exploration and exploitation. On the one hand, biasing the selection of the new population toward high quality individuals may lead to premature termination; on the other hand, unless a certain degree of elitism is used to keep good individuals into the population, the algorithm might wander around and skip the opportunity of exploring promising subregions of the solution space.

A key ingredient is the encoding of a feasible solution as a string of features, corresponding to the genes of a chromosome. For continuous optimization in

ⁿ the choice of representing each solution by the corresponding vector x is fairly natural.⁶ Let us consider the simplest form of crossover: Given two individuals x and y in the current pool, a “breakpoint” position k

{1, 2, …, n} is randomly selected and two offsprings are generated as follows:

Variations on the theme are possible; for instance, a double crossover may be exploited, in which two breakpoints are selected for the crossover. The two new solutions are clearly feasible for an unconstrained optimization problem in

ⁿ; when constraints are added, this guarantee is lost in general. As we have already mentioned, one can enforce hard constraints by eliminating noncomplying individuals. A usually better alternative is a relaxation by a suitable penalty function.

There are a few packages implementing genetic algorithms in R. Figure 10.11 shows how to use one of them, GA, to minimize the Rastrigin function. Note that we have to set the “real-valued” option to specify that we are dealing with a continuous optimization, and that there are other settings aimed at discrete optimization. Furthermore, genetic algorithms typically maximize a fitness function; hence, we have to change the sign of the Rastrigin function. We obtain the following (edited) output:

We notice that some options related to crossover and mutation probabilities, as well as the degree of elitism, have been set to default values. We may also provide the ga function with customized functions to carry out mutation, crossover, etc. Genetic algorithms offer plenty of customization options, which is a mixed blessing, as it may be awkward to find a robust and satisfactory setting. It is also possible to collect the best solutions found along the way and to refine them by a classical nonlinear programming method, finding a local minimizer nearby.

10.3.4 PARTICLE SWARM OPTIMIZATION

Particle swarm optimization (PSO) methods are an alternative class of stochastic search algorithms, based on a population of m particles exploring the space of solutions, which we assume is a subset of

ⁿ. The position of each particle j in the swarm is a vector

The idea is to mimic the interactions of members of a swarm looking for food, and a classical statement of the algorithm is the following:

Equation (10.20) governs the evolution of each component i = 1,…, n of the velocity of each particle j = 1, …, m. The new velocity depends on:

Equation (10.21) simply changes each component of the current position according to the new velocity. At each iteration, the personal and global bests are updated when necessary, and other adjustments are used in order to keep velocity within a given range, as well as the positions, if bounds on the variables are specified.

PSO can be regarded as another Monte Carlo search approach, and several variants have been proposed (see the end of chapter references):

In small world methods, the global best position for neighboring particles is used, rather than the whole swarm.

In quantum PSO different rules are used to evolve particle positions. A version of the algorithm generates a new particle position as

equation

where U is a uniformly distributed random variable, β is a coefficient to be chosen, and p_j(t) is a random combination of the personal and the global best. The ± is resolved by the flip of a fair coin, i.e., + and − have both 50% probability.

Some algorithms use Lévy flights, i.e., the shocks to current positions are generated by using heavy-tailed distributions, which is contrasted against the normal distribution underlying geometric Brownian motion and standard random walks.

In the firefly algorithm, the quality of each solution corresponds to the light intensity of a firefly attracting other particles.

Some of these algorithms have a physical motivation, whereas some have a biological motivation; they differ in the number of coefficients to be set and the corresponding complexity of fine tuning, though some self-adaptive methods are available. As the reader can imagine, it is easy to get lost because of the sheer number of proposals and combinations. The key is to experiment with some of these variants and find the best combination for the specific problem at hand. A welcome tool is the pso R package, which implements some variants of the approach. The code in Fig. 10.12 shows how to use the psoptim function to minimize the Rastrigin function. The function has a similar interface as the standard optim function, plus several control parameters controlling the maximum number of iterations (maxit), the tracing of execution (trace and REPORT), as well as the possibility of carrying out a local optimization for each particle using a standard nonlinear optimizer (if hybrid is set). The code produces the following output:

Indeed, we notice that the hybrid algorithm is more effective, at the cost of a possibly larger computational effort. Some experimentation with control parameters may be needed to fine tune the algorithm performance.

10.4 Direct search and simulation-based optimization methods

In the previous section we have considered Monte Carlo-based stochastic search approaches to optimize a deterministic objective function. The difficulty was in the nonconvexity of the objective function, but not in its evaluation. When dealing with optimization under uncertainty, as in the case

the function evaluation itself may be difficult. Thus, we may be forced to adopt Monte Carlo sampling to obtain noisy estimates of the objective function. In well-structured problems, we may build a formal optimization model, which can be tackled by stochastic programming with recourse or stochastic dynamic programming, to be discussed later. However, sometimes the system is so complicated that we may only estimate the objective function by some stochastic simulator acting as a black box: We feed the black box with x and we obtain an estimate of the expected performance. As a consequence, some nice properties such as continuity and differentiability may not be always guaranteed. Population-based approaches, like genetic algorithms and particle swarm optimization may help, since they do not place any restriction on the objective function; furthermore, since they work on a population, they tend to be less vulnerable to estimation errors. We may also keep a pool of the most promising solutions found, which could be simulated to a greater degree of accuracy to yield the optimal solution.

Here we discuss two approaches which may be used for simulation-based optimization:

We should note that these methods are not aimed at global optimization, unlike those discussed in the previous section. Nevertheless they may be more effective and efficient in specific settings.

10.4.1 SIMPLEX SEARCH

The simplex search algorithm, also known as Nelder–Mead method, is a deterministic search strategy relying on functions evaluations only. Rather than working with a single point, it uses a simplex of n+1 points in

ⁿ. A simplex in

ⁿ is the convex hull of a set of n+1 affinely independent points x₁, …, x_n+1.⁷ The convex hull of a set of points is just the set of points that may be obtained by taking convex combinations (linear combinations whose weights are non-negative and add up to 1) of the elements of the set. For instance, in two dimensions, a simplex is simply a triangle, whereas in three dimensions it is a tetrahedron.

The rationale behind the method is illustrated in Fig. 10.13 for a minimization problem in

². In this case, the simplex search generates a sequence of sets consisting of three points, in which the worst one is discarded and replaced by a new one. For instance, let us assume that x₃ is associated with the worst objective value; then, it seems reasonable to move away from x₃ by reflecting it through the center of the face formed by the other two points, as shown in the figure. A new simplex is obtained and the process is repeated. The generation of the new point is easily accomplished algebraically. If x_n+1 is the worst point, We compute the centroid of the other n points as

Clearly, the key issue is finding the right reflection coefficient α > 0. If x_r turns out to be even worse than x_n+1, we may argue that the step was too long, and the simplex should be contracted. If x_r turns out to be the new best point, we have found a good direction and the simplex should be expanded. Different strategies have been devised in order to improve the convergence of the method.⁸ Although the simplex search procedure has its merit, it does not overcome the possible difficulties due to the nonconvexity of the objective function Or the discrete character of some decision parameters. In fact, it was originally developed for experimental, response surface optimization.

The Nelder–Mead algorithm is offered as an option in the basic optim function. Furthermore, there are some R packages implementing variants of this and other derivative-free optimization methods, such as neldermead and dfoptim. The package optimsimplex offers a framework to implement specific variants of simplex search.

10.4.2 METAMODELING

We have discussed several derivative-free algorithms. Avoiding reliance on information provided by the gradient and the Hessian matrix of the objective function is certainly a good idea when we are dealing with a badly behaved function. However, in other cases, we may miss an opportunity to better drive the search process. Metamodeling aims to build an approximation of the objective function

on the basis of simulation experiments. Hence, we use a simulation model as a black box, whose response surface is modeled by a suitably chosen approximating function. This kind of higher-level model explains the name of the overall approach.

If we want to find the gradient ∇h(x_k) at some point x_k, we may build a local linear approximator like

Apparently, the gradient does not depend on x_k. However, the linear metamodel does depend on that point, as the simulation experiments that we use in linear regression have to be carried out in its neighborhood. They should not be too close, otherwise the noise in the evaluation of f(·, ·) will be overwhelming. However, they should not be placed too far, as the linear approximation is likely to be valid only on a neighborhood of x_k. Statistical methods for design of experiments can also be put to good use to limit the number of simulation runs needed to feed the linear regression model. Also variance reduction strategies, like common random numbers may help.

The estimated gradient may be used within standard nonlinear programming algorithms to generate a sequence of points like

for a step length α_k > 0. This is the steepest descent (or gradient) method, which suffers from poor convergence. One way to improve performance is to build a quadratic metamodel

If the square matrix Γ is positive semidefinite, the above quadratic form is convex and it is easily minimized by solving a system of linear equations, yielding a new approximation x_k+1 of the optimal solution. A new quadratic approximation is built, centered on the new point, and the procedure is repeated until some convergence condition is met. This kind of approach is related to sequential quadratic programming methods for traditional nonlinear programming. It may be advisable to use a linear metamodel at the early stages of the process, since this may result in a faster movement toward the minimizer; then, when convergence of the gradient method becomes problematic, the procedure may switch to the quadratic metamodel.

A noteworthy feature of metamodeling is that it can cope with constraints on the decision variables, since we solve a sequence of nonlinear programming models, whereas alternative approaches have to resort to penalty functions. An obvious disadvantage is that it is likely to be quite expensive in computational terms, if many simulation experiments are needed to filter random noise and find a good metamodel. Alternative methods, such as perturbation analysis, have been proposed to estimate sensitivities with a single simulation run. We will consider a few simple application examples to financial derivatives in Chapter 12.

10.5 Stochastic programming models

Earlier in the chapter we have introduced decision models under uncertainty. In this section we describe one possible modeling framework, stochastic programming with recourse. For the sake of simplicity, we will just deal with linear programming (LP) models under uncertainty, but the approach can be extended to nonlinear optimization models. We begin with two-stage models in Section 10.5.1. Then, we generalize the approach to multistage problems in Section 10.5.2. Finally, we stress important issues related to scenario generation in Section 10.5.3. An alternative approach to dynamic decision making under uncertainty, stochastic dynamic programming, will be the subject of the last sections of the chapter.

10.5.1 TWO-STAGE STOCHASTIC LINEAR PROGRAMMING WITH RECOURSE

To get a clear picture on how uncertainty can be introduced into an optimization model, let us consider an uncertain LP problem:

where all problem data depend on random variables ξ with support

. Stated as such, the model is not even posed in a sensible way, since the objective function is a random variable, i.e., a function of the underlying events, and the minimization is not well defined. When we select a decision, we only select the probability distribution of cost or profit, and in order to assign a proper meaning to the problem, we must specify how probability distributions are ranked. The most natural approach is to take the expected value of the objective, if we ignore risk aversion.⁹ However, even if we settle for some sensible optimality criterion, there are serious issues with feasibility. We should wonder whether we can require feasibility for any possible contingency, i.e.,

This is clearly not possible in general, as we get an inconsistent set of equations. The case of inequality constraints is a bit easier. Nevertheless, requiring

yields a fat solution that may be overly expensive, assuming that such a feasible solution exists. More often than not, we must accept a controlled violation of constraints. One possibility is to settle for a solution with a probabilistic guarantee:

for a suitably small α. This idea leads to chance-constrained models, which have a sensible interpretation in terms of solution reliability. Chance-constrained models are indeed useful in some cases, but they suffer from definite limitations:

Per se, they do not model the flow of information and the sequence of decision stages; in real life, we make a decision here and now, but then we revise and adjust it along the way, when uncertainty is progressively resolved.

They can lead to quite risky solutions, as nothing is said about the effect of a constraint violation, even though this has low probability α.¹⁰

Even if we choose a very small α, can we trust our ability to estimate very small probabilities?

In general, chance-constrained models do not result in convex optimization problems. The essential reason for this unpleasing state of the matter is that the union of convex sets need not be convex.¹¹

An alternative modeling framework, to which one may resort to overcome these issues, is stochastic programming with recourse. Let us consider the stochastic constraints

As we have pointed out, it is impossible to satisfy these constraints with an assignment of x that is valid for any realization of the random variable ξ. However, we may think of “adjusting” the solution, after observing uncertainty, by some action represented by decision variables y(ξ):

These variables are called recourse variables, and they are random in the sense that they are adapted to the random variable ξ, as they are made after uncertainty is resolved. They are, therefore, second-stage decisions corresponding to contingency plans, whereas the x are first-stage, here-and-now, decisions. The matrix W is called recourse matrix and represent the “technology” that we use to adjust first-stage decisions. Its structure has a remarkable impact on the difficulty of the problem, and it is worth noting that we are assuming deterministic recourse, as the matrix W is not random. Problems with stochastic recourse are generally more difficult and are subject to some pathologies.

Example 10.13 Simple recourse in the newsvendor model

In Section 10.2.3 we have formulated the newsvendor model within a scenario-based framework. The first-stage decision q, the amount we produce, should ideally match demand D^s in every scenario, which is clearly impossible. In the model of Eq. (10.17) we just penalize deviations from the target:

equation

Thus, we immediately see that in this model the structure of the recourse matrix is

equation

where I is the identity matrix of order S. This case is referred to as simple recourse, as the second-stage variables are just used for bookkeeping and cost evaluation, and are not really linked to decisions. However, we could apply more clever decisions, such as asking a third-party to produce or sell us additional items, possibly using express shipping, or trying to persuade the customer to accept a substitute item. When we include these recourse actions in the model, and we account for multiple items, we really introduce second-stage decisions.

Clearly, these adjustments are not for free and incur some cost. We should minimize the total expected cost of both our initial and recourse decisions, which yields the following optimization model:

This is a two-stage stochastic LP model with recourse and can be interpreted as follows:

We may also introduce the recourse function Q(x) and rewrite the model as the deterministic equivalent

This formulation shows that a stochastic linear program is, in general, a nonlinear programming problem. Nonlinear problems are usually less pleasing to solve than linear ones, but this is not the really bad news. In fact, the recourse function Q(x) looks like a “hopeless” function to deal with:

Luckily, in many cases of practical interest, we can prove interesting properties of the recourse function, most notably convexity, which implies continuity (on an open domain). In some cases, Q(x) is differentiable; in other cases it is polyhedral. This does not imply that it is easy to evaluate the recourse function, but we may resort to statistical sampling (scenario generation) and take advantage of both convexity and problem structure.

We may represent uncertainty by a discrete probability distribution, where S is the set of scenarios and π_s is the probability of scenario s

S. We obtain the scenario tree (fan) in Fig. 10.14, where ξ_s is the realization of random variables corresponding to scenario s. One obvious way to sample a scenario tree is by Monte Carlo, in which case the scenarios have the same probability π_s = 1/ |S|, but other scenario generation strategies are available. This yields the following LP:

Thus we are left with the task of solving a plain LP, even though a possibly large-scale one. A crucial point is the ability to generate a good and robust solution on the basis of a limited number of representative scenarios.

10.5.2 A MULTISTAGE MODEL FOR PORTFOLIO MANAGEMENT

Multistage stochastic programming formulations arise naturally as a generalization of two-stage models. At each stage, we gather new information and we make decisions accordingly, taking into account immediate costs and expected future recourse cost. The resulting decision process may be summarized as follows:¹²

At the beginning of the first time period (at time t = 0) we select the decision vector x₀; this decision has a deterministic immediate cost c₀x₀ and must satisfy the constraint

equation

At the beginning of the second time period we observe random data (A₁₀, A₁₁, c₁, b₁); then, on the basis of this information, we make decision x₁; this second decision has an immediate cost c₁x₁ and must satisfy the constraint

equation

Note that these data are not known at time t = 0, but only at time t = 1; the new decision depends on the realization of these random variables and is also affected by the previous decision.

We repeat the same scheme as above for time periods up to H − 1, where H is our planning horizon.

Finally, at the beginning of the last time period H, we observe random data (A_H,H−1, A_{H H}, c_H, b_H); then, on the basis of this information we make decision x_H, which has an immediate cost c_Hx_H and must satisfy the constraint

equation

From the perspective of time period t = 0, the decisions x₁, …, x_H are random variables, as they will be adapted to the realization of the underlying stochastic process. However, the only information that we may use in making each decision consists on the observed history so far. The resulting structure of the dynamic decision process can be appreciated by the following recursive formulation of the multistage problem:

In this formulation, we observe that a decision x_t depends directly only on the previous decision x_t−1 In general, decisions may depend on all of the past history, leading to a slightly more complicated model. However, we may often introduce additional state variables, in such a way that the above formulation applies. What the above formulation hides is the dependence structure of the random variables ξ_t, t = 1, …, H, which are the underlying risk factors of the random matrices and vectors A, b, and c:

From the viewpoint of stochastic programming, all of the above cases may be tackled. On the contrary, the stochastic dynamic approach that we consider later relies on a Markovian structure and is more limited. We shall also see that dynamic programming takes advantage of a recursive formulation, which is implicit in the above multistage model. On the other hand, a definite advantage of the dynamic programming framework is that decisions are given in feedback form, as a function of the current system state. Indeed, it should be noted that, in practice, the relevant output of a stochastic programming model is the set of immediate decisions x₀. The remaining decision variables could be regarded as contingency plans, which should be implemented as time goes by; however, it is more likely that the model will be solved again and again according to a rolling-horizon logic.

While this formulation points out the dynamic optimization nature of multistage problems, we usually resort to deterministic equivalents based on discretized scenario trees, like the one depicted in Fig. 10.15. Each node n_k in the tree corresponds to an event, where we should make a decision. We have an initial node n₀ corresponding to time t = 0. Then, for each event node, we branch descendant nodes. In the figure, each node has two successors, but the branching factors are arbitrary. The terminal nodes (the leaves) have no successor and their number gives the number of scenarios in the tree. A scenario is a path from the root node to a leaf node. For instance, scenario 2 consists of the node sequence (n₀, n₁, n₃, n₈). Each branch is associated with a conditional probability of occurrence P(n_k | n_i), where n_i = a(n_k) is the immediate predecessor of node n_k. If scenarios are sampled by plain Monte Carlo, this probability is related to the branching factor. For instance, we may assume that all conditional probabilities in Fig. 10.15 are 0.5. The probability of a scenario is the product of the conditional probabilities of the branches we meet on the path. Hence, in this example we have 8 scenarios with probability 1/8.

Decision variables are associated with each node of the tree, as we show in the next section. It is important to realize that the tree structure enforces nonanticipativity of decisions. For instance, at node n₁ we do not know if we are on scenario 1, 2, 3, or 4. The decisions x(n₁) at node n₁ will be the same for each of those 4 scenarios, which cannot be distinguished at node n₁. Alternatively, we could associate decisions x(s, t) with scenarios and time steps. Therefore, rather than second-stage decisions x(n₁) and x(n₁), we would include decisions x(s, 1), for s = 1, 2, …, 8. However, we should include nonanticipativity constraints explicitely:¹³

In the next section we illustrate a simple financial model where decisions are associated with nodes, rather than scenarios, resulting in a more compact model formulation. The choice between the two approaches, however, depends on the solution algorithm that we choose for solving a possibly large-scale optimization model. In fact, we realize that if the branching structure is not kept under control, the number of nodes will grow exponentially.

10.5.2.1 A multistage financial planning model

To illustrate how multistage stochastic programming can be used to tackle financial problems, let us consider a simple asset–liability management problem.

We have a stream of stochastic liabilities that should be funded by a portfolio of assets. We represent uncertainty by a scenario tree, and Lⁿ is the liability we have to meet in node

, where

is the set of nodes in the tree.

We denote the root node of the tree by n₀; each node

has a unique predecessor denoted by a(n).¹⁴

We may invest in assets indexed by i

. Of course, asset prices are random, and let P_iⁿ is the price for asset i at node n.

We are given a set of initial holdings

for each asset i

at the root node.

The initial portfolio can be rebalanced at node n₀, as well as at other nodes of the tree.

For the sake of simplicity, we do not consider the possibility of receiving new cash along the way, as it would be the case, e.g., for a pension fund receiving new contributions. The only way to raise cash is by selling assets.

When we buy or sell assets, we incur proportional (linear) transaction costs; the transaction cost is a percentage c of the traded value, for both buying and selling any asset. It is easy to introduce proportional transaction costs depending on the liquidity of each single asset, as well as on the sign of the trade.

We want to maximize the expected utility of the terminal wealth at the leaves of the tree. Let S be the set of terminal nodes in the tree, where we assume to liquidate the portfolio and pay the last liability. Actually, this need not be really the case, as the model is solved in a rolling-horizon framework.

As we have already pointed out, a scenario in the tree is a path from the root node n₀ to a terminal node s

S; let π_s be the corresponding probability. We introduce the following decision variables:

Variables z_iⁿ, y_iⁿ, and x_iⁿ are only defined at nodes

, as we assume that no rebalancing occurs at terminal nodes, where the portfolio is liquidated, we pay the liability L^s, and we measure the terminal wealth W^s. Let u(w) is the utility for wealth w; this function is used to express utility of terminal wealth.

The objective (10.27) is the expected utility of terminal wealth. Equation (10.28) expresses the initial asset balance, taking the current holdings into account; the asset balance at intermediate trading dates, i.e., at all nodes with exception of root and leaves, is taken into account by Eq. (10.29). Equation (10.30) ensures that enough cash is generated by selling assets in order to meet the liabilities; we may also reinvest the proceeds of what we sell in new asset holdings. Note how proportional transaction costs are expressed for selling and purchasing. Equation (10.31) is used to evaluate terminal wealth at leaf nodes, after portfolio liquidation and payment of the last liability. In practice, we would repeatedly solve the model on a rolling-horizon basis, so the exact expression of the objective function is a bit debatable. The role of terminal utility is just to ensure that we are left in a good position at the end of the planning horizon, in order to avoid nasty end-of-horizon effects.

10.5.3 SCENARIO GENERATION AND STABILITY IN STOCHASTIC PROGRAMMING

The key ingredient in any stochastic programming model is a scenario tree, as the quality of the solution depends critically on how well uncertainty is captured. In principle, given a model of the underlying stochastic process, straightforward Monte Carlo simulation could be used to generate a scenario tree, on the basis of the sampling and path generation methods of Chapters 5 and 6. Unfortunately, a suitably rich tree may be needed to capture uncertainty and, given the exponential growth of nodes, this is not practical in the multistage case. An additional difficulty in financial problems is that the tree should be arbitrage-free, which places additional requirements on the branching structure.

One possible way out is a radical change in the solution paradigm. Rather than generate the tree and then solve a large-scale optimization problem, we may interleave optimization and sampling and stop when a convergence criterion is met; this kind of approach relies on sophisticated decomposition strategies that are beyond the scope of this book,¹⁵ and it requires the development of specific software. If we want to stick to commercial solvers, we may resort to the following ideas for clever scenario generation:

In the next sections we outline these ideas. Whatever scenario generation strategy we select, it is important to check its effect on solution stability, as we discuss in Section 10.5.3.4. We will use the simple newsvendor model to illustrate these concepts in Section 10.5.3.5.

1. If the model aims to yield robust first-stage solutions, it may be advisable to use a rich branching factor at the first-stage, and a limited branching factor at later stages.

2. The number of stages can also be limited by using non-uniform time steps. For instance, the first two time steps could correspond to one month, and the later ones to one year.

3. A clever way to limit the number of stages is to include in the objective function a term measuring the quality of the terminal state. By doing so, we may avoid myopic decisions and reduce the size of the tree.

4. Sometimes, we may reduce the effective dimensionality of the problem by applying data reduction techniques such as principal component analysis; see Section 3.8.1. However, we must be able to sample the few principal components accurately and generate corresponding scenarios for all the random variables involved in the model.

10.5.3.1 Gaussian quadrature

We have introduced Gaussian quadrature in Section 2.2 as a tool for numerical integration. When the integral represents the expected value of a function g(x, ξ) depending on continuous random variables ξ with joint density f_ξ(·), a quadrature formula represents a discretization of the underlying continuous distribution:

where the weights π_s correspond to scenario probabilities, and the integration nodes ξ_s are the realizations of the random variables in the discrete approximation. Note that, unlike Monte Carlo sampling, different probabilities are associated with different scenarios. Since Gaussian quadrature formulas ensure exact integration of polynomials up to a given order, this ensures moment matching. As we have remarked, this approach can be applied up to a limited dimensionality, however. Nevertheless, it may work very well for two-stage models.

10.5.3.2 Variance reduction and low-discrepancy sequences

Low-discrepancy sequences are another tool for numerical integration that can be used to generate scenarios; see [29]. There is also some room for variance reduction strategies:

See also [40, Chapter 4] to appreciate the role of the aforementioned techniques for statistical inference in stochastic programming.

10.5.3.3 Optimized scenario generation

Arguably, the most sophisticated approaches to scenario generation are those based on the solution of an optimization problem. We have already appreciated the role of moment matching in Section 2.5. The idea can be applied to scenario generation as well, since we aim at finding a good discretization of a continuous distribution.

To illustrate the approach, consider a variable distributed as a multivariate normal, X ~ N(μ, ∑). The expected values and covariances of the discrete approximation should match those of the continuous distribution. Furthermore, since we are dealing with a normal distribution, we know that skewness and kurtosis for each component should be 0 and 3, respectively.

Let us denote by x_i^s the realization of X_i, i = 1, …, n, in scenario s, s = 1, …, S. Natural requirements are:

Approximate moment matching is obtained by minimizing the following squared error:

The objective function includes four weights w_k which may be used to fine tune performance. It should be mentioned that the resulting scenario optimization problem need not be convex. However, if we manage to find any solution with a low value of the “error” objective function, this is arguably a satisfactory solution, even though it is not necessarily the globally optimal one. The idea can be generalized to “property matching,” since one can match other features that are not necessarily related to moments; see, e.g., [20].

Moment (or property) matching has been criticized, since it is possible to build counterexamples, i.e., pairs of quite different distributions that share the first few moments. An alternative idea is to rely on metrics fully capturing the distance between probability distributions. Thus, given a scenario tree topology, we should find the assignment of values and probabilities minimizing some distance with respect to the “true” distribution. Alternatively, given a large scenario tree, we should try to reduce it to a more manageable size by optimal scenario reduction. From a formal point of view, there are different ways to define a distance between two probability measures

and

. One possibility has its roots in the old Monge transportation problem, which consists of finding the optimal way of transporting mass (e.g., soil, when we are building a road). The problem has a natural probabilistic interpretation, which was pointed out by Kantorovich, when we interpret mass in a probabilistic sense (see [38], for more details). Thus, we may define a transportation functional:

Here c(·, ·) is a suitably chosen cost function; the problem calls for finding the minimum of the integral over all joint measures η, defined on the Cartesian product

, where

is the support of random variable ξ, whose marginals coincide with

and

, respectively (π₁ and π₂ represent projection operators).

As the reader can imagine, we are threading on difficult ground, requiring some knowledge of functional analysis. Nevertheless, the idea is not too difficult to grasp if we deal with discrete distributions, where mass is concentrated on points (atoms). The Kantorovich distance, in the discrete case, boils down to

Therefore, the evaluation of distance requires the solution of a transportation problem, which is a well-known instance of a linear programming problem. Given this distance, it is possible to build heuristics in which scenarios are successively eliminated, redistributing probability mass in an optimal way, after having prescribed the scenario cardinality or the quality of the approximation. See, e.g., [17] for more details.

We should mention that a counterargument to proponents of the optimal approximation approach is that we are not really interested in the distance between the ideal distribution and the scenario tree. What really matters is the quality of the solution. For instance, if we are considering a mean–variance portfolio optimization problem along the lines of Section 10.2.1, it can be argued that matching the first two moments is all we need. See, e.g., [26, Chapter 4] for a discussion of these issues.

10.5.3.4 In- and out-of-sample stability

Whatever scenario reduction strategy we employ, it is important to check the stability of the resulting solution. There are quite sophisticated studies on the stability of stochastic optimization, relying on formal distance concepts. Here we just consider a down-to-earth approach along the lines of [26]. The exact problem is

which involves the expectation of the objective under the true measure

. The actual problem we solve is based on an approximate scenario tree τ, possibly resulting from random sampling:

where with the notation

we point out that we are just estimating the true expected value of the objective, given the generated scenario tree. Each scenario tree induces an optimal solution. Thus, if we sample trees

_i and

_j, we obtain solutions x_i* and x_j*, respectively. If the tree generation mechanism is reliable, we should see some stability in what we get. We may consider stability in the solution itself, but it is easier, and perhaps more relevant, to check stability in the value of the objective function. After all, if we want to minimize cost or maximize profit, this is what matters most.

There are two concepts of stability. In-sample stability means that if we sample two scenario trees, the values of the solutions that we obtain are not too different:

This definition does not apply directly if the scenario tree is generated deterministically; in that case, we might compare trees with slightly different branching structures to see if the tree structure that we are using is rich enough to ensure solution robustness. This concept of stability is called in-sample, as we evaluate a solution using the same tree we have used to find it. But since the tree is only a limited representation of uncertainty, we should wonder what happens when we apply the solution in the real world, where a different scenario may unfold. This leads to out-of-sample stability, where we compare the objective value from the optimization model against the actual expected performance of the solution. If the trees are reliable, we should not notice a significant difference in performance:

This kind of check is not too hard in two stage models, as we should just plug the first-stage solution into several second-stage problems, sampling a large number of solutions. Typically, solving a large number of second-stage problems is not too demanding, especially if they are just linear programs; as a rule, it takes more time to solve one large stochastic LP than a large number of deterministic LPs. Unfortunately, this is not so easy in a multistage setting. To see why, consider the second-stage decisions. They are associated with a realization of random variables, which will be different from the out-of-sample realization, and there is no obvious way to adapt the solution.¹⁶ Some ideas can be devised of course, but a realistic evaluation should rely on a rolling-horizon simulation whereby we solve a sequence of multistage stochastic programming problems and apply only the first-stage solution. This is technically feasible but quite expensive computationally. A possible (cheaper) alternative is to check solutions on different trees to see if

In the next section we apply these ideas on a very simple problem, for the purpose of illustration.

10.5.3.5 Scenario generation and the newsvendor problem

We have introduced the newsvendor model in Example 10.9, as a basic optimization model under uncertainty. Given its structure, we are able to find the optimal solution in explicit form. If we assume that demand is normally distributed, D ~ N(μ, σ²), then we know that the optimal decision is to make/buy a quantity

of items, where z_β is the quantile of the standard normal for the probability level

defined by the profit margin m (underage cost) and by the cost of unsold items c_u (overage cost). We have also seen that maximization of expected profit is equivalent to minimization of an expected deviation cost

In Section 10.2.3 we have formulated the problem as a scenario-based LP, and it is interesting to check the accuracy of scenario generation mechanisms on this prototypical example, as well as investigating stability. To this end, we should also express the objective function analytically, which is not too difficult to accomplish for normally distributed demand,

Let us denote the normal PDF of demand by f_D(x) and express the expected deviation cost in the case of normal demand:

Since we have assumed a normal distribution, the integration domain starts from −∞, which looks unreasonable for a non-negative demand. The assumption is plausible only if the probability of a negative demand is practically negligible. The last integral represents expected lost sales as a function of q and is related to the loss function. We digress a little to illustrate the essential properties of the loss function in the normal case.¹⁷

Technical note: The loss function. We define the standard loss function for a standard normal variable as

where φ(x) is the standard normal density. For a standard normal, the loss function can be expressed in terms of the PDF φ(x) and the corresponding CDF

Given the standard loss function, we may express lost sales in the case of a generic normal random variable:

Using the loss function of a generic normal, we may express the deviation cost for a normal demand explicitely:

The R code of Fig. 10.16 takes advantage of this knowledge and yields the optimal decision and the related cost, given purchase cost, selling price, markdown price, as well as expected value and standard deviation of demand:

In a practical setting, we would probably be forced to round the solution when demand is restricted to integer values, in which case the normal distribution would be an approximation. By the way, since μ > 3σ we may neglect the embarrassing possibility of negative demand.

Given this analytical benchmark, we may use the scenario-based LP model of Section 10.2.3 to check the accuracy of the solution and the estimated expected cost. If we use straightforward Monte Carlo for scenario generation, then the scenarios are equiprobable: π_s = 1/S, s = 1, …, S. We may use the function solveLP of package linprog to solve the model, but this requires building the technology matrix explicitly, which is in this case quite simple as the recourse matrix is just W = [I, I]. The resulting code is shown in Fig. 10.17. Demand scenarios and probabilities are an input to the procedure, so that we may check the performance of Monte Carlo scenario generation against other strategies. The figure also includes code to check in- and out-of-sample stability. Note that in scenario generation we clip demand to non-negative values, as we may not rule out generation of a negative demand, especially with a large sample. In order check in-sample stability, let us repeat the generation of 50 random scenarios five times:

We observe some variability, especially in the objective function. As this casts some doubts on the quality of the solution, we may repeat the procedure with 1,000 scenarios, with a corresponding increase in computational effort, which yields

Now we see that the solution is more stable, and fairly close to the true optimum. As a further check, we may assess out-of-sample stability, by generating a large number of scenarios (one million) and estimating the cost of each solution generated by the stochastic LP model. A surprising fact is that the solution obtained by using 50 scenarios proves to be a rather good one:

This finding can be explained by observing that in some cases the true objective function is rather flat around the optimal solution, which makes the model rather robust. However, this need not be the case in other settings. For instance, in financial applications the model might take full advantage of the faulty information included in badly generated scenarios, yielding a very unstable solution. Furthermore, here we may afford a large number of scenarios because this is a very simple model, but a parsimonious approach must be pursued in general. Thus, it is also interesting to see what happens if we generate scenarios using Gaussian quadrature:¹⁸

With only ten scenarios, we get a solution that proves to be good out-of-sample, even though the in-sample evaluation of the objective function does not look too satisfactory. A less pleasing feature of Gaussian quadrature for this example is that, in order to capture the tails of the distribution, it generates negative as well as rather extreme demand values:

Even though negative values have small probability, this shows a potential issue, which is actually related to the fact that we are using a normal distribution to model an intrinsically non-negative variable. A similar issue could arise when modeling rates of return of stock shares, which cannot be less than −100%.

In Gaussian quadrature we determine both nodes and weights together. We may also pursue a simple approach in which probabilities are fixed to a uniform value, and we look for nodes. One such approach can be devised by integrating random variate generation by the inverse transform method with a sort of “deterministic stratification.” The generation of a standard normal random variable by the inverse transform method, as we have seen in Chapter 5, requires the generation of a uniform random variable U ~ U(0, 1) and the inversion of the CDF of the standard normal, Z = Φ⁻¹(U). We may partition the unit interval uniformly, define a representative point of each subinterval, and generate the corresponding standard normal. For instance, if we use five intervals, their end points are

Each subinterval has probability 0.2, and we might consider their midpoints as their representative points. If so, the nodes would be

We note that, in the stated form, this is at best a simple heuristic; in order to apply stratification we should find the conditional expectation of the standard normal on each interval. This may actually be done, but it is left as an exercise for the reader, and we just check the performance of this simple approach:

Needless to say, the performance is arguably satisfactory because of the simple and well-behaved problem we are tackling. Still, the idea might be worth considering, and it can be applied to small-dimensional cases (where we build hyper-rectangular cells, rather than subintervals), possibly with some preliminary Monte Carlo runs to find the representative point of each cell. It is interesting to note that in this case we do not generate “extreme” node values:

Whether this is a strength or a weakness depends on the application. As a final example, let us resort to low-discrepancy sequences and try scrambled Sobol sequences:¹⁹

Once again, the result does not look bad, but the reader is warned against drawing any conclusion on the basis of this extremely limited experiment. Anyway, we observe that both Monte Carlo and quasi-Monte Carlo methods can be applied to relatively high-dimensional problems, whereas both Gaussian quadrature and stratification are more limited. Unfortunately, the limitation of R as an optimization tool does not allow to cope with more interesting examples. However, we should mention that state-of-the-art commercial solvers, such as IBM CPLEX and Gurobi do offer an interface with R.

10.6 Stochastic dynamic programming

Dynamic programming (DP) is, arguably, the single most powerful optimization principle that we have at our disposal. In fact, it can be used to tackle a wide variety of problems:

Actually, for some problem families, dynamic programming is the only viable approach. Since there is no free lunch, power and generality must come at some price. On the one hand, dynamic programming is a principle, rather than a well defined optimization algorithm. Thus, unlike the simplex method for linear programming, there is no on-the-shelves software package, and a considerable degree of customization and implementation effort may be needed to make it work. On the other hand, dynamic programming is associated with the so-called curse of dimensionality. As we shall see, the curse actually takes different forms, but the essence is that, as a rule, conventional dynamic programming can only be applied to rather small-scale problems. These limitations can be overcome, at least in part, by resorting to numerical approximations where Monte Carlo methods play a fundamental role. Here we are interested in discrete-time decision models under uncertainty; hence, there is a considerable connection with stochastic programming with recourse.

To introduce the dynamic programming concept in the simplest way, we show how it can be used to solve a deterministic shortest path problem in Section 10.6.1. Then, in Section 10.6.2, we formalize the principle in the form of the recursive functional equation originally introduced by Bellman; this form allows us to cope with stochastic optimization problems. We outline the adaption of the approach to infinite-horizon problems in Section 10.6.3. Finally, in Section 10.6.4, we briefly compare stochastic programming with recourse and stochastic dynamic programming; as expected, the two approaches have somewhat complementary strengths and weaknesses, as well as many intersections.

10.6.1 THE SHORTEST PATH PROBLEM

The easiest way to introduce dynamic programming is by considering one of its most natural applications, i.e., finding the shortest path in a network. Graph and network optimization are not customary topics in finance or economics, but a quick look at Fig. 10.18 is enough to understand what we are talking about. A network consists of a set of nodes (numbered from 0 to 7 in our toy example) and a set of arcs joining pairs of nodes. Arcs are labeled by a number which can be interpreted as the arc length (or cost). Our purpose is finding a path in the network, starting from node 0 and leading to node 7, such that the path has total minimal length. For instance, by summing the arc lengths that are followed on the path (0, 1, 4, 7), we see that its total length is 18, whereas path (0, 1, 3, 5, 7) has length 16. At each node, we must choose the next node to visit. We may immediately appreciate that this problem bears some resemblance to dynamic decision making; given some state we are in, we should decide what to do in order to optimize an outcome that depends on the whole path. A greedy decision need not be the optimal one; for instance, the closest node to the starting point 0 in our network is node 2, but there is no guarantee that this arc is on an optimal path.

Of course, we could simply enumerate all of the possible paths to spot the optimal one; here we have just a finite set of alternatives and there is no uncertainty involved, so the idea is conceptually feasible. However, this approach becomes quickly infeasible in practice, as the network size increases. Furthermore, we need a better idea in view of an extension to a stochastic case: If arc costs are random, running a Monte Carlo simulation for each possible path to estimate its expected length is out of the question. So we must come up with some clever way to avoid exhaustive enumeration. Dynamic programming is one possible approach to accomplish this aim. It is worth noting that more efficient algorithms are available for the shortest path problem, but the idea we illustrate here can be extended to problems featuring infinite state spaces (count-able or not) and uncertain data. Let

= {0, 1, 2, …, N} be the node set and

be the arc set; let the start and final nodes be 0 and N, respectively. For simplicity, we assume that the network is acyclic and that the arc lengths c_ij, i, j

, are non-negative: If we had the possibility of getting trapped in a loop of negative length arcs, the optimal cost would be −∞, and we do not want to consider such pathological cases.

The starting point is to find a characterization of the optimal solution, which can be translated into a constructive algorithm. Let V_i be the length of the shortest path from node

to node N (denoted by i

N). Assume that, for a specific i

, node j lies on the optimal path i

N. Then the following property holds:

In other words, the optimal solution for a problem is obtained by assembling optimal solutions for smaller subproblems, which suggests a useful decomposition strategy. To understand why, consider the decomposition of i

N into the subpaths i → j and j → N. The length of i

N is the sum of the lengths of the two subpaths:

Note that the second subpath is not affected by how we go from i to j. This is strongly related to the concept of state in Markovian dynamic systems: How we get to state j has no influence on the future. Now, assume that the subpath j → N is not the optimal path from j to N. Then we could improve the second term of Eq. (10.38) by assembling a path consisting of i → j followed by j

N. The length of this new path would be

which is a contradiction, as we have assumed that V_i is the optimal path length.

This observation leads to the following recursive equation for the shortest path from a generic node i to the terminal node N:

In other words, to find the optimal path from node i to node N, we should consider the immediate cost c_ij of going from i to all of its immediate successors j, plus the optimal cost of going from j to the terminal node. Note that we do not only consider the immediate cost, as in a greedy decision rule. We also add the future cost of the optimal sequence of decisions starting from each state j that we can visit next; this is what makes the approach nonmyopic. The function V_i is called cost-to-go or value function and is denned recursively by Eq. (10.39). The value function, for each point in the state space, tells us what the future optimal cost would be, if we reach that state and proceed with an optimal policy. This kind of recursive equation, whose exact form depends on the problem at hand, is the heart of dynamic programming and is an example of a functional equation. In the shortest path problem, we have a finite set of states, and the value function is a vector; in a continuous-state model, the value function is an infinite-dimensional object.

Solving the problem requires finding the value function V₀ for the initial node, and to do that we should go backward from the terminal node.²⁰ We can associate a terminal condition V_N = 0 with our functional equation, as the cost for going from N to N is zero. Then we unfold the recursion by considering the immediate predecessors i of the terminal node N; for each of them, finding the optimal path length is trivial, as this is just c_iN. Then we proceed backward, labeling each node with the corresponding value function. In this unstructured network, we may label a node only when all of its successors have been labeled; we can always find a correct node ordering in acyclic networks.

Example 10.14 A shortest path

Let us find the shortest path for the network depicted in Fig. 10.18. We have the terminal condition V₇ = 0 for the terminal node, and we look for its immediate predecessors 4 and 6 (we cannot label node 5 yet, because node 6 is one of its successors). We have

equation

Now we may label node 5:

equation

Then we consider node 3 and its immediate successors 4, 5, and 6:

equation

By the same token we have

equation

Apart from getting the optimal length, which is 16, we may find the optimal path by looking for the nodes optimizing each single decision, starting from node 0:

equation

10.6.2 THE FUNCTIONAL EQUATION OF DYNAMIC PROGRAMMING

The shortest path problem is one of the prototypical examples of dynamic programming, yet it lacks the structure of most problems of interest in finance and economics, which are dynamic problems where uncertainty unfolds over time. We have introduced the three essential families of dynamic models in Chapter 3, Dynamic programming can be applied to continuous-time and discrete-event models, too, but here we only consider discrete-time models, which are relatively simple but do hide some potential traps. In fact, when discretizing time it is necessary to be very clear about the meaning of time instants and time intervals:²¹

These definitions are illustrated in Fig. 10.19. Note that, with this timing convention, we emphasize the fact that noise is realized during time interval t, after making the decision at time instant t − 1.

The vector of state variables at time instant t is denoted by s_t; with the time convention above s_t is the value of the state variables at the end of time period t, as a result of what happened between time instants t − 1 and t, i.e., during time interval t.

The vector of decision variables is denoted by x_t; these decisions are based on the knowledge of s_t. Here we assume that states are perfectly observable, but the framework can be applied to inexact observations as well.

The state s_t+1 at time instant t + 1 depends on s_t and x_t, but also on possible random perturbations that take place during time interval t + 1, i.e., between time instants t, when we made the last decision, and t + 1, when we have to make another one. We represent these perturbations by

_t+1; the subscript t + 1 emphasizes the fact this is new information.

where S_t+1 is the state transition function over time period t + 1; the subscript may be omitted if the dynamics do not change over time, i.e., if the system is stationary. The latter is the common assumption when dealing with infinite-horizon problems.

By making the decision x_t in state s_t we incur an immediate cost C_t(s_t, x_t). We would like to minimize the expected cost incurred over a given planning horizon,

where γ < 1 is a suitable discount factor. Clearly, when dealing with profits or utility functions, we rewrite the problem in terms of a maximization. However, in the statement above we are not stating precisely what kind of decision process we are dealing with. In a deterministic setting, we would minimize with respect to decision variables x_t, and we would come up with a plan specifying all of the future decisions. However, we have already seen in multistage stochastic programming with recourse that decisions should be adapted to the occurred contingencies. As a consequence, apart from the first-stage ones, decisions are in fact random variables. In stochastic dynamic programming we make this even clearer, as decisions are obtained by the application of a feedback policy.²² We define a policy, denoted by π, whereby an action is selected on the basis of the current state:

where Π is the set of feasible policies. By feasible we mean that each decision x_t is constrained to be in X_t(s_t), the set of feasible decisions at time t if we are in state s_t.

Stated as such, the problem does not look quite tractable. However, the shortest path example suggests a suitable decomposition strategy. When in state s_t at time t, we should select the decision x_t

X_t(s_t) that minimizes the sum of the immediate cost, and the expected value of cost-to-go that we will incur from time period t + 1 onward:

More generally, we talk of a value function, as the term is better suited to both minimization and maximization problems. The conditioning with respect to s_t stresses the fact that we are taking an expectation conditional on the current state; it might even be the case that uncertainty depends on our decisions, and if so we should also condition with respect to x_t. We will not analyze the conditions under which Eq. (10.44) can be used to solve problem (10.43). The intuition gained from the shortest path problem suggests that the stochastic process

_t must be Markovian. With more general path dependency, the decomposition argument behind dynamic programming fails. Sometimes, a model reformulation based on additional state variables can be used to transform a non-Markovian model into a Markovian one, at the price of an increase in complexity. The careful reader will immediately notice a similarity between this modeling framework and stochastic programming with recourse, where the recourse cost plays the role of the value function. The recursive structure of a multistage problem, which is pretty evident in Eq. (10.26), is also visible here. However, multistage stochastic programming with recourse does not require the Markov property. On the other hand, we should note that standard stochastic programming models with recourse assume exogenous uncertainty.

Equation (10.44) is the functional equation of dynamic programming, also called the optimality equation, and it requires to find the value function for each time instant. Typically, the functional equation is solved backward in time, taking advantage of the fact that the problem in the last time period is a myopic choice. At the last decision time instant, t = T, we should solve the problem

for every possible state s_T. The value function V_T+1 (s_T+1) is given by a boundary condition. For problem (10.41), the boundary condition is

since the objective function includes only a trajectory cost. Sometimes, a cost (or reward) Φ_T+1(s_T+1) is associated with the terminal state, and the problem is

For instance, in life-cycle consumption-saving models for pension economics, the objective function could include both a utility function depending on consumption and some utility from bequest. In this case, we impose the boundary condition

Whatever boundary condition we select, problem (10.45) is a static optimization model. By solving it for every state s_T, we build the value function V_T(s_T) at time t = T. Then, unfolding the recursion backward in time, we find the value functions V_T−1(s_T−1), V_T−2(s_T−2), and so on, down to V₁(s₁). Finally, given the initial state s₀, we find the next optimal decision by solving the static problem

Unlike stochastic programming with recourse, we have a clear way to make also the next decisions, as the full set of value functions allows us to solve each problem over time, finding the decision x_t as a function of the current state s_t. Thus, we find the optimal policy in an implicit feedback form, and the dynamic problem is decomposed into a sequence of static problems.

The skeptical reader will wonder what is the price to pay for this extremely powerful approach. The price is actually pretty evident in Eq. (10.44). We should solve an optimization problem for each possible value of the state variable s_t. If we are dealing with a continuous state space, this is impossible, unless we may take advantage of some specific problems structure (we shall see an example in Section 10.7.1). But even if the state space is discrete, when the dimensionality is large, i.e., when s_t is a vector with several components, the sheer computational burden is prohibitive. This is called the curse of dimensionality. Actually, a close look at Eq. (10.44) shows that we face a three-fold curse of dimensionality:

The shortest path problem is so easy because it is deterministic, it features a discrete state space, and the of possible decisions is discrete as well. In general the application of DP is not trivial, and calls for the integration of several numerical tricks of the trade, as we show in Section 10.7. Even so, DP in its standard form can be tackled only for moderately sized problems. However, recent trends in approximate dynamic programming, which we outline in Section 10.8, integrate statistical learning with optimization, paving the way for a much wider array of applications. Typically, these approaches rely on Monte Carlo simulation.

10.6.3 INFINITE-HORIZON STOCHASTIC OPTIMIZATION

The recursive form of Eq. (10.44) needs some adjustment when coping with an infinite-horizon problem like

where we assume that immediate costs are bounded and γ < 1, so that the series converges to a finite value.²³ In this case, we typically drop the subscript t from the immediate cost, as well as from the state-transition equation

i.e., we assume a stationary model. An infinite-horizon model may be of interest per se, or it can be a trick to avoid the need to specify a terminal state value function. In this case, the functional equation boils down to

where X(s) is the set of feasible decisions when we are in state s. The good news is that we need only one value function, rather than a value function for each time instant. The bad news is that now we have a value function defined as the fixed point of a possibly complicated operator, as we have V(s) on both sides of the equation. In general, an iterative method is needed to solve Eq. (10.49).

10.6.4 STOCHASTIC PROGRAMMING WITH RECOURSE VS. DYNAMIC PROGRAMMING

Before we delve into numerical solution methods for stochastic dynamic programming, it is useful to pause and compare this approach with stochastic programming with recourse, since there is a clear connection between the two.

As a general consideration, stochastic programming models are mostly favored within the operations research community, whereas dynamic programming is favored by economists.²⁴ One reason is that stochastic programming is intrinsically operational: What we care about is the first-stage decision, which is what we actually implement. The next stages are included to find a nonmyopic and robust solution, but they are not really contingency plans to be implemented. When uncertainty is progressively resolved, we typically recalibrate and resolve the model, within a rolling-horizon framework. On the contrary, the use of a model in economics is generally quite different. Consider a life-cycle, strategic asset allocation model.²⁵ Usually, a very limited number of assets is included in the model, possibly only two, risk-free and risky. Indeed, the aim there is not really to manage a portfolio of a pension fund. The aim is to investigate the behavior of individual workers in order to see what drives their decisions (precautionary savings, habit formation, income path over time, etc.). To simulate consumption-saving and asset allocation decisions over several years, dynamic programming is definitely more convenient, as once we have found the sequence of value functions, we may simulate the sequence of optimal decisions, which calls for the solution of a sequence of static optimization models, without the need of solving a dynamic model repeatedly. The result is a pattern in terms of consumption–saving decisions, as well as an asset allocation between the risky and risk-free assets. The pattern may be checked against empirical evidence in order to assess the validity of the model and the assumptions behind it.

There is also a clear connection between recourse functions in stochastic programming and value functions in dynamic programming. However, a recourse function is instrumental in finding a first-stage decision. Decisions at latter stages are made after observing the new state, which need not be included in the scenario tree; however, there is no obvious way to adjust a recourse decision when an unplanned scenario is realized. The value function is essential in finding decisions in feedback form; hence, they may be used for any scenario. This is a definite advantage of dynamic programming, which comes at a price. First, we have to rely on a Markovian structure, whereas stochastic programming may cope with more general path dependence. Second, stochastic programming may cope with more complicated, less stylized decision models.

Another feature of dynamic programs is that, as a rule, they deal with multi-stage models where the same kind of decision must be made over and over. For instance, in a consumption–saving problem, at each time step we have to decide how much to consume and how much to save. However, there are decision problems in which the decisions ate different stages are completely different. For instance, consider the design of a logistic network. At one stage, we cope with the fixed costs of installing warehouses and other facilities; hence, the decision variables define the network structure, which must be designed under demand uncertainty. At later stages, we have to cope with variable transportation costs; hence, the second-stage decision are related to the optimal routing of items, in order to satisfy a known demand, with constraints imposed by the selected network structure. Clearly, this latter kind of model is less common in finance or economics, but it contributes to explain with stochastic programming is more common in operational decision problems.

Another noteworthy point is that, so far, we have assumed that the important output of the optimization model is a set of decisions, possibly in feedback form. However, there are problems in which it is the value of the objective function that is relevant. One important class of such problems is the pricing of American-style options. As we have seen in Section 10.2.5, pricing options with early exercise features requires the solution of a stochastic optimization problem. Since our ability to represent uncertainty by scenario trees is limited, the validity of a price found by stochastic programs is questionable. Armed with a sequence of value functions, possibly approximated ones, we may simulate the application of an early exercise policy with a large number of scenarios. Due to approximations in the solution of the dynamic programming problems, we will get a lower bound on the option price, but at least we may check its reliability, as we shall see in Chapter 11.

Finally, we observe that DP is well-suited to cope with infinite horizon problems, whereas stochastic programming with recourse is not. However, the two approaches may be integrated, since the value function for an infinite-horizon program may be used to avoid end-of-horizon effects in a finite-horizon stochastic program. Indeed, there are important connections between the approaches, especially when it comes to devising sophisticated decomposition algorithms to cope with large-scale problems. Thus, both approaches should be included in our bag of tools to tackle a wide class of relevant problems.

Chapter Ten

Optimization

10.1 Classification of optimization problems

Convex vs. nonconvex optimization problems

Example 10.1 Polyhedra are convex sets

Example 10.2 The union of convex sets need not be convex

Example 10.3 Local minima in polynomial optimization

Example 10.4 Variations on a circle

Example 10.5 Nonconvexities in the feasible set

Example 10.6 A concave problem

Linear vs. nonlinear optimization problems

Example 10.7 The linprog package

Example 10.8 The quadprog package

Deterministic vs. stochastic optimization problems

Example 10.9 The newsvendor problem

10.2 Optimization model building

10.2.1 MEAN–VARIANCE PORTFOLIO OPTIMIZATION

10.2.2 MODELING WITH LOGICAL DECISION VARIABLES: OPTIMAL PORTFOLIO TRACKING

Modeling fixed charges

Semicontinuous variables

Portfolio tracking and compression

10.2.3 A SCENARIO-BASED MODEL FOR THE NEWSVENDOR PROBLEM

10.2.4 FIXED-MIX ASSET ALLOCATION

10.2.5 ASSET PRICING

Example 10.10 An American-style put option

10.2.6 PARAMETER ESTIMATION AND MODEL CALIBRATION

Example 10.11 Fitting the yield curve

10.3 Monte Carlo methods for global optimization

10.3.1 LOCAL SEARCH AND OTHER METAHEURISTICS

Example 10.12 Neighborhood structures for local search

10.3.2 SIMULATED ANNEALING

10.3.3 GENETIC ALGORITHMS

10.3.4 PARTICLE SWARM OPTIMIZATION

10.4 Direct search and simulation-based optimization methods

10.4.1 SIMPLEX SEARCH

10.4.2 METAMODELING

10.5 Stochastic programming models

10.5.1 TWO-STAGE STOCHASTIC LINEAR PROGRAMMING WITH RECOURSE

Example 10.13 Simple recourse in the newsvendor model

10.5.2 A MULTISTAGE MODEL FOR PORTFOLIO MANAGEMENT

10.5.2.1 A multistage financial planning model

10.5.3 SCENARIO GENERATION AND STABILITY IN STOCHASTIC PROGRAMMING

10.5.3.1 Gaussian quadrature

10.5.3.2 Variance reduction and low-discrepancy sequences

10.5.3.3 Optimized scenario generation

10.5.3.4 In- and out-of-sample stability

10.5.3.5 Scenario generation and the newsvendor problem

10.6 Stochastic dynamic programming

10.6.1 THE SHORTEST PATH PROBLEM

Example 10.14 A shortest path

10.6.2 THE FUNCTIONAL EQUATION OF DYNAMIC PROGRAMMING

10.6.3 INFINITE-HORIZON STOCHASTIC OPTIMIZATION

10.6.4 STOCHASTIC PROGRAMMING WITH RECOURSE VS. DYNAMIC PROGRAMMING