Chapter Thirteen: Risk Measurement and Management

In this chapter we deal with a fundamental key issue in finance, i.e., risk. Actually, risk is a multidimensional concept, and there is a long list of risk factors that are relevant in finance. Risk factors directly related to financial markets are the first that come to mind, such as volatility in stock prices, interest rates, and foreign exchange rates, but they do not exhaust the list at all. For instance, after the subprime mortgage crisis, everyone is well aware of the role of credit risk, and the increasing interplay between financial markets and energy/commodity markets contributes to complicate the overall picture. In fact, the impact of derivative markets on oil and commodity (most notably, food) prices is controversial. The growth of derivative markets casts some doubts on seemingly obvious causal links: If one thinks about interest rate derivatives, it may seem natural assume that the value of those derivatives depends on prevailing conditions on money and capital markets. However, if one realizes the sheer volume of derivatives traded on regulated exchanges and over the counter, some doubts may arise about tails wagging dogs.

The above risk factors are often addressed by Monte Carlo methods, but we should mention that there are other risk factors that are less amenable to statistical approaches. The increasing role of information/communication technologies and globalization has introduced other forms of risk, such as operational risk, political risk, and regulatory risk. Ensuring business continuity in the face of operational disruptions is quite relevant to any financial firm, but it involves a different kind of risk analysis, and it is certainly difficult to attach probability distributions to political risk factors. Despite these difficulties, it is fundamental not to miss the interplay between financial and nonfinancial stakeholders. In times of general concern about the nature of financial speculation and the abuse of derivatives, it is refreshing to keep in mind that risk affects not only financial institutions, but also nonfinancial stakeholders, like firms using derivatives for hedging purposes. A global player is typically exposed, among other things, to foreign exchange risk. Choosing the right hedging instrument (e.g., forward/futures contracts, vanilla or exotic options) requires an assessment of the amount of exposure that should be covered. In such a context, things are made difficult by volume risk, which is related to how good or bad the business is going to be. Given this level of complexity, it is clear that we cannot adequately address the full set of risk dimensions in a single chapter. We will only deal with a limited subset of the above risk factors, through a few selected examples aimed at illustrating the potential of Monte Carlo applications for risk measurement and management.

Last but not least, we should remark that risk management is sometimes confused with risk measurement. It is certainly true that a sensible way to measure risk is a necessary condition to manage risk properly. But coming up with a number gauging risk, however useful, is not the same as establishing a policy, e.g., to hedge financial or commodity risk. In the first part of the chapter we deal with risk measures. Emphasis will be placed on quantile-based risk measures, such as value-at-risk (V@R) and conditional value-at-risk (CV@R). In the second part of the chapter, we move on and deal with risk management; we also illustrate a few links with certain stochastic optimization models dealt with in Chapter 10.

We start in Section 13.1 by discussing general issues concerning risk measurement, most notably the coherence of risk measures, Then, Section 13.2 is essentially devoted to V@R, a widely used and widely debated quantile-based risk measure. First we define V@R, then we illustrate a few of its shortcomings. Despite all of the criticism concerning its use, V@R is still a relevant risk measure, and it provides us with a good opportunity to appreciate Monte Carlo simulation in action. It is not too difficult to build a crude simulation model to estimate V@R, but there are several issues that we must take into account. As we have pointed out in Section 7.3, estimating quantiles raises some nontrivial statistical issues. Therefore, in Section 13.3 we estimate shortfall probabilities, from which other risk measures can be estimated. Furthermore, when a portfolio includes derivatives, it may be rather expensive to reprice them several times. Thus, we explore the possibility of using option greeks to speed up the computation. Another thorny issue is that in risk measurement we are concerned with extreme bad tails, where rare events are relevant. This may require the adoption of suitable variance reduction strategies, as we illustrate in Section 13.4. In the second half of the chapter we move from risk measurement to risk management. In Section 13.5 we take a brief detour to discuss stochastic programming models involving risk measures; in particular, we illustrate the difficulty of dealing with V@R, whereas the problem of CV@R optimization has a surprisingly simple solution, at least in principle. In Section 13.6 we illustrate how Monte Carlo methods can be used to assess the merits of a risk management policy. As a simple but instructive example, we consider delta hedging of a vanilla call option and compare it against a simpler stop-loss strategy. As expected, continuous-time delta hedging yields the Black–Scholes–Merton price for the option, assuming very idealized, not quite realistic, market conditions. Nevertheless, a simulation model is flexible and allows us to analyze the impact of additional features, like transaction costs, stochastic volatilities, and modeling errors. Finally, in Section 13.7 we discuss the interplay between financial and nonfinancial risk factors in a foreign exchange risk management problem.

13.1 What is a risk measure?

When setting up a portfolio of financial assets, stocks or bonds, or when writing an option. risk is the first and foremost concern. This is evident in basic portfolio management models in the Markowitz style, where expected return is traded off against variance. We have introduced this model in Section 3.1.1, and here we just recall that, if we consider a universe of n assets with random rate of return R_i, i = 1, …, n, and if w

ⁿ is the vector collecting portfolio weights w_i, then the variance of the portfolio return is

where the matrix ∑

^n,n collects the covariances σ_ij = Cov(R_i, R_j) of the asset returns. Typically, the actual risk measure considered is the standard deviation σ_p, which has the same unit of measurement of expected return; the reason for minimizing variance is purely computational, as this is equivalent to minimizing standard deviation and results in an easy quadratic programming problem. Clearly, standard deviation can be considered as a risk measure: the smaller, the better. Standard deviation does capture the dispersion of a probability distribution, but is it really a good risk measure?

Before answering the question, we should actually clarify what a risk measure is. Standard deviation maps a random variable into a real number. Hence, we may define a risk measure as a function ρ(X) mapping a random variable X into

. According to our convenience, the random variable X may represent the value of a portfolio, or profit, or loss. Furthermore, when we consider loss, we may take the current wealth as a reference point, or its expected value in the future. Whatever the modeling choice is, a larger value of ρ(X) is associated with a riskier portfolio. By the way, standard deviation of portfolio return R_p is just the same as the standard deviation of portfolio loss L_p = − R_p. Indeed, standard deviation is symmetric, as it penalizes both the upside and the downside potential of a portfolio. Hence, it may be a suitable measure if we assume a symmetric distribution for asset returns; it is a bit less so, if we include assets with skewed, i.e., asymmetric returns. Even if we accept the hypothesis that stock returns are, e.g., normally distributed, and this is not quite the case, it is certainly not true that the same property holds for derivatives written on those assets, because of the nonlinearity of the payoff functions, which induces a nonlinearity in option values before maturity as well.

Given the limitations of standard deviation, we should look somewhere else to find alternative risk measures, but which ones make sense? What are, in principle, the desirable features of a risk measure? A plausible list of desiderata derives from intuitive properties that a risk measure should enjoy. Such a list has been compiled and leads to the definition of coherent risk measure. Note that in the following statement we assume that X is related to portfolio value or profit (the larger, the better); if we interpret X as loss, the properties can be easily adjusted by flipping some inequalities and changing a few signs:

Normalization. If the random variable is X ≡ 0, it is reasonable to set ρ(0) = 0; if you do not hold any portfolio, you are not exposed to any risk.

Monotonicity. If X₁ ≤ X₂, with probability 1, then ρ(X₁) ≥ ρ(X₂). In plain English, if the value of portfolio 1 is never larger than the value of portfolio 2, then portfolio 1 is at least as risky as portfolio 2.¹

Translation invariance. If we add a fixed amount to the portfolio, this will reduce risk: ρ(X + α) = ρ(X) − α.

Positive homogeneity. Intuitively, if we double the amount invested in a portfolio, we double risk. Formally: ρ(αX) = αρ(X), for α ≥ 0.

Subadditivity. Diversification is expected to decrease risk; at the very least, diversification cannot increase risk. Hence, it makes sense to assume that the risk of the sum of two random variables should not exceed the sum of the respective risks: ρ(X + Y) ≤ ρ(X) + ρ(Y).

Convexity is a quite important feature when tackling optimization problems involving risk measures, as convex optimization problems are relatively easy to solve. We are dealing here only with a single-period problem; tackling multiperiod problems may complicate the matter further, introducing issues related to time consistency, which we do not consider here.²

These are the theoretical requirements of a risk measure, but what about the practical ones? Clearly, a risk measure should not be overly difficult to compute. Unfortunately, computational effort may be an issue, if we deal with financial derivatives whose pricing itself requires intensive computation. Another requirement is that it should be easily communicated to top management. A statistically motivated measure, characterizing a feature of a probability distribution, may be fine for the initiated, but a risk measure expressed in hard monetary terms can be easier to grasp. This is what led to the development of value-at-risk, which we describe in the next section. A further advantage of such a measure is that it sets all different kind of assets on a common ground. For instance, duration gives a measure of interest rate risk of bonds, and option greeks, like delta and gamma, tell something about the risk of derivatives, but it is important to find a measure summarizing all risk contributions, irrespective of the nature of the different positions.

Before closing the section, we should at least mention that in financial economics there is another approach to deal with risk aversion and decision under risk, based on expected utility theory. Utility functions, despite all of their limitations and the criticisms surrounding them, could be used in principle, but there are at least two related difficulties:

It is difficult to elicit the utility function of a decision maker.

If we are dealing with a portfolio of a mutual fund, whose risk aversion should we measure? The fund manager? The client? And how can we aggregate the risk aversion of several clients?

The last observation clearly points out that we need a hopefully objective risk measure, related to the risk in the portfolio, rather than the subjective attitude of a decision maker toward risk. This is necessary if risk measures are to be used as a tool by regulators and managers.

13.2 Quantile-based risk measures: Value-at-risk

The trouble with standard deviation as a risk measure is that it does not focus on the bad tail of the profit/loss distribution. Hence, in order to account for asymmetry, it is natural to look for risk measures related to quantiles. The best-known such measure is value-at-risk. It is common to denote value-at-risk as VaR, where the last capital letter should avoid confusion with variance.³ We follow here the less ambiguous notational style of V@R. Informally, V@R aims at measuring the maximum portfolio loss one could suffer, over a given time horizon, within a given confidence level. Technically speaking, it is a quantile of the probability distribution of loss. Let L_T be a random variable representing the loss of a portfolio over a holding period of length T; note that a negative value of loss corresponds to a profit. Then, V@R at confidence level 1 − α can be defined as the smallest number V@R_1−α such that

This definition is also valid for discrete probability distributions. If L_T is a continuous random variable and its CDF is invertible, we may rewrite Eq. (13.1) as

In the following discussion we will mostly assume for simplicity that loss is a continuous random variable, unless a discrete one is explicitly involved. For instance, if we set α = 0.05, we obtain V@R at 95%. The probability that the loss exceeds V@R is α.⁴

Actually, there are two possible definitions of V@R, depending on the reference wealth that we use in defining loss. Let W₀ be the initial portfolio wealth. If R_T is the random (rate of) return over the holding period, the future wealth is

where μ is the expected return. The absolute loss over the holding period is related to the initial wealth:

The quantile of absolute loss at level 1 − α is the absolute V@R at that confidence level. We find the relative V@R if we take the expected future wealth as the reference in defining loss:

If we work with a short time period, say, one day, drift is dominated by volatility;⁵ hence, the expected return is essentially zero and the two definitions boil down to the same thing. Since certain bank regulations require the use of a risk measure in order to set aside enough cash to be able to cover short-term losses, in this section we consider absolute V@R and drop the superscript from the definition of loss. Nevertheless, we should keep in mind that relative V@R may be more relevant to longer term risk, as the one faced by a pension fund. Furthermore, even if the underlying risk factors do not change at all, the sheer passage of time does have an effect on bonds and derivatives.

The loss L_T is a random variable depending on a set of risk factors, possibly through complicated transformations linked to option pricing. To illustrate V@R using a simple example, let us consider a portfolio of stock shares and assume that the risk factor is portfolio return itself. If the holding period return has a continuous distribution, Eq. (13.2) implies that loss will exceed V@R with a low probability,

where we may indifferently write “≥” or “>,” since the distribution involved is continuous. This can be rewritten as follows:

The return r_α will be negative in most practical cases and is just the quantile at level α of the distribution of portfolio return, i.e., the worst-case return with confidence level α. Thus, absolute V@R is

Computing V@R is very easy if we assume that the rate of return is normally distributed with

in which case there is no difference between absolute and relative V@R, and loss L_T = −W₀R_T is normal as well, L_T ~ N(0, W₀²σ²). We may take advantage of the symmetry of the normal distribution, as the critical return r_α is, in absolute value, equal to the quantile r_1−α. Then, to compute V@R_1−α, we may use the familiar standardization/destandardization drill for normal variables:

where Z is a standard normal variable. We have just to find the standard quantile z_1−α and set

Example 13.1 Elementary V@R calculation

We have invested $100,000 in Quacko Corporation stock shares, whose daily volatility is 2%. Then, V@R at 95% level is

We are “95% sure” that we will not lose more than $3289.71 in one day. V@R at 99% level is

Clearly, increasing the confidence level by 4% has a significant effect, since we are working on the tail of the distribution.

The assumption of normality of returns can be dangerous, as the normal distribution has a relatively low kurtosis; alternative distributions have been proposed, featuring fatter tails, in order to better account for tail risk, which is what we are concerned about in risk management. Nevertheless, the calculation based on the normal distribution is so simple and appealing that it is tempting to use it even when we should rely on more realistic models. In practice, we are not interested in V@R for a single asset, but in V@R for a whole portfolio. Again, the normality assumption streamlines our task considerably.

Example 13.2 V@R in multiple dimensions

Suppose that we hold a portfolio of two assets. The portfolio weights are w₁ = and w₂ = , respectively. We also assume that the returns of the two assets have a jointly normal distribution; the two daily volatilities are σ₁ = 2% and σ₂ = 1%, respectively, and the correlation is ρ = 0.7. Let the time horizon be T = 10 days; despite this, we assume again that expected holding period return is zero. To obtain portfolio risk, we first compute the variance of the holding period return:

Hence, σ_p = 0.05011. If the overall portfolio value is $10 million, and the required confidence level is 99%, we obtain

Once again, we stress that the calculations in Example 13.2 are quite simple (probably too simple) since they rely on a few rather critical assumptions. To begin with, we have scaled volatility over time using the square-root law, which assumes independence of returns over time. If you are not willing to believe in the efficient market hypothesis, this may be an uncomfortable assumption. Then, we have taken advantage of the analytical tractability of the normal distribution, and the fact the return of stock shares was the only risk factor involved. Other risk factors may be involved, such as inflation and interest rates, and the portfolio can include derivatives, whose value is a complicated function of underlying asset prices. Even if we assume that the underlying risk factors are normally distributed, the portfolio value may be a nonlinear function of them, and the analytical tractability of the normal distribution is lost. Needless to say, Monte Carlo methods play a relevant role in estimating V@R in such cases, as we will illustrate later. However, we should also mention that a completely different route may be taken, based on historical V@R. So far, we have relied on a parametric approach, based on a theoretical, not necessarily normal, probability distribution. One advantage of the normal distribution is that it simplifies the task of characterizing the joint distribution of returns, since we need only a correlation matrix. However, we know that correlations may not fully capture dependence between random variables, and this is especially true under the stress conditions that are relevant in risk management. The copula framework of Section 3.4 may be adopted to improve accuracy. Alternatively, rather than assuming a specific joint distribution, we may rely on a nonparametric approach based on historical data. The advantage of historical data is that they should naturally capture dependence. Hence, we may combine them, according to bootstrapping procedures, to generate future scenarios and estimate V@R by historical simulation.

Whatever approach we use for its computation, V@R is not free from some fundamental flaws, which depend on its definition as a quantile. We should be well aware of them, especially when using sophisticated computational tools that may lure us into a false sense of security. For instance, a quantile cannot distinguish between different tail shapes. Consider the two loss densities in Fig. 13.1. In Fig. 13.1(a) we observe a normal loss distribution and its 95% V@R, which is just its quantile at probability level 95%; the area of the right tail is 5%. In Fig. 13.1(b) we observe a sort of truncated distribution, obtained by appending a uniform tail to a normal density, which accounts for 5% of the total probability. By construction, V@R is the same in both cases, since the areas of the right tails are identical. However, we should not associate the same measure of risk with the two distributions. In the case of the normal distribution there is no upper bound to loss; in the second case, there is a clearly defined worst-case loss. Whether the risk for density (a) is larger than density (b) or not, it depends on how we measure risk exactly; the point is that V@R does not indicate any difference between them. In order to discriminate between the two cases, we may consider the expected value of loss conditional on being on the right (bad) tail of the loss distribution. This conditional expectation yields the midpoint of the uniform tail in the truncated density; the conditional expected value may be larger in the normal case, because of its unbounded support. This observation has led to the definition of alternative risk measures, such as conditional value-at-risk (CV@R), which is the expected value of loss, conditional on being to the right of V@R.

Risk measures like V@R or CV@R could also be used in portfolio optimization, by solving mathematical programs with the same structure as problem (10.12), where variance is replaced by such measures. The resulting optimization problem can be rather difficult. In particular, it may lack the convexity properties that are so important in optimization. It turns out that minimizing V@R, when uncertainty is modeled by a finite set of scenarios (which may be useful to capture complex distributions and dependencies among asset prices), is a nasty nonconvex problem, whereas minimizing CV@R is, from a computational viewpoint, easier as it yields a convex optimization problem.⁶

There is one last issue with V@R that deserves mention. Intuitively, risk is reduced by diversification. This should be reflected by any risk measure ρ(·) we consider and, as we have seen before, the subadditivity condition is needed to express such a requirement. The following counterexample is often used to show that V@R lacks this property.

Example 13.3 V@R is not subadditive

Let us consider two corporate bonds, A and B, whose issuers may default with probability 4%. Say that, in the case of default, we lose the full face value, $100 (in practice, we might partially recover the face value of the bond). Let us compute the V@R of each bond with confidence level 95%. Since loss has a discrete distribution in this example, we should use the more general definition of V@R provided by Eq. (13.1). The probability of default is 4%, and 1 −0.04 = 0.96 > 0.95; therefore, we find

Now what happens if we hold both bonds and assume independent defaults? We will suffer:

A loss of $0, with probability 0.96² = 0.9216

A loss of $100, with probability 2 × 0.96 × 0.04 = 0.0768

A loss of $200, with probability 0.04² = 0.0016

Now the probability of losing $0 is smaller than 95%, and

equation

Hence, with that confidence level, V@R(A+B) = 100 > V@R(A) + V@R(B), which means that risk, as measured by V@R, may be increased by diversification.

The counterexample shows that V@R lacks one of the fundamental properties of a coherent risk measure. We should mention that if we restrict our attention to specific classes of distributions, such as the normal, V@R is subadditive. Nevertheless, the above considerations suggest the opportunity of introducing other risk measures, like CV@R. It can be shown that CV@R is a coherent risk measure.

13.3 Issues in Monte Carlo estimation of V@R

Using Monte Carlo methods to estimate V@R is, in principle, quite straightforward. Let V(S(t), t) denote the value of a portfolio at time t, depending on a vector of risk factors S(t)

ⁿ, with components S_i(t), i = 1, …, n. The risk factors could be the underlying asset prices in a portfolio including derivatives, but we might also consider interest rates or exchange rates. After a time step of length δt, the risk factors are changed by a shock

To streamline notation, we will use S_i ≡ S_i(t), i = 1, …, n, to denote the current value of the risk factors, and δ_i = δS_i to denote their shocks; these variables are collected into vectors S and δ, respectively. If we assume a joint distribution for δ, Monte Carlo estimation of V@R is straightforward, in principle:

1. We sample independent observations of the shocks δ.

2. We reprice each asset included in the portfolio for the new values of the risk factors, and we assess the corresponding loss

equation

3. Given a set of loss observations, we estimate the required quantile.

A quick and dirty way to estimate the quantile would be to sort, say, 1000 observations of loss in decreasing order, and to report the one in position 50 if, for instance, we want V@R at 95% confidence level. For a slightly more careful discussion of quantile estimation, see Section 7.3, where we also underline the related difficulties. An alternative approach, proposed in [6, 7], is to fix a suitable loss threshold x and to estimate the probability of losing more than this value, i.e., the shortfall probability:

where 1_{·} is the indicator function for the relevant event. Then, we may build confidence intervals for this probability based on parameter estimation for the binomial distribution (using, e.g., the R function bino.fit). Furthermore, on the basis of the probability estimates corresponding to a few selected values of the threshold x, we may build an empirical CDF, which enables us to come up with an estimate of V@R. Given the role of shortfall probabilities, in the following we will only be concerned with them.

Unfortunately, the first two steps in the above procedure are not quite trivial, either. Specifying a sensible distribution for multiple factors, especially under stress conditions, is far from trivial. Even if we pretend, as we do in the rest of this section, that a multivariate normal is a sensible model for the underlying risk factors, repricing each derivative contract in a real life portfolio may be a daunting task, possibly itself requiring expensive Monte Carlo runs. Using option greeks, it is possible to develop an approximation which may help in estimating V@R. If we assume that we own a portfolio of derivatives and that the only risk factors are the underlying asset prices, we might rely on a delta-gamma approximation to develop linear or quadratic models.⁷ Let us collect the first-order sensitivities into the vector

We should note that Δ_i should not be interpreted as the delta of a single option in the portfolio; rather, it is the sensitivity of the overall portfolio to a single risk factor. The portfolio may consist of long or short positions in any number of vanilla call or put options, or even exotic derivatives. Thus, Δ_i will be, in general, a linear combination of option deltas. Then, loss can be approximated by the linear model

A similar line of reasoning may be pursued for bonds, using their durations. Clearly, L boils down to a normal random variable, if the underlying shocks are jointly normal. This makes computations quite simple, but possibly oversimplified. In fact, we are missing one point which may be quite relevant for both options and bonds, the passage of time: In general, long option positions lose value, which is measured by the option theta, which is usually negative. Let Θ be the portfolio theta, which may be positive or negative, depending on the sign of the positions. As we pointed out in the previous section, there are cases in which δt is very small and the effect of time can be ignored; for the sake of generality, here we do not neglect this issue. Furthermore, a linear approximation may be inaccurate, and we may include second-order sensitivities, related to option gammas, as a remedy. If we collect the second-order sensitivities into the symmetric matrix

The sensitivity with respect to time plays the role of a constant, but the quadratic term makes the overall approximation non-normal, even when δ is jointly normal.

To check the quality of delta–gamma approximations, let us compare the shortfall probability estimates obtained by:

1. A straightforward Monte Carlo procedure based on full repricing of a portfolio of derivatives. The R code is displayed in Fig. 13.2. Note that shocks in the value of underlying assets are generated according to normal distribution, rather than from a lognormal, to ease the comparison with the approach that we describe in the next section. The probability measure we use, of course, is the real one, as risk-neural measures are relevant for pricing purposes only.

FIGURE 13.2 Estimating a shortfall probability by straightforward Monte Carlo for a portfolio of call and put options.

2. The corresponding estimate by a delta-gamma approximation, shown in Fig. 13.3. This function relies on R code to evaluate greeks; see Fig. 3.51.

Here and in the following section we aim at replicating an experiment carried out in [7]. In that paper, the shortfall probability P{L > x} is estimated, among other things, for a portfolio consisting of a short position in ten calls and five puts, all written on ten stock shares with similar characteristics; note that the number of calls and puts is negative. The input data are shown in Fig. 13.4, where we give a script to compare the two estimates. In the output lists out1 and out2 we also include vectors of losses corresponding to each scenario. Note that, by resetting the state of random generators, we are comparing estimates based on the same 3000 scenarios. It is interesting to produce a scatterplot of the true loss against the delta–gamma estimates and to check their correlation:

The correlation between losses is quite high and indeed the plot in Fig. 13.5 shows that the approximation looks fairly good. The two point estimates also look in fairly good agreement:

Here we are comparing indicator variables detecting when the loss threshold is exceeded. These correlations get smaller and smaller when the threshold is increased. This seems to suggest that the quality of the delta–gamma approximation tends to deteriorate under extreme scenarios. Hence, we may be forced to resort to more sophisticated approximations, when available, or to full option repricing. Nevertheless, this kind of approximation may help improving the efficiency of Monte Carlo methods, as we illustrate in the next section.

13.4 Variance reduction methods for V@R

In this section we consider variance reduction methods, like importance sampling, to measure the risk of a portfolio of derivatives. The approach we describe was originally proposed in [7], and it is an excellent illustration of clever variance reduction strategies, taking advantage of the delta–gamma approximation we have discussed in the previous section. The treatment follows [6, Chapter 9], where more details can be found, as well as an extension to heavy-tailed distributions. As in the previous section, the objective is to estimate the probability P{L > x} that loss L exceeds a given threshold x. It would be convenient to approximate loss by the quadratic function of Eq. (13.10), but, as we have seen, this may be an inaccurate approximation for market scenarios under stress. Furthermore, risk measurement has to cope with tail risk, which implies that many replications might be actually useless, when exceeding the loss threshold is a rare event. As we have seen in Chapter 8, this is a typical case for importance sampling and, possibly, other variance reduction strategies. The main issue with importance sampling is how to find a suitable change of probability measure. To this end, we may take advantage of the quadratic delta-gamma approximation.

Let us assume, for the sake of simplicity, that δ ~ N(0, ∑), where δ is the vector collecting the changes δS_i in the underlying risk factors. As we have seen in Section 5.5.2, sampling such a multivariate normal may be accomplished by choosing a matrix C such that CC = ∑, generating a vector of independent standard normals Z ~ N(0, I), and then computing

Given the selected matrix C, the delta–gamma quadratic approximation of Eq. (13.10) can be rewritten as

where a = −Θ δt and b = −CΔ. Standard choices for the matrix C are the lower triangular Cholesky factor or the (symmetric) square root of the covariance matrix. However, a different choice can be advantageous. In fact, the quadratic form is a function of standard normals, but it is not normal itself, as it involves squares Z_j² and products Z_iZ_j of standard normals. If we could get rid of these products, the quadratic form would be a much simpler function of standard normals, paving the way for useful analysis. This would be the case if the Hessian matrix CΓC in Eq. (13.12) were diagonal. In order to find a matrix C such that CC = ∑ and the resulting Hessian is diagonal, we must find a way to transform the more familiar factorizations of the covariance matrix ∑. Let the square matrix G be one such factorization, i.e., a matrix such that GG = ∑, like the lower triangular Cholesky factor of ∑. Since the matrix −1/2GΓG is symmetric, it can be diagonalized:

where U is an orthogonal matrix (i.e, UU = UU = I, where I is the identity matrix) collecting the unit eigenvectors of −

GΓG. The diagonal matrix Λ consists of the corresponding eigenvalues:

Note that we are not diagonalizing a covariance matrix and that these eigenvalues might also be negative. Now, let us consider the matrix

Therefore, if we sample the factor shocks using the matrix C of Eq. (13.14), we may rewrite the quadratic approximation Q in Eq. (13.12) as

This form of Q includes squared standard normals, which is a bit annoying, but at least it is the sum of independent terms. We may analyze its distribution by evaluating its characteristic function, or we may just complete the square,

which hints at a noncentral chi-square random variable. In fact, we recall from Section 3.2.3.2 that the chi-square distribution is obtained as a sum of squares of independent standard normals. In the expression above the standard normals Z_i are shifted and there are additional transformations, but the resulting distribution can be analyzed by using moment and cumulant generating functions (see Section 8.6.2). We refrain from doing so, as this requires a deeper background, but we give the result of the analysis. The cumulant generating function for Q is

On the basis of this knowledge, we might even compute the exact probability P{Q > x}, which immediately suggests a control variate estimator to improve variance in Eq. (13.7):

A potential issue with this approach is that the correlation between L and Q may be weak just where it matters most, when loss is high, as we have seen in the previous section; hence, we will not pursue this approach here.⁸ Intuition suggests that importance sampling should work better in this specific case, and that we should sample in such a way that large losses, exceeding the threshold x, are more likely. This means that we should change the distribution of the normals Z_i. If we look again at Eq. (13.15), we observe immediately that we may increase loss as follows:

When b_i > 0, we should have a positive Z_i as well, which suggests increasing the expected value of Z_i from 0 to a positive number.

When b_i < 0, we should have a negative Z_i as well, which suggests decreasing the expected value of Z_i from 0 to a negative number.

When λ_i > 0, since it multiplies Z_i², we could increase loss by increasing the variance of Z_i.

This intuition is made more precise in [7] by resorting to exponential tilting. In Section 8.6.3 we have seen that:

Now the problem is how to choose a suitable random variable playing the role of X in the likelihood ratio. Such a variable is the loss Q in the delta–gamma approximation of Eq. (13.15). If we used Q, the application of exponential tilting to the estimation of the shortfall probability would involve the importance sampling estimator e^−θQ+(θ)1_(Q>x}, which yields

where E_θ[·] denotes expectation under the importance sampling measure parameterized by the tilting coefficient θ. Note that, for θ > 0, the tilted measure gives more weight to large losses, and the likelihood ratio in Eq. (13.18) compensates for that. To better see why we may hope to reduce variance, let us consider the second-order moment of the estimator:

The equality in Eq. (13.19) follows from familiar properties of the exponential function and the fact that 1² = 1.

We move back to the original probability measure in Eq. (13.20), which requires multiplying the estimator by the inverse of the likelihood ratio.

The inequality in Eq. (13.21) follows from the fact that if Q > x, then e^−θQ < e^−θx for θ > 0.

The inequality in Eq. (13.22) follows from the fact that 1_{Q>x} ≤ 1.

Clearly, this bound is reduced by increasing θ. However, since Q may be a poor replacement of the true loss L, we should apply the idea as follows:

Again, if we consider the second-order moment of this estimator (be sure to notice the change of measure in the expectation),

we see that it is small if θ > 0 and if Q is large when the event {L > x} occurs. The correlation between Q and L may not be as large as we wish, but it should be enough for the idea to work. However, now we face the problem of sampling Q and L under the tilted measure, which may not be trivial in general. If we assume that the shocks δ are normally distributed, what we actually do is sample a vector of independent standard normals Z and set δ = CZ, where the matrix C is given by the diagonalization process that we have described before. These shocks may be used to reprice the portfolio exactly, rather than by the quadratic approximation. Then, the estimator is actually a function of the normal variables Z, and a key issue is what their distribution is under the tilted measure. We have seen in Example 8.10 a simple case in which by tilting a normal, we end up with another normal. This nice result, as shown in the original paper [7], holds here as well. Under the tilted measure with parameter θ, the distribution of the driving normals Z is still normal, Z ~ N(μ(θ), ∑(θ)), where the covariance matrix is diagonal. The components of μ(θ) and the diagonal entries of ∑(θ) are related to the n eigenvalues λ_i of the Hessian matrix of the quadratic approximation:

By the way, these results agree with our intuitive discussion about how the probability of a large loss can be increased under the importance sampling measure: The sign of the expected value of each normal variable depends on the sign of the corresponding coefficient b_i, and its variance is increased if the corresponding eigenvalue λ_i is positive. A quick look at these expressions shows that we have constraints on the tilting parameter:

These constraints are actually required to ensure the existence of the cumulant generating function in Eq. (13.16). Furthermore, we need some clue about a good setting of θ. Equation (13.22) provides us with an upper bound on the second-order moment of the estimator. This lower bound is minimized by minimizing the argument of the exponential in the expression of the upper bound,

Taking advantage of the convexity property of the cumulant generating function, we enforce the first-order optimality condition and find the minimum by solving the equation

Step 1. Calculate the sensitivities (greeks) of the option portfolio.

Step 2. Using diagonalization of the Hessian matrix, find the parameters a, b_i, and λ_i in the quadratic approximation Q of Eq. (13.15).

Step 3. Find the optimal tilting parameter θ by solving Eq. (13.16).

Step 4. For each replication:

Sample Z from a multivariate normal with parameters given by Eq. (13.25).

Evaluate the delta–gamma approximation Q on the basis of Eq. (13.15), as well as the likelihood ratio e^−θQ+(θ).

Evaluate the shock δ = CZ on the risk factors S, as in Eq. (13.11), reprice the portfolio to find V(S + δ, t + δt), and evaluate the loss L.

Calculate the importance sampling estimator

equation

Step 5. Return the average of the estimator.

To accomplish all of the above in the simple example involving vanilla call and put options (see Fig. 13.4), we use again the functions to evaluate option greeks, but we also need some auxiliary functions depicted in Fig. 13.6.

The function psi evaluates the cumulant generating function for a given value of the tilting parameter tilt.

The function psiPrime evaluates the derivative

’(θ).

The function findTilt finds the optimal value of θ by solving Eq. (13.27). To this end, we use the function uniroot to find a positive root bounded by Eq. (13.26).

Armed with these functions, we may implement the procedure as shown in Fig. 13.7. Note that we are assuming normal shocks on the underlying asset prices, which is not quite consistent with lognormality of prices in the GBM model. We should generate normals Z_i and compute their exponential, but using a simple Taylor expansion we see

Thus, the approximation is not too bad for small shocks. Unfortunately, it is large shocks that we are interested in, and we make them larger using exponential tilting. Indeed, we should change our definition of the greeks, in order to express the sensitivity of option prices to the underlying normal shocks, rather than asset prices. We avoid doing so for the sake of simplicity. Anyway, we compare results for functions based on the same approximation; thus, the comparison should be fair. In order to do so for the toy example, we use the script in Fig. 13.8, which produces the following output:

In order to avoid issues with the non-normality of the estimator, rather than producing a confidence interval using one sequence of replications, we run each function 100 times, producing 100 averages of replications. Then, we build a confidence interval based on these averages and evaluate a percentage error. In fact, we do see a significant reduction in variance when we use importance sampling. Clearly, this is just meant as an illustration and not as a conclusive evidence. We also mention that the variance reduction becomes really impressive when importance sampling is integrated with stratification; see [6].

13.5 Mean–risk models in stochastic programming

Estimating V@R, or any other risk measure, does not necessarily mean that we are actually managing risk. To this end, we need to set up a strategy, or a decision model. For instance, the classical mean-variance portfolio optimization model is a decision support for managers interested in trading off expected return against variance/standard deviation. Since we have other risk measures at our disposal, it is natural to generalize the idea by replacing variance with, say, V@R or CV@R. This leads to mean-risk decision models. In this section we are not concerned with the appropriateness of each risk measure, but with the computational viability of each class of models. Minimizing variance subject to linear constraints, including a lower bound on expected return, is a straightforward convex quadratic programming model, because we may express variance as an explicit function of portfolio weights. In some specific cases, also minimizing V@R and CV@R may be quite simple. However, in general, we have to use the tools of stochastic programming to generate scenarios and solve the corresponding optimization problem. Needless to say, scenario generation methods, including Monte Carlo, play a key role here.⁹

Unfortunately, the minimization of V@R is not a convex optimization problem in general. One reason is its lack of subadditivity. Another reason is the interplay with Monte Carlo sampling. To see why, let us consider two assets, with sampled returns R_i^s, where i = 1, 2 refers to assets and s is the scenario index. For a given scenario, the portfolio return is linear function of portfolio weights w and 1 − ω:

If we see V@R as a function of w, what we obtain is the envelope of linear affine functions. To see why, consider the worst-case return as a function of w:

Thus, we find the lower envelope of a family of affine functions, which is clearly a continuous piecewise linear function, but nondifferentiable and not necessarily convex. Actually, we should select the worst case at confidence level α, but this does not change the nature of the resulting function. To visualize a picture and further investigate the issue, let us try the code in Fig. 13.9. Here we consider two assets with normally distributed returns. Of course, by using the concepts of Section 13.2, we may evaluate V@R as a function of w explicitely, but let us consider a sample of randomly generated returns. In order to estimate V@R, we use the naive quantile estimation approach of Eq. (7.9). If we generate 100 scenarios and plot the V@R at 95% level, by repeating the random sampling four times, we obtain the plots in Fig. 13.10. There, we notice:

There is quite some sampling variability; this is not quite surprising, as there is a significant volatility in the distributions of return, and we are collecting a statistic affected by tail behavior.

The resulting function is nonconvex and nondifferentiable: an optimization nightmare.

If we increase the sample size to 10,000, the plot looks more reasonable, as we see in Fig. 13.11. As a check, if we set portfolio weight to w = 0 and w = 1, V@R should be as follows:

This is in good agreement with the picture, and we see that, by increasing the number of scenarios, we get closer to a smooth and convex function. Indeed, V@R is subadditive for a multivariate normal distribution, but, in a general case, anything can happen.

Since CV@R looks like a complication of V@R, it seems reasonable to expect that it is even a more difficult beast to tame. On the contrary, CV@R is much better behaved, which may be not quite surprising after all, since CV@R is a coherent risk measure and V@R is not. More surprisingly, minimizing CV@R may even lead to a (stochastic) linear programming model formulation.¹⁰ Let f(x, Y) be a loss or cost function, depending on a vector of decision variables x and a vector of random variables Y with joint density g_Y(y), and consider function F_1−α(x, ζ) defined as

where [z]⁺ = max{z, 0}, and ζ

. is an auxiliary variable. It can be shown that minimization of CV@R, at confidence level 1 − α, is accomplished by the minimization of F_1−α(x, ζ) with respect to its arguments. Furthermore, the resulting value of the auxiliary variable ζ turns out to be the corresponding V@R.

If we discretize the distribution of Y by a set S of scenarios, characterized by realizations y^s and probabilities π^s, s

S, we may recast the problem as a stochastic programming problem:

Furthermore, if the loss f(x, y^s) in scenario s is a linear function, and the same applies to the additional constraints depending on the specific model, the minimization of CV@R boils down to the solution of a linear programming model. This important result should be tempered by the difficulty in getting a quantile-based estimate right using a limited number of scenarios. See, e.g., [5] for some critical remarks on the coherence of risk measures and their estimates.

13.6 Simulating delta hedging strategies

In this section we illustrate how Monte Carlo simulation may be used to check the effectiveness of hedging strategies. As an illustrative example, we consider delta hedging in its simplest form. We know from Chapter 3 that the price of a call option written on a non-dividend-paying stock is essentially the cost of a delta hedging strategy and that the continuous-time hedging strategy requires holding an amount Δ of the underlying asset. To summarize the essence of the reasoning, we have a portfolio whose value is a function Π(S) of the underlying asset price S. To hedge against changes in S, we may hold an amount h of the underlying asset. Then, the overall value of the portfolio is

If we look for first-order immunization, we set the first-order derivative to zero:

Delta hedging is a commonly mentioned risk management strategy, but it is open to criticism:

We have a perfect hedge for infinitesimal perturbations, rather than a good hedge for larger shocks.

We should rebalance the hedge frequently, incurring transaction costs.

Delta hedging as a pricing argument in the BSM model neglects not only transaction costs, but stochastic volatilities, jumps, market impact, herding behavior, etc.

In a word, naive delta hedging is subject to modeling errors. Hence, we should probably at least check its effectiveness and robustness by simulation experiments in a realistic setting. Needless to say, this is a job for Monte Carlo methods. For illustration purposes we consider hedging a vanilla call option in the BSM world. Clearly, in real life the net factor exposure due to a whole portfolio of options should be considered.

It may also be instructive to compare delta hedging against another strategy, the stop-loss strategy.¹¹ Stop loss is simpler than delta hedging and model-free. The idea is that we should have a covered position (hold one share) when the option is in the money, and a naked position (hold no share) when it is out of the money. So, we should buy a share when the asset price goes above the strike price K, and we should sell it when it goes below. Ideally, if we buy at K and sell at K, the cash flows cancel each other, assuming that we disregard the time value of money. This strategy makes intuitive sense, but it is not that trivial to analyze in continuous time.¹² Nevertheless, it is easy to evaluate its performance in discrete time by Monte Carlo simulation. The problem with executing the strategy in discrete time is that we cannot really buy or sell at the strike price: We buy at a price larger than K, when we detect that the price went above that critical value, and we sell at a price which is slightly lower. Hence, we cannot claim that cash flows due to buying and selling at K cancel each other. So, even without considering transaction costs, which would affect delta hedging as well, we see a potential trouble with the stop-loss strategy.

An R function to estimate the average cost of a stop-loss strategy is given in Fig. 13.12. The function receives the matrix paths of sample paths, possibly generated as we have seen in Chapter 6, by the function simvGBM. Note that in this case, unlike option pricing, the real drift mu must be used in the simulation, as we check hedging in the real world, and not under the risk-neutral measure. Note that the true number of steps (time intervals) is one less the number of columns in matrix paths, which includes the initial price. If we need to buy shares of the underlying stock, we may need to borrow money, which should be taken into account. But, since we assume deterministic and constant interest rates, we will not account for borrowed money, since we can simply record cash flows from trading and discount them back to time t = 0, having precomputed discount factors in the vector discountFactors. We use a state variable, covered, to detect when we cross the strike price going up or down. Since cash flow is negative when we buy, and positive when we sell, the option “price” is evaluated as the average total discounted cash flow, with a change in sign. We should also pay attention to what happens at maturity: If the option is in the money, the option holder will exercise her right and we will also earn the strike price, which should be included in the cash flow stream.

Since vectorizing R code is often beneficial, we also show a vectorized version of this code in Fig. 13.13. The main trick here is using a vector oldPrice, which is essentially a shifted copy of paths, to spot where the price crosses the critical level, going up or down. The time instants at which we go up are recorded in vector upTimes, where we have a negative cash flow; a similar consideration applies to downTimes.

Now let us check if the two functions are actually consistent, i.e., if they yield the same results, and whether there is any advantage in vectorization:

Indeed, using the function system.time we observe that there is a significant difference in CPU time. Now we should compare the cost of the stop-loss strategy against the cost of delta hedging, as well as the theoretical option price. A code to estimate the average cost of delta hedging is displayed in Fig. 13.14. The code is similar to the stop-loss strategy, but it is not vectorized. The only vectorization we have done is in calling BSMdelta once to get the option Δ for each point on the sample path. Note that Δ must be computed using the current asset price and the current time to maturity. The current position in the stock is updated given the new Δ, generating positive or negative cash flows that are discounted back to time t = 0.

Figure 13.15 displays a script to compare performances of the two hedging strategies, in terms of both expected value and standard deviation of the present value of hedging cost. In the first pair of runs, we are only using 10 hedging steps, which has an impact on hedging errors; in the second pair this is increased to 100 steps. We also compute the exact option price using the BSM formula, for comparison purposes. By running the script, we obtain

The average cost of the stop-loss strategy is larger than the average cost of delta hedging.

What is more significant, however, is the standard deviation of the hedging cost. This is converging to zero with delta hedging, whose expected cost converges to the BSM option price.

Indeed, the price of an option is related to its hedging cost. In the idealized BSM world, the market is complete and we may hedge perfectly the option risk. In other words, we are replicating it exactly. Of course, the introduction of stochastic volatility, transaction costs, modeling errors, etc., make hedging less than perfect. As we mentioned, delta hedging has been often criticized as an over-simplified, possibly misleading, approach. The important message is that by Monte Carlo methods we may assess all of the above effects, for this and other hedging policies.

13.7 The interplay of financial and nonfinancial risks

Let us consider a firm that is subject to currency exchange risk, but also to volume risk.¹³ To make the problem as simple as possible, let us assume that the current exchange rate is 1.0, expressed in any currency exchange ratio you like. There is considerable uncertainty on the exchange rate at some future time T, and let us assume that it is uniformly distributed between 0.7 and 1.3. The firm has to buy a significant amount of the foreign currency for its activity, say, 50,000. If the rate increases, the firm will have to pay much more for that amount of currency; if, on the contrary, the rate decreases, that will be good news. Let us say that the base case is when the rate stays at the current level. Considering the two extreme scenarios, if the rate goes up and turns out to be 1.3, the firm will incur a loss given by

On the contrary, it will have a corresponding gain if the rate turns out to be 0.7.

A long position in a forward contract, where we assume that the forward price is just 1.0.¹⁴

A long position in at-the-money call options. Unlike forward contracts, this requires an upfront payment. Let us assume that the price of each option is 0.1; in other words, we have to pay 10% of the nominal amount we hedge, in the domestic currency.

Hence, if we only consider the scenarios (0.7, 1.0, 1.3), the corresponding outcomes would be

(15,000, 0, − 15,000) with 0% hedging.

(0, 0, 0) with 100% hedging using forward contracts.

(−5000, −5000, 10,000) with 100% hedging using options; note that in this case we have to pay 5000 for the options, which expire worthless in the first two outcomes, whereas we have to subtract the option price from the option payoff in the third case.

With no hedging there is considerable uncertainty, which is completely eliminated by forward contracts. With options, we limit the downside a bit, while retaining some upside. The choice is a matter of risk aversion, as well as probabilities, assuming that we are willing to associate probabilities with each event.

In many practical settings, however, there is still another complication: volume risk. We may not really know precisely the amount we have to hedge. Say that 50,000 is the volume in the nominal scenario, i.e., the base case, but the amount we need to hedge is uniformly distributed between 10,000 and 90,000. For the sake of simplicity, we assume that the two sources of risk are independent but, clearly, Monte Carlo methods may be applied in more complicated and realistic settings. To see why volume risk is relevant, let us check the outcome resulting from 100% hedging with forward contracts in the following scenario: exchange rate is 0.7 and required volume is 10,000. Now we have to buy (and arguably to sell immediately) 40,000 useless units of foreign currency, with a loss of

Apart from the loss, we are here in a probably bad business setting, since the reduced requirements is likely to come from lost business. With 100% coverage using call options, which give the right but not the obligation to buy, the loss would be just the foregone option premium, 5000.

Deciding the best course of action is by no means a trivial task, even in this idealized setting. To support the decision makers, it would be nice to have a picture of the profit/loss scenarios as a function of the decision variables. Then we might also want to plot a given risk measure. All of these important tasks can be accomplished by Monte Carlo sampling.

For further reading

An elementary introduction to risk aversion and related issues can be found in [3].

The original treatment of coherent risk measures can be found in [1].

For Monte Carlo applications to credit risk, see [6, Chapter 9].

The reformulation of CV@R in a way suitable for stochastic optimization was originally proposed in [9, 10].

References

1 P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk. Mathematical Finance, 9:203–228, 1999.

2 P. Artzner, F. Delbaen, J.-M. Eber, D. Heath, and H. Ku. Coherent multiperiod risk adjusted values and Bellman’s principle. Annals of Operations Research, 152:5–22, 2007.

3 P. Brandimarte. Quantitative Methods: An Introduction for Business Management. Wiley, Hoboken, NJ, 2011.

4 P.P. Carr and R.A. Jarrow. The stop-loss start-gain paradox and option valuation: a new decomposition into intrinsic and time value. Review of Financial Studies, 3:469–492, 1990.

5 F.J. Fabozzi and R. Tunaru. On risk management problems related to a coherence property. Quantitative Finance, 6:75-81, 2006.

6 P. Glasserman. Monte Carlo Methods in Financial Engineering. Springer, New York, 2004.

7 P. Glasserman, P. Heidelberger, and P. Shahabuddin. Variance reduction techniques for estimating value-at-risk. Management Science, 46:1349–1364, 2000.

8 J.C. Hull. Options, Futures, and Other Derivatives (8th ed.). Prentice Hall, Upper Saddle River, NJ, 2011.

9 R.T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. The Journal of Risk, 2:21–41, 2000.

10 R.T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions. Journal of Banking and Finance, 26:1443–1471, 2002.

¹Since we are comparing random variables, the inequality should be qualified as holding almost surely, i.e., for all of the possible outcomes, with the exception of a set of measure zero. The unfamiliar reader may consider this as a technicality.

²The essence of time consistency of a multiperiod risk measure is that if a portfolio is riskier than another portfolio at time horizon τ, then it is riskier at time horizons t < τ as well. See, e.g., [2].

³Arguably, the lowercase letter in the middle should also avoid confusion with VAR, which usually refers to vector autoregressive models.

⁴We stick to the statistical convention that α is the small area associated with the tail, but sometimes the opposite notation is adopted.

⁵Intuitively, drift is related to expected return, and volatility is related to standard deviation. On a short time interval of length δt, since drift scales linearly with δt, whereas volatility is proportional to √δt, drift goes to zero more rapidly than does volatility. See the square-root rule, which we considered in Section 3.7.1.

⁷To be precise, if we assume that the risk factors are just the underlying asset prices, we may rely on the standard formulas of Section 3.9.2 for option greeks. Otherwise, option senstivities must be determined, and estimated, by a careful analysis.

¹⁰In this section, we rely on results from Rockafellar and Uryasev [9, 10], which we take for granted, thereby cutting a few corners.

¹³For a real life example of such a situation, please refer to the following business case, which has been an inspiration for the section: M.A. Desai, A. Sjoman, and V. Dessain. Hedging Currency Risks at AIFS. Case no. 9-205-026, Harvard Business School Publishing.

¹⁴Therefore, we are assuming that there is no difference between the interest rates in the two currencies.