The previous chapter explained the importance of Value at Risk (VaR); namely, it is the best single measure to assess market risk. It is a good measure of risk because it combines information on the sensitivity of the value to changes in market-risk factors with information on the probable amount of change in those factors. VaR tries to answer the question, “How much could we lose today given our current position and the possible changes in the market?” VaR formalizes that question into the calculation of the level of loss that is so bad that there is only a 1 in 100 chance of there being a loss worse than the calculated VaR. VaR estimates this level by knowing the current value of the portfolio and calculating the probability distribution of changes in the value over the next trading day. From the probability distribution we can read the confidence level for the 99-percentile loss.
To estimate the value’s probability distribution, we use two sets of information: the current position, or holdings, in the bank’s trading portfolio, and an estimate of the probability distribution of the price changes over the next day. The estimate of the probability distribution of the price changes is based on the distribution of price changes over the last few weeks or months.
The goal of this chapter is to explain how to calculate VaR using the three methods that are in common use: Parametric VaR, Historical Simulation, and Monte Carlo Simulation.
It is important to note that while the three calculation methods differ, they do share common attributes and limitations. For example, each approach uses market-risk factors. Risk factors are fundamental market rates that can be derived from the prices of securities being traded. Typically, the main risk factors used are interest rates, foreign exchange rates, equity indices, commodity prices, forward prices, and implied volatilities. By observing this small number of risk factors, we are able to calculate the price of all the thousands of different securities held by the bank. For example, it is possible to price all government bonds by knowing the risk-free interest rates for just a dozen points on the yield curve. This risk-factor approach uses less data than would be required if we tried to collect historical price information for every security.
Each approach uses the distribution of historical price changes to estimate the probability distributions. This requires a choice of historical horizon for the market data; e.g., how far back should we go in using historical data to calculate standard deviations? This is a trade-off between having large amounts of information or fresh information.
Because VaR attempts to predict the future probability distribution, it should use the latest market data with the latest market structure and sentiment. However, with a limited amount of data, the estimates become less accurate, and there is less chance of having data that contains those extreme, rare market movements which are the ones that cause the greatest losses.
Each approach has the disadvantage of assuming that past relationships between the risk factors will be repeated; e.g., it assumes that factors that have tended to move together in the past will move together in the future.
Each approach uses binning (also known as mapping) to put cash flows into a finite number of buckets. To understand the need for binning, consider a bond portfolio which will have coupons and principal payments due almost daily for several years. It would be possible to calculate the duration for every cash flow, then calculate the statistics of the rate movements for each day, but this requires a very large amount of data. As an alternative, we can bin (or map) all the cash flows onto a limited number of time points, and just deal with the statistics of those time points. Typically, approximately 10 time points are used, including 3 months, 6 months, 12 months, 18 months, 2 years, 5 years, and 10 years. To understand the process of mapping, consider a cash flow of $100 falling due in 6 years. This could be mapped onto the 5- and 10-year points as $75 at 5 years and $25 at 10 years. The mapping will try to preserve some combination of cash flow, present value, duration, or stand-alone VaR amounts. The process of binning is discussed in Appendix A to this chapter.
Each approach has strengths and weaknesses when compared to the others, as summarized in Figure 6-1. The degree to which the circles are shaded corresponds to the strength of the approach. The factors evaluated in the table are the speed of computation, the ability to capture nonlinearity, the ability to capture non-Normality, and the independence from historical data. Nonlinearity refers to the price change not being a linear function of the change in the risk factors. This is especially important for options. Non-Normality refers to the ability to calculate the potential changes in risk factors without assuming that they have a Normal distribution. Note, for example, that Parametric VaR is fast, but does not capture non-Normality and nonlinearity. Monte Carlo captures nonlinearity, but does not capture non-Normality and can be slow. Historical simulation captures non-Normality and nonlinearity, but the resuts are heavily influenced by the exact form of historical market movements; e.g., if there was a significant crisis in the past, historical simulation will keep reliving that crisis in exactly the same form.
FIGURE 6-1 Summary of VaR Techniques
The relative strengths of the VaR calculation methods are shown by the extent of the shading.
Parametric VaR is also known as Linear VaR, Variance-Covariance VaR, Greek-Normal VaR, Delta Normal VaR, or Delta-Gamma Normal VaR. The approach is parametric in that it assumes that the probability distribution is Normal and then requires calculation of the variance and covariance parameters. The approach is linear in that changes in instrument values are assumed to be linear with respect to changes in risk factors. For example, for bonds the sensitivity is described by duration, and for options it is described by the Greeks.
The overall Parametric VaR approach is as follows:
• Define the set of risk factors that will be sufficient to calculate the value of the bank’s portfolio.
• Find the sensitivity of each instrument in the portfolio to each risk factor.
• Get historical data on the risk factors to calculate the standard deviation of the changes and the correlations between them.
• Estimate the standard deviation of the value of the portfolio by multiplying the sensitivities by the standard deviations, taking into account all correlations.
• Finally, assume that the loss distribution is Normally distributed, and therefore, approximate the 99% VaR as 2.32 times the standard deviation of the value of the portfolio.
Parametric VaR has two advantages:
• It is typically 100 to 1000 times faster to calculate Parametric VaR compared with Monte Carlo or Historical Simulation.
• Parametric VaR allows the calculation of VaR contribution, as explained in the next chapter.
It also has significant limitations:
• It gives a poor description of nonlinear risks.
• It gives a poor description of extreme tail events, such as crises, because it assumes that the risk factors have a Normal distribution. In reality, as we found in the statistics chapter, the risk-factor distributions have a high kurtosis with more extreme events than would be predicted by the Normal distribution.
• Parametric VaR uses a covariance matrix, and this implicitly assumes that the correlations between risk factors is stable and constant over time.
To give an intuitive understanding of Parametric VaR, we have provided three worked-out examples. The examples are fundamentally quite simple, but they introduce the method of calculating Parametric VaR. There are a lot of equations, but the underlying math is mostly algebra rather than complex statistics or calculus.
The main statistical relationship that will be used is taken from Chapter 3: the variance of the sum of two numbers is a function of the variance of the individual numbers and the correlation between them. If we have a portfolio of two instruments, the loss on the portfolio (LP) will be the sum of the losses on each instrument:
LP = L1 + L2
The standard deviation of the loss on the portfolio (σP) will be as follows:
Here, σ1 is the standard deviation of losses from instrument 1, and ρ1,2 is the correlation between losses from 1 and 2.
Three different notations are used in this chapter: algebraic, summation, and matrix. Algebraic notation is used for most of the equations because it is easiest to understand if there are just a few variables; however, it becomes cumbersome with many variables. It then becomes easier to use summation or matrix notation. As an example, consider the following equation in algebraic notation:
This can be written in summation notation as follows:
This means first sum over j from 1 to 2 then sum over i from 1 to 2:
The matrix notation for the same equation is as follows:
If we carry out the usual matrix multiplications we get back to the same original equation:
Notice that all three notations give the same result. The choice of the notation to use is not terribly important and is generally dictated by convention and convenience.
The examples worked out below use absolute changes in risk factors. In practice, relative changes are often used because the distribution of relative changes is closer to Normal. However, working out the equations with relative changes is more complicated, and adds little intuitive understanding. Therefore, the relative changes are relegated to Appendix B.
The first example calculates the stand-alone VaR for a bank holding a long position in an equity. The stand-alone VaR is the VaR for the position on its own without considering correlation and diversification effects from other positions. The present value of the position is simply the number of shares (N) times the value per share, VS.
PV$ = N × VS
The change in the value of the position is simply the number of shares multiplied by the change in the value of each share:
ΔPV$ = N × ΔVS
The standard deviation of the value is the number of shares multiplied by the standard deviation of the value of each share. (This step is explained more thoroughly in the statistics chapter.)
σv = N × σS
As we have assumed that the value changes are Normally distributed, there will be a 1% chance that the loss is more than 2.32 standard deviations; therefore, we can calculate the 99% VaR as follows
VaR = 2.32 × N × σS
In this very simple example, notice that there are two elements: N, which describes the sensitivity of the position to changes in the risk factor, and σS, which describes the volatility of the risk factor.
As a slightly more complex example, consider a government bond held by a U.K. bank denominated in British pounds with a single payment. The present value in pounds (PVp) is simply the value of the cash flow in pounds (Cp) at time t discounted according to sterling interest rates for that maturity, rp:
The sensitivity of the value to changes in interest rates is the derivative with respect to rp:
Notice that this is the same as duration but without the minus sign. For simplicity in this example, let us represent the derivative by dr:
The change in the value is the sensitivity multiplied by the change in interest rates:
ΔPVp = dr × Δrp
The standard deviation of PVp is then the standard deviation of the rate times dr, and the 99% VaR is 2.32 times the result:
VaR = 2.32 × dr × σrp
To make this example more concrete, consider a bond paying 100 pounds (Cp) in 5 years’ time (t), with the 5-year discount rate at 6% (rp), and a standard deviation in the rate of 0.5% (σrp). The present value is then 74 pounds, the sensitivity (dr) is −352 pounds per 100% increase in rates, and the VaR is 4.1 pounds:
The bars on each side of the equation indicate that we take the absolute value; i.e., we drop the minus sign. We need to use the absolute value because we have taken a shortcut in this calculation. As we will discover in the next example, Parametric VaR is actually 2.32 times the standard deviation of value, i.e., the square root of the variance, and is therefore always positive. When dealing with one risk factor, as above, we can skip the step of squaring then taking the square root, but if we skip this step, we need to make sure that the result is not negative.
The two examples above were simple because they had only one risk factor. Now let us consider a multidimensional case: the same simple bond as before, but now held by a U.S. bank. The U.S. bank is exposed to two risks: changes due to sterling interest rates and changes due to the pound-dollar exchange rate. The value of the bond in dollars is the value in pounds multiplied by the FX rate:
The change in value due to changes in interest rates is as before, but translated into dollars:
The linear change in value due to a change in FX rates is simply given by the derivative with respect to FX:
Therefore, the change in value due to a change in FX is given by the following:
The change in value due to both a change in rates and a change in FX is given by the sum of individual changes:
For simplicity, let us define the derivative with respect to FX to be dFX and the derivative with respect to sterling interest rates to be drp:
(Notice that drp is different than the one used in the previous example because it is now for a U.S. bank, and therefore is in dollars.) We can now rewrite the equation for change in value in a simpler form:
ΔPV$ = dFXΔFX + drpΔrp
Now we want to get an expression for the variance of PV. The main complication is that changes in FX are correlated with changes in interest rates. To get from deltas to variances for correlated variables, we need to use the relationship we discussed earlier for the sum of losses:
We can use this equation to find the standard deviation of the bond value by making the following substitutions:
L1 = dFXΔFX
L2 = drpΔrp
LP = ΔPV$
Here, dFX and drp are fixed multipliers, but ΔFX and Δrp are random values. The variances of ΔFX and Δrp are estimated from the historical data. The variance for FX rates is as follows (where FXt is the FX historical rate on day t):
(Assuming the mean is zero.) The variance for interest rates is calculated similarly:
The correlation is estimated from the cross-multiplication of the changes:
These values can be substituted for σ1, σ2, and ρ1,2:
σ1 = dFXσFX
σ2 = drpσrp
ρ1,2 = ρFX,rp
Substituting this back in the equation for σP gives us the variance for the bond’s value:
Standard deviation is the square root of variance, and VaR is 2.32 times the standard deviation; therefore, VaR is as follows:
To put numerical values to this, assume that the bond is the same as before: paying 100 million pounds in 5 years’ time, with the 5-year discount rate at 6%, and a standard deviation in the rate of 0.5%. Also assume that the exchange rate is 1.6 dollars per pound, the volatility of the rate is 0.02, and the correlation between FX rates and interest rates is −0.6. We will use this example several times in the following chapter, and the parameter values are therefore shown in Table 6-1.
With these values we obtain the current value, the sensitivities, and the VaR result of $9.05:
TABLE 6-1 Parameter Values for the Example of a Sterling Bond
PV$ = $119
dFX = 74.7
dr = −564.0
VaR = $9.05
In the example above we had only two risk factors. We can generalize the VaR equation relationship to give VaR for a position with many risk factors:
Here, N is the total number of risk factors being used, and dN is the derivative of the portfolio’s value with respect to the Nth risk factor:
The equation above can become cumbersome and can be written more compactly in summation notation:
Alternatively, we can use matrix notation. Using matrix notation, we can put the derivatives into a vector and the statistics into a covariance matrix:
The covariance matrix has the variances of the risk factors along the main diagonal and has the covariances off the diagonal. Notice that the covariance matrix is symmetric. Appendix C describes the covariance matrix in more detail. We obtain the variance of the portfolio, , by multiplying the derivative vector with the covariance matrix:
where DT is the transpose of D. (In a transpose, the rows and columns are switched.) VaR is then given by a simple expression:
In our numerical example D, C, DT and VaR are as follows:
VaR is obtained by matrix multiplication:
In general, if we have many risk factors, we can extend the vector of sensitivities and the covariance matrix:
In the example above, we had one security that was sensitive to two different risk factors. If the portfolio is made up of several securities, each of which is affected by the same risk factor, then the sensitivity of the portfolio to the risk factor is simply the sum of the sensitivities for the individual positions. For example, consider a portfolio holding our example 100-pound five-year bond and 100 pounds of cash. The value of the bond in dollars was given as:
The value of the cash position is simply 100 pounds translated into dollars:
ValueCash = FX × 100
The present value of the portfolio is simply the sum of the value of the two securities:
The sensitivity of the value to changes in the FX rate is given by the derivative with respect to FX rates:
The sensitivity to interest rates is unchanged from the previous example. In matrix notation, we have one vector for the bond, one for the cash, and then the sum for the portfolio. The VaR in this case is $13.11.
DBond = [74.7 −563]
DCash = [100 0]
DPosition = DBond + DCash
= [174.7 −563]
In general, if we have multiple risk factors, 1 to N, and have multiple securities, A through Z, the vector for the portfolio will be the sum of the vectors for the individual securities:
DA = [dA,1, dA,2, dA,3, . . . dA,N]
DB = [dB,1, dB,2, dB,3, . . . dB,N]
DZ = [dZ,1, dZ,2, dZ,3, . . . dZ,N]
DPortfolio = DA + DB + . . . + DZ
DPortfolio = [(dA,1 + dB,1 + . . . + dZ,1), ..., (dA,N + dB,N + . . . + dZ,N)]
The VaR for the portfolio is calculated as before, but using the sensitivity vector for the chosen portfolio:
The section above discussed the methodologies for calculating parametric VaR. Many vendors have incorporated these methodologies into “industrial-strength” calculators that can be run daily to calculate VaR for all the thousands of securities in a bank’s trading operation. Figure 6-2 gives a typical layout for a Parametric VaR calculator.
These calculators work in the following manner:
• The calculator is fed market data and position data. The market data comes from data vendors such as Telerate, Bloomberg, and Reuters. The position data comes from the trader’s deal capture or position-keeping system, i.e., the systems that the traders use to record their purchases and sales.
• The position and market data must then be cleaned to remove gross errors, such as rates that are entered in decimals instead of percentages. There must also be algorithms to fill in any missing data, including data for markets that had local holidays.
• The market data is used to calculate the covariance matrix and is fed into the calculation of the derivative vectors.
• The derivative vectors for each type of security are calculated using analytic formulas or by perturbing pricing calculators by small amounts to get the delta.
• VaR is then calculated by multiplying the derivative vectors with the covariance matrix.
FIGURE 6-2 The Modules within a Parametric VaR Calculator
Typically, VaR will be reported both for the institution as a whole and for individual desks and traders as if they were stand-alone units. Organizational data for the bank is stored and used to determine which derivative vectors should be added together for each business unit.
The VaR calculator is typically run overnight in a batch process with the intention of showing management the risk profile at the start of the next day. Ideally, at the start of the day, senior management should get a report showing the current position of the bank, how much could be lost in the coming day, and the main causes of such a loss. Management can then act to hedge or reduce any positions that it considers to be too dangerous.
As noted above, the VaR calculator is typically run overnight in a batch process. Intraday calculations are also desirable for two reasons:
• Some fast-moving positions can build up significant risks very quickly, and management would like early warning.
• As discussed later, some banks limit the maximum amount of VaR that a trader can have. If this is the case, traders must have some way of knowing whether the next trade would cause these limits to be violated.
The industry is still grappling with the problems of quickly calculating intraday VaR, but the most common approach is to use an incremental calculation. In an incremental calculation, the overnight batch process produces the covariance matrix and the derivative vectors for all the bank’s existing positions. Traders are then given VaR calculators on their desks that have sufficient functionality to create derivative vectors for any new trades that they are considering. These vectors are added to their existing derivative vector to calculate the new VaR. This is illustrated in Figure 6-3. The intraday calculator is similar to the full, overnight calculator except the intraday calculator stores the market data and derivative data from the previous night’s batch.
FIGURE 6-3 Calculation of Intra-Day Parametric VaR
Diagram of how an intraday VaR calculator operates. Traders use this VaR calculator to calculate the VaR for potential trades by creating a derivative vector for the new trade and adding it to the vectors for their existing trades.
Conceptually, historical simulation is the most simple VaR technique, but it takes significantly more time to run than parametric VaR. The historical-simulation approach takes the market data for the last 250 days and calculates the percent change for each risk factor on each day. Each percentage change is then multiplied by today’s market values to present 250 scenarios for tomorrow’s values. For each of these scenarios, the portfolio is valued using full, nonlinear pricing models. The third-worst day is then selected as being the 99% VaR.
As an example, let’s consider calculating the VaR for a five-year, zero-coupon bond paying $100. We start by looking back at the previous trading days and noting the five-year rate on each day. We then calculate the proportion by which the rate changed from one day to the next:
Scenarios are then created for tomorrow’s rate by applying the proportional change to today’s rate:
rScenario,k = rToday (1 + Δt)
We shift from a subscript of t to k because there is a conceptual shift. We use data from the past days to create scenarios of what could happen tomorrow. The scenarios therefore do not represent what has happened, but what could happen in the next day. Using these scenarios, we value the bond using the usual formula:
and then calculate the change in value:
ΔVk = ValueScenario,k − ValueToday
Table 6-2 gives an example of 10 days of data (rather than 250 days). The “Rate” column shows the rate observed at the end of each day. The next columns show the proportional change and the scenario that would occur if that change were to happen starting from today’s rate. The final columns show the bond value and loss in each scenario. In this example, the worst-case change is a loss of 39 cents, which is a rough estimate of the 10-percentile loss.
TABLE 6-2 Example of Historical VaR Calculation
There are two main advantages of using historical simulation:
• It is easy to communicate the results throughout the organization because the concepts are easily explained.
• There is no need to assume that the changes in the risk factors have a structured parametric probability distribution (e.g., no need to assume they are Joint-Normal with stable correlation).
The disadvantages are due to using the historical data in such a raw form:
• The result is often dominated by a single, recent, specific crisis, and it is very difficult to test other assumptions. The effect of this is that Historical VaR is strongly backward-looking, meaning the bank is, in effect, protecting itself from the last crisis, but not necessarily preparing itself for the next.
• There can also be an unpleasant “window effect.” When 250 days have passed since the crisis, the crisis observation drops out of our window for historical data, and the reported VaR suddenly drops from one day to the next. This often causes traders to mistrust the VaR because they know there has been no significant change in the risk of the trading operation, and yet the quantification of the risk has changed dramatically.
Monte Carlo simulation is also known as Monte Carlo evaluation (MCE). It estimates VaR by randomly creating many scenarios for future rates, using nonlinear pricing models to estimate the change in value for each scenario, and then calculating VaR according to the worst losses.
Monte Carlo simulation has two significant advantages:
• Unlike Parametric VaR, it uses full pricing models and can therefore capture the effects of nonlinearities.
• Unlike Historical VaR, it can generate an infinite number of scenarios and therefore test many possible future outcomes.
Monte Carlo has two important disadvantages:
• The calculation of Monte Carlo VaR can take 1000 times longer than Parametric VaR because the potential price of the portfolio has to be calculated thousands of times.
• Unlike Historical VaR, it typically requires the assumption that the risk factors have a Normal or Log-Normal distribution.
The Monte Carlo approach assumes that there is a known probability distribution for the risk factors. The usual implementation of Monte Carlo assumes a stable, Joint-Normal distribution for the risk factors. This is the same assumption used for Parametric VaR. The analysis calculates the covariance matrix for the risk factors in the same way as Parametric VaR but unlike Parametric VaR, it then decomposes the matrix as described below. The decomposition ensures that the risk factors are correlated in each scenario. The scenarios start from today’s market condition and go one day forward to give possible values at the end of the day. Full, nonlinear pricing models are then used to value the portfolio under each of the end-of-day scenarios. For bonds, nonlinear pricing means using the bond-pricing formula rather than duration, and for options, it means using a pricing formula such as Black-Scholes rather than just using the Greeks.
From the scenarios, VaR is selected to be the 1-percentile worst loss. For example, if 1000 scenarios were created, the 99% VaR would be the tenth-worst result. Figure 6-4 summarizes the Monte Carlo approach:
Most of the Monte Carlo approach is conceptually simple. The one mathematically difficult step is to decompose the covariance matrix in such a way as to allow us to create random scenarios with the same correlation as the historical market data. For example, in the previous example of a Sterling bond held by a U.S. bank, we assumed a correlation of −0:6 between the interest rate and exchange rate. In other words, when the interest rate increases, we would expect that the exchange rate would tend to decrease. One way to think of this is that 60% of the change in the exchange rate is driven by changes in the interest rate. The other 40% is driven by independent, random factors. The trick is to create random scenarios that properly capture such relationships.
If we just have two factors, we can easily create correlated random numbers in the following way:
FIGURE 6-4 Illustration of the Process for Monte Carlo VaR
• Draw a random number, z1, from a Standard Normal distribution.
• Multiply z1 by the standard deviation of the first risk factor (σA) to create the first risk factor for that scenario, fA:
fA = σAz1, z1 ∼ N(0,1)
• Multiply z1 by the correlation, ρA,B.
• Draw a second independent random number, z2.
• Multiply z2 by the root of one minus the correlation squared .
• Add the two results together to create a random number (y) that has a standard deviation of one and correlation ρA,B with fA:
• Multiply y by the standard deviation of the second risk factor, σB, to create the second risk factor for that scenario, fB:
fB = σBy
This process can be summarized in the following equations:
For the previous bond example, we would create changes in the risk factors rp and FX by using the following equations:
Unfortunately, this simple approach does not work if there are more than two risk factors. If there are many risk factors, we need to create the correlation by decomposing the covariance matrix using either Cholesky decomposition or Eigen-value decomposition. We will give an overview of each approach.
Cholesky decomposition finds a new matrix, A, such that the transpose of A times A equals the covariance matrix, C:
C = ATA
A is also required to be upper triangular, i.e., all the elements below the main diagonal are zero; e.g.:
As a two-dimensional example, assume that we have a covariance matrix with known covariances a, b, and c:
We now wish to find the elements of A to satisfy the following equation:
A will be as required if we define its elements as follows:
α2 = a
αβ = b
β2 + φ2 = c
This can be solved as follows:
Now we take two random numbers, z1 and z2, drawn independently from Normal distributions with a mean of zero and standard deviation of one to create a vector Z:
Z = [z1 z2]
If we multiply A by Z, we get a vector, F, of two random risk factors that are correlated according to the original covariance matrix:
If you recognize that is the correlation between the risk factors, you will see that in this two-dimensional example, the result comes out to be the same as our previous simple approach for getting correlated random numbers.
As a further example, recall the statistics of the risk factors for our example of the 100-pound bond:
σrp = 0.5%
σFX = 0.02
ρFX,rp = −0.6
For this example, the covariance and Cholesky matrices are as follows:
This gives us the following equations for changes in the risk factors:
δFX = 0.02 × z1
δrp = −0.003 × z1 + 0.004 × z2
In the two-dimensional example above, we had two risk factors and could easily calculate the Cholesky decomposition by hand. For a larger number of risk factors, the equations become more tedious, but many software packages include Cholesky decomposition. If you need to program a Cholesky decomposition, you can find suitable algorithms in Numerical Recipes.1
Cholesky decomposition is relatively straightforward to program, but the algorithm used to find the Cholesky matrix does not work if the matrix is not positive definite. To be positive definite, all the Eigenvalues (see below) of the covariance matrix must be positive. In practical terms, this means that none of the risk factors can have a perfect correlation with another factor. This condition often breaks down when constructing covariance matrices, either because there is not enough historical data to show that variables are independent, or because of small errors in the data. In practice, Cholesky decomposition tends to fail if there are more than 10 to 20 risk factors in the covariance matrix.
The alternative to Cholesky decomposition is Eigenvalue decomposition, which is also known as Principal Components analysis. It is more difficult to program than Cholesky decomposition, but it will work for covariance matrices that are not positive definite. This means that it will work for covariance matrices with hundreds of risk factors. Eigenvalue decomposition only fails if different parts of the matrix were built with data from different time periods (because inconsistencies in the data may cause negative variances for some of the principal components). Eigenvalue decomposition also has the advantage that it can give intuitive insights into the structure of the random risk factors, allowing us to identify the main drivers of risk. This can help us reduce the number of simulations needed.
Eigenvalue decomposition works by looking for two matrices, Λ and E, to satisfy the following equation:
C = ETΛE
C is the covariance matrix. Λ is a square matrix in which all the elements other than the main diagonal are zero:
E is such that when it is multiplied by its transpose, the result is the identity matrix:
I = ETE
Because Λ is diagonal, it can easily be broken into two parts. This allows us to decompose the covariance matrix:
To generate correlated random numbers, we use the matrix B from Eigenvalue decomposition in the same way as the A matrix obtained from Cholesky decomposition:
As an example, for our two-dimensional bond example, the results are as follows:
Which gives this result after multiplying out the matrices:
δFX = −0.154 × 0.004 × z1 + 0.02 × 0.988 × z2
δrp = 0.004 × 0.988 × z1 − 0.02 × 0.154 × z2
For those with some experience with matrix math, Appendix D digs deeper into why it should be that Eigenvalue decomposition produces properly correlated numbers.
The Eigen vectors also have special properties that are useful for speeding up Monte Carlo evaluations. Each Eigen vector defines a market movement that is by definition independent of the other movements (due to the requirement that I = ETE). The best illustration of the special properties of Eigen vectors is the Eigenvalue decomposition of a yield curve.
Let us consider the U.S. government yield curve. Below, there is the standard deviation (S), correlation (R) and covariance (C) matrices for absolute changes in the 3-month, 1-year, 5-year and 20-year interest rates. This is followed by the calculation of the B matrix, which is very interesting:
The Eigenvalue decomposition of C is as follows:
Let us look at how B would be used to create random scenarios:
Notice that the rows of B describe changes in the rates that are a shift, twist, flex, and wiggle:
Wiggle = z1[ 0.01 −0.01 0.01 −0.01]
Flex = z2[ 0.01 −0.02 0.00 0.01]
Twist = z3[ −0.04 −0.01 0.01 0.02]
Shift = z4[ 0.03 0.05 0.06 0.05]
For a given change in the random factor (z), the shift has the strongest effect, and the wiggle is the weakest.
If z4 increases by 1, all the rates shift up by the amount shown on the bottom row of the B matrix; i.e., the 3-month rate will shift up by 3bps, the 1-year rate will shift up by 5bps, the 5-year rate will shift up by 6bps, and the 20-year rate will shift up by 5bps. When z3 increases by 1, the 3-month and 1-year rates twist down, but the 5-year and 20-year rates twist up, according to the third row of the matrix. When z2 increases by 1, the 1-year rate flexes down a little, but the other rates stay the same or increase slightly, as in the second row of B. When z1 increases by 1, there is a small wiggle down of the 1-and 20-year rates, and a wiggle up of the 3-month and 5-year rates. In summary, this matrix shows us the extent to which changes in the yield curve can be regarded as independent movements of shift, twist, flex, and wiggle.
Figure 6-5 plots the 4 rows of the B matrix for U.S. interest rates; these are the principal components of any rate shift. Notice that for a 1-unit change in the random z factor, the shift has the highest impact, and the wiggle has very little influence on the final shape. If we kept z1, and z2 constant, and just let z3 and z4 be random, we would capture most of the uncertainty in rates because the flex and wiggle are comparatively small.
Monte Carlo evaluation is generally considered to be computationally intensive; i.e., it takes a long time to compute the results, compared with Parametric VaR. Creating the random scenarios for the risk factors takes relatively little time to run once it is programmed. The slow part of running Monte Carlo VaR is doing the pricing of all instruments under each scenario. Several techniques have been developed to reduce the computation time. Here, we will discuss four of the most popular:
• Parallel processing
• Stabilization of results
• Variance-reduction techniques
• Approximate pricing
Let’s look at each in more detail:
FIGURE 6-5 Principal Component Analysis of the U.S. Yield Curve
This figure plots the four rows of the B matrix for the U.S. interest rates. In a Monte Carlo simulation, each row is multiplied by a separate random number drawn from a Standard Normal distribution. And then the rows are added together. The result is a set of scenarios which tend to have large shifts, moderate twists, small flexing, and even smaller wiggle.
Parallel processing simply uses several computers and simultaneously evaluates different groups of scenarios or instruments on each. Some care has to be taken to distribute the processes and collate the results, but this is relatively easy. The main drawback is the cost of hardware.
Stabilization of results reduces the number of scenarios that need to be run. If we allow the scenarios to be completely different on one day compared with the next, it is quite likely that the results will change not because there has been any fundamental change in the risk, but because different random scenarios have been tested. The common approach to reducing this problem is to run many scenarios each day to “average out” the random fluctuations.
An alternative approach is to run fewer scenarios but fix the Normally distributed, independent, random numbers that are used to create the scenarios and only allow the correlation matrix and portfolio composition to change from day to day. The consequence is that the results only change due to real market and portfolio changes.
The random numbers can be fixed either by generating the numbers once and storing them, or by using the same seed number to start the random sequence each day. Although fixing the random numbers sounds straightforward, it does require discipline when writing the program to ensure that the same random number will be used in the same place for every run.
There is a slight disadvantage to fixing the random numbers because there is less opportunity to test obscure scenarios that may cause unusual losses. As a compromise, you may allow a proportion of the random numbers to change each day.
Variance-reduction techniques choose the numbers to be more evenly distributed than in the pure random case. This reduces the randomness in the results and allows us to use fewer scenarios. In some cases, the number of scenarios may be reduced by a factor of 10 to 100 while still maintaining the same accuracy.
There are a number of techniques that do this. The most easily implemented techniques are Antithetic sampling, Stratification, Importance sampling, and the Latin Hypercube. They are briefly outlined below.
Antithetic sampling is the easiest variance reduction technique. If we wanted to create N random scenarios, we would first create N/2 random, normally distributed numbers as usual, then take the negative of all those random numbers to be the second set of scenarios. This guarantees that the overall set of random numbers will be perfectly balanced about the mean.
For Stratification, we divide the sample space into N cells and randomly take a single sample from within each cell. This ensures that the samples will be well-spread throughout the sample space.
As an example, consider two risk factors, fA and fB, whose possible values are uniformly distributed between zero and one. Here we have a two-dimensional sample space. Assume that we want to take four samples from this space. In this case, we would cut each dimension in two and create our four samples as follows:
Sample 1
fA,1 = 0.5 × u, u ∼ U(0,1)
fB,1 = 0.5 × u, u ∼ U(0,1)
Sample 2
fA,2 = 0.5 + 0.5 × u, u ∼ U(0,1)
fB,2 = 0.5 × u, u ∼ U(0,1)
Sample 3
fA,3 = 0.5 × u, u ∼ U(0,1)
fB,3 = 0.5 + 0.5 × u, u ∼ U(0,1)
Sample 4
fA,4 = 0.5 + 0.5 × u, u ∼ U(0,1)
fB,4 = 0.5 + 0.5 × u, u ∼ U(0,1)
Here, u ∼ u(0, 1) means that u is a random number sampled from a uniform distribution between zero and one. Figure 6-6 sketches typical resulting samples.
In general, it is necessary to ensure that there is an equal probability of a sample occurring in each cell. If we wanted to create variables with a distribution other than uniform, we first create uniform, stratified, random numbers as above and then transform them to have the distribution that we need. The transformation is done by using the inverse of the cumulative probability function to map from a uniform distribution to the required distribution. For a Normal distribution, this mapping is sketched in Figure 6-7. The symbol N() denotes a Normal distribution, and Φ denotes a Standard Normal distribution with a mean of zero and standard deviation of one.
In a spreadsheet application, such as Microsoft’s Excel, uniformly distributed, random numbers can be created using the command “rand()”. These can be converted into having a standard Normal distribution using the command “Norminv(rand(),0,1)”. If we were stratifying one random factor into four strata, we could create four stratified, random numbers from the following commands:
Z1 = Norminv(0.00 + 0.25 * rand(), 0, 1)
Z2 = Norminv(0.25 + 0.25 * rand(), 0, 1)
Z3 = Norminv(0.50 + 0.25 * rand(), 0, 1)
Z4 = Norminv(0.75 + 0.25 * rand(), 0, 1)
Let us move on to Importance sampling. Importance sampling biases the selection of random numbers towards the places that are of most importance to us, e.g., crises in the tails of the distribution. This can be done by creating random numbers that have a higher standard deviation than the true risk factors. Once the results have been calculated, their probability is adjusted according to the relative probabilities of the true distribution and the sample distribution.
The Latin Hypercube is most useful if there are only one or two risk factors that are particularly important in causing losses. The Latin Hypercube technique starts in a similar way to Stratified sampling in that we divide each dimension into a number of segments. However, we then take only one sample from each segment. The result is illustrated in Figure 6-8. Notice that each dimension is very finely covered, but not all combinations of the two variables are explored.
FIGURE 6-6 Results from Stratified Sampling
Stratified sampling is a variance-reduction technique enabling selected numbers to be more evenly distributed than those in the pure random case.
FIGURE 6-7 Creation of a Normally Distributed Variable from a Uniformly Distributed Variable
Here, the set of uniformly distributed, random numbers (f) are put into the inverse of the Normal distribution to create a set of Normally distributed numbers (z). If a number in the uniform set is close to zero, it will produce a number close to negative infinity in the Normal set. A uniform number close to 0.5 becomes close to zero, and a uniform number close to one becomes close to infinity.
In using Variance reduction, it is important to note that as the number of risk factors increases, it becomes more difficult to create well-balanced pseudorandom numbers. A common technique, therefore, is to concentrate the random numbers on the most important risk factors and let the lesser factors be purely random. If Eigen-value decomposition has been used, the most important risk factors are the ones corresponding to the largest element in the Λ matrix.
FIGURE 6-8 Results from a Latin Hypercube
The Latin Hypercube ensures that samples do not overlap in any dimension.
Approximate pricing reduces the computation time for each instrument by simply taking less time to evaluate each scenario. This is done by using simple models that are not as accurate, but can be run quickly. A typical example would be to use the Black-Scholes equation to approximate the value of an option that would be more accurately priced by binomial trees.
In this chapter, we explained how to calculate VaR using three methods that are in common use: Parametric VaR, Historical simulation, and Monte Carlo simulation. Next, we will explain how to attribute VaR to the source of each risk.
As discussed earlier in the chapter, binning is important for discounting cash flows because it enables us to reduce the amount of data that we need to handle. Without binning, we would need to store the yield curve and rate volatilities for every day until the final cash flow in the portfolio was due, typically around 30 years. By binning, we group all future cash flows onto a limited number of points on the yield curve, e.g., overnight, 3 months, 6 months, 1 year, 2 years, 5 years, 10 years, and 30 years. A crude way of doing this would be to move every cash flow to the nearest point on the yield curve, e.g., a payment of $100 in 4 years would be considered to be a payment of $100 at 5 years; however, this would distort measures such as duration. An alternative is to represent a single, true cash flow as a series of fictitious cash flows spread over several points on the yield curve.
In binning the cash flows, we typically try to preserve some combination of the cash-flow amount, present value, and duration. If we want to meet all three conditions, we need to have three possible variables, i.e., cash flows at three different points on the yield curve. This is generally considered to be excessive. Instead, we just try to satisfy two of the conditions by altering the cash flows at the two points just before and after the true cash flow.
Let us consider mapping a cash flow, C5, that is expected to occur five months from now. This will be represented by cash flows at the three-month and six-month points.
If we wish to preserve the cash flow and duration, we must choose the cash flows at the three-month and six-month points (C3 and C6) to set the following to be equal:
As an example, assume that the cash flow at 5 months is $100, the 3-month rate is 5%, the 5-month rate is 5.1%, and the 6-month rate is 5.15%; then the equations become:
Cash flow: 100 = C3 + C6
Duration: − 0.388 × 100 = −0.235 × C3 − 0.464 × C6
This can be solved to obtain the required dummy cash flows:
C3 = $33.2
C6 = $66.8
In the introduction to Parametric VaR, the equations were worked out in terms of absolute changes in the risk factors. For example, if the exchange rate went from 1.5 to 1.6, we considered that to be a change of 0.1 rather than a relative change of 7%. Relative changes are also called “returns.” Working out the equations in terms of relative changes is a little more complex, but makes the results more accurate because the relative changes typically have a more Normal distribution than the absolute changes. This appendix shows how our previous example would be modified to use relative changes.
The example is a bond denominated in British pounds with a single payment and owned by a U.S. bank. As it is held by a U.S. bank, the value should be converted to dollars. The value of the bond in dollars is the value in pounds multiplied by the FX rate:
The change in value due to both a change in rates and a change in FX is given by the sum of individual changes:
We can divide and multiply this equation by the current rates to get an expression in terms of returns:
We now define the derivative vectors to include multiplication by the current rate:
The rest of the VaR calculation continues as before but now using the standard deviations of the historical returns rather than the absolute values:
Note that the variance and covariance equations neglect the subtraction of the small mean, which for daily rate changes is usually a reasonable assumption. If we wished to add the extra complication of subtracting the mean, we would use the following:
The construction of a covariance matrix is central to both Parametric VaR and Monte Carlo VaR. There are two approaches to constructing the matrix that are in common use: the random walk and the exponentially weighted moving average.
The random-walk assumption for the Covariance Matrix is that the correlations and standard deviations are stable over time. This allows us to use the usual calculation for the variances and covariances:
Here, xi,t represents risk factor i at time t. If VaR is being calculated using absolute changes, then xi,t is simply the change in the factor fi from day t – 1 to day t:
xi,t = fi,t − fi,t−1
If VaR is being calculated using relative changes, then xi,t is the relative change:
In practice, the mean change is much smaller than the standard deviation of the change and can be neglected with little loss of accuracy. Typically, between 180 and 250 days of historical data are used. Decreasing the number of days means that there are fewer samples for accurately estimating the parameters. However, increasing the number of days means including periods when market conditions could have been significantly different from today.
The Exponentially Weighted Moving Average (EWMA) is slightly more difficult to program but is better for estimating the latest covariances than the simple random walk, and has become an industry standard. The EWMA approach assumes that recent data gives better information about market conditions than past data. It therefore weights recent changes more heavily. The covariance is calculated recursively using yesterday’s estimate of the covariance and today’s market change, as follows:
Here, λ is a decay factor, typically with a value between 0.9 and 0.99. xi,T is the market change for risk factor i on day t, and is the previous day’s estimate of the covariance between risk factors i and j. The above expression is equivalent to the weighted sum of the daily changes, as shown below:
With λ equal to 0.9, today’s data will have a weight of 0.1, and data from 20 days ago will have a weight of (1 − λ)λ20, i.e., 0.012. With such a weighting, data from before 20 days has very little influence on the EWMA result, which means that the EWMA will be more responsive to changes in the market than the simple random-walk estimate.
If you want to dig deeper into why it should be that Eigenvalue decomposition produces properly correlated numbers, consider calculating the covariance of the random factors in F. The covariance of the random factors is given by the expected value of the outer product of F and its transpose:
CF = E(FTF)
We can now replace F with its value in terms of the results of the Eigenvalue decomposition, B, and the uncorrelated, random numbers, Z.
E(FTF) = E(BTZTZB) = BTE(ZTZ)B
Z is a set of uncorrelated numbers with a standard deviation of one; therefore, E(ZTZ) is the identity matrix:
BTE(ZTZ) B = BTIB = BTB
BTB was constructed to equal the covariance matrix of historical changes in the original risk factors:
C = BTB
Therefore, the covariance matrix of the newly generated, random numbers equals the covariance matrix of the historical data:
CF = C
1 Numerical Recipes in C: The Art of Scientific Computing, W. H. Press et al., Cambridge University Press, 1997.