In the previous chapter, we gave a qualitative overview of the measurement of risk, and we started to build a quantitative framework. We quantified risk as the amount of economic capital consumed, and this capital was shown to depend on the probability distribution of potential losses. This quantification according to the probability distribution of potential losses is at the core of modern risk measurement, and we will spend many chapters discussing how the distribution can be obtained for different types of risks. In this chapter, we will give a deeper explanation of probability distributions and discuss other statistical techniques that are commonly used in risk measurement. This chapter is intended to gather together the core statistical techniques that will be used in the rest of the book. It gives a single place for reference so in the later chapters we will not need to digress from finance each time we need to apply a statistical technique.
As a reader, you can treat this chapter in one of two ways: read it all now and get the pain over with, or skip to the chapters that are of particular interest to you and refer back to this chapter when needed. If you are fortunate enough to be familiar with statistics already, you can just read the section headers to make sure that there is nothing that you have missed, and then go on to the market-risk chapters.
We will cover the following topics in this chapter:
• The creation of histograms and probability distributions from empirical data.
• The statistical parameters used to describe the distribution of losses: mean, standard deviation, skew, and kurtosis.
• Examples of market-risk and credit-risk loss distributions to give an understanding of the practical problems that we face.
• The idealized distributions that are used to describe risk: the Normal, Log-Normal, and Beta probability distributions.
• The use of confidence intervals, confidence levels, and percentiles.
• How to include correlations between random losses.
• The statistics for a sum of separate losses.
• The equations that can be used to describe a random time series, such as the evolution of interest rates.
• A brief reminder of addition and multiplication for matrices. Matrices will be used later to make complex equations more readable.
The primary use of probability densities in risk measurement is to show us the likelihood of any given level of losses. For example, we can use them to show us the probability of losses from a given portfolio being greater than a million dollars in one day.
We will introduce probability densities by describing how they can be empirically constructed from raw data. The first step will be to create histograms of the data. We will then modify the histogram to become a graph of probability and further modify it to become a graph of probability density.
Let us return to the example in Chapter 2, in which we had 10 possible scenarios for asset value at the end of the year. In Chapter 2, we quickly went from the sample distribution to probability distributions. Here we will do the same, but more thoroughly. The possible values are shown again in Table 3-1. We can define ranges of possible asset values and count how many samples fall within each range. These ranges are also called bins or buckets because we place each sample into one of the bins. For our example, we define the ranges to be $2 increments, from $96 to $106, which gives the results of Table 3-2. We can plot these results as the histogram in Figure 3-1.
The histogram displays how many occurrences or samples fall within each range or bin. We can restate this in terms of the probability of one random sample falling in any given bin. The probability is calculated by dividing the number of samples in each bin by the total number of samples. If we define ni to be the number of samples falling in bin number i (as in Table 3-2), and N to be the total number of samples (in this case, 10), then we can calculate the probability Pi of a sample falling in bin i. The probability is given by the simple equation:
In many cases, it will be easier to work with probability densities rather than raw probabilities. The probability density is defined to be the probability of a sample falling in a given range divided by the width of the range (w). In the example above, we chose the width of the range to be $2. The probability density is signified by a lowercased p:
TABLE 3-1 Results of 10 Credit-Loss Scenarios
TABLE 3-2 Number of Occurrences in Each Range
The probability density for this example is plotted in Figure 3-2, which looks very similar to the histogram of Figure 3-1. In the histogram, the y-axis showed the number of samples per bin. In the probability density, the y-axis shows the number of samples per bin divided by the total number of samples and the bin width:
FIGURE 3-1 Histogram of 10 Credit-Loss Scenarios
Histogram showing how many results fall in each range.
FIGURE 3-2 Probability Density for the Credit-Loss Example
The distribution of results scaled to be a probability density by dividing the number of results in each range by the width of the range, which in this case is $2.
The probability density can be used to tell us the probability of a variable falling in a given range. From this we can also calculate the cumulative probability. This is the probability of the random variable falling below a given number. The cumulative probability can be estimated by multiplying the probability density by the bin width to get probabilities for each bin, and by summing up all the probabilities for values less than or equal to the given number. The cumulative probability (CP) up to the given number (xi) can be expressed by the following summation:
FIGURE 3-3 Cumulative Probability for the Credit-Loss Example
The graph shows the cumulative probability density, which is the sum of all the results falling in or below the given range.
Here means the sum of the probability densities for all the bins from 1 to i.
For our example, the graph of the cumulative probability constructed this way is shown in Figure 3-3, in which the y-axis shows the probability that the random variable will be less than the value shown in the x-axis.
In the discussion above, we described the random variable of asset value in graphical form. The properties of the variable can also be quantified in terms of mean, standard deviation, skew, and kurtosis. In calculating these statistics, we need to bear in mind that there are two tightly related but subtly different sets of statistics. The first set is the mean, standard deviation, skew, and kurtosis that describe the actual underlying process that produces the random results. The second set is the mean, standard deviation, skew, and kurtosis of the results that we observe. This second set is called the set of the sample statistics.
In general, we cannot know the true statistics of a process; we can only observe the individual results, calculate the sample statistics, and then use these as the best estimates of the statistics of the underlying process. Here we shall use an apostrophe (‘) to denote the statistics of the underlying process.
The mean is typically denoted by the Greek letter μ or by a bar over the symbol for the random variable, e.g., . The mean value of a variable produced by a random process is the sum of the possible results, each weighted by the probability of the result. If the process can only produce a discrete number of results, the mean is the weighted sum of the results:
Here, N is the total number of possible results, and P(xi) is the probability of result xi. If the underlying process is continuous, then there is an infinite number of possible results, and we use integration to calculate the mean:
Here, p(x) is the probability density function for x, and p(x)dx is the probability of x falling in the range dx. The mean value of the underlying process is also called the expected value, denoted by E(x).
The sample mean of a set of random results is simply the sum of the results divided by the number of results (N):
Here, xi represents the individual results. The probability function, p(x) is implied by the distribution of the values that were observed. For the 10-sample example of asset values we calculate the mean as follows:
The standard deviation gives a measure of the degree to which the random results vary away from the mean. This is a key statistic in risk measurement because it gives a measure of how different the results could be from the desired result of making a profit. The standard deviation is generally denoted by σ. The standard deviation squared is called the variance:
Variance = σ2
The variance for a random process is the probability-weighted sum of the square of the differences between the result and the mean. For a process giving discrete results, the standard deviation is given by the following:
For a continuous variable, it is given by the integral:
The sample variance is given by the sum of the squared deviations from the sample mean divided by N – 1.
The division by N – 1 ensures that the expected value of the sample variance equals the variance of the underlying distribution. The derivation of the factor N – 1 is a little complex, but depends on the fact that the sample mean is not necessarily the same as the true mean. For our example, the variance is 8.2, and the standard deviation is 2.9:
The skew is a measure of the asymmetry of the distribution. In risk measurement, it tells us whether the probability of winning is similar to the probability of losing.
The skew for discrete and continuous process are calculated as follows:
The sample skew includes terms to ensure that the expected value of the sample skew equals the skew of the underlying process:
For our example, the sample skew is calculated as follows:
Kurtosis is useful in describing extreme events, e.g., losses that are so bad that they only have a 1 in 1000 chance of happening. Consider two different trading portfolios whose values have the same mean, standard deviation, and skew, but different kurtosis. Every 1000 days, the portfolios could be expected to suffer a “bad” loss. In these extreme events, the portfolio with the higher kurtosis would suffer worse losses than the portfolio with lower kurtosis. We will illustrate this later with actual market data.
The kurtosis for discrete and continuous process are calculated as follows:
The kurtosis of a sample is calculated as follows:
For our example, the kurtosis is 4.4:
Later in this chapter, we will discuss the Normal probability distribution. The kurtosis for the Normal distribution is three, and this distribution is so commonly used that some researchers define the “excess kurtosis” as being the calculations above minus three, i.e.:
Distributions with a kurtosis greater than the Normal distribution are said to have leptokurtosis.
Up to this point we have illustrated probability distributions, probability densities, and statistics using the simple example with 10 different outcomes for asset values. Now let us apply this language and theory to examine real, historical financial data.
Table 3-3 shows the annual rate of default over the last 19 years on bonds rated by Standard & Poor’s. The data is plotted as a histogram in Figure 3-4. Along the x-axis is the proportion of bonds that defaulted. (100 basis points equals 1%.) The y-axis shows the number of years in which the loss rate fell within each of the 50 basis-point ranges from 0 to 400.
Let us now discuss the information that we can obtain from this data. In 7 of the 19 years, the loss rate was between 50 and 100 basis points. The worst annual loss was 297 basis points. The mean loss rate was 127 basis points, with a standard deviation of 72 basis points. The skew was 0.9 (reflecting the asymmetry), and the excess kurtosis was 0.24 (reflecting the extreme event of a 297-basis-point loss).
Figure 3-5 gives the histogram of daily relative changes in the S&P 500 equity index over 10 years, from January of 1990 to December of 2000. The relative change for day T is calculated as the change in the level of the index divided by the level one day previously:
TABLE 3-3 Bond-Default Rates over 19 Years
The maximum change in one day was 5%, the minimum was –7%, the mean daily change was 0.05% (14%/year), the standard deviation was 0.9%, and the skew and kurtosis were –0.2 and 7.5 respectively. From this data we can make several observations. The standard deviation of daily returns is much higher than the mean. The shape of the probability distribution is close to being symmetric and has a familiar bell shape, but it has a kurtosis that is significantly greater than 3, and therefore is not a Normal distribution.
The size of the bins along the x-axis is in 0.5% increments from –7.5% to +7.5%. The y-axis shows the number of results falling in each bin, but it is not very useful to say that with 2800 samples, around 700 will fall within the bin between 0 and 0.5%. An alternative is to divide each result by the total number of samples and the bin width, to produce the probability density. This gives us a result that is independent of the number of samples and the sizes of the bins. The sample probability-density function for the S&P 500 is shown in Figure 3-6. The only change compared with Figure 3-5 is the scale on the y-axis.
FIGURE 3-4 Histogram of Bond Losses over 19 Years
Histogram of bond-default rates over 19 years. The X-axis shows proportion of bonds that defaulted in basis points, with 100 basis points equaling 1%. The Y-axis shows number of years the loss rate fell within each of the 50 basis point ranges, from 0 to 400.
FIGURE 3-5 Histogram of Daily Returns for the S&P 500
Histogram of daily relative changes in the S&P equity index over a 10-year period.
FIGURE 3-6 Probability Density of Daily Returns for the S&P 500
Daily return on the S&P 500 index presented as a probability-density function.
In the examples above, we simply plotted the observed data in histograms and calculated their statistics. An alternative is to use an equation to describe the probability-density function. A probability-density function is an equation that gives a value for the probability density as a function of the possible value of the random variable and parameters such as the mean, standard deviation, skew, and kurtosis. This allows us to know the complete probability distribution without collecting vast amounts of data. Here we will discuss three distributions that are commonly used in risk measurement: the Normal distribution, the Log-normal distribution, and the Beta distribution.
The Normal distribution is also known as the Gaussian distribution or Bell curve. It is the distribution most commonly used to describe the random changes in market-risk factors, such as exchange rates, interest rates, and equity prices. This distribution is very common in nature because of the Central Limit Theorem, which states that if a large amount of independent, identically distributed, random numbers are added together, the outcome will tend to be Normally distributed. The equation for the Normal distribution is as follows:
Here, p[x] is the probability density for the variable taking the value x. p[x] depends on three variables: the value of x, the mean of the distribution, μ, and the standard deviation, σ. The skew of the Normal distribution is always zero, and the kurtosis is always three. Figure 3-7 shows p[x] for different values of x with a mean of one and a standard deviation of two.
Earlier, we summed the sample results to estimate the cumulative probability. With a probability-density function, we use integration rather than summation to obtain the cumulative-probability function:
Here, y is the symbol that we use for integration, and x is the upper limit of the integration interval. The cumulative probability for x tells us the probability of a random sample having a value that is less than x. Figure 3-8 shows the cumulative-probability function for the Normal function with a mean of one and standard deviation of two.
You may have noticed the similarity between the Normal function and the distribution of the S&P returns (Figure 3-6). Figure 3-9 shows the S&P 500 returns with a Normal distribution of the same mean (0.05%) and standard deviation (0.9%) imposed upon it. The Normal distribution is the dotted line. The curves are reasonably similar, but notice that the peak for the raw data is higher and to the right. This is because there are a few extreme negative days (of losses up to –7%), which drag down the mean of the Normal function and make the standard deviation quite high. This can be seen in Figure 3-10 that expands the y-axis to show the detail of the low-probability events. Notice that the curve obtained directly from the S&P data has a greater probability density out in the tails, especially the few days of losses around –7%. This is reflected in the high kurtosis of 7.5. Notice also that there are more extreme losses than gains, reflected in the negative skew of –0.7. This is a common problem when using the Normal distribution to represent market data, and one that we will return to in the chapters on market risk.
FIGURE 3-7 The Normal Probability-Density Function
FIGURE 3-8 The Normal Cumulative-Probability Function
FIGURE 3-9 Comparison of Normal Distribution with Actual S&P Data
This figure shows the S&P 500 returns compared with a Normal distribution of the same mean (0.05%) and standard deviation (0.9%) imposed on it.
The Log-normal distribution is useful for describing variables which cannot have a negative value, such as interest rates and stock prices. If the variable has a Log-normal distribution, then the log of the variable will have a Normal distribution:
FIGURE 3-10 Comparison of Normal Distribution with Actual S&P Data—Detail
Conversely, if you have a variable that is Normally distributed, and you want to produce a variable that has a Log-normal distribution, take the exponential of the Normal variable:
Figure 3-11 shows the Log-normal distribution corresponding to a Normal distribution with a mean of zero and a standard deviation of one. Notice that the distribution is highly skewed.
FIGURE 3-11 Illustration of the Log-Normal Distribution
FIGURE 3-12 Comparison of Beta Distribution with Actual Credit-Loss Data
The Beta distribution is useful in describing credit-risk losses, which are typically highly skewed, as we saw in the bond-default rates in Figure 3-4.
The formula for the Beta distribution is quite complex; however, it is available in most spreadsheet applications. As with the Normal distribution, it only requires two parameters (in this case called α and β) to define the shape. α and β are functions of the desired mean and standard deviation of the distribution; they are calculated as follows:
For the bond-default data that we examined earlier, the mean is 1.27%, and the standard deviation is 0.62%. The value of α and β is therefore 4.15 and 322, respectively. With these values, the Beta distribution has the shape shown in Figure 3-12. Figure 3-12 compares the Beta distribution with the histogram obtained by binning the raw data. (The probability density was obtained from the histogram by dividing by the number of samples, 19, and the bin width, 50bps, or 0.005.) The figures show that the Beta distribution effectively smoothes the empirical data.
Confidence intervals are one of the ways that we can use probability distributions to make statements about the probabilities of future events. Confidence intervals allow us to state with a given level of certainty the range of values that a variable is likely to take. For example, the probability that x will fall between a and b is given by:
Here, a is the lower end of the interval, b is the upper end, and px is the probability-density function. For example, consider a Normal distribution with a mean of 0 and standard deviation of 1. This is called a Standard Normal distribution. There is a 68% confidence that a random variable from a Standard Normal distribution will fall between –1 and +1. For this variable, a 68% confidence interval is therefore +/–1. Table 3-4 shows a range of confidence intervals and the associate probabilities for the Standard Normal distribution.
In risk management, confidence levels are often more useful than confidence intervals because we are usually concerned with the downside risk or worst-case level. The confidence level is a single number rather than a range. It is the level that will not be exceeded, with a given probability. For example, there is only a 5% chance that a variable drawn from a Standard Normal distribution will have a value greater than 1.64. We can therefore say that the 95% confidence level for this variable is 1.64. Table 3-5 gives several confidence levels for the Standard Normal distribution.
The inverse of the confidence levels is the percentile. For example, the 99 percentile (or “99%ile”) is the level such that 99% of the results fall below that level. For a Normal distribution, the 99%ile is at the mean, plus 2.32 times the standard deviation. When dealing with empirical data, the 99%ile is the result that is 99% from the start of the data series after the series has been ordered. If, for example, there were 300 variables in a series, the 99%ile would be the third from the end of the ordered series.
TABLE 3-4 Confidence Intervals for the Standard Normal Distribution
TABLE 3-5 Confidence Levels for the Standard Normal Distribution
So far, we have been discussing the statistics of isolated variables, such as the change in the equity prices. We also need to describe the extent to which two variables move together, e.g., the changes in equity prices and changes in interest rates.
If two random variables show a pattern of tending to increase at the same time, then they are said to have a positive correlation. If one tends to decrease when the other increases, they have a negative correlation; and if they are completely independent, and there is no relationship between the movement of x and y, they are said to have zero correlation.
The quantification of correlation starts with covariance. The covariance of two variables can be thought of as an extension from calculating the variance for a single variable. Earlier, we defined the variance as follows:
The variance for each of two separate variables, x and y, can therefore be calculated as:
The covariance between the variables is calculated by multiplying the variables together at each observation:
If the change in x is always positive when the change in y is positive, the covariance will come out to be a large number. If when the change in x is positive, the change in y is sometimes positive and sometimes negative, the terms will cancel each other out, and the covariance will tend towards zero.
The correlation is defined by normalizing the covariance with respect to the individual variances:
The maximum possible value for a correlation is one, and the minimum is negative one. The correlation of a variable with itself is simply the variance divided by the variance, and is therefore always equal to one.
In risk measurement, we are often interested in finding the statistics for a result which is the sum of many variables. For example, the loss on a portfolio is the sum of the losses on the individual instruments. Similarly, the trading loss over a year is the sum of the losses on the individual days. Let us consider an example in which y is the sum of two random numbers, x1 and x2.
The mean of the sum of two random numbers is simply the sum of the means of the individual numbers:
The variance of a sum of random variables is the sum of the variances of the individual variables, plus the covariances:
This relationship is used in many places in risk measurement, and for those who are interested, can be derived as follows:
The examples above were for two variables. In general, if y is a sum of q variables, then:
It is worth noting that the correlation between variables a and b is the same as the correlation between b and a, i.e.,
pa,b = pb,a
One particularly useful application of this equation is when the correlation between the variables is zero. This assumption is commonly made for day-to-day changes in market variables. If we make this assumption, then the variance of the loss over multiple days is simply the sum of the variances for each day:
If we further assume that the variance of the loss on each day is the same, then the variance over T days is simply T times the variance over one day:
By taking the square root of each side, we can show that the standard deviation of the loss over T days is the standard deviation of the loss over one day, multiplied by the square root of T:
This relationship is commonly used in the analysis of market risk to predict how bad the cumulative losses could be over multiple days.
For many random phenomena, such as equity prices and interest rates, the value in one time period will depend on the value in previous periods. For example, a very simple model for stock prices would be to say that the price is a random walk in which the stock price, S, on any given day is equal to the previous day’s price plus a random number, xt:
St = St–1 + xt
A typical assumption is that the random number is drawn from a Standard Normal distribution, multiplied by the required standard deviation:
Here, St is the stock price at time t, σ is the standard deviation of the stock price over one period, and ΔT is the number of time steps between each period. N(0, 1) signifies a Normal distribution with a mean of zero and standard deviation of one. If σ was the daily standard deviation, and the time steps were days, then ΔT would equal 1. If σ was the daily standard deviation, and the time steps were years, then ΔT would equal 250 (the number of trading days in a year). In practice, the random walk of equities is closer to being a geometric process, i.e., as the stock price increases, the size of the random changes also increases. This geometric process is included in the model below:
We can further complicate the model by adding a mean expected growth rate, μ:
This describes the evolution of stock prices reasonably well but is not good for describing interest rates. Interest rates do not have a long-term expected growth, but instead, over long periods they tend to return to a set level called the long-term average.
There are many models for random interest-rate processes. One of the most straightforward is the Cox-Ingersoll-Ross (CIR) model. The CIR model is as follows:
Here, rt is the rate at time t, c is the rate of decay towards rm, and rm is the long-term mean for the rate. Whenever rt is greater than rm, the term c(rm rt) becomes negative and tends to move the rates back down. The square root of r in the last term scales the random disturbance so when rates are low, the disturbance will be low, and there is a low possibility of creating negative interest rates.
When there are many variables, the normal algebraic expressions become cumbersome. An alternative way of writing these expressions is in matrix form.
Matrices are just representations of the parameters in an equation. You may have used matrices in physics to represent distances in multiple dimensions, e.g., in the x, y, and z coordinates. In risk, matrices are commonly used to represent weights on different risk factors, such as interest rates, equities, FX, and commodity prices.
For example, we could say that the value of an equity portfolio was the sum of the number (n) of each equity held multiplied by the value (v) of each:
VPortfolio = nava + nbvb + ncvc
This can also be written in matrix notation as follows:
Notice that matrices and vectors are typically represented by capital letters and scalars by lowercased letters.
In this section, we give a quick reminder of the basic matrix operations. Consider a row vector, A. The transpose of A is a column vector:
Consider a matrix, M, with two rows and three columns. The transpose is a matrix with three rows and two columns:
Matrices can be summed by separately adding the individual elements, i.e., the first element of one vector is added to the first element of the other vector. Consider two row vectors, A and B. The sum of two row vectors is another row vector:
Matrices are multiplied together by multiplying the rows of the first vector with the columns of the second vector. Consider a row vector, A, and a column vector, B. The product of these is a scalar:
Now consider a row vector, A, and a matrix, C. The product is a row vector:
As further illustration, consider A and C with numerical values:
By following the rules for matrix operations, we can find matrices that represent the answers to systems of equations. We will use matrices on a couple of occasions in this book, notably in calculating Parametric Value at Risk (VaR). In Parametric VaR, we will use matrices to replace summations of products. For example, we earlier defined the variance for a sum of correlated variables to be as follows:
This can be written in matrix form simply as:
Here, S and R are defined as follows:
If we only had two risk factors, the calculation would be as follows:
which gets us back to the algebraic expression for variance.
In this chapter, we gathered together the core statistical techniques that are used in the balance of the book to measure various forms of risk. Next, we will look at the main instruments that banks trade and why they trade them. We will also explore how each instrument can be valued to manage risk effectively.