Monte Carlo methods are often the only feasible approach to cope with high-dimensional problems. Nevertheless, they may require an excessive computational effort to produce a reliable answer. Hence, strategies to reduce the dimensionality of a problem are welcome anyway. Moreover, they can also be put to good use when applying alternative approaches, such as low-discrepancy sequences. Apart from computational convenience, there is another driving force behind dimensionality reduction. While in this chapter we are mainly concerned with model building, in the next one we will deal with model estimation. A rich model may be quite appealing in principle, but it turns into a nightmare if it calls for a huge amount of data for its estimation, and a badly estimated data will make the output of the most sophisticated Monte Carlo simulation utterly useless, if not worse.
Data and dimensionality reduction methods are an important and wide topic in multivariate statistics, and we cannot provide anything but some hints on the most relevant tools for applications in finance and economics, to provide the reader with a feeling for them. In this section we outline two methods:
The second approach should not be confused with factor analysis, which is another multivariate analysis technique for dimensionality reduction. The main difference is that factor analysis techniques deal with latent and nonobservable factors, arguably due to their origin in psychology. The factor models we consider here are quite common in finance and aim at easing some statistical estimation issues when a large number of assets is analyzed.
To introduce principal component analysis (PCA) in the most intuitive way, let us have a look at Fig. 3.42, which shows a scatterplot of observations of two random variables X1 and X2. It is clear that the two random variables are positively correlated. Now imagine that, for some reason, we can only afford dealing with a single variable. In order to streamline our model of uncertainty, one possibility would be to get rid of one of the two variables, but which one? Since the variables are related and both display some non-negligible variability, discarding one the two variables looks like a very crude approach to dimensionality reduction. We should try to come up with a single variable trying to capture the most information. Now, consider the two axes in the figure, labeled with variables Z1 and Z2. Of course, we may represent the very same set of observations in terms of Z1 and Z2. This is just a change of coordinates, accomplished in two steps:
FIGURE 3.42 Visualization of PCA.
Vector rotation is accomplished by multiplication by an orthogonal matrix. A square matrix A is orthogonal if its columns are a set of orthogonal unit vectors,39 which implies
This also shows that this matrix can be inverted by transposition. To see that multiplication by such a matrix is a rotation, we may observe that it does not change the norm of vectors:
Since orthogonal matrices enjoy plenty of nice properties, we might be able to find a linear transformation of variables with some interesting structure, Indeed, in Fig. 3.42 we observe two important facts about the new variables Z1 and Z2:
Now, if we had to get rid of one variable, we would have no doubt: Z1 conveys the most information, and the residual one is not only less relevant, but also uncorrelated.
Now that we have a feeling for what PCA is and why it may be useful, we may investigate it in more depth following two different routes to introduce PCA:
Let us pursue both viewpoints, which actually lead us to same conclusions, as we will see.
Let p be the dimension of the random vector X and n the number of available joint observations. Such observations are collected into the following matrix:
where the element [X]kj in row k and column j is the component j of observation k, i.e., X(k)j.
A linear data transformation, including centering, can be written as
where A . We assume that data have already been centered, in order to ease notation. Hence,
The Zi variables are called principal components: Z1 is the first principal component. Now, let us consider the sample covariance matrix of X, denoted by Sx, with entries
Since we assume centered data, this matrix can be expressed as follows:
Now, we may also find the corresponding sample covariance matrix for Z, Sz. However, we would like to find a matrix A such that the resulting principal components are uncorrelated; in other words, Sz should be diagonal:40
where SZi2. is the sample variance of each principal component. The matrix A should diagonalize the sample covariance matrix Sx. To diagonalize Sx, we should consider the product
where matrix P is orthogonal and its columns consist of the normalized eigenvectors of Sx; since this is symmetric, its eigenvectors are indeed orthogonal. The diagonalized matrix consists of the eigenvalues λi, i = 1, …, p, of Sx. Putting everything together, we see that the rows ai, i = 1, …, p, of matrix A should be the normalized eigenvectors:
We also observe that the sample variances of the principal components Zi are the eigenvalues of Sx:
If we sort eigenvalues in decreasing order, λ1 ≥ λ2 ≥ … ≥ λp, we can see that indeed Z1 is the first principal component, accounting for most variability. Then, the second principal component Z2 is uncorrected with Z1 and is the second in rank. The fraction of variance explained by the first q components is
Taking the first few components, we can account for most variability and reduce the problem dimension by replacing the original variables by the principal components.
Another view of PCA is obtained by interpreting the first principal component in terms of orthogonal projection. Consider a unit vector u
p, and imagine projecting the observed vector X on u. This yields a vector parallel to u, of length u
X. Since u has unit length, the projection of observation X(k) on u is
We are projecting p-dimensional observations on just one axis, and, of course, we would like to have an approximation that is as good as possible. More precisely, we should find u in such a way that the distance between the originally observed vector X(k) and its projection is as small as possible. If we have a sample of n observations, we should minimize the average distance
which looks much like a least-squares problem. This amounts to an orthogonal projection of the original vectors on u, where we know that the original and the projected vectors are orthogonal. Hence, we can apply the Pythagorean theorem to rewrite the problem:
In order to minimize the left-hand side of this equality, on the average, we have to maximize
subject to the condition ||u|| = 1. The problem can be restated as
But we know that, assuming that data are centered, the sample covariance matrix is ; hence, the problem is equivalent to
In plain English, what we want is finding one dimension on which multidimensional data should be projected, in such a way that the variance of the projected data is maximized. This makes sense from a least-squares perspective, but it also has an intuitive appeal: The dimension along which we maximize variance is the one providing the most information.
To solve the problem above, we may associate the constraint (3.97) with a Lagrange multiplier λ and augment the objective function (3.96) to obtain the Lagrangian function:41
The gradient of the Lagrangian function with respect to u is
and setting it to zero yields the first-order optimality condition
This amounts to saying that λ must be an eigenvalue of the sample covariance matrix, but which one? We can rewrite the objective function (3.96) as follows:
Hence, we see that λ should be the largest eigenvalue of SX, u is the corresponding normalized eigenvector, and we obtain the same result as in the previous section. Furthermore, we should continue on the same route, by asking for another direction in which variance is maximized, subject to the constraint that it is orthogonal to the first direction we found. Since eigenvectors of a symmetric matrix are orthogonal, we see that indeed we will find all of them, in decreasing order of the corresponding eigenvalues.
Principal component analysis in practice is carried out on sampled data, but it may be instructive to consider an example where both the probabilistic and the statistical sides are dealt with.42 Consider first a random variable with bivariate normal distribution, X ~ N(0, Σ), where
and ρ > 0. Thus, X1 and X2 are standard normal variables with positive correlation ρ. To find the eigenvalues of Σ, we must find its characteristic polynomial and solve the corresponding equation
This yields the two eigenvalues λ1 = 1 + ρ and λ2 = 1 − ρ. Note that both eigenvalues are non-negative, since ρ [−1, 1] is a correlation coefficient. To find the first eigenvector, associated with λ1, we consider the system of linear equations
Clearly, the two equations are linearly dependent and any vector such that u1 = u2 is an eigenvector. By a similar token, any vector such that u1 = −u2 is an eigenvector corresponding to λ2. Two normalized eigenvectors are
These are the rows of the transformation matrix
Since we are dealing with standard normals, μ = 0 and the first principal component is
The second principal component is
As a further check, let us compute the variance of the first principal component:
Figure 3.43 shows the level curves of the joint density of X when ρ = 0.85. Since correlation is positive, the main axis of the ellipses has positive slope. It is easy to see that along that direction we have the largest variability.
FIGURE 3.43 Level curves of a multivariate normal with ρ = 0.85.
Practical PCA is carried out on sampled data, and R provides us with a suitable function called prcomp. The R code in Fig. 3.44 samples 2000 points from the above bivariate normal distribution, with ρ = 0.85, carries out PCA, and produces the following output:
FIGURE 3.44 Code to carry out PCA on sampled data.
> summary(PCA) Importance of components: PC1 PC2 Standard deviation 1.3574 0.38496 Proportion of Variance 0.9256 0.07444 Cumulative Proportion 0.9256 1.00000 > print(PCA) Standard deviations: [1] 1.3573917 0.3849576 Rotation: PC1 PC2 [1,] −0.7133313 0.7008270 [2,] −0.7008270 −0.7133313
As usual in R, we may get the essential information about an object by using functions print or summary; here we are applying them on an object, PCA, which is a rich data structure including all of the relevant information about the results of the analysis. Let us check these results against the theory, starting with sample statistics, which are fairly close to the theoretical values (we also compute centered data along the way):
> m <- colMeans(X);m [1] −0.013842466 −0.009333299 > Xc <- X − (matrix(1,NN,1) %*% m) > S <- cov(X);S [,1] [,2] [1,] 1.0103328 0.8470275 [2,] 0.8470275 0.9803719
Then, let us compute eigenvalues and eigenvectors of the sample covariance matrix, and check them against the output of PCA:
> vv <- eigen(S) > eigval <- vv$values; eigval [1] 1.8425123 0.1481923 > sqrt(eigval) [1] 1.3573917 0.3849576 > PCA$sdev [1] 1.3573917 0.3849576 > eigvet <- vv$vectors;eigvet [,1] [,2] [1,] −0.7133313 0.7008270 [2,] −0.7008270 −0.7133313
Note that if we do not feel comfortable with eigenvalues and eigenvectors, we may also obtain the principal components directly.
> A <- PCA$rotation;A PC1 PC2 [1,] −0.7133313 0.7008270 [2,] −0.7008270 −0.7133313
We clearly see that the first principal component accounts for most variability, precisely:
We may also find the transformed variables directly:
> z <- PCA$x > cov(z) PC1 PC2 PC1 1.842512e+00 2.636065e-16 PC2 2.636065e-16 1.481923e-01 > head(z) PC1 PC2 [1,] −2.3881699 −0.83075134 [2,] 0.6764307 −1.07220758 [3,] 1.5929255 −0.33190052 [4,] 2.3407987 0.06721413 [5,] 1.6718914 −0.10795592 [6,] −0.8683767 0.56002701 > head(Xc %*% A) PC1 PC2 [1,] −2.3881699 −0.83075134 [2,] 0.6764307 −1.07220758 [3,] 1.5929255 −0.33190052 [4,] 2.3407987 0.06721413 [5,] 1.6718914 −0.10795592 [6,] −0.8683767 0.56002701
Finally, let us draw the scatterplots of the original variables and the principal components, as well as the boxplots of the principal components:
> par(mfrow=c(1,2)) > plot(X[,1], X[,2]) > plot(z[,1], z[,2]) > par(mfrow=c(1,1)) > boxplot(z[,1], z[,2], names=c(“Z1”,“Z2”))
The first plot, displayed in Fig. 3.45, shows the positive correlation of the original variables and the lack of correlation of the principal components. Note that the two scales here are different, however, and this hides the relative amounts of variability; the difference in variability is pretty evident in the two boxplots of Fig. 3.46.
FIGURE 3.45 Scatterplots of original variables and principal components.
FIGURE 3.46 Boxplots of principal components.
PCA can be applied in a Monte Carlo setting by reducing the number of random variables we have to sample from. A less obvious role is played in the contest of low-discrepancy sequences. As we shall see in Chapter 9, these sequences can replace random sampling in intermediate-sized problems, but may not be as effective in truly high-dimensional settings. Then, one can use low-discrepancy sequences to sample from the principal components, which have the most impact on the output, possibly using random sampling for the remaining ones. We should mention, however, that in order to apply such ideas we have to figure out the relationship between the distribution of the original variables and the principal components, which need not to be an easy task in general.
Another typical application of PCA is in reducing the number of risk factors in financial modeling. Consider, for instance, the term structure of interest rates, which specifies a set of (annualized) interest rates applying on different time horizons. To use the most general notation, let R(0, t, t + τ) be continuously compounded forward interest rate applying on the time period [t, t + τ], as seen at time 0. This is a forward rate, as it is the rate that can be contracted now for a time interval starting in the future. If we consider the set of rates R(t, t, t + τ) as a function of τ, we have the term structure of spot rates, i.e., the rates applying at time t for several maturities τ. Usually, the term structure is increasing in τ, but it moves in time, changing its shape in various ways. Clearly, modeling the term structure requires the definition of a model for each maturity; furthermore, these movements must preserve some basic consistency, in order to prevent arbitrage opportunities. Measuring and managing interest rate risk is complicated by the presence of so many interrelated risk factors, and Monte Carlo simulation can be challenging as well. One possible approach to make the problem more manageable is to replace the term structure by a few key risk factors combining rates at different maturities, possibly obtained by PCA.43 In typical models, the first few principal components can be interpreted and related to the level, slope, and curvature of the term structure.
Now let us consider another quite practical question: If we have a large set of jointly distributed random variables, what is the required effort to estimate their covariance structure? Equivalently, how many correlations we need? The covariance matrix is symmetric; hence, if we have n random variables, we have to estimate n variances and n(n − 1)/2 covariances. This amounts to
entries. Hence, if n = 1000, we should estimate 500,500 entries in the covariance matrix. A daunting task, indeed! If you think that such a case will never occur in practice, please consider a practical portfolio management problem. You might well consider 1000 assets for inclusion in the portfolio. In such a case, can we estimate the covariance matrix? What you know about statistical inference tells that you might need a lot of data to come up with a reliable estimate of a parameter. If you have to estimate a huge number of parameters, you need a huge collection of historical data. Unfortunately, many of them would actually tell us nothing useful: Would you use data from the 1940s to characterize the distribution of returns for IBM stock shares now? We need a completely different approach to reduce our estimation requirements.
The returns of a stock share are influenced by many factors. Some are general economic factors, such as inflation and economic growth. Others are specific factors of a single firm, depending on its management strategy, product portfolio, etc. In between, we may have some factors that are quite relevant for a group of firms within a specific industrial sectors, much less for others; think of the impact of oil prices on energy or telecommunications.
Rather than modeling uncertain returns individually, we might try to take advantage of this structure of general and specific factors. Let us take the idea to an extreme and build a simple model whereby there is one systematic factor common to all of stock shares, and a specific factor for each single firm. Formally, we represent the random return for the stock share of firm i as
where
The model above is called single-factor model for obvious reasons, but what are its advantages from a statistical perspective? Let us check how many unknown parameters we should estimate in order to evaluate expected return and variance of return for an arbitrary portfolio. To begin with, observe that for a portfolio with weights wi we have
Then,
where μm is the expected return of the market portfolio (more generally, the expected value of whatever systematic risk factor we choose). From this, we see that we need to estimate:
These add up to 2n + 1 parameters. Variance is a bit trickier, but we may use the diagonality condition (3.99) to eliminate covariances and obtain
where σm2 is the variance of the systematic risk factor and σi2 is the variance of each idiosyncratic risk factor, i = 1, …, n. They amount to n + 1 additional parameters that we should estimate, bringing the total to 3n + 2 parameters. In the case of n = 1000 assets, we have a grand total of 3,002 parameters; this is a large number, anyway, but it pales when compared with the 500,500 entries of the full covariance matrix.
In passing, we note that factor models also play a key role in financial economics. The capital asset pricing model (CAPM) is essentially a single-factor model, based on a systematic risk factor related to the return of the market portfolio; if we add some suitable hypotheses to the model, we are led to an important, though controversial, equilibrium model. If more factors are included in the model, plus some hypotheses on the structure of the market and the behavior of investors, we obtain another equilibrium model, arbitrage pricing theory (APT). The validity of these models is challenged by alternative views, as they are not just statistical but equilibrium models, relying on questionable assumptions. Nevertheless, the idea of factor models is widely used by practitioners who do not subscribe to CAPM or APT.
From a Monte Carlo point of view, factor models may simplify scenario generation. There might be some computational saving, but this is arguably not the really important point. What is more relevant is that we can provide the simulation model with more reliable parameter estimates and inputs, with a positive impact on output quality.
In this section we outline risk-neutral pricing for financial derivatives, most prominently options, for the sake of the unfamiliar reader. What we cover here is mostly relevant to Chapter 11. If a reader is not directly interested to this topic, but she wants nevertheless to see how Monte Carlo methods are used in this setting as an example, here are the essential results:
This justifies the application of Monte Carlo methods to estimate the expected payoff and the option price as a consequence; however, we have to use a modified model for the dynamics of the underlying asset. Typically, this just means changing the drift in a stochastic differential equation.
Options are financial derivatives, i.e., assets whose value depends on some other variable. This can be illustrated by considering a call option, a contract between two counterparts, written on a non-dividend-paying stock: The option holder, i.e., the party buying the option, has the right, but not the obligation, to buy the underlying asset from the option writer, i.e., the party selling the option on the primary market, at a prescribed future time and at a price established now. This is called the strike price, denoted by K. Let us denote the current price of the underlying asset by S0 (known when the option is written) and the future price at a time t = T by ST; the latter is, at time t = 0, a random variable. The time t = T corresponds to the option maturity. A European-style option can only be exercised at maturity. On the contrary, American-style options can be exercised at any time before their expiration date. The call option will be exercised only if ST ≥ K, since there would be no point in using the option to buy at a price K that is larger than the prevailing (spot) market price ST at time T. Then, the payoff of the option is, from the holder’s viewpoint,
To interpret this, consider the possibility of exercising the option and buying at price K an asset that can be immediately sold at ST. Market structure is not really that simple, but some options are settled in cash, so the payoff really has a monetary nature; this occurs for options written on a nontraded market index or on interest rates. This payoff is a piecewise linear function and is illustrated in Fig. 3.47. Put options, unlike calls, grant the holder the right to sell the underlying asset at the strike price. Hence, the payoff of a put option is
FIGURE 3.47 Payoff of a call option.
Put options have a positive payoff when the asset price drops, much like short selling strategies, whereby one borrows and sells an asset, hoping to buy the asset back at a lower price in order to give it back to the lender.
When the future asset price ST will be realized, at time t = T, the payoff will be clearly determined. What is not that clear is the fair price for the contract at time t = 0. It is easy to see that a put or call option must have a positive price, as their payoff to the option holder cannot be negative; the option writer, on the contrary, can suffer a huge loss; hence, the latter requires an option premium, paid at time t = 0, to enter into this risky contract.
The easiest way to get the essential message about option pricing is by referring to the binomial model of Section 3.2.1, based on multiplicative shocks and a generalization of the Bernoulli random variable. As shown in Fig. 3.48, the future price of the underlying asset can be either uS0 or dS0, with probabilities pu and pd, respectively. Conditional on the asset price at maturity, the option will provide its holder with a payoff fT, which can either be fu or fd. These values are also illustrated in Fig. 3.48. For instance, in the case of a call option with strike price K, we have
FIGURE 3.48 One-step binomial model for option pricing.
The pricing problem consists of finding a fair price f0 for the option, i.e., its value at time t = 0. Intuitively, one would guess that the two probabilities pu and pd play a major role in determining f0. What is not quite clear is the role that should be played by the overall risk aversion of market participants. However, a simple principle can be exploited in order to simplify our task. The key idea is to build a replicating portfolio, i.e., a portfolio consisting of elementary assets, such that its payoff replicates the option payoff. If we can do so and we know the current prices of the assets in the replicating portfolio, then it is easy to argue that the option value cannot be different from the value of the replicating portfolio. If this were not the case, the intuitive law of one price would be violated. We would have two equivalent portfolios with different prices, and we would just buy the cheaper one and sell the more expensive one to lock in a risk-free profit starting with no money. Such a money making machine is called an arbitrage opportunity, and option pricing relies on the assumption that arbitrage opportunities are ruled out. To be precise, what we need to assume is that they cannot exist in equilibrium, as investors would take immediate advantage of them, in such a way to move prices back to an equilibrium.
The replicating portfolio should certainly include the underlying asset, but we also need an additional one. Financial markets provide us with such an asset, in the form of a risk-free investment. We may consider a bank account yielding a risk-free interest rate r; we assume here that this rate is continuously compounded, so that if we invest $1 now, we will own $ert at a generic time instant t. Equivalently, we may think of a riskless zero-coupon bond, with initial price B0 = 1 and future price B1 = erT at time t = T, when the option matures. In order to find the number of stock shares and riskless bonds that we should include in the replicating portfolio, we just need to set up a system of two linear equations. Let us denote the number of stock shares by Δ and the number of bonds by . The initial value of this portfolio is
(3.100)
and its future value, depending on the realized state, will be either
Note that future value of the riskless bond does not depend on the realized state. If this portfolio has to replicate the option payoff, we should enforce the following two conditions:
(3.101)
Solving the system yields
Hence, by the law of one price, the option value now must be
A striking fact is that this relationship does not involve the objective probabilities pu and pd. However, let us consider the quantities
which happen to multiply the option payoffs in Eq. (3.102). We may notice two interesting features of πu and πd:
The last condition does make economic sense. If we had erT < d < u, the risky stock share would perform better than the risk-free bond in any scenario, and we would just borrow money at the risk-free rate and buy a huge amount of the stock share. On the contrary, if the risk-free rate is always larger than the stock price in the future, one could make easy money by selling the stock share short and investing the proceeds in the risk-free asset. In fact, we can interpret πu and πd as probabilities, and Eq. (3.102) can be interpreted in turn as the discounted expected value of the option payoff,
provided that we do not use the “objective” probabilities pu and pd, but πu and πd. Here the notation E[·] emphasizes this change in the probability measure. These are called risk-neutral probabilities, a weird name that we can easily justify. In fact, if we use the probabilities πu and πd, the expected value of the underlying asset at maturity is
But how is it possible that the expected return of a risky asset is just the risk-free rate? This might happen only in a world where investors do not care about risk, but only about the expected return; in other words, they do not ask for any risk premium. In such a risk-neutral world, the expected return of all assets, at equilibrium, should be the same, and if there is a risk-free asset, its return will be the expected return for all risky assets as well. This observation justifies the name associated with the probabilities that we use in option pricing. Indeed, what we have just illustrated is the tip of a powerful iceberg, the risk-neutral pricing approach, which plays a pivotal role in financial engineering.
The binomial model yields an intuitive and enlightening result. However, we cannot certainly represent uncertainty in stock prices by a Bernoulli random variable. If we introduce more states of nature, we have two ways to build a replicating portfolio: Either we introduce more assets, or we introduce a dynamic replicating strategy, i.e., we rebalance the portfolio several times along the option life. The second approach sounds much more promising, as it relies on the same elementary assets as before. In the limit, if we assume rebalancing in continuous time, we can rely on the machinery of stochastic calculus. Let us assume that the underlying asset price follows a geometric Brownian motion,
where to streamline notation a bit we use St and Wt rather than S(t) and W(t), and let us consider the viewpoint of the option writer, who holds a short position in the call option. Let us denote the fair option price at time t, when the underlying asset price is St, by f(St, t). Note that we are taking for granted that the option price depends on these two variables only, which makes sense for a path-independent, vanilla option. In the case of Asian options, where the payoff depends on average prices, things are not that simple. Using Itô’s lemma, we may write a stochastic differential equation for f:
Just as in the binomial case, what we know is the option value at maturity,
and what we would like to know is f(S0, 0), the fair option price now. The option holder has a position of value −f(St, t) and is subject to considerable risk: If the stock price rockets up, the option holder will exercise, and the writer will have to buy the underlying asset at a high price, unless some countermeasures are taken. One possibility is to buy some stock shares, say Δ, and build a hedged portfolio of value
Equation (3.104) does not suggest an immediate way to find the option price, but it would look a little bit nicer without the random term dS. Indeed, it is easy to see that, by choosing
we can hedge risk by eliminating the random term dS, which includes the increment of the Wiener process. By differentiating Π, we find
With the above choice of Δ, our portfolio is riskless; hence, by no-arbitrage arguments, it must earn the risk-free interest rate r:
Eliminating dΠ between Eqs. (3.105) and (3.106), we obtain
and finally
(3.107)
Now we have a deterministic partial differential equation (PDE for short) describing an option value f(S, t). This is the Black–Scholes–Merton PDE. Such a PDE must be solved with suitable boundary conditions, but before doing so, the following observations are in order about the choice of Δ:
What does not quite look consistent with the binomial model is that there we obtain the price as an expected value, whereas here we have a PDE. Actually, they are two sides of the same coin, and the gap can be bridged by one version of the Feynman–Kač formula.
THEOREM 3.17 Feynman–Kač representation theorem. Consider the partial differential equation
and let F = F(x, t) be a solution satisfying the boundary condition
Then, under technical conditions, F(x, t) can be represented as
where Xt is a stochastic process satisfying the differential equation
with initial condition Xt = x.
The notation Ex,t[·] is used to point out that this is a conditional expectation, given that at time t the value of the stochastic process is Xt = x. From a mathematical viewpoint, the theorem is a consequence of how the stochastic integral in the sense of Itô is defined (see [1] for a clear proof). From a physical point of view, it is a consequence of the connection between Brownian motion (which is a diffusion process) and a certain type of PDEs which can be transformed into the heat equation.
The application of the representation theorem to the Black–Scholes–Merton equation, for an option with payoff function , immediately yields
which is consistent with Eq. (3.103). We point out once more that the expectation is taken under a risk-neutral measure, which essentially means that we work as if the stochastic differential equation for St were
By recalling the result of Eq. (3.93), we see that under the risk-neutral measure the price St is lognormal and may be written as
where ~ N(0, 1). Then, it is a fairly easy, eve though a bit tedious, exercise in integration to prove the celebrated Black–Scholes–Merton formula (BSM formula for short), which gives the price C of a vanilla European-style call option:
where
and Φ(x) is the cumulative distribution function for the standard normal distribution:
It is easy to use R to implement the above formula and plot the call price for various underlying asset prices and times to maturity. The code in Fig. 3.49 produces the plot in Fig. 3.50. Note that the implementation is a bit naive, in the sense that it does not really work with some input arguments. For instance, when time to maturity is zero and the asset price is the same as the strike price, the function returns NaN. This is the reason why in the last line of the script we plot the option value at maturity by computing the payoff. The reader is invited to improve the code, including some consistency check on the input (for instance, volatility cannot be negative).
FIGURE 3.49 Naive implementation of the Black–Scholes–Merton formula for a vanilla call.
FIGURE 3.50 Call option prices.
The same reasoning can be applied to price a put option, but there is a very useful theorem that solves the problem.
THEOREM 3.18 (Put–call parity) Consider a call and a put options written on the same non-dividend-paying stock, with the same strike price K and time to maturity T. Then their prices C0 and P0 at time t = 0 are related as follows:
where S0 is the current stock price and r is the continuously compounded risk-free interest rate.
PROOF Consider a portfolio consisting of a call option and an amount of cash given by Ke−rT. At maturity, the value of this portfolio is
By a similar token, at maturity the value of a portfolio consisting of a put option and one share of the underlying stock is
Since the two portfolios have the same value in any future scenario, they cannot have a different value now, which implies the statement of the theorem.
This proof is instructive and nontechnical, which is why we have included it: It does not rely on any model and is just based on a no-arbitrage consideration, in the form of the law of one price. When we plug the BSM formula for the call option price into Eq. (3.109), we find a similar formula for the put option price:
where d1 and d2 are the same expressions occurring in the call option price. A fundamental advantage of the BSM formula, apart from its computational appeal, is that we may evaluate option sensitivities as well. For instance, the number of shares Δ in the hedging portfolio for a call option is
(3.111)
This quantity is commonly referred to as “delta,” and it may be interpreted as the sensitivity of the option price to changes in the underlying asset price. By differentiating the put–call parity relationship with respect to S0 we immediately find the corresponding delta for the put option:
(3.112)
Other sensitivities may be obtained, collectively nicknamed as the greeks:
where ϕ(·) is the PDF of a standard normal, not to be confused with the CDF Φ(·). Equation (3.113) defines and evaluates the option gamma, i.e., the second-order derivative of the option price with respect to the underlying asset price. Equation (3.114) defines and evaluates the option vega, i.e., the first-order derivative of the option price with respect to volatility. From put–call parity we observe that these two greeks are the same for call and put options.
Another relevant greek is theta, which is related to the passage of time:
We see that different formulas are required for call and put options, but there is a subtle point here: We have expressed call and option prices at time t = 0, as a function of time to maturity T. A more general statement would be recast as a function of time to maturity T − t. This is easily accomplished by replacing T with T − t in Eqs. (3.108) and (3.110). It is important to realize that the derivatives defining the option thetas in Eqs. (3.115) and (3.116) are with respect to t, which is then set to t = 0 to find the current theta. Typically, option thetas are negative, which shows that option values are subject to a decay over time.
The option sensitivities, among other things, may be used to evaluate the risk involved in holding a portfolio of options, as we will see in Section 13.3. All of the above formulas are implemented in the R code of Fig. 3.51.
FIGURE 3.51 R code to evaluate the put option price and the essential greeks within the BSM framework.
The BSM formula relies on the fact that we can replicate an option. A market in which any option (more generally, any contingent claim) can be replicated, is called a complete market. But the obvious question arises: If the BSM formula were correct, why should one trade options? Indeed, in a complete market, one can replicate any derivative, which turns out to be a redundant and useless asset. A fundamental theorem states that if there are no arbitrage opportunities in the market, then there exists a risk-neutral measure that can be used for pricing. But it is market completeness that ensures its uniqueness. However, markets in real life are incomplete, and even if we rule out arbitrage opportunities, we fail to find a unique risk-neutral measure for pricing purposes.
A market can be incomplete for many reasons:
The last case is quite relevant for interest rate derivatives. We can model uncertainty in interest rates using Itô stochastic differential equations, but we cannot use the interest rate itself in a replicating portfolio. Hence, we cannot build a replicating portfolio using the underlying risk factor. Nevertheless, we still can get rid of risk. In fact, the idea here is not to replicate a risky derivative, but to build a riskless portfolio using two assets depending on the same risk factor.
Assume that we consider the short rate rt as the underlying risk factor, and let us denote by Z1(rt, t) and Z2(rt, t) be the prices of two traded assets, whose value depends on the short rate rt at time t. These assets can be as simple as two zero-coupon bonds, which explains the notation, or more complex interest rate derivatives. Let us assume the following general model for the interest rate:44
(3.117)
where the drift rate may feature mean reversion. Indeed, the model includes the Vasicek and the Cox–Ingersoll–Ross models of Section 3.7.5. Using Itô’s lemma as usual, we may find the stochastic differential equation for the derivative value Z(rt, t):
(3.118)
We may also get rid of risk by setting up a portfolio involving both Z1 and Z2 in the right proportions. Let us build a portfolio as follows:
for a suitable choice of Δ. We note that the above choice implies:
From the first condition, we see that we obtain ∂Π/∂r = 0 if we choose
Actually, we are using the same trick as in the BSM case, but now it involves two risky assets depending on the same risk factor. That choice of Δ eliminates the impact of both the underlying risk and the drift of the short rate:
Since this portfolio is risk-free, we can also write
By linking Eqs. (3.121) and (3.122), taking advantage of Eqs. (3.119) and (3.120), and rearranging, we find
(3.123)
Substituting Δ and rearranging again yields the fundamental pricing equation
We note that the drift term m(rt, t) does not play any role, and this is consistent with the BSM case. However, here we are relating two derivative prices. We can observe that the two ratios above refer to generic derivatives depending on the risk factor rt, which implies that for any such derivative they are equal to some function of time and the factor itself. Let us define a function m* (rt, t) as minus the ratios in Eq. (3.124); we change the sign just for the sake convenience, as this allows to write the following pricing PDE:
(3.125)
We notice that this is quite close to the BSM partial differential equation, but it involves an unknown function m* (rt, t). Using the Feynman–Kač theorem, we can think of the derivative price as the expected value of the payoff, if the risk factor follows the risk-adjusted dynamics
(3.126)
Once again, we notice that there is a change in drift, which reflects a change in probability measure. However, due to market incompleteness, we do not know a priori which measure we should use, as uniqueness of the risk-neutral measure requires completeness. The difficulty is that the drift is related to a market price of risk, which does play a role here since we cannot hedge using the underlying asset directly. The way out of the dilemma requires calibrating the model using some traded derivatives. In the case of interest rates, we might use the quoted prices of bonds or other liquid assets to formulate a model calibration problem and solve the corresponding optimization model to estimate the function m*(rt,t). Having pinned down the right risk-adjusted measure, we may price other derivatives by using analytical formulas, when available, or by using numerical methods to solve the PDE, like finite differences, or by estimating the price in expectation form by Monte Carlo methods, thanks to the Feynman–Kač representation theorem.
From our viewpoint, we will assume that the model has already been calibrated and that we may proceed to price the derivative asset by estimating the risk-neutral expectation directly. We will see some examples related to interest rates in Section 6.3 and in Chapter 11.
1 T. Björk. Arbitrage Theory in Continuous Time (2nd ed.). Oxford University Press, Oxford, 2004.
2 P.J. Brockwell and R.A. Davis. Time Series: Theory and Methods (2nd ed.). Springer, New York, 1991.
3 C. Brooks. Introductory Econometrics for Finance (2nd ed.). Cambridge University Press, Cambridge, 2008.
4 J.Y. Campbell and L.M. Viceira. Strategic Asset Allocation. Oxford University Press, Oxford, 2002.
5 P. Embrechts, C. Kluppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. Springer, Berlin, 2000.
6 P. Embrechts, F. Lindskog, and A. McNeil. Modelling dependence with copulas and applications to risk management. In S.T. Rachev, editor, Handbook of Heavy Tailed Distributions in Finance, pp. 329–384. Elsevier, Amsterdam, 2003.
7 W.H. Greene. Econometric Analysis (7th ed.). Prentice Hall, Upper Saddle River, NJ, 2011.
8 G. Grimmett and D. Stirzaker. Probability and Random Processes (3rd ed.). Oxford University Press, Oxford, 2003.
9 J.D. Hamilton. Time Series Analysis. Princeton University Press, Princeton, NJ, 1994.
10 W. Härdle and L. Simar. Applied Multivariate Statistical Analysis (2nd ed.). Springer, Berlin, 2007.
11 S. Hubbert. Essential Mathematics for Market Risk Management. Wiley, Hoboken, NJ, 2012.
12 J.C. Hull. Options, Futures, and Other Derivatives (8th ed.). Prentice Hall, Upper Saddle River, NJ, 2011.
13 H. Joe. Multivariable Models and Dependence Concepts. Chapman & Hall/CRC, Boca Raton, FL, 2001.
14 J. Knight and S. Satchell, editors. Linear Factor Models in Finance. Elsevier, Amsterdam, 2005.
15 D.P. Kroese, T. Taimre, and Z.I. Botev. Handbook of Monte Carlo Methods. Wiley, Hoboken, NJ, 2011.
16 L. Martellini, P. Priaulet, and S. Priaulet. Fixed-Income Securities: Valuation, Risk Management, and Portfolio Strategies. Wiley, Chichester, 2003.
17 T. Mikosch. Elementary Stochastic Calculus with Finance in View. World Scientific Publishing, Singapore, 1998.
18 R.B. Nelsen. An Introduction to Copulas. Springer, New York, 1999.
19 E.E. Qian, R.H. Hua, and E.H. Sorensen, editors. Quantitative Equity Portfolio Management: Modern Techniques and Applications. CRC Press, London, 2007.
20 A.C. Rencher. Methods of Multivariate Analysis (2nd ed.). Wiley, New York, 2002.
21 S. Shreve. Stochastic Calculus for Finance (vols. I & II). Springer, New York, 2003.
22 R.S. Tsay. An Introduction to the Analysis of Financial Data with R. Wiley, Hoboken, NJ, 2013.
23 P. Veronesi. Fixed Income Securities: Valuation, Risk, and Risk Management. Wiley, Hoboken, NJ, 2010.
1 See Section 4.4.
2See Section 8.3.
3What we use here as return is referred to as rate of return in some treatments, but we sometimes prefer to streamline terminology a bit.
4For the unfamiliar reader, we formally introduce covariance later in Section 3.3.
5To put it bluntly, relying on past returns is akin to driving by just looking at the rear-view mirror.
6This is a streamlined version of the model described in [4, Chapter 7]. See also Chapter 10 to see how such problems may be tackled by stochastic dynamic programming.
7 Throughout this book, we use log rather than ln, but we always mean natural logarithms.
8 Actually, the precise definition of Gaussian processes does not only involve normal marginals but also jointly normal distributions.
9 Needless to say, Monte Carlo methods play a key role in more complicated cases; see Chapter 11.
10 We have outlined how a simple binomial model may be calibrated by moment matching in Section 2.5.1.
11 We recall the definition of the factorial of an integer number, n! ≡ n · (n − 1) · (n − 2) ··· 2·1; by convention, 0! = 1.
12 Such a mechanism is found, e.g., in rejection sampling, as we shall see in Section 5.4.
13 One could wonder why we typically use X ~ N(μ, σ2), referring to variance, and not X ~ N(μ, σ), referring to standard deviation. The reason is that this notation easily generalizes to the multivariate case X ~ N(μ, Σ), where Σ is the covariance matrix; see Section 3.3.
14 See Section 8.6.2.
15 We do not deal with stable distributions in this introductory treatment, since they require more advanced concepts; see, e.g., [15, pp. 129–131].
16See Section 6.3.2, where we deal with sample path generation for continuous-time stochastic processes.
17See Section 4.6 and Chapter 14.
18Actually, they are sampled from a joint normal distribution with μ1 = μ2 = 10, σ1 = σ2 = 5. The multivariate normal distribution is introduced in Section 3.3.4.
19You are advised not to try estimating Kendall’s correlation with R on a very large data set, as this may require a lot of computation and is likely to block your computer.
20See, e.g., [7] for a proof.
21See, e.g., [6] or [18].
22These counterexamples are borrowed from [18, p. 6].
23See the inverse transform method in Chapter 5.
24See, e.g., [6].
25See [5] for details.
26The denominator in Eq. (3.51) is related to the estimate of autocovariance. In the literature, sample autocovariance is typically obtained by dividing the sum by T, even though this results in a biased estimator. If we divide by T − k, we might obtain an autocovariance matrix which is not positive semidefinite; see Brockwell and Davis [2], pp. 220–221.
27 Actually, expression is much more powerful; please refer to the online help.
28 To be precise, this representation requires that the two polynomials ϕ(B) and Θ(B) satisfy some additional technical conditions that are beyond the scope of this book.
29 The acronym VAR is a bit unfortunate, as it may be confused with variance or value-at-risk. To avoid confusion, variance is typically written as Var and value-at-risk as VaR; some authors use V@R for the latter, and we will follow that practice in this book.
30 The square-root rule, which we consider in Section 3.7.1, suggests that on very short time spans volatility dominates drift.
31 The parameters used in the simulation are calibrated on real data in [22, p. 202].
32 Sample path generation for stochastic differential equations is the subject of Chapter 6.
33 This has some implications on the evaluation of risk measures like value-at-risk; see Chapter 13.
34We have already made a quite similar point in Example 3.3.
35See Example 8.9.
36We have already met this function in the first chapter; please refer to Fig. 1.5 for programming notes. Later, we will describe a more efficient version.
37See Section 3.2.3.
38One might well wonder whether an increasing exponential trend makes sense for the prices of stock shares, as we do not observe this in real life. One reason is that dividends are paid, bringing down the share price.
39We should maybe talk about orthonormal vectors and matrices, but we will follow the more common parlance.
40See Property 3.6 in Section 3.3.4.1.
41 Strictly speaking, we are in trouble here, since we are maximizing a convex quadratic form. However, we may replace the equality constraint by an inequality constraint, which results in a concave problem, i.e., a problem in which the optimal solution is on the boundary of the feasible solution. It turns out that the solution we pinpoint using the Lagrange multiplier method is the right one.
42 This example has been adapted from [10, Chapter 9].
43 See, e.g., Chapters 3 and 6 of [16].
44The treatment below is related to [23].