Chapter Three: Stochastic Modeling in Finance and Economics

Monte Carlo methods are often the only feasible approach to cope with high-dimensional problems. Nevertheless, they may require an excessive computational effort to produce a reliable answer. Hence, strategies to reduce the dimensionality of a problem are welcome anyway. Moreover, they can also be put to good use when applying alternative approaches, such as low-discrepancy sequences. Apart from computational convenience, there is another driving force behind dimensionality reduction. While in this chapter we are mainly concerned with model building, in the next one we will deal with model estimation. A rich model may be quite appealing in principle, but it turns into a nightmare if it calls for a huge amount of data for its estimation, and a badly estimated data will make the output of the most sophisticated Monte Carlo simulation utterly useless, if not worse.

Data and dimensionality reduction methods are an important and wide topic in multivariate statistics, and we cannot provide anything but some hints on the most relevant tools for applications in finance and economics, to provide the reader with a feeling for them. In this section we outline two methods:

Principal component analysis, PCA for short

Factor models

The second approach should not be confused with factor analysis, which is another multivariate analysis technique for dimensionality reduction. The main difference is that factor analysis techniques deal with latent and nonobservable factors, arguably due to their origin in psychology. The factor models we consider here are quite common in finance and aim at easing some statistical estimation issues when a large number of assets is analyzed.

3.8.1 PRINCIPAL COMPONENT ANALYSIS (PCA)

To introduce principal component analysis (PCA) in the most intuitive way, let us have a look at Fig. 3.42, which shows a scatterplot of observations of two random variables X₁ and X₂. It is clear that the two random variables are positively correlated. Now imagine that, for some reason, we can only afford dealing with a single variable. In order to streamline our model of uncertainty, one possibility would be to get rid of one of the two variables, but which one? Since the variables are related and both display some non-negligible variability, discarding one the two variables looks like a very crude approach to dimensionality reduction. We should try to come up with a single variable trying to capture the most information. Now, consider the two axes in the figure, labeled with variables Z₁ and Z₂. Of course, we may represent the very same set of observations in terms of Z₁ and Z₂. This is just a change of coordinates, accomplished in two steps:

Data have been centered, which is obtained by subtracting the sample mean from the observations: X − . This is just a shift in the origin of reference axes, but it is often recommended to improve numerical stability in data analysis algorithms.

The coordinates Z₁ and Z₂ are obtained by a suitable rotation of the axes.

Vector rotation is accomplished by multiplication by an orthogonal matrix. A square matrix A is orthogonal if its columns are a set of orthogonal unit vectors,³⁹ which implies

This also shows that this matrix can be inverted by transposition. To see that multiplication by such a matrix is a rotation, we may observe that it does not change the norm of vectors:

Since orthogonal matrices enjoy plenty of nice properties, we might be able to find a linear transformation of variables with some interesting structure, Indeed, in Fig. 3.42 we observe two important facts about the new variables Z₁ and Z₂:

1. They are uncorrected.

2. The first variable is associated with the largest variance.

Now, if we had to get rid of one variable, we would have no doubt: Z₁ conveys the most information, and the residual one is not only less relevant, but also uncorrelated.

Now that we have a feeling for what PCA is and why it may be useful, we may investigate it in more depth following two different routes to introduce PCA:

1. As a means of taking linear combinations of variables, in such a way that they are uncorrelated.

2. As a way to combine variables so that the first component has maximal variance, the second one has the next maximal variance, and so forth.

Let us pursue both viewpoints, which actually lead us to same conclusions, as we will see.

3.8.1.1 A geometric view of PCA

Let p be the dimension of the random vector X and n the number of available joint observations. Such observations are collected into the following matrix:

where the element [X]_kj in row k and column j is the component j of observation k, i.e., X^(k)_j.

where A

. We assume that data have already been centered, in order to ease notation. Hence,

The Z_i variables are called principal components: Z₁ is the first principal component. Now, let us consider the sample covariance matrix of X, denoted by S_x, with entries

Now, we may also find the corresponding sample covariance matrix for Z, S_z. However, we would like to find a matrix A such that the resulting principal components are uncorrelated; in other words, S_z should be diagonal:⁴⁰

where S_{Z_i}². is the sample variance of each principal component. The matrix A should diagonalize the sample covariance matrix S_x. To diagonalize S_x, we should consider the product

where matrix P is orthogonal and its columns consist of the normalized eigenvectors of S_x; since this is symmetric, its eigenvectors are indeed orthogonal. The diagonalized matrix consists of the eigenvalues λ_i, i = 1, …, p, of S_x. Putting everything together, we see that the rows a_i, i = 1, …, p, of matrix A should be the normalized eigenvectors:

We also observe that the sample variances of the principal components Z_i are the eigenvalues of S_x:

If we sort eigenvalues in decreasing order, λ₁ ≥ λ₂ ≥ … ≥ λ_p, we can see that indeed Z₁ is the first principal component, accounting for most variability. Then, the second principal component Z₂ is uncorrected with Z₁ and is the second in rank. The fraction of variance explained by the first q components is

Taking the first few components, we can account for most variability and reduce the problem dimension by replacing the original variables by the principal components.

3.8.1.2 Another view of PCA

Another view of PCA is obtained by interpreting the first principal component in terms of orthogonal projection. Consider a unit vector u

^p, and imagine projecting the observed vector X on u. This yields a vector parallel to u, of length uX. Since u has unit length, the projection of observation X^(k) on u is

We are projecting p-dimensional observations on just one axis, and, of course, we would like to have an approximation that is as good as possible. More precisely, we should find u in such a way that the distance between the originally observed vector X^(k) and its projection is as small as possible. If we have a sample of n observations, we should minimize the average distance

which looks much like a least-squares problem. This amounts to an orthogonal projection of the original vectors on u, where we know that the original and the projected vectors are orthogonal. Hence, we can apply the Pythagorean theorem to rewrite the problem:

In order to minimize the left-hand side of this equality, on the average, we have to maximize

But we know that, assuming that data are centered, the sample covariance matrix is

; hence, the problem is equivalent to

In plain English, what we want is finding one dimension on which multidimensional data should be projected, in such a way that the variance of the projected data is maximized. This makes sense from a least-squares perspective, but it also has an intuitive appeal: The dimension along which we maximize variance is the one providing the most information.

To solve the problem above, we may associate the constraint (3.97) with a Lagrange multiplier λ and augment the objective function (3.96) to obtain the Lagrangian function:⁴¹

This amounts to saying that λ must be an eigenvalue of the sample covariance matrix, but which one? We can rewrite the objective function (3.96) as follows:

Hence, we see that λ should be the largest eigenvalue of S_X, u is the corresponding normalized eigenvector, and we obtain the same result as in the previous section. Furthermore, we should continue on the same route, by asking for another direction in which variance is maximized, subject to the constraint that it is orthogonal to the first direction we found. Since eigenvectors of a symmetric matrix are orthogonal, we see that indeed we will find all of them, in decreasing order of the corresponding eigenvalues.

3.8.1.3 A small numerical example

Principal component analysis in practice is carried out on sampled data, but it may be instructive to consider an example where both the probabilistic and the statistical sides are dealt with.⁴² Consider first a random variable with bivariate normal distribution, X ~ N(0, Σ), where

and ρ > 0. Thus, X₁ and X₂ are standard normal variables with positive correlation ρ. To find the eigenvalues of Σ, we must find its characteristic polynomial and solve the corresponding equation

This yields the two eigenvalues λ₁ = 1 + ρ and λ₂ = 1 − ρ. Note that both eigenvalues are non-negative, since ρ

[−1, 1] is a correlation coefficient. To find the first eigenvector, associated with λ₁, we consider the system of linear equations

Clearly, the two equations are linearly dependent and any vector such that u₁ = u₂ is an eigenvector. By a similar token, any vector such that u₁ = −u₂ is an eigenvector corresponding to λ₂. Two normalized eigenvectors are

Since we are dealing with standard normals, μ = 0 and the first principal component is

As a further check, let us compute the variance of the first principal component:

Figure 3.43 shows the level curves of the joint density of X when ρ = 0.85. Since correlation is positive, the main axis of the ellipses has positive slope. It is easy to see that along that direction we have the largest variability.

Practical PCA is carried out on sampled data, and R provides us with a suitable function called prcomp. The R code in Fig. 3.44 samples 2000 points from the above bivariate normal distribution, with ρ = 0.85, carries out PCA, and produces the following output:

As usual in R, we may get the essential information about an object by using functions print or summary; here we are applying them on an object, PCA, which is a rich data structure including all of the relevant information about the results of the analysis. Let us check these results against the theory, starting with sample statistics, which are fairly close to the theoretical values (we also compute centered data along the way):

Then, let us compute eigenvalues and eigenvectors of the sample covariance matrix, and check them against the output of PCA:

Note that if we do not feel comfortable with eigenvalues and eigenvectors, we may also obtain the principal components directly.

We clearly see that the first principal component accounts for most variability, precisely:

Finally, let us draw the scatterplots of the original variables and the principal components, as well as the boxplots of the principal components:

The first plot, displayed in Fig. 3.45, shows the positive correlation of the original variables and the lack of correlation of the principal components. Note that the two scales here are different, however, and this hides the relative amounts of variability; the difference in variability is pretty evident in the two boxplots of Fig. 3.46.

3.8.1.4 Applications of PCA

PCA can be applied in a Monte Carlo setting by reducing the number of random variables we have to sample from. A less obvious role is played in the contest of low-discrepancy sequences. As we shall see in Chapter 9, these sequences can replace random sampling in intermediate-sized problems, but may not be as effective in truly high-dimensional settings. Then, one can use low-discrepancy sequences to sample from the principal components, which have the most impact on the output, possibly using random sampling for the remaining ones. We should mention, however, that in order to apply such ideas we have to figure out the relationship between the distribution of the original variables and the principal components, which need not to be an easy task in general.

Another typical application of PCA is in reducing the number of risk factors in financial modeling. Consider, for instance, the term structure of interest rates, which specifies a set of (annualized) interest rates applying on different time horizons. To use the most general notation, let R(0, t, t + τ) be continuously compounded forward interest rate applying on the time period [t, t + τ], as seen at time 0. This is a forward rate, as it is the rate that can be contracted now for a time interval starting in the future. If we consider the set of rates R(t, t, t + τ) as a function of τ, we have the term structure of spot rates, i.e., the rates applying at time t for several maturities τ. Usually, the term structure is increasing in τ, but it moves in time, changing its shape in various ways. Clearly, modeling the term structure requires the definition of a model for each maturity; furthermore, these movements must preserve some basic consistency, in order to prevent arbitrage opportunities. Measuring and managing interest rate risk is complicated by the presence of so many interrelated risk factors, and Monte Carlo simulation can be challenging as well. One possible approach to make the problem more manageable is to replace the term structure by a few key risk factors combining rates at different maturities, possibly obtained by PCA.⁴³ In typical models, the first few principal components can be interpreted and related to the level, slope, and curvature of the term structure.

3.8.2 FACTOR MODELS

Now let us consider another quite practical question: If we have a large set of jointly distributed random variables, what is the required effort to estimate their covariance structure? Equivalently, how many correlations we need? The covariance matrix is symmetric; hence, if we have n random variables, we have to estimate n variances and n(n − 1)/2 covariances. This amounts to

entries. Hence, if n = 1000, we should estimate 500,500 entries in the covariance matrix. A daunting task, indeed! If you think that such a case will never occur in practice, please consider a practical portfolio management problem. You might well consider 1000 assets for inclusion in the portfolio. In such a case, can we estimate the covariance matrix? What you know about statistical inference tells that you might need a lot of data to come up with a reliable estimate of a parameter. If you have to estimate a huge number of parameters, you need a huge collection of historical data. Unfortunately, many of them would actually tell us nothing useful: Would you use data from the 1940s to characterize the distribution of returns for IBM stock shares now? We need a completely different approach to reduce our estimation requirements.

The returns of a stock share are influenced by many factors. Some are general economic factors, such as inflation and economic growth. Others are specific factors of a single firm, depending on its management strategy, product portfolio, etc. In between, we may have some factors that are quite relevant for a group of firms within a specific industrial sectors, much less for others; think of the impact of oil prices on energy or telecommunications.

Rather than modeling uncertain returns individually, we might try to take advantage of this structure of general and specific factors. Let us take the idea to an extreme and build a simple model whereby there is one systematic factor common to all of stock shares, and a specific factor for each single firm. Formally, we represent the random return for the stock share of firm i as

α_i and β_i are parameters to be estimated.

R_m is a random variable representing the systematic risk factor; the subscript m stands for market; indeed, financial theory suggests that a suitable systematic factor could be the return of a market portfolio consisting of all stock shares, with a proportion depending on their relative capitalization with respect to the whole market.

_i is a random variable representing specific risk, which in financial parlance is also referred to as idiosyncratic risk; a natural assumption about these variables is that E[

_i] = 0 (otherwise, we would include the expected value into α_i). Another requirement is that the systematic factor really captures whatever the stock returns have in common, and that the specific factors are independent. Typically, we do not require independence, but only lack of correlation. These requirements can be formalized as

(3.98) equation

(3.99) equation

In the next chapter, where we deal with linear regression, we will see that condition (3.98) is actually ensured by model estimation procedures based on least squares. On the contrary, condition (3.99) is just an assumption, resulting in a so-called diagonal model, since the covariance matrix of specific factors is diagonal.

The model above is called single-factor model for obvious reasons, but what are its advantages from a statistical perspective? Let us check how many unknown parameters we should estimate in order to evaluate expected return and variance of return for an arbitrary portfolio. To begin with, observe that for a portfolio with weights w_i we have

where μ_m is the expected return of the market portfolio (more generally, the expected value of whatever systematic risk factor we choose). From this, we see that we need to estimate:

n parameters α_i, i = 1, …, n

n parameters β_i, i = 1, …, n

The expected value μ_m

These add up to 2n + 1 parameters. Variance is a bit trickier, but we may use the diagonality condition (3.99) to eliminate covariances and obtain

where σ_m² is the variance of the systematic risk factor and σ_i² is the variance of each idiosyncratic risk factor, i = 1, …, n. They amount to n + 1 additional parameters that we should estimate, bringing the total to 3n + 2 parameters. In the case of n = 1000 assets, we have a grand total of 3,002 parameters; this is a large number, anyway, but it pales when compared with the 500,500 entries of the full covariance matrix.

In passing, we note that factor models also play a key role in financial economics. The capital asset pricing model (CAPM) is essentially a single-factor model, based on a systematic risk factor related to the return of the market portfolio; if we add some suitable hypotheses to the model, we are led to an important, though controversial, equilibrium model. If more factors are included in the model, plus some hypotheses on the structure of the market and the behavior of investors, we obtain another equilibrium model, arbitrage pricing theory (APT). The validity of these models is challenged by alternative views, as they are not just statistical but equilibrium models, relying on questionable assumptions. Nevertheless, the idea of factor models is widely used by practitioners who do not subscribe to CAPM or APT.

From a Monte Carlo point of view, factor models may simplify scenario generation. There might be some computational saving, but this is arguably not the really important point. What is more relevant is that we can provide the simulation model with more reliable parameter estimates and inputs, with a positive impact on output quality.

3.9 Risk-neutral derivative pricing

In this section we outline risk-neutral pricing for financial derivatives, most prominently options, for the sake of the unfamiliar reader. What we cover here is mostly relevant to Chapter 11. If a reader is not directly interested to this topic, but she wants nevertheless to see how Monte Carlo methods are used in this setting as an example, here are the essential results:

Options (and other derivatives) can be priced by taking the expected value of the discounted payoff.

The expectation must be taken under a risk-neutral (or risk-adjusted) probability measure.

This justifies the application of Monte Carlo methods to estimate the expected payoff and the option price as a consequence; however, we have to use a modified model for the dynamics of the underlying asset. Typically, this just means changing the drift in a stochastic differential equation.

Options are financial derivatives, i.e., assets whose value depends on some other variable. This can be illustrated by considering a call option, a contract between two counterparts, written on a non-dividend-paying stock: The option holder, i.e., the party buying the option, has the right, but not the obligation, to buy the underlying asset from the option writer, i.e., the party selling the option on the primary market, at a prescribed future time and at a price established now. This is called the strike price, denoted by K. Let us denote the current price of the underlying asset by S₀ (known when the option is written) and the future price at a time t = T by S_T; the latter is, at time t = 0, a random variable. The time t = T corresponds to the option maturity. A European-style option can only be exercised at maturity. On the contrary, American-style options can be exercised at any time before their expiration date. The call option will be exercised only if S_T ≥ K, since there would be no point in using the option to buy at a price K that is larger than the prevailing (spot) market price S_T at time T. Then, the payoff of the option is, from the holder’s viewpoint,

To interpret this, consider the possibility of exercising the option and buying at price K an asset that can be immediately sold at S_T. Market structure is not really that simple, but some options are settled in cash, so the payoff really has a monetary nature; this occurs for options written on a nontraded market index or on interest rates. This payoff is a piecewise linear function and is illustrated in Fig. 3.47. Put options, unlike calls, grant the holder the right to sell the underlying asset at the strike price. Hence, the payoff of a put option is

Put options have a positive payoff when the asset price drops, much like short selling strategies, whereby one borrows and sells an asset, hoping to buy the asset back at a lower price in order to give it back to the lender.

When the future asset price S_T will be realized, at time t = T, the payoff will be clearly determined. What is not that clear is the fair price for the contract at time t = 0. It is easy to see that a put or call option must have a positive price, as their payoff to the option holder cannot be negative; the option writer, on the contrary, can suffer a huge loss; hence, the latter requires an option premium, paid at time t = 0, to enter into this risky contract.

3.9.1 OPTION PRICING IN THE BINOMIAL MODEL

The easiest way to get the essential message about option pricing is by referring to the binomial model of Section 3.2.1, based on multiplicative shocks and a generalization of the Bernoulli random variable. As shown in Fig. 3.48, the future price of the underlying asset can be either uS₀ or dS₀, with probabilities p_u and p_d, respectively. Conditional on the asset price at maturity, the option will provide its holder with a payoff f_T, which can either be f_u or f_d. These values are also illustrated in Fig. 3.48. For instance, in the case of a call option with strike price K, we have

The pricing problem consists of finding a fair price f₀ for the option, i.e., its value at time t = 0. Intuitively, one would guess that the two probabilities p_u and p_d play a major role in determining f₀. What is not quite clear is the role that should be played by the overall risk aversion of market participants. However, a simple principle can be exploited in order to simplify our task. The key idea is to build a replicating portfolio, i.e., a portfolio consisting of elementary assets, such that its payoff replicates the option payoff. If we can do so and we know the current prices of the assets in the replicating portfolio, then it is easy to argue that the option value cannot be different from the value of the replicating portfolio. If this were not the case, the intuitive law of one price would be violated. We would have two equivalent portfolios with different prices, and we would just buy the cheaper one and sell the more expensive one to lock in a risk-free profit starting with no money. Such a money making machine is called an arbitrage opportunity, and option pricing relies on the assumption that arbitrage opportunities are ruled out. To be precise, what we need to assume is that they cannot exist in equilibrium, as investors would take immediate advantage of them, in such a way to move prices back to an equilibrium.

The replicating portfolio should certainly include the underlying asset, but we also need an additional one. Financial markets provide us with such an asset, in the form of a risk-free investment. We may consider a bank account yielding a risk-free interest rate r; we assume here that this rate is continuously compounded, so that if we invest $1 now, we will own $e^rt at a generic time instant t. Equivalently, we may think of a riskless zero-coupon bond, with initial price B₀ = 1 and future price B₁ = e^rT at time t = T, when the option matures. In order to find the number of stock shares and riskless bonds that we should include in the replicating portfolio, we just need to set up a system of two linear equations. Let us denote the number of stock shares by Δ and the number of bonds by

. The initial value of this portfolio is

Note that future value of the riskless bond does not depend on the realized state. If this portfolio has to replicate the option payoff, we should enforce the following two conditions:

A striking fact is that this relationship does not involve the objective probabilities p_u and p_d. However, let us consider the quantities

which happen to multiply the option payoffs in Eq. (3.102). We may notice two interesting features of π_u and π_d:

1. They add up to one, i.e., π_u + π_d = 1.

2. They are non-negative, i.e., π_u, π_d ≥ 0, provided that d ≤ e^rT ≤ u.

The last condition does make economic sense. If we had e^rT < d < u, the risky stock share would perform better than the risk-free bond in any scenario, and we would just borrow money at the risk-free rate and buy a huge amount of the stock share. On the contrary, if the risk-free rate is always larger than the stock price in the future, one could make easy money by selling the stock share short and investing the proceeds in the risk-free asset. In fact, we can interpret π_u and π_d as probabilities, and Eq. (3.102) can be interpreted in turn as the discounted expected value of the option payoff,

provided that we do not use the “objective” probabilities p_u and p_d, but π_u and π_d. Here the notation E[·] emphasizes this change in the probability measure. These are called risk-neutral probabilities, a weird name that we can easily justify. In fact, if we use the probabilities π_u and π_d, the expected value of the underlying asset at maturity is

But how is it possible that the expected return of a risky asset is just the risk-free rate? This might happen only in a world where investors do not care about risk, but only about the expected return; in other words, they do not ask for any risk premium. In such a risk-neutral world, the expected return of all assets, at equilibrium, should be the same, and if there is a risk-free asset, its return will be the expected return for all risky assets as well. This observation justifies the name associated with the probabilities that we use in option pricing. Indeed, what we have just illustrated is the tip of a powerful iceberg, the risk-neutral pricing approach, which plays a pivotal role in financial engineering.

3.9.2 A CONTINUOUS-TIME MODEL FOR OPTION PRICING: THE BLACK–SCHOLES–MERTON FORMULA

The binomial model yields an intuitive and enlightening result. However, we cannot certainly represent uncertainty in stock prices by a Bernoulli random variable. If we introduce more states of nature, we have two ways to build a replicating portfolio: Either we introduce more assets, or we introduce a dynamic replicating strategy, i.e., we rebalance the portfolio several times along the option life. The second approach sounds much more promising, as it relies on the same elementary assets as before. In the limit, if we assume rebalancing in continuous time, we can rely on the machinery of stochastic calculus. Let us assume that the underlying asset price follows a geometric Brownian motion,

where to streamline notation a bit we use S_t and W_t rather than S(t) and W(t), and let us consider the viewpoint of the option writer, who holds a short position in the call option. Let us denote the fair option price at time t, when the underlying asset price is S_t, by f(S_t, t). Note that we are taking for granted that the option price depends on these two variables only, which makes sense for a path-independent, vanilla option. In the case of Asian options, where the payoff depends on average prices, things are not that simple. Using Itô’s lemma, we may write a stochastic differential equation for f:

and what we would like to know is f(S₀, 0), the fair option price now. The option holder has a position of value −f(S_t, t) and is subject to considerable risk: If the stock price rockets up, the option holder will exercise, and the writer will have to buy the underlying asset at a high price, unless some countermeasures are taken. One possibility is to buy some stock shares, say Δ, and build a hedged portfolio of value

Equation (3.104) does not suggest an immediate way to find the option price, but it would look a little bit nicer without the random term dS. Indeed, it is easy to see that, by choosing

we can hedge risk by eliminating the random term dS, which includes the increment of the Wiener process. By differentiating Π, we find

With the above choice of Δ, our portfolio is riskless; hence, by no-arbitrage arguments, it must earn the risk-free interest rate r:

Now we have a deterministic partial differential equation (PDE for short) describing an option value f(S, t). This is the Black–Scholes–Merton PDE. Such a PDE must be solved with suitable boundary conditions, but before doing so, the following observations are in order about the choice of Δ:

It is consistent with the value it takes in the binomial case, which is just an increment ratio and becomes a derivative in the continuous case.

It eliminates the dependence with respect to the true drift μ, which is related to the expected return of the underlying asset; again, this is consistent with the binomial model, where the expected value of the future asset price, based on the objective probabilities, does not plat any role.

What does not quite look consistent with the binomial model is that there we obtain the price as an expected value, whereas here we have a PDE. Actually, they are two sides of the same coin, and the gap can be bridged by one version of the Feynman–Kač formula.

THEOREM 3.17 Feynman–Kač representation theorem. Consider the partial differential equation

The notation E_x,t[·] is used to point out that this is a conditional expectation, given that at time t the value of the stochastic process is X_t = x. From a mathematical viewpoint, the theorem is a consequence of how the stochastic integral in the sense of Itô is defined (see [1] for a clear proof). From a physical point of view, it is a consequence of the connection between Brownian motion (which is a diffusion process) and a certain type of PDEs which can be transformed into the heat equation.

The application of the representation theorem to the Black–Scholes–Merton equation, for an option with payoff function

, immediately yields

which is consistent with Eq. (3.103). We point out once more that the expectation is taken under a risk-neutral measure, which essentially means that we work as if the stochastic differential equation for S_t were

By recalling the result of Eq. (3.93), we see that under the risk-neutral measure the price S_t is lognormal and may be written as

where

~ N(0, 1). Then, it is a fairly easy, eve though a bit tedious, exercise in integration to prove the celebrated Black–Scholes–Merton formula (BSM formula for short), which gives the price C of a vanilla European-style call option:

and Φ(x) is the cumulative distribution function for the standard normal distribution:

It is easy to use R to implement the above formula and plot the call price for various underlying asset prices and times to maturity. The code in Fig. 3.49 produces the plot in Fig. 3.50. Note that the implementation is a bit naive, in the sense that it does not really work with some input arguments. For instance, when time to maturity is zero and the asset price is the same as the strike price, the function returns NaN. This is the reason why in the last line of the script we plot the option value at maturity by computing the payoff. The reader is invited to improve the code, including some consistency check on the input (for instance, volatility cannot be negative).

The same reasoning can be applied to price a put option, but there is a very useful theorem that solves the problem.

THEOREM 3.18 (Put–call parity) Consider a call and a put options written on the same non-dividend-paying stock, with the same strike price K and time to maturity T. Then their prices C₀ and P₀ at time t = 0 are related as follows:

where S₀ is the current stock price and r is the continuously compounded risk-free interest rate.

PROOF Consider a portfolio consisting of a call option and an amount of cash given by Ke^−rT. At maturity, the value of this portfolio is

By a similar token, at maturity the value of a portfolio consisting of a put option and one share of the underlying stock is

Since the two portfolios have the same value in any future scenario, they cannot have a different value now, which implies the statement of the theorem.

This proof is instructive and nontechnical, which is why we have included it: It does not rely on any model and is just based on a no-arbitrage consideration, in the form of the law of one price. When we plug the BSM formula for the call option price into Eq. (3.109), we find a similar formula for the put option price:

where d₁ and d₂ are the same expressions occurring in the call option price. A fundamental advantage of the BSM formula, apart from its computational appeal, is that we may evaluate option sensitivities as well. For instance, the number of shares Δ in the hedging portfolio for a call option is

This quantity is commonly referred to as “delta,” and it may be interpreted as the sensitivity of the option price to changes in the underlying asset price. By differentiating the put–call parity relationship with respect to S₀ we immediately find the corresponding delta for the put option:

where ϕ(·) is the PDF of a standard normal, not to be confused with the CDF Φ(·). Equation (3.113) defines and evaluates the option gamma, i.e., the second-order derivative of the option price with respect to the underlying asset price. Equation (3.114) defines and evaluates the option vega, i.e., the first-order derivative of the option price with respect to volatility. From put–call parity we observe that these two greeks are the same for call and put options.

We see that different formulas are required for call and put options, but there is a subtle point here: We have expressed call and option prices at time t = 0, as a function of time to maturity T. A more general statement would be recast as a function of time to maturity T − t. This is easily accomplished by replacing T with T − t in Eqs. (3.108) and (3.110). It is important to realize that the derivatives defining the option thetas in Eqs. (3.115) and (3.116) are with respect to t, which is then set to t = 0 to find the current theta. Typically, option thetas are negative, which shows that option values are subject to a decay over time.

The option sensitivities, among other things, may be used to evaluate the risk involved in holding a portfolio of options, as we will see in Section 13.3. All of the above formulas are implemented in the R code of Fig. 3.51.

3.9.3 OPTION PRICING IN INCOMPLETE MARKETS

The BSM formula relies on the fact that we can replicate an option. A market in which any option (more generally, any contingent claim) can be replicated, is called a complete market. But the obvious question arises: If the BSM formula were correct, why should one trade options? Indeed, in a complete market, one can replicate any derivative, which turns out to be a redundant and useless asset. A fundamental theorem states that if there are no arbitrage opportunities in the market, then there exists a risk-neutral measure that can be used for pricing. But it is market completeness that ensures its uniqueness. However, markets in real life are incomplete, and even if we rule out arbitrage opportunities, we fail to find a unique risk-neutral measure for pricing purposes.

Transaction costs make continuous-time hedging not practical.

Some risk factors, such as jumps and stochastic volatilities, cannot be hedged.

The underlying factor of an option is not a tradeable asset.

The last case is quite relevant for interest rate derivatives. We can model uncertainty in interest rates using Itô stochastic differential equations, but we cannot use the interest rate itself in a replicating portfolio. Hence, we cannot build a replicating portfolio using the underlying risk factor. Nevertheless, we still can get rid of risk. In fact, the idea here is not to replicate a risky derivative, but to build a riskless portfolio using two assets depending on the same risk factor.

Assume that we consider the short rate r_t as the underlying risk factor, and let us denote by Z₁(r_t, t) and Z₂(r_t, t) be the prices of two traded assets, whose value depends on the short rate r_t at time t. These assets can be as simple as two zero-coupon bonds, which explains the notation, or more complex interest rate derivatives. Let us assume the following general model for the interest rate:⁴⁴

where the drift rate may feature mean reversion. Indeed, the model includes the Vasicek and the Cox–Ingersoll–Ross models of Section 3.7.5. Using Itô’s lemma as usual, we may find the stochastic differential equation for the derivative value Z(r_t, t):

We may also get rid of risk by setting up a portfolio involving both Z₁ and Z₂ in the right proportions. Let us build a portfolio as follows:

Actually, we are using the same trick as in the BSM case, but now it involves two risky assets depending on the same risk factor. That choice of Δ eliminates the impact of both the underlying risk and the drift of the short rate:

We note that the drift term m(r_t, t) does not play any role, and this is consistent with the BSM case. However, here we are relating two derivative prices. We can observe that the two ratios above refer to generic derivatives depending on the risk factor r_t, which implies that for any such derivative they are equal to some function of time and the factor itself. Let us define a function m^* (r_t, t) as minus the ratios in Eq. (3.124); we change the sign just for the sake convenience, as this allows to write the following pricing PDE:

We notice that this is quite close to the BSM partial differential equation, but it involves an unknown function m^* (r_t, t). Using the Feynman–Kač theorem, we can think of the derivative price as the expected value of the payoff, if the risk factor follows the risk-adjusted dynamics

Once again, we notice that there is a change in drift, which reflects a change in probability measure. However, due to market incompleteness, we do not know a priori which measure we should use, as uniqueness of the risk-neutral measure requires completeness. The difficulty is that the drift is related to a market price of risk, which does play a role here since we cannot hedge using the underlying asset directly. The way out of the dilemma requires calibrating the model using some traded derivatives. In the case of interest rates, we might use the quoted prices of bonds or other liquid assets to formulate a model calibration problem and solve the corresponding optimization model to estimate the function m^*(r_t,t). Having pinned down the right risk-adjusted measure, we may price other derivatives by using analytical formulas, when available, or by using numerical methods to solve the PDE, like finite differences, or by estimating the price in expectation form by Monte Carlo methods, thanks to the Feynman–Kač representation theorem.

From our viewpoint, we will assume that the model has already been calibrated and that we may proceed to price the derivative asset by estimating the risk-neutral expectation directly. We will see some examples related to interest rates in Section 6.3 and in Chapter 11.

For further reading

Many of the topics that we have considered in this chapter are reviewed, e.g., in [11].

There are many excellent textbooks in probability theory; [8] is an example of the top-level ones in terms of quality and readability.

For some additional information on data dependence and copulas, see [13] or [18].

A good reference for introductory econometrics in finance, including VAR, ARCH, and GARCH models, is [3]. A deeper treatment is offered in [7].

As to time series models, you may refer to [2] or the extensive treatment in [9]. A practical introduction focusing on financial time series analysis with R is [22].

Readers interested in stochastic calculus may refer to [17] for a gentle introduction or to [21] for a deeper treatment.

Several methods in multivariate statistics, including PCA, are covered in [10] and [20].

The role of factor models in finance is illustrated, e.g., in [14] and [19].

References

1 T. Björk. Arbitrage Theory in Continuous Time (2nd ed.). Oxford University Press, Oxford, 2004.

2 P.J. Brockwell and R.A. Davis. Time Series: Theory and Methods (2nd ed.). Springer, New York, 1991.

3 C. Brooks. Introductory Econometrics for Finance (2nd ed.). Cambridge University Press, Cambridge, 2008.

4 J.Y. Campbell and L.M. Viceira. Strategic Asset Allocation. Oxford University Press, Oxford, 2002.

5 P. Embrechts, C. Kluppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. Springer, Berlin, 2000.

6 P. Embrechts, F. Lindskog, and A. McNeil. Modelling dependence with copulas and applications to risk management. In S.T. Rachev, editor, Handbook of Heavy Tailed Distributions in Finance, pp. 329–384. Elsevier, Amsterdam, 2003.

7 W.H. Greene. Econometric Analysis (7th ed.). Prentice Hall, Upper Saddle River, NJ, 2011.

8 G. Grimmett and D. Stirzaker. Probability and Random Processes (3rd ed.). Oxford University Press, Oxford, 2003.

9 J.D. Hamilton. Time Series Analysis. Princeton University Press, Princeton, NJ, 1994.

10 W. Härdle and L. Simar. Applied Multivariate Statistical Analysis (2nd ed.). Springer, Berlin, 2007.

11 S. Hubbert. Essential Mathematics for Market Risk Management. Wiley, Hoboken, NJ, 2012.

12 J.C. Hull. Options, Futures, and Other Derivatives (8th ed.). Prentice Hall, Upper Saddle River, NJ, 2011.

13 H. Joe. Multivariable Models and Dependence Concepts. Chapman & Hall/CRC, Boca Raton, FL, 2001.

14 J. Knight and S. Satchell, editors. Linear Factor Models in Finance. Elsevier, Amsterdam, 2005.

15 D.P. Kroese, T. Taimre, and Z.I. Botev. Handbook of Monte Carlo Methods. Wiley, Hoboken, NJ, 2011.

16 L. Martellini, P. Priaulet, and S. Priaulet. Fixed-Income Securities: Valuation, Risk Management, and Portfolio Strategies. Wiley, Chichester, 2003.

17 T. Mikosch. Elementary Stochastic Calculus with Finance in View. World Scientific Publishing, Singapore, 1998.

19 E.E. Qian, R.H. Hua, and E.H. Sorensen, editors. Quantitative Equity Portfolio Management: Modern Techniques and Applications. CRC Press, London, 2007.

20 A.C. Rencher. Methods of Multivariate Analysis (2nd ed.). Wiley, New York, 2002.

21 S. Shreve. Stochastic Calculus for Finance (vols. I & II). Springer, New York, 2003.

22 R.S. Tsay. An Introduction to the Analysis of Financial Data with R. Wiley, Hoboken, NJ, 2013.

23 P. Veronesi. Fixed Income Securities: Valuation, Risk, and Risk Management. Wiley, Hoboken, NJ, 2010.

³What we use here as return is referred to as rate of return in some treatments, but we sometimes prefer to streamline terminology a bit.

⁴For the unfamiliar reader, we formally introduce covariance later in Section 3.3.

⁵To put it bluntly, relying on past returns is akin to driving by just looking at the rear-view mirror.

⁶This is a streamlined version of the model described in [4, Chapter 7]. See also Chapter 10 to see how such problems may be tackled by stochastic dynamic programming.

⁷ Throughout this book, we use log rather than ln, but we always mean natural logarithms.

⁸ Actually, the precise definition of Gaussian processes does not only involve normal marginals but also jointly normal distributions.

⁹ Needless to say, Monte Carlo methods play a key role in more complicated cases; see Chapter 11.

¹⁰ We have outlined how a simple binomial model may be calibrated by moment matching in Section 2.5.1.

¹¹ We recall the definition of the factorial of an integer number, n! ≡ n · (n − 1) · (n − 2) ··· 2·1; by convention, 0! = 1.

¹² Such a mechanism is found, e.g., in rejection sampling, as we shall see in Section 5.4.

¹³ One could wonder why we typically use X ~ N(μ, σ²), referring to variance, and not X ~ N(μ, σ), referring to standard deviation. The reason is that this notation easily generalizes to the multivariate case X ~ N(μ, Σ), where Σ is the covariance matrix; see Section 3.3.

¹⁵ We do not deal with stable distributions in this introductory treatment, since they require more advanced concepts; see, e.g., [15, pp. 129–131].

¹⁶See Section 6.3.2, where we deal with sample path generation for continuous-time stochastic processes.

¹⁸Actually, they are sampled from a joint normal distribution with μ₁ = μ₂ = 10, σ₁ = σ₂ = 5. The multivariate normal distribution is introduced in Section 3.3.4.

¹⁹You are advised not to try estimating Kendall’s correlation with R on a very large data set, as this may require a lot of computation and is likely to block your computer.

²⁶The denominator in Eq. (3.51) is related to the estimate of autocovariance. In the literature, sample autocovariance is typically obtained by dividing the sum by T, even though this results in a biased estimator. If we divide by T − k, we might obtain an autocovariance matrix which is not positive semidefinite; see Brockwell and Davis [2], pp. 220–221.

²⁷ Actually, expression is much more powerful; please refer to the online help.

²⁸ To be precise, this representation requires that the two polynomials ϕ(B) and Θ(B) satisfy some additional technical conditions that are beyond the scope of this book.

²⁹ The acronym VAR is a bit unfortunate, as it may be confused with variance or value-at-risk. To avoid confusion, variance is typically written as Var and value-at-risk as VaR; some authors use V@R for the latter, and we will follow that practice in this book.

³⁰ The square-root rule, which we consider in Section 3.7.1, suggests that on very short time spans volatility dominates drift.

³¹ The parameters used in the simulation are calibrated on real data in [22, p. 202].

³² Sample path generation for stochastic differential equations is the subject of Chapter 6.

³³ This has some implications on the evaluation of risk measures like value-at-risk; see Chapter 13.

³⁶We have already met this function in the first chapter; please refer to Fig. 1.5 for programming notes. Later, we will describe a more efficient version.

³⁸One might well wonder whether an increasing exponential trend makes sense for the prices of stock shares, as we do not observe this in real life. One reason is that dividends are paid, bringing down the share price.

³⁹We should maybe talk about orthonormal vectors and matrices, but we will follow the more common parlance.

⁴¹ Strictly speaking, we are in trouble here, since we are maximizing a convex quadratic form. However, we may replace the equality constraint by an inequality constraint, which results in a concave problem, i.e., a problem in which the optimal solution is on the boundary of the feasible solution. It turns out that the solution we pinpoint using the Lagrange multiplier method is the right one.

3.8 Dimensionality reduction