Chapter 3

Differentiation

In the previous chapter we studied continuity of operators from a normed space X to a normed space Y. In this chapter, we will study differentiation: we will define the (Fréchet) derivative of a map f : X → Y at a point x₀ ∈ X. Roughly speaking, the derivative f′(x₀) of f at a point x₀ will be a continuous linear transformation f′(x₀) : X → Y that provides a linear approximation of f in the vicinity of x₀.

As an application of the notion of differentiation, we will indicate its use in solving optimisation problems in normed spaces, for example for real valued maps living on C¹[a, b]. At the end of the chapter, we’ll apply our results to the concrete case of solving the optimisation problem

Setting the derivative of a relevant functional, arising from (P), equal to the zero linear transformation, we get a condition for an extremal curve x_∗, called the Euler-Lagrange equation. Thus, instead of an algebraic equation obtained for example in the minimisation of a polynomial p : R → R using ordinary calculus, now, for the problem (P), the Euler-Lagrange equation is a differential equation. The solution of this differential equation is then the sought after function x_∗ that solves the optimisation problem (P).

At the end of this chapter, we will also briefly see an application of the language developed in this chapter to Classical Mechanics, where we will describe the Lagrangian equations and the Hamiltonian equations for simple mechanical systems. This will also serve as a stepping stone to a discussion of Quantum Mechanics in the next chapter.

3.1Definition of the derivative

Let us first revisit the situation in ordinary calculus, where f : R → R, and let us rewrite the definition of the derivative of f at x₀ ∈ R in a manner that lends itself to generalisation to the case of maps between normed spaces. Recall that for a function f : R → R, the derivative at a point x₀ is the approximation of f around x₀ by a straight line.

Let f : R → R and let x₀ ∈ R. Then f is said to be differentiable at x₀ with derivative f′(x₀) ∈ R if

that is, for every > 0, there exists a δ > 0 such that whenever x ∈ R satisfies 0 < |x − x₀| < δ, there holds that

that is,

If we now imagine f instead to be a map from a normed space X to another normed space Y, then bearing in mind that the norm is a generalisation of the absolute value in R, we may try mimicking the above definition and replace the denominator |x − x₀| above by ||x − x₀||, and the numerator absolute value can be replaced by the norm in Y, since f(x) − f(x₀) lives in Y. But what object must there be in the box below?

Since f(x), f(x₀) live in Y, we expect the term f′(x₀)(x − x₀) to be also in Y. As x − x₀ is in X, f′(x₀) should take this into Y. So we see that it is natural that we should not expect f′(x₀) to be a number (as was the case when X = Y = R), but rather it we expect it should be a certain mapping from X to Y. We will in fact want it to be a continuous linear transformation from X to Y. Why? We will see this later, but a short answer is that with this definition, we can prove analogous theorems from ordinary calculus, and we can use these theorems in applications to solve (e.g. optimisation) problems. After this rough motivation, let us now see the precise definition.

Definition 3.1. (Derivative).

Let X, Y be normed spaces, f : X → Y be a map, and x₀ ∈ X.

Then f is said to be differentiable at x₀ if there exists a continuous linear transformation L : X → Y such that for every > 0, there exists a δ > 0 such that whenever x ∈ X satisfies 0 < ||x − x₀|| < δ, we have

(If f is differentiable at x₀, then it can be shown that there can be at most one continuous linear transformation L such that the above statement holds. We will prove this below in Theorem 3.1, page 124.)

The unique continuous linear transformation L is denoted by f′(x₀), and is called the derivative of f at x₀.

If f is differentiable at every point x ∈ X, then f is said to be differentiable.

Before we see some simple illustrative examples on the calculation of the derivative, let us check that this is a genuine extension of the notion of differentiability from ordinary calculus. Over there the concept of derivative was very simple, and f′(x₀) was just a number. Now we will see that over there too, it was actually a continuous linear transformation, but it just so happens that any continuous linear transformation from R to R is simply given by multiplication by a fixed number. We explain this below.

Coincidence of our new definition with the old definition when we have X = Y = R, f : R → R, x₀ ∈ R.

(1)Differentiable in the old sense ⇒ differentiable in the new sense.
Let exist and be the number
Define the map L : R → R by
Then L is a linear transformation as verified below.

(L1) For every v₁, v₂ ∈ R,

(L2) For every α ∈ R and every v ∈ V,

L is continuous since |L(v)| = |(x₀) · v| = |(x₀)||v| for all v ∈ R. We know that

that is, for every > 0, there exists a δ > 0 such that whenever x ∈ R satisfies 0 < |x − x₀| < δ, we have

that is,

So f is differentiable in the new sense too, and (x₀) = L, that is, we have ((x₀)) (v) = (x₀) · v, v ∈ R.

(2)Differentiable in the new sense ⇒ differentiable in the old sense.

Suppose there is a continuous linear transformation (x₀) : R → R such that for every > 0, there exists a δ > 0 such that whenever x ∈ R satisfies 0 < |x − x₀| < δ, we have

Define (x₀) := (x₀))(1) ∈ R. Then if x ∈ R, we have

So there exists a number, namely (x₀), such that for every > 0, there exists a δ > 0 such that whenever x ∈ R satisfies 0 < |x − x₀| < δ,

Consequently, f is differentiable at x₀ in the old sense, and furthermore, (x₀) = ((x₀))(1).

The derivative as a local linear approximation. We know that in ordinary calculus, for a function f : R → R that is differentiable at x₀ ∈ R, the number f′(x₀) has the interpretation of being the slope of the tangent line to the graph of the function at the point (x₀, f(x₀)), and the tangent line itself serves as a local linear approximation to the graph of the function. Imagine zooming into the point (x₀, f′(x₀)) using lenses of greater and greater magnification: then there is little difference between the graph of the function and the tangent line. We now show that also in the more general set-up when f is a map from a normed space X to a normed space Y, that is differentiable at a point x₀ ∈ X, f′(x₀) can be interpreted as giving a local linear approximation to the mapping f near the point x₀, and we explain this below. Let > 0. Then we know that for all x close enough to x₀ and distinct from x₀, we have

that is, ||f(x) − f(x₀) − f′(x₀)(x − x₀)|| < ||x − x₀||. So for all x close enough to x₀,

that is, f(x) − f(x₀) − f′(x₀)(x − x₀) ≈ 0 ∈ X, and upon rearranging,

The above says that near x₀, f(x) − f(x₀) looks like the action of the linear transformation f′(x₀) acting on x − x₀. We will keep this important message in mind because it will help us calculate the derivative in concrete examples. Given an f, for which we need to find f′(x₀), our starting point will always be to start with calculating f(x) − f(x₀) and trying to guess what linear transformation L would give that f(x) − f(x₀) ≈ L(x − x₀) for x near x₀. So we would start by writing f(x) − f(x₀) = L(x − x₀) + error, and then showing that the error term is mild enough so that the derivative definition can be verified. We will soon see this in action below, but first let us make an important remark.

Remark 3.1. In our definition of the derivative, why do we insist the derivative f′(x₀) of f : X → Y at x₀ ∈ X should be a continuous linear transformation—that is, why not settle just for it being a linear transformation (without demanding continuity)? The answer to this question is tied to wanting

We know this holds with the usual derivative concept in ordinary calculus when f : R → R. If we want this property to hold also in our more general setting of normed spaces, then just having f′(x₀) as a linear transformation won’t do, but in addition we also need the continuity.

On the other hand, for solving optimisation problems, even if one doesn’t have differentiability at a point implying continuity at the point, one can prove useful optimisation theorems using the weaker notion of the derivative. The weaker notion is called the Gateaux derivative ¹, while our stronger notion is the Fréchet derivative. As we’ll only use the Fréchet derivative, we refer to our “Fréchet derivative” simply as “derivative.”

Example 3.1. Let X, Y be normed spaces, and let T : X → Y be a continuous linear transformation. We ask:

Let us do some rough work first. We would like to fill the question mark in the box below with a continuous linear transformation so that

for x close to x₀. But owing to the linearity of T, we know that for all x ∈ X,

and (the right-hand side) T is already a continuous linear transformation. So we make a guess that T′(x₀) = T! Let us check this now.

Let > 0. Choose any δ > 0, for example, δ = 1. Then whenever x ∈ X satisfies 0 < ||x − x₀|| < δ = 1, we have

Hence T′(x₀) = T. Note that as the choice of x₀ was arbitrary, we have in fact obtained that for all x ∈ X, T′(x) = T! This is analogous to the observation in ordinary calculus that a linear function x m · x has the same slope at all points, namely the number m.

Example 3.2. Consider f : C[a, b] → R,

Let x₀ ∈ C[a, b]. What is f′(x₀)?

As before, we begin with some rough work to make a guess for f′(x₀) and we seek a continuous linear transformation L so that for x ∈ C[a, b] near x₀, f(x) − f(x₀) ≈ L(x − x₀). We have

where L : C[a, b] → R is given by

This L is a continuous linear transformation, since it is a special case of Example 2.11, page 72 (when A := 2·x₀ and B = 0). Let us now check that the “-δ definition of differentiability” holds with this L. For x ∈ C[a, b],

and so

So if 0 < ||x − x₀||_∞, then

Let > 0. Set δ := /(b − a). Then δ > 0 and if 0 < ||x − x||_∞ < δ,

So f′(x₀) = L. In other words, f′(x₀) is the continuous linear transformation from C[a, b] to R given by

So as opposed to the ordinary calculus case, one must stop thinking of the derivative as being a mere number, but instead, in the context of maps between normed spaces, the derivative at a point is itself a map, in fact a continuous linear transformation. So the answer to the question

should always begin with the phrase

“f′(x₀) is the continuous linear transformation from X to Y given by ···”.

To emphasise this, let us see some particular cases of our calculation of f′(x₀) above, for specific choices of x₀.

In particular, we have that the derivative of f at the zero function 0, namely f′(0), is the zero linear transformation 0 : C[a, b] → R that sends every h ∈ C[a, b] to the number 0: 0(h) = 0, h ∈ C[a, b].

Similarly,

Exercise 3.1. Consider f : C[0, 1] → R given by

Let x₀ ∈ C[0, 1]. What is f′(x₀)? What is f′(0)?

We now prove that something we had mentioned earlier, but which we haven’t proved yet: if f is differentiable at x₀, then its derivative is unique.

Theorem 3.1. Let X, Y be normed spaces. If f : X → Y is differentiable at x₀ ∈ X, then there is a unique continuous linear transformation L such that for every > 0, there is a δ > 0 such that whenever x ∈ X satisfies 0 < ||x − x₀|| < δ, there holds

Proof. Suppose that L₁, L₂ : X → Y are two continuous linear transformations such that for every > 0, there is a δ > 0 such that whenever x ∈ X satisfies 0 < ||x − x₀|| δ, there holds

Suppose that L₁(h₀) ≠ L₂(h₀) for some h₀ ∈ X. Clearly h₀ ≠ 0 (for otherwise L₁(0) = 0 = L₂(0)!). Take = 1/n for some n ∈ N. Then there exists a δ_n > 0 such that whenever x ∈ X satisfies 0 < ||x − x₀|| < δ_n, the inequalities (3.1), (3.2) hold.

with we have that x ≠ x₀, and

So (3.1), (3.2) hold for this x. The triangle inequality gives

Upon rearranging, we obtain

As the choice of n ∈ N was arbitrary, it follows that ||L₁(h₀) − L₂(h₀)|| = 0, and so L₁(h₀) = L₂(h₀), a contradiction. This completes the proof.

Exercise 3.2. (Differentiability ⇒ continuity).

Let X, Y be normed spaces, x₀ ∈ X, and f : X → Y be differentiable at x₀.

Prove that f is continuous at x₀.

Exercise 3.3. Consider f : C¹[0, 1] → R defined by f(x) = (x′(1))², x ∈ C¹[0, 1]. Is f differentiable? If so, compute f′(x₀) at x₀ ∈ C¹[0, 1].

Exercise 3.4. (Chain rule).

Given distinct x₁, x₂ in a normed space X, define the straight line γ : R → X passing through x₁, x₂ by γ(t) = (1 − t)x₁ + tx₂, t ∈ R.

(1)Prove that if f : X → R is differentiable at γ(t₀), for some t₀ ∈ R, then f γ : R → R is differentiable at t₀ and (fγ)′(t₀)) = f′(γ(t₀))(x₂ − x₁).

(2)Deduce that if g : X → R is differentiable and g′(x) = 0 at every x ∈ X, then g is constant.

3.2Fundamental theorems of optimisation

From ordinary calculus, we know the following two facts that enable one to solve optimisation problems for f : R → R.

Fact 1. If x_∗ ∈ R is a minimiser of f, then f′(x_∗) = 0.

Fact 2. If f″(x) 0 for all x ∈ R and f′(x_∗) = 0, then x_∗ is a minimiser of f.

The first fact gives a necessary condition for minimisation (and allows one to narrow the possibilities for minimisers — together with the knowledge of the existence of a minimiser, this is a very useful result since it then tells us that the minimiser x_∗ has to be one which satisfies f′(x_∗) = 0). On the other hand, the second fact gives a sufficient condition for minimisation.

Analogously, we will prove the following two results in this section, but now for a real-valued function f : X → R on a normed space X.

Fact 1. If x_∗ ∈ X is a minimiser of f, then f′(x_∗) = 0.

Fact 2. If f is convex and f′(x_∗) = 0, then x_∗ is a minimiser of f.

We mention that there is no loss of generality in assuming that we have a minimisation problem, as opposed to a maximisation one. This is because we can just look at −f instead of f. (If f : S → R is a given function on a set S, then defining −f : S → R by (−f)(x) = −f(x), x ∈ S, we see that x_∗ ∈ S is a maximiser for f if and only if x_∗ is a minimiser for −f .)

Optimisation: necessity of vanishing derivative

Theorem 3.2. Let X be a normed space, and let f : X → R be a function that is differentiable at x_∗ ∈ X . If f has a minimum at x_∗, then f′(x_∗) = 0.

Let us first clarify what 0 above means: 0 : X → R is the continuous linear transformation that sends everything in X to 0 ∈ R: 0(h) = 0, h ∈ X.

So to say that “f′(x_∗) = 0” is the same as saying that for all h ∈ X, (f′(x_∗))(h) = 0.

Proof. Suppose that f′(x_∗) ≠ 0. Then there exists a vector h₀ ∈ X such that (f′(x_∗))(h₀) ≠ 0. Clearly this h₀ must be a nonzero vector (because the linear transformation f′(x₀) takes the zero vector in X to the zero vector in R, which is 0). Let > 0. Then there exists a δ > 0 such that whenever x ∈ X satisfies 0 < ||x − x_∗|| < δ, we have

Thus whenever 0 < ||x − x_∗|| < δ, we have

Hence whenever 0 < ||x − x_∗|| < δ,

Now we will construct a special x using the h₀ from before. Take

Then x ≠ x_∗ and

Using the linearity of f′(x₀), we obtain

Thus, |(f′(x_∗))(h₀)| < ||h₀||. As > 0 was arbitrary, |(f′(x_∗))(h₀)| = 0, and so (f′(x_∗))(h₀) = 0, a contradiction.

We remark that the condition f′(x_∗) = 0 is a necessary condition for x_∗ to be a minimiser, but it is not sufficient. This is analogous to the situation to optimisation in R: if we look at f : R → R given by f(x) = x³, x ∈ R, then with x_∗ := 0, we have that f′(x_∗) = 3x_∗² = 3 · 0² = 0, but clearly x_∗ = 0 is not a minimiser of f.

Example 3.3. Let f : C[0, 1] → R be given by f(x) = (x(1))³, x ∈ C[0, 1]. Then f′(0) = 0. (Here the 0 on the left-hand side is the zero function in C[0, 1], while the 0 on the right-hand side is the zero linear transformation 0 : C[0, 1] → R.) Indeed, given > 0, we may set δ := min{, 1}, and then we have that whenever x ∈ C[0, 1] satisfies 0 < ||x − 0||_∞ < δ,

But 0 is not a minimiser for f. For example, with x := −α · 1 ∈ C[0, 1], where α > 0, we have f(x) = (−α)³ = −α³ < 0 = f(0), showing that 0 is not ² a minimiser.

Exercise 3.5. Let f : C [a, b] → R be given by

In Example 3.2, page 122, we showed that if x₀ ∈ C [a, b], then f′(x₀) is given by

(1)Find all x₀ ∈ C[a, b] for which f′(x₀) = 0.

(2)If we know that x_∗ ∈ C[a, b] is a minimiser for f, what can we say about x_∗?

Optimisation: sufficiency in the convex case

We will now show that if f : X → R is a convex function, then a vanishing derivative at some point is enough to conclude that the function has a minimum at that point. Thus the condition “f″(x) 0 for all x ∈ R” from ordinary calculus when X = R, is now replaced by “f is convex” when X is a general normed space. We will see that in the special case when X = R (and when f is twice continuously differentiable), convexity is precisely characterised by the second derivative condition above. We begin by giving the definition of a convex function.

Definition 3.2. (Convex set, convex function) Let X be a normed space.

(1)A subset C ⊂ X is said to be a convex set if for every x₁, x₂ ∈ C, and all α ∈ (0, 1), (1 − α) · x₁ + α · x₂ ∈ C.

(2)Let C be a convex subset of X. A map f : C → R is said to be a convex function if for every x₁, x₂ ∈ C, and all α (0, 1),

The geometric interpretation of the inequality (3.3), when X = R, is shown below: the graph of a convex function lies above all possible chords.

Exercise 3.6. Let a < b, y_a, y_b be fixed real numbers.

Show that S := {x ∈ C¹[a, b] : x(a) = y_a and x(b) = y_b} is a convex set.

Exercise 3.7. (||·||) is a convex function).

If X is a normed space, then prove that the norm x ||x|| : X → R is convex.

Exercise 3.8. (Convex set versus convex function).

Let X be a normed space, C be a convex subset of X, and let f : C → R. Define the epigraph of f by

Intuitively, we think of U(f) as the “region above the graph of f ”. Show that f is a convex function if and only if U(f) is a convex set.

Exercise 3.9. Suppose that f : X → R is a convex function on a normed space X.

If n ∈ N, x₁, ···, x_n ∈ X, then

Convexity of functions living in R.

We will now see that for twice differentiable functions f : (a, b) → R, convexity of f is equivalent to the condition that f″(x) 0 for all x ∈ (a, b). This test will actually help us to show convexity of some functions on spaces like C¹[a, b].

If one were to use the definition of convexity alone, then the verification can be cumbersome. Consider for example the function f : R → R given by f(x) = x², x ∈ R. To verify that this function is convex, we note that for x₁, x₂ ∈ R and α ∈ (0, 1),

On the other hand, we will now prove the following result.

Theorem 3.3. Let f : (a, b) → R be twice continuously differentiable. Then f is convex if and only if for all x ∈ (a, b), f″(x) 0.

The convexity of x x² is now immediate, as

Example 3.4. We have and so x e^x is convex. Consequently, for all x₁, x₂ ∈ R and all α ∈ (0, 1), we have the inequality e^{(1−α)x₁ +αx₂} (1 − α)e^x₁ + αe^x₂ .

Exercise 3.10. Consider the function f : R → R given by Show that f is convex.

Proof. (Of Theorem 3.3.) Only if part: Let x, y ∈ (a, b) and x < u < y.

Set Then α ∈ (0, 1), and As f is convex,

that is,

From (3.4), (y − x) f(u) (u − x) f(y) + (y − x + x −y) f(x), that is,

and so

From (3.4), we also have (y − x)f(u) (u − y + y − x)f(y) + (y − u)f(x), that is, (y − x)f(u) − (y − x)f(y) (u − y)f(y) − (u − y)f(x), and so

Combining (3.5) and (3.6),

Passing the limit as u x and u y,

Hence f′ is increasing, and so

Consequently, for all x ∈ R, f″(x) 0.

If part: Since f″(x) 0 for all x ∈ (a, b), it follows that f′ is increasing. Indeed by the Fundamental Theorem of Calculus, if a < x < y < b, then

Now let a < x < y < b, α ∈ (0, 1), and u := (1 − α)x + αy. Then x < u < y.

By the Mean Value Theorem, for some v ∈ (x, u).

Similarly, for some w ∈ (u, y).

As w > v, we have f′(w) f′(v), and so

Rearranging, we obtain

Thus f is convex.

Example 3.5. Consider the function f : C[a, b] → R given by

Is f convex? We will show below that f is convex, using the convexity of the map ξ ξ² : R → R. Let x₁, x₂ ∈ C[a, b] and α ∈ (0, 1). Then for all a, b ∈ R, ((1 − α)a + αb)² (1 − α)a² + αb². Hence for each t ∈ [a, b], with a := x₁(t), b := x₂(t), we obtain

Thus

Consequently, f is convex.

Exercise 3.11. (Convexity of the arc length functional.)

Let f : C¹[0, 1] → R, be given by

Prove that f is convex.

Example 3.6. Let us revisit Example 0.1, page viii.

There S := {x ∈ C¹[0, T] : x(0) = 0 and x(T) = Q}, and f : S → R was given by

where a, b, Q > 0 are constants. Let us check that f is convex. The convexity of the map

follows from the convexity of η η² : R → R. The map

is constant on S because

and so this map is trivially convex. Hence f, being the sum of two convex functions, is convex too.

We now prove the following result on the sufficiency of the vanishing derivative for a minimiser in the case of convex functions.

Theorem 3.4.

Let X be a normed space and f : X → R be convex and differentiable. If x_∗ ∈ X is such that f′(x_∗) = 0, then f has a minimum at x_∗ .

Proof. Suppose that x₀ ∈ X and f(x₀) < f(x_∗). Define φ : R → R by φ(t) = f(tx₀ + (1 − t)x_∗), t ∈ R. The function φ is convex, since if α ∈ (0, 1) and t₁, t₂ ∈ R, then we have

Also, from Exercise 3.4 on page 125, φ is differentiable at 0, and

We have φ(1) = f(x₀) < f(x_∗) = φ(0). By the Mean Value Theorem, there exists a θ ∈ (0, 1) such that

But this is a contradiction because φ is convex (and so φ′ must be increasing; see the proof of the “only if” part of Theorem 3.3).

Thus there cannot exist an x₀ ∈ X such that f(x₀) < f(x_∗).

Consequently, f has a minimum at x_∗.

Exercise 3.12. Consider

Let x₀ ∈ C[0, 1]. From Example 3.2, page 122, f′(x₀) : C[0, 1] → R is given by

Prove that f′(x₀) = 0 if and only if x₀(t) = 0 for all t ∈ [0, 1].

We have also seen that in Example 3.5, page 131, that f is a convex function.

Find all solutions to the optimisation problem

3.3Euler-Lagrange equation

Theorem 3.5. Let

(1)x_∗ ∈ S = {x ∈ C¹[a, b] : x(a) = y_a, x(b) = y_b},

(2)

(3)

(4)X := {h ∈ C¹[a, b] : h(a) = 0, x(bh = 0},

(5) : X → R be given by (h) = f(x_∗ + ), h ∈ X .

Then ′(0) = 0 if and only if x_∗ ∈ S satisfies the Euler-Lagrange equation:

Definition 3.3. Such an x_∗ ∈ S, which satisfies the Euler-Lagrange equation, is said to be stationary for the functional f.

Note that X defined above in Theorem 3.5 is a vector space, since it is a subspace of C¹[0, 1] (Exercise 1.3, page 7), and it inherits the ||·||_1,∞-norm from C¹[0, 1]. To prove Theorem 3.5, we will need the following result.

Lemma 3.1. (“Fundamental lemma of the calculus of variations”). If k ∈ C[a, b] is such that

then there exists a constant c such that k(t) = c for all t ∈ [a, b].

Of course, if k ≡ c, then by the Fundamental Theorem of Calculus,

for all h ∈ C¹[a, b] that satisfy h(a) = h(b) = 0. The remarkable thing is that the converse is true, namely that the special property in the box forces k to be a constant.

Proof. Set

(If k ≡ c, then

so the c defined above is the constant that k “is supposed to be”.)

Define h₀ : [a, b] → R by

Thus Since h′₀(t) = k(t) − c, t ∈ [a, b], we obtain

Thus k(t) − c = 0 for all t ∈ [a, b], and so k ≡ c.

Proof. (Of Theorem 3.5). We note that h ∈ X if and only if x_∗ + h ∈ S.

and so x_∗ + h ∈ S.

Vice versa, if x_∗ + h ∈ S, then

Consequently, h ∈ X .) Thus is well-defined.

What is ′(0)? For h ∈ X, we have

By Taylor’s Formula for F, we know that

for some θ such that 0 < θ < 1. We will apply this for each fixed t ∈ [a, b], with ξ₀ := x_∗(t), p := h(t), η₀ := x′_∗(t), q := h′(t), τ₀ := t, r := 0, and we will obtain a θ ∈ (0, 1) for which the above formula works. If I change the t, then I will get a possibly different θ ∈ (0, 1). So we have that the θ depends on t ∈ [a, b]. This gives rise to a function Θ : [a, b] → (0, 1) so that

where

and H_F (·) denotes the Hessian of F:

From the above, we make a guess for ′(0): define L : X → R by

We have seen that L is a continuous linear transformation in Example 2.11, page 72. For h ∈ X,

where

We note that for each t ∈ [a, b], the point

in R³ belongs to a ball with centre (x_∗(t), x′_∗(t), t) and radius ||h||_1,∞. But x_∗, x′_∗ are continuous, and so these centres (x_∗(t), x′_∗(t), t), for different values of t ∈ [a, b], lie inside some big compact set in R³. And if we look at balls with radius, say 1, around these centres, we get a somewhat bigger compact set, say K, in R³. Since the partial derivatives

are all continuous, it follows that their absolute values are bounded on K. Hence M is finite.

Let > 0, and If h ∈ X satisfies 0 < ||h − 0||_1,∞ = ||h||_1,∞ < δ, then

Consequently, ′(0) = L.

(Only if part). So far we’ve calculated ′(0) and found out that it is the continuous linear transformation L. Now suppose that ′(0) = L = 0, that is, for all h ∈ X, Lh = 0, and so

for all h ∈ C¹[a, b] with h(a) = h(b) = 0,

We would now like to use the technical result (Lemma 3.1) we had shown. So we rewrite the above integral and convert the term in the integrand which involves h, into a term involving h′, by using integration by parts:

because h(a) = h(b) = 0. So for all h ∈ C¹[a, b] with h(a) = h(b) = 0, we have

By Lemma 3.1, for some constant c.

By differentiating with respect to t, we obtain

(If part). Now suppose that x_∗ satisfies the Euler-Lagrange equation, that is, A(t) − B′(t) = 0 for all t ∈ [a, b]. For h ∈ X, we have

Thus for all h ∈ X,

Consequently, ′(0) = L = 0

Corollary 3.1.

Let (1)S = {x ∈ C¹[a, b] : x(a) = y_a, x(b) = y_b},

(2)

(3)f : S → R be given by

Then we have:

(a)If x_∗ is a minimiser of f, then it satisfies the Euler-Lagrange equation:

(b)If f is convex, and x_∗ ∈ S satisfies the Euler-Lagrange equation, then x_∗ is a minimiser of f .

Proof. Let X := {x ∈ C¹[a, b] : x(a) = 0, x(b) = 0}, and : X → R be given by (h) = f(x_∗ + ), h ∈ X. Then is well-defined.

(a)We claim that has a minimum at 0 ∈ X. Indeed, for h ∈ X, we have

So by Theorem 3.2, page 126, ′(0) = 0. From Theorem 3.5, page 134, it follows that x_∗ satisfies the Euler-Lagrange equation.

(b)Now let f be convex and x_∗ ∈ S satisfy the Euler-Lagrange equation. By Theorem 3.5, it follows that ′(0) = 0. The convexity of f makes convex as well. Indeed, if h₁, h₂ ∈ X, and α ∈ (0, 1), then

Recall that in Theorem 3.4, page 132, we had shown that for a convex function, the derivative vanishing at a point implies that that point is a minimiser for the function. Since is convex, and because ′(0) = 0, 0 is a minimiser of . We claim that x_∗ is a minimiser of f.

Indeed, if x ∈ S, then x = x_∗ + (x – x_∗) = x_∗ + h, where h := x – x_∗ ∈ X.

Hence

This completes the proof.

Let us revisit Example 0.1, page viii, and solve it by observing that it falls in the class of problems considered in the above result.

Example 3.7. Recall that S := {x ∈ C¹[0, T] : x(0) = 0 and x(T) = Q}, so that we have a = 0, b = T, y_a = 0 and y_b = Q. The cost function f : S → R was given by

where a, b, Q > 0 are constants, and F : R³ → R is given by

So this problem does fall into the class of problems covered by Corollary 3.1. In order to apply the result to solve this problem, we compute

The Euler-Lagrange equation for x_∗ ∈ S is:

that is, for all t ∈ [0, T]. Thus

By the Fundamental Theorem of Calculus, it follows that there is a constant A such that x′(t) = A, t ∈ [0, T], and integrating again, we obtain a constant B such that x_∗(t) = At + B, t ∈ [0, T]. But since x_∗ ∈ S, we also have that x_∗(0) = 0 and x_∗(T) = Q, which we can use to find the constants A, B: A · 0 + B = 0, and A · T + B = Q so that B = 0 and A = Q/T. Consequently, by part (a) of the conclusion in Corollary 3.1, we know that if x_∗ is a minimiser of f, then

On the other hand, we had checked in Example 3.6 that f is convex. And we know that the x_∗ given above satisfies the Euler-Lagrange equation. Consequently, by part (b) of the conclusion in Corollary 3.1, we know that this x_∗ is a minimiser. So we have shown, using Corollary 3.1, that

So we have solved our optimal mining question.

And we now know that the optimal mining operation is given by the humble straight line!

Exercise 3.13. (Euclidean plane).

Let P₁ = R² with ||(x, t)||₁ := for (x, t) ∈ P₁.

Set S := {x ∈ C¹[a, b] : x(a) = x_a, x(b) = x_b}.

Given x ∈ S, the map is a curve in the Euclidean plane P₁, and we define its arc length by

Show that the straight line joining (x_a, a) and (x_b, b) has the smallest arc length.

Exercise 3.14. (Galilean spacetime).

Let P₀ = R² with ||(x, t)||₀ := for (x, t) ∈ P₀.

Set S := {x ∈ C¹[a, b] : x(a) = x_a, x(b) = x_b}.

Given x ∈ S, the map is a curve in the Euclidean plane P₀, and we define its arc length by

Show that all the curves γ_x joining (x_a, a) and (x_b, b) have the same arc length. (If we think of P₀ as the collection of all events (=“here and now”), with the coordinates provided by an “inertial frame ³” choice, then this arc length is the pre-relativistic absolute time between the two events (x_a, a) and (x_b, b).)

Exercise 3.15. (Minkowski spacetime).

Let P−1 = R² with ||(x, t)||₋₁ := for (x, t) ∈ P₀.

Set S := {x ∈ C[a, b] : x(a) = x_a, x(b) = x_b, for all a t b, |x′(t)| < 1}.

Given x ∈ S, the map is a curve in the Euclidean plane P₋₁, and we define its arc length by

Show ⁴ that among all the curves γ_x joining (x_a, a) and (x_b, b), the straight line has the largest(!) arc length.

(P₋₁ can be thought of as the special relativistic spacetime of all events, with the coordinates provided by an “inertial frame” choice. Then the arc length L(γ_x) is the proper time between the two events (x_a, a) and (x_b, b), which may be thought of as the time recorded by a clock carried by an observer along its worldline γ_x. The fact that the straight line has the largest length accounts for the aging of the travelling sibling in the famous Twin Paradox: Imagine two twins, say Seeta and Geeta, who are separated at birth, event (0, 0) in an inertial frame, and meet again in adulthood at the event (0, T). Seeta, the meek twin, doesn’t move in the inertial frame, described by the straight line γS joining the events (0, 0) and (0, T). Meanwhile, the other feisty twin, Geeta, travels in a spaceship (never exceeding the speed of light, 1), with a worldline given by γ_G, starting at (0, 0), and ending to meet the twin at (0, T) as shown.

There is no longer any surprise that Seeta has aged far more than Geeta, thanks to our inequality that L(γ_G) < L(γ_S). Resting is rusting!)

Exercise 3.16. (Euler-Lagrange Equation: vector valued case).

The results in this section can be generalised to the case when f has the form

where (ξ₁, ···, ξ_n, η₁, ···, η_n, τ) F(ξ₁, ···, ξ_n, η₁, ···, η_n, τ) : R²ⁿ⁺¹ → R is a function with continuous partial derivatives of order 2, and x₁, ···, x_n are n continuously differentiable functions of the variable t ∈ [a, b].

Then following a similar analysis as before, we obtain n Euler-Lagrange equations to be satisfied by the minimiser (x_1∗, ···, x_n∗): for t ∈ [a, b], and k ∈ {1, ···, n},

Let us see an application of this to the planar motion of a body under the action of the gravitational force field (planet around the sun).

If x₁(t) = r(t) (the distance to the sun), and x₂(t) = φ(t) (radial angle), then the function to be minimised is

Show that the Euler-Lagrange equations give

(The latter equation shows that the angular momentum, L(t) := mr(t)² φ′(t), is conserved, and this gives Kepler’s Second Law, saying that a planet sweeps equal areas in equal times.)

Exercise 3.17. (Euler-Lagrange Equation: several independent variables). Suppose that Ω ⊂ R^d is a “region” (an open, path-connected set), and that

is a given C² function (called the Lagrangian density).

We are interested in finding u ∈ C¹(Ω) which minimise I : C¹(Ω) → R given by

(Here subscripts indicate respective partial derivatives: for example,

It can be shown that a necessary condition for u to be a minimiser of I is that it satisfies the Euler-Lagrange equation below:

(Note that the Euler-Lagrange equation above is now a Partial Differential Equation (PDE), rather than the Ordinary Differential Equation (ODE) we had met in Theorem 3.5, page 134.)

Let us consider examples of writing the Euler-Lagrange equation.

(1)(Minimal area surfaces).

Consider a smooth surface in R³ which is the graph of (x, y) u(x, y) defined on an open set Ω ⊂ R².

The area of the surface is given by:

Show that if u is a minimiser, then u must satisfy the PDE

Verify that the following solve this PDE:

Also, in the case of the helicoid, show that a parametric representation of the surface is given by x(s, t) = s · cos t, y(s, t) = s · sin t, z(s, t) = t, by setting and t = tan⁻¹(y/x). Plot the surface ⁵ with Maple using:

(2)(Wave equation).

Consider a vibrating string of length 1, whose ends are fixed.

If u(x, t) denotes the displacement at position x and time t, where 0 x 1, then the potential energy at time t is given by

and the kinetic energy is

For u : [0, 1] × [0, T] → R, set

Prove that if u_∗ minimises I, then it satisfies the wave equation

Show that if f : R → R is

-twice continuously differentiable,

-odd (f(x) = −f(–x) for all x ∈ R), and

-periodic with period 2 (that is, f(x + 2) = f(x) for all x ∈ R),

then u given by is such that

-it solves the wave equation,

-with the boundary conditions u(0, ·) = 0 = u(1, ·) and

-the initial conditions u(·, 0) = f (position) and (velocity).

Interpret the solution graphically.

3.4An excursion in Classical Mechanics

The aim of this section is to apply the Euler-Lagrange equation to illustrate some basic ideas in classical mechanics. Also, this brief discussion will provide some useful background for discussing Quantum Mechanics later on, as an application of Hilbert spaces and their operators.

Newtonian Mechanics. Consider the motion t q_∗(t) of a classical point particle of mass m along a straight line. Here q_∗(t) denotes the position of the particle at time t.

Then the evolution of q_∗ is described by Newton’s Law, which says that the “mass times the acceleration equals the force acting”, that is, if F(x) is the force at position x, then

Together with the particle’s initial position q_∗(t_i) = q_i, and initial velocity (t_i) = v_i, the above equation determines a unique q_∗.

Principle of Stationary Action. An alternative formulation of Newtonian Mechanics is given by the “Principle of Stationary ⁶ Action”, which is more useful because it lends itself to generalisations for other types of physical situations, for example in describing the electromagnetic field (when there are no particles). In that sense it is more fundamental as it provides a unifying language.

First, let us define the potential V : R → R as follows. Choose any x₀ ∈ R, and set

V is thought of as the work done against the force to go from x₀ to x. (Because of the fact that x₀ was chosen arbitrarily, the potential V for a force F is not unique. By the Fundamental Theorem of Calculus, we have

and so it can be seen that if V, are potentials for F, then as

there is a constant c ∈ R such that (x) = V(x) + c, x ∈ R.) We define the kinetic energy of the particle at time t as

Consider for q ∈ C¹[t_i, t_f] with q(t_i) = x_i and q(t_f) = x_f, the action

where L is called the Lagrangian, given by L(x, v) = mv² − V(x).

Note that along an imagined trajectory q of a particle,

The Principle of Stationary Action in Classical Mechanics says that the motion q_∗ of the particle moving from position x_i at time t_i to position x_f at time t_f is such that Ã′(0) = 0, where Ã : X → R is given by

By Theorem 3.5, page 134, the Euler-Lagrange equation is equivalent to Ã′(0) = 0, and so the motion q_∗ is described by

that is,

Using we obtain Newton’s equation of motion,

Here are a couple of examples.

Example 3.8. (The falling stone).

Let x 0 denote the height above the surface of the Earth of a stone of mass m. Then its potential energy is given by V(x) = mgx. Thus

Suppose the stone starts from initial height x₀ > 0 at time 0, with initial speed 0. Then the height q_∗(t) at time t is described by that is,

Using the initial conditions, we obtain and so

Example 3.9. (The Harmonic Oscillator).

The harmonic oscillator is the simplest oscillating system, where we can imagine a body mass m attached to a spring with spring constant k oscillating about its equilibrium position. For a displacement of x from the equilibrium position of the mass, a force of kx is imparted on the mass by the spring. So

The equation of motion is

describing the displacement q_∗(t) from the equilibrium position at time t. If v₀ is the velocity at time t = 0, and the initial position is q_∗(0) = 0, then the unique solution is

(It can be easily verified that this q_∗ satisfies the equation of motion as well as the initial conditions and

The maximum displacement is

and the period of oscillation is

“Symmetries” of the Lagrangian give rise to “conservation laws”:

Law of Conservation of Energy. Since the Lagrangian L(x, v) does not depend on t (that is, it possesses “the symmetry of being invariant under time translations”), we will now see that this results in the Law of Conservation of Energy. Define the energy E(t) along q_∗ at time t by

Then we have

Hence the energy E is constant, that is, it is conserved.

Law of Conservation of Momentum. Now suppose that the Lagrangian does not depend on the position, that is, L(x, v) = l(v) for some function l. Define the momentum p_∗t along q_∗ at time t by

Then

and so p_∗ is constant, that is, the momentum is conserved.

Remark 3.2. (Noether’s Theorem).

The above two results are special cases of a much more general result, called Noether’s Theorem, roughly stating that every differentiable symmetry of the action has a corresponding conservation law. This result is fundamental in theoretical physics. We refer the interested reader to the book [Neuenschwander (2011)].

Example 3.10. (Particle in a Potential Well).

Consider a particle of mass m moving along a line, and which is acted upon by a force

generated by a potential V. The associated Lagrangian is

Suppose that the motion of the particle is described by q_∗ for t 0. If

then for all t 0, we have by the Law of Conservation of Energy, that

and so This implies that V(q_∗(t)) E. Hence the particle cannot leave the potential well if V(x) → ∞ as x → ±∞.

If the velocity of the particle is always positive while moving from initial position x₀ at time t = 0 to a position x > x₀ at time t, then by integrating,

If in this manner, the particle reaches x₁, where E = V(x₁) (see the previous picture), then we may ask if the travel time t₁ from x₀ to x₁ is finite.

The above expression reveals that t₁ < ∞ if and only if

In particular, in the case of the harmonic oscillator, where

we have that the time of travel from the initial condition x₀ to the maximum displacement x_max is finite, and is given by

which is, as expected, one-fourth of the period of oscillation.

Hamiltonian Mechanics.

The momentum p_∗ is defined by Since we have

The Euler-Lagrange equation, can be re-written as

It turns out that the above two equations can be expressed in a much more symmetrical manner, with the introduction of the Hamiltonian,

as follows. Note that

Thus

These two equations are equivalent to and the Euler-Lagrange equation. The space {(q, p) ∈ R²} is called the phase plane, where the position-momentum pairs live. Each point (q, p) in the phase plane is thought of as a possible state of the particle. Given an initial state (q₀, p₀), the coupled first order differential equations, describing the evolution of the state, namely the Hamiltonian equations

for t 0, describe a curve t (q_∗(t), p_∗(t)) in the phase plane, called a phase plane trajectory. The collection of all phase plane trajectories, corresponding to various initial conditions, is called the phase portrait. The following picture shows the phase portrait for the harmonic oscillator.

We also observe that the Hamiltonian H, evaluated along a phase plane trajectory t (q_∗(t), p_∗(t)), is

the energy, which by the Law of Conservation of Energy, is a constant. So the phase plane trajectories are contained in level sets of the Hamiltonian H. Another proof of this constancy of the function H along phase plane trajectories, based on the Hamiltonian equations, is given below (where we have suppressed writing the argument t):

This sort of a calculation can be used to calculate the time evolution of any “observable” (q, p) f(q, p) along phase plane trajectories in the phase plane, as explained in the next paragraph.

Poissonian Mechanics.

All the (mechanical) physical characteristics are functions of the state. For example in our one-dimensional motion of the particle, the coordinate functions (q, p) q and (q, p) p give, for a state (q, p) of the particle, the position, respectively the momentum, of the particle. Similarly,

gives the energy. Motivated by these considerations, we take

as the collection of all observables.

We now introduce a binary operation {·, ·} : C^∞;(R²) × C^∞(R²) → C^∞(R²), which is connected with the evolution of the mechanical system.

Given two observables F and G in C^∞(R²), define the new observable {F, G} ∈ C^∞(R²), called the Poisson bracket of F, G, by

The Poisson bracket can be used to express the evolution of an observable F. Suppose that our particle moving along a line, evolves along the phase plane trajectory (q_∗, p_∗) in the phase plane according to Hamilton’s equations for a Hamiltonian H. Then the evolution of the observable F ∈ C^∞(R²) along the trajectory (q_∗, p_∗) is given by (again suppressing t):

In particular, if {F, H} = 0 (as for example is the case when F = H!), then F is a conserved quantity.

It can be shown that C^∞(R²) forms a Lie algebra with the Poisson bracket, that is, the following properties hold:

for α, β ∈ R and any F, G, H ∈ C^∞(R²). (H may not be the Hamiltonian!)

We will see in the next chapter, that the role of the Poisson bracket in classical mechanics,

of observables F, G ∈ C^∞(R²), is performed by the commutator

of observables A, B (which are operators on a Hilbert space H) in quantum mechanics.

Exercise 3.18. Prove (3.7)-(3.9).

Exercise 3.19. (Position and Momentum).

Let Q ∈ C^∞(R²) be the position observable, and P ∈ C^∞(R²) be the momentum observable Show that {Q, P} = 1.

¹See for example [Luenberger (1969)].

²In fact, not even a “local” minimiser because ||x − 0||_∞ = α can be chosen as small as we please.

³A coordinate system is inertial if particles which are “free” that is, not acted upon by any force, move in straight lines with a uniform speed.

⁴Here we tacitly ignore the fact that the set S doesn’t quite have the form that we have been considering, since we have the extra constraint |x′(t)| < 1 for all ts. Despite this extra condition, a version of Corollary 3.1 holds, mutatis mutandis, with an appropriately adapted proof: instead of X := {h ∈ C¹[a, b] : h(a) = 0 = h(b)}, we work in the open subset X₀ := {h ∈ X :|h′(t)| < 1 for all t ∈ [a, b]} of X. We won’t spell out the details here, but we simply use the Euler-Lagrange equation in this exercise.

⁵This surface has the least area with a helix as its boundary.

⁶It is standard to use “Least” rather than “Stationary” because in many cases the action is actually minimised.