A normed space has two structures: a linear one (the underlying vector space), and a topological one (the norm). So when we study maps between normed spaces, it is natural to focus on maps which are well-behaved with these structures, and we’ll do this now. In particular, we’ll study:
(1)linear transformations
(well-behaved with respect to the linear structure),
(2)continuous maps
(well-behaved with respect to the topological structure),
(3)continuous linear transformations
(well-behaved with respect to both structures).
In the context of normed spaces, continuous linear transformations are most important, and these are sometimes also called bounded linear operators.
The reason for this terminology will become clear in Theorem 2.6 (page 67). We’ll see that the set of all bounded linear operators is itself a vector space, with obvious pointwise operations of addition and scalar multiplication, and it also has a natural notion of a norm, called the operator norm. Equipped with the operator norm, the vector space of bounded linear operators is a Banach space, provided that the co-domain is a Banach space. This is a useful result, which we will use in order to prove the existence of solutions to integral and differential equations.
Linear transformations are maps that respect vector space operations.
Definition 2.1. (Linear transformation).
Let X and Y be vector spaces over K (R or C).
A map T : X → Y is called a linear transformation if:
(L1) For all x1, x2 ∈ X, T(x1 + x2) = T(x1) + T(x2).
(L2) For all x ∈ X and all α ∈ K, T(α · x) = α · T(x).
Example 2.1. (Linear galore!)
(1)D : C1[a, b] → C[a, b] given by Dx = x′, x ∈ C1[a, b] is a linear transformation, since
(L1) D(x + y) = (x + y)′ = x′ + y′ = Dx + Dy for all x, y ∈ C1[a, b];
(L2)D(αx) = (αx)′ = α · x′ for all α ∈ R and x ∈ C1[a, b].
(2)Let m, n ∈ N and X = Rn and Y = Rm.
is a linear transformation from Rn to Rm. Indeed,
and so (L1) holds. Moreover,
and so (L2) holds as well. Hence TA is a linear transformation.
(3)Let X = Y = ℓ2. Define the left/right shift operators L, R as follows: if x = (xn)n∈N ∈ ℓ2, then
Then it is easy to see that R and L are linear transformations.
(4)Let X := c(⊂ ℓ∞), the space of all real valued convergent sequences, and Y = R. The map L : c → R, L((an)n∈N) := for (an)n∈N, is a linear transformation (using the algebra of limits).
Recall that given a linear transformation T : X → Y, we can associate with T two natural subspaces, of X and Y, respectively,
the kernel of T, ker T := {x ∈ X : Tx = 0Y} ⊂ X, and
the range of T, ran T := {y ∈ Y : ∃x ∈ X such that y = Tx} ⊂ Y .
In the above example of the linear transformation L, we have
ker L = c0(set of sequences convergent with limit 0),
ran L = R(since for every r ∈ R, the constant sequence (r)n∈N) converges to r).
(5)The map I : C[a, b] → R, given by for all x ∈ C[a, b], is a linear transformation.
(6)Let S := {h ∈ C1[a, b] : h(a) = h(b) = 0}. From Exercise 1.3 (page 7), we see that S is a subspace of C1[a, b]. Let A, B ∈ C[a, b] be fixed functions.
Let L : S → R be given by
Let us check that L is a linear transformation. We have:
(L1) For all h1, h2 ∈ S,
(L2) For all h ∈ C1[0, 1] and all α ∈ R,
Thus L is a linear transformation.
(7)Let C1(R), C2(R) denote the vector spaces of once, respectively twice continuously differentiable real-valued functions in R with pointwise operations. For f ∈ C2(R) and g ∈ C1(R), consider the initial value problem for the one (spatial) dimensional wave equation:
Let C2(R × [0, ∞)) denote the vector space of all twice continuously differentiable functions (x, t) u(x, t) : R × [0, ∞]
R, again with pointwise operations. Then it can be shown that the unique solution uf,g in C2(R × [0, ∞)) to (IVP) is given by d’Alembert’s Formula,
Then the map (f, g) uf,g : C2(R) × C1(R) → C2(R × [0, ∞)) is a linear transformation.
Here are some non examples.
Example 2.2. (Not quite linear!)
(1)If ·∗ denotes complex conjugation, then the complex conjugation map z z∗ : C → C is not a linear transformation, since although (L1) is satisfied: (z + w)∗ = z∗ + w∗(z, w ∈ C), we see that (L2) isn’t: indeed, (i · 1)∗ = i∗ = –i ≠ i · 1∗.
(2)Consider the map T : R2 → R2 defined by
Then T is not a linear transformation since (L1) is not satisfied.
Indeed,
while
If α ∈ R\{0} and then we have
If α = 0 and
So for all Thus (L2) holds.
Notation 2.1. We will denote the set of all linear transformations from the vector space X to the vector space Y by L(X, Y). Recall from elementary linear algebra that L(X, Y) is itself a vector space (over the common field K for X, Y) with pointwise operations: if T, S ∈ L(X, Y), then we define T + S ∈ L(X, Y) by (T + S)(x) = Tx + Sx, for all x ∈ X, and if α ∈ K and T ∈ L(X, Y), then we define α · T ∈ L(X, Y) by (α · T)(x) = α · (Tx), for all x ∈ X. What is the zero vector in this vector space L(X, Y)? It is the “zero linear transformation” 0 : X → Y, given by 0x = 0Y, for all x ∈ X, where 0Y denotes the zero vector in Y.
If X = Y, then we write L(X) instead of L(X, X).
Exercise 2.1. Consider the two maps S1, S2 : C[0, 1] → R given by
Show that S1 is not a linear transformation, while S2 is.
Exercise 2.2. Let a, b be nonzero real numbers, and consider the two real-valued functions f1, f2 defined on R by f1(t) = eat cos(bt) and f2(t) = eat sin(bt), t ∈ R. f1 and f2 are vectors belonging to the infinite dimensional vector space C1(R), consisting of all continuously differentiable functions from R to R. Denote by Sf1,f2 the span of the two functions f1 and f2.
(1)Prove that f1 and f2 are linearly independent in C1(R).
(2)Show that the differentiation map, is a linear transformation.
(3)What is the matrix [D]B of D with respect to the (ordered) basis B = (f1, f2)?
(4)Prove that D is invertible, and write down the matrix corresponding to the inverse of D.
(5)Compute the indefinite integrals
Let X and Y be normed spaces. As there is a notion of distance between pairs of vectors in either space (provided by the norm of the difference of the pair of vectors in each respective space), one can talk about continuity of maps. Within the huge collection of all maps, the class of continuous maps form an important subset. Continuous maps play a prominent role in functional analysis since they possess some useful properties.
Before discussing the case of a function between normed spaces, let us first of all recall the notion of continuity of a function f : R → R.
In everyday speech, a ‘continuous’ process is one that proceeds without gaps of interruptions or sudden changes. What does it mean for a function f : R → R to be continuous? The common informal definition of this concept states that a function f is continuous if one can sketch its graph without lifting the pencil. In other words, the graph of f has no breaks in it. If a break does occur in the graph, then this break will occur at some point. Thus (based on this visual view of continuity), we first give the formal definition of the continuity of a function at a point below. Next, if a function is continuous at each point, then it will be called continuous. If a function has a break at a point, say x0, then even if points x are close to x0, the points f(x) do not get close to f(x0).
This motivates the definition of continuity in calculus, which guarantees that if a function is continuous at a point x0, then we can make f(x) as close as we like to f(x0), by choosing x sufficiently close to x0.
Definition 2.2. A function f : R → R is continuous at x0 if for every > 0, there exists a δ > 0 such that for all x ∈ R satisfying |x – x0| < δ, we have that |f(x) – f(x0)| <
.
f : R → R is continuous if for every x0 ∈ R, f is continuous at x0.
We now define the set of continuous maps from a normed space X to a normed space Y.
We observe that in the definition of continuity in ordinary calculus, if x, y are real numbers, then |x–y| is a measure of the distance between them, and that the absolute value | · | is a norm in the finite (one) dimensional normed space R. So it is natural to define continuity in arbitrary normed spaces by simply replacing the absolute values by the corresponding norms, since the norm provides a notion of distance between vectors.
Definition 2.3. (Continuity of maps between normed spaces).
Let X and Y be normed spaces over K (R or C). Let x0 ∈ X. A map f : X → Y is continuous at x0 if for every > 0, there exists a δ > 0 such that for all x ∈ X satisfying ||x – x0|| < δ, we have ||f(x) – f(x0)|| <
. f : X → Y is continuous if for all x0 ∈ X, f is continuous at x0.
We will soon study when linear transformations are continuous, but first let us consider some examples of nonlinear maps.
Example 2.3. Consider the map S : C[0, 1] → R, given by
We’ll show that S is continuous. (As usual, C[0, 1] is endowed with the supremum norm.) Suppose that x0 ∈ C[0, 1]. Let > 0. As we would like to make |S(x) – S(x0)| small, let us first consider this expression. We have
if ||x – x0||∞ < δ, where δ > 0 is some number. We ought to choose δ > 0 suitably so as to make the right-hand side above smaller than . There is no unique way to do this, and anything one can justify works. We set
Whenever ||x – x0||∞ < δ, in light of the above computation, we have
Thus S is continuous at x0. As the choice of x0 was arbitrary, it follows that S is continuous (on C[0, 1]).
Example 2.4. c00 is the subspace of ℓ∞ of all finitely supported sequences. c00 is a normed space with the supremum norm inherited from ℓ∞.
Consider the map s : c00 → R given by
We’ll show that s is not continuous at 0. Suppose on the contrary that it is. With = 1/4 > 0, there exists a δ > 0 such that if ||a||∞ = ||a – 0||∞ < δ, then we are guaranteed that |s(a) – s(0)| = |s(a) – 0|= |s(a)| <
= 1/4. If
then
So for all m sufficiently large, we must have ||am||∞ < δ, giving in turn that |s(am)| < 1/4. But for all m we have
a contradiction. Hence s is not continuous at 0.
Exercise 2.3. (Rationale for the C1[a, b] norm.)
This exercise concerns the norm on C1[a, b] we have chosen to use. Since we want to be able to use ordinary analytic operations such as passage to the limit, then, given a function f : C1[0, 1] → R, it is reasonable to choose a norm such that f is continuous. As our f, let us take the arc length function given by
We show in the following sequence of exercises that f is not continuous if we equip C1[0, 1] with the supremum norm || · ||∞ induced from C[0, 1].
(1)Calculate f(0). (The arc length of the graph of the constant function taking value 0 everywhere on [0, 1] is obviously 1, and check that the above formula delivers this.)
(2)Now consider Using
for all
and the periodicity of sin(2πnt) (the graph of sin(2πnt) on
is repeated n times in [0, 1]), conclude that
(3)Show that f is not continuous at 0. (Prove this by contradiction. Note that by taking larger and larger n, ||xn – 0||∞ can be made as small as we please, but f(xn) doesn’t stay close to f(0).)
Show that the arc length function f is continuous if we equip C1[0, 1] with the norm || · ||1,8. It may be useful to note that by using the triangle inequality in (R2, || · ||2), we have for a, b ∈ R that
Exercise 2.4. Let (X, || · ||) be a normed space. Show that the norm || · || : X → R is a continuous map.
We’ll now learn an important property of continuous maps:
“inverse images” of open sets under a continuous map are open.
In fact, we shall see that this property is a characterisation of continuity. First let’s some notation. Let f : X → Y be a map between the normed spaces X and Y, and let V ⊂ Y. We set f–1(V) := {x ∈ X : f(x) ∈ V}, and call it the inverse image of V under f. Clearly f–1(Y) = X and f–1(∅) = ∅.
Exercise 2.5. Let f : R → R be given by f(x) = cos x(x ∈ R).
Find f–1(V), where V = {–1, 1}, V = {1}, V = [–1, 1], V = R,
On the other hand if U ⊂ X, then we set f(U) := {f(x) ∈ Y : x ∈ U}, and call it the image of U under f.
Exercise 2.6. Let f : R → R be given by f(x) = cos x (x ∈ R).
Find f(U), where U = R, U = [0, 2π], U = [δ, δ + 2π] where δ > 0.
Theorem 2.1. Let X, Y be normed spaces and f : X → Y be a map.
Then f is continuous on X if and only if
for every V open in Y, f–1(V) is open in X.
Proof.
(If) Let c ∈ X, and let > 0. Consider the open ball B(f(c),
) with center f(c) and radius
in Y . We know that this open ball V := B(f(c),
) is an open set in Y. Thus we also know that f–1(V) = f–1(B(f(c),
)) is an open set in X. But the point c ∈ f–1(B(f(c),
)), because f(c) ∈ B(f(c),
) (||f(c), f(c)|| = 0 <
!). So by the definition of an open set, there is a δ > 0 such that B(c, δ) ⊂ f–1(B(f(c),
)). In other words, whenever x ∈ X satisfies ||x – c|| < δ, we have x ∈ f–1(B(f(c),
)), that is, f(x) ∈ B(f(c),
), which implies ||f(x) – f(c)|| <
. Hence f is continuous at c. But the choice of c ∈ X was arbitrary. Consequently f is continuous on X. See the picture on the left.
(Only if) Now let f be continuous, and let V be an open subset of Y. We would like to show that f–1(V) is open. So let c ∈ f–1(V. Then f(c) ∈ V. As V is open, there is a small open ball B(f(c), ) with center f(c) and radius
that is contained in V. By the continuity of f at c, there is a δ > 0 such that whenever ||x – c|| < δ, we have ||f(x) – f(c)|| <
, that is, f(x) ∈ V. But this means that B(c, δ) ⊂ f–1(V). Indeed, if x ∈ B(c, δ), then ||x – c|| < δ and so by the above, f(x) ∈ V, that is, x ∈ f–1(V). Consequently, f–1(V) is open in X. See the picture on the right above.
Note that the theorem does not claim that for every U open in X, f(U) is open in Y. Consider for example X = Y = R equipped with the Euclidean norm, and the constant function f(x) = c (x ∈ R), which is clearly continuous. But note that direct images of open sets are not always open under f : indeed X = R is open in X = R, but f(X) = {c} is not open in Y = R.
Corollary 2.1. Let X, Y be normed spaces and f : X → Y be a map.
Then f is continuous on X if and only if
for every F closed in Y, f–1(F) is closed in X.
Proof. If F ⊂ Y, then f–1(Y\F) = X\(f–1(F)).
Exercise 2.7. Fill in the details of the proof of Corollary 2.1.
Theorem 2.2. Let X, Y, Z be normed spaces, and f : X → Y, g : Y → Z be continuous maps. Then the composition map g f : X → Z, defined by (g
f)(x) := g(f(x)) (x ∈ X), is continuous.
Proof. Let W be open in Z. Then since g is continuous, g–1(W) is open in Y. Also, since f is continuous, f–1(g–1(W)) is open in X. Finally, we note that (g f)–1(W) = f–1(g–1(W)). So g
f is continuous.
Exercise 2.8. In the proof of Theorem 2.2, we used (g f)–1(W) = f–1(g–1(W)). Check this.
Exercise 2.9. Let X be a normed space and f : X → R be a continuous map. Determine if the following statements are true or false.
(1){x ∈ X : f(x) < 1} is an open set.
(2){x ∈ X : f(x) > 1} is an open set.
(3){x ∈ X : f(x) = 1} is an open set.
(4){x ∈ X : f(x) 1} is a closed set.
(5){x ∈ X : f(x) = 1} is a closed set.
(6){x ∈ X : f(x) = 1 or f(x) = 2} is a closed set.
(7){x ∈ X : f(x) = 1} is a compact set.
We have the following characterisation of continuous maps in terms of convergence of sequences: “Continuous maps preserve convergent sequences”.
Theorem 2.3. Let X, Y be normed spaces, c ∈ X, and let f : X → Y. Then the following two statements are equivalent:
(1)f is continuous at c.
(2)For every sequence (xn)n∈N in X such that (xn)n∈N converges to c, (f(xn))n∈N converges to f(c).
Proof.
(1) ⇒ (2): Suppose that f is continuous at c. Let (xn)n∈N be a sequence in X such that (xn)n∈N converges to c. Let > 0. Then there exists a δ > 0 such that for all x ∈ X satisfying ||x – c|| < δ, we have ||f(x) – f(c)|| <
. As the sequence (xn)n∈N converges to c, for this δ > 0, there exists an N ∈ N such that whenever n > N, ||xn – c|| < δ. But then by the above, ||f(xn) – f(c)|| <
. So we have shown that for every
> 0, there is an N ∈ N such that for all n > N, ||f(xn) – f(c)|| <
. In other words, the sequence (f(xn))n∈N converges to f(c).
(2) ⇒ (1): Suppose that f is not continuous at c. Thus there is an > 0 such that for every δ > 0, there is an x ∈ X such that ||x – c|| < δ, but ||f(x) – f(c)|| >
. We will use this statement to construct a sequence (xn)n∈N for which the conclusion in (2) does not hold. Let δ = 1/n, for n ∈ N, and denote a corresponding x as xn: thus, ||xn – c|| < δ = 1/n, but ||f(xn) – f(c)|| >
. Clearly the sequence (xn)n∈N is convergent with limit c, but (f(xn))n∈N does not converge to f(c) since ||f(xn) – f(c)|| >
for all n ∈ N. Consequently if (1) does not hold, then (2) does not hold. In other words, we have shown that (2) ⇒ (1).
Exercise 2.10. Let X, Y be normed spaces. Find all continuous maps f : X → Y such that for all x ∈ X, f(x) + f(2x) = 0. Hint:
Exercise 2.11. (∗)(Continuity of the determinant; {invertible matrices} is open). Show that the determinant M → det M : (Rn×n, || · ||∞) → (R, | · |) is continuous. Prove that the set of invertible matrices is open in (Rn×n, || · ||∞). Hint: det–1{0}.
In this section we will learn about a very useful result in Optimisation Theory, on the existence of global minimisers of real-valued continuous functions on compact sets.
Theorem 2.4.
If(1)K is a compact subset of a normed space X,
(2)Y is a normed space, and
(3)f : X → Y is function that is continuous at each x ∈ K,
then f(K) is a compact subset of Y.
Proof. Suppose that (yn)n∈N is a sequence contained in f(K). Then for each n ∈ N, there exists an xn ∈ K such that yn = f(xn). Thus we obtain a sequence (xn)n∈N in the set K. As K is compact, there exists a convergent subsequence, say (xnk)k∈N, with limit L ∈ K. As f is continuous, it preserves convergent sequences. So (f(xnk))k∈N = (ynk)k∈N is convergent with limit f(L) ∈ f(K). Consequently, f(K) is compact.
Now we prove the aforementioned result which turns out to be very useful in Optimisation Theory, namely that a real-valued continuous function on a compact set attains its maximum/minimum on the compact set. This is a generalisation of the Extreme Value Theorem we had learnt earlier, where the compact set in question was just the interval [a, b].
Theorem 2.5. (Weierstrass).
If(1)K is a nonempty compact subset of a normed space X, and
(2)f : X → R is a function that is continuous at each x ∈ K, then there exists a c ∈ K such that f(c) = sup{f(x) : x ∈ K}.
We note that since c ∈ K, f(c) ∈ {f(x) : x ∈ K}, and so the supremum above is actually a maximum:
Also, under the same hypothesis of the above result, there exists a minimiser in K, that is, there exists a d ∈ K such that
This follows from the above result by just looking at –f, that is by applying the above result to the function g : X → R given by g(x) = –f(x) (x ∈ X).
Proof. (Of Theorem 2.5.) We know that the image of K under f, namely the set f(K) is compact and hence bounded. So {f(x) : x ∈ K} is bounded. It is also nonempty since K is nonempty. But by the least upper bound property of R, a nonempty bounded subset of R has a least upper bound. Thus M := sup{f(x) : x ∈ K} ∈ R. Now consider M – 1/n (n ∈ N). This number cannot be an upper bound for {f(x) : x ∈ K}. So there must be an xn ∈ K such that f(xn) > M – 1/n. In this manner we get a sequence (xn)n∈N in K. As K is compact, (xn)n∈N has a convergent subsequence (xnk)k∈N with limit, say c, belonging to K. As f is continuous, (f(xnk))k∈N is convergent as well with limit f(c). But from the inequalities f(xn) > M – 1/n (n ∈ N), it follows that f(c) M. On the other hand, from the definition of M, we also have that f(c)
M. So f(c) = M.
Example 2.5. Since the set is compact in R3 and since the function x
x1 + x2 + x3 is continuous on R3, it follows that the optimisation problem
has a minimiser.
Remark 2.1. (∗) In Optimisation Theory, one often meets necessary conditions for a minimiser, that is, results of the following form:
(Here are certain mathematical conditions, such as the Lagrange multiplier equations.) Now such a result has limited use as such since even if we find all
which satisfy
, we can’t conclude that there is one that is a minimiser. But now suppose that we know that f is continuous on F and that F is compact. Then we know that a minimiser exists, and so we know that among the
that satisfy
, there is at least one which is a minimiser.
Notation 2.2. We will denote the set of all continuous maps from the normed space X to the normed space Y by C(X, Y).
In this section we study those linear transformations from a normed space X to a normed space Y that are also continuous.
Notation 2.3. We denote the set of all continuous linear transformations from the normed space X to the normed space Y by CL(X, Y), that is, CL(X, Y) := C(X, Y) ∩ L(X, Y). If X = Y, then we denote CL(X, X) simply by CL(X).
We begin by giving a characterisation of continuous linear transformations.
Theorem 2.6.
Let X and Y be normed spaces, and T : X → Y be a linear transformation. Then the following properties of T are equivalent:
(1)T is continuous.
(2)T is continuous at 0.
(3)There exists an M > 0 such that for all x ∈ X, ||Tx||Y M ||x||X.
We’ll see the proof below. But let us first remark that the useful part is the equivalence of (1) and (3), since by just showing the existence/lack of of the bound M, we can conclude the continuity/lack of continuity of the given linear transformation. So we don’t have to go through the rigmarole of verifying the -δ definition: rather, a simple estimate, as stipulated in (3), suffices. Note also that it seems miraculous that continuity at just one point (at 0) delivers continuity everywhere on X! This miracle happens because the map T is not any old map, but rather a linear transformation. Here is an elementary example.
Example 2.6. (The left shift and right shift operators).
The left shift operator, L : ℓ2 → ℓ2, given by
is a linear transformation. We have for all (an)n∈N ∈ ℓ2 that
and so L ∈ C(ℓ2, ℓ2). The right shift operator R : ℓ2 → ℓ2, given by R(a1, a2, a3, ···) := (0, a1, a2, ···), (an)n∈N ∈ ℓ2, is also a linear transformation which is continuous, thanks to the equality
for all (an)n∈N ∈ ℓ2.
Proof. (Of Theorem 2.6.) We will show the three implications (1)⇒(2), (2)⇒(3), and (3)⇒(1), which are enough to get all the three equivalences (and six implications) given in the statement of the theorem.
(1)⇒(2). This is just the definition of continuity on X. Indeed, T has to be continuous at each point in X, and in particular at 0 ∈ X.
(2)⇒(3). Take := 1 > 0. Then there exists a δ > 0 such that whenever ||x – 0|| = ||x|| < δ, we have that ||Tx – T0|| = ||Tx – 0|| = ||Tx|| < 1. Let’s check that this yields:
First consider x = 0. Then
And so the claim in (2.2) holds because we have in fact an equality.
On the other hand, now suppose that x ≠ 0. Set Then
and so ||Ty|| < 1, that is
Upon rearranging, we obtain (2.2). So the claim in (3) holds with (3)⇒(1). Let M > 0 be such that for all x ∈ X, ||Tx||
M ||x||. Let x0 ∈ X, and
> 0. Set δ :=
/M > 0. Then whenever ||x – x0|| < δ, we have
So T is continuous at x0. But as x0 ∈ X was arbitrary, T is continuous.
Example 2.7. (Norm on C1[a, b] revisited). Consider the differentiation mapping D : C1[0, 1] → C[0, 1] defined by (Dx)(t) = x′(t), t ∈ [0, 1], x ∈ C1[0, 1]. We had seen that D is a linear transformation. Let’s now investigate if D is also continuous.
(1)We will show that D is not continuous if both C1[0, 1] and C[0, 1] are equipped with the || · ||∞ norm. Suppose on the contrary that the map D is continuous. Because D is a linear transformation, it follows from Theorem 2.6 that there exists an M > 0 such that for all x ∈ C1[0, 1],
But if we take x = tn (n ∈ N), then we have
and so ||Dx||∞ = ||x′||∞ = n M ||x||∞ = M · 1, that is, n
M for all n ∈ N, which is clearly not true. So D is not continuous.
(2)However, D is continuous if C1[0, 1] is equipped with the || · ||1,∞ norm:
while C[0, 1] has the usual supremum norm || · ||∞. Indeed, we have for all x ∈ C1[0, 1], that ||Dx||∞ = ||x′||∞ ||x||∞ + ||x′||∞ = ||x||1,∞.
Example 2.8. (If X, Y are finite dimensional, then L(X, Y) = CL(X, Y).) Let X = (Rn, || · ||2), Y = (Rm, || · ||2) and let A ∈ Rm×n be given by
Let TA : Rn → Rm be the linear transformation given by TAx := Ax for all x ∈ Rn. Then for all x ∈ Rn,
and so Hence TA is continuous.
Remark 2.2. We know that every linear transformation on finite dimensional vector spaces X, Y can be represented by TA once bases for X, Y have been chosen. Also we know that all norms on finite-dimensional normed spaces are equivalent to each other. It follows from these two facts that every linear transformation between finite dimensional normed spaces is continuous.
Example 2.9. Let and let
For x = (xj)j∈N ∈ ℓ2, set
We claim that TA : ℓ2 → ℓ2 is a continuous linear transformation on ℓ2. Firstly, TAx ∈ ℓ2, since
Moreover, it is easily seen that TA ∈ L(ℓ2). Moreover, by Theorem 2.6, the computation above shows that TA ∈ CL(ℓ2).
Example 2.10. (Integral operators).
Suppose that A : [0, 1] × [0, 1] → R be such that
We think of A as a “non-discrete/continuous” analogue of a square matrix: the indices i, j are replaced by the “non-discrete/continuous” indices t, τ.
Then the map TA : L2[0, 1] → L2[0, 1], defined by
is a continuous linear transformation. The following picture illustrates the action of TA on x schematically, highlighting the analogy with matrix multiplication.
We note that for x ∈ L2[0, 1],
(The inequality in the second line above, is the Cauchy-Schwarz inequality in L2[0, 1], and it follows from the general Cauchy-Schwarz inequality in inner product spaces, which will be shown in Theorem 4.1, page 157; see also Example 4.3, page 159. We’ll accept this for now.) So TAx ∈ L2[0, 1], and TA ∈ CL(L2[0, 1]).
Operators TA are called integral operators. It used to be common to call the function A that plays the role of the matrix, as the “kernel”1 of the integral operator. Many variations of the integral operator are possible.
Example 2.11. We had seen on page 55 that with
and A, B ∈ C[a, b], the map L : S → R given by
is a linear transformation. Now we ask: is L continuous? Here we equip S ⊂ C1[0, 1] with the norm || · ||1,∞. For h ∈ S,
where In the above, we have used
Hence L is continuous.
Example 2.12. (∗)(Fourier transform).
Let L1(R) be the space of all complex valued Lebesgue integrable functions on R, with the usual L1-norm:
Its Fourier transform is the function : R → C defined by
Then is a continuous function on R, and it is also bounded because
The vector space Cb(R) of all complex-valued continuous functions on R that are bounded, is a normed space with the supremum norm:
(We won’t check this; the proof is analogous to Example 1.9, page 10.) Thus from the above, we have ∈ Cb(R). It is also easy to check that
: L1(R) → Cb(R) is a linear transformation, and it is continuous, thanks to the estimate above, giving ||
||∞
||f||1.
Remark 2.3. Owing to the characterisation of continuous linear transformations by the existence of a bound as in item (3) of Theorem 2.6 above, they are sometimes called bounded linear operators.
Exercise 2.12. Show that if A ∈ Rm×n, then ker A = {x ∈ Rn : Ax = 0} is a closed subspace of Rn.
Exercise 2.13. (∗) Prove that every subspace of Rn is closed.
Hint: Construct a linear transformation whose kernel is the given subspace.
Exercise 2.14. Let C[a, b] be endowed with the || · ||∞-norm.
(1)Show that is a continuous linear transformation.
(2)Prove that if converges to f in C[a, b], then
Exercise 2.15. (Convolution operator).
If f ∈ L1(R), then the corresponding convolution operator f∗ : L∞(R) → L∞(R) is given by
Show that f∗ is well-defined and that f∗ ∈ CL(L∞(R)).
Exercise 2.16. Let Y = {f ∈ L2(R) : f = := f(–·)} be the set of all even functions in L2(R). Show that Y is a closed subspace of L2(R).
Hint: View Y as the kernel of a suitable map in CL(L2(R)).
Consider the set CL(X, Y) of all continuous linear transformations from a normed space X to a normed space Y. We will show that CL(X, Y) is a normed space, with pointwise operations (inherited from L(X, Y)), and the “operator norm” || · || : CL(X, Y) → R given by
Let us first show that CL(X, Y) is a subspace of L(X, Y), making it a vector space in its own right.
Proposition 2.1. CL(X, Y) is a subspace of L(X, Y).
Proof. We have:
(S1)Let S, T ∈ CL(X, Y). Then there exist MS, MT > 0 such that for all x ∈ X, ||Sx|| MS ||x|| and ||Tx||
MT ||x||. So
Thus S + T ∈ CL(X, Y) too.
(S2)Let α ∈ R and T ∈ CL(X, Y). There exists an M > 0 such that for all x ∈ X, ||Tx|| M||x||. So ||(αT)x|| = ||α(Tx)|| = |α| ||Tx||
|α|M||x||. Hence αT ∈ CL(X, Y).
(S3)The zero linear transformation 0 ∈ CL(X, Y) because for all x ∈ X, ||0x|| = ||0Y|| = 0 1 · ||x||.
Consequently, CL(X, Y) is a subspace of L(X, Y).
Next we show that the operator norm || · || : CL(X, Y) → R given by
is indeed a norm on CL(X, Y). First let us check that is a well-defined number. If we set S := {||Tx|| : x ∈ X, ||x|| 1}, then we note that this is a subset of the real numbers. Let us observe that this is a nonempty bounded set:
(1)S ≠ ∅ because if we take x = 0X ∈ X, then ||x|| = ||0X|| = 0 1, and so ||Tx|| = ||T0X|| = ||0Y|| = 0 ∈ S.
(2)S is bounded above. As T ∈ CL(X, Y), there is an M > 0 such that for all x ∈ X, ||Tx|| M||x||. We claim that M is an upper bound of S. Indeed, if x ∈ X and ||x||
1, then ||Tx||
M||x||
M · 1 = M.
Since S is a nonempty subset of R which is bounded above, it follows from the Least Upper Bound Property of R that the supremum of S exists: so for all T ∈ CL(X, Y), ||T|| := sup{||Tx|| : x ∈ X, ||x|| 1} < ∞. In order to do our verification that this operator norm || · || is a norm on CL(X, Y), the following two results will be useful.
Lemma A. Let T ∈ CL(X, Y).
If M > 0 is such that for all x ∈ X, ||Tx|| M||x||, then ||T||
M.
Proof. If x ∈ X and ||x|| 1, then ||Tx||
M||x||
M · 1 = M. So M is an upper bound of S = {||Tx|| : x ∈ X, ||x||
1}. Thus sup S
M, that is, ||T||
M.
Lemma B. Let T ∈ CL(X, Y). Then for all x ∈ X, ||Tx|| ||T|| ||x||.
Proof.
1 x = 0. Then ||Tx|| = ||T0|| = ||0|| = 0 = ||T||0 = ||T|| ||0|| = ||T|| ||x||.
2 Suppose that 0X ≠ x ∈ X. Let
Then
Thus ||Ty|| ∈ S, and so ||Ty|| sup S = ||T||, that is,
Rearranging, we get ||Tx|| ||T|| ||x||.
Lemmas A and B together tell us that for a T ∈ CL(X, Y), ||T|| is allowed as an “M” in
and moreover it is the smallest possible such number M, in the sense that any other allowed M has got to be at least as large as ||T||.
Theorem 2.7. The operator norm, || · || : CL(X, Y) → R, given by
is a norm on CL(X, Y).
Proof. We have:
(N1)For T ∈ CL(X, Y), since ||Tx||
0 for all x.
If T ∈ CL(X, Y) and ||T|| = 0, then ||Tx|| ||T|| ||x|| = 0||x|| = 0, and so ||Tx|| = 0, that is, Tx = 0Y for all x ∈ X. So T = 0, the zero linear transformation.
(N2)For α ∈ K and T ∈ CL(X, Y),
(N3)Let T, S ∈ CL(X, Y). Then for all x ∈ X,
from which it follows (Lemma A) that ||T + S|| ||T|| + ||S||.
Example 2.13. Recall Example 2.8, page 69.
Let Rn and Rm be equipped with the Euclidean || · ||2-norm.
Let A = [Aij] ∈ Rm×n, and TA ∈ CL(Rn, Rm) be the continuous linear transformation given by TAx = Ax, x ∈ Rn.
Then we’d seen that for all x ∈ Rn, So
So we have an estimate for ||TA|| in terms of the matrix coefficients aij. But there does not exist a general “formula” for ||TA|| in terms of the matrix coefficients except in the special cases n = 1 or m = 1, when ||TA|| = |a11|. It can be seen that the map
is also a norm on Rm×n, and is called the Hilbert-Schmidt norm of A.
Exercise 2.17. (Diagonal operator norm; operator norm needn’t be attained.) Let (λn)n∈N be a bounded sequence in K, and let Λ ∈ CL(ℓ2) be given by Λ(a1, a2, a3, ···) = (λ1a1, λ2a2, λ3a3, ···) for all (a1, a2, a3, ···) ∈ ℓ2. Show that Λ ∈ CL(ℓ2) and
Now let Show that there is no x ∈ ℓ2 such that ||x||2
1 and ||Λx||2 = ||Λ||. This gives an example showing that the operator norm need not be attained.
Exercise 2.18. (Schauder basis). Let X be a Banach space. A sequence of vectors (en)n∈N in X is a Schauder basis for X if for every x ∈ X, there exists a unique sequence of numbers (ξn)n∈N such that
Let 1 p < ∞, and en = (0, ···, 0, 1, 0, ···) be the sequence in ℓp with nth term equal to 1 and all others 0. Show that {en : n ∈ N} is a Schauder basis for ℓp.
Hint: For uniqueness use the continuity of the “coordinate map” φn : x xn, selecting the nth term of the sequence x.
Remark. A Banach space X that has a Schauder basis is separable, that is, there exists a countable dense subset in X (for example the linear combinations of the en with rational coefficients). The converse of the above, namely if every separable Banach space had a Schauder basis, was an open problem for a long time. In 1973, the Swedish mathematician Per Enflo finally constructed an example of a separable Banach space that does not have a Schauder basis.
Exercise 2.19. (Invariant subspace, and the Invariant Subspace Problem)
(1)Prove that the averaging operator2 A : ℓ∞ → ℓ∞, defined by
is a continuous linear transformation. What is the operator norm of A?
(2)(∗) A subspace Y of a normed space X is said to be an invariant subspace with respect to a linear transformation T : X → X if TY ⊂ Y. Let A ∈ CL(ℓℓ) be the averaging operator from part (1). Show that the subspace c of ℓ∞, consisting of all convergent sequences, is an invariant subspace of the averaging operator A. Hint: Show that if x ∈ c has limit L, then Ax has limit L.
Remark. Invariant subspaces are useful since they are helpful in studying complicated operators by breaking them down into smaller operators acting on invariant subspaces. This is already familiar to the student from the diagonalisation procedure in linear algebra, where one decomposes the vector space into eigenspaces, and in these eigenspaces the linear transformation acts trivially. One of the open problems in functional analysis is the invariant subspace problem:
Does every T ∈ CL(H) on a separable complex Hilbert space H have a non-trivial invariant subspace?
Hilbert spaces are just special types of Banach spaces, in which the norm is induced by an inner product, and we will learn about Hilbert spaces in Chapter 4. Non-trivial means that the invariant subspace must be different from {0} or H. In the case of Banach spaces, the answer to the above question is “no”: during the annual meeting of the American Mathematical Society in Toronto in 1976, Per Enflo (again!) announced the existence of a Banach space and a bounded linear operator on it without any non-trivial invariant subspace.
Now that we know CL(X, Y) is a normed space with the operator norm, it is natural to ask if CL(X, Y) is complete, that is, if CL(X, Y) is a Banach space. It turns out that CL(X, Y) is a Banach space if and only if Y is a Banach space, and we’ll show this in the next section.
We’ll see that CL(X, Y) is a Banach space if and only if Y is a Banach space. In this section the “if” part will be shown, and the “only if” part will be done in Remark 2.9, page 109.
Theorem 2.8. If Y is a Banach space, and X is any normed space, then CL(X, Y) is a Banach space.
Proof. Let (Tn)n∈N be a Cauchy sequence in CL(X, Y). Let x ∈ X. Claim: (Tnx)n∈N is Cauchy in Y.
Indeed, for all n, m, ||Tnx – Tmx|| ||Tn – Tm|| ||x||.
As Y is Banach, (Tnx)n∈N converges in Y, with limit, say Tx ∈ Y.
So we get a map x → Tx : X → Y.
Questions:(a)Is T ∈ CL(X, Y)?
(b)Does Tn T in CL(X, Y)?
(a)Is T a linear transformation?
If x1, x2 ∈ X, then (Tnx1)n∈N converges to Tx1 in Y, and (Tnx2)n∈N converges to Tx2 in Y. Thus (Tnx1 + Tnx2)n∈N = (Tn(x1 + x2))n∈N converges to Tx1 + Tx2 in Y. But we know that (Tn(x1 + x2))n∈N converges to T(x1 + x2) in Y. By the uniqueness of limits, T(x1 + x2) = Tx1 + Tx2.
Let α ∈ K and x ∈ X. Then (Tnx)n∈N converges to Tx in Y. So we have (α · (Tnx))n∈N = (Tn(α · x))n∈N converges to α · (Tx) in Y. But (Tn(α · x))n∈N converges to T(α · x) in Y. So α · T(x) = T(α · x).
Is T continuous? Let = 1. Then there exists an N ∈ N such that for all n, m > N, ||Tn – Tm||
= 1. So for all n > N, ||Tn – TN+1||
1. Thus for n > N and x ∈ X, ||Tnx – TN +1x||
||Tn – TN+1|| ||x||
1 · ||x||. Passing the limit n → ∞, we obtain ||Tx – TN+1x||
||x|| for all x ∈ X. So for all x ∈ X, ||Tx||
||Tx – TN+1x|| + ||TN+1x||
= (1 + ||TN+1||) ||x||. Conclusion: T ∈ CL(X, Y).
(b)Is it true that Tn = T in CL(X, Y)?
Let > 0. Then there exists an N ∈ N such that for all n, m > N, we have ||Tn – Tm||
. So for all n, m > N and all x ∈ X, we obtain that ||Tnx – Tmx||
||Tn – Tm || · ||x||
||x||. Passing to the limit as m → ∞, we get that for all n > N and x ∈ X, ||Tnx – Tx||
||x||. Hence for all n > N, ||Tn – T||
.
Corollary 2.2. If X is a normed space over K, then the dual space of X, X′ := CL(X, K), is a Banach space with the operator norm.
Corollary 2.3. If X is a Banach space, then CL(X) := CL(X, X) is a Banach space with the operator norm.
Remark 2.4. (“Hilbert” versus Banach spaces). In Chapter 4, we’ll meet Hilbert spaces: a Hilbert space is a special type of a Banach space in which the norm is induced by an “inner product”. If instead of Banach spaces, we are interested only in Hilbert spaces, then the notion of a Banach space is still indispensable, since for a Hilbert space H, the normed space CL(H) is typically only a Banach space, and not a Hilbert space in general.
Many claims in this section won’t be proved, but are included to provide the reader with a “road map”. The main content of the section are the definitions of the three operator topologies and the illustrative examples. One who wants to know more could embark on a deeper study, as offered for example in [Pedersen (1989)] or [Rudin (1976)].
Let a set X be equipped with two topologies, and let X1 (respectively X2) denote the set X equipped with the first (respectively second) topology. If the identity map x → x : X1 → X2 is continuous, namely if every set open in X2 is open in X1, one says that first topology is stronger than the second, or that the second topology is weaker/coarser/smaller than the first. Of all the topologies on the set X, there is a strongest one (discrete topology), namely the one for which all subsets of X are open, and there is a weakest one (trivial topology), namely the one for which only X, ∅ are open.
Now suppose we have a set X, and a family F = {fi : X → R | i ∈ I} of maps. Then of course there exists at least one topology on X with respect to which all the maps fi are continuous, namely the discrete topology on X. However, there is also a “less wasteful/more efficient/weakest” topology on X that makes all the maps fi, i ∈ I, continuous, characterized by the following: U is open in this topology on X if for every x ∈ U, there exist a finite number of indices i1, ···, in ∈ I and intervals (a1, b1), ···, (an, bn) such that x ∈ {y ∈ X : fik (y) ∈ (ak, bk), k = 1, ···, n} ⊂ U. It can be shown that this gives a topology T on X, and for any other topology T′ on X that makes the maps fi, i ∈ I, continuous, we have T ⊂ T′.
We had seen that CL(X, Y) is a normed space with the operator norm ||T|| := sup{||Tx|| : x ∈ X, ||x|| 1}, T ∈ CL(X, Y). We call the resulting topology the uniform operator topology on CL(X, Y), and is the weakest topology making each map in the family
continuous. A subset U ⊂ CL(X, Y) is open in the uniform operator topology on CL(X, Y) if for each T ∈ U, there exists an > 0 such that {S ∈ CL(X, Y) : ||S – T|| <
} ⊂ U. A sequence (Tn)n∈N converges to T ∈ CL(X, Y) in the uniform operator topology if
||Tn – T|| = 0.
We remark that besides the uniform operator topology on CL(X, Y), there are weaker topologies (with fewer open sets), on CL(X, Y), called the Strong Topology and the Weak Topology. Here are the definitions, although in this basic introduction, we won’t use these useful alternative topologies much.
Definition 2.4. (Strong Operator Topology)
Let X, Y be normed spaces. Then the weakest topology on CL(X, Y) which makes each map in the family
continuous, is called the strong operator topology on CL(X, Y). A subset U ⊂ CL(X, Y) is open in the strong operator topology on CL(X, Y) if for each T ∈ U, there exists an > 0 and finitely many x1, ···, xn ∈ X such that {S ∈ CL(X, Y) : ||Sxk – Txk|| <
, k = 1, ···, n} ⊂ U. A sequence (Tn)n∈N converges to T ∈ CL(X, Y) in the strong operator topology if for all x ∈ X,
||Tnx – Tx|| = 0.
Example 2.14. (Strong but not uniform convergence).
For n ∈ N, let Pn ∈ CL(ℓ2) be the “projection operator” given by
We claim that (Pn)n∈N converges to the identity operator I ∈ CL(ℓ2) in the strong operator topology.
Indeed, ||Ia – Pna||22 = ||(0, ···, 0, an+1, an+2, ···)||22 =
But the sequence (Pn)n∈N does not converge to the identity I ∈ CL(ℓ2) in the uniform operator topology. Let’s show this by contradiction.
Suppose it does converge to I with respect to the operator norm. With := 1/2 > 0, there exists an N ∈ N such that ||PN – I|| < 1/2. So if eN+1 ∈ ℓ2 is the sequence with the (N + 1)st term 1 and all others 0, then we have
a contradiction!
A yet weaker topology than the strong operator topology is the weak operator topology, defined below.
Definition 2.5. (Weak Operator Topology)
Let X, Y be normed spaces. Let Y′ := CL(Y, K). Then the weakest topology on CL(X, Y) which makes each map in the family
continuous, is called the weak operator topology on CL(X, Y). A subset U ⊂ CL(X, Y) is open in the weak operator topology on CL(X, Y) if for all T ∈ U, there exists an > 0, finitely many x1, ···, xn ∈ X, and φ1, ···, φn ∈ Y′ such that
A sequence (Tn)n∈N converges to T ∈ CL(X, Y) in the weak operator topology if for all φ ∈ Y′ and for all
The following table summarises this:
Example 2.15. (Weak but not strong convergence).
Let R ∈ CL(ℓ2) be given by ℓ2 ∋ (a1, a2, a3, ···) (0, a1, a2, a3, ···), the right shift operator. We claim that (Rn)n∈N converges to 0 ∈ CL(ℓ2) in the weak operator topology. We’ll use a result which will be proved later on in Theorem 2.14, page 104 (and also in Chapter 4, Theorem 4.10, page 189):
For each φ ∈ CL(ℓ2, C) =: (ℓ2)′, there is an xφ = (xφ(k)k∈N ∈ ℓ2, such that
(Here ·∗ denotes complex conjugation.)
Using the Cauchy-Schwarz inequality (page 159), for all a ∈ ℓ2, φ ∈ (ℓ2)′,
Thus (Rn)n∈N converges to 0 ∈ CL(ℓ2) in the weak operator topology.
If e1 := (1, 0, 0, ···) ∈ ℓ2, then Rne1 = (0, ···, 0, 1, 0, ···), the sequence with (n + 1)st term 1 and all others 0. So ||Rne1||2 = 1, n ∈ N. Thus it is not the case that Rne1 = 0 = 0e1.
So (Rn)n∈N does not converge to 0 in the strong operator topology.
If T ∈ CL(X, Y),S ∈ CL(Y, Z), then the composition ST : X → Z of T, S is defined by (ST) (x) = S(T(x)), x ∈ X.
It is easily checked that ST is linear. Moreover, it is continuous too, since for all x ∈ X, we have ||(ST)(x)|| = ||S(T(x)) ||S|| ||Tx||
||S|| ||T|| ||x||. Moreover, the above inequality shows that ||ST||
||S|| ||T||.
In particular, if X is a normed space, then CL(X), besides possessing a natural addition and scalar multiplication (both defined pointwise), also possesses a natural multiplication of elements of CL(X), namely composition (S, T) ST : CL(X) × CL(X) → CL(X). So CL(X) is an “algebra”. Loosely speaking, an algebra is a vector space in which there is also available a nice way of multiplying vectors and producing new vectors.
Definition 2.6. (Algebra). An algebra is a vector space V in which an associative and distributive multiplication is defined, that is,
for all u, v, w ∈ V, and which is related to scalar multiplication so that
for all u, v ∈ V and all α ∈ K. We call e ∈ V a multiplicative identity element if for all v ∈ V, one has ev = v = ve.
The algebra V := CL(X) has a multiplicative identity element, namely the identity operator I. The identity operator is the map I : X X, given by Ix = x, x ∈ X. The operator I clearly belongs to CL(X) (with ||I|| = 1), and I serves as the multiplicative identity element of the algebra CL(X): IT = T = TI for all T ∈ CL(X).
Definition 2.7. (Normed and Banach algebras).
A normed algebra is an algebra V equipped with a norm || · || that satisfies:
A Banach algebra is a normed algebra which is complete.
We note that V := CL(X) is a normed algebra. We’d seen earlier that CL(X) is a Banach space if X is a Banach space. So CL(X) is a Banach algebra if X is a Banach algebra.
Let us note that as opposed to vector addition in CL(X), vector multiplication (that is, composition) in CL(X) is in general not commutative. Here is an example. Take X = R2. Let T be clockwise rotation by π/2, and S be reflection in the x-axis, that is,
Then one can check that TS ≠ ST. This can also be observed visually by observing the distinct fates of the point (1, 0) under TS and under ST :
The commutator of A, B ∈ CL(X) is defined by [A, B] = AB – BA, and “measures” the lack of commutativity of A and B. The above example shows that the commutator may not be necessarily 0. In Exercise 2.22, page 86, we will investigate the “largeness” of the commutator in finite and infinite dimensional spaces X. This plays a role in Quantum Mechanics. We’ll show in Chapter 4 (page 204) that for “observables” A, B, the Heisenberg Uncertainty Relation holds:
We won’t explain this3 right now, but we simply notice that the commutator makes an appearance on the right-hand side.
If dim X = d < ∞, and T, S ∈ CL(X) are such that TS = I, then ST = I too. So TS = I ⇒ TS = ST = I. (Let us show this. First of all, if TS = I, then ker S = {0}. Indeed, if Sx = 0, then
Next observe that if {v1, ···, vd} is a basis for X, then {Sv1, ···, Svd} are linearly independent: if αks are scalars such that α1Sv1 + ··· + αdSvd = 0, then S(α1v1 + ··· + αdvd) = 0, and so α1v1 + ··· + αdvd = 0, making all αks zeros. Hence {Sv1, ···, Svd} must be a basis for X. For x ∈ X, there exist βks in K such that x = β1Sv1 + ··· + βdSvd = S(β1v1 + ··· + βdvd); and so STx = STS(β1v1 + ··· + βdvd) = SI(β1v1 + ··· + βdvd) = x.)
However, if dim X = ∞, then it can happen that TS = I, but ST ≠ I. Consider for example the left/right shift operators on ℓ2. We have LR = I as LR(a1, a2, a3, ···) = L(0, a1, a2, ···) = (a1, a2, ···) = I(a1, a2, a3, ···), for all (a1, a2, a3, ···) ∈ ℓ2. But RL ≠ I since
This prompts the following definition.
Definition 2.8. (Invertible operator) Let X be a normed space. An element A ∈ CL(X) is said to be invertible if there exists a B ∈ CL(X) such that AB = I = BA.
Inverses are unique. This follows from the associativity of composition.
Proposition 2.2. If A ∈ CL(X) is invertible, then there exists a unique B ∈ CL(X) such that AB = I = BA.
The unique inverse of an invertible A ∈ CL(X) is denoted by A–1 ∈ CL(X).
Proof. If B1, B2 ∈ CL(X) satisfy AB1 = I = B1A and AB2 = I = B2A, then B1 = IB1 = (B2A)B1 = B2(AB1) = B2I = B2.
Proposition 2.3. If A ∈ CL(X) is invertible, then A is bijective.
Proof. If x, y ∈ X are such that Ax = Ay, then A–1(Ax) = A–1(Ay), that is, Ix = Iy, and so x = y. Thus A is injective/one-to-one.
If y ∈ X, then x := A–1 y ∈ X, and so Ax = A(A–1y) = Iy = y. Hence A is surjective/onto too.
If A ∈ CL(X) is bijective, then the inverse map is automatically a linear transformation. In the case when dim X < ∞, we have L(X) = CL(X). So in this case the inverse is automatically continuous too. So if dim X < ∞, then A ∈ CL(X) is invertible if and only if A is a bijection.
In the infinite dimensional case, is it still true that if A ∈ CL(X) is a bijection, then A must be invertible? The answer is “yes” if X is a Banach space. The proof is not immediate, and we will show this below, using a deep result called the “Open Mapping Theorem”. But first, let us see an example showing that in non-Banach spaces, the inverses of continuous bijections may fail to be continuous.
Example 2.16. (Bijection, but not invertible.)
Recall that c00 is the subspace of ℓ∞ of all finitely supported sequences. Consider the map A : c00 → c00 given by
Then A is linear, and continuous (because ||Ax||∞ ||x||∞ for all x ∈ c00). It is also easily seen that A is injective and surjective. So A is a bijection. However, it is not invertible. Indeed, if otherwise, B ∈ CL(c00) is the inverse, then we would have, with em := (0, ···, 0, 1, 0, ···) (mth term 1, all others 0), that
giving m ||B|| for all m ∈ N, a contradiction. But we aren’t shocked by this example, since c00 is not complete with the supremum norm, and the equivalence of bijectivity with invertibility is supposed to hold for operators in a Banach space.
Exercise 2.20. (When is the diagonal operator invertible?)
Let (λn)n∈N be a bounded sequence in K, and consider Λ ∈ CL(ℓ2) given by
Show that Λ is invertible in CL(ℓ2) if and only if
Exercise 2.21. Let X be a normed space, and suppose that A, B ∈ CL(X).
Show that if I + AB is invertible, then I + BA is also invertible, with the inverse (I + BA)–1 given by I – B(I + AB)–1A.
Remark. This identity can be used to show that the nonzero spectrum of AB and BA coincide. λ is said to be in the spectrum of an operator T if λI – T is not invertible in CL(X).
Exercise 2.22. ([A, B] can’t be “large” for A, B ∈ CL(X).)
(1)The trace, tr(A), of a square matrix A = [aij] ∈ Cd×d is the sum of its diagonal entries: tr(A) = a11 + ··· + add. It can be shown that tr(A + B) = tr(A) + tr(B) and that tr(AB) = tr(BA). Prove that there cannot exist A, B in Cd×d such that AB – BA = I, where I denotes the d × d identity matrix.
(2)Let X be a normed space, and A, B be in CL(X). Show that if AB – BA = I, then for all n ∈ N, ABn – Bn A = nBn–1, where we set B0 := I. Taking the operator norm on both sides of ABn – Bn A = nBn–1, conclude that we can never have AB – BA = I with A, B ∈ CL(X).
(3)Let C∞(R) denote the set of all functions f : R → R such that for all n ∈ N, f(n) exists. It is clear that C∞(R) is a vector space with pointwise operations. Consider the operators A, B : C∞(R) → C∞(R) given as follows:
(The operators A and B appear as the momentum operator and the position operator in Quantum Mechanics.) Show that AB – BA = I, where I denotes the identity on C∞(R).
Theorem 2.9. (Neumann4 Series Theorem).
Let X be a Banach space, and A ∈ CL(X) be such that ||A|| < 1.
Then (1) I – A is invertible in CL(X),
In particular, I – A : X → X is bijective: for each y ∈ X, there exists a unique solution x ∈ X of the equation x – Ax = y, and moreover,
so that x depends continuously on y.
This plays a role in integral equation theory:
where y, k are given, and x is the unknown function.
(This is called the Fredholm equation of the second type.)
Proof. (Of the Neumann Series Theorem). For all n ∈ N, ||An|| ||A||n.
As ||A|| < 1, ||A||n converges. By comparison,
||An|| converges too.
As X is Banach, so is CL(X). Since all absolutely convergent series in the Banach space CL(X) converge, it follows that
converges in CL(X). Is this S the inverse of I – A? For n ∈ N, define
Then we know that Sn = S in CL(X). We have
Since ||ASn – AS|| ||A|| ||Sn – S|| and ||SnA – SA||
||A|| ||Sn – S||, it follows that SA = AS = S – I. This gives (I – A)S = I = S(I – A). Hence I – A is invertible in CL(X) and
Moreover,
Exercise 2.23. Consider the system
in the unknown variables (x1, x2) ∈ R2. If I denotes the 2 × 2 identity matrix, then this system can be written as (I – K)x = y, where
(1)Show that if R2 is equipped with the norm || · ||2, then ||K|| < 1.
Conclude that (2.3) has a unique solution (denoted by x in the sequel).
(2)Find out the unique solution x by computing (I – K)–1.
(3)Write a computer program to compute xn = (I + K + ··· + Kn)y and the relative error ||x – xn||2/||x||2 for various values of n (say, until the relative error is less than 1%). Note the slow convergence of the Neumann series.
Exercise 2.24. Let X be a Banach space, and let A ∈ CL(X) be such that ||A|| < 1.
For n ∈ N, let Pn := (I + A)(I + A2)(I + A4) ··· (I + A2).
(1)Using induction, show that (I – A)Pn = I – A2n+1 for all n ∈ N.
(2)Prove that (Pn)n∈N is convergent in CL(X) to (I – A)–1.
Exercise 2.25. (∗)(The set of invertibles is open, and ·–1 is continuous.)
Let X be a Banach space and GL(X) denote the set of all invertible continuous linear transformations on X.
(1)Prove that GL(X) is an open subset of CL(X) in the usual operator norm topology.
(2)Prove that T T–1 is continuous on GL(X), that is, for all T0 ∈ CL(X) and each
> 0, there exists a δ > 0 such that if T ∈ CL(X) satisfies ||T – T0|| < δ, then T ∈ GL(X) and ||T–1 – T0–1|| <
.
The exponential of an operator. Let X be a Banach space and let A ∈ CL(X). We will now study the exponential operator eA ∈ CL(X).
For a ∈ R, one defines the exponential ea ∈ R by
The exponential function e· is useful, because it provides a solution to the initial value problem for the most basic differential equation
(Here x(t) ∈ R and x0 ∈ R.) The unique solution is given by x(t) = etax0, t ∈ R. This fundamental differential equation arises in all sorts of applications, for example, radioactive decay, Newton’s law of cooling, continuous compound interest, population growth, etc.
For A ∈ CL(X), we will show that an analogous definition,
(where we have simply replaced the little a by capital A!) works, and the series converges in CL(X). Then the map t etA x0 provides a solution to the analogous initial value problem, but now in the Banach space X, with the initial condition x0 ∈ X.
Theorem 2.10. Let X be a Banach space, and A ∈ CL(X).
Then converges in CL(X).
Proof. The real series converges (to e||A||). Since for n ∈ N we have
by the Comparison Test,
converges absolutely. So
converges in the Banach space CL(X).
Remark 2.5. (∗) Recall that when a ∈ R, we have
Similarly, it can be shown that when A ∈ CL(X),
The last equality is not superfluous, since commutativity of multiplication in CL(X) is not always guaranteed, but it turns out that A does commute with etA. Formally, the above result is not surprising, as can be seen by differentiating the series for etA termwise with respect to t:
A rigorous justification can be given using the fact that e(t+s)A = etA esA for all s, t ∈ R. In general, if A, B ∈ CL(X) commute, that is, AB = BA, then eA+B = eA eB. This shows that eA is always invertible in CL(X). Indeed, since A commutes with –A, we have e–AeA = eA–A = e0 = I = eA e–A.
Now let x0 ∈ X, A ∈ CL(X), and consider the initial value problem:
Then x(t) := etA x0, t ∈ R, solves the initial value problem because
with x(0) = e0Ax0 = e0x0 = Ix0 = x0.
Moreover, the solution is unique, since if is any solution, then
so that e–tA(t) = e–0A
(0) = Ix0 = x0 for all t, giving
for all t ∈ R. Hence the solution t etA x0, t ∈ R, is unique.
Initial value problems in Banach spaces of the above type arise from initial boundary value problems for partial differential equations and their discretisations. More generally, the operator A in the initial value problem is then “unbounded”, and similar to t etA, one can then associate a “C0-semigroup
generated by the infinitesimal operator A”. The solution to the initial value problem is given by x(t) = etA x0 for t
0. For example, the initial value problem for the diffusion equation with the homogeneous Dirichlet boundary conditions
gives the initial value problem for the following ordinary differential equation in the Banach space L2[0, 1]:
where x(t) = u(·, t) ∈ L2[0, 1], and A : D(A)(⊂ L2[0, 1] → L2[0, 1] is an unbounded operator given by
and
This completes our (rather long!) Remark 2.5.
Example 2.17. (Computing eA for diagonalisable A). Consider the system
With x = (x1, x2), this system can be written as x′(t) = Ax(t), where
We know that given the initial condition x0 = (x1(0), x2(0)) ∈ R2, the unique solution is x(t) = etAx0. This raises the question:
There are several ways, but let us consider a method which works for diagonalisable As. First we note that if
and so
Note in particular that e0 = I, and so calculating eA cannot be the same as taking exponentials of the entries of A!
Now suppose that A is diagonalisable, that is, A = PDP–1 where D is diagonal and P is invertible. Then An = PDnP–1 and so
Let’s see this method in action when where a, b ∈ R.
By computing the eigenvalues and eigenvectors of A, we can write
In particular, our initial value problem for (2.4) has the solution (putting a = 1 and b = 2 above)
So we’ve seen how to compute eA if the matrix A is diagonalisable. However, not all matrices are diagonalisable. For example, consider the matrix
The eigenvalues of this matrix are both 0, and so if it were diagonalisable, say A = PDP–1, then the diagonal matrix D must be the zero matrix. But then A = PDP–1 = P0P–1 = 0, and we have arrived at a contradiction since A ≠ 0! So this A is not diagonalisable.
In general, however, every matrix has what is called a Jordan canonical form, that is, there exists an invertible P such that P–1AP = D + N, where D is diagonal, N is nilpotent (that is, there exists an n ∈ N such that Nn = 0), and D and N commute. Then the exponential of A is:
But the computation of a P taking A to its Jordan form requires some sophisticated linear algebra, and we won’t treat this here. The interested reader is referred to [Hirsch and Smale (1974), Chapter 6].
Exercise 2.26. (eA+B ≠ eA eB).
Compute eA and eB, where A, B are the nilpotent matrices
Give an example of matrices A, B ∈ R2×2 for which eA+B ≠ eA eB.
In this section, we will show Theorem 2.11, the “Open Mapping Theorem”. The proofs in this section are somewhat more technical than the rest of the sections of this chapter.
Definition 2.9. (Open map) Let X, Y be normed spaces.
T ∈ CL(X, Y) is called open if for all open sets U ⊂ X, T(U) is open in Y.
Proposition 2.4.
Let X, Y be normed spaces, T ∈ CL(X, Y), and B := {x ∈ X : ||x|| 1}. Then the following are equivalent:
(1)T is open.
(2)There exists a δ > 0 such that B(0Y, δ) ⊂ T(B).
Proof.
(2)⇒(1): Suppose that there exists a δ > 0 such that B(0Y, δ) ⊂ T(B). Let U be open in X. If y0 ∈ T(U), then y0 = Tx0 for some x0 ∈ U. As U is open, there exists a r > 0 such that the open ball B(x0, r) with centre x0 and radius r is contained in U. We claim that the open ball B(y0, δr/2) is contained in T(U). If y ∈ B(y0, δr/2), then ||y – y0|| < δr/2, that is, ||(2/r)(y – y0)|| < δ, and so (2/r)(y – y0) ∈ B(0Y, δ) ⊂ T(B). Hence there exists an x ∈ B such that (2/r)(y – y0) = Tx, that is, we have y = T((r/2)x + x0). But as ||((r/2)x + x0) – x0|| = (r/2)||x|| (r/2) · 1 < r, we see that (r/2)x + x0 ∈ B(x0, r) ⊂ U. Consequently, y ∈ T(U), as desired.
(1)⇒(2): Suppose that T is open. Then T(B(0X, 1)), the image of the open set B(0X, 1), must be open. But 0Y = T0X ∈ T(B(0X, 1)), and so, there must exist a δ > 0 such that the open ball B(0Y, δ) ⊂ T(B(0X, 1)) ⊂ T(B), as wanted. 0
Lemma 2.1. (Baire Lemma)
Let(1)X be a Banach space, and
(2)(Fn)n∈N be a sequence of closed sets in X such that X = Fn.
Then there exist an n ∈ N and a nonempty open set U such that U ⊂ Fn.
Proof. We assume none of the sets Fn contain a nonempty open subset and construct a Cauchy sequence that converges to a point, which lies in none of the Fn, contradicting the fact that the Fns cover X.
First let us observe that whenever a closed set F in X does not contain any open set, we have that F is dense in X. (To see this, let x ∈ X, and r > 0. We’d like to show that B(x, r) ∩
F ≠ ∅. If x ∈
F, then x ∈ B(x, r) ∩
F, and we are done. On the other hand, if x ∉
F, then x ∈ F. But as F doesn’t contain any open set, it won’t, in particular, contain B(x, r). So there must be an element y in B(x, r) which is not in F. But this means that y ∈
F, and so we’ve got y ∈ B(x, r) ∩
F, as wanted.) By our assumption, it follows that
Fn is dense in X for all n ∈ N.
Let x1 be any element in the nonempty (dense!) open set F1. Let r1 > 0 be such that
⊂
F1. As
F2 is dense in X, there exists an x2 ∈ B(x1, r1) ∩
F2. As B(x1, r1) ∩
F2 is open, we can find an r2 < r1/2 such that
⊂ B(x1, r1) ∩
F2. As
F3 is dense in X, there exists an x3 ∈ B(x2, r2) ∩
F3. As B(x2, r2) ∩
F3 is open, we can find an r3 < r1/4 such that
⊂ B(x2, r2) ∩
F3.
Proceeding in this manner, we obtain a sequence (xn)n∈N, with the term xn+1 ∈ B(xn, rn). If n > m, then B(xn, rn) ⊂ B(xm, rm), and so we have ||xn – xm|| < rm < r1/2m–1 0. Thus (xn)n∈N is Cauchy, and as X is Banach, also convergent, say, to x ∈ X. With a fixed m, in the inequality above, if we pass the limit as n → ∞, then we obtain ||x – xm||
rm, that is, x ∈
⊂
Fm. As the choice of m ∈ N was arbitrary, for all m ∈ N, x ∉ Fm. But this contradicts the fact that the Fms cover X.
Exercise 2.27.
Show that the Hamel basis5 of a Banach space can only be finite or uncountable.
Before proving the Open Mapping Theorem, we’ll give some notation and a useful technical result. For subsets A, B of a normed space X and a scalar α, we set αA := {αa : a ∈ A}, and A + B := {a + b : a ∈ A, b ∈ B}.
Lemma 2.2. Let X be a normed space, and A ⊂ X satisfy
(1)A is symmetric, that is, –A = A,
(2)A is mid-point convex, that is, for all x, y ∈ A, ∈ A, and
(3)there is a nonempty open set U ⊂ A.
Then there exists a δ > 0 such that B(0, δ) ⊂ A.
Proof. First note that for a fixed scalar α ≠ 0, and an a ∈ X, the maps x x + a : X → X and x
αx : X → X, are both continuous, with the continuous inverses (x
x – a and x
α–1x).
Hence if U is open in X, then U + {–a} is open in X.
So U + (–A) = (U + {–a}) is open in X. Thus
is open in X.
If a ∈ U, then 0 =
Thus there exists a δ > 0 such that B(0, δ) ⊂ .
Theorem 2.11. (Open Mapping Theorem).
Let X, Y be Banach spaces, and T ∈ CL(X, Y) be surjective.
Then T is open.
Proof. Let B := {x ∈ X : ||x|| 1}. Then X =
nB. Thanks to the surjectivity of T, we have Y =
T(nB). Thus certainly Y =
T(nB). It can be checked that T(nB) = nT(B). By the Baire Lemma, there exists an n ∈ N such that nT(B) contains a nonempty open set. But since the map x
nx : X → X is continuous with a continuous inverse, it follows that T(B) contains a nonempty open set too. By Lemma 2.2, there exists a δ > 0 such that B(0Y, δ) ⊂ T(B). We will now show that this implies
giving the required openness of T by Proposition 2.4. Let y such that ||y|| < δ/2. We must show that there exists a x ∈ B with y = Tx. Using B(0Y, δ) ⊂ T(B), it can be seen that
From (2.6), with n = 1, it follows that we can arbitrarily closely approximate y by elements from T(B/2). Thus there exists an x1 with ||x1|| 1/2 such that ||y – Tx1||
δ/4 that is, y – Tx1 ∈ B(0, δ/4). From (2.6) again it follows (with n = 2) that we can arbitrarily closely approximate y – Tx1 by an element Tx2 with ||x2||
1/4: ||y – Tx1 – Tx2||
δ/8. Proceeding in this manner, we can inductively construct a sequence (xn)n∈N such that: ||xn||
1/2n and ||y – Tx1 – Tx2 – ··· – Txn–1||
δ/2n.
As ||xn||
xn is absolutely convergent, and
.
If we denote the sum of the series xn by x, then
thanks to the continuity of T. Since ||x|| 1, this proves the desired inclusion (2.5).
Corollary 2.4. If X, Y are Banach spaces, and T ∈ CL(X, Y) is bijective, then T–1 ∈ L(Y, X) is continuous.
We then refer to T as a normed space isomorphism, and say that X, Y are isomorphic (as normed spaces), written X Y.
Proof. T is open, and so if U is open in X, T(U) is open in Y. But (T–1)–1(U) = {y ∈ Y : T–1y ∈ U} = {y ∈ Y : y ∈ T(U)} = T(U). Thus the inverse images of open sets under T–1 are open, showing that T–1 is continuous.
Exercise 2.28. Construct a continuous and surjective, but not open, f : R → R.
Exercise 2.29. (Closed Graph Theorem).
The aim of this exercise is to prove the Closed Graph Theorem:
Let X, Y be Banach spaces and T : X → Y be a linear transformation.
Then T is continuous if and only if its graph G(T) is closed in X × Y.
Here X × Y has the norm ||(x, y)} := max{||x||, ||y||}, (x, y) ∈ X × Y, and the set G(T) := {(x, Tx) : x ∈ X} ⊂ X × Y is the graph of T.
The “only if” part is easy to see. If (xn, Txn) → (x, y), then xn → x, and as T is continuous, ||Txn – Tx|| ||T|| ||xn – x||, so that Txn → Tx. But Txn → y, and so, by the uniqueness of limits, Tx = y. Thus (xn, Txn) → (x, Tx) ∈ G(T), showing that G(T) is closed.
Show the “if” part. Hint: Consider p : G(T) → X, where p((x, Tx)) = x, x ∈ X.
We give below another important application of the Baire Lemma.
Theorem 2.12. (Uniform Boundedness Principle).
Suppose that
(1)X and Y are Banach spaces,
(2)Ti ∈ CL(X, Y), i ∈ I, is a “pointwise bounded” family, that is,
Then the family is “uniformly bounded”, that is, ||Ti|| < + ∞.
Proof. For n ∈ N, Fn := {X ∈ X : ||Tix||
n} =
{x ∈ X : ||Tix||
n} is mid-point convex, symmetric, and closed, as Fn is the intersection of the mid-point convex, symmetric, and closed sets {x ∈ X : ||Tix||
n}, i ∈ I.
From (2), we have X = Fn, and so by the Baire Lemma, there exists an n such that Fn contains a nonempty open set. By Lemma 2.2, there exists a δ > 0 such that the ball B(0, δ) with center 0 and radius δ is contained in Fn, that is, if ||x|| < δ, then for all i ∈ I we have ||Tix||
n. We claim that ||Tix||
(2n/δ)||x|| for all x ∈ X and all i ∈ I. Clearly this is true if x = 0, since then both sides of the inequality are equal to 0. On the other hand, if x ≠ 0, then y :=
x has norm ||y|| = δ/2 < δ, and so we must have ||Tiy||
n, which, using the linearity of Ti and the positive homogeneity of the norm, delivers, upon a rearrangement, the desired inequality. Thus ||Ti||
2n/δ for all i ∈ I, and thus
||Ti||
2n/δ.
Corollary 2.5. (Banach-Steinhauss Theorem).
Let(1)X, Y be Banach spaces, and
(2)(Tn)n∈N in CL(X, Y) be such that Tnx exists for all x ∈ X.
Then x
Tnx : X → Y belongs to CL(X, Y).
Proof. It is clear that the map x
Tnx : X → Y is linear.
It remains to show that it is continuous too. Set Tx := Tnx, x ∈ X. For each x ∈ X, (Tnx)n∈N is convergent, and in particular, bounded:
Hence by the Uniform Boundedness Principle, there exists an M such that for all n ∈ N, ||Tn|| M. This gives, for each fixed x ∈ X, that
Passing the limit n → ∞ yields ||Tx|| M||x||. As the choice of x was arbitrary, this holds for all x, and consequently, the linear transformation T is continuous.
For a linear transformation T ∈ L(X) on a finite dimensional vector space X over C, the set of eigenvalues of T is known as its spectrum σ(T), and has cardinality at most dim X. But in infinite dimensional complex vector spaces, strange things may happen, for example linear transformations may have no eigenvalues at all or finitely many or (countably/uncountably) infinitely many! First of all, here is a natural definition of eigenvectors and eigenvalues, extending our prior familiarity with eigenvalues from elementary linear algebra. We remind the reader that the prefix eigen is derived from German, meaning “one’s own”.
Definition 2.10. (Eigenvalues and eigenvectors). Let X be a normed space and T ∈ CL(X). Then λ ∈ C is called an eigenvalue of T if there exists a nonzero vector x ∈ X such that Tx = λx. Such a nonzero vector x is then called an eigenvector of T corresponding to the eigenvalue λ.
Example 2.18. (Uncountably many eigenvalues).
Let λ ∈ D := {z ∈ C : |z| < 1}. If x := (1, λ, λ2, λ3, ···), then as |λ| < 1,
and so x ∈ ℓ2. Clearly x ≠ 0 too.
We see that x is an eigenvector of the left shift operator L ∈ CL(ℓ2) because
Thus each point in the open unit disk6 is an eigenvalue of L.
Example 2.19. (No eigenvalues). On the other hand, the right shift operator R ∈ CL(ℓ2) has no eigenvalues. Suppose that λ ∈ C is such that Rx = λx for some x = (xn)n∈N ∈ ℓ2. Then
Suppose first that λ ≠ 0. Then from the above, λx1 = 0 gives x1 = 0. Next, λx2 = x1 now gives x2 = 0. Proceeding in this manner, we obtain x1 = x2 = x3 = ··· = 0, and so x = 0.
On the other hand, if λ = 0, then (0, x1, x2, x3, ···) = (λx1, λx2, λx3, ···) shows immediately that x1 = x2 = x3 = ··· = 0, and so x = 0.
Consequently, R has no eigenvalues.
Note that when dim X < ∞, and T ∈ CL(X), then
λ ∈ C is an eigenvalue of T if and only if λI – T is not invertible.
So the points in the spectrum σ(T) are exactly the ones where λI – T fails to be invertible in σ(T). This prompts the following natural concept in the general case, that is, when dim X ∞.
Definition 2.11. (Spectrum and resolvent).
Let X be a normed space and T ∈ CL(X).
We say that λ ∈ C belongs to the spectrum σ(T) of T if λI – T is not invertible in CL(X). Thus
The set ρ(T) is called the resolvent set of T.
The set σp(T) of all eigenvalues of T is called the point spectrum of T.
We have that σp(T) ⊂ σ(T), since if λ ∈ σp(T), then there exists a nonzero vector x such that Tx = λx, that is, (λI – T)x = 0, showing that λI – T is not injective, and hence can’t be invertible either!
We’ll now show that if X is Banach and T ∈ CL(X), then σ(T) is a compact nonempty subset of C.
Theorem 2.13. Let X be a Banach space and T ∈ CL(X).
Then
(1)σ(T) ⊂ {λ ∈ C : |λ| ||T||}.
(2)ρ(T) is an open subset of C.
(3)σ(T) is a compact subset of C.
(4) σ(T) is nonempty.
(1)Let |λ| > ||T|| 0. Then
< 1, and so I –
is invertible in CL(X). Thus, as λ ≠ 0, we have that
is invertible in CL(X) too.
(2)Let λ0 ∈ ρ(T). Then for λ ∈ C,
So I – .
For λ0 ≈ λ, A has small norm, in particular, < 1. Hence it follows that (λ0I – T)–1(λI – T) =: S is invertible in CL(X). So we conclude that λI – T = (λ0I – T)S (being the product of two invertible operators in CL(X)) is also invertible in CL(X).
(3)σ(T) is bounded (as σ(T) ⊂ B(0, ||T||) := {z ∈ C : |z| ||T||}), and also it is closed (because its complement C\σ(T) = ρ(T) is open). So σ(T) is compact.
(4)(∗)7 Let σ(T) = ∅. Then f(z) := (zI – T)–1 ∈ CL(X) for all z ∈ C.
In particular, T–1 exists, and is not 0.
Let φ ∈ (CL(X))′ be such that φ(T–1) ≠ 0.
Such a φ exists by the Hahn-Banach Theorem (Exercise 2.38, page 109).
Let g : R2 → C be given by g(r, θ) = φ(f(reiθ)), for all (r, θ) ∈ R2.
We will show that g ∈ C1 (R2, C) by showing that it has continuous first order partial derivatives (which will in turn be used in the calculations, and also to justify a differentiation under the integral sign).
Using the resolvent identity (Exercise 2.30, page 102), we have
Using continuity of φ and that of the inverse operation (Exercise 2.25, page 88), it follows from the above calculation, that
Similarly, .
By differentiating under the integral sign, we obtain
Consequently,
Hence F is constant, and we have
Now
Fix r such that |φ(f(reiθ))| < . Then
giving 2 < 1, a contradiction. This completes the proof.
Example 2.20. (Spectrum of the left shift operator).
Consider the left shift operator L ∈ CL(ℓ2). Then ||L|| 1. So it follows that σ(L) ⊂ {z ∈ C : |z|
1}. As {z ∈ C : |z| < 1} ⊂ σp(L) ⊂ σ(L), and because σ(L) is closed, it follows that {z ∈ C : |z|
1} ⊂ σ(L) too. So σ(L) = {z ∈ C : |z|
1}.
We now claim that σp(L) = {z ∈ C : |z| < 1}. We had seen earlier that {z ∈ C : |z| < 1} ⊂ σp(L). Now we’ll show the reverse inclusion.
To this end, let λ ∈ σp(L) with eigenvector x = (xn)n∈N.
Then (x2, x3, ···) = L(x1, x2, x3, ···) = λ(x1, x2, x3, ···).
So λxn = xn+1 for all n, giving (by induction) xn = λn–1x1 for all n.
As ℓ2 ∋ x ≠ 0, we have
so that x1 ≠ 0, and the geometric series with common ratio |λ|2 converges. So |λ| < 1, and we get the reverse inclusion σp(T) ⊂ {z ∈ C : |z| < 1}.
We will return to this topic on spectral theory when we deal with operators on a Hilbert space, and also in the context of compact operators.
Exercise 2.30. (Resolvent Identity). Let X be a normed space, T ∈ CL(X) and λ, μ ∈ ρ(T). Prove that (λI – T)–1 – (μI – T)–1 = (μ – λ)(λI – T)–1(μI – T)–1.
Exercise 2.31. (Spectral radius). Let X be a Banach space, and T ∈ CL(X). Define the spectral radius of T by rσ(T) := |λ|.
(1)Prove that rσ(T) ||T||.
(2)Show that for TA ∈ CL(R2), A := , then rσ(TA) < ||TA||.
Here R2 has the usual Euclidean || · ||2-norm.
Remark. In this connection, the Gelfand-Beurling Formula8 says that:
If X is Banach and T ∈ CL(X), then rσ(T) = ||Tn||1/n.
Exercise 2.32. Let X be a Banach space, T ∈ CL(X), and λ ∈ σ(T).
Prove that λ2 belongs to the spectrum of T2.
Hint: Use (λ2I – T2) = (λI – T)(λI + T) = (λI + T)(λI – T).
Remark. More generally, the Spectral Mapping Theorem9 says that:
If X is a Banach space, T ∈ CL(X), p = c0 + c1z + · · · + cdzd ∈ C[z] (a polynomial with complex coefficients), and p(T) := c0I + c1T + · · · + cdTd, then we have σ(p(T)) = p(σ(T)) := {p(λ) : λ ∈ σ(T)}.
Exercise 2.33. (Spectrum of the diagonal operator).
Let (λn)n∈N be sequence in C which is convergent to 0, and consider Λ ∈ CL(ℓ2) given by Λ(a1, a2, a3, · · ·) = (λ1a1, λ2a2, λ3a3, · · ·) for all (a1, a2, a3, · · ·) ∈ ℓ2.
Show that {λn : n ∈ N) ⊂ σp(Λ) ⊂ {λn : n ∈ N) {0} = σ(Λ).
Remark. (Spectral Theorem for Compact Operators).
In Chapter 5, we will learn that this Λ is an example of a “compact operator”; see Example 5.3 on page 214. More generally, one can show the Spectral Theorem for Compact Operators, which says that for a compact operator K on an infinite dimensional Hilbert space H,
(1)σ(K\{0} = σp(K\{0}, and σ(K) is countable,
(2)0 is the only accumulation point of σ(K),
(3)For all λ ∈ σp(K\{0}, dim kerp(λI – K) = dim kerp(λ*I – K*) < ∞.
Exercise 2.34. (Approximate spectrum).
(1)Let X be a Banach space, and T ∈ CL(X). A number λ ∈ C is said to belong to the approximate spectrum σap(T) of T if there exists a sequence (xn)n∈N of vectors from X such that ||xn|| = 1 for all n ∈ N, and Txn – λxn 0 in X. Prove that σap(T) ⊂ σ(T).
(2)Let Λ ∈ CL(ℓ2) be the diagonal operator corresponding to a convergent (and hence bounded) sequence (λn)n∈N. Prove that λn ∈ σap(Λ).
Exercise 2.35. (Point spectrum of the position operator).
Let X be a normed space, and let A : DA → X be an “unbounded operator10”, where the domain DA is a subspace of X. Then the point spectrum of the unbounded operator A is defined in an analogous manner as before: σp(A) := {λ ∈ C : there exists an x ∈ DA\{0} such that Ax = λx}.
Now consider the position operator Q : DQ → L2(R), arising in Quantum Mechanics, where DQ := {Ψ ∈ L2(R) : (x → xΨ(x)) =: QΨ ∈ L2(R)}, and (QΨ)(x) := xΨ(x), for almost all x ∈ R, and all Ψ ∈ DQ.
Show that σp(Q) = ∅.
Remark. So Q has no eigenvectors in DQ ⊂ L2(R). However, when we learn elementary distribution theory later on in Chapter 6, we’ll see that xδλ = λδλ for all λ ∈ R, where δλ is the “Dirac distribution” with support at λ ∈ R. See Example 6.11 on page 251.
Definition 2.12. (Dual space of a normed space).
Let X be a normed space over K. Then the normed space CL(X, K), equipped with the operator norm, is called the dual space of X. One denotes the dual space of X simply by X′. Elements of the dual space are sometimes called bounded linear functionals.
Recall that a consequence of Theorem 2.8 (page 78) was Corollary 2.2, which says that X′ is always a Banach space, even if X isn’t. This is because K = R or C are both Banach spaces.
Given a concrete X, like Rd or ℓp, it is sometimes possible to “recognize” X′, that is to establish a (normed space) isomorphism from X′ to some other Banach space, for example:
Such results are called representation theorems, and we will see a few such results now, and also later on in the chapter on Hilbert spaces (Chapter 4), when we will learn about the Riesz Representation Theorem, page 189.
Theorem 2.14. For 1 p < ∞, (ℓp)′ » ℓp, where
(Here the understanding is that if p = 1, then q = ∞.)
Proof. (Sketch). We consider K = R for simplicity. Let 1 < p < ∞.
By Hölder’s Inequality, |a1b1 + · · · + anbn| ||(a1, · · ·, an)||p||b1, · · ·, bn)||q, with equality if
is a multiple of
. Let T ∈ CL(ℓp, R). Let ek ∈ ℓp be the sequence (0, · · ·, 0, 1, 0, · · ·) with kth term 1, and all others 0. Fix n ∈ N. Let a = (a1, · · ·, an, 0, · · ·) ∈ ℓp be such that
is a multiple of ((Te1)q, · · ·, (Ten)q) (i.e., ak := (Tek)p/q, k = 1, · · ·, n).
Then ||T||
= ||(Te1, · · · , Ten)||q.
Passing the limit n → ∞, we get (Te1, Te2, Te3, · · ·) ∈ ℓq. So we get a continuous linear transformation T (Te1, Te2, Te3, · · ·) : CL(ℓp, R) → ℓp. It can be checked that this map ι is injective and surjective. As ι is bijective, it is an isomorphism.
If p = 1, then let us now show that (ℓ1)′ ℓ∞. This is easier to see, since if T ∈ CL(ℓ1, R), then we get immediately that for all k, |Tek| =
||T||, giving (Tek)k∈N ∈ ℓ∞.
Remark 2.6. (The dual space (ℓ∞)′ ℓ′.)
If a = (an)n∈N ∈ ℓ1, then define the functional φa ∈ CL(ℓ∞, R) = (ℓ∞)′ by
Then a φa : ℓ1 → (ℓ∞)′ is an injective linear transformation. It is continuous since |φa(b)|
||b||∞||a||1 for all b ∈ ℓ∞, giving ||φa||
||a||1. However it is not surjective, and this can be shown by using the Hahn-Banach Theorem (see Theorem 2.15 on page 108), which says that a continuous linear functional on a subspace of a normed space can be extended to the whole normed space while preserving the operator norm of the functional. To see how this gives the non-surjectivity of the map a
φa : ℓ1 → (ℓ∞)′ above, let us consider the subspace c ⊂ ℓ∞ of all convergent subsequences, and the “limit functional” λ : c → R, given by
Then λ ∈ CL(c, R) = c′, and ||λ|| = 1. By the Hahn-Banach Theorem, this functional λ on the subspace c of ℓ∞ has an extension Λ ∈ CL(ℓ∞, R). But now we see that Λ can’t be φa for some a ∈ ℓ1. Otherwise, with en ∈ c ⊂ ℓ∞ being the sequence with nth term 1 and all others 0, we have
for all n, showing that a = 0, and so Λ = φa = 0, which is clearly false, since Λ(1, 1, 1, · · ·) = λ(1, 1, 1, · · ·) = 1 ≠ 0! So (ℓ∞)′ is “bigger” than ℓ1.
Remark 2.7. If 1 p < ∞, then it can be shown that
where
Exercise 2.36. Consider the subspace c0 ⊂ ℓ∞ consisting of all sequences that converge to 0. Prove that ℓ1 (c0)′.
Exercise 2.37. (∗) (Dual of C[a, b]). In this exercise we will learn a representation of the dual space of C[a, b]. A function μ : [a, b] → R is said to be of bounded variation on [a, b] if its total variation var(μ) on [a, b] is finite, where
Here P is the set of all partitions of [a, b]. A partition of [a, b] is a finite set P = {t0, t1, · · ·, tn–1, tn} with t0 := a < t1 < · · · < tn–1 < b =: tn.
(1)Show that the set of all functions of bounded variations on [a, b], with the usual pointwise operations forms a vector space, denoted BV[a, b].
Define || · || : BV[a, b] → [0, +∞) by ||μ|| := |μ(a)| + var(μ), for μ ∈ BV[a, b].
(2)Prove that || · || is a norm on BV[a, b].
The Riemann-Stieltjes integral: Let x ∈ C[a, b] and μ ∈ BV[a, b]. For a partition of [a, b], say P = {t0, t1, · · ·, tn–1, tn}, let δP be the length of a largest interval [tj–1, tj], that is, δP := max{t1 – t0, · · ·, tn – tn–1}, and set
Then it can be shown that there exists a unique real number, denoted by
called the Riemann-Stieltjes integral of x over [a, b] with respect to μ, such that for every > 0 there is a δ > 0 such that if P is a partition of [a, b] satisfying δP < δ, then
The usual linearity of the integral (as with the ordinary Riemann integral) holds:
(3)Prove that
||x||∞ var(μ), where x ∈ C[a, b] and μ ∈ BV[a, b].
(4)Conclude that every μ ∈ BV[a, b] gives rise to a φµ ∈ CL(C[a, b], R),
and that ||φµ|| var(μ).
The following converse result was proved by F. Riesz: For all φ ∈ CL(C[a, b], R), there exists a μ ∈ BV[a, b] such that
and ||φ|| = var(μ). In other words, every element (C[a, b]′ can be represented by a Riemann-Stieltjes integral.
(5)For the functional x x(a) on C[a, b], find a corresponding μ ∈ BV[a, b].
Dual spaces are important because, among other things, they allow us to define dual operators. Here is the definition.
Definition 2.13. (Dual operator).
Let X, Y be normed spaces, and T ∈ CL(X, Y). We define the dual operator (of T), T′ ∈ CL(Y′, X′), by (T′ψ)(x) = ψ(Tx), for all x ∈ X and ψ ∈ Y′.
Several things need to be checked here:
(1)For ψ ∈ Y′, does T′ψ belong to X′?
(2)Does T′ ∈ CL(Y′, X′)?
Let us begin with (1). If ψ′ ∈ Y′, then we have that
(L1)for all x1, x2 ∈ X,
(L2)for all α ∈ K and x ∈ X,
Hence T′ψ ∈ L(Y, K). Moreover T′ψ is continuous because for all x ∈ X,
Now let’s check (2), that is, that T′ ∈ CL(Y′, X′). We have
(L1)for all ψ1, ψ2 ∈ Y′, for all x ∈ X,
and so T′(ψ1 + ψ2) = T′(ψ1) + T′(ψ2),
(L2)for all α ∈ K, for all ψ ∈ Y′, for all x ∈ X,
and so T′(αψ) = α(T′ψ).
Thus T′ is linear. It is also continuous, because (2.7) gives ||T′ψ|| ||ψ||||T|| for all ψ, that is T′ ∈ CL(Y′, X′) and ||T′||
||T||.
Example 2.21. Consider x x′ : C1[0, 1] → C[0, 1], x ∈ C1[0, 1]. Then D′ : (C[0, 1]) → (C1[0, 1])′ is given by (D′ψ)(x) = ψ(Dx) = ψ(x′), ψ ∈ (C[0, 1])′, x ∈ C1[0, 1]. But (C[0, 1])′ ⊂ BV[0, 1], and so every ψ ∈ (C[0, 1])′ can be represented by some element μψ ∈ BV[0, 1], so that
Thus if ψ ∈ (C[0, 1])′, then
where μψ ∈ BV[0, 1] is such that ψ(y) = y(t)dμψ, y ∈ C[0, 1].
Sometimes problems for an operator can be simplified by looking at the dual operator, making the consideration of dual spaces and dual operators a useful endeavour.
Remark 2.8. (Dual versus adjoint operators). When we learn about Hilbert spaces, we will learn about the notion of the adjoint T* ∈ CL(Y, X) of an operator T ∈ CL(X, Y), where X, Y are Hilbert spaces, and we can use the Riesz Representation Theorem (which we will also learn there) to represent elements of Y′, X′ by elements of Y, X. (The next sentence should be read after the discussion of the adjoint operator and the Riesz Representation Theorem.) If X, Y are Hilbert spaces, and T ∈ CL(X, Y), then for Y ∋ yψ ≡ ψ ∈ Y′, we have for all x ∈ X that
where we identified T*yψ ∈ X with the functional x → 〈x, T*yψ〉 : X → K in X′. In this sense the notions of adjoint and dual “coincide” in this context of operators on Hilbert spaces.
Hahn-Banach Theorem11. Finally, we will learn a fundamental result, known as the Hahn-Banach Theorem, which says that X′ always contains sufficiently many elements to separate points of X : for x ≠ y in X, there exists a φ ∈ X′ such that φ(x) ≠ φ(y). In this sense, the elements of X′ play the role of “coordinates” for the points of X (which is the kind of thinking one is used to in elementary linear algebra when X = Kd).
Theorem 2.15. (Hahn-Banach).
Let (1) X be a normed space,
(2)Y ⊂ X be a linear subspace,
(3)φ ∈ CL(Y, K).
Then there exists a Φ ∈ CL(X, K) such that Φ|Y = φ and ||Φ|| = ||φ||.
In other words: “Every continuous linear functional on a subspace Y of a normed space X possesses a norm-preserving extension to the entire normed space X”.
Before proving the Hahn-Banach Theorem, we will now list a few important consequences one obtains from it.
Corollary 2.6. Let X be a normed space and x0 ∈ X. Then there exists an element Φ ∈ X′ such that Φ(x0) = ||x0|| and ||Φ|| = 1.
Proof. Let Y := span {x0} and φ : Y → K be given by φ(y) = α||x0|| for y = αx0 ∈ Y, α ∈ K. Then φ is linear. Moreover, φ is a continuous map because |φ(y)| = |φ(αx0)| = |α||x0||| = ||αx0|| = ||y||, that is, |φ(y)| = ||y|| for all y ∈ Y. Hence ||φ|| = 1. By Hahn-Banach there now exists an extension Φ with the desired property.
As mentioned earlier, once we have the Hahn-Banach Theorem, one has the ability of distinguishing elements of X using elements from X′. This is shown in the two corollaries below.
Corollary 2.7. Let x and y be elements in a normed space X with x ≠ y. Then there exists a Φ ∈ X′ such that Φ(x) ≠ Φ(y). (In other words, X′ separates the points of X.)
Proof. Take Φ ∈ X′ with Φ(xq– Φ(y) = Φ(x – y) = ||x – y|| ≠ 0.
Exercise 2.38. Let X be a complex normed space and x∗ ∈ X\{0}. Show that there exists a φ∗ ∈ CL(X, C) such that φ∗(x∗) ≠ 0.
Remark 2.9. (CL(X, Y) Banach ⇒ Y Banach).
Fix any nonzero x∗ ∈ X. By Exercise 2.38, there exists a φ ∈ CL(X, C), such that φ(x∗) ≠ 0. Let (yn)n∈N be a Cauchy sequence in Y. For n ∈ N, define Tn ∈ CL(X, Y) by
Then using the linearity of φ, it follows that Tn is linear. Also, Tn is continuous because
A similar computation also gives that for n, m ∈ N,
showing that (Tn)n∈N is a Cauchy sequence (since (yn)n∈N is Cauchy). As CL(X, Y) is Banach, the Cauchy sequence (Tn)n∈N is convergent, with limit, say, T ∈ CL(X, Y). But for x ∈ X,
and so we have that for all x ∈ X, (Tnx)n∈N converges to Tx.
In particular, with x = x∗, (Tnx∗)n∈N = (yn)n∈N converges to Tx∗ = y.
Hence Y is a Banach space!
Since X′ is itself a normed space, we know that X′ too has a dual space (X′)′ =: X″, and X″ called the bidual of X. For x ∈ X, now consider the map φx : X′ → K, given by
It is clear that φx is linear. Moreover, it is continuous too, since
We thus see that ||φx|| ||x||.
If x = 0, the zero vector in X, we have φx = 0, the zero linear transformation in CL(X′, K). So ||φx|| = 0 = ||x|| in this case.
If x ≠ 0, then from Corollary 2.6 it follows that there exists a ψ ∈ X′\{0} for which |ψ(x)| = ||x|| and ||ψ|| = 1. So we get the reverse inequality ||φx|| ||x|| too. Hence ||φx|| = ||x|| in this case as well.
So we have the following third consequence of the Hahn-Banach Theorem:
Corollary 2.8. Let X be a normed space and x ∈ X. Then the map φx on X′, given by has the operator norm ||φx|| = ||x||.
Thus map x φx : X
X″ is a linear isometric embedding of X in X″. If we consider the elements of X as bounded linear functionals on X′ (by identifying x with φx), then:
where the norm of x on X agrees with the norm of φx in X″. Sometimes, the map x φx from X into X″ is also surjective. In that case, the space X is called reflexive and the inclusion (2.8) is replaced by the equality:
For proving the Hahn-Banach theorem, we will need a few preliminaries. We will first prove the theorem in the case K = R, and then show how the result for the case K = C can be derived from the real case.
In the following lemma, we consider a normed space X over R, and instead of a norm, we consider a more general function p : X → R such that
that is, a subadditive and positive-homogeneous functional.
Lemma 2.3. (Hahn-Banach Lemma).
Let X be a normed space over R and ( : X → R satisfy (2.9) and (2.10). Furthermore, let Y ⊂ X be a subspace and φ : Y → R be a linear map such that
Then there exists a linear map Φ : X → R such that Φ|Y = φ, and
(That is, there exists a linear extension of φ to X preserving the estimate.)
Proof. (∗) This is a rather technical proof, but the idea of the proof is to extend φ “one dimension at a time”. Let x0 ∈ X\Y. Every vector x ∈ Y + (span {x0}) has a unique decomposition x = αx0 + y, with y ∈ Y and α ∈ R. An extension Φ of φ to Y + (span {x0}) is given by Φ(x) = αr + φ(y), where r, which ought to be Φ(x0), will be chosen so that (2.12) holds, that is:
Owing to the positive homogeneity of p, it is sufficient to choose r such that (2.13) is satisfied with α = 1 and α = –1:
Indeed, once these hold, then multiplication with t > 0 yields
which, in light of the fact that every element from Y can be written in the form ty with y ∈ Y, gives (2.13) with α = ±t ≠ 0. For α = 0, (2.13) is already satisfied according to the hypothesis. Now the inequalities (2.14) and (2.15) are equivalent to the statement:
But there exists such a r precisely if all numbers φ(y) – p(–x0 + y), with y ∈ Y, lie to the left of all numbers –φ(z) + p(x0 + z), with z ∈ Y, that is,
that is, if for all y, z ∈ Y, φ(y) + φ(z) p(–x0 + y) + p(x0 + z). But this is indeed the case since we have for all y, z ∈ Y that
Now from (2.16) it now follows that:
and it is sufficient to choose, for instance,
(In general, the sup and the inf here are unequal, and we can choose r arbitrarily from an interval.) Now the number r also satisfies (2.14) and (2.15) and thus from (2.13), we have obtained an extension to Y + (span {x0}) such that (2.12) holds.
Now the idea is that we extend φ one dimension at a time in order to get an extension to the space X. If X were finite dimensional, then it is clear that this can be done. After dim X – dim Y steps we will have obtained a linear transformation Φ : X → R that satisfies (2.12).
In the general case, the proof goes through, in essentially the same manner, by successive one-dimensional extensions, but we won’t be able to get an extension to X in finitely many steps. In order to complete the process, we will use Zorn’s Lemma.
Zorn’s Lemma
Zorn’s Lemma says that a partially ordered set P with the property that every chain has an upper bound in P possesses a maximal element. The terms are explained below.
A partial order on a set P is a relation on P satisfying
•(transitivity) for all x, y, z ∈ P, x y, y
z ⇒ x
z,
•(antisymmetry) for all x, y ∈ P, x y, y
x ⇒ x = y,
•(reflexivity) for all x ∈ P, x x.
A set with a partial order is called a partially ordered set.
A familiar example is R with the usual relation, but the situation can be much more general: for example, consider R2 with the order: (a, b)
(c, d) if a
c and b
d. This latter example justifies the terminology partial. Indeed,
is not a total order because not every pair of elements can be compared with
: we have neither (0, 1)
(1, 0) nor (1, 0)
(0, 1).
A subset C of P is said to be bounded above if there exists an element u ∈ P such that x u for all x ∈ C. The element u ∈ P is then called an upper bound of C.
A subset C of P is said to be chain if for all x, y ∈ C, there holds x y or y
x. Thus on a chain C,
forms a total order since any two elements in C can be compared with
. The set R2 with the above order
is not a chain, since neither (0, 1)
(1, 0) nor (1, 0)
(0, 1). However, the diagonal {(x, x) : x ∈ R} is a chain.
An element m ∈ P is called maximal if whenever x ∈ P and m x we have that x = m.
Zorn’s Lemma (named after the mathematician Max Zorn) is an axiom in Set Theory. It can be shown that it is equivalent with the Axiom of Choice: for every family Ai, i ∈ I, of nonempty sets Ai, there exists a map I ∋ i → xi ∈ Ai.
In order to apply Zorn’s Lemma to complete the proof of the Hahn-Banach Lemma, we proceed as follows.
Consider the set P of all pairs (Z, ψ), where Z is a subspace of X with Y ⊂ Z ⊂ X, and ψ : Z → R is a linear transformation extending φ such that ψ(z) p(z) for all z ∈ Z.
We define the partial order on P by defining (Z, ψ)
(Z′, ψ′) if Z ⊂ Z′ and ψ = ψ′|Z. Then every chain in P has an upper bound, as explained below.
If C is a chain in P, then we can construct an upper bound (ZC, ψC) of C as follows: Let ZC be the union of all subspaces Z, with (Z, ψ) ∈ C and let ψC be the common extension of the linear transformations ψ. More precisely, for z ∈ ZC, there exists a (Z, ψ) ∈ C such that z ∈ Z, and we define ψC(z) = ψ(z). This definition of ψC(z) is independent of the choice of (Z, ψ). Indeed, if (Z′, ψ′) also belongs to C, and z ∈ Z′, then we have (Z, ψ) (Z′, ψ′) or (Z′, ψ′)
(Z, ψ), and so ψ is the restriction of ψ′ or vice versa. In either case, we have ψ(z) = ψ′(z). The map ψC : ZC → R so defined is linear: Indeed, if z, z′ belong to ZC, then there exists a (Z, ψ) such that z ∈ Z and there exists a (Z′, ψ′) such that z′ ∈ Z′. We have Z ⊂ Z′ or Z′ ⊂ Z. Suppose that Z′ ⊂ Z. Then also z′ ∈ Z so that for α, α′ ∈ R, we have αz + α′z′ ∈ Z ⊂ ZC, and so it follows that ZC is subspace of X, and ψC(αz + α′z′) = ψ(αz + α′z′) = αψ(z) + α′ψ(z′) = αψC(z) + α′ψC(z′). Finally, ψC satisfies the inequality ψC(z)
p(z), z ∈ ZC, since indeed ψ(z)
p(z) for all z ∈ Z, for all (Z, ψ) ∈ C. Thus we see that (ZC, ψC) belongs to P and that (Z, ψ)
(ZC, ψC) for all (Z, ψ) ∈ C. This completes the proof that every chain in P has an upper bound.
By Zorn’s Lemma, P has a maximal element (Z∗, Φ). Then Z∗ = X. Indeed, if Z∗ X, then there exists an x∗ ∈ X\Z∗, and then from the first part of the proof of the Hahn-Banach Lemma, it follows that we can extend Φ to Z∗ + (span{x∗}) with the same estimate given by p, contradicting the maximality of (Z∗,Φ). Thus we have a linear Φ : X → R that extends φ : Y → R, while satisfying the estimate (2.12). This completes the proof of the Hahn-Banach Lemma!
We will now apply this Hahn-Banach Lemma to prove the Hahn-Banach Theorem, first of all in the case when K = R.
Proof. (Of the Hahn-Banach Theorem; real case.)
Let φ : Y → R be a continuous linear transformation. Then we have:
Now we apply the Hahn-Banach Lemma with p(x) := ||φ|| ||x||, x ∈ X. From (2.17), we certainly have for all y ∈ Y, φ(y) |φ(y)|
||φ|| ||x|| = p(y). Thus, by the Hahn-Banach Lemma, there exists a linear map Φ : X → R, extending φ to X, that moreover satisfies the estimate that for all x ∈ X, Φ(x)
p(x) = ||φ|| ||x||. Replacing x by –x, we obtain –Φ(x)
||φ|| ||x||, and so for all x ∈ X, |Φ(x)|
||φ|| ||x||. Hence it follows that Φ is continuous and that ||Φ||
||φ||. Since φ is the restriction of Φ, we have, on the other hand, also that
This proves the Hahn-Banach Theorem in the real case.
The proof for complex scalars can be derived from the real case. We remark that real versions of the Hahn-Banach Theorem were first proved independently by Hahn and by Banach. The complex version was given by Bohnenblust and Sobcyzk, following the ideas of Murray.
Proof. (Of the Hahn-Banach Theorem; complex case.)
Let X be a normed space over C. By restricting the multiplication with scalars to real numbers, we obtain a normed space over R, which we denote simply by XR. If Φ : X → C is a linear transformation, then ΦR : XR → R, given by ΦR(x) = Re(Φ(x)), x ∈ XR, is also a linear transformation. We now observe below that Φ is completely determined by its “real part” ΦR. For complex z = a + ib, with a, b ∈ R, we have iz = –b + ia, and hence Im(z) = –Re(iz). So Im(Φ(x)) = –Re(iΦ(x)) = –Re(Φ(ix)) = –ΦR(ix). Thus
Now if ΦR : XR → R is R-linear, then the right-hand side expression of (2.18) determines a C-linear map Φ : X → C:
(1)It is clear that Φ is R-linear.
(2)We have Φ(ix) = ΦR(ix) – iΦR(–x) = i(ΦR(x) – iΦR(ix) = iΦ(x).
Since every complex number is of the form a + ib, with a, b ∈ R, it follows from here and the above part (1) that Φ is also C-linear.
Finally we show that Φ continuous if and only if its real part ΦR is continuous, and moreover, ||Φ|| = ||ΦR||.
(1)If Φ is continuous, then since |ΦR(x)| = |Re(Φ(x))| |Φ(x)|
||Φ|| |x||, we have that ΦR is continuous and moreover ||ΦR||
||Φ||.
(2)Now suppose that ΦR is continuous and that Φ is given by (2.18). For x ∈ X, let θ ∈ R be such that Φ(x) = eiθ |Φ(x)|. As |Φ(x)| is real,
so that Φ is continuous, and moreover ||Φ|| ||ΦR||.
The proof of the Hahn-Banach theorem in the complex case can now be completed as follows. Let φ ∈ CL(Y, C) and let φR ∈ CL(YR, R) be the real part of φ. Then there exists an extension ΦR ∈ CL(XR, R) of φR to XR with ||ΦR|| = ||φR||. Let Φ ∈ CL(X, C) defined by (2.18). Then Φ is an extension of φ, and ||Φ|| = ||ΦR|| = ||φR|| = ||φ||.
Exercise 2.39. (Hamel basis). Let X be a vector space over any field F. Show that there exists a subset B ⊂ X such that B is linearly independent, and span B = X. Such a set is called a Hamel basis of X.
Exercise 2.40. Let X, Y be vector spaces over a field F. Show that any function f : B → Y defined on a Hamel basis B of X can be extended to a linear transformation F : X → Y, that is, F |B = f. Hint: Every vector in X can be uniquely expressed as a linear combination of vectors from B.
Exercise 2.41. Let X be an infinite dimensional normed space, and let Y be a nontrivial normed space. Prove that there exists a linear transformation from X to Y which is not continuous.
Exercise 2.42. R is a vector space over Q, and hence has a Hamel basis B. Prove that B is necessarily uncountable.
Exercise 2.43. (Additive discontinuous F : R → R). Show that there exists a function F : R → R such that for all x, y ∈ R, F(x + y) = F(x) + F(y), but F is not continuous on R.
Exercise 2.44. (∗)(Banach limits).
Consider the subspace c of ℓ∞ comprising convergent sequences.
Let l : c → K be the limit functional given by
(1)Show that l is an element in the dual space CL(c, K) of c, when c is equipped with the induced norm from ℓ∞.
Let Y ⊂ ℓ∞ be given by
(2)Show that Y is a subspace of ℓ∞.
(3)Prove that for all x ∈ ℓ∞, x – Sx ∈ Y, where S : ℓ∞ → ℓ∞ denotes the left shift operator: S(x1, x2, x3, ···) = (x2, x3, x4, ···), (xn)n∈N ∈ ℓ∞.
(4)Prove that c ⊂ Y.
(5)Show that there exists a L ∈ CL(ℓ∞, K) such that L|c = l and LS = L.
This gives a generalisation of the concept of a limit, and the number Lx is called a Banach limit of a (possibly divergent!) sequence x ∈ ℓ∞.
Hint: First observe that L0 : Y → K defined by
is an extension of the functional l from c to Y. Now use the Hahn-Banach Theorem to extend L0 from Y to ℓ∞.
(6)Find the Banach limit of the divergent sequence ((–1)n)n∈N.
The proof of the Open Mapping Theorem given in §2.5, and the proof of the Hahn-Banach Theorem given in §2.7 are based on [Thomas (1997)].
1 This has nothing to do with the null space: ker TA := {x ∈ L2[0, 1] : TAx = 0}, which is also called the kernel of the integral operator.
2 In Fourier/Harmonic Analysis, this is sometimes called the Cesáro summation operator.
3 That is, what Δψ, 〈·〉ψ, etc. mean.
4 The “geometric” series in (2) is called the Neumann series, after the German mathematician Carl Neumann, who used it in connection with the Dirichlet problem.
5 For the definition/existence of a Hamel basis, see Exercise 2.39, page 115.
6 We remark that if we look at the “matrix” corresponding to L, while thinking of vectors in ℓ2 as an “infinite columns”, then the action of L is described by
a matrix with all diagonal entries equal to 0, and with 1s along an “upper” diagonal. So in this case, our “matricial intuition” would have led us astray, since based on the above matrix, reminiscent of a Jordan block in finite dimensional linear algebra, one would be tempted to hastily guess that L has the only eigenvalue 0!
7 The usual proof of this is by using some tools from complex analysis. We will instead follow the proof from [Singh (2006)] relying on real analysis techniques.
8 See for example [Taylor and Lay (1980), page 287, Theorem 3.2].
9 See for example [Taylor and Lay (1980), page 279, Theorem 3.4].
10 By an unbounded operator, we mean a linear transformation that is not continuous.
11 Named after the mathematicians Hans Hahn and Stefan Banach.