Differential Calculus

Abstract

We could cite various forerunners of differential calculus, including Descartes, Fermat and Cavalieri, but Newton and Leibniz should be remembered as the true pioneers of the field. This dual paternity created terrible priority disputes where the only certainty is the complexity of the controversy. Newton’s “fluxions” and Leibniz’s “vanishing quantities” are analogous to our modern concepts of derivative and differential, respectively. The extension of these ideas to functions of several variables was due to Euler (who introduced partial derivatives) and Clairaut. After contributions by many other mathematicians, including Volterra and Hadamard, the concept of a derivative of arbitrary order (Lemma-Definition 1.4) was ultimately introduced by Fréchet between 1909 and 1925. The inverse mapping theorem was proved by Lagrange in 1770, and the simplest case of the implicit function theorem was proved by Cauchy around 1833, followed by the case of vector-valued functions of several variables by Dini in 1877. Gateaux then extended Fréchet’s earlier ideas to develop his concept of differential, which was presented in a posthumous publication in 1919. The “convenient” form of differentials, one of their most recent incarnations, was introduced by Frölicher, Kriegl and Michor in the early 1980s (p. 73). These differentials were intended for mappings taking values in locally convex non-normable spaces, in particular nuclear Fréchet spaces (see, in particular, section 5.3.2 on manifolds of mappings). Wherever possible, this chapter therefore chooses to present differential calculus for mappings which take values in locally convex spaces rather than the normed vector spaces considered by the standard approach (nothing essential changes). Nonetheless, for the inverse mapping theorem and the implicit function theorem (Theorems 1.29 and 1.30), we will restrict attention to the Banach case (for a more general context, which uses yet another concept of differentiability distinct from any of those mentioned above). Both theorems are proved in full detail, including the Banach analytic case, by explicitly filling in the hints from. Together with the Carathéodory theorems mentioned below, these two theorems are the most profound results of this chapter.

Keywords

Calculus of variations; “Convenient” differentials; Differential Calculus; Existence and uniqueness theorems; Fréchet differential calculus; Gateaux differentials; Lagrange variations; Mappings of class Cp; Parameter dependence; Taylor’s formulas

1.1 Introduction

We could cite various forerunners of differential calculus, including Descartes, Fermat and Cavalieri, but Newton and Leibniz should be remembered as the true pioneers of the field. This dual paternity created terrible priority disputes where the only certainty is the complexity of the controversy [HAL 80]. Newton’s “fluxions” and Leibniz’s “vanishing quantities” are analogous to our modern concepts of derivative and differential, respectively. The extension of these ideas to functions of several variables was due to Euler (who introduced partial derivatives) and Clairaut. After contributions by many other mathematicians, including Volterra and Hadamard, the concept of a derivative of arbitrary order (Lemma-Definition 1.4) was ultimately introduced by Fréchet between 1909 and 1925. The inverse mapping theorem was proved by Lagrange in 1770, and the simplest case of the implicit function theorem was proved by Cauchy around 1833, followed by the case of vector-valued functions of several variables by Dini in 1877. Gateaux then extended Fréchet’s earlier ideas to develop his concept of differential, which was presented in a posthumous publication in 1919. The “convenient” form of differentials, one of their most recent incarnations, was introduced by Frölicher, Kriegl and Michor in the early 1980s ([KRI 97], p. 73). These differentials were intended for mappings taking values in locally convex non-normable spaces, in particular nuclear Fréchet spaces (see, in particular, section 5.3.2 on manifolds of mappings). Wherever possible, this chapter therefore chooses to present differential calculus for mappings which take values in locally convex spaces rather than the normed vector spaces considered by the standard approach (nothing essential changes). Nonetheless, for the inverse mapping theorem and the implicit function theorem (Theorems 1.29 and 1.30), we will restrict attention to the Banach case (for a more general context, see [GLÖ 06], which uses yet another concept of differentiability distinct from any of those mentioned above). Both theorems are proved in full detail, including the Banach analytic case, by explicitly filling in the hints from [WHI 65]. Together with the Carathéodory theorems mentioned below, these two theorems are the most profound results of this chapter.

The classical Cauchy–Lipschitz conditions for the existence and uniqueness of solutions of ordinary differential equations were considerably weakened by Carathéodory ([CAR 27], section 576 and following) using measure and integration theory ([P2], section 4.1), which was an extremely recent development at the time. It was judged useful to give a full proof of Carathéodory’s theorems in section 1.5.1, since the available literature seems to expect the reader to reassemble this proof from a scattered collection of isolated results and special cases, at the expense of clarity. The parameter dependence of solutions is studied in section 1.5.3. Again, proofs are given in full, with the exception of one result: the differentiability of solutions with respect to the initial condition when the classical hypothesis (Corollary 1.82) is replaced by Lusin’s condition. This generalization is important (Remark 1.84) and the proof is not uninteresting ([ALE 87], Chapter 2, section 2.5.6); it is omitted from these pages not because it is too difficult but because it is too long.

1.2 Fréchet differential calculus

1.2.1 General conventions

The conventions presented below apply throughout the entire volume. The conventions listed under (II) are motivated by tensor calculus (Chapter 4).

(I) K si1_e denotes the field of real or complex numbers. Two elements ∞ and ω are adjoined to the set of integers ℤ si2_e . The usual order relation on ℤ si3_e is extended by the convention n < ∞ < ω for every integer n ∈ ℤ si4_e , with ∞ + n = n + ∞ = ∞, ω + n = n + ω = ω. We define ℕ K = ℕ ∪ ∞ ω si5_e if K = ℝ si6_e and ℕ K = 0 ω si7_e if K = ℂ si8_e , and ℕ K × = ℕ K − 0 si9_e . See also the convention (C) in section 2.2.1 (II) .

For α = α 1 … α n ∈ ℕ n si10_e , recall that α! = α ₁!…α _n!. For α ∈ ℤ n si11_e , | α | = α ₁ + … + α _n.

The locally convex vector spaces considered below are defined over K si1_e and are always Hausdorff, with the exception of semi-normed spaces. We will use the Landau notation: let X be a topological space, a some point in X, F a locally convex space, and f : X → ℝ + si13_e , g : X → F two mappings; suppose that f (x) > 0 in some neighborhood of a. We write g = O (f)(x → a) (respectively g = o (f)(x → a)) and say that g is dominated by f (respectively is negligible with respect to f)in the neighborhood of a if the function g/f is bounded in some neighborhood of a (respectively tends to 0 as the variable x tends to a). If there are several such mappings, we can write O ₁ (f), O ₂ (f) (respectively o ₁ (f), o ₂ (f)), etc.

(II) Unless otherwise stated, the dual of a locally convex space E is denoted by E ^∨ ¹. Let (e _i) be a basis of the finite-dimensional vector space E, (e ^∨ î) the dual basis, (e′_i) some other basis of E and (e′^∨ i) its dual basis ([P1], section 3.1.3(VI)). In practice, we can unambiguously write e _i′ for e′_i′, e ^∨ i′ for e′^∨ i′. Similarly, given a change-of-basis matrix A = (A _i ^i′) (where i ranges over the rows and i′ ranges over the columns), we can write A − 1 = A i ′ i = 1 det A α T si14_e , where α is the matrix of cofactors of A ([P1], section 2.3.11 (V) ); thus, Σ_i′, A _i ^i′ A _i′ ^j = δ _i ^j ². Let x = Σ_i x ⁱ e _i ∈ E; E is typically a left K si1_e -vector space and so the vector x can be represented by the row (x ⁱ) with respect to the basis (e _i) ([P1], section 3.1.3(II)). By contrast, E ^∨ is typically a right K si1_e -vector space; thus, if x ^∨ = Σ_i x _i ^∨ e ^∨ i ∈ E ^∨, the covector x ^∨ can be represented by the column (x _i ^∨) with respect to the basis (e ^∨ i) ([P1], section 3.1.3(IV)).

Memory aid:– Indices such as i, j, k refer to the old basis; primed indices such as i′, j′, k′ refer to the new basis and are written as superscripts for the change-of-basis matrix A or as subscripts for its inverse A ^− 1. The indices of the components of a vector are usually written as superscripts; the indices of the components of a covector are usually written as subscripts. The indices of a sequence of vectors are always subscripts; the indices of a sequence of covectors are always superscripts. This is motivated by “Einstein’s summation convention” (see Remark 4.2 in section 4.2.1).

If x ∈ E, x ^∨ ∈ E ^∨, the duality bracket of these two vectors is written as 〈x, x ^∨〉. A change of basis in E and E ^∨ may therefore be written as:

e i = ∑ j ′ A i j ′ e j ′ , e j ′ = ∑ j A j ′ i e i , e ∨ i = ∑ j ′ e ∨ j ′ A j ′ i , e ∨ j ′ = ∑ i e ∨ i A i j ′ .

si17_e [1.1]

Let x = Σ_i x ⁱ e _i ∈ E and x^∨ = Σ_i e ^∨ i x _i ^∨ ∈ E′. Then, x = Σ_j′ x ^j′ e _j′ and x ^∨ = Σ_j′ e ^∨ j′ x _j′ ^∨, where:

x i ∨ = ∑ j ′ A i j ′ x j ′ ∨ , x j ′ ∨ = ∑ i A j ′ i x i ∨ , x i = ∑ j ′ x j ′ A j ′ i , x j ′ = ∑ i x i A i j ′ .

si19_e [1.2]

Since K si1_e is commutative, we do not need to distinguish between left K si1_e -vector spaces and right K si1_e -vector spaces. Therefore, the duality bracket of x ∈ E, x ^∨ ∈ E ^∨ can also be written as 〈x ^∨, x〉. For example, if f : Ω → K si23_e is a differentiable function in some non-empty open subset Ω of E, then, at each point a of Ω, we have Df (a) ∈ E ^∨ (see Lemma-Definition 1.4). If h _a ∈ E, it might seem more convenient to write (Df (a), h _a) instead of (h _a, Df (a)) for the quantity Df (a).h _a, but the reverse notation is also perfectly justifiable (section 2.2.4 (IV) ).

(III) Recall the following fact ([P2], section 3.9.3(II)): let E ₁, …, E _n be normed vector spaces, each equipped with a norm |.|, and suppose that F is a semi-normed vector space equipped with a semi-norm |.|_γ. The space ℒ E 1 … E n F si24_e of continuous n-linear mappings from E ₁ × … × E _n into F is a semi-normed vector space when equipped with the semi-norm ||.||_γ defined for any mapping u ∈ ℒ E 1 … E n F si25_e by:

u γ = sup ∣ x 1 ∣ , … , ∣ x n ∣ ≤ 1 u x 1 … x n γ .

si26_e [1.3]

If F is a Hausdorff locally convex space whose topology is induced by the family of semi-norms (|.|_γ)_γ ∈ Γ ([P2], section 3.3.3), then ℒ E 1 … E n F si24_e is a Hausdorff locally convex space when equipped with the family of semi-norms (|.|_γ)_γ ∈ Γ. This space is quasi-complete whenever F is quasi-complete ([P2], section 3.4.2 (II)).

Remark 1.1

If F is a normed vector space with norm |.|, the index γ andthe set Γ have only to be omitted. This simplification only causes problems in sections 1.3.1 and 1.3.3 . In sections 1.2.4 and 1.2.5 , read “Banach space” instead of “quasi-complete locally convex space” if the simplification has been made.

Write ℒ n E F si28_e for the space ℒ E 1 … E n F si24_e whenever E _i = E for all i ∈ {1, …, n}. An element u of ℒ n E F si28_e is said to be symmetric if, for all (h ¹, …, h ⁿ) ∈ E ⁿ and every permutation σ ∈ S n si31_e ,we have u(h ¹, …, h ⁿ) = u(h ^σ(1), …, h ^σ(n)). The set of symmetric elements of ℒ n E F si28_e is written as ℒ n , s E F si33_e and is a vector subspace of ℒ n E F si28_e ; this subspace is equipped with the family of semi-norms (||.||_γ)_γ ∈ Γ. If h ∈ E and u ∈ ℒ n , s E F si35_e , set:

u . h n = u . h , … , h ⏟ n terms .

si36_e

If E is finite-dimensional, then we have the following canonical isomorphism:

ℒ n , s E F ≅ Hom K E ⊗ … ⊗ E ; F ⏟ n terms .

si37_e [1.4]

(IV) See also the conventions (C1) (section 2.2.1 (III) , p. 55), (C2) (section 2.2.2 (III) , p. 58), (C3) (section 6.2.1 (II), p. 253).

1.2.2 Fréchet differential

(I) Let E, F be two locally convex spaces and ϕ a mapping from U into F, where U is some neighborhood of 0 in E.

Definition 1.2

We say that ϕ is tangent to 0 if, for every neighborhood W of 0 in F , there exists a balanced neighborhood V ⊂ U of 0 in E such that, for every t ∈ K with sufficiently small | t |, we have ϕ (tV) ⊂ o (t) W (where o : K → K ).

Lemma 1.3

i) If ϕ is linear and tangent to 0, then ϕ = 0

ii) If E and F are normed vector spaces with norm |.|, then ϕ is tangent to 0 if and only if | ϕ (x)| = o (| x |).

Proof

(i) With the notation of Definition 1.2, we have ϕ V ⊂ o t t W si40_e , so ϕ (V) ⊂ {0} and ϕ = 0. (ii): exercise.

From this lemma, we immediately deduce the following claim (ii):

Lemma-Definition 1.4

Let a be some point of E, U some neighborhood of 0 in E , and f : U + a → F some mapping where a + U := {a + x : x ∈ U}.

i) We say that f is (Fréchet) differentiable at a if there exists a continuous linear mapping L ∈ ℒ E F such that the mapping h ↦ f (a + h) − f (a) − L.h (defined in U) is tangent to 0.
ii) This mapping L is uniquely determined.
iii) This mapping is written as L = D f (a) (or L = d _a f or L = f′ (a)) and is called the (Fréchet) differential of f at the point a.

The notion of the Fréchet differential is especially fruitful when E is a normed vector space. In this case, if A is an open subset of E, F is a locally convex space, a is a point of A, and f : A → F is a mapping, then f is differentiable at a with differential D f (a) at this point if and only if

lim h → 0 , h ≠ 0 f a + h − f a − D f a . h h = 0 .

si42_e

The canonical isomorphism ℒ K F ≅ F : u ↦ u .1 si43_e ([P2], section 3.3.8) reintroduces the usual notion of derivative, including the notion of a “complex derivative” when K = ℂ si8_e ([P2], section 4.2.1, Theorem-Definition 4.45). Indeed, keeping the same notation as before, we obtain:

Corollary-Definition 1.5

If E = K si45_e , the function f : U + a → F is differentiable at the point a ∈ K si46_e (in the above sense) if and only if it is differentiable at a (in the usual sense of differentiability of functions of one real or complex variable), in which case the element f . a = Df a .1 si47_e of F is the derivative of f at a.

Theorem 1.6

(Rolle) Let [a, b] be a compact interval of ℝ si48_e with non-empty interior and suppose that φ : a b → ℝ si49_e is a continuous function in [a, b] that is differentiable in ]a, b[ and which satisfies φ (a) = φ (b). Then, there exists c ∈ ]a, b[ such that φ . c = 0 si50_e .

Proof

If φ is constant in [a, b], then it is differentiable and has zero derivative on ]a, b[. otherwise, it attains its lower and upper bounds ([P2], section 2.3.7, Theorem 2.42). one of these values is attained at some point c ∈ ]a, b[, which must therefore satisfy φ . c = 0 si50_e (exercise).

If f is differentiable at the point a, then it is also continuous at this point (exercise). Let A be a non-empty open subset of E and write D a A F si52_e for the set of mappings from A into F that are differentiable at the point a ∈ A. The mapping D a A F → ℒ E F : f ↦ D f a si53_e is known as the differentiation operator at the point a and is K si1_e -linear.

Definition 1.7

Let f ∈ D a A F si55_e . The rank rk (D f (a)) ([P1],section 3.1.10, Definition 3.38 (ii)) is said to be the rank of f at the point a and is written as rk _a (f).

(II) Let E be a real Hilbert space, A some non-empty open subset of E, and f : A → ℝ si56_e a function that is differentiable at the point a ∈ A. Then, Df (a) ∈ E ^∨. By Riesz’s theorem ([P2], section 3.10.2(IV), Theorem 3.15(1)), there exists some uniquely determined element x ^* ∈ E such that Df (a) coincides with the linear form h ↦ 〈x ^∗| h〉.

Definition 1.8

The element x ^* specified above is written as grad _a (f) or ∇_a f and is called the gradient of f at the point a. It satisfies 〈∇_a f | h〉 ≔ Df(a). h for all h ∈ E.

(III) Some of the classical results of differential calculus ([DIE 93], Volume 1, Chapter 8) are reproduced below. A few are stated in a slightly more general form than the cited reference; however, the reasoning is identical and entirely straightforward in each case.

Theorem 1.9

(chain rule) Let E, F be normed vector spaces, G a locally convex space, A some non-empty open subset of E , a some point of A, B an open subset of F containing f (a), f ∈ D a A F si55_e , and g ∈ D f a B G si58_e . Then, g ∘ f ∈ D a A G si59_e and

D g ∘ f a = D g f a ∘ D f a .

si60_e [1.5]

Example 1.10

Let E be a Banach space. The set ℌ ⊂ ℒ E si61_e of continuous linear bijections is open in ℒ E si62_e ([P2], section 3.4.1 (II), Lemma 3.50 ). The mapping u ↦ u ^− 1 from ℌ si63_e onto ℌ si63_e is differentiable, and its differential at the point u ₀ is the continuous linear mapping s ↦ − u ₀ ^− 1 ∘ s ∘ u ₀ (from ℒ E si62_e into ℒ E si62_e ). The reader may wish to prove this result as an exercise or refer to Lemma 1.24 of section 1.2.5 (II).

(IV) Let E ₁, …, E _n be normed vector spaces. Then, E = E ₁ × … × E _n can be canonically equipped with the structure of a normed vector space ([P2], section 3.4.1 (I)). Let A be a non-empty open subset of E and suppose that a = (a ¹,…,a ⁿ) ∈ A. Suppose further that F is a locally convex space. If f : A → F : (x ¹, …, x ⁿ) ↦ f (x ¹, …, x ⁿ) is differentiable at the point a, then the mapping

h i ↦ f a 1 … a i − 1 a i + h i a i + 1 … a n

si67_e

is defined in some open neighborhood of 0 in E _i and differentiable at 0 for all i ∈ {1, …n} (exercise); its differential at this point is written as D _i f (a) (or d _i f (a) or ∂ f ∂ x i a ).

Definition 1.11

The mapping D _i f (a) is called the i-th partial differential of f at the point a.

Given the conditions stated above, the differential D f (a) can be expressed in terms of the partial differentials D _i f (a) as follows:

D f a . h = ∑ i = 1 n D i f a . h i ,

si69_e [1.6]

where h = (h ¹, …, h ⁿ). If E i = K , the element ∂_i f (a) = D _i f (a).1 ∈ F is called the i-th partial derivative of f at the point a; writing (e _i)_{1 ≤ i ≤ n} for the canonical basis of K n , we have

∂ i f a = D i f a .1 ∈ F .

si72_e [1.7]

If K = ℝ si6_e , the existence of the partial differentials D _i f (a)(i = 1, …, n) does not imply the existence of the differential D f (a) (however, see Theorem 1.16(ii)).

If E j = K si74_e for all j ∈{1, …, n} and F = K m si75_e , then the linear mapping D f (a) can be represented with respect to the canonical bases by the Jacobian matrix [∂_i f ^j (a)]_{1 ≤ i ≤ m, 1 ≤ j ≤ n}, where f = (f ¹, …, f ⁿ). If n = m, the determinant of this matrix is called the Jacobian of f at the point a and is written as ∂ f 1 … f n ∂ x 1 … x n a si76_e .

(V) In the following, K = ℝ si6_e . Let F be a locally convex space and |.|_γ a continuous semi-norm on F.

Theorem 1.12

(mean value theorem, 1st version) Let I = [α, β] be a compact interval of ℝ si48_e , f a continuous mapping from I into F and φ _γ a continuous mapping from I into ℝ si48_e . Suppose that there exists a countable subset D ⊂ I ∘ si80_e such that f and φ _γ both have a derivative at every point ξ ∈ D and f . ξ γ ≤ φ . γ ξ si81_e . Then, | f(β) − f(α)|_γ ≤ φ _γ(β) − φ _γ(α).

Proof

This is a classical result³ when F is a normed vector space with norm |.|; we simply need to replace |.| par |.|_γ and φ by φ_γ.

Let E be a normed vector space with norm |.|. With the notation of [1.3] (with n = 1), we have the following result:

Theorem 1.13

(mean value theorem, 2nd version) Let f be a mapping taking values in F that is defined and continuous in some neighborhood of the closed segment S = [a, a + h] joining the points a, a + h of E ([P2], section 3.3.1 ). Suppose that f is differentiable at every point of S.

i) Given any subset Θ of [0, 1] with countable complement in [0, 1],
f a + h − f a γ ≤ h . sup t ∈ Θ D f a + t . h γ .

ii) Let L ∈ ℒ E F . Given any subset Ξ of [a, a + h] with countable complement in [a, a + h], we have | f (a + h) − f (a) − L.h |_γ ≤ M _γ. | h | with M _γ = sup_x ∈ Ξ || D f (x) − L ||_γ . In particular, if M _γ = sup_x ∈ Ξ || D f (x) − D f (a)||_γ , then
f a + h − f a − D f a . h γ ≤ M γ h .

[1.8]

iii) Let F = K = ℝ . There exists θ ∈ ]0, 1[ such that
f a + h − f a = D f a + θ . h . h .

Proof

Let g :[0, 1] ↦ F : t ↦ f (a + t.h). By Theorem 1.9, g . t = D f a + t . h . h si87_e , so g . t γ ≤ N γ h si88_e , where N _γ = sup_t ∈ Θ || D f (a + t.h)||_γ. The upper bound of (i) is therefore obtained by applying Theorem 1.12 with f replaced by g and φ _γ (t) = N _γ.t. | h |.

(ii) Can be deduced by applying (i) to the function ξ ↦ f (ξ) − L. (ξ − x).
(iii) Let g : 0 1 → ℝ : t ↦ f a + t . h − f a − ( f a + h − f a . t . Since g (0) = g (1) = 0, Rolle’s theorem (Theorem 1.6) implies the existence of a number θ ∈ ]0, 1[ such that g . θ = 0 .

Theorem 1.13 has the following two corollaries:

Corollary 1.14

Let A be a connected open subset of E and suppose that f : A → F is differentiable in A. If the differential D f : x ↦ D f (x) is zero in the complement of a countable subset of A, then f must be constant in A.

Corollary 1.15

Let (|.|_γ)_γ ∈ Γ be a family of semi-norms that induce the topology of F, A some convex open subset of E and f : A → F a differentiable mapping in A. If, for all γ ∈ Γ, there exists a real number k _γ > 0 such that sup_x ∈ A || D f (x)||_γ ≤ k _γ, then f is Lipschitz ([P2],section 2.4.3(II)) and hence uniformly continuous.

Proof

Let x′, x″ ∈ A. Since A is convex, the segment [x′, x″] is contained in A, so by Theorem 1.13(i) | f (x′) − f (x″)|_γ ≤ k _γ | x′ − x″| for all γ ∈ Γ, which means that f is Lipschitz.

1.2.3 Mappings of class C^p

In this section, K = ℝ .

(I) Let E be a normed vector space and suppose that F is a locally convex space. Write ℒ n E F si28_e for the space of continuous n-linear mappings from E ⁿ into F equipped with the family of semi-norms [1.3].

Let A be a non-empty open subset of E and suppose that f : A → F is a mapping. We say that f is of class C ⁰ if it is continuous in A. We say that it is of class C ¹ if it is differentiable in A and its differential D f : A → ℒ E F si92_e is continuous. We say that f is twice differentiable (respectively is of class C ² )in A if its differential D f is differentiable (respectively is of class C ¹) in A. At any given point a ∈ A, the second differential D (D f)(a), written as D ² f (a), is an element of ℒ E ℒ E F si93_e ; this space can be identified with ℒ 2 E F si94_e by ([P2], section 3.9.3(II)). Similarly, we may recursively define a p-times differentiable mapping (respectively a mapping of class C ^p)in A for any integer p ≥ l, as well as the p-th order differential D p f a ∈ ℒ p E F si95_e for any point a ∈ A. We say that f is of class C ^∞ if f is of class C ^p for every p ≥ l. We write C p A F si96_e for the K si1_e -vector space of mappings of class C ^p from A into F (0 ≤ p ≤ ∞).

From ([DIE 93], Volume 1, (8.12.1), (8.16.6)), mutatis mutandis, we have the following result:

Theorem 1.16

Let F be a locally convex space.

i) (Schwarz’s theorem) Let E be a normed vector space, A some non-empty open subset of E and f : A → F a p-times differentiable mapping (p ≥ 2) at some point a ∈ A. Then, the differential D ^p f (a) is symmetric and therefore belongs to ℒ p , s E F si98_e .

ii) Let E ₁, …, E _n be normed vector spaces (n ≥ 2) and suppose that A is a non-empty subset of E ₁ × … × E _n. For any p such that 1 ≤ p ≤ ∞, f is of class C ^p in A if and only if the n ^p partial derivatives of order p of f exist and are continuous in A (where n ^∞ := ∞).

By Theorem 1.16(ii), a mapping f from a non-empty open subset Ω of ℝ n si99_e into ℂ si100_e is of class C ^p if and only if its n ^p partial derivatives exist and are continuous, i.e. f ∈ ℰ p Ω si101_e ([P2], section 4.3.1 (I)).

(II) In the context of Theorem 1.16(i), when E = K n si102_e , introducing “symbolic powers” gives us an easy way to express D ^p f (a).h ^p (with the notation of section 1.2.1 (III)) in terms of the partial derivatives of order p:

∂ i 1 … ∂ i p f a ≔ D i 1 … D i p f a . e i 1 … e i p ,

si103_e

where D _i denotes the partial differential in the i-th variable. Writing h = (h ¹, …, h ⁿ), we have D ² f (a).h ² = Σ_{1 ≤ i,j ≤ n} ∂_i ∂_j f (a).h ⁱ h ^j 4 . This can be expressed as (Σ_{1 ≤ i ≤ n} ∂_i f (a).h ⁱ)^[2] by developing the latter like any other squared parentheses and replacing (∂_i f (a).h ⁱ) (∂_j f (a).h ^j) by ∂_i ∂_j f (a).h ⁱ h ^j. With the same conventions, we can continue inductively to obtain the following result:

D p f a . h p = ∑ 1 ≤ i ≤ n ∂ i f a . h i p .

si104_e

If E i = K si70_e (i = 1, …, n), F = K si106_e , and f : A → K si107_e is twice differentiable at some point a of A, then the matrix H f (a) = (∂_i ∂_j f (a))_{1 ≤ i,j ≤ n} is called the Hessian matrix 5 of f at the point a. Identifying the vectors of K n si71_e with the columns of this matrix gives D ² f (a).h ² = h ^T.H f (a). h .

(III) Suppose that l ≤ p ≤ ∞. The following results can be shown by induction (exercise). The composition of two mappings of class C ^p is also of class C ^p. If E is a normed vector space, A some non-empty open subset of E, F and G locally convex spaces, u a continuous linear mapping from F into G and f : A → F a mapping of class C ^p in A, then u ∘ f is of class C ^p in A and D ^p (u ∘ f) = u ∘ D ^p f. With the same hypotheses on E and A, if F ₁, F ₂, and G are locally convex spaces, [.,.] is a continuous bilinear mapping from F ₁ × F ₂ into G, and f _i is a mapping of class C ^p from A into F _i for i = 1, 2, then [f₁, f₂] is of class C ^p from A into G and the so-called Leibniz rule holds:

D f 1 f 2 = D f 1 + f 2 + f 1 D f 2 .

si109_e [1.9]

(IV) Nemytskii Operator Let F be a Banach space, J a compact interval of ℝ with non-empty interior, and C p J F the space of mappings of class C ^p (p ≥ 1) from J into F. Then, C p J F is a Banach space when equipped with the norm

φ p = sup 0 ≤ k ≤ p , t ∈ J φ k t

si113_e

(exercise). Let U be an open neighborhood of 0 in F, f : U × J → F a mapping of class C ^p and N : U × J ∘ → F the so-called Nemytskii operator, defined by N φ t = f φ t t , where U ≔ φ ∈ C p J F : φ t ∈ U ∀ t ∈ J ∘ .

Theorem 1.17

The Nemytskii operator N si117_e is of class C ^p and

D N φ t . h τ = D 1 f φ t t . h t + φ . t . τ + D 2 f φ t t . τ .

si118_e [1.10]

Proof

The operator N si117_e is the composition

U × J ∘ → ev × 1 J ∘ F × J ∘ → f F ,

si120_e

where ev : C p J F × J ∘ → F is the evaluation operator defined by ev (φ, t) = φ(t). But ev is of class C ^p and

D ev φ t . h τ = h t + φ . t . τ

si122_e [1.11]

(exercise ^* : see [ABR 83], Proposition 2.4.17). Since f is of class C ^p, so is N si117_e by (III), and [1.10] follows from [1.5].

Remark 1.18

The operator N si117_e defined above is stated slightly more generally than the classical Nemytskii operator, which does not allow t to vary (to recover the classical case, set the increase τ to 0 in [1.10] ).

1.2.4 Taylor’s formulas

Throughout this section, K = ℝ si6_e , and F denotes a quasi-complete locally convex space. Readers who are only interested in mappings taking values in a Banach space (see Remark 1.1) may skip (I) and (II) below.

(I) Mackey Convergence Let B be a balanced convex subset (also known as an “absolutely convex” subset ([KÖT 79], Volume 1, section 16.1(2))) that is closed and bounded in F, and write F ₁ = ∪_n = 1 ^∞ nB. The gauge of p _B of B in F ₁ ([P2], section 3.3.2 (I)) is a norm on F ₁ (exercise) and F ₁ is a Banach space, written as F _B, when equipped with this norm.

Definition 1.19

We say that a sequence (x _n) of elements of F is Mackey convergent to 0 (or locally convergent to 0 ( [KÖT 79] , Volume 1, section 28.3)), if there exists a balanced, closed and bounded subset B containing 0 and all the x _n such that (x _n) converges to 0 in F _B.

Mackey convergence of a sequence of elements of F implies the usual notion of convergence and is equivalent to it whenever F is a Fréchet space or the strong dual of an infrabarreled Schwartz space ([HOG 71], p. 8, Example 3), such as the distribution spaces studied in ([P2], section 4.4.1).

(II) Generalized Riemann integral To state Taylor’s formulas in a general framework, it will be useful to introduce the generalized Riemann integral of a function of a real variable taking values in a quasi-complete locally convex space⁶.

Lemma-definition 1.20

Let F be a quasi-complete locally convex space, I a non-empty open interval of ℝ si48_e , c some point of I and f : I → F a continuous mapping.

i) There exists a unique differentiable mapping ∫ f : I → F such that (∫ f)(c) = 0 and (∫ f)^′ = f .
ii) Let a, b ∈ I. Write ∫_a ^b f(t)dt = (∫ f)(b) − (∫ f)(a); this quantity is called the (generalized) Riemann integral of f from a to b.
iii) If f : [a, b] → F is Lipschitz, consider the points a = t ₀ < t ₁ < … < t _n = b of the interval [a, b], and the Riemann sum S _n = ∑_i = 0 ^n − 1(t _i + 1 − t _i)f(t _i). If n → ∞ and | t_i + 1 − t _i | → 0 for all i ∈ {1, …, n − 1}, then S _n → ∫_a ^b f(t)dt in the sense of Mackey convergence.

Proof

See [KRI 97], Chapter 1, Lemma 2.5 and Proposition 2.7.

If F is a Banach space, this recovers the classical notion of the integral of a continuous function. The integration operator ∫_a ^b defined above satisfies the usual properties ([KRI 97], Chapter 1, Corollary 2.6). In particular, we have the following result:

Theorem 1.21

(integral mean value theorem) Let I = [α, β] be a compact interval of ℝ si48_e and let f : I → F be a continuous mapping. For every continuous semi-norm |.|_γ of F,

∫ α β f t . dt γ ≤ ∫ α β f t . dt γ ≤ β − α . sup t ∈ I f t γ .

si128_e

Proof

Let g(t) = ∫_α ^t f(t). dt. By the definition of the integral, g is differentiable in I and g . t γ = φ . t si129_e , where φ . t = f t γ si130_e . Applying the mean value theorem (Theorem 1.12) to g therefore gives the desired inequalities.

(III) Taylor’s formulas Let E be a normed vector space with norm |.|, A some non-empty open subset of E and F a quasi-complete locally convex space whose topology is defined by a family of semi-norms (|.|_γ)_γ ∈ Γ. With the conventions of section 1.2.1 and p ≥ l, we have the following result:

Theorem 1.22

i) Let f : A → F be a mapping of class C ^p in A. If the segment [a, a + h] is contained in A, then f satisfies Taylor’s formula
f a + h = ∑ k = 0 p − 1 1 k ! D k f a . h k + r p h ,

[1.12]

where the residual r _p (h) is the (Laplace residual) integral
r p h = ∫ 0 1 1 − t p − 1 p − 1 ! D p f a + t . h . h p . dt .

ii) As | h | → 0, we can alternatively express the residual in the form of Young’s residual:
r p h = 1 p ! D p f a . h p + o h p .

iii) Let γ ∈ Γ and suppose that there exists some real number M > 0 such that sup_{x ∈]a,a + h[} || D ^p f (x)||_γ ≤ M. Then, | r _p (h)|_γ is upper bounded by the Lagrange residual:
r p h γ ≤ M p ! h p .

iv) If K = F = ℝ , suppose that f is of class C ^p − 1 in A and admits a differential of order p, D ^p f (x), at every point of the open segment ]a, a + h[. The Lagrange residual can be expressed as follows: there exists θ ∈ ]0, 1[ such that
r p h = D p f a + θ . h p ! h p .

Proof

i) For p = 1,
f a + 1 . h − f a = ∫ 0 1 d dt f a + t . h dt = ∫ 0 1 D f a + t . h . h . dt .

For p = 2, we can simply evaluate the above integral by parts. This integral is of the form ∫₀ ¹ u. dv with u = D f (a + t h).h and v = 1 − t; hence, [u. v]₀ ¹ − ∫₀ ¹ v. d u = D f(a) + ∫₀ ¹(1 − t)D ² f(a + t h). h ². dt. Continuing inductively gives (i).

ii) Since D ^p f is continuous, for all ε > 0, there exists a real number r _γ > 0 such that || D ^p f (a + t.h) − D ^p f (a)||_γ ≤ p!ε whenever | h | ≤ r and 0 ≤ t ≤ 1. The first inequality of the integral mean value theorem (Theorem 1.21) then implies:
r p h − 1 p ! D p f a . h p γ = ∫ 0 1 1 − t p − 1 p − 1 ! D p f a + t . h dt − 1 p ! D p f a . h p . dt γ ≤ ∫ 0 1 1 − t p − 1 p − 1 ! D p f a + t . h dt − D p f a γ . dt . h p ≤ ε . h p .

iii) This again follows from (i) and the integral mean value theorem, since
r p h γ ≤ ∫ 0 1 1 − t p − 1 p − 1 ! D p f a + t . h . h p γ dt ≤ M γ . h p p − 1 ! ∫ 0 1 1 − t p − 1 dt .

iv) First, consider the case of a function g that is defined and of class C ^p − 1 in some open neighborhood of a compact interval [a, b] of ℝ and assumed to have a p-th derivative g ^(p) (t) in ]a, b[. Set
φ p − 1 t = g b − ∑ k = 1 p − 1 b − t k k ! g k t

and ψ p t = φ p − 1 t − λ . b − t p p ! , where λ is chosen so that ψ _p (a) = 0. Since ψ _p (b) = 0, Rolle’s theorem (Theorem 1.6) implies that there exists a point c ∈ ]a, b[ such that ψ _p ^(p) (c) = 0, and hence
φ p − 1 a − b − a p p ! g p c = 0 .

We now simply need to apply this result to the mapping g (t) = f (a + t.h), t ∈ [0, 1].

Corollary 1.23

Let E be a real normed vector space, A some non-empty subset of E, f : A → ℝ a mapping of class C ^p (2 ≤ p ≤ ∞) and a some point of A.

i) For f to have a relative minimum (or local minimum) at the point a, the smallest integer n ≤ p such that D ⁿ f (a) is non-zero (if any such integer n exists) must necessarily be even; furthermore, D ⁿ f (a).h ≥ 0 for every non-zero vector h ∈ E.
ii) Conversely, if A is convex, n ≤ p is even, D ⁱ f (a) = 0 for all i ∈ {1, …, n − 1}, and D ⁿ f (x) .h ⁿ > 0 for all x ∈ A and every h ≠ 0, then f has a strict relative minimum at the point a.
iii) If D ⁱ f (a) = 0 for all i ∈ {1, …, n − 1}, n ≤ p is even, and there exists ε > 0 such that D ⁿ f (x).h ⁿ > ε | h |ⁿ for every x ∈ A and every h ≠ 0, then f has a strict relative minimum at the point a (if E is finite-dimensional, this sufficient condition is still valid with ε = 0).

Proof

i) follows from Taylor’s formula with Young’s residual; ii) and iii) follow from Taylor’s formula with the Lagrange residual. If E is finite-dimensional, the unit sphere S 1 ≔ h ∈ E : h = 1 si145_e is compact by Riesz’s theorem ([P2], section 3.2.3, Theorem 3.11); thus, S n si146_e is compact by Tychonov’s theorem ([P2], section 2.3.7, Theorem 2.43) and D ⁿ f (x).h ⁿ has a minimum ε > 0 in S n si146_e (ibid., Theorem 2.39).

(IV) If E = E ₁ × … × E _n, where each E _i is a normed vector space, let α be the multi-index (α ₁, …, α _n), α i ∈ ℕ si148_e (i ∈ {1, …, n}). Define

D α ≔ D 1 α 1 … D n α n , h α ≔ h 1 α 1 … h n α n h = h 1 … h n ;

si149_e

the operator D ^α is called the partial differential of order α. If each E _i is equal to ℝ si48_e , write ∂^α for D ^α ([P2], section 4.3.1 (I)). Let U be an open neighborhood of a in E and suppose that the function f : U → F is of class C ^p in U. Then, Taylor’s formula can be rewritten as follows:

f a + h = ∑ α ≤ p − 1 1 α ! D α f a . h α + r p h .

si151_e [1.13]

1.2.5 Analytic functions

(I) Power series Let E be a normed vector space with norm |.| and suppose that F is a Hausdorff quasi-complete locally convex space (see Remark 1.1 for the case where F is normable). With the notation of section 1.2.1, let S E F si152_e be the K si1_e -vector space of formal power series S = ∑_p S _p, where S _p = c _p.X ^p and c p ∈ ℒ p , s E F si154_e . Let (|.|_γ)_γ ∈ Γ be a family of semi-norms which induces the topology of F and let r > 0; we write that ‖S‖_{γ, r} = ∑_p r ^p‖c _p‖_γ and

S r E F = S ∈ S E F : S γ , r < ∞ ∀ γ ∈ Γ ; S E F = ∪ r > 0 S r E F .

si155_e

The set S E F is a K -vector space called the space of convergent power series. If S ∈ S E F , we say that ρ S ≔ inf r > 0 : S ∈ S r E F is the radius of convergence of S. Suppose that ρ (S) > 0; if we replace the indeterminate X with an element h ∈ E such that | h | < ρ (S), then the family c _p.h ^p is summable ([P2], section 3.2.1 (III)), as can be seen by adapting the proof of ([P2], section 3.4.1 (I), Theorem 3.41), and the mapping S : S ↦ S (h) is continuous in the open set | h | < ρ (S).If F is a Banach space and ρ (S) > 0, then the power series S is absolutely convergent in | h | < ρ (S) and normally convergent in | h | ≤ r′ for every r′ such that 0 < r′ < ρ (S) ([P2], section 4.3.2 (I)).

(II) Analytic functions Let A be a non-empty open subset of E. We say that a function f from A into F is analytic (or is a mapping of class C ^ω) if, for each point a ∈ A, there exists a convergent series S ∈ S E F si158_e , denoted by f _a, such that f (a + h) = f _a (h) for every h ∈ E with sufficiently small norm. This definition generalizes ([P2], section 4.3.2 (I), Definition 4.74). Write C ω A F si161_e for the K si1_e -vector space of analytic functions from A into F. If K = ℝ si6_e and f ∈ C ω A F si164_e , then f is of class C ^∞ in A, and so is each of its differentials D ^p f (p ≥ 1). Every mapping f ∈ C ω A F si164_e admits the following Taylor series expansion at the point a, which converges in | h | < ρ(f _a):

f a + h = ∑ p = 0 ∞ 1 p ! D p f a . h p .

si166_e [1.14]

If | a | < ρ(f _a), then the radius of convergence of the Taylor expansion of f at the point a is greater than or equal to ρ (f _a) − | a |. If A = E and ρ (f _a) = +∞, then the function f is said to be entire.

Let E, F be Banach spaces, G a quasi-complete locally convex space, A an open subset of E, f : A → F an analytic function, B an open subset of F containing f (A) and g : B → G an analytic function. Then, g ∘ f is analytic (exercise); see ([BOU 82a], 3.2.7), ([WHI 65], p. 1079).

The principle of analytic continuation ([P2], section 4.3.2, Theorem 4.76) can be generalized as follows (exercise: see [WHI 65], p. 1080): let E and F be two Banach spaces, Ω a connected open subset of E and f, g two analytic functions from Ω into F. If f and g coincide in any non-empty open subset of Ω, then they must be equal.

Lemma 1.24

Let E be a Banach space and write ℌ si63_e for the subset of invertible operators in ℒ E si62_e . Let ℐ : ℌ → ℌ : u ↦ u − 1 si169_e . The mapping ℐ si170_e is analytic and satisfies D ℐ u 0 . h = − u 0 − 1 . h . u 0 − 1 si171_e for every u 0 ∈ ℌ si172_e .

Proof

We know that ℌ si63_e is open in ℒ E si62_e ([P2], section 3.4.1 (II), Corollary 3.49). Let u 0 ∈ ℌ si172_e and s ∈ ℒ E si176_e . We have u ₀ + s = u ₀ (1_E − v), where v = − u ^− 1 s. If || v || < 1, then 1_E − v is invertible with inverse Σ_n ≥ 0 v ⁿ (ibid.). Hence, if s < 1 u 0 si177_e , u ₀ + s has inverse ∑_n ≥ 0(− u ₀ ^− 1. s)ⁿ u ₀ ^− 1, which shows that ℐ si170_e is analytic. Furthermore, ∑_n ≥ 0(− u ₀ ^− 1. s)ⁿ u ₀ ^− 1 = u ₀ ^− 1 − u ₀ ^− 1. s. u ₀ ^− 1 + o(‖s‖).

(III) Holomorphic functions Let E be a normed complex vector space with norm |.|, A some non-empty open subset of E, and F a complex quasi-complete locally convex space. Goursat’s theorem ([P2], section 4.2.4, Proposition 4.56) can be generalized as follows ([BOU 82a], 3.1.1): the function f : A → F is analytic if and only if it is holomorphic (i.e. complex-differentiable). If this condition is satisfied for E = E ₁ × … × E _n, let a =(a ₁,…, a _n) ∈ A, r = (r ₁, …, r _n), where r _i > 0, and c α = 1 α ! D α f a si179_e , where α is the multi-index (α ₁,…, α _n). The Cauchy inequalities ([P2], section 4.3.2 (II), Lemma-Definition 4.78(2)) can be generalized as follows (exercise): for r ^α ≔ r ₁ ^α
₁…r _n ^α
_n,

c α γ ≤ M γ r α if M γ ≔ sup ξ i − a i ≤ r i : i = 1 , … , n f ξ γ < ∞ .

si180_e

Hence ([P2], section 4.3.2 (II), Theorem-Definition 4.81(3)), if f is entire in E and bounded in F, then it must be constant (Liouville’s theorem). The statement of Hartogs’ theorem ([P2], section 4.3.2 (II), Corollary 4.80) also holds, mutatis mutandis, for a function f : A → F, where A is an open subset of E ₁ × … × E _n, and each E _i is a complex normed vector space: any such function is analytic if and only if it is analytic in each of its variables when the others are held fixed.

Theorem 1.25

(maximum modulus) Let E (respectively F ) be a complex Banach space (respectively quasi-complete Hausdorff locally convex space), A some connected non-empty open subset of E and f : E → F a holomorphic function. Let |.|_γ be a continuous semi-norm on F . If the function | f |_γ : x ↦ | f (x)|_γ is not constant, then it does not have a maximum in A.

Proof

1) Let us begin by showing the result by contradiction when E = F = ℂ . Suppose that f has a maximum in A. By translation, we may assume that 0 ∈ A and that this maximum is attained at 0. Let c ₀ = f (0). If f is not constant, then there exists b _m ≠ 0 such that f (z) = c ₀ (1 + b _m z ^m + z ^m.h (z)), where h is holomorphic in A and satisfies h (0) = 0. Choose r > 0 such that | z | ≤ r implies z ∈ A and h z ≤ 1 2 b m . Let t ∈ ℝ be such that e mit = b m b m . For z = re ^it, we have
1 + b m z m + z m h z ≥ 1 + 1 2 b m z m ,

which is a contradiction.

2) In the case where E = ℂ , we can similarly argue by contradiction by assuming that there exist z ₀, z ₁ ∈ A such that | f (z) |_γ ≤ | f (z ₀)|_γ for all z ∈ A and | f (z ₁)|_γ < | f (z ₀)|_γ. Let V = λ . f z 0 : λ ∈ ℂ and η : V → ℂ : λ . f z 0 ↦ λ . f z 0 γ . Then, | η |_γ = 1, where η γ ≔ sup y ∈ F , y γ ≤ 1 η y . By the Hahn–Banach theorem ([P2], section 3.3.4(II), Theorem 3.25), there exists a continuous linear form ξ ∈ F^∨ extending η such that | ξ |_γ = 1. Therefore, for all x ∈ A, | ξ ∘ f (z)| ≤ | f (z)|_γ ≤ | f (z ₀)|_γ = | ξ ∘ f (z ₀)|, so ξ ∘ f is constant by (1). Hence, | ξ ∘ f (z ₁) = | ξ ∘ f (z ₀)| = | f (z ₀)|_γ and | ξ ∘ f (z ₁)| ≤ | f (z ₁)|_γ < | f (z ₀)|_γ, contradiction.
3) In the general case, let g (ξ) = f (z ₀ + ξ (z − z ₀)) and suppose that | f (z)|_γ ≤ | f (z ₀)|_γ for all z ∈ A. Then, g is holomorphic in Ω = ξ ∈ ℂ : ξ < 1 + r for sufficiently small r > 0 and z sufficiently close to z ₀. Therefore, | g (ξ)|_γ ≤ | f (z ₀)|_γ = | g (0)|_γ, and g is constant in Ω by (2). Thus, g (0) = g (1), so f (z) = f (z ₀). The set of z ∈ A satisfying this condition is non-empty, open and closed in A, and so must be equal to A ([P2], section 2.3.8).

1.2.6 The implicit function theorem and its consequences

(I) Banach–Caccioppoli fixed point theorem

Definition 1.26

Let (X, d) be a metric space and f : X →·X a mapping.

i) We say that a point ξ ∈ X is a fixed point of f if f (ξ) = ξ.
ii) We say that f is a contraction if there exists some constant k, 0 ≤ k < 1, such that, for all x, x′ ∈ X, d (f (x), f (x′)) ≤ k.d (x, x′).

Theorem 1.27

(Banach–Caccioppoli fixed point theorem) Every contraction in a complete metric space has a unique fixed point.

Proof

a) Uniqueness: If f (ξ) = ξ and f (ξ′) = ξ′, then d (f (ξ), f (ξ′)) = d (ξ, ξ′) and d (f (ξ), f (ξ′)) ≤ k.d (ξ, ξ′), so d (ξ, ξ′) ≤ k.d (ξ, ξ′), and thus (1 − k).d (ξ, ξ′) ≤ 0. Since 1 – k > 0, this implies that d (ξ, ξ′) = 0 and ξ = ξ′.

b) Existence: We will use the method of successive approximation: let (x _n) be the sequence of elements of X defined from some arbitrary starting point x ₀ ∈ X by the recurrence relation x _n + 1 = f (x _n). For all n ≥ 0,
d x n + 1 x n ≤ k . d x n x n − 1 ≤ … ≤ k n . d x 1 x 0 ,

and so for all p ≥ 1, by the triangle inequality,
d x n + p x n ≤ ∑ i = 0 p − 1 d x n + p − i x n + p − i − 1 ≤ ∑ i = 0 p − 1 k p − i − 1 ⏟ 1 / 1 − k k n . d x 1 x 0 .

Hence, (x _n) is a Cauchy sequence. Since X is complete, (x _n) must have some limit ξ in X; but f is continuous, so ξ = f (ξ).

(II) Inverse mapping and implicit function theorems Below, we assume that 0 < p ≤ ω.

Definition 1.28

A diffeomorphism of class C ^p (or a C ^p -diffeomorphism) is a bijection of class C ^p whose inverse bijection is also of class C ^p .

This definition can be localized in the obvious way as we did for homeomorphisms ([P2], section 2.3.4 (III)). Every diffeomorphism (respectively local diffeomorphism) is clearly a homeomorphism (respectively local homeomorphism). A local diffeomorphism of class C ^p is also known as an étale mapping of class C ^p (see [P2], section 5.3.2 (II)).

Theorem 1.29

(inverse mapping theorem) Let E and F be Banach spaces and suppose that f is a mapping of class C ^p taking values in F and defined in a neighborhood of some point a ∈ E. Let b = f (a) and suppose that D f (a) is bijective. Then, f is a local diffeomorphism of class C ^p from some neighborhood U of a into some neighborhood V of b; the inverse diffeomorphism g : V → U (of class C ^p) satisfies D g (b) = D f (a)^− 1 .

Proof

1) Preliminary: Since D f a ∈ ℒ E F is bijective, it is a linear homeomorphism by the Banach inverse operator theorem ([P2], section 3.2.3, Theorem 3.12(2)(i)). Hence, E and F are isomorphic and can be identified. We can also assume that D f (a) = 1_E (left-multiplying f by D f (a)^− 1 if necessary) and reduce to the case where a = 0 by translation. Let ϕ (x) = x − f (x). We have Dϕ (0) = 0, and, since Dϕ is continuous, there exists r > 0 such that x ≤ r ⇒ Dϕ x ≤ 1 2 x . The mean value theorem (Theorem 1.13) then implies that ϕ x ≤ 1 2 x whenever | x | ≤ r, i.e. ϕ (B _r ^c (0)) ⊂ B_r/2 ^c (0), where B _r ^c(0) ≔ {x ∈ E : | x | ≤ r}.
2) Existence of an inverse mapping g: B _r/2 ^c (0)) → B_r ^c (0):
Let y ∈ B _r/2 ^c (0). We will show that there exists a unique element x ∈ B _r ^c (0) such that f (x) = y. Let ψ _y (x) = y + x − f (x). If y ≤ r 2 and | x | ≤ r, then | ψ _у (x)| ≤ r, so ψ _у is a mapping from B _r ^c (0) into B _r ^c (0). For all x ₁, x ₂ ∈ B _r ^c (0),
ψ y x 1 − ψ y x 2 = ϕ x 1 − ϕ x 2 ≤ 1 2 x 1 − x 2 .

Since B _r ^c (0) is a complete metric space ([P2], section 2.4.4(II), Lemma 2.77), Theorem 1.27 implies that ψ _у has a unique fixed point in this set. This fixed point x satisfies y + x − f (x) = x, so f (x) = y, and x = g (y).
3) Continuity of g : For x ₁, x ₂ ∈ B _r ^c (0), we have:
x 1 − x 2 = x 1 − f x 1 ⏟ g x 1 + f x 1 − f x 2 − x 2 − f x 2 ⏟ g x 2 ; x 1 − x 2 ≤ f x 1 − f x 2 + g x 1 − g x 2 ≤ f x 1 − f x 2 + 1 2 x 1 − x 2 .

Therefore, | x ₁ − x ₂ | ≤ 2 | f (x ₁) − f (x ₂)|.

4) Differentiability of g : Let y _i = f (x _i), y _i ∈ B _r/2 ^c (0), x _i ∈ B _r ^c (0) (i = 1, 2). Then:
g y 1 − g y 2 − D f x 2 − 1 . y 1 − y 2 = x 1 − x 2 − D f x 2 − 1 . f x 1 − f x 2 .

By taking sufficiently small r > 0, we can guarantee that || D f (x ₂)^− 1 || ≤ 1. Therefore,
g y 1 − g y 2 − D f x 2 − 1 . y 1 − y 2 = o 1 x 1 − x 2 = o 2 y 1 − y 2 ,

which shows that g is differentiable and D g (y) = D f (x)^− 1 in B _r/2 (0).

5) Class of g : If K = ℂ , the generalized Goursat theorem (section 1.2.5 (III)) implies that p = ω. Consider the case where K = ℝ . Since D f and g are continuous and ℐ is analytic (Lemma 1.24), D g = ℐ ∘ D f ∘ g is continuous, and g is therefore of class C ¹. By induction, it follows that if f is of class C ^p (1 ≤ p ≤ ∞), then g is of class C ^p.
Suppose that f is analytic and therefore can be expressed as an absolutely convergent series f(x) = ∑_p c _p. x ^p(| x | < ρ). By (1), c ₀ = 0, c ₁ = 1_E, so y = x + c ₂.x ² + c ₃.x ³ + … According to a classical procedure dating back to Newton, x can be expressed in terms of y as a formal series x = y + ∑_i ≥ 2 d _i. y ⁱ. The terms d _i (i ≥ 2) are determined recursively: y ² = x ² + 2c ₂.x ³ + …, y ³ = x ³ + …, which gives y = x + c ₂. (y ² − 2c ₂.x ³) + …, and x = y − c ₂ .y ² + (2c₂ ² − c₃).y ³ + …, where c ₂ ².y ³ := c ₂ (y, c ₂ (y, y)). We now need to study the convergence of this series. There exists K > 0 such that || c _i || ≤ γ_i, where γ i = K ρ i i ≥ 2 . Thus, the majorant series
η = ξ − ∑ i ≥ 2 γ i . ξ i = ξ − K ρ 2 ∑ i ≥ 0 ξ i ρ i

[1.15]

converges for | ξ | < ρ, with sum φ ξ = ξ − K ρ ξ 2 ρ − ξ . As before, we can construct an inverse formal series ξ = η + ∑_i ≥ 2 δ ⁱ. η. Cauchy showed that this series has a non-zero radius of convergence as follows. By elementary arithmetic, the relation η = φ (ξ) is invertible and we may write ξ = ψ (η), for ξ ∈ ]−∞, ξ ₁[, where ξ 1 = ρ 1 − K K + ρ > 0 , and η ∈ ]−∞, η ₁[, where η 1 = 2 K + ρ − 2 K K + ρ > 0 . It is easy to check that ψ can be expanded into an entire series in a neighborhood of 0 ([KNO 51], section 107). By section 1.2.5 (I), the formal series y + ∑_i ≥ 2 d _i. y ⁱ therefore has a non-zero radius of convergence.

Theorem 1.30

(implicit function theorem) Let E, F and G be Banach spaces, A some non-zero open subset of E × F and f : A → G a mapping of class C ^p in A. Let (a, b) ∈ A be such that f (a, b) = 0 and suppose that D 2 f a b ∈ ℒ F G si210_e is bijective. There exists some neighborhood U ₀ of a in E such that, for every connected open set U ⊂ U ₀ containing a, there is a unique continuous mapping u from U into F satisfying u(a) = b, (x, u (x)) ∈ A, and f (x, u(x)) = 0 for all x ∈ U. Furthermore, u is of class C ^p in U and, for all x ∈ U,

D u x = − D 2 f x u x − 1 ∘ D 1 f x u x .

si211_e [1.16]

Proof

1) Existence of an implicit function: Since D ₂ f (a, b) is bijective, it is an isomorphism from F onto G, and we may consider that F = G; furthermore, replacing f by D ₂ f (a, b)^− 1.f if necessary, we may assume that D ₂ f (a, b) = 1_F. Let φ : A → E × F :(x, y) ↦ (x, f (x, y)). We have
D φ a b = 1 E 0 D 1 f a b 1 F ,

so D _φ (a, b) is invertible in ℒ E × F . By Theorem 1.29, φ is a local diffeomorphism of class C ^p and admits an inverse local diffeomorphism ψ of class C ^p .
Write ψ (x, z) = (x, h (x, z)), where h is defined and of class C ^p in some neighborhood of (a, b) and takes values in F. Finally, set u (x) = h (x, 0). Then, u is of class C ^p in some neighborhood of a and takes values in F. Thus, there exists some neighborhood U ₀ of a in E such that, for all x ∈ U ₀,
x f x u x = φ x u x = φ x h x 0 = φ ψ x 0 = x 0

and u (a) = h (a, 0),so (a, u (a)) = ψ (a, 0) = φ ^− 1 ((a, 0)) = (a, b), and therefore u (a) = b. Hence, u is an “implicit function” of class C ^p.

2) Uniqueness of the implicit function: Since φ is a local homeomorphism, there exist a neighborhood U′₀ of a and a neighborhood V ₀ of b such that there is a unique (x, y) in U′₀ × V ₀ satisfying φ (x, y) = (x, 0) (see Figure 1.1, where Γ is the graph of f). We may assume that U′₀ is the same U ₀ as above (replacing U ₀ by U ₀ ∩ U′₀ if necessary).
If a continuous mapping v : U ₀ → F satisfies v (a) = b and f (x, v(x)) = 0 for all x ∈ U ₀, then we may assume that v(x) ∈ V ₀ for all x ∈ U ₀, further reducing the neighborhood U ₀ of a if necessary. We may similarly assume that u, like v, is defined in U ₀. Let U ⊂ U ₀ be a connected neighborhood of a and suppose that M = {x ∈ U : u (x) = v (x)}. Then, a ∈ M and M is closed in U ([P2], section 2.3.3 (II), Lemma 2.30). We will show that M is also open in U. By the hypotheses, the mapping x ↦ D ₂ f(x, u (x)) is continuous and D ₂ f (a, b) = 1_F, so (again reducing the neighborhood U ₀ if necessary) we may assume that D ₂ f (x, u(x)) is invertible for all x ∈ U ₀. Let a′ ∈ M. There exist a neighborhood U _a′ ⊂ U of a′ and a neighborhood V _a′ ⊂ V ₀ of b′ = u (a′) such that, for all x ∈ U _a′, u (x) is the only solution y of f (x, y) = 0 satisfying y ∈ V _a′. Given that v is continuous at a′ and v (a′)= u (a′), there exists a neighborhood W ⊂ U _a′ of a′ such that v (x) ∈ V _a′ whenever x ∈ W. Therefore, v (x) = u (x) for all x ∈ W, which proves that M is open. The set M is non-empty, open, and closed in the connected space U, which implies that M = U ([P2], section 2.3.8).
3) Calculation of D u(x): Since f(x, u(x)) = 0 in U, the chain rule (Theorem 1.9) implies that
D 1 f x u x + D 2 f x u x ∘ D u x = 0 ,

[1.17]

which gives us [1.16].

Remark 1.31

i) The implicit function theorem answers the following question: what condition makes it possible to express y = b + Δy uniquely as a function of x = a + Δx (so that y = u (x)) whenever f (x, y) = 0 in an open neighborhood A of (a, b)? First of all, with suitable continuity hypotheses, neglecting second-order terms, and assuming that Δx, Δy are sufficiently small, we have ∂ f ∂ x x y . Δ x + ∂ f ∂ y x y . Δ y . Hence:
Δ y ≃ D 2 f x y − 1 ∘ D 1 f x y − 1 ⏟ D u x . Δ x

if D ₂ f (x, y) is invertible, or equivalently if D ₂ f (a, b) is invertible by continuity of D ₂ f. Figure 1.1 shows that the functional relation y = u (x) might only be valid in a sufficiently small neighborhood of (a, b). If x belongs to a connected neighborhood U ₀ of a, the variable y such that f (x, y) = 0 can be made arbitrarily close to b, provided that U ₀ is chosen small enough.
ii) The reader may wish to find an expression for [1.16] in the finite-dimensional case using Jacobian matrices ( [DIE 93] , Volume 1, (10.2.2)); the composition of two linear mappings translates to the product of their matrices.
iii) The statement of Theorem 1.30 no longer holds when F and G are arbitrary Fréchet spaces [SER 72] ; however, it remains valid when E is a non-complete normed vector space (see [SCH 93] , Volume 2,Theorem 3.8.5).

(III) Immersions, submersions, subimmersions, the rank theorem In the following, we assume that 0 < p ≤ ω.

Corollary-Definition 1.32

1) Let E, F be Banach spaces, A an open subset of E, a ∈ A, and i : A → F a mapping of class C ^p such that i (a) = 0, D i (a) is injective and its image F ₁ = im (D i (a)) splits in F ([P2], section 3.2.2 (IV)), i.e. admits a topological complement F ₂ (ibid.). Then, there exist a local homeomorphism r : F → F ₁ × F ₂ in some neighborhood of 0 and an open neighborhood U ⊂ A of a in E such that r ∘ i induces a diffeomorphism of class C ^p from U onto an open subset of F ₁. The local homeomorphism r is a local diffeomorphism of class C ^p .
2) The mapping i defined above is called an immersion of class C ^p .