We could cite various forerunners of differential calculus, including Descartes, Fermat and Cavalieri, but Newton and Leibniz should be remembered as the true pioneers of the field. This dual paternity created terrible priority disputes where the only certainty is the complexity of the controversy. Newton’s “fluxions” and Leibniz’s “vanishing quantities” are analogous to our modern concepts of derivative and differential, respectively. The extension of these ideas to functions of several variables was due to Euler (who introduced partial derivatives) and Clairaut. After contributions by many other mathematicians, including Volterra and Hadamard, the concept of a derivative of arbitrary order (Lemma-Definition 1.4) was ultimately introduced by Fréchet between 1909 and 1925. The inverse mapping theorem was proved by Lagrange in 1770, and the simplest case of the implicit function theorem was proved by Cauchy around 1833, followed by the case of vector-valued functions of several variables by Dini in 1877. Gateaux then extended Fréchet’s earlier ideas to develop his concept of differential, which was presented in a posthumous publication in 1919. The “convenient” form of differentials, one of their most recent incarnations, was introduced by Frölicher, Kriegl and Michor in the early 1980s (p. 73). These differentials were intended for mappings taking values in locally convex non-normable spaces, in particular nuclear Fréchet spaces (see, in particular, section 5.3.2 on manifolds of mappings). Wherever possible, this chapter therefore chooses to present differential calculus for mappings which take values in locally convex spaces rather than the normed vector spaces considered by the standard approach (nothing essential changes). Nonetheless, for the inverse mapping theorem and the implicit function theorem (Theorems 1.29 and 1.30), we will restrict attention to the Banach case (for a more general context, which uses yet another concept of differentiability distinct from any of those mentioned above). Both theorems are proved in full detail, including the Banach analytic case, by explicitly filling in the hints from. Together with the Carathéodory theorems mentioned below, these two theorems are the most profound results of this chapter.
Keywords
Calculus of variations; “Convenient” differentials; Differential Calculus; Existence and uniqueness theorems; Fréchet differential calculus; Gateaux differentials; Lagrange variations; Mappings of class Cp; Parameter dependence; Taylor’s formulas
1.1 Introduction
We could cite various forerunners of differential calculus, including Descartes, Fermat and Cavalieri, but Newton and Leibniz should be remembered as the true pioneers of the field. This dual paternity created terrible priority disputes where the only certainty is the complexity of the controversy [HAL 80]. Newton’s “fluxions” and Leibniz’s “vanishing quantities” are analogous to our modern concepts of derivative and differential, respectively. The extension of these ideas to functions of several variables was due to Euler (who introduced partial derivatives) and Clairaut. After contributions by many other mathematicians, including Volterra and Hadamard, the concept of a derivative of arbitrary order (Lemma-Definition 1.4) was ultimately introduced by Fréchet between 1909 and 1925. The inverse mapping theorem was proved by Lagrange in 1770, and the simplest case of the implicit function theorem was proved by Cauchy around 1833, followed by the case of vector-valued functions of several variables by Dini in 1877. Gateaux then extended Fréchet’s earlier ideas to develop his concept of differential, which was presented in a posthumous publication in 1919. The “convenient” form of differentials, one of their most recent incarnations, was introduced by Frölicher, Kriegl and Michor in the early 1980s ([KRI 97], p. 73). These differentials were intended for mappings taking values in locally convex non-normable spaces, in particular nuclear Fréchet spaces (see, in particular, section 5.3.2 on manifolds of mappings). Wherever possible, this chapter therefore chooses to present differential calculus for mappings which take values in locally convex spaces rather than the normed vector spaces considered by the standard approach (nothing essential changes). Nonetheless, for the inverse mapping theorem and the implicit function theorem (Theorems 1.29 and 1.30), we will restrict attention to the Banach case (for a more general context, see [GLÖ 06], which uses yet another concept of differentiability distinct from any of those mentioned above). Both theorems are proved in full detail, including the Banach analytic case, by explicitly filling in the hints from [WHI 65]. Together with the Carathéodory theorems mentioned below, these two theorems are the most profound results of this chapter.
The classical Cauchy–Lipschitz conditions for the existence and uniqueness of solutions of ordinary differential equations were considerably weakened by Carathéodory ([CAR 27], section 576 and following) using measure and integration theory ([P2], section 4.1), which was an extremely recent development at the time. It was judged useful to give a full proof of Carathéodory’s theorems in section 1.5.1, since the available literature seems to expect the reader to reassemble this proof from a scattered collection of isolated results and special cases, at the expense of clarity. The parameter dependence of solutions is studied in section 1.5.3. Again, proofs are given in full, with the exception of one result: the differentiability of solutions with respect to the initial condition when the classical hypothesis (Corollary 1.82) is replaced by Lusin’s condition. This generalization is important (Remark 1.84) and the proof is not uninteresting ([ALE 87], Chapter 2, section 2.5.6); it is omitted from these pages not because it is too difficult but because it is too long.
1.2 Fréchet differential calculus
1.2.1 General conventions
The conventions presented below apply throughout the entire volume. The conventions listed under (II) are motivated by tensor calculus (Chapter 4).
(I)K denotes the field of real or complex numbers. Two elements ∞ and ω are adjoined to the set of integers ℤ. The usual order relation on ℤ is extended by the convention n < ∞ < ω for every integer n∈ℤ, with ∞ + n = n + ∞ = ∞, ω + n = n + ω = ω. We define ℕK=ℕ∪∞ω if K=ℝ and ℕK=0ω if K=ℂ, and ℕK×=ℕK−0. See also the convention (C) in section 2.2.1(II).
For α=α1…αn∈ℕn, recall that α! = α1!…αn!. For α∈ℤn, | α | = α1 + … + αn.
The locally convex vector spaces considered below are defined over K and are always Hausdorff, with the exception of semi-normed spaces. We will use the Landau notation: let X be a topological space, a some point in X, F a locally convex space, and f:X→ℝ+, g : X → F two mappings; suppose that f (x) > 0 in some neighborhood of a. We write g = O (f)(x → a) (respectively g = o (f)(x → a)) and say that g is dominated by f (respectively is negligible with respect to f)in the neighborhood of a if the function g/f is bounded in some neighborhood of a (respectively tends to 0 as the variable x tends to a). If there are several such mappings, we can write O1 (f), O2 (f) (respectively o1 (f), o2 (f)), etc.
(II) Unless otherwise stated, the dual of a locally convex space E is denoted by E∨1. Let (ei) be a basis of the finite-dimensional vector space E, (e∨ î) the dual basis, (e′i) some other basis of E and (e′∨ i) its dual basis ([P1], section 3.1.3(VI)). In practice, we can unambiguously write ei′ for e′i′, e∨ i′ for e′∨ i′. Similarly, given a change-of-basis matrix A = (Aii′) (where i ranges over the rows and i′ ranges over the columns), we can write A−1=Ai′i=1detAαT, where α is the matrix of cofactors of A ([P1], section 2.3.11(V)); thus, Σi′, Aii′Ai′j = δij2. Let x = Σixiei ∈ E; E is typically a left K-vector space and so the vector x can be represented by the row (xi) with respect to the basis (ei) ([P1], section 3.1.3(II)). By contrast, E∨ is typically a right K-vector space; thus, if x∨ = Σixi∨ e∨ i ∈ E∨, the covector x∨ can be represented by the column (xi∨) with respect to the basis (e∨ i) ([P1], section 3.1.3(IV)).
Memory aid:– Indices such as i, j, k refer to the old basis; primed indices such as i′, j′, k′ refer to the new basis and are written as superscripts for the change-of-basis matrix A or as subscripts for its inverse A− 1. The indices of the components of a vector are usually written as superscripts; the indices of the components of a covector are usually written as subscripts. The indices of a sequence of vectors are always subscripts; the indices of a sequence of covectors are always superscripts. This is motivated by “Einstein’s summation convention” (see Remark 4.2 in section 4.2.1).
If x ∈ E, x∨ ∈ E∨, the duality bracket of these two vectors is written as 〈x, x∨〉. A change of basis in E and E∨ may therefore be written as:
Since K is commutative, we do not need to distinguish between left K-vector spaces and right K-vector spaces. Therefore, the duality bracket of x ∈ E, x∨ ∈ E∨ can also be written as 〈x∨, x〉. For example, if f:Ω→K is a differentiable function in some non-empty open subset Ω of E, then, at each point a of Ω, we have Df (a) ∈ E∨ (see Lemma-Definition 1.4). If ha ∈ E, it might seem more convenient to write (Df (a), ha) instead of (ha, Df (a)) for the quantity Df (a).ha, but the reverse notation is also perfectly justifiable (section 2.2.4(IV)).
(III) Recall the following fact ([P2], section 3.9.3(II)): let E1, …, En be normed vector spaces, each equipped with a norm |.|, and suppose that F is a semi-normed vector space equipped with a semi-norm |.|γ. The space ℒE1…EnF of continuous n-linear mappings from E1 × … × En into F is a semi-normed vector space when equipped with the semi-norm ||.||γ defined for any mapping u∈ℒE1…EnF by:
uγ=sup∣x1∣,…,∣xn∣≤1ux1…xnγ.
[1.3]
If F is a Hausdorff locally convex space whose topology is induced by the family of semi-norms (|.|γ)γ ∈ Γ ([P2], section 3.3.3), then ℒE1…EnF is a Hausdorff locally convex space when equipped with the family of semi-norms (|.|γ)γ ∈ Γ. This space is quasi-complete whenever F is quasi-complete ([P2], section 3.4.2(II)).
Remark 1.1
IfFis a normed vector space with norm |.|, the index γ andthe set Γ have only to be omitted. This simplification only causes problems insections 1.3.1and1.3.3. Insections 1.2.4and1.2.5, read “Banach space” instead of “quasi-complete locally convex space” if the simplification has been made.
Write ℒnEF for the space ℒE1…EnF whenever Ei = E for all i ∈ {1, …, n}. An element u of ℒnEF is said to be symmetric if, for all (h1, …, hn) ∈ En and every permutation σ∈Sn,we have u(h1, …, hn) = u(hσ(1), …, hσ(n)). The set of symmetric elements of ℒnEF is written as ℒn,sEF and is a vector subspace of ℒnEF; this subspace is equipped with the family of semi-norms (||.||γ)γ ∈ Γ. If h ∈ E and u∈ℒn,sEF, set:
u.hn=u.h,…,h⏟nterms.
If E is finite-dimensional, then we have the following canonical isomorphism:
(I) Let E, F be two locally convex spaces and ϕ a mapping from U into F, where U is some neighborhood of 0 in E.
Definition 1.2
We say that ϕ is tangent to 0 if, for every neighborhood W of 0 inF, there exists a balanced neighborhood V ⊂ U of 0 inEsuch that, for everyt∈Kwith sufficiently small | t |, we have ϕ (tV) ⊂ o (t) W (whereo:K→K).
Lemma 1.3
i)If ϕ is linear and tangent to 0, then ϕ = 0
ii)IfEandFare normed vector spaces with norm |.|, then ϕ is tangent to 0 if and only if | ϕ (x)| = o (| x |).
Proof
(i) With the notation of Definition 1.2, we have ϕV⊂ottW, so ϕ (V) ⊂ {0} and ϕ = 0. (ii): exercise.
From this lemma, we immediately deduce the following claim (ii):
Lemma-Definition 1.4
Let a be some point ofE, U some neighborhood of 0 inE, and f : U + a → Fsome mapping where a + U := {a + x : x ∈ U}.
i)We say thatfis (Fréchet) differentiable at a if there exists a continuous linear mappingL∈ℒEFsuch that the mappingh ↦ f (a + h) − f (a) − L.h(defined in U) is tangent to 0.
ii)This mappingLis uniquely determined.
iii)This mapping is written asL = Df (a) (orL = daforL = f′ (a)) and is called the (Fréchet) differential offat the point a.
The notion of the Fréchet differential is especially fruitful when E is a normed vector space. In this case, if A is an open subset of E, F is a locally convex space, a is a point of A, and f : A → F is a mapping, then f is differentiable at a with differential Df (a) at this point if and only if
limh→0,h≠0fa+h−fa−Dfa.hh=0.
The canonical isomorphism ℒKF≅F:u↦u.1 ([P2], section 3.3.8) reintroduces the usual notion of derivative, including the notion of a “complex derivative” when K=ℂ ([P2], section 4.2.1, Theorem-Definition 4.45). Indeed, keeping the same notation as before, we obtain:
Corollary-Definition 1.5
IfE=K, the functionf : U + a → Fis differentiable at the pointa∈K(in the above sense) if and only if it is differentiable at a (in the usual sense of differentiability of functions of one real or complex variable), in which case the elementf.a=Dfa.1ofFis the derivative offat a.
Theorem 1.6
(Rolle) Let [a, b] be a compact interval ofℝwith non-empty interior and suppose thatφ:ab→ℝis a continuous function in [a, b] that is differentiable in ]a, b[ and which satisfies φ (a) = φ (b). Then, there exists c ∈ ]a, b[ such thatφ.c=0.
Proof
If φ is constant in [a, b], then it is differentiable and has zero derivative on ]a, b[. otherwise, it attains its lower and upper bounds ([P2], section 2.3.7, Theorem 2.42). one of these values is attained at some point c ∈ ]a, b[, which must therefore satisfy φ.c=0 (exercise).
If f is differentiable at the point a, then it is also continuous at this point (exercise). Let A be a non-empty open subset of E and write DaAF for the set of mappings from A into F that are differentiable at the point a ∈ A. The mapping DaAF→ℒEF:f↦Dfa is known as the differentiation operator at the point a and is K-linear.
Definition 1.7
Letf∈DaAF. The rank rk (Df (a)) ([P1],section 3.1.10, Definition 3.38(ii)) is said to be the rank offat the point a and is written as rka (f).
(II) Let E be a real Hilbert space, A some non-empty open subset of E, and f:A→ℝ a function that is differentiable at the point a ∈ A. Then, Df (a) ∈ E∨. By Riesz’s theorem ([P2], section 3.10.2(IV), Theorem 3.15(1)), there exists some uniquely determined element x* ∈ E such that Df (a) coincides with the linear form h ↦ 〈x∗| h〉.
Definition 1.8
The element x*specified above is written as grada (f) or ∇af and is called the gradient of f at the point a. It satisfies 〈∇af | h〉 ≔ Df(a). h for all h ∈ E.
(III) Some of the classical results of differential calculus ([DIE 93], Volume 1, Chapter 8) are reproduced below. A few are stated in a slightly more general form than the cited reference; however, the reasoning is identical and entirely straightforward in each case.
Theorem 1.9
(chain rule) LetE, Fbe normed vector spaces, Ga locally convex space, A some non-empty open subset ofE, a some point of A, B an open subset ofFcontainingf (a), f∈DaAF, andg∈DfaBG. Then, g∘f∈DaAGand
Dg∘fa=Dgfa∘Dfa.
[1.5]
Example 1.10
LetEbe a Banach space. The setℌ⊂ℒEof continuous linear bijections is open inℒE([P2],section 3.4.1(II),Lemma 3.50). The mappingu ↦ u− 1fromℌontoℌis differentiable, and its differential at the pointu0is the continuous linear mappings ↦ − u0− 1 ∘ s ∘ u0(fromℒEintoℒE). The reader may wish to prove this result as an exercise or refer toLemma 1.24ofsection 1.2.5(II).
(IV) Let E1, …, En be normed vector spaces. Then, E = E1 × … × En can be canonically equipped with the structure of a normed vector space ([P2], section 3.4.1(I)). Let A be a non-empty open subset of E and suppose that a = (a1,…,an) ∈ A. Suppose further that F is a locally convex space. If f : A → F : (x1, …, xn) ↦ f (x1, …, xn) is differentiable at the point a, then the mapping
hi↦fa1…ai−1ai+hiai+1…an
is defined in some open neighborhood of 0 in Ei and differentiable at 0 for all i ∈ {1, …n} (exercise); its differential at this point is written as Dif (a) (or dif (a) or ∂f∂xia).
Definition 1.11
The mapping Dif (a) is called the i-th partial differential offat the point a.
Given the conditions stated above, the differential Df (a) can be expressed in terms of the partial differentials Dif (a) as follows:
Dfa.h=∑i=1nDifa.hi,
[1.6]
where h = (h1, …, hn). If Ei=K, the element ∂if (a) = Dif (a).1 ∈ F is called the i-th partial derivative of f at the point a; writing (ei)1 ≤ i ≤ n for the canonical basis of Kn, we have
∂ifa=Difa.1∈F.
[1.7]
If K=ℝ, the existence of the partial differentials Dif (a)(i = 1, …, n) does not imply the existence of the differential Df (a) (however, see Theorem 1.16(ii)).
If Ej=K for all j ∈{1, …, n} and F=Km, then the linear mapping Df (a) can be represented with respect to the canonical bases by the Jacobian matrix [∂ifj (a)]1 ≤ i ≤ m, 1 ≤ j ≤ n, where f = (f1, …, fn). If n = m, the determinant of this matrix is called the Jacobian of f at the point a and is written as ∂f1…fn∂x1…xna.
(V) In the following, K=ℝ. Let F be a locally convex space and |.|γ a continuous semi-norm on F.
Theorem 1.12
(mean value theorem, 1st version) Let I = [α, β] be a compact interval ofℝ, fa continuous mapping from I intoFand φγa continuous mapping from I intoℝ. Suppose that there exists a countable subsetD⊂I∘such thatfand φγboth have a derivative at every point ξ ∈ D andf.ξγ≤φ.γξ. Then, | f(β) − f(α)|γ ≤ φγ(β) − φγ(α).
Proof
This is a classical result3 when F is a normed vector space with norm |.|; we simply need to replace |.| par |.|γ and φ by φγ.
Let E be a normed vector space with norm |.|. With the notation of [1.3] (with n = 1), we have the following result:
Theorem 1.13
(mean value theorem, 2nd version) Letfbe a mapping taking values inFthat is defined and continuous in some neighborhood of the closed segment S = [a, a + h] joining the points a, a + hofE([P2],section 3.3.1). Suppose thatfis differentiable at every point of S.
i)Given any subset Θ of [0, 1] with countable complement in [0, 1],
fa+h−faγ≤h.supt∈ΘDfa+t.hγ.
ii)LetL∈ℒEF. Given any subset Ξ of [a, a + h] with countable complement in [a, a + h], we have | f (a + h) − f (a) − L.h |γ ≤ Mγ. | h | with Mγ = supx ∈ Ξ || Df (x) − L ||γ. In particular, if Mγ = supx ∈ Ξ || Df (x) − Df (a)||γ, then
fa+h−fa−Dfa.hγ≤Mγh.
[1.8]
iii)LetF=K=ℝ. There exists θ ∈ ]0, 1[ such that
fa+h−fa=Dfa+θ.h.h.
Proof
Let g :[0, 1] ↦ F : t ↦ f (a + t.h). By Theorem 1.9, g.t=Dfa+t.h.h, so g.tγ≤Nγh, where Nγ = supt ∈ Θ || Df (a + t.h)||γ. The upper bound of (i) is therefore obtained by applying Theorem 1.12 with f replaced by g and φγ (t) = Nγ.t. | h |.
(ii)Can be deduced by applying (i) to the function ξ ↦ f (ξ) − L. (ξ − x).
(iii)Let g:01→ℝ:t↦fa+t.h−fa−(fa+h−fa.t. Since g (0) = g (1) = 0, Rolle’s theorem (Theorem 1.6) implies the existence of a number θ ∈ ]0, 1[ such that g.θ=0.
Let A be a connected open subset ofEand suppose thatf : A → Fis differentiable in A. If the differential Df : x ↦ Df (x) is zero in the complement of a countable subset of A, thenfmust be constant in A.
Corollary 1.15
Let (|.|γ)γ ∈ Γbe a family of semi-norms that induce the topology ofF, A some convex open subset ofEandf : A → Fa differentiable mapping in A. If, for all γ ∈ Γ, there exists a real number kγ > 0 such that supx ∈ A || Df (x)||γ ≤ kγ, thenfis Lipschitz ([P2],section 2.4.3(II)) and hence uniformly continuous.
Proof
Let x′, x″ ∈ A. Since A is convex, the segment [x′, x″] is contained in A, so by Theorem 1.13(i) | f (x′) − f (x″)|γ ≤ kγ | x′ − x″| for all γ ∈ Γ, which means that f is Lipschitz.
1.2.3 Mappings of class Cp
In this section, K=ℝ.
(I) Let E be a normed vector space and suppose that F is a locally convex space. Write ℒnEF for the space of continuous n-linear mappings from En into F equipped with the family of semi-norms [1.3].
Let A be a non-empty open subset of E and suppose that f : A → F is a mapping. We say that f is of class C0 if it is continuous in A. We say that it is of class C1 if it is differentiable in A and its differential Df:A→ℒEF is continuous. We say that f is twice differentiable (respectively is of class C2 )in A if its differential
Df is differentiable (respectively is of class C1) in A. At any given point a ∈ A, the second differential D (Df)(a), written as D2f (a), is an element of ℒEℒEF; this space can be identified with ℒ2EF by ([P2], section 3.9.3(II)). Similarly, we may recursively define a p-times differentiable mapping (respectively a mapping of class Cp)in A for any integer p ≥ l, as well as the p-th order differential Dpfa∈ℒpEF for any point a ∈ A. We say that f is of class C∞ if f is of class Cp for every p ≥ l. We write CpAF for the K-vector space of mappings of class Cp from A into F (0 ≤ p ≤ ∞).
From ([DIE 93], Volume 1, (8.12.1), (8.16.6)), mutatis mutandis, we have the following result:
Theorem 1.16
LetFbe a locally convex space.
i) (Schwarz’s theorem) LetEbe a normed vector space, A some non-empty open subset ofEandf : A → Fa p-times differentiable mapping (p ≥ 2) at some point a ∈ A. Then, the differential Dpf (a) is symmetric and therefore belongs toℒp,sEF.
ii) LetE1, …, Enbe normed vector spaces (n ≥ 2) and suppose that A is a non-empty subset ofE1 × … × En. For any p such that 1 ≤ p ≤ ∞, fis of class Cpin A if and only if the nppartial derivatives of order p offexist and are continuous in A (where n∞ := ∞).
By Theorem 1.16(ii), a mapping f from a non-empty open subset Ω of ℝn into ℂ is of class Cp if and only if its np partial derivatives exist and are continuous, i.e. f∈ℰpΩ ([P2], section 4.3.1(I)).
(II) In the context of Theorem 1.16(i), when E=Kn, introducing “symbolic powers” gives us an easy way to express Dpf (a).hp (with the notation of section 1.2.1(III)) in terms of the partial derivatives of order p:
∂i1…∂ipfa≔Di1…Dipfa.ei1…eip,
where Di denotes the partial differential in the i-th variable. Writing h = (h1, …, hn), we have D2f (a).h2 = Σ1 ≤ i,j ≤ n ∂i ∂jf (a).hihj4. This can be expressed as (Σ1 ≤ i ≤ n ∂if (a).hi)[2] by developing the latter like any other squared parentheses and replacing (∂if (a).hi) (∂jf (a).hj) by ∂i ∂jf (a).hihj. With the same conventions, we can continue inductively to obtain the following result:
Dpfa.hp=∑1≤i≤n∂ifa.hip.
If Ei=K (i = 1, …, n), F=K, and f:A→K is twice differentiable at some point a of A, then the matrix H f (a) = (∂i ∂jf (a))1 ≤ i,j ≤ n is called the Hessian matrix5 of f at the point a. Identifying the vectors of Kn with the columns of this matrix gives D2f (a).h2 = hT.H f (a).h.
(III) Suppose that l ≤ p ≤ ∞. The following results can be shown by induction (exercise). The composition of two mappings of class Cp is also of class Cp. If E is a normed vector space, A some non-empty open subset of E, F and G locally convex spaces, u a continuous linear mapping from F into G and f : A → F a mapping of class Cp in A, then u ∘ f is of class Cp in A and Dp (u ∘ f) = u ∘ Dpf. With the same hypotheses on E and A, if F1, F2, and G are locally convex spaces, [.,.] is a continuous bilinear mapping from F1 × F2 into G, and fi is a mapping of class Cp from A into Fi for i = 1, 2, then [f1, f2] is of class Cp from A into G and the so-called Leibniz rule holds:
Df1f2=Df1+f2+f1Df2.
[1.9]
(IV) Nemytskii Operator Let F be a Banach space, J a compact interval of ℝ with non-empty interior, and CpJF the space of mappings of class Cp (p ≥ 1) from J into F. Then, CpJF is a Banach space when equipped with the norm
φp=sup0≤k≤p,t∈Jφkt
(exercise). Let U be an open neighborhood of 0 in F, f : U × J → F a mapping of class Cp and N:U×J∘→F the so-called Nemytskii operator, defined by Nφt=fφtt, where U≔φ∈CpJF:φt∈U∀t∈J∘.
Theorem 1.17
The Nemytskii operatorNis of class Cpand
DNφt.hτ=D1fφtt.ht+φ.t.τ+D2fφtt.τ.
[1.10]
Proof
The operator N is the composition
U×J∘→ev×1J∘F×J∘→fF,
where ev:CpJF×J∘→F is the evaluation operator defined by ev (φ, t) = φ(t). But ev is of class Cp and
Devφt.hτ=ht+φ.t.τ
[1.11]
(exercise*: see [ABR 83], Proposition 2.4.17). Since f is of class Cp, so is N by (III), and [1.10] follows from [1.5].
Remark 1.18
The operatorNdefined above is stated slightly more generally than the classical Nemytskii operator, which does not allow t to vary (to recover the classical case, set the increase τ to 0 in[1.10]).
1.2.4 Taylor’s formulas
Throughout this section, K=ℝ, and F denotes a quasi-complete locally convex space. Readers who are only interested in mappings taking values in a Banach space (see Remark 1.1) may skip (I) and (II) below.
(I) Mackey Convergence Let B be a balanced convex subset (also known as an “absolutely convex” subset ([KÖT 79], Volume 1, section 16.1(2))) that is closed and bounded in F, and write F1 = ∪n = 1∞ nB. The gauge of pB of B in F1 ([P2], section 3.3.2(I)) is a norm on F1 (exercise) and F1 is a Banach space, written as FB, when equipped with this norm.
Definition 1.19
We say that a sequence (xn) of elements ofFis Mackey convergent to 0 (or locally convergent to 0 ([KÖT 79], Volume 1, section 28.3)), if there exists a balanced, closed and bounded subset B containing 0 and all the xnsuch that (xn) converges to 0 inFB.
Mackey convergence of a sequence of elements of F implies the usual notion of convergence and is equivalent to it whenever F is a Fréchet space or the strong dual of an infrabarreled Schwartz space ([HOG 71], p. 8, Example 3), such as the distribution spaces studied in ([P2], section 4.4.1).
(II) Generalized Riemann integral To state Taylor’s formulas in a general framework, it will be useful to introduce the generalized Riemann integral of a function of a real variable taking values in a quasi-complete locally convex space6.
Lemma-definition 1.20
LetFbe a quasi-complete locally convex space, I a non-empty open interval ofℝ, c some point of I andf : I → Fa continuous mapping.
i)There exists a unique differentiable mapping ∫ f : I → Fsuch that (∫ f)(c) = 0 and (∫ f)′ = f.
ii)Let a, b ∈ I. Write ∫abf(t)dt = (∫ f)(b) − (∫ f)(a); this quantity is called the (generalized) Riemann integral offfrom a to b.
iii)Iff : [a, b] → Fis Lipschitz, consider the points a = t0 < t1 < … < tn = b of the interval [a, b], and the Riemann sum Sn = ∑i = 0n − 1(ti + 1 − ti)f(ti). If n → ∞ and | ti + 1 − ti | → 0 for all i ∈ {1, …, n − 1}, then Sn → ∫abf(t)dt in the sense of Mackey convergence.
Proof
See [KRI 97], Chapter 1, Lemma 2.5 and Proposition 2.7.
If F is a Banach space, this recovers the classical notion of the integral of a continuous function. The integration operator ∫ab defined above satisfies the usual properties ([KRI 97], Chapter 1, Corollary 2.6). In particular, we have the following result:
Theorem 1.21
(integral mean value theorem) Let I = [α, β] be a compact interval ofℝand letf : I → Fbe a continuous mapping. For every continuous semi-norm |.|γofF,
∫αβft.dtγ≤∫αβft.dtγ≤β−α.supt∈Iftγ.
Proof
Let g(t) = ∫αtf(t). dt. By the definition of the integral, g is differentiable in I and g.tγ=φ.t, where φ.t=ftγ. Applying the mean value theorem (Theorem 1.12) to g therefore gives the desired inequalities.
(III) Taylor’s formulas Let E be a normed vector space with norm |.|, A some non-empty open subset of E and F a quasi-complete locally convex space whose topology is defined by a family of semi-norms (|.|γ)γ ∈ Γ. With the conventions of section 1.2.1 and p ≥ l, we have the following result:
Theorem 1.22
i)Letf : A → Fbe a mapping of class Cpin A. If the segment [a, a + h] is contained in A, thenfsatisfies Taylor’s formula
fa+h=∑k=0p−11k!Dkfa.hk+rph,
[1.12]
where the residual rp (h) is the (Laplace residual) integral
rph=∫011−tp−1p−1!Dpfa+t.h.hp.dt.
ii)As | h | → 0, we can alternatively express the residual in the form of Young’s residual:
rph=1p!Dpfa.hp+ohp.
iii)Let γ ∈ Γ and suppose that there exists some real number M > 0 such that supx ∈]a,a + h[ || Dpf (x)||γ ≤ M. Then, | rp (h)|γis upper bounded by the Lagrange residual:
rphγ≤Mp!hp.
iv)IfK=F=ℝ, suppose thatfis of class Cp − 1in A and admits a differential of order p, Dpf (x), at every point of the open segment ]a, a + h[. The Lagrange residual can be expressed as follows: there exists θ ∈ ]0, 1[ such that
rph=Dpfa+θ.hp!hp.
Proof
i)For p = 1,
fa+1.h−fa=∫01ddtfa+t.hdt=∫01Dfa+t.h.h.dt.
For p = 2, we can simply evaluate the above integral by parts. This integral is of the form ∫01u. dv with u = Df (a + th).h and v = 1 − t; hence, [u. v]01 − ∫01v. du = Df(a) + ∫01(1 − t)D2f(a + th). h2. dt. Continuing inductively gives (i).
ii)
Since Dpf is continuous, for all ε > 0, there exists a real number rγ > 0 such that || Dpf (a + t.h) − Dpf (a)||γ ≤ p!ε whenever | h | ≤ r and 0 ≤ t ≤ 1. The first inequality of the integral mean value theorem (Theorem 1.21) then implies:
iv)First, consider the case of a function g that is defined and of class Cp − 1 in some open neighborhood of a compact interval [a, b] of ℝ and assumed to have a p-th derivative g(p) (t) in ]a, b[. Set
φp−1t=gb−∑k=1p−1b−tkk!gkt
and ψpt=φp−1t−λ.b−tpp!, where λ is chosen so that ψp (a) = 0. Since ψp (b) = 0, Rolle’s theorem (Theorem 1.6) implies that there exists a point c ∈ ]a, b[ such that ψp(p) (c) = 0, and hence
φp−1a−b−app!gpc=0.
We now simply need to apply this result to the mapping g (t) = f (a + t.h), t ∈ [0, 1].
Corollary 1.23
LetEbe a real normed vector space, A some non-empty subset ofE, f:A→ℝa mapping of class Cp (2 ≤ p ≤ ∞) and a some point of A.
i)For f to have a relative minimum (or local minimum) at the point a, the smallest integer n ≤ p such that Dnf (a) is non-zero (if any such integer n exists) must necessarily be even; furthermore, Dnf (a).h ≥ 0 for every non-zero vectorh ∈ E.
ii)Conversely, if A is convex, n ≤ p is even, Dif (a) = 0 for all i ∈ {1, …, n − 1}, and Dnf (x) .hn > 0 for all x ∈ A and everyh ≠ 0, then f has a strict relative minimum at the point a.
iii)If Dif (a) = 0 for all i ∈ {1, …, n − 1}, n ≤ p is even, and there exists ε > 0 such that Dnf (x).hn > ε | h |nfor every x ∈ A and everyh ≠ 0, then f has a strict relative minimum at the point a (ifEis finite-dimensional, this sufficient condition is still valid with ε = 0).
Proof
i) follows from Taylor’s formula with Young’s residual; ii) and iii) follow from Taylor’s formula with the Lagrange residual. If E is finite-dimensional, the unit sphere S1≔h∈E:h=1 is compact by Riesz’s theorem ([P2], section 3.2.3, Theorem 3.11); thus, Sn is compact by Tychonov’s theorem ([P2], section 2.3.7, Theorem 2.43) and Dnf (x).hn has a minimum ε > 0 in Sn (ibid., Theorem 2.39).
(IV) If E = E1 × … × En, where each Ei is a normed vector space, let α be the multi-index (α1, …, αn), αi∈ℕ (i ∈ {1, …, n}). Define
Dα≔D1α1…Dnαn,hα≔h1α1…hnαnh=h1…hn;
the operator Dα is called the partial differential of order α. If each Ei is equal to ℝ, write ∂α for Dα ([P2], section 4.3.1(I)). Let U be an open neighborhood of a in E and suppose that the function f : U → F is of class Cp in U. Then, Taylor’s formula can be rewritten as follows:
fa+h=∑α≤p−11α!Dαfa.hα+rph.
[1.13]
1.2.5 Analytic functions
(I) Power series Let E be a normed vector space with norm |.| and suppose that F is a Hausdorff quasi-complete locally convex space (see Remark 1.1 for the case where F is normable). With the notation of section 1.2.1, let SEF be the K-vector space of formal power series S = ∑pSp, where Sp = cp.Xp and cp∈ℒp,sEF. Let (|.|γ)γ ∈ Γ be a family of semi-norms which induces the topology of F and let r > 0; we write that ‖S‖γ, r = ∑prp‖cp‖γ and
SrEF=S∈SEF:Sγ,r<∞∀γ∈Γ;SEF=∪r>0SrEF.
The set SEF is a K-vector space called the space of convergent power series. If S∈SEF, we say that ρS≔infr>0:S∈SrEF is the radius of convergence of S. Suppose that ρ (S) > 0; if we replace the indeterminate X with an element h ∈ E such that | h | < ρ (S), then the family cp.hp is summable ([P2], section 3.2.1(III)), as can be seen by adapting the proof of ([P2], section 3.4.1(I), Theorem 3.41), and the mapping S : S ↦ S (h) is continuous in the open set | h | < ρ (S).If F is a Banach space and ρ (S) > 0, then the power series S is absolutely convergent in | h | < ρ (S) and normally convergent in | h | ≤ r′ for every r′ such that 0 < r′ < ρ (S) ([P2], section 4.3.2(I)).
(II) Analytic functions Let A be a non-empty open subset of E. We say that a function f from A into F is analytic (or is a mapping of class Cω) if, for each point a ∈ A, there exists a convergent series S∈SEF, denoted by fa, such that f (a + h) = fa (h) for every h ∈ E with sufficiently small norm. This definition generalizes ([P2], section 4.3.2(I), Definition 4.74). Write CωAF for the K-vector space of analytic functions from A into F. If K=ℝ and f∈CωAF, then f is of class C∞ in A, and so is each of its differentials Dpf (p ≥ 1). Every mapping f∈CωAF admits the following Taylor series expansion at the point a, which converges in | h | < ρ(fa):
fa+h=∑p=0∞1p!Dpfa.hp.
[1.14]
If | a | < ρ(fa), then the radius of convergence of the Taylor expansion of f at the point a is greater than or equal to ρ (fa) − | a |. If A = E and ρ (fa) = +∞, then the function f is said to be entire.
Let E, F be Banach spaces, G a quasi-complete locally convex space, A an open subset of E, f : A → F an analytic function, B an open subset of F containing f (A) and g : B → G an analytic function. Then, g ∘ f is analytic (exercise); see ([BOU 82a], 3.2.7), ([WHI 65], p. 1079).
The principle of analytic continuation ([P2], section 4.3.2, Theorem 4.76) can be generalized as follows (exercise: see [WHI 65], p. 1080): let E and F be two Banach spaces, Ω a connected open subset of E and f, g two analytic functions from Ω into F. If f and g coincide in any non-empty open subset of Ω, then they must be equal.
Lemma 1.24
LetEbe a Banach space and writeℌfor the subset of invertible operators inℒE. Letℐ:ℌ→ℌ:u↦u−1. The mappingℐis analytic and satisfiesDℐu0.h=−u0−1.h.u0−1for everyu0∈ℌ.
Proof
We know that ℌ is open in ℒE ([P2], section 3.4.1(II), Corollary 3.49). Let u0∈ℌ and s∈ℒE. We have u0 + s = u0 (1E − v), where v = − u− 1s. If || v || < 1, then 1E − v is invertible with inverse Σn ≥ 0vn (ibid.). Hence, if s<1u0, u0 + s has inverse ∑n ≥ 0(− u0− 1. s)nu0− 1, which shows that ℐ is analytic. Furthermore, ∑n ≥ 0(− u0− 1. s)nu0− 1 = u0− 1 − u0− 1. s. u0− 1 + o(‖s‖).
(III) Holomorphic functions Let E be a normed complex vector space with norm |.|, A some non-empty open subset of E, and F a complex quasi-complete locally convex space. Goursat’s theorem ([P2], section 4.2.4, Proposition 4.56) can be generalized as follows ([BOU 82a], 3.1.1): the function f : A → F is analytic if and only if it is holomorphic (i.e. complex-differentiable). If this condition is satisfied for E = E1 × … × En, let a =(a1,…, an) ∈ A, r = (r1, …, rn), where ri > 0, and cα=1α!Dαfa, where α is the multi-index (α1,…, αn). The Cauchy inequalities ([P2], section 4.3.2(II), Lemma-Definition 4.78(2)) can be generalized as follows (exercise): for rα ≔ r1α1…rnαn,
cαγ≤MγrαifMγ≔supξi−ai≤ri:i=1,…,nfξγ<∞.
Hence ([P2], section 4.3.2(II), Theorem-Definition 4.81(3)), if f is entire in E and bounded in F, then it must be constant (Liouville’s theorem). The statement of Hartogs’ theorem ([P2], section 4.3.2(II), Corollary 4.80) also holds, mutatis mutandis, for a function f : A → F, where A is an open subset of E1 × … × En, and each Ei is a complex normed vector space: any such function is analytic if and only if it is analytic in each of its variables when the others are held fixed.
Theorem 1.25
(maximum modulus) LetE(respectivelyF) be a complex Banach space (respectively quasi-complete Hausdorff locally convex space), A some connected non-empty open subset ofEandf : E → Fa holomorphic function. Let |.|γbe a continuous semi-norm onF. If the function | f |γ : x ↦ | f (x)|γis not constant, then it does not have a maximum in A.
Proof
1)Let us begin by showing the result by contradiction when E=F=ℂ. Suppose that f has a maximum in A. By translation, we may assume that 0 ∈ A and that this maximum is attained at 0. Let c0 = f (0). If f is not constant, then there exists bm ≠ 0 such that f (z) = c0 (1 + bmzm + zm.h (z)), where h is holomorphic in A and satisfies h (0) = 0. Choose r > 0 such that | z | ≤ r implies z ∈ A and hz≤12bm. Let t∈ℝ be such that emit=bmbm. For z = reit, we have
1+bmzm+zmhz≥1+12bmzm,
which is a contradiction.
2)In the case where E=ℂ, we can similarly argue by contradiction by assuming that there exist z0, z1 ∈ A such that | f (z) |γ ≤ | f (z0)|γ for all z ∈ A and | f (z1)|γ < | f (z0)|γ. Let V=λ.fz0:λ∈ℂ and η:V→ℂ:λ.fz0↦λ.fz0γ. Then, | η |γ = 1, where ηγ≔supy∈F,yγ≤1ηy. By the Hahn–Banach theorem ([P2], section 3.3.4(II), Theorem 3.25), there exists a continuous linear form ξ ∈ F∨ extending η such that | ξ |γ = 1. Therefore, for all x ∈ A, | ξ ∘ f (z)| ≤ | f (z)|γ ≤ | f (z0)|γ = | ξ ∘ f (z0)|, so ξ ∘ f is constant by (1). Hence, | ξ ∘ f (z1) = | ξ ∘ f (z0)| = | f (z0)|γ and | ξ ∘ f (z1)| ≤ | f (z1)|γ < | f (z0)|γ, contradiction.
3)In the general case, let g (ξ) = f (z0 + ξ (z − z0)) and suppose that | f (z)|γ ≤ | f (z0)|γ for all z ∈ A. Then, g is holomorphic in Ω=ξ∈ℂ:ξ<1+r for sufficiently small r > 0 and z sufficiently close to z0. Therefore, | g (ξ)|γ ≤ | f (z0)|γ = | g (0)|γ, and g is constant in Ω by (2). Thus, g (0) = g (1), so f (z) = f (z0). The set of z ∈ A satisfying this condition is non-empty, open and closed in A, and so must be equal to A ([P2], section 2.3.8).
1.2.6 The implicit function theorem and its consequences
(I) Banach–Caccioppoli fixed point theorem
Definition 1.26
Let (X, d) be a metric space and f : X →·X a mapping.
i)We say that a point ξ ∈ X is a fixed point of f if f (ξ) = ξ.
ii)We say that f is a contraction if there exists some constant k, 0 ≤ k < 1, such that, for all x, x′ ∈ X, d (f (x), f (x′)) ≤ k.d (x, x′).
Theorem 1.27
(Banach–Caccioppoli fixed point theorem) Every contraction in a complete metric space has a unique fixed point.
Proof
a)Uniqueness: If f (ξ) = ξ and f (ξ′) = ξ′, then d (f (ξ), f (ξ′)) = d (ξ, ξ′) and d (f (ξ), f (ξ′)) ≤ k.d (ξ, ξ′), so d (ξ, ξ′) ≤ k.d (ξ, ξ′), and thus (1 − k).d (ξ, ξ′) ≤ 0. Since 1 – k > 0, this implies that d (ξ, ξ′) = 0 and ξ = ξ′.
b)Existence: We will use the method of successive approximation: let (xn) be the sequence of elements of X defined from some arbitrary starting point x0 ∈ X by the recurrence relation xn + 1 = f (xn). For all n ≥ 0,
Hence, (xn) is a Cauchy sequence. Since X is complete, (xn) must have some limit ξ in X; but f is continuous, so ξ = f (ξ).
(II) Inverse mapping and implicit function theorems Below, we assume that 0 < p ≤ ω.
Definition 1.28
A diffeomorphism of class Cp(or a Cp-diffeomorphism) is a bijection of class Cpwhose inverse bijection is also of class Cp.
This definition can be localized in the obvious way as we did for homeomorphisms ([P2], section 2.3.4(III)). Every diffeomorphism (respectively local diffeomorphism) is clearly a homeomorphism (respectively local homeomorphism). A local diffeomorphism of class Cp is also known as an étale mapping of class Cp (see [P2], section 5.3.2(II)).
Theorem 1.29
(inverse mapping theorem) LetEandFbe Banach spaces and suppose thatfis a mapping of class Cptaking values inFand defined in a neighborhood of some point a ∈ E. Let b = f (a) and suppose that Df (a) is bijective. Then, fis a local diffeomorphism of class Cpfrom some neighborhood U of a into some neighborhood V of b; the inverse diffeomorphismg : V → U (of class Cp) satisfies Dg (b) = Df (a)− 1.
Proof
1)Preliminary: Since Dfa∈ℒEF is bijective, it is a linear homeomorphism by the Banach inverse operator theorem ([P2], section 3.2.3, Theorem 3.12(2)(i)). Hence, E and F are isomorphic and can be identified. We can also assume that Df (a) = 1E (left-multiplying f by Df (a)− 1 if necessary) and reduce to the case where a = 0 by translation. Let ϕ (x) = x − f (x). We have Dϕ (0) = 0, and, since Dϕ is continuous, there exists r > 0 such that x≤r⇒Dϕx≤12x. The mean value theorem (Theorem 1.13) then implies that ϕx≤12x whenever | x | ≤ r, i.e. ϕ (Brc (0)) ⊂ Br/2c (0), where Brc(0) ≔ {x ∈ E : | x | ≤ r}.
2)Existence of an inverse mappingg: Br/2c (0)) → Brc (0): Let y ∈ Br/2c (0). We will show that there exists a unique element x ∈ Brc (0) such that f (x) = y. Let ψy (x) = y + x − f (x). If y≤r2 and | x | ≤ r, then | ψу (x)| ≤ r, so ψу is a mapping from Brc (0) into Brc (0). For all x1, x2 ∈ Brc (0),
ψyx1−ψyx2=ϕx1−ϕx2≤12x1−x2.
Since Brc (0) is a complete metric space ([P2], section 2.4.4(II), Lemma 2.77), Theorem 1.27 implies that ψу has a unique fixed point in this set. This fixed point x satisfies y + x − f (x) = x, so f (x) = y, and x = g (y).
4)Differentiability ofg : Let yi = f (xi), yi ∈ Br/2c (0), xi ∈ Brc (0) (i = 1, 2). Then:
gy1−gy2−Dfx2−1.y1−y2=x1−x2−Dfx2−1.fx1−fx2.
By taking sufficiently small r > 0, we can guarantee that || Df (x2)− 1 || ≤ 1. Therefore,
gy1−gy2−Dfx2−1.y1−y2=o1x1−x2=o2y1−y2,
which shows that g is differentiable and Dg (y) = Df (x)− 1 in Br/2 (0).
5)Class ofg : If K=ℂ, the generalized Goursat theorem (section 1.2.5(III)) implies that p = ω. Consider the case where K=ℝ. Since Df and g are continuous and ℐ is analytic (Lemma 1.24), Dg=ℐ∘Df∘g is continuous, and g is therefore of class C1. By induction, it follows that if f is of class Cp (1 ≤ p ≤ ∞), then g is of class Cp. Suppose that f is analytic and therefore can be expressed as an absolutely convergent series f(x) = ∑pcp. xp(| x | < ρ). By (1), c0 = 0, c1 = 1E, so y = x + c2.x2 + c3.x3 + … According to a classical procedure dating back to Newton, x can be expressed in terms of y as a formal series x = y + ∑i ≥ 2di. yi. The terms di (i ≥ 2) are determined recursively: y2 = x2 + 2c2.x3 + …, y3 = x3 + …, which gives y = x + c2. (y2 − 2c2.x3) + …, and x = y − c2.y2 + (2c22 − c3).y3 + …, where c22.y3 := c2 (y, c2 (y, y)). We now need to study the convergence of this series. There exists K > 0 such that || ci || ≤ γi, where γi=Kρii≥2. Thus, the majorant series
η=ξ−∑i≥2γi.ξi=ξ−Kρ2∑i≥0ξiρi
[1.15]
converges for | ξ | < ρ, with sum φξ=ξ−Kρξ2ρ−ξ. As before, we can construct an inverse formal series ξ = η + ∑i ≥ 2δi. η. Cauchy showed that this series has a non-zero radius of convergence as follows. By elementary arithmetic, the relation η = φ (ξ) is invertible and we may write ξ = ψ (η), for ξ ∈ ]−∞, ξ1[, where ξ1=ρ1−KK+ρ>0, and η ∈ ]−∞, η1[, where η1=2K+ρ−2KK+ρ>0. It is easy to check that ψ can be expanded into an entire series in a neighborhood of 0 ([KNO 51], section 107). By section 1.2.5(I), the formal series y + ∑i ≥ 2di. yi therefore has a non-zero radius of convergence.
Theorem 1.30
(implicit function theorem) LetE, FandGbe Banach spaces, A some non-zero open subset ofE × Fandf : A → Ga mapping of class Cpin A. Let (a, b) ∈ A be such thatf (a, b) = 0 and suppose thatD2fab∈ℒFGis bijective. There exists some neighborhood U0of a inEsuch that, for every connected open set U ⊂ U0containing a, there is a unique continuous mapping u from U intoFsatisfying u(a) = b, (x, u (x)) ∈ A, andf (x, u(x)) = 0 for all x ∈ U. Furthermore, u is of class Cpin U and, for all x ∈ U,
Dux=−D2fxux−1∘D1fxux.
[1.16]
Proof
1)Existence of an implicit function: Since D2f (a, b) is bijective, it is an isomorphism from F onto G, and we may consider that F = G; furthermore, replacing f by D2f (a, b)− 1.f if necessary, we may assume that D2f (a, b) = 1F. Let φ : A → E × F:(x, y) ↦ (x, f (x, y)). We have
Dφab=1E0D1fab1F,
so Dφ (a, b) is invertible in ℒE×F. By Theorem 1.29, φ is a local diffeomorphism of class Cp and admits an inverse local diffeomorphism ψ of class Cp. Write ψ (x, z) = (x, h (x, z)), where h is defined and of class Cp in some neighborhood of (a, b) and takes values in F. Finally, set u (x) = h (x, 0). Then, u is of class Cp in some neighborhood of a and takes values in F. Thus, there exists some neighborhood U0 of a in E such that, for all x ∈ U0,
xfxux=φxux=φxhx0=φψx0=x0
and u (a) = h (a, 0),so (a, u (a)) = ψ (a, 0) = φ− 1 ((a, 0)) = (a, b), and therefore u (a) = b. Hence, u is an “implicit function” of class Cp.
2)Uniqueness of the implicit function: Since φ is a local homeomorphism, there exist a neighborhood U′0 of a and a neighborhood V0 of b such that there is a unique (x, y) in U′0 × V0 satisfying φ (x, y) = (x, 0) (see Figure 1.1, where Γ is the graph of f). We may assume that U′0 is the same U0 as above (replacing U0 by U0 ∩ U′0 if necessary). If a continuous mapping v : U0 → F satisfies v (a) = b and f (x, v(x)) = 0 for all x ∈ U0, then we may assume that v(x) ∈ V0 for all x ∈ U0, further reducing the neighborhood U0 of a if necessary. We may similarly assume that u, like v, is defined in U0. Let U ⊂ U0 be a connected neighborhood of a and suppose that M = {x ∈ U : u (x) = v (x)}. Then, a ∈ M and M is closed in U ([P2], section 2.3.3(II), Lemma 2.30). We will show that M is also open in U. By the hypotheses, the mapping x ↦ D2f(x, u (x)) is continuous and D2f (a, b) = 1F, so (again reducing the neighborhood U0 if necessary) we may assume that D2f (x, u(x)) is invertible for all x ∈ U0. Let a′ ∈ M. There exist a neighborhood Ua′ ⊂ U of a′ and a neighborhood Va′ ⊂ V0 of b′ = u (a′) such that, for all x ∈ Ua′, u (x) is the only solution y of f (x, y) = 0 satisfying y ∈ Va′. Given that v is continuous at a′ and v (a′)=u (a′), there exists a neighborhood W ⊂ Ua′ of a′ such that v (x) ∈ Va′ whenever x ∈ W. Therefore, v (x) = u (x) for all x ∈ W, which proves that M is open. The set M is non-empty, open, and closed in the connected space U, which implies that M = U ([P2], section 2.3.8).
3)Calculation of Du(x): Since f(x, u(x)) = 0 in U, the chain rule (Theorem 1.9) implies that
i)The implicit function theorem answers the following question: what condition makes it possible to express y = b + Δy uniquely as a function of x = a + Δx (so that y = u (x)) wheneverf (x, y) = 0 in an open neighborhood A of (a, b)? First of all, with suitable continuity hypotheses, neglecting second-order terms, and assuming that Δx, Δy are sufficiently small, we have∂f∂xxy.Δx+∂f∂yxy.Δy. Hence:
Δy≃D2fxy−1∘D1fxy−1⏟Dux.Δx
if D2f (x, y) is invertible, or equivalently if D2f (a, b) is invertible by continuity of D2f. Figure 1.1shows that the functional relation y = u (x) might only be valid in a sufficiently small neighborhood of (a, b). If x belongs to a connected neighborhood U0of a, the variable y such thatf (x, y) = 0 can be made arbitrarily close to b, provided that U0is chosen small enough.
ii)The reader may wish to find an expression for[1.16]in the finite-dimensional case using Jacobian matrices ([DIE 93], Volume 1, (10.2.2)); the composition of two linear mappings translates to the product of their matrices.
iii)The statement ofTheorem 1.30no longer holds whenFandGare arbitrary Fréchet spaces[SER 72]; however, it remains valid whenEis a non-complete normed vector space (see[SCH 93], Volume 2,Theorem 3.8.5).
(III) Immersions, submersions, subimmersions, the rank theorem In the following, we assume that 0 < p ≤ ω.
Corollary-Definition 1.32
1)LetE, Fbe Banach spaces, A an open subset ofE, a ∈ A, andi : A → Fa mapping of class Cpsuch thati (a) = 0, Di (a) is injective and its imageF1 = im (Di (a)) splits inF([P2], section 3.2.2(IV)), i.e. admits a topological complementF2(ibid.). Then, there exist a local homeomorphismr : F → F1 × F2in some neighborhood of 0 and an open neighborhood U ⊂ A of a in E such thatr ∘ iinduces a diffeomorphism of class Cpfrom U onto an open subset ofF1. The local homeomorphismris a local diffeomorphism of class Cp.
2)The mappingidefined above is called an immersion of class Cp.