X.-D. ZhangA Matrix Algebra Approach to Artificial Intelligencehttps://doi.org/10.1007/978-981-15-2770-8_2

2. Matrix Differential

Xian-Da Zhang¹

(1)

Department of Automation, Tsinghua University, Beijing, Beijing, China

Deceased

The matrix differential is a generalization of the multivariate function differential. The matrix differential (including the matrix partial derivative and gradient) is an important operation tool in matrix algebra and optimization in machine learning, neural networks, support vector machine and evolutional computation. This chapter is concerned with the theory and methods of matrix differential.

2.1 Jacobian Matrix and Gradient Matrix

In this section we discuss the partial derivatives of real functions. Table 2.1 summarizes the symbols of the real functions.

Table 2.1

Symbols of real functions

Function type	Variable $\mathbf {x}\in \mathbb {R}^m$	Variable $\mathbf {X}\in \mathbb {R}^{m\times n}$
Scalar function $f\in \mathbb {R}$	$f(\mathbf {x}):~\mathbb {R}^m\rightarrow \mathbb {R}$	$f(\mathbf {X}):~\mathbb {R}^{m\times n}\rightarrow \mathbb {R}$
Vector function $\mathbf {f}\in \mathbb {R}^p$	$\mathbf {f}(\mathbf {x}):~\mathbb {R}^m\rightarrow \mathbb {R}^p$	$\mathbf {f}(\mathbf {X}):~\mathbb {R}^{m\times n} \rightarrow \mathbb {R}^p$
Matrix function $\mathbf {F}\in \mathbb {R}^{p\times q}$	$\mathbf {F}(\mathbf {x}):~\mathbb {R}^m\rightarrow \mathbb {R}^{p\times q}$	$\mathbf {F}(\mathbf {X}):~\mathbb {R}^{m \times n}\rightarrow \mathbb {R}^{p\times q}$

2.1.1 Jacobian Matrix

First, we introduce definitions on partial derivations and Jacobian matrix.

1.
Row partial derivative operator with respect to an m × 1 vector is defined as
$\displaystyle \begin{aligned} {\boldsymbol \nabla}_{{\mathbf{x}}^T}^{\,} \stackrel{\mathrm{def}}{=} \frac{\partial}{\partial\,{\mathbf{x}}^T}=\left [\frac{\partial}{\partial x_1^{\,}},\ldots ,\frac{\partial}{\partial x_m^{\,}}\right ],{} \end{aligned}$

(2.1.1)

and row partial derivative vector of real scalar function f(x) with respect to its m × 1 vector variable x is given by
$\displaystyle \begin{aligned} {\boldsymbol \nabla}_{{\mathbf{x}}^T}^{\,} f(\mathbf{x})=\frac{\partial f(\mathbf{x})}{\partial\,{\mathbf{x}}^T}=\left [\frac{\partial f(\mathbf{x})} {\partial x_1^{\,}},\ldots , \frac{\partial f(\mathbf{x})}{\partial x_m^{\,}}\right ].{} \end{aligned}$

(2.1.2)
2.
Row partial derivative operator with respect to an m × n matrix X is defined as
$\displaystyle \begin{aligned} {\boldsymbol \nabla}_{(\mathrm{vec}\,\mathbf{X})^T}\stackrel{\mathrm{def}}{=}\frac{\partial}{\partial (\mathrm{vec}\,\mathbf{X})^T}=\left [\frac{\partial} {\partial x_{11}^{\,}},\ldots ,\frac{\partial}{\partial x_{m1}^{\,}},\ldots ,\frac{\partial}{\partial x_{1n}^{\,}},\ldots ,\frac{\partial}{\partial x_{mn}^{\,}} \right ], \end{aligned}$

(2.1.3)

and row partial derivative vector of real scalar function f(X) with respect to its matrix variable $\mathbf {X}\in \mathbb {R}^{m\times n}$ is given by
$\displaystyle \begin{aligned} {\boldsymbol \nabla}_{(\mathrm{vec}\,\mathbf{X})^T}^{\,} f(\mathbf{X})\!=\!\frac{\partial f(\mathbf{X})}{\partial (\mathrm{vec}\,\mathbf{X})^T}\!=\!\left [\frac{\partial f(\mathbf{X} )} {\partial x_{11}^{\,}},\ldots ,\frac{\partial f( \mathbf{X})}{\partial x_{m1}^{\,}},\ldots ,\frac {\partial f(\mathbf{X})}{\partial x_{1n}^{\,}},\ldots , \frac{\partial f(\mathbf{X})}{\partial x_{mn}^{\,}} \right ]{}. \end{aligned}$

(2.1.4)
3.
Jacobian operator with respect to an m × n matrix X is defined as

(2.1.5)

and Jacobian matrix of the real scalar function f(X) with respect to its matrix variable $\mathbf {X}\in \mathbb {R}^{m\times n}$ is given by

(2.1.6)

There is the following relationship between the Jacobian matrix and the row partial derivative vector:

$\displaystyle \begin{aligned} {\boldsymbol \nabla}_{(\mathrm{vec}\,\mathbf{X})^T}^{\,} f(\mathbf{X})=\mathrm{rvec}(\mathbf{J})=\left (\mathrm{vec}({\mathbf{J}}^T)\right )^T.{} \end{aligned}$

(2.1.7)

This important relation is the basis of the Jacobian matrix identification.

As a matter of fact, the Jacobian matrix is more useful than the row partial derivative vector.

The following theorem provides a specific expression for the Jacobian matrix of a p × q real-valued matrix function F(X) with m × n matrix variable X.

Definition 2.1 (Jacobian Matrix [6])

Let the vectorization of a p × q matrix function F(X) be given by

$\displaystyle \begin{aligned} \mathrm{vec}\, \mathbf{F}(\mathbf{X})\stackrel{\mathrm{def}}{=}[f_{11}(\mathbf{X}),\ldots ,f_{p1}(\mathbf{X}), \ldots , f_{1q}(\mathbf{X}),\ldots ,f_{pq}(\mathbf{X})]^T\ \in\mathbb{R}^{pq}. \end{aligned}$

(2.1.8)

Then the pq × mn Jacobian matrix of F(X) is defined as

$\displaystyle \begin{aligned} \mathbf{J}={\boldsymbol \nabla}_{(\mathrm{vec}\mathbf{X})^T}\mathbf{F}(\mathbf{X})\stackrel{\mathrm{def}}{=}\frac{\partial\,\mathrm{vec} \mathbf{F}(\mathbf{X})}{\partial (\mathrm{vec} \mathbf{X})^T}\ \in\mathbb{R}^{pq\times mn}{} \end{aligned}$

(2.1.9)

whose specific expression J is given by

../images/492994_1_En_2_Chapter/492994_1_En_2_Equ10_HTML.png

(2.1.10)

2.1.2 Gradient Matrix

The partial derivative operator in column form is referred to as the gradient vector operator.

Definition 2.2 (Gradient Vector Operators)

The gradient vector operators with respect to an m × 1 vector x and to an m × n matrix X are, respectively, defined as

$\displaystyle \begin{aligned} \nabla_{\mathbf{x}}^{\,} \stackrel{\mathrm{def}}{=} \frac{\partial}{\partial\,\mathbf{x}}=\left [\frac{\partial}{\partial x_1^{\,}}, \ldots , \frac{\partial}{\partial x_m^{\,}}\right ]^T \end{aligned}$

(2.1.11)

and

$\displaystyle \begin{aligned} \nabla_{\mathrm{vec}\,\mathbf{X}}^{\,} \stackrel{\mathrm{def}}{=}\frac{\partial}{\partial\,\mathrm{vec}\,\mathbf{X}}=\left [\frac{\partial}{\partial x_{11}^{\,}},\ldots ,\frac{\partial}{\partial x_{1n}^{\,}},\ldots ,\frac {\partial}{\partial x_{m1}^{\,}},\ldots ,\frac{\partial} {\partial x_{mn}^{\,}} \right ]^T. \end{aligned}$

(2.1.12)

Definition 2.3 (Gradient Matrix Operator)

The gradient matrix operator with respect to an m × n matrix X, denoted as $\nabla _{\mathbf {X}}=\frac {\partial } {\partial \,\mathbf {X}}$ , is defined as

../images/492994_1_En_2_Chapter/492994_1_En_2_Equ13_HTML.png

(2.1.13)

Definition 2.4 (Gradient Vectors)

The gradient vectors of functions f(x) and f(X) are, respectively, defined as

$\displaystyle \begin{aligned} \nabla_{\mathbf{x}}^{\,} f(\mathbf{x})&\stackrel{\mathrm{def}}{=}\left [\frac {\partial f( \mathbf{x})}{\partial x_1^{\,}}, \ldots ,\frac{\partial f( \mathbf{x})}{\partial x_m^{\,}}\right ]^T= \frac {\partial f( \mathbf{x})}{\partial\mathbf{x}}, \end{aligned}$

(2.1.14)

$\displaystyle \begin{aligned} \nabla_{\mathrm{vec}\,\mathbf{X}}^{\,} f(\mathbf{X})&\stackrel{\mathrm{def}}{=}\left [\frac{\partial f(\mathbf{X})}{\partial x_{11}^{\,}}, \ldots ,\frac{\partial f(\mathbf{X})}{\partial x_{m1}^{\,}},\ldots ,\frac{\partial f(\mathbf{X})}{\partial x_{1n}^{\,}},\ldots , \frac{\partial f(\mathbf{X})}{\partial x_{mn}^{\,}} \right ]^T. \end{aligned}$

(2.1.15)

Definition 2.5 (Gradient Matrix)

The gradient matrix of the function f(X) is defined as

../images/492994_1_En_2_Chapter/492994_1_En_2_Equ16_HTML.png

(2.1.16)

For a real matrix function $\mathbf {F}(\mathbf {X})\in \mathbb {R}^{p\times q}$ with matrix variable $\mathbf {X}\in \mathbb {R}^{m\times n}$ , its gradient matrix is defined as

$\displaystyle \begin{aligned} \nabla_{\mathbf{X}}^{\,} \mathbf{F}(\mathbf{X})=\frac{\partial\,(\mathrm{vec}\,\mathbf{F}(\mathbf{X}))^T}{\partial\,\mathrm{vec} \mathbf{X}}=\left ( \frac{\partial \mathrm{vec} \mathbf{F}(\mathbf{X})}{\partial (\mathrm{vec}\,\mathbf{X})^T}\right )^T.{} \end{aligned}$

(2.1.17)

Comparing Eq. (2.1.16) with Eq. (2.1.6), we have

$\displaystyle \begin{aligned} \nabla_{\mathbf{X}}^{\,} f(\mathbf{X})={\mathbf{J}}^T. \end{aligned}$

(2.1.18)

Similarly, comparing Eq. (2.1.17) with Eq. (2.1.9) gives

$\displaystyle \begin{aligned} \nabla_{\mathbf{X}}\mathbf{F}(\mathbf{X})={\mathbf{J}}^T. \end{aligned}$

(2.1.19)

That is to say, the gradient matrix of a real scalar function f(X) or a real matrix function F(X) is equal to the transpose of respective Jacobian matrix.

An obvious fact is that, given a real scalar function f(x), its gradient vector is directly equal to the transpose of the partial derivative vector. In this sense, the partial derivative in row vector form is a covariant form of the gradient vector, so the row partial derivative vector is also known as the cogradient vector. Similarly, the Jacobian matrix is sometimes called the cogradient matrix. The cogradient is a covariant operator [3] that itself is not the gradient, but is related to the gradient.

For this reason, the partial derivative operator ∂∕∂x ^T and the Jacobian operator ∂∕∂ X ^T are known as the (row) partial derivative operator, the covariant form of the gradient operator or the cogradient operator.

The direction of the negative gradient $-\nabla _{\mathbf {x}}^{\,} f(\mathbf {x})$ is known as the gradient flow direction of the function f(x) at the point x, and is expressed as

$\displaystyle \begin{aligned} \dot{\mathbf{x}}=-\nabla_{\mathbf{x}}^{\,} f(\mathbf{x})\quad \text{or}\quad \dot{\mathbf{X}}=-\nabla_{\mathrm{vec} \mathbf{X}}^{\,} f(\mathbf{X}). \end{aligned}$

(2.1.20)

From the definition formula of the gradient vector, we have the following conclusive remarks:

In the gradient flow direction, the function f(x) decreases at the maximum descent rate. On the contrary, in the opposite direction (i.e., the positive gradient direction), the function increases at the maximum ascent rate.
Each component of the gradient vector gives the rate of change of the scalar function f(x) in the component direction.

2.1.3 Calculation of Partial Derivative and Gradient

The gradient computation of a real function with respect to its matrix variable has the following properties and rules [5]:

1.
If f(X) = c, where c is a real constant and X is an m × n real matrix, then the gradient $\partial c/\partial \mathbf {X}={\mathbf {O}}_{m\times n}^{\,}$ .
2.
Linear rule: If f(X) and g(X) are two real-valued functions of the matrix variable X, and $c_1^{\,}$ and $c_2^{\,}$ are two real constants, then
$\displaystyle \begin{aligned} \begin{aligned} \frac{\partial [c_1^{\,} f(\mathbf{X})+c_2^{\,} g( \mathbf{X})]}{\partial\mathbf{X}}= c_1^{\,} \frac{\partial f(\mathbf{X})} {\partial\,\mathbf{X}}+c_2^{\,}\frac{\partial g(\mathbf{X})}{\partial\mathbf{X}} \end{aligned} . \end{aligned}$

(2.1.21)
3.
Product rule: If f(X), g(X), and h(X) are real-valued functions of the matrix variable X, then
$\displaystyle \begin{aligned} \begin{aligned} \frac {\partial (f(\mathbf{X})g(\mathbf{X}))}{\partial\,\mathbf{X}}=g(\mathbf{X})\frac {\partial f(\mathbf{X})}{\partial \mathbf{X}}+f(\mathbf{X}) \frac{\partial g(\mathbf{X})}{\partial\,\mathbf{X}} \end{aligned} \end{aligned}$

(2.1.22)

and
$\displaystyle \begin{aligned} \frac {\partial (f(\mathbf{X}) g(\mathbf{X})h(\mathbf{X}))}{\partial\mathbf{X}}&=g(\mathbf{X})h(\mathbf{X})\frac{\partial f(\mathbf{X})}{\partial\mathbf{X}}+f(\mathbf{X})h(\mathbf{X})\frac{\partial g(\mathbf{X})}{\partial\,\mathbf{X}}\\ &\quad +f(\mathbf{X})g(\mathbf{X})\frac{\partial h(\mathbf{X})}{\partial\,\mathbf{X}}. \end{aligned}$

(2.1.23)
4.
Quotient rule: If g(X) ≠ 0, then
$\displaystyle \begin{aligned} \begin{aligned} \frac{\partial (f(\mathbf{X})/g(\mathbf{X}))} {\partial\,\mathbf{X}}=\frac 1{g^2 (\mathbf{X})}\left ( g(\mathbf{X})\frac {\partial f(\mathbf{X})}{\partial\,\mathbf{X}}-f(\mathbf{X})\frac{\partial g(\mathbf{X})}{\partial\,\mathbf{X}}\right ) \end{aligned}. \end{aligned}$

(2.1.24)
5.
Chain rule: If X is an m × n matrix and y = f(X) and g(y) are, respectively, the real-valued functions of the matrix variable X and of the scalar variable y, then
$\displaystyle \begin{aligned} \begin{aligned} \frac {\partial g(f(\mathbf{X}))}{\partial\,\mathbf{X}}=\frac{\mathrm{d}g(y)}{\mathrm{d}y}\frac{\partial f(\mathbf{X})} {\partial\,\mathbf{X}} \end{aligned}. \end{aligned}$

(2.1.25)

As an extension, if g(F(X)) = g(F), where $\mathbf {F}=[f_{kl}]\in \mathbb {R}^{p\times q}$ and $\mathbf {X}=[x_{ij}]\in \mathbb {R}^{m\times n}$ , then the chain rule is given by Petersen and Petersen [7]

$\displaystyle \begin{aligned} \left [\frac{\partial g(\mathbf{F})}{\partial\,\mathbf{X}}\right ]_{ij}=\frac{\partial g(\mathbf{F})}{\partial x_{ij}}= \sum_{k=1}^p\sum_{l=1}^q \frac {\partial g(\mathbf{F})}{\partial f_{kl}}\frac{\partial f_{kl}}{\partial x_{ij}}. \end{aligned}$

(2.1.26)

When computing the partial derivative of the functions f(x) and f(X), it is necessary to make the following basic assumption.

Independence Assumption

Given a real-valued function f, we assume that the vector variable $\mathbf {x}=[x_i^{\,} ]_{i=1}^m\in \mathbb {R}^m$ and the matrix variable $\mathbf {X}=[x_{ij}^{\,} ]_{i=1, j=1}^{m,n}\in \mathbb {R}^{m \times n}$ do not themselves have any special structure; namely, the entries of x (and X) are independent.

The independence assumption can be expressed in mathematical form as follows:

$\displaystyle \begin{aligned} \frac{\partial x_i^{\,}}{\partial x_j^{\,}}&=\delta_{ij}^{\,} =\left\{\begin{array}{ll} 1,&~~i=j;\\ &{}\\ 0,&~~\text{otherwise}.\end{array}\right.{} \end{aligned}$

(2.1.27)

$\displaystyle \begin{aligned} \frac{\partial x_{kl}^{\,}}{\partial x_{ij}^{\,}}&=\delta_{ki}^{\,} \delta_{lj}^{\,} =\left\{\begin{array}{ll} 1,&~~k=i \text{ and }l=j;\\ &{}\\ 0, &~~\text{otherwise}.\end{array}\right.{}\end{aligned}$

(2.1.28)

These expressions on independence are the basic formulas for partial derivative computation.

Example 2.1

Consider the real function

$\displaystyle \begin{aligned} f(\mathbf{X})={\mathbf{a}}^T\mathbf{XX}^T \mathbf{b}=\sum_{k=1}^m\sum_{l=1}^m a_k^{\,}\left (\sum_{p=1}^n x_{kp}x_{lp}\right )b_l^{\,} ,\quad \mathbf{X}\in \mathbb{R}^{m\times n},~\mathbf{a},\mathbf{b}\in\mathbb{R}^{n\times 1}. \end{aligned}$

Using Eq. (2.1.28), we can see easily that

$\displaystyle \begin{aligned} \left[\frac{\partial f(\mathbf{X})}{\partial\,{\mathbf{X}}^T}\right]_{ij}&=\frac{\partial f(\mathbf{X})}{\partial x_{ji}}= \sum_{k=1}^m\sum_{l=1}^m \sum_{p=1}^n \frac{\partial a_k^{\,} x_{kp}x_{lp}b_l^{\,}}{\partial x_{ji}}\\ &=\sum_{k=1}^m \sum_{l=1}^m \sum_{p=1}^n\left [ a_k^{\,} x_{lp}b_l^{\,}\frac{\partial x_{kp}}{\partial x_{ji}}+a_k^{\,} x_{kp} b_l^{\,} \frac{\partial x_{lp}}{\partial x_{ji}}\right ]\\ &=\sum_{i=1}^m\sum_{l=1}^m \sum_{j=1}^n a_j^{\,} x_{li}b_l^{\,} + \sum_{k=1}^m \sum_{i=1}^m \sum_{j=1}^n a_k^{\,} x_{ki} b_j^{\,} \\ &=\sum_{i=1}^m\sum_{j=1}^n \left [{\mathbf{X}}^T\mathbf{b} \right]_i a_j^{\,} +\left [{\mathbf{X}}^T\mathbf{a}\right ]_i b_j^{\,} , \end{aligned}$

which yields, respectively, the Jacobian matrix and the gradient matrix as follows:

$\displaystyle \begin{aligned} \mathbf{J}={\mathbf{X}}^T(\mathbf{ba}^T+\mathbf{ab}^T)\quad \text{and}\quad \nabla_{\mathbf{X}}^{\,} f(\mathbf{X})={\mathbf{J}}^T=(\mathbf{ba}^T+\mathbf{ab}^T)\mathbf{X}. \end{aligned}$

Example 2.2

Let F(X) = AXB, where $\mathbf {A}\in \mathbb {R}^{p\times m},\mathbf {X}\in \mathbb {R}^{m\times n},\mathbf {B}\in \mathbb {R}^{n\times q}$ . We have

$\displaystyle \begin{aligned} &\frac{\partial f_{kl}^{\,}}{\partial x_{ij}^{\,}}=\frac{\partial [\mathbf{AXB}]_{kl}}{\partial x_{ij}^{\,}}=\frac{\partial \left (\sum_{u=1}^m \sum_{v=1}^n a_{ku}^{\,} x_{uv}^{\,} b_{vl}^{\,}\right )}{\partial x_{ij}}=b_{jl}^{\,} a_{ki}^{\,}\\ \Rightarrow&~\nabla_{\mathbf{X}}^{\,} (\mathbf{AXB})= \mathbf{B}\otimes {\mathbf{A}}^T~~ \Rightarrow~~ \mathbf{J}=(\nabla_{\mathbf{X}}^{\,} (\mathbf{AXB}))^T={\mathbf{B}}^T\otimes \mathbf{A}. \end{aligned}$

That is, the pq × mn Jacobian matrix is J = B ^T ⊗A, and the mn × pq gradient matrix is given by $\nabla _{\mathbf {X}}^{\,} (\mathbf {AXB})=\mathbf {B}\otimes {\mathbf {A}}^T$ .

2.2 Real Matrix Differential

Although direct computation of partial derivatives $\partial f_{kl}^{\,} /\partial x_{ji}^{\,}$ or $\partial f_{kl}^{\,} /\partial x_{ij}^{\,}$ can be used to find the Jacobian matrices or the gradient matrices of many matrix functions, for more complex functions (such as the inverse matrix, the Moore–Penrose inverse matrix and the exponential functions of a matrix), direct computation of their partial derivatives is more complicated and difficult. Hence, naturally we want to have an easily remembered and effective mathematical tool for computing the Jacobian matrices and the gradient matrices of real scalar functions and real matrix functions. Such a mathematical tool is the matrix differential.

2.2.1 Calculation of Real Matrix Differential

The differential of an m × n matrix $\mathbf {X}=[x_{ij}^{\,} ]$ is known as the matrix differential, denoted dX, which is still m × n matrix and is defined as $\mathrm {d} \mathbf {X}=[\mathrm {d}x_{ij}^{\,} ]_{i=1,j=1}^{m,n}$ .

Consider the differential of a trace function tr(U). We have

$\displaystyle \begin{aligned} \mathrm{d}(\mathrm{tr}\,\mathbf{U})=\mathrm{d}\bigg(\sum_{i=1}^n u_{ii}^{\,}\bigg)=\sum_{i=1}^n \mathrm{d}u_{ii}^{\,} =\mathrm{tr}(\mathrm{d} \mathbf{U}), \end{aligned}$

namely d(tr U) = tr(dU).

The matrix differential of the matrix product UV is given in element form:

$\displaystyle \begin{aligned}{}[\mathrm{d}(\mathbf{UV})]_{ij}^{\,} &=\mathrm{d}\left ( [\mathbf{UV}]_{ij}^{\,}\right )=\mathrm{d} \bigg( \sum_k u_{ik}^{\,} v_{kj}^{\,}\bigg) =\sum_k \mathrm{d}(u_{ik}^{\,} v_{kj}^{\,} )\\ &=\sum_k \left ( (\mathrm{d}u_{ik}^{\,} )v_{kj}^{\,} +u_{ik}^{\,} \mathrm{d}v_{kj}^{\,} \right )=\sum_k (\mathrm{d}u_{ik}^{\,} )v_{kj}^{\,} + \sum_k u_{ik}^{\,} \mathrm{d} v_{kj}^{\,}\\ &=[(\mathrm{d}\mathbf{U})\mathbf{V} ]_{ij}^{\,} + [\mathbf{U}(\mathrm{d}\mathbf{V})]_{ij}^{\,} . \end{aligned}$

Then, we have the matrix differential d(UV) = (dU)V + U(dV).

Common computation formulas for matrix differential are given as follows [6, pp. 148–154]:

1.
The differential of a constant matrix is a zero matrix, namely dA = O.
2.
The matrix differential of the product αX is given by d(αX) = α dX.
3.
The matrix differential of a transposed matrix is equal to the transpose of the original matrix differential, namely d(X ^T) = (dX)^T.
4.
The matrix differential of the sum (or difference) of two matrices is given by d(U ±V) = dU ±dV. More generally, we have d(aU ± bV) = a ⋅dU ± b ⋅dV.
5.
The matrix differentials of the functions UV and UVW, where U = F(X), V = G(X), W = H(X), are, respectively, given by
$\displaystyle \begin{aligned} \mathrm{d}(\mathbf{UV})&=(\mathrm{d}\mathbf{U})\mathbf{V}+\mathbf{U}(\mathrm{d}\mathbf{V}) \end{aligned}$

(2.2.1)

$\displaystyle \begin{aligned} \mathrm{d}(\mathbf{UVW})&=(\mathrm{d}\mathbf{U})\mathbf{VW}+ \mathbf{U}(\mathrm{d}\mathbf{V})\mathbf{W}+\mathbf{UV}(\mathrm{d}\mathbf{W}). \end{aligned}$

(2.2.2)

If A and B are constant matrices, then d(AXB) = A(dX)B.
6.
The differential of the matrix trace d(tr X) is equal to the trace of the matrix differential dX, namely
$\displaystyle \begin{aligned} \mathrm{d}(\mathrm{tr}\,\mathbf{X})=\mathrm{tr}(\mathrm{d} \mathbf{X}). \end{aligned}$

(2.2.3)

In particular, the differential of the trace of the matrix function F(X) is given by d(tr F(X)) = tr(dF(X)).
7.
The differential of the determinant of X is given by
$\displaystyle \begin{aligned} \mathrm{d}|\mathbf{X}|=|\mathbf{X}| \mathrm{tr}({\mathbf{X}}^{-1}\mathrm{d} \mathbf{X}). \end{aligned}$

(2.2.4)

In particular, the differential of the determinant of the matrix function F(X) is computed by d|F(X)| = |F(X)|tr(F ⁻¹(X)dF(X)).
8.
The matrix differential of the Kronecker product is given by
$\displaystyle \begin{aligned} \mathrm{d}(\mathbf{U}\otimes \mathbf{V})=(\mathrm{d}\mathbf{U})\otimes\mathbf{V}+ \mathbf{U} \otimes \mathrm{d}\mathbf{V}. \end{aligned}$

(2.2.5)
9.
The matrix differential of the Hadamard product is computed by
$\displaystyle \begin{aligned} \mathrm{d}(\mathbf{U}\odot \mathbf{V})=(\mathrm{d}\mathbf{U})\odot \mathbf{V}+\mathbf{U}\odot \mathrm{d}\mathbf{V}. \end{aligned}$

(2.2.6)
10.
The matrix differential of the inverse matrix is given by
$\displaystyle \begin{aligned} \mathrm{d}({\mathbf{X}}^{-1})=-{\mathbf{X}}^{-1}(\mathrm{d} \mathbf{X}){\mathbf{X}}^{-1}. \end{aligned}$

(2.2.7)
11.
The differential of the vectorization function vec X is equal to the vectorization of the matrix differential, i.e.,
$\displaystyle \begin{aligned} \mathrm{d}\,\mathrm{vec}\,\mathbf{X}=\mathrm{vec}(\mathrm{d} \mathbf{X}). \end{aligned}$

(2.2.8)
12.
The differential of the matrix logarithm is given by
$\displaystyle \begin{aligned} \mathrm{d}\log \mathbf{X}={\mathbf{X}}^{-1}\mathrm{d} \mathbf{X}. \end{aligned}$

(2.2.9)

In particular, $\mathrm {d}\log \mathbf {F}(\mathbf {X})={\mathbf {F}}^{-1}(\mathbf {X})\,\mathrm {d} \mathbf {F}(\mathbf {X})$ .
13.
The matrix differentials of X ^†, X ^†X, and XX ^† are given by
$\displaystyle \begin{aligned} \mathrm{d}({\mathbf{X}}^\dagger)=&\,-{\mathbf{X}}^\dagger (\mathrm{d} \mathbf{X}){\mathbf{X}}^\dagger +{\mathbf{X}}^\dagger({\mathbf{X}}^\dagger)^T (\mathrm{d} {\mathbf{X}}^T)(\mathbf{I}-\mathbf{XX}^\dagger )\\ &\,+( \mathbf{I}-{\mathbf{X}}^\dagger \mathbf{X})(\mathrm{d} {\mathbf{X}}^T)({\mathbf{X}}^\dagger)^T {\mathbf{X}}^\dagger , \end{aligned}$

(2.2.10)

$\displaystyle \begin{aligned} \mathrm{d}({\mathbf{X}}^\dagger\mathbf{X})=&\,{\mathbf{X}}^\dagger (\mathrm{d} \mathbf{X})(\mathbf{I}-{\mathbf{X}}^\dagger\mathbf{X})+\left ( {\mathbf{X}}^\dagger (\mathrm{d} \mathbf{X})(\mathbf{I}-{\mathbf{X}}^\dagger\mathbf{X})\right )^T, \end{aligned}$

(2.2.11)

$\displaystyle \begin{aligned} \mathrm{d}(\mathbf{XX}^\dagger)=&\, (\mathbf{I}-\mathbf{XX}^\dagger)(\mathrm{d} \mathbf{X}){\mathbf{X}}^\dagger +\left ( (\mathbf{I}- \mathbf{XX}^\dagger)(\mathrm{d} \mathbf{X}){\mathbf{X}}^\dagger\right )^T. \end{aligned}$

(2.2.12)

2.2.2 Jacobian Matrix Identification

In multivariate calculus, the multivariate function $f(x_1^{\,}, \ldots , x_m^{\,} )$ is said to be differentiable at the point $(x_1^{\,}, \ldots , x_m^{\,})$ , if a change in $f(x_1^{\,}, \ldots , x_m^{\,} )$ can be expressed as

$\displaystyle \begin{aligned} \Delta f(x_1^{\,} ,\ldots ,x_m^{\,} )&=f(x_1^{\,} +\Delta x_1^{\,} ,\ldots ,x_m^{\,} +\Delta x_m^{\,} )-f(x_1^{\,} ,\ldots , x_m^{\,} )\\ &=A_1^{\,} \Delta x_1^{\,} +\cdots +A_m^{\,} \Delta x_m^{\,} +O(\Delta x_1^{\,} ,\ldots ,\Delta x_m^{\,} ), \end{aligned}$

here $A_1^{\,} ,\ldots ,A_m^{\,}$ are independent of $\Delta x_1^{\,} ,\ldots , \Delta x_m^{\,}$ , respectively, and $O(\Delta x_1^{\,} ,$ $\ldots , \Delta x_m^{\,} )$ denotes the second-order and the higher-order terms in $\Delta x_1^{\,} ,\ldots , \Delta x_m^{\,}$ . In this case, the partial derivative $\partial f/\partial x_1^{\,} ,\ldots , \partial f/\partial x_m^{\,}$ must exist, and

$\displaystyle \begin{aligned} \frac{\partial f}{\partial x_1^{\,}}=A_1^{\,} ,\quad \ldots \quad ,\quad \frac {\partial f}{\partial x_m^{\,}}=A_m^{\,} . \end{aligned}$

The linear part of the change $\Delta f(x_1^{\,} ,\ldots ,x_m^{\,} )$ ,

$\displaystyle \begin{aligned} A_1^{\,} \Delta x_1^{\,} +\cdots +A_m^{\,} \Delta x_m^{\,} =\frac{\partial f}{\partial x_1^{\,}}\mathrm{d}x_1^{\,} +\cdots + \frac{\partial f}{\partial x_m^{\,}}\mathrm{d}x_m^{\,} , \end{aligned}$

is said to be the differential or first-order differential of the multivariate function $f(x_1^{\,} ,\ldots ,x_m^{\,} )$ and is denoted by

$\displaystyle \begin{aligned} \mathrm{d}f(x_1^{\,} ,\ldots ,x_m^{\,} )=\frac{\partial f}{\partial x_1^{\,}}\,\mathrm{d}x_1^{\,} +\cdots + \frac{\partial f} {\partial x_m^{\,}}\,\mathrm{d}x_m^{\,} .{} \end{aligned}$

(2.2.13)

The sufficient condition for a multivariate function $f(x_1^{\,} ,\ldots , x_m^{\,} )$ to be differentiable at the point $(x_1^{\,} ,\ldots ,x_m^{\,} )$ is that the partial derivatives $\partial f/\partial x_1^{\,},\ldots , \partial f/\partial x_m^{\,}$ exist and are continuous.

Equation (2.2.13) provides two Jacobian matrix identification methods.

For a scalar function f(x) with variable $\mathbf {x}=[x_1^{\,} ,\ldots ,x_m^{\,} ]^T\in \mathbb {R}^m$ , if regarding the elements $x_1^{\,} ,\ldots ,x_m^{\,}$ as m variables, and using Eq. (2.2.13), then we can directly obtain the differential of the scalar function f(x) as follows:
or
$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{x})=\frac{\partial f(\mathbf{x})}{\partial\,{\mathbf{x}}^T}\,\mathrm{d} \mathbf{x},{} \end{aligned}$

(2.2.14)
where $\frac {\partial f(\mathbf {x})}{\partial \,{\mathbf {x}}^T}=\Big [\frac {\partial f(\mathbf {x})}{\partial x_1}, \ldots , \frac {\partial f(\mathbf {x})}{\partial x_m}\Big ]$ and dx = [dx ₁, …, dx _m]^T. If denoting the row vector $\mathbf {A}=\frac {\partial f(\mathbf {x})}{\partial \, {\mathbf {x}}^T}$ , then the first-order differential in (2.2.14) can be represented as a trace:
$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{x})=\frac{\partial f(\mathbf{x})}{\partial\,{\mathbf{x}}^T}\mathrm{d}\mathbf{x}=\mathbf{A}\mathrm{d}\mathbf{x}=\mathrm{tr}(\mathbf{A}\,\mathrm{d}\mathbf{x}) \end{aligned}$
because Adx is a scalar, and for any scalar α we have α = tr(α). This shows that there is an equivalence relationship between the Jacobian matrix of the scalar function f(x) and its matrix differential as follows:
$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{x})=\mathrm{tr}(\mathbf{A}\,\mathrm{d} \mathbf{x})\quad \Leftrightarrow \quad \mathbf{J}=\frac{\partial f(\mathbf{x})}{\partial\,{\mathbf{x}}^T}=\mathbf{A}. \end{aligned}$

(2.2.15)
In other words, if the differential of the function f(x) is denoted as df(x) = tr(A dx), then the matrix A is just the Jacobian matrix of the function f(x).
For a scalar function f(X) with variable $\mathbf {X}=[{\mathbf {x}}_1^{\,} ,\ldots ,{\mathbf {x}}_n^{\,} ]\in \mathbb {R}^{m\times n}$ , if denoting ${\mathbf {x}}_j^{\,} =[ x_{1j}^{\,} ,\ldots ,x_{mj}^{\,} ]^T, j=1,\ldots ,n$ , then Eq. (2.2.13) becomes

(2.2.16)
By the relationship between the row partial derivative vector and the Jacobian matrix in Eq. (2.1.7), ${\boldsymbol \nabla }_{(\mathrm {vec}\,\mathbf {X})^T}^{\,} f(\mathbf {X})=\left (\mathrm {vec}({\mathbf {J}}^T)\right )^T$ , Eq. (2.2.16) can be written as
$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{X})=(\mathrm{vec}\,{\mathbf{A}}^T)^T\mathrm{d}(\mathrm{vec}\,\mathbf{X}),{} \end{aligned}$

(2.2.17)
where

(2.2.18)
is the Jacobian matrix of the scalar function f(X). Using the relationship between the vectorization operator vec and the trace function tr(B ^TC) = (vec B)^Tvec C, and letting B = A ^T and C = dX, then Eq. (2.2.17) can be expressed in the trace form as
$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{X})=\mathrm{tr}(\mathbf{A} \mathrm{d} \mathbf{X}).{} \end{aligned}$

(2.2.19)
This can be regarded as the canonical form of the differential of a scalar function f(X).

The above discussion shows that once the matrix differential of a scalar function df(X) is expressed in its canonical form, we can identify the Jacobian matrix and/or the gradient matrix of the scalar function f(X), as stated below.

Proposition 2.1

If a scalar function f(X) is differentiable at the point X, then the Jacobian matrix A can be directly identified as follows [ 6]:

$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{x})=\mathrm{tr}(\mathbf{A}\mathrm{d} \mathbf{x})\quad &\Leftrightarrow\quad \mathbf{J}=\mathbf{A}, \end{aligned}$

(2.2.20)

$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{X})=\mathrm{tr}(\mathbf{A}\mathrm{d} \mathbf{X})\quad &\Leftrightarrow\quad \mathbf{J}=\mathbf{A}.{} \end{aligned}$

(2.2.21)

Proposition 2.1 motivates the following effective approach to directly identifying the Jacobian matrix $\mathbf {J}=\mathrm {D}_{\mathbf {X}}^{\,} f(\mathbf {X})$ of the scalar function f(X):

1.
Find the differential df(X) of the real function f(X), and denote it in the canonical form as df(X) = tr(A dX).
2.
The Jacobian matrix is directly given by A.

The following are two main points in applying Proposition 2.1:

Any scalar function f(X) can always be written in the form of a trace function, because f(X) = tr(f(X)).
No matter where dX appears initially in the trace function, we can place it in the rightmost position via the trace property tr(C(dX)B) = tr(BC dX), giving the canonical form df(X) = tr(A dX).

It has been shown [6] that the Jacobian matrix A is uniquely determined: if there are A ₁ and A ₂ such that df(X) = A _idX, i = 1, 2, then A ₁ = A ₂.

Since the gradient matrix is the transpose of the Jacobian matrix for a given real function f(X), Proposition 2.1 implies in addition that

$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{X})=\mathrm{tr}(\mathbf{A} \mathrm{d} \mathbf{X})\ \Leftrightarrow\ \nabla_{\mathbf{X}}^{\,} f( \mathbf{X}) ={\mathbf{A}}^T. \end{aligned}$

(2.2.22)

Because the Jacobian matrix A is uniquely determined, the gradient matrix is uniquely determined as well.

Jacobian Matrices of Trace Functions

Example 2.3

The differential of the trace tr(X ^TAX) is given by

$\displaystyle \begin{aligned} \mathrm{d}\,\mathrm{tr}({\mathbf{X}}^T\mathbf{AX})&=\mathrm{tr} \left (\mathrm{d}({\mathbf{X}}^T\mathbf{AX})\right )=\mathrm{tr} \left ( (\mathrm{d} \mathbf{X})^T\mathbf{AX}+ {\mathbf{X}}^T\mathbf{A}\mathrm{d} \mathbf{X}\right )\\ &=\mathrm{tr} \left ((\mathrm{d} \mathbf{X})^T\mathbf{AX}\right )+ \mathrm{tr}({\mathbf{X}}^T\mathbf{A}\mathrm{d} \mathbf{X})=\mathrm{tr} \left ( (\mathbf{AX})^T\mathrm{d} \mathbf{X}\right )+\mathrm{tr}({\mathbf{X}}^T \mathbf{A}\mathrm{d} \mathbf{X})\\ &=\mathrm{tr} \left ( {\mathbf{X}}^T ({\mathbf{A}}^T+ \mathbf{A})\mathrm{d} \mathbf{X} \right ), \end{aligned}$

which yields the gradient matrix

$\displaystyle \begin{aligned} \frac{\partial\, \mathrm{tr}({\mathbf{X}}^T\mathbf{AX})}{\partial\,\mathbf{X}}=\left ({\mathbf{X}}^T ({\mathbf{A}}^T+ \mathbf{A})\right )^T=( \mathbf{A}+{\mathbf{A}}^T)\mathbf{X}. \end{aligned}$

(2.2.23)

Similarly, we can compute the differential matrices and Jacobian matrices of other typical trace functions.

Table 2.2 summarizes the differential matrices and Jacobian matrices of several typical trace functions [6].

Table 2.2

Differential matrices and Jacobian matrices of trace functions

f(X)	Differential df(X)	Jacobian matrix J = ∂f(X)∕∂X
tr(X)	tr(IdX)	I
tr(X ⁻¹)	−tr(X ⁻²dX)	−X ⁻²
tr(AX)	tr(AdX)	A
tr(X ²)	2tr(XdX)	2X
tr(X ^TX)	2tr(X ^TdX)	X ^T
tr(X ^TAX)	$\mathrm {tr}\left ({\mathbf {X}}^T(\mathbf {A}+{\mathbf {A}}^T)\mathrm {d} \mathbf {X} \right )$	X ^T(A + A ^T)
tr(XAX ^T)	$\mathrm {tr} \left ( (\mathbf {A}+{\mathbf {A}}^T){\mathbf {X}}^T\mathrm {d} \mathbf {X}\right )$	(A + A ^T)X ^T
tr(XAX)	$\mathrm {tr} \left ((\mathbf {AX}+\mathbf {XA})\mathrm {d} \mathbf {X}\right )$	AX + XA
tr(AX ⁻¹)	$-\mathrm {tr} \left ({\mathbf {X}}^{-1}\mathbf {AX}^{-1} \mathrm {d} \mathbf {X}\right )$	−X ⁻¹AX ⁻¹
tr(AX ⁻¹B)	$-\mathrm {tr} \left ({\mathbf {X}}^{-1}\mathbf {BAX}^{-1}\mathrm {d} \mathbf {X}\right )$	−X ⁻¹BAX ⁻¹
$\mathrm {tr}\left (( \mathbf {X}+\mathbf {A})^{-1}\right )$	$-\mathrm {tr}\left ((\mathbf {X}+\mathbf {A})^{-2}\mathrm {d} \mathbf {X} \right )$	− (X + A)⁻²
tr(XAXB)	$\mathrm {tr}\left ((\mathbf {AXB}+\mathbf {BXA})\mathrm {d} \mathbf {X} \right )$	AXB + BXA
tr(XAX ^TB)	$\mathrm {tr}\left ((\mathbf {AX}^T\mathbf {B}+ {\mathbf {A}}^T{\mathbf {X}}^T{\mathbf {B}}^T)\mathrm {d} \mathbf {X} \right )$	AX ^TB + A ^TX ^TB ^T
tr(AXX ^TB)	$\mathrm {tr} \left ({\mathbf {X}}^T(\mathbf {BA}+{\mathbf {A}}^T {\mathbf {B}}^T)\mathrm {d} \mathbf {X}\right )$	X ^T(BA + A ^TB ^T)
tr(AX ^TXB)	$\mathrm {tr}\left ((\mathbf {BA}+{\mathbf {A}}^T {\mathbf {B}}^T) {\mathbf {X}}^T\mathrm {d} \mathbf {X}\right )$	(BA + A ^TB ^T)X ^T

Here A ⁻² = A ⁻¹A ⁻¹

Jacobian Matrices of Determinant Functions

Consider the Jacobian matrix identification of typical determinant functions.

Example 2.4

For the nonsingular matrix XX ^T, we have

$\displaystyle \begin{aligned} \mathrm{d}|\mathbf{XX}^T|&=|\mathbf{XX}^T|\,\mathrm{tr} \left ((\mathbf{XX}^T)^{-1}\mathrm{d}(\mathbf{XX}^T)\right )\\ &=|\mathbf{XX}^T|\left (\mathrm{tr} \left ((\mathbf{XX}^T)^{-1}(\mathrm{d} \mathbf{X}){\mathbf{X}}^T\right )+\mathrm{tr} \left ( (\mathbf{XX}^T)^{-1}\mathbf{X}(\mathrm{d} \mathbf{X})^T\right )\right )\\ &=|\mathbf{XX}^T|\left ( \mathrm{tr} \left ({\mathbf{X}}^T (\mathbf{XX}^T)^{-1}\mathrm{d} \mathbf{X}\right ) +\mathrm{tr} \left ( {\mathbf{X}}^T( \mathbf{XX}^T)^{-1}\mathrm{d} \mathbf{X} \right )\right )\\ &=\mathrm{tr} \left (2 |\mathbf{XX}^T|\,{\mathbf{X}}^T (\mathbf{XX}^T)^{-1}\mathrm{d} \mathbf{X}\right ).\end{aligned}$

By Proposition 2.1, we get the gradient matrix

$\displaystyle \begin{aligned} \frac{\partial |\mathbf{XX}^T|}{\partial\,\mathbf{X}}=2|\mathbf{XX}^T|\,(\mathbf{XX}^T)^{-1}\mathbf{X}.\end{aligned}$

(2.2.24)

Similarly, set $\mathbf {X}\in \mathbb {R}^{m\times n}$ . If rank(X) = n, i.e., X ^TX is invertible, then

$\displaystyle \begin{aligned} \mathrm{d}|{\mathbf{X}}^T\mathbf{X}|=\mathrm{tr} \left ( 2|{\mathbf{X}}^T\mathbf{X}|({\mathbf{X}}^T\mathbf{X})^{-1}{\mathbf{X}}^T\mathrm{d} \mathbf{X} \right ),\end{aligned}$

(2.2.25)

and hence ∂|X ^TX|∕∂ X = 2|X ^TX| X(X ^TX)⁻¹.

Similarly, we can compute the differential matrices and Jacobian matrices of other typical determinant functions.

Table 2.3 summarizes the real matrix differentials and the Jacobian matrices of several typical determinant functions.

Table 2.3

Differentials and Jacobian matrices of determinant functions

f(X)	Differential df(X)	Jacobian matrix J = ∂f(X)∕∂X
\|X\|	\|X\| tr(X ⁻¹dX)	\|X\|X ⁻¹
$\log \|\mathbf {X}\|$	tr(X ⁻¹dX)	X ⁻¹
\|X ⁻¹\|	−\|X ⁻¹\| tr(X ⁻¹dX)	−\|X ⁻¹\|X ⁻¹
\|X ²\|	$2\|\mathbf {X}\|{ }^2\,\mathrm {tr} \left ({\mathbf {X}}^{-1} \mathrm {d} \mathbf {X}\right )$	2\|X\|²X ⁻¹
\|X ^k\|	k\|X\|^k tr(X ⁻¹dX)	k\|X\|^kX ⁻¹
\|XX ^T\|	$2\|\mathbf {XX}^T\|\, \mathrm {tr} \left ({\mathbf {X}}^T (\mathbf {XX}^T)^{-1}\mathrm {d} \mathbf {X} \right )$	2\|XX ^T\|X ^T(XX ^T)⁻¹
\|X ^TX\|	$2\|{\mathbf {X}}^T\mathbf {X}\|\,\mathrm {tr} \left ( ({\mathbf {X}}^T\mathbf {X})^{-1}{\mathbf {X}}^T\mathrm {d} \mathbf {X}\right )$	2\|X ^TX\|(X ^TX)⁻¹X ^T
$\log \|{\mathbf {X}}^T\mathbf {X}\|$	$2\mathrm {tr}\left (({\mathbf {X}}^T\mathbf {X})^{-1}{\mathbf {X}}^T\mathrm {d} \mathbf {X}\right )$	2(X ^TX)⁻¹X ^T
\|AXB\|	$\|\mathbf {AXB} \|\,\mathrm {tr} \left ( \mathbf {B}(\mathbf {AXB})^{-1}\mathbf {A}\mathrm {d} \mathbf {X}\right )$	\|AXB\|B(AXB)⁻¹A
\|XAX ^T\|	$\begin {array}{l}\|\mathbf {XAX}^T\|\,\mathrm {tr}\Big ( \big (\mathbf {AX}^T(\mathbf {XAX}^T)^{-1}\\ \quad \qquad +(\mathbf {XA})^T( \mathbf {XA}^T{\mathbf {X}}^T )^{-1}\big )\mathrm {d} \mathbf {X}\Big )\end {array}$	$\begin {array}{l}\|\mathbf {XAX}^T \|\Big (\mathbf {AX}^T(\mathbf {XAX}^T)^{-1}\\ \quad \qquad +(\mathbf {XA})^T( \mathbf {XA}^T{\mathbf {X}}^T)^{-1}\Big ) \end {array}$
\|X ^TAX\|	$\begin {array}{l}\|{\mathbf {X}}^T\mathbf {AX}\|\mathrm {tr}\Big (\big (({\mathbf {X}}^T\mathbf {AX})^{-T}(\mathbf {AX})^T\\ \quad \qquad +({\mathbf {X}}^T \mathbf {AX})^{-1}{\mathbf {X}}^T \mathbf {A}\big )\mathrm {d} \mathbf {X} \Big )\end {array}$	$\begin {array}{l}\|{\mathbf {X}}^T \mathbf {AX}\| \Big (({\mathbf {X}}^T\mathbf {AX})^{-T} (\mathbf {AX})^T\\ \quad \qquad +({\mathbf {X}}^T \mathbf {AX})^{-1}{\mathbf {X}}^T\mathbf {A}\Big )\end {array}$

2.2.3 Jacobian Matrix of Real Matrix Functions

Let $f_{kl}=f_{kl}^{\,} (\mathbf {X})$ be the entry of the kth row and lth column of the real matrix function F(X); then $\mathrm {d}f_{kl}(\mathbf {X})=[\mathrm {d} \mathbf {F}(\mathbf {X})]_{kl}^{\,}$ represents the differential of the scalar function $f_{kl}^{\,} (\mathbf {X})$ with respect to the variable matrix X. From Eq. (2.2.16) we have

../images/492994_1_En_2_Chapter/492994_1_En_2_Equn_HTML.png

$\displaystyle \begin{aligned} \mathrm{d} \mathrm{vec} \mathbf{F}(\mathbf{X})=\mathbf{A} \mathrm{d} \mathrm{vec} \mathbf{X},{} \end{aligned}$

(2.2.26)

where

$\displaystyle \begin{aligned} \mathrm{d} \mathrm{vec} \mathbf{F}(\mathbf{X})&= [\mathrm{d}f_{11}(\mathbf{X}),\ldots ,\mathrm{d}f_{p1}(\mathbf{X}),\ldots ,\mathrm{d}f_{1q} (\mathbf{X}),\ldots ,\mathrm{d}f_{pq}(\mathbf{X})]^T, \end{aligned}$

(2.2.27)

$\displaystyle \begin{aligned} \mathrm{d}\,\mathrm{vec} \mathbf{X}&=[\mathrm{d}x_{11}^{\,} ,\ldots ,\mathrm{d}x_{m1}^{\,} ,\ldots ,\mathrm{d}x_{1n}^{\,} ,\ldots ,\mathrm{d}x_{mn}^{\,} ]^T, \end{aligned}$

(2.2.28)

../images/492994_1_En_2_Chapter/492994_1_En_2_Equ57_HTML.png

(2.2.29)

In other words, the matrix A is the Jacobian matrix $\mathbf {J}=\frac {\partial \,\mathrm {vec}\mathbf {F}(\mathbf {X})} {\partial (\mathrm {vec}\,\mathbf {X})^T}$ of the matrix function F(X).

Let $\mathbf {F}( \mathbf {X})\in \mathbb {R}^{p\times q}$ be a matrix function including X and X ^T as variables, where $\mathbf {X}\in \mathbb {R}^{m\times n}$ .

Theorem 2.1 ([6])

Given a matrix function $\mathbf {F}(\mathbf {X}):\mathbb {R}^{m\times n}\to \mathbb {R}^{p\times q}$ , then its pq × mn Jacobian matrix can be identified as follows:

$\displaystyle \begin{aligned} \mathrm{d}\mathbf{F}(\mathbf{X})&=\mathbf{A}(\mathrm{d} \mathbf{X})\mathbf{B}+\mathbf{C}(\mathrm{d} {\mathbf{X}}^T)\mathbf{D},\\ \Leftrightarrow~ \mathbf{J}&=\frac{\partial\,\mathrm{vec}\,\mathbf{F}(\mathbf{X})} {\partial (\mathrm{vec}\,\mathbf{X})^T}=({\mathbf{B}}^T\otimes\mathbf{A})+({\mathbf{D}}^T\otimes\mathbf{C}){\mathbf{K}}_{mn}, \end{aligned}$

(2.2.30)

the mn × pq gradient matrix can be determined from

$\displaystyle \begin{aligned} \nabla_{\mathbf{X}}^{\,} \mathbf{F}(\mathbf{X})=\frac{\partial (\mathrm{vec}\,\mathbf{F}(\mathbf{X}))^T}{\partial\, \mathrm{vec}\, \mathbf{X}}=(\mathbf{B}\otimes{\mathbf{A}}^T)+{\mathbf{K}}_{nm}(\mathbf{D}\otimes{\mathbf{C}}^T). \end{aligned}$

(2.2.31)

Table 2.4 summarizes the matrix differentials and Jacobian matrices of some real functions.

Table 2.4

Matrix differentials and Jacobian matrices of real functions

Functions	Matrix differential	Jacobian matrix
$f(x):~\mathbb {R}\rightarrow \mathbb {R}$	df(x) = Adx	$A\in \mathbb {R}$
$f(\mathbf {x}):~\mathbb {R}^m\rightarrow \mathbb {R}$	df(x) = Adx	$\mathbf {A}\in \mathbb {R}^{1\times m}$
$f(\mathbf {X}):~\mathbb {R}^{m\times n}\rightarrow \mathbb {R}$	df(X) = tr(AdX)	$\mathbf {A}\in \mathbb {R}^{n\times m}$
$\mathbf {f}(\mathbf {x}):~\mathbb {R}^m\rightarrow \mathbb {R}^p$	df(x) = Adx	$\mathbf {A}\in \mathbb {R}^{p\times m}$
$\mathbf {f}(\mathbf {X}):~\mathbb {R}^{m\times n}\rightarrow \mathbb {R}^p$	df(X) = Ad(vecX)	$\mathbf {A}\in \mathbb {R}^{p\times mn}$
$\mathbf {F}(\mathbf {x}):~\mathbb {R}^m\rightarrow \mathbb {R}^{p\times q}$	d vecF(x) = Adx	$\mathbf {A}\in \mathbb {R}^{pq\times m}$
$\mathbf {F}(\mathbf {X}):~\mathbb {R}^{m\times n}\rightarrow \mathbb {R}^{p\times q}$	dF(X) = A(dX)B + C(dX ^T)D	$({\mathbf {B}}^T\otimes \mathbf {A})+({\mathbf {D}}^T\otimes \mathbf {C}){\mathbf {K}}_{mn}\in \mathbb {R}^{pq \times mn}$

Table 2.5 lists some matrix functions and their Jacobian matrices.

Table 2.5

Differentials and Jacobian matrices of matrix functions

F(X)	dF(X)	Jacobian matrix
X ^TX	X ^TdX + (dX ^T)X	(I _n ⊗X ^T) + (X ^T ⊗I _n)K _mn
XX ^T	X(dX ^T) + (dX)X ^T	(I _m ⊗X)K _mn + (X ⊗I _m)
AX ^TB	A(dX ^T)B	(B ^T ⊗A)K _mn
X ^TBX	X ^TB dX + (dX ^T)BX	I ⊗ (X ^TB) + ((BX)^T ⊗I)K _mn
AX ^TBXC	A(dX ^T)BXC + AX ^TB(dX)C	((BXC)^T ⊗A)K _mn + C ^T ⊗ (AX ^TB)
AXBX ^TC	A(dX)BX ^TC + AXB(dX ^T)C	(BX ^TC)^T ⊗A + (C ^T ⊗ (AXB))K _mn
X ⁻¹	−X ⁻¹(dX)X ⁻¹	− (X ^−T ⊗X ⁻¹)
X ^k	$\sum \limits _{j=1}^k{\mathbf {X}}^{j-1}(\mathrm {d} \mathbf {X}){\mathbf {X}}^{k-j}$	$\sum \limits _{j=1}^k({\mathbf {X}}^T)^{k-j}\otimes {\mathbf {X}}^{j-1}$
$\log \mathbf {X}$	X ⁻¹dX	I ⊗X ⁻¹
$\exp (\mathbf {X})$	$\sum \limits _{k=0}^\infty \frac 1{(k+1)!}\sum \limits _{j=0}^k {\mathbf {X}}^j (\mathrm {d} \mathbf {X}){\mathbf {X}}^{k-j}$	$\sum \limits _{k=0}^\infty \frac 1{(k+1)!}\sum \limits _{j=0}^k({\mathbf {X}}^T)^{k-j}\otimes {\mathbf {X}}^j$

Example 2.5

Let F(X, Y) = X ⊗Y be the Kronecker product of two matrices $\mathbf {X}\in \mathbb {R}^{p\times m}$ and $\mathbf {Y}\in \mathbb {R}^{n\times q}$ . Consider the matrix differential dF(X, Y) = (dX) ⊗Y + X ⊗ (dY). By the vectorization formula vec(X ⊗Y) = (I _m ⊗K _qp ⊗I _n)(vec X ⊗vec Y), we have

$\displaystyle \begin{aligned} \mathrm{vec}(\mathrm{d} \mathbf{X}\otimes\mathbf{Y})&=({\mathbf{I}}_m\otimes{\mathbf{K}}_{qp}\otimes{\mathbf{I}}_n)(\mathrm{d}\,\mathrm{vec} \mathbf{X}\otimes \mathrm{vec}\,\mathbf{Y})\\ &=({\mathbf{I}}_m\otimes{\mathbf{K}}_{qp}\otimes{\mathbf{I}}_n)({\mathbf{I}}_{pm} \otimes \mathrm{vec}\,\mathbf{Y})\mathrm{d}\,\mathrm{vec} \mathbf{X}, \end{aligned}$

(2.2.32)

$\displaystyle \begin{aligned} \mathrm{vec}(\mathbf{X}\otimes\mathrm{d}\mathbf{Y})&=({\mathbf{I}}_m \otimes{\mathbf{K}}_{qp}\otimes{\mathbf{I}}_n)(\mathrm{vec}\,\mathbf{X}\otimes \mathrm{d}\,\mathrm{vec} \mathbf{Y})\\ &=({\mathbf{I}}_m\otimes{\mathbf{K}}_{qp}\otimes{\mathbf{I}}_n)(\mathrm{vec}\,\mathbf{X}\otimes {\mathbf{I}}_{nq})\mathrm{d}\,\mathrm{vec} \mathbf{Y}. \end{aligned}$

(2.2.33)

Hence, the Jacobian matrices with respect to the variable matrices X and Y are, respectively, given by

$\displaystyle \begin{aligned} {\mathbf{J}}_{\mathbf{X}}^{\,} (\mathbf{X}\otimes\mathbf{Y})&=({\mathbf{I}}_m\otimes{\mathbf{K}}_{qp}\otimes{\mathbf{I}}_n)({\mathbf{I}}_{pm}\otimes \mathrm{vec}\,\mathbf{Y}), \end{aligned}$

(2.2.34)

$\displaystyle \begin{aligned} {\mathbf{J}}_{\mathbf{Y}}^{\,} (\mathbf{X}\otimes\mathbf{Y})&=({\mathbf{I}}_m\otimes{\mathbf{K}}_{qp}\otimes{\mathbf{I}}_n)(\mathrm{vec}\,\mathbf{X}\otimes {\mathbf{I}}_{nq}). \end{aligned}$

(2.2.35)

The analysis and examples in this section show that the first-order real matrix differential is indeed an effective mathematical tool for identifying the Jacobian matrix and the gradient matrix of a real function. And this tool is simple and easy to master.

2.3 Complex Gradient Matrices

In many engineering applications, observed data are usually complex. In these cases, the objective function of an optimization problem is a real-valued function of a complex vector or matrix. Hence, the gradient of the real objective function with respect to the complex vector or matrix variable is a complex vector or complex matrix. This complex gradient has the following two forms:

Complex gradient: the gradient of the objective function with respect to the complex vector or matrix variable itself;
Conjugate gradient: the gradient of the objective function with respect to the complex conjugate vector or matrix variable.

2.3.1 Holomorphic Function and Complex Partial Derivative

Before discussing the complex gradient and conjugate gradient, it is necessary to recall the relevant facts about complex functions.

Definition 2.6 (Complex Analytic Function [4])

Let $D\subseteq \mathbb {C}$ be the definition domain of the function $f:D\rightarrow \mathbb {C}$ . The function f(z) with complex variable z is said to be a complex analytic function in the domain D if f(z) is complex differentiable, namely $\displaystyle \lim _{\Delta z\rightarrow 0}\frac {f(z+\Delta z)-f(z)}{\Delta z}$ exists for all z ∈ D.

In the standard framework of complex functions, a complex function f(z) (where z = x + jy) is written in the real polar coordinates $r\stackrel {\mathrm {def}}{=}(x,y)$ as f(r) = f(x, y).

The terminology “complex analytic function” is commonly replaced by the completely synonymous terminology “holomorphic function.” It is noted that a (real) analytic complex function is (real) in the real-variable x-domain and y-domain, but is not necessarily holomorphic in the complex variable domain z = x + jy, i.e., it may be complex nonanalytic.

A complex function f(z) can always be expressed in terms of its real part u(x, y) and imaginary part v(x, y) as

$\displaystyle \begin{aligned} f(z)=u(x,y)+\mathrm{j}v(x,y), \end{aligned}$

where z = x + jy and both u(x, y) and v(x, y) are real functions.

For a holomorphic scalar function, the following four statements are equivalent [2]:

1.
The complex function f(z) is a holomorphic (i.e., complex analytic) function.
2.
The derivative f ^′(z) of the complex function exists and is continuous.
3.
The complex function f(z) satisfies the Cauchy–Riemann condition
$\displaystyle \begin{aligned} \frac{\partial u}{\partial x} = \frac{\partial v}{\partial y} \quad \text{and} \quad \frac{\partial v}{\partial x} = - \frac{\partial u}{\partial y}. \end{aligned}$

(2.3.1)
4.
All derivatives of the complex function f(z) exist, and f(z) has a convergent power series.

The Cauchy–Riemann condition is also called the Cauchy–Riemann equations. The function f(z) = u(x, y) + jv(x, y) is a holomorphic function only when both the real functions u(x, y) and v(x, y) satisfy the Laplace equations at the same time:

$\displaystyle \begin{aligned} \frac{\partial^2 u(x,y)}{\partial x^2}+\frac{\partial^2u(x,y)}{\partial y^2}=0\quad \text{and}\quad \frac{\partial^2v(x,y)}{\partial x^2}+\frac{\partial^2 v(x,y)}{\partial y^2}=0. \end{aligned}$

(2.3.2)

A real function g(x, y) is called a harmonic function, if it satisfies the Laplace equation

$\displaystyle \begin{aligned} \frac{\partial^2 g(x,y)}{\partial x^2}+\frac{\partial^2 g(x,y)}{\partial y^2}= 0. \end{aligned}$

(2.3.3)

A complex function f(z) = u(x, y) + jv(x, y) is not a holomorphic function if any of two real functions u(x, y) and v(x, y) does not meet the Cauchy–Riemann condition or the Laplace equations.

Although the power function z ⁿ, the exponential function e^z, the logarithmic function $\ln z$ , the sine function $\sin z$ , and the cosine function $\cos z$ are holomorphic functions, i.e., analytic functions in the complex plane, many commonly used functions are not holomorphic. A natural question to ask is whether there is a general representation form f(z, ⋅) instead of f(z) such that f(z, ⋅) is always holomorphic. The key to solving this problem is to adopt f(z, z ^∗) instead of f(z), as shown in Table 2.6.

Table 2.6

Forms of complex-valued functions

Function	Variables $z,z^*\in \mathbb {C}$	Variables $\mathbf {z},{\mathbf {z}}^* \in \mathbb {C}^m$	Variables $\mathbf {Z},\mathbf {Z^*}\in \mathbb {C}^{m\times n}$
$f\in \mathbb {C}$	$\begin {array}{@{}l}f(z,z^*)\\ f:\mathbb {C}\times \mathbb {C}\rightarrow \mathbb {C}\end {array}$	$\begin {array}{@{}l}f(\mathbf {z},{\mathbf {z}}^*)\\ f:\mathbb {C}^m\times \mathbb {C}^m\rightarrow \mathbb {C}\end {array}$	$\begin {array}{@{}l}f(\mathbf {Z},{\mathbf {Z}}^*)\\ f:\mathbb {C}^{m\times n} \times \mathbb {C}^{m\times n}\rightarrow \mathbb {C}\end {array}$
$\mathbf {f}\in \mathbb {C}^p$	$\begin {array}{@{}l}\mathbf {f}(z,z^*)\\ \mathbf {f}: \mathbb {C}\times \mathbb {C}\rightarrow \mathbb {C}^p\end {array}$	$\begin {array}{@{}l} \mathbf {f}(\mathbf {z}, {\mathbf {z}}^*)\\ \mathbf {f}:\mathbb {C}^m \times \mathbb {C}^m\rightarrow \mathbb {C}^p\end {array}$	$\begin {array}{l}\hspace{-2pt} \mathbf {f}(\mathbf {Z},{\mathbf {Z}}^*)\\ \hspace{-2pt} \mathbf {f}:\mathbb {C}^{m\times n}\times \mathbb {C}^{m\times n}\rightarrow \mathbb {C}^p\end {array}$
$\mathbf {F}\in \mathbb {C}^{p\times q}$	$\begin {array}{l} \hspace{-2pt}\mathbf {F} (z,z^*)\\ \hspace{-2pt}\mathbf {F}:\mathbb {C}\times \mathbb {C} \rightarrow \mathbb {C}^{p\times q}\end {array}$	$\begin {array}{l} \hspace{-2pt}\mathbf {F}(\mathbf {z}, {\mathbf {z}}^*)\\ \hspace{-2pt}\mathbf {F}:\mathbb {C}^m\times \mathbb {C}^m \rightarrow \mathbb {C}^{p\times q}\end {array}$	$\begin {array}{l} \hspace{-2pt} \mathbf {F}(\mathbf {Z}, {\mathbf {Z}}^*)\\ \hspace{-2pt}\mathbf {F}:\mathbb {C}^{m\times n}\times \mathbb {C}^{m\times n}\rightarrow \mathbb {C}^{p\times q}\end {array}$

In the framework of complex derivatives, the formal partial derivatives of complex numbers are defined as

$\displaystyle \begin{aligned} \frac {\partial}{\partial z}=\frac 12 \left ( \frac {\partial}{\partial x}-\mathrm{j}\,\frac {\partial}{\partial y} \right ),\quad \frac {\partial}{\partial z^*}=\frac 12 \left ( \frac {\partial}{\partial x}+\mathrm{j}\,\frac {\partial}{\partial y} \right ).{} \end{aligned}$

(2.3.4)

The formal partial derivatives above were presented by Wirtinger [8] in 1927, so they are sometimes called Wirtinger partial derivatives.

On the partial derivatives of the complex variable z = x + jy, a basic assumption is the independence of its real and imaginary parts:

$\displaystyle \begin{aligned} \frac{\partial x}{\partial y}=0\quad \text{and}\quad \frac{\partial y}{\partial x}=0. \end{aligned}$

(2.3.5)

From both the definition of the partial derivative and the above independence assumption, it is easy to conclude that

$\displaystyle \begin{aligned} \frac{\partial z}{\partial z^*}&=\frac{\partial x}{\partial z^*}+\mathrm{j}\frac{\partial y}{\partial z^*}=\frac 12 \left (\frac{\partial x} {\partial x}+\mathrm{j}\,\frac {\partial x}{\partial y}\right )+\mathrm{j} \frac 12 \left ( \frac{\partial y}{\partial x}+\mathrm{j}\,\frac{\partial y}{\partial y}\right )\\ &=\frac 12 (1+0)+\mathrm{j}\frac 12 (0+ \mathrm{j}),\\ \frac{\partial z^*}{\partial z}&=\frac{\partial x}{\partial z}-\mathrm{j}\frac{\partial y}{\partial z}=\frac 12\left (\frac{\partial x}{\partial x}-\mathrm{j}\frac {\partial x}{\partial y}\right )-\mathrm{j}\frac 12\left (\frac{\partial y}{\partial x}-\mathrm{j}\frac{\partial y}{\partial y}\right )=\frac 12 (1-0)-\mathrm{j}\frac 12(0-\mathrm{j}). \end{aligned}$

That is to say,

$\displaystyle \begin{aligned} \frac {\partial z}{\partial z^*}=0\qquad \text{and}\qquad \frac{\partial z^*}{\partial z}=0.{} \end{aligned}$

(2.3.6)

Equation (2.3.6) reveals a basic result in the theory of complex variables: the complex variable z and the complex conjugate variable z ^∗ are independent variables.

Therefore, when finding the complex partial derivative ∇_zf(z, z ^∗) and the complex conjugate partial derivative $\nabla _{z^*}^{\,} f(z,z^* )$ , the complex variable z and the complex conjugate variable z ^∗ can be regarded as two independent variables:

$\displaystyle \begin{aligned} \nabla_z f(z,z^* )=\left .\frac{\partial f(z,z^*)}{\partial z}\right |{}_{z^*=\mathrm{const}},\quad \nabla_{z^*}^{\,} f(z,z^* )=\left .\frac{\partial f(z,z^*)} {\partial z^*} \right |{}_{z=\mathrm{const}}.{} \end{aligned}$

(2.3.7)

This implies that when any nonholomorphic function f(z) is written as f(z, z ^∗), it becomes holomorphic, because, for a fixed z ^∗ value, the function f(z, z ^∗) is analytic on the whole complex plane z = x + jy, and for a fixed z value, the function f(z, z ^∗) is analytic on the whole complex plane z ^∗ = x −jy, see, e.g., [2].

Table 2.7 is a comparison between the nonholomorphic and holomorphic representation forms of complex functions.

Table 2.7

Nonholomorphic and holomorphic functions

Functions	Nonholomorphic	Holomorphic
Coordinates	$\left \{\begin {aligned}{} &r \stackrel {\mathrm {def}}{=} (x,y)\in \mathbb {R}\times \mathbb {R}\\ &z=x+\mathrm {j}\,y\end {aligned}\right .$	$\left \{\begin {aligned}{} &r\stackrel {\mathrm {def}}{=} (z, z^)\in \mathbb {C}\times \mathbb {C}\\ &z=x+\mathrm {j}\,y,~z^=x-\mathrm {j}\,y\end {aligned} \right .$
Representation	f(r) = f(x, y)	f(c) = f(z, z ^∗)

The function f(z) = |z|² itself is not a holomorphic function with respect to z, but f(z, z ^∗) = |z|² = zz ^∗ is holomorphic, because its first-order partial derivatives ∂|z|²∕∂z = z ^∗ and ∂|z|²∕∂z ^∗ = z exist and are continuous.

The following are common formulas and rules for the complex partial derivatives:

1.
The conjugate partial derivative of the complex conjugate function $\frac {\partial f^*(z,z^*)}{\partial z^*}$ :
$\displaystyle \begin{aligned} \frac{\partial f^*(z,z^*)}{\partial z^*}=\left (\frac{\partial f(z,z^*)}{\partial z}\right )^*. \end{aligned}$

(2.3.8)
2.
The partial derivative of the conjugate of the complex function $\frac {\partial f^*(z,z^*)}{\partial z}$ :
$\displaystyle \begin{aligned} \frac{\partial f^*(z,z^*)}{\partial z}=\left (\frac{\partial f(z,z^*)}{\partial z^*}\right )^*. \end{aligned}$

(2.3.9)
3.
Complex differential rule
$\displaystyle \begin{aligned} \mathrm{d}f(z,z^*)=\frac{\partial f(z,z^*)}{\partial z}\mathrm{d}z+\frac{\partial f(z,z^* )}{\partial z^*}\mathrm{d}z^*. \end{aligned}$

(2.3.10)
4.
Complex chain rule
$\displaystyle \begin{aligned} \frac{\partial h(g(z,z^*))}{\partial z}&=\frac{\partial h(g(z,z^*))}{\partial g(z,z^*)}\frac{\partial g(z,z^*)} {\partial z}+\frac{\partial h(g(z,z^*))}{\partial g^*(z,z^*)}\frac{\partial g^*(z,z^*)}{\partial z}, \end{aligned}$

(2.3.11)

$\displaystyle \begin{aligned} \frac{\partial h(g(z,z^*))}{\partial z^*}&=\frac{\partial h(g(z,z^*))}{\partial g(z,z^*)}\frac{\partial g(z,z^*)} {\partial z^*}+\frac{\partial h(g(z,z^*))}{\partial g^*(z,z^*)}\frac{\partial g^*(z,z^*)}{\partial z^*}. \end{aligned}$

(2.3.12)

2.3.2 Complex Matrix Differential

Consider the complex matrix function F(Z) and the holomorphic complex matrix function F(Z, Z ^∗).

On holomorphic complex matrix functions, the following statements are equivalent [1]:

The matrix function F(Z) is a holomorphic function of the complex matrix variable Z.
The complex matrix differential $\mathrm {d}\,\mathrm {vec} \mathbf {F}(\mathbf {Z})=\frac {\partial \,\mathrm {vec} \mathbf {F}(\mathbf {Z})}{\partial (\mathrm {vec} \mathbf {Z})^T}\mathrm {d}\,\mathrm {vec} \mathbf {Z}$ .
For all Z, $\frac {\partial \,\mathrm {vec} \mathbf {F}(\mathbf {Z})}{\partial (\mathrm {vec}\,{\mathbf {Z}}^*)^T}=\mathbf {O}$ (zero matrix) holds.
For all Z, $\frac {\partial \,\mathrm {vec} \mathbf {F}(\mathbf {Z})}{\partial (\mathrm {vec} \mathrm {Re}\mathbf {Z})^T}+\mathrm {j}\frac {\partial \,\mathrm {vec} \mathbf {F}(\mathbf {Z})} {\partial (\mathrm {vec} \mathrm {Im} \mathbf {Z})^T}=\mathbf {O}$ holds.

The complex matrix function F(Z, Z ^∗) is obviously a holomorphic function, and its matrix differential is

$\displaystyle \begin{aligned} \mathrm{d}\,\mathrm{vec} \mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)=\frac{\partial\,\mathrm{vec} \mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)}{\partial (\mathrm{vec} \mathbf{Z})^T}\mathrm{d}\,\mathrm{vec} \mathbf{Z}+\frac{\partial\,\mathrm{vec} \mathbf{F}(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial (\mathrm{vec} {\mathbf{Z}}^*)^T}\mathrm{d}\,\mathrm{vec} {\mathbf{Z}}^*. \end{aligned}$

(2.3.13)

The complex matrix differential $\mathrm {d}\,\mathbf {Z}=[\mathrm {d}Z_{ij}]_{i=1,j=1}^{m,n}$ has the following properties [1]:

1.
Transpose: dZ ^T = d(Z ^T) = (dZ)^T.
2.
Hermitian transpose: dZ ^H = d(Z ^H) = (dZ)^H.
3.
Conjugate: dZ ^∗ = d(Z ^∗) = (dZ)^∗.
4.
Linearity (additive rule): d(Y + Z) = dY + dZ.
5.
Chain rule: If F is a function of Y, while Y is a function of Z, then
$\displaystyle \begin{aligned} \mathrm{d}\,\mathrm{vec} \mathbf{F}=\frac{\partial\,\mathrm{vec} \mathbf{F}}{\partial (\mathrm{vec} \mathbf{Y})^T}\mathrm{d}\,\mathrm{vec} \mathbf{Y}=\frac{\partial\,\mathrm{vec} \mathbf{F}}{\partial (\mathrm{vec} \mathbf{Y})^T}\frac{\partial\,\mathrm{vec} \mathbf{Y}}{\partial (\mathrm{vec} \mathbf{Z})^T}\mathrm{d} \,\mathrm{vec} \mathbf{Z}, \end{aligned}$

where $\frac {\partial \,\mathrm {vec} \mathbf {F}}{\partial (\mathrm {vec} \mathbf {Y})^T}$ and $\frac {\partial \,\mathrm {vec} \mathbf {Y}}{\partial (\mathrm {vec}\,\mathbf {Z})^T}$ are the normal complex partial derivative and the generalized complex partial derivative, respectively.
6.
Multiplication rule:
$\displaystyle \begin{aligned} \mathrm{d}(\mathbf{UV})&=(\mathrm{d}\mathbf{U})\mathbf{V}+\mathbf{U}(\mathrm{d}\mathbf{V})\\ \mathrm{d} \mathrm{vec}(\mathbf{UV})&=({\mathbf{V}}^T\otimes \mathbf{I}) \mathrm{d} \mathrm{vec} \mathbf{U}+(\mathbf{I}\otimes \mathbf{U}) \mathrm{d} \mathrm{vec} \mathbf{V}. \end{aligned}$
7.
Kronecker product: d(Y ⊗Z) = dY ⊗Z + Y ⊗dZ.
8.
Hadamard product: d(Y ⊙Z) = dY ⊙Z + Y ⊙dZ.

Let us consider the relationship between the complex matrix differential and the complex partial derivative.

First, the complex differential rule for scalar variables,

$\displaystyle \begin{aligned} \mathrm{d}f(z,z^*)=\frac{\partial f(z,z^*)}{\partial z}\,\mathrm{d} z+\frac{\partial f(z,z^* )}{\partial z^*} \,\mathrm{d} z^* \end{aligned}$

(2.3.14)

is easily extended to a complex differential rule for the multivariate real scalar function $f(\cdot )=f((z_1^{\,} , z_1^*),\ldots ,(z_m^{\,} , z_m^*))$ :

$\displaystyle \begin{aligned} \mathrm{d}f(\cdot )&=\frac{\partial f(\cdot )}{\partial z_1^{\,}}\,\mathrm{d} z_1^{\,} +\cdots +\frac{\partial f(\cdot )} {\partial z_m^{\,}}\,\mathrm{d} z_m^{\,} +\frac{\partial f(\cdot )}{\partial z_1^*}\,\mathrm{d} z_1^*+\cdots + \frac{\partial f(\cdot )}{\partial z_m^*}\,\mathrm{d} z_m^*\\ &=\frac {\partial f(\cdot )}{\partial\,{\mathbf{z}}^T}\,\mathrm{d} \mathbf{z} +\frac {\partial f(\cdot )}{\partial\,{\mathbf{z}}^H} \,\mathrm{d} {\mathbf{z}}^*. \end{aligned}$

(2.3.15)

Here, $\mathrm {d}\mathbf {z}=[\mathrm {d}z_1^{\,} ,\ldots ,\mathrm {d}z_m^{\,}]^T$ , $\mathrm {d}{\mathbf {z}}^*=[\mathrm {d}z_1^*,\ldots ,\mathrm {d}z_m^* ]^T$ . This complex differential rule is the basis of the complex matrix differential.

In particular, if f(⋅) = f(z, z ^∗), then

$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{z},{\mathbf{z}}^*)=\frac {\partial f(\mathbf{z},{\mathbf{z}}^*)}{\partial\,{\mathbf{z}}^T}\,\mathrm{d} \mathbf{z}+\frac{\partial f(\mathbf{z},{\mathbf{z}}^*)}{\partial\,{\mathbf{z}}^H}\,\mathrm{d} {\mathbf{z}}^* \end{aligned}$

$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{z},{\mathbf{z}}^*)= \mathrm{D}_{\mathbf{z}}^{\,} f(\mathbf{z},{\mathbf{z}}^*)\,\mathrm{d} \mathbf{z}+ \mathrm{D}_{{\mathbf{z}}^*}^{\,} f(\mathbf{z},{\mathbf{z}}^*)\,\mathrm{d} {\mathbf{z}}^*,{} \end{aligned}$

(2.3.16)

where

$\displaystyle \begin{aligned} \mathrm{D}_{\mathbf{z}}^{\,} f(\mathbf{z},{\mathbf{z}}^*)&=\left .\frac{\partial f(\mathbf{z},{\mathbf{z}}^*)}{\partial\,{\mathbf{z}}^T}\right |{}_{{\mathbf{z}}^* =\mathrm{const}}=\left [\frac {\partial f(\mathbf{z},{\mathbf{z}}^*)}{\partial z_1^{\,}}, \ldots ,\frac{\partial f(\mathbf{z},{\mathbf{z}}^*)} {\partial z_m^{\,}}\right ], \end{aligned}$

(2.3.17)

$\displaystyle \begin{aligned} \mathrm{D}_{{\mathbf{z}}^*}^{\,} f(\mathbf{z},{\mathbf{z}}^*)&=\left .\frac{\partial f(\mathbf{z},{\mathbf{z}}^*)}{\partial\,{\mathbf{z}}^H}\right |{}_{\mathbf{z} =\mathrm{const}}=\left [\frac {\partial f(\mathbf{z},{\mathbf{z}}^*)}{\partial z_1^*}, \ldots ,\frac {\partial f(\mathbf{z},{\mathbf{z}}^*)} {\partial z_m^*}\right ] \end{aligned}$

(2.3.18)

are, respectively, the cogradient vector and the conjugate cogradient vector of the real scalar function f(z, z ^∗), while

$\displaystyle \begin{aligned} \mathrm{D}_{\mathbf{z}}^{\,} =\frac{\partial}{\partial\,{\mathbf{z}}^T}\stackrel{\mathrm{def}}{=} \left [\frac {\partial} {\partial z_1^{\,}},\ldots ,\frac {\partial} {\partial z_m^{\,}}\right ],\quad \mathrm{D}_{{\mathbf{z}}^*}^{\,} =\frac{\partial}{\partial\,{\mathbf{z}}^H}\stackrel{\mathrm{def}}{=} \left [\frac {\partial} {\partial z_1^*},\ldots ,\frac {\partial} {\partial z_m^*} \right ] \end{aligned}$

(2.3.19)

are termed the cogradient operator and the conjugate cogradient operator of complex vector variable $\mathbf {z}\in \mathbb {C}^m$ , respectively.

For $\mathbf {z}=\mathbf {x}+\mathrm {j}\mathbf {y}=[z_1^{\,} ,\ldots ,z_m^{\,} ]^T\hskip -0.3mm\in \mathbb {C}^m$ with $\mathbf {x}=[x_1^{\,} ,\ldots , x_m^{\,} ]^T\hskip -0.3mm\in \mathbb {R}^m,\mathbf {y}=[y_1^{\,} ,\ldots ,y_m^{\,} ]^T\hskip -0.3mm \in \mathbb {R}^m$ , due to the independence between the real part $x_i^{\,}$ and the imaginary part $y_i^{\,}$ , if applying the complex partial derivative operators

$\displaystyle \begin{aligned} \mathrm{D}_{z_i^{\,}}^{\,} = \frac{\partial} {\partial z_i^{\,}}=\frac 12\left (\frac {\partial}{\partial x_i^{\,}}-\mathrm{j} \frac{\partial}{\partial y_i^{\,}} \right ),\quad \mathrm{D}_{z_i^*}^{\,} =\frac{\partial}{\partial z_i^*}=\frac 12 \left (\frac {\partial}{\partial x_i^{\,}}+\mathrm{j} \frac {\partial}{\partial y_i^{\,}} \right ), \end{aligned}$

(2.3.20)

to each element of the row vector ${\mathbf {z}}^T=[z_1^{\,} ,\ldots ,z_m^{\,} ]$ , then we obtain the following complex cogradient operator:

$\displaystyle \begin{aligned} \mathrm{D}_{\mathbf{z}}^{\,} =\frac{\partial}{\partial\,{\mathbf{z}}^T}=\frac 12\left (\frac{\partial}{\partial\,{\mathbf{x}}^T}- \mathrm{j}\frac{\partial} {\partial\,{\mathbf{y}}^T} \right ) \end{aligned}$

(2.3.21)

and the complex conjugate cogradient operator

$\displaystyle \begin{aligned} \mathrm{D}_{{\mathbf{z}}^*}^{\,} =\frac{\partial}{\partial\,{\mathbf{z}}^H}=\frac 12 \left (\frac{\partial}{\partial\,{\mathbf{x}}^T} +\mathrm{j}\frac{\partial}{\partial\,{\mathbf{y}}^T}\right ). \end{aligned}$

(2.3.22)

Similarly, the complex gradient operator and the complex conjugate gradient operator are, respectively, defined as

$\displaystyle \begin{aligned} \nabla_{\mathbf{z}}^{\,} =\frac{\partial} {\partial\,\mathbf{z}}\stackrel{\mathrm{def}}{=} \left [\frac {\partial}{\partial z_1^{\,}},\ldots ,\frac {\partial}{\partial z_m^{\,}}\right ]^T,\quad \nabla_{{\mathbf{z}}^*}^{\,} =\frac{\partial} {\partial\,{\mathbf{z}}^*}\stackrel{\mathrm{def}}{=} \left [\frac {\partial} {\partial z_1^*},\ldots ,\frac {\partial} {\partial z_m^*} \right ]^T. \end{aligned}$

(2.3.23)

Hence, the complex gradient vector and the complex conjugate gradient vector of the real scalar function f(z, z ^∗) are, respectively, defined as

$\displaystyle \begin{aligned} \nabla_{\mathbf{z}}^{\,} f(\mathbf{z},{\mathbf{z}}^*)&=\left .\frac{\partial f(\mathbf{z},{\mathbf{z}}^*)}{\partial\,\mathbf{z}} \right |{}_{{\mathbf{z}}^*=\,\text{const vector}}=(\mathrm{D}_{\mathbf{z}}^{\,} f(\mathbf{z},{\mathbf{z}}^*))^T, \end{aligned}$

(2.3.24)

$\displaystyle \begin{aligned} \nabla_{{\mathbf{z}}^*}^{\,} f(\mathbf{z},{\mathbf{z}}^*)&=\left .\frac{\partial f(\mathbf{z},{\mathbf{z}}^*)}{\partial\,{\mathbf{z}}^*} \right |{}_{\mathbf{z}\,=\,\text{const vector}}=(\mathrm{D}_{{\mathbf{z}}^*}^{\,} f(\mathbf{z},{\mathbf{z}}^*))^T. \end{aligned}$

(2.3.25)

On the other hand, by mimicking the complex partial derivative operator to each element of the complex vector $\mathbf {z}=[z_1^{\,} ,\ldots , z_m^{\,} ]^T$ , the complex gradient operator and the complex conjugate gradient operator are obtained as follows:

$\displaystyle \begin{aligned} \nabla_{\mathbf{z}}^{\,} =\frac{\partial} {\partial\,\mathbf{z}}=\frac 12\left (\frac {\partial} {\partial\,\mathbf{x}}-\mathrm{j} \frac{\partial} {\partial\,\mathbf{y}} \right ),\quad \nabla_{{\mathbf{z}}^*}^{\,} =\frac{\partial} {\partial\,{\mathbf{z}}^*}=\frac 12 \left (\frac{\partial} {\partial\,\mathbf{x}}+\mathrm{j} \frac{\partial}{\partial\,\mathbf{y}}\right ). \end{aligned}$

(2.3.26)

By the above definitions, it is easy to obtain

$\displaystyle \begin{aligned} \frac{\partial {\mathbf{z}}^T}{\partial\mathbf{z}}=\frac{\partial {\mathbf{x}}^T}{\partial \mathbf{z}}+\mathrm{j}\frac{\partial {\mathbf{y}}^T}{\partial\mathbf{z}} =\frac 12\left (\frac{\partial{\mathbf{x}}^T}{\partial \mathbf{x}}-\mathrm{j}\frac{\partial{\mathbf{x}}^T}{\partial\mathbf{y}}\right )+\mathrm{j}\frac 12 \left ( \frac{\partial{\mathbf{y}}^T}{\partial\mathbf{x}}-\mathrm{j}\frac{\partial{\mathbf{y}}^T}{\partial\mathbf{y}}\right )={\mathbf{I}}_{m\times m}, \end{aligned}$

and

$\displaystyle \begin{aligned} \frac{\partial{\mathbf{z}}^T}{\partial{\mathbf{z}}^*}=\frac{\partial{\mathbf{x}}^T}{\partial{\mathbf{z}}^*}+\mathrm{j}\frac{\partial{\mathbf{y}}^T}{\partial{\mathbf{z}}^*}= \frac 12\left (\frac{\partial{\mathbf{x}}^T}{\partial\mathbf{x}}+\mathrm{j}\frac{\partial{\mathbf{x}}^T}{\partial\mathbf{y}}\right )+\mathrm{j}\frac 12\left ( \frac{\partial{\mathbf{y}}^T}{\partial\mathbf{x}}+\mathrm{j}\frac{\partial{\mathbf{y}}^T}{\partial\mathbf{y}}\right )={\mathbf{O}}_{m\times m}. \end{aligned}$

Summarizing the above results and their conjugate, transpose and complex conjugate transpose, we have the following important results:

$\displaystyle \begin{aligned} \frac{\partial {\mathbf{z}}^T}{\partial \mathbf{z}}&=\mathbf{I},\quad \ \, \frac{\partial {\mathbf{z}}^H}{\partial {\mathbf{z}}^*}=\mathbf{I},\quad \ \,\frac{\partial \mathbf{z}}{\partial {\mathbf{z}}^T}=\mathbf{I},\quad \ \,\frac {\partial {\mathbf{z}}^*}{\partial {\mathbf{z}}^H}=\mathbf{I}, \end{aligned}$

(2.3.27)

$\displaystyle \begin{aligned} \frac{\partial {\mathbf{z}}^T}{\partial {\mathbf{z}}^*}&=\mathbf{O},\quad \frac{\partial {\mathbf{z}}^H}{\partial \mathbf{z}}=\mathbf{O},\quad \frac{\partial \mathbf{z}}{\partial {\mathbf{z}}^H}=\mathbf{O},\quad \frac {\partial {\mathbf{z}}^*}{\partial {\mathbf{z}}^T}=\mathbf{O}. \end{aligned}$

(2.3.28)

The above results show that the complex vector variable z and its complex conjugate vector variable z ^∗ can be viewed as two independent variables. This important fact is not surprising because the angle between z and z ^∗ is π∕2, i.e., they are orthogonal to each other. Hence, we can summarize the rules for using the cogradient operator and gradient operator below.

When using the complex cogradient operator ∂∕∂z ^T or the complex gradient operator ∂∕∂z, the complex conjugate vector variable z ^∗ can be handled as a constant vector.
When using the complex conjugate cogradient operator ∂∕∂z ^H or the complex conjugate gradient operator ∂∕∂z ^∗, the vector variable z can be handled as a constant vector.

Now, we consider the real scalar function f(Z, Z ^∗) with variables $\mathbf {Z},\,{\mathbf {Z}}^*\in \mathbb {C}^{m\times n}$ . Performing the vectorization of Z and Z ^∗, respectively, from Eq. (2.3.16) we get the first-order complex differential rule for the real scalar function f(Z, Z ^∗):

$\displaystyle \begin{aligned} \mathrm{d} f(\mathbf{Z},{\mathbf{Z}}^*)&=\frac{\partial f(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial (\mathrm{vec} \mathbf{Z})^T}\mathrm{d} \mathrm{vec} \mathbf{Z}+\frac{\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial (\mathrm{vec} {\mathbf{Z}}^*)^T}\mathrm{d} \mathrm{vec} {\mathbf{Z}}^*\\ &=\frac{\partial f(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial (\mathrm{vec} \mathbf{Z})^T}\mathrm{d} \mathrm{vec} \mathbf{Z}+\frac{\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial (\mathrm{vec} {\mathbf{Z}}^*)^T}\mathrm{d} \mathrm{vec} {\mathbf{Z}}^*, {} \end{aligned}$

(2.3.29)

where

$\displaystyle \begin{aligned} \frac{\partial f(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial (\mathrm{vec}\,\mathbf{Z})^T}=\left [\frac{\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial Z_{11}},\ldots ,\frac{\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial Z_{m1}},\ldots ,\frac {\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial Z_{1n}},\ldots ,\frac{\partial f(\mathbf{Z},{\mathbf{Z}}^*)} {\partial Z_{mn}}\right ],\\ \frac{\partial f(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial (\mathrm{vec}\,{\mathbf{Z}}^*)^T}=\left [\frac{\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial Z_{11}^*},\ldots ,\frac{\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial Z_{m1}^*},\ldots ,\frac {\partial f(\mathbf{Z},{\mathbf{Z}}^*)} {\partial Z_{1n}^*},\ldots ,\frac{\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial Z_{mn}^*}\right ]. \end{aligned}$

We define the complex cogradient vector and the complex conjugate cogradient vector as

$\displaystyle \begin{aligned} \mathrm{D}_{\mathrm{vec}\,\mathbf{Z}} f(\mathbf{Z},{\mathbf{Z}}^*)=\frac{\partial f(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial (\mathrm{vec}\,\mathbf{Z})^T},\quad \mathrm{D}_{\mathrm{vec}\,{\mathbf{Z}}^*} f(\mathbf{Z},{\mathbf{Z}}^*)=\frac{\partial f(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial (\mathrm{vec}\,{\mathbf{Z}}^*)^T}, \end{aligned}$

(2.3.30)

and the complex gradient vector and the complex conjugate gradient vector of the function f(Z, Z ^∗) as

$\displaystyle \begin{aligned} \nabla_{\mathrm{vec}\,\mathbf{Z}} f(\mathbf{Z},{\mathbf{Z}}^* )=\frac{\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial\,\mathrm{vec}\,\mathbf{Z}},\quad \nabla_{\mathrm{vec}\,{\mathbf{Z}}^*} f(\mathbf{Z},{\mathbf{Z}}^*)=\frac{\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial\,\mathrm{vec}\,{\mathbf{Z}}^*}. \end{aligned}$

(2.3.31)

The conjugate gradient vector $\nabla _{\mathrm {vec}\,{\mathbf {Z}}^*} f(\mathbf {Z},{\mathbf {Z}}^*)$ has the following properties [1]:

1.
The conjugate gradient vector of the function f(Z, Z ^∗) at an extreme point is equal to the zero vector, i.e., $\nabla _{\mathrm {vec}\,{\mathbf {Z}}^*} f(\mathbf {Z},{\mathbf {Z}}^*)=\mathbf {0}$ .
2.
The conjugate gradient vector $\nabla _{\mathrm {vec}\,{\mathbf {Z}}^*} f(\mathbf {Z},{\mathbf {Z}}^*)$ and the negative conjugate gradient vector $-\nabla _{\mathrm {vec}\,{\mathbf {Z}}^*} f(\mathbf {Z},{\mathbf {Z}}^*)$ point in the direction of the steepest ascent and steepest descent of the function f(Z, Z ^∗), respectively.
3.
The step length of the steepest increase slope is $\|\nabla _{\mathrm {vec}\,{\mathbf {Z}}^*} f(\mathbf {Z},{\mathbf {Z}}^*)\|{ }_2^{\,}$ .
4.
The conjugate gradient vector $\nabla _{\mathrm {vec}\,{\mathbf {Z}}^*} f(\mathbf {Z},{\mathbf {Z}}^*)$ and the negative conjugate gradient vector $-\nabla _{\mathrm {vec}\,{\mathbf {Z}}^*} f(\mathbf {Z},{\mathbf {Z}}^*)$ can be used separately as update in gradient ascent algorithms and gradient descent algorithms.

Furthermore, the complex Jacobian matrix and the complex conjugate Jacobian matrix of the real scalar function f(Z, Z ^∗) are, respectively, as follows:

../images/492994_1_En_2_Chapter/492994_1_En_2_Equ95_HTML.png

(2.3.32)

../images/492994_1_En_2_Chapter/492994_1_En_2_Equ96_HTML.png

(2.3.33)

Similarly, the complex gradient matrix and the complex conjugate gradient matrix of the real scalar function f(Z, Z ^∗) are, respectively, given by

../images/492994_1_En_2_Chapter/492994_1_En_2_Equ97_HTML.png

(2.3.34)

and

../images/492994_1_En_2_Chapter/492994_1_En_2_Equ98_HTML.png

(2.3.35)

Summarizing the definitions above, there are the following relations among the various complex partial derivatives of the real scalar function f(Z, Z):

The conjugate gradient (or cogradient) vector is equal to the complex conjugate of the gradient (or cogradient) vector; and the conjugate Jacobian (or gradient) matrix is equal to the complex conjugate of the Jacobian (or gradient) matrix.
The gradient (or conjugate gradient) vector is equal to the transpose of the cogradient (or conjugate cogradient) vector, namely
$\displaystyle \begin{aligned} \nabla_{\mathrm{vec}\,\mathbf{Z}}^{\,} f(\mathbf{Z},{\mathbf{Z}}^*)=\mathrm{D}_{\mathrm{vec}\,\mathbf{Z}}^Tf(\mathbf{Z},{\mathbf{Z}}^*),\quad \nabla_{\mathrm{vec}\,{\mathbf{Z}}^*}^{\,} f(\mathbf{Z},{\mathbf{Z}}^*)=\mathrm{D}_{\mathrm{vec}\,{\mathbf{Z}}^*}^Tf(\mathbf{Z},{\mathbf{Z}}^*). \end{aligned}$

(2.3.36)
The cogradient (or conjugate cogradient) vector is equal to the transpose of the vectorization of Jacobian (or conjugate Jacobian) matrix:
$\displaystyle \begin{aligned} \mathrm{D}_{\mathrm{vec}\,\mathbf{Z}}^{\,} f(\mathbf{Z},\mathbf{Z})=\left (\mathrm{vec}\,{\mathbf{J}}_{\mathbf{Z}}^{\,}\right )^T,\quad \mathrm{D}_{\mathrm{vec}\,{\mathbf{Z}}^*}^{\,} f(\mathbf{Z},\mathbf{Z}) =\left (\mathrm{vec}\,{\mathbf{J}}_{{\mathbf{Z}}^*}^{\,}\right )^T. \end{aligned}$

(2.3.37)
The gradient (or conjugate gradient) matrix is equal to the transpose of the Jacobian (or conjugate Jacobian) matrix:
$\displaystyle \begin{aligned} \nabla_{\mathbf{Z}}^{\,} f(\mathbf{Z},{\mathbf{Z}}^*)={\mathbf{J}}_{\mathbf{Z}}^T\quad \text{and}\quad \nabla_{{\mathbf{Z}}^*}^{\,} f(\mathbf{Z},{\mathbf{Z}}^*)={\mathbf{J}}_{{\mathbf{Z}}^*}^T. \end{aligned}$

(2.3.38)

Here are the rules of operation for the complex gradient.

1.
If f(Z, Z ^∗) = c (a constant), then its gradient matrix and conjugate gradient matrix are equal to the zero matrix, namely ∂c∕∂ Z = O and ∂c∕∂ Z ^∗ = O.
2.
Linear rule: If f(Z, Z ^∗) and g(Z, Z ^∗) are scalar functions, and $c_1^{\,}$ and $c_2^{\,}$ are complex numbers, then
$\displaystyle \begin{aligned} \frac{\partial (c_1^{\,} f(\mathbf{Z},{\mathbf{Z}}^*)+c_2^{\,} g(\mathbf{Z},{\mathbf{Z}}^*))}{\partial\,{\mathbf{Z}}^*}=c_1^{\,} \frac {\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial\,{\mathbf{Z}}^*}+c_2^{\,}\frac{\partial g(\mathbf{Z},{\mathbf{Z}}^*)}{\partial\,{\mathbf{Z}}^*}. \end{aligned}$
3.
Multiplication rule:
$\displaystyle \begin{aligned} \frac {\partial f(\mathbf{Z},{\mathbf{Z}}^*)g(\mathbf{Z},{\mathbf{Z}}^*)}{\partial{\mathbf{Z}}^*}=g(\mathbf{Z},{\mathbf{Z}}^*)\frac{\partial f(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial\,{\mathbf{Z}}^*}+f(\mathbf{Z},{\mathbf{Z}}^*)\frac{\partial g(\mathbf{Z},{\mathbf{Z}}^*)}{\partial\,{\mathbf{Z}}^*}. \end{aligned}$
4.
Quotient rule: If g(Z, Z ^∗) ≠ 0, then
$\displaystyle \begin{aligned} \frac{\partial f/g}{\partial\,{\mathbf{Z}}^*} = \frac 1{g^2 (\mathbf{Z},{\mathbf{Z}}^*)}\left [ g(\mathbf{Z},{\mathbf{Z}}^*) \frac{\partial f(\mathbf{Z},{\mathbf{Z}}^*)}{\partial\,{\mathbf{Z}}^*}-f(\mathbf{Z},{\mathbf{Z}}^*)\frac{\partial g( \mathbf{Z},{\mathbf{Z}}^*)}{\partial\,{\mathbf{Z}}^*}\right ]. \end{aligned}$

If h(Z, Z ^∗) = g(F(Z, Z ^∗), F ^∗(Z, Z ^∗)), then the quotient rule becomes

$\displaystyle \begin{aligned} \frac{\partial h(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial\,\mathrm{vec}\,\mathbf{Z}}=&\,\frac{\partial g(\mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*),{\mathbf{F}}^* (\mathbf{Z},{\mathbf{Z}}^*))}{\partial \left (\mathrm{vec}\,\mathbf{F}(\mathbf{Z}, {\mathbf{Z}}^*)\right )^T}\frac{\partial \left (\mathrm{vec}\, \mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)\right )^T}{\partial\,\mathrm{vec}\,\mathbf{Z}}\\ &\,+\frac{\partial g(\mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*),{\mathbf{F}}^*(\mathbf{Z},{\mathbf{Z}}^*))}{\partial\left (\mathrm{vec}\,{\mathbf{F}}^*(\mathbf{Z}, {\mathbf{Z}}^*)\right )^T}\frac{\partial\left (\mathrm{vec}\,{\mathbf{F}}^*(\mathbf{Z},{\mathbf{Z}}^*)\right )^T}{\partial\,\mathrm{vec}\,\mathbf{Z}}, \end{aligned}$

(2.3.39)

and

$\displaystyle \begin{aligned} \frac{\partial h(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial\,\mathrm{vec}\,{\mathbf{Z}}^*}=&\,\frac{\partial g(\mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*), {\mathbf{F}}^* (\mathbf{Z},{\mathbf{Z}}^*))}{\partial \left (\mathrm{vec}\,\mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)\right )^T}\frac{\partial\left (\mathrm{vec} \mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)\right )^T}{\partial\,\mathrm{vec}\,{\mathbf{Z}}^*}\\ &\,+\frac{\partial g(\mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*),{\mathbf{F}}^*(\mathbf{Z},{\mathbf{Z}}^*))}{\partial\left (\mathrm{vec} {\mathbf{F}}^*(\mathbf{Z}, {\mathbf{Z}}^*)\right )^T}\frac{\partial\left (\mathrm{vec}\,{\mathbf{F}}^*(\mathbf{Z},{\mathbf{Z}}^*)\right )^T}{\partial\,\mathrm{vec}\,{\mathbf{Z}}^*}. \end{aligned}$

(2.3.40)

2.3.3 Complex Gradient Matrix Identification

If letting $\mathbf {A}=\mathrm {D}_{\mathbf {Z}}^{\,} f(\mathbf {Z}, {\mathbf {Z}}^*)$ and $\mathbf {B}=\mathrm {D}_{{\mathbf {Z}}^*}^{\,} f(\mathbf {Z}, {\mathbf {Z}}^*)$ , then

$\displaystyle \begin{aligned} \frac{\partial f(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial (\mathrm{vec}\,\mathbf{Z})^T}&=\mathrm{rvec} \mathrm{D}_{\mathbf{Z}}^{\,} f(\mathbf{Z}, {\mathbf{Z}}^*)=\mathrm{rvec} \mathbf{A}=(\mathrm{vec}({\mathbf{A}}^T))^T, \end{aligned}$

(2.3.41)

$\displaystyle \begin{aligned} \frac{\partial f(\mathbf{Z}, {\mathbf{Z}}^*)}{\partial (\mathrm{vec}\,\mathbf{Z})^H}&=\mathrm{rvec} \mathrm{D}_{{\mathbf{Z}}^*}^{\,} f(\mathbf{Z}, {\mathbf{Z}}^*)=\mathrm{rvec} \mathbf{B}=(\mathrm{vec}({\mathbf{B}}^T))^T. \end{aligned}$

(2.3.42)

Hence, Eq. (2.3.29) can be rewritten as

$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{Z},{\mathbf{Z}}^*)=(\mathrm{vec}({\mathbf{A}}^T))^T\mathrm{d} \mathrm{vec}\,\mathbf{Z}+(\mathrm{vec}({\mathbf{B}}^T))^T\mathrm{d} \mathrm{vec} {\mathbf{Z}}^*.{} \end{aligned}$

(2.3.43)

Using tr(C ^TD) = (vec C)^TvecD, Eq. (2.3.43) can be written as

$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{Z},{\mathbf{Z}}^*)=\mathrm{tr}(\mathbf{A} \mathrm{d} \mathbf{Z}+\mathbf{B} \mathrm{d} {\mathbf{Z}}^*).{} \end{aligned}$

(2.3.44)

Proposition 2.2

Given a scalar function $f(\mathbf {Z},{\mathbf {Z}}^*):\mathbb {C}^{m\times n}\times \mathbb {C}^{m\times n} \to \mathbb {C}$ , its complex Jacobian and gradient matrices can be, respectively, identified by

$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{Z},{\mathbf{Z}}^*)&=\mathrm{tr}(\mathbf{A} \mathrm{d}\mathbf{Z}+ \mathbf{B} \mathrm{d}{\mathbf{Z}}^*)\quad \Leftrightarrow\quad \bigg \{\begin{aligned} &{\mathbf{J}}_{\mathbf{Z}}^{\,}=\mathbf{A},\\ &{\mathbf{J}}_{{\mathbf{Z}}^*}^{\,} =\mathbf{B};\end{aligned}{} \end{aligned}$

(2.3.45)

$\displaystyle \begin{aligned} \mathrm{d}f(\mathbf{Z},{\mathbf{Z}}^*)&=\mathrm{tr}(\mathbf{A} \mathrm{d}\mathbf{Z}+ \mathbf{B} \mathrm{d}{\mathbf{Z}}^*)\quad \Leftrightarrow\quad \left\{\begin{aligned} &\nabla_{\mathbf{Z}}^{\,} f(\mathbf{Z},{\mathbf{Z}}^*)={\mathbf{A}}^T,\\ &\nabla_{{\mathbf{Z}}^*}^{\,} f(\mathbf{Z},{\mathbf{Z}}^*)={\mathbf{B}}^T.\end{aligned} \right.{} \end{aligned}$

(2.3.46)

That is to say, the complex gradient matrix and the complex conjugate gradient matrix are identified as the transposes of the matricesAandB, respectively.

Table 2.8 lists the complex gradient matrices of several trace functions.

Table 2.8

Complex gradient matrices of trace functions

f(Z, Z ^∗)	df	∂f∕∂Z	∂f∕∂ Z ^∗
tr(AZ)	tr(AdZ)	A ^T	O
tr(AZ ^H)	tr(A ^TdZ ^∗)	O	A
tr(ZAZ ^TB)	tr((AZ ^TB + A ^TZ ^TB ^T)dZ)	B ^TZA ^T + BZA	O
tr(ZAZB)	tr((AZB + BZA)dZ)	(AZB + BZA)^T	O
tr(ZAZ ^∗B)	tr(AZ ^∗BddZ + BZAddZ ^∗)	B ^TZ ^HA ^T	A ^TZ ^TB ^T
tr(ZAZ ^HB)	tr(AZ ^HBdZ + A ^TZ ^TB ^TdZ ^∗)	B ^TZ ^∗A ^T	BZA
tr(AZ ⁻¹)	−tr(Z ⁻¹AZ ⁻¹dZ)	−Z ^−TA ^TZ ^−T	O
tr(Z ^k)	k tr(Z ^k−1dZ)	k (Z ^T)^k−1	O

Table 2.9 lists the complex gradient matrices of several determinant functions.

Table 2.9

Complex gradient matrices of determinant functions

f(Z, Z ^∗)	df	∂f∕∂Z	∂f∕∂ Z ^∗
\|Z\|	\|Z\|tr(Z ⁻¹dZ)	\|Z\|Z ^−T	O
\|ZZ ^T\|	2\|ZZ ^T\|tr(Z ^T(ZZ ^T)⁻¹dZ)	2\|ZZ ^T\|(ZZ ^T)⁻¹Z	O
\|Z ^TZ\|	2\|Z ^TZ\|tr((Z ^TZ)⁻¹Z ^TdZ)	2\|Z ^TZ\|Z(Z ^TZ)⁻¹	O
\|ZZ ^∗\|	\|ZZ ^∗\|tr(Z ^∗(ZZ ^∗)⁻¹dZ + (ZZ ^∗)⁻¹ZdZ ^∗)	\|ZZ ^∗\|(Z ^HZ ^T)⁻¹Z ^H	\|ZZ ^∗\|Z ^T(Z ^HZ ^T)⁻¹
\|Z ^∗Z\|	\|Z ^∗Z\|tr((Z ^∗Z)⁻¹Z ^∗dZ + Z(Z ^∗Z)⁻¹dZ ^∗)	\|Z ^∗Z\|Z ^H(Z ^TZ ^H)⁻¹	\|Z ^∗Z\|(Z ^TZ ^H)⁻¹Z ^T
\|ZZ ^H\|	\|ZZ ^H\|tr(Z ^H(ZZ ^H)⁻¹dZ + Z ^T(Z ^∗Z ^T)⁻¹dZ ^∗)	\|ZZ ^H\|(Z ^∗Z ^T)⁻¹Z ^∗	\|ZZ ^H\|(ZZ ^H)⁻¹Z
\|Z ^HZ\|	\|Z ^HZ\|tr((Z ^HZ)⁻¹Z ^HdZ + (Z ^TZ ^∗)⁻¹Z ^TdZ ^∗)	\|Z ^HZ\|Z ^∗(Z ^TZ ^∗)⁻¹	\|Z ^HZ\|Z(Z ^HZ)⁻¹
\|Z ^k\|	k\|Z\|^ktr(Z ⁻¹dZ)	k\|Z\|^kZ ^−T	O

If $\mathbf {f}(\mathbf {z},{\mathbf {z}}^*)=[f_1^{\,} (\mathbf {z},{\mathbf {z}}^*),\ldots ,f_n^{\,} (\mathbf {z},{\mathbf {z}}^*)]^T$ is an n × 1 complex vector function with m × 1 complex vector variable, then

../images/492994_1_En_2_Chapter/492994_1_En_2_Equz_HTML.png

which can simply be written as

$\displaystyle \begin{aligned} \mathrm{d}\mathbf{f}(\mathbf{z},{\mathbf{z}}^*)={\mathbf{J}}_{\mathbf{z}}^{\,} \mathrm{d}\mathbf{z}+{\mathbf{J}}_{{\mathbf{z}}^*}^{\,}\mathrm{d}{\mathbf{z}}^*,{}\end{aligned}$

(2.3.47)

where $\mathrm {d}\mathbf {f}(\mathbf {z},{\mathbf {z}}^*)=[\mathrm {d}f_1^{\,} (\mathbf {z},{\mathbf {z}}^*), \ldots ,\mathrm {d}f_n^{\,} (\mathbf {z},{\mathbf {z}}^*)]^T$ , while

../images/492994_1_En_2_Chapter/492994_1_En_2_Equ111_HTML.png

(2.3.48)

and

../images/492994_1_En_2_Chapter/492994_1_En_2_Equ112_HTML.png

(2.3.49)

are, respectively, the complex Jacobian matrix and the complex conjugate Jacobian matrix of the vector function f(z, z ^∗).

For a p × q matrix function F(Z, Z ^∗) with m × n complex matrix variable Z, if $\mathbf {F} ( \mathbf {Z},{\mathbf {Z}}^*) = [{\mathbf {f}}_1^{\,} (\mathbf {Z},{\mathbf {Z}}^*),\ldots ,{\mathbf {f}}_q^{\,} (\mathbf {Z},{\mathbf {Z}}^*)]$ , then $\mathrm {d}\mathbf {F}(\mathbf {Z},{\mathbf {Z}}^*) = [\mathrm {d}{\mathbf {f}}_1^{\,} (\mathbf {Z},{\mathbf {Z}}^*),\ldots ,\mathrm {d} {\mathbf {f}}_q^{\,} (\mathbf {Z},{\mathbf {Z}}^*)]$ , and (2.3.47) holds for the vector functions ${\mathbf {f}}_i^{\,} (\mathbf {Z},{\mathbf {Z}}^*),i=1,\ldots ,q$ . This implies that

../images/492994_1_En_2_Chapter/492994_1_En_2_Equ113_HTML.png

(2.3.50)

where $\mathrm {D}_{\mathrm {vec}\,\mathbf {Z}}^{\,} {\mathbf {f}}_i^{\,} (\mathbf {Z}, {\mathbf {Z}}^*)=\frac {\partial \,{\mathbf {f}}_i^{\,} (\mathbf {Z},{\mathbf {Z}}^* )}{\partial (\mathrm {vec} \mathbf {Z})^T}\in \mathbb {C}^{p\times mn}$ and $\mathrm {D}_{\mathrm {vec}\,{\mathbf {Z}}^*}^{\,}{\mathbf {f}}_i^{\,} (\mathbf {Z},{\mathbf {Z}}^*)=\frac {\partial \,{\mathbf {f}}_i^{\,} (\mathbf {Z},{\mathbf {Z}}^* )}{\partial (\mathrm {vec} {\mathbf {Z}}^*)^T}\in \mathbb {C}^{p\times mn}$ .

Equation (2.3.50) can be simply rewritten as

$\displaystyle \begin{aligned} \mathrm{d} \mathrm{vec} \mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)=\mathbf{A} \mathrm{d} \mathrm{vec} \mathbf{Z}+ \mathbf{B} \mathrm{d} \mathrm{vec} {\mathbf{Z}}^*~\in\mathbb{C}^{pq},{} \end{aligned}$

(2.3.51)

where

$\displaystyle \begin{aligned} \mathrm{d} \mathrm{vec} \mathbf{Z}&=[\mathrm{d}Z_{11},\ldots ,\mathrm{d} Z_{m1},\ldots ,\mathrm{d}Z_{1n},\ldots ,\mathrm{d}Z_{mn}]^T,\\ \mathrm{d} \mathrm{vec} {\mathbf{Z}}^*&=[\mathrm{d}Z_{11}^*,\ldots ,\mathrm{d}Z_{m1}^*, \ldots ,\mathrm{d}Z_{1n}^*,\ldots ,\mathrm{d}Z_{mn}^*]^T,\\ \mathrm{d} \mathrm{vec} \mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*))&=[\mathrm{d}f_{11}(\mathbf{Z},{\mathbf{Z}}^*),\ldots , \mathrm{d}f_{p1}(\mathbf{Z}, {\mathbf{Z}}^*), \ldots ,\mathrm{d}f_{1q}(\mathbf{Z}, {\mathbf{Z}}^*),\\ & \qquad \ldots ,\mathrm{d}f_{pq}(\mathbf{Z},{\mathbf{Z}}^*)]^T, \end{aligned}$

../images/492994_1_En_2_Chapter/492994_1_En_2_Equab_HTML.png

and

../images/492994_1_En_2_Chapter/492994_1_En_2_Equac_HTML.png

The complex gradient matrix and the complex conjugate gradient matrix of the matrix function F(Z, Z ^∗) are, respectively, defined as

$\displaystyle \begin{aligned} \nabla_{\mathrm{vec}\,\mathbf{Z}}^{\,}\mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)&=\frac{\partial (\mathrm{vec}\,\mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*))^T} {\partial\,\mathrm{vec}\,\mathbf{Z}}=({\mathbf{J}}_{\mathrm{vec}\,\mathbf{Z}}^{\,} )^T, \end{aligned}$

(2.3.52)

$\displaystyle \begin{aligned} \nabla_{\mathrm{vec}\,{\mathbf{Z}}^*}^{\,}\mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)&=\frac{\partial (\mathrm{vec}\,\mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*))^T} {\partial\,\mathrm{vec}\,{\mathbf{Z}}^*}=({\mathbf{J}}_{\mathrm{vec}\,{\mathbf{Z}}^*}^{\,} )^T. \end{aligned}$

(2.3.53)

Proposition 2.3

For a complex matrix function $\mathbf {F}(\mathbf {Z},{\mathbf {Z}}^*)\in \mathbb {C}^{p\times q}$ with $\mathbf {Z},{\mathbf {Z}}^* \in \mathbb {C}^{m\times n}$ , we have its complex Jacobian matrix and the complex gradient matrix:

$\displaystyle \begin{aligned} \mathrm{d} \mathrm{vec} \mathbf{F}( \mathbf{Z},{\mathbf{Z}}^*)&=\mathbf{A} \mathrm{d} \mathrm{vec}\,\mathbf{Z}+\mathbf{B} \mathrm{d} \mathrm{vec} {\mathbf{Z}}^*~\Leftrightarrow~\left\{ \begin{aligned} &{\mathbf{J}}_{\mathrm{vec}\mathbf{Z}}^{\,}=\mathbf{A},\\ &{\mathbf{J}}_{\mathrm{vec}{\mathbf{Z}}^*}^{\,} =\mathbf{B},\end{aligned}\right. \end{aligned}$

(2.3.54)

$\displaystyle \begin{aligned} \mathrm{d} \mathrm{vec} \mathbf{F}( \mathbf{Z},{\mathbf{Z}}^*)&=\mathbf{A} \mathrm{d} \mathrm{vec} \mathbf{Z}+ \mathbf{B} \mathrm{d} \mathrm{vec} {\mathbf{Z}}^*~\Leftrightarrow~ \left\{\begin{aligned} &\nabla_{\mathrm{vec} \mathbf{Z}}^{\,} \mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)={\mathbf{A}}^T,\\ &\nabla_{\mathrm{vec}{\mathbf{Z}}^*}^{\,} \mathbf{F}(\mathbf{Z},\mathbf{ Z}^*)={\mathbf{B}}^T.\end{aligned}\right. \end{aligned}$

(2.3.55)

If d(F(Z, Z ^∗)) = A(dZ)B + C(dZ ^∗)D, then the vectorization result is given by

$\displaystyle \begin{aligned} \mathrm{d}\mathrm{vec}\,\mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)=({\mathbf{B}}^T\otimes \mathbf{A})\mathrm{d}\mathrm{vec}\, \mathbf{Z}+({\mathbf{D}}^T\otimes \mathbf{ C})\mathrm{d} \mathrm{vec}{\mathbf{Z}}^*. \end{aligned}$

By Proposition 2.3 we have the following identification formula:

$\displaystyle \begin{aligned} \mathrm{d} \mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)=\mathbf{A} (\mathrm{d}\mathbf{Z})\mathbf{B}+\mathbf{C} (\mathrm{d}{\mathbf{Z}}^*)\mathbf{D}~\Leftrightarrow~\bigg \{\begin{aligned} &{\mathbf{J}}_{\mathrm{vec}\,\mathbf{Z}}^{\,} ={\mathbf{B}}^T\otimes \mathbf{A},\\ &{\mathbf{J}}_{\mathrm{vec}\,{\mathbf{Z}}^*}^{\,} ={\mathbf{D}}^T\otimes\mathbf{C}.\end{aligned} \end{aligned}$

(2.3.56)

Similarly, if dF(Z, Z ^∗) = A(dZ)^TB + C(dZ ^∗)^TD, then we have the result

$\displaystyle \begin{aligned} \mathrm{d} \mathrm{vec} \mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)&=({\mathbf{B}}^T\otimes \mathbf{A}) \mathrm{d} \mathrm{vec} {\mathbf{Z}}^T+({\mathbf{D}}^T\otimes \mathbf{C}) \mathrm{d} \mathrm{vec} {\mathbf{Z}}^H\\ &=({\mathbf{B}}^T\otimes \mathbf{A}){\mathbf{K}}_{mn}\mathrm{d}\mathrm{vec}\mathbf{Z}+({\mathbf{D}}^T\otimes \mathbf{C}){\mathbf{K}}_{mn} \mathrm{d} \mathrm{vec} {\mathbf{Z}}^*, \end{aligned}$

where we have used the vectorization property $\mathrm {vec} {\mathbf {X}}_{m\times n}^T={\mathbf {K}}_{mn}\mathrm {vec} \mathbf {X}$ . By Proposition 2.3, the following identification formula is obtained:

$\displaystyle \begin{aligned} \mathrm{d}\mathbf{F}(\mathbf{Z},{\mathbf{Z}}^*)=\mathbf{A}(\mathrm{d}\mathbf{Z})^T\mathbf{B}+\mathbf{C}(\mathrm{d}{\mathbf{Z}}^*)^T\mathbf{D}~\Leftrightarrow~\begin{cases} {\mathbf{J}}_{\mathrm{vec}\, \mathbf{Z}}^{\,} =({\mathbf{B}}^T\otimes \mathbf{A}){\mathbf{K}}_{mn},\\ {\mathbf{J}}_{\mathrm{vec}\,{\mathbf{Z}}^*}^{\,} =({\mathbf{D}}^T\otimes \mathbf{C})\mathbf{ K}_{mn}.\end{cases} \end{aligned}$

(2.3.57)

The above equation shows that, as in the vector case, the key to identifying the gradient matrix and conjugate gradient matrix of a matrix function F(Z, Z ^∗) is to write its matrix differential into the canonical form dF(Z, Z ^∗) = A(dZ)^TB + C(dZ ^∗)^TD.

Table 2.10 lists the corresponding relationships between the first-order complex matrix differential and the complex Jacobian matrix, where $\mathbf {z}\in \mathbb {C}^m,\mathbf {Z}\in \mathbb {C}^{m\times n},\mathbf {F}\in \mathbb {C}^{p\times q}$ .

Table 2.10

Complex matrix differential and complex Jacobian matrix

Function	Matrix differential	Jacobian matrix
f(z, z ^∗)	df(z, z ^∗) = adz + bdz ^∗	$\frac {\partial f}{\partial z}=a,\ \frac {\partial f}{\partial z^*}=b$
f(z, z ^∗)	df(z, z ^∗) = a ^Tdz + b ^Tdz ^∗	$\frac {\partial f} {\partial {\mathbf {z}}^T}={\mathbf {a}}^T,\ \frac {\partial f}{\partial {\mathbf {z}}^H}={\mathbf {b}}^T$
f(Z, Z ^∗)	df(Z, Z ^∗) = tr(AdZ + BdZ ^∗)	$\frac {\partial f}{\partial \mathbf {Z}{ }^T}=\mathbf {A},\ \frac {\partial f}{\partial {\mathbf {Z}}^H}=\mathbf {B}$
F(Z, Z ^∗)	d vecF = Ad vecZ + Bd vecZ ^∗	$\frac {\partial \mathrm {vec}\mathbf {F}}{\partial (\mathrm {vec}\mathbf {Z})^T}=\mathbf {A},\ \frac {\partial \mathrm {vec}\mathbf {F}}{\partial (\mathrm {vec}{\mathbf {Z}}^*)^T}=\mathbf {B}$
	dF = A(dZ)B + C(dZ ^∗)D	$\frac {\partial \mathrm {vec}\mathbf {F}}{\partial (\mathrm {vec}\mathbf {Z})^T}={\mathbf {B}}^T\otimes \mathbf {A},\ \frac {\partial \mathrm {vec}\mathbf {F}}{\partial (\mathrm {vec}{\mathbf {Z}}^*)^T}={\mathbf {D}}^T\otimes \mathbf {C}$
	dF = A(dZ)^TB + C(dZ ^∗)^TD	$\frac {\partial \mathrm {vec}\mathbf {F}}{\partial (\mathrm {vec}\mathbf {Z})^T} = ({\mathbf {B}}^T \otimes \mathbf {A}){\mathbf {K}}_{mn}, \frac {\partial \mathrm {vec}\mathbf {F}}{\partial \left (\mathrm {vec}{\mathbf {Z}}^*\right )^T} = ({\mathbf {D}}^T \otimes \mathbf {C}){\mathbf {K}}_{mn}$

Brief Summary of This Chapter

This chapter presents the matrix differential for real and complex matrix functions. The matrix differential is a powerful tool for finding the gradient vectors/matrices that are key for update in optimization, as will see in the next chapter.

References

1.
Brookes, M.: Matrix Reference Manual (2011). https://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html
2.
Flanigan, F.: Complex Variables: Harmonic and Analytic Functions, 2nd edn. Dover, New York (1983)zbMATH
3.
Frankel, T.: The Geometry of Physics: An Introduction (with Corrections and Additions). Cambridge University Press, Cambridge (2001)
4.
Kreyszig, E.: Advanced Engineering Mathematics, 7th edn. Wiley, New York (1993)zbMATH
5.
Lütkepohl, H.: Handbook of Matrices. Wiley, New York (1996)zbMATH
6.
Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics, rev. edn. Wiley, Chichester (1999)zbMATH
7.
Petersen, K.B., Petersen, M.S.: The Matrix Cookbook (2008). https://matrixcookbook.com
8.
Wirtinger, W.: Zur formalen theorie der funktionen von mehr komplexen $\text{ver}\ddot {a}\text{nderlichen}$ . Math. Ann. 97, 357–375 (1927)

f(X)	Differential df(X)	Jacobian matrix J = ∂f(X)∕∂X
\|X\|	\|X\| tr(X ⁻¹dX)	\|X\|X ⁻¹
$\log \|\mathbf {X}\|$	tr(X ⁻¹dX)	X ⁻¹
\|X ⁻¹\|	−\|X ⁻¹\| tr(X ⁻¹dX)	−\|X ⁻¹\|X ⁻¹
\|X ²\|	$2\|\mathbf {X}\|{ }^2\,\mathrm {tr} \left ({\mathbf {X}}^{-1} \mathrm {d} \mathbf {X}\right )$	2\|X\|²X ⁻¹
\|X ^k\|	k\|X\|^k tr(X ⁻¹dX)	k\|X\|^kX ⁻¹
\|XX ^T\|	$2\|\mathbf {XX}^T\|\, \mathrm {tr} \left ({\mathbf {X}}^T (\mathbf {XX}^T)^{-1}\mathrm {d} \mathbf {X} \right )$	2\|XX ^T\|X ^T(XX ^T)⁻¹
\|X ^TX\|	$2\|{\mathbf {X}}^T\mathbf {X}\|\,\mathrm {tr} \left ( ({\mathbf {X}}^T\mathbf {X})^{-1}{\mathbf {X}}^T\mathrm {d} \mathbf {X}\right )$	2\|X ^TX\|(X ^TX)⁻¹X ^T
$\log \|{\mathbf {X}}^T\mathbf {X}\|$	$2\mathrm {tr}\left (({\mathbf {X}}^T\mathbf {X})^{-1}{\mathbf {X}}^T\mathrm {d} \mathbf {X}\right )$	2(X ^TX)⁻¹X ^T
\|AXB\|	$\|\mathbf {AXB} \|\,\mathrm {tr} \left ( \mathbf {B}(\mathbf {AXB})^{-1}\mathbf {A}\mathrm {d} \mathbf {X}\right )$	\|AXB\|B(AXB)⁻¹A
\|XAX ^T\|	$\begin {array}{l}\|\mathbf {XAX}^T\|\,\mathrm {tr}\Big ( \big (\mathbf {AX}^T(\mathbf {XAX}^T)^{-1}\\ \quad \qquad +(\mathbf {XA})^T( \mathbf {XA}^T{\mathbf {X}}^T )^{-1}\big )\mathrm {d} \mathbf {X}\Big )\end {array}$	$\begin {array}{l}\|\mathbf {XAX}^T \|\Big (\mathbf {AX}^T(\mathbf {XAX}^T)^{-1}\\ \quad \qquad +(\mathbf {XA})^T( \mathbf {XA}^T{\mathbf {X}}^T)^{-1}\Big ) \end {array}$
\|X ^TAX\|	$\begin {array}{l}\|{\mathbf {X}}^T\mathbf {AX}\|\mathrm {tr}\Big (\big (({\mathbf {X}}^T\mathbf {AX})^{-T}(\mathbf {AX})^T\\ \quad \qquad +({\mathbf {X}}^T \mathbf {AX})^{-1}{\mathbf {X}}^T \mathbf {A}\big )\mathrm {d} \mathbf {X} \Big )\end {array}$	$\begin {array}{l}\|{\mathbf {X}}^T \mathbf {AX}\| \Big (({\mathbf {X}}^T\mathbf {AX})^{-T} (\mathbf {AX})^T\\ \quad \qquad +({\mathbf {X}}^T \mathbf {AX})^{-1}{\mathbf {X}}^T\mathbf {A}\Big )\end {array}$

f(Z, Z ^∗)	df	∂f∕∂Z	∂f∕∂ Z ^∗
\|Z\|	\|Z\|tr(Z ⁻¹dZ)	\|Z\|Z ^−T	O
\|ZZ ^T\|	2\|ZZ ^T\|tr(Z ^T(ZZ ^T)⁻¹dZ)	2\|ZZ ^T\|(ZZ ^T)⁻¹Z	O
\|Z ^TZ\|	2\|Z ^TZ\|tr((Z ^TZ)⁻¹Z ^TdZ)	2\|Z ^TZ\|Z(Z ^TZ)⁻¹	O
\|ZZ ^∗\|	\|ZZ ^∗\|tr(Z ^∗(ZZ ^∗)⁻¹dZ + (ZZ ^∗)⁻¹ZdZ ^∗)	\|ZZ ^∗\|(Z ^HZ ^T)⁻¹Z ^H	\|ZZ ^∗\|Z ^T(Z ^HZ ^T)⁻¹
\|Z ^∗Z\|	\|Z ^∗Z\|tr((Z ^∗Z)⁻¹Z ^∗dZ + Z(Z ^∗Z)⁻¹dZ ^∗)	\|Z ^∗Z\|Z ^H(Z ^TZ ^H)⁻¹	\|Z ^∗Z\|(Z ^TZ ^H)⁻¹Z ^T
\|ZZ ^H\|	\|ZZ ^H\|tr(Z ^H(ZZ ^H)⁻¹dZ + Z ^T(Z ^∗Z ^T)⁻¹dZ ^∗)	\|ZZ ^H\|(Z ^∗Z ^T)⁻¹Z ^∗	\|ZZ ^H\|(ZZ ^H)⁻¹Z
\|Z ^HZ\|	\|Z ^HZ\|tr((Z ^HZ)⁻¹Z ^HdZ + (Z ^TZ ^∗)⁻¹Z ^TdZ ^∗)	\|Z ^HZ\|Z ^∗(Z ^TZ ^∗)⁻¹	\|Z ^HZ\|Z(Z ^HZ)⁻¹
\|Z ^k\|	k\|Z\|^ktr(Z ⁻¹dZ)	k\|Z\|^kZ ^−T	O