© Springer Nature Switzerland AG 2019
Tho Le-Ngoc and Ruikai MaiHybrid Massive MIMO Precoding in Cloud-RANWireless Networkshttps://doi.org/10.1007/978-3-030-02158-0_2

2. Background

Tho Le-Ngoc1  and Ruikai Mai1
(1)
Department of Electrical and Computer Engineering, McGill University, Montréal, QC, Canada
 

From a signal processing point of view, the novel aspect of hybrid precoding/combining in comparison with the conventional fully digital precoding/combining lies in the introduction of the precoding/combining stage in the RF domain as a result of driving a large-scale antenna array by a limited number of RF chains. By treating the cascade of the RF stage and multiple-input multiple-output (MIMO) channel as the effective channel, the system model of massive MIMO is analogous to the conventional counterpart, where various solution techniques have been proposed for the single-user and multi-user scenarios. On the other hand, problem formulation and optimization of the RF component depend on the choice of the baseband component in a joint RF-baseband design. Therefore, before we embark on the study of hybrid precoding/combining design, we give a brief review of the related background and recent developments in this chapter, which serve as the basis for our proposed research in the subsequent chapters.

2.1 Linear Precoding and Combining for Point-to-Point MIMO

2.1.1 Baseband Digital Solutions for Conventional MIMO

Let us consider a point-to-point/single-user (SU) communication link, where a transmitter with N t transmit antennas sends 
$$N_{\mathrm {s}}=\min \left (N_{\mathrm {t}},N_{\mathrm {r}}\right )$$
data streams to a receiver with N r receive antennas. Specifically, the receive signal 
$$\mathbf {y}\in \mathbb {C}^{N_{\mathrm {r}}}$$
is given as

$$\displaystyle \begin{aligned} \mathbf{y}=\mathbf{H}\mathbf{x}+\mathbf{n}=\mathbf{H}\mathbf{F}\mathbf{s}+\mathbf{n}, \end{aligned}$$
where 
$$\mathbf {H}\in \mathbb {C}^{N_{\mathrm {r}}\times N_{\mathrm {t}}}$$
is the flat-fading MIMO channel matrix with (i, j)th entry representing the channel coefficient between the ith receive antenna and the jth transmit antenna, 
$$\mathbf {x}=\mathbf {F}\mathbf {s}\in \mathbb {C}^{N_{\mathrm {t}}}$$
is the linearly precoded transmit signal with 
$$\mathbf {F}\in \mathbb {C}^{N_{\mathrm {t}}\times N_{\mathrm {s}}}$$
as the linear precoder and 
$$\mathbf {s}\sim \mathcal {CN}(\mathbf {0},{\mathbf {I}}_{N_{\mathrm {s}}})$$
as the Gaussian-coded data streams, and 
$$\mathbf {n}\sim \mathcal {CN}(\mathbf {0},\sigma _{n}^{2}{\mathbf {I}}_{N_{\mathrm {r}}})$$
is zero-mean, spatially white Gaussian noise with variance 
$$\sigma _{n}^{2}$$
. At the output of a linear receive combiner 
$$\mathbf {W}\in \mathbb {C}^{N_{\mathrm {r}}\times N_{\mathrm {s}}}$$
, the signal estimate is given as

$$\displaystyle \begin{aligned} \hat{\mathbf{s}}={\mathbf{W}}^{\mathrm{H}}\mathbf{H}\mathbf{F}\mathbf{s}+{\mathbf{W}}^{\mathrm{H}}\mathbf{n}={\mathbf{W}}^{\mathrm{H}}\mathbf{U}\boldsymbol{\varLambda}{\mathbf{V}}^{\mathrm{H}}\mathbf{F}\mathbf{s}+{\mathbf{W}}^{\mathrm{H}}\mathbf{n}, \end{aligned}$$
where the second equality comes from the singular value decomposition (SVD) H = U Λ V H. From this relation, if the linear precoder F and linear combiner W are chosen as the N s right and left singular vectors corresponding to the N s dominant singular values of the channel H, respectively, the effective channel W H HF becomes diagonal. In other words, the Gaussian vector channel is converted into parallel non-interfering Gaussian scalar sub-channels. In combination with power allocation, it was shown in [1] that the channel capacity can be achieved. Equipped with this insight, joint design with respect to other criteria such as minimum mean square error (MMSE) and bit error rate (BER) minimization was also studied in [26]. In particular, with the definition of the mean square error (MSE) matrix as

$$\displaystyle \begin{aligned} \mathbf{E}=\mathbb{E}\big[\left(\hat{\mathbf{s}}-\mathbf{s}\right)\left(\hat{\mathbf{s}}-\mathbf{s}\right)^{\mathrm{H}}\big]={\mathbf{W}}^{\mathrm{H}}\mathbf{H}\mathbf{F}{\mathbf{F}}^{\mathrm{H}}{\mathbf{H}}^{\mathrm{H}}\mathbf{W}+\sigma_{n}^{2}{\mathbf{W}}^{\mathrm{H}}\mathbf{W}, \end{aligned}$$
where the ith diagonal element of E, i.e., E i,i, represents the MSE of the ith data stream s i, the authors in [4] recognized that the popular design criteria such as mutual information, sum MSE, minimum BER, are in fact either Schur-convex or Schur-concave functions of the MSEs, i.e., diag(E), for which the optimal linear precoder must be chosen in the presence of a linear MMSE equalizer such that the effective channel W H HF is diagonalized (up to a unitary rotation in the Schur-concave case). For both cases, the optimal linear precoder is derived from the right singular vectors of the MIMO channel H while the optimal linear MMSE combiner is the well-known linear Wiener filter, i.e.,

$$\displaystyle \begin{aligned} {\mathbf{W}}_{\mathrm{LMMSE}}=\left(\mathbf{H}\mathbf{F}{\mathbf{F}}^{\mathrm{H}}{\mathbf{H}}^{\mathrm{H}}+\sigma_{n}^{2}{\mathbf{I}}_{N_{\mathrm{r}}}\right)^{-1}\mathbf{H}\mathbf{F}. \end{aligned}$$
In doing this, matrix optimization problems are reduced to convex vector optimization problems, which can be efficiently solved by waterfilling-based algorithms.

2.1.2 Progressive Baseband Digital Solutions for Conventional MIMO ARQ

In order to improve link reliability, modern wireless communications systems are often equipped with hybrid automatic repeat request (ARQ) mechanisms in practice. As mentioned in the previous section, linear MIMO precoding and combining are a useful technique to improve the efficiency and reliability of the system by exploiting the inherent spatial diversity created by multiple antennas. On the other hand, the combination of ARQ with precoding design makes it possible to exploit diversity from the temporal dimension. Such an idea is illustrated in Fig. 2.1, where each branch represents one round of ARQ retransmission, and the combiner performs linear combination across all retransmissions. In particular, the receive signal for the mth round of ARQ retransmission 
$${\mathbf {y}}_{m}\in \mathbb {C}^{N_{\mathrm {r}}}$$
reads as

$$\displaystyle \begin{aligned} {\mathbf{y}}_{m}={\mathbf{H}}_{m}{\mathbf{F}}_{m}\mathbf{s}+{\mathbf{n}}_{m}, \end{aligned}$$
../images/470489_1_En_2_Chapter/470489_1_En_2_Fig1_HTML.png
Fig. 2.1

Linear precoding and combining for M retransmissions of a packet

where H m is the MIMO channel realization during the mth round of retransmission, F m is the linear precoder, and 
$${\mathbf {n}}_{m}\sim \mathcal {CN}(\mathbf {0},\sigma _{n}^{2}{\mathbf {I}}_{N_{\mathrm {r}}})$$
is the zero-mean, temporally and spatially white Gaussian noise with variance 
$$\sigma _{n}^{2}$$
, i.e., 
$$\mathbb {E}[{\mathbf {n}}_{m}{\mathbf {n}}_{l}^{\mathrm {H}}]=\delta _{ml}\sigma _{n}^{2}{\mathbf {I}}_{N_{\mathrm {r}}}$$
. When viewed independently, each round of ARQ retransmission is nothing but a conventional point-to-point MIMO system as presented in the previous section. Following the observation that each retransmission is likely to experience independent channel fading, the current channel realization H m is modeled as independent of the past realizations H 1, …, H m−1. If the linear precoder F m is designed in accordance with the current channel realization H m while taking into account past transmissions as represented by F 1, …, F m−1, this will allow us to leverage the temporal diversity and therefore further performance improvement can be expected. Towards this end, the objective function can be formulated as

$$\displaystyle \begin{aligned} \mathcal{I}\left(\mathbf{s};{\mathbf{y}}_{1},{\mathbf{y}}_{2},\ldots,{\mathbf{y}}_{m}\right)=\log_{2}\det\left({\mathbf{I}}_{N_{\mathrm{s}}}+\sum_{i=1}^{m-1}{\mathbf{F}}_{i}^{\mathrm{H}}{\mathbf{H}}_{i}^{\mathrm{H}}{\mathbf{H}}_{i}{\mathbf{F}}_{i}+{\mathbf{F}}_{m}^{\mathrm{H}}{\mathbf{H}}_{m}^{\mathrm{H}}{\mathbf{H}}_{m}{\mathbf{F}}_{m}\right). \end{aligned}$$
The unique aspect of MIMO ARQ is that past transmissions cannot be changed while future retransmissions might not be necessary. Therefore, it makes sense to optimize the precoder F m only based on the current channel state information (CSI) H m for each (re-)transmission while taking into account previous retransmissions 
$$\sum _{i=1}^{m-1}{\mathbf {F}}_{i}^{\mathrm {H}}{\mathbf {H}}_{i}^{\mathrm {H}}{\mathbf {H}}_{i}{\mathbf {F}}_{i}$$
. Accordingly, the problem is formulated as

$$\displaystyle \begin{aligned} \sup_{{\mathbf{F}}_{m}}\;\mathcal{I}\left(\mathbf{s};{\mathbf{y}}_{1},{\mathbf{y}}_{2},\ldots,{\mathbf{y}}_{m}\right)\quad \mathrm{s.t.}\;\mathrm{tr}\left({\mathbf{F}}_{m}{\mathbf{F}}_{m}^{\mathrm{H}}\right)\leq P_{\mathrm{t}}. \end{aligned}$$
This is the basic idea behind sequential precoder design with respect to hybrid ARQ where the same data streams s are resent for each round of retransmission. The work in [7] solved such an optimization problem by showing that the optimal design must diagonalize the summation 
$$\sum _{i=1}^{m}{\mathbf {F}}_{i}^{\mathrm {H}}{\mathbf {H}}_{i}^{\mathrm {H}}{\mathbf {H}}_{i}{\mathbf {F}}_{i}$$
. The issue of sum MSE minimization was addressed in [8]. Instead of the highly complex ML receiver as implied by the expression of 
$$\mathcal {I}(\mathbf {s};{\mathbf {y}}_{1},{\mathbf {y}}_{2},\ldots ,{\mathbf {y}}_{m})$$
, a linear precoder in conjunction with nonlinear MMSE-decision feedback equalizer (DFE) was derived in [9]. By relaxing the assumption of full CSI to partial CSI, i.e., channel transmit covariance, the objective of ergodic mutual information, i.e.,
../images/470489_1_En_2_Chapter/470489_1_En_2_Equh_HTML.png
is formulated and solved suboptimally in [10]. Extension to non-Gaussian inputs, i.e., practical modulation schemes, for mutual information maximization was made in [11].

2.1.3 Hybrid RF-Baseband Solutions for Massive MIMO

Let us shift our focus to a narrowband massive MIMO system as illustrated in Fig. 2.2, where the number of transmit antennas N t and that of receive antennas N r are on the order of tens or even hundreds, i.e., N t, N r ≫ 1. As explained in Sect. 1.​2.​1, because of high hardware complexity and power consumption, the fully digital architecture is not practically realizable for massive MIMO. Instead, the large-scale antenna arrays are driven by a limited number of n t RF chains at the transmitter and n r RF chains at the receiver with n t ≪ N t and n r ≪ N r. In this case, the number of data streams that can be physically supported must satisfy 
$$N_{\mathrm {s}}\leq \min (n_{\mathrm {t}},n_{\mathrm {r}})$$
. The received signal reads as

$$\displaystyle \begin{aligned} \mathbf{y}=\mathbf{H}\mathbf{x}+\mathbf{n}, \end{aligned}$$
where 
$$\mathbf {H}\in \mathbb {C}^{N_{\mathrm {r}}\times N_{\mathrm {t}}}$$
is the block fading channel matrix normalized as 
$$\mathbb {E}[||\mathbf {H}||{ }_{\mathrm {F}}^{2}]=N_{\mathrm {t}}N_{\mathrm {r}}$$
, 
$$\mathbf {x}\in \mathbb {C}^{N_{\mathrm {t}}}$$
is the transmitted signal satisfying the normalized power constraint 
$$\mathrm {tr}(\mathbb {E}[\mathbf {x}{\mathbf {x}}^{\mathrm {H}}])\leq P_{\mathrm {t}}$$
, and 
$$\mathbf {n}\sim \mathcal {CN}(\mathbf {0},\sigma _{n}^{2}{\mathbf {I}}_{N_{\mathrm {r}}})$$
is the zero-mean, spatially white Gaussian noise with variance 
$$\sigma _{n}^{2}$$
. In particular,

$$\displaystyle \begin{aligned} \mathbf{x}=\mathbf{F}\mathbf{s}={\mathbf{F}}_{\mathrm{RF}}{\mathbf{F}}_{\mathrm{BB}}\mathbf{s}, \end{aligned}$$
../images/470489_1_En_2_Chapter/470489_1_En_2_Fig2_HTML.png
Fig. 2.2

Hybrid precoding and combining with RF phase-shifters

where F = F RF F BB is a linear precoder composed of two stages: an RF analog component 
$${\mathbf {F}}_{\mathrm {RF}}\in \mathbb {C}^{N_{\mathrm {t}}\times n_{\mathrm {t}}}$$
and a baseband digital component 
$${\mathbf {F}}_{\mathrm {BB}}\in \mathbb {C}^{n_{\mathrm {t}}\times N_{\mathrm {s}}}$$
, and 
$$\mathbf {s}\sim \mathcal {CN}(\mathbf {0},{\mathbf {I}}_{N_{\mathrm {s}}})$$
is the normalized data stream. The signal estimate is given by

$$\displaystyle \begin{aligned} \hat{\mathbf{s}}={\mathbf{W}}^{\mathrm{H}}\mathbf{y}={\mathbf{W}}_{\mathrm{BB}}^{\mathrm{H}}{\mathbf{W}}_{\mathrm{RF}}^{\mathrm{H}}\mathbf{y}, \end{aligned}$$
where W = W RF W BB is a linear two-stage combiner consisting of an RF analog combiner 
$${\mathbf {W}}_{\mathrm {RF}}\in \mathbb {C}^{N_{\mathrm {r}}\times n_{\mathrm {r}}}$$
and a baseband digital combiner 
$${\mathbf {W}}_{\mathrm {BB}}\in \mathbb {C}^{n_{\mathrm {r}}\times N_{\mathrm {s}}}$$
. To reduce hardware complexity, RF elements might consist of only phase-shifters. In other words, the elements of the RF precoder F RF and RF combiner W RF are of constant modulus, i.e., 
$$\vert [{\mathbf {F}}_{\mathrm {RF}}]_{i,j}\vert =\frac {1}{\sqrt {N_{\mathrm {t}}}},\,i=1,\ldots ,N_{\mathrm {t}},j=1,\ldots ,n_{\mathrm {t}}$$
and 
$$\vert [{\mathbf {W}}_{\mathrm {RF}}]_{p,q}\vert =\frac {1}{\sqrt {N_{\mathrm {r}}}},p=1,\ldots ,N_{\mathrm {r}},q=1,\ldots ,n_{\mathrm {r}}$$
. On the assumption of perfect CSI, one noticeable difference in the formulation of hybrid precoding and combining from its fully digital counterpart in Sect. 2.1.1 is the introduction of the RF stages F RF and W RF. Although by treating 
$${\mathbf {W}}_{\mathrm {RF}}^{\mathrm {H}}\mathbf {H}{\mathbf {F}}_{\mathrm {RF}}$$
as the effective MIMO channel, previous baseband digital solutions for conventional MIMO are readily applicable, the RF components nonetheless need to be carefully optimized for the best performance. For example, optimal power loading schemes for fully digital solutions are unlikely to remain optimal without taking into account the presence of the RF precoder since 
$$\mathrm {tr}(\mathbb {E}[\mathbf {x}{\mathbf {x}}^{\mathrm {H}}])=\Vert {\mathbf {F}}_{\mathrm {RF}}{\mathbf {F}}_{\mathrm {BB}}\Vert _{\mathrm {F}}^{2}\leq P_{\mathrm {t}}$$
. In combination with the nonconvex constant-modulus constraints, such problems are in fact challenging to solve.
To jointly optimize the RF precoder and combiner, one may attempt to maximize the capacity of the effective channel. This motivates the consideration of the mutual information as the criterion, which is defined between the baseband-precoded signal and the signal estimate after linear RF combining, i.e.,

$$\displaystyle \begin{aligned} \mathcal{I}_{\mathrm{RF}}=\log_{2}\det\left({\mathbf{I}}_{N_{\mathrm{s}}}+\frac{P_{\mathrm{t}}}{N_{\mathrm{s}}}{\mathbf{F}}_{\mathrm{RF}}^{\mathrm{H}}{\mathbf{H}}^{\mathrm{H}}{\mathbf{W}}_{\mathrm{RF}}\left({\mathbf{W}}_{\mathrm{RF}}^{\mathrm{H}}{\mathbf{W}}_{\mathrm{RF}}\right)^{-1}{\mathbf{W}}_{\mathrm{RF}}^{\mathrm{H}}\mathbf{H}{\mathbf{F}}_{\mathrm{RF}}\right), \end{aligned}$$
with equal power allocation. It was shown in [12] that the optimal RF precoder (combiner) is related to the right (left) singular vectors corresponding to the n t (n r) largest singular values of the MIMO channel H. Built upon this insight, the constant-modulus RF precoder and combiner were further derived by extracting the phases of their optimal counterparts. In doing this, the solutions can adapt flexibly to the instantaneous channel variation. The downside is that the optimal solutions need to be computed first based on the perfect CSI. Such requirements were relaxed in [13] where the hybrid combining aspect was addressed with respect to the channel receive correlation. On the other hand, when strong spatial correlation is present in the MIMO channel, one simplifying and effective alternative is to reduce the design of RF transmit and receive phase-shifting to beam selection from the N t-dimensional discrete Fourier transform (DFT) codebook 
$$\mathcal {D}_{\mathrm {t}}=[\frac {1}{\sqrt {N_{\mathrm {t}}}}e^{-\frac {\jmath 2\pi mn}{N_{\mathrm {t}}}}],m,n=0,\ldots ,N_{\mathrm {t}}-1$$
and the N r-dimensional DFT codebook 
$$\mathcal {D}_{\mathrm {r}}=[\frac {1}{\sqrt {N_{\mathrm {r}}}}e^{-\frac {\jmath 2\pi pq}{N_{\mathrm {r}}}}],p,q=0,\ldots ,N_{\mathrm {r}}-1$$
, respectively [14]. Different from the independently and identically distributed (i.i.d.) Rayleigh fading channel which corresponds to a rich scattering environment, a strongly correlated channel experiences limited scattering, and thus has a directional power spectrum in the angle domain. Intuitively, if the chosen DFT beams are well aligned with the directional scattering, it is unlikely for such an RF beamformer to cause a severe loss of channel power. Nonetheless, since the DFT beamforming directions correspond to uniformly quantized angles of arrival (AoAs)/angles of departure (AoDs), phase-shifting adaptive to instantaneous CSI as in [12] is more advantageous as the channel becomes increasingly independent. Besides, in a multi-user setting, the coarse resolution provided by DFT might not be sufficient to achieve good spatial separation between users or good coverage when the users are not located in a favorable direction.
The work in [15] treated perfect CSI-based hybrid precoding and combining with RF phase-shifters in the context of mmWave massive MIMO systems. Subject to the power constraint 
$$\Vert {\mathbf {F}}_{\mathrm {RF}}{\mathbf {F}}_{\mathrm {BB}}\Vert _{\mathrm {F}}^{2}\leq P_{\mathrm {t}}$$
, the hybrid precoder and combiner are obtained by optimizing the mutual information 
$$\mathcal {I}\left (\mathbf {s};\mathbf {y}\right )$$
, i.e.,
../images/470489_1_En_2_Chapter/470489_1_En_2_Equm_HTML.png
where 
$${\mathbf {R}}_{n}\triangleq \sigma _{n}^{2}{\mathbf {W}}_{\mathrm {BB}}^{\mathrm {H}}{\mathbf {W}}_{\mathrm {RF}}^{\mathrm {H}}{\mathbf {W}}_{\mathrm {RF}}{\mathbf {W}}_{\mathrm {BB}}$$
is the noise correlation after hybrid combining. In contrast with the study in [12] where the RF precoder and combiner were optimized with respect to the RF-processed channel capacity, the effect of baseband precoding and combining are explicitly incorporated in seeking a joint RF-baseband design. Due to the difficulty posed by the nonconvex constant-modulus constraint on the RF elements, joint optimization of the hybrid precoder {F RF, F BB} and hybrid combiner {W RF, W BB} is not possible. Instead, the authors focused on joint optimization of RF-baseband precoding (combining) by abstracting the effect of hybrid combining (precoding). In view of the constant-modulus property of the array steering vectors, the design of RF precoder (RF combiner) was formulated as selecting the optimal array steering vectors corresponding to all path AoAs (AoDs). In combination with the baseband digital precoder (combiner), such a hybrid design was cast as matrix reconstruction of the fully digital optimal solution for conventional MIMO as in Sect. 2.1.1, and a near-optimal precoding (combining) solution was found via an orthogonal matching pursuit algorithm. Although low-complexity, one immediate drawback of this technique is that extensive information of AoAs (AoDs) for all the propagation paths is required. In an attempt to address this practical limitation, the idea of sparse approximation was further applied to channel estimation in [16]. The sub-optimal restriction of RF elements to the space of array steering vectors incurs severe performance loss under certain channel conditions, and therefore, general formulation of constant-modulus RF design is needed. Following the idea of matrix reconstruction, improved hybrid solutions were devised in [17] using Grassmann manifold optimization and in [18] based on approximate joint diagonalization of matrices. To obviate the need for computing the optimal solution, a heuristic method was proposed in [19]. In [20], multi-layer codebook design for both RF phase-shifters and baseband precoder was considered for wideband orthogonal frequency-division multiplexing (OFDM) systems. In this case, beam-steering in the RF domain is commonly applied across all sub-carriers while baseband precoding is specific to each sub-carrier.

2.2 Multi-User MIMO Precoding

2.2.1 Precoding for Conventional MIMO: From Linear to Nonlinear

As illustrated in Fig. 2.3, let us consider a traditional multi-user (MU)-MIMO broadcast channel (BC) where a base station (BS) with N t transmit antennas is serving K ≤ N t single-antenna users. Let 
$${\mathbf {h}}_{i}^{\mathrm {H}}\in \mathbb {C}^{1\times N_{\mathrm {t}}}$$
represent the channel between the ith user and the BS, and 
$$\mathbf {H}\triangleq [{\mathbf {h}}_{1},\ldots ,{\mathbf {h}}_{K}]^{\mathrm {H}}\in \mathbb {C}^{K\times N_{\mathrm {t}}}$$
. The received signals for all K users can be collectively written as

$$\displaystyle \begin{aligned} \mathbf{y}=\left[{\mathbf{h}}_{1},\ldots,{\mathbf{h}}_{K}\right]^{\mathrm{H}}\mathbf{x}+\mathbf{n}=\mathbf{H}\mathbf{x}+\mathbf{n}, \end{aligned}$$
where 
$$\mathbf {x}\in \mathbb {C}^{N_{\mathrm {t}}}$$
is the transmit signal, and 
$$\mathbf {n}\sim \mathcal {CN}(\mathbf {0},\sigma _{n}^{2}{\mathbf {I}}_{K})$$
is the zero-mean, spatially white Gaussian noise with variance 
$$\sigma _{n}^{2}$$
. It is well-known that the sum capacity is achievable by encoding x by dirty paper coding (DPC) [21, 22]. However, DPC involves sophisticated random coding and binning approaches, and therefore is too computationally intensive to be useful in practice. Alternatively, a linear precoding scheme, channel inversion or zero forcing (ZF), is usually preferred because of its low complexity and being not too sensitive to channel estimation errors. At the expense of transmit power enhancement, inter-user interference is completely eliminated. In this case, the transmit signal is given as

$$\displaystyle \begin{aligned} \mathbf{x}=\beta^{-1}{\mathbf{H}}^{\mathrm{H}}\left(\mathbf{H}{\mathbf{H}}^{\mathrm{H}}\right)^{-1}\mathbf{s}, \end{aligned}$$
where β −1 is the power scaling factor to enforce the power constraint 
$$\mathrm {tr}(\mathbb {E}[\mathbf {x}{\mathbf {x}}^{\mathrm {H}}])\leq P_{\mathrm {t}}$$
, and s = [s 1, s 2, …, s K]T represents the data streams for the K users. When multi-user diversity is present in the system, i.e., there is a large number of users with heterogeneous channel realizations, it is possible to achieve the sum channel capacity by simply combining ZF precoding with greedy user selection [23, 24]. However, the effectiveness of such linear schemes might become questionable when users experience the same large-scale channel fading and require an equal-rate performance. If the system is fully loaded 
$$\frac {N_{\mathrm {t}}}{K}=1$$
, i.e., the number of spatially multiplexed users K is equal to the maximum number of data streams N t that can be physically supported, the sum rate unfortunately does not grow linearly with the number of users [25]. Note that the MIMO channel matrix 
$$\mathbf {H}\in \mathbb {C}^{K\times N_{\mathrm {t}}}$$
in this case is square. It was observed in [25] that all the eigenvalues except one of H have magnitudes of comparable order. On the contrary, the inverse of the peculiar, or ill-behaving, eigenvalue has an infinite mean. Intuitively, this implies that when zero-forcing (channel inversion) precoding is used to serve the K homogeneous users, transmission along the eigenvector associated with the ill-behaving eigenvalue would consume most of the power. As a result, the sum rate performance does not grow linearly with the number of users.
../images/470489_1_En_2_Chapter/470489_1_En_2_Fig3_HTML.png
Fig. 2.3

MU-MIMO downlink linear precoding for single-antenna users

To avoid transmission on the sub-channel with a poor channel gain, one possible solution is to use regularized ZF (RZF) precoding [25], where instead of direct inversion of the channel, a regularized channel is inverted. In other words,

$$\displaystyle \begin{aligned} \mathbf{x}=\beta^{-1}{\mathbf{H}}^{\mathrm{H}}\left(\mathbf{H}{\mathbf{H}}^{\mathrm{H}}+\alpha\mathbf{I}\right)^{-1}\mathbf{s},{} \end{aligned} $$
(2.1)
where α is the regularization factor to be optimized. Clearly, such an RZF precoding solution includes channel inversion as a special case when α = 0. In other words, the introduction of the regularization factor α represents an additional degree of freedom (DoF) for beamforming optimization. Under the assumption that users receive the same signal-to-interference-plus-noise ratio (SINR), it was proved in [25] that the asymptotically optimal α in the sense of SINR maximization is 
$$\alpha ^{\star }=\frac {K\sigma _{n}^{2}}{P_{\mathrm {t}}}$$
. Such an RZF precoder can be equivalently obtained by an MMSE formulation[26]. In addition to modifying the linear precoding front-end, modification of the signal structure is another feasible avenue. Here, as illustrated in Fig. 2.4, in lieu of the original data symbol d, its perturbed version s is transmitted [27], i.e.,
../images/470489_1_En_2_Chapter/470489_1_En_2_Fig4_HTML.png
Fig. 2.4

MU-MIMO downlink nonlinear precoding for single-antenna users


$$\displaystyle \begin{aligned} \mathbf{s}=\mathbf{d}+\tau\mathbf{q},{} \end{aligned} $$
(2.2)
where 
$$\mathbf {q}\in \mathbb {Z}^{K}+\jmath \mathbb {Z}^{K}$$
is a complex integer vector, and τ > 0 is a design parameter chosen to ensure correct recovery of s by modulo decoding. In particular, by defining the element-wise modulo-τ operation for a real vector z as

$$\displaystyle \begin{aligned} f_{\tau}\left(\mathbf{z}\right)=\mathbf{z}-\left\lfloor \frac{\mathbf{z}}{\tau}+0.5\right\rfloor \tau, \end{aligned}$$
it is required that

$$\displaystyle \begin{aligned} f_{\tau}\left(\Re\left\{ \mathbf{s}\right\} \right)+\jmath f_{\tau}\left(\Im\left\{ \mathbf{s}\right\} \right)=\mathbf{d}. \end{aligned}$$
In other words, enhancement of the system performance is enabled through introducing extra DoFs in the signal domain, as represented by the addition and removal of a perturbation vector q. To search for q, the authors in [27] proposed to minimize the transmit power as the cost function, or equivalently,

$$\displaystyle \begin{aligned} {\mathbf{q}}_{\mathrm{pow}}=\operatorname*{\mbox{arg min}}_{\mathbf{q}\in\mathbb{Z}^{K}+\jmath\mathbb{Z}^{K}}\big\Vert{\mathbf{H}}^{\mathrm{H}}(\mathbf{H}{\mathbf{H}}^{\mathrm{H}}+\xi\mathbf{I})^{-1}\left(\mathbf{d}+\tau\mathbf{q}\right)\big\Vert^{2}, \end{aligned}$$
where ξ = 0 and ξ = α correspond to using ZF and RZF precoding as the linear front-end, respectively. Because of the nonlinear nature of the search for q, the probability distribution of s in (2.2) is generally not known. In an attempt to circumvent this difficulty, the MSE in [28] is defined as the squared distance between the perturbed signal and the signal estimate before modulo decoding conditioned on d, i.e.,

$$\displaystyle \begin{aligned} \mathrm{MSE}|{}_{\mathbf{d}}=\mathbb{E}\left[\left\Vert \mathbf{s}-\beta\mathbf{y}\right\Vert ^{2}\big|\mathbf{d}\right], \end{aligned}$$
where the averaging is over noise. Intuitively, if the signal estimate β y is close to the perturbed signal s, the recovered data streams should be close to the original data streams d after the removal of the perturbation vector by modulo decoding. Minimization of MSE|d results in the RZF precoder in (2.1) with the same 
$$\alpha ^{\star }=\frac {K\sigma _{n}^{2}}{P_{\mathrm {t}}}$$
, and the perturbation vector

$$\displaystyle \begin{aligned} {\mathbf{q}}_{\mathrm{mse}}=\operatorname*{\mbox{arg min}}_{\mathbf{q}\in\mathbb{Z}^{K}+\jmath\mathbb{Z}^{K}}\left\Vert -\mathbf{L}\mathbf{d}-\tau\mathbf{L}\mathbf{q}\right\Vert ^{2}, \end{aligned}$$
where the lower triangular matrix L is defined through Cholesky factorization, i.e.,

$$\displaystyle \begin{aligned} \left(\mathbf{H}{\mathbf{H}}^{\mathrm{H}}+\alpha^{\star}\mathbf{I}\right)^{-1}={\mathbf{L}}^{\mathrm{H}}\mathbf{L}. \end{aligned}$$
In terms of coded BER, the MSE-based solution yields a noticeably superior performance than the power minimization-based counterpart [28].

Since the perturbation vector is a Gaussian integer, the search can be viewed as a problem of closest point search in a lattice. If s is picked from a high-order square constellation such as 16-, 64-, and 256-QAM as defined in Long-Term Evolution (LTE), and the optimal perturbation vector is optimally found, then by approximating the equiprobable discrete constellation points as continuous and uniformly distributed in a hyper-rectangle, the resulting errors from the search can be accordingly treated as uniformly distributed [29]. If the perturbation vector is obtained by power minimization, the error is nothing but the transmit power. In contrast to [30] where the transmit power is obtained by numerically solving a set of fixed-point equations, such lattice-theoretic approximation yields a mathematically tractable lower bound. This insight was leveraged for channel vector quantization design in [31], and for greedy user selection to alleviate the concern of power enhancement in cooperative ZF beamforming [32]. While the per-BS power constraint is more practical for multi-cell downlink beamforming, unfortunately, the MMSE-VP precoder has to be numerically optimized [33].

When users are equipped with multiple antennas, an enhanced system performance is expected from joint design of nonlinear precoding and linear combining. So far, the focus has been exclusively on non-iterative methods for the benefit of low complexity. The basic idea is to first use block diagonalization (BD) to eliminate inter-user interference and hence create parallel SU-MIMO channels, and then perform VP across the spatially multiplexed data streams in conjunction with SU-MIMO precoding and combining for each user. In [34], over each SU-MIMO channel, ZF-VP is used for spatial multiplexing while treating each receive antenna as a virtual user. In doing so, users only need to know the power scaling factor β for detection. Such a design, motivated by the need for signaling overhead reduction, was shown to approach the performance of waterfilling-based solutions [1]. The work in [35] instead combined BD with MMSE-VP, and demonstrated the benefit of geometric mean decomposition-based joint design [5] in terms of improved BER. By exploiting uniform channel decomposition [6], the linear MMSE receiver could also be incorporated [36, 37], which further decreases the BER. Assuming matched filtering at the users, a non-iterative approach for cooperative ZF-VP beamforming was proposed in [38] and compared with the linear counterpart.

2.2.2 Hybrid RF-Baseband Solutions for Massive MIMO

For a multi-user massive MIMO system where the BS employs the hybrid precoding architecture, the transmit signal is expressed as

$$\displaystyle \begin{aligned} \mathbf{x}=\mathbf{F}\mathbf{s}={\mathbf{F}}_{\mathrm{RF}}{\mathbf{F}}_{\mathrm{BB}}\mathbf{s}. \end{aligned}$$
Subject to the transmit power constraint 
$$\Vert {\mathbf {F}}_{\mathrm {RF}}{\mathbf {F}}_{\mathrm {BB}}\Vert _{\mathrm {F}}^{2}\leq P_{\mathrm {t}}$$
, joint RF-baseband optimization can be performed sequentially in light of the following equivalence [39]:

$$\displaystyle \begin{aligned} \inf_{\left\{ {\mathbf{F}}_{\mathrm{RF}},{\mathbf{F}}_{\mathrm{BB}}\right\} }\phi\left({\mathbf{F}}_{\mathrm{RF}},{\mathbf{F}}_{\mathrm{BB}}\right)=\inf_{{\mathbf{F}}_{\mathrm{RF}}}\left\{ \inf_{{\mathbf{F}}_{\mathrm{BB}}}\phi\left({\mathbf{F}}_{\mathrm{RF}},{\mathbf{F}}_{\mathrm{BB}}\right)\right\} =\inf_{{\mathbf{F}}_{\mathrm{BB}}}\left\{ \inf_{{\mathbf{F}}_{\mathrm{RF}}}\phi\left({\mathbf{F}}_{\mathrm{RF}},{\mathbf{F}}_{\mathrm{BB}}\right)\right\} , \end{aligned}$$
which holds for any cost function 
$$\phi \left (\cdot \right )$$
. Here, the digital precoder F BB at baseband generally employs traditional MU-MIMO linear schemes such as ZF and RZF, and the interest lies mainly in deriving the RF analog precoder F RF subject to additional constraints such as constant-modulus elements, i.e., phase-shifters, and partial CSI.

RF phase-shifting design with perfect CSI was addressed in [40, 41]. In [40], the phase-shifters in the RF domain are heuristically derived based on extracting the phases of the maximum ratio transmission (MRT) beamformer. In combination with the baseband ZF precoding, the hybrid solution was shown to perform close to the fully digital ZF counterpart in terms of sum rate. This idea was further pursued in a multi-antenna user setting in [41], where hybrid combining was also considered with the constant-modulus RF combiner reduced to DFT beam selection.

It is worth mentioning that one basic assumption in the aforementioned works [1519, 40, 41] is that perfect knowledge of the high-dimensional MIMO channel is available at the BS. This assumption, however, could be problematic in closed-loop frequency-division duplexing (FDD) systems, in that an enormous amount of channel estimates need to be frequently fed back [42]. In an effort to remedy the issue with channel estimation overhead, one interesting idea is to adjust RF processing solely based on the statistical CSI while updating baseband processing according to the instantaneous effective CSI [4347]. The slow-varying nature of statistical CSI renders it unnecessary for frequent update which leads to reduction in feedback overhead. In the presence of the RF stage, the dimension of the effective channel from the perspective of the baseband is significantly decreased in contrast to the original MIMO channel, and thus timely update of instantaneous CSI becomes feasible.

So far, studies on two-timescale hybrid precoding have been largely focused on scenarios where single-antenna users are assumed to be clustered. According to the one-ring correlation channel model [48, 49], the transmit correlation matrix is related to the mean AoA and angle spread, which are in turn decided by a user’s relative location to the BS and its surrounding scattering environment. Therefore, it makes sense to geographically divide users into clusters, and assume that users associated with the same cluster share the same transmit correlation matrix, as illustrated in Fig. 2.5. This observation motivates us to view the interference experienced by each user as a combination of inter-cluster and intra-cluster interference, which can be handled by RF precoding and baseband precoding, respectively. The authors in [43, 44] proposed the technique of joint spatial division and multiplexing where the statistical CSI-based RF precoder is employed to separate users into non-interfering clusters by BD. This technique relies on the premise that the sub-spaces spanned by the transmit correlation matrices do not significantly overlap with each other. Depending on the availability of CSI, baseband precoding can be carried out jointly across all user clusters or separately for each cluster of users. It was concluded that the sum rate of per-cluster precoding would become interference limited when inter-cluster interference is not effectively suppressed. Instead of trying to null inter-cluster interference as in BD, the work in [50] derived the RF precoder with the aim of striking a balance between cluster-wise self-transmission and interference leakage power. A lower bound for the cost function, the expected signal-to-leakage-plus-noise ratio (SLNR), was proposed and solved by well-known trace quotient algorithms. Not surprisingly, since self-transmission is not overly penalized by allowing controlled inter-cluster interference, the sum rate is increased. The authors in [46] considered a design of statistical CSI-based RF phase-shifting to maximize the worst ergodic rate. Based on the insight that the DFT matrix well approximates the transmit correlation matrix for uniform linear arrays (ULAs) in the large array regime (N t →), the problem is reduced to DFT beam selection. When taking into account channel estimation errors, such constant-modulus solutions are even able to outperform the fully digital ZF solution. In [51], the MSE across all users is minimized for OFDM systems where elements in both the RF and the baseband domain are constrained to be of constant modulus in an attempt to avoid high peak-to-average power ratio.
../images/470489_1_En_2_Chapter/470489_1_En_2_Fig5_HTML.png
Fig. 2.5

Massive MIMO downlink precoding for clustered users

It is worth mentioning that the effectiveness of the hybrid precoding solutions with two-timescale CSI as in [43, 50, 52] relies on the assumption that users are naturally partitioned into groups, and the same group of users experiences identical channel spatial correlation. This is however too restrictive in practice. Thus, the work in [5355] proposed various user grouping algorithms based on the distance between the subspaces of the transmit spatial correlation, and RF beamforming was adapted to the centroid of the correlation matrices for each user group. Since the mean correlation is only a rough approximation, in this case, it is unlikely for the resulting RF beams to create near-perfect spatial separation, which casts doubt on the feasibility of group-wise spatial multiplexing at baseband.

When it comes to multi-cell systems, additional design issues, such as limited inter-BS cooperation in terms of signal and local CSI exchange, per-BS power constraint, and inter-cell interference need to be properly addressed. For example, in [45], only statistical CSI was assumed to be globally available, and the cluster-wise RF precoding was constrained to the null space of the superimposed transmit correlation matrices for interference reduction. In conjunction with local CSI-based ZF solutions, the RF solutions are derived to maximize a general utility function of spectral efficiency. In [52], RF precoding is designed with the objective of minimizing interference leakage power with linear pricing. Under the assumption that statistical CSI used for RF precoder update is outdated, a subspace tracking and compensation algorithm on Grassmann manifolds was proposed. The concept of deterministic equivalent was exploited in [47] to approximate the SINR chance constraint by deterministic functions, where accordingly, the RF precoding solutions are obtained. In [56], MMSE-VP was employed for instantaneous CSI-based two-stage precoder design in cooperative multi-cell systems.

2.3 Summary

For point-to-point massive MIMO systems, previous work has shown that even with a reduced number of RF chains, instantaneous CSI-based two-stage precoding/combining solutions are capable of delivering a performance comparable to their fully digital counterparts. This is especially the case when the MIMO channel is correlated, as commonly found in a directional propagation environment such as mmWave channels. However, because of the tremendous overhead of estimating the high-dimensional MIMO channel, one might be interested to know if such an observation is still valid when perfect CSI in the RF domain is relaxed to statistical CSI, e.g., channel covariance. Besides, it is noted that joint precoder-combiner optimization in conventional MIMO gives a substantial performance enhancement. Nonetheless, such an approach has not been attempted to derive the hybrid precoding and combining solutions yet. Finally, the design and evaluation of statistical CSI-based RF phase-shifting remain an open issue.

When packet retransmission is incorporated through hybrid ARQ mechanisms in conventional MIMO systems, previous research efforts have demonstrated that by exploiting the temporal diversity in the linear precoder optimization, the system performance can be improved. Unfortunately, such solutions cannot be directly applied to massive MIMO. To begin with, it has been assumed that the received signals from the past rounds of retransmission are fully accessible to the baseband. However, such an assumption raises concern about the storage requirement and processing complexity when the received signals are of high dimensions. A potential remedy is to introduce hybrid RF-baseband combining which reduces the dimension of the received signals to be combined through RF preprocessing. In doing this, the baseband has only access to the low-dimensional received signals at the output of the RF combiner. On the other hand, when the hybrid precoding structure with RF phase-shifting is employed, it is necessary for the optimization of baseband precoding to take into account the design of RF phase-shifting and hybrid combining at the receiver. Hence, a novel design of hybrid precoding and combining is required.

In a multi-user massive MIMO environment, the principle of hybrid RF-baseband precoding is to first create spatial separation of users in the RF beam domain and then perform spatial multiplexing at baseband. In particular, by assuming that users are geographically clustered in hotspots, it suffices to adjust the RF precoder based on the statistical CSI in consideration of the baseband precoder adaptive to the instantaneous effective CSI. As a result, only two-timescale CSI is required, which significantly reduces the channel estimation overhead. It is worth mentioning that the existing work has restricted the attention to linear precoding schemes such as ZF and RZF at baseband. Unfortunately, the linear schemes suffer severe power loss when a maximum number of equal-rate users is spatially multiplexed. On the contrary, by introducing a perturbation vector as additional DoFs for performance optimization, VP effectively addresses such an issue. A direct consequence of hybrid precoding with a reduced number of RF chains is that the number of data streams that can be physically supported is limited. Thus, it is desirable for hybrid precoding to perform well in fully loaded systems for utility maximization. In view of the drawback of linear schemes in this case, it is natural to explore how nonlinear VP techniques can be combined with the two-timescale CSI-based hybrid precoding design.

When extended to a multi-cell massive MIMO environment, the hybrid precoder needs to take into account the presence of inter-cell interference as well. The effectiveness of inter-cell interference mitigation depends on the degree of inter-cell cooperation in terms of information exchange allowed. For network MIMO processing, the technique of hybrid precoding design for single cell cannot directly carry over. In particular, because of the per-BS power constraint, closed-form expressions for the linear baseband front-end are no longer available. Given the difficulty with optimizing the statistical CSI-based RF precoder with respect to the traditional performance metrics, e.g., mutual information and MSE, the existing work has largely relied on heuristics. We remark that although the use of two-timescale CSI provides an effective alternative to addressing the issue of channel estimation overhead, it is somewhat restrictive from the perspectives of optimization and applicability. For example, in the absence of the analytical baseband solutions, iterative procedures are generally required for joint RF-baseband optimization. It is not clear how such alternating optimization can be carried out when the design variables are adaptive to different time scales. Besides, the assumption that users are geographically clustered and the user clusters are separated apart can turn out to be too ideal. Hence, novel design approaches that overcome such disadvantages while enjoying comparable channel estimation overhead with the two-timescale CSI are desired.