Currently, there are two kinds of signal detectors for MIMO systems: linear signal detectors and nonlinear signal detectors [1]. Linear signal detection algorithms include the conventional ZF algorithm and MMSE algorithm [2], as well as some recently proposed linear signal detection algorithms [3–5]. Although these linear signal detection algorithms have the advantages of low complexity, their deficiency in detection accuracy cannot be ignored, especially when the number of user antennas is close to or equal to the number of base station antennas [3]. The optimal signal detector is a nonlinear ML signal detector, but its complexity increases exponentially with the increase of the number of the transmitting antennas, so it cannot be implemented for massive MIMO systems [6]. The SD detector [7] and the K-Best detector [8] are two different variations of the ML detector. They can balance the computational complexity and performance by controlling the number of nodes in each search stage. However, the QR decomposition in these nonlinear signal detectors can lead to high computational complexity and low parallelism because of the inclusion of matrix operations such as element elimination. Therefore, people urgently need a detector with low complexity, high precision and high processing parallelism.
This chapter first introduces several conventional nonlinear MIMO signal detection algorithms in the Sect. 4.1. Section 4.2 presents a K-best signal detection and preprocessing algorithm in high-order MIMO systems, combining the Cholesky sorted QR decomposition and partial iterative lattice reduction (CHOSLAR) [9]. Section 4.3 presents another new signal detection algorithm, TASER algorithm [10].
4.1 Conventional Nonlinear MIMO Signal Detection Algorithm
4.1.1 ML Signal Detection Algorithm
![$$ \varvec{y} \in C^{{N_{\text{r}} }} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq1.png)
![$$ \varvec{y} = \varvec{Hs} + \varvec{n,} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ1.png)
![$$ \varvec{s} \in \varOmega^{{N_{\text{t}} }} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq2.png)
![$$ \varOmega $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq3.png)
![$$ \varvec{H} \in C^{{N_{\text{r}} \times N_{\text{t}} }} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq4.png)
![$$ h_{j,i} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq5.png)
![$$ i\left( {i = 1,2, \ldots ,N_{\text{t}} } \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq6.png)
![$$ j\left( {j = 1,2, \ldots ,N_{\text{r}} } \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq7.png)
![$$ \varvec{n} \in C^{{N_{\text{r }} }} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq8.png)
![$$ N\left( {0,\sigma^{2} } \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq9.png)
![$$ P\left( {\left. \varvec{y} \right|\varvec{H},\varvec{s}} \right) = \frac{1}{{\left( {\uppi\sigma^{2} } \right)^{M} }}\exp \left( { - \frac{1}{{\sigma^{2} }}\left\| {\varvec{y} - \varvec{Hs}} \right\|^{2} } \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ2.png)
![$$ \varvec{s} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq10.png)
![$$ \tilde{\varvec{s}} = \mathop {\arg \hbox{max} }\limits_{s \in \varOmega } P\left( {\left. \varvec{y} \right|\varvec{H},\varvec{s}} \right) = \mathop {\arg \hbox{min} }\limits_{s \in \varOmega } \left\| {\varvec{y} - \varvec{Hs}} \right\|^{2} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ3.png)
![$$ \begin{aligned} \left\| {\varvec{y} - \varvec{Hs}} \right\|^{2} & { = }\left\| {\varvec{QQ}^{\text{H}} \left( {\varvec{y} - \varvec{Hs}} \right) + \left( {\varvec{I}_{{N_{\text{r}} }} - \varvec{QQ}^{\text{H}} } \right)\left( {\varvec{y} - \varvec{Hs}} \right)} \right\|^{2} \\ & = \left\| {\varvec{QQ}^{\text{H}} \left( {\varvec{y} - \varvec{Hs}} \right)} \right\|^{2} + \left\| {\left( {\varvec{I}_{{N_{\text{r}} }} - \varvec{QQ}^{\text{H}} } \right)\left( {\varvec{y} - \varvec{Hs}} \right)} \right\|^{2} \\ & = \left\| {\varvec{Q}^{\text{H}} \varvec{y} - \varvec{Rs}} \right\|^{2} + \left\| {\left( {\varvec{I}_{{N_{\text{r}} }} - \varvec{QQ}^{\text{H}} } \right)y} \right\|^{2} \\ & = \left\| {\varvec{Q}^{\text{H}} \varvec{y} - \varvec{Rs}} \right\|^{2} \\ & = \left\| {\varvec{y^{\prime}} - \varvec{Rs}} \right\|^{2} , \\ \end{aligned} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ4.png)
![$$ \varvec{y^{\prime}} = \varvec{Q}^{\text{H}} \varvec{y} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq11.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq12.png)
![$$ \begin{aligned} \tilde{\varvec{x}} & = \mathop {\arg \hbox{min} }\limits_{s \in \varOmega } \left\| {\varvec{y^{\prime}} - \varvec{Rs}} \right\|^{2} \\ & = \mathop {\arg \hbox{min} }\limits_{s \in \varOmega } \left( {\sum\limits_{i = 1}^{{N_{\text{t}} }} {\left| {y^{\prime}_{i} - \sum\limits_{j = i}^{{N_{\text{t}} }} {R_{i,j} s_{j} } } \right|^{2} + \sum\limits_{{i = N_{\text{t}} + 1}}^{{N_{\text{r}} }} {\left| {y^{\prime}_{i} } \right|^{2} } } } \right) \\ & = \mathop {\arg \hbox{min} }\limits_{s \in \varOmega } \left( {\sum\limits_{i = 1}^{{N_{\text{t}} }} {\left| {y^{\prime}_{i} - \sum\limits_{j = i}^{{N_{\text{t}} }} {R_{i,j} s_{j} } } \right|^{2} } } \right) \\ & = \mathop {\arg \hbox{min} }\limits_{s \in \varOmega } \left[ {f_{{N_{\text{t}} }} \left( {s_{{N_{\text{t}} }} } \right) + f_{{N_{\text{t}} - 1}} \left( {s_{{N_{\text{t}} }} ,s_{{N_{\text{t}} - 1}} } \right) + \cdots + f_{1} \left( {s_{{N_{\text{t}} }} ,s_{{N_{\text{t}} - 1}} , \ldots ,s_{1} } \right)} \right], \\ \end{aligned} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ5.png)
![$$ f_{k} \left( {s_{{N_{\text{t}} }} ,s_{{N_{\text{t}} - 1}} , \ldots ,s_{k} } \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq13.png)
![$$ f_{k} \left( {s_{{N_{\text{t}} }} ,s_{{N_{\text{t}} - 1}} , \ldots ,s_{k} } \right){ = }\left| {y^{\prime}_{k} - \sum\limits_{j = k}^{{N_{\text{t}} }} {R_{k,j} s_{j} } } \right|^{2} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ6.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig1_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig1_HTML.png)
Search tree in the ML signal detection algorithm
There are S nodes (S is the number of possible values of each point in the modulation mode) in the first stage expanded by the root node. Each node has a value of . In the first stage, each node expands S child nodes to get the structure of the second stage, a total of
nodes. The value of second stage node is
, and so on, the last generated
stage has
child nodes. The value of the child node is
. We can find the optimal solution by looking for all the nodes.
ML signal detection algorithm searches all nodes to find the optimal node, and then estimates the transmitted signal, which is obviously the optimal estimation algorithm. Although ML signal detection algorithm can achieve the best performance, its computational complexity increases exponentially with the increase of the number of the transmitting antennas, the number of bits per symbol after modulation and the length of processing data. The computational complexity is (M is the number of constellation points,
is the number of the transmitting antennas). For example, in a
MIMO system with 16QAM modulation, the search amount for each symbol block is as high as
= 65,536. Therefore, it is difficult for the ML signal detection algorithm to apply to the actual communication system in the application scenarios of high-order modulation (
is large) and large number of transmitting antennas (
is large). In order to reduce the computational complexity of the algorithm, we need some approximate detection algorithm [11].
4.1.2 SD Signal Detection Algorithm and K-Best Signal Detection Algorithm
To achieve near-optimal performance and reduce computational complexity, several nonlinear signal detectors are proposed. One of the typical algorithms is the tree-based search algorithm [12, 13]. So far, many near-ML signal detection algorithms have been presented, including tree-based search algorithms with low complexity. K-best signal detection algorithm [14] searches K nodes in each stage to find the possible transmitted signals, while SD signal detection algorithm [15] searches the hypersphere near the receiving signal vector to find the possible transmitted signals. However, no other signal detection algorithms can achieve full diversity gain [16] except the ML signal detection algorithm. The fixed-complexity SD (fixed-complexity sphere decoding, FSD) signal detection algorithm [17] utilizes the potential grid structure of the receiving signal, and is considered the most promising algorithm to achieve the ML detection performance and reduce the computational complexity. In the conventional small-scale MIMO system, the algorithm performs well, but its computational complexity is still unbearable when the antenna size increases or the modulation order increases (for example, the number of the transmitting antennas is 128 and the modulation order is 64 QAM modulation) [6].
The above-mentioned algorithm will be described in detail in the following.
4.1.2.1 K-Best Signal Detection Algorithm
![$$ K $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq27.png)
![$$ N_{\text{t}} = 2 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq28.png)
![$$ K $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq29.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig2_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig2_HTML.png)
Search tree in the K-best signal detection algorithm.
© [2018] IEEE. Reprinted, with permission, from Ref. [11]
Algorithm 4.1
K-best signal detection algorithm
![../images/478198_1_En_4_Chapter/478198_1_En_4_Figa_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Figa_HTML.png)
The K-best signal detection algorithm can achieve near-optimal performance within a fixed complexity and moderate parallelism. The fixed complexity depends on the number of reserved candidate nodes , the modulation order and the number of the transmitting antennas. In the search number of the K-best signal detection algorithm, the total number of nodes searched is
. Although the K-best signal detection algorithm has the above advantages, it does not take into account the noise variance and channel conditions. In addition, the K-best signal detection algorithm extends all the K reserved paths for each stage to
possible child nodes. Therefore, great complexity is required to enumerate these child nodes, especially when the number of high-order modulation and survivor paths is large. The algorithm also needs to compute and sort
paths of each stage, where
is the path of the cropped tree. This sort is also very time-consuming. At the same time, the algorithm has a large bit error rate at a low
value.
4.1.2.2 SD Signal Detection Algorithm
![$$ r_{\text{s}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq36.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig3_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig3_HTML.png)
Principle of SD signal detection algorithm.
© [2018] IEEE. Reprinted, with permission, from Ref. [11]
![$$ \hat{\varvec{s}}_{{\text{SD}}} = \mathop {\arg \hbox{min} }\limits_{{\varvec{s} \in 2^{{QN_{\text{t}} }} }} \left\{ {\left\| {\varvec{y} - \varvec{Hs}} \right\|^{2} \le r_{\text{s}}^{2} } \right\} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ7.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq37.png)
![$$ \varvec{Q} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq38.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq39.png)
![$$ \varvec{H} = \varvec{QR} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq40.png)
![$$ \hat{\varvec{s}}_{{\text{SD}}} = \mathop {\arg \hbox{min} }\limits_{{\varvec{s} \in 2^{{QN_{\text{t}} }} }} \left\{ {\left\| {\tilde{\varvec{y}} - \varvec{Rs}} \right\|^{2} \le r_{\text{s}}^{2} } \right\}, $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ8.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq41.png)
![$$ d_{1} = \left\| {\tilde{\varvec{y}} - \varvec{Rs}} \right\|^{2} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq42.png)
![$$ d_{i} = d_{i + 1} + \left| {\tilde{y}_{i} - \sum\limits_{j = i}^{{N_{t} }} {R_{i,j} s_{j} } } \right|^{2} = d_{i + 1} + \left| {e_{i} } \right|^{2} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ9.png)
![$$ N_{\text{t}} + 1 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq43.png)
![$$ i $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq44.png)
![$$ i\left( {\text{th}} \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq45.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig4_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig4_HTML.png)
Search tree in the SD signal detection algorithm.
© [2018] IEEE. Reprinted, with permission, from Ref. [11]
The search algorithm starts with the root node or the first child node in Stage , where Stage
represents the
(th) antenna symbol at the transmitter. Then compute PED. If PED (i.e.,
) is less than the sphere radius
, then the search process will continue until it reaches the stage
Otherwise, if the
is greater than the sphere radius
, then the search has exceeded the set hypersphere, no longer search for all the child nodes below this node, but continue to search in another path. Thus, the step-by-step search is carried out until a valid leaf node in the first stage is estimated.
The selection of D in the SD signal detection algorithm has a certain influence on the search process. If the value of D is too small, it will result in the value of the first stage node exceeding D or the value of all nodes exceeding D when finding a middle stage, thus the optimal solution cannot be obtained. The SD-pruning signal detection algorithm can effectively solve this problem. It first defines the preset value as infinity, then updates the preset value when the last stage searched, and then keeps updating when smaller values searched, so that the performance of SD-pruning signal detection algorithm can reach that of ML signal detection algorithm.
4.1.2.3 FSD Signal Detection Algorithm
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig5_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig5_HTML.png)
Search tree in the FSD signal detection algorithm.
© [2018] IEEE. Reprinted, with permission, from Ref. [11]
Algorithm 4.2
FSD signal detection algorithm
![../images/478198_1_En_4_Chapter/478198_1_En_4_Figb_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Figb_HTML.png)
A conventional FSD signal detection algorithm has a fixed complexity but without consideration of noise and channel conditions. A simplified version of FSD signal detection algorithm is described in Ref. [18], which introduces the path selection into the remaining stages, so that the FSD signal detection algorithm can be highly parallel and fully pipelined [19].
4.2 CHOSLAR Algorithm
4.2.1 System Model
![$$ N_{\text{r}} = N_{\text{t}} = N $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq54.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq55.png)
![$$ \hat{\varvec{s}}_{{\text{ML}}} = \mathop {\arg \hbox{min} }\limits_{{\varvec{s} \in \varOmega }} \left\| {\varvec{Q}^{\text{H}} \varvec{y} - \varvec{Rs}} \right\|^{2} = \mathop {\arg \hbox{min} }\limits_{{\varvec{s} \in \varOmega }} \left\| {\hat{\varvec{y}} - \varvec{Rs}} \right\|^{2} , $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ10.png)
![$$ \varvec{H} = \varvec{QR} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq56.png)
![$$ \varvec{Q} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq57.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq58.png)
![$$ \hat{\varvec{y}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq59.png)
![$$ b $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq60.png)
![$$ i $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq61.png)
![$$ \text{LLR}\left( {s_{i,b} } \right) = \frac{1}{{\sigma^{2} }}\left( {\mathop {\hbox{min} }\limits_{{\varvec{s} \in s_{i,b}^{0} }} \left\| {\hat{\varvec{y}} - \varvec{Rs}} \right\|^{2} - \mathop {\hbox{min} }\limits_{{\varvec{s} \in s_{i,b}^{1} }} \left\| {\hat{\varvec{y}} - \varvec{Rs}} \right\|^{2} } \right), $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ11.png)
![$$ s_{i,b}^{0} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq62.png)
![$$ {\text{s}}_{i,b}^{1} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq63.png)
![$$ \varOmega $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq64.png)
![$$ s_{i,b} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq65.png)
Matrix is an upper triangular matrix, so the tree search starts with the last element of
. The scale of the channel matrix
increases as the scale of the MIMO system increases (e.g., from 2 × 2 to 16 × 16). Due to the high complexity and data dependency, QR decomposition is difficult to implement on hardware, especially for high-order MIMO systems. Therefore, the computational complexity and parallelism of QR decomposition are the main factors limiting the throughput of high-order MIMO systems at present.
4.2.2 QR Decomposition
QR decomposition is the process of decomposing the estimated channel matrix into a unitary matrix
and an upper triangular matrix
[8]. Gram–Schmidt (GS) [21, 24, 25], Householder transformation (HT) [26] and GR [22, 27] are three widely used QR decomposition algorithms. A simplified GS algorithm is proposed to achieve stable permutation QR decomposition in Ref. [28]. In GR algorithm, the computation of matrix
can be replaced by the same rotation operation as that of upper triangular matrix
. However, in high-order MIMO systems, with the increase of the scale of matrix H, it is inefficient to eliminate only one element at a time. Another disadvantage of GR algorithm is that the rotation operation cannot be executed until the two elements on the left are eliminated. Therefore GR algorithm parallelism is constrained, resulting in lower data throughput in higher order MIMO systems. In order to reduce the computational complexity while maintaining the hierarchical gain, a lattice reduction (LR) algorithm proposed in Ref. [20] as a preprocessing method to adjust the estimated channel matrix in polynomial time according to Lenstra–Lenstra–Lovasz (LLL) algorithm [20]. In addition, the LR-based MMSE signal detection algorithm proposed in Ref. [21] and the LR-based zero-forcing decision feedback (ZF-DF) algorithm proposed in Ref. [29] have better performance than conventional linear detection algorithms. However, LR requires huge quantity of conditional validation and column exchanging, resulting in uncertain data throughput, low parallelism, and long latency in hardware implementation, especially when the size of MIMO systems increases. QR decomposition of channel matrix is also required in LR-based ZF-DF algorithm, and QR decomposition is very sensitive to the order of search stage. The scale increases gradually with the increase of system dimensions. Therefore, with the increase of the number of users and antennas, the preprocessing part (including QR decomposition and LR) as part of the K-best signal detection algorithm has high complexity and low parallelism [20], which becomes one of the main factors hindering the performance of the whole detector, especially when the channel matrix is not slowly changing. After preprocess, the detection program itself is also a rather complicated part of the whole process. The number of child nodes in each stage determines the complexity of K-best signal detection algorithm. Seeking the optimal solution in a smaller vector space can greatly reduce the number of child nodes. Therefore, when the channel matrix obtained by preprocess has better performance (when the preprocess is more asymptotically orthogonal), the complexity of the detection program itself can be greatly reduced.
In Ref. [8], there is a detailed comparison of the computational complexity of GS, HT and GR algorithms based on real number calculation quantity. For complex domain system, GS algorithm and GR algorithm approximate need multiplication on the real domain, while HT algorithm needs more computation d it needs to compute the unitary matrix
[26]. Therefore, although HT algorithm can eliminate elements by column, its hardware consumption is also unsustainable. An improved GS algorithm for
QR decomposition is proposed in Ref. [25]. Matrices
and
as well as the median of computation are expanded for parallel design. However, compared with the conventional GS algorithm, the number of multiplication has not decreased. In Ref. [23], a simplified GS algorithm for matrix decomposition is proposed. This algorithm can realize low complexity, especially QR decomposition for stable replacement of slowly varying channels. However, when the channel is not slowly varying, this algorithm has a nonnegligible loss in detection accuracy. A time-dependent channel matrix is proposed in Ref. [30], in which only the columns of partial matrices will be updated with time. Meanwhile, an algorithm combining an approximate
matrix retention scheme and a precise QR decomposition update scheme is proposed to reduce the computational complexity. In Ref. [8], GR algorithm using coordinate rotation digital computer (CORDIC) unit simplifies complex arithmetic units in QR decomposition by iterative shift and addition operation. A new decomposition scheme based on real-domain system is also proposed in Ref. [8] to further reduce the number of arithmetic operation. However, the simplified QR decomposition combined with K-best signal detection has a serious loss of precision compared with ML signal detection algorithm. Other nonlinear signal detection algorithms, such as TASER [10] and probabilistic data association (PDA) algorithm [31] are also not suitable for high-order modulation systems. There will be serious precision loss, and hardware implementation is very difficult.
![$$ \hat{\varvec{s}}^{\text{ML}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq80.png)
![$$ \hat{s}_{i} = {{\left( {\hat{y}_{i} - \sum\limits_{{j = i + 1}}^{N} {R_{{i,j}} s_{j} } } \right)} \mathord{\left/ {\vphantom {{\left( {\hat{y}_{i} - \sum\limits_{{j = i + 1}}^{N} {R_{{i,j}} s_{j} } } \right)} {R_{{i,i}} }}} \right. \kern-\nulldelimiterspace} {R_{{i,i}} }} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ12.png)
![$$ \hat{y}_{i} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq81.png)
![$$ R_{i,i} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq82.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq83.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq84.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq85.png)
![$$ i $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq86.png)
![$$ R_{i,i} = \text{norm}\left( {\varvec{H}\left( {i:N,i} \right)} \right) = \sqrt {\sum\limits_{j = i}^{N} {H_{j,i}^{2} } } , $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ13.png)
![$$ R_{i,i} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq87.png)
![$$ i $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq88.png)
4.2.3 Lattice Reduction
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq89.png)
![$$ \bar{\varvec{H}} = \varvec{HT}, $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ14.png)
![$$ \varvec{T} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq90.png)
![$$ \bar{\varvec{H}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq91.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq92.png)
![$$ \bar{\varvec{H}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq93.png)
![$$ {\mathbf{R}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq94.png)
![$$ \delta $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq95.png)
![$$ 0.25 < \delta < 1 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq96.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq97.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq98.png)
![$$ \delta \left| {R_{k - 1,k - 1} } \right| > \left| {R_{k,k} } \right|, \, k = 2,3, \ldots ,N $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ15.png)
![$$ \frac{1}{2}\left| {R_{k - 1,k - 1} } \right| > \left| {R_{k - 1,r} } \right|,\,2 \le k \le r \le N $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ16.png)
However, LR algorithm requires multiple conditional checks and column exchanges, resulting in uncertain data throughput, low complexity, and long latency. The sorted QR decomposition is essentially similar to the LR algorithm, both of which achieve better properties by adjusting the matrix . Therefore, K-best signal detection algorithm can achieve the near-ML accuracy when the number of branch extensions is small. However, as the scale of the MIMO system increases, the computational complexity of preprocessing becomes uncontrollable. At the same time, sorted QR decomposition and LR will result in higher computational complexity, especially LR, which requires an uncertain number of low parallelism operations.
4.2.4 Cholesky Preprocessing
4.2.4.1 Cholesky-Sorted QR Decomposition
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq100.png)
![$$ \varvec{Q} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq101.png)
![$$ \varvec{Q} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq102.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq103.png)
![$$ \varvec{Q} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq104.png)
![$$ \varvec{H} = \varvec{QR} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq105.png)
![$$ \varvec{H}^{\text{H}} \varvec{H} = \left( {\varvec{QR}} \right)^{\text{H}} \varvec{QR} = \varvec{R}^{\text{H}} \left( {\varvec{Q}^{\text{H}} \varvec{Q}} \right)\varvec{R} = \varvec{R}^{\text{H}} \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ17.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq106.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq107.png)
![$$ \varvec{A} = \varvec{H}^{\text{H}} \varvec{H} = \varvec{R}^{\text{H}} \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq108.png)
![$$ \varvec{A} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq109.png)
![$$ \varvec{A} = \varvec{LL}^{\text{T}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq110.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq111.png)
![$$ \varvec{L}^{\text{T}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq112.png)
![$$ \varvec{Q} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq113.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq114.png)
![$$ \varvec{Q} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq115.png)
![$$ \varvec{Q}^{\text{H}} \varvec{y} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq116.png)
![$$ \varvec{H} = \varvec{QR} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq117.png)
![$$ \varvec{Q} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq118.png)
![$$ \varvec{ Q} = \varvec{HR}^{ - 1} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq119.png)
![$$ \varvec{Q}^{\text{H}} \varvec{y} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq120.png)
![$$ \varvec{Q}^{\text{H}} \varvec{y} = \left( {\varvec{HR}^{ - 1} } \right)^{\text{H}} \varvec{y} = \left( {\varvec{R}^{ - 1} } \right)^{\text{H}} \varvec{H}^{\text{H}} \varvec{y} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ18.png)
In Eq. (4.18), the direct solution of is replaced by the inversion of the upper triangular matrix
and two matrix-vector multiplications. The computation complexity of the upper triangular matrix is
, which is significantly lower than that of the directly computing matrix
. In the GR algorithm, when eliminating the elements of matrix
, do the same conversion for vector
to solve
instead of directly solving matrix
. However, the computation matrix
takes the multiplication of
complexity. The computational complexity of these algorithms will be compared in the subsequent chapters.
The next question is how to combine sorting operations with QR decomposition. In the Cholesky-based algorithm, the QR decomposition is realized through the Gram matrix . Based on the Cholesky decomposition, the matrix
is generated by row, and the pseudo codes of Cholesky decomposition are shown in Algorithm 4.3.
Algorithm 4.3
Cholesky decomposition algorithm
![../images/478198_1_En_4_Chapter/478198_1_En_4_Figc_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Figc_HTML.png)
![$$ i = 1 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq133.png)
![$$ \varvec{A} = \varvec{H}^{\text{H}} \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq134.png)
![$$ V_{k} = \varvec{h}_{k}^{\text{H}} \varvec{h}_{k} = \text{norm}^{2} \left( {\varvec{h}_{k} } \right) = A_{k,k} , \, k = 1,2,3,4 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ19.png)
![$$ A_{k,k} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq135.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq136.png)
![$$ \varvec{A} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq137.png)
![$$ i = 2 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq138.png)
![$$ V_{k} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq139.png)
![$$ \begin{aligned} V_{k} & = \text{norm}^{2} \left( {\varvec{h}_{k} } \right) - R_{i - 1,k}^{\text{H}} R_{i - 1,k} \\ & = A_{k,k} - R_{i - 1,k}^{\text{H}} R_{i - 1,k} , \, k = 2,3,4 \\ \end{aligned} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ20.png)
![$$ V_{k} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq140.png)
![$$ \varvec{A} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq141.png)
![$$ i = 2 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq142.png)
![$$ \varvec{A} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq143.png)
![$$ i = 3, 4 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq144.png)
![$$ \varvec{A} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq145.png)
![$$ i = 2 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq146.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq147.png)
![$$ \left[ {\begin{array}{*{20}c} {H_{1,1} ,} & {H_{1,2} ,} & {H_{1,3} ,} & {H_{1,4} } \\ {H_{2,1} ,} & {H_{2,2} ,} & {H_{2,3} ,} & {H_{2,4} } \\ {H_{3,1} ,} & {H_{3,2} ,} & {H_{3,3} ,} & {H_{3,4} } \\ {H_{4,1} ,} & {H_{4,2} ,} & {H_{4,3} ,} & {H_{4,4} } \\ \end{array} } \right] \to \left[ {\begin{array}{*{20}c} {R_{1,1} ,} & {R_{1,2} ,} & {R_{1,3} ,} & {R_{1,4} } \\ {0,} & {H^{\prime}_{2,2} ,} & {H^{\prime}_{2,3} ,} & {H^{\prime}_{2,4} } \\ {0,} & {H^{\prime}_{3,2} ,} & {H^{\prime}_{3,3} ,} & {H^{\prime}_{3,4} } \\ {0,} & {H^{\prime}_{4,2} ,} & {H^{\prime}_{4,3} ,} & {H^{\prime}_{4,4} } \\ \end{array} } \right], $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ21.png)
![$$ H^{\prime}_{i,j} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq148.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq149.png)
![$$ k $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq150.png)
![$$ V_{k} = \sum\limits_{j = i}^{4} {\left( {H^{\prime}_{j,k} } \right)^{\text{H}} H^{\prime}_{j,k} } ,\,k = 2,3,4 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ22.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq151.png)
![$$ \frac{2}{3}N^{3} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq152.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig6_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig6_HTML.png)
Difference between Cholesky sorted QR decomposition algorithm and a conventional algorithm. a Use this algorithm to sort QRD, b use a conventional algorithm to sort QRD.
© [2018] IEEE. Reprinted, with permission, from Ref. [9]
Compared with other Cholesky decomposition algorithms for MIMO detection [29, 32, 33], the proposed algorithm in this section is slightly different in terms of purpose and implementation details. In ZF-DF algorithm and successive interference cancelation (SIC) detector [29, 33], the diagonal elements of are calculated by the QR decomposition. The Gram matrix is decomposed by the LDL algorithm, which is broken down into a lower triangular matrix
with diagonal elements of 1 and a diagonal matrix
with real numbers. We can compute the diagonal elements of
after decomposing the matrix
. However, in the K-best signal detection algorithm, the whole upper triangular matrix R needs to be calculated. Therefore, the matrix R cannot be calculated simply by using the LDL algorithm, but by using other algorithms, then the subsequent K-best program will be affected. In the CHOSLAR algorithm in this section, the matrices
and
can be solved directly by matrix decomposition, which makes the subsequent K-best program execution can start in a very short time. The Cholesky algorithm is used to decompose and invert the Gram matrix in the linear MMSE signal detection algorithm in Ref. [32]. Compared with the above algorithms, in CHOSLAR algorithm, Cholesky QR decomposition is performed first, then LR algorithm and K-best algorithm are executed, so the algorithm can be applied to nonlinear algorithm with high performance. In a word, the algorithm in this section is more favorable to subsequent K-best search than other algorithms. The algorithm also includes sorting operations in the decomposition process, and the detector accuracy is greatly improved compared with the Cholesky decomposition algorithm in Refs. [29, 32, 33]. The obtained matrix has a flat diagonal element, which is conducive to the computation of LR and K-best, and increases the accuracy of detection. In conventional QR decomposition, the sorting operation is used to optimize the matrix
[21, 24, 25, 27]. In the sorted QR decomposition algorithm in this section, the norm of each column of matrix
has been computed when matrix
is adjusted. As a result, the algorithm does not need additional addition operation, because the adjustment of matrix
as part of the Cholesky decomposition process has been implemented. The conventional QR decomposition, whether using GR [27] or GS algorithm [21, 24, 25], is directly implemented with the matrix
. The decomposition process is performed by column operation, and the sorting operation requires an extra step to compute the norm of all remaining columns that have not been decomposed.
4.2.4.2 PILR
In this section, a PILR algorithm is described to reduce the number of column exchanges while maintaining a constant throughput.
![$$ N/2 + 1 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq165.png)
![$$ k $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq166.png)
![$$ \mu $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq167.png)
![$$ \mu = \text{round}\left( {R_{k - 1,k} /R_{k - 1,k - 1} } \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ23.png)
![$$ k - 1 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq168.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq169.png)
![$$ \mu $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq170.png)
![$$ k $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq171.png)
![$$ R_{1:k,k} \leftarrow R_{1:k,k} - \mu R_{1:k,k - 1} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ24.png)
![$$ k $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq173.png)
![$$ k - 1 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq174.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq175.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq176.png)
![$$ \hat{\varvec{R}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq177.png)
![$$ \hat{\varvec{H}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq178.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq179.png)
![$$ 2 \times 2 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq180.png)
![$$ \varvec{\theta} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq181.png)
![$$ \varvec{\theta}= \left[ {\begin{array}{*{20}c} {a^{\text{H}} ,} & {b^{\text{H}} } \\ { - b,} & a \\ \end{array} } \right], $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ26.png)
![$$ a $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq182.png)
![$$ b $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq183.png)
![$$ a = \frac{{\hat{R}_{k - 1,k - 1} }}{{\sqrt {\hat{R}_{k - 1,k - 1}^{2} { + }\hat{R}_{k,k - 1}^{2} } }},\,b = \frac{{\hat{R}_{k,k - 1} }}{{\sqrt {\hat{R}_{k - 1,k - 1}^{2} { + }\hat{R}_{k,k - 1}^{2} } }} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ27.png)
![$$ k $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq184.png)
![$$ k - 1 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq185.png)
![$$ \theta $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq186.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq187.png)
![$$ \left[ {\begin{array}{*{20}l} {\hat{R}_{k - 1,k - 1} ,} \hfill & {\hat{R}_{k - 1,k} ,} \hfill & \ldots \hfill \\ {0,} \hfill & {\hat{R}_{k,k} ,} \hfill & \ldots \hfill \\ \end{array} } \right] \leftarrow \theta \times \left[ {\begin{array}{*{20}l} {\hat{R}_{k - 1,k} ,} \hfill & {\hat{R}_{k - 1,k - 1} ,} \hfill & \ldots \hfill \\ {\hat{R}_{k,k} ,} \hfill & {0,} \hfill & \ldots \hfill \\ \end{array} } \right] $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ28.png)
![$$ \hat{\varvec{R}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq188.png)
![$$ \hat{R}_{k,k} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq189.png)
![$$ \hat{R}_{1,1} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq190.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq191.png)
![$$ 16 \times 16 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq192.png)
![$$ 8 \times 8 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq193.png)
![$$ \varvec{R} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq194.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig7_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig7_HTML.png)
a Element values of unsorted matrix R, b sorted matrix R.
© [2018] IEEE. Reprinted, with permission, from Ref. [9]
Algorithm 4.4
CHOSLAR algorithm
![../images/478198_1_En_4_Chapter/478198_1_En_4_Figd_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Figd_HTML.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fige_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fige_HTML.png)
In Ref. [34], conventional LR and partial LR operations are used to optimize matrix . First, a conventional algorithm uses the unit matrix
as the trace of all column adjustments, where the channel matrix
can be used directly. Then, when the full size reduction is executed, the original LR [29] and partial LR [34] algorithms iterate elements from
to
one by one, whereas the algorithm proposed in this section operates in the opposite direction, which makes it more possible to operate in line to improve parallelism. As will be mentioned in Chap. 5, the algorithm can achieve high throughput ASIC hardware architecture design, thus proving its high parallelism. Finally, in the conventional LR algorithm, the entire matrix
needs to be adjusted, and the algorithm in this section combines LR with sorted QR decomposition, both of them need to adjust the matrix
and make it owns more advantageous features. The PILR algorithm runs
times from the last row to the
row iteratively. This algorithm makes use of this characteristic and combines it in order to reduce the total number of column exchanges.
4.2.5 Improved K-Best Detector and Its Performance Simulation
![$$ \hat{\varvec{s}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq204.png)
![$$ 16 \times 16 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq205.png)
![$$ \hat{s}_{N} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq206.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig8_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig8_HTML.png)
![$$ \hat{s}_{N} = \hat{y}_{N} /\hat{R}_{N,N} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ29.png)
![$$ \hat{s}_{N} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq207.png)
![$$ \hat{\varvec{s}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq208.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig9_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig9_HTML.png)
Enumeration of constellation points.
© [2018] IEEE. Reprinted, with permission, from Ref. [9]
![$$ 16 \times 16 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq209.png)
![$$ K = 10 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq210.png)
![$$ 8 \times 8 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq211.png)
![$$ 16 \times 16 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq212.png)
![$$ K $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq213.png)
![$$ (K = 8,10,12) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq214.png)
![$$ 10^{ - 5} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq215.png)
![$$ K = 8 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq216.png)
![$$ K = 10 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq217.png)
![$$ (K = 10) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq218.png)
![$$ 16 \times 16 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq219.png)
![$$ 10^{ - 5} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq220.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig10_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig10_HTML.png)
Comparison of BER performance, a PILR and full-matrix LR, b different K values, c different algorithms.
© [2018] IEEE. Reprinted, with permission, from Ref. [9]
![$$ 16 \times 16 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq221.png)
![$$ 64 \times 64 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq222.png)
![$$ 128 \times 128 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq223.png)
![$$ 64 \times 64 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq224.png)
![$$ 128 \times 128 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq225.png)
![$$ 64 \times 64 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq226.png)
![$$ 128 \times 128 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq227.png)
![$$ K = 14 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq228.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig11_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig11_HTML.png)
BER performance comparison with different configurations and different modulation. a 64 × 64 MIMO 64QAM, b 128 × 128 MIMO 64QAM, c 16 × 16 MIMO 256QAM.
© [2018] IEEE. Reprinted, with permission, from Ref. [9]
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq229.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq230.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq231.png)
![$$ (16 \times 32) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq232.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig12_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig12_HTML.png)
Comparison of BER performance. a Asymmetric (16 × 32) MIMO system, b Kronecker channel (16 × 16 MMIMO, 64QAM).
© [2018] IEEE. Reprinted, with permission, from Ref. [9]
![$$ N\left( {0,d\left( \varvec{z} \right)\varvec{I}_{B} } \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq233.png)
![$$ d\left( \varvec{z} \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq234.png)
![$$ d\left( \varvec{z} \right) = C/\left\| {\varvec{z} - \varvec{b}} \right\|^{\kappa } $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq235.png)
![$$ \varvec{b} \in \varvec{R}^{2} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq236.png)
![$$ \kappa $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq237.png)
![$$ \parallel \cdot\parallel $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq238.png)
![$$ C $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq239.png)
![$$ 10{ \lg }C \sim N\left( {0,\sigma_{\text{sf}}^{2} } \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq240.png)
![$$ \varvec{R}_{\text{r}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq241.png)
![$$ \varvec{R}_{\text{t}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq242.png)
![$$ \xi $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq243.png)
![$$ \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq244.png)
![$$ \varvec{H} = \varvec{R}_{\text{r}}^{1/2} \varvec{H}_{{\text{i}\text{.i}\text{.d}\text{.}}} \sqrt {d\left( \varvec{z} \right)} \varvec{R}_{\text{t}}^{1/2} , $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ30.png)
![$$ \varvec{H}_{{{\text{i}}.{\text{i}}.{\text{d}}.}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq245.png)
![$$ r = 500{\text{m}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq246.png)
![$$ \varvec{z} \in \varvec{R}^{2} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq247.png)
![$$ \kappa = 3.7 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq248.png)
![$$ \sigma_{\text{sf}}^{2} = 5 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq249.png)
![$$ \rho = r^{\kappa } /2 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq250.png)
![$$ \xi = 0.2 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq251.png)
4.2.6 Summary and Analysis
![$$ \varvec{H}^{\text{H}} \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq252.png)
![$$ \varvec{A} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq253.png)
![$$ \varvec{H}^{\text{H}} \varvec{H} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq254.png)
![$$ 2N^{3} + 2N^{2} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq255.png)
![$$ N^{3} - N $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq256.png)
![$$ \varvec{A} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq257.png)
![$$ N^{3} - 2N^{2} + 5N $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq258.png)
![$$ N^{3} - 3N^{2} + 3N $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq259.png)
![$$ \varvec{Q} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq260.png)
![$$ \frac{2}{3}N^{3} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq261.png)
![$$ N = 16 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq262.png)
Computational complexity of different detection algorithms in a higher order MIMO system
Algorithm | Real addition | Real multiplication |
---|---|---|
GS |
|
|
GR |
|
|
HT |
|
|
Cholesky sorted QR decomposition |
|
|
![$$ 16 \times 16 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq271.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig13_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig13_HTML.png)
Column exchange comparison between LR for non-sorted QR decomposition and LR for sorted QR decomposition. a LR for non-sorted QRD, b LR for sorted QRD.
© [2018] IEEE. Reprinted, with permission, from Ref. [9]
Considering the parallelism of hardware implementation, the Cholesky sorted QR decomposition algorithm can eliminate an entire column of matrix at one time, while the conventional GR algorithm based on paired CORDIC can only eliminate one element of matrix
at one time. Also, the elements in the left column must have been eliminated before removing a new column of elements. The correlation between these elimination limits its parallelism, especially when the user number and antennas number increase in higher order MIMO systems. For example, for a
matrix, the GR algorithm based on paired CORDIC takes four rounds to eliminate all elements in the first column. Therefore, compared with the conventional QR decomposition, the Cholesky-sorted QR decomposition algorithm can achieve higher parallelism and lower preprocessing delay. In addition, the parallelism of the size reduction process in the LR algorithm is also improved, because the column updates of matrices
and
are implemented by row rather than by element.
4.3 TASER Algorithm
4.3.1 System Model
TASER algorithm is applicable to two scenarios: the coherent data detection in MU-MIMO wireless systems and the JED in large-scale single-input multiple-output (SIMO) wireless systems.
First, consider the first application scenario, that is, the coherent data detection in MU-MIMO wireless system. The channel model is shown in Eq. (4.1), the ML detection is shown in Eq. (4.3), assuming that the channel matrix has been estimated.
For conventional small-scale MIMO systems, a series of SD algorithms are presented in Ref. [15]. However, the computational complexity of these algorithms increases exponentially with the increase of the number of the transmitting antennas , so they are not suitable for massive MIMO system. When the ratio of the number of receiving antennas to the transmitting antennas is more than 2 in the massive MU-MIMO system, some newly linear algorithms can achieve the near-ML performance [3]. However, when the number of the receiving antennas increases, the ratio of the number of receiving antennas to the transmitting antennas is close to 1, the BER performance of these linear algorithms is too poor to be accepted [3].
![$$ \bar{\varvec{y}} = \varvec{\bar{H}\bar{s}} + \varvec{\bar{n}}, $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ31.png)
![$$ \begin{aligned} & \bar{\varvec{y}} = \left[ {\begin{array}{*{20}l} {\text{Re}\left\{ \varvec{y} \right\}} \hfill \\ {\text{Im} \left\{ \varvec{y} \right\}} \hfill \\ \end{array} } \right],\,\bar{\varvec{H}} = \left[ {\begin{array}{*{20}c} {\text{Re}\left\{ \varvec{H} \right\}} & { - \text{Im} \left\{ \varvec{H} \right\}} \\ {\text{Im} \left\{ \varvec{H} \right\}} & {\text{Re}\left\{ \varvec{H} \right\}} \\ \end{array} } \right] \\ & \bar{\varvec{s}} = \left[ {\begin{array}{*{20}l} {\text{Re}\left\{ \varvec{s} \right\}} \hfill \\ {\text{Im} \left\{ \varvec{s} \right\}} \hfill \\ \end{array} } \right], \, \bar{\varvec{n}} = \left[ {\begin{array}{*{20}l} {\text{Re}\left\{ \varvec{n} \right\}} \hfill \\ {\text{Im} \left\{ \varvec{n} \right\}} \hfill \\ \end{array} } \right] \\ \end{aligned} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ32.png)
![$$ \bar{\varvec{s}}^{{\text{ML}}} = \mathop {\arg \hbox{min} }\limits_{{\tilde{\varvec{s}} \in \chi N}} \text{Tr}\left( {\tilde{\varvec{s}}^{\text{H}} \varvec{T\tilde{s}}} \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ33.png)
For QPSK, is the matrix of
,
. The range of its element value is
. In this way, the solution
can be converted back to the solution
in the complex domain. For BPSK,
is the matrix of
,
. The matrix
of
is defined. At this time
, so
.
![$$ K + 1 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq294.png)
![$$ N_{\text{r}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq295.png)
![$$ \varvec{Y} = \varvec{hs}^{\text{H}} + \varvec{N} , $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ34.png)
![$$ \varvec{Y} \in \varvec{C}^{{N_{\text{r}} \times \left( {K + 1} \right)}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq296.png)
![$$ K + 1 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq297.png)
![$$ \varvec{h} \in \varvec{C}^{{N_{\text{r}} }} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq298.png)
![$$ K + 1 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq299.png)
![$$ \varvec{s}^{\text{H}} \in \varvec{O}^{{1 \times \left( {K + 1} \right)}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq300.png)
![$$ K + 1 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq301.png)
![$$ \varvec{N} \in \varvec{C}^{{N_{\text{r}} \times \left( {K + 1} \right)}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq302.png)
![$$ N_{0} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq303.png)
![$$ \left\{ {\hat{\varvec{s}}^{{\text{JED}}} ,\hat{\varvec{h}}} \right\} = \mathop {\arg \hbox{min} }\limits_{{\varvec{s} \in {\mathbf{O}}^{K + 1} ,\varvec{h} \in {\mathbf{C}}^{B} }} \left\| {\varvec{Y} - \varvec{hs}^{\text{H}} } \right\|_{\text{F}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ35.png)
Note that both two outputs of JED have stage ambiguity that is, for a certain stage , if
, then
is also one of the solutions. To avoid this problem, assume that the first transmitted entry has been known to the receiver.
![$$ \hat{\varvec{s}}^{{\text{JED}}} = \mathop {\arg \hbox{max} }\limits_{{\varvec{s} \in {\mathbf{O}}^{K + 1} }} \left\| {\varvec{Ys}} \right\|_{2} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ36.png)
is the estimation for channel vectors. When the time slot
is very small, Eq. (4.36) can be accurately solved [37] with a low-complexity SD algorithm. However, the complexity of the SD algorithm will become very high when the time slot is very large. Compared with the coherent ML detection algorithm in the above Eq. (4.33), it is not recommendable to approximate the solution of Eq. (4.36) by the linear algorithm, because the possible value of every item of
is infinite after the constraint
is relaxed to
.
![$$ s_{0} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq312.png)
![$$ \parallel \varvec{Ys}\parallel_{2} = \parallel y_{0} s_{0} + \varvec{Y}_{\text{r}} \varvec{s}_{\text{r}} \parallel_{2} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq313.png)
![$$ \varvec{Y}_{\text{r}} = \left[ {y_{1} , \ldots ,y_{K} } \right] $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq314.png)
![$$ \varvec{s}_{\text{r}} = \left[ {s_{1} , \ldots ,s_{K} } \right]^{\text{H}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq315.png)
![$$ \bar{\varvec{y}} = \left[ {\begin{array}{*{20}c} {\text{Re} \left\{ {y_{0} s_{0} } \right\}} \\ {\text{Im} \left\{ {y_{0} s_{0} } \right\}} \\ \end{array} } \right],\,\bar{\varvec{H}} = \left[ {\begin{array}{*{20}c} {\text{Re} \left\{ {\varvec{Y}_{\text{r}} } \right\}} & {\text{ - }\text{Im} \left\{ {\varvec{Y}_{\text{r}} } \right\}} \\ {\text{Im} \left\{ {\varvec{Y}_{\text{r}} } \right\}} & {\text{Re} \left\{ {\varvec{Y}_{\text{r}} } \right\}} \\ \end{array} } \right],\,\bar{\varvec{s}} = \left[ {\begin{array}{*{20}c} {\text{Re} \left\{ {\varvec{s}_{\text{r}} } \right\}} \\ {{\text{Im}}\left\{ {\varvec{s}_{\text{r}} } \right\}} \\ \end{array} } \right] $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ37.png)
![$$ \parallel y_{0} s_{0} + \varvec{Y}_{\text{r}} \varvec{s}_{\text{r}} \parallel_{2} = \parallel \bar{\varvec{y}} + \varvec{\bar{H}\bar{s}}\parallel_{2} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq316.png)
![$$ \bar{\varvec{s}}^{{\text{JED}}} = \mathop {\arg \hbox{min} }\limits_{{\tilde{\varvec{s}} \in \chi N}} \text{Tr}\left( {\tilde{\varvec{s}}^{\text{T}} \varvec{T\tilde{s}}} \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ38.png)
For QPSK, is the matrix of
,
. The range of its element value is
. For BPSK,
is the matrix of
,
. The matrix
of
is defined. Similar to coherent ML problem, the solution
can be converted to the solution in the complex domain.
Next, we will introduce how to find the approximate solutions of Eqs. (4.33) and (4.38) with the same SDRS-based algorithm.
4.3.2 Semi-definite Relaxation
SDR is known to all because it can be used to solve the coherent ML problem [36]. It can significantly reduce the computational complexity of BPSK and QPSK modulation systems. SDR cannot only provide near-ML performance, but also achieve the same diversity order of magnitude as ML detector [38]. Meanwhile, SDR is rarely used to solve the ML JED problem.
![$$ \hat{\varvec{S}} = \mathop {\arg \hbox{min} }\limits_{{\varvec{S} \in {\mathbf{R}}^{N \times N} }} \text{Tr}\left( {\varvec{TS}} \right),\,\text{s}\text{.t}\text{.}\,\text{diag}\left( \varvec{S} \right) = 1,\,\text{rank}\left( \varvec{S} \right) = 1, $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ39.png)
![$$ {\text{Tr}}\left( {\varvec{s}^{\text{H}} \varvec{Ts}} \right) = {\text{Tr}}\left( {\varvec{Tss}^{\text{H}} } \right) = {\text{Tr}}\left( {\varvec{TS}} \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq329.png)
![$$ \varvec{S} = \varvec{ss}^{\text{H}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq330.png)
![$$ \varvec{s} \in {\mathcal{X}}^{N} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq331.png)
![$$ N $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq332.png)
![$$ \hat{\varvec{S}} = \mathop {\arg \hbox{min} }\limits_{{\varvec{S} \in {\mathbf{R}}^{N \times N} }} \text{Tr}\left( {\varvec{TS}} \right) ,\,\text{s}\text{.t}\text{.}\,\text{diag}\left( \varvec{S} \right) = 1,\,\varvec{S}\underline{ \succ } 0 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ40.png)
Constraint makes
a positive semi-definite (PSD) matrix. If the rank of the result from Eq. (4.40) is 1, then
in
is an accurate estimation of Eqs. (4.33) and (4.38), that means SDR solves the original problem in an optimal way. If the rank of
is greater than 1, the ML solution can be estimated by taking the symbol of the leading eigenvector of
or adopting a random scheme [36].
4.3.3 Algorithm Analysis
In this section, we will introduce a new algorithm, TASER, which is used to approximate the SDP problem presented in Eq. (4.40).
![$$ \varvec{S} \ge 0 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq339.png)
![$$ \varvec{S} = \varvec{L}^{\text{H}} \varvec{L} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq340.png)
![$$ \varvec{L} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq341.png)
![$$ N \times N $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq342.png)
![$$ \hat{\varvec{L}} = \mathop {\arg \hbox{min} }\limits_{\varvec{L}} \text{Tr}\left( {\varvec{LTL}^{\text{H}} } \right) ,\,\text{s}\text{.t}\text{. }\left\| {l_{k} } \right\|_{2} = 1,\quad \forall k $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ41.png)
Equation (4.41) uses two-norm constraint to replace in Eq. (4.40), where
The symbolic bit of the last row of solution
from Eq. (4.41) is taken as the solution to the problem in Eqs. (4.33) and (4.38), because if the rank of
is 1, then the last row of
, as the unique vector, must contain the relevant eigenvectors. If the rank of
is greater than 1, an near-ML solution needs to be extracted. In Ref. [39], it is proposed that the last row of the result from Cholesky decomposition can be used as an approximation of PSD matrix with rank 1. In Chap. 5, the simulation results all prove that the performance achieved by this approximation is close to that of the exact SDR detector from eigenvalue decomposition. This approach avoids eigenvalue decomposition and stochastic solution in conventional schemes, thus reducing the complexity.
An effective algorithm is proposed to directly solve the triangular SDP of Eq. (4.41). However, the matrix of Eq. (4.41) is non-convex, so it becomes difficult to find the optimal solution. For the TASER algorithm, a special method to solve the convex optimization problem, that is FBS method, it is applied to solve the non-convex problem of Eq. (4.41) [40]. This method cannot guarantee that the non-convex problem of Eq. (4.41) can converge to the optimal solution. Therefore, in Chap. 5, TASER algorithm will be proved to converge to a key point, and simulation results also prove that this method can guarantee the BER performance similar to that of the ML algorithm.
![$$ \hat{\varvec{x}} = \mathop {\arg \hbox{min} }\limits_{\varvec{x}} f\left( \varvec{x} \right) + g\left( \varvec{x} \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq350.png)
![$$ f $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq351.png)
![$$ g $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq352.png)
![$$ \varvec{x}^{\left( t \right)} = \text{prox}_{g} \left( {\varvec{x}^{{\left( {t - 1} \right)}} - \tau^{{\left( {t - 1} \right)}} {\nabla }f\left( {\varvec{x}^{{\left( {t - 1} \right)}} } \right);\tau^{{\left( {t - 1} \right)}} } \right),\quad t = 1,2, \ldots $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ42.png)
![$$ t_{ \hbox{max} } $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq353.png)
![$$ \left\{ {\tau^{\left( t \right)} } \right\} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq354.png)
![$$ {\nabla }f\left( \varvec{x} \right) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq355.png)
![$$ f $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq356.png)
![$$ g $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq357.png)
![$$ \text{prox}_{g} \left( {\varvec{z};\tau } \right) = \mathop {\arg \hbox{min} }\limits_{\varvec{x}} \left\{ {\tau g\left( \varvec{x} \right) + \frac{1}{2}\left\| {\varvec{x} - \varvec{z}} \right\|_{2}^{2} } \right\} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_Equ43.png)
In order to approximately solve the Eq. (4.42) with FBS, we define and
, where
is the eigenfunction (the value is 0 when the constraint is satisfied, otherwise the value is infinite). The gradient function expressed as
, where
means extracting its lower triangular part. Although the function
is non-convex, the approximation operator still has a closed solution
.
In order to be friendly in hardware implementation, the complex step rule proposed in Ref. [40] is not used here, but a fixed step is used to improve the convergence rate of TASER algorithm [41]. The value of the fixed step is proportional to the reciprocal of Lipschitz constant of the gradient function
, where
is the spectral norm of matrix
and
is dependent on the system’s regulation parameters, used to improve the convergence rate of TASER algorithm [41]. To further improve the convergence rate of FBS, we need to preprocess the problem in Eq. (4.41). First, the diagonal scaling matrix
is computed, which is used to scale the matrix
to get a matrix with
.
is a matrix with a main diagonal of 1. The processor that implements this operation, called the Jacobian preprocessor, is used to increase the conditional number of the original PSD matrix
[42]. Then run FBS to get the lower triangular matrix
. In the process of preprocessing, we also need to correct the proximity operators. In this case,
, where
is the
column of
. Because we only take the symbol bits of the last row of
to estimate the ML problem, here we can take only the symbols of the normalized triangular matrix
.
The pseudocode of the TASER algorithm is shown in Algorithm 4.5. Input is the preprocessing matrix , scaling matrix
and step size
.
is used for initialization. The main loop body of TASER algorithm is to run gradient function and proximity operator until the final iterations number
is obtained. In most cases, only a few iterations are required to obtain BER performance approximate to near-ML.
Algorithm 4.5
TASER algorithm
![../images/478198_1_En_4_Chapter/478198_1_En_4_Figf_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Figf_HTML.png)
The TASER algorithm tries to use FBS to solve a non-convex problem, which will result in two problems. One is whether the algorithm converges to the minimum; the other is whether the local minimum of the non-convex problem corresponds to the minimum of SDP for the convex optimization problem. Next, we will solve these two problems.
For the first problem, although it is still novel to use FBS to solve the minimum value of the positive semi-definite, there has been a lot of research on the convergence of FBS to solve non-convex problems. In Ref. [43], the condition of solving convergence of non-convex problems with FBS is proposed, and the problem to be solved must be semi-algebraic. Equation (4.41) meets this condition exactly. A strict proof of this solution will be given below.
Proposition 4.3.1
If FBS (Algorithm 4.5) is used to solve the Eq. (4.41) and the step is , then the iterative sequence
will converge to a key point.
Proof
Function is a polynomial, and the constraint set of the Eq. (4.41) is the solution of the polynomial system
, so it is semi-algebraic. Theorem 5.3 in Ref. [43] shows that if the upper bound of the step is the inverse of the Lipschitz constant of the gradient function of the objective function, then the iterative sequence
is convergent. The Lipchitz constant is the spectral radius (two norm) of the matrix
.□
Jacobi preprocessor leads to a problem in the same form as Eq. (4.41), except that the constraint is and the step is
, so Proposition 4.3.1 is applicable as well. However, this proposition does not guarantee that a minimum point is found, but only a stable point (in fact, a minimum point is often found). Nevertheless, this proposition is much better than other known low-complexity SDP algorithms in guaranteeing convergence. For example, non-convex enhanced Lagrange scheme adopted in Ref. [44] cannot guarantee convergence.
The second question is whether the local minimum of Eq. (4.41) corresponds to the minimum of convex SDP of Eq. (4.40). In Ref. [44], it is proposed that the local minimum in Eq. (4.41) is the minimum of SDP in Eq. (4.40) when the factors and
are not constrained into triangular form. Nevertheless,
and
are constrained to a triangular form, as this simplifies the hardware architecture designed in Chap. 5.
4.3.4 Performance Analysis
![$$ 128 \times 8(N_{\text{r}} \times N_{\text{t}} ) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq399.png)
![$$ 64 \times 16 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq400.png)
![$$ 32 \times 32 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq401.png)
![$$ (K = 5) $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq402.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig14_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig14_HTML.png)
VER performance comparison of MIMOs with different configurations a BPSK, b QPSK.
© [2018] IEEE. Reprinted, with permission, from Ref. [10]
For massive MIMO systems, we can see that the performance of all detectors is close to the optimal performance, including the SIMO lower bound. This result is obvious [46]. For the
massive MIMO system, only linear detection has rather small performance loss, and other detectors have better performance. As for
massive MIMO system, we can see that that TASER algorithm can still achieve near-ML performance, which is obviously superior to MMSE algorithm and K-best algorithm (however, even with SD algorithm, ML detection complexity is still very high). Figure 4.14a, b also show the fixed-point performance of the TASER algorithm, showing only a small performance loss (SNR below 0.2 dB at 1% VER point).
![$$ N_{\text{r}} = 16 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq406.png)
![$$ K + 1 = 16 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq407.png)
![$$ t_{ \hbox{max} } = 20 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq408.png)
![../images/478198_1_En_4_Chapter/478198_1_En_4_Fig15_HTML.png](../images/478198_1_En_4_Chapter/478198_1_En_4_Fig15_HTML.png)
BER performance comparison of SIMO system. a BPSK, b QPSK.
© [2018] IEEE. Reprinted, with permission, from Ref. [10]
4.3.5 Computational Complexity
![$$ t_{ \hbox{max} } $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq409.png)
![$$ t_{ \hbox{max} } N_{\text{t}}^{3} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq410.png)
![$$ t_{ \hbox{max} } N_{\text{t}}^{2} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq411.png)
![$$ t_{ \hbox{max} } N_{\text{r}} N_{\text{t}} $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq412.png)
![$$ 32 \times 32 $$](../images/478198_1_En_4_Chapter/478198_1_En_4_Chapter_TeX_IEq413.png)