© Springer Nature Switzerland AG 2019
C. Bocci, L. ChiantiniAn Introduction to Algebraic Statistics with TensorsUNITEXT118https://doi.org/10.1007/978-3-030-24624-2_8

8. Marginalization and Flattenings

Cristiano Bocci1   and Luca Chiantini1
(1)
Dipartimento di Ingegneria dell’Informazione e Scienze Matematiche, Università di Siena, Siena, Italy
 
 
Cristiano Bocci

We collect in this chapter some of the most useful operations on tensors, in view of the applications to Algebraic Statistics.

8.1 Marginalization

The concept of marginalization goes back to the beginnings of a statistical analysis of discrete random variables, when, mainly, only a pair of variables were compared and correlated. In this case, distributions in the total correlation corresponded to matrices, and it was natural to annotate the sums of rows and columns at the edges (margins) of the matrices.

Definition 8.1.1

For matrices of given type $$n\times m$$ over a field K, which can be seen as points of the vector space $$K^{n,m}$$, the marginalization  is the linear map $$\mu :K^{n,m}\rightarrow K^n\times K^m$$ which sends the matrix $$A=(a_{ij})$$ to the pair $$((v_1,\dots ,v_n),(w_1,\dots ,w_m))\in K^n\times K^m$$, where
$$ v_i= \sum _{j=1}^m a_{ij},\qquad w_j= \sum _{i=1}^n a_{ij}.$$
In practice, the $$v_i$$’s correspond to the sums of the columns, the $$w_j$$’s correspond to the sums of the rows.

Notice that we can define as well the marginalization of A by $$((1,\dots ,1)A,(1,\dots ,1)A^t)$$.

Below there is an example of marginalization of a $$3\times 3$$ matrix.
../images/485183_1_En_8_Chapter/485183_1_En_8_Equ32_HTML.png
The notion can be extended (with some complication only for the notation) to tensors of any dimension.

Definition 8.1.2

For tensors of given type $$a_1\times \dots \times a_n$$ over a field K, which can be seen as points of the vector space $$K^{a_1,\dots , a_n}$$, the marginalization  is the linear map $$\mu :K^{a_1,\dots , a_n}\rightarrow K^{a_1}\times \dots \times K^{a_n}$$ which sends the tensor $$A=(\alpha _{q_1\dots q_n})$$ to the n-uple $$((v_{11},\dots ,v_{1a_1}),\dots ,(v_{n1},\dots ,v_{na_n}))\in K^{a_1}\times \dots \times K^{a_n}$$, where
$$ v_{ij}= \sum _{q_i=j} \alpha _{q_1\dots q_n},$$
i.e., in each sum we fix the ith index and take the sum over all the elements of the tensor in which the ith index is equal to j.
../images/485183_1_En_8_Chapter/485183_1_En_8_Figa_HTML.png

Example 8.1.3

The marginalization of the $$3\times 2 \times 2$$ tensor of Example 6.​1.​6

is ((18, 36), (9, 18, 27), (18, 36)).

Since the marginalization $$\mu $$ is a linear map, we can analyze its linear properties. It is immediate to realize that, except for trivial cases, $$\mu $$ is not injective.

Example 8.1.4

Even for $$2\times 2$$ matrices, the matrix
$$M= \begin{pmatrix} 1 &{} -1\\ -1 &{} 1\end{pmatrix}$$
belongs to the Kernel of $$\mu $$. Indeed, M generates the Kernel of $$\mu $$.

Even the surjectivity of $$\mu $$ fails in general. This is obvious for $$2\times 2$$ matrices, since $$\mu $$ is a noninjective linear map between $$K^4$$ and itself.

Indeed, if $$((v_1,\dots ,v_n),(w_1,\dots ,w_m))$$ is the marginalization of the matrix $$A=(a_{ij})$$, then clearly $$v_1+\dots +v_n$$ is equal to the sum of all the entries of A, thus it is also equal to $$w_1+\dots +w_m$$. We prove that this is the only restriction for a vector in $$K^n\times K^m$$ to belong to the image of the marginalization.

Proposition 8.1.5

A vector $$((v_1,\dots ,v_n),(w_1,\dots ,w_m))\in K^n\times K^m$$ is the marginalization of a matrix $$A=(a_{ij})\in K^{n,m}$$ if and only if
$$v_1+\dots +v_n = w_1+\dots +w_m.$$

Proof

The fact that the condition is necessary follows from the few lines before the proposition. We need to prove that the condition is sufficient. So, assume that $$\sum v_i= \sum w_j.$$ A way to prove the claim, which can be easily extended even to higher dimensional tensors, is the following. Write $$e_i$$ for the element in $$K^n$$ with 1 in the ith position and 0 elsewhere, so that $$e_1,\dots ,e_n$$ is the canonical basis of $$K^n$$. Define similarly $$e'_1,\dots ,e'_m\in K^m$$. It is clear that any pair $$(e_i,e'_j)$$ belongs to the image of $$\mu $$: just take the marginalization of the matrix having $$a_{ij}=1$$ and all the remaining entries equal to 0. So, it is sufficient to prove that if $$v_1+\dots +v_n = w_1+\dots +w_m$$, then $$(v,w)=((v_1,\dots ,v_n),(w_1,\dots ,w_m))$$ belongs to the subspace generated by the $$(e_i,e'_j)$$’s. Assume that $$n\le m$$ (if the converse holds, just take the transpose). Then notice that
$$\begin{aligned} (v,w)=v_1(e_1,e'_1)+\sum _{i=2}^n (v_1+&\dots +v_i-w_1-\dots -w_{i-1})(e_i,e'_i)+ \\&\sum _{i=1}^{n-1} (w_1+\dots +w_i-v_1-\dots -v_i) (e_{i+1},e'_i).\quad \;\,\square \end{aligned}$$

Corollary 8.1.6

The image of $$\mu $$ has dimension $$n+m-1$$. Thus the kernel of $$\mu $$ has dimension $$nm-n-m+1$$.

We can extend the previous computation to tensors of arbitrary dimension.

Proposition 8.1.7

A vector
$$((v_{11},\dots ,v_{1a_1}),\dots ,(v_{n1},\dots ,v_{na_n}))\in K^{a_1}\times \dots \times K^{a_n}$$
is the marginalization of a tensor of type $$a_1\times \dots \times a_n$$ if and only if
$$v_{11}+\dots +v_{1a_1}= v_{21}+\dots +v_{2a_2} =\dots =v_{n1}+\dots +v_{na_n}.$$

Definition 8.1.8

If the marginalization $$\mu (T)$$ of a tensor T is the element $$((v_{11},\dots ,v_{1a_1}),\dots ,(v_{n1},\dots ,v_{na_n}))$$ of $$K^{a_1}\times \dots \times K^{a_n}$$, then the vector $$\mu (T)_i=(v_{i1},\dots ,v_{ia_i})\in K^{a_i}$$ is called the i-contraction of T.

Next, we will study the connections between the marginalization of a tensor T and the tensor product of its contractions.

Proposition 8.1.9

Let $$T\in K^{a_1,\dots ,a_n}$$ be a tensor of rank 1. Let $$u_i\in K^{a_i}$$ be the i-contraction of T. Then T is a scaling of $$u_1\otimes \dots \otimes u_n$$.

Proof

Assume $$T=w_1\otimes \dots \otimes w_n$$ where $$w_i=(w_{i1},\dots ,w_{ia_i})$$. Then $$T_{i_1,\dots ,i_n}= w_{1i_1}\cdots w_{ni_n}$$, hence $$u_i=(u_{i1},\dots , u_{ia_i})$$ where
$$ u_{ij} =\sum _{j_i=j} w_{1j_1}\cdots w_{nj_n}= w_{ij_i}\sum w_{1j_1}\cdots \hat{w}_{ij_i} \cdots w_{nj_n},$$
hence $$u_i$$ is a multiple of $$w_i$$ (by $$\sum w_{1j_1}\cdots \hat{w}_{ij_i} \cdots w_{dj_d}$$). $$\square $$

Remark 8.1.10

We can be even more precise about the scaling factor of the previous proposition. Namely, assume $$T=w_1\otimes \dots \otimes w_n$$ where $$w_i=(w_{i1},\dots ,w_{ia_i})$$, and set $$W_i=\sum w_{i1}+\dots +w_{ia_i}$$. Then $$u_1\otimes \dots \otimes u_n=WT$$, where
$$ W= \Pi _{i=1}^n W_1\cdots \hat{W}_i \cdots W_n.$$
../images/485183_1_En_8_Chapter/485183_1_En_8_Figb_HTML.png

Example 8.1.11

Let $$T\in K^{2,2,2}$$ be the rank 1 tensor, product of $$(1,2)\otimes (3,4)\otimes (5,6)$$. Then,

so that the the marginalization of T is $$(u_1,u_2,u_3){=}((77,154), (99,132),(105,136))$$. Clearly, $$(77,154)=77(1,2)$$, $$(99,132)=33(3,4)$$ and $$(105,136)=21(5,6)$$. Here $$W_1=3$$, $$W_2=7$$, $$W_3=11$$, so that
$$ u_1\otimes u_2\otimes u_3= (3\cdot 7)(3\cdot 11)(7\cdot 11) T = (53361) T.$$

When T has rank $$>1$$, then clearly it cannot coincide with a multiple of $$u_1\otimes \dots \otimes u_n$$. For general T, the product of its contractions $$u_1\otimes \dots \otimes u_n$$ determines a good rank 1 approximation of T.

../images/485183_1_En_8_Chapter/485183_1_En_8_Figc_HTML.png
../images/485183_1_En_8_Chapter/485183_1_En_8_Figd_HTML.png

Example 8.1.12

Consider the tensor

which is an approximation of $$(1,2)\otimes (2,3)\otimes (3,4)$$ (only one entry has been changed). The marginalization of T is $$(u_1,u_2,u_3)=((35,66),(42,59),(45,56))$$. The product $$u_1\otimes u_2 \otimes u_3$$, divided by $$35\cdot 42\cdot 45= 66150$$ gives the rank 1 tensor (approximate):

not really far from T.

Indeed, for some purposes, the product of the contractions can be considered as a good rank 1 approximation of a given tensor. We warn the reader that, on the other hand, there are other methods for the rank 1 approximation of tensors which, in many cases, produce a result much closer to the best possible rank 1 approximation. See, for instance, Remark 8.3.12.

8.2 Contractions

The concept of i-contraction of a tensor can be extended to any subset of indices. In this case, we will talk about partial contraction . The definition is straightforward, though it can look odd, at a first sight, since it requires many levels of indices.

Definition 8.2.1

For a tensor T of type $$a_1\times \dots \times a_n$$ and for any subset J of the set $$[n]=\{1,\dots ,n\}$$, of cardinality $$n-q$$, define the J-contraction  $$T^J$$ as follows. Set $$\{1,\dots , n\}\setminus J=\{j_1,\dots ,j_q\}$$. For any choice of $$k_1,\dots , k_q$$ with $$1\le k_s\le a_{j_s}$$, put
$$ T^J_{k_1,\dots ,k_q} = \sum T_{i_1,\dots ,i_n},$$
where the sum ranges on all the entries $$T_{i_1,\dots ,i_n}$$ in which $$i_{j_s}=k_s$$ for $$s=1,\dots ,q$$.
Some example is needed.
../images/485183_1_En_8_Chapter/485183_1_En_8_Fige_HTML.png

Example 8.2.2

Consider the tensor:

The contraction of T along $$J=\{2\}$$ means that we take the sum of the left face with the right face, so that, e.g., $$T^J_{11}=T_{111}+T_{121}$$, and so on. We get
$$T^J=\begin{pmatrix} 28 &{} 38 \\ 14 &{} 21\end{pmatrix}.$$
The contraction of T along $$J'=\{3\}$$ means that we take the sum of the front face with the rear face, so that, e.g., $$T^{J'}_{11}=T_{111}+T_{112}$$, and so on. We get
$$T^{J'}=\begin{pmatrix} 30 &{} 36 \\ 15 &{} 20\end{pmatrix}.$$
Instead, if we take the contraction along $$J''=\{2,3\}$$, we get a vector $$T^{J''}\in K^2$$, whose entries are the sums of the upper and the lower faces. Indeed $$T^{J''}_{1}=T_{111}+T_{112}+T_{121}+T_{122}$$, so that
$$ T^{J''} =(66, 35).$$
In this case, $$T^{J''}$$ is the 1-st contraction of T.

The last observation of the previous example generalizes.

Proposition 8.2.3

If $$J=\{1,\dots ,n\}\setminus \{i\}$$, then $$T^J$$ is the ith contraction of T.

Proof

Just compare the two definitions.

Remark 8.2.4

The contraction along $$J\subset \{1,\dots ,n\}$$ determines a linear map between spaces of tensors.

If $$J'\subset J\subset \{1,\dots ,n\}$$, then the contraction $$T^{J'}$$ is a contraction of $$T^J$$.

The relation on the ranks of a tensor and its contractions is expressed as follows:

Proposition 8.2.5

If T has rank 1, then every contraction of T is either 0 or it has rank 1. As a consequence, for every $$J\subset \{1,\dots ,n\}$$,
$$ \mathrm {rank}(T^J) \le \mathrm {rank}T.$$

Proof

In view of Remark 8.2.4, it is enough to prove the first statement when J is a singleton. Assume, for simplicity, that $$J=\{n\}$$. Then by definition
$$ T^J_{i_1\dots i_{n-1}} = \sum _{j=1}^{a_n} T_{1_1\dots i_{n-1}j}.$$
If $$T=v_1\otimes \dots \otimes v_n$$, where $$v_i=(v_{i1},\dots ,v_{ia_i})$$, then $$T_{i_1\dots i_{n-1}j}= v_{1i_1}\cdots v_{n-1\ i_{n-1}}v_{nj}$$. It follows that, setting $$w=v_{n1}+\dots +v_{na_n}$$, then
$$ T^J_{i_1\dots i_{n-1}}=wv_{1i_1}\cdots v_{n-1\ i_{n-1}},$$
so that $$T^J=wv_1\otimes \dots \otimes v_{n-1}$$.

The second statement follows immediately from the first one and from the linearity of the contraction. Indeed, if $$T=T_1+\dots +T_r$$, where each $$T_i$$ has rank 1, then $$T^J=T_1^J+\dots + T^J_r$$.

The following proposition is indeed an extension of Proposition 8.1.9.

Proposition 8.2.6

Let $$T\in K^{a_1,\dots ,a_n}$$ be a tensor of rank 1. Let $$Q_1,\dots ,Q_m$$ be a partition of $$\{1,\dots ,n\}$$, with the property that every element of $$Q_i$$ is smaller than every element of $$Q_{i+1}$$, for all i. Define $$J_i=\{1,\dots ,n\} \setminus Q_i$$.

Then the tensor product $$T^{J_1}\otimes \dots \otimes T^{J_m}$$ is a scalar multiple of T.

Proof

The proof is straightforward when we take the (maximal) partition where $$m=n$$ and $$Q_i=\{i\}$$ for all i. Indeed in this case $$T^{J_i}$$ is the ith contraction of T, and we can apply Proposition 8.1.9.

For the general case, we can use Proposition 8.1.9 again and induction on n. Indeed, by assumption, if $$j_i=\min \{Q_i\}-1$$, then the kth contraction of $$T^{J_i}$$ is equal to the $$(j_i+k)$$th contraction of T. $$\square $$

../images/485183_1_En_8_Chapter/485183_1_En_8_Figf_HTML.png
../images/485183_1_En_8_Chapter/485183_1_En_8_Figg_HTML.png

Example 8.2.7

Consider the rank 1 tensor:

and consider the partition $$Q_1=\{1,2\}, Q_2=\{3\}$$, so that $$J_1=\{3\}$$ and $$J_2=\{1,2\}$$. Then
$$T^{J_1}=\begin{pmatrix} 3 &{} 6\\ 12 &{} 24 \end{pmatrix}, $$
while $$T^{J_2}=(15,30)$$. We get:

hence $$T^{J_1}\otimes T^{J^2} =45 T$$.

8.3 Scan and Flattening

The following operations is natural for tensors and often allow a direct computation of the rank.

Definition 8.3.1

Let $$T\in K^{a_1,\dots ,a_n}$$ be a tensor. For any $$j=1,\dots ,n$$ we denote with $$Scan(T)_j$$  the set of $$a_j$$ tensors obtained by fixing the index j in all the possible ways.

Technically, $$Scan(T)_j$$ is the set of tensors $$T_1,\dots T_{a_j}$$ in $$K^{a_1,\dots , \hat{a}_j,\dots ,a_n}$$ such that
$$ (T_q)_{i_1 \dots i_{n-1}} = T_{i_1\dots q\dots i_{n-1}},$$
where the index q occurs in the jth place.

By applying the definition recursively, we obtain the definition of scan of a tensor T along a subset $$J\subset \{1,\dots , n\}$$ of two, three, etc., indices.

../images/485183_1_En_8_Chapter/485183_1_En_8_Figh_HTML.png

Example 8.3.2

Consider the tensor:

The scan $$Scan(T)_1$$ is given by the three matrices
$$\begin{pmatrix} 2 &{} 3\\ 2 &{} 4 \end{pmatrix}\qquad \begin{pmatrix} 1 &{} 1\\ 4 &{} 8 \end{pmatrix}\qquad \begin{pmatrix} 3 &{} 4\\ 5 &{} 6 \end{pmatrix}, $$
while $$Scan(T)_2$$ is the couple of matrices
$$\begin{pmatrix} 2 &{} 2\\ 1 &{} 4\\ 3 &{} 5 \end{pmatrix}\qquad \begin{pmatrix} 3 &{} 4\\ 1 &{} 8\\ 4 &{} 6 \end{pmatrix}. $$

Remark 8.3.3

There is an obvious relation between the scan and the contraction of a tensor T. If $$J\subset \{1, \dots , n\}$$ is any subset of the set of indices and $$J'=\{1, \dots , n\}\setminus J$$ then the $$J-$$contraction of T equals the sum of the tensors of the scan of the tensor T along $$J'$$.

We define the flattening of a tensor by taking the scan along one index, and arranging the resulting tensors in one big tensor.

Definition 8.3.4

Let $$T\in K^{a_1,\dots , a_n}$$ be a tensor. The flattening  of T along the last index is defined as follows. For any positive $$q\le a_{n-1}a_n$$, one finds uniquely defined integers $$\alpha ,\beta $$ such that $$q-1=(\beta -1) a_{n-1}+(\alpha -1)$$, with $$1\le \beta \le a_n$$, $$1\le \alpha \le a_{n-1}$$. Then the flattening of T along the last index is the tensor $$FT\in K^{a_1,\dots ,a_{n-2},a_{n-1}a_n}$$ with:
$$ FT_{i_1\dots i_{n-2}q}=T_{i_1\dots i_{n-2}\alpha \beta }.$$
../images/485183_1_En_8_Chapter/485183_1_En_8_Figi_HTML.png
../images/485183_1_En_8_Chapter/485183_1_En_8_Figj_HTML.png

Example 8.3.5

Consider the tensor:

Its flattening is the $$2\times 4$$ matrix:
$$\begin{pmatrix}1 &{} 1\\ 3 &{} 4 \\ 3 &{} 8 \\ 2 &{} 6 \end{pmatrix}.$$
The flattening of the rank 1 tensor:
is the $$2\times 4$$ matrix of rank 1:
$$\begin{pmatrix}1 &{} 2\\ 2 &{} 4 \\ 4 &{} 8 \\ 8 &{} 16 \end{pmatrix}.$$

Remark 8.3.6

One can apply the flattening procedure after a permutation of the indices. In this way, in fact, one can define the flattening along any of the indices. We leave the details to the reader.

Moreover, one can take an ordered series of indices and perform a sequence of flattening procedures, in order to reduce the dimension of the tensor.

The final target which is usually the most useful for applications is the flattening reduction of a tensor to a (usually rather huge) matrix, by performing $$n-2$$ flattenings of an n-dimensional tensor. If we do not use permutations, the final output is a matrix of size $$a_1\times (a_2\cdots a_n)$$.

The reason why the flattenings are useful, for the analysis of tensors, is based on the following property.

Proposition 8.3.7

A tensor T has rank 1, if and only if all its flattening has rank 1.

Proof

Let $$T\in K^{a_1,\dots ,a_n}$$ be a tensor of rank 1, $$T=v_1\otimes \dots \otimes v_n$$ where $$v_i=(v_{i1},\dots , v_{ia_i})$$. Since $$T\ne 0$$, then also $$FT\ne 0$$. Then one computes directly that the flattening FT is equal to $$v_1\otimes \dots \otimes v_{n-2}\otimes w$$, where
$$ w= (v_{n-1\ 1}v_{n1},\dots , v_{n-1 a_{n-1}}v_{n1}, v_{n-1\ 1}v_{n2},\dots ,v_{n-1 n_{a-1}}v_{na_n}).$$
Conversely, recall that, from Theorem 6.​4.​13, a tensor T has rank 1 when, for all choices $$p,q=1,\dots , n$$ and numbers $$1\le \alpha ,\gamma \le a_p$$, $$1\le \beta ,\delta \le a_q$$ one has
$$\begin{aligned} {\begin{matrix} T_{i_1\cdots i_{p-1}\alpha i_{p+1}\dots i_{q-1} \beta i_{q+1}\dots i_n}\cdot T_{i_1\cdots i_{p-1}\gamma i_{p+1}\dots i_{q-1} \delta i_{q+1}\dots i_n}-\\ -T_{i_1\cdots i_{p-1}\alpha i_{p+1}\dots i_{q-1} \delta i_{q+1}\dots i_n}\cdot T_{i_1\cdots i_{p-1}\gamma i_{p+1}\dots i_{q-1} \beta i_{q+1}\dots i_n}&{}=0 \end{matrix}} \end{aligned}$$
(8.1)
If we take the flattening over the two indices p and q, the left term of previous equation is a $$2\times 2$$ minor of the flattening.

The second tensor in Example 8.3.5 shows the flattening of a $$2\times 2\times 2$$ tensor of rank 1.

Notice that one implication for the Proposition works indeed for any rank. Namely, if T has rank r, then all of its flattenings have rank $$\le r$$.

On the other hand, the converse does not hold at least when the rank is big. For instance, there are tensors of type $$2\times 2\times 2$$ and rank 3 (see Example 6.​4.​15) for which, of course, the $$2\times 4$$ flattenings cannot have rank 3.

../images/485183_1_En_8_Chapter/485183_1_En_8_Figk_HTML.png

Example 8.3.8

The tensor

has rank $$>1$$, because some determinants of its faces are not 0. On the other hand, its flattening is the $$2\times 4$$ matrix
$$\begin{pmatrix}1 &{} 2\\ 8 &{} 16 \\ 2 &{} 4 \\ 4 &{} 8 \end{pmatrix}$$
which has rank 1.

We have the following, straightforward:

Proposition 8.3.9

Let FT have rank 1, $$FT=v_1\otimes \dots \otimes v_{n-2}\otimes w$$, with $$w\in K^{a_{n-1},a_n}$$. Assume that we can split w in $$a_{n-1}$$ pieces of length $$a_n$$ which are proportional, i.e., assume that there is $$v_n=(\alpha _1,\dots ,\alpha _{a_n})$$ with
$$w= (\beta _1\alpha _1,\dots ,\beta _1\alpha _{a_n},\beta _2\alpha _1,\dots ,\beta _{a_{n-1}}\alpha _{a_n}).$$
Then T has rank 1, and setting $$v_{n-1}=(\beta _1,\dots , \beta _{a_{n-1}})$$, then:
$$T= v_1\otimes \dots \otimes v_{n-2}\otimes v_{n-1}\otimes v_n.$$

As a corollary of Proposition 8.3.7, we get:

Proposition 8.3.10

If a tensor T has rank r, then its flattening has rank $$\le r$$.

Proof

The flattening is a linear operation in the space of tensors. Thus, if $$T=T_1+\dots + T_r$$ with each $$T_i$$ of rank 1, then also $$FT=FT_1+\cdots + FT_r$$, and each $$FT_i$$ has rank 1. The claim follows.

Of course, the rank of the flattening FT can be strictly smaller than the rank of T. For instance, we know from Example 6.​4.​15 that there are $$2\times 2\times 2$$ tensors T of rank 3. The flattening FT, which is a $$2\times 4$$ matrix, cannot have rank bigger than 2.

Remark 8.3.11

Here is one application of the flattening procedure to the computation of the rank.

Assume we are given a tensor $$T\in K^{a_1,\dots ,a_n}$$ and assume we would like to know if the rank of T is r. If $$r< a_1$$, then we can perform a series of flattenings along the last indices, obtaining a matrix $$F^*T$$ of size $$a_1\times (a_2\cdots a_n)$$. Then, we can compute the rank of the matrix (and we have plenty of fast procedures to do this). If $$F^*T$$ has rank $$>r$$, then there is no chance that the original tensor T has rank r. If $$F^*T$$ has rank r, then this can be considered as a cue toward the fact that $$rank(T)=r$$.

Of course, a similar process is possible, by using permutations on the indices, when $$r \ge a_1$$ but $$r<a_i$$ for some i.

The flattening process is clearly invertible, so that one can reconstruct the original tensor T from the flattening FT, thus also from the matrix $$F^*T$$ resulting from a process of $$n-2$$ flattenings.

On the other hand, since a matrix of rank $$r>1$$ has infinitely many decompositions as a sum of r matrices of rank 1, then by taking one decomposition of $$F^*T$$ as a sum of r matrices of rank 1 one cannot hope to reconstruct automatically from that decomposition of T as a sum of r tensors of rank 1.

Indeed, the existence of a decomposition for T is subject to the existence of a decomposition for $$F^*T$$ in which every summand satisfies the condition of Proposition 8.3.9.

Remark 8.3.12

One can try to find an approximation of a given tensor T with a tensor of prescribed, small rank $$r<a_1$$ by taking the matrix $$F^*T$$, resulting from a process of $$n-2$$ flattenings, and considering the rank r approximation for $$F^*T$$ obtained by the standard SVD approximation process for matrices (see [1]).

For instance, one can find in this way a rank 1 approximation for a tensor, which in principle is not equal to the rank 1 approximation obtained by the marginalization (see Example 8.1.12).

../images/485183_1_En_8_Chapter/485183_1_En_8_Figl_HTML.png
../images/485183_1_En_8_Chapter/485183_1_En_8_Figm_HTML.png
../images/485183_1_En_8_Chapter/485183_1_En_8_Fign_HTML.png

Example 8.3.13

Consider the tensor:

of Example 8.1.12. The contractions of T are (1, 1), (1, 1), (1, 1), whose tensor product, divided by $$4=(1+1)+(1+1)$$ determines a rank 1 approximation:

The flattening of T is the matrix:
$$FT=\begin{pmatrix}1 &{} 0\\ 0 &{} 0 \\ 0 &{} 0 \\ 0 &{} 1 \end{pmatrix}.$$
The rank 1 approximation of FT, by the SVD process, is:
$$FT=\begin{pmatrix}1 &{} 0\\ 0 &{} 0 \\ 0 &{} 0 \\ 0 &{} 0 \end{pmatrix}.$$
This matrix defines the tensor:

which is another rank 1 approximation of T.

If we consider the tensors as vectors in $$K^8$$, then the natural distance $$d(T,T_1)$$ is $$\sqrt{3/2}$$, while the natural distance $$d(T,T_2)$$ is 1. So $$T_2$$ is “closer” to T than $$T_1$$. On the other hand, for some purposes, one could consider $$T_1$$ as a better rank 1 approximation of T than $$T_2$$ (e.g., it preserves marginalization).

8.4 Exercises

Exercise 15

Prove the assertion in Example 8.1.4: the matrix M defined there generates the Kernel of the marginalization.

Exercise 16

Find generators for the Kernel of the marginalization map of $$3\times 3$$ matrices.

Exercise 17

Find generators for the image of the marginalization map of tensors and prove Proposition 8.1.7.

Exercise 18

Prove the statements of Remark 8.2.4.