Complex Projective Algebraic Statistics

No, we are not exaggerating. We are instead simplifying.

In fact, many of the phenomena associated with the main statistic models are better understood if studied, at least in the first measure, from the projective point of view and on an algebraically closed numerical field.

The main link between Algebraic Statistics and Projective Algebraic Geometry is based on the constructions of this chapter.

4.1 Motivations

We have seen as many models of interest in the fields of statistics are defined, in the space of the distributions of a system S, by (polynomials) algebraic equations of degree greater than one. To understand such models, a mathematical approach is to initially study the subsets of a space defined by the vanishing of polynomials. This is equivalent to studying the theory of solutions of systems of polynomial equations of arbitrary degree, which goes under the name of Algebraic Geometry.

The methods of Algebraic Geometry are based on various theories: certainly, on Linear and Multi-linear Algebra, but also on rings theory (in particular, on the theory of rings of polynomials) and on Complex Analysis. We will repeat here only a part of the main results that can be applied to statistical problems. It must be kept in mind, however, that the theory in question is rather developed, and topics that will not be introduced here, they could be important in the future also from a statistical point of view.

The first step to take is to define the ambient space in which we move. Being to study solutions of nonlinear equations, from an algebraic point of view, it is natural to pass from the real field, fundamental for applications but without some algebraic elements, to the complex field which, being algebraically closed, allows a complete understanding of the solutions of polynomial equations.

We will then have to expand, from a theoretical point of view, the space of distributions, to admit points with complex coordinates. These points, corresponding to distributions with complex numbers, will allow us a more immediate characterization of the sets of solutions of algebraic systems. Naturally, when we have to reread the results in the common statistical theory, we will have to return to models exclusively defined on the real field, that intersects with the real space contained in every complex space. This final step, which in general poses technical problems that are absolutely nontrivial, can, however, be overlooked in a first reading, where the indications that we will get on the complexes will still help us to understand of real phenomena.

Once this first enlargement is accepted, to arrive at an understanding even more in-depth of algebraic phenomena, it is opportune to operate a second step, perhaps apparently even more challenging: the passage from affine spaces $\mathbb C^n$ to the associated projective spaces.

The reasons for this second enlargement are justified, from a geometric point of view, with the need to work in compact ambients. Compactness is indeed an essential property for our geometric understanding. In very descriptive terms, thanks to the introduction of points to infinity, we will avoid to lose the solutions when, passing to the limit, we should have phenomena of parallelism or however asymptotic phenomena. The possibility of following the reasonings passing to the limit is one of the winning cards that geometry in projective setting offers, compared to that in similar environments.

Of course, projective compactification, to make sense in statistical problems, it must be carried out properly, differentiating, for example, the passage to the limit of the various random variables.

For those who find too cumbersome the procedure to use homogeneous coordinates to describe distributions on random systems, perhaps it is worth remembering that a similar procedure, in statistics, is always present: the normalization. In practice, if we have a distribution D on a random variable x having states $s_1, \dots , s_n$ , then it is natural to replace D with the $\bar{D}$ distribution obtained dividing each $$D_x (s_i)$$ by the sampling $\sum D_x (s_j)$ (in the event that x is not neutral with respect to D). Note that, in doing so, in the space of distributions that concerns the variable x, we get to replace the point $(D_x (s_1), \dots , D_x (s_n))$ with the point $(\bar{D}_x (s_1), \dots , \bar{D}_x (s_n))$ . If, in the affine space, the point is changed, in the projective space, where the n -tuple represents homogeneous coordinates, passing to normalization, the point does not change! In effect, every point in the projective space can always be represented by homogeneous coordinates $(a_1, \dots , a_n)$ such that $\sum a_j = 1$ .

From another point of view, the classical statistical theory, in the space $\mathbb R^n$ of the distributions of a random variable x as above, tended to shrink to the hyperplane defined by the equation $\sum a_j = 1$ , as we have seen, for example, in the statement of Varcenko’s Theorem. Instead, projective statistical theory works on distributions up to scaling, then does not need this restriction, since from a projective point of view, normalization, like any other scaling, is irrelevant.

Therefore, it is quite simple to convince ourselves that, in the end, these are two equivalent approaches. The difficulty of passing from one to another basically resides in habit. The advantage of using the projective language consists in being able to access directly to the vast literature on Algebraic Geometry which, in many ways, mainly uses this terminology.

During this chapter, we will use many topics from the Part III. We strongly recommend to the reader which is not expert in Algebraic Geometry to study such chapters before to start the treatment of projective algebraic models.

4.2 Projective Algebraic Models

What is the link of all of this with Algebraic Statistics?

When we meet a distribution $D\in \mathcal {D}(S)$ , where S is a system of random variables, we are, in practice, collecting a set of observed data. To the aim of our interpretation of data, is usually irrelevant (within certain reasonable limits) the sampling of the variables.

If, for example, we are evaluating the efficiency of a medicine, to administer the medicine to 100 sick people and having 50 healed suggests that the medicine is efficient in about the 50% of cases. From this point of view, we obtain the same conclusion if we administer the medicine to 120 sick and record 60 healed.

So, in our setting, we will consider that a distribution gives the same information on the analyzed phenomenon than any of its scaling.

In Statistics, the problem is solved by choosing, among all possible scaling of a distribution, the associated probabilistic distribution, introduced in Definition 1.3.2. Such distribution is uniquely determined, given a distribution D, but it exists only when all variables have sampling different from 0 in D.

The associated probabilistic distribution lies in a linear subspace of $\mathcal {D}(S)=K^{a_1}\times \cdots \times K^{a_n}$ , defined by the vectors $v=((v_{11},\dots , v_{1a_1}),\dots ,(v_{n1},\dots , v_{na_n}))$ satisfying the equations

$v_{11}+\dots + v_{1a_1} = \dots = v_{n1}+\dots +v_{na_n} = 1.$

Instead, in our setting, we prefer to consider the multi-projective space of distributions.

Definition 4.2.1

Given a system S, with random variables $x_1,\dots , x_n$ , we call multi-projective space of distributions the multi-projective space

$\mathbb P(\mathcal {D}(S)) =\mathbb P^{a_1}\times \dots \times \mathbb P^{a_n}$

where

has

states. (note the increasing by 1!).

The elements of $\mathbb P(\mathcal {D}(S))$ are thus equivalence classes, each of them containing a distribution D and all its scaling. In such a way we can link more easily (Projective) Algebraic Geometry with the study of significative statistical models.

In this new view, only the statistical models which are independent of the scaling (multiconi) in the space of distributions, are significative.

Because the overwhelming majority of important models (if correctly interpreted) are independent of the scaling, so they are cones, the depth investigation of our theory will not be affected.

Definition 4.2.2

A K-projective model on S is a subset of $\mathbb P(\mathcal {D}_K(S))$ . A K-projective algebraic model on S is a model corresponding to a (multi)projective variety, i.e., defined by the vanishing of multi-homogeneous polynomials.

There is a natural surjection from $\mathcal {D}(S)\setminus \{0\}$ to $\mathbb P(\mathcal {D}(S))$ . The preimage of a projective model on S, in such projection, is a model M on S with the following property:

$\begin{aligned} \text {if } D\in M \text { and }D' \text { is a scaling of }D, \text { with }D'\ne 0, \text { then }D'\in M. \end{aligned}$

Example 4.2.3

The independence model can be thought as a projective algebraic model, because it is defined by many multi-homogeneous equations (see Theorem 6.4.13).

A linear model is a projective algebraic model when its defining linear polynomials do not have a constant term.

Example 4.2.4

The projective algebraic models on a system S with a single variable (as in the case of the total correlation) are strictly related with the cones of the vector space $\mathcal {D}(S)$ . Every cone defines a projective model on S. Vice versa, given a projective model on S, its preimage in the projection $\mathcal {D}(S)\rightarrow \mathbb P(S)$ is a cone.

Example 4.2.5

Consider the system S formed by two ordinary dice. The projective space of distributions is $\mathbb P^5\times \mathbb P^5$ . Instead, the projective space of distribution of $\Pi S$ is $\mathbb P^{35}$ , corresponding to the space of $6\times 6$ matrices.

If S is a system where the two variables represent a die and a coin, the projective space of distributions is $\mathbb P^5\times \mathbb P^1$ . In this case, it is easy to observe that the only variable in the total correlation $\Pi S$ has 12 states, hence the projective space of distributions of $\Pi S$ is $\mathbb P^{11}$ .

If a system S has n Boolean variables, then its projective space of distributions is a product of n copies of $\mathbb P^1$ , and the space of distributions of $\Pi S$ is $\mathbb P^{2^n-1}$ .

4.3 Parametric Models

We are now able to define projective parametric models on A. For this aim, we will use Definition 9.3.1 of projective map, in Sect. 9.3.

Definition 4.3.1

If S, T are random systems, we call projective connection any projective map $\Gamma : \mathbb P(\mathcal {D}(S))\rightarrow \mathbb P(\mathcal {D}(T))$ . Notice, in particular, that if $\Gamma$ is a projective connection, then the image of any scaling $$D'$$ of a distribution D is a scaling of $\Gamma (D)$ .

We say that a model M is projective parametric if it is the image $\mathbb P(\mathcal {D}(S))$ of a projective connection $\Gamma$ .

Many interesting parametric models have a counterpart projective parametric.

Example 4.3.2

The independence model is projective parametric. As a matter of fact let S be a system with random variables $x_1,\dots , x_n$ and let $$a_i+1$$ be the number of states of the variable $$x_i$$ . Hence, the total correlation $\Pi S$ has a unique variable, with $\Pi (a_i+1)$ states.

The model of independence of S corresponds to the map

$\mathbb P(\mathcal {D}(S))=\mathbb P^{a_1}\times \dots \times \mathbb P^{a_n}\rightarrow \mathbb P(\mathcal {D}(\Pi S))=\mathbb P^M$

( $M=-1+\Pi (a_i+1)$ ) defined by

${\left\{ \begin{array}{ll} \quad \vdots &{}=\qquad \qquad \vdots \\ t_{i_1,\dots , i_n} &{}= \quad s_{1i_1}s_{2i_2}\cdots s_{ni_n} \\ \quad \vdots &{}=\qquad \qquad \vdots \end{array}\right. }$

where we have numbered the coordinates of an element of $\mathbb P(\mathcal {D}(\Pi X))$ , as usual, identifying this element as a tensor.

It is evident from the same definition that the model of independence corresponds to a Segre variety (compare with Definition 10.5.9).

Notice that, in general, M is quite bigger than $$a_i$$ . For example, if $$n=3$$ and $$a_1=a_2=a_3=3$$ , then $$M=63$$ and the model corresponds to the Segre variety of $\mathbb P^3\times \mathbb P^3\times \mathbb P^3$ embedded in $\mathbb P^{63}$ .

We recall that the product $\mathbb P^1\times \mathbb P^1$ is not isomorphic to $\mathbb P^2$ . Through the Segre map, $\mathbb P^1\times \mathbb P^1$ corresponds to a surface in $\mathbb P^3$ , image of the map

$((x_1,x_2),(y_1,y_2))\mapsto (x_1y_1, x_1y_2,x_2y_1,x_2y_2)$

that is, in parametric form

${\left\{ \begin{array}{ll} a_{11} &{}=\quad x_1y_1 \\ a_{12} &{}=\quad x_1y_2 \\ a_{21} &{}=\quad x_2y_1 \\ a_{22} &{}=\quad x_2y_2 \end{array}\right. }$

This surface, which represents the (projective) model of independence of a Boolean system with two variables, is defined by a single equation (determinant of the corresponding $2\times 2$ matrix) $a_{11}a_{22}=a_{12}a_{21}$ .

Example 4.3.3

On a random system with three variables $$x_1, x_2, x_3$$ , the model without triple correlation of Example 3.2.5 it is not, strictly speaking, parametric projective.

In fact, taking up the terminology of the example, this model is defined by considering the model $$S'$$

given by the union of the total correlations of the three subsystems of S that are obtained by canceling one of the variables in turn. $$S'$$

also has three variables, corresponding to $$(x_1,x_2), (x_1,x_3)$$

, and

. The models without triple correlation are obtained from the connection from $$S'$$

to S, which sends every triplet of matrices $(A,B, C)\in \mathcal {D}(S')$ , with $A\in \mathbb C^{d_1,d_2}$ , $B\in \mathbb C^{d_1,d_3}$ , $C\in \mathbb C^{d_2,d_3}$ , to the tensor $D\in \mathcal {D}(\Pi S)$ defined by

It is clear that all components of this map are multi-homogeneous of the same degree, but, in general, they do not define a map

$\mathbb P^{d_1d_2-1}\times \mathbb P^{d_1d_3-1}\times \mathbb P^{d_2d_3-1}\rightarrow \mathbb P^{d_1d_2d_3-1}.$

because even if A, B, C are all different from zero, however it is not clear that their image is not zero.

In order to obtain our model as the image of a well-defined projective map, we must restrict to a subvariety of $\mathbb P^{d_1d_2-1}\times \mathbb P^{d_1d_3-1}\times \mathbb P^{d_2d_3-1}$ .

For instance, when the three variables are Boolean, we must restrict the model to a suitable model X of distributions on $$S'$$

, so that we get a well-defined map from a variety $X\subset \mathbb P^3\times \mathbb P^3\times \mathbb P^3$ to $\mathbb P^7$ . This map is obtained by composing the Segre map $\mathbb P^3\times \mathbb P^3\times \mathbb P^3\rightarrow \mathbb P^{63}$ with a suitable projection $\mathbb P^{63}\rightarrow \mathbb P^7$ . It is a matter of computations that the subvariety X should not intersect a configuration of products of linear spaces, containing, for instance,

$\begin{matrix} L_1\times \mathbb P^3\times L''_1 &{} \text{ where } &{} L_1=\{x_1=x_3=0\} &{} L''_1=\{z_3=z_4=0\} \\ L_2\times L'_2 \times \mathbb P^3 &{} \text{ where } &{} L_2=\{x_1=x_2=0\} &{} L''_1=\{y_3=y_4=0\} \\ \dots &{} &{} \end{matrix}$

The fact that the image of a Segre map can be interpreted as (projective) model of independence of an random system, by Theorem 6.4.13, guarantees us that the Segre varieties are all projective varieties.

Let us see how, in general, there are projective parametric models which are not algebraic models.

Example 4.3.4

Consider two systems S, $$S'$$ , both with a single random variable.

We identify both the projective spaces of distribution over $\mathbb R$ , $\mathbb P(\mathcal {D}(S))$ and $\mathbb P(\mathcal {D}(S'))$ , with $\mathbb P_\mathbb R^1$ . We can define a projective connection $\Gamma :\mathbb P(\mathcal {D}(S))\rightarrow \mathbb P(\mathcal {D}(S'))$ by posing $\Gamma (x_0,x_1)=(x_0^2,x_1^2)$ .

It is easy to check that the image W of $\Gamma$ contains infinitely many points of $\mathbb P^1_R$ . However, it does not contain all points: as a matter of fact, the point with homogenous coordinates $$(1,-1)$$ is not in the image.

On the other hand, each projective variety in $\mathbb P_\mathbb R^1$ , being defined by the vanishing of a homogeneous polynomial in two variables, or coincides with $\mathbb P_\mathbb R^1$ , or it can only contain a finite number of points.

So W can not be a projective variety.

Example 4.3.5

Let us go back to the situation represented in Example 3.3.4.

Recall that the initial situation corresponds to a Boolean system S with a variable (of states A, B) while the final situation corresponded to a system $$S'$$ with only one variable that could take the 3 values AA, AB, BB.

The connection $\Gamma$ , defined by $\Gamma (a, b) = (a^2,2ab, b^2)$ , is clearly a projective map between $\mathbb P(\mathcal {D}(S)) = \mathbb P_\mathbb R^1$ and $\mathbb P(\mathcal {D}(S ')) = \mathbb P_\mathbb R^2$ . The image corresponds to the subset $W \subset \mathbb P_\mathbb R^2$ defined by the points satisfying the equation $$y^2 = 4xz $$ .

It should be noted, however, that not all the homogeneous coordinates of these points can be obtained in the map. In fact the point P of coordinates (1, 2, 1) is in the image (we get it for $$(a, b) = (1,1)$$ , but no pair of $\mathbb R^2$ gives $$(-1, -2, -1)$$ , which are also coordinates of P.

From Chow’s Theorem (see Theorem 10.6.3), it immediately follows:

Theorem 4.3.6

Each projective parametric model is a projective algebraic model.

This theorem generalizes the situation already seen for the model of independence and explains how each projective parametric model can be defined by homogeneous polynomial equations.

The proof of Chow’s Theorem also explains theoretically how the homogeneous equations of a projective parametric model can be found.

In practice, as one can imagine, when the number of variables grows, it is not easy to follow step by step the directions, and find an effective set of equations, even with the help of computational tools. The use of Groebner bases, which we will introduce later, allows an optimization of the procedure (see Chap. 13).

The advantage of presenting a model with homogeneous equations (implicit equations), rather than through parametric equations, should instead be clear in the daily practice of Algebraic Statistics: to test whether a given phenomenon, i.e., a given distribution, falls within the model imagined by a theory (in more imaginative words: if an experiment confirms or less a theory), once the implicit equations are known, it is sufficient to check if the distribution satisfies them. A similar computation is elementary, for every single equation. In everyday practice, the complication derives only from the fact that normally each model is described by an astronomical number of equations, sometimes with approximate coefficients. However, these problems can be managed by methods of sample searching and error checking.

Instead, having to show that a given distribution belongs to a model of which only parametric equations are known, the problem moves to show the existence of parameters for which the parameterization returns the starting distribution. Such a problem of existence is extremely difficult to control, even in the presence of a few, precise equations. Imagine when the equations are thousands, with approximate coefficients!

4. Complex Projective Algebraic Statistics