The relationship and importance of the normal distribution to quantitative genetics is clear; however, the closeness of the relationship between observed progeny performances and theoretical distributions will be related to the model upon which the relationship is based. For example, we assumed that all uppercase letter alleles were of equal additive value, which of course may not be true. It is important that an appropriate model of inheritance is applied; otherwise other derived statistics (i.e. heritabilities, see later in Chapter 6), which are potentially of great value in plant breeding, will be biased and are likely to be highly misleading.
Let us examine the basic model applied to quantitative situations and see how the model can be tested for its appropriateness to the situation or inheritance of specific characters.
Consider again the cross between two canola cultivars described above. The yield of the higher-yielding parent is 620 kg/plot, the yield of the lower-yielding parent
is 500 kg/plot, and the yield of the
is 560 kg/plot. Assume a model of additive genetic effects, where we have:
where the difference between the performance of the parents is divided in half (i.e. ), and indicates the additive effect. Note that [a] carries no sign.
is called the mid-parent value and is midway in value between the performance of
and
. Therefore:
The term [a] is used to indicate the summation of the additive effects over all loci involved, however many this may be. In the example shown we assumed a completely additive model of inheritance and the performance was indeed equal to
. Therefore in the absence of dominance, the mid-parent value will equal the performance of the
. Dominance will be detected in cases where the performance of the
is not equal to
.
Let us return again to the canola cross, and continue to assume the relationship between uppercase alleles adding to a base yield and lowercase alleles adding nothing. Previously, we did not consider dominant alleles and their effect on the distribution.
Now let us assume a two loci and two alleles per locus model of inheritance for yield. Assume also that is dominant to
, but
and
are additive. Therefore
, so we have:
One (or two) alleles would add 60 kg/plot to the base weight. Therefore AA adds 60 kg/plot,
, and
adds 30 kg/plot. The
. Now
, so
; clearly the
is not equal to
, and we have a case of dominance.
When the population is examined we see that the basic bell-shaped curve (below) has now been skewed to the right, as a greater frequency of progeny have higher yield due to the effect of the dominant
allele. The average (mean, shown by the arrow) performance of the
is now 575 kg/plot.
AABb | ||||
AAbb | AaBb | |||
Aabb | aABb | aABB | ||
aabB | aAbb | AabB | AaBB | |
aabb | aaBb | aaBB | aAbB | AABB |
500 | 530 | 560 | 590 | 620 |
![]() |
We can now expand this idea to a three-loci, two-allele example as before, and we have 64 possible genotypes with 7 possible phenotypes. Assume that is dominant to
, but that
,
,
and
are all additive, and uppercase alleles add 20 kg/plot to the base yield of 500 kg/plot. We now have the
(as
). Again we see that the
performance is higher than the mid-parent value
, reflecting the dominance that
exhibits.
As with the two loci case, the distribution of phenotypes with three genes is similarly skewed to the right. In this instance, the average (mean, shown by the arrow) performance of the generation would be 570 kg/plot.
AABBcc | ||||||
AABbcC | ||||||
AABbCc | ||||||
aabBCC | AAbBCc | |||||
aaBBcC | AAbBcC | |||||
aaBBCc | AAbbCC | |||||
aaBbCC | aAbBcC | |||||
AAbbcC | aAbBCc | AABBcC | ||||
AAbbCc | aAbbCC | AABBCc | ||||
AAbBcc | AabbCC | AAbBCC | ||||
aabbCC | AABbcc | AabBcC | AABbCC | |||
aabBcC | aAbbcC | AabBCc | AaBBCc | |||
aabBCc | aAbbCc | AaBbcC | AaBBcC | |||
aaBbcC | aAbBcc | AaBbCc | AaBbCC | |||
aaBbCc | aABbcc | AaBBcc | AabBCC | |||
aabbcC | aaBBcc | AabbcC | aABbcC | aABBCc | ||
aabbCc | AAbbcc | AabbCc | aABbCc | aABBcC | AABBCC | |
aabBcc | aAbbcc | AabBcc | aABBcc | aABbCC | aABBCC | |
aabbcc | aaBbcc | Aabcc | AaBbcc | aaBBCC | aAbBCC | AaBBCC |
500 | 520 | 540 | 560 | 580 | 600 | 620 |
![]() |
The keen observer will have noted two points:
Consider again the inheritance model that we have for additive effects:
Figure 5.3 The degree of skewness of a six loci two-allele system is shown is shown for no dominance, one dominant locus, three dominant loci and five dominant loci. Note that increasing the number of dominant loci results in greater skewness.
As we have seen, the performance does not always coincide with the mid-parent value. In the two-loci case we had:
So we now need to add a second parameter to the model which represents the amount of dominance [d], and where:
and from this we have:
Unlike [a] which is always positive, [d] can be either positive or negative (i.e. the has a higher or lower value than
, respectively).
Using these three parameters we can now proceed to determine the expected performance of any generation. We have:
Assume that the and
generations are grown so that
, [a] and [d] can be calculated from their means; then it can be seen why
,
and
are referred to as the components of the generation means.
In addition to and
, three further generations are commonly considered:
These three generations are typified by genetic segregation. It is therefore necessary to derive the proportions of the different genotypes, and their relative contributions to the means, in the various generations.
The Aa gene segregates with gamete formation in the thus giving, in the
generation, the genotypes AA, Aa and aa in the ratio 1 AA : 2 Aa: 1 aa or, as proportions,
and
. The mean expression of the AA plants is
. But since only a quarter of the
generation is of the AA genotype, it contributes
to the
generation mean. Similarly, since the mean expression of the aa plants is
and they make up a quarter of the
generation, aa plants contribute
to the
generation mean. Finally, since half the
generation has the genotype Aa, which has a mean expression of
, the heterozygotes contribute
to the
generation mean. The term [a] is a summation of additive effects over all loci, therefore, for simplicity, we assume (without proof):
Considering a single gene model again, then the generation
is composed of
and
. Again assuming without proof that [a] is an accumulation of all additive effects and [d] an accumulation of all dominance effects, we have:
Similarly the generation
would be:
Earlier, a model which we shall now call the additive–dominance model was put forward that purports to explain the inheritance of quantitative (continuously varying) characters exclusively in terms of the additive and dominance properties of the single gene difference that underlies it. It is now necessary to consider whether a model based on only additive and dominance genetic differences is adequate. From a plant breeding standpoint it is important to know whether the inheritance of a character under selection is controlled by an additive–dominance model because many assumptions, notably the response to selection, are based on heritability, and in estimating heritability it is usually assumed that this model is appropriate. Testing the additive–dominance model in quantitative genetics should be regarded as directly comparable to testing for the absence of linkage or epistasis with qualitative inheritance. However, with genes showing qualitative differences the most common testing method to compare frequencies of genotypes or phenotypes is a test. In quantitative genetics, genotypes (or more correctly the phenotypes) do not fall into distinct classes and hence frequency
tests are not appropriate. So we need an equivalent but appropriate test.
First, recall that the components , [a] and [d] are derived from the generation means of the
and
generations:
From these, similar equations can be formulated for the generation means of the and
generations:
These six equations lead to various predictive relationships between the means of different combinations of the generations. For example, if the additive–dominance model we have developed is correct, then it can be predicted that:
This relationship is easily be seen by the substitution of the appropriate combinations of , [a] and [d] for the three generation means:
It can be seen that all the terms on the left-hand side of the last equation cancel out. This test is called the A-scaling test.
Another relationship is:
and:
Once again, all the terms on the left-hand side cancel out. This test is called the B-scaling test.
A final relationship, known as the C-scaling test, is:
This last test can be shown to hold true in the same way as for the two tests above.
These relationships are based on the model we have proposed for predicting the means of the various generations. Would you expect this relationship to hold if we substituted the means that we had actually measured? In other words would you expect the above equations to exactly equal zero? The answer is no, and in fact, it would be quite surprising if, for example, the sum of the means of the and
generations exactly equalled twice the mean of the
generation. While
might be approximately equal to
, error variation would give rise to random variation in all three means resulting in some overall discrepancy.
Thus, from the above and thinking now in terms of measured generation means, we have the relationships:
where A, B and C are all expected effectively to equal zero. If they do equal zero, or at least are not too far from it, then there is no reason to suspect that the additive–dominant model is inadequate as an explanation of the inheritance of the continuously varying character in question. On the other hand, if they do deviate markedly from zero, then there is reason to doubt the adequacy of the model as an explanation of the inheritance of the character in question. This is a classic instance of the need for objective statistical tests to decide whether A, B or C differ significantly from zero, or whether any discrepancies observed could simply be fairly ascribed to chance. If the discrepancies could reasonably be attributed to chance, then the model can be provisionally accepted as an adequate description of reality. If this explanation is too unlikely, then the hypotheses (that A, B and C all equal zero within the bounds of sampling error) must be rejected and therefore also the additive–dominant model upon which they are based. The statistical tests are called the A-scaling test, B-scaling test and C-scaling test (and there are many others possible). However, the A-scaling test will be used here to represent the principles involved in all of them.
The basis of the A-scaling test is that the value of A is compared to the value predicted on the assumption that an additive–dominant model is adequate (i.e. ). The question is then asked:“If the hypothesis is true (i.e.
), what is the probability that any difference between observation (A) and prediction (zero) could be due to chance?”
Conventionally, if the probability that the difference was due to chance is less than 0.05 (i.e. 5%, or 1 in 20) then the null hypothesis (in this case, that A is equal to 0) is rejected and the alternative hypothesis (that ) is accepted. In accepting the alternative hypothesis, it is also accepted that
is not equal to zero, and that an additive–dominant model is inadequate in this particular instance.
In comparing the actual value of A with its predicted value (zero), what factors must be taken into account? Clearly the magnitude of the discrepancy (i.e. the actual value of A itself, must be considered. Another important factor is the variability in A from one experiment to another. If A varied enormously from one experiment to the next, then the mean value of A would have to be relatively large for it to be significantly different from zero. On the other hand, if the value of A were relatively constant from experiment to experiment, then even quite a small value of A could be accepted as significantly different from zero.
Finally, values based on relatively few plants are likely to be less convincing than values based on the measurement of many plants. Thus sample size is also highly relevant.
All of this perhaps sounds like a pretty tall order. In fact, biometricians have provided us with a method of relating the difference between the actual and predicted values of A to the variability in A from experiment to experiment, and also a statistical table in which the probability of obtaining such a difference by chance, given the number of plants measured, can be looked up. The equation is:
In fact, as you might have already noticed, the A-scaling test is just a particular application of Student's test, which you may have come across elsewhere.
In order to calculate , A has to be divided by its standard error (se). We have noted that
, where
and
are the measured (not the predicted) means of the
and
generations. But what is the standard error of A? A standard error is, like a standard deviation, the square root of a variance, as shown earlier. In fact, the standard error of A is the square root of the variance of the mean of A
, and is represented by
. Therefore:
where is the variance of the mean of the
generation,
is the variance of the mean of the
generation and
is the variance of the mean of the
generation.
It is essential to realize that the variance of the mean of a generation (i.e. ) is not the same as the variance between plants in that generation. In principle, the variance of the mean is calculated by growing adequate numbers of plants representing the generation in several different plots or experiments, calculating a generation mean for each experiment, and then calculating the variance of these different means (effectively treating them as the raw data). A variance of the mean so determined is less than the variance between all the individual plants grown in all the experiments. Fortunately, it is not necessary to perform several different experiments as described. Biometricians have demonstrated that a satisfactory estimate of the variance of the mean is obtained by dividing the variance derived from a single sample of plants by the number of plants measured that contribute to the estimate of the mean. That is:
As an example of this, consider that the height of 50 individual plants (i.e. ) of a pure-line barley cultivar was recorded and that the average plant height of all plants measured was calculated to be 100 cm, with a variance of
. The variance and the standard error of the mean of this sample of plant heights would be:
So, given a set of individual measurements, you should be able to calculate the mean, the variance and the standard deviation of the population, of which the data you are given can be assumed to be an unbiased sample. Furthermore, you could calculate both the variance of the mean and therefore its standard error.
Let us consider now a simple example where two homozygous barley cultivars ( and
) are cross-pollinated and a sample of
seed is backcrossed to the higher yielding parent
to produce
seed. Now if
and
seeds were planted in a properly randomized experiment and the height of each plant recorded, then the following means and standard errors might have resulted:
Now the variance of each family would be:
Therefore it follows that the variance of A would be:
and the standard error of A is given by:
Now to consider the mean value of A, this is given by:
Finally:
So, a value of has been calculated for these data as 0.085. This is based upon both the deviation of A from its expected value of zero (i.e. 2.0) and the variability found in the
and
plants measured, all the variability being summarized in the standard error of A (i.e. 23.6). The following question now arises: Is the deviation we have observed statistically significant?
In order to decide this, it is necessary to account for the number of plants measured in each generation upon which the values of A and are based. In fact, it is not the number of plants as such that is used, but the relevant numbers of degrees of freedom, where the degrees of freedom of
degrees of freedom of
the degrees of freedom of
the degrees of freedom of
.
Degrees of freedom have previously been mentioned in connection with the test. Generally, the number of degrees of freedom associated with a generation is one fewer than the number of plants representing that generation. Thus, if 11 plants of each of generations
and
were measured, then the degrees of freedom of
It is necessary to look up the value of (i.e. 0.085) for 30 degrees of freedom in a table of probabilities for
. As the
value we obtained is smaller in magnitude than the tabulated value with 30 degrees of freedom, there is no reason to reject the additive–dominance model in this instance, and so it is provisionally accepted as an adequate explanation of the inheritance of the character in question.
The procedure described above can be repeated in a similar way for the B- and the C-scaling tests. Indeed, sets of such scaling tests can be devised to cover any combination of types of family that may be available.
As an alternative, however, to testing the various expected relationships one at a time, a procedure was proposed by a researcher called L.L. Cavalli in 1952, which is known as the joint scaling test. This test effectively combines the whole set of scaling tests into one and thus offers a more general, more convenient, more adaptable and more informative approach.
The joint scaling test consists of estimating the model's parameters, , [a] and [d] from the means of all the families available, followed by a comparison of these observed means with their expected values derived from the estimates of the three parameters. This makes it clear at once that at least three types of family are necessary if the parameters of the model are to be estimated. However, with only three types of family available, no test can be made of the goodness of fit of the model since in such a case a perfect fit must be obtained between the observed means and their expectations derived from the estimates of the three parameters. So to provide such a test, at least four types of family must be raised.
The procedure for the joint scaling test is illustrated by considering the example given by Mather and Jinks's seminal textbook Introduction to Biometrical Genetics. The data they presented have been truncated for simplicity, and so differences due to rounding errors may occur. Their example consists of a cross between two pure-breeding varieties of rough tobacco (Nicotiana rustica). The means and variances of the means for plant height of the parental, and first back-cross families (
and
) derived from this cross are shown in Table 5.7.
Table 5.7 Means and variances of the means for plant height of two parental lines ( and
), the
progeny, and the first backcross families (
and
) derived from crossing
to
.
Number of plants | ![]() |
Weight ![]() |
Model | Observed | |||
![]() |
[a] | [d] | |||||
![]() |
20 | 1.033 | 0.968 | 1 | 1 | 0 | 116.30 |
![]() |
20 | 1.452 | 0.669 | 1 | ![]() |
0 | 98.45 |
![]() |
60 | 0.970 | 1.031 | 1 | 0 | 1 | 117.67 |
![]() |
160 | 0.492 | 2.034 | 1 | 0 | 1/2 | 111.78 |
![]() |
120 | 0.489 | 2.046 | 1 | 1/2 | 1/2 | 116.00 |
![]() |
120 | 0.613 | 1.630 | 1 | ![]() |
1/2 | 109.16 |
Also shown in this table is the number of plants that were evaluated from each generation. Family size was deliberately varied with the kind of family. It was set as low as 20 for the genetically uniform parents and in excess of 100 for the and backcrosses, to compensate for the greater variation expected in these segregating families. All plants were individually randomized at the time of sowing so that the variation within families reflects all the non-heritable sources of variation to which the experiment is exposed. With this design the estimate of the variance of a family mean
, valid for use in the joint scaling test, is obtained in the usual way, by dividing the variance within the family by the number of individuals in that family. Reference to these variances shows that the greater family size of the segregating generations has more than compensated for their greater expected variability, in that the variances of their family means are smaller than those of their non-segregating families.
Six equations are available for estimating , [a] and [d], and these are obtained by equating the observed family means to their expectations as given above. The coefficients of
, [a] and [d] in the six equations are listed with the collected data in Table 5.8.
Table 5.8 Coefficients of , [a] and [d] in the parents (
and
), the
generation and both back-cross generations (
and
) and the observed plant height of each family.
Generation | Model | Observed | ||
![]() |
[a] | [d] | ||
![]() |
1 | 1 | 0 | 116.30 |
![]() |
1 | ![]() |
0 | 98.45 |
![]() |
1 | 0 | 1 | 117.67 |
![]() |
1 | 0 | 1/2 | 111.78 |
![]() |
1 | 1/2 | 1/2 | 116.00 |
![]() |
1 | ![]() |
1/2 | 109.16 |
There are three more equations than there are parameters to be estimated (, [a] and [d]), therefore a least-square technique can be used. The six generation means to which we are fitting the
, [a] and [d] model are not known with equal precision; for example, the variance of the mean
of
is almost three times that of the
. The best estimates will be obtained, therefore, if generation means are weighted in relation to the accuracy of the estimates. The appropriate weights in this instance are the reciprocals of the variances of the means. For the first entry in the data (Table 5.7),
, the weight is given by
and so on for the other families.
The six equations and their weights may be combined to give three equations whose solution will lead to weighted least-squares estimates of , [a] and [d] as follows. In order to obtain the first of these three equations each of the six equations is multiplied through by the coefficient of
that it contains, and by its weight, and the six are then summed. When we weight each line of the array by
(which is always equal to 1) we have:
![]() |
[a] | [d] | Observed | ||
![]() |
![]() |
0 | ![]() |
112.541 | |
![]() |
![]() |
0 | ![]() |
65.848 | |
![]() |
0 | ![]() |
![]() |
121.327 | |
![]() |
0 | ![]() |
![]() |
227.376 | |
![]() |
![]() |
![]() |
![]() |
237.316 | |
![]() |
![]() |
![]() |
![]() |
177.931 | |
![]() |
8.3775 | ![]() |
![]() |
![]() |
942.340 |
The second and third equations are found in the same way using the coefficient of [a] for the second and of [d] for the third, along with the weights as multipliers.
To illustrate, the next line is found in the same way by multiplying each of the lines by the coefficients of [a] (i.e. ), and then summing columns thus:
![]() |
[a] | [d] | Observed | ||
![]() |
![]() |
0 | ![]() |
112.541 | |
![]() |
![]() |
0 | ![]() |
![]() |
|
0 | 0 | 0 | ![]() |
0 | |
0 | 0 | 0 | ![]() |
0 | |
![]() |
![]() |
![]() |
![]() |
118.658 | |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
0.5067 | ![]() |
![]() |
![]() |
76.385 |
Finally, the third line is obtained by multiplying through by the coefficients of [d] (i.e. 0, 0, 1, 1/2, 1/2), and then summing the columns thus:
![]() |
[a] | [d] | Observed | ||
0 | 0 | 0 | ![]() |
0 | |
0 | 0 | 0 | ![]() |
0 | |
![]() |
0 | ![]() |
![]() |
121.327 | |
![]() |
0 | ![]() |
![]() |
113.688 | |
![]() |
![]() |
![]() |
![]() |
118.658 | |
![]() |
![]() |
![]() |
![]() |
99.965 | |
![]() |
3.8860 | ![]() |
![]() |
![]() |
442.639 |
We then have three simultaneous equations, known as normal equations, which may be solved in a variety of ways to yield estimates of , [a] and [d]. A general approach to the solution is by way of matrix inversion. The three equations are rewritten in the form:
where J is known as the information matrix, M is the estimate of the parameters and S is the matrix of the scores.
The solution then takes the general form where
is the inverse of the information matrix and is itself a variance–covariance matrix.
The inversion may be achieved by any one of a number of standard procedures; for our example, inversion leads to the following solution:
The estimate of is then:
The standard error (se) of is
.
In a similar way:
All are highly significantly different from zero when looked up in a table of normal deviates.
The adequacy of the additive-dominance model may now be tested by predicting the six family means using these estimates of , [a] and [d].
For example:
On the basis of this model and using the estimates obtained, it has the expected value:
This expectation and those for the other five families are listed in Table 5.9.
Table 5.9 Observed plant heights from both parents ( and
), the
and both backcross (
and
) generations along with the expected plant height using the joint scaling test parameters, and the difference between the observed and expected plant height.
Family | Observed | Expected | Obs–Exp |
![]() |
116.300 | 115.522 | ![]() |
![]() |
98.450 | 99.122 | ![]() |
![]() |
117.675 | 117.381 | ![]() |
![]() |
111.778 | 112.351 | ![]() |
![]() |
116.000 | 116.451 | ![]() |
![]() |
109.161 | 108.252 | ![]() |
The agreement with the observed values appears to be very close, and in no case is the deviation more than 0.83% of the observed value. The goodness of fit of this model can be tested statistically by a test. Since the data comprise six observed means, and three parameters have been estimated (i.e.
, [a] and [d]), then the
value has
degrees of freedom.
The contribution made to the by
, for example, is the squared difference between Observed and Expected divided by the variance (or in our case we can multiply by one over the variance, that is, the weight. So, for example,
. Summing the six such contributions, one from each of the six types of family, gives a
of 3.411 for 3 degrees of freedom, which has a probability of between 0.40 and 0.30. The model must therefore be regarded as adequate (i.e. there is no evidence of anything beyond additive and dominance effects).
The individual scaling tests, A, B and C, referred to earlier can, of course, also be used to test the model. Thus with the present data:
leading to .
Thus , which, when compared in Student's
test statistical tables, does not differ significantly from the expected value of 0.
The joint scaling test gives exactly the same answer as the A-, B- or C-scaling tests. However, the joint scaling test does more than test the adequacy of the additive–dominance model. It also provides the ‘best’ estimates of all the parameters required (and their standard errors) to account for differences among family means when the model is adequate. If you try to estimate , [a] and [d] with the procedure shown earlier you will find values of
and
. These estimates do not differ markedly from those estimated (107.322, 8.1997, 10.0587, respectively) from the joint scaling test. However, the difference may in some cases be of importance, and an additional factor of relevance is that the joint scaling test can also be readily extended to more complex situations.
To conclude, in this example the best estimates show that the additive and dominance components are of the same order of magnitude, and since [d] is significantly positive, on average alleles that increase final height must be dominant more often than alleles that decrease it.
In some instances the results from generation testing will lead to the conclusion that an additive–dominance model of inheritance does not adequately account for the data. There are many possible explanations for this, and here only three, in order of increasing genetic complexity, will be mentioned briefly.
In the presence of BB, the difference between the AA and aa genotypes is units. However, in the presence of bb, the difference between AA and aa is
units. Of course, another way of looking at the matter might be to say that the difference between BB and bb is
units in the presence of AA, but
units in the presence of aa. Either way, it can be seen that there is interaction between the alleles at different loci and that an additive–dominant model of inheritance cannot adequately account for the situation. In fact, it is possible to add epistasis to our model. This is usually done by adding symbols: aa, for interaction between loci that are homozygous, ad for those between loci where one is heterozygous and one homozygous and dd for loci that are heterozygous.
In general it is actually quite straightforward to take into account other genetic phenomena by inclusion of appropriate parameters in the basic additive–dominant model of inheritance, and thus increasingly account for more complex genetic inheritance.
Although you should be aware of the existence of these complications, they will not be taken into any further detail in this book. Moreover, it is often found that, for most characters of interest to plant breeders, the additive–dominant model is adequate – if it fails we are then aware that the situation is more complex and act accordingly. Also, since what is of primary practical interest is the ratio of the additive genetic variance in a generation to the variance attributable to all causes (environmental, additive, dominant and all other genetic phenomena), it is often unnecessary to itemize them individually.
The concept of linkage between different loci located on the same chromosome was introduced in the qualitative genetics section. Quantitatively inherited characters are controlled by alleles at multiple loci. Yield, for example, is a highly complex character which is related to a multitude of other characters, like seedling germination and emergence, flowering times, partition, photosynthesis efficiency, nitrogen uptake efficiency, and so on, plus a susceptibility or resistance to a wide range of stresses including diseases and pests. Even if a single gene were to be responsible for all the individual factors that are involved in yield potential (which they are not), then it is easy to see that there will be hundreds or even thousands of genes that influence yield. Given that the number of chromosomes in crop species is small ( in sunflower,
in lettuce,
in rapeseed,
in maize,
in wheat,
in barley,
in rice,
in bean, and
in potato), then linkage will always be a major factor in the inheritance of quantitatively inherited traits. So this, as with other quantitative effects, adds another level of complexity. In general, the complexity of the genetics has meant that many questions remain unanswered. Some questions that might be asked are:
The concept of quantitative trait loci (QTL) was first raised by Karl Sax in 1923. Sax reported examining yield on a segregating progeny from a cross between two homozygous common bean (Phaseolus vulgaris) lines. One parent was homozygous for coloured seed while the other had white seed. A single gene at the P-locus determined seed colour, with PP alleles for coloured seed and pp for white seed. On inspection of seed weights, Sax found that PP lines produced seeds with an average weight of 30.7 g/100 seeds, heterozygotes (Pp) produced seed with 28.3 g/100 seeds, while pp lines had lowest seed weights (26.4 g/100 seeds). From this Sax introduced the concept that the quantitative loci determining seed weight were linked to the single gene locus for seed colour.
The potential of expanding this concept in plant breeding attracted the attention of many researchers after Sax's work was published. However, few advantages were achieved because plant breeders were forced to work with mainly morphologically visible single gene traits and major-gene mutants. These were not the most suitable for investigating QTLs because:
These defects have been corrected by the introduction of molecular markers, which tend to be numerous, do not affect the plant phenotype, and are often co-dominant, allowing the heterozygotes to be differentiated from the homozygotes parental types.
In plant breeding, QTLs have greatest potential in marker-assisted selection for quantitatively inherited traits that have low heritability or that are difficult or expensive to screen or evaluate.
The process involved in QTLs will be illustrated using a simple simulated example where two homozygous parents are hybridized to produce plants. One parent was homozygous AABBCC at the A-, B- and C-bands, respectively, while the other parent was homozygous aabbcc. Traditionally, these bands were separated and observed on an autoradiograph and represent an allele. Currently though, these bands are more and more frequently resolved in sequencing machines and therefore appear as a peak. It should be noted that in this example,
is not dominant to
, etc.
Thirty-two homozygous lines were derived from the family using double-haploidy techniques (see Chapter 8). These lines were grown in a four replicate field trial to determine yield of each line. In addition, the lines were polymorphic for three loci that appeared to be located on the same chromosome. The molecular marker banding at the three molecular markers (identified simply as A-, B- and C-bands, AA, BB, and CC, respectively) along with the yield of each line, is shown in Table 5.10. We use doubled haploids in this example for simplicity as there will be no heterozygotes in the population. This makes some of the calculations simpler as dominance effects can be ignored. However, the principle is the same and can be carried out using any segregating population resulting from a two-parent cross.
Table 5.10 Yield of 32 double haploid canola lines, and genotype of each line at the A-, B-, and C-loci.
Line | A-loci | B-loci | C-loci | Yield | Line | A-loci | B-loci | C-loci | Yield |
1 | AA | BB | CC | 107.80 | 17 | aa | BB | cc | 112.41 |
2 | AA | BB | CC | 113.57 | 18 | aa | bb | cc | 104.93 |
3 | AA | BB | cc | 111.68 | 19 | aa | bb | cc | 104.62 |
4 | aa | bb | CC | 101.09 | 20 | AA | BB | CC | 114.68 |
5 | aa | bb | cc | 91.29 | 21 | AA | BB | CC | 110.79 |
6 | aa | bb | cc | 112.24 | 22 | AA | bb | cc | 101.47 |
7 | aa | bb | cc | 97.17 | 23 | AA | BB | cc | 116.61 |
8 | aa | bb | cc | 95.75 | 24 | aa | bb | CC | 101.95 |
9 | aa | BB | CC | 113.52 | 25 | aa | bb | cc | 106.33 |
10 | aa | BB | CC | 119.27 | 26 | aa | bb | cc | 95.42 |
11 | AA | bb | cc | 98.40 | 27 | AA | BB | CC | 121.85 |
12 | AA | BB | CC | 106.82 | 28 | AA | BB | CC | 111.94 |
13 | AA | BB | CC | 117.61 | 29 | AA | bb | cc | 105.45 |
14 | AA | BB | CC | 112.88 | 30 | AA | bb | cc | 99.15 |
15 | AA | bb | CC | 101.58 | 31 | aa | BB | CC | 116.49 |
16 | aa | BB | CC | 119.27 | 32 | aa | bb | cc | 100.21 |
Mapping of the three qualitative loci is done according to the method described earlier, and the map is as follows:
which, when converted to cMs, is:
The first stage in QTL analysis is to determine whether there are indeed significant differences between the progeny lines. This is done by carrying out a simple analysis of variance. In our example, there were indeed significant differences between these lines (see Table 5.11).
Table 5.11 Degrees of freedom and mean squares from the analysis of variance of seed yield on 32 doubled haploid lines grown in a three replicate randomized complete block design.
Source | df | Mean square |
Between haploid lines | 31 | ![]() |
Replicate blocks | 3 | 321.1 ns |
Replicate error | 93 | 401.4 |
*** .
Where there are significant differences in yield detected between the parental lines, can this difference in yield potential be explained by association between yield and the single marker bands?
Assume, for simplicity here, that genotypes with A-bands have genotype AA, and those without have genotype aa, and similarly for the B- and C-bands. Average yield of each single band genotype can be calculated by adding the yield of lines carrying the same bands at each locus and dividing by the number of individual lines in that class. For example, the average of all lines that have the AA bands is 109.52, while for those that have the aa bands it is 105.23. Similarly, yield of the BB band types is 114.31 compared with 100.44 for bb, and for CC types it is 112.05 compared with 102.70 for cc types. From this, there appears a pattern that lines carrying the BB band rather than the bb band have the largest yield advantage. Similarly, lines carrying the CC band over the cc band also have an advantage (albeit smaller than with the B-band). AA and aa lines differ only slightly. To apply significance to these differences requires partition of the sum of squares for differences between lines into:
In this simple example, there are 16 lines that are AA and 16 that are aa. Similarly there are 16 lines that are BB, bb, CC and cc. Therefore it is completely balanced. In this instance the partition of the lines' sum of squares is by a simple orthogonal contrast. In actual experiments, the number of individuals in each class is likely to vary, and the BG–SS partition is completed by:
where is the mean of lines with the 11 genotype, and
is the number of lines with the 11 genotype. In this example, for the AA and aa genotypes we would have:
The sum of squares for variation within genotypes (WG–SS) is obtained by subtracting the variation between types (above) from the total sum of squares between lines:
In the case of the AA and aa bands we have:
The degree of freedom for the between-genotype sum of squares is 1, while the degrees of freedom for the within-genotype sum of squares is the total number of lines minus two (in our example, ).
Completing this operation for the other two bands, we have the mean squares from three analyses of variance (see Table 5.12).
Table 5.12 Degrees of freedom and mean squares from the analyses of variance of seed yield between and within progeny that are polymorphic at the AA:aa, BB:bb and CC:cc loci.
Source | df | AA–aa locus | BB–bb locus | CC–cc locus |
Between genotypes | 1 | 2,351 ns | ![]() |
![]() |
Within genotypes | 30 | ![]() |
463 ns | ![]() |
Replicate error | 93 | 459 | 459 | 459 |
*
**
***
The within genotypes effect is tested against the replicate error, while the between genotype effect is tested against the within genotype mean square.
Clearly, there is a significant relationship between seed yield and alleles at the B-bands. Similarly, some relationship exists between the C-band and yield, although the variability with genotypes CC and cc are highly significant, hence weakening the QTL relationship. There is no relationship between seed yield and bands at the A-band.
From the above analysis of variance, our best guess to the position of the QTL would be between the B- and C-bands, and nearer to the B than the C. Determination of the position of the QTL on the chromosome can be done using a number of statistical techniques. The simplest technique involves regression, and will be illustrated here.
Now the difference in yield between genotypes at each band is an indication of the linkage between the QTL and the single band position. In this example we have:
Given a simple additive–dominance model of inheritance, we find that lines that have BB and will have expectation of
, and this genotype will occur in the population with frequency
, where
is the recombination frequency between the B-band and the QTL. The BB lines without the QTL
will be
, and will occur in the population with frequency
, where
is the recombination frequency between the B-band and the QTL. Similarly, for bb we have
,
,
. The difference between the BB and bb genotypes
is therefore equal to
.
There is therefore a linear relationship between s and
:
where the regression slope is an estimate of .
The value of and the accuracy of fit of regression is dependent upon
, the recombination frequency between the three single band positions and the QTL. We know the map distance between A–B and B–C. Therefore all that is now required is to substitute in recombination frequencies to find the recombination frequency with the least departure from regression in a regression analysis of variance. It is usual to start with the assumption that the QTL is located at the A-band position and complete a regression analysis. Then assume that the QTL is 2 cM from A, towards B, and carry out another analysis. Repeat this operation until it is assumed that the QTL is located at the C-locus. Thereafter determine which of the regression analyses has the best regression fit (with least departure from regression term) and the QTL will be located at that map location. From this the recombination frequencies between the various single band positions and the QTL can be calculated to determine the usefulness of the linkage with the QTL and hence the usefulness in practice.
To avoid duplication, the speculative map distances from each single band position and the QTL in our example is shown from around the map location with minimum departure from regression:
A-locus | B-locus | C-locus | Residual sum | |
of squares | ||||
![]() |
2.145 | 6.935 | 4.475 | |
![]() |
0.300 | 0.013 | 0.193 | 1932 |
0.310 | 0.003 | 0.183 | 802 | |
0.3198 | 0.01478 | 0.17148 | 0 | |
0.330 | 0.017 | 0.163 | 339 |
From this the resulting map, including the QTL, would be:
which, when converted to cMs, is:
In this simple case the QTL and the B-bands are tightly linked and therefore selection based on the B-band would be highly effective in selecting for the QTL, and hence high seed yield. Actual examples in plant breeding, however, are rarely this close. The close proximity of the QTL to the B-band is also a reflection of the high recombination frequencies (low linkage) between the three bands in this simple example. If the recombination frequencies between A–B and B–C were halved (i.e. 15.7% and 9.0%), the QTL would have a recombination of 3% with the B-band position. Similarly, this example looked at only three bands on a single chromosome; in real situations, many chromosomes will be involved and more loci examined on each chromosome. However, the underlying theory is the same.
893 tall plants with long leaf margin hairs; |
313 tall plants with short margin hairs; |
0 short plants with long leaf margin hairs; and |
394 short plants with short margin hairs. |
Give a genetic explanation of what underlies the observed frequency of phenotypes and test your theory using a suitable statistical test.
Determine the percentage recombination between the foot-rot and yellow stripe rust loci.
Given the frequency of phenotypes with the above percentage recombination, how many plants would need to be evaluated to be 99% certain of obtaining at least one plant that is resistant to both foot-rot and yellow stripe rust? In contrast, how many would need to be evaluated to be 99% certain of obtaining at least one plant that is resistant to both foot rot and yellow stripe rust, given independent assortment of the two loci involved?
A breeding programme aims to produce potato parental lines that are either quadruplex or triplex for the potato cyst nematode resistant allele . A cross is made between two parents, which are known to be duplex for the resistant allele
. What would be the expected genotype and phenotype of progeny from this
(i.e.
) cross?
A properly designed experiment was carried out in canola where parent lines ( and
), were grown alongside
and
progeny plants from the cross
. The average yield of plants from each of the four families, the variance of each family and the number of plants evaluated from each family were:
Family | Mean | Variance | Number of |
yield | of yield | plants | |
![]() |
1,901 | 74 | 21 |
![]() |
1,502 | 68 | 21 |
![]() |
1,429 | 69 | 21 |
![]() |
1,888 | 102 | 31 |
Use an appropriate test to determine whether the additive–dominance model of inheritance is adequate to explain the genetic variation in yield in canola.
What can you conclude from the results obtained above? What could have caused this to occur?
From a properly designed field trial that included and
families, the following yield estimates were obtained.
From these family means, estimate the expected value of and
, based on the additive–dominance model of inheritance
LLGG | L-gg | llG- | llgg |
891 | 312 | 0 | 397 |
Complete an appropriate analysis to explain and interpret this segregation pattern.