Chapter 5
Genetics and Plant Breeding

5.1 Introduction

Early plant breeders, basically farmers, did not have any knowledge of the inheritance of characters in which they were interested. The only knowledge they possessed was that the most productive offspring tended to originate from the most productive plants, and that the better flavour types tended to be derived from parent plants which were, themselves, of better flavour. None the less, the achievements of these breeders were remarkable and should never be underestimated. They moulded most of the crops as we recognize them today from their wild and weedy ancestral types.

In modern plant breeding schemes it is recognized, however, that it is very much more effective and efficient (or indeed essential) to have a basic knowledge of the inheritance or genetics of traits for which selection is to be carried out.

There are generally five different areas of genetics that have been applied to plant breeding:

Qualitative genetics, where inheritance is controlled by alleles at a single locus, or at very few loci.
Population genetics, which deals with the behaviour (or frequency) of alleles in populations and the conditions under which they remain in equilibrium or change, thus allowing predictions to be made about the properties and changes expected in populations.
Quantitative genetics, for traits where the variation is determined by alleles at more than a few loci; traits that are said to be controlled by polygenic systems. Quantitative genetics is concerned with describing the variation present in terms of statistical parameters such as progeny means, variances and covariances.
Cytogenetics, the study of the behaviour and properties of chromosomes, being the structural units that carry the genes that govern expression of all the traits.
Molecular genetics, where studies are carried out at the molecular level. Molecular techniques have been developed to investigate and handle both qualitative and quantitative characters. Although the details of molecular genetics are generally outside the scope of this book, the impact of molecular genetic techniques is increasing, important and relevant.

5.2 Qualitative genetics

Few sciences have as clear-cut a beginning as modern genetics. As mentioned previously, early plant breeders were aware of some associations between parent plant and offspring, and at various times in history researchers had carried out experiments to study such associations. However, experimental genetics with real meaning began in the middle of the 19th century with the work of Gregor Johann Mendel, and was only fully appreciated after the turn of the 20th century.

Mendel's definitive experiments were carried out in a monastery garden on pea lines. The flowers of pea plants are so constructed as to favour self-pollination, and as a result the majority of lines used by Mendel were either homozygous or near-homozygous genotypes. Mendel's choice of peas as an experimental plant species offered a tremendous advantage over many other plant species he might have chosen. The differences in characters he chose were also fortuitous. Therefore many present-day scientists have argued that Mendel had a great deal of luck associated with his findings because of the choices he made over what to study. When this is combined with segregation ratios that are better than might be expected by chance, many have concluded that he must ‘have already foreseen the results he expected to obtain’.

Before considering an example of one of Mendel's experiments, there are a few general points to be made about his experiments. Others, before Mendel, had made controlled hybridizations or crosses within various species. So why did Mendel's crosses, rather than those of earlier workers, provide the basis for the modern science of genetics? First and foremost, Mendel had a brilliant analytical mind that enabled him to interpret his results in ways that defined the principles of heredity. Secondly, Mendel was a proficient experimentalist. He knew how to carry out experiments in such a way as to maximize the chances of obtaining meaningful results. He knew how to simplify data in a meaningful way. As parents in his crosses, he chose individuals that differed by sharply contrasting characters (now known to be controlled by single genes). Finally, as noted above, he used true breeding (homozygous) lines as parents in the crosses he studied.

Among other elements in Mendel's success were the simple, logical sequence of crosses that he made and the careful numerical counts of his progeny that he recorded with reference to the easily definable characteristics on whose inheritance he focused his attention. It should be noted that many of the features listed as reasons for Mendel's success are very similar to the criteria necessary to carry out a successful plant breeding programme (including the ‘luck element’).

Let us consider some of Mendel's experiments as an introduction to qualitative inheritance. These results are presented in Table 5.1. When Mendel crossed plants from a round-seeded line with plants from a wrinkled-seeded line, all of the first generation $c05-math-0008$ had round seeds. The characteristic of only one of the parental types was therefore represented in the progeny. In the next generation $c05-math-0009$ , achieved by selfing the $c05-math-0010$ , both round-seed and wrinkled-seed were found in the progeny. Mendel's count of the two types in the $c05-math-0011$ was 5,474 round-seed and 1,850 wrinkled-seed, a ratio of 2.96:1.

c05-math-0001 — **Table 5.1** Results from some of Mendel's crossing experiments with peas.

c05-math-0002 — **Table 5.1** Results from some of Mendel's crossing experiments with peas.

Mendel found it notable that the same general result occurred when he made crosses between plants from lines differing for other characters. Another example is when he crossed peas with yellow cotyledons with ones with green cotyledons; the $c05-math-0012$ s all had yellow cotyledons, while he found a ratio of 3.01 yellows to 1 green in the $c05-math-0013$ . Almost identical results were obtained when long-stemmed plants were crossed with short-stemmed plants, and when plants with axial inflorescence were crossed with plants with terminal inflorescence.

How did Mendel interpret his generalized findings? One of the keys to his solution was his recognition that in $c05-math-0014$ the heredity basis for the character that fails to be expressed was not lost. This expression of the character appears again in the $c05-math-0015$ generation. Recognizing the idea of dominance and recessiveness in heterozygous genotypes and the particulate nature of the heritable factors was the overwhelming genius of Gregor Mendel. This laid the foundation of genetics and hence the explanation underlying the most important features of qualitative genetics.

5.2.1 Genotype/phenotype relationships

Within genetic studies there are two interrelated points. The first is concerned with the actual genetic makeup of individual plants or segregating populations and is referred to as the genotype. The second is related to what is actually expressed or observed in individual plants or segregating populations, and is termed the phenotype.

In the absence of any environmental variation (which can often be assumed with qualitative, single, major gene inheritance), the most frequent cause of difference between genotype and phenotype is due to dominance effects. For example, consider a single locus and two alleles ( $c05-math-0016$ and $c05-math-0017$ ). If two diploid homozygous lines are crossed where one has the genotype AA and the other aa, then the $c05-math-0018$ will be heterozygous at this locus (i.e. Aa). If this resembles exactly the AA parent, then allele $c05-math-0019$ is said to be dominant over allele $c05-math-0020$ . On selfing the $c05-math-0021$ , genotypes will occur in the ratio $c05-math-0022$ If, however, the allele $c05-math-0023$ is completely dominant to $c05-math-0024$ , the phenotype of the $c05-math-0025$ will be $c05-math-0026$ Therefore, complete dominance occurs when the $c05-math-0027$ shows exactly the same phenotype as one of the two parents in the cross.

Compare these ratios to those observed by Mendel (Table 5.1). You will notice that Mendel's data do not fit exactly to a 3:1 ratio that would be expected, given that each trait is controlled by a single completely dominant gene. It should, however, be remembered that gamete segregation in meiosis and pairing in fertilization are random events. Therefore it is highly unlikely, because of sampling variation, that exact ratios are found in such experiments. Indeed, applying statistical considerations, many modern researchers have claimed that Mendel's data appears to be ‘too good a fit’ for purely random events. Whether these geneticists are correct or not does not, however, detract in any way from the remarkable achievements of Mendel.

5.2.2 Segregation of qualitative genes in diploid species

To illustrate segregation of qualitative genes, and indeed many following concepts, consider a series of simple examples. Firstly, consider two pairs of single genes in spring barley (H. vulgare L.). Say that dwarf barley plants differ in plant height from tall types at the t-locus, with tall types given the genetic constitution TT, and dwarf lines tt. Barley lines with 6-row ears differ from lines with 2-row ears, where the central florets do not set, at the S-locus, with 6-row SS types being dominant over 2-row ss types. Let us consider the case where two homozygous lines are artificially hybridized, where one parent is homozygous tall and 6-row and the other is dwarf and 2-row. First consider the phenotypes expected for each trait.

When the tall $c05-math-0031$ is backcrossed to the dwarf parent, a ratio of 1 tall : 1 dwarf is obtained in the resulting progeny. Similarly, when the six-row $c05-math-0032$ is crossed with a homozygous two-row, a ratio of 1 six-row : 1 two-row is obtained. Evidently, the difference between tall and dwarf behaves as a single major gene with the tall allele showing complete dominance to the dwarf allele; and the difference between six-row and two-row is also a single gene with six-row showing complete dominance to two-row. In terms of segregating alleles (i.e. genotypes), the above example would be:

Where TT and Tt and similarly SS and Sc have the same phenotype, we get the 3:1 phenotypic segregation ratio. If each allele exerts equal effect (additive, and no dominance) then we would have three expected phenotypes in the Mendelian ratio of one tall, two intermediate and one short, according to the 1 TT : 2 Tt : 1 tt, or one 6-row, two intermediate and one 2-row.

Now assume that a large number of the $c05-math-0037$ plants were allowed to set selfed seed: what would be the ratio of phenotypes and genotypes in the $c05-math-0038$ generation? To answer this, consider that only the heterozygous $c05-math-0039$ plants will segregate (i.e. TT and tt types are now homozygous for that trait and so will breed true for that character). Of course the heterozygous $c05-math-0040$ has the same genotype as the $c05-math-0041$ , and so it is no surprise that it segregates in the same ratio. At $c05-math-0042$ we therefore have $c05-math-0043$ which results in 3/8 TT : 2/8 Tt : 3/8 tt. Similar results of 3/8 SS : 2/8 Ss : 3/8 ss would be obtained for 6-row versus 2-row. Applying these simple rules and expanding this to later generations, we get:

Note that each successive generation of selfing reduces the proportion of heterozygotes by half (i.e. 1/2 Tt at $c05-math-0049$ , 2/8 Tt at $c05-math-0050$ , 2/16 Tt at $c05-math-0051$ and 2/32 Tt at $c05-math-0052$ , etc.

Let us now consider how these two different pairs of alleles behave with respect to each other in inheritance. One way to study this is to cross individuals that differ in both characteristics:

When this cross is made, the $c05-math-0054$ shows both dominant characteristics, tall and 6-row. Plant breeders' interests in genetics mainly relate to selection, and as no selection takes place at the $c05-math-0055$ stage, so the interest begins when the self-pollinated progeny of the $c05-math-0056$ is considered. In this case the $c05-math-0057$ individuals are assumed to produce equal frequencies of four kinds of gametes during meiosis (TS, Ts, tS and ts). An easy way to illustrate the possible combination of $c05-math-0058$ progeny is using a Punnett square, where the four gamete types from the male parent are listed in a row along the top, and the four kinds from the female parent are listed in a column down the left-hand side. The 16 possible genotype combinations are then obtained by filling in the square, that is:

Gametes from female parent	Gametes from male parent
	TS	Ts	tS	ts
TS	TSTS	TSTs	TStS	TSts
Ts	TsTS	TsTs	TstS	Tsts
tS	tSTS	tSTs	tStS	tSts
ts	tsTS	tsTs	tstS	tsts

This also can be written, putting the allele representations for the same locus together, as:

Gametes from female parent	Gametes from male parent
	TS	Ts	tS	ts
TS	TTSS	TTSs	TtSS	TtSs
Ts	TTSs	TTss	TtSs	Ttss
tS	TtSS	TtSs	ttSS	ttSs
ts	TtSs	Ttss	ttSs	ttss

Collecting the like genotypes we get the following frequency of genotypes:

1/16 TTSS;	2/16 TTSs;	1/16 TTss
2/16 TtSS;	4/16 TtSs;	2/16 Ttss
1/16 ttSS;	2/16 ttSs;	1/16 ttss

and frequency of phenotypes:

As with the single gene case above, if the alleles do not show dominance then there would indeed be nine different phenotypes in the ratio:

In most cases when developing inbred cultivars, plant breeders carry out selection based on plant phenotype amongst early generation segregating populations (i.e. $c05-math-0061$ and $c05-math-0062$ ). It is obvious that 75% of the $c05-math-0063$ plants will be heterozygous at one or both loci, and dominance effects can mask the actual genotypes that are to be selected. Single $c05-math-0064$ plant selections for the recessive traits (short stature and 2-row) allow for identification of the desired genotype (ttss) but only 1/16th of $c05-math-0065$ plants will be of this type, while the recessive expression of the trait in most plants is hidden due to dominance.

The effects of heterozygosity on selection can be reduced through successive rounds of self- pollination. Plant breeders, therefore, do not only select for single gene characters at the $c05-math-0066$ stage. Consider now what would happen if a sample of $c05-math-0067$ plants from the above example were selfed: what would be the resulting segregation expected at the $c05-math-0068$ stage?

There are nine different genotypes at the $c05-math-0069$ stage, TTSS, TTSs, TTss, TtSS, TtSs, Ttss, ttSS, ttSs and ttss, and they occur in the ratio 1 : 2 : 1 : 2 : 4 : 2 : 1 : 2 : 1, respectively. Obviously if any genotype is homozygous at a locus then these plants will not segregate at that locus. For example, plants with the genetic constitution of TTSS will always produce plants with the TTSS genotype. $c05-math-0070$ plants with a genotype of TTSs will not segregate for the Tt locus and will only segregate for the Ss locus. Similarly, $c05-math-0071$ plants with a genotype of TtSs will segregate for both loci, with the same segregation frequencies as the $c05-math-0072$ .

From this we have:

Summing over rows, this results in a genotypic segregation ratio of 9/64 TTSS : 6/64 TTSs : 9/64 TTss : 6/64 TtSS : 4/64 TtSs : 6/64 Ttss : 9/64 ttSS : 6/64 ttSs : 9/64 ttss.

The phenotypic expectation at the $c05-math-0075$ would therefore be:

This result could be obtained in a simpler manner by considering the segregation ratio of each trait separately and using these to form a Punnett square. For example, the frequency of homozygous tall (TT), heterozygous tall (Tt) and homozygous short (tt) at the $c05-math-0077$ is 1:2:1, respectively. Similarly, the frequency of SS, Ss and ss is also 1:2:1. We can use these frequencies to construct a Punnett square such as:

This would give the same genotypic and phenotypic frequencies as shown above. It is, however, necessary to be familiar with the more direct method for situations where segregation is not independent (i.e. in cases of linkage).

It should be noted, as in the single gene case previously described, that increased selfing results in increased homozygosity. Therefore with increased generations of selfing there is an increase in the frequency of expression of recessive traits. If, in this case, a breeder wished to retain the 6-row short plant type, then it would be expected that 3/16 (approximately 19%) of all $c05-math-0084$ plants will be of this type $c05-math-0085$ , and only 1/3 of these would at this stage be homozygous for both traits. If selection were delayed until the $c05-math-0086$ generation, then 15/64 (or just over 23%) of the population would be of the desired type, and now 60% of the selections would be homozygous for both genes. Obviously, if selection is delayed until after an infinite amount of selfing, then the frequency of the desired types would be 25% (and all homozygous).

5.2.3 Qualitative loci linkage

The principle of independent assortment of alleles at different loci is one of the cornerstones on which an understanding of qualitative genetics is based. Independent assortment of alleles does not, however, always occur. When certain different allelic pairs are involved in crosses, deviation from independent assortment regularly occurs if the loci are located on the same arm of the same chromosome. This will mean that there will be a tendency of parental combinations to remain together, which is expressed in the relative frequency of new combinations, and is the phenomenon of linkage.

It is often desirable to have knowledge of linkage in plant breeding to help predict segregation patterns in various generations and to help in selection decisions. All genetic linkages can be broken by successive generations of sexual reproduction, although it may take many rounds of recombination before tight linkages are broken. In general, however, if two traits of interest are adversely linked (i.e. the desired combination appears with lower than expected frequency), then increased opportunities for recombination need to be given before selection takes place.

The ratio of $c05-math-0087$ progeny after selfing $c05-math-0088$ individuals (i.e. say AaBb), of equal numbers of gametes of the four possible genotypic combinations (AB, Ab, aB, ab), leads to the ratio of $c05-math-0089$ . Similarly, we find the genetic ratio of $c05-math-0090$ , on test-crossing the $c05-math-0091$ to a complete recessive (aabb). Any significant deviation from these expected ratios is an indication of linkage between the two gene loci.

Consider a second simple example of recombination frequencies derived from a test cross, how the test cross can be used to estimate recombination ratios, and hence ascertain linkage between characters. Consider again the dwarfing gene in spring barley, but this time with a third single gene that confers resistance to barley powdery mildew. Mildew-resistant genotypes are designated as RR and susceptible genotypes as rr. A cross is made between two barley genotypes where one parent is homozygous tall and mildew resistant (i.e. TTRR); and the other is short and mildew susceptible (i.e. ttrr). We would expect the $c05-math-0092$ to be tall and resistant (i.e. TtRr). When the heterozygous $c05-math-0093$ is crossed with a completely recessive genotype (i.e. ttrr) we would expect to have a 1:1:1:1 ratio of phenotypes, tall and resistant: tall and susceptible: short and resistant: short and susceptible. When this cross was carried out and the progeny from the test cross was examined, the following frequencies of phenotypes and genotypes were the ones actually observed.

Linkage is obviously indicated by these results, as the observed frequencies are markedly different from those expected at 50:50:50:50, on the basis of independent assortment. The two phenotypes, tall and resistant and short and susceptible, occur at a considerably higher frequency (79 and 81, respectively) than we expected. You will readily note that these are the same phenotypes as the two parents. The other two phenotypes (the non-parental, or recombinant, types) were observed with much lower frequency (18 and 22, respectively). Added together, the two recombinant types (short and resistant, and tall and susceptible) only account for 40 (20%) out of the 200 $c05-math-0099$ progeny examined, while we expected their contribution to collectively make up 50% of the progeny. Therefore the $c05-math-0100$ plants are producing TR gametes or tr gametes in 80% of meiotic events, or a frequency of 0.4 for each type. Similarly, recombinant Tr or tR gametes are produced in only 20% of meiotic events, or a frequency of 0.1 for each type. Knowing the frequency of the four gamete types, we can now proceed to predict the frequency of genotypes and phenotypes we would expect in the $c05-math-0101$ generation using a Punnet square.

Collecting like phenotypes together results in:

From a breeding standpoint, compare these phenotypic ratios to those that would have been expected with no linkage (i.e. $c05-math-0111$ ). If, as might be expected, the aim was to identify individual plants which were short and resistant, then the actual occurrence of these types would drop from 19% to 9%, by more than half. It should also be noted that 20% recombination is not particularly high.

Overall, therefore, genes on the same chromosome (particularly on the same chromosome arm) are linked and do not segregate independently as they would when they are located on different chromosomes. Two heterozygous loci may be linked in coupling or repulsion. Coupling is present when desirable alleles at two loci are present together on the same chromosome and the unfavourable alleles are on another (e.g. TR/tr). Repulsion is present when a desirable allele at one locus is on the same chromosome as an unfavourable allele at another locus (e.g. Tr/tR).

In the test cross $c05-math-0112$ the situations shown in Table 5.2 can arise.

c05-math-0116 — **Table 5.2** Expected frequency of genotypes resulting from a test cross where *TrRr* is crossed to *ttrr* with independent assortment, and with coupling and repulsion linkage.

c05-math-0117 — **Table 5.2** Expected frequency of genotypes resulting from a test cross where *TrRr* is crossed to *ttrr* with independent assortment, and with coupling and repulsion linkage.

The percentage recombination, expressed as the number of map units between loci, can be calculated from the equation:

Depending on what the initial parents were, this will be estimated as:

Linkage can also be detected, but less efficiently, by selfing the $c05-math-0124$ and observing the segregation ratio of the $c05-math-0125$ to determine whether there is any deviation from the expectation, based on independent assortment (e.g. 1 TTRR : 2 TTRr : 1 TTrr : 2 TtRR : 4 TtRr : 2 Ttrr : 1 ttRR : 2 ttRr : 1 ttrr).

The relative positions of three loci can be mapped by considering the frequency of progeny from a three gene test cross. To examine three gene test crosses, consider the following cross between two parents where one parent has the genotype AABBCC and the other has the genotype aabbcc. The $c05-math-0134$ would have the genotype AaBbCc. In order to perform the test cross it is necessary to cross the $c05-math-0135$ family with a completely recessive genotype (i.e. aabbcc, in this case the recessive parent) and observe the frequency of the eight possible phenotypes. In our example the frequencies are as shown in Table 5.3.

**Table 5.3** Observed and expected genotypes obtained from backcrossing the $c05-math-0126$ progeny from a cross between two parents where one parent has the genotype *AABBCC* and the other has the genotype *aabbcc*, to the recessive (*aabbcc*) parent.

c05-math-0127 — **Table 5.3** Observed and expected genotypes obtained from backcrossing the $c05-math-0126$ progeny from a cross between two parents where one parent has the genotype *AABBCC* and the other has the genotype *aabbcc*, to the recessive (*aabbcc*) parent.

If all three loci segregated independently, for example if they were on different chromosomes, we would expect the phenotype frequencies of the eight genotypes to be the same at 149.375 (or the total number of observations divided by the expected number of phenotype classes, $c05-math-0136$ ). Obviously, the observed frequencies are very different from those expected.

The first point to note is that the frequency of the parental genotypes is considerably higher than expected, while all other, recombinant, types are less than expected.

Next, it is necessary to know the relative order that the loci appear on the chromosome. Such ordering or arrangement of loci along a chromosome is known as a genetic map. The middle locus can usually be determined from the phenotype that is observed at the lowest frequency. In this example the lowest observed phenotypes are $c05-math-0137$ and $c05-math-0138$ , both of which are effectively parental types at the $c05-math-0139$ and $c05-math-0140$ locus but involves non-parental combinations with $c05-math-0141$ . As recombination of only the centre gene will involve a double crossover event, the frequency of occurrence will be lowest. Conversely, the pair with most phenotypic observed classes for recombinants (not counting double crossovers) involves the outside genes. From this, the $c05-math-0142$ -locus would appear to be the middle one (i.e. requires a recombination event between $c05-math-0143$ - and $c05-math-0144$ -locus, plus recombination between the $c05-math-0145$ - and $c05-math-0146$ -locus).

Consider the possible combinations of alleles and the frequency at which they occur. For the $c05-math-0147$ loci we have:

So percentage of $c05-math-0149$ .

For $c05-math-0150$ loci we have:

So percentage of $c05-math-0152$ .

For the $c05-math-0153$ loci we have:

So percentage of $c05-math-0155$ .

We obtain the map distances as being equivalent to the percentage recombinants and so we have:

From the map it is clear that the map distance $c05-math-0157$ to $c05-math-0158$ $c05-math-0159$ is less than the added distance A–C plus C–B. The method of calculation used to estimate these distances assumes that only the non-parental types (i.e. recombinants) are included as the results of recombination, as would be the case where the linkage between two loci is considered. Where three loci are involved it is necessary to include the double crossover recombination events in estimating the distance between the furthest two loci. In this case we could calculate the $c05-math-0160$ distance from:

Therefore, the percentage of $c05-math-0173$ , giving an estimated genetic distance between A–B of $c05-math-0174$ , which is now the same distance as estimated by simply adding $c05-math-0175$ to $c05-math-0176$ .

This indicates a general flaw in linkage map distance estimation, in that where there is no ‘centre gene’ locus, it is impossible to detect double recombination events, and as a result map distances will always provide underestimates of actual recombination frequencies.

To avoid such discrepancies, recombination frequencies, which are not additive, are usually converted to a cM scale using the function published by J.B.S. Haldane in 1919, being:

where $c05-math-0178$ is the recombination frequency. From this we see that the map distances transform to:

The standard error of these map distances can be calculated using the equation:

where $c05-math-0181$ is the recombination frequency, and $c05-math-0182$ is the number of plants observed in the three-way test cross.

5.2.4 Pleiotropy

Very tight linkage between two loci can be confused with pleiotropy, the control of two or more characters by a single gene. For example, the linkage of resistance to the soybean cyst nematode and seed coat colour seems to be a case of pleiotropy because the two characters are always inherited together. The only way linkage and pleiotropy can be distinguished is effectively the negative way, that is, to find a crossover product such as a progeny homozygous for resistance and yellow seed coat. Resistance and yellow seed coat could never occur in a true breeding individual if true pleiotropy was present. Thousands of individuals may have to be grown to break a tight linkage and prove that it is not actually pleiotropy.

5.2.5 Epistasis

Different loci may be independent of each other in their segregation and recombination patterns, but independence of gene transmission, however, does not necessarily imply independence of gene action or expression. In fact, in terms of its final expression in the phenotype of the individual, no gene acts by itself.

A character can be controlled by genes that are inherited independently but that interact to form the final phenotype. The interaction of genes at different loci that affect the same character is called either non-allelic interaction or epistasis. Epistasis was originally used to describe two different genes that affect the same character, one that masks the expression of the other. The gene that masks the other is said to be epistatic to it. The gene that is masked was termed hypostatic. Epistasis causes deviations from the common phenotypic ratios in $c05-math-0183$ such as 9:3:3:1, which indicates segregation of two independent genes, each with complete dominance.

To examine epistasis, consider the following simple examples. A white kernel wheat cultivar is crossed to one with coloured kernels. The $c05-math-0184$ progeny all have coloured kernels. This situation could easily be interpreted as indicating that coloured kernels (CC) are dominant to white (cc). However, when the $c05-math-0185$ progeny is examined, the ratio observed is 15 coloured kernels to 1 white kernel, a much higher proportion of coloured kernels than expected (3:1) if a single dominant gene is responsible. In fact, two genes control kernel colour in wheat (gene $c05-math-0186$ and gene $c05-math-0187$ ). Coloured wheat kernels are produced from a precursor product being acted on by an enzyme, or in this case either one of two enzymes, enzyme A or enzyme B. So when a homozygous wheat cultivar with coloured kernels (AABB) is crossed to one with white kernels (aabb), the $c05-math-0188$ (AaBb) produces both the A enzyme and the B enzyme, and hence all kernels are coloured. Now phenotypes from the $c05-math-0189$ progeny should have a ratio of $c05-math-0190$ . However, coloured kernels can result from either the $c05-math-0191$ -gene (enzyme A) or the $c05-math-0192$ -gene (enzyme B), acting alone or together, so then the first 3 types ( $c05-math-0193$ , and $c05-math-0194$ ) all have coloured kernels and only the aabb types have white kernels. This is called duplicate dominant epistasis where $c05-math-0195$ is epistatic to $c05-math-0196$ and bb, and $c05-math-0197$ epistatic to $c05-math-0198$ and aa.

Consider now a cross between two sweet pea cultivars, both homozygous, one with coloured flowers (anthocyanin) and the other with white flowers (no anthocyanin). All $c05-math-0199$ plants have coloured flowers. Again this could be interpreted as a situation whereby flower colour is controlled by a single dominant gene (CC). However, when the $c05-math-0200$ progeny is examined the ratio observed is 9 coloured flowers: 7 white flowers, certainly not a simple 3:1 ratio, or indeed a 9:3:3:1 ratio, expected in a simple dominant case of one and two genes, respectively. Once again it is known that two genes control flower colour in sweet pea (gene $c05-math-0201$ and gene $c05-math-0202$ ). Anthocyanin production (and hence coloured flowers) in sweet pea requires two operations whereby a precursor is acted on by an enzyme C, produced by the presence of the dominant $c05-math-0203$ -allele, to produce an intermediate product, which is then acted on by a different enzyme P, produced by the dominant $c05-math-0204$ -allele. Therefore anthocyanin and coloured flowers only occur when both the $c05-math-0205$ - and $c05-math-0206$ -alleles are present. So coloured sweet pea flowers can only result from both the enzyme C ( $c05-math-0207$ -gene) and enzyme P ( $c05-math-0208$ -gene) produced together (i.e. 9/16 $c05-math-0209$ ). From the other possible types: $c05-math-0210$ produces enzyme $c05-math-0211$ , but not enzyme P; $c05-math-0212$ produces enzyme P but not enzyme C; while ccpp produces neither enzyme, and so all these types have white flowers. This is termed duplicate recessive epistasis whereby cc is epistatic to $c05-math-0213$ , and pp is epistatic to $c05-math-0214$ .

One final example relates to fruit colour in squash where two loci are involved. At one locus, fruit colour is recessive to no colour. The first locus must be homozygous for the recessive allele before the second colour gene at another locus is expressed. At the first locus, white fruit colour is dominant $c05-math-0215$ over coloured fruit $c05-math-0216$ . At the second locus yellow fruit colour $c05-math-0217$ is dominant over green colour $c05-math-0218$ , which is recessive. When a heterozygous $c05-math-0219$ is self-pollinated, three different fruit colours white:yellow:green are observed in a 12:3:1 ratio. All $c05-math-0220$ phenotypes (i.e. 9/16 $c05-math-0221$ and 3/16 $c05-math-0222$ ) are white, because of dominant epistasis where $c05-math-0223$ is epistatic to $c05-math-0224$ and $c05-math-0225$ . Phenotypes with $c05-math-0226$ (3/16) have yellow fruit, while wwgg genotypes (1/16) have green fruit.

All possible phenotypic ratios in $c05-math-0227$ for two unlinked genes, as influenced by dominance at each locus and epistasis between loci, are shown in Table 5.4.

**Table 5.4** Phenotypic ratios of progeny in the $c05-math-0228$ generation for two unlinked genes (where $c05-math-0229$ is dominant to $c05-math-0230$ , and $c05-math-0231$ is dominant to $c05-math-0232$ ), with epistasis between loci.

**Table 5.4** Phenotypic ratios of progeny in the $c05-math-0228$ generation for two unlinked genes (where $c05-math-0229$ is dominant to $c05-math-0230$ , and $c05-math-0231$ is dominant to $c05-math-0232$ ), with epistasis between loci.

In some ways the interaction between two different loci, in which the allele at one locus affects the expression of the alleles at another, will remind you of the phenomenon of dominance. The two phenomena are essentially different, however. Dominance always refers to the expression of one member of a pair of alleles relative to the other at the same locus, as opposed to another locus; epistasis is the term generally used to describe effects of non-allelic genes (i.e alleles at different loci) on each other's expression, in other words their interaction.

5.2.6 Qualitative inheritance in tetraploid species

Compared with crop species that are cultivated as pure inbred lines, there has been comparatively little research carried out on inheritance in auto-tetraploid species. This has been primarily due to the fact that the major autotetraploid crop species (e.g. potato) are clonally reproduced or are outbreeding species that suffer severe inbreeding depression (i.e. alfalfa). Many (or all) of these cultivars are highly heterozygous, and hence it is not as easy to carry out simple genetic experiments.

However, major gene inheritance in autotetraploids has been the topic of research/breeding groups with the aim of parental development. Consider, for example, the potato crop, where there are several single traits that are controlled by major genes such as resistance to Potato Virus X, Potato Virus Y, Potato Cyst Nematode (G. rostochiensis) and late blight (Phytophthora infestans). All these qualitative traits show complete dominance. The technique used to develop parents in a breeding programme is aimed at increasing the proportion of desirable offspring in sexual crosses, and in the extreme to avoid the need to test breeding lines for the presence of the allele of interest. The technique is called multiplex breeding.

In tetraploid crops any genotype may be nulliplex (aaaa), having no copies of the desired allele at the specific $c05-math-0250$ locus; simplex (Aaaa), having only one copy of the desirable resistance allele $c05-math-0251$ ; duplex (AAaa), having two copies of the allele; triplex (AAAa), having three copies of the allele; or quadruplex (AAAA) having four copies of the allele (i.e. homozygous at that locus). If the alleles show dominance, then genotypes that have at least one copy of the gene will, phenotypically, appear identical in terms of their resistance. But they will differ in their effectiveness as parents in a breeding programme. To determine the genotype of a clonal line, test crossing is necessary.

To illustrate the usefulness of multiplex breeding in potato, consider the problem of developing a parental line which, when crossed to any other line (irrespective of the genotype of the second parent), will give progeny all of which will be resistant to potato cyst nematode by having at least one copy of the $c05-math-0252$ gene (a qualitative resistance gene conferring resistance to all UK populations of the damaging nematode Globodera rostochiensis, and which has been shown to give relatively durable resistance).

The aim of multiplex breeding is therefore to develop parental lines that are either triplex or quadruplex (three or four copies of the desirable allele, respectively) for the $c05-math-0257$ allele. When crossed to any other parent, these multiplex lines will produce progeny that have at least a single copy of the $c05-math-0258$ gene and, due to dominance, all will be phenotypically resistant to G. rostochiensis nematodes. It will therefore not be necessary to screen for G. rostochiensis nematode resistance in these progeny. The ratios of resistant to susceptible amongst the progeny of genotypes derived by crossing a simplex, duplex, triplex and quadruplex to a nulliplex are shown in Table 5.5.

c05-math-0253 — **Table 5.5** The ratios of resistant to susceptible progeny amongst the genotypes derived by crossing a simplex, duplex, triplex and quadruplex resistant gene parent to a nulliplex parent.

c05-math-0254 — **Table 5.5** The ratios of resistant to susceptible progeny amongst the genotypes derived by crossing a simplex, duplex, triplex and quadruplex resistant gene parent to a nulliplex parent.

Genotype ratios from all possible cross-combinations between nulliplex, simplex, duplex, triplex or quadruplex parents are given in Table 5.6.

Table 5.6 Genotype ratios from all possible cross-combinations between nulliplex (N), simplex (S), duplex (D), triplex (T) or quadruplex parents (Q).

Cross	Nulliplex
	(N)
Nulliplex	All N	Simplex
*(aaaa)*		(S)
Simplex	1S : 1N	1D : 2S:1N	Duplex
*(Aaaa)*			(D)
Duplex	1D : 4S : 1N	1T : 5D : 5S : 1N	1Q : 8T : 18D : 8S : 1N	Triplex
*(AAaa)*				(T)
Triplex	1D : 1S	1T : 2D : 1S	1Q : 5T : 5D : 1S	1Q : 2T : 1D	Quadruplex
*(AAAa)*					(Q)
Quadruplex	All D	1T : 1D	1Q : 4T : 1D	1Q : 1T	All Q
*(AAAA)*

It was at first thought that developing multiplex parents would be very difficult; instead it proved simply to be a matter of effort and application. The main difficulty lies in the fact that progeny tests need to be carried out to test the genetic makeup of the dominant parental lines (very similar to backcrossing where the non-recurrent parent has a single recessive gene of interest). Consider one of the worst situations, where both starting parents are simplex. Three-quarters of the progeny will be resistant, but only a quarter will be duplex. So a quarter of the progeny will be nulliplex, hence susceptible, and so upon testing can be immediately discarded. The resistant progeny, however, need to be testcrossed to a nulliplex to distinguish the duplex from simplex genotypes. Once identified, the selected duplex lines are intercrossed or selfed. In their progeny, 1/36 will be quadruplex and 8/36 will be triplex. A single round test cross with a nulliplex will be necessary to distinguish, on the one hand, the triplex and quadruplex lines from, on the other, those that are either duplex or simplex. A second round test cross using the progeny of the first test cross will be necessary (i.e. a second backcross to a nulliplex line) in order to distinguish the quadruplex from triplex lines.

Once quadruplex lines have been identified, these can be continually intercrossed, or selfed, without further need to test. Similarly, quadruplex or triplex parental lines can be used in cross combination with any other parental lines and 100% of the resulting progeny will contain at least one of the dominant resistant alleles and show resistance. Multiplex breeding can be used in a similar way to develop parents in hybrid cross combinations.

5.2.7 The chi-square test

If plant breeders have an understanding of the inheritance of simply inherited characters, it is possible to predict the frequency of desired genes and genotypes in a breeding population. Breeders often ask questions relating to the nature of inheritance as well as the number of alleles or loci involved in the inheritance of a particular character of importance. For example, it is often valuable to determine whether a single gene is dominant, recessive or additive in inheritance. In the dominant/recessive case, the $c05-math-0259$ family will segregate in a 3:1 dominant : recessive ratio, while in the latter a 1:2:1 homozygous dominant : heterozygous : homozygous recessive ratio.

In segregating families such as those noted above, the ratios actually observed will not be exactly as predicted due to sampling error. For example, a coin is expected to fall as heads or tails in the ratio 1:1, but if tossed 10 times it does not always result in 5 heads and 5 tails. So the breeder is faced with interpreting the ratios that are observed in terms of what is expected, so it might be necessary to decode whether a particular ratio is really 1:2 or 1:3 or 3:4 or 9:7. How can this be done objectively? The answer is to use chi-square tests.

It should be clear that the significance of a given deviation is related to the size of the sample. If we expect a 1:1 ratio in a test involving six individuals, an observed ratio of 4:2 is not at all bad. But if the test involves 600 individuals, an observed ratio of 400:200 is clearly a long way off. Similarly, if we test 40 individuals and find a deviation of 10 in each class, this deviation seems serious:

But if we test 200 individuals, the same numerical deviation seems reasonably enough explained as a purely chance effect:

The statistical test most commonly used when such a problem arises is simple in design and application. Each deviation is squared, and the expected number in its class then divides each squared deviation. The resulting quotients are then all added together to give a single value, called the chi-square $c05-math-0262$ . To substitute symbols for words, let $c05-math-0263$ represent the respective deviations (observed minus expected), and $c05-math-0264$ the corresponding expected values, then:

We can calculate chi-square for the two arbitrary examples above to show how this value relates the magnitude of the deviation to the size of the sample.

You might note that the value of chi-square is much larger for the smaller population, even though deviations in the two populations are numerically the same. In view of our earlier common-sense comparison of the two, this is a practical demonstration that the calculated value of chi-square is related to the significance of a deviation. It has the virtue of reducing many different samples, of different sizes and with different numerical deviations, to a common scale for comparison.

The chi-square test can also be applied to samples including more than two classes. For example, the table below shows the chi-square analyses of the tall $c05-math-0270$ barley example earlier. Suppose that a test cross was carried out where the heterozygous $c05-math-0271$ (TtSs) is crossed to a genotype with the recessive alleles at these loci. When this was actually done, and 400 test cross progeny were grown out, there were 112 tall and 6-row, 89 tall and 2-row, 93 short and 6-row, and 106 short and 2-row. Inspection of this would show that there is a higher proportion of each of the original parental types (tall/6-row and short/2-two) than the recombinant types. The question is, therefore: is this linkage, or simple random sampling variation? To determine this we would use the $c05-math-0272$ test.

The number of degrees of freedom in tests of genetic ratios is almost always one less than the number of genotypic classes. To be more precise, it is the number of observable data that are independent. For example, in a two-gene test cross there are four possible phenotypes, expected in equal frequency. So, in our example, 400 plants were observed and there were 112 in the first group, 89 in the second group and 93 in the third group; then by definition to sum to 400 there must be 106 in the last group $c05-math-0277$ . Therefore our chi-square test would have 3 degrees of freedom. In just the same way, if we were testing two groups in tests of 1:1 or 3:1 ratios, they will have one degree of freedom.

Do not confuse assigning degrees of freedom to genetic frequencies with degrees of freedom in $c05-math-0278$ contingency tables. In a two-way contingency table, with pre-assigned row and column totals, one value can be filled arbitrarily, but the others are then fixed by the fact that the total must add up to the precise number of observations involved in that row or column. When there are four classes, any three are usually free, but the fourth is fixed. Thus, when there are four classes, there are usually three degrees of freedom.

Remembering the example given earlier with 40 individuals, the calculated $c05-math-0279$ for one degree of freedom. We can look up this value in the probability tables for chi-square, and in this case chance alone would be expected in considerably less than one in a hundred independent trials to produce as large a deviation as that obtained. We cannot reasonably accept chance alone as being responsible for this particular deviation; it represents an event that would occur, on a chance basis, much less often than the one-time-in-20 that we have agreed on as our point of rejection; this event would occur less often than even the one-time-in-a-100 that we decided to regard as highly significant.

In the case of the example we noted for barley, the expected frequency of each of the four phenotypes, if no linkage is present, would be 100 (total of 400, with four equal expectations), which would lead to deviations of 12, 11, 7 and 6; these squared and divided by the expected value $c05-math-0280$ gives a $c05-math-0281$ value of 3.5. This value is compared in probability tables for $c05-math-0282$ values and it falls just below the tabulated 50% probability value with 3 degrees of freedom, clearly not close to the accepted 5% probability we define as showing significance. We therefore say that there is no evidence that linkage exists, and that the observed deviations are likely to have occurred by random chance (sampling error).

Improper use of chi-square

The two most important reservations regarding the straightforward use of the chi-square method in genetics are:

Chi-square can usually be applied only to numerical frequencies themselves, not to percentages or ratios derived from the frequencies. For example, if in an experiment one expects equal numbers in each of two classes, but observes 8 in one class and 12 in the other, we might express the observed numbers as 40% and 60%, and the expected as 50% in each class. A chi-square value computed from these percentages cannot be used directly for the determination of the probability. When the classes are large, a chi-square value computed from percentages can be used, if it is multiplied by $c05-math-0283$ , where $c05-math-0284$ is the total number of individuals observed.
Chi-square cannot properly be applied to distributions in which the expected frequency of any class is less than 5. In fact, some statisticians suggest that a particular correction be applied if the frequency of any class is less than 50. However, the approximations involved in chi-squares are close enough for most practical purposes when there are more than 5 expected in each class.

5.2.8 Family size necessary in qualitative genetic studies

It is usual in genetic work for the scope of the experiments to be limited by such considerations as lack of available space, labour or money. It is therefore essential to make the best use of available resources, which will be a function of the number of plants/plots that need to be raised. Achieving this usually requires considerable care and planning of experiments. Often statistical considerations can be of great value.

In many experiments it is desirable to be able to pick out certain genotypes, usually (but not always) homozygous, with the aim of developing superior cultivars. It may also be necessary to detect some particular nonconformity. For example, to detect any linkage effects will involve test crossing $c05-math-0285$ lines onto a recessive parent. It would be advantageous to keep the population size down to a minimum while also assuring with high probability that the experiment will be sufficiently large so as to detect the required differences.

Let us consider now the question of detecting homozygotes in a segregating population. Any progeny that fails to segregate could have derived from a homozygous parent or could fail to show segregation because of sampling error. The greater the numbers of individuals that are examined, the lower the probability that sampling error will interfere with the interpretation of experimental results. The minimum size of progeny designed to test a particular ratio is then a statistical question involving consideration of the probability that any individual in a family derived from a heterozygote will be of the recessive type, and also the permissible maximum probability of obtaining a misleading result.

Consider the following example. Suppose you wish to test a series of individuals phenotypically dominant for a single gene, in order to identify the homozygous individuals, by using a test cross to a homozygous recessive. The progeny of the homozygotes will not show segregation whereas progeny from the heterozygotes will segregate in a 1:1 ratio (i.e. $c05-math-0286$ ). The important error that can arise will be from our failure to detect any segregants in the progenies from crosses that actually do have a heterozygous parent. Let us also assume that we do not want the test to fail with greater probability than once in 100 experiments or tests (i.e. $c05-math-0287$ ). In the progeny of a heterozygote, each individual has a chance of 1/2 of containing a dominant allele. Then a family of $c05-math-0288$ will be expected to have $c05-math-0289$ individuals with one dominant allele. Therefore we can predict the possibility of having no such individuals, which represents the misleading (error) result which must be avoided and must not occur at a higher frequency than 1 in 100. Then the minimum value of $c05-math-0290$ is given by the solution of the equation:

Taking natural logarithms this becomes:

Therefore:

In our example, $c05-math-0294$ Therefore the smallest family size needed would be seven or more, in order to be at least 99% certain of detecting the difference.

This formula can be generalized to:

where $c05-math-0296$ is the number that need to be evaluated, $c05-math-0297$ is the probability of having at least one type of interest (i.e. 90% $c05-math-0298$ ) and $c05-math-0299$ is the frequency of the desirable genotype/phenotype.

It is often necessary in plant breeding to estimate the number of individuals that need to be grown or screened in order to identify at least $c05-math-0300$ individuals (where $c05-math-0301$ is greater than one). It is never a good idea to simply multiply $c05-math-0302$ (from above), the number that needs to be grown to ensure at least one by $c05-math-0303$ the number required, as this will result in a gross overestimation.

An alternative is to use an extension to the above equation. The mathematics behind this equation is outside the scope of this book. However, the equation itself can be useful. It is:

where $c05-math-0305$ is the total number of plants that need to be grown; $c05-math-0306$ is the number of plants with the required alleles that need to be recovered; $c05-math-0307$ is the frequency of plants with the desired alleles; $c05-math-0308$ is the probability of recovering the desired number of plants with the desired alleles; and $c05-math-0309$ is a cumulative normal frequency distribution (area under standardized normal curve from 0 to $c05-math-0310$ ), a function of probability $c05-math-0311$ . For the sake of simplicity and to cover the most commonly used situations, $c05-math-0312$ for $c05-math-0313$ (i.e. 95% certainty) and $c05-math-0314$ (for 99% certainty).

For example, consider the frequency of homozygous recessive genotypes (aabb) resulting from the cross $c05-math-0315$ at the $c05-math-0316$ stage. How many $c05-math-0317$ lines would need to be evaluated to be 95% certain of obtaining ten homozygous recessive genotypes? Here the probability of aabb is 9/64 (i.e. $c05-math-0318$ ), $c05-math-0319$ , therefore $c05-math-0320$ .

Therefore a minimum of 111 $c05-math-0322$ lines would need to be grown.

5.3 Quantitative genetics

5.3.1 The basis of continuous variation

With qualitative inheritance, the segregating individual phenotypes are usually easily distinguished and fall into a few phenotypic classes (i.e. tall or short; white or red; resistant or susceptible). Characters that are controlled by multiple genes do not fall into such simple classifications. Let us consider a hectare field of potatoes, planted with one cultivar, and so every single plant in the field should be of identical genotype (ignoring mutations and errors). In that field there are likely to be 11,000 plants. At harvest, you have been given the task of harvesting all 11,000 plants and weighing the tubers that come from each plant separately. Would you expect all the potato plants to have exactly the same weight of potatoes? Probably not. Indeed, the weights can be presented in the form of a histogram (Figure 5.1) where 11,000 yields were divided into 17 weight classes. The variation in weight is obvious; some plants produce less than 0.5 kg of tubers, while others produce over 5 kg. Most, however, are grouped around the average of 3.2 kg per plant.

c05f001 — **Figure 5.1** Yield of tubers from individual potato plants taken at random from a single clonally reproduced potato cultivar.

Yield in potato, as in other crops, is polygenically inherited. Yield is therefore not controlled by a single gene but by many genes, all acting collectively. Thus yield has more chance of some of the processes these many genes control being affected by differences in the ‘environment’. As the example above involved harvesting plants that are all indeed clones of the same genotype, then the variation observed is due entirely to the environmental conditions to which each plant was subjected.

One of the major differences between single gene inheritance and multiple gene inheritance is that the former is relatively less affected or influenced by the environmental conditions compared with the later – at least in terms of the differences being expressed. When potato yields are being recorded we are recording the phenotype, but in reality as breeders we are interested in the genotype. The two are related by the equation:

where P is the phenotype, G is the genotypic effect, E is the effect of the environment in which the genotype is grown, GE is the interaction between the genotype and the environment, and $c05-math-0324$ is a random error term. Unfortunately, the fact that agronomic practices are part of the environment plants face is often neglected. Therefore in order to reduce the overall environmental impact on the field expression of the genotype, it is essential to provide at all times the most uniform agronomic management to breeding trials.

As noted, relatively single gene characters are often less affected by environment, and so do not show much influence of $c05-math-0325$ interactions (i.e. the difference in expression of the two alleles is large in comparison with the variation caused by environmental changes). For example, a potato genotype with white flowers (a qualitatively inherited trait) will always have white flowers in any environment in which the plants produce flowers. Conversely, quantitatively inherited traits are greatly influenced by environmental conditions and $c05-math-0326$ interactions are common, and can be large. The greatest difficulty plant breeders face is dealing with quantitative traits, and in particular in detecting the better genotypes based on their phenotypic performance. This is why it is critically important to employ appropriate experimental design techniques to genetic experiments and also plant breeding programmes.

A major part of quantitative genetics research related to plant breeding has been directed towards partitioning the variation that is observed (i.e. phenotypic variation) into its genetic and non-genetic portions. Once achieved, this can be taken further to further divide the genetic portion into that which is additive in nature and that which is non-additive (quite often dominance variation). Obviously, in breeding self-pollinating crops, the additive genetic variance is of primary importance since it is that portion of the variation attributable to homozygous gene combinations in the population and is what the breeder is aiming for (i.e. homozygous lines). On the other hand, variance due to dominance is related to the degree of heterozygosity in the population and will be reduced (to zero) over time with inbreeding as breeding lines move towards homozygosity, but will be important when it comes to out-breeding species.

Let us now return to the potato weights as one of many possible examples of continuous variation. If the frequency distribution of potato yields is inspected, there are two points to note: (1) the distribution is symmetrical (i.e. there are as many high-yielding plants as really low-yielding ones); and (2) the majority of potato yields were clustered around a weight in the middle of the distribution. As we have taken a class interval of 0.3 kg to produce this distribution, the figure does not look particularly continuous. However, we know that potato yields do not go up in increments of 0.3 kg but show a more continuous and gradual range of variation. If we use more class intervals in this example we will produce a smoother histogram, and if we use infinitely small class intervals it will result in a truly continuous bell-shaped curve. The shape of this curve is highly indicative of many aspects of plant science and it is a distribution called a normal distribution, and occurs in a wide variety of aspects relating to plant growth, particularly quantitative genetics.

5.3.2 Describing continuous variation

The normal distribution

The 11,000 potato plant weights discussed above are a sample, albeit a large one, of possible potato weights from individual plants. It is possible to predict mathematically the frequency distribution for the population as a whole (i.e. every possible potato plant of that cultivar grown), provided it is assumed that the sample is representative of the population (i.e. that our sample is an unbiased sample of all that was possible).

It is not necessary to actually draw normal distributions (which, even with the aid of computer graphics, are difficult to do accurately). Most of the important properties of a normal distribution can be characterized by two statistics, the mean or average $c05-math-0327$ of the distribution and the standard deviation $c05-math-0328$ , a measure of the ‘spread’ of the distribution.

There are in fact two means, the mean of the sample and the mean of the population from which the sample was drawn. The latter is represented by the symbol $c05-math-0329$ , and is, in reality, seldom known precisely. The sample mean is represented by $c05-math-0330$ (spoken $c05-math-0331$ bar), and it can be known with complete accuracy. Generally, the best estimate of a population mean $c05-math-0332$ is the actual mean of an unbiased sample drawn from it $c05-math-0333$ . The population mean is thus best estimated as:

where $c05-math-0335$ is the sum of all $c05-math-0336$ values from $c05-math-0337$ to $c05-math-0338$ .

The standard deviation is an ideal statistic to examine the variation that exists within a dataset. For any normal distribution, approximately 68% of the population sampled will be within one standard deviation from either side of the mean, approximately 95% will be within two standard deviations (Figure 5.2), and approximately 99% will be within three standard deviations of the mean.

c05f002 — **Figure 5.2** 95% of a population that is normally distributed will lie within two standard deviations of the population mean.

Once again, it is necessary to distinguish between the actual standard deviation of a population, all of whose members have been measured, and the estimated standard deviation of a population based on measuring a sample of individuals from it. The standard deviation of a population is represented by the symbol $c05-math-0339$ (Greek letter sigma) and is defined thus:

while the standard deviation of a sample is represented by the symbol $c05-math-0341$ , and is given by:

Another measure of the spread of data around the mean is the variance, which is the square of the standard deviation. The estimated variance of a population (i.e. obtained from the sample) is given by:

and the actual variance of an entire population (if every member of it has been measured) is given by:

Calculators are often programmed to give means and either standard deviations or variances with a few key strokes once data have been entered. However, in case it is necessary to derive these descriptive statistics semi-manually, it is useful to know about alternative equations for $c05-math-0345$ and $c05-math-0346$ :

Although these look more complicated than those given previously, they are easier to use because the mean does not have to be worked out first (which would entail entering all the data into the calculator twice). Note the difference between $c05-math-0348$ (each value of $c05-math-0349$ squared and then the squares totalled) and $c05-math-0350$ (the values of $c05-math-0351$ totalled and then the sum squared).

Standard deviations, as measures of spread around the mean, are probably intuitively more understandable than variances; for example, 68% of the population fall within one standard deviation of the mean. Why introduce the complication of variances? Well, variances are additive in a way standard deviations are not. Thus, if the variances attributable to a variety of factors have been estimated, it is mathematically valid to sum them to estimate the variance due to all the factors acting together. Similarly, the reverse is also true, in that a total variance can be partitioned into the variances attributable to a variety of individual factors. These operations, which are used extensively in quantitative genetics, cannot be so readily performed with standard deviations.

Variation between datasets

Two basic procedures are frequently used in quantitative genetics to interpret the variation and relationship that exists between characters, or between one character evaluated in different environments. These are simple linear regression and correlation.

A straight-line regression can be adequately described by two estimates: the slope, or gradient of the line, (b) and the intercept on the $c05-math-0352$ -axis (a). These are related by the equation:

It can be seen that b is the gradient of the line, because a change of one unit on the $c05-math-0354$ -axis results in a change of b units on the $c05-math-0355$ -axis. If $c05-math-0356$ and $c05-math-0357$ both increase (or both decrease) together, the gradient is positive. If, however, $c05-math-0358$ increases while $c05-math-0359$ decreases or vice versa, then the gradient is negative. When $c05-math-0360$ , the equation for $c05-math-0361$ reduces to:

and a is therefore the point at which the regression line crosses the $c05-math-0363$ -axis. This intercept value may be equal to, greater than or less than zero.

The formulation and theory behind regression analysis will not be described here and is not within the scope of this book. However, the gradient of the best-fitting straight line (also known as the regression coefficient) for a collection of points whose coordinates are $c05-math-0364$ and $c05-math-0365$ is estimated as:

where SP(x,y) is the sum of products of the deviations of $c05-math-0367$ and $c05-math-0368$ from their respective means ( $c05-math-0369$ and $c05-math-0370$ ) and SS(x) is the sum of the squared deviations of $c05-math-0371$ from its mean. It will be useful to have an understanding of the regression analysis and to remember the basic regression equations.

Now, SP(x,y) is given by the equation:

although in practice it is usually easier to calculate it using the equation:

The comparable equations for SS(x) are:

Notice that a sum of squares is really a special case of a sum of products. You should also note that if every $c05-math-0375$ value is exactly equal to every $c05-math-0376$ value, then the equation used to estimate b becomes $c05-math-0377$ .

Having determined b, the intercept value is found by substituting the mean values of $c05-math-0378$ and $c05-math-0379$ into the rearranged equation:

In regression analysis it is always assumed that one character is the dependant variable and the other is the independent variable. For example, it is common to compare parental expression with progeny expression (see Chapter 6), and in this case then progeny expression would be considered the dependant variable and parental expression independent. The expression of progeny is obviously dependent upon the expression of their parents, and not vice versa.

In addition, the degree of association between any two or a number of different characters can be examined statistically by the use of correlation analysis. Correlation analysis is similar in many ways to simple regression, but in correlations there is no need to assign one set of values to be the dependant variable while the other is said to be the independent variable. Correlation coefficients $c05-math-0381$ are calculated from the equation:

where SP(x,y) is again the sum of products between the two variables, SS(x) is the sum of squares of one variable $c05-math-0383$ and SS(y) is the sum of squares of the second variable $c05-math-0384$ , and:

These can, of course, sometimes be calculated more easily by:

Correlation coefficients $c05-math-0387$ range in value from $c05-math-0388$ to $c05-math-0389$ . $c05-math-0390$ values approaching $c05-math-0391$ show very good positive association between two sets of data (i.e. high values for one variable are always associated with high values of the other). In this case, we say that the two variables are positively correlated. Values of $c05-math-0392$ that are near to $c05-math-0393$ show disassociation between two sets of data (i.e. a high value for one variable is always associated with a low value in another). In this case we say that the two variables are negatively correlated. Values of $c05-math-0394$ that are near to zero indicate that there is no association between the variables. In this case a high value for one variable can be associated with a high, medium or low value of the other.

5.3.3 Relating quantitative genetics and the normal distribution

Consider two homozygous canola (Brassica napus) cultivars ( $c05-math-0395$ and $c05-math-0396$ ). The yield potential of $c05-math-0397$ is 620 kg/plot, and is higher than $c05-math-0398$ , which has a yield potential of 500 kg/plot. When these two cultivars were crossed and the $c05-math-0399$ produced, the yield of the $c05-math-0400$ progeny was exactly midway between both parents (560 kg/plot). This would suggest that additive genetic effects rather than dominance were present.

If yield in canola were controlled by a single locus and two alleles (which it is not), we would have:

It should be noted here that upper and lower case letters denoting alleles do not signify dominance as in qualitative inheritance, but rather simply differentiate between alleles. It is common to assign uppercase letters to alleles from the parent with the greater expression of the trait, and designated as $c05-math-0403$ .

When there is only one locus with two alleles involved, we would assume that the uppercase alleles each add 60 kg/plot to the base performance of a plant, and lowercase alleles add nothing. In this case the base performance is equal to $c05-math-0404$ . Therefore, $c05-math-0405$ $c05-math-0406$ , $c05-math-0407$ $c05-math-0408$ , The $c05-math-0409$ $c05-math-0410$ . In the $c05-math-0411$ we have a ratio of 1 AA : 2 Aa : 1 aa, and we would have three types of plants in the population: $c05-math-0412$ and $c05-math-0413$ .

Obviously, yield in canola is not controlled by two alleles at a single locus. However, let us progress gradually and assume that two loci each with two alleles are involved. We now have:

In this case (assuming alleles at different loci have equal effects), each of the two uppercase alleles would each add 30 kg/plot to the base weight. The $c05-math-0416$ (the same as if only one gene was involved). However, in the $c05-math-0417$ we have 16 possible allele combinations that can be grouped according to the number of uppercase alleles (or yield potential).

		AAbb
		AaBb
	aabB	aABa	AABb
	aaBb	AabB	AAbB
	aAbb	aAbB	AaBB
aabb	Aabb	aaBB	aABB	AABB
500	530	560	590	620

Extending in the same manner one more time, we see that the frequency distribution of the phenotypic classes in the $c05-math-0418$ generation when three genes having equal additive effects, and which segregate independently, are:

			aaBbCC
			aAbBcC
			aAbBCc
			aAbbCC
			AabbCC
		aabbCC	AabBcC	AABBcc
		aabBcC	AabBCc	AABbCc
		aabBCc	AaBbcC	AABbcC
		aaBbcC	AaBbCc	AAbBCc
		aaBbCc	AaBBcc	AAbBcC
		aaBBcc	aabBCC	AAbbCC
		aAbbcC	aaBBcC	AaBBCc
		aAbbCc	aaBBCc	AaBBcC
		aAbBcc	aABbcC	AaBbCC
	aabbcC	aABbcc	aABbCc	AabBCC	AABBcC
	aabbCc	AabbcC	aABBcc	aABBCc	AABBCc
	aabBcc	AabbCc	AAbbcC	aABBcC	AAbBCC
	aaBbcc	AabBcc	AAbbCc	aABbCC	AABbCC
	aAbbcc	AaBbcc	AAbBcc	aAbBCC	aABBCC
aabbcc	Aabbcc	AAbbcc	AABbcc	aaBBCC	AaBBCC	AABBCC
500	520	540	560	580	600	620

In this case each single upper case allele adds only 20 kg to the base weight of 500 kg/plot. This is determined in the same way as for the one and two gene models, although it is considerably more involved. You should note once more that the $c05-math-0419$ would have had a yield potential of 560 kg/plot, exactly the same as in the single and two gene cases.

Even with only three loci and two alleles at each locus, it should be obvious that we are moving closer to a shape resembling a standard normal distribution. The frequency of different genotypes possible when four, five and six loci are considered has 9, 11 and 12 phenotypic classes, respectively. It is fairly easy to visualize, therefore, that with only a modest number of loci, with segregating alleles acting in a more or less equal additive way, truly continuous variation in a character would be approximated quite closely. Quantitative inheritance deals with many loci and alleles, often too many to consider trying to estimate the number, and therefore explains the ubiquity of the normal distribution. Just as the mean and variance can describe the normal distribution, many of the important elements of the inheritance of a character can be described and explained using progeny means and genetic variances.

Phenotype of parents	$c05-math-0001$ progeny	No. in $c05-math-0002$ progeny	$c05-math-0003$ ratio, dominant : recessive by phenotype
$c05-math-0004$ , seed	round	5,474 r : 1,850 w	2.96 : 1
$c05-math-0005$ , coty.	yellow	6,022 y : 2,001 g	3.01 : 1
$c05-math-0006$ , stem	long	787 lo : 277 sh	2.84 : 1
$c05-math-0007$ , inf	axil	651 ax : 207 ter	3.15 : 1

Parents	$c05-math-0028$	$c05-math-0029$
$c05-math-0030$	Tall	6-row

Parents	$c05-math-0033$	$c05-math-0034$
$c05-math-0035$	Tt	Ss
$c05-math-0036$	TT:Tt:tt	SS:Ss:ss
	1:2:1	1:2:1

$c05-math-0044$	7/16 TT	:	2/16 Tt	:	7/16 tt
$c05-math-0045$	15/32 TT	:	2/32 Tt	:	15/32 tt
$c05-math-0046$	31/64 TT	:	2/64 Tt	:	31/64 tt
$c05-math-0047$
$c05-math-0048$	1/2 TT	:	0 Tt	:	1/2 tt

Tall and resistant (TtRr)	$c05-math-0094$	79
Short and resistant (ttRr)	$c05-math-0095$	18
Tall and susceptible (Ttrr)	$c05-math-0096$	22
Short and susceptible (ttrr)	$c05-math-0097$	81
Total	$c05-math-0098$	200

	$c05-math-0073$ parents
$c05-math-0074$		TTSS	TTSs	TTss	TtSS	TtSs	Ttss	ttSS	ttSs	ttss	Total
		4/64	8/64	4/64	8/64	16/64	8/64	4/64	8/64	4/64
TTss		4	2	–	2	1	–	–	–	–	9
TTSs		–	4	–	–	2	–	–	–	–	6
TTss		–	2	4	–	1	2	–	–	–	9
TtSS		–	–	–	4	2	–	–	–	–	6
TtSs		–	–	–	–	4	–	–	–	–	4
Ttss		–	–	–	–	2	4	–	–	–	6
ttSS		–	–	–	2	1	–	4	2	–	9
ttSs		–	–	–	–	2	–	–	4	–	6
ttss		–	–	–	–	1	2	–	2	4	9
											64

Gametes	$c05-math-0078$	$c05-math-0079$	$c05-math-0080$
$c05-math-0081$	1/16 TTSS	2/16 TtSS	1/16 ttSS
$c05-math-0082$	2/16 TTSs	4/16 TtSs	2/16 ttSs
$c05-math-0083$	1/16 TTss	2/16 Ttss	1/16 ttss

Gametes from	Gametes from male parent
female parent	$c05-math-0102$	$c05-math-0103$	$c05-math-0104$	$c05-math-0105$
$c05-math-0106$	TTRR	TTRr	TtRR	TtRr
	0.16	0.04	0.04	0.16
$c05-math-0107$	TTRr	TTrr	TtRr	Ttrr
	0.04	0.01	0.01	0.04
$c05-math-0108$	TtRR	TtRr	ttRR	ttRr
	0.04	0.01	0.01	0.04
$c05-math-0109$	TtRr	Ttrr	ttRr	ttrr
	0.16	0.04	0.04	0.16

Independent assortment	Coupling linkage	Repulsion linkage
(1/4) TR	$c05-math-0116$	$c05-math-0117$
(1/4) Tr	$c05-math-0118$	$c05-math-0119$
(1/4) tR	$c05-math-0120$	$c05-math-0121$
(1/4) tr	$c05-math-0122$	$c05-math-0123$

Phenotype	Observed	Expected
$c05-math-0127$	500	149.375
aabbcc	510	149.375
$c05-math-0128$	50	149.375
$c05-math-0129$	55	149.375
$c05-math-0130$	35	149.375
$c05-math-0131$	38	149.375
$c05-math-0132$	4	149.375
$c05-math-0133$	3	149.375
Total	1195

Non-recombinant types	$c05-math-0161$	500	$c05-math-0162$
	aabb	510	$c05-math-0163$
Single recombinant types	$c05-math-0164$	$c05-math-0165$	$c05-math-0166$
	$c05-math-0167$	$c05-math-0168$	$c05-math-0169$
$c05-math-0170$ crossover recombinants		$c05-math-0171$	$c05-math-0172$

$c05-math-0233$ phenotype				Explanation
$c05-math-0234$	$c05-math-0235$	$c05-math-0236$	aabb
9	3	3	1	No epistasis.
9	3	4	0	Recessive epistasis: aa epistatic to $c05-math-0237$ and bb.
12	0	3	1	Dominant epistasis: $c05-math-0238$ epistatic to $c05-math-0239$ , or bb.
13	0	3	0	Dominant and recessive epistasis: $c05-math-0240$ epistatic to $c05-math-0241$ and bb; bb epistatic to $c05-math-0242$ and aa. $c05-math-0243$ and bb produce identical phenotypes.
9	0	7	0	Duplicate recessive epistasis: aa epistatic to $c05-math-0244$ , and bb; and bb epistatic to $c05-math-0245$ and aa.
15	0	0	1	Duplicate dominant epistasis: $c05-math-0246$ epistatic to $c05-math-0247$ and bb; $c05-math-0248$ epistatic to $c05-math-0249$ and aa.

Cross type	Phenotype Resistant : Susceptible	% resistant in progeny
$c05-math-0253$	1 : 1	50
$c05-math-0254$	5 : 1	83
$c05-math-0255$	1 : 0	100
$c05-math-0256$	1 : 0	100

observed	30	10
expected	20	20
obs−exp (difference)	10	$c05-math-0260$

	Sample of 40		Sample of 200
	individuals		individuals
Observed (obs)	30	10	90	110
Expected (exp)	20	20	100	100
obs−exp $c05-math-0266$	10	10	10	10
$c05-math-0267$	100	100	100	100
$c05-math-0268$	5	5	1	1
$c05-math-0269$	10		2

	Tall,	Tall,	Short,	Short,
	6-row	2-row	6-row	2-row
Observed (obs)	112	89	93	106
Expected (exp)	100	100	100	100
obs−exp $c05-math-0273$	12	11	7	6
$c05-math-0274$	144	121	49	36
$c05-math-0275$	1.44	1.21	0.49	0.36
$c05-math-0276$ n.s.