Chapter 5
Genetics and Plant Breeding

5.1 Introduction

Early plant breeders, basically farmers, did not have any knowledge of the inheritance of characters in which they were interested. The only knowledge they possessed was that the most productive offspring tended to originate from the most productive plants, and that the better flavour types tended to be derived from parent plants which were, themselves, of better flavour. None the less, the achievements of these breeders were remarkable and should never be underestimated. They moulded most of the crops as we recognize them today from their wild and weedy ancestral types.

In modern plant breeding schemes it is recognized, however, that it is very much more effective and efficient (or indeed essential) to have a basic knowledge of the inheritance or genetics of traits for which selection is to be carried out.

There are generally five different areas of genetics that have been applied to plant breeding:

  • Qualitative genetics, where inheritance is controlled by alleles at a single locus, or at very few loci.
  • Population genetics, which deals with the behaviour (or frequency) of alleles in populations and the conditions under which they remain in equilibrium or change, thus allowing predictions to be made about the properties and changes expected in populations.
  • Quantitative genetics, for traits where the variation is determined by alleles at more than a few loci; traits that are said to be controlled by polygenic systems. Quantitative genetics is concerned with describing the variation present in terms of statistical parameters such as progeny means, variances and covariances.
  • Cytogenetics, the study of the behaviour and properties of chromosomes, being the structural units that carry the genes that govern expression of all the traits.
  • Molecular genetics, where studies are carried out at the molecular level. Molecular techniques have been developed to investigate and handle both qualitative and quantitative characters. Although the details of molecular genetics are generally outside the scope of this book, the impact of molecular genetic techniques is increasing, important and relevant.

5.2 Qualitative genetics

Few sciences have as clear-cut a beginning as modern genetics. As mentioned previously, early plant breeders were aware of some associations between parent plant and offspring, and at various times in history researchers had carried out experiments to study such associations. However, experimental genetics with real meaning began in the middle of the 19th century with the work of Gregor Johann Mendel, and was only fully appreciated after the turn of the 20th century.

Mendel's definitive experiments were carried out in a monastery garden on pea lines. The flowers of pea plants are so constructed as to favour self-pollination, and as a result the majority of lines used by Mendel were either homozygous or near-homozygous genotypes. Mendel's choice of peas as an experimental plant species offered a tremendous advantage over many other plant species he might have chosen. The differences in characters he chose were also fortuitous. Therefore many present-day scientists have argued that Mendel had a great deal of luck associated with his findings because of the choices he made over what to study. When this is combined with segregation ratios that are better than might be expected by chance, many have concluded that he must ‘have already foreseen the results he expected to obtain’.

Before considering an example of one of Mendel's experiments, there are a few general points to be made about his experiments. Others, before Mendel, had made controlled hybridizations or crosses within various species. So why did Mendel's crosses, rather than those of earlier workers, provide the basis for the modern science of genetics? First and foremost, Mendel had a brilliant analytical mind that enabled him to interpret his results in ways that defined the principles of heredity. Secondly, Mendel was a proficient experimentalist. He knew how to carry out experiments in such a way as to maximize the chances of obtaining meaningful results. He knew how to simplify data in a meaningful way. As parents in his crosses, he chose individuals that differed by sharply contrasting characters (now known to be controlled by single genes). Finally, as noted above, he used true breeding (homozygous) lines as parents in the crosses he studied.

Among other elements in Mendel's success were the simple, logical sequence of crosses that he made and the careful numerical counts of his progeny that he recorded with reference to the easily definable characteristics on whose inheritance he focused his attention. It should be noted that many of the features listed as reasons for Mendel's success are very similar to the criteria necessary to carry out a successful plant breeding programme (including the ‘luck element’).

Let us consider some of Mendel's experiments as an introduction to qualitative inheritance. These results are presented in Table 5.1. When Mendel crossed plants from a round-seeded line with plants from a wrinkled-seeded line, all of the first generation c05-math-0008 had round seeds. The characteristic of only one of the parental types was therefore represented in the progeny. In the next generation c05-math-0009, achieved by selfing the c05-math-0010, both round-seed and wrinkled-seed were found in the progeny. Mendel's count of the two types in the c05-math-0011 was 5,474 round-seed and 1,850 wrinkled-seed, a ratio of 2.96:1.

Table 5.1 Results from some of Mendel's crossing experiments with peas.

Phenotype of parents c05-math-0001 progeny No. in c05-math-0002 progeny c05-math-0003 ratio, dominant : recessive by phenotype
c05-math-0004, seed round 5,474 r : 1,850 w 2.96 : 1
c05-math-0005, coty. yellow 6,022 y : 2,001 g 3.01 : 1
c05-math-0006, stem long 787 lo : 277 sh 2.84 : 1
c05-math-0007, inf axil 651 ax : 207 ter 3.15 : 1

Mendel found it notable that the same general result occurred when he made crosses between plants from lines differing for other characters. Another example is when he crossed peas with yellow cotyledons with ones with green cotyledons; the c05-math-0012s all had yellow cotyledons, while he found a ratio of 3.01 yellows to 1 green in the c05-math-0013. Almost identical results were obtained when long-stemmed plants were crossed with short-stemmed plants, and when plants with axial inflorescence were crossed with plants with terminal inflorescence.

How did Mendel interpret his generalized findings? One of the keys to his solution was his recognition that in c05-math-0014 the heredity basis for the character that fails to be expressed was not lost. This expression of the character appears again in the c05-math-0015 generation. Recognizing the idea of dominance and recessiveness in heterozygous genotypes and the particulate nature of the heritable factors was the overwhelming genius of Gregor Mendel. This laid the foundation of genetics and hence the explanation underlying the most important features of qualitative genetics.

5.2.1 Genotype/phenotype relationships

Within genetic studies there are two interrelated points. The first is concerned with the actual genetic makeup of individual plants or segregating populations and is referred to as the genotype. The second is related to what is actually expressed or observed in individual plants or segregating populations, and is termed the phenotype.

In the absence of any environmental variation (which can often be assumed with qualitative, single, major gene inheritance), the most frequent cause of difference between genotype and phenotype is due to dominance effects. For example, consider a single locus and two alleles (c05-math-0016 and c05-math-0017). If two diploid homozygous lines are crossed where one has the genotype AA and the other aa, then the c05-math-0018 will be heterozygous at this locus (i.e. Aa). If this resembles exactly the AA parent, then allele c05-math-0019 is said to be dominant over allele c05-math-0020. On selfing the c05-math-0021, genotypes will occur in the ratio c05-math-0022 If, however, the allele c05-math-0023 is completely dominant to c05-math-0024, the phenotype of the c05-math-0025 will be c05-math-0026 Therefore, complete dominance occurs when the c05-math-0027 shows exactly the same phenotype as one of the two parents in the cross.

Compare these ratios to those observed by Mendel (Table 5.1). You will notice that Mendel's data do not fit exactly to a 3:1 ratio that would be expected, given that each trait is controlled by a single completely dominant gene. It should, however, be remembered that gamete segregation in meiosis and pairing in fertilization are random events. Therefore it is highly unlikely, because of sampling variation, that exact ratios are found in such experiments. Indeed, applying statistical considerations, many modern researchers have claimed that Mendel's data appears to be ‘too good a fit’ for purely random events. Whether these geneticists are correct or not does not, however, detract in any way from the remarkable achievements of Mendel.

5.2.2 Segregation of qualitative genes in diploid species

To illustrate segregation of qualitative genes, and indeed many following concepts, consider a series of simple examples. Firstly, consider two pairs of single genes in spring barley (H. vulgare L.). Say that dwarf barley plants differ in plant height from tall types at the t-locus, with tall types given the genetic constitution TT, and dwarf lines tt. Barley lines with 6-row ears differ from lines with 2-row ears, where the central florets do not set, at the S-locus, with 6-row SS types being dominant over 2-row ss types. Let us consider the case where two homozygous lines are artificially hybridized, where one parent is homozygous tall and 6-row and the other is dwarf and 2-row. First consider the phenotypes expected for each trait.

Parents c05-math-0028 c05-math-0029
c05-math-0030 Tall 6-row

When the tall c05-math-0031 is backcrossed to the dwarf parent, a ratio of 1 tall : 1 dwarf is obtained in the resulting progeny. Similarly, when the six-row c05-math-0032 is crossed with a homozygous two-row, a ratio of 1 six-row : 1 two-row is obtained. Evidently, the difference between tall and dwarf behaves as a single major gene with the tall allele showing complete dominance to the dwarf allele; and the difference between six-row and two-row is also a single gene with six-row showing complete dominance to two-row. In terms of segregating alleles (i.e. genotypes), the above example would be:

Parents c05-math-0033 c05-math-0034
c05-math-0035 Tt Ss
c05-math-0036 TT:Tt:tt SS:Ss:ss
1:2:1 1:2:1

Where TT and Tt and similarly SS and Sc have the same phenotype, we get the 3:1 phenotypic segregation ratio. If each allele exerts equal effect (additive, and no dominance) then we would have three expected phenotypes in the Mendelian ratio of one tall, two intermediate and one short, according to the 1 TT : 2 Tt : 1 tt, or one 6-row, two intermediate and one 2-row.

Now assume that a large number of the c05-math-0037 plants were allowed to set selfed seed: what would be the ratio of phenotypes and genotypes in the c05-math-0038 generation? To answer this, consider that only the heterozygous c05-math-0039 plants will segregate (i.e. TT and tt types are now homozygous for that trait and so will breed true for that character). Of course the heterozygous c05-math-0040 has the same genotype as the c05-math-0041, and so it is no surprise that it segregates in the same ratio. At c05-math-0042 we therefore have c05-math-0043 which results in 3/8 TT : 2/8 Tt : 3/8 tt. Similar results of 3/8 SS : 2/8 Ss : 3/8 ss would be obtained for 6-row versus 2-row. Applying these simple rules and expanding this to later generations, we get:

c05-math-0044 7/16 TT : 2/16 Tt : 7/16 tt
c05-math-0045 15/32 TT : 2/32 Tt : 15/32 tt
c05-math-0046 31/64 TT : 2/64 Tt : 31/64 tt
c05-math-0047
c05-math-0048 1/2 TT : 0 Tt : 1/2 tt

Note that each successive generation of selfing reduces the proportion of heterozygotes by half (i.e. 1/2 Tt at c05-math-0049, 2/8 Tt at c05-math-0050, 2/16 Tt at c05-math-0051 and 2/32 Tt at c05-math-0052, etc.

Let us now consider how these two different pairs of alleles behave with respect to each other in inheritance. One way to study this is to cross individuals that differ in both characteristics:

equation

When this cross is made, the c05-math-0054 shows both dominant characteristics, tall and 6-row. Plant breeders' interests in genetics mainly relate to selection, and as no selection takes place at the c05-math-0055 stage, so the interest begins when the self-pollinated progeny of the c05-math-0056 is considered. In this case the c05-math-0057 individuals are assumed to produce equal frequencies of four kinds of gametes during meiosis (TS, Ts, tS and ts). An easy way to illustrate the possible combination of c05-math-0058 progeny is using a Punnett square, where the four gamete types from the male parent are listed in a row along the top, and the four kinds from the female parent are listed in a column down the left-hand side. The 16 possible genotype combinations are then obtained by filling in the square, that is:

Gametes from female parent Gametes from male parent
TS Ts tS ts
TS TSTS TSTs TStS TSts
Ts TsTS TsTs TstS Tsts
tS tSTS tSTs tStS tSts
ts tsTS tsTs tstS tsts

This also can be written, putting the allele representations for the same locus together, as:

Gametes from female parent Gametes from male parent
TS Ts tS ts
TS TTSS TTSs TtSS TtSs
Ts TTSs TTss TtSs Ttss
tS TtSS TtSs ttSS ttSs
ts TtSs Ttss ttSs ttss

Collecting the like genotypes we get the following frequency of genotypes:

1/16 TTSS; 2/16 TTSs; 1/16 TTss
2/16 TtSS; 4/16 TtSs; 2/16 Ttss
1/16 ttSS; 2/16 ttSs; 1/16 ttss

and frequency of phenotypes:

equation

As with the single gene case above, if the alleles do not show dominance then there would indeed be nine different phenotypes in the ratio:

equation

In most cases when developing inbred cultivars, plant breeders carry out selection based on plant phenotype amongst early generation segregating populations (i.e. c05-math-0061 and c05-math-0062). It is obvious that 75% of the c05-math-0063 plants will be heterozygous at one or both loci, and dominance effects can mask the actual genotypes that are to be selected. Single c05-math-0064 plant selections for the recessive traits (short stature and 2-row) allow for identification of the desired genotype (ttss) but only 1/16th of c05-math-0065 plants will be of this type, while the recessive expression of the trait in most plants is hidden due to dominance.

The effects of heterozygosity on selection can be reduced through successive rounds of self- pollination. Plant breeders, therefore, do not only select for single gene characters at the c05-math-0066 stage. Consider now what would happen if a sample of c05-math-0067 plants from the above example were selfed: what would be the resulting segregation expected at the c05-math-0068 stage?

There are nine different genotypes at the c05-math-0069 stage, TTSS, TTSs, TTss, TtSS, TtSs, Ttss, ttSS, ttSs and ttss, and they occur in the ratio 1 : 2 : 1 : 2 : 4 : 2 : 1 : 2 : 1, respectively. Obviously if any genotype is homozygous at a locus then these plants will not segregate at that locus. For example, plants with the genetic constitution of TTSS will always produce plants with the TTSS genotype. c05-math-0070 plants with a genotype of TTSs will not segregate for the Tt locus and will only segregate for the Ss locus. Similarly, c05-math-0071 plants with a genotype of TtSs will segregate for both loci, with the same segregation frequencies as the c05-math-0072.

From this we have:

c05-math-0073 parents
c05-math-0074 TTSS TTSs TTss TtSS TtSs Ttss ttSS ttSs ttss Total
4/64 8/64 4/64 8/64 16/64 8/64 4/64 8/64 4/64
TTss 4 2 2 1 9
TTSs 4 2 6
TTss 2 4 1 2 9
TtSS 4 2 6
TtSs 4 4
Ttss 2 4 6
ttSS 2 1 4 2 9
ttSs 2 4 6
ttss 1 2 2 4 9
64

Summing over rows, this results in a genotypic segregation ratio of 9/64 TTSS : 6/64 TTSs : 9/64 TTss : 6/64 TtSS : 4/64 TtSs : 6/64 Ttss : 9/64 ttSS : 6/64 ttSs : 9/64 ttss.

The phenotypic expectation at the c05-math-0075 would therefore be:

equation

This result could be obtained in a simpler manner by considering the segregation ratio of each trait separately and using these to form a Punnett square. For example, the frequency of homozygous tall (TT), heterozygous tall (Tt) and homozygous short (tt) at the c05-math-0077 is 1:2:1, respectively. Similarly, the frequency of SS, Ss and ss is also 1:2:1. We can use these frequencies to construct a Punnett square such as:

Gametes c05-math-0078 c05-math-0079 c05-math-0080
c05-math-0081 1/16 TTSS 2/16 TtSS 1/16 ttSS
c05-math-0082 2/16 TTSs 4/16 TtSs 2/16 ttSs
c05-math-0083 1/16 TTss 2/16 Ttss 1/16 ttss

This would give the same genotypic and phenotypic frequencies as shown above. It is, however, necessary to be familiar with the more direct method for situations where segregation is not independent (i.e. in cases of linkage).

It should be noted, as in the single gene case previously described, that increased selfing results in increased homozygosity. Therefore with increased generations of selfing there is an increase in the frequency of expression of recessive traits. If, in this case, a breeder wished to retain the 6-row short plant type, then it would be expected that 3/16 (approximately 19%) of all c05-math-0084 plants will be of this type c05-math-0085, and only 1/3 of these would at this stage be homozygous for both traits. If selection were delayed until the c05-math-0086 generation, then 15/64 (or just over 23%) of the population would be of the desired type, and now 60% of the selections would be homozygous for both genes. Obviously, if selection is delayed until after an infinite amount of selfing, then the frequency of the desired types would be 25% (and all homozygous).

5.2.3 Qualitative loci linkage

The principle of independent assortment of alleles at different loci is one of the cornerstones on which an understanding of qualitative genetics is based. Independent assortment of alleles does not, however, always occur. When certain different allelic pairs are involved in crosses, deviation from independent assortment regularly occurs if the loci are located on the same arm of the same chromosome. This will mean that there will be a tendency of parental combinations to remain together, which is expressed in the relative frequency of new combinations, and is the phenomenon of linkage.

It is often desirable to have knowledge of linkage in plant breeding to help predict segregation patterns in various generations and to help in selection decisions. All genetic linkages can be broken by successive generations of sexual reproduction, although it may take many rounds of recombination before tight linkages are broken. In general, however, if two traits of interest are adversely linked (i.e. the desired combination appears with lower than expected frequency), then increased opportunities for recombination need to be given before selection takes place.

The ratio of c05-math-0087 progeny after selfing c05-math-0088 individuals (i.e. say AaBb), of equal numbers of gametes of the four possible genotypic combinations (AB, Ab, aB, ab), leads to the ratio of c05-math-0089. Similarly, we find the genetic ratio of c05-math-0090, on test-crossing the c05-math-0091 to a complete recessive (aabb). Any significant deviation from these expected ratios is an indication of linkage between the two gene loci.

Consider a second simple example of recombination frequencies derived from a test cross, how the test cross can be used to estimate recombination ratios, and hence ascertain linkage between characters. Consider again the dwarfing gene in spring barley, but this time with a third single gene that confers resistance to barley powdery mildew. Mildew-resistant genotypes are designated as RR and susceptible genotypes as rr. A cross is made between two barley genotypes where one parent is homozygous tall and mildew resistant (i.e. TTRR); and the other is short and mildew susceptible (i.e. ttrr). We would expect the c05-math-0092 to be tall and resistant (i.e. TtRr). When the heterozygous c05-math-0093 is crossed with a completely recessive genotype (i.e. ttrr) we would expect to have a 1:1:1:1 ratio of phenotypes, tall and resistant: tall and susceptible: short and resistant: short and susceptible. When this cross was carried out and the progeny from the test cross was examined, the following frequencies of phenotypes and genotypes were the ones actually observed.

Tall and resistant (TtRr) c05-math-0094 79
Short and resistant (ttRr) c05-math-0095 18
Tall and susceptible (Ttrr) c05-math-0096 22
Short and susceptible (ttrr) c05-math-0097 81
Total c05-math-0098 200

Linkage is obviously indicated by these results, as the observed frequencies are markedly different from those expected at 50:50:50:50, on the basis of independent assortment. The two phenotypes, tall and resistant and short and susceptible, occur at a considerably higher frequency (79 and 81, respectively) than we expected. You will readily note that these are the same phenotypes as the two parents. The other two phenotypes (the non-parental, or recombinant, types) were observed with much lower frequency (18 and 22, respectively). Added together, the two recombinant types (short and resistant, and tall and susceptible) only account for 40 (20%) out of the 200 c05-math-0099 progeny examined, while we expected their contribution to collectively make up 50% of the progeny. Therefore the c05-math-0100 plants are producing TR gametes or tr gametes in 80% of meiotic events, or a frequency of 0.4 for each type. Similarly, recombinant Tr or tR gametes are produced in only 20% of meiotic events, or a frequency of 0.1 for each type. Knowing the frequency of the four gamete types, we can now proceed to predict the frequency of genotypes and phenotypes we would expect in the c05-math-0101 generation using a Punnet square.

Gametes from Gametes from male parent
female parent c05-math-0102 c05-math-0103 c05-math-0104 c05-math-0105
c05-math-0106 TTRR TTRr TtRR TtRr
0.16 0.04 0.04 0.16
c05-math-0107 TTRr TTrr TtRr Ttrr
0.04 0.01 0.01 0.04
c05-math-0108 TtRR TtRr ttRR ttRr
0.04 0.01 0.01 0.04
c05-math-0109 TtRr Ttrr ttRr ttrr
0.16 0.04 0.04 0.16

Collecting like phenotypes together results in:

equation

From a breeding standpoint, compare these phenotypic ratios to those that would have been expected with no linkage (i.e. c05-math-0111). If, as might be expected, the aim was to identify individual plants which were short and resistant, then the actual occurrence of these types would drop from 19% to 9%, by more than half. It should also be noted that 20% recombination is not particularly high.

Overall, therefore, genes on the same chromosome (particularly on the same chromosome arm) are linked and do not segregate independently as they would when they are located on different chromosomes. Two heterozygous loci may be linked in coupling or repulsion. Coupling is present when desirable alleles at two loci are present together on the same chromosome and the unfavourable alleles are on another (e.g. TR/tr). Repulsion is present when a desirable allele at one locus is on the same chromosome as an unfavourable allele at another locus (e.g. Tr/tR).

In the test cross c05-math-0112 the situations shown in Table 5.2 can arise.

Table 5.2 Expected frequency of genotypes resulting from a test cross where TrRr is crossed to ttrr with independent assortment, and with coupling and repulsion linkage.

Independent assortment Coupling linkage Repulsion linkage
(1/4) TR c05-math-0116 c05-math-0117
(1/4) Tr c05-math-0118 c05-math-0119
(1/4) tR c05-math-0120 c05-math-0121
(1/4) tr c05-math-0122 c05-math-0123

The percentage recombination, expressed as the number of map units between loci, can be calculated from the equation:

equation

Depending on what the initial parents were, this will be estimated as:

equation

or

equation

Linkage can also be detected, but less efficiently, by selfing the c05-math-0124 and observing the segregation ratio of the c05-math-0125 to determine whether there is any deviation from the expectation, based on independent assortment (e.g. 1 TTRR : 2 TTRr : 1 TTrr : 2 TtRR : 4 TtRr : 2 Ttrr : 1 ttRR : 2 ttRr : 1 ttrr).

The relative positions of three loci can be mapped by considering the frequency of progeny from a three gene test cross. To examine three gene test crosses, consider the following cross between two parents where one parent has the genotype AABBCC and the other has the genotype aabbcc. The c05-math-0134 would have the genotype AaBbCc. In order to perform the test cross it is necessary to cross the c05-math-0135 family with a completely recessive genotype (i.e. aabbcc, in this case the recessive parent) and observe the frequency of the eight possible phenotypes. In our example the frequencies are as shown in Table 5.3.

Table 5.3 Observed and expected genotypes obtained from backcrossing the c05-math-0126 progeny from a cross between two parents where one parent has the genotype AABBCC and the other has the genotype aabbcc, to the recessive (aabbcc) parent.

Phenotype Observed Expected
c05-math-0127 500 149.375
aabbcc 510 149.375
c05-math-0128 50 149.375
c05-math-0129 55 149.375
c05-math-0130 35 149.375
c05-math-0131 38 149.375
c05-math-0132 4 149.375
c05-math-0133 3 149.375
Total 1195

If all three loci segregated independently, for example if they were on different chromosomes, we would expect the phenotype frequencies of the eight genotypes to be the same at 149.375 (or the total number of observations divided by the expected number of phenotype classes, c05-math-0136). Obviously, the observed frequencies are very different from those expected.

The first point to note is that the frequency of the parental genotypes is considerably higher than expected, while all other, recombinant, types are less than expected.

Next, it is necessary to know the relative order that the loci appear on the chromosome. Such ordering or arrangement of loci along a chromosome is known as a genetic map. The middle locus can usually be determined from the phenotype that is observed at the lowest frequency. In this example the lowest observed phenotypes are c05-math-0137 and c05-math-0138, both of which are effectively parental types at the c05-math-0139 and c05-math-0140 locus but involves non-parental combinations with c05-math-0141. As recombination of only the centre gene will involve a double crossover event, the frequency of occurrence will be lowest. Conversely, the pair with most phenotypic observed classes for recombinants (not counting double crossovers) involves the outside genes. From this, the c05-math-0142-locus would appear to be the middle one (i.e. requires a recombination event between c05-math-0143- and c05-math-0144-locus, plus recombination between the c05-math-0145- and c05-math-0146-locus).

Consider the possible combinations of alleles and the frequency at which they occur. For the c05-math-0147 loci we have:

equation

So percentage of c05-math-0149.

For c05-math-0150 loci we have:

equation

So percentage of c05-math-0152.

For the c05-math-0153 loci we have:

equation

So percentage of c05-math-0155.

We obtain the map distances as being equivalent to the percentage recombinants and so we have:

equation

From the map it is clear that the map distance c05-math-0157 to c05-math-0158 c05-math-0159 is less than the added distance A–C plus C–B. The method of calculation used to estimate these distances assumes that only the non-parental types (i.e. recombinants) are included as the results of recombination, as would be the case where the linkage between two loci is considered. Where three loci are involved it is necessary to include the double crossover recombination events in estimating the distance between the furthest two loci. In this case we could calculate the c05-math-0160 distance from:

Non-recombinant types c05-math-0161 500 c05-math-0162
aabb 510 c05-math-0163
Single recombinant types c05-math-0164 c05-math-0165 c05-math-0166
c05-math-0167 c05-math-0168 c05-math-0169
c05-math-0170 crossover recombinants c05-math-0171 c05-math-0172

Therefore, the percentage of c05-math-0173, giving an estimated genetic distance between A–B of c05-math-0174, which is now the same distance as estimated by simply adding c05-math-0175 to c05-math-0176.

This indicates a general flaw in linkage map distance estimation, in that where there is no ‘centre gene’ locus, it is impossible to detect double recombination events, and as a result map distances will always provide underestimates of actual recombination frequencies.

To avoid such discrepancies, recombination frequencies, which are not additive, are usually converted to a cM scale using the function published by J.B.S. Haldane in 1919, being:

equation

where c05-math-0178 is the recombination frequency. From this we see that the map distances transform to:

equation

The standard error of these map distances can be calculated using the equation:

equation

where c05-math-0181 is the recombination frequency, and c05-math-0182 is the number of plants observed in the three-way test cross.

5.2.4 Pleiotropy

Very tight linkage between two loci can be confused with pleiotropy, the control of two or more characters by a single gene. For example, the linkage of resistance to the soybean cyst nematode and seed coat colour seems to be a case of pleiotropy because the two characters are always inherited together. The only way linkage and pleiotropy can be distinguished is effectively the negative way, that is, to find a crossover product such as a progeny homozygous for resistance and yellow seed coat. Resistance and yellow seed coat could never occur in a true breeding individual if true pleiotropy was present. Thousands of individuals may have to be grown to break a tight linkage and prove that it is not actually pleiotropy.

5.2.5 Epistasis

Different loci may be independent of each other in their segregation and recombination patterns, but independence of gene transmission, however, does not necessarily imply independence of gene action or expression. In fact, in terms of its final expression in the phenotype of the individual, no gene acts by itself.

A character can be controlled by genes that are inherited independently but that interact to form the final phenotype. The interaction of genes at different loci that affect the same character is called either non-allelic interaction or epistasis. Epistasis was originally used to describe two different genes that affect the same character, one that masks the expression of the other. The gene that masks the other is said to be epistatic to it. The gene that is masked was termed hypostatic. Epistasis causes deviations from the common phenotypic ratios in c05-math-0183 such as 9:3:3:1, which indicates segregation of two independent genes, each with complete dominance.

To examine epistasis, consider the following simple examples. A white kernel wheat cultivar is crossed to one with coloured kernels. The c05-math-0184 progeny all have coloured kernels. This situation could easily be interpreted as indicating that coloured kernels (CC) are dominant to white (cc). However, when the c05-math-0185 progeny is examined, the ratio observed is 15 coloured kernels to 1 white kernel, a much higher proportion of coloured kernels than expected (3:1) if a single dominant gene is responsible. In fact, two genes control kernel colour in wheat (gene c05-math-0186 and gene c05-math-0187). Coloured wheat kernels are produced from a precursor product being acted on by an enzyme, or in this case either one of two enzymes, enzyme A or enzyme B. So when a homozygous wheat cultivar with coloured kernels (AABB) is crossed to one with white kernels (aabb), the c05-math-0188 (AaBb) produces both the A enzyme and the B enzyme, and hence all kernels are coloured. Now phenotypes from the c05-math-0189 progeny should have a ratio of c05-math-0190. However, coloured kernels can result from either the c05-math-0191-gene (enzyme A) or the c05-math-0192-gene (enzyme B), acting alone or together, so then the first 3 types (c05-math-0193, and c05-math-0194) all have coloured kernels and only the aabb types have white kernels. This is called duplicate dominant epistasis where c05-math-0195 is epistatic to c05-math-0196 and bb, and c05-math-0197 epistatic to c05-math-0198 and aa.

Consider now a cross between two sweet pea cultivars, both homozygous, one with coloured flowers (anthocyanin) and the other with white flowers (no anthocyanin). All c05-math-0199 plants have coloured flowers. Again this could be interpreted as a situation whereby flower colour is controlled by a single dominant gene (CC). However, when the c05-math-0200 progeny is examined the ratio observed is 9 coloured flowers: 7 white flowers, certainly not a simple 3:1 ratio, or indeed a 9:3:3:1 ratio, expected in a simple dominant case of one and two genes, respectively. Once again it is known that two genes control flower colour in sweet pea (gene c05-math-0201 and gene c05-math-0202). Anthocyanin production (and hence coloured flowers) in sweet pea requires two operations whereby a precursor is acted on by an enzyme C, produced by the presence of the dominant c05-math-0203-allele, to produce an intermediate product, which is then acted on by a different enzyme P, produced by the dominant c05-math-0204-allele. Therefore anthocyanin and coloured flowers only occur when both the c05-math-0205- and c05-math-0206-alleles are present. So coloured sweet pea flowers can only result from both the enzyme C (c05-math-0207-gene) and enzyme P (c05-math-0208-gene) produced together (i.e. 9/16 c05-math-0209). From the other possible types: c05-math-0210 produces enzyme c05-math-0211, but not enzyme P; c05-math-0212 produces enzyme P but not enzyme C; while ccpp produces neither enzyme, and so all these types have white flowers. This is termed duplicate recessive epistasis whereby cc is epistatic to c05-math-0213, and pp is epistatic to c05-math-0214.

One final example relates to fruit colour in squash where two loci are involved. At one locus, fruit colour is recessive to no colour. The first locus must be homozygous for the recessive allele before the second colour gene at another locus is expressed. At the first locus, white fruit colour is dominant c05-math-0215 over coloured fruit c05-math-0216. At the second locus yellow fruit colour c05-math-0217 is dominant over green colour c05-math-0218, which is recessive. When a heterozygous c05-math-0219 is self-pollinated, three different fruit colours white:yellow:green are observed in a 12:3:1 ratio. All c05-math-0220 phenotypes (i.e. 9/16 c05-math-0221 and 3/16 c05-math-0222) are white, because of dominant epistasis where c05-math-0223 is epistatic to c05-math-0224 and c05-math-0225. Phenotypes with c05-math-0226 (3/16) have yellow fruit, while wwgg genotypes (1/16) have green fruit.

All possible phenotypic ratios in c05-math-0227 for two unlinked genes, as influenced by dominance at each locus and epistasis between loci, are shown in Table 5.4.

Table 5.4 Phenotypic ratios of progeny in the c05-math-0228 generation for two unlinked genes (where c05-math-0229 is dominant to c05-math-0230, and c05-math-0231 is dominant to c05-math-0232), with epistasis between loci.

c05-math-0233 phenotype Explanation
c05-math-0234 c05-math-0235 c05-math-0236 aabb
9 3 3 1 No epistasis.
9 3 4 0 Recessive epistasis: aa epistatic to c05-math-0237 and bb.
12 0 3 1 Dominant epistasis: c05-math-0238 epistatic to c05-math-0239, or bb.
13 0 3 0 Dominant and recessive epistasis: c05-math-0240 epistatic to c05-math-0241 and bb; bb epistatic to c05-math-0242 and aa. c05-math-0243 and bb produce identical phenotypes.
9 0 7 0 Duplicate recessive epistasis: aa epistatic to c05-math-0244, and bb; and bb epistatic to c05-math-0245 and aa.
15 0 0 1 Duplicate dominant epistasis: c05-math-0246 epistatic to c05-math-0247 and bb; c05-math-0248 epistatic to c05-math-0249 and aa.

In some ways the interaction between two different loci, in which the allele at one locus affects the expression of the alleles at another, will remind you of the phenomenon of dominance. The two phenomena are essentially different, however. Dominance always refers to the expression of one member of a pair of alleles relative to the other at the same locus, as opposed to another locus; epistasis is the term generally used to describe effects of non-allelic genes (i.e alleles at different loci) on each other's expression, in other words their interaction.

5.2.6 Qualitative inheritance in tetraploid species

Compared with crop species that are cultivated as pure inbred lines, there has been comparatively little research carried out on inheritance in auto-tetraploid species. This has been primarily due to the fact that the major autotetraploid crop species (e.g. potato) are clonally reproduced or are outbreeding species that suffer severe inbreeding depression (i.e. alfalfa). Many (or all) of these cultivars are highly heterozygous, and hence it is not as easy to carry out simple genetic experiments.

However, major gene inheritance in autotetraploids has been the topic of research/breeding groups with the aim of parental development. Consider, for example, the potato crop, where there are several single traits that are controlled by major genes such as resistance to Potato Virus X, Potato Virus Y, Potato Cyst Nematode (G. rostochiensis) and late blight (Phytophthora infestans). All these qualitative traits show complete dominance. The technique used to develop parents in a breeding programme is aimed at increasing the proportion of desirable offspring in sexual crosses, and in the extreme to avoid the need to test breeding lines for the presence of the allele of interest. The technique is called multiplex breeding.

In tetraploid crops any genotype may be nulliplex (aaaa), having no copies of the desired allele at the specific c05-math-0250 locus; simplex (Aaaa), having only one copy of the desirable resistance allele c05-math-0251; duplex (AAaa), having two copies of the allele; triplex (AAAa), having three copies of the allele; or quadruplex (AAAA) having four copies of the allele (i.e. homozygous at that locus). If the alleles show dominance, then genotypes that have at least one copy of the gene will, phenotypically, appear identical in terms of their resistance. But they will differ in their effectiveness as parents in a breeding programme. To determine the genotype of a clonal line, test crossing is necessary.

To illustrate the usefulness of multiplex breeding in potato, consider the problem of developing a parental line which, when crossed to any other line (irrespective of the genotype of the second parent), will give progeny all of which will be resistant to potato cyst nematode by having at least one copy of the c05-math-0252 gene (a qualitative resistance gene conferring resistance to all UK populations of the damaging nematode Globodera rostochiensis, and which has been shown to give relatively durable resistance).

The aim of multiplex breeding is therefore to develop parental lines that are either triplex or quadruplex (three or four copies of the desirable allele, respectively) for the c05-math-0257 allele. When crossed to any other parent, these multiplex lines will produce progeny that have at least a single copy of the c05-math-0258 gene and, due to dominance, all will be phenotypically resistant to G. rostochiensis nematodes. It will therefore not be necessary to screen for G. rostochiensis nematode resistance in these progeny. The ratios of resistant to susceptible amongst the progeny of genotypes derived by crossing a simplex, duplex, triplex and quadruplex to a nulliplex are shown in Table 5.5.

Table 5.5 The ratios of resistant to susceptible progeny amongst the genotypes derived by crossing a simplex, duplex, triplex and quadruplex resistant gene parent to a nulliplex parent.

Cross type Phenotype Resistant : Susceptible % resistant in progeny
c05-math-0253 1 : 1 50
c05-math-0254 5 : 1 83
c05-math-0255 1 : 0 100
c05-math-0256 1 : 0 100

Genotype ratios from all possible cross-combinations between nulliplex, simplex, duplex, triplex or quadruplex parents are given in Table 5.6.

Table 5.6 Genotype ratios from all possible cross-combinations between nulliplex (N), simplex (S), duplex (D), triplex (T) or quadruplex parents (Q).

Cross Nulliplex
(N)
Nulliplex All N Simplex
(aaaa) (S)
Simplex 1S : 1N 1D : 2S:1N Duplex
(Aaaa) (D)
Duplex 1D : 4S : 1N 1T : 5D : 5S : 1N 1Q : 8T : 18D : 8S : 1N Triplex
(AAaa) (T)
Triplex 1D : 1S 1T : 2D : 1S 1Q : 5T : 5D : 1S 1Q : 2T : 1D Quadruplex
(AAAa) (Q)
Quadruplex All D 1T : 1D 1Q : 4T : 1D 1Q : 1T All Q
(AAAA)

It was at first thought that developing multiplex parents would be very difficult; instead it proved simply to be a matter of effort and application. The main difficulty lies in the fact that progeny tests need to be carried out to test the genetic makeup of the dominant parental lines (very similar to backcrossing where the non-recurrent parent has a single recessive gene of interest). Consider one of the worst situations, where both starting parents are simplex. Three-quarters of the progeny will be resistant, but only a quarter will be duplex. So a quarter of the progeny will be nulliplex, hence susceptible, and so upon testing can be immediately discarded. The resistant progeny, however, need to be testcrossed to a nulliplex to distinguish the duplex from simplex genotypes. Once identified, the selected duplex lines are intercrossed or selfed. In their progeny, 1/36 will be quadruplex and 8/36 will be triplex. A single round test cross with a nulliplex will be necessary to distinguish, on the one hand, the triplex and quadruplex lines from, on the other, those that are either duplex or simplex. A second round test cross using the progeny of the first test cross will be necessary (i.e. a second backcross to a nulliplex line) in order to distinguish the quadruplex from triplex lines.

Once quadruplex lines have been identified, these can be continually intercrossed, or selfed, without further need to test. Similarly, quadruplex or triplex parental lines can be used in cross combination with any other parental lines and 100% of the resulting progeny will contain at least one of the dominant resistant alleles and show resistance. Multiplex breeding can be used in a similar way to develop parents in hybrid cross combinations.

5.2.7 The chi-square test

If plant breeders have an understanding of the inheritance of simply inherited characters, it is possible to predict the frequency of desired genes and genotypes in a breeding population. Breeders often ask questions relating to the nature of inheritance as well as the number of alleles or loci involved in the inheritance of a particular character of importance. For example, it is often valuable to determine whether a single gene is dominant, recessive or additive in inheritance. In the dominant/recessive case, the c05-math-0259 family will segregate in a 3:1 dominant : recessive ratio, while in the latter a 1:2:1 homozygous dominant : heterozygous : homozygous recessive ratio.

In segregating families such as those noted above, the ratios actually observed will not be exactly as predicted due to sampling error. For example, a coin is expected to fall as heads or tails in the ratio 1:1, but if tossed 10 times it does not always result in 5 heads and 5 tails. So the breeder is faced with interpreting the ratios that are observed in terms of what is expected, so it might be necessary to decode whether a particular ratio is really 1:2 or 1:3 or 3:4 or 9:7. How can this be done objectively? The answer is to use chi-square tests.

It should be clear that the significance of a given deviation is related to the size of the sample. If we expect a 1:1 ratio in a test involving six individuals, an observed ratio of 4:2 is not at all bad. But if the test involves 600 individuals, an observed ratio of 400:200 is clearly a long way off. Similarly, if we test 40 individuals and find a deviation of 10 in each class, this deviation seems serious:

observed 30 10
expected 20 20
obs−exp (difference) 10 c05-math-0260

But if we test 200 individuals, the same numerical deviation seems reasonably enough explained as a purely chance effect:

observed 90 110
expected 100 100
obs−exp (difference) c05-math-0261 10

The statistical test most commonly used when such a problem arises is simple in design and application. Each deviation is squared, and the expected number in its class then divides each squared deviation. The resulting quotients are then all added together to give a single value, called the chi-square c05-math-0262. To substitute symbols for words, let c05-math-0263 represent the respective deviations (observed minus expected), and c05-math-0264 the corresponding expected values, then:

equation

We can calculate chi-square for the two arbitrary examples above to show how this value relates the magnitude of the deviation to the size of the sample.

Sample of 40 Sample of 200
individuals individuals
Observed (obs) 30 10 90 110
Expected (exp) 20 20 100 100
obs−exp c05-math-0266 10 10 10 10
c05-math-0267 100 100 100 100
c05-math-0268 5 5 1 1
c05-math-0269 10 2

You might note that the value of chi-square is much larger for the smaller population, even though deviations in the two populations are numerically the same. In view of our earlier common-sense comparison of the two, this is a practical demonstration that the calculated value of chi-square is related to the significance of a deviation. It has the virtue of reducing many different samples, of different sizes and with different numerical deviations, to a common scale for comparison.

The chi-square test can also be applied to samples including more than two classes. For example, the table below shows the chi-square analyses of the tall c05-math-0270 barley example earlier. Suppose that a test cross was carried out where the heterozygous c05-math-0271 (TtSs) is crossed to a genotype with the recessive alleles at these loci. When this was actually done, and 400 test cross progeny were grown out, there were 112 tall and 6-row, 89 tall and 2-row, 93 short and 6-row, and 106 short and 2-row. Inspection of this would show that there is a higher proportion of each of the original parental types (tall/6-row and short/2-two) than the recombinant types. The question is, therefore: is this linkage, or simple random sampling variation? To determine this we would use the c05-math-0272 test.

Tall, Tall, Short, Short,
6-row 2-row 6-row 2-row
Observed (obs) 112 89 93 106
Expected (exp) 100 100 100 100
obs−expc05-math-0273 12 11 7 6
c05-math-0274 144 121 49 36
c05-math-0275 1.44 1.21 0.49 0.36
c05-math-0276 n.s.

The number of degrees of freedom in tests of genetic ratios is almost always one less than the number of genotypic classes. To be more precise, it is the number of observable data that are independent. For example, in a two-gene test cross there are four possible phenotypes, expected in equal frequency. So, in our example, 400 plants were observed and there were 112 in the first group, 89 in the second group and 93 in the third group; then by definition to sum to 400 there must be 106 in the last group c05-math-0277. Therefore our chi-square test would have 3 degrees of freedom. In just the same way, if we were testing two groups in tests of 1:1 or 3:1 ratios, they will have one degree of freedom.

Do not confuse assigning degrees of freedom to genetic frequencies with degrees of freedom in c05-math-0278 contingency tables. In a two-way contingency table, with pre-assigned row and column totals, one value can be filled arbitrarily, but the others are then fixed by the fact that the total must add up to the precise number of observations involved in that row or column. When there are four classes, any three are usually free, but the fourth is fixed. Thus, when there are four classes, there are usually three degrees of freedom.

Remembering the example given earlier with 40 individuals, the calculated c05-math-0279 for one degree of freedom. We can look up this value in the probability tables for chi-square, and in this case chance alone would be expected in considerably less than one in a hundred independent trials to produce as large a deviation as that obtained. We cannot reasonably accept chance alone as being responsible for this particular deviation; it represents an event that would occur, on a chance basis, much less often than the one-time-in-20 that we have agreed on as our point of rejection; this event would occur less often than even the one-time-in-a-100 that we decided to regard as highly significant.

In the case of the example we noted for barley, the expected frequency of each of the four phenotypes, if no linkage is present, would be 100 (total of 400, with four equal expectations), which would lead to deviations of 12, 11, 7 and 6; these squared and divided by the expected value c05-math-0280 gives a c05-math-0281 value of 3.5. This value is compared in probability tables for c05-math-0282 values and it falls just below the tabulated 50% probability value with 3 degrees of freedom, clearly not close to the accepted 5% probability we define as showing significance. We therefore say that there is no evidence that linkage exists, and that the observed deviations are likely to have occurred by random chance (sampling error).

Improper use of chi-square

The two most important reservations regarding the straightforward use of the chi-square method in genetics are:

  • Chi-square can usually be applied only to numerical frequencies themselves, not to percentages or ratios derived from the frequencies. For example, if in an experiment one expects equal numbers in each of two classes, but observes 8 in one class and 12 in the other, we might express the observed numbers as 40% and 60%, and the expected as 50% in each class. A chi-square value computed from these percentages cannot be used directly for the determination of the probability. When the classes are large, a chi-square value computed from percentages can be used, if it is multiplied by c05-math-0283, where c05-math-0284 is the total number of individuals observed.
  • Chi-square cannot properly be applied to distributions in which the expected frequency of any class is less than 5. In fact, some statisticians suggest that a particular correction be applied if the frequency of any class is less than 50. However, the approximations involved in chi-squares are close enough for most practical purposes when there are more than 5 expected in each class.

5.2.8 Family size necessary in qualitative genetic studies

It is usual in genetic work for the scope of the experiments to be limited by such considerations as lack of available space, labour or money. It is therefore essential to make the best use of available resources, which will be a function of the number of plants/plots that need to be raised. Achieving this usually requires considerable care and planning of experiments. Often statistical considerations can be of great value.

In many experiments it is desirable to be able to pick out certain genotypes, usually (but not always) homozygous, with the aim of developing superior cultivars. It may also be necessary to detect some particular nonconformity. For example, to detect any linkage effects will involve test crossing c05-math-0285 lines onto a recessive parent. It would be advantageous to keep the population size down to a minimum while also assuring with high probability that the experiment will be sufficiently large so as to detect the required differences.

Let us consider now the question of detecting homozygotes in a segregating population. Any progeny that fails to segregate could have derived from a homozygous parent or could fail to show segregation because of sampling error. The greater the numbers of individuals that are examined, the lower the probability that sampling error will interfere with the interpretation of experimental results. The minimum size of progeny designed to test a particular ratio is then a statistical question involving consideration of the probability that any individual in a family derived from a heterozygote will be of the recessive type, and also the permissible maximum probability of obtaining a misleading result.

Consider the following example. Suppose you wish to test a series of individuals phenotypically dominant for a single gene, in order to identify the homozygous individuals, by using a test cross to a homozygous recessive. The progeny of the homozygotes will not show segregation whereas progeny from the heterozygotes will segregate in a 1:1 ratio (i.e. c05-math-0286). The important error that can arise will be from our failure to detect any segregants in the progenies from crosses that actually do have a heterozygous parent. Let us also assume that we do not want the test to fail with greater probability than once in 100 experiments or tests (i.e. c05-math-0287). In the progeny of a heterozygote, each individual has a chance of 1/2 of containing a dominant allele. Then a family of c05-math-0288 will be expected to have c05-math-0289 individuals with one dominant allele. Therefore we can predict the possibility of having no such individuals, which represents the misleading (error) result which must be avoided and must not occur at a higher frequency than 1 in 100. Then the minimum value of c05-math-0290 is given by the solution of the equation:

equation

Taking natural logarithms this becomes:

equation

Therefore:

equation

In our example, c05-math-0294 Therefore the smallest family size needed would be seven or more, in order to be at least 99% certain of detecting the difference.

This formula can be generalized to:

equation

where c05-math-0296 is the number that need to be evaluated, c05-math-0297 is the probability of having at least one type of interest (i.e. 90% c05-math-0298) and c05-math-0299 is the frequency of the desirable genotype/phenotype.

It is often necessary in plant breeding to estimate the number of individuals that need to be grown or screened in order to identify at least c05-math-0300 individuals (where c05-math-0301 is greater than one). It is never a good idea to simply multiply c05-math-0302 (from above), the number that needs to be grown to ensure at least one by c05-math-0303 the number required, as this will result in a gross overestimation.

An alternative is to use an extension to the above equation. The mathematics behind this equation is outside the scope of this book. However, the equation itself can be useful. It is:

equation

where c05-math-0305 is the total number of plants that need to be grown; c05-math-0306 is the number of plants with the required alleles that need to be recovered; c05-math-0307 is the frequency of plants with the desired alleles; c05-math-0308 is the probability of recovering the desired number of plants with the desired alleles; and c05-math-0309 is a cumulative normal frequency distribution (area under standardized normal curve from 0 to c05-math-0310), a function of probability c05-math-0311. For the sake of simplicity and to cover the most commonly used situations, c05-math-0312 for c05-math-0313 (i.e. 95% certainty) and c05-math-0314 (for 99% certainty).

For example, consider the frequency of homozygous recessive genotypes (aabb) resulting from the cross c05-math-0315 at the c05-math-0316 stage. How many c05-math-0317 lines would need to be evaluated to be 95% certain of obtaining ten homozygous recessive genotypes? Here the probability of aabb is 9/64 (i.e. c05-math-0318), c05-math-0319, therefore c05-math-0320.

equation

Therefore a minimum of 111 c05-math-0322 lines would need to be grown.

5.3 Quantitative genetics

5.3.1 The basis of continuous variation

With qualitative inheritance, the segregating individual phenotypes are usually easily distinguished and fall into a few phenotypic classes (i.e. tall or short; white or red; resistant or susceptible). Characters that are controlled by multiple genes do not fall into such simple classifications. Let us consider a hectare field of potatoes, planted with one cultivar, and so every single plant in the field should be of identical genotype (ignoring mutations and errors). In that field there are likely to be 11,000 plants. At harvest, you have been given the task of harvesting all 11,000 plants and weighing the tubers that come from each plant separately. Would you expect all the potato plants to have exactly the same weight of potatoes? Probably not. Indeed, the weights can be presented in the form of a histogram (Figure 5.1) where 11,000 yields were divided into 17 weight classes. The variation in weight is obvious; some plants produce less than 0.5 kg of tubers, while others produce over 5 kg. Most, however, are grouped around the average of 3.2 kg per plant.

c05f001

Figure 5.1 Yield of tubers from individual potato plants taken at random from a single clonally reproduced potato cultivar.

Yield in potato, as in other crops, is polygenically inherited. Yield is therefore not controlled by a single gene but by many genes, all acting collectively. Thus yield has more chance of some of the processes these many genes control being affected by differences in the ‘environment’. As the example above involved harvesting plants that are all indeed clones of the same genotype, then the variation observed is due entirely to the environmental conditions to which each plant was subjected.

One of the major differences between single gene inheritance and multiple gene inheritance is that the former is relatively less affected or influenced by the environmental conditions compared with the later – at least in terms of the differences being expressed. When potato yields are being recorded we are recording the phenotype, but in reality as breeders we are interested in the genotype. The two are related by the equation:

equation

where P is the phenotype, G is the genotypic effect, E is the effect of the environment in which the genotype is grown, GE is the interaction between the genotype and the environment, and c05-math-0324 is a random error term. Unfortunately, the fact that agronomic practices are part of the environment plants face is often neglected. Therefore in order to reduce the overall environmental impact on the field expression of the genotype, it is essential to provide at all times the most uniform agronomic management to breeding trials.

As noted, relatively single gene characters are often less affected by environment, and so do not show much influence of c05-math-0325 interactions (i.e. the difference in expression of the two alleles is large in comparison with the variation caused by environmental changes). For example, a potato genotype with white flowers (a qualitatively inherited trait) will always have white flowers in any environment in which the plants produce flowers. Conversely, quantitatively inherited traits are greatly influenced by environmental conditions and c05-math-0326 interactions are common, and can be large. The greatest difficulty plant breeders face is dealing with quantitative traits, and in particular in detecting the better genotypes based on their phenotypic performance. This is why it is critically important to employ appropriate experimental design techniques to genetic experiments and also plant breeding programmes.

A major part of quantitative genetics research related to plant breeding has been directed towards partitioning the variation that is observed (i.e. phenotypic variation) into its genetic and non-genetic portions. Once achieved, this can be taken further to further divide the genetic portion into that which is additive in nature and that which is non-additive (quite often dominance variation). Obviously, in breeding self-pollinating crops, the additive genetic variance is of primary importance since it is that portion of the variation attributable to homozygous gene combinations in the population and is what the breeder is aiming for (i.e. homozygous lines). On the other hand, variance due to dominance is related to the degree of heterozygosity in the population and will be reduced (to zero) over time with inbreeding as breeding lines move towards homozygosity, but will be important when it comes to out-breeding species.

Let us now return to the potato weights as one of many possible examples of continuous variation. If the frequency distribution of potato yields is inspected, there are two points to note: (1) the distribution is symmetrical (i.e. there are as many high-yielding plants as really low-yielding ones); and (2) the majority of potato yields were clustered around a weight in the middle of the distribution. As we have taken a class interval of 0.3 kg to produce this distribution, the figure does not look particularly continuous. However, we know that potato yields do not go up in increments of 0.3 kg but show a more continuous and gradual range of variation. If we use more class intervals in this example we will produce a smoother histogram, and if we use infinitely small class intervals it will result in a truly continuous bell-shaped curve. The shape of this curve is highly indicative of many aspects of plant science and it is a distribution called a normal distribution, and occurs in a wide variety of aspects relating to plant growth, particularly quantitative genetics.

5.3.2 Describing continuous variation

The normal distribution

The 11,000 potato plant weights discussed above are a sample, albeit a large one, of possible potato weights from individual plants. It is possible to predict mathematically the frequency distribution for the population as a whole (i.e. every possible potato plant of that cultivar grown), provided it is assumed that the sample is representative of the population (i.e. that our sample is an unbiased sample of all that was possible).

It is not necessary to actually draw normal distributions (which, even with the aid of computer graphics, are difficult to do accurately). Most of the important properties of a normal distribution can be characterized by two statistics, the mean or average c05-math-0327 of the distribution and the standard deviation c05-math-0328, a measure of the ‘spread’ of the distribution.

There are in fact two means, the mean of the sample and the mean of the population from which the sample was drawn. The latter is represented by the symbol c05-math-0329, and is, in reality, seldom known precisely. The sample mean is represented by c05-math-0330 (spoken c05-math-0331 bar), and it can be known with complete accuracy. Generally, the best estimate of a population mean c05-math-0332 is the actual mean of an unbiased sample drawn from it c05-math-0333. The population mean is thus best estimated as:

equation

where c05-math-0335 is the sum of all c05-math-0336 values from c05-math-0337 to c05-math-0338.

The standard deviation is an ideal statistic to examine the variation that exists within a dataset. For any normal distribution, approximately 68% of the population sampled will be within one standard deviation from either side of the mean, approximately 95% will be within two standard deviations (Figure 5.2), and approximately 99% will be within three standard deviations of the mean.

c05f002

Figure 5.2 95% of a population that is normally distributed will lie within two standard deviations of the population mean.

Once again, it is necessary to distinguish between the actual standard deviation of a population, all of whose members have been measured, and the estimated standard deviation of a population based on measuring a sample of individuals from it. The standard deviation of a population is represented by the symbol c05-math-0339 (Greek letter sigma) and is defined thus:

equation

while the standard deviation of a sample is represented by the symbol c05-math-0341, and is given by:

equation

Another measure of the spread of data around the mean is the variance, which is the square of the standard deviation. The estimated variance of a population (i.e. obtained from the sample) is given by:

equation

and the actual variance of an entire population (if every member of it has been measured) is given by:

equation

Calculators are often programmed to give means and either standard deviations or variances with a few key strokes once data have been entered. However, in case it is necessary to derive these descriptive statistics semi-manually, it is useful to know about alternative equations for c05-math-0345 and c05-math-0346:

equation

Although these look more complicated than those given previously, they are easier to use because the mean does not have to be worked out first (which would entail entering all the data into the calculator twice). Note the difference between c05-math-0348 (each value of c05-math-0349 squared and then the squares totalled) and c05-math-0350 (the values of c05-math-0351 totalled and then the sum squared).

Standard deviations, as measures of spread around the mean, are probably intuitively more understandable than variances; for example, 68% of the population fall within one standard deviation of the mean. Why introduce the complication of variances? Well, variances are additive in a way standard deviations are not. Thus, if the variances attributable to a variety of factors have been estimated, it is mathematically valid to sum them to estimate the variance due to all the factors acting together. Similarly, the reverse is also true, in that a total variance can be partitioned into the variances attributable to a variety of individual factors. These operations, which are used extensively in quantitative genetics, cannot be so readily performed with standard deviations.

Variation between datasets

Two basic procedures are frequently used in quantitative genetics to interpret the variation and relationship that exists between characters, or between one character evaluated in different environments. These are simple linear regression and correlation.

A straight-line regression can be adequately described by two estimates: the slope, or gradient of the line, (b) and the intercept on the c05-math-0352-axis (a). These are related by the equation:

equation

It can be seen that b is the gradient of the line, because a change of one unit on the c05-math-0354-axis results in a change of b units on the c05-math-0355-axis. If c05-math-0356 and c05-math-0357 both increase (or both decrease) together, the gradient is positive. If, however, c05-math-0358 increases while c05-math-0359 decreases or vice versa, then the gradient is negative. When c05-math-0360, the equation for c05-math-0361 reduces to:

equation

and a is therefore the point at which the regression line crosses the c05-math-0363-axis. This intercept value may be equal to, greater than or less than zero.

The formulation and theory behind regression analysis will not be described here and is not within the scope of this book. However, the gradient of the best-fitting straight line (also known as the regression coefficient) for a collection of points whose coordinates are c05-math-0364 and c05-math-0365 is estimated as:

equation

where SP(x,y) is the sum of products of the deviations of c05-math-0367 and c05-math-0368 from their respective means (c05-math-0369 and c05-math-0370) and SS(x) is the sum of the squared deviations of c05-math-0371 from its mean. It will be useful to have an understanding of the regression analysis and to remember the basic regression equations.

Now, SP(x,y) is given by the equation:

equation

although in practice it is usually easier to calculate it using the equation:

equation

The comparable equations for SS(x) are:

equation

Notice that a sum of squares is really a special case of a sum of products. You should also note that if every c05-math-0375 value is exactly equal to every c05-math-0376 value, then the equation used to estimate b becomes c05-math-0377.

Having determined b, the intercept value is found by substituting the mean values of c05-math-0378 and c05-math-0379 into the rearranged equation:

equation

In regression analysis it is always assumed that one character is the dependant variable and the other is the independent variable. For example, it is common to compare parental expression with progeny expression (see Chapter 6), and in this case then progeny expression would be considered the dependant variable and parental expression independent. The expression of progeny is obviously dependent upon the expression of their parents, and not vice versa.

In addition, the degree of association between any two or a number of different characters can be examined statistically by the use of correlation analysis. Correlation analysis is similar in many ways to simple regression, but in correlations there is no need to assign one set of values to be the dependant variable while the other is said to be the independent variable. Correlation coefficients c05-math-0381 are calculated from the equation:

equation

where SP(x,y) is again the sum of products between the two variables, SS(x) is the sum of squares of one variable c05-math-0383 and SS(y) is the sum of squares of the second variable c05-math-0384, and:

equation

These can, of course, sometimes be calculated more easily by:

equation

Correlation coefficients c05-math-0387 range in value from c05-math-0388 to c05-math-0389. c05-math-0390 values approaching c05-math-0391 show very good positive association between two sets of data (i.e. high values for one variable are always associated with high values of the other). In this case, we say that the two variables are positively correlated. Values of c05-math-0392 that are near to c05-math-0393 show disassociation between two sets of data (i.e. a high value for one variable is always associated with a low value in another). In this case we say that the two variables are negatively correlated. Values of c05-math-0394 that are near to zero indicate that there is no association between the variables. In this case a high value for one variable can be associated with a high, medium or low value of the other.

5.3.3 Relating quantitative genetics and the normal distribution

Consider two homozygous canola (Brassica napus) cultivars (c05-math-0395 and c05-math-0396). The yield potential of c05-math-0397 is 620 kg/plot, and is higher than c05-math-0398, which has a yield potential of 500 kg/plot. When these two cultivars were crossed and the c05-math-0399 produced, the yield of the c05-math-0400 progeny was exactly midway between both parents (560 kg/plot). This would suggest that additive genetic effects rather than dominance were present.

If yield in canola were controlled by a single locus and two alleles (which it is not), we would have:

c05-math-0401
c05-math-0402

It should be noted here that upper and lower case letters denoting alleles do not signify dominance as in qualitative inheritance, but rather simply differentiate between alleles. It is common to assign uppercase letters to alleles from the parent with the greater expression of the trait, and designated as c05-math-0403.

When there is only one locus with two alleles involved, we would assume that the uppercase alleles each add 60 kg/plot to the base performance of a plant, and lowercase alleles add nothing. In this case the base performance is equal to c05-math-0404. Therefore, c05-math-0405 c05-math-0406, c05-math-0407 c05-math-0408, The c05-math-0409 c05-math-0410. In the c05-math-0411 we have a ratio of 1 AA : 2 Aa : 1 aa, and we would have three types of plants in the population: c05-math-0412 and c05-math-0413.

Obviously, yield in canola is not controlled by two alleles at a single locus. However, let us progress gradually and assume that two loci each with two alleles are involved. We now have:

c05-math-0414
c05-math-0415

In this case (assuming alleles at different loci have equal effects), each of the two uppercase alleles would each add 30 kg/plot to the base weight. The c05-math-0416 (the same as if only one gene was involved). However, in the c05-math-0417 we have 16 possible allele combinations that can be grouped according to the number of uppercase alleles (or yield potential).

AAbb
AaBb
aabB aABa AABb
aaBb AabB AAbB
aAbb aAbB AaBB
aabb Aabb aaBB aABB AABB
500 530 560 590 620

Extending in the same manner one more time, we see that the frequency distribution of the phenotypic classes in the c05-math-0418 generation when three genes having equal additive effects, and which segregate independently, are:

aaBbCC
aAbBcC
aAbBCc
aAbbCC
AabbCC
aabbCC AabBcC AABBcc
aabBcC AabBCc AABbCc
aabBCc AaBbcC AABbcC
aaBbcC AaBbCc AAbBCc
aaBbCc AaBBcc AAbBcC
aaBBcc aabBCC AAbbCC
aAbbcC aaBBcC AaBBCc
aAbbCc aaBBCc AaBBcC
aAbBcc aABbcC AaBbCC
aabbcC aABbcc aABbCc AabBCC AABBcC
aabbCc AabbcC aABBcc aABBCc AABBCc
aabBcc AabbCc AAbbcC aABBcC AAbBCC
aaBbcc AabBcc AAbbCc aABbCC AABbCC
aAbbcc AaBbcc AAbBcc aAbBCC aABBCC
aabbcc Aabbcc AAbbcc AABbcc aaBBCC AaBBCC AABBCC
500 520 540 560 580 600 620

In this case each single upper case allele adds only 20 kg to the base weight of 500 kg/plot. This is determined in the same way as for the one and two gene models, although it is considerably more involved. You should note once more that the c05-math-0419 would have had a yield potential of 560 kg/plot, exactly the same as in the single and two gene cases.

Even with only three loci and two alleles at each locus, it should be obvious that we are moving closer to a shape resembling a standard normal distribution. The frequency of different genotypes possible when four, five and six loci are considered has 9, 11 and 12 phenotypic classes, respectively. It is fairly easy to visualize, therefore, that with only a modest number of loci, with segregating alleles acting in a more or less equal additive way, truly continuous variation in a character would be approximated quite closely. Quantitative inheritance deals with many loci and alleles, often too many to consider trying to estimate the number, and therefore explains the ubiquity of the normal distribution. Just as the mean and variance can describe the normal distribution, many of the important elements of the inheritance of a character can be described and explained using progeny means and genetic variances.