Chapter 9
Contemporary Approaches in Plant Breeding

9.1 Introduction

In Chapter 7, selection was introduced and the different forms of selection that are imposed in plant breeding described. However, it is increasingly common that plant breeders use biotechnology protocols to help select and multiply plants, including at the molecular level, and there are now a range of techniques available to plant breeders. These have been developing over the last 40 years and have made, or are starting to make, significant contributions to the production of new cultivars. These include in vitro propagation and genetic markers (mainly molecular markers) which are being increasingly used in breeding programmmes, to supplement the more traditional selection methods already described.

9.2 Tissue culture

A variety of techniques have been developed under the broad umbrella of tissue culture. It is not the intention to cover the details of these techniques but to briefly consider a couple of them, enough to give an idea of their application.

9.2.1 Doubled haploids

Establishing true-breeding, homozygous lines (as noted earlier) is an essential part of developing new cultivars in many crop species. These homozygous lines are used either as cultivars in their own right or as parents in hybrid variety development. Traditionally, plant breeders have used inbreeding, the process of selfing or mating between close relatives, to achieve homozygosity, a process that is time-consuming. Therefore the opportunity to produce plants from gametic, haploid cells has been the goal of many plant breeders, since this technique that can produce ‘instant’ inbred lines once the chromosomes of the haploids are doubled. The time needed to produce truly homozygous lines can be significantly reduced, which can result in reduced time to develop new cultivars. The additional costs and complexity involved in producing double haploid lines can often be justified by the speeding up of the breeding cycle they enable.

The genetic phenomenon critical to obtaining homozygous lines is the formation of haploid gametes by meiosis. During this type of cell division, the chromosome number is halved and each chromosome is represented only once in each cell (assuming the species is basically a diploid one). If such gametic, haploid cells can be induced to develop into plantlets (i.e. we encourage the development of the sporophyte – note: lower plants often have this as a specific phase of the lifecycle), a haploid plant can develop which can then be treated (usually with a chemical called colchicine) to encourage its chromosomes to double, to produce a completely homozygous genotype (a doubled haploid).

From a quantitative genetics perspective, the main advantage of doubled haploid plants is based upon the elimination of heterozygous individuals, as the duplication of haploid cells would develop homozygous individuals (i.e. either AA or aa, using a single locus as example). By doubling the frequency of recessive homozygous individuals from the 25% found in an F2 population, to 50% in a double haploid population, accordingly the probability of identifying superior breeding materials can be increased.

Techniques used for producing haploids in vitro

Although the generation of double haploids is a very attractive technique to many plant breeders, the natural occurrence of haploid plants is rare. However, the use of plant tissue culture has allowed the production of plants from gametic cells cultured in vitro.

Even though haploid plants can be regenerated from both male and female sex cells, it is generally the male cells (microspores or pollen) that have proven most successful in the regeneration of large numbers of haploid and doubled haploid lines. This is partly because of the ease with which pollen, as opposed to eggs, can be collected, and partly because it is simply that, in general, many more pollen grains than eggs are produced.

There are, of course, exceptions, and some examples include:

  • The relative ease by which haploid barley plants can be produced from female sex cells. Interspecific crosses between cultivated barley (Hordeum vulgare) and the wild species H. bulbosum followed by in vitro culture of rescued immature embryos results in haploid plants as a result of exclusion of the H. bulbosum chromosomes during embryo development.
  • Dihaploids from tetraploid potatoes have been produced in large quantities, using interspecific hybridization between cultivated potato (Solanum tuberosum) and a diploid relative (S. phureja). The cross of the tetraploid female S. tuberosum with the diploid male S. phureja would be expected to produce only triploid offspring – but it does not. Instead, the numbers of seeds obtained are relatively few and are predominantly tetraploid (as a result of the production of unreduced (2n) pollen from S. phureja). Among the rest are some of the expected sterile triploids, but also some maternal dihaploids arising from the egg. Lines of S. phureja have been selected that produce a high frequency of dihaploid seed, greater than 70%. In addition such pollen parents have been selected to include a homozygous dominant embryo spot marker, which makes visual identification of the non-dihaploid seed easy.
  • In bread wheat, its interspecific cross with maize pollen induces the production of haploid plants from female sex cells which can be recovered into full, fertile plants through the combination of tissue culture and chromosome duplication through colchicine.
  • In maize, the crossing of inbreds with haploid inducer genotypes has recently enabled the cost-effective development of doubled haploid lines in an increasing number of both private and public breeding programmes.

There are other haploid induction mechanisms, but the most widely applicable are via anther or microspore (immature pollen grains) in vitro. The anthers, of course, are flower organs in which microspores mature into pollen grains under normal conditions (i.e. in vivo). The production of haploid plants from anther culture has been reported for over 200 species of higher plants. However, although the technique offers great potential for use in plant breeding programmes, the current examples of its application on a large, practical scale are restricted, but some are provided by commercial programmes in rice, wheat, barley, rye, canola, tobacco, potato, pepper and maize.

Probably the most successful examples of the use of doubled haploid plants are commercial programmes of canola and maize, which use double haploid plants on a regular and increasing basis. The most common approaches deployed to obtain double haploid plants in canola and maize are microspore culture and interspecific crossing with haploidy inducer stocks, respectively.

9.2.2 Some potential issues

Genotype dependence

One factor that has limited the use of anther culture in practical plant breeding programmes is that even the different variants of the protocol often show strong genotypic dependence. Therefore, if a protocol is identified which is effective for one genotype, that protocol often needs to be modified (sometimes to a large extent) to obtain success with another genotype, or, more appropriately, a range of genotypes.

Somaclonal variation

The techniques noted above involve producing plants that have been regenerated following in vitro culture. Variation can often be detected among such plants that are regenerated, and this variation has been termed somaclonal variation. The frequency of such variation has been suggested as reflecting the occurrence and length of the callus phase. In a haploid production scheme it is therefore essential that callus stages are kept to a minimum, so that any somatic variation is kept under an acceptable threshold.

Non-random recovery of haploid lines

An important need underlying the application of haploids in a plant breeding context is that the population of homozygous lines, derived from the chromosome doubling of the haploids produced, are a random representation of the gametic array possible. In other words, the possibility of unconscious selection occurring (effectively gametic selection) must be avoided. The genetic combinations recovered from haploid systems may be disproportionately composed of combinations from one of the original parents that were used to make the hybrid crosses from which the anthers were taken. An obvious possibility is that one of the parents showed a much greater propensity for regeneration in culture, or more responsiveness to plant growth regulators, and this would result in combinations with the genes that determined its response being represented more frequently in the population of gametes. In experimental studies it has been shown that non-randomness of the possible gametic combinations can occur and can be influenced by the culture protocols used.

Constraints on recombination opportunities

As already discussed in previous chapters, recombination processes are the main engine through which new genetic combinations and genetic variation is created, and therefore are of paramount importance to plant breeders. While a traditional, pedigree-based breeding programme offers several opportunities for chromosomes to recombine, in many cases double haploid lines are derived from gametes from F1 populations. This could potentially create a constraint on the genetic variation creation ability of breeding programmes, which can nevertheless be addressed in several effective ways, such as inducing double haploid plants from heterozygous F2 plants instead of F1 generations.

Practical applications of haploids

Progress in evaluating gametic-derived plants under field conditions has been increasing dramatically; reports using numerous crops have indicated the importance of continued research in this area. A previously unforeseen advantage of using doubled haploid plants has been an improvement of the quality of field datasets in hybrid crops using testcross testing (Section 4.6.3), because it enabled testing of only homozygous genetic classes in the field. In the case of maize, for example, the traditional testing of testcross generations generated from F2 populations was inherently based on a combination of homo- and heterozygous genotypes, whereas double haploid testcross generations derived from an F2 population only carry two homozygous genotypes, lacking the heterozygous class.

It has been suggested that developing haploids in a practical breeding scheme will not be as effective as might be expected. In particular, concerns have been raised regarding:

  • The cost of producing haploids.
  • The inability to easily produce large numbers of homozygous lines through haploidy.
  • The deleterious variation that is sometimes exposed as a result of recessive alleles in the original material or mutational/somaclonal variation induced as a result of the in vitro techniques.
  • The dependence on the genotype of the parental material used in influencing the frequency of haploids produced – which often means that the very material the breeder most wants to use is non-responsive and haploids are not easily obtained.

Nevertheless, as refinements are made in methods and protocols, it is likely that it will become easier and cheaper to produce haploids on a routine basis. This will then mean that their impact on plant breeding programmes will be greater in the future. To date, very few cultivars have been introduced as a direct result of haploidy (perhaps those produced in China being the exception). However, there is little doubt that these techniques have added valuable information for plant breeders with regard to a number of aspects of genetics and tissue culture.

As noted earlier, one limitation to the widespread use of doubled haploids among many crops is the inability to produce large enough numbers of plants from culture. Regeneration frequencies are improving continuously, however, which will not only improve the applicability of the technique in a range of species, but will also increase the potential for their application in other ways. For example, the possibility of deliberately applying positive selection pressure during the culture phase for certain characteristics, that is, in vitro selection, will become even more attractive. Also this might be combined with induced mutagenesis during microsporogenesis, for example, allowing production of novel resistance to fungal or bacterial pathogens or to herbicides.

9.2.3 In vitro multiplication

In vitro multiplication of breeding lines can have two main benefits (particularly in clonal species) in relation to plant breeding programmes:

  • Plants propagated in vitro can generally be initiated to be disease-free, and can be used to: help maintain stocks of breeding lines; facilitate long-term germplasm storage; facilitate international exchange of material; and reduce the length of quarantine periods.
  • Short ‘generation’ times and fast growth means that rapid increases in plant number can readily be achieved.

Both the above have particular importance to clonal crops, which tend to have a relatively low multiplication rate as a result of their vegetative mode of propagation and which are particularly susceptible to viral and bacterial diseases that tend to be multiplied and transmitted through each clonal generation.

Good examples of maintaining high disease status and offering rapid plant regeneration potential include potato and strawberry. Other, perhaps less well-developed examples include in vitro propagation of date and oil palms. In these crops it was found that rapid plant regeneration would indeed offer an alternative to the slow and lengthy process of propagating side shoots in date palm and a more uniform planting material in the case of oil palm. However, in date palm the process is still very genotype dependent, and with oil palm there proved to be an unacceptably high frequency of sterile palms produced with initial protocols; however, these are now being revised and would appear to offer practical possibilities.

9.3 Molecular markers in plant breeding

Although plant breeders have successfully practised their art for many centuries, genetics is a subject that really only ‘came of age’ in the twentieth century with the rediscovery of Mendel's work. Since then research in genetics has covered many aspects of the inheritance of qualitative and quantitative traits, but plant breeders usually still have little, or no, information about:

  • the locations of many of these loci in the genome or on which chromosome they reside;
  • the number of loci involved in any trait;
  • the relative size of the contribution of individual alleles at each locus on the observed phenotype, except where there is an obvious major effect (e.g. height and dwarfing genes).

9.3.1 Theory of using markers

The concept of associating easily visualized markers in plants with loci affecting qualitative and quantitative variation in traits of interest is not new, and was first proposed by Karl Sax in 1923 while studying the association between the colour and size of common beans. Since then a variety of contributions have been made to the general concept and theory of using mapped genetic markers for identifying, locating and manipulating genes of specific interest. The basic idea is straightforward. If a trait or characteristic is difficult to score for whatever reason (e.g. it shows continuous variation; assessment is detailed and time-consuming; or the trait is only expressed after several years of growth), an easily scored marker that was determined by a locus genetically linked with that affecting the character would be an attractive alternative, surrogate way to monitor the locus of interest.

The concept, therefore, is to use the marker locus as a point of reference for the chromosomal segment in the vicinity of the gene that is really of interest. The approach requires that alternative alleles at the marker locus match the different alleles at the locus of real interest, thus effectively marking the sections of the homologous chromosomes containing the locus that determines the particular expression of the trait we are trying to select.

The association of these marked chromosome segments with the expression of specific quantitative characters can be evaluated while allowing other chromosomal regions in the same individuals to vary at random. The aim, therefore, is to obtain molecular marker that are closely associated with the locus determining the desirable phenotypic expression of polygenic characters such as yield or quality, and so selection procedures could be based upon these markers rather than on phenotypic observations.

The segregating nature of F2 populations (resulting from selfing an F1 produced by crossing two homozygous inbred lines) often makes this generation ideal for studying quantitatively inherited characters. Investigations have also been carried out using BC1 generations, although the information obtained from this type of investigation is likely to be reduced (approximately half) compared with that obtainable from studies on F2s. In species where double haploid populations are available, such as canola and maize, they have been used extensively in the quest for molecular markers to be used in selection schemes, because of the reduction in genetic complexity they represent.

With an adequate number of uniformly spaced markers, it is possible to identify and characterize linkage groups, which represent the chromosomes involved. It is also possible to construct such a detailed genetic map (a graphical array of ordered molecular markers along each chromosome as well as the genetic distances – expressed as centiMorgans – existing among these) so that the location of all major genetic factors associated with the quantitative trait might be linked rather easily and thus, by following the presence/absence of the different alleles, their individual and interactive effects can be described.

Markers in plants could assist plant breeders in the development of a better understanding of the underlying genes for characters of interest, as well as providing breeders and geneticists with a powerful approach for mapping and manipulating individual loci associated with the expression of these traits. In addition, if the marker genes are tightly linked to other qualitative or quantitative characters, then much of the selection in a plant breeding scheme could be carried out based on the identification of specific set of alleles at the marker loci.

The ability to identify loci that have effects on specific quantitative traits (termed quantitative trait loci – QTL) should lead not only to the ability to handle these loci in a much more deterministic manner, but also provide a more powerful means of investigating epistasis, pleiotropy and the genetic base of heterosis. So the effective use of mapped genetic markers enables advances in cultivar development and selection procedures.

Genetic markers in plants associated with expression of morphological characters have been used for quite a long time, and marker maps assembled. They have been quite well developed in a number of species (e.g. wheat, maize, potato, barley, peas and tomatoes, but also forest species) but generally had rather limited usage because of the problems in finding or generating such markers and their genotype-specific nature. This is changing very rapidly, though, because of:

  • The dramatic cost reduction of molecular data as a consequence of the technological progress. The cost of a molecular datapoint today is at least 100 times cheaper than it was 10 years ago.
  • The progress observed with developing breeding approaches based on molecular markers in animal breeding, which are being readily deployed in plants.
  • The outcomes of the partial/complete sequencing of genomes in many important crop species by providing a massive number of molecular markers. For instance, the sequence analysis of a diverse set of inbred lines in maize yielded over 3 million SNPs (single nucleotide polymorphisms).

The characteristics of a ‘good’ marker system are:

  • that the markers are easy, quick and inexpensive to score the phenotypes expressed;
  • the markers are neutral in terms of their phenotypes, and so have no deleterious effects on fitness and no effects on any other traits, including undesirable epistatic interactions with any other traits;
  • they reveal a high level of polymorphism (i.e. allelic richness);
  • they are robust enough so they can be readily transferred between laboratories and researchers in diverse countries;
  • they are stable in expression over environments;
  • they can be assessed early in the development of the plant (seedling level), and/or in tissue culture, and require little plant material as source of DNA. Thus evaluation is possible without the need to grow a plant for months or even years before it can be scored. Indeed, a molecular marker that could be assessed in just a fraction of an embryo without compromising the viability of the remaining embryo would enable the selection of only those seeds carrying appropriate alleles, and remove the need to plant all those seeds carrying alternative alleles. The use of embryo fractions in plant breeding is not new, and in the 1960s and 1970s it was very successfully used in Canada to identify canola breeding lines lacking erucic acid in their fatty acid profiles;
  • the scoring should be non-destructive, so that desirable individuals can be selected and grown to maturity;
  • codominance in expression of the alternative alleles, so that heterozygotes can be differentiated from homozygous dominant genotypes. However, the increasing use of double haploid breeding approaches in crops like corn and wheat alleviates this otherwise important requirement.

9.3.2 Types of marker systems

Any type of genetic marker that has the above properties (or many of them) may be suitable for marker-based applications in the investigation and manipulation of quantitative traits, but the question is really: how closely do they conform to the ideal requirements given above?

The types of markers that can and have been used in plant breeding include:

  • Morphological markers, which are basically those that you see by simply looking at a plant's phenotype, including characters such as pigmentation, dwarfism, leaf shape, absence of petals, and so on. It is possible, of course, to choose ones that are easily scored, but the difficulties with morphological markers include that they: cannot always be scored early in development (e.g. flower colour); are often associated with deleterious effects (e.g. albinism); are often relatively rare; their expression is not always independent of the environment in which they grow; and often show dominance/recessiveness.
  • Biochemical markers, such as isozyme markers. Isozymes (an abbreviation for isoenzyme) are variant forms of an enzyme, which are functionally identical but can be distinguished by electrophoresis (in other words when placed in an electric field). Under these circumstances the different forms of the enzyme will migrate to different points in the electric field depending on their charge, size and shape. Isozymes have been used very successfully in certain aspects of plant breeding and genetics since they: generally appear to be nearly neutral in their effects on fitness; are rarely associated with undesirable phenotypic effects on other traits; are usually free of environmental influence; and can often be extracted from tissue early in development. So they have a number of inherent properties that allow them to be used effectively for characterizing, and selecting for, qualitative and quantitative characters. Unfortunately, the number of genetic markers provided by isozyme assays is not over-abundant, and they can be either co-dominant or dominant in expression. As a result, the use of isozymes as genetic markers did not allow the full potential of genetic mapping to be realized.

In reality, the practical impact of morphological and biochemical markers has been negligible because of their constraints, and molecular markers are the most appropriate option available for breeding programmes.

  • Molecular markers, which are able to detect genetic variation directly at the DNA level. There are basically two systems by which molecular markers are generated, and these need to be described briefly to allow an understanding of their application, but their thorough description goes beyond the scope of this book. The two systems can conveniently be classified as non-PCR-based methods and PCR-based methods. Before briefly describing each it is worth pointing out that molecular markers are simply differences in the DNA between individuals, groups, species, taxa, and so on. Clearly the type and level of variation in DNA that we would want to examine is different depending on what level of distinction we are interested in and what questions we are answering.

Given the above characteristics of molecular markers, particularly their relatively unlimited numbers, it is no surprise that the advent of the possibilities for molecular markers in the 1990s was greeted with some excitement and is seen as providing a major change in the potential to exploit the ideas for using markers advocated some 70 years earlier.

9.3.3 Molecular markers

Non-PCR methods – DNA/DNA hybridization

The first and most widely known of these is restriction fragment length polymorphism (RFLP), originally developed in human genetics. Other non-PCR methods do exist, for example the use of tandemly repeated regions of DNA, known as mini-satellites or micro-satellites, but these will not be described here.

RFLP analysis involves digesting the DNA (cutting it at sites with very specific sequences – there are a number of different enzymes, called restriction enzymes, that cut different patterns of sequences) into fragments, which can then be separated out by gel electrophoresis (as for isozymes, separating them by their differing mobilities in an electric field). To visualize their positions, they are ‘blotted’ onto a filter, where they are hybridized with a labelled (usually radioactive) ‘probe’. The probe is a short fragment of DNA, which may be from a known gene, an expressed sequence or an unknown fragment of the genome. When the ‘blotted DNA’, having first been denatured to reduce it to single strands (rather the usual double-stranded state of DNA), and the probe (also denatured) are brought together, where there is an exact match in the complementary sequences they will hybridize (by hydrogen bonding) or bond. The filter is then washed to remove all the excess probe and leave only that which is now bonded with our sample DNA. If we expose the filter to X-ray film, when it is developed it will show where the probe still remains, hence where the probe has hybridized and so where there was a piece of the DNA we were investigating which had a complementary sequence. The pattern of bands obtained in this way is called the restriction fragment pattern. Using a varied combination of enzymes and probes gives a wide range of possibilities for exposing variation in the DNA sequences.

RFLPs are highly reproducible, they show codominance in their expression and are reliably specific. However, they are relatively time-consuming, rather expensive, not easy to automate, require fairly large amounts of ‘clean’ DNA, and tend to use radioactive probes for best results. Although they are extremely useful for detailed genetic analyses, they are not well suited to the needs of breeding programmes.

PCR-based methods – arbitrarily primed techniques – multi-locus systems

The most commonly used approach in the past was randomly amplified polymorphic DNA (RAPD). The technique basically involves using a single ‘arbitrary’ primer – a 10-nucleotide-long sequence of DNA in a PCR reaction. The basic ingredient of the PCR reaction is DNA polymerase, an enzyme that enables the copying of a duplicate molecule of DNA from a DNA template, and is commonly Taq polymerase, a thermally stable DNA polymerase. The primer anneals to its complementary sequences in the DNA sample being studied, and ‘primes’ the DNA polymerase to start DNA amplification. These amplification products can be resolved on agarose gels.

The advantages of RAPDs are that it requires only small amounts of relatively crude DNA; it requires modest, widely available equipment (thermal cycler and electrophoresis devices); and no prior knowledge of the gene or DNA sequence is required. It is fast and relatively inexpensive. However, the results can be rather variable depending upon slight changes of the PCR conditions or ingredients, and RAPD markers show dominance, and so their transfer between laboratories is less than ideal. Despite the early excitement and their very low cost, RAPDs are very rarely used on a regular basis in breeding applications.

A more reliable method developed is amplified fragment length polymorphism (AFLP), and this is not only more repeatable but also gives much higher frequency of markers, and inter-simple sequence repeats (ISSRs or anchored microsatellites). However, the details of these are beyond our present remit. AFLPs include a DNA digestion step with restriction enzymes, which tends to preclude their routine use as a high-throughput marker system in breeding programmes.

PCR methods – site targeted techniques – single locus systems

Rather than using arbitrary primers, it is possible to specifically design primers to be used in PCR. There are a number of possibilities to design primers, but one such approach is single sequence repeats (SSR), also known as microsatellites. Microsatellites are simple sequence repeats which are ubiquitous around the genome and are generally quite variable in exact DNA base-pair composition. If one pictures these at different places in the genome, the DNA ‘flanking’ these regions will be different depending on where they are (i.e. the site at which they are found will be unique). So you can ‘fish’ for these in genome libraries cloned in E. coli with simple repeats as probes, then sequence positive clones and design PCR primers with the main part being simple repeats but the ends being other unique ‘tags’. This allows much more robust markers to be generated but with all the advantages of the PCR technology. SSRs have been extensively used in breeding programmes and genetic research, even though they are rapidly being replaced by single nucleotide polymorphisms (SNPs, pronounced “snips”). SNPs represent an outstanding source of molecular markers as they exploit variation existing directly at the DNA level: when directly comparing aligned DNA sequences of two individuals, nucleotide variants such as indels (insertion/deletions) or single point mutations can be detected that enable very specific PCR assays to be developed. SNPs are orders of magnitude more variable than any other known molecular marker; in maize, the analysis through sequencing of a diverse set of inbreds determined that 1 base pair in every 44 was polymorphic.

SNPs are robust, amenable to automatization, and also able to uncover more genetic variation than any previously developed molecular marker systems. In addition, their codominant nature increases the genetic insight they provide. They are resolved on DNA sequencers rather than on gels, which further contributes to their reliability and transportability among laboratories. Nowadays there are thousands of publicly-available SNPs in the crops most important for humankind.

9.3.4 Uses of molecular markers in breeding programmes

Molecular markers can therefore be used to identify cultivars (DNA fingerprinting), to differentiate one cultivar from another (perhaps one already released), or to be able to prove proprietary ownership of specific cultivars. If you have a modest set of markers it is possible to produce a DNA fingerprint which is unique (or nearly so) and so can potentially be used to identify that particular genotype. Similarly, using the same principle it is possible to identify DNA that is not supposed to be there and so it can be used to ensure that a particular cultivar is pure and free from contaminants. A further possibility is afforded by the potential to assess how diverse genotypes are at the DNA level and hence assess their level of difference (genetic distance) if used as parents (e.g. parents of hybrid cultivars).

Marker-assisted backcrossing

When a gene of interest can be shown to be linked to a molecular marker, then assessment of the marker can help to accelerate the backcrossing process. Mature plants would not need to be grown to identify which backcross individuals carry the allele of interest. This is particularly helpful where, although determined by a major gene, the phenotypes are difficult or time-consuming to detect or are expressed later in development (e.g. fruit colour). Molecular markers can identify which of the backcross progeny have better restoration of the rest, or background, of the genome of the recurrent parent. Marker-assisted backcrossing has enabled the development of trait integration approaches where a transgene is introgressed quickly and accurately into elite breeding lines in breeding programmes. When marker-assisted backcrossing is combined with winter nurseries, allowing several crop generations to be planted within a calendar year, variety development can be dramatically accelerated.

They can provide breeders with vital information about the legitimacy of any cross, but particularly if a supposed wide cross (or interspecific cross) is a rare genuine event or the result of an unfortunate illegitimate pollination. Indeed, when used in conjunction with cytogenetic information, they can give very precise information about what chromosome or parts of chromosomes are present in interspecific hybrids or generations derived from such hybrids. In tree breeding programmes, this specific application is critical, as many years might pass between the time a cross is made and when the progeny from such a cross is field-tested.

When a number of markers have been generated then they can be used to build a genetic map of a species – an ordered array of genetic markers along chromosomes which also displays the genetic distances existing between these markers- and hence provide much clearer ideas of the positions on chromosomes of different genes and so determine the associations that might be expected between simply inherited traits. Thus helping to determine the selection strategy that will be most applicable.

QTL mapping

If a genetic map based on a mapping population is available, as well as phenotypic trait data collected from the same population, the genetic position (both in absolute terms, i.e. in which chromosome, and in relative terms, i.e. where in such a chromosome) of QTL involved in the genetic architecture of such a trait on the genetic map can be established. Subsequently, one could attempt to use neighbouring molecular markers to ‘tag’ such QTL, and so to follow their segregation and use such molecular markers as surrogates of the QTL. In breeding programmes this could be used, for instance, to conduct selection based on the presence/absence of molecular markers genetically linked to a given QTL, instead of running selection based on phenotypic assessments of those traits, thus reducing time and expenses. Also, it might assist the early identification in plantlets of individuals carrying a QTL for which expression would take place during later stages of crop development. Even though conceptually this appears as appealing and straightforward, results of selection in breeding programmes based on QTL or their neighbour molecular markers have enjoyed very limited success, at best. This is because the difficulty remains in assessing the quantitative trait expressions accurately and in ways that are relevant to the agronomic circumstances in which the cultivars will finally be grown. Genotype × environment interactions could pose as large a problem in QTL as it does in traditional evaluation and selection. QTL, however, might offer plant breeders an opportunity to obtain a better understanding of the genetic basis of genotype × environment interactions, epistasis and heterosis. Also, it is clear that amongst the quantitative variations exhibited for many traits, there are some regions of DNA that determine rather large parts of the variation that we observe – if these could be handled effectively, the effort that was saved could be focused on the non-defined regions. Another significant issue encountered is that the expression of many QTL has turned out to be dependent upon genetic context, and so QTL discovered in a given population might not exert the same allelic effect in a different population, or might not even exert any allelic effect whatsoever. It is therefore fundamental that before embarking upon the deployment of breeding schemes based on QTL to enhance the efficiency of selection, those QTL must be thoroughly validated across genetic contexts and environments.

To a large extent the limited impact of QTL mapping in applied plant breeding relates to the quantitative genetics of agronomically relevant traits, controlled by a large number of QTL with small individual effect and expression strongly influenced by the environment. Another significant limitation is the way in which research around QTL has been organized: first QTLs are found, and subsequently their effects are estimated. The biparental populations used to map QTLs often lack breeding relevance and represent additional costs and time. A likely consequence of this is a biased estimate of QTL and the failure to observe QTL with small effects. These significant pitfalls can be mitigated by the use of association mapping approaches, as these rely on the use of populations closer to the context of ‘real-life’ breeding programmes. Nevertheless, the issue of biased genetic effects remains.

Association mapping

In order to overcome some of the pitfalls described in QTL mapping, mainly their lack of breeding relevance, association approaches have been developed that exploit linkage disequilibrium and enable the genetic mapping to be carried out in sets of genotypes rather than in mapping populations. For instance, an association mapping project might assemble several hundred individuals encompassing elite breeding lines, breeding germplasm and other sources of genetic variation. The molecular markers most often used in association mapping approaches are SNPs. Using statistical analyses, significant marker–trait associations are established. The typical outcome of association mapping efforts are haplotypes (a combination of SNPs encompassing a small chromosomic block) statistically associated with the expression of the trait under study, which could then be used as a selection tool.

Despite over 25 years of research with molecular markers and the publication of thousands of papers on the topic of QTL mapping (the first paper linking genetic maps based on molecular markers and the detection of QTL was published in 1988) and on a lesser scale association mapping efforts, there are very few documented cases of their successful or routine use in applied breeding programmes. Among additional reasons for such a less than ideal contribution to breeding programmes, several in particular could be mentioned, such as the use of small, biparental mapping populations lacking relevance to the populations used by breeders, sparse genetic maps, inappropriate data analysis, poor phenotypic datasets and lack of thorough validation – in the genetic and environmental context – before their routine use. In addition, although the allelic effects of most reported QTLs might be valuable from a research perspective, in many cases they do not support genetic gains large enough to offset the expenses incurred in detecting the QTL in the context of breeding programmes.

In order to attempt to overcome the pitfalls of molecular markers and marker-assisted selection approaches (briefly described above), a new approach has recently been developed known as genomic selection, or genome wide selection. It represents a good example where plant breeding borrows heavily from animal breeding concepts, being originally proposed by Theo Meuwissen and colleagues in Norway in 2001. Genomic selection (GS) does not refer to the use of molecular markers identified through QTL or association mapping approaches as an aid to plant breeding. Rather, it represents an entirely new perspective on the exploitation of molecular data in breeding programmes.

Genomic selection

Genomic selection builds on the availability of large numbers of molecular markers (mostly SNPs) to estimate marker effects across the entire genome, rather than at single QTL. So, GS aims to capture the entire set of both large- and small-effect QTL, and this in turn enables GS to fully comprehend the genetic variance existing for a given trait in a population, whereas QTL approaches can only capture a limited proportion of such genetic variance. GS integrates all those genetic effects into so-called genomic estimated breeding values (GEBV) which represent the genetic merit of individuals. It uses a training population of individuals for which a large amount of genotyping information (hundreds to thousands of SNPs) and high-quality phenotyping data is integrated to calculate GEBVs. Conceptually, they represent the ultimate selection criteria as they encompass the entire genome rather than just some markers or QTL/haplotypes. Subsequently, selection based upon GEBVs is imposed upon breeding populations for which a sufficient volume of genotyping information is available. It is important to stress that only genotypic, not phenotypic, information is required from those breeding populations, since this represents a paradigm shift in the way breeding programmes have successfully been run for many years (i.e. relying entirely on phenotypic information to make selection decisions).

As would be expected with any novel approach, there are many research issues that need to be addressed and resolved before GS could have a meaningful, significant impact in breeding programmes. One such area of research is the way training populations need to be created in order to maximize the capture of genetic variation while keeping breeding relevance. An additional area of work refers to the complex statistical models required to estimate the genetic merit of individuals. An often overlooked area of research is the phenotypic basis of GS efforts; because selection is imposed upon breeding populations lacking phenotypic data, it is imperative that the phenotypic data used to establish GEBV be of the utmost quality and accuracy. In other words, these phenotypic datasets must be based upon the best statistical designs available, the best agronomics and trait data collection procedures and the most appropriate linear mixed model-based statistical analyses, leading to datasets with large heritability values.

Even though genomic selection is still in its infancy, several research projects are already underway in wheat, corn, barley and the forest species eucalyptus. Particular research issues remain, such as our still rather limited ability to establish commercially relevant genotype to phenotype relationships. Regardless, it is envisaged that in the years to come GS will revolutionize plant breeding, by shrinking breeding cycles and increasing the overall efficacy of breeding programmes.

9.3.5 Issues with markers

In many instances, using molecular marker techniques (say for selection) is basically more expensive and more technically demanding than other selection options. It is therefore not really cost-effective to set the necessary laboratory facilities and trained staff to handle a few crosses or perhaps a situation where the profit returns on the breeding are low.

Finding a molecular marker that is associated with a major gene of interest or a QTL is not always too difficult, but ensuring that it is close enough not to be lost by subsequent recombination is more difficult. Also, the applicability of the marker combination over a range of crosses rather than just a specific one is a concern that takes considereable time and effort.

Nevertheless, the exploitation of QTL offers great potential that has yet to be realized in practical terms. Developing good and reliable QTL will require a great deal of well designed and accurate field-testing. As already noted, genotype × environment interactions may pose as large a problem in QTL as they do for traditional selection. Finally, there needs to be even better repeatability between results obtained by different research teams. Different researchers sometimes identify different loci to be responsible, in QTL analyses, for the major differences in expression of the phenotypes for the character of interest. Some of these differences will reasonably be ascribable to the fact that different alleles are segregating in different crosses or being expressed at different levels in different circumstances – note the similar problems with heritability estimates. But there are also technical differences that need to be corrected before the true potential of QTLs can be realized.

The only reported successful examples of deploying QTL information in large-scale, commercial breeding efforts represent private endeavours based in the US: Monsanto's maize breeding programme and Pioneer's soybean breeding programme.

A previously unforeseen positive consequence of the deployment of genetic markers in breeding programmes has been an increasing awareness of the need to have high-quality phenotypic datasets. Unless the field phenotypic data are of high enough quality, the promise and potential of molecular markers will not be realized, nor the delivery of the expensive investment in labs, molecular biology, bioinformatics and staff needed to set up marker-assisted selection approaches.

9.3.6 The increasing availability of genome sequences

The success of the human genome sequencing project, the continuous advancement of sequencing approaches, the dramatic reduction in sequencing costs led by the quest to achieve the sequencing of human genomes for under US$1000, reflected for instance in current claims of cost per megabase (106 base-pairs of DNA) sequenced below US$ 10 cents, and the availability of computing power and software to process, analyse and make sense of the deluge of genomic information, have all enabled the sequencing of crop genomes to become routine. A typical crop genome sequencing effort encompasses the following generic steps, although important modifications may occur both in terms of chosen steps and extent of analyses:

  1. Production of sequencing libraries where the plant DNA to be sequenced is properly cloned and subjected to shotgun sequencing through high throughput approaches;
  2. Assembly of the sequence reads in order to reconstruct the genome just sequenced;
  3. Annotation, by assigning putative gene functions to the genic sequences unveiled. This is carried out using computer simulation/analysis (in silico) using powerful software and gene identification algorithms; however, the final biological confirmation of putative gene sequence function is often achieved through further experimentation;
  4. Further analyses such as synteny analyses, search for orthologous genes, patterns of genome duplication/rearrangement/loss, evolutionary analyses, and in some cases SNP identification.

The first plant genome sequenced, Arabidopsis thaliana, was made available in 2000, and since then the genome of over 20 plant species has been sequenced, including some of the major crops for mankind such as barley, maize, potato, soybean, sorghum, tomato and wheat. In addition, horticulturally important crops such as Brassica oleracea, cucumber, melon and watermelon have had their genomes already sequenced, as well as fruit species such as apple, cacao and diploid strawberry. It is particularly encouraging that additionally the genomic sequence of crops such as chickpea (Cicer arietinum), a key staple crop to many people and the second most widely grown legume globally, are also being reported, as this development reflects that cultivated species known as ‘orphan’ crops are also benefiting from genomic sequencing efforts.

Only a few years ago, sequencing a crop genome represented a major, expensive undertaking. However, the molecular and bioinformatic technology and expertise currently available allows the accomplishment of the sequencing and analysis of a crop species genome within a calendar year. It is foreseen that in the years to come, sequencing genomes will become even more widespread and available, and therefore genomic sequences will become a commodity.

The availability of genomic information in an increasing number of crop species would challenge the current paradigm of plant breeding – phenotype-rich, genotype-poor – into one that is phenotype-rich and genotype-rich. This in turn will positively impact plant breeding in a number of ways.

Increasing availability of molecular markers

The use of molecular markers in crop breeding is relatively new and began in the 1980s. Regardless of the chosen approach (QTL mapping, association mapping or genomic selection), until recently a main hurdle to establishing agronomically meaningful marker–trait associations was the availability of genetic markers. The availability of cheap sequencing approaches provides plant breeders and researchers with thousands of SNPs, and has changed this situation dramatically. In some crops there are ten of thousands of SNPs, and even larger numbers available to establish marker–trait associations. Until now, even the most advanced genetic maps in crop species relied upon only hundreds of molecular markers; however, from this point on the number of genetic markers available should not represent a constraint in either developed or developing countries. In addition, not only the genome of specific genotypes could be made available, but also those of a myriad of individuals, which could be part of genetic mapping efforts or from genetic improvement programmes. At the time of writing an exciting new approach, called genotyping-by-sequencing, is being deployed in crop species such as barley and wheat, which enables marker discovery and genotyping at the same time.

Molecular basis of genetic variation

There is increasing evidence provided by sequencing efforts that there is significant structural variation among the genomes of diverse individuals within the same species. In the best reported crop example to date, maize, the comparative analysis of inbred genomic sequences has shown significant differences in gene copy number but also in presence/absence of genes. It was previously thought that the same set of genes existed in different individuals of a given species, and that phenotypic differences arising were due to the several allelic forms that could arise from those genes, or to non-allelic, epistatic interactions. The presence of genes in some maize inbreds and their lack in others might contribute to important biological mechanisms such as heterosis, and also shed light on the basis of quantitative variation and genotype by environment interactions.

Identification of agronomically relevant genes

It is known in cattle and chicken that traits selected by humans have lower levels of variation in ‘improved’ than in ‘non-improved’ individuals. These genomic regions with lower variation and skewed allelic frequencies might harbour important genes associated with domestication processes, and thus agronomically relevant genes or genes controlling the expression of other genes. This has been demonstrated in rice through the sequencing and thorough analysis of the genomes of 50 diverse wild and cultivated rice lines, enabling not only the discovery and identification of over 6 million SNPs, but also thousands of candidate genes that might have been artificially selected during the domestication of this important crop.

Another elegant example from rice has recently enabled the identification of functional nucleotide polymorphisms: a DNA polymorphism associated with the expression of a phenotypic trait, at the gene DTH2, which encodes the locus ‘Days to Heading’ on chromosome 2. These polymorphisms correlate with early flowering and explain the geographical expansion of rice cultivation in Asia, and likely represent a target of human selection for adaptation to long-day photoperiod conditions.

Think questions

  1. List four uses for molecular markers in plant breeding.
  2. Describe any advantages of using molecular markers in plant breeding.
  3. Describe one difficulty that might be encountered in utilizing QTLs in plant breeding selection.
  4. A cross is made between two parents in a hybrid wheat breeding programme, where P1 is the female parent and P2 is the male parent. When you run RFLP analysis you see electrophoretic patterns like those below.

    P1 is the lane pattern for the female parent, P2 the male parent, and other lanes (a to l) were observed from a sample of seed of the F2 progeny.

    image

    Explain what could have caused the pattern shown in lane f. If it is know that Parent 1 has a single gene conferring resistance to yellow stripe rust and that the stripe rust resistance of each genotype/lane is given below each lane (i.e. Res = resistant to strip rust) indicate on the diagram a possible molecular marker for the stripe rust resistance gene.

  5. Describe three applications involving tissue culture or in vitro plant propagation in a plant breeding scheme.