Top texture: © Laguna Design / Science Source;
Chapter Opener: © T-flex/Shutterstock, Inc.
Today, the field of molecular biology focuses on the mechanisms by which cellular processes are carried out by the various biological macromolecules in the cell, with a particular emphasis on the structure and function of genes and genomes. Molecular biology as a field, however, was originally born from the development of tools and methods that allow the direct manipulation of DNA both in vitro and in vivo in numerous organisms.
Two essential items in the molecular biologist’s toolkit are restriction endonucleases, which allow DNA to be cut into precise pieces, and cloning vectors, such as plasmids or phages used to “carry” inserted foreign DNA fragments for the purpose of producing more material or a protein product. The term genetic engineering was originally used to describe the range of manipulations of DNA that become possible with the ability to clone a gene by placing its DNA into another context in which it could be propagated. From this beginning, when recombinant DNA was used as a tool to analyze gene structure and expression, we moved to the ability to change the DNA content of bacteria and eukaryotic cells by directly introducing cloned DNA that could become part of the genome. Then, by changing the genetic content in conjunction with the ability to develop an animal from an embryonic cell, it became possible to generate multicellular eukaryotes with deletions or additions of specific genes that are inherited via the germline. We now use genetic engineering to describe a range of activities including the manipulation of DNA, the introduction of changes into specific somatic cells within an animal or plant, and even changes in the germline itself.
As research has advanced, more and more sensitive methods for detecting and amplifying DNA have been developed. Now that we have entered the era of routine whole-genome sequencing, the function and expression of entire genomes have become commonplace. This chapter discusses some of the most common methods used in molecular biology, ranging from the very first tools developed by molecular biologists to some of the most recently developed methods to assess the content.
Nucleases are one of the most valuable tools in a molecular biology laboratory. One class of enzymes, the restriction endonucleases (discussed shortly), was critical for the cloning revolution. Nucleases are enzymes that degrade nucleic acids, the opposite function of polymerases. They hydrolyze, or break, an ester bond in a phosphodiester linkage between adjacent nucleotides in a polynucleotide chain, as shown in FIGURE 2.1.
FIGURE 2.1 The target of a phosphatase is shown in (a), a terminal phosphomonoester bond. The target of a nuclease is shown in (b), the phosphodiester bond between two adjacent nucleotides. Note that the nuclease can cleave either the first ester bond from the 3′ end of the terminal nucleotide (b1) or the second ester bond from the 5′ end of the next nucleotide (b2). Nucleases can cleave internal bonds (c) as an endonuclease, or begin at an end and progress into the fragment (d) as an exonuclease.
There is another, related class of enzymes that can hydrolyze an ester bond in a nucleotide chain—a monoesterase, usually called a phosphatase. The critical difference between a phosphatase and a nuclease is shown in Figure 2.1. A phosphatase can only hydrolyze a terminal ester bond linking a phosphate (or di- or triphosphate) to a terminal nucleotide at the 3′ or 5′ end, whereas a nuclease can hydrolyze an internal ester bond in a diester link, between adjacent bases.
Phosphatases are important enzymes in the laboratory because they allow the removal of a terminal phosphate from a polynucleotide chain. This is often required for a subsequent step of connecting, or ligating, chains together. This also allows one to replace the phosphate with a radioactive 32P molecule.
Nucleases can be divided into groups based on a number of different features. We can distinguish between endonucleases and exonucleases as shown in Figure 2.1. An endonuclease can hydrolyze internal bonds within a polynucleotide chain, whereas an exonuclease must begin at the end of a chain and hydrolyze from that end position.
The specificity of nucleases ranges from none to extreme. Nucleases can be specific for DNA, as DNases, or RNA, as RNases, or even be specific for a DNA/RNA hybrid, as RNaseH (which cleaves the RNA strand of a hybrid duplex). Nucleases can be specific for either single-stranded nucleotide chains, duplex chains, or both.
When a nuclease—either endo- or exo-—hydrolyzes an ester bond in a phosphodiester linkage, it will have specificity for either of the two ester bonds, generating either 5′ nucleotides or 3′ nucleotides, as shown in Figure 2.1. An exonuclease can attack a polynucleotide chain from either the 5′ end and hydrolyze 5′ to 3′ or attack from the 3′ end and hydrolyze 3′ to 5′ (Figure 2.1).
Nucleases might have a sequence preference, such as pancreatic RNase A, which preferentially cuts after a pyrimidine, or T1 RNase, which cuts single-stranded RNA chains after a G. At the extreme end of sequence specificity lie the restriction endonucleases, usually called restriction enzymes. These are endonucleases from eubacteria and Archaea that recognize a specific DNA sequence. Their name typically derives from the bacteria in which they were discovered. For example, EcoR1 is the first restriction enzyme from an Escherichia coli R strain.
Broadly speaking, there are three different classes of restriction enzymes and several subclasses. In 1978, the Nobel Prize in Medicine was awarded to Daniel Nathans, Werner Arber, and Hamilton Smith for the discovery of restriction endonucleases. It was this discovery that enabled scientists to develop the methods to clone DNA, as shown in the next section. Thousands of restriction enzymes are known, many of which are now commercially available. Restriction enzymes have to do two things: (1) recognize a specific sequence, and (2) cut, or restrict, at or near that sequence.
The type II restriction enzymes (with several subgroups) are the most common. Type II enzymes are distinguished because the recognition site and cleavage site are the same. These sites range in length from 4 to 8 base pairs (bp). The sites are typically inversely palindromic, that is, reading the same forward and backward on complementary strands, as shown in FIGURE 2.2. Restriction enzymes can cut the DNA in two different ways, as demonstrated in Figure 2.2. The first and more common is a staggered cut, which leaves single-stranded overhangs, or “sticky ends.” The overhang can be a 3′ or a 5′ overhang. The second way is a blunt double-stranded cut, which does not leave an overhang. An additional level of specificity determines whether the enzyme will cut DNA containing a methylated base. The degree of specificity in the site also varies. Most enzymes are very specific, whereas some will allow multiple bases at one or two positions within the site.
FIGURE 2.2 (a) A restriction endonuclease may cleave its recognition site and make a staggered cut, leaving a 5′ overhang or a 3′ overhang. (b) A restriction endonuclease may cleave its recognition site and make a blunt end cut.
Restriction enzymes from different bacteria can have the same recognition site but cut the DNA differently. One might make a blunt cut and the other might make a staggered cut, or one might leave a 3′ overhang, whereas the second might leave a 5′ overhang. These different enzymes are called isoschizomers.
Types I and III enzymes differ from type II enzymes in that the recognition site and cleavage site are different and are usually not palindromes. With a type I enzyme, the cleavage site can be up to 1,000 bp away from the recognition site. Type III enzymes have closer cleavage sites, usually 20 to 30 bp away.
A restriction map represents a linear sequence of the sites at which particular restriction enzymes find their targets. When a DNA molecule is cut with a suitable restriction enzyme, it is cleaved into distinct, negatively charged fragments. These fragments can be separated on the basis of their size by gel electrophoresis (described later, in the section DNA Separation Techniques). By analyzing the restriction fragments of DNA, it is possible to generate a map of the original molecule in the form shown in FIGURE 2.3. The map shows the positions at which particular restriction enzymes cut DNA. The DNA is divided into a series of regions of defined lengths that lie between sites recognized by the restriction enzymes. A restriction map can be obtained for any sequence of DNA, irrespective of whether we have any knowledge of its function. If the sequence of the DNA is known, we can generate a restriction map in silico by simply searching for the recognition sites of known enzymes. Knowing the restriction map of a DNA sequence of interest is extremely valuable in DNA cloning, which is described in the next section.
FIGURE 2.3 A restriction map is a linear sequence of sites separated by defined distances on DNA. The map identifies the three sites cleaved by enzyme A and the two sites cleaved by enzyme B. Thus, A produces four fragments, which overlap those of B, and B produces three fragments, which overlap those of A.
Cloning has a simple definition: To clone something is to make an identical copy, whether it is done by a photocopy machine on a piece of paper, cloning Dolly the sheep, or cloning DNA, which is discussed here. Cloning can also be considered an amplification process, in which we currently have one copy and we want many identical copies. Cloning DNA typically involves recombinant DNA. This also has a simple definition: a DNA molecule from two (or more) different sources.
To clone a fragment of DNA, we must create and copy a recombinant DNA molecule many times. There are two different DNAs needed: a vector, or cloning vehicle, and an insert, or the molecule to be cloned. The two most popular classes of vectors are derived from plasmids and viruses, respectively.
Over the years, vectors have been specifically engineered for safety, selection ability, and high growth rate. “Safety” means that the vector will not integrate into a genome (unless engineered specifically for that purpose) and the recombinant vector will not autotransfer to another cell. (We discuss selection later.) In general, about a microgram of vector DNA will be ligated with about a microgram of the insert DNA that we want to clone. Both the vector and insert should be restricted with the same restriction endonuclease to create compatible DNA ends.
Let us now examine the details and the variables that will affect the process, beginning with the insert—the DNA fragment that we want to amplify. The insert could come from one of many different sources, such as restricted genomic DNA—either size selected on an agarose gel or unselected, a larger fragment from another clone to be subcloned (i.e., taking a smaller part of the larger fragment), a PCR fragment (see the section PCR and RT-PCR later in this chapter), or even a DNA fragment synthesized in vitro. The size and the nature of the fragment ends must be known. Are the ends blunt or do they have overhanging single strands (recall the section “Nucleases” earlier in this chapter), and if so, what are their sequences? The answer to this question comes from how the fragments were created (what restriction enzyme[s] were used to cut the DNA, or what PCR primers were used to amplify the DNA).
The vector is selected based on the answers to these questions. For this exercise, a common type of plasmid cloning vector called a blue/white selection vector is used, as shown in FIGURE 2.4. This vector has been constructed with a number of important elements. It has an ori, or origin of replication (see the chapter titled DNA Replication), to allow plasmid replication, which will provide the actual amplification step, in a bacterial cell. It contains a gene that codes for resistance to the antibiotic ampicillin, ampr, which will allow selection of bacteria that contain the vector. It also contains the E. coli lacZ gene (see the chapter titledThe Operon), which will allow selection of an insert DNA fragment in the vector.
FIGURE 2.4 (a) A plasmid that contains three key sites (an origin of replication, ori; a gene for ampicillin resistance, ampr; and lacZ with an MCS), together with the insert DNA to be cloned, is restricted with EcoR1. (b) Restricted insert fragments and vector will be combined and (c) ligated together. The final pool of this DNA will be transformed into E. coli.
The lacZ gene has been engineered to contain a multiple cloning site (MCS). This is an oligonucleotide sequence with a series of different restriction endonuclease recognition sites arranged in tandem in the same reading frame as the lacZ gene itself. This is the heart of blue/white selection. The lacZ gene codes for the β-galactosidase (β-gal) enzyme, which cleaves the galactoside bond in lactose. It will also cleave the galactoside bond in an artificial substrate called X-gal (5-bromo-4-chloro-3-indolyl-beta-D-galactopyranoside), which can be added to bacterial growth media and has a blue color when cleaved by the intact enzyme. If a fragment of DNA is cloned (inserted) into the MCS, the lacZ gene will be disrupted, inactivating it, and the resulting β-gal will no longer be able to cleave X-gal, resulting in white bacterial colonies rather than blue colonies. This is the blue/white selection mechanism.
Let us now begin the cloning experiment. Following along in Figure 2.4, both the vector and the insert are cut with the same restriction enzyme in order to generate compatible single-stranded sticky ends. The variables here are the ability to select different enzymes that recognize different restriction sites as long as they generate the same overhang sequence. An enzyme that makes a blunt cut can also be used, although that will make the next step, ligation, less efficient, but still doable. Two completely different ends with different overhangs can also be used if an exonuclease is used to trim the ends and produce blunt ends. (Continuing with the same reasoning, randomly sheared DNA can also be used if the ends are then blunted for ligation.) If forced to use a type I or type III restriction enzyme, the ends must also be blunted. An important alternative is to use two different restriction enzymes that leave different overhangs on each end. The advantages to this are that neither the vector nor the insert will self-circularize, and the orientation of how the insert goes into the vector can be controlled; this is called directional cloning. Select the vector that has the appropriate restriction endonuclease sites.
The next step is to combine the two pools of DNA fragments, vector and insert, in order to connect or ligate them. A 5- or 10-to-1 molar ratio of insert to vector is usually used. If you use too much vector, vector–vector dimers will be produced. If you use too much insert, multiple inserts per vector will be produced. The size of the insert is important; too large (over ~10 kilobases [kb]) an insert will not be efficiently cloned in a plasmid vector, which will necessitate using an alternative virus-based vector. Ligation is often performed overnight on ice to slow the ligation reaction and generate fewer multimers.
The pool of randomly generated ligated DNA molecules is now used to “transform” E. coli. Transformation is the process by which DNA is introduced into a host cell. E. coli does not normally undergo physiological transformation. As a result, DNA must be forced into the cell. There are two common methods of transformation: washing the bacteria in a high salt wash of calcium chloride (CaCl2), or electroporation, in which an electric current is applied. Both methods create small pores or holes in the cell wall. Even with these methods, only a tiny fraction of bacterial cells will be transformed. The strain of E. coli is important. It should not have a restriction system or a modification system to methylate the incoming DNA. The strain should also be compatible with the blue/white system, which means that it should contain the α-complementing fragment of LacZ (the lacZ gene contained in most plasmids does not function without this fragment). DH5α is a commonly used strain.
Transformation results in a pool of multiple types of bacteria, most of which are not wanted because they either contain a vector with no insert or have not taken up any DNA at all. Select the handful of bacteria that contain recombinant plasmids from the millions that do not. The transformed bacterial cells are plated on an agar plate containing both the antibiotic ampicillin and an artificial β-gal inducer called isopropylthiogalactoside (IPTG). The ampicillin in the plate will kill the vast majority of bacterial cells, namely all of those that have not been transformed with the ampr plasmid. The remaining bacteria can now grow and form visible colonies. As shown in FIGURE 2.5, there are two different types of colonies: blue ones that contain a vector without an insert—because β-gal cleaved X-gal into a blue compound—and white ones, for which the inactivated β-gal did not cleave X-gal and so remained colorless.
FIGURE 2.5 After transformation into E. coli of restricted and ligated vector plus insert DNA, the bacterial cells are plated onto agar plates containing ampicillin, IPTG, and the color indicator, X-gal. Overnight incubation at 37°C will yield both blue and white colonies. The white colonies will be used to prepare DNA for further analysis.
This is not quite the end of the story. False-positive clones, such as those that were formed as vector-only dimers, must be identified and removed. To do so, plasmid DNA must be at least partly purified from each candidate colony, restricted, and run on a gel to check for the insert size. Sequencing the fragment to be absolutely certain a random contaminant has not been cloned is also suggested (see the section DNA Sequencing later in this chapter).
In the example in the section Cloning earlier in the chapter, we described the use of a vector that is designed simply for amplifying insert DNA, with inserts up to ~10 kb. It is often desirable to clone larger inserts, though, and sometimes the goal is not just to amplify the DNA but also to express cloned genes in cells, investigate properties of a promoter, or create various fusion proteins (defined shortly). TABLE 2.1 summarizes the properties of the most common classes of cloning vectors. These include vectors based on bacteriophage genomes, which can be used in bacteria but have the disadvantage that only a limited amount of DNA can be packaged into the viral coat (although more than can be carried in a plasmid). The advantages of plasmids and phages are combined in the cosmid, which propagates like a plasmid but uses the packaging mechanism of phage lambda to deliver the DNA to the bacterial cells. Cosmids can carry inserts of up to 47 kb (the maximum length of DNA that can be packaged into the phage head).
TABLE 2.1 Cloning vectors may be based on plasmids or phages or may mimic eukaryotic chromosomes.
Vector | Features | Isolation of DNA | DNA Limit |
---|---|---|---|
Plasmid | High copy number | Physical | 10 kb |
Phage | Infects bacteria | Via phage packaging | 20 kb |
Cosmid | High copy number | Via phage packaging | 48 kb |
BAC | Based on F plasmid | Physical | 300 kb |
YAC | Origin + centromere + telomere | Physical | > 1 Mb |
Two vectors used for cloning the largest possible DNA inserts are the yeast artificial chromosome (YAC) and the human artificial chromosome (HAC). A YAC has a yeast origin to support replication, a centromere to ensure proper segregation, and telomeres to afford stability. In effect, it is propagated just like a yeast chromosome and can carry inserts measured in the megabase (Mb) length range. The HAC is the newest addition to the line of vectors and it offers the advantage of having virtually unlimited capacity.
There is an extremely useful class of vectors known as shuttle vectors that we can use in more than one species of host cell. The example shown in FIGURE 2.6 contains origins of replication and selectable markers for both E. coli and the yeast Saccharomyces cerevisiae. It can replicate as a circular multicopy plasmid in E. coli. It has a yeast centromere, and it also has yeast telomeres adjacent to BamHI restriction sites so that cleavage with BamHI generates a YAC that can be propagated in yeast.
FIGURE 2.6 pYAC2 is a cloning vector with features to allow replication and selection in both bacteria and yeast. Bacterial features (shown in blue) include an origin of replication and antibiotic resistance gene. Yeast features (shown in red and yellow) include an origin, centromere, two selectable markers, and telomeres.
Other vectors, such as expression vectors, can contain promoters to drive expression of genes. Any open reading frame can be inserted into the vector and expressed without further modification. These promoters can be continuously active, or they can be inducible so that they are only expressed under specific conditions.
Alternatively, the goal might be to study the function of a cloned promoter of interest in order to understand the normal regulation of a gene. In this case, rather than using the actual gene, we can use an easily detected reporter gene under control of the promoter of interest.
The type of reporter gene that is most appropriate depends on whether we are interested in quantitating the efficiency of the promoter (and, for example, determining the effects of mutations in it or the activities of transcription factors that bind to it) or determining its tissue-specific pattern of expression. FIGURE 2.7 summarizes a common system for assaying promoter activity. A cloning vector is created that has a eukaryotic promoter linked to the coding region of luciferase, a gene that encodes the enzyme responsible for bioluminescence in the firefly. In general, a transcription termination signal is added to ensure the proper generation of the mRNA. The hybrid vector is introduced into target cells, and the cells are grown and subjected to any appropriate experimental treatments. The level of luciferase activity is measured by addition of its substrate luciferin. Luciferase activity results in light emission that can be measured at 562 nanometers (nm) and is directly proportional to the amount of enzyme that was made, which in turn depends upon the activity of the promoter.
FIGURE 2.7 Luciferase (derived from fireflies such as the one shown here) is a popular reporter gene. The graph shows the results from mammalian cells transfected with a luciferase vector driven by a minimal promoter or the promoter plus a putative enhancer. The levels of luciferase activity correlate with the activities of the promoters.
Photo © Cathy Keifer/Dreamstime.com.
Some very striking reporters are now available for visualizing gene expression. The lacZ gene, described in the blue/white selection strategy earlier, also serves as a very useful reporter gene. FIGURE 2.8 shows what happens when the lacZ gene is placed under the control of a promoter that regulates the expression of a gene in the nervous system. The tissues in which this promoter is normally active can be visualized by providing the X-gal substrate to stain the embryo.
FIGURE 2.8 Expression of a lacZ gene can be followed in the mouse by staining for β-gal (in blue). In this example, lacZ was expressed under the control of a promoter of a mouse gene that is expressed in the nervous system. The corresponding tissues can be visualized by blue staining.
Photo courtesy of Robb Krumlauf, Stowers Institute for Medical Research.
One of the most popular reporters that can be used to visualize patterns of gene expression is green fluorescent protein (GFP), which is obtained from jellyfish. GFP is a naturally fluorescent protein that, when excited with one wavelength of light, emits fluorescence in another wavelength. In addition to the original GFP, numerous variants that fluoresce in different colors, such as yellow (YFP), cyan (CFP), and blue (BFP), have been developed. We can use GFP and its variants as reporter genes on their own, or we can use them to generate fusion proteins in which a protein of interest is fused to GFP and can thus be visualized in living tissues, as is shown in the example in FIGURE 2.9.
(a)
(b)
FIGURE 2.9 (a) Since the discovery of GFP, derivatives that fluoresce in different colors have been engineered. (b) A live transgenic mouse expressing human rhodopsin (a protein expressed in the retina of the eye) fused to GFP.
(a) Photo courtesy of Joachim Goedhart, Molecular Cytology, SILS, University of Amsterdam. (b) © Eye of Science/Science Source.
Vectors are introduced into different species in a variety of ways. Bacteria and simple eukaryotes like yeast can be transformed easily, using chemical treatments that permeabilize the cell membranes (as discussed in the section Cloning earlier in this chapter). Many types of cells cannot be transformed so easily, though, and we must use other methods, as summarized in FIGURE 2.10. Some types of cloning vectors use natural methods of infection to pass the DNA into the cell, such as a viral vector that uses the viral infective process to enter the cell. Liposomes are small spheres made from artificial membranes, which can contain DNA or other biological materials. Liposomes can fuse with plasma membranes and release their contents into the cell. Microinjection uses a very fine needle to puncture the cell membrane. A solution containing DNA can be introduced into the cytoplasm or directly into the nucleus for cases in which the nucleus is large enough to be chosen as a target (such as an egg). The thick cell walls of plants are an impediment to many transfer methods; thus, the “gene gun” was invented as a means to overcome this obstacle. A gene gun shoots very small particles into the cell by propelling them through the wall at high velocity. The particles can consist of gold or nanospheres coated with DNA. This method now has been adapted for use with a variety of species, including mammalian cells.
FIGURE 2.10 DNA can be released into target cells by methods that pass it across the membrane naturally, such as by means of a viral vector (in the same way as a viral infection) or by encapsulating it in a liposome (which fuses with the membrane). Alternatively, it can be passed manually, by microinjection, or by coating it on the exterior of nanoparticles that are shot into the cell by a “gene gun” that punctures the membrane at very high velocity.
There are a number of different ways to detect DNA and RNA. The classical method relies on the ability of nucleic acids to absorb light at 260 nanometers. The amount of light absorbed is proportional to the amount of nucleic acid present. There is a slight difference in the amount of absorption by single-stranded versus double-stranded nucleic acids, but not DNA versus RNA. Protein contamination can affect the outcome, but because proteins absorb maximally at 280 nm, tables have been published of 260/280 ratios that allow quantitation of the amount of nucleic acid present.
DNA and RNA can be nonspecifically stained with ethidium bromide (EtBr) to make visualization more sensitive. EtBr is an organic tricyclic compound that binds strongly to double-stranded DNA (and RNA) by intercalating into the double helix between the stacked base pairs. It binds to DNA, thus is a strong mutagen and care must be taken when using it. EtBr fluoresces when exposed to ultraviolet (UV) light, which increases the sensitivity. SYBR green is a safer alternate DNA stain.
We now focus on the detection of specific sequences of nucleic acids. The ability to identify a specific sequence relies on hybridization of a probe with a known sequence to a target. The probe can detect and bind to a sequence to which it is complementary. The percentage of match does not need to be perfect, but as the match percentage decreases, the stability of the nucleic acid hybrid decreases. G-C base pairs are more stable than A-T base pairs so that base composition (usually referred to as % G-C) is an important variable. The second set of variables that affects hybrid stability is extrinsic; it includes the buffer conditions (concentration and composition) and the temperature at which hybridization occurs. This is called the stringency, under which the hybridization is carried out.
The probe functions as a single-stranded molecule (if it is double stranded, it must be melted). The target can be single stranded or double stranded. If the target is double stranded, it also must be melted to single strands to begin the hybridization process. The reaction can take place in solution (e.g., during sequencing or PCR; see the sections DNA Sequencing and PCR and RT-PCR later in this chapter), or it can be performed when the target has been bound to a membrane support such as a nitrocellulose filter (see the section Blotting Methods later in this chapter). The target can be DNA (called a Southern blot) or RNA (called a Northern blot); the probe is usually DNA.
For this exercise, let’s use a Southern blot from an experiment in which we have restricted a large DNA fragment into smaller fragments and subcloned the individual fragments (see the section Cloning earlier in this chapter). Starting with the clones on the plate from Figure 2.5, we can isolate plasmid DNA from each white clone and restrict the DNA with the same restriction enzymes used to clone the fragments. The DNA fragments will be separated on an agarose gel and blotted onto nitrocellulose (see the section DNA Separation Techniques later in this chapter).
To increase the sensitivity from the optical range, the probe must be labeled. Begin with radiolabeling and then describe alternate labeling without radioactivity. For most reactions, 32P is used, but 33P (with a longer half-life but less penetrating ability) and 3H (for special purposes described later) are also used. Probes can be radiolabeled in several different ways. One is end labeling, in which a strand of DNA (that has no 5′ phosphate) is labeled by using a kinase and 32P. Alternatively, a probe can be generated by nick translation or random priming with 32P using the Klenow DNA polymerase fragment and labeled nucleotides (see the chapter titled DNA Replication) or during a PCR reaction (see the section PCR and RT-PCR later in this chapter).
In performing nucleic acid hybridization studies, standard procedures are typically used that allow hybridization over a large range of G-C content. Hybridization experiments are performed in a standardized buffer called standard sodium citrate (SSC), which is usually prepared as a 20× concentrated stock solution. Hybridization is typically carried out within a standard temperature range of 45°C to 65°C, depending upon the required stringency.
The actual hybridization between a labeled probe and a target DNA bound to a membrane usually takes place in a closed (or sealed) container in a buffer that contains a set of molecules to reduce background hybridization of the probe to the filter. Hybridization experiments typically are performed overnight to ensure maximum probe-to-target hybridization. The hybridization reaction is stochastic and depends upon the abundance of each different sequence. The more copies of a sequence, the greater the chance of a given probe molecule encountering its complementary sequence.
The next step is to wash the filter to remove all of the probe that is not specifically bound to a complementary sequence of nucleic acid. Depending on the type of experiment, the stringency of the wash is usually set quite high to avoid spurious results. Higher stringency conditions include higher temperature (closer to the melting temperature of the probe) and lower concentration of cations. (Lower salt concentrations result in less shielding of the negative phosphate groups of the DNA backbone, which in turn inhibits strand annealing.) In some experiments, however, where one is looking specifically for hybridization to targets with a lower percentage of match (e.g., finding a copy of species X DNA using a probe from species Y), hybridization would be performed at lower stringency.
The last step is the identification of which target DNA band on the gel (and thus the filter) has been bound by the radiolabeled probe. The washed nitrocellulose filter is subjected to autoradiography. The dried filter will be placed against a sheet of x-ray film. To amplify the radioactive signal, intensifying screens can be used. These are special screens placed on either side of the filter/film pair that act to bounce the radiation back through the film. Alternatively, a phosphorimaging screen (a solid-state liquid scintillation device) can be used. This is more sensitive and faster than X-ray film, but results in somewhat lower resolution. The length of time for autoradiography is empirical. An estimate of the total radioactivity can be made with a handheld radiation monitor. Sample results are shown in FIGURE 2.11. One band on the filter has blackened the X-ray film. The film can be aligned to the filter to determine which band corresponds to the probe.
FIGURE 2.11 A cartoon of an autoradiogram of a gel prepared from the colonies described in Figure 2.5. The gel was blotted onto nitrocellulose and probed with a radioactive gene fragment. Lane 1 contains a set of standard DNA size markers. Lane 2 is the original vector cleaved with EcoR1. Lanes 3 to 6 each contain plasmid DNA from one of the white clones from Figure 2.4 that was restricted with EcoR1. A cartoon of the photograph of the gel is on the left; the radioactive bands are marked with an asterisk.
Using a simple modification of the autoradiography procedure called in situ hybridization allows one to peer into a cell and determine the location, at a microscopic level, of specific nucleic acid sequences. We simply modify a few steps in the process to perform the hybridization between our probe, usually labeled with 3H, and complementary nucleic acids in an intact cell or tissue. The goal is to determine exactly where the target is located. The cell or tissue slice is mounted on a microscope slide. Following hybridization, a photographic emulsion instead of film is applied to the slide, covering it. The emulsion, when developed, is transparent to visible light so that it is possible to see the exact location in the cell where the grains in the emulsion blackened by the radioactivity are located. Development time can be weeks to months because 3H has less energetic radiation and its longer half-life results in lower activity.
There are nonradioactive alternatives to the procedures described here that use either colorimetric or fluorescence labeling. A digoxygenin-labeled probe is a commonly used colorimetric procedure. The probe bound to target is localized with an anti-digoxygenin antibody coupled to alkaline phosphatase to develop color. The advantage is the time required to see the results. It is typically a single day, but sensitivity is usually less than with radioactivity. Fluorescence in situ hybridization (FISH) is another very common nonradioactive procedure that uses a fluorescently labeled probe. This method is illustrated in FIGURE 2.12. Multiple fluorophores in different colors are available—about a dozen now—but ratios of different probe color combinations can be used to create additional colors.
FIGURE 2.12 Fluorescence in situ hybridization (FISH).
Data from an illustration by Darryl Leja, National Human Genome Research Institute (www.genome.gov).
These procedures are more picturesque but less quantitative than traditional scintillation counting. At best, they can be called semiquantitative. It is possible to use an optical scanner to quantitate the amount of signal produced on film, but care must be taken to ensure the time of exposure during the experiment is within a linear range.
With a few exceptions, the individual pieces of DNA (chromosomes) making up a living organism’s genome are on the order of Mb in length, making them too physically large to be manipulated easily in the laboratory. Individual genes or chromosomal regions of interest by contrast are often quite small and readily manageable, on the order of hundreds or a few thousand bp in length. A necessary first step, therefore, in many experimental processes investigating a specific gene or region, is to break the large original chromosomal DNA molecule down into smaller manageable pieces and then begin isolation and selection of the particular relevant fragment or fragments of interest. This breakage can be done by mechanical shearing of chromosomes, in a process that produces breakages randomly to produce a uniform size distribution of assorted molecules. This approach is useful if randomness in breakpoints is required, such as to create a library of short DNA molecules that “tile” or partially overlap one another while together representing a much larger genomic region, such as an entire chromosome or genome. Alternatively, restriction endonucleases (see the section Nucleases earlier in this chapter) can be employed to cut large DNA molecules into defined shorter segments in a way that is reproducible. This reproducibility is frequently useful, in that a DNA section of interest can be identified in part by its size. Consider a hypothetical gene, genX, on a bacterial chromosome, with the entire gene lying between two EcoRI sites spaced 2.3 kb apart. Digestion of the bacterial DNA with EcoRI will yield a range of small DNA molecules, but genX will always occur on the same 2.3-kb fragment. Depending on the size and complexity of the starting genome, there might be several other DNA segments of similar size produced, or in a simple enough system, this 2.3-kb size might be unique to the genX fragment. In this latter case, detection or visualization of a 2.3-kb fragment is enough to definitively identify the presence of genX. Many of the earliest laboratory techniques developed in working with DNA relate to separating and concentrating DNA molecules based on size expressly to take advantage of these concepts. The ability to separate DNA molecules based on size allows for taking a complex mixture of many fragment sizes and selecting a much smaller, less complex subset of interest for further study.
The simplest method for separation and visualization of DNA molecules based on size is gel electrophoresis. In neutral agarose gel (the most basic type of gel), electrophoresis is done by preparing a small slab of gel in an electrically conductive, mildly basic buffer. Although similar to the gelatins used to make dessert dishes, this type of gel is made from agarose, a polysaccharide that is derived from seaweed and has very uniform molecular sizes. Preparation of agarose gels of a specific percentage of agarose by mass (usually in the range of 0.8%–3%) creates, in effect, a molecular sieve, with a “mesh” pore size being determined by the percentage of agarose (higher percentages yielding smaller pores). The gel is poured in a molten state into a rectangular container, with discrete wells being formed near one end of the product. After cooling and solidifying, the slab is submerged in the same conductive, mildly alkaline buffer and samples of mixed DNA fragments are placed in the preformed wells. A DC electric current is then applied to the gel, with the positive charge being at the opposite end of the gel from the wells. The alkalinity of the solution ensures that the DNA molecules have a uniform negative charge from their backbone phosphates, and the DNA fragments begin to be drawn electrostatically toward the positive electrode. Shorter DNA fragments are able to move through the agarose pores with less resistance than longer fragments, and so over time the smallest DNA molecules move the farthest from the wells and the largest move the least. All fragments of a given size will move at about the same rate, effectively concentrating any population of equal-sized molecules into a discrete band at the same distance from the well. The addition of a DNA-binding fluorescent dye to the gel, such as ethidium bromide or SYBR green, stains these DNA bands such that they can be directly seen by eye when the gel is exposed to fluorescence-exciting light. In practice, a standard sample consisting of a set of DNA molecules of a known size is run in one of the wells, with sizes of bands in other wells estimated in comparison to the standard, as shown in FIGURE 2.13. DNA molecules of roughly 50 to 10,000 bp can be quickly separated, identified, and sized to within about 10% accuracy by this simple method, which remains a common laboratory technique. DNA molecules can be separated not only by size but also by shape. Supercoiled DNA, which is compact compared to relaxed or linear DNA, migrates more rapidly on a gel, and the more supercoiling, the faster the migration, as shown in FIGURE 2.14.
FIGURE 2.13 DNA sizes can be determined by gel electrophoresis. (a) A DNA of standard size and a DNA of unknown size are run in two lanes of a gel, depicted schematically. (b) The migration of the DNAs of known size in the standard is graphed to create a standard curve (migration distance in cm versus log bp). The point shown in green is for the DNA of unknown size.
Data from an illustration by Michael Blaber, Florida State University.
FIGURE 2.14 Supercoiled DNA molecules separated by agarose gel electrophoresis. Lane 1 contains untreated negatively supercoiled DNA (lower band). Lanes 2 and 3 contain the same DNA that was treated with a type 1 topoisomerase for 5 and 30 minutes, respectively. The topoisomerase makes a single-strand break in the DNA and relaxes negative supercoils in single steps (one supercoil relaxed per strand broken and reformed).
Reproduced from: Keller, W. 1975. Proc Natl Acad Sci USA 72:2550–2554. Photo courtesy of Walter Keller, University of Basel.
Variations on this method primarily relate to changing the gel matrix from agarose to other molecules such as synthetic polyacrylamides, which can have even more precisely controlled pore sizes. These can offer finer size resolution of DNA molecules from roughly 10 to 1,500 base pairs in size. Both resolution and sensitivity are further improved by making these types of gels as thin as possible, normally requiring that they be formed between glass plates for mechanical strength. When chemical denaturants such as urea are added to the buffer system, the DNA molecules are forced to unfold (losing any secondary structures) and take on hydrodynamic properties related only to molecule length. This approach can clearly resolve DNA molecules differing in length by only a single nucleotide. Denaturing polyacrylamide electrophoresis is a key component of the classic DNA sequencing technique whereby the separation and detection of a series of single nucleotide–length difference DNA products allows for the reading of the underlying order of nucleotide bases.
Another method for separating DNA molecules from other contaminating biomolecules, or in some cases for fractionation of specific small DNA molecules from other DNAs, is through the use of gradients, as depicted in FIGURE 2.15. The most frequent implementation of this is isopycnic banding, which is based on the fact that specific DNA molecules have unique densities based on their G-C content. Under the influence of extreme g-forces, such as through ultracentrifugation, a high-concentration solution of a salt (such as cesium chloride) will form a stable density gradient from low density (near top of tube/center of rotor) to high density (near bottom of tube/outside of rotor). When placed on top of this gradient (or even mixed uniformly within the gradient) and subjected to continued centrifugation, individual DNA molecules will migrate to a position in the gradient where their density matches that of the surrounding medium. Individual DNA bands can then be either visualized (e.g., through the incorporation of DNA-binding fluorescent dyes in the gradient matrix and exposure to fluorescence excitation) or recovered by careful puncture of the centrifuge tube and fractional collection of the tube contents. This method can also be used to separate double-stranded from single-stranded molecules and RNA from DNA molecules, again based solely on density differences.
FIGURE 2.15 Gradient centrifugation separates samples based on their density.
Choice of the gradient matrix material, its concentration, and the centrifugation conditions can influence the total density range separated by the process, with very narrow ranges being used to fractionate one particular type of DNA molecule from others, and wider ranges being used to separate DNAs in general from other biomolecules. Historically, one of the best known uses of this technique was in the Meselson–Stahl experiment of 1958 (introduced in the Genes Are DNA and Encode RNAs and Polypeptides chapter), in which the stepwise density changes in the DNA genomes of bacteria shifted from growth in “heavy” nitrogen (15N) to “regular” nitrogen (14N) were observed. The method’s capacity to differentially band DNA with pure 15N, half 15N/half 14N, and pure 14N conclusively demonstrated the semiconservative nature of DNA replication. Now, the method is most frequently employed as a large-scale preparative purification technique with wider density ranges to purify DNAs as a group away from proteins and RNAs.
The classic method of DNA sequencing called dideoxy sequencing has not changed significantly since Frederick Sanger and colleagues developed the technique in 1977. This method requires many identical copies of the DNA, either through cloning or by PCR, an oligonucleotide primer that is complementary to a short stretch of the DNA, DNA polymerase, deoxynucleotides (dNTPS: dATP, dCTP, dGTP, and dTTP), and dideoxynucleotides (ddNTPS). Dideoxynucleotides are modified nucleotides that can be incorporated into the growing DNA strand but lack the 3′ hydroxyl group needed to attach the next nucleotide. Thus, their incorporation terminates the synthesis reaction. The ddNTPs are added at much lower concentrations than the normal nucleotides so that they are incorporated at low rates, randomly.
Originally, four separate reactions were necessary, with a single different ddNTP added to each one. The reason for this was that the strands were labeled with radioisotopes and could not be distinguished from each other on the basis of the label. Thus, the reactions were loaded into adjacent lanes on a denaturing acrylamide gel and separated by electrophoresis at a resolution that distinguished between strands differing by a length of one nucleotide. The gel was transferred to a solid support, dried, and exposed to film. The results were read from top to bottom, with a band appearing in the ddATP lane indicating that the strand terminated with an adenine, the next band appearing in the ddTTP lane indicating that the next base was a thymine, and so on. Read lengths were typically 500 to 1,000 bp.
A major advance was the use of a different fluorescent label for each ddNTP in place of radioactivity. This allowed a single reaction to be run that is read as the strands are hit with a laser and pass by an optical sensor. The information about which ddNTP terminated the fragment is fed directly into a computer. The second modification was the replacement of large slabs of polyacrylamide gels with very thin, long, glass capillary tubes filled with gel (as described previously in the section DNA Separation Techniques). These tubes can dissipate heat more rapidly, allowing the electrophoresis to be run at a higher voltage, greatly reducing the time required for separation. A schematic illustrating this process is shown in FIGURE 2.16. As the figure illustrates, the process is automated and machine based. These modifications, with their resulting automation and increased throughput, ushered in the era of whole-genome sequencing. This was the process used to sequence the first set of genomes, including the human genome. It was relatively slow and very expensive. The determination of the human genome sequence took several years and cost several billion dollars to complete.
FIGURE 2.16 DideoxyNTP sequencing using fluorescent tags.
The next generation of sequencing technologies that followed sought to eliminate the need for time-consuming gel separation and reliance on human labor. Modifications of procedures and new instrumentation beginning in about 2005—sometimes called next-generation sequencing (NGS) or (now) second-generation NGS—aided in the automation and scaling up of the procedure. This still required PCR amplification of the starting material, which is first randomly fragmented and then amplified. Individual amplified fragments (typically very short—a few hundred bp) are anchored to a solid support and read out one base, in one set of fragments, at a time, in a massively parallel array. These modifications allow sequencing on a very large scale at a much lower cost per kb of DNA than the original first-generation methods.
This technology, sometimes called sequencing-by-synthesis or wash-and-scan sequencing, relies on the detection and identification of each nucleotide as it is added to a growing strand. In one such application, the primer is tethered to a glass surface and the complementary DNA to be sequenced anneals to the primer. Sequencing proceeds by adding polymerase and fluorescently labeled nucleotides individually, washing away any unused dNTPs. After illuminating with a laser, the nucleotide that has been incorporated into the DNA strand can be detected. Other versions use nucleotides with reversible termination so that only one nucleotide can be incorporated at a time even if there is a stretch of homopolymeric DNA (such as a run of adenines). Still another version, called pyrosequencing, detects the release of pyrophosphate from the newly added base. These second-generation systems utilize amplification of material to produce massively parallel analysis runs, but the drawback is that there are typically very short read lengths. The data then require computation to stitch them together into what are called contigs (contiguous sequences).
Technology is now moving from this second generation to a set of third-generation NGS systems. Third-generation sequencing is a collection of methods that avoids the problems of amplification by direct sequencing of the material, DNA or RNA, still giving multiple short (but longer than second-generation sequencing) reads by using single-molecule sequencing (SMS) templates fixed to a surface for sequencing. Again, different companies are proposing different platforms that use different methods to examine the single molecules of DNA. Among these real-time sequencing methods in development are nanopore sequencing and tunneling currents sequencing. The first aims to detect individual nucleotides as a DNA sequence is run through a silicone nanopore, the second, through a channel. Tiny transistors are used to control a current passing through the pore. As a nucleotide passes through, it disturbs the current in a manner unique to its chemical structure. If successful, these technologies have the advantage of reading DNA by simply using electronics, with no chemistry or optical detection required. Nevertheless, there are many kinks to work out of the process before it becomes feasible. Other methods under development include examination by electron microscopy and single-base synthesizing. The accuracy might not be as high as second-generation systems, but read lengths are longer, approaching 1,000 bp.
Few advances in the life sciences have had the broad-reaching and even paradigm-shifting impact of the polymerase chain reaction (PCR). Although evidence exists that the underlying core principles of the method were understood and in fact used in practice by a few isolated people prior to 1983, credit for independent conceptualization of the mature technology and foresight of its applications must go to Kary Mullis, who was awarded the 1993 Nobel Prize in Chemistry for his insight.
The underlying concepts are simple and based on the knowledge that DNA polymerases require a template strand with an annealed primer containing a 3′ hydroxyl to commence strand extension. The steps of PCR are illustrated in FIGURE 2.17. While in the context of normal cellular DNA replication (see the chapter titled DNA Replication) this primer is in the form of a short RNA molecule provided by DNA primase, it can equally well be provided in the form of a short, single-stranded synthetic DNA oligonucleotide having a defined sequence complementary to the 3′ end of any known sequence of interest. Heating of the double-stranded target sequence of interest (known as the “template molecule,” or just “template” for short) to near 100°C in an appropriate buffer causes thermal denaturation as the template strands melt apart from each other (Figure 2.17a and b). Rapid cooling to the annealing temperature (or Tm) of the primer/template pair and a vast molar excess of the short, kinetically active synthetic primer ensures that a primer molecule finds and appropriately anneals to its complementary target sequence more rapidly than the original opposing strand can do so (Figure 2.17c). If presented to a polymerase, this annealed primer presents a defined location from which to commence primer extension (Figure 2.17d). In general, this extension will occur until either the polymerase is forced off the template or it reaches the 5′ end of the template molecule and effectively runs out of template to copy.
FIGURE 2.17 Denaturation (a) and rapid cooling (b) of a DNA template molecule in the presence of excess primer allow the primer to hybridize to any complementary sequence region of the template (c). This provides a substrate for polymerase action and primer extension (d), creating a complementary copy of one template strand downstream from the primer.
The ingenuity of PCR arises from simultaneously incorporating a nearby second primer of opposing polarity (i.e., complementary to the opposite strand to which the first primer anneals) and then subjecting the mixture of template, two primers (at high concentrations), thermostable DNA polymerase, and dNTP containing polymerase buffer to repeated cycles of thermal denaturation, annealing, and primer extension. Consider just the first cycle of the process: Denaturation and annealing occur as described earlier, but with both primers, creating the situation depicted in FIGURE 2.18. If polymerase extension is allowed to proceed for a short period of time (on the order of 1 minute per 1,000 base pairs), each of the primers will be extended out and past the location of the other, thus creating a new complementary annealing site for the opposing primer. Raising the temperature back to denaturation stops the primer elongation process and displaces the polymerases and newly created strands. As the system is cooled again to the annealing temperature, each of the newly formed short, single DNA strands serves as an annealing site for its opposite polarity primer. In this second thermal cycle, extension of the primers proceeds only as far as the template exists—that is, the 5′ end of the opposing primer sequence. The process has now made both strands of the short, defined, precisely primer-to-primer DNA sequence. Repeating the thermal steps of denaturation, annealing, and primer extension leads to an exponential increase (2N, where N is the number of thermal cycles) in the number of this defined product, allowing for phenomenal levels of “sequence amplification.” Close consideration of the process reveals that even though this also creates uncertain length products from the extension of each primer off the original template molecule with each cycle, these products accrue in a linear fashion and are quickly vastly outnumbered by the primer-to-primer defined product, known as the amplicon. In fact, within 40 thermal cycles of an idealized PCR reaction, a single template DNA molecule generates approximately 1012 amplicons—more than enough to go from an invisible target to a clearly visible fluorescent dye–stained product.
FIGURE 2.18 Thermally driven cycles of primer extension where primers of opposite polarity have nearby priming sites on each of the two template strands lead to the exponential production of the short, primer-to-primer–defined sequence (the “amplicon”).
Perhaps not surprisingly, there are many technical complexities underlying this deceptively simple description. Primer design must take into account issues such as DNA secondary structures, uniqueness of sequence, and similarity of Tm between primers. Use of a thermostable polymerase (that is, one that is not inactivated by the high temperatures used in the denaturation steps) is an essential concept identified by Mullis and coworkers. Within this constraint, however, different enzyme sources with differing properties (e.g., exonuclease activities for increased accuracy) can be exploited to meet individual application needs. Buffer composition (including agents such as DMSO to help reduce secondary structural barriers to effective amplification, and inclusion of divalent cations such as Mg2+ at sufficient concentration not to be depleted by chelation to nucleotides) often needs some optimization for effective reactions. In general, the PCR process works best when the primers are within short distances of each other (100 to 500 base pairs), but well optimized reactions have been successful at distances into the tens of kilobases. “Hot start” techniques—frequently through covalent modification of the polymerase—can be employed to ensure that no inappropriate primer annealing and extension can occur prior to the first denaturation step, thereby avoiding the production of incorrect products. Generally, somewhere around 40 thermal cycles marks an effective limit for a PCR reaction with good kinetics in the presence of appropriate template, as depletion of dNTPs into amplicons effectively occurs around this point and a “plateau phase” occurs wherein no more product is made. Conversely, if the appropriate template was not present in the reaction, proceeding beyond 40 cycles primarily increases the likelihood of production of rare, incorrect products.
Pairing PCR with a preliminary reverse transcription step (either random-primed or using one of the PCR primers to direct activity of the RNA-dependent DNA polymerase [reverse transcriptase]) allows for RNA templates to be converted to cDNA and then subject to regular PCR, in a variation known as reverse transcription PCR (RT-PCR). In general, the subsequent discussion uses the term PCR to refer to both PCR and RT-PCR.
Detection of PCR products can be done in a number of ways. Postreaction “endpoint techniques” include gel electrophoresis and DNA-specific dye staining. Long a staple of molecular biological techniques (described earlier in the section DNA Separation Techniques), this is a simple but effective technique to rapidly visualize both that an amplicon was produced and that it is of an expected size. If the particular application requires exact, to-the-nucleotide product sizing, capillary electrophoresis can be used instead. Hybridization of PCR products to microarrays or suspension bead arrays can be used to detect specific amplicons when more than one product sequence might come out of an assay. These in turn use a variety of methods for amplicon labeling, including chemiluminescence, fluorescence, and electrochemical techniques. Alternatively, real-time PCR methodologies employ some way of directly detecting the ongoing production of amplicons in the reaction vessel, most commonly through monitoring a direct or indirect fluorescence change linked to amplicon production by optical methods. These methods allow the reaction vessel to stay sealed throughout the process. In contrast to endpoint methods for which final amplicon concentration bears little relationship to starting template concentration, real-time methods show good correlations between the thermocycle number at which clear signals are measurable—usually referred to as the threshold cycle(CT)—and the starting template concentration. Thus, real-time methods are effective template quantification approaches. As a result, these methods are often referred to as quantitative PCR (qPCR) methods.
Conceptually, the simplest method for real-time PCR detection is based on the use of dyes that selectively bind and become fluorescent in the presence of double-stranded DNA, such as SYBR green. Production of a PCR product during thermocycling leads to an exponential increase in the amount of double-stranded product present at the annealing and extension thermal steps of each cycle. The real-time instrument monitors fluorescence in each reaction tube during these thermal steps of each cycle and calculates the change in fluorescence per cycle to generate a sigmoidal amplification curve. A cutoff threshold value placed approximately midrange in the exponential phase of this curve is used for calculating the CT of each sample and can be used for quantitation if appropriate controls are present.
A potential issue with this approach is that the reporter dyes are not sequence specific, so any spurious products produced by the reaction can lead to false-positive signals. In practice, this is usually controlled for by performance of a melt point analysis at the end of regular thermocycling. The reaction is cooled to the annealing temperature, and then the temperature is slowly raised while fluorescence is constantly monitored. Specific amplicons will have a characteristic melt point at which fluorescence is lost, whereas nonspecific amplicons will demonstrate a broad range of melt points, giving a gradual loss in sample fluorescence.
A number of alternate approaches use probe-based fluorescence reporters, which avoid this potential nonspecific signal. Probe-based approaches work through the application of a process called fluorescence resonant energy transfer (FRET). In simple terms, FRET occurs when two fluorophores are in close proximity and the emission wavelength of one (the reporter) matches the excitation wavelength of the other (the quencher). Photons emitted at the reporter dye emission wavelength are effectively captured by the nearby quencher dye and reemitted at the quencher emission wavelength. In the simplest form of this approach, two short oligonucleotide probes with homology to adjoining sequences within the expected amplicon are included in the assay reaction; one probe carries the reporter dye, and the other the quencher. If specific PCR product is formed in the reaction, at each annealing step these two probes can anneal to the single-stranded product and thereby place the reporter and quencher molecules close to each other. Illumination of the reaction with the excitation wavelength of the reporter dye will lead to FRET and fluorescence at the quencher dye’s characteristic emission frequency. By contrast, if the homologous template for the probe molecules is not present (i.e., the expected PCR product), the two dyes will not be colocalized and excitation of the reporter dye will lead to fluorescence at its emission frequency. This is illustrated in FIGURE 2.19. As with the DNA-binding dye approach, the real-time instrument monitors the quencher emission wavelength during each cycle and generates a similar sigmoidal amplification curve. Multiple alternate ways of exploiting FRET for this process exist, including 5′ fluorogenic nuclease assays, molecular beacons, and molecular scorpions. Although the details of these differ, the underlying concept is similar and all generate data in a similar fashion.
FIGURE 2.19 Fluorescence resonant energy transfer (FRET) occurs only when the reporter and quencher fluorophores are very close to each other, leading to the detection of light at the quencher emission frequency when the reporter is stimulated by light of its excitation frequency. If the reporter and quencher are not colocalized, stimulation of the reporter instead leads to detection of light at the reporter emission frequency. By placing the reporter and quencher fluorophores on single-stranded nucleic acid probes complementary to the expected amplicon, different variations on this method can be designed such that the occurrence of FRET can be used to monitor the production of sequence-specific amplicons.
The applications of the PCR process are incredibly diverse. The simple appearance or nonappearance of an amplicon in a properly controlled reaction can be taken as evidence for the presence or absence, respectively, of the assay target template. This leads to medical applications such as the detection of infectious disease agents at sensitivities, specificities, and speeds much greater than alternate methods. Whereas the two primer sites must be of known sequence, the internal section can be any sequence of a general length, which leads directly to applications for which a PCR product for a region known to vary between species (or even between individuals) can be produced and subject to sequence analysis to identify the species (or individual identity, in the latter case) of the sample template. Coupled with single-molecule sensitivity, this has provided criminal forensics with tools powerful enough to identify individuals from residual DNA on crime scene evidence as small as cigarette butts, smudged fingerprints, or a single hair. Evolutionary biologists have made use of PCR to amplify DNA from well-preserved samples, such as insects encased in amber millions of years old, with subsequent sequencing and phylogenetic analysis, yielding fascinating results on the continuity and evolution of life on Earth. Quantitative real-time approaches have applications in medicine (e.g., monitoring viral loads in transplant patients), research (e.g., examining transcriptional activation of a specific target gene in a single cell), or environmental monitoring (e.g., water purification quality control).
In general, PCR reactions are run with carefully optimized Tm values that maximize sensitivity and amplification kinetics while ensuring that primers will only anneal to their exact hybridization matches. Lowering the Tm of a PCR reaction—in effect, relaxing the reaction stringency and allowing primers to anneal to not quite perfect hybridization partners—has useful applications, as well, such as in searching a sample for an unknown sequence suspected to be similar to a known one. This technique has been successfully employed for the discovery of new virus species, when primers matching a similar virus species are employed. Similarly, during a PCR-directed cloning of a gene or region of interest, planned mismatches in the primer sequence and slightly lowered Tms can be used to introduce wanted mutations in a process called site-directed mutagenesis. It’s possible to perform differential detection of single nucleotide polymorphisms (SNPs) (see the chapter titled The Content of the Genome), which can be directly indicative of particular genotypes or serve as surrogate linked markers for nearby genetic targets of interest, through the design of PCR primers with a 3′ terminal nucleotide specific to the expected polymorphism. At the optimal Tm, this final crucial nucleotide can only hybridize and provide a 3′ hydroxyl to the waiting polymerase if the matching single nucleotide polymorphism occurs. This process is known by several names, including amplification refractory mutation selection (ARMS) or allele-specific PCR extension (ASPE).
The PCR process described thus far has been restricted to amplification of a single target per reaction, or simplex PCR. Although this is the most common application, it is possible to combine multiple, independent PCR reactions into a single reaction, allowing for an experiment to query a single, minute specimen for the presence, absence, or possibly the amount of multiple unrelated sequences. This multiplex PCR is particularly useful in forensics applications and medical diagnostic situations, but entails rapidly increasing levels of complexity in ensuring that multiple primer sets do not have unwanted interactions that lead to undesired false products. At best, multiplexing tends to result in loss of some sensitivity for each individual PCR due to effective competition between them for limited polymerase and nucleotides.
A final point of interest to many students with regard to PCR is its consideration from a philosophical perspective. In practice, performance of this now incredibly pervasive method requires the use of a thermostable polymerase, as previously indicated. These polymerases (of which there are a number of varieties) primarily derive from bacterial DNA polymerases originally identified in extremophiles living in boiling hot springs and deep-sea volcanic thermal vents. Few people would have been likely to suspect that studying deep-sea thermal vent microbes would be of such direct importance in so many other aspects of science, including those that impact on their daily lives. These unexpected links between topics serve to highlight the importance of basic research on all manner of subjects; critical discoveries can come from the least expected avenues of exploration.
After nucleic acids are separated by size in a gel matrix, they can be detected using dyes that are sequence-nonspecific, or specific sequences can be detected using a method generically referred to as blotting. Although slower and more involved than direct visualization by fluorescent dye staining, blotting techniques have two major advantages: They have a greatly increased sensitivity relative to dye staining, and they allow for the specific detection of defined sequences of interest among many similarly sized bands on a gel.
The method was first developed for application to DNA agarose gels and was briefly introduced in the section Nucleic Acid Detection. In this form, the method is referred to as Southern blotting (after the method’s inventor, Dr. Edwin Southern). A schematic of this process is shown in FIGURE 2.20. A regular agarose gel is made and run (and if desired, stained) as described previously. Following this, the gel is soaked in an alkali buffer to denature the DNA, and then placed in contact with a sheet of porous membrane (commonly nitrocellulose or nylon). Next, a buffer is drawn through the gel and then the membrane either by capillary action (e.g., by wicking into a stack of dry paper towel) or by a gentle vacuum pressure. This slow flow of buffer in turn draws each nucleic acid band in the gel out of the gel matrix and onto the membrane surface. Nucleic acids bind to the membrane, which in many cases is positively charged to increase efficiency of DNA binding. This, in effect, creates a “contact print” of the order and position of all nucleic acid bands as size-resolved in the gel. To make the elution of large DNA molecules from the gel matrix more efficient, the gel is sometimes treated with a mild acid after electrophoresis but before transfer. This induces nucleic acid depurination and creates random strand breaks in the DNA within the gel, such that large molecules are broken into smaller subsections that elute more readily but remain in the same physical location as their original gel band.
FIGURE 2.20 To perform a Southern blot, DNA digested with restriction enzymes is electrophoresed to separate fragments by size. Double-stranded DNA is denatured in an alkali solution either before or during blotting. The gel is placed on a wick (such as a sponge) in a container of transfer buffer and a membrane (nylon or nitrocellulose) is placed on top of the gel. Absorbent materials such as paper towels are placed on top. Buffer is drawn from the reservoir through the gel by capillary action, transferring the DNA to the membrane. The membrane is then incubated with a labeled probe (usually DNA). The unbound probe is washed away, and the bound probe is detected by autoradiography or phosphorimaging. In Northern blotting, RNA is run on a gel rather than DNA.
Following transfer, the nucleic acids are fixed to the membrane either through drying or through exposure to ultraviolet light, which can create physical crosslinks between the membrane and the nucleic acids (primarily pyrimidines). The blot is now ready for blocking, where it is immersed in a warmed, low-salt buffer containing materials that will bind to and block areas of the blot that might bind organic compounds nonspecifically. Following blocking, a probe molecule is introduced. The probe consists of a labeled (isotopically or chemically, e.g., through incorporation of biotinylated nucleotides) copy of the target sequence of interest, which is either synthesized as a single-stranded oligonucleotide, or (if double stranded) has been heat denatured and rapidly cooled to place it in a single-stranded form. When this is added to the warmed buffer and allowed to incubate with the blocked membrane, the probe will attempt to hybridize to homologous sequences on the membrane surface. Following this hybridization step, the membrane is generally washed in warm buffer without a probe or blocking agent to remove nonspecifically associated probe molecules, and then visualized; in the case of isotopically labeled probes, this can be done by simply exposing the membrane to a piece of film or a phosphor-imager screen. Decay of the label (usually 32P or 35S) leads to the production of an image in which any hybridized DNA bands become visible on the developed film or scanned phosphor screen. For chemically labeled probes, chemiluminescent or fluorescent detection strategies are used in an analogous manner.
A final benefit of the Southern blotting technique is that the observed band intensity is related to the amount of target on the membrane—in other words, it is a quantitative method. If a suitable standard (e.g., a dilution series of unlabeled probe sequence) is included in the gel, comparison of this standard to target band intensities allows for determination of target quantity in the starting sample. This information can be useful for applications such as determining viral copy number in a host cell sample.
Numerous variations on the Southern-blot approach exist, including use of specialized gel systems for the initial separation of DNAs. For example, two-dimensional gels can be used to separate DNA molecules by shape as well as size. FIGURE 2.21 illustrates a two-dimensional mapping technique used to identify replication intermediates, a method used extensively in studies of replication and replication repair. In this method, restriction fragments of replicating DNA are electrophoresed in a first dimension that separates by mass and a second dimension where movement is determined more by shape. Different types of replicating molecules follow characteristic paths, measured by their deviation from the line that would be followed by a linear molecule of DNA that doubled in size. A simple Y-structure (which occurs when a fragment is in the midst of replication, but does not itself contain an origin of replication) follows a continuous path in which one fork moves along the linear fragment. An inflection point occurs when all three branches are the same length and the structure therefore deviates most extensively from linear DNA. Analogous considerations determine the paths of double Y-structures or bubbles (bubbles indicate a bidirectional fork, thus an origin of replication, within the fragment). An asymmetric bubble follows a discontinuous path, with a break at the point at which the bubble is converted to a Y-structure as one fork runs off the end.
FIGURE 2.21 One application of Southern blotting allows detection of fragments separated by shape as well as size. In this example, the position of a replication origin and the number of replicating forks determine the shape of a replicating restriction fragment, which can be followed by its electrophoretic path (solid line). The dashed line shows the path for a linear DNA.
Another variation of the Southern-blot approach is the use of a denaturing gel matrix for an otherwise analogous process on RNA molecules (referred to as northern blotting). In this case, there is no initial digestion step, so intact RNA molecules are separated by size, usually on a formaldehyde or other denaturing gel, which eliminates RNA secondary structures. This allows measurement of actual RNA sizes and, like Southern blotting, provides a similarly quantitative method for detection of any type of RNA. If mRNA is the target of interest, it is possible to separate mRNA from all the other classes of RNA in the cell. mRNA (and some noncoding RNA) differs from other RNAs in that it is polyadenylated (it has a string of adenine residues added to the 3′ end; see the RNA Splicing and Processing chapter). Poly(A)+ mRNA can therefore be enriched by use of an oligo(dT) column, in which oligomers of oligo(dT) are immobilized on a solid support and used to capture mRNA from the total RNA in a sample. This is illustrated in FIGURE 2.22.
FIGURE 2.22 Poly(A)+ RNA can be separated from other RNAs by fractionation on an oligo(dT) column.
A conceptually similar process for proteins based on protein-separation gels and blotting to membrane is known as western blotting. This method is depicted in FIGURE 2.23. There are some key differences between the procedures for blotting proteins compared to nucleic acids. First, protein-separation gels typically contain the detergent SDS, which serves to unfold the proteins so that they will migrate according to size rather than shape. It also provides a uniform negative charge to all proteins so that they will migrate toward the positive pole of the gel. (In the absence of SDS, each protein has a specific individual charge at a given pH; it is possible to separate proteins based on these charges, rather than size, in a technique called isoelectric focusing.)
FIGURE 2.23 In a western blot, proteins are separated by size on an SDS gel, transferred to a nitrocellulose membrane, and detected by using an antibody. The primary antibody detects the protein and the enzyme-linked secondary antibody detects the primary antibody. The secondary antibody is detected in this example via addition of a chemiluminescent substrate, which results in emission of light that can be detected on X-ray film.
After the proteins are separated on the gel, they are transferred to a nitrocellulose membrane using an electric current to effect the transfer, rather than the capillary or vacuum methods used for nucleic acids. The most significant difference in western blotting is the method of detecting proteins on the membrane. Complementary base pairing can’t be used to detect a protein, so westerns use antibodies to recognize the protein of interest. The antibody can either recognize the protein itself, if such an antibody is available, or it can recognize an epitope tag that has been fused to the protein sequence. An epitope tag is a short peptide sequence that is recognized by a commercially available antibody; the DNA encoding the tag can be cloned in-frame to a gene of interest, resulting in a product containing the epitope (typically at the N- or C-terminus of the protein). Sequences for the most commonly used epitope tags (such as the HA, FLAG, and myc tags) are often available in expression vectors for ease of fusion (see the section Cloning Vectors Can Be Specialized for Different Purposes earlier in this chapter).
The antibody that recognizes the target on the membrane is known as the primary antibody. The final stage of western blotting is detection of the primary antibody with a secondary antibody, which is the antibody that can be visualized. Secondary antibodies are raised in a different species from the primary antibody used and recognize the constant region of the primary antibody (e.g., a “goat antirabbit” antibody will recognize a primary antibody raised in a rabbit; see the chapter titled Somatic DNA Recombination and Hypermutation in the Immune System for a review of antibody structure). The secondary antibody is typically linked to a moiety that allows its visualization—for example, a fluorescent dye or an enzyme such as alkaline phosphatase or horseradish peroxidase. These enzymes serve as visualization tools because they can convert added substrates to a colored product (colorimetric detection) or can release light as a reaction product (chemiluminescent detection). Use of primary and secondary antibodies (rather than linking a visualizer to the primary antibody) increases the sensitivity of western blotting. The result is semiquantitative detection of the protein of interest.
Continuing in the same vein, techniques used to identify interactions between DNA and proteins (through protein gel separation and blotting followed by probing with a DNA) are southwestern blotting; when an RNA probe is used, the technique is northwestern blotting.
A logical technical progression from Southern and northern blotting is the microarray. Instead of having the unknown sample on the membrane and the probe in solution, this effectively reverses the two. These originated in the form of “slot-blots” or “dot-blots,” whereby a researcher would spot individual DNA sequences of interest directly onto a hybridization membrane in an ordered pattern, with each spot consisting of a different, single, known sequence. Drying of the membrane immobilized these spots, creating a premade blotting array. In use, the researcher would then take a nucleic acid sample of interest, such as total cellular DNA, and then fragment and randomly and uniformly label this DNA (originally with a radioisotopic label). This labeled mix of sample DNA could then be used exactly as in a Southern blot as a probe to hybridize to the premade blot. Labeled DNA sequences homologous to any of the array spots would hybridize and be retained in the known, fixed location of that spot and be visualized by autoradiography. By viewing the autoradiogram and knowing the physical location of each specific probe spot, the pattern of hybridized versus nonhybridized spots could be read out to indicate the presence or absence of each of the corresponding known sequences in the unknown sample.
Technological improvements to this approach followed rapidly through miniaturization of the size and physical density of the immobilized spots, going from membranes with 30 to 100 spots to glass microscope slides with up to 1,000 spots. Today, silicon chip substrates have hundreds of thousands and up to a million or more individual spots in an area about the size of a postage stamp.
To visualize the distinct spots in such a high-density array, automated optical microscopy is used and fluorescence has replaced radiolabeling both to allow for increased spatial resolution (higher spot density) and easier quantification of each hybridization signal. In parallel with the increased total number of spots per array, the length of each unique probe has generally become shorter, allowing for each spot in the array to be specific to a smaller target area—in effect, giving greater “resolution” on a molecular scale. Although the potential applications of microarrays are really limited only by the user’s imagination, there are a number of particular applications for which they have become standard tools.
The first of these is in gene expression profiling, wherein a total mRNA sample from a specimen of interest (e.g., tissue in a disease state or under a particular environmental challenge) is collected and converted en masse to cDNA by a random primed reverse transcription. A label is incorporated into the cDNA during its synthesis (either through use of labeled nucleotides or having the primers themselves with a label); this can be either a fluorophore (“direct labeling”) or another hapten (such as biotin), which can at a later stage be exposed to a fluorophore conjugate that will bind the hapten (in the present example, streptavidin–phycoerythrin conjugate might be used) in what is called “indirect labeling.” This labeled cDNA is then hybridized to an array where the immobilized spots consist of complementary strands to a number of known mRNAs from the target organism. Hybridization, washing, and visualization allow for the detection of those spots that have bound their complementary labeled cDNA and thus the readout of which genes are being expressed in the original sample. This process is depicted in FIGURE 2.24. This method is fairly quantitative, meaning that the observed signal on each spot corresponds reasonably well to the original level of its particular mRNA. Clever selection of the sequence of each of the immobilized spots, such as choosing short probe sequences that are complementary to particular alternate exons of a gene, can even allow the method to differentiate and quantitate the relative levels of alternate splicing products from a single gene. By comparison of the data from such experiments performed in parallel on experimental tissue and control tissue, an experiment can collect a snapshot of the total cellular “global” changes in gene expression patterns, often with useful insight into the state or condition of the experimental tissue.
FIGURE 2.24 Gene expression arrays are used to detect the levels of all the expressed genes in an experimental sample. mRNAs are isolated from control and experimental cells or tissues and reverse transcribed in the presence of fluorescently labeled nucleotides (or primers), resulting in labeled cDNAs with different fluorophores (red and green strands) for each sample. Competitive hybridization of the red and green cDNAs to the microarray is proportional to the relative abundance of each mRNA in the two samples. The relative levels of red and green fluorescence are measured by microscopic scanning and are displayed as a single color. Red or orange indicates increased expression in the red (experimental) sample, green or yellow-green indicates lower expression, and yellow indicates equal levels of expression in the control and experiment.
A second major application is in genotyping. Analysis of the human genome (and other organisms) has led to the identification of large numbers of single nucleotide polymorphisms (SNPs), which are single nucleotide substitutions at a specific genetic locus (see the chapter titled The Content of the Genome). Individual SNPs occur at known frequencies, which often differ between populations. The most straightforward examples are where the SNP creates a missense mutation within a gene of interest, such as one involved in the metabolism of a drug. People carrying one allele of the SNP might clear a drug from circulation at a very different rate from those with an alternate allele, and thus determination of a patient’s allele at this SNP can be an important consideration in choosing an appropriate drug dosage. An example of this that has come all the way from theory into everyday use is CYP450 SNP genotyping to determine appropriate dosage of the anticoagulant warfarin. Another is in SNP genotyping of the K-Ras oncogene in some types of cancer patients in order to determine whether EGFR-inhibitory drugs will be of therapeutic value. Other SNPs might be of no direct biological consequence but can become a valuable genetic marker if found to be closely associated to a particular allele of interest—that is, if in genetic terms it is closely linked. Hundreds of thousands of SNPs have been mapped in the human genome, and arrays that can be probed with a subject’s DNA allow for the genotype at each of these to be simultaneously determined, with concurrent determination of what the linked genetic alleles are. In effect, this allows for much of the genotype of the subject to be inferred from a single experiment at vastly less time and expense than actually sequencing the entire subject genome. With a view toward the future, however, it should be noted that SNP genotyping—in the common case of linked alleles as opposed to direct missense mutation alleles—is indirect inference and has at least some potential for being inaccurate.
Sequencing, on the other hand, is definitive. If emerging sequencing technologies improve to the point of offering an entire human genome in 24 hours for a competitive cost to SNP genotyping, it might move to become the dominant approach for genotyping.
A third major application of DNA microarrays is array-comparative genomic hybridization (array-CGH). This is a technique that is augmenting, and in some cases replacing, cytogenetics for the detection and localization of chromosomal abnormalities that change the copy number of a given sequence—that is, deletions or duplications. In this technique, the array chip, known as a tiling array, is spotted with an organism’s genomic sequences that together represent the entire genome; the higher the density of the array, the smaller the genetic region each spot represents and thus the higher resolution the assay can provide. Two DNA samples (one from normal control tissue and one from the tissue of interest) are each randomly labeled with a different fluorophore, such that one sample, for example, is green and the other is red (similar to the mRNA labeling described earlier for the expression arrays). These two differentially labeled specimens are mixed at exactly equal ratios for total DNA, and then hybridized to the chip. Regions of DNA that occur equally in the two samples will hybridize equally to their complementary array spots, giving a “mixed” color signal. By comparison, any DNA regions that occur more in one sample than the other will outcompete and thus show a stronger color on their complementary probe spot than will the deficient sample. Computer-assisted image analysis can read out and quantitate small color changes on each array spot and thus detect hemizygous loss or duplication of even very small regions in a test sample. The resolution and facility for automation provided by this technique compared to conventional cytogenetics is leading to its increasing adoption in diagnostic settings for the detection of chromosomal copy number changes associated with a range of hereditary diseases.
Tiling arrays are also often used for chromatin immunoprecipitation studies, which can identify sequences interacting with a DNA-binding protein or complex on a genome-wide scale; this is described in the section Chromatin Immunoprecipitation.
In addition to the chip-like solid-phase arrays described, lower-density arrays for focused applications (with up to a few hundred targets, as opposed to millions) can be made in microbead-based formats. In these approaches, each microscopic bead has a distinct optical signal or code, and its surface can be coated with the target DNA sequence. Different bead codes can be mixed and matched into a single labeled sample of DNA or cDNA and then sorted, detected, and quantitated by optical and/or flow sorting methods. Although of much lower density than chip-type arrays, bead arrays can be modified and adapted much more readily to suit a particular focused biological question, and in practice they show faster three-dimensional hybridization kinetics than chips, which effectively have two-dimensional kinetics.
Most of the methods discussed thus far in this chapter are in vitro methods that allow the detection or manipulation of nucleic acids or proteins that have been isolated from cells (or produced synthetically). Many other powerful molecular techniques have been developed, however. These techniques either allow direct visualization of the in vivo behavior of macromolecules (e.g., imaging of GFP fusions in live cells) or allow researchers to take a “snapshot” of the in vivo localization or interactions of macromolecules at a particular condition or point in time.
There are numerous proteins that function by interacting directly with DNA, such as chromatin proteins, or the factors that perform replication, repair, and transcription. Although much of our understanding of these processes is derived from in vitro reconstitution experiments, it is critical to map the dynamics of protein–DNA interactions in living cells in order to fully understand these complex functions. The powerful technique of chromatin immunoprecipitation (ChIP) was developed to capture such interactions. (Chromatin refers to the native state of eukaryotic DNA in vivo, in which it is packaged extensively with proteins; this is discussed in the Chromatin chapter.) ChIP allows researchers to detect the presence of any protein of interest at a specific DNA sequence in vivo.
FIGURE 2.25 shows the process of ChIP. This method depends on the use of an antibody to detect the protein of interest. As was discussed earlier for western blots (see the section Blotting Methods earlier in this chapter), this antibody can be against the protein itself, or against an epitope-tagged target.
FIGURE 2.25 Chromatin immunoprecipitation detects protein–DNA interactions in the native chromatin context in vivo. Proteins and DNA are crosslinked, chromatin is broken into small fragments, and an antibody is used to immunoprecipitate the protein of interest. Associated DNA is then purified and analyzed by either identifying specific sequences by PCR (as shown), or by labeling the DNA and applying to a tiling array to detect genome-wide interactions.
The first step in ChIP is typically the crosslinking of the cell (or tissue or organism) of interest by fixing it with formaldehyde. This serves two purposes: (1) It kills the cell and arrests all ongoing processes at the time of fixation, providing the snapshot of cellular activity; and (2) it covalently links any protein and DNA that are in very close proximity, thus preserving protein–DNA interactions through the subsequent analysis. ChIP can be performed on cells or tissues under different experimental conditions (e.g., different phases of the cell cycle, or after specific treatments) to look for changes in protein–DNA interactions under different conditions.
After crosslinking, the chromatin is then isolated from the fixed material and cleaved into small chromatin fragments, usually 200 to 1,000 bp each. This can be achieved by sonication, which uses high-intensity sound waves to nonspecifically shear the chromatin. Nucleases (either sequence-specific or sequence-nonspecific) can also be used to fragment the DNA. These small chromatin fragments are then incubated with the antibody against the protein target of interest. These antibodies can then be used to immunoprecipitate the protein by pulling the antibodies out of the solution using heavy beads coated with a protein (such as Protein A) that binds to the antibodies.
After washing away unbound material, the remaining material contains the protein of interest still crosslinked to any DNA it was associated with in vivo. This is sometimes called a “guilt by association” assay, because the DNA target is only isolated due to its interaction with the protein of interest. The final stages of ChIP entail reversal of the crosslinks so that the DNA can be purified, and specific DNA sequences can be detected using PCR. Quantitative (real-time) PCR is usually the method of choice for detecting the DNA of a limited number of targets of interest.
In addition to revealing the presence of a specific protein at a given DNA sequence (e.g., a transcription factor bound to the promoter of a gene of interest), highly specialized antibodies can provide even more detailed information. For example, antibodies can be developed that distinguish between different posttranslational modifications of the same protein. As a result, ChIP can distinguish the difference between RNA polymerase II engaged in initiation at the promoter of a gene from pol II that has entered the elongation phase of transcription, because pol II is differentially phosphorylated in these two states (see the Eukaryotic Transcription chapter), and antibodies exist that recognize these phosphorylation events.
Certain variations on the ChIP procedure allow researchers to query the localization of a given protein (or modified version of a protein) across large genomic regions—or even entire genomes. In two of the most powerful variations, known as ChIP-on-chip and ChIP-seq, the only difference from a conventional ChIP is the fate of the DNA that is purified from the immunoprecipitated material. Rather than querying specific sequences in this DNA via PCR, the DNA is either labeled in bulk and hybridized to a DNA microarray (ChIP on chip; usually a genome tiling array, such as described in the previous section), or is directly subjected to deep sequencing (ChIP-seq; this is now the most popular method). Either method allows a researcher to obtain a genome-wide footprint of all of the binding sites of the protein of interest. For example, putative origins of replication (which are difficult to identify in multicellular eukaryotes) can be detected en masse by performing a ChIP against proteins in the origin recognition complex (ORC).
An organism that gains new genetic information from the addition of foreign DNA is described as transgenic. For simple organisms such as bacteria or yeast, it is easy to generate transgenics by transformation with DNA constructs containing sequences of interest. Transgenesis in multicellular organisms, however, can be much more challenging.
The approach of directly injecting DNA can be used with mouse eggs, as shown in FIGURE 2.26. Plasmids carrying the gene of interest are injected into the nucleus of the oocyte or into the pronucleus of the fertilized egg. The egg is implanted into a pseudopregnant mouse (a mouse that has mated with a vasectomized male to trigger a receptive state). After birth, the recipient mouse can be examined to see whether it has gained the foreign DNA, and, if so, whether it is expressed. Typically, a minority (~15%) of the injected mice carry the transfected sequence. In general, multiple copies of the plasmid appear to have been integrated in a tandem array into a single chromosomal site. The number of copies varies from 1 to 150, and they are inherited by the progeny of the injected mouse. The level of gene expression from transgenes introduced in this way is highly variable, both due to copy number and the site of integration. A gene can be highly expressed if it integrates within an active chromatin domain, but not if it integrates in or near a silenced region of the chromosome.
FIGURE 2.26 Transfection can introduce DNA directly into the germline of animals.
Photo reproduced from: Chambon, P. 1981. Sci Am 244:60–71. Used with permission of Pierre Chambon, Institute of Genetics and Molecular and Cellular Biology, College of France.
Transgenesis with novel or mutated genes can be used to study genes of interest in the whole animal. In addition, defective genes can be replaced by functional genes using transgenic techniques. One example is the cure of the defect in the hypogonadal mouse. The hpg mouse has a deletion that removes the distal part of the gene coding for the precursor to gonadotropin-releasing hormone (GnRH) and GnRH-associated peptide (GAP). As a result, the mouse is infertile. When an intact hpg gene is introduced into the mouse by transgenic techniques, it is expressed in the appropriate tissues. FIGURE 2.27 summarizes experiments to introduce a transgene into a line of hpg–homozygous mutant mice. The resulting progeny are normal. This provides a striking demonstration that expression of a transgene under normal regulatory control can be indistinguishable from the behavior of the normal allele.
FIGURE 2.27 Hypogonadism can be averted in the progeny of hpg mice by introducing a transgene that has the wild-type sequence.
Although promising, there are impediments to using such techniques to cure human genetic defects. The transgene must be introduced into the germline of the preceding generation, the ability to express a transgene is not predictable, and an adequate level of expression of a transgene can be obtained in only a small minority of the transgenic individuals. In addition, the large number of transgenes that might be introduced into the germline, and their erratic expression, could pose problems in cases in which overexpression of the transgene is harmful. In other cases, the transgene can integrate near an oncogene and activate it, promoting carcinogenesis.
A more versatile approach for studying the functions of genes is to eliminate the gene of interest. Transgenesis methods allow DNA to be added to cells or animals, but to understand the function of a gene, it is most useful to be able to remove the gene or its function and observe the resulting phenotype. The most powerful techniques for changing the genome use gene targeting to delete or replace genes by homologous recombination. Gene deletions are usually referred to as knockouts, whereas replacement of a gene with an alternative mutated version is called a knock-in.
In simple organisms such as yeast, this is again a very simple process in which DNA encoding a selectable marker flanked by short regions of homology to a target gene is transformed into the yeast. As little as 40 bp or so of homology will result in extremely efficient replacement of the target gene by the introduced marker gene, via homologous recombination using the short regions of homology.
In some organisms, and in mammalian cells in culture, there is no good method for deleting endogenous genes. Instead, researchers use knockdown approaches, which reduce the amount of a gene product (RNA or protein) produced, even while the endogenous gene is intact. There are several different knockdown methods, but one of the most powerful is the use of RNA interference (RNAi) to selectively target specific mRNAs for destruction. (RNAi is described in the Regulatory RNA chapter.) Briefly, introduction of double-stranded RNA into most eukaryotic cells triggers a response in which these RNAs are cleaved by a nuclease called Dicer into 21 bp dsRNA fragments (siRNAs), unwound into single strands, and then used by another enzyme, RISC, to find and anneal to mRNAs containing complementary sequence. When a fully complementary mRNA is found, it is cleaved and destroyed. In practice, this means that the mRNA for any gene can be targeted for destruction by introduction of a dsRNA designed to anneal to the target of interest. The means of introducing the dsRNA depends on the species being targeted; in mammalian cells, one method is transfection with DNA encoding a self-annealing RNA that forms a hairpin containing the targeting sequence. For many species, researchers are developing siRNA libraries that allow systematic elimination of large sets of target mRNAs, one at a time, providing a powerful new tool for genetic screening.
In some multicellular organisms, gene deletion is possible, but the process is more complicated than in organisms like yeast. In mammals, the target is usually the genome of an ES cell, which is then used to generate a mouse with the knockout. ES cells are derived from the mouse blastocyst (an early stage of development, which precedes implantation of the egg in the uterus). FIGURE 2.28 illustrates the general approach.
FIGURE 2.28 ES cells can be used to generate mouse chimeras, which breed true for the transfected DNA when the ES cell contributes to the germline.
ES cells are transfected with DNA in the usual way (most often by microinjection or electroporation). By using a donor that carries an additional sequence, such as a drug-resistance marker or some particular enzyme, it is possible to select ES cells that have obtained an integrated transgene carrying any particular donor trait. This results in a population of ES cells in which there is a high proportion carrying the marker.
These ES cells are then injected into a recipient blastocyst. The ability of the ES cells to participate in normal development of the blastocyst forms the basis of the technique. The blastocyst is implanted into a foster mother, and in due course develops into a chimeric mouse. Some of the tissues of the chimeric mice are derived from the cells of the recipient blastocyst; other tissues are derived from the injected ES cells. The proportion of tissues in the adult mouse that are derived from cells in the recipient blastocyst and from injected ES cells varies widely in individual progeny; if a visible marker (e.g., coat-color gene) is used, areas of tissue representing each type of cell can be seen.
To determine whether the ES cells contributed to the germline, the chimeric mouse is crossed with a mouse that lacks the donor trait. Any progeny that have the trait must be derived from germ cells that have descended from the injected ES cells. By this means, it is known that an entire mouse has been generated from an original ES cell!
When a donor DNA is introduced into the cell, it might insert into the genome by either nonhomologous or homologous recombination. Homologous recombination is relatively rare, probably representing <1% of all recombination events, and thus occurring at a frequency of ~10–7. By designing the donor DNA appropriately, though, we can use selective techniques to identify those cells in which homologous recombination has occurred.
FIGURE 2.29 illustrates the knockout technique that is used to disrupt endogenous genes. The basis for the technique is the design of a knockout construct with two different markers that will allow nonhomologous and homologous recombination events in the ES cells to be distinguished. The donor DNA is homologous to a target gene, but has two key modifications. First, the gene is inactivated by interrupting or replacing an exon with a gene encoding a selectable marker (most often the neoR gene that confers resistance to the drug G418 is used). Second, a counterselectable marker (a gene that can be selected against) is added on one side of the gene; for example, the thymidine kinase (TK) gene of the herpes simplex virus.
FIGURE 2.29 A transgene containing neo within an exon and TK downstream can be selected by resistance to G418 and loss of TK activity.
When this knockout construct is introduced into an ES cell, homologous and nonhomologous recombinations will result in different outcomes. Nonhomologous recombination inserts the entire construct, including the flanking TK gene. These cells are resistant to neomycin, and they also express thymidine kinase, which makes them sensitive to the drug ganciclovir (thymidine kinase phosphorylates ganciclovir, which converts it to a toxic product). In contrast, homologous recombination involves two exchanges within the sequence of the donor gene, resulting in the loss of the flanking TK gene. Cells in which homologous recombination has occurred therefore gain neomycin resistance in the same way as cells that have nonhomologous recombination, but they do not have thymidine kinase activity, and so are resistant to ganciclovir. Thus, plating the cells in the presence of neomycin plus ganciclovir specifically selects those in which homologous recombination has replaced the endogenous gene with the donor gene.
The presence of the neoR gene in an exon of the donor gene disrupts translation, and thereby creates a null allele. A particular target gene can therefore be knocked out by this means; once a mouse with one null allele has been obtained, it can be bred to generate the homozygote. This is a powerful technique for investigating whether a particular gene is essential, and what functions in the animal are perturbed by its loss. Sometimes phenotypes can even be observed in the heterozygote.
A major extension of ability to manipulate a target genome has been made possible by using the phage Cre/lox system to engineer site-specific recombination in a eukaryotic cell. The Cre enzyme catalyzes a site-specific recombination reaction between two lox sites, which are identical 34-bp sequences. FIGURE 2.30 shows that the consequence of the reaction is to excise the stretch of DNA between the two lox sites.
FIGURE 2.30 The Cre recombinase catalyzes a site-specific recombination between two identical lox sites, releasing the DNA between them.
Structure from Protein Data Bank: 1OUQ. E. Ennifar, et al. 2003. Nucleic Acids Res 31:5449–5460.
The great utility of the Cre/lox system is that it requires no additional components and works when the Cre enzyme is produced in any cell that has a pair of lox sites. FIGURE 2.31 shows that we can control the reaction to make it work in a particular cell by placing the cre gene under the control of a regulated promoter. The procedure begins with two mice. One mouse has the cre gene, typically controlled by a promoter that can be turned on specifically in a certain cell or under certain conditions. The other mouse has a target sequence flanked by lox sites. When we cross the two mice, the progeny have both elements of the system; the system can be turned on by controlling the promoter of the cre gene. This allows the sequence between the lox sites to be excised in a controlled way.
FIGURE 2.31 By placing the Cre recombinase under the control of a regulated promoter, it is possible to activate the excision system only in specific cells. One mouse is created that has a promoter-cre construct, and another that has a target sequence flanked by lox sites. The mice are crossed to generate progeny that have both constructs. Then excision of the target sequence can be triggered by activating the promoter.
The Cre/lox system can be combined with the knockout technology to give us even more control over the genome. Inducible knockouts can be made by flanking the neoR gene (or any other gene that is used similarly in a selective procedure) with lox sites. After the knockout has been made, the target gene can be reactivated by causing Cre to excise the neoR gene in some particular circumstance (such as in a specific tissue).
FIGURE 2.32 shows a modification of this procedure that allows a knock-in to be created. Basically, we use a construct in which some mutant version of the target gene is used to replace the endogenous gene, relying on the usual selective procedures. Then, when the inserted gene is reactivated by excising the neoR sequence, we have in effect replaced the original gene with a different version.
FIGURE 2.32 An endogenous gene is replaced in the same way as when a knockout is made (see Figure 2.30), but the neomycin gene is flanked by lox sites. After the gene replacement has been made using the selective procedure, the neomycin gene can be removed by activating Cre, leaving an active insert.
A useful variant of this method is to introduce a wild-type copy of the gene of interest in which the gene itself (or one of its exons) is flanked by lox sites. This results in a normal animal that can be crossed to a mouse containing Cre under control of a tissue-specific or otherwise regulated promoter. The offspring of this cross are conditional knockouts, in which the function of the gene is lost only in cells that express Cre. This is particularly useful for studying genes that are essential for embryonic development; genes in this class would be lethal in homozygous embryos and thus are very difficult to study.
Recently, several technologies have emerged that allow direct editing of target sequences in the genome in vivo. These methods are all based on endonucleases that can be targeted very specifically to genomic sites. The double-strand breaks created by these nucleases then utilize the cell’s own repair machinery (homologous recombination or nonhomologous end-joining; see the Repair Systems chapter) to generate sequence alterations. These changes can include gene mutation, deletion, insertion, or even precise gene editing or correction based on a provided donor template.
The specificity and outcomes of these techniques depend on the specific targeting of endonucleases to only the site(s) of interest. Four general classes of nucleases are used: zinc finger nucleases (ZFNs), meganucleases, transcription activator-like effector nucleases (TALENs), and, most recently, the CRISPR/Cas9 system. The basic characteristics of these systems are summarized in TABLE 2.2.
TABLE 2.2 Basic features of endonuclease-based genome-editing systems.
Genome-Editing Tool | Derivation | Targeting | Characteristics |
---|---|---|---|
ZFN | Zinc finger DNA–binding domain fused to FokI restriction endonuclease | Multifinger arrays selected for binding to desired target site | Pros: Can trigger both NHEJ and HR; modest size Con: Generating specificity to desired target can be labor-intensive |
TALEN | TALE proteins from Xanthomonus bacteria (plant pathogens) fused to FokI restriction endonuclease | ~35 amino acid TALE repeats each bind specific DNA base pairs, strung together to match target sequence | Pro: Can be designed for virtually any sequence Con: Large size makes in vivo delivery challenging |
Meganuclease | Homing endonucleases (e.g., I-SceI) | Homing endonuclease reengineered/selected to recognize desired target | Pros: Cleavage produces 3′ overhang—more recombinogenic; small size for ease of delivery Con: Limits to the number of sequences recognized |
CRIPSR/Cas9 | RNA-guided nucleases from bacterial adaptive immune system | Sequence of the guide RNA (gRNA) component provides target specificity | Pro: Can just change gRNA sequence rather than engineer new proteins for each target site Con: Target sequences slightly limited by requirement for a short motif 3′ to the target site |
ZFNs take advantage of the fact that zinc finger (ZF) DNA binding domains (discussed in the chapter titled Eukaryotic Transcription) are modular domains that each recognize a 3-bp sequence and can be strung together into multifinger domains to recognize longer sequences. A combination of engineering and selection allows the creation of ZF arrays that will target a locus of interest. The ZF portion is fused to the endonuclease domain of the FokI restriction enzyme to create the ZFN, which then dimerizes to make a DSB at the desired site.
Similarly, TALENs utilize a modular DNA binding repeat; in this case, a set of conserved 33–35 amino acid repeats derived from the TALE proteins of the Xanthomonas bacterial plant pathogens. Each TALE repeat recognizes a single base pair (determined by two variable amino acids within the 33–35 aa repeat), so multiple TALE repeats can be strung together to recognize virtually any sequence (with the only requirement that there be a T at the 5′ end of the target). As for ZFNs, the TALE array is fused to the FokI enzyme to provide the cleavage. A downside of TALENs is that because each base pair in the target site is recognized by an approximately 35 aa motif, targeting sequences long enough to be unique in the genome can result in very large TALENs, which makes delivery into target cells or tissues more challenging.
The meganucleases, despite their name, are actually the smallest of these editing nucleases and thus the easiest to deliver (in fact, several meganucleases with different specificities could be delivered simultaneously for multiplex editing). These nucleases are derived from naturally occurring homing endonucleases, a family of nucleases encoded within introns or as self-splicing inteins. These nucleases naturally recognize long, usually asymmetric, sites of up to 40 bp that typically occur only 1 or 2 times in a genome. (The large target sites are the origin of the name.) Meganucleases can be engineered or selected to recognize novel sequences, but because they lack the modular nature of ZFNs and TALENs, this can be difficult.
The most recent—and most exciting—gene editing tool to be developed is based on the CRISPR-Cas RNA-guided nucleases that form the basis of a bacterial adaptive immune response against viruses and plasmids. The CRISPR-Cas system is described in more detail in the chapter titled Regulatory RNA. Briefly, the CRISPR-Cas system involves integration of invading nucleic acids into CRISPR loci, where they are transcribed into CRISPR RNAs (crRNAs). These then form a complex with a trans-activating crRNA and Cas (CRISPR-associated) proteins. The crRNA then targets cleavage of complementary DNA sequences. To adapt this system for gene editing, the two RNAs are fused into a single guide RNA (gRNA), and changes to a portion of this sequence can be used to define desired targets. This is an enormous advantage over the other technologies, which need to engineer novel proteins for every desired target sequence. The same Cas9 protein can simply be delivered with a gRNA (or several!) designed against the site of interest. Cas9 proteins do require a short (about 3 bp) protospacer-adjacent motif (PAM) 3′ to the target site, which can limit some target sequences. Recent efforts have focused on developing Cas9 proteins with different PAM specificities to expand this repertoire as well as developing Cas9 variants with increased specificity to reduce off-target cleavage.
With these techniques, we are able to investigate the functions and regulatory features of genes in whole animals. The ability to introduce DNA into the genome allows us to make changes in it, add new genes that have had particular modifications introduced in vitro, or inactivate existing genes. Thus, it becomes possible to delineate the features responsible for tissue-specific gene expression. Gene editing techniques have already begun to show promise as a gene therapy tool to treat human genetic disorders and other diseases. For example, ZFNs have been used in Phase 1 clinical trials to modify the CCR5 receptor (used by HIV to enter cells) in HIV-infected patients. All of the gene editing tools are being utilized in preclinical studies. Ultimately, we can expect routinely to replace or repair defective genes in the genome in a targeted manner.
DNA can be manipulated and propagated by using the techniques of cloning. These include digestion by restriction endonucleases, which cut DNA at specific sequences, and insertion into cloning vectors, which permit DNA to be maintained and amplified in host cells such as bacteria. Cloning vectors can have specialized functions, as well, such as allowing expression of the product of a gene of interest, or fusion of a promoter of interest to an easily assayed reporter gene.
DNA (and RNA) can be detected nonspecifically by the use of dyes that bind independent of sequence. Specific nucleic acid sequences can be detected by using base complementarity. Specific primers can be used to detect and amplify particular DNA targets via PCR. RNA can be reverse transcribed into DNA to be used in PCR; this is known as reverse transcription (RT-PCR). Labeled probes can be used to detect DNA or RNA on Southern or Northern blots, respectively. Proteins are detected on western blots using antibodies.
Sequencing technology is advancing rapidly. The original cost to determine the human genome sequence was about $1 billion. By the beginning of 2012, multiple individuals had their sequence determined. For many now, normal and tumor-derived sequences have been determined and their sequences compared for a price of just a few thousand dollars. The original goal of the next generation sequencing methodologies was a $1,000 genome, a target that is now here.
DNA microarrays are solid supports (usually silicon chips or glass slides) on which DNA sequences corresponding to ORFs or complete genomic sequences are arrayed. Microarrays are used to detect gene expression, for SNP genotyping, and to detect changes in DNA copy number as well as many other applications.
Protein–DNA interactions can be detected in vivo using chromatin immunoprecipitation. The DNA obtained in a chromatin immunoprecipitation experiment can be used as a probe on a genome tiling array, or it can be sequenced directly, to map all localization sites for a given protein in the genome.
New sequences of DNA can be introduced into a cultured cell by transfection or into an animal egg by microinjection. The foreign sequences can become integrated into the genome, often as large tandem arrays. The array appears to be inherited as a unit in a cultured cell. The sites of integration appear to be random. A transgenic animal arises when the integration event occurs in a genome that enters the germ cell lineage. Often a transgene responds to tissue and temporal regulation in a manner that resembles the endogenous gene. Under conditions that promote homologous recombination, an inactive sequence can be used to replace a functional gene, thus creating a knockout, or deletion, of the target locus. Extensions of this technique can be used to make conditional knockouts, where the activity of the gene can be turned on or off (such as by Cre-dependent recombination), and knock-ins, where a donor gene specifically replaces a target gene. Transgenic mice can be obtained by injecting recipient blastocysts with ES cells that carry transfected DNA. Knockdowns, most commonly achieved by using RNA interference, can be used to eliminate gene products in cell types for which knockout technologies are not available. New genome editing technologies based on targeted endonucleases have dramatically expanded our capacity to make changes to genomes in vivo.
Olorunniji, F. J., Rosser, S. J., and Stark, W. M. (2016). Site-specific recombinases: Molecular machines for the Genetic Revolution. Biochem. J. Mar 15;473(6), 673–84.
Wang, H., La Russa, Qi. (2016). CRISPR/Cas9 in genome editing and beyond. Annu. Rev. Biochem. Apr 25. [Epub ahead of print] PMID: 27145843.