Top texture: © Laguna Design / Science Source;

CHAPTER 18: Eukaryotic Transcription

Chapter Opener: © Carol & Mike Werner/Visuals Unlimited.

18.1 Introduction

Initiation of transcription on a chromatin template that is already opened requires the enzyme RNA polymerase to bind at the promoter and transcription factors to bind to enhancers. In vitro transcription on a DNA template requires a different subset of transcription factors than are needed to transcribe a chromatin template (we examine how chromatin is opened in the chapter titled Eukaryotic Transcription Regulation). Any protein that is needed for the initiation of transcription, but that is not itself part of RNA polymerase, is defined as a transcription factor. Many transcription factors act by recognizing cis-acting sites on DNA. Binding to DNA, however, is not the only means of action for a transcription factor. A factor may recognize another factor, recognize RNA polymerase, or be incorporated into an initiation complex only in the presence of several other proteins. The ultimate test for membership in the transcription apparatus is functional: A protein must be needed for transcription to occur at a specific promoter or set of promoters.

A significant difference between the transcription of eukaryotic and prokaryotic RNAs is that in bacteria transcription takes place on a DNA template, whereas in eukaryotes transcription takes place on a chromatin template. Chromatin changes everything and must be taken into account at every step. The chromatin must be in an open structure, and, even in an open structure, nucleosome octamers must be moved or removed from promoter sequences before transcription factors and RNA polymerase can bind. This can sometimes require transcription from a silent or cryptic promoter either on the same strand or on the antisense strand.

A second major difference is that the bacterial RNA polymerase, with its sigma factor subunit, can read the DNA sequence to find and bind to its promoter. A eukaryotic RNA polymerase cannot read DNA. Initiation at eukaryotic promoters therefore involves a large number of factors that must prebind to a variety of cis-acting elements and other factors already bound to the DNA before the RNA polymerase can bind. These factors are called basal transcription factors. The RNA polymerase then binds to this basal transcription factor–DNA complex. This binding region is defined as the core promoter, the region containing all the binding sites necessary for RNA polymerase to bind and function. RNA polymerase itself binds around the start point of transcription, but does not directly contact the extended upstream region of the promoter. By contrast, bacterial promoters discussed in the chapter titled Prokaryotic Transcription are largely defined in terms of the binding site for RNA polymerase in the immediate vicinity of the start point.

Whereas bacteria have a single RNA polymerase that transcribes all three major classes of genes, transcription in eukaryotic cells is divided into three classes. Each class is transcribed by a different RNA polymerase:

  • RNA polymerase I transcribes 18S/28S rRNA.

  • RNA polymerase II transcribes mRNA and some small RNAs.

  • RNA polymerase III transcribes tRNA, 5S ribosomal RNA, and also some other small RNAs.

This is the current picture of the major classes of genes. As we will see in the chapter titled Regulatory RNA, recent discoveries by whole genome tiling arrays and deep sequencing of cellular RNA have uncovered a new world of antisense transcripts, intergenic transcripts, and heterochromatin transcripts. Virtually the entire genome is transcribed from both strands. Not much is currently known about the promoters for these classes or their function and regulation, but it is known that many (possibly most) of these transcripts are produced by RNA polymerase II.

Basal transcription factors are needed for initiation, but most are not required subsequently. For the three eukaryotic RNA polymerases, the transcription factors, rather than the RNA polymerases themselves, are responsible for recognizing the promoter DNA sequence. For all eukaryotic RNA polymerases, the basal transcription factors create a structure at the promoter to provide the target that is recognized by the RNA polymerase. For RNA polymerases I and III, these factors are relatively simple, but for RNA polymerase II they form a sizeable group. The basal factors join with RNA polymerase II to form a complex surrounding the start point, and they determine the site of initiation. The basal factors together with RNA polymerase constitute the basal transcription apparatus.

The promoters for RNA polymerases I and II are (mostly) upstream of the start point, but a large number of promoters for RNA polymerase III lie downstream (within the transcription unit) of the start point. Each promoter contains characteristic sets of short conserved sequences that are recognized by the appropriate class of basal transcription factors. RNA polymerases I and III each recognize a relatively restricted set of promoters, and thus rely upon a small number of accessory factors.

Promoters utilized by RNA polymerase II show much more variation in sequence and have a modular organization. All RNA polymerase II promoters have sequence elements close to the start point of transcription that are bound by the basal apparatus and the polymerase to establish the site of initiation. Other sequences farther upstream or downstream, called enhancer sequences, determine whether the promoter is expressed, and, if expressed, whether this occurs in all cell types or is cell type specific.

The enhancer is a second type of site involved in transcription and is identified by sequences that stimulate initiation. Enhancer elements are often targets for tissue-specific or temporal regulation. Some enhancers bind transcription factors that function by short-range interactions and are located near the promoter, whereas others can be located thousands of base pairs away. FIGURE 18.1 illustrates the general properties of promoters and enhancers. A regulatory site that binds more negative regulators than positive regulators to control transcription is called a silencer. As can be seen in Figure 18.1, promoters and enhancers are sequences that bind a variety of proteins that control transcription, and in that regard are actually quite similar to each other. Enhancers, like promoters, can also bind RNA polymerase and initiate transcription of an RNA called eRNA (enhancer RNA) as discussed in the chapter called Regulatory RNA. These eRNAs may promote enhancer/promoter interactions by DNA looping, often through intermediates called coactivators. The components of an enhancer or a silencer resemble those of the promoter in that they consist of a variety of modular elements that can bind positive regulators or negative regulators in a closely packed array. Enhancers do not need to be near the promoter. They can be upstream, inside a gene, or beyond the end of a gene, and their orientation relative to the gene does not matter.

FIGURE 18.1 A typical gene transcribed by RNA polymerase II has a promoter that extends upstream from the site where transcription is initiated. The promoter contains several short-sequence (~10 bp) elements that bind transcription factors, dispersed over ~100 bp. An enhancer containing a more closely packed array of elements that also bind transcription factors may be located several hundred base pairs to several kilobases distant. (DNA may be coiled or otherwise rearranged so that transcription factors at the promoter and at the enhancer interact to form a large protein complex.)

Promoters that are constitutively expressed and needed in all cells (their genes are sometimes called housekeeping genes) have upstream sequence elements that are recognized by ubiquitous activators. No one element/factor combination is an essential component of the promoter, which suggests that initiation by RNA polymerase II may be regulated in many different ways. Promoters that are expressed only in certain times or places have sequence elements that require activators that are available only at those times or places.

Because chromatin is a general negative regulator, eukaryotic transcription is most often under positive regulation: A transcription factor is provided under tissue-specific control to activate a promoter or set of promoters that contain a common target sequence. This is a multistep process that first involves opening the chromatin and binding the basal transcription factors, and then binding the polymerase. Regulation by specific repression of a target promoter is less common.

A eukaryotic transcription unit generally contains a single gene, and termination typically occurs beyond the end of the coding region. Termination lacks the regulatory importance that applies in prokaryotic systems. RNA polymerases I and III terminate at discrete sequences in defined reactions, but the mode of termination by RNA polymerase II is not clear. The significant event in generating the 3′ end of an mRNA, however, is not the termination event itself, but rather a cleavage reaction in the primary transcript (see the chapter titled RNA Splicing and Processing).

18.2 Eukaryotic RNA Polymerases Consist of Many Subunits

The three eukaryotic RNA polymerases have different locations in the nucleus that correspond with the different genes that they transcribe. The most prominent of the three with regard to activity is the enzyme RNA polymerase I, which resides in the nucleolus and is responsible for transcribing the genes coding for the 18S and 28S rRNA. It accounts for most cellular RNA synthesis (in terms of quantity).

The other major enzyme is RNA polymerase II, which is located in the nucleoplasm (i.e., the part of the nucleus excluding the nucleolus). It represents most of the remaining cellular activity and is responsible for synthesizing most of the heterogeneous nuclear RNA (hnRNA), the precursor for most mRNA and a lot more. The classical definition was that hnRNA includes everything but rRNA and tRNA in the nucleus (again, classically, mRNA is only found in the cytoplasm). With modern molecular tools, it is now possible to look a little closer at hnRNA. Researchers have found many low-abundance RNAs that are very important, plus many others that are just now beginning to be understood. mRNA is the least abundant of the three major RNAs, accounting for just 2% to 5% of the cytoplasmic RNA.

RNA polymerase III is a minor enzyme in terms of activity, but it produces a collection of stable, essential RNAs. This nucleoplasmic enzyme synthesizes the 5S rRNAs, tRNAs, and other small RNAs that constitute more than a quarter of the cytoplasmic RNAs.

All eukaryotic RNA polymerases are large proteins, functioning as complexes of approximately 500 kD. They typically have about 12 subunits. The purified enzymes can undertake template-dependent transcription of RNA, but are not able to initiate selectively at promoters. The general constitution of a eukaryotic RNA polymerase II enzyme as typified in Saccharomyces cerevisiae is illustrated in FIGURE 18.2. The two largest subunits are homologous to the β and β′ subunits of bacterial RNA polymerase. Three of the remaining subunits are common to all the RNA polymerases; that is, they are also components of RNA polymerases I and III. Note that there is no subunit related to the bacterial sigma factor. Its function is contained in the basal transcription factors.

FIGURE 18.2 Some subunits are common to all classes of eukaryotic RNA polymerases and some are related to bacterial RNA polymerase. This drawing is a simulation of purified yeast RNA polymerase II run on an SDS gel to separate the subunits by size.

The largest subunit in RNA polymerase II has a carboxy-terminal domain (CTD), which consists of multiple repeats of a consensus sequence of seven amino acids. The sequence is unique to RNA polymerase II. Yeast has about 26 repeats and mammals have about 50. The number of repeats is important because deletions that remove (typically) more than half of the repeats are lethal. The CTD can be highly phosphorylated on serine or threonine residues. The CTD is involved in regulating the initiation reaction (see the section later in this chapter titled Initiation Is Followed by Promoter Clearance and Elongation), transcription elongation, and all aspects of mRNA processing, even export of mRNA to the cytoplasm.

The RNA polymerases of mitochondria and chloroplasts are smaller, and they resemble bacterial RNA polymerase rather than any of the nuclear enzymes (because they evolved from eubacteria). Of course, the organelle genomes are much smaller; thus the resident polymerase needs to transcribe relatively few genes, and the control of transcription is likely to be very much simpler. These enzymes are more similar to bacteriophage enzymes that do not need to respond to a more complex environment.

A major practical distinction between the eukaryotic enzymes is drawn from their response to the bicyclic octapeptide α-amanitin (the toxic compound in Amanita mushroom species). In essentially all eukaryotic cells, the activity of RNA polymerase II is rapidly inhibited by low concentrations of α-amanitin (resulting in transcriptional shutdown leading to acute liver toxicity in Amanita poisoning). RNA polymerase I is not inhibited. The response of RNA polymerase III is less well conserved; in animal cells it is inhibited by high levels, but in yeast and insects it is not inhibited.

18.3 RNA Polymerase I Has a Bipartite Promoter

RNA polymerase I transcribes only the genes for ribosomal RNA from a single type of promoter in a special region of the nucleus called the nucleolus. The precursor transcript includes the sequences of both large 28S and small 18S rRNAs, which are later processed by cleavages and modifications. Ribosome assembly also occurs in the nucleolus. There are many copies of the rRNA transcription unit. They alternate with nontranscribed spacers and are organized in a cluster, as discussed in the chapter titled Clusters and Repeats. The organization of the promoter, and the events involved in initiation, are illustrated in FIGURE 18.3. RNA polymerase I exists as a holoenzyme that contains additional factors required for initiation and is recruited by its transcription factors directly as a giant complex to the promoter.

FIGURE 18.3 Transcription units for RNA polymerase I have a core promoter separated by ~70 bp from the upstream promoter element. UBF binding to the UPE increases the ability of core-binding factor to bind to the core promoter. Core-binding factor (SL1) positions RNA polymerase I at the start point.

The promoter consists of two separate regions. The core promoter surrounds the start point, extending from −45 to +20, and is sufficient for transcription to initiate. It is generally G-C rich (unusual for a promoter), except for the only conserved sequence element, a short A-T–rich sequence around the start point. The core promoter’s efficiency, however, is very much increased by the upstream promoter element (UPE, sometimes also called the upstream control element, or UCE). The UPE is another G-C–rich sequence related to the core promoter sequence, extending from −180 to −107. This type of organization is common to pol I promoters in many species, although the actual sequences vary widely.

RNA polymerase I requires two ancillary transcription factors to recognize the promoter sequence. The factor that binds to the core promoter is SL1 (or TIF-1B and Rib1 in different species), which consists of four protein subunits. Two of the components of SL1 are the TATA-binding protein (TBP), a factor that also is required for initiation by RNA polymerases II and III, and a second component that is homologous to the RNA polymerase II factor TFIIB (see the section in this chapter titled TBP Is a Universal Factor). TBP does not bind directly to G-C−rich DNA, and DNA binding is the responsibility of the other components of SL1. It is likely that TBP interacts with RNA polymerase, probably with a common subunit or a feature that has been conserved among polymerases. SL1 enables RNA polymerase I to initiate from the promoter at a low basal frequency.

SL1 has primary responsibility for RNA polymerase recruitment, proper localization of polymerase at the start point, and promoter escape. As will be discussed later, a comparable function is provided for RNA polymerases II and III by a factor that consists of TBP and other proteins. Thus, a common feature in initiation by all three polymerases is a reliance on a “positioning factor” that consists of TBP associated with proteins that are specific for each type of promoter. The exact mode of action is different for each of the TBP-dependent positioning factors; at the promoter for RNA polymerase I it does not bind DNA, whereas at TATA box–containing promoters for RNA polymerase II it is the principal means for locating the factor on DNA.

For high-frequency initiation, the transcription factor UBF is required. This is a single polypeptide that binds to a G-C–rich element in the UPE. UBF has multiple functions. UBF is required to maintain open chromatin structure. It prevents histone HI binding, and therefore prevents assembly of inactive chromatin. It stimulates promoter release by the RNA polymerase, and it stimulates SL1. One indication of how UBF interacts with SL1 is given by the importance of the spacing between UBF and the core promoter. This can be changed by distances involving integral numbers of turns of DNA, but not by distances that introduce half turns. UBF binds to the minor groove of DNA and wraps the DNA in a loop of almost 360° turn on the protein surface, with the result that the core promoter and UPE come into close proximity, enabling UBF to stimulate binding of SL1 to the promoter.

Figure 18.3 shows initiation as a series of sequential interactions. RNA polymerase I, however, exists as a holoenzyme that contains most or all of the factors required for initiation, and it is probably recruited directly to the promoter. Following initiation, RNA polymerase I, like RNA polymerase II, requires a special factor, the RNA polymerase I PafI complex, for efficient elongation.

18.4 RNA Polymerase III Uses Downstream and Upstream Promoters

Recognition of promoters by RNA polymerase III strikingly illustrates the relative roles of transcription factors and the polymerase enzyme. The promoters fall into three general classes that are recognized in different ways by different groups of factors. The promoters for classes I and II, 5S and tRNA genes, are internal; they lie downstream of the start point. The promoters for class III snRNA (small nuclear RNA) genes lie upstream of the start point in the more conventional manner of other promoters. In both internal and external promoters, the individual elements that are necessary for promoter function consist exclusively of sequences recognized by transcription factors, which, in turn, direct the binding of RNA polymerase.

The structures of the three types of promoters for RNA polymerase III are summarized in FIGURE 18.4 Two of the promotor types are internal promoters. Each contains a bipartite structure, in which two short sequence elements are separated by a variable sequence. The 5S ribosomal gene type 1 promoter consists of a boxA sequence separated by an intermediate element (IE) from a boxC sequence; the entire boxA-IE-boxC region is often referred to as the 5S internal control region (ICR). In yeast, only the boxC element is required for transcription. The tRNA type 2 promoter consists of a boxA sequence separated from a boxB sequence. A common group of type 3 promoters encoding other small RNAs have three sequence elements that are all located upstream of the start point; these same elements are also present in a number of RNA polymerase II promoters.

FIGURE 18.4 Promoters for RNA polymerase III may consist of bipartite sequences downstream of the start point, with boxA separated from either boxC or boxB, or they may consist of separated sequences upstream of the start point (Oct, PSE, TATA).

The detailed interactions are different at the two types of internal promoter, but the principle is the same. TFIIIC binds downstream of the start point, either independently (tRNA type 2 promoters) or in conjunction with TFIIIA (5S type 1 promoters). The presence of TFIIIC enables the positioning factor TFIIIB to bind at the start point. RNA polymerase III is then recruited.

FIGURE 18.5 summarizes the stages of reaction at type 2 internal promoters used for tRNA genes. The distance between boxA and boxB can vary because many tRNA genes contain a small intron. TFIIIC binds to both boxA and boxB. This enables TFIIIB to bind at the start point. At this point RNA polymerase III can bind.

FIGURE 18.5 Internal type 2 pol III promoters use binding of TFIIIC to boxA and boxB sequences to recruit the positioning factor TFIIIB, which recruits RNA polymerase III.

The difference at type 1 internal promoters (for 5S genes) is that TFIIIA must bind at boxA to enable TFIIIC to bind at boxC. TFIIIA is a 5S sequence-specific binding factor that binds to the promoter and to the 5S RNA as a chaperone and gene regulator. FIGURE 18.6 shows that once TFIIIC has bound events follow the same course as at type 2 promoters, with TFIIIB (which contains the ubiquitous TBP) binding at the start point and RNA polymerase III joining the complex. Type 1 promoters are found only in the genes for 5S rRNA.

FIGURE 18.6 Internal type 1 pol III promoters use the assembly factors TFIIIA and TFIIIC, at boxA and boxC, to recruit the positioning factor TFIIIB, which recruits RNA polymerase III.

TFIIIA and TFIIIC are assembly factors, whose sole role is to assist the binding of the positioning factor TFIIIB at the correct location. Once TFIIIB has bound, TFIIIA and TFIIIC can be removed from the promoter without affecting the initiation reaction. TFIIIB remains bound in the vicinity of the start point, and its presence is sufficient to allow RNA polymerase III to identify and bind at the start point. Thus, TFIIIB is the only true initiation factor required by RNA polymerase III. This sequence of events explains how the promoter boxes downstream can cause RNA polymerase to bind at the start point, farther upstream. Although the ability to transcribe these genes is conferred by the internal promoter, changes in the region immediately upstream of the start point can alter the efficiency of transcription.

TFIIIC is a large protein complex (more than 500 kD), which is comparable in size to RNA polymerase itself, and contains six subunits. TFIIIA is a member of an interesting class of proteins containing a nucleic acid–binding motif called a zinc finger. The positioning factor TFIIIB consists of three subunits. It includes the same protein factor TBP that is present in the core-binding factor SL1 used for pol I promoters and (as we will see later in the section titled TBP Is a Universal Factor) in the corresponding transcription factor TFIID used by RNA polymerase II. It also contains Brf, which is related to the transcription factor TFIIB that is used by RNA polymerase II and to a subunit in the RNA polymerase ISL1 factor. The third subunit is called B99; it is dispensable if the DNA duplex is partially melted, which suggests that its function is to initiate the transcription bubble. The role of B99 may be comparable to the role played by sigma factor in bacterial RNA polymerase (see the chapter titled Prokaryotic Transcription).

The upstream region has a conventional role in the third class of polymerase III promoters. The example shown in Figure 18.4 has three upstream elements. These elements are also found in promoters for snRNA genes that are transcribed by RNA polymerase II. (Genes for some snRNAs are transcribed by RNA polymerase II, whereas others are transcribed by RNA polymerase III.) The upstream elements function in a similar manner in promoters for both RNA polymerases II and III.

Initiation at an upstream promoter for class III RNA polymerase III can occur on a short region that immediately precedes the start point and contains only the TATA element. Efficiency of transcription, however, is much increased by the presence of the enhancer proximal sequence element (PSE) and OCT (so named because it has an 8-bp binding sequence) elements. The factors that bind at these elements interact cooperatively. The PSE element may be essential at promoters used by RNA polymerase II, whereas it is stimulatory in promoters used by RNA polymerase III.

The TATA element confers specificity for the type of polymerase (II or III) that is recognized by an snRNA promoter. It is bound by a factor that includes TBP, which actually recognizes the sequence in DNA. TBP is associated with other proteins, which are specific for the type of promoter. The function of TBP and its associated proteins is to position the RNA polymerase correctly at the start point. This is described in more detail later in the sections on RNA polymerase II.

The factors work in the same way for both types of promoters for RNA polymerase III. The factors bind at the promoter before RNA polymerase itself can bind. They form a preinitiation complex that directs binding of the RNA polymerase. RNA polymerase III does not itself recognize the promoter sequence, but binds adjacent to factors that are themselves bound just upstream of the start point. For the type I and type II internal promoters, the assembly factors ensure that TFIIIB (which includes TBP) is bound just upstream of the start point, thereby providing the positioning information. For the upstream promoters, TFIIIB binds directly to the region including the TATA box. This means that, irrespective of the location of the promoter sequences, factor(s) are bound close to the start point in order to direct binding of RNA polymerase III. In all cases, the chromatin must be modified and in an open configuration.

18.5 The Start Point for RNA Polymerase II

The basic organization of the apparatus for transcribing protein-coding genes was revealed by the discovery that purified RNA polymerase II can catalyze synthesis of mRNA, but that it cannot initiate transcription unless an additional extract is added. The purification of this extract led to the definition of the general transcription factors, or basal transcription factors—a group of proteins that are needed for initiation by RNA polymerase II at all promoters. RNA polymerase II in conjunction with these factors constitutes the basal transcription apparatus that is needed to transcribe any promoter. The general factors are described as TFIIX, where X is a letter that identifies the individual factor. The subunits of RNA polymerase II and the general transcription factors are conserved among eukaryotes.

Our starting point for considering promoter organization is to define the core promoter as the shortest sequence at which RNA polymerase II can initiate transcription. A core promoter can, in principle, be expressed in any cell (though in practice a core promoter alone results in little or no transcription in the chromatin context in vivo). It is the minimum sequence that enables the general transcription factors to assemble at the start point. These factors are involved in the mechanics of binding to DNA and enable RNA polymerase II to recognize the promoter and initiate transcription. A core promoter functions at only a low efficiency. Other proteins, called activators, a different class of transcription factors, are required for the proper level of function (see the section titled Enhancers Contain Bidirectional Elements That Assist Initiation later in this chapter). The activators are not described systematically, but have casual names reflecting their histories of identification.

We might expect any sequence components involved in the binding of RNA polymerase and general transcription factors to be conserved at most or all promoters, as is the case for pol I and pol III promoters. As with bacterial promoters, when promoters for RNA polymerase II are compared homologies in the regions near the start point are restricted to rather short sequences. These elements correspond with the sequences implicated in promoter function by mutation. FIGURE 18.7 shows the construction of a typical pol II core promoter with three of the most common pol II promoter elements. However, the eukaryotic pol II promoter is far more structurally diverse than the bacterial promoter and the promoters for pol I and III. In addition to the three major elements, a number of minor elements can also serve to define the promoter.

FIGURE 18.7 A minimal pol II promoter may have a TATA box ~25 bp upstream of the Inr. The TATA box has the consensus sequence of TATAA. The Inr has pyrimidines (Y) surrounding the CA at the start point. The DPE is downstream of the start point. The sequence shows the coding strand.

The start point does not have an extensive homology of sequence, but there is a tendency for the first base of mRNA to be A, flanked on either side by pyrimidines. (This description is also valid for the CAT start sequence of bacterial promoters.) This region is called the initiator (Inr), and it may be described in the general form Py2CAPy5, where Py stands for any pyrimidine. The Inr is contained between positions −3 and +5.

Many promoters have a sequence called the TATA box, usually located approximately 25 bp upstream of the start point in higher eukaryotes. It constitutes the only upstream promoter element that has a relatively fixed location with respect to the start point. The consensus sequence of this core element is TATAA, usually followed by three more A-T base pairs (see the chapter titled Prokaryotic Transcription for a discussion of consensus sequence). The TATA box tends to be surrounded by G-C–rich sequences, which could be a factor in its function. It is almost identical with the sequence of the −10 box found in bacterial promoters; in fact, it could pass for one except for the difference in its location at −25 instead of −10. (The exception is in yeast, where the TATA box is more typically found at −90.) Single-base substitutions in the TATA box may act as up or down mutations, depending on how closely the original sequence matches the consensus sequence and how different the mutant sequence is. Typically, substitutions that introduce a G-C base pair are the most severe.

Promoters that do not contain a TATA element are called TATA-less promoters. Surveys of promoter sequences suggest that 50% or more of promoters may be TATA-less. When a promoter does not contain a TATA box, it often contains another element, the downstream promoter element (DPE), which is located at +28 to +32 within the transcription unit.

Typical core promoters consist either of a TATA box plus Inr or of an Inr plus DPE, although other combinations with minor elements exist as well.

18.6 TBP Is a Universal Factor

Before transcription initiation can begin, the chromatin has to be modified and remodeled to the open configuration, and any nucleosome octamer positioned over the promoter has to be moved or removed at all classes of eukaryotic promoters (we examine this aspect of transcription control more closely in the chapter titled Eukaryotic Transcription Regulation). At that point it is possible for a positioning factor to bind to the promoter. Each class of RNA polymerase is assisted by a positioning factor that contains TBP associated with other components. Recall that TBP stands for TATA-binding protein; it was initially so named because it was a protein that bound to the TATA box in RNA polymerase II genes. It was subsequently discovered to also be part of the positioning factors SL1 for RNA polymerase I (see the section earlier in this chapter titled RNA Polymerase I Has a Bipartite Promoter) and TFIIIB RNA polymerase III (see the section titled RNA Polymerase III Uses Downstream and Upstream Promoters). For these latter two RNA polymerases, TBP does not recognize the TATA box sequence (except in type 3 pol III promoters); thus, the name is misleading. In addition, many RNA polymerase II promoters lack TATA boxes, but still require the presence of TBP.

For RNA polymerase II, the positioning factor is TFIID, which consists of TBP associated with up to 14 other subunits called TAFs (for TBP-associated factors). Some TAFs are stoichiometric with TBP; others are present in lesser amounts, which means that there are multiple TFIID variants. TFIIDs containing different TAFs could recognize promoters with different combinations of conserved elements described in the previous section, The Start Point for RNA Polymerase II. Some TAFs are tissue specific. The total mass of TFIID typically is about 800 kD. The TAFs in TFIID were originally named in the form TAFII00, for example, where the number 00 gives the molecular mass of the subunit. Recently, the RNA polymerase II TAFs have been renamed TAF1, TAF2, and so forth; in this nomenclature TAF1 is the largest TAF, TAF2 is the next largest, and homologous TAFs in different species thus have the same names.

FIGURE 18.8 shows that the positioning factor recognizes the promoter in a different way in each case. At promoters for RNA polymerase III, TFIIIB binds adjacent to TFIIIC. At promoters for RNA polymerase I, SL1 binds in conjunction with UBF. TFIID is solely responsible for recognizing promoters for RNA polymerase II. At a promoter that has a TATA element, TBP binds specifically to the TATA box, but at TATA-less promoters, the TAFs have the role of recognizing other promoter elements, including the Inr and DPE. Whatever its means of entry into the initiation complex, it has the common purpose of interaction with the RNA polymerase.

FIGURE 18.8 RNA polymerases are positioned at all promoters by a factor that contains TBP.

TBP has the unusual property of binding to DNA in the minor groove. (The vast majority of DNA-binding proteins bind in the major groove.) The crystal structure of TBP suggests a detailed model for its binding to DNA. FIGURE 18.9 shows that it surrounds one face of DNA, forming a “saddle” around a stretch of the minor groove, which is bent to fit into this saddle. In effect, the inner surface of TBP binds to DNA, and the larger outer surface is available to extend contacts to other proteins. The DNA-binding site consists of a C-terminal domain that is conserved between species, and the variable N-terminal tail is exposed to interact with other proteins. It is a measure of the conservation of mechanism in transcriptional initiation that the DNA-binding sequence of TBP is 80% conserved between yeast and humans.

FIGURE 18.9 A view in cross-section shows that TBP surrounds DNA from the side of the narrow groove. TBP consists of two related (40% identical) conserved domains, which are shown in light and dark blue. The N-terminal region varies extensively and is shown in green. The two strands of the DNA double helix are in light and dark gray.

Photo courtesy of Stephen K. Burley.

Binding of TBP may be inconsistent with the presence of nucleosome octamers. Nucleosomes form preferentially by placing A-T−rich sequences with the minor grooves facing inward (see the chapter titled Chromatin); as a result, they could prevent binding of TBP. This may explain why the presence of a nucleosome at the promoter prevents initiation of transcription.

TBP binds to the minor groove and bends the DNA by approximately 80°, as illustrated in FIGURE 18.10. The TATA box bends toward the major groove, widening the minor groove. The distortion is restricted to the 8 bp of the TATA box; at each end of the sequence the minor groove has its usual width of about 5 Å, but at the center of the sequence the minor groove is greater than 9 Å. This is a deformation of the structure, but it does not actually separate the strands of DNA because base pairing is maintained. The extent of the bend can vary with the exact sequence of the TATA box and is correlated with the efficiency of the promoter.

FIGURE 18.10 The cocrystal structure of TBP with DNA from −40 to the start point shows a bend at the TATA box that widens the narrow groove where TBP binds.

Photo courtesy of Stephen K. Burley.

This structure has several functional implications. By changing the spatial organization of DNA on either side of the TATA box, it allows the transcription factors and RNA polymerase to form a closer association than would be possible on linear DNA. The bending at the TATA box corresponds energetically to unwinding of about one-third of a turn of DNA, and is compensated by a positive writhe.

The presence of TBP in the minor groove, combined with other proteins binding in the major groove, creates a high density of protein–DNA contacts in this region. Binding of purified TBP to DNA in vitro protects about one turn of the double helix at the TATA box, typically extending from −37 to −25. Binding of the TFIID complex in the initiation reaction, however, regularly protects the region from −45 to −10.

Within TFIID as a free protein complex, the factor TAF1 binds to TBP, where it occupies the concave DNA-binding surface. In fact, the structure of the binding site, which lies in the N-terminal domain of TAF1, mimics the surface of the minor groove in DNA. This molecular mimicry allows TAF1 to control the ability of TBP to bind to DNA; the N-terminal domain of TAF1 must be displaced from the DNA-binding surface of TBP in order for TFIID to bind to DNA.

Strikingly, a number of TAFs resemble histones: 9 of 14 TAFs contain a histone fold domain, though in most cases the TAFs lack the residues of this domain that are responsible for DNA binding. Four TAFs do have some intrinsic DNA binding ability: TAF4b, TAF12, TAF9, and TAF6 are (distant) homologs of histones H2A, H2B, H3, and H4, respectively. (The histones form the basic complex that binds DNA in eukaryotic chromatin; see the chapter titled Chromatin.) TAF4b/TAF12 and TAF9/TAF6 form heterodimers using the histone-fold motif; together they may form the basis for a structure resembling a histone octamer. Such a structure may be responsible for non-sequence-specific interactions of TFIID with DNA. Histone folds are also used in pairwise interactions between other TAFIIs.

Some of the TAFIIs may be found in other complexes as well as in TFIID. In particular, the histone-like TAFIIs also are found in protein complexes that modify the structure of chromatin prior to transcription (see the chapter titled Eukaryotic Transcription Regulation).

18.7 The Basal Apparatus Assembles at the Promoter

In a cell, gene promoters can be found in three basic types of chromatin with respect to activity. The first is an inactive gene in closed chromatin. The second is a potentially active gene in open chromatin bound with RNA polymerase, called a poised gene. Promoters in this class may assemble the basal apparatus, but they cannot proceed to transcribe the gene without a second signal to start transcription. Heat-shock genes are poised so that they can be activated immediately upon a rise in temperature. The third class (which we will examine shortly) is a gene being turned on in open chromatin.

What has been largely unexplored until recently is the involvement of noncoding RNA (ncRNA) transcripts in gene activation. Numerous recent examples have been described in which transcription of ncRNAs regulates transcription of nearby or overlapping protein-coding genes. The production of these functional ncRNAs (also referred to as cryptic unstable transcripts, or CUTs) is much more common than originally believed. A significant number of active promoters have transcripts generated upstream of the promoters (known as promoter upstream transcripts, or PROMPTs). PROMPTs are transcribed in both sense and antisense orientations relative to the downstream promoter and may play a regulatory role in transcription. The many roles of ncRNAs in transcriptional regulation are discussed further in the chapter titled Regulatory RNA.

The initiation process requires the basal transcription factors to act in a defined order to build a complex that will be joined by RNA polymerase. The series of events summarized in FIGURE 18.11 is one model. It is important to remember that RNA polymerase II promoters are structurally very diverse. Once a polymerase is bound, its activity then is controlled by enhancer-binding transcription factors.

FIGURE 18.11 An initiation complex assembles at promoters for RNA polymerase II by an ordered sequence of association with transcription factors. TFIID consists of TBP plus its associated TAFs as shown in the top panel; TBP alone, rather than TFIID, is shown in the remaining panels for simplicity.

Data from M. E. Maxon, J. A. Goodrich, and R. Tijan, Genes Dev. 8 (1994): 515–524.

A promoter for RNA polymerase II often consists of two types of regions. The core promoter contains the start point itself, typically identified by the Inr, and often includes either a nearby TATA box or DPE; additional less common elements may be found as well. The efficiency and specificity with which a promoter is recognized, however, depend upon short sequences farther upstream, which are recognized by a different group of transcription factors, sometimes called activators. In general, the target sequences are about 100 bp upstream of the start point, but sometimes they are more distant. Binding of activators at these sites may influence the formation of the initiation complex at (probably) any one of several stages. Promoters are organized on a principle of “mix and match.” A variety of elements can contribute to promoter function, but none is essential for all promoters.

The first step in activating a TATA box–containing promoter in open chromatin is initiated when the TBP subunit of TFIID directs its binding to the TATA box. This may be enhanced by upstream elements acting through a coactivator. (TFIID also recognizes the Inr sequence at the start point, the DPE, and possibly other promoter elements.) TFIIB binds downstream of the TATA box, adjacent to TBP in a region called the B recognition element (BRE), thus extending contacts along one face of the DNA from −10 to +10. The crystal structure of the ternary complex shown in FIGURE 18.12 extends this model. TFIIB makes contacts in the minor groove downstream of the TATA box, and contacts the major groove upstream of the TATA box. In archaeans, the homolog of TFIIB actually makes sequence-specific contacts with the promoter in the BRE region. This step is believed to be the major determinant in the establishment of promoter polarity, which way the RNA polymerase faces, and thus which strand is the template strand. TFIIB may provide the surface that is, in turn, recognized by RNA polymerase, so that it is responsible for the directionality of the polymerase binding. TFIIB also has a major role in recruiting RNA pol II to the TFIID/TFIIA/promoter DNA complex, assisting in the conversion from the closed to the open complex, and selecting the transcription start site (TSS).

FIGURE 18.12 Two views of the ternary complex of TFIIB-TBP-DNA show that TFIIB binds along the bent face of DNA. The two strands of DNA are green and yellow, TBP is blue, and TFIIB is red and purple.

Photo courtesy of Stephen K. Burley.

The crystal structure of TFIIB with RNA polymerase shows that three domains of the factor interact with the enzyme. As illustrated schematically in FIGURE 18.13, an N-terminal zinc ribbon from TFIIB contacts the enzyme near the site where RNA exits; it is possible that this interferes with the exit of RNA and influences the switch from abortive initiation to promoter escape. An elongated “finger” of TFIIB is inserted into the polymerase active center. The C-terminal domain interacts with the RNA polymerase and with TFIID to stabilize initial promoter melting. It also determines the path of the DNA where it contacts the factors TFIIE, TFIIF, and TFIIH, which may align them in the basal factor complex.

FIGURE 18.13 TFIIB binds to DNA and contacts RNA polymerase near the RNA exit site and at the active center, and orients it on DNA. Compare with Figure 18.12, which shows the polymerase structure engaged in transcription.

The factor TFIIF is a heterotetramer consisting of two types of subunits and is required for PIC (preinitiation complex) assembly. The larger subunit (RAP74) has an ATP-dependent DNA helicase activity that could be involved in melting the DNA at initiation. The smaller subunit (RAP38) has some homology to the regions of bacterial sigma factor that contact the core polymerase; it binds tightly to RNA polymerase II. TFIIF may assist in bringing RNA polymerase II to the assembling transcription complex and is required, along with TFIIB, for transcription start-site selection. The complex of TBP and TAFs may interact with the CTD tail of RNA polymerase, and interaction with TFIIB may also be important when TFIIF/polymerase joins the complex.

Polymerase binding extends the sites that are protected downstream to +15 on the template strand and +20 on the nontemplate strand. The enzyme extends the full length of the complex because additional protection is seen at the upstream boundary.

What happens at TATA-less promoters? The same general transcription factors, including TFIID, are needed. The Inr provides the positioning element; TFIID binds to it via an ability of one or more of the TAFs to recognize the Inr directly. Other TAFs in TFIID also recognize the DPE element downstream from the start point. The function of TBP at these promoters is more like that at promoters for RNA polymerase I and at internal promoters for RNA polymerase III.

When a TATA box is present, it determines the location of the start point. Its deletion causes the site of initiation to become erratic, although any overall reduction in transcription is relatively small. Indeed, some TATA-less promoters lack unique start points, so initiation occurs within a cluster of start points. The TATA box aligns the RNA polymerase via the interaction with TFIID and other factors so that it initiates at the proper site. Binding of TBP to TATA is the predominant feature in recognition of the promoter, but two large TAFs (TAF1 and TAF2) also contact DNA in the vicinity of the start point and influence the efficiency of the reaction.

Whereas most of the genes that RNA polymerase II transcribes are protein-coding mRNA genes, RNA pol II also transcribes some of the minor class snRNA genes. These have a similar, but not identical, promoter. Transcription of snRNA and the snoRNA (small nucleolar) genes in the nucleolus requires a specific modification of the CTD, a specific methylation of an Arg residue.

Assembly of the RNA polymerase II initiation complex provides an interesting contrast with prokaryotic transcription. Bacterial RNA polymerase is essentially a coherent aggregate with intrinsic ability to recognize and bind the promoter DNA; the sigma factor, needed for initiation but not for elongation, becomes part of the enzyme before DNA is bound, although it may later be released. RNA polymerase II can bind to the promoter, but only after separate transcription factors have bound. The transcription factors play a role analogous to that of bacterial sigma factor—to allow the basic polymerase to recognize DNA specifically at promoter sequences—but have evolved more independence. Indeed, the factors are primarily responsible for the specificity of promoter recognition. Only some of the factors participate in protein–DNA contacts (and only TBP and certain TAFs make sequence-specific contacts); thus protein–protein interactions are important in the assembly of the complex.

Although assembly can take place just at the core promoter in vitro, this reaction is not sufficient for transcription in vivo, where interactions with activators that recognize the more upstream elements are required. The activators interact with the basal apparatus at various stages during its assembly (see the chapter titled Eukaryotic Transcription Regulation).

18.8 Initiation Is Followed by Promoter Clearance and Elongation

Promoter melting (DNA unwinding) is necessary to begin the process of transcription. TFIIH is required for the formation of the open complex in conjunction with ATP hydrolysis to provide torsional stress for unwinding. Some final steps are then needed to release the RNA polymerase from the promoter once the first nucleotide bonds have been formed. Promoter clearance is the key regulated step in eukaryotes in determining if a poised gene or an active gene will be transcribed. This step is controlled by enhancers. (Note that the key step in bacterial transcription is conversion of the closed complex to the open complex; see the chapter titled Prokaryotic Transcription.) Most of the general transcription factors are required solely to bind RNA polymerase to the promoter, but some act at a later stage.

The transcription factors that bind enhancers usually do not directly contact elements at the promoter to control it, but rather bind to a coactivator that binds to the promoter elements. The coactivator Mediator is one of the most common coactivators. This is a very large multisubunit protein complex. In multicellular eukaryotes, it can contain 30 subunits or more. Many cell-type and gene-specific forms of Mediator contain a common core of subunits conserved from yeast to humans that integrate signals from many enhancer-bound transcription factors. Both poised and active genes require the interaction of the transcription factors bound to enhancers with the promoter by DNA looping with Mediator as the intermediate.

The last factors to join the initiation complex are TFIIE and TFIIH. They act at the later stages of initiation for unwinding the DNA. Binding of TFIIE causes the boundary of the region protected downstream to be extended by another turn of the double helix, to +30. TFIIH is the only general transcription factor that has multiple independent enzymatic activities. Its several activities include an ATPase, helicases of both polarities, and a kinase activity that can phosphorylate the CTD tail of RNA polymerase II (on serine 5 of the heptapeptide repeat). TFIIH is an exceptional factor that may also play a role in elongation. Its interaction with DNA downstream of the start point is required for RNA polymerase to escape from the promoter. TFIIH is also involved in repair of damage to DNA (see the chapter titled Repair Systems).

On a linear template, ATP hydrolysis, TFIIE, and the helicase activity of TFIIH (provided by the XPB and XPD subunits) are required for polymerase movement. This requirement is bypassed with a supercoiled template. This suggests that TFIIE and TFIIH are required to melt DNA to allow polymerase movement to begin. The helicase activity of the XPB subunit of TFIIH is responsible for the actual melting of DNA.

RNA polymerase II stutters when it starts transcription. (The result is not dissimilar to the abortive initiation of bacterial RNA polymerase discussed in the chapter titled Prokaryotic Transcription, although the mechanism is different.) RNA polymerase II terminates after a short distance; small oligonucleotides of 4 to 5 nucleotides are unstable; and the crystal structures of these RNA–DNA hybrids are unordered. Only longer hybrids have proper base pairing. The short RNA products are degraded rapidly. The suggestion is that this abortive initiation is a form of promoter proofreading. To extend elongation into the transcription unit, a kinase complex, P-TEFb, is required. P-TEFb contains the CDK9 kinase, which is a member of the kinase family that controls the cell cycle. P-TEFb acts on the CTD to phosphorylate it further (on serine 2 of the heptapeptide repeat). It is not yet understood why this effect is required at some promoters but not others or how it is regulated.

Phosphorylation of the CTD tail is needed to release RNA polymerase II from the promoter and transcription factors so that it can make the transition to the elongating form, as shown in FIGURE 18.14. Real-time observation of live cells shows a bursting pattern that is gene specific, rather than continuous initiation. The phosphorylation pattern on the CTD is dynamic during the elongation process, catalyzed and controlled by multiple protein kinases, including P-TEFb, and phosphatases. Most of the basal transcription factors are released from the promoter at this stage.

FIGURE 18.14 Modification of the RNA polymerase II CTD heptapeptide during transcription. The CTD of RNA polymerase II when it enters the preinitiation complex is unphosphorylated. Phosphorylation of Ser residues serves as binding sites for both mRNA processing enzymes and kinases that catalyze further phosphorylation as described in the figure.

Reprinted from Trends Genet., vol. 24, S. Egloff and S. Murphy, Cracking the RNA polymerase II CTD code, pp. 280–288. Copyright 2008, with permission from Elsevier [http://www.sciencedirect.com/science/journal/01689525].

The CTD is involved, directly or indirectly, in processing mRNA while it is being synthesized and after it has been released by RNA polymerase II. Sites of phosphorylation on the CTD serve as a recognition or anchor point for other proteins to dock with the polymerase. The capping enzyme (guanylyl transferase), which adds the G residue to the 5′ end of newly synthesized mRNA, binds to CTD phosphorylated at serine 5, the first phosphorylation event catalyzed by TFIIH. This may be important in enabling it to modify (and thus protect) the 5′ end as soon as it is synthesized. Subsequently, serine 2 phosphorylation by P-TEFb leads to recruitment of a set of proteins called SCAFs to the CTD, and they, in turn, bind to splicing factors. This may be a means of coordinating transcription and splicing. Some components of the cleavage/polyadenylation apparatus used during transcription termination also bind to the CTD phosphorylated at serine 2. Oddly enough, they do so at the time of initiation, so that RNA polymerase is ready for the 3′ end processing reactions as soon as it sets out. Finally, export from the nucleus through the nuclear pore is also controlled by the CTD and may be coordinated with 3′ end processing. All of this suggests that the CTD may be a general focus for connecting other processes with transcription. In the cases of capping and splicing, the CTD functions indirectly to promote formation of the protein complexes that undertake the reactions. In the case of 3′ end generation, it may participate directly in the reaction. Control of the life history of an mRNA does not end here. Recent data show that in yeast a subset of mRNAs exist whose cytoplasmic stability or turnover is directly controlled by the promoter/upstream activating sequence (UAS). Binding sites for specific transcription factors control recruitment of stability/instability factors that bind to the mRNA during transcription.

The key event in determining whether (and when, in the case of a poised or paused polymerase, see the following discussion) a gene will be expressed is promoter clearance, release from the promoter regulated by PAF-1, the gatekeeper for regulation of gene expression. Once that has occurred and initiation factors are released, there is a transition to the elongation phase. The transcription complex now consists of the RNA polymerase II, the basal factors TFIIE and TFIIH, and all of the enzymes and factors bound to the CTD. Elongation factors such as TFIIF and TFIIS and others to prevent inappropriate pausing may be present in another large complex called super elongation complex (SEC).

The RNA polymerase, like the ribosome, functions as a Brownian ratchet where random fluctuations are stabilized and (usually) converted into forward motion by the binding of nucleotides. This, then, means that forward as well as backward or backtracking motion occurs. Backtracking also occurs when an incorrect nucleotide is inserted and the duplex structure of the 3′ end is improperly base paired. Backtracking is a necessary component of the fidelity mechanism. The dynamics of this are controlled by the underlying DNA sequence context and elongation factors such as TFIIF, TFIIS, Elongin, and a number of others.

As discussed earlier in the section The Basal Apparatus Assembles at the Promoter, considerable heterogeneity can exist in the DNA sequence elements that comprise the core promoter that can lead to promoter specificity of different genes. One of these elements is known as the pause button, a G-C–rich sequence typically located downstream from the start of initiation. This element has been found in a surprising number of Drosophila developmental genes, among others. Release from pausing requires a separate set of regulatory steps controlled by the gene’s enhancer and a 7SK snRNA that provides a link between the enhancer, the polymerase, and a required chromatin mark. P-TEFb is required to phosphorylate negative regulating pause factors in order to inactivate them and to phosphorylate the CTD for release. A subset of human genes in a paused state is regulated by the oncogene transcription factor cMyc (see the chapter titled Replication Is Connected to the Cell Cycle). P-TEFb is specifically recruited to these genes by cMyc in order to release them from the paused state.

In summary, the general process of initiation is similar to that catalyzed by bacterial RNA polymerase. Binding of RNA polymerase generates a closed complex, which is converted at a later stage to an open complex in which the DNA strands have been separated. In the bacterial reaction, formation of the open complex completes the necessary structural change to DNA; a difference in the eukaryotic reaction is that further unwinding of the template is needed after this stage.

This complex now has to transcribe a chromatin template, through nucleosomes. The whole gene may be in open chromatin, especially if it is not too large, or only the area around the promoter. Some genes, like the Duchenne muscular dystrophy gene (DMD), can be megabases in size and require many hours to transcribe. The histone octamers must be transiently modified—in some cases temporarily disassembled—and then reassembled on the template (see the chapters titled Chromatin and Eukaryotic Transcription Regulation for more details). The octamer itself is changed by this process, having some of the canonical histone H3 replaced by the variant H3.3 during active transcription.

A model exists in which the first polymerase to leave the promoter acts as a pathfinder polymerase. Its major function is to ensure that the entire gene is in open chromatin. It carries with it enzyme complexes to facilitate transcription through nucleosomes. Both the initiation factor TFIIF and the elongation factor TFIIS are required. Histone H2B is dynamically monoubiquitinated in actively transcribed chromatin. This is required in order for the second step, methylation of histone H3, which is, in turn, required for the recruitment of chromatin remodelers (see the chapters titled Chromatin and Eukaryotic Transcription Regulation).

The most recent model has each polymerase using a chromatin-remodeling complex together with a histone chaperone to remove an H2A–H2B dimer, leaving a hexamer (in place of the octamer), which is easier to temporarily displace. These modifications also are necessary to reassemble the nucleosome octamer on the DNA in the wake of the RNA polymerase (see the Chromatin chapter).

In both bacteria and eukaryotes, there is a direct link from RNA polymerase to the activation of DNA repair. The basic phenomenon was first observed because transcribed genes are preferentially repaired. It was then discovered that it is only the template strand of DNA that is the target—the nontemplate strand is repaired at the same rate as bulk DNA. When RNA polymerase encounters DNA damage in the template strand, it stalls because it cannot use the damaged sequences as a template to direct complementary base pairing. This explains the specificity of the effect for the template strand (damage in the nontemplate strand does not impede progress of the RNA polymerase). Stalled polymerase at a damage site recruits a pair of proteins, CSA and CSB (proteins with the name CS are encoded by genes in which mutations lead to the disease Cockayne syndrome). The general transcription factor TFIIH, already present with the elongating polymerase, is essential to the repair process. TFIIH is found in alternative forms, which consist of a core associated with other subunits.

TFIIH has a common function in both initiating transcription and repairing damage. The same TFIIH helicase subunits (XPB and XPD) create the initial transcription bubble and melt DNA at a damaged site. Subunits with the name XP are encoded by genes in which mutations cause the disease xeroderma pigmentosum, which causes a predisposition to cancer. The role of TFIIH subunits in DNA repair is discussed in detail in the Repair Systems chapter.

The repair function may require modification or degradation of a stalled RNA polymerase. The large subunit of RNA polymerase is degraded by the ubiquitylation pathway when the enzyme stalls at sites of ultraviolet (UV) damage. The connection between the transcription/repair apparatus as such and the degradation of RNA polymerase is not yet fully understood. It is possible that removal of the polymerase is necessary once it has become stalled.

18.9 Enhancers Contain Bidirectional Elements That Assist Initiation

We have largely considered the promoter as an isolated region responsible for binding RNA polymerase. Eukaryotic promoters do not necessarily function alone, though. In most cases, the activity of a promoter is enormously increased by the presence of an enhancer located at a variable distance from the core promoter. Some enhancers function through long-range interactions of tens of kilobases; others function through short-range interactions and may lie quite close to the core promoter.

One of the first common elements to be described near the promoter was the sequence at −75, now called the CAAT box, named for its consensus sequence. It is often located close to −80, but it can function at distances that vary considerably from the start point. It functions in either orientation. Susceptibility to mutations suggests that the CAAT box plays a strong role in determining the efficiency of the promoter, but does not influence its specificity. A second common upstream element is the GC box at −90, which contains the sequence GGGCGG. Often, multiple copies are present in the promoter, and they occur in either orientation. The GC box, too, is a relatively common element near the promoter.

The concept that the enhancer is distinct from the promoter reflects two characteristics. The position of the enhancer relative to the promoter need not be fixed, but can vary substantially. FIGURE 18.15 shows that it can be upstream, downstream, or within a gene (typically in introns). In addition, it can function in either orientation (i.e., it can be inverted) relative to the promoter. Manipulations of DNA show that an enhancer can stimulate any promoter placed in its vicinity, even tens of kilobases away in either direction.

FIGURE 18.15 An enhancer can activate a promoter from upstream or downstream locations, and its sequence can be inverted relative to the promoter.

Like the promoter, an enhancer (or its alter ego, a silencer) is a modular element constructed of short DNA sequence elements that bind various types of transcription factors. Enhancers can be simple or complex depending on the number of binding elements and the type of transcription factors they bind.

One way to divide up the world of enhancer-binding transcription factors is to consider positive and negative factors. Transcription factors can be positive and stimulate transcription (as activators) or can be negative and repress transcription (as repressors). At any given time in a cell, as determined by its developmental history, that cell will contain a mixture of transcription factors that can bind to an enhancer. If more activators bind than repressors, the element will be an enhancer. If more repressors bind than activators, the element will be a silencer.

Another way to examine the transcription factors that bind enhancers is by function. The first class we will consider is called true activators; that is, they function by both binding specific DNA sites and making contact with the basal machinery at the promoter, either directly by themselves, or, more commonly, through coactivators like Mediator. This class functions equally well on a DNA template or a chromatin template. Two additional classes of activators have completely different mechanisms of activation. One includes activators that function by recruiting chromatin-modification enzymes and chromatin-remodeling complexes. Many activators actually function as true activators and by recruiting chromatin modifiers. The third class includes architectural transcription factors. Their sole function is to change the structure of the DNA, typically to bend it. This can then facilitate bringing together two transcription factors separated by a short distance to synergize. In the next section, Enhancers Work by Increasing the Concentration of Activators Near the Promoter, we examine more closely how the different classes of activators and repressors work together in an enhancer, and in the chapter titled Eukaryotic Transcription Regulation, we examine transcription regulation in more detail.

Elements analogous to enhancers, called upstream activating sequences (UASs), are found in yeast. They can function in either orientation at variable distances upstream of the promoter, but cannot function when located downstream. They have a regulatory role: The UAS is bound by the regulatory protein(s) that activates the genes downstream.

Reconstruction experiments in which the enhancer sequence is removed from the DNA and then is inserted elsewhere show that normal transcription can be sustained as long as it is present anywhere on the DNA molecule (as long as no insulators are present in the intervening DNA; see the Chromatin chapter). If a β-globin gene is placed on a DNA molecule that contains an enhancer, its transcription is increased in vivo more than 200-fold, even when the enhancer is several kilobytes upstream or downstream of the start point, in either orientation. It has not yet been discovered at what distance the enhancer fails to work.

18.10 Enhancers Work by Increasing the Concentration of Activators Near the Promoter

Enhancers function by binding combinations of transcription factors, either positive or negative, that control the promoter and, by extension, gene expression. The promoter is the site where, in open chromatin, basal transcription factors prebind so that RNA polymerase can find the promoter. How can an enhancer stimulate initiation at a promoter that can be located any distance away on either side of it?

Enhancer function involves interaction with the basal apparatus at the core promoter element. Enhancers are modular, like promoters. Some elements are found in both long-range enhancers and enhancers near promoters. Some individual elements found near promoters share with distal enhancers the ability to function at variable distance and in either orientation. Thus, the distinction between long-range and short-range enhancers is blurred.

The essential role of the enhancer may be to increase the concentration of activator in the vicinity of the promoter (vicinity in this sense being a relative term) in cis. Numerous experiments have demonstrated that the level of gene expression (i.e., the rate of transcription) is proportional to the net number of activator-binding sites. Typically, the more activators bound at an enhancer site, the higher the level of expression.

The Xenopus laevis ribosomal RNA enhancer is able to stimulate transcription from its RNA polymerase I promoter. This stimulation is relatively independent of location and is able to function when removed from the chromosome and placed with its promoter on a circular plasmid. Stimulation does not occur when the enhancer and promoter are on separated plasmids, but when the enhancer is placed on a plasmid that is catenated (interlocked) with a second plasmid that contains the promoter, initiation is almost as effective as when the enhancer and promoter are on the same circular molecule, as shown in FIGURE 18.16 (even though, in this case, the enhancer is acting on its promoter in trans). Again, this suggests that the critical feature is localization of the protein bound at the enhancer, which increases the enhancer’s chance of contacting a protein bound at the promoter.

FIGURE 18.16 An enhancer may function by bringing proteins into the vicinity of the promoter. An enhancer and promoter on separate circular DNAs do not interact as in (c), but can interact when the two molecules are catenated as in (b).

If proteins bound at an enhancer several kilobytes distant from a promoter interact directly with proteins bound in the vicinity of the start point, the organization of DNA must be flexible enough to allow the enhancer and promoter to be closely located. This requires the intervening DNA to be extruded as a large “loop.” Such loops have now been directly observed in the case of enhancers.

What limits the activity of an enhancer? Typically it works upon the nearest promoter. In some situations an enhancer is located between two promoters, but activates only one of them on the basis of specific protein–protein contacts between the complexes bound at the two elements. The action of an enhancer may be limited by an insulator—an element in DNA that prevents the enhancer from acting on promoters beyond the insulator (see the Chromatin chapter).

18.11 Gene Expression Is Associated with Demethylation

Methylation of DNA is one of several epigenetic regulatory events that influence the activity of a promoter (see the chapter titled Epigenetics I). Methylation at the promoter usually prevents transcription, and those methyl groups must be removed in order to activate a promoter. This effect is well characterized at promoters for both RNA polymerase I and RNA polymerase II. In effect, methylation is a reversible regulatory event, though DNA methylation patterns can also be stably maintained over many cell divisions. DNA methylation can be triggered by modifications to histones that include deacetylation and protein methylation (see the Chromatin chapter).

Methylation also occurs in a particular epigenetic phenomenon known as imprinting. In this case, modification occurs in sex-specific patterns in sperm or oocyte, with the result that maternal and paternal alleles are differentially expressed in the next generation (see the chapter titled Epigenetics II).

Methylation at promoters for RNA polymerase II occurs on the 5′ position of C (producing 5-methyl cytosine, or 5mC) at CG doublets (also referred to as CpG doublets) by two different classes of DNA methyltransferases. DNMT1 is a maintenance enzyme that methylates the new C in a methylated GC doublet after replication. DNMT2 is an enzyme that initiates de novo methylation of an unmethylated GC doublet. Although DNA methylation has been understood for some time, the mechanism of demethylation has been mysterious. Recently, the role of TET (ten eleven translocation) enzymes in demethylation of mammalian DNA has been proposed. These enzymes were originally identified as being involved in epigenetic inheritance and can convert 5mC to 5-hydroxymethylcytosine as the first step in a DNA damage excision repair pathway. A somewhat different DNA repair mechanism is known to be used for demethylation in plants.

Classically, the distribution of methyl groups was examined by taking advantage of restriction enzymes that cleave target sites containing the CG doublet. Two types of restriction activity are compared in FIGURE 18.17. These isoschizomers are enzymes that cleave the same target sequence in DNA, but have a different response to its state of methylation. It is now possible through direct DNA sequencing to determine the methylome, or pattern of 5mC at single-base resolution in an organism.

FIGURE 18.17 The restriction enzyme MspI cleaves all CCGG sequences whether or not they are methylated at the second C, but HpaII cleaves only unmethylated CCGG tetramers.

Many genes show a pattern in which the state of methylation is constant at most sites but varies at others. Some of the sites are methylated in all tissues examined; some sites are unmethylated in all tissues. A minority of sites are methylated in tissues in which the gene is not expressed, but are not methylated in tissues in which the gene is active. Even in active genes that are unmethylated in the promoter region these genes are typically methylated within the gene body, but usually not at the 3′ end. Thus, an active gene may be described as undermethylated.

Experiments with the drug 5-azacytidine produce indirect evidence that demethylation can result in gene expression. The drug is incorporated into DNA in place of deoxycytidine and cannot be methylated, because the 5′ position is blocked. This leads to the appearance of demethylated sites in DNA as the consequence of replication.

The phenotypic effects of 5-azacytidine include the induction of changes in the state of cellular differentiation. For example, muscle cells are induced to develop from non-muscle-cell precursors. The drug also activates genes on a silent X chromosome, which is consistent with the idea that the state of methylation is connected with chromosomal inactivity.

As well as examining the state of methylation of resident genes, we can compare the results of introducing methylated or nonmethylated DNA into new host cells. Such experiments show a clear correlation: The methylated gene is inactive, but the unmethylated gene is active.

What is the extent of the undermethylated region? In the chicken α-globin gene cluster in adult erythroid cells, the undermethylation is confined to sites that extend from about 500 bp upstream of the first of the two adult α genes to about 500 bp downstream of the second. Sites of undermethylation are present in the entire region, including the spacer between the genes. The region of undermethylation coincides with the region of maximum sensitivity to DNase I (see the Chromatin chapter). This argues that undermethylation is a feature of a domain that contains a transcribed gene or genes. As with many changes in chromatin, it seems likely that the absence of methyl groups is associated with the ability to be transcribed rather than with the act of transcription itself.

The problem in interpreting the general association between undermethylation and gene activation is that only a minority (sometimes a small minority) of the methylated sites are involved. It is likely that the state of methylation is critical at specific sites or in a restricted region. It is also possible that a reduction in the level of methylation (or even the complete removal of methyl groups from some stretch of DNA) is part of some structural change needed to permit transcription to proceed.

In particular, demethylation at the promoter may be necessary to make it available for the initiation of transcription. In the γ-globin gene, for example, the presence of methyl groups in the region around the start point, between −200 and +90, suppresses transcription. Removal of the three methyl groups located upstream of the start point, or of the three methyl groups located downstream, does not relieve the suppression. Removal of all methyl groups, though, allows the promoter to function. Transcription may therefore require a methyl-free region at the promoter (see the next section, CpG Islands Are Regulatory Targets). There are exceptions to this general relationship.

Some genes, however, can be expressed even when they are extensively methylated. Any connection between methylation and expression thus is not universal in an organism, but the general rule is that methylation prevents gene expression, and demethylation is required for expression.

18.12 CpG Islands Are Regulatory Targets

The origin of DNA methylation may have been as a defense mechanism to prevent inserted sequences such as viruses and transposable elements from being expressed. In both plants and animals, these sequences and simple repeat sequences are uniformly methylated.

It is now possible to examine the full methylome of an entire genome in multiple tissues at multiple times during development. The majority of methylation occurs in CpG islands in the 5′ regions of some genes and is connected with the effect of methylation on gene expression. These islands are detected by the presence of an increased density of the dinucleotide sequence CpG (CpG = 5′-CG-3′). A significant minority of methylation, however, is not found in CpG islands.

The CpG doublet occurs in vertebrate DNA at only about 20% of the frequency that would be expected from the proportion of G-C base pairs. (This may be because when CpG doublets are methylated on C, spontaneous deamination of methyl-C converts it to T, which, if incorrectly repaired, introduces a mutation that removes the doublet.) In certain regions, however, the density of CpG doublets reaches the predicted value; in fact, it is increased by a factor of 10 relative to the rest of the genome. The CpG doublets in these regions are generally unmethylated.

These CpG-rich islands have an average G-C content of about 60%, compared with the 20% average in bulk DNA. They take the form of stretches of DNA typically 1 to 2 kb long. The human genome has about 45,000 such islands. Some of the islands are present in repeated Alu elements and may just be the consequence of their high G-C content. The human genome sequence confirms that, excluding these, there are about 29,000 islands. The mouse genome has fewer islands, about 15,500. About 10,000 of the predicted islands in both species appear to reside in a context of sequences that are conserved between the species, suggesting that these may be the islands with regulatory significance. The structure of chromatin in these regions has changes associated with gene expression when the CpG islands are unmethylated (see the Chromatin chapter). The content of histone H1 is reduced (which probably means that the structure is less compact); the other histones are extensively acetylated (a feature that tends to be associated with gene expression); and DNase-hypersensitive sites or sites nearly devoid of histone octamers (as would be expected of active promoters) are present. The presence of methylated CpG sites precludes the presence of the histone variant H2A.Z in nucleosomes.

In several cases, CpG-rich islands begin just upstream of a promoter and extend downstream into the transcribed region before petering out. FIGURE 18.18 compares the density of CpG doublets in a “general” region of the genome with a CpG island identified from the DNA sequence. The CpG island surrounds the 5′ region of the APRT gene, which is constitutively expressed.

FIGURE 18.18 The typical density of CpG doublets in mammalian DNA is ~1/100 bp, as seen for a γ-globin gene. In a CpG-rich island, the density is increased to more than 10 doublets/100 bp. The island in the APRT gene starts ~100 bp upstream of the promoter and extends ~400 bp into the gene. Each vertical line represents a CpG doublet.

All of the housekeeping genes that are constitutively expressed have CpG islands; this accounts for about half of the islands. The remaining islands occur at the promoters of tissue-regulated genes; approximately 50% of these genes have islands. In these cases, the islands are unmethylated irrespective of the state of expression of the gene, so that CpG island methylation is not correlated with transcriptional state for tissue-specific genes. The presence of unmethylated CpG-rich islands may be necessary, but is not sufficient, for transcription. Thus, the presence of unmethylated CpG islands may be taken as an indication that a gene is potentially active rather than inevitably transcribed. Many islands that are unmethylated in an animal become methylated in cell lines in tissue culture (or in some cancers); this could be connected with the inability of these lines to express all of the functions typical of the tissue from which they were derived. The one clear example in which there is a strong correlation between promoter methylation and gene expression is when promoter CpG islands become methylated in the mammalian inactive X chromosome (see the chapter titled Epigenetics II).

Methylation of a CpG island can affect transcription. One of two mechanisms can be involved:

  • Methylation of a binding site for some factor may prevent it from binding. This happens in a case of binding to a regulatory site other than the promoter (see the chapter titled Epigenetics I).

  • Methylation may cause specific repressors to bind to the DNA.

Repression is caused by either of two types of protein that bind to methylated CpG sequences. The protein MeCP1 requires the presence of several methyl groups to bind to DNA, whereas MeCP2 and a family of related proteins can bind to a single methylated CpG base pair. This explains why a methylation-free zone is required for initiation of transcription. Binding of proteins of either type prevents transcription in vitro by a nuclear extract.

MeCP2, which directly represses transcription by interacting with complexes at the promoter, also interacts with the Sin3 repressor complex, which contains histone deacetylase activities. This observation provides a direct connection between two types of repressive modifications: methylation of DNA and deacetylation of histones.

Although promoters that contain CpG islands (approximately 60% CpG density) or that show no CpG enrichment (approximately 20% CpG density) exhibit a generally poor correlation between promoter methylation and transcription, a third class of promoters appears to be consistently regulated by CpG methylation. Approximately 12% of human genes contain so-called weak CpG islands, in which the density of CpGs is about 30%, intermediate between the other two classes of promoters. These genes show a strong inverse relationship between promoter CpG methylation and RNA polymerase II occupancy.

The absence of methyl groups is associated with gene expression (or at least the potential for expression). However, supposing that the state of methylation provides a general means for controlling gene expression presents some difficulties. In the case of Drosophila melanogaster (and other Dipteran insects), there is very little methylation of DNA (although one methyltransferase, Dnmt2, has been identified, its importance is unclear), and there is no methylation of DNA in the nematode Caenorhabditis elegans or in yeast. The other differences between inactive and active chromatin appear to be the same as in species that display methylation. Thus, in these organisms, any role that methylation has in vertebrates is replaced by some other mechanism.

The three changes that occur in typical active genes are as follows:

  • A hypersensitive chromatin site(s) is established near the promoter.

  • The chromatin of a domain, including the transcribed region, becomes more sensitive to DNase I.

  • The DNA of the same region is undermethylated.

All of these changes are necessary for transcription.

Summary

Of the three eukaryotic RNA polymerases, RNA polymerase I transcribes rDNA and accounts for the majority of activity, RNA polymerase II transcribes structural genes for mRNA and has the greatest diversity of products, and RNA polymerase III transcribes small RNAs. The enzymes have similar structures, with two large subunits and many smaller subunits; the enzymes have some common subunits.

None of the three RNA polymerases recognize their promoters directly. A unifying principle is that transcription factors have primary responsibility for recognizing the characteristic sequence elements of any particular promoter, and they serve, in turn, to bind the RNA polymerase and to position it correctly at the start point. At each type of promoter, histone octamers must be removed or moved. The initiation complex is then assembled by a series of reactions in which individual factors join (or leave) the complex. The factor TBP is required for initiation by all three RNA polymerases. In each case it provides one subunit of a transcription factor that binds in the vicinity of the start point.

An RNA polymerase II promoter consists of a number of short-sequence elements in the region upstream of the start point. Each element is bound by one or more transcription factors. The basal apparatus, which consists of the TFII factors, assembles at the start point and enables RNA polymerase to bind. The TATA box (if there is one) near the start point, and the initiator region immediately at the start point, are responsible for selection of the exact start point at promoters for RNA polymerase II. TBP binds directly to the TATA box when there is one; in TATA-less promoters it is located near the start point by binding to the Inr or to the DPE downstream. After binding of TFIID, the other general transcription factors for RNA polymerase II assemble the basal transcription apparatus at the promoter. Other elements in the promoter, located upstream of the TATA box, bind activators that interact with the basal apparatus. The activators and basal factors are released when RNA polymerase begins elongation.

The CTD of RNA polymerase II is phosphorylated during the initiation reaction. It provides a point of contact for proteins that modify the RNA transcript, including the 5′ capping enzyme, splicing factors, the 3′ processing complex, and mRNA export from the nucleus. As the RNA polymerase moves through the transcription unit, histone octamers must be modified and/or removed to allow passage.

Promoters may be stimulated by enhancers, sequences that can act at great distances and in either orientation on either side of a gene. Enhancers also consist of sets of elements, although they are more compactly organized. Some elements are found close to promoters and in distant enhancers. Enhancers function by assembling a protein complex that interacts with the proteins bound at the promoter, requiring that DNA between is “looped out.”

CpG islands contain concentrations of CpG doublets and often surround the promoters of constitutively expressed genes, although they are also found at the promoters of regulated genes. The island including a promoter must be unmethylated for that promoter to be able to initiate transcription. A specific protein binds to the methylated CpG doublets and prevents initiation of transcription.

References

18.1 Introduction

Review
  1. Kim, T.-K., and Shiekhattar, R. (2015). Architectural and functional commonalities between enhancers and promoters. Cell 162, 948–959.

Research
  1. Hah, N., Benner, C., Chang, L.-W., Yu, R. T., Downes, M., and Evans, R. M. (2015). Inflammation-sensitive super enhancer forms domains of coordinately regulated enhancer RNAs. Proc. Natl. Acad. Sci. USA 112, E297–E302.

18.2 Eukaryotic RNA Polymerases Consist of Many Subunits

Reviews
  1. Doi, R. H., and Wang, L. F. (1986). Multiple prokaryotic RNA polymerase sigma factors. Microbiol. Rev. 50, 227–243.

  2. Young, R. A. (1991). RNA polymerase II. Annu. Rev. Biochem. 60, 689–715.

18.3 RNA Polymerase I Has a Bipartite Promoter

Reviews
  1. Grummt, I. (2003). Life on a planet of its own: regulation of RNA polymerase I transcription in the nucleolus. Genes Dev. 17, 1691–1702.

  2. Leslie, M. (2014). Central command. Science 345, 506–507.

  3. Mathews, D. A., and Olson, W. M. (2006). What is new in the nucleolus? EMBO. Rep. 7, 870–873.

  4. Paule, M. R., and White, R. J. (2000). Survey and summary: transcription by RNA polymerases I and III. Nucleic Acids Res. 28, 1283–1298.

Research
  1. Bell, S. P., Learned, R. M., Jantzen, H. M., and Tjian, R. (1988). Functional cooperativity between transcription factors UBF1 and SL1 mediates human ribosomal RNA synthesis. Science 241, 1192–1197.

  2. Knutson, B. A., and Hahn, S. (2011). Yeast Rrn7 and human TAFIB are TFIIB-related RNA polymerase I general transcription factors. Science 333, 1637–1640.

  3. Kuhn, C. D., Geiger, S. R., Baumli, S., Gartmann, M., Gerber, J., Jennebach, S., Mielke, T., Tschochner, H., Beckmann, R., and Cramer P. (2007). Functional architecture of RNA polymerase I. Cell 131, 1260–1273.

  4. Naidu, S., Friedrich, J. K., Russell, J., and Zomerdijk, J. C. B. M. (2011). TAFIB is a TFIIB-like component of the basal transcription machinery for RNA polymerase I. Science 333, 1640–1642.

  5. Sanji, E., Poortinga, G., Sharkey, K., Hung, S., Holloway, T. P., Quin, J., Robb, E., Wong, L. H., Thomas, W. G., Stefanousky, V., Moss, T., Rothblum, L., Hannan, K. M., McArthur, G. A., Pearson, R. B., and Hannan, R. D. (2008). UBF levels determine the number of active rRNA genes in mammals. J. Cell Bio. 183, 1259–1274.

  6. Zhang, Y., Sikes, M. L., Beyer, A. L., and Schneider, D. A. (2009). The PafI complex is required for efficient transcription elongation by RNA polymerase I. Proc. Natl Acad. Sci. USA 106, 2153–2158.

18.4 RNA Polymerase III Uses Downstream and Upstream Promoters

Reviews
  1. Geiduschek, E. P., and Tocchini-Valentini, G. P. (1988). Transcription by RNA polymerase III. Annu. Rev. Biochem. 57, 873–914.

  2. Schramm, L., and Hernandez, N. (2002). Recruitment of RNA polymerase III to its target promoters. Genes Dev. 16, 2593–2620.

Research
  1. Bogenhagen, D. F., Sakonju, S., and Brown, D. D. (1980). A control region in the center of the 5S RNA gene directs specific initiation of transcription: II. The 3′ border of the region. Cell 19, 27–35.

  2. Canella, D., Praz, V., Reina, J. H., Cousin, P., and Hernandez, N. (2010). Defining the RNA polymerase III transcriptome: genome-wide localization of the RNA polymerase III transcription machinery in human cells. Genome Res. 20, 710–721.

  3. Galli, G., Hofstetter, H., and Birnstiel, M. L. (1981). Two conserved sequence blocks within eukaryotic tRNA genes are major promoter elements. Nature 294, 626–631.

  4. Kassavatis, G. A., Braun, B. R., Nguyen, L. H., and Geiduschek, E. P. (1990). S. cerevisiae TFIIIB is the transcription initiation factor proper of RNA polymerase III, while TFIIIA and TFIIIC are assembly factors. Cell 60, 235–245.

  5. Kassavetis, G. A., Joazeiro, C. A., Pisano, M., Geiduschek, E. P., Colbert, T., Hahn, S., and Blanco, J. A. (1992). The role of the TATA-binding protein in the assembly and function of the multisubunit yeast RNA polymerase III transcription factor, TFIIIB. Cell 71, 1055–1064.

  6. Kassavetis, G. A., Letts, G. A., and Geiduschek, E. P. (1999). A minimal RNA polymerase III transcription system. EMBO J. 18, 5042–5051.

  7. Kunkel, G. R., and Pederson, T. (1988). Upstream elements required for efficient transcription of a human U6 RNA gene resemble those of U1 and U2 genes even though a different polymerase is used. Genes Dev. 2, 196–204.

  8. Pieler, T., Hamm, J., and Roeder, R. G. (1987). The 5S gene internal control region is composed of three distinct sequence elements, organized as two functional domains with variable spacing. Cell 48, 91–100.

  9. Sakonju, S., Bogenhagen, D. F., and Brown, D. D. (1980). A control region in the center of the 5S RNA gene directs specific initiation of transcription: I. The 5′ border of the region. Cell 19, 13–25.

18.5 The Start Point for RNA Polymerase II

Reviews
  1. Butler, J. E., and Kadonaga, J. T. (2002). The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 16, 2583–2592.

  2. Smale, S. T., Jain, A., Kaufmann, J., Emami, K. H., Lo, K., and Garraway, I. P. (1998). The initiator element: a paradigm for core promoter heterogeneity within metazoan protein-coding genes. Cold Spring Harb Symp Quant Biol. 63, 21–31.

  3. Smale, S. T., and Kadonaga, J. T. (2003). The RNA polymerase II core promoter. Annu. Rev. Biochem. 72, 449–479.

  4. Woychik, N. A., and Hampsey, M. (2002). The RNA polymerase II machinery: structure illuminates function. Cell 108, 453–463.

Research
  1. Burke, T. W., and Kadonaga, J. T. (1996). Drosophila TFIID binds to a conserved downstream basal promoter element that is present in many TATA-box-deficient promoters. Genes Dev. 10, 711–724.

  2. Singer, V. L., Wobbe, C. R., and Struhl, K. (1990). A wide variety of DNA sequences can functionally replace a yeast TATA element for transcriptional activation. Genes Dev. 4, 636–645.

  3. Smale, S. T., and Baltimore, D. (1989). The “initiator” as a transcription control element. Cell 57, 103–113.

18.6 TBP Is a Universal Factor

Reviews
  1. Berk, A. J. (2000). TBP-like factors come into focus. Cell 103, 5–8.

  2. Burley, S. K., and Roeder, R. G. (1996). Biochemistry and structural biology of TFIID. Annu. Rev. Biochem. 65, 769–799.

  3. Hernandez, N. (1993). TBP, a universal eukaryotic transcription factor? Genes Dev. 7, 1291–1308.

  4. Lee, T. I., and Young, R. A. (1998). Regulation of gene expression by TBP-associated proteins. Genes Dev. 12, 1398–1408.

  5. Orphanides, G., Lagrange, T., and Reinberg, D. (1996). The general transcription factors of RNA polymerase II. Genes Dev. 10, 2657–2683.

Research
  1. Crowley, T. E., Hoey, T., Liu, J. K., Jan, Y. N., Jan, L. Y., and Tjian, R. (1993). A new factor related to TATA-binding protein has highly restricted expression patterns in Drosophila. Nature 361, 557–561.

  2. Horikoshi, M., Hai, T., Lin, Y. S., Green, M. R., and Roeder, R. G. (1988). Transcription factor ATF interacts with a TATA factor to facilitate establishment of a preinitiation complex. Cell 54, 1033–1042.

  3. Kim, J. L., Nikolov, D. B., and Burley, S. K. (1993). Cocrystal structure of TBP recognizing the minor groove of a TATA element. Nature 365, 520–527.

  4. Kim, Y., Geiger, J. H., Hahn, S., and Sigler, P. B. (1993). Crystal structure of a yeast TBP/TATA-box complex. Nature 365, 512–520.

  5. Liu, D., Ishima, R., Tong, K. I., Bagby, S., Kokubo, T., Muhandiram, D. R., Kay, L. E., Nakatani, Y., and Ikura M. (1998). Solution structure of a TBP-TAFII230 complex: protein mimicry of the minor groove surface of the TATA box unwound by TBP. Cell 94, 573–583.

  6. Martinez, E., Chiang, C. M., Ge, H., and Roeder, R. G. (1994). TATA-binding protein-associated factors in TFIID function through the initiator to direct basal transcription from a TATA-less class II promoter. EMBO. J. 13, 3115–3126.

  7. Nikolov, D. B., Hu, S.-H., Lin, J., Gasch, A., Hoffmann, A., Horikoshi, M., Chua, N.-H., Roeder, R. G., and Burley S. K. (1992). Crystal structure of TFIID TATA-box binding protein. Nature 360, 40–46.

  8. Ogryzko, V. V., Kotani, T., Zhang, X., Schiltz, R. L., Howard, T., Yang, X. J., Howard, B. H., Qin, J., and Nakatani, Y. (1998). Histone-like TAFs within the PCAF histone acetylase complex. Cell 94, 35–44.

  9. Sprouse R. O., Karpova, T. A., Mueller, F., Dasgupta, A., McNally, J. G., and Auble, D. T. (2008). Regulation of TATA-binding protein dynamics in living yeast cells. Proc. Natl Acad. Sci. USA 105, 13304–13308.

  10. Verrijzer, C. P., Chen, J. L., Yokomori, K., and Tjian, R. (1995). Binding of TAFs to core elements directs promoter selectivity by RNA polymerase II. Cell 81, 1115–1125.

  11. Wu, J., Parkhurst, K. M., Powell, R. M., Brenowitz, M., and Parkhurst, L. J. (2001). DNA bends in TATA-binding protein-TATA complexes in solution are DNA sequence-dependent. J. Biol. Chem. 276, 14614–14622.

18.7 The Basal Apparatus Assembles at the Promoter

Reviews
  1. Egloff, S., and Murphy, S. (2008). Cracking the RNA polymerase II CTD code. Trends Genet. 24, 280–288.

  2. Muller F., Demeny, M. A., and Tora, L. (2007). New problems in RNA polymerase II transcription initiation: matching the diversity of core promoters with a variety of promoter recognition factors. J. Biol. Chem. 282, 14685–14689.

  3. Nikolov, D. B., and Burley, S. K. (1997). RNA polymerase II transcription initiation: a structural view. Proc. Natl. Acad. Sci. USA 94, 15–22.

  4. Zawel, L., and Reinberg, D. (1993). Initiation of transcription by RNA polymerase II: a multi-step process. Prog. Nucleic Acid Res. Mol. Biol. 44, 67–108.

Research
  1. Buratowski, S., Hahn, S., Guarente, L., and Sharp, P. A. (1989). Five intermediate complexes in transcription initiation by RNA polymerase II. Cell 56, 549–561.

  2. Burke, T. W., and Kadonaga, J. T. (1996). Drosophila TFIID binds to a conserved downstream basal promoter element that is present in many TATA-box-deficient promoters. Genes Dev. 10, 711–724.

  3. Bushnell, D. A., Westover, K. D., Davis, R. E., and Kornberg, R. D. (2004). Structural basis of transcription: an RNA polymerase II-TFIIB cocrystal at 4.5 angstroms. Science 303, 983–988.

  4. Carninci, P., et al. (2006) Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Gen. 38, 626–635.

  5. Fishburn, J., Tomko, E., Galburt, E., and Hahn, S. (2015). Double-stranded DNA translocase activity of transcription factor TFIIH and the mechanism of RNA polymerase II open complex formation. Proc. Natl Acad. Sci USA 112, 3961–3966.

  6. Kostrewa, D., Zeller, M. E., Armache, K. J., Seiz, M., Leike, K., Thomm, M., and Cramer, P. (2009). RNA polymerase II-TFIIB structure and mechanism of transcription initiation. Nature 462, 323–330.

  7. Liu, X., Bushnell, D. A., Wang, D., Calero, G., and Kornberg, R. D. (2011). Structure of an RNA polymerase II-TFIIB complex and the transcription initiation mechanism. Science 327, 206–209.

  8. Sims, R. J., III, Rojas, L. A., Beck, D., Bonasio, R., Schuller, R. Drury, W. J. III, Eick, D., and Reinberg, D. (2011). The C-terminal domain of RNA polymerase II is modified by site-specific methylation. Science 332, 99–103.

18.8 Initiation Is Followed by Promoter Clearance and Elongation

Reviews
  1. Ares, M., Jr., and Proudfoot, N. J. (2005). The Spanish connection: transcription and mRNA processing get even closer. Cell 120, 163–166.

  2. Calvo, O., and Manley, J. L. (2003). Strange bedfellows: polyadenylation factors at the promoter. Genes Dev. 17, 1321–1327.

  3. Hartzog, G. A., and Quan, T. K. (2008). Just the FACTs: Histone H2B ubiquitylation and nucleosome dynamics. Mol. Cell 31, 2–4.

  4. Lehmann, A. R. (2001). The xeroderma pigmentosum group D (XPD) gene: one gene, two functions, three diseases. Genes Dev. 15, 15–23.

  5. Liu, X., Bushnell, D. A., Silva, D. A., Huang, X., and Kornberg, R. D. (2011). Initiation complex structure and promoter proofreading. Science 333, 633–637.

  6. Nair, G., and Raj, A. (2011). Time-lapse transcription. Science 332, 431–432.

  7. Price, D. H. (2000). P-TEFb, a cyclin dependent kinase controlling elongation by RNA polymerase II. Mol. Cell Biol. 20, 2629–2634.

  8. Selth, L. A., Sigurdsson, S., and Svejstrup, J. Q. (2010). Transcript elongation by RNA polymerase II. Annu. Rev. Biochem. 79, 271–293.

  9. Woychik, N. A., and Hampsey, M. (2002). The RNA polymerase II machinery: structure illuminates function. Cell 108, 453–463.

Research
  1. Bregman, A., Avraham-Kelbert, M., Barkai, O., Duek, L., Gutman, A., and Choder, M. (2011). Promoter elements regulate cytoplasmic mRNA decay. Cell 147, 1473–1483.

  2. Chen, F. X., Woodfin, A. R., Gardini, A., Rickels, R. A., Marshall, S. A., Smith, E. R., Shiekhattar, R., and Shilatifard, A. (2015). PAF-1, a molecular regulator of promoter-proximal pausing by RNA polymerase II. Cell 162, 1003–1015.

  3. Cheung, A. C., and Cramer, P. (2011). Structural basis of RNA polymerase backtracking, arrest and reactivation. Nature 471, 249–253.

  4. Douziech, M., Coin, F., Chipoulet, J. M., Arai, Y., Ohkuma, Y., Egly, J. M., and Coulombe, B. (2000). Mechanism of promoter melting by the xeroderma pigmentosum complementation group B helicase of transcription factor IIH revealed by protein-DNA photo-cross-linking. Mol. Cell Biol. 20, 8168–8177.

  5. Fong, N., and Bentley, D. L. (2001). Capping, splicing, and 3′ processing are independently stimulated by RNA polymerase II: different functions for different segments of the CTD. Genes Dev. 15, 1783–1795.

  6. Goodrich, J. A., and Tjian, R. (1994). Transcription factors IIE and IIH and ATP hydrolysis direct promoter clearance by RNA polymerase II. Cell 77, 145–156.

  7. Hendrix, D. A., Hong, J. W., Zeitlinger, J., Rokhsar, D. S., and Levine, M. S. (2008). Promoter elements associated with RNA polymerase II stalling in the Drosophila embryo. Proc. Natl. Acad. Sci. USA 105, 7762–7767.

  8. Hirota, K., Miyosha, T., Kugou, K., Hoffman, C. S., Shibata, T., and Ohta, K. (2008). Stepwise chromatin remodeling by a cascade of transcription initiation of non-coding RNAs. Nature 456, 130–135.

  9. Holstege, F. C., van der Vliet, P. C., and Timmers, H. T. (1996). Opening of an RNA polymerase II promoter occurs in two distinct steps and requires the basal transcription factors IIE and IIH. EMBO. J. 15, 1666–1677.

  10. Kim, T. K., Ebright, R. H., and Reinberg, D. (2000). Mechanism of ATP-dependent promoter melting by transcription factor IIH. Science 288, 1418–1422.

  11. Lans, H., Marteijn, J. A., Schumacher, B., Hoeijmakers, J. H. J., Lansen, G., and Vermeulen, W. (2010). Involvement of global genome repair, transcription coupled repair and chromosome remodeling in UV damage response changes during development. PLoS Genet. 6(5), e100094. doi 10137.

  12. Liu, W., Ma, Q., Wong, K., Li, W., Ohgi, K., Zhang, J., and Aggarwal, A. K. (2013). Brd4 and JMJDG-associated anti-pause enhancers in regulation of transcriptional pause release. Cell 155, 1581–1595.

  13. Luse, D. S., Spangler, L. C., and Ujvari, A. (2011). Efficient and rapid nucleosome traversal by RNA polymerase II depends on a combination of transcription elongation factors. J. Biol. Chem. 286, 6040–6048.

  14. Montanuy, I., Torremocha, R., Hernandez-Munain, C., and Suñé, C. (2008). Promoter influences transcription elongation: TATA-BOX element mediates the assembly of processive transcription complexes responsive to cyclin-dependent kinase 9. J. Biol. Chem. 283, 7368–7378.

  15. Plaschka, C., Lariviere, L., Wenzeck, L., Seizi, M., Herman, M., Tegunov, D., Petrotchenko, E. V., Borchers, C. H., Baumeister, W., Herzog, F., Villa, E., and Cramer, P. (2015). Architecture of the RNA polymerase II-mediator core initiation complex. Nature 518, 376–380.

  16. Rahl, P. B., Lin, C. Y., Seila, A. C., Flynn, R. A., McCuine, S., Burge, C. B., Sharpe, P. A., and Young, R. A. (2010). cMyc Regulates transcriptional pause release. Cell 141, 432–445.

  17. Spangler, L., Wang, X., Conaway, J. W., Conaway, R. C, and Dvir, A. (2001). TFIIH action in transcription initiation and promoter escape requires distinct regions of downstream promoter DNA. Proc. Natl. Acad. Sci. USA 98, 5544–5549.

18.9 Enhancers Contain Bidirectional Elements That Assist Initiation

Reviews
  1. Bulger, M., and Groudine, M. (2011). Functional and mechanistic diversity of distal transcription enhancers. Cell 144, 327–339.

  2. Muller, M. M., Gerster, T., and Schaffner, W. (1988). Enhancer sequences and the regulation of gene transcription. Eur. J. Biochem. 176, 485–495.

Research
  1. Banerji, J., Rusconi, S., and Schaffner, W. (1981). Expression of β-globin gene is enhanced by remote SV40 DNA sequences. Cell 27, 299–308.

18.10 Enhancers Work by Increasing the Concentration of Activators Near the Promoter

Review
  1. Blackwood, E. M., and Kadonaga, J. T. (1998). Going the distance: a current view of enhancer action. Science 281, 60–63.

Research
  1. Mueller-Storm, H. P., Sogo, J. M., and Schaffner, W. (1989). An enhancer stimulates transcription in trans when attached to the promoter via a protein bridge. Cell 58, 767–777.

  2. Zenke, M., Grundström, T., Matthes, H., Wintzerith M., Schatz, C., Wildeman, A., and Chambon, P. (1986). Multiple sequence motifs are involved in SV40 enhancer function. EMBO. J. 5, 387–397.

18.11 Gene Expression Is Associated with Demethylation

Review
  1. Nabel, C. S., and Kohli, R. M. (2011). Demystifying DNA demethylation. Science 333, 1229–1230.

Research
  1. Zemach, A., McDaniel, I. E., Silva, P., and Zilberman, D. (2010). Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science 328, 916–919.

18.12 CpG Islands Are Regulatory Targets

Reviews
  1. Bird, A. (2002). DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21.

  2. Lee, T. F., Zhai, J., and Meyers, B. C. (2010). Conservation and divergence in eukaryotic DNA methylation. Proc. Natl. Acad. Sci. USA 107, 9027–9028.

Research
  1. Antequera, F., and Bird, A. (1993). Number of CpG islands and genes in human and mouse. Proc. Natl. Acad. Sci. USA 90, 11995–11999.

  2. Boyes, J., and Bird, A. (1991). DNA methylation inhibits transcription indirectly via a methyl-CpG binding protein. Cell 64, 1123–1134.

  3. Lister, R., Pelizzola, M., Dowen, R. H., Hawkins, R. D., Hon, G., Tonti-Filippini, J., Nery, J. R., Lee, L., Zhen, Y., Ngo, Q. M., Edsen, L., Antosiewicz-Bourget, J., Stewart, R., Ruotti, V., Millar, A. H., Thompson, J. A., Ren, B., and Ecker, J. R. (2009). Human DNA methylation at base resolution show widespread epigenomic differences. Nature 462, 315–322.

  4. Zilberman, D., Coleman-Derr, D., Ballinger, T., and Henikoff, S. (2008). Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks. Nature 456, 125–130.