The basic principle of gene regulation is that expression (transcription) is controlled by a regulator that interacts with a specific sequence or structure in DNA or mRNA at some stage prior to the synthesis of protein. The stage of expression that is controlled can be transcription when the target for regulation is DNA, or it can be at translation when the target for regulation is RNA. Control during transcription can be at initiation, elongation, or termination. The regulator can be a protein or an RNA. “Controlled” can mean that the regulator turns off (represses) or turns on (activates) the target. Expression of many genes can be coordinately controlled by a single regulator gene on the principle that each target contains a copy of the sequence or structure that the regulator recognizes. Regulators may themselves be regulated, most typically in response to small molecules whose supply responds to environmental conditions. Regulators may be controlled by other regulators to make complex circuits or networks.
Many protein regulators work on the principle of allosteric changes. The protein has two binding sites—one for a nucleic acid target, the other for a small molecule. Binding of the small molecule to its site changes the conformation in such a way as to alter the affinity of the other site for the nucleic acid. The way in which this happens is known in detail for the lac repressor in Escherichia coli (see the chapter titled The Operon). Protein regulators are often multimeric, with a symmetrical organization that allows two subunits to contact a palindromic or repeated target on DNA. This can generate cooperative binding effects that create a more sensitive response to regulation.
Regulation via RNA uses changes in secondary structure base pairing as the guiding principle. The ability of an RNA to shift between different conformations with regulatory consequences is the nucleic acid’s alternative to the allosteric changes of protein conformation. The changes in structure may result from either intramolecular or intermolecular interactions.
It was once thought that RNA was merely structural: mRNA carried the blueprint for the synthesis of a protein, rRNA was the structural component of the ribosome, and tRNA shuttled amino acids to the ribosome. It is now clear that there is a vast RNA world where RNAs have numerous functions, where mRNA can regulate its own translation (see the chapter titled The Operon), where rRNA catalyzes peptide bond formation (see the Translation chapter), and where tRNAs participate in the mechanism of fidelity of translation (see the Translation chapter).
The RNA world extends far beyond the three major RNA types—mRNA, rRNA, and tRNA—to include dozens of different RNAs. These RNAs can function as guide RNAs or as splicing cofactors. In addition, a large and very heterogeneous class of RNAs with known and suspected regulatory functions is described here and in the chapter titled Regulatory RNA. However, all the mysteries in this new RNA world have certainly not been resolved.
As seen in the chapter titled The Operon, an mRNA is more than simply an open reading frame (ORF). Regions in the bacterial 5′ untranslated region (UTR) contain elements that, due to coupled transcription/translation, can control transcription termination. The 5′ UTR sequence itself can determine if an mRNA is a “good” message, which supports a high level of translation, or a “poor” message, which does not. Another type of element in a 5′ UTR that can control expression of the mRNA is a riboswitch. A riboswitch is an RNA domain that contains a sequence that can change in secondary structure to control its activity. This change can be mediated by small metabolites. It is important to note that RNA structural change can be at the level of secondary structure—how the RNA folds—or tertiary structure—how the RNA arms and loops associate together. These are independent structural features.
Dozens of different riboswitches have been identified, each responding to a different ligand. The RNA domain that binds the metabolite is called the aptamer. Aptamer binding causes a structural change to the platform, the remainder of the riboswitch that carries out its function. One type of riboswitch is an RNA element that can assume alternate base-pairing configurations (controlled by metabolites in the environment) that can affect translation of the mRNA. FIGURE 29.1 illustrates the regulation of the system that produces the metabolite GlcN6P (glucosamine-6-phosphate). The gene glmS codes for an enzyme that synthesizes GlcN6P from fructose-6-phosphate and glutamine. GlcN6P is a fundamental intermediate in cell wall biosynthesis in bacteria. The mRNA contains a long 5′ UTR before the coding region of the mRNA. (Extra-long 5′ or 3′ UTRs are a clue that there may be regulatory elements in them.) Within the 5′ UTR is a ribozyme—a sequence of RNA that has catalytic activity (see the Catalytic RNA chapter). In this case, the catalytic activity is an endonuclease that cleaves its own RNA. It is activated by binding of the metabolite product, GlcN6P, to the aptamer region of the ribozyme. The consequence is that accumulation of GlcN6P activates the ribozyme, which cleaves the mRNA, which, in turn, prevents further translation. This is an exact parallel to allosteric control of a repressor protein by the end product of a metabolic pathway. There are numerous examples of such riboswitches in bacteria.
Not all riboswitches encode a ribozyme that controls mRNA stability. Other riboswitches have alternate configurations of the RNA that allow or prevent expression of the mRNA by affecting ribosome binding. Riboswitches are found predominantly in bacteria and less commonly in eukaryotes.
An interesting eukaryotic riboswitch has been described in the fungus Neurospora to control alternate splicing. The gene NMT1 (involved with vitamin B1 synthesis) produces an mRNA precursor with a single intron that has two splice donor sites (see the chapter titled RNA Splicing and Processing). Alternative use of these two sites can produce a functional or nonfunctional message depending on the concentration of a vitamin B1 metabolite, thiamine pyrophosphate (TPP). Thus, product concentration controls product formation, a form of repressible control. The selection of the splice site is controlled by a riboswitch in the intron. At a low concentration of TPP the proximal splice donor site is chosen and the distal splice donor site is blocked by the riboswitch, as shown in FIGURE 29.2. This splice produces a functional mRNA. At high TPP concentration, TPP binds the riboswitch to alter its configuration and prevents blocking of the distal splice donor site to allow the alternate splice, which produces a nonfunctional mRNA.
Noncoding RNAs (ncRNAs) and their genes, such as rRNA and tRNA, have been known since the 1950s. Whole families of new ncRNAs and their genes have been identified since then. These include snRNAs involved in splicing, snoRNAs involved in processing large RNAs such as rRNAs (see the chapter titled RNA Splicing and Processing), and microRNAs (described in the chapter titled Regulatory RNA). These RNAs can generally be divided by size into large (rRNA size), medium (tRNA size), and microRNA sizes. This section focuses on the large-size class of ncRNAs, also called lncRNAs.
Experiments using both whole-genome tiling arrays (probing not just genes but whole genomes) and massive, whole-cell RNA-sequencing experiments have shown that the vast majority of the eukaryotic genome is transcribed. This includes gene regions, of course, but surprisingly it also includes both the coding and noncoding strands of the genes, the regions between genes, telomeres, and centromeres. The estimate is that as much as 70% of human genes produce an antisense RNA. This pattern varies with the cell type and is presumably regulated. Transcription from the both the coding (sense) and noncoding (antisense) strands can result in noncoding RNAs with regulatory functions. Another ncRNA class is long intergenic noncoding RNA (lincRNA), as the name implies originating from intergenic regions, previously assumed to house no information. In addition to genes and antisense gene regions being transcribed, and the regions between genes being transcribed, promoters and enhancers are transcribed as well, giving rise to pRNAs (promoter RNA, sometimes called PROMPTs) and eRNAs (enhancer RNA).
A systematic, focused effort began a few years ago to examine the human genome in depth to understand its functional genomic content—called the Encyclopedia of DNA Elements (ENCODE) project. Shortly thereafter, the model organism ENCODE (modENCODE) projects were begun, focusing on the Caenorhabditis elegans and Drosophila melanogaster genomes. The first phase of these projects has examined about 1% of the human genome and the entire C. elegans and Drosophila genomes.
At the start of the modENCODE project, C. elegans was known to have about 1000 ncRNAs. Data now support a model showing more than 21,000 ncRNAs called the 21k set. (Note that C. elegans has about 19,000 classical genes, but what is the definition of a gene?) A second set, comprising about 7000 ncRNAs (called the 7k set) has been culled from the first by fine-tuning the identification model. This in itself demonstrates the difficulty of distinguishing potentially genuine functional transcripts from accidental transcription events.
Base pairing offers a powerful means for one RNA to control the activity of another. Many cases have been identified in both prokaryotes and eukaryotes where a (usually rather short) single-stranded RNA base pairs with a complementary region of an mRNA, and as a result it prevents expression of the mRNA. One of the early illustrations of this effect was provided by an artificial situation in which antisense genes were introduced into eukaryotic cells.
Antisense genes are constructed by reversing the orientation of a gene with regard to its promoter, so that the “antisense” strand is transcribed into an antisense noncoding RNA, as illustrated in FIGURE 29.3. Synthesis of antisense RNA can inactivate a target RNA in either prokaryotic or eukaryotic cells. An antisense RNA is in effect an RNA regulator. Quantitation of the effect is not entirely reliable, but it seems that an excess (perhaps a considerable excess) of the antisense RNA may be necessary.
At what stage does the antisense RNA inhibit expression? It could in principle prevent transcription of the authentic gene, processing of its RNA product, or translation of the messenger. Results with different systems show that the inhibition depends on formation of RNA–RNA duplex molecules, but this can occur either in the nucleus or in the cytoplasm. In the case of an antisense gene stably carried by a cultured cell, sense–antisense RNA duplexes form in the nucleus, preventing normal processing and/or transport of the sense RNA. In another case, injection of antisense RNA into the cytoplasm inhibits translation by forming duplex RNA in the 5′ region of the mRNA.
This technique offers a powerful approach for turning off genes at will; for example, the function of a regulatory gene can be investigated by introducing an antisense version. An extension of this technique is to place the antisense gene under the control of a promoter that is itself subject to regulation. The target gene can then be turned off and on by regulating the production of antisense RNA. This technique allows investigation of the importance of the timing of expression of the target gene.
Antisense RNA in eukaryotes has been known for some time. The first genome-sequencing projects demonstrated that nested genes (genes located within the introns of other genes) are widespread. They are more common than was first thought, comprising as much as 5% to 10% of genes. If the nested gene is transcribed from the opposite strand, then antisense RNA is produced. This head-to-head arrangement of a nested gene will also lead to transcriptional interference (TI), because both genes cannot be transcribed simultaneously.
Transcriptional interference has emerged as a significant mechanism of transcriptional regulation, and it can actually occur both when an interfering RNA is produced in an antisense orientation, as described earlier, or in the sense orientation. For example, the yeast SER3 gene (involved in serine biosynthesis) is normally repressed in the presence of serine and induced in its absence. It turns out that under serine-rich, repressive conditions, a noncoding RNA is expressed from the intergenic region upstream of the SER3 promoter and is transcribed from the same strand as SER3 across its promoter. This RNA (named for its gene, the SER3 regulatory gene, or SRG1) does not encode a protein, but its high expression ultimately serves to disrupt transcription initiation at the SER3 promoter. SRG1 is induced by serine; transcription by RNA pol II and the elongation factor Paf1 results in the recruitment of histone modification factors and the chromatin remodeling complex SWI/SNF, which then results in the deposition of a nucleosome on the SER3 promoter, preventing transcription. The end product of the biosynthetic pathway, serine, thus regulates SER3 by causing transcriptional interference at the SER3 promoter by a sense transcript. It is important to note that in transcriptional interference, it can be transcription per se, rather than the RNA product that is responsible for the regulatory effect.
A direct role for antisense RNA in transcription control has been demonstrated in the yeast Saccharomyces cerevisiae. The gene PHO84 is regulated in part by a class of noncoding RNAs called cryptic unstable transcripts, or CUTs. As shown in FIGURE 29.4, in addition to the promoter at the 5′ end of the gene, there is another promoter on the opposite strand that is unregulated. This promoter requires Set1 histone methyltransferase for activity and produces an antisense RNA. Under normal conditions, this RNA is rapidly degraded by the TRAMP (Transgenic Adenocarcinoma of the Mouse Prostate) complex and exosome RNase complexes (see the mRNA Stability and Localization chapter) as it is produced. In the absence of degradation or in aging cells, the antisense RNA persists. This antisense RNA, or CUT, works in trans to recruit histone deacetylase enzymes that remove acetate groups from histones, thereby causing the chromatin over the gene region to be remodeled and condensed so that the gene can no longer be transcribed (see the Eukaryotic Transcription Regulation chapter). This is gene-specific remodeling directed by the antisense RNA and does not extend to the neighboring genes. The effect may also be brought about by a second exogenous copy of PHO84 on a plasmid in trans, called transcriptional gene silencing, or TGS, a phenomenon often seen in plants.
Since this discovery, similar examples of ncRNAs that result in alteration of local chromatin structure have been described, such as a long RNA transcribed from the GAL1-10 locus (see the Eukaryotic Transcription Regulation chapter) that also results in histone deacetylation (as well as methylation) to promote GAL gene repression through chromatin remodeling. ncRNAs also prevent Ty retrotransposition through changes in chromatin structure in trans; this is reminiscent of the role of piRNAs in Drosophila (discussed in the chapter titled Regulatory RNA).
This phenomenon may be quite widespread. In human HeLa cells, when a component of the RNA degradation machinery is disabled, vast amounts of upstream transcripts are observed from all three classes of active promoters (i.e., pRNAs, or PROMPTs). These RNAs are capped and polyadenylated at their 3′ end. Like CUTs in yeast, this RNA is very unstable. It can occur in both directions and may be related to the fact that open chromatin is available.
In addition to promoter-derived ncRNA (PROMPTs), enhancers are also transcribed and give rise to eRNAs. It has been proposed that these eRNAs (through base pairing with PROMPTs) can establish the necessary enhancer–promoter interactions necessary for initiating transcription.
Although some of these long ncRNAs are clearly derived from the promoters or gene body of classical genes, such as the PROMPTs and CUTs, others are derived from intergenic regions and are not associated with classical genes. One of the best examples, known for some time, is Xist (described in the chapter titled Epigenetics II). Ten different proteins bind to Xist RNA to exclude RNA Pol II and silence transcription. It also is responsible for recruiting the Polycomb repressor complex. (Interestingly, Xist itself is regulated by its antisense partner transcript, TsiX). Whereas Xist acts only in cis, on the X chromosome, others can act in trans, on multiple chromosomes. In response to DNA damage, p53 acting as a transcription factor activates multiple lincRNAs. One of these, lincRNA-p21 (see the chapter titled Replication Is Connected to the Cell Cycle), is itself targeted to multiple sites and acts as a transcription repressor.
Another lincRNA that is well characterized is the human HOTAIR, named because when discovered it was believed by many that this field of research was useless. It is transcribed from the developmental HOX C homeotic gene region but targets multiple genes on other chromosomes. At its target loci, it acts as a scaffold to assemble the Polycomb repressive complex 2 (PCR2; see the chapter titled Epigenetics I) to reprogram chromatin structure and silence those genes that should be turned off. HOTAIR expression has also been found to be deregulated in several cancers where it is associated with a poor prognosis.
In general, ncRNAs can function in multiple ways, in cis, as with CUTs and PROMPTs, and in trans, as with HOTAIR. A second way to examine function is mechanistic. ncRNAs can work as antisense RNA, either by directly binding to its counterpart or by transcriptional interference. ncRNAs can function by binding and targeting a protein to a specific gene or region. Many ncRNAs work as scaffolds for chromatin modifiers and remodelers, either in cis or in trans. Alternatively, an ncRNA can bind a protein and act as an allosteric modifier.
It is becoming clear that lncRNAs play an important role beyond gene regulation. They also play a critical role in the overall structure of the nucleus itself, as shown in FIGURE 29.5. Chromosomes are not simply thrown into the nucleus randomly, but rather occupy specific nuclear domains called topologically associated domains (TADs; also discussed in the chapter titled Chromatin). Homologous chromosomes have to be able to find each other at certain times in the meiotic cell cycle. This organization has been referred to as the chroperon.
Gene expression can be regulated positively by factors that activate a gene or negatively by factors that repress a gene. Translation may be controlled by regulators that interact with mRNA. The regulatory products may be proteins, which often are controlled by allosteric interactions in response to the environment, or RNAs, which function by base pairing with the target nucleic acids to change the target’s secondary structure or interfere with its function. Small metabolites can also bind to RNA aptamer domains and affect an alteration in secondary structure, as seen in riboswitches. Regulatory networks can be created by linking regulators so that the production or activity of one regulator is controlled by another.
ncRNAs such as antisense RNA are used in bacterial and in eukaryotic cells as a powerful system to regulate gene expression. This regulation can be direct, at the level of interference with an RNA polymerase, or indirect, by affecting the chromatin configuration of the gene and, more universally, the nuclear organization of chromosome and the nucleus itself. Antisense transcripts can also function in the cytoplasm by giving rise to a host of small regulatory RNAs.
Dethoff, F. A., Chug, J., Mustoe, A. M., and Al-Hashimi, H. M. (2012). Functional complexity and regulation through RNA dynamics. Nature 482, 322–330.
Cheah, M. T., Wachter, A., Sudarsan, N., and Beaker, R. R. (2007). Control of alternate splicing and gene expression by eukaryote riboswitches. Nature 447, 497–500.
Winkler, W. C., Nahvi, A., Roth, A., Collins, J. A., and Breaker, R. R. (2004). Control of gene expression by a natural metabolite-responsive ribozyme. Nature 428, 281–286.
Bonasio, R., and Shiekhattar, R. (2014). Regulation of transcription by long noncoding RNAs. Annu. Rev. Genet. 48, 433–455.
ENCODE Project Consortium. (2011). A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046. doi 10.1371.
Gerstein, M. B., et al. (2010). Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science 330, 1775–1787.
Giorgetti, L., Galupe, R., Nora, E. P., Piolut, T., Laun, F., Dekker, J., Tiana, G. and Heard, E. (2014). Predictive polymer modeling reveals coupled fluctuations in chromosome conformation and transcription. Cell 157, 950–963.
Guttman, M., and Rinn, J. L. (2012). Modular regulatory principles of large non-coding RNAs. Nature 482, 339–346.
The modENCODE Consortium, Roy, S. et al. (2010). Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330, 1787–1797.
Nagano, T., and Fraser, P. (2011). No-Nonsense functions for long noncoding RNAs. Cell 145, 178–181.
Pennisi, E. (2012). ENCODE project writes eulogy for junk DNA. Science 337, 1159–1161.
Preker, P., Almvig, K., Christensen, M. S., Valen, E., Mapendano, C. K., Sandelin, A., and Jensen, T. H. (2011). PROMoter uPstream transcripts share characteristics with mRNAs and are produced upstream of all three major mammalian promoters. Nuc. Acid Res. 39, 7179–7193.
Rinn, J., and Guttman, M. (2014). RNA and dynamic nuclear organization. Science 345, 1240–1241.
Arner, E., et al. (2015). Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347, 1010–1014.
Beretta, J., Pinskaya, M., and Morillon, A. (2008). A cryptic unstable transcript mediates transcriptional trans-silencing of the Ty1 retrotransposon in S. cerevisiae. Genes Dev. 22, 615–626.
Camblong, J., Beyrouthy, N., Guffanti, E., Schlaepfer, G., Steinmetz, L. M., and Stutz, F. (2009). Trans-acting antisense RNAs mediate transcriptional gene cosuppression in S. cerevisiae. Genes Dev. 23, 1534–1545.
Camblong, J., Iglesias, N., Fickentscher, C., Dieppois, G., and Stutz, F. (2007). Antisense RNA stabilization induces transcriptional gene silencing via histone deacetylation in S. cerevisiae. Cell 131, 706–717.
Giorgetti, L., Galupe, R., Nova, E. P., Pielot, T., Laun, F., Dekker, J., Tiana, G., and Heard, E. (2014). Predictive polymer modeling reveals coupled fluctuations in chromosome conformation and transcription. Cell 157, 950–963.
He, Y., Vogelstein, B., Velculescu, V. E., Papadopoulos, N., and Kinzler, K. W. (2008). The antisense transcriptomes of human cells. Science 322, 1855–1857.
Houseley, J., Rubbi, L., Grunstein, M., Tollervey, D., and Vogelauer, M. (2008). A ncRNA modulates histone modification and mRNA induction in the yeast GAL gene cluster. Mol. Cell 32, 685–695.
Huarte, M., Guttman, M., Feldser, D., Garber, M., Kozoil, M. J., Kenzelmann-Braz, D., Khalil, A. M., Zuk, O., Amit, I., Rabani, M., Attardi, L. D., Regev, A., Lander, E. S., Jacks, T., and Rinn, J. L. (2010). A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 142, 409–419.
Li, G., et al. (2012). Extensive promoter-centered interactions provide a topological basis for transcription regulation. Cell 148, 84–98.
Martens, J. A., Laprade, L., and Winston, F. (2004). Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene. Nature 429, 571–574.
McHugh, C. A., McHugh, C. A., Chen, C. K., Chow, A., Surka, C. F., Tran, C., McDonel, P., Pandya-Jones, A., Blanco, M., Burghard, C., Moradian, A., Sweredoski, M. J., Shishkin, A. A., Su, J., Lander, E. S., Hess, S., Plath, K., and Guttman, M. (2015). The Xist lncRNA interacts directly with SHARP to silence transcription through HDAC3. Nature 521, 232–236.
Prunesky, J. A., Hainev, S. J., Petrov, K. O., and Martens, J. A. (2011). The Paf1 complex represses SER3 transcription in Saccharomyces cerevisiae by facilitating intergenic transcription-dependent nucleosome occupancy of the SER3 promoter. Euk. Cell 10, 1283–1294.
Tsai, M. C., Manor, O., Wan, Y., Mosammaparast, N., Wang, J. K., Lan, F., Shi, Y., Segal, E., and Chang, H. Y. (2010). Long noncoding RNA as modular scaffold of histone modification complexes. Science 329, 689–693.