In V(D)J recombination, the DNA encoding a complete antibody V region is assembled from V, D, and J (heavy chain) or from V and J (light chain) segments that are initially separated by many kilobases of DNA. Each developing B cell generates a novel pair of heavy- and light-chain variable region coding sequences by recombination of its genomic DNA. This recombinational event is catalyzed by a set of enzymes (Table 6-2), many of which are also involved in nonhomologous end-joining (NHEJ) DNA repair functions that occur in all cells. Because V(D)J recombination entails the cutting of DNA at both strands, and because inappropriate recombination is potentially catastrophic for the cell, mechanisms have evolved that restrict antigen receptor gene recombination events to the appropriate sites on the Ig genes and to ensure that they occur only during defined periods of B and T cell development.
Protein | Function in V(D)J recombination | Immunological consequences of protein deficiency |
---|---|---|
|
||
RAG1/2 |
Antigen receptor gene recombinase complex. DNA cleavage is mediated by RAG1. Epigenetic targeting is directed by RAG2. |
Severe combined immuno-deficiency (SCID) |
Terminal deoxyribonucleotidyl transferase (TdT) |
Adds nontemplated (N) nucleotides to V-D and D-J joints of Ig heavy chain and all joints of TCR chains in a template-independent manner. |
Reduced N-nucleotide addition is seen at coding joints |
|
||
High mobility group B proteins 1 and 2 (HMGB1/2) |
Stabilize binding of RAG1/2 to recombination signal sequences (RSSs). Stabilize introduction of bend into 23 RSS DNA by RAG1/2. |
No information available |
|
||
Ku70/80 |
Complex is recruited to DNA double-strand (DS) breaks. Stabilizes and aligns DNA ends prior to repair. Essential for both signal and coding joint repair in Ig and TCR genes. Recruits DNA-PKcs protein. |
SCID occurs in the absence of either or both Ku proteins. Knockout mice are also small in size and sterile |
DNA-PKcs |
A protein kinase that forms a complex with Ku70/80. It phosphorylates and actives Artemis. It recruits the ligation machinery. |
SCID occurs. Knockout mice otherwise develop normally |
Artemis |
Once Artemis has been phosphorylated by DNA-PKcs, it opens the hairpin on the coding end joint. |
Artemis-deficient B and T cells have blocked formation of coding joints and accumulation of hairpin-sealed coding ends. Mice lacking Artemis have severely impaired B- and T-cell development |
DNA ligation complex: DNA ligase IV, XRCC4, and XLF (Cernunnos) |
XRCC4 maintains stability of DNA ligase IV and stimulates its catalytic activity. XRCC4 may also help to align DNA ends. DNA ligase IV is required for ligation of cut DNA ends, at both the coding and the signal joints. |
Lack of DNA ligase IV or of XRCC4 causes a complete block in lymphoid development (SCID). Mice lacking XLF show enhanced sensitivity to radiation-induced DNA damage, but do not develop significant immunodeficiency |
Unusual DNA polymerases, such as DNA Pol μ and DNA Pol λ |
These polymerases add nucleotides at Ig heavy chain and TCR antigen receptor loci. In the absence of TdT activity, some N-nucleotide addition is observed, which is thought to be the result mainly of DNA Pol μ action. Whereas DNA Pol λ requires a template strand, DNA Pol μ, like TdT, can act in a template-independent manner. Pol μ participates primarily in heavy-chain and Pol λ in light-chain rearrangements. |
Defects in the development of hematopoietic cells |
Ataxia telangiectasia mutated (ATM) protein |
A kinase that binds double-stranded breaks in DNA and blocks entry into the cell cycle until the breaks can be repaired. The ATM protein is recruited to the double-strand breaks by the MRN (Mre11/Rad50/Nbs1) complex. |
Lymphopenia and predisposition to thymic lymphomas characterized by translocations involving TCR genes |
This degree of accuracy is accomplished in part by the fact that the recombination enzymes recognize specific DNA sequence motifs called recombination signal sequences (RSSs). These sequences also ensure that one of each type of segment (V and J for the light chain, or V, D, and J for the heavy chain) is included in the recombined heavy- and light-chain genes. During cleavage and ligation of the segments, the DNA is edited in various ways, adding further variability to the recombined gene. As you will see later in this chapter, similar, although not identical, mechanisms operate to generate complete T-cell receptor genes in developing thymocytes. We will also discuss the roles of epigenetic modifications and chromatin structure in restricting recombination to the relevant regions of antigen receptor genes.
Different immunoglobulin variable region gene segments are recombined at specific stages in lymphoid development (see Chapter 9, Figure 9-4). As B cells develop in the bone marrow, the first step in the creation of a mature immunoglobulin receptor is recombination that brings together a D and a JH gene segment. This step occurs at a very early stage of development, termed the pre-pro-B cell to the early pro-B-cell stage, while the cell is still in the bone marrow and just beginning its journey of differentiation into a mature B cell (see Chapter 9). Recombination between the VH and D-JH segments follows during the pro-B-cell stage.
If V(D)J recombination is successful and a heavy-chain variable region is generated, the resulting heavy-chain protein is placed onto the cell surface in combination with a nonvariable pair of proteins, VpreB and λ5 (called a surrogate light chain), to form a pre-B-cell receptor (Figure 6-7a). Signaling from the pre-B-cell receptor halts heavy-chain recombination, initiates several rounds of proliferation, and then calls for the beginning of light-chain recombination. Light-chain recombination occurs at the small pre-B-cell stage of B-cell development. Light-chain recombination in the mouse initiates at the κ locus, and if this is not successful, continues at the λ locus. In each case, recombination occurs on one allele at a time. In humans, light-chain recombination may start at either the κ or the λ locus. Expression of an intact membrane IgM B-cell receptor (Figure 6-7b) shuts off further light-chain gene rearrangement.
In the late 1970s, investigators sequencing light-chain genes first described two blocks of conserved sequences—a nonamer (a set of 9 bp) and a heptamer (a set of 7 bp)—that are highly conserved and occur in the noncoding regions upstream of each J segment. The heptamer appeared to end exactly at the J region coding sequence. Further sequencing showed that the same motif was repeated in an inverted manner on the downstream side of the V region coding sequences, again with the heptamer sequence ending flush with the V region gene segment (Figure 6-8a).
Between the nonamer and heptamer sequences, the researchers described a spacer sequence of either 12 or 23 bp in length. Although the nucleotide sequence of the RSS spacer is not well conserved, the significance of the spacer lengths was immediately clear; 12 nucleotides is about the length of one turn of the double helix, and 23 nucleotides is about two turns. In this way, the spacer sequence ensures that the ends of the nonamer and heptamer closest to the spacers would be on the same side of the double helix and therefore accessible to binding by the same enzyme. The investigators correctly concluded that they had discovered the DNA signal sequence that directs recombination between the V and J gene segments. They therefore termed this “heptamer-spacer-nonamer” motif the recombination signal sequence or RSS.
Nucleotide sequencing of the RSS demonstrated that it consists of three elements:
In the heavy-chain gene segments, a similar pattern was noted. The spacer regions separating the heptamer and nonamer pairs were 23 bp in length following the V segments and preceding the J segments, and 12 bp in length before and after the D segments. The relative locations of the 12- and 23-bp spacers (Figure 6-8b) suggested that the VDJ recombinase enzyme is designed to pair one RSS bearing a 12-bp spacer with a second RSS that includes a 23-bp spacer, something we now know to be the case. This is referred to as the 12/23 rule.
Figures 6-9a and b illustrate the manner in which the RSSs act to bring together the appropriate gene segments during the generation of complete light-chain and heavy-chain variable region genes.
In the early 1990s two proteins, encoded by RAG1 (recombination activating gene 1) and RAG2 (recombination activating gene 2), were shown to be required for recombining antibody variable region gene segments. The RAG1 and RAG2 genes are just 8 kb apart and are transcribed in opposite directions. RAG gene expression occurs only in cells of the immune system, is developmentally regulated in both T and B cells, and coincides with those periods during lymphoid development when receptor genes are being assembled (see Chapters 8 and 9). The RAG1/2 protein complex is required for RSS recognition and targeted cleavage of the DNA at the junction between the RSS and the respective variable region–coding segments.
The functional RAG1/2 occurs as a tetramer, with each protein being represented twice in the active protein complex. Recent x-ray crystallographic analysis has shed light on the relative locations of the two RAG1 and two RAG2 monomers within the tetrameric complex (Figure 6-10a and b).
Figure 6-10c shows the location of functionally important sequences in the primary structures of RAG1 and RAG2. The active site of the recombinase complex is located in the RAG1 subunit and contains two aspartic acid residues and one glutamic acid residue (D600, D708, and E962, respectively). This active site “DDE motif” is found in many enzymes that cleave DNA, such as endonucleases, transposases, and recombinases. Figure 6-10c also identifies the domains on RAG1 that bind to both the nonameric and heptameric regions of the RSS and shows that the heptamer-binding region overlaps with the part of RAG1 that interacts with RAG2. ZnA and ZnB on RAG1 are regions of the proteins that form zinc fingers, elongated protein domains stabilized by the coordination of a Zn2+ ion.
The ZnA region of the protein assists in the initial RAG1 binding to active chromatin via its ability to interact with histone H3. However, once the RAG1/2 complex is in place, this binding then inhibits cleavage of DNA by RAG1/2. The ubiquitin ligase activity located in the same region of the protein is thought to ubiquitinylate the H3 protein, allowing release of RAG1 to mediate its DNA cleavage function. More recently, this same ubiquitin ligase activity has been shown to auto-activate the RAG1 recombinase activity.
The core region necessary for RAG2 activity lies at the amino-terminal end of the RAG2 molecule, while the plant homeodomain (PHD) region helps to guide the complex to active DNA bearing the H3K4me3 histone mark. Threonine residue T490 on RAG2 is phosphorylated during the S, G2, and M phases of the cell cycle, and this phosphorylation triggers the destruction of RAG2. This ensures that the complex does not cut DNA while the cell is undergoing division.
Biochemical experiments have demonstrated that the essential activities of the RAG1/2 complex can be accomplished by a so-called “core complex,” which consists of residues 384–1008 of RAG1 and residues 1–383 of RAG2.
Only three of the proteins implicated in V(D)J recombination are unique to lymphocytes: RAG1, RAG2, and terminal deoxynucleotidyl transferase (TdT). Like RAG1/2, TdT is also expressed only in developing lymphocytes. It adds nontemplated (“N”) nucleotides to the free 3′ termini of coding ends of heavy-chain V, D, and J segments following their cleavage by RAG1/2 recombinases. (These nucleotides are designated as “nontemplated” because they are not present in the germline, but rather are added to the DNA of a somatic cell.) TdT activity therefore contributes to the generation of additional receptor gene diversity in the CDR3 region of the antibody heavy chain.
Other proteins participating in the recombination process are not lymphoid specific. The high mobility group B proteins 1 and 2 (HMGB1 and HMGB2) act interchangeably to enhance RAG1/2 binding to the RSS and may also facilitate DNA bending at the recombination site. Whereas binding of the RSSs by RAG1/2 requires only RSS and HMGB proteins, other cellular factors, most of which are part of the nonhomologous end-joining (NHEJ) pathway of DNA repair, are necessary to accomplish V(D)J recombination. The involvement of particular proteins at various steps in this process was deduced from observations of V(D)J recombination in natural and artificially generated systems lacking one or more of the proteins. The proteins known to participate in V(D)J joining are described in Table 6-2. Clinical Focus Box 6-2 further describes some of the immunodeficiencies suffered by individuals with mutated or insufficient activities of the enzymes involved in V(D)J recombination. Additional descriptions of these immunodeficiency syndromes can be found in Chapter 18.
The process of V(D)J recombination occurs in several well-defined stages (Overview Figure 6-11). The end product of each successful rearrangement is an intact Ig gene, in which V and J (light chain) segments or V, D, and J (heavy chain) segments are made contiguous (flush) with one another, to create a complete heavy- or light-chain gene. The new joints in the antibody V region gene, created by this recombination process, are referred to as coding joints. During the process of V(D)J recombination of the heavy-chain variable region, or of V-J recombination of the λ-chain variable region, the intervening DNA is deleted and lost as an excision circle, or episome (Figure 6-11a). In the case of the κ light-chain gene, about 50% of the Vκ gene segments in the germ line are found in the opposite transcriptional orientation to the Jκ gene segments. In these cases, the intervening DNA is inverted and the excised sequences are retained on the chromosome upstream of the recombined gene (see Figure 6-11b). Regardless of whether the joints between the two RSS heptamers are lost as excision circles or retained in upstream DNA, they are referred to as signal joints.
Recombination Of Immunoglobulin Variable Region Genes
"Illustration a shows recombination between V and J segments arranged in the same transcriptional orientation. Three V segments, V 1 , V 2, and V 3, each connected to a spacer, are oriented in alternating directions. At the opposite end, the J cluster has three J segments, J 1, J 2, and J 3, all oriented in the same direction and each connected to a spacer. V 1 is indicated and has a protein bound to it on its spacer side. Binding of RAG 1 and 2 proteins at R S S, and synapsis between V and J R S S’s occur.. The strand is folded over and the J strands are oriented toward the opposite side, with the protein bound to both V 1 and the oppositely-oriented J 2. Signal and coding joints are cleaved by RAG 1 and 2; a hairpin is formed at coding joint at V2 and J2, detaching the V and J regions from their respective spacers. The hairpin opens and the coding joint is ligated. In the next step, V1 and J2 have combined, and the connecting point is labeled, coding joint. Ligitation of the signal joint takes place to form an episome. The spacers from the V and J segments are joined together and labeled, signal joint. The episome is lost as an excision circle.
Part b shows the recombination between V and J segments arranged in the opposite transcriptional orientation. Three V segments, V 1 , V 2, and V 3, each connected to a spacer, are oriented in alternating directions. At the opposite end, the J cluster has three J segments, J 1, J 2, and J 3, all oriented in the same direction and each connected to a spacer. V 2 is indicated and has a protein bound to it on its spacer side. Binding of RAG 1 and 2 and synapsis between RSS of V2 and J2 occur. The strand is folded over and the J strands are oriented toward the opposite side, with the protein bound to both V 2 and J 3 oriented in the same direction. Cleavage occurs by RAG 1 and 2 and a hairpin is formed at the coding joints. Next, resolution of hairpin and ligation at the coding joints take place, leaving a signal joint composed of the spacers from the V and J segments connected together."
The first phase of the recombination process, DNA recognition and cleavage, is catalyzed by the RAG1/2 proteins acting in concert with an HMGB1/2 protein. The second phase, end processing and joining, requires, in addition to RAG1/2, a more complex set of enzymatic activities: Artemis, other NHEJ proteins, and TdT (for heavy-chain recombination only). The individual steps involved in the process of recombination between Vκ and Jκ segments are shown sequentially in Figure 6-12.
In step 1, RAG 1 and 2 and HMGB1 and 2 proteins bind to the R S S and catalyze synapse formation between the V and J segments. The illustration shows the v segment and the J segment facing each other, with the protein bound to the spacer next to the V segment. The strand folds over, with the 5 prime to 3 prime coding side on the outside edge from the V region to the J region. The V and J segments line up oriented in the same direction with the protein bound to the spacer next to each. In step 2, RAG1 and 2 performs a single stranded nick at the exact 5 prime border on the heptamic R S S’s bordering both the V and the J segments. The illustration shows the spacer next to the V segment cut on the 5 prime side. In step 3, the hydroxyl group that was liberated by the nick at the 3 dash end of the coding strand attacks the corresponding phosphate group on the noncoding strands of both the V and the J segments to yield a covalently sealed hairpin coding end and a blunt signal end. The illustration shows the closed coding end forming a 5 prime to 3 prime loop, and on the signal end a phosphate group on the side formerly facing the 5 prime end and a hydroxyl group on the side formerly facing the 3 prime end. In step 4, the signal end joining ligates the ends of the two R S S heptameric sequences that were originally in contact with the V and J coding sequences. A nonamer is at each side of the sequence. Toward the closed off V region is a 12 base pair spacer followed by a heptameric sequence with a phosphate group at the end of the coding side, facing the closed off V region, the noncoding side having a hydroxyl group at the end. On the opposite side of the nonamers, toward the closed off J region, is a 23 base pair spacer followed by a heptameric sequence with a hydroxyl group at the end of the coding side, facing the closed off J region, the noncoding side having a phosphate group at the end. The sequence at the signal junction results from the joining of the two heptameric regions, where the two spacer sequences join to form a loop. In step 5, opening of the hairpin can result in a 5 prime overhang, a 3 prime overhang, or a blunt end. The most common result generated by Artemis is a 3 prime overhang. Opening on the side of the loop nearest the 5 prime end yields a 5 prime overhang in which the 5 prime side of one end ends with a few unpaired bases, opening at in the middle of the loop yields a blunt end in which the 3 prime and 5 prime side end at the same place, and opening on the side of the loop nearest the 3 prime end yields a 3 prime overhang in which the 3 prime side of one end ends with a few unpaired bases. In step 6, cleavage of the hairpin generates sites for P nucleotide addition. The illustration shows the hairpin loop next to the V region with a T C sequence on the coding side, with hairpin cleavage by Artemis between the opposite A.G. sequence and the V region. Cleavage of the hairpin with the J region in it is between the J region and a T.A. sequence on the coding side. The illustration shows that this results in a TCGA sequence next to the V region on the 3 prime end of the coding side of one strand, and an ATAT sequence next to the J region on the 3 prime end of the noncoding side of the other strand. The complementary strands are filled by DNA repair enzymes. The reading at the bottom strand to the right of the V region is AGCT. In step 7, ligation of light chain V and J regions takes place. The ligation of complete segments by DNA ligase IV and XRCC4 results in a structure where the upstream sequence in the V and J joint is TCGATATA and the downstream sequence in the V and J joint is AGCTATAT, the four nucleotides in the middle labeled P nucleotides.
Step 1 Recognition of the recombination signal sequence (RSS) by the RAG1/RAG2 enzyme complex. The RAG1/2 recombinase tetramer forms a complex with the RSS next to one of the two gene segments to be joined. Binding is usually, but not always, initiated at the RSS containing the 12-bp spacer. Binding of the RAG1/2 complex is enhanced by the HMGB1/2 proteins, which may also serve to induce and stabilize bending of the DNA, facilitating its cleavage. The second RSS is then bound by the RAG1/2 complex and the two gene segments to be joined are brought into close contact (synapsis). Current models based on recent crystallographic studies suggest that binding of one type of spacer induces a conformational change in the RAG1/2 DNA-binding site that specifically accommodates the opposite type of spacer, thus enforcing the 12/23 rule.
Step 2 One-strand cleavage at the junction of the coding and signal sequences. The RAG1 protein then creates single-strand nicks, 5′ of the heptameric signal sequence on the coding strand of each V segment (i.e., at the junction between the V segment and the heptamer) and at the heptamer–J region junction. (Figure 6-12 shows this process for the V segment only.)
Step 3 Formation of V and J region hairpins and blunt signal ends. The free 3′-hydroxyl group at the end of the coding strand of the V segment now attacks the phosphate group on the opposite, noncoding V strand, forming a new covalent phosphodiester bond across the double helix and yielding a DNA hairpin structure on the V segment side of the break. This is called the coding end. Simultaneously, a blunt DNA end is formed at the edge of the heptameric signal sequence as a result of making a clean cut through both strands of DNA, with no overhang. This is the signal end. The same process occurs simultaneously on the J side of the incipient joint. At this stage, the RAG1/2 proteins and HMGB1/2 proteins are still associated with the coding and signal ends of both the V and J segments in a postcleavage complex. The serine/threonine kinase protein, ataxia telangiectasia mutated (ATM), is thought to play an important role in stabilizing this complex and minimizing aberrant recombination events at this point in the process.
Step 4 Ligation of the signal ends. The NHEJ protein, DNA ligase IV, then ligates the free blunt ends to form the signal joint.
Step 5 Hairpin cleavage. Next, the hairpins at the ends of the V and J regions are opened by the endonuclease, Artemis, in one of three ways. The identical bond that was formed by the reaction described in step 3 may be reopened to create a blunt end at the coding joint. Alternatively, the hairpin may be opened asymmetrically either on the “top” or on the “bottom” strand, to yield a 5′ or a 3′ overhang, respectively. Artemis is a member of the NHEJ pathway and requires activation by the NHEJ kinase, DNA-PKcs, which binds to the DNA hairpin ends via its DNA-binding protein subunits Ku70/80. The most common overhang created by Artemis-mediated cleavage at immunoglobulin gene junctions is a 3′ overhang that leaves two unpaired residues. In addition to hairpin opening, the Artemis–DNA-PKcs complex also possesses both single- and double-stranded DNA endonuclease activity that is capable of removing several DNA bases or base pairs on each side of the nascent joint. This activity is rarely observed at the signal joint, but occurs often at the coding joint. The number of nucleotides that can be lost on each side of the joint ranges from 0 to 14.
Step 6 Overhang extension can lead to addition of palindromic nucleotides. In Ig light-chain rearrangements, nucleotide overhangs resulting from the steps described previously can act as substrates for NHEJ DNA repair enzymes, leading to double-stranded palindromic (P) nucleotides at the coding joint. For example, the top row of bases in the V region shown in Figure 6-12, step 6, reading in the 5′ to 3′ direction, reads TCGA. Reading backward on the bottom strand from the point of ligation also yields TCGA. The palindromic nature of the bases at this joint is a direct function of an asymmetric hairpin-opening reaction. P-nucleotide addition can also occur at both the V-D and D-J joints of the heavy-chain gene segments but, as described below, other processes can intervene to add further diversity at the VH-D and D-JH junctions.
Step 7 Ligation of light-chain V and J segments. DNA ligase IV repairs the signal joints, as well as the coding joints. DNA ligase IV is usually found in complex with XRCC4, which helps to activate it. However, whereas at the signal joints, ligation almost always occurs without the addition or deletion of any nucleotides, the situation can be more complex at the coding joint. The enzymes of the NHEJ pathway include polymerases as well as Artemis and DNA ligase. As mentioned earlier, the endonuclease activity of Artemis will sometimes nibble at the coding ends after hairpin opening (see Table 6-2). In addition, the DNA polymerases associated with the NHEJ pathway, in particular DNA polymerase (Pol) λ and DNA Pol μ, are less faithful than the conventional DNA polymerase even when acting in a template-dependent manner. Even more dramatically, DNA Pol μ, like TdT, is capable of polymerizing DNA in a non–template-dependent manner and is therefore capable of adding random nucleotides at the coding joint.
Thus, NHEJ repair mechanisms can generate significant nucleotide diversity at the light-chain coding joint, even in the absence of TdT, which acts mainly at the heavy-chain joints.
Comparative sequence analysis of germ-line and mature B-cell Ig genes demonstrated that particularly extensive addition of nontemplated nucleotides could be identified in heavy-chain sequences. These additional nucleotide sequences occurred at both the VH-D and D-JH joints. In addition, careful comparative sequencing of germ-line versus somatic B-cell Ig heavy-chain sequences revealed that nucleotides were also often lost at these junctions. Two distinct types of enzyme-catalyzed activities are responsible for these findings in VH sequences.
Step 8 Exonuclease trimming. Exonuclease activity trims back the edges of the V region DNA joints. Since the RAG proteins themselves can trim DNA near a 3′ flap, it is possible that the RAG proteins may cut off some of the lost nucleotides. Alternatively, as described in step 5, the Artemis–DNA-PKcs complex could be the enzyme responsible for the V(D)J-associated endonuclease function. Extensive exonuclease trimming is more common at the two heavy-chain V gene joints (V-D and D-J) than at the light-chain V-J joint. In cases where trimming is extensive, it can lead to the loss of the entire D region as well as the elimination of any P nucleotides formed as a result of asymmetric hairpin cleavage.
Step 9 N-nucleotide addition. (Most probably occurs simultaneously with step 8.) Nontemplated (N) nucleotides are added by TdT to the coding joints of heavy-chain genes after hairpin cleavage. This enzyme can add up to 20 nucleotides to each side of the joint. The two ends are held together throughout this process by the RAG1/2 enzyme complex. TdT-mediated N-nucleotide addition at the coding joints of the heavy-chain genes is more commonly observed than at light-chain joints, because TdT is expressed at the earliest phases of V(D)J recombination when the heavy chain, but not the light-chain, genes are being rearranged. TdT activity is then usually turned off before light-chain rearrangements begin in mice, although residual TdT activity is found during light-chain rearrangement in humans. In addition, as described above, some nontemplated nucleotide addition at light-chain joints may be mediated by the NHEJ DNA Pol μ.
Step 10 Ligation and repair of the heavy-chain gene. This final step is identical to the ligation and repair for the light-chain genes and is mediated by DNA ligase IV acting in concert with XRCC4.
When considering the creation of an immunoglobulin variable region gene, we must always take into account the fact that nucleotide addition and/or exonuclease trimming at the V(D)J joints does not necessarily occur in sets of three nucleotides, and so can lead to out-of-phase joining. Recombined V segment sequences in which trimming has caused the loss of the correct reading frame for the transcription process cannot encode antibody molecules, and such rearrangements are said to be unproductive. If recombination at one heavy-chain locus is unproductive, rearrangement at the other allele is immediately initiated. Unproductive rearrangement at both alleles leads to apoptosis of the developing cell as it fails to receive necessary survival signals from the pre-BCR (see Chapter 9). Once light-chain rearrangements begin, sequential rearrangement of light-chain alleles occurs if prior rearrangements are unsuccessful.
The cleavage and rearrangement of DNA segments within somatic cells of a mammalian genome is an unusual occurrence and led scientists to question how such a mechanism might have evolved. Compelling evidence now supports the evolutionary origin of the genes encoding the RAG1/2 complex as a transposon unit that hopped into a primitive antigen receptor gene; this is discussed further in Evolution Box 6-3.
A Central Mechanism of the Adaptive Immune System Has a Surprising Evolutionary Origin
The hallmark feature of the adaptive immune system is the existence of highly diverse antigen receptors generated by V(D)J gene rearrangements effected by the RAG1/2 recombinase. Thus, understanding how the RAG1/2 recombinase evolved is key to understanding the formation of the adaptive immune system during vertebrate evolution. A series of seminal observations made in the early 1990s gave rise to the notion that the process of V(D)J recombination may have its beginnings in the insertion of a transposase gene into an ancestral antigen receptor gene. Transposition and V(D)J recombination share a conceptual framework in that they are both systems in which DNA segments are “book-ended” by short signal sequences and subsequently moved around the genome by mechanisms that cleave and rejoin DNA. As more experiments have been conducted, this idea has been progressively fleshed out, and it now seems extremely likely that the RAG system has evolved from an ancient transposon.
Structurally, the RSSs bear all the hallmarks of a recognition unit for transposase activity. All transposons bear cis-acting sequences (sequences on the same strand of DNA that will be attacked) that must be recognized by a transposase enzyme. These sequences are usually short sequences (terminal inverted repeats, or TIRs) at either end of the transposon. The presence of the RSSs on either side of each of the variable gene segments in both T and B cells suggests their origin as recognition sequences for transposable elements. Sequence analysis of vertebrate RSSs demonstrated considerable sequence homology with the TIR sequences of the Transib family of transposons. This homology is most acute in the conserved heptameric sequence. Furthermore, the RAG1 enzyme itself has been shown to have considerable sequence homology to the Transib transposase enzyme, lending further support to the evolutionary origin of V(D)J recombination as a form of transposition. However, notably, no homolog for RAG2 was found in the Transib transposase system.
Lending further support to the notion that V(D)J recombination has its origins in transposition, V(D)J recombination and transposon excision and repair have been shown to be very similar, mechanistically. During the movement of a transposon, the two transposon ends are held together in a nucleoprotein complex, perfectly analogous to the synaptic complex generated during V(D)J recombination. At the catalytic site of RAG1, V(D)J recombination uses three acidic residues: DDE (two aspartic acid residues and a glutamic acid residue). This same triad of acidic amino acids is characteristic of the DNA cleavage sites of the DDE family of transposases. The acidic triad coordinates metal ions (most probably magnesium ions in vivo) in both the DDE transposases and in RAG1. Investigators demonstrated that both the DDE transposases and RAG1 catalyze single-strand nicking followed by hairpin formation mediated by transesterification.
If RAG1 indeed evolved from a Transib transposase, the next obvious question was whether the Transib transposase could mediate Ig gene segment recombination. To investigate this, scientists set up a test system in which mouse 3T3 fibroblasts were first transfected with a RAG1/2 recombination substrate (Figure 1). Note that, in this substrate, the transcriptional direction of the Vκ gene segment is opposite to that of the Jκ substrate; recall that this requires that the intervening sequence be inverted during the process of V-J recombination (Figure 6-11b). Inversion of this sequence in the test recombination substrate used in this experiment would then allow expression of the xanthine-guanine phosphoribosyltransferase (GPT) gene, which confers resistance of the cell to the antibiotic mycophenolic acid (MPA). Recombinational events could therefore easily be screened for by simply looking for the cells that survived in culture with MPA.
Before recombination, the strand has a V kappa region with an adjacent spacer, a G P T region, and J kappa regions 1 and 2, each with an adjacent spacer. The first J region is oriented opposite to the other regions on the strand. After recombination, the spacers from the V and first J regions are joined to form a signal joint, the GPT region is oriented in the opposite direction from what it was before, and the V and first J regions are joined and oriented in the same direction as the GPT region. The second J region remains unaltered.
Initial control experiments showed that transfection with genes encoding RAG1 and RAG2 gave rise to the expected high frequency of recombination events that conformed to the 12/23 rule. Transfection with RAG1 genes alone showed a lower number of recombination events, but, surprisingly, transfection with RAG1 genes alone continued to allow recombination of Vκ and Jκ segments even when both segments were flanked with the same RSS. It therefore appears that one of the roles of the RAG2 protein in the vertebrate immune system is to enforce the 12/23 rule in Ig gene recombination.
The next set of experiments tested whether the Transib transposase from Helicoverpa zea, Hztransib, could bring about recombination guided by the 12-bp RSS and 23-bp RSS signal sequences. Transfection of 3T3 cells with the recombination substrate and the Hztransib transposase enzyme gene alone yielded no recombination activity. Remarkably, however, when the Hztransib gene was simultaneously transfected along with the mouse RAG2 gene, recombination occurred at a level fully one-third of that catalyzed by the mouse RAG1/2 complex. Sequence analysis confirmed that this recombination yielded perfectly normal signal and coding joints.
These and other findings have led to a model of RAG evolution in which a Transib transposase gene served as the evolutionary precursor of RAG1, and the RAG2 gene was separately acquired by a Transib element to form the vertebrate RAG1-RAG2 transposon (Figure 2). The evolutionary origins of RAG2 are currently unknown, although it may have existed as a host factor in the genome of the organism in which the transposon first arose.
"The illustration shows a transposable element from the transib family and an unknown RAG2 gene to form transposase. The text corresponding to step 1 reads, Insertion of the RAG1 over 2 complex from a transposon and an unknown RAG2 ancestral gene. In step 2, the insertion of the transposase into an ancestral receptor gene takes place, followed by excision of the recombinase-coding genes and several rounds of duplication of the antigen receptor gene segments to form a chain with two v elements, two J elements and RAG1 and RAG2 genes."
Once the primitive RAG transposon was assembled, it is not difficult to envisage that it could insert into exons of primitive cell surface receptor genes. This insertion could be followed variously by gene duplication events and genetic separation of the recombinase activity from between the terminal inverted RSS repeats to elsewhere in the genome. This would result in rapid evolution of receptor genes and lead to the complex receptor gene structure we know today.
The above description allows us to understand how such an immensely diversified antibody repertoire can be generated from a finite amount of genetic material. To summarize, the diversity of the naïve BCR repertoire is shaped by the following mechanisms (Table 6-3, first two columns):
Mechanism | Used in B cells | Used in T cells | Comments |
---|---|---|---|
Multiple germ-line V(D)J genes |
Yes |
Yes |
The mouse Vλ locus has undergone a severe contraction, and, therefore, only 5% of mouse light chains are of the λ type. The TCR γ-chain locus also has few V genes J region diversity is notably higher in TCR α-chain genes than in other TCR or Ig genes |
Light-chain segment use |
κ and λ variable regions encoded by V and J segments |
α and γ variable regions encoded by V and J segments |
|
Heavy-chain segment use |
VH regions encoded by V, D, and J segments |
β and δ variable regions encoded by V, D, and J segments |
|
Absolute dependence on RAG1/2 expression |
Yes |
Yes |
|
Junctional diversity: P-nucleotide and N-nucleotide addition |
Yes |
Yes |
Many fewer N nucleotides found in Ig light chains because of developmental regulation of TdT |
Multiple D regions per recombined chain |
No |
Present only in TCR δ |
The presence of two D segments allows an additional site for N-nucleotide addition |
Allelic exclusion of receptor gene expression |
Absolute |
Allelic exclusion of TCR α genes is not absolute |
|
On activation, secretes product with the same binding site as the receptor |
Yes |
No |
|
Nature of constant region determines function |
Yes; constant region of secreted antibody product determines its function. Constant region of membrane receptor anchors receptor in membrane and connects with signal transduction complex |
No secreted product. Constant region of membrane receptor anchors receptor in membrane and connects with signal transduction complex |
|
Receptor genes undergo somatic hypermutation following antigenic stimulation |
Yes |
No |
Mechanisms 3, 4, and 5 give rise to striking sequence diversity at the junctions between gene segments, and result in the formation of the highly variable CDR3 regions of the antibody heavy and light chains.
Together, these five mechanisms are responsible for the creation of the repertoire of BCRs that is available to organisms before any contact with pathogens or other antigens has occurred, the so-called naïve BCR repertoire.
Note that we have described the process of the generation of the primary Ig variable region repertoire as it occurs in humans and rodents. Although the same principles apply to most vertebrate species, different species have evolved their own variations. For example, the process of gene conversion is used in chickens and rabbits, and some species, such as sheep and cows, use somatic hypermutation in the generation of the primary as well as the antigen-experienced repertoire.
In attempting to understand the complex process of V(D)J recombination and its regulation, investigators must address the question of how Ig gene recombination happens only at particular stages of B-cell development and how two RSSs, located many kilobases or even megabases apart in the linear DNA sequence, are brought into sufficiently close apposition for accurate recombination to succeed.
Because of its capacity to induce genome instability by introducing double-strand breaks in DNA, the expression of the RAG1/2 complex must be tightly regulated. The enzyme complex is expressed only in lymphoid cells at specific periods in lymphoid development (see Chapters 8 and 9). Furthermore, as described earlier, it is inactivated prior to the cell’s entry into S phase, when double-stranded breaks in the DNA might interfere with regulated chromatin distribution into daughter cells. Kinases coupled to the cell cycle phosphorylate RAG2 prior to the G1-S cell cycle transition, targeting RAG2 for ubiquitin-dependent protein degradation prior to entry into S phase (see Figure 6-10c).
However, once RAG1/2 is expressed, how does the cell ensure that its activity is appropriately restricted to the correct sites on the chromatin? Although the native RAG recombinase complex is quite difficult to isolate, a core, catalytically active RAG1/2 heterodimer can be purified quite readily, and has been used to determine the binding specificity and orientation of the recombinase.
On isolated DNA fragments, the core RAG recombinase binds specifically to recombination signal sequences (RSSs), although it also binds to other sites on the genome that lack extensive sequence homology to the RSS. The nonamer-binding domains of RAG1 interact with the A-rich tract of the RSS nonamer. Additional regions in the RAG1 core also interact with the RSS heptamer and with the end of the V, D, or J coding sequence, as well as mediating the catalytic reaction. In general, the isolated core recombinase operating on purified DNA fragments tolerates more considerable sequence variation in the RSS spacer, the heptamer, and even the nonamer than is observed in vivo.
In vivo, the catalytic activity of RAG1/2 occurs in an extraordinarily complex chromosomal environment, and analysis of V(D)J recombination regulatory mechanisms in the native chromosomal context has required the refinement of techniques capable of analyzing the interactions between proteins and nuclear DNA folded within its native chromatin structure. Three techniques in particular: chromatin immunoprecipitation (ChIP); multicolor, three-dimensional fluorescence in situ hybridization (3-D FISH); and methods that identify DNA sequences that interact with one another within the context of active chromatin (e.g., Hi-C), are described in Chapter 20. These approaches were all used to generate the information described in this section.
RAG1/2 binding is affected by particular epigenetic modifications on the histones associated with target sequences. Recall that eukaryotic DNA is wound around histone octamers to form nucleosomes. The core DNA that is directly associated with each nucleosome is 147 nucleotides in length, and nucleosomes are separated from one another by linker DNA sequences of up to 80 nucleotides in length that interact with histone H1. This “beads on a string” nucleosomal DNA is then coiled into structures of increasing complexity. Histone modifications, such as methylation or acetylation, can affect the degree to which the DNA in the associated chromatin is accessible to enzymatic activities, such as recombination or transcription, by altering the extent of nucleosome packing. The nature of histone modifications or epigenetic marks associated with a set of genes is referred to as its “histone code.” Alterations in the histone code of chromatin associated with immunoglobulin DNA during B-cell development signal the onset of receptiveness of the Ig locus to transcription and recombination.
Analysis of the biochemical basis for RAG recombinase binding to chromatin shows that the RAG2 plant homeodomain region (see Figure 6-10c) interacts with histone H3 that has been trimethylated at the lysine in position 4 of the histone’s amino acid sequence (H3K4me3). This H3K4me3 modification is typically found at transcriptional start sites in active chromatin. Disruption of this RAG2-histone interaction inhibits V(D)J recombination. Biochemical experiments have shown that RAG2 binding to H3K4me3 increases the affinity of the RAG complex for its DNA substrates, possibly by inducing an activating conformational change in RAG1.
Furthermore, it has long been known that one of the earliest steps in Ig gene recombination is the transcription of noncoding RNA from promoters in DNA regions near the Ig gene segments. This germ-line transcription, irrespective of the nature of the RNA product, confirms that the DNA is now accessible for enzymatic manipulation. RNA polymerase II, the enzyme that transcribes the immunoglobulin genes and initiates the germ-line transcription process alluded to above, often travels with the histone methyltransferases, suggesting that the histone modifications that signal active chromatin are mechanistically linked to the germ-line transcription event and that together, they signal the readiness of the germ-line immunoglobulin DNA for recombination.
In immunoglobulin genes, both the trimethylated lysine histone modification and acetylation of histone residues, which also signals open chromatin, are concentrated in the J-gene segment regions of both heavy and light chain–encoding DNA, with a few trimethylated histones found in association with J-proximal D gene segments. Thus, the nature of the histone code directs the recombination apparatus first to the J regions of immunoglobulin heavy and light chains.
Because the V, D, and J gene segments are so spread out along the chromosome, higher order chromatin structure must also play a role in the regulation of V(D)J recombination. Chromatin visualization techniques have shown that chromatin folds extensively into loops of various lengths that cluster into the form of rosettes (Figure 6-13a). Clustering is regulated by the binding of proteins to specific sites on the DNA. Notable among these site-specific DNA-binding proteins is the factor CTCF, which binds specifically to regions with the DNA sequence CCCTC. Recent experiments have demonstrated that the three-dimensional structure of these loops is altered in real time in a surprisingly orderly fashion and affects variable region recombination.
"Part a shows the chromatin configurations. In the pre-pro-B cell, the distal and proximal V regions each form complex loops of six bulges, forming a rosette shape. Next to the proximal V regions, the D regions and J H region form a loop, and the C H region forms another loop, after which the sequence forms a long tail. In the pro-B cell, the two rosette shaped V regions are partially overlapped, with the D and J loop and C loop next to them.
Part b shows the micrographs of the two chromatin configurations. The Pre-pro-b micrograph shows scattered blue shapes with a group of three small pink shapes in one area and a pair of two small pink shapes in another area, each representing I G genes. The Pro-B micrograph shows two small separate pink shapes among scattered blue shapes."
Detailed analysis of the three-dimensional structure of the Ig heavy-chain gene locus has indicated that it initially appears to be arranged in space into three rosette-containing chromatin regions. One of these regions contains the distal VH genes (those farthest from the D region); a second contains the proximal VH genes (those nearest to the D region); and the third contains the gene segments of the D, JH, and CH regions. Since recombination events are topologically limited to include only genes within a rosette, the rosette loop that contains the D, JH, and CH regions defines the scope of RAG activity in the earliest B lymphoid precursors, pre-pro-B cells. Once DH-JH recombination has occurred, the loop structure is altered to allow VH-DH recombination at the pro-B-cell stage. Figure 6-13a illustrates the change in the structure of the Ig loci as development proceeds.
The alteration in chromatin topology can be visualized microscopically as a locus contraction event (Figure 6-13b) and has been shown to depend on the binding of transcription factors, including Pax-5, to the chromatin. It is thought that Pax-5, a key transcription factor inducing the formation of B lymphocytes (see Chapter 9), interacts with proteins that control the formation of the base of the loops. Once V(D)J recombination has occurred successfully on one allele, the inactive chromosome is decontracted.
Although it is tempting to speculate that selective placement of histone marks and regulated contraction of the chromosomal regions bearing Ig genes can together explain the exquisitely controlled ordering of Ig gene rearrangements, it is now clear that other factors are involved. Specifically, observations of the Igκ locus demonstrated that it is contracted in both pro-B cells and pre-B cells and has the potential for similar levels of long-range interactions at both of these cell stages. However, Vκ rearrangements do not occur at the pro-B-cell stage of development, but rather are delayed until after IgH rearrangements are complete, at the pre-B-cell stage. If the Vκ locus is contracted as early as the pro-B-cell stage, what is preventing Vκ rearrangement from occurring then?
A further aspect of the regulation of RAG activity concerns the manner in which the intra-nuclear localization of the antigen receptor chromatin is altered in order to make available the relevant genes to the recombinase. Within the nucleus, inactive chromatin is found in regions associated with the nuclear lamina, which lies immediately inside the nuclear membrane. Indeed, some parts of the chromatin can be shown to be tethered to the nuclear lamina. Such inactive chromatin is unable to participate in either transcription or recombination.
In contrast, chromatin located in the general nucleoplasm tends to be more active. Considerable data now suggest that antigen receptor loci move away from the nuclear envelope prior to recombination and that those alleles that are excluded from productive rearrangement re-associate with the envelope once recombination terminates. The movement away from the nuclear lamina occurs subsequent to increased histone acetylation at Ig loci.
Figure 6-14 illustrates the sequence of movement of chromosomes within the nucleus during B-cell development. The IgH locus in hematopoietic progenitor cells and early pre-pro-B cells is associated with the inner nuclear lamina. Thus, colocalization with the nuclear lamina as well as the physical nature of the chromatin loops ensures that the only transcriptional and associated recombinational events that can occur are restricted to the heavy-chain D and J regions. As the B cell enters the pro-B-cell stage, the V gene locus moves away from the nuclear lamina and the entire locus contracts under the influence of Pax-5, facilitating rearrangements with distal as well as proximal VH gene segments
"In the pre-pro cell, heavy chain D J H rearrangement begins. Linear clusters of I G H and I G kappa are around the edges of the nucleus. Next, in the pro-B cell, heavy chain V H D J H rearrangement occurs. A C D 19 component goes through the nuclear lamina and the I G H and I G kappa clusters each form a circular chain. Next, in the pre-B cell, heavy chain (pre-BCR) expression and light chain V L J rearrangement occurs. A pre-BCR antibody is attached to the outside of the nucleus and the I G H and I G kappa clusters have switched places."
Correspondingly, movement of the Igκ locus into the central nucleoplasmic region has also been demonstrated at the pre-B-cell stage, when light-chain rearrangement occurs, indicating that the positioning of the immunoglobulin loci in the nuclear environment, as well as the extent of locus contraction at Ig genes, together determine the capacity for recombination.
Following a productive recombination at one Ig receptor allele, the potential for recombination at the corresponding allele is shut down, a process known as allelic exclusion (discussed further below). At this point, it remains unclear whether allelic exclusion is correlated with movement of the excluded allele back toward the nuclear lamina. However, in the case of the IgH locus, suppression of the inactive allele has been associated with relocation to heterochromatic regions of the nucleus under the influence of the action of the ataxia telangiectasia mutated (ATM) protein.