The Mechanism of V(D)J Recombination

In V(D)J recombination, the DNA encoding a complete antibody V region is assembled from V, D, and J (heavy chain) or from V and J (light chain) segments that are initially separated by many kilobases of DNA. Each developing B cell generates a novel pair of heavy- and light-chain variable region coding sequences by recombination of its genomic DNA. This recombinational event is catalyzed by a set of enzymes (Table 6-2), many of which are also involved in nonhomologous end-joining (NHEJ) DNA repair functions that occur in all cells. Because V(D)J recombination entails the cutting of DNA at both strands, and because inappropriate recombination is potentially catastrophic for the cell, mechanisms have evolved that restrict antigen receptor gene recombination events to the appropriate sites on the Ig genes and to ensure that they occur only during defined periods of B and T cell development.

TABLE 6-2 Proteins involved in V(D)J recombination

Protein Function in V(D)J recombination Immunological consequences of protein deficiency
  • Lymphoid-Specific Proteins

RAG1/2

Antigen receptor gene recombinase complex. DNA cleavage is mediated by RAG1. Epigenetic targeting is directed by RAG2.

Severe combined immuno-deficiency (SCID)

Terminal deoxyribonucleotidyl transferase (TdT)

Adds nontemplated (N) nucleotides to V-D and D-J joints of Ig heavy chain and all joints of TCR chains in a template-independent manner.

Reduced N-nucleotide addition is seen at coding joints

  • Non-Lymphoid-Specific Proteins

High mobility group B proteins 1 and 2 (HMGB1/2)

Stabilize binding of RAG1/2 to recombination signal sequences (RSSs). Stabilize introduction of bend into 23 RSS DNA by RAG1/2.

No information available

  • Non–Lymphoid-Specific Proteins of the Nonhomologous End-Joining (NHEJ) DNA Repair Pathway

Ku70/80

Complex is recruited to DNA double-strand (DS) breaks. Stabilizes and aligns DNA ends prior to repair. Essential for both signal and coding joint repair in Ig and TCR genes. Recruits DNA-PKcs protein.

SCID occurs in the absence of either or both Ku proteins. Knockout mice are also small in size and sterile

DNA-PKcs

A protein kinase that forms a complex with Ku70/80. It phosphorylates and actives Artemis. It recruits the ligation machinery.

SCID occurs. Knockout mice otherwise develop normally

Artemis

Once Artemis has been phosphorylated by DNA-PKcs, it opens the hairpin on the coding end joint.

Artemis-deficient B and T cells have blocked formation of coding joints and accumulation of hairpin-sealed coding ends. Mice lacking Artemis have severely impaired B- and T-cell development

DNA ligation complex: DNA ligase IV, XRCC4, and XLF (Cernunnos)

XRCC4 maintains stability of DNA ligase IV and stimulates its catalytic activity. XRCC4 may also help to align DNA ends. DNA ligase IV is required for ligation of cut DNA ends, at both the coding and the signal joints.

Lack of DNA ligase IV or of XRCC4 causes a complete block in lymphoid development (SCID). Mice lacking XLF show enhanced sensitivity to radiation-induced DNA damage, but do not develop significant immunodeficiency

Unusual DNA polymerases, such as DNA Pol μ and DNA Pol λ

These polymerases add nucleotides at Ig heavy chain and TCR antigen receptor loci. In the absence of TdT activity, some N-nucleotide addition is observed, which is thought to be the result mainly of DNA Pol μ action. Whereas DNA Pol λ requires a template strand, DNA Pol μ, like TdT, can act in a template-independent manner. Pol μ participates primarily in heavy-chain and Pol λ in light-chain rearrangements.

Defects in the development of hematopoietic cells

Ataxia telangiectasia mutated (ATM) protein

A kinase that binds double-stranded breaks in DNA and blocks entry into the cell cycle until the breaks can be repaired. The ATM protein is recruited to the double-strand breaks by the MRN (Mre11/Rad50/Nbs1) complex.

Lymphopenia and predisposition to thymic lymphomas characterized by translocations involving TCR genes

This degree of accuracy is accomplished in part by the fact that the recombination enzymes recognize specific DNA sequence motifs called recombination signal sequences (RSSs). These sequences also ensure that one of each type of segment (V and J for the light chain, or V, D, and J for the heavy chain) is included in the recombined heavy- and light-chain genes. During cleavage and ligation of the segments, the DNA is edited in various ways, adding further variability to the recombined gene. As you will see later in this chapter, similar, although not identical, mechanisms operate to generate complete T-cell receptor genes in developing thymocytes. We will also discuss the roles of epigenetic modifications and chromatin structure in restricting recombination to the relevant regions of antigen receptor genes.

V(D)J Recombination in Lymphocytes Is a Highly Regulated Sequential Process

Different immunoglobulin variable region gene segments are recombined at specific stages in lymphoid development (see Chapter 9, Figure 9-4). As B cells develop in the bone marrow, the first step in the creation of a mature immunoglobulin receptor is recombination that brings together a D and a JH gene segment. This step occurs at a very early stage of development, termed the pre-pro-B cell to the early pro-B-cell stage, while the cell is still in the bone marrow and just beginning its journey of differentiation into a mature B cell (see Chapter 9). Recombination between the VH and D-JH segments follows during the pro-B-cell stage.

If V(D)J recombination is successful and a heavy-chain variable region is generated, the resulting heavy-chain protein is placed onto the cell surface in combination with a nonvariable pair of proteins, VpreB and λ5 (called a surrogate light chain), to form a pre-B-cell receptor (Figure 6-7a). Signaling from the pre-B-cell receptor halts heavy-chain recombination, initiates several rounds of proliferation, and then calls for the beginning of light-chain recombination. Light-chain recombination occurs at the small pre-B-cell stage of B-cell development. Light-chain recombination in the mouse initiates at the κ locus, and if this is not successful, continues at the λ locus. In each case, recombination occurs on one allele at a time. In humans, light-chain recombination may start at either the κ or the λ locus. Expression of an intact membrane IgM B-cell receptor (Figure 6-7b) shuts off further light-chain gene rearrangement.

Two illustrations show the pre-BCR and BCR complexes.

FIGURE 6-7 Pre-BCR and BCR complexes. (a) The pre-BCR: the μ heavy chain is expressed on the cell surface prior to light-chain rearrangement, in concert with the surrogate light chain, made up of VpreB and λ5, and the signaling components CD19 and Igα and Igβ (Igα, Igβ). The pre-BCR complex signals via Igα, Igβ that heavy-chain rearrangement is completed and initiates several rounds of proliferation, after which light-chain rearrangement begins. (b) The mature BCR complex is shown here for comparison.

Recombination Is Directed by Recombination Signal Sequences

In the late 1970s, investigators sequencing light-chain genes first described two blocks of conserved sequences—a nonamer (a set of 9 bp) and a heptamer (a set of 7 bp)—that are highly conserved and occur in the noncoding regions upstream of each J segment. The heptamer appeared to end exactly at the J region coding sequence. Further sequencing showed that the same motif was repeated in an inverted manner on the downstream side of the V region coding sequences, again with the heptamer sequence ending flush with the V region gene segment (Figure 6-8a).

Two illustrations show the two conserved sequences in light chain and heavy chain DNA function.

FIGURE 6-8 Two conserved sequences in light-chain and heavy-chain DNA function as recombination signal sequences (RSSs). (a) Both signal sequences consist of a conserved heptamer and conserved AT-rich nonamer separated by nonconserved spacers of 12 or 23 bp. (b) The two types of RSS have characteristic locations within λ-chain, κ-chain, and heavy-chain germ-line DNA. During DNA rearrangement of the Ig heavy and light chains, gene segments adjacent to the 12-bp RSS can join only with segments adjacent to the 23-bp RSS.

Between the nonamer and heptamer sequences, the researchers described a spacer sequence of either 12 or 23 bp in length. Although the nucleotide sequence of the RSS spacer is not well conserved, the significance of the spacer lengths was immediately clear; 12 nucleotides is about the length of one turn of the double helix, and 23 nucleotides is about two turns. In this way, the spacer sequence ensures that the ends of the nonamer and heptamer closest to the spacers would be on the same side of the double helix and therefore accessible to binding by the same enzyme. The investigators correctly concluded that they had discovered the DNA signal sequence that directs recombination between the V and J gene segments. They therefore termed this “heptamer-spacer-nonamer” motif the recombination signal sequence or RSS.

Nucleotide sequencing of the RSS demonstrated that it consists of three elements:

In the heavy-chain gene segments, a similar pattern was noted. The spacer regions separating the heptamer and nonamer pairs were 23 bp in length following the V segments and preceding the J segments, and 12 bp in length before and after the D segments. The relative locations of the 12- and 23-bp spacers (Figure 6-8b) suggested that the VDJ recombinase enzyme is designed to pair one RSS bearing a 12-bp spacer with a second RSS that includes a 23-bp spacer, something we now know to be the case. This is referred to as the 12/23 rule.

Figures 6-9a and b illustrate the manner in which the RSSs act to bring together the appropriate gene segments during the generation of complete light-chain and heavy-chain variable region genes.

Illustrations show the recombination between gene segments in germ-line light chain DNA and heavy chain DNA.

FIGURE 6-9 Recombination between gene segments is required to generate complete variable region light- and heavy-chain genes. (a) Recombination between a V region (in this case, V3) and a J region (in this case, J3) generates a single V-J light-chain gene in each B cell. The recombinase enzymes recognize the RSS downstream of the V region (orange triangle) and upstream of the J region (brown triangle). In every case, an RSS with a 12-bp (one-turn) spacer is paired with an RSS with a 23-bp (two-turn) spacer. This ensures that there is no inadvertent V-V or J-J joining in Ig genes. (b) Recombination of V (blue), D (purple), and J (red) segments creates a complete heavy-chain variable region gene. Again, the recombinase enzyme recognizes the RSS sequences downstream of the V region, up- and downstream of the D region, and upstream of the J region, pairing 23-bp spacers with 12-bp spacers.

Gene Segments Are Joined by a Diverse Group of Proteins

In the early 1990s two proteins, encoded by RAG1 (recombination activating gene 1) and RAG2 (recombination activating gene 2), were shown to be required for recombining antibody variable region gene segments. The RAG1 and RAG2 genes are just 8 kb apart and are transcribed in opposite directions. RAG gene expression occurs only in cells of the immune system, is developmentally regulated in both T and B cells, and coincides with those periods during lymphoid development when receptor genes are being assembled (see Chapters 8 and 9). The RAG1/2 protein complex is required for RSS recognition and targeted cleavage of the DNA at the junction between the RSS and the respective variable region–coding segments.

The functional RAG1/2 occurs as a tetramer, with each protein being represented twice in the active protein complex. Recent x-ray crystallographic analysis has shed light on the relative locations of the two RAG1 and two RAG2 monomers within the tetrameric complex (Figure 6-10a and b).

Illustrations show the structural features and gene sequence of the RAG 1 and 2 recombinase proteins.

FIGURE 6-10 Structural features of the RAG1/2 recombinase proteins. (a) The tetrameric RAG1/2 proteins are shown in three dimensions drawn from x-ray crystallographic analysis, in complex with the RSSs, positioning the 12- and 23-bp spacer sequences to enable cleavage at the boundary of the coding sequence and the heptamer of the RSS. (b) A hypothetical model of how two coding regions to be joined may be arranged spatially, stabilized by the RAG1 and RAG2 recombinase complex. (c) The RAG1 protein is 1040 amino acids in length. The region of RAG1 that binds to the RSS nonamer lies to the amino-terminal side of the catalytic site, which contains three acidic amino acids, D600, D708, and E962, necessary for DNA cleavage. The heptamer binding site and several residues that interact with RAG2 also lie within the core region. The ZnA region is important for homodimerization with the RAG1 partner in the functional tetrameric protein. The amino-terminal 383 amino acids in RAG1 enhance its catalytic activity. The ubiquitin ligase activity ubiquitinylates histone H3, and may aid in releasing initial tight binding of RAG1 and allowing catalysis to proceed. RAG2 interacts with RAG1 and, like RAG1, also exists as a dimer in the RAG1/2 tetramer. The amino-terminal core region of RAG2 enhances the binding of RAG1 to DNA and is required for DNA cleavage to occur. A plant homeodomain (PHD) finger (so named because it was originally discovered in plants) binds specifically to a trimethylated lysine residue at position 4 in histone H3 (H3K4me3) and is critical for guiding RAG1/2 to regions of active chromatin. At the carboxyl-terminal end of RAG2 is a threonine residue, T490, which is phosphorylated during the S, G2, and M phases of the cell cycle, inducing RAG2 proteolysis. Therefore RAG2 is active only in nondividing cells.

Figure 6-10c shows the location of functionally important sequences in the primary structures of RAG1 and RAG2. The active site of the recombinase complex is located in the RAG1 subunit and contains two aspartic acid residues and one glutamic acid residue (D600, D708, and E962, respectively). This active site “DDE motif” is found in many enzymes that cleave DNA, such as endonucleases, transposases, and recombinases. Figure 6-10c also identifies the domains on RAG1 that bind to both the nonameric and heptameric regions of the RSS and shows that the heptamer-binding region overlaps with the part of RAG1 that interacts with RAG2. ZnA and ZnB on RAG1 are regions of the proteins that form zinc fingers, elongated protein domains stabilized by the coordination of a Zn2+ ion.

The ZnA region of the protein assists in the initial RAG1 binding to active chromatin via its ability to interact with histone H3. However, once the RAG1/2 complex is in place, this binding then inhibits cleavage of DNA by RAG1/2. The ubiquitin ligase activity located in the same region of the protein is thought to ubiquitinylate the H3 protein, allowing release of RAG1 to mediate its DNA cleavage function. More recently, this same ubiquitin ligase activity has been shown to auto-activate the RAG1 recombinase activity.

The core region necessary for RAG2 activity lies at the amino-terminal end of the RAG2 molecule, while the plant homeodomain (PHD) region helps to guide the complex to active DNA bearing the H3K4me3 histone mark. Threonine residue T490 on RAG2 is phosphorylated during the S, G2, and M phases of the cell cycle, and this phosphorylation triggers the destruction of RAG2. This ensures that the complex does not cut DNA while the cell is undergoing division.

Biochemical experiments have demonstrated that the essential activities of the RAG1/2 complex can be accomplished by a so-called “core complex,” which consists of residues 384–1008 of RAG1 and residues 1–383 of RAG2.

Only three of the proteins implicated in V(D)J recombination are unique to lymphocytes: RAG1, RAG2, and terminal deoxynucleotidyl transferase (TdT). Like RAG1/2, TdT is also expressed only in developing lymphocytes. It adds nontemplated (“N”) nucleotides to the free 3′ termini of coding ends of heavy-chain V, D, and J segments following their cleavage by RAG1/2 recombinases. (These nucleotides are designated as “nontemplated” because they are not present in the germline, but rather are added to the DNA of a somatic cell.) TdT activity therefore contributes to the generation of additional receptor gene diversity in the CDR3 region of the antibody heavy chain.

Other proteins participating in the recombination process are not lymphoid specific. The high mobility group B proteins 1 and 2 (HMGB1 and HMGB2) act interchangeably to enhance RAG1/2 binding to the RSS and may also facilitate DNA bending at the recombination site. Whereas binding of the RSSs by RAG1/2 requires only RSS and HMGB proteins, other cellular factors, most of which are part of the nonhomologous end-joining (NHEJ) pathway of DNA repair, are necessary to accomplish V(D)J recombination. The involvement of particular proteins at various steps in this process was deduced from observations of V(D)J recombination in natural and artificially generated systems lacking one or more of the proteins. The proteins known to participate in V(D)J joining are described in Table 6-2. Clinical Focus Box 6-2 further describes some of the immunodeficiencies suffered by individuals with mutated or insufficient activities of the enzymes involved in V(D)J recombination. Additional descriptions of these immunodeficiency syndromes can be found in Chapter 18.

V(D)J Recombination Occurs in a Series of Well-Regulated Steps

The process of V(D)J recombination occurs in several well-defined stages (Overview Figure 6-11). The end product of each successful rearrangement is an intact Ig gene, in which V and J (light chain) segments or V, D, and J (heavy chain) segments are made contiguous (flush) with one another, to create a complete heavy- or light-chain gene. The new joints in the antibody V region gene, created by this recombination process, are referred to as coding joints. During the process of V(D)J recombination of the heavy-chain variable region, or of V-J recombination of the λ-chain variable region, the intervening DNA is deleted and lost as an excision circle, or episome (Figure 6-11a). In the case of the κ light-chain gene, about 50% of the Vκ gene segments in the germ line are found in the opposite transcriptional orientation to the Jκ gene segments. In these cases, the intervening DNA is inverted and the excised sequences are retained on the chromosome upstream of the recombined gene (see Figure 6-11b). Regardless of whether the joints between the two RSS heptamers are lost as excision circles or retained in upstream DNA, they are referred to as signal joints.

The first phase of the recombination process, DNA recognition and cleavage, is catalyzed by the RAG1/2 proteins acting in concert with an HMGB1/2 protein. The second phase, end processing and joining, requires, in addition to RAG1/2, a more complex set of enzymatic activities: Artemis, other NHEJ proteins, and TdT (for heavy-chain recombination only). The individual steps involved in the process of recombination between Vκ and Jκ segments are shown sequentially in Figure 6-12.

Illustrations show the steps in the mechanism of V (D) J recombination for V kappa to J kappa joining.
Illustrations show the steps in the mechanism of V (D) J recombination for V kappa to J kappa joining.
Illustrations show the steps in the mechanism of V (D) J recombination for V kappa to J kappa joining.

FIGURE 6-12 Mechanism of V(D)J recombination, illustrated for Vκ-to-Jκ joining. The RAG1/2 tetramer (aqua ovals) and HMGB1/2 proteins bind to the RSSs and catalyze synapse formation. The coding (5′ → 3′) strand of DNA is drawn as a thick line, and the noncoding (3′ → 5′) strand as a thin line. For steps 2 to 5 we show only the events associated with the Vκ region gene segment, although the single-strand cleavage, hairpin formation, and templated nucleotide addition occur simultaneously at the borders of the Vκ and Jκ segments. In this example, the V and J regions are encoded in the same direction on the chromosome, and so the DNA encoding the RSSs and the intervening DNA is released into the nucleus as a circular episome and will be lost on cell division. The DNA that was on the coding strand of the V region prior to rearrangement is emboldened. The signal joint is between the residues that were in contiguity with the V and J regions, respectively. Only the heptamer sequence is written out, to preserve clarity. Nucleotides encoded in the germ-line genome are shown in black; P nucleotides are in blue; and nontemplated nucleotides added by TdT at heavy-chain VD and DJ joints are shown in red. Steps 8, 9, and 10, shown on the facing page, occur only in heavy-chain loci. See text for details.

Step 1 Recognition of the recombination signal sequence (RSS) by the RAG1/RAG2 enzyme complex. The RAG1/2 recombinase tetramer forms a complex with the RSS next to one of the two gene segments to be joined. Binding is usually, but not always, initiated at the RSS containing the 12-bp spacer. Binding of the RAG1/2 complex is enhanced by the HMGB1/2 proteins, which may also serve to induce and stabilize bending of the DNA, facilitating its cleavage. The second RSS is then bound by the RAG1/2 complex and the two gene segments to be joined are brought into close contact (synapsis). Current models based on recent crystallographic studies suggest that binding of one type of spacer induces a conformational change in the RAG1/2 DNA-binding site that specifically accommodates the opposite type of spacer, thus enforcing the 12/23 rule.

Step 2 One-strand cleavage at the junction of the coding and signal sequences. The RAG1 protein then creates single-strand nicks, 5′ of the heptameric signal sequence on the coding strand of each V segment (i.e., at the junction between the V segment and the heptamer) and at the heptamer–J region junction. (Figure 6-12 shows this process for the V segment only.)

Step 3 Formation of V and J region hairpins and blunt signal ends. The free 3′-hydroxyl group at the end of the coding strand of the V segment now attacks the phosphate group on the opposite, noncoding V strand, forming a new covalent phosphodiester bond across the double helix and yielding a DNA hairpin structure on the V segment side of the break. This is called the coding end. Simultaneously, a blunt DNA end is formed at the edge of the heptameric signal sequence as a result of making a clean cut through both strands of DNA, with no overhang. This is the signal end. The same process occurs simultaneously on the J side of the incipient joint. At this stage, the RAG1/2 proteins and HMGB1/2 proteins are still associated with the coding and signal ends of both the V and J segments in a postcleavage complex. The serine/threonine kinase protein, ataxia telangiectasia mutated (ATM), is thought to play an important role in stabilizing this complex and minimizing aberrant recombination events at this point in the process.

Step 4 Ligation of the signal ends. The NHEJ protein, DNA ligase IV, then ligates the free blunt ends to form the signal joint.

Step 5 Hairpin cleavage. Next, the hairpins at the ends of the V and J regions are opened by the endonuclease, Artemis, in one of three ways. The identical bond that was formed by the reaction described in step 3 may be reopened to create a blunt end at the coding joint. Alternatively, the hairpin may be opened asymmetrically either on the “top” or on the “bottom” strand, to yield a 5′ or a 3′ overhang, respectively. Artemis is a member of the NHEJ pathway and requires activation by the NHEJ kinase, DNA-PKcs, which binds to the DNA hairpin ends via its DNA-binding protein subunits Ku70/80. The most common overhang created by Artemis-mediated cleavage at immunoglobulin gene junctions is a 3′ overhang that leaves two unpaired residues. In addition to hairpin opening, the Artemis–DNA-PKcs complex also possesses both single- and double-stranded DNA endonuclease activity that is capable of removing several DNA bases or base pairs on each side of the nascent joint. This activity is rarely observed at the signal joint, but occurs often at the coding joint. The number of nucleotides that can be lost on each side of the joint ranges from 0 to 14.

Step 6 Overhang extension can lead to addition of palindromic nucleotides. In Ig light-chain rearrangements, nucleotide overhangs resulting from the steps described previously can act as substrates for NHEJ DNA repair enzymes, leading to double-stranded palindromic (P) nucleotides at the coding joint. For example, the top row of bases in the V region shown in Figure 6-12, step 6, reading in the 5′ to 3′ direction, reads TCGA. Reading backward on the bottom strand from the point of ligation also yields TCGA. The palindromic nature of the bases at this joint is a direct function of an asymmetric hairpin-opening reaction. P-nucleotide addition can also occur at both the V-D and D-J joints of the heavy-chain gene segments but, as described below, other processes can intervene to add further diversity at the VH-D and D-JH junctions.

Step 7 Ligation of light-chain V and J segments. DNA ligase IV repairs the signal joints, as well as the coding joints. DNA ligase IV is usually found in complex with XRCC4, which helps to activate it. However, whereas at the signal joints, ligation almost always occurs without the addition or deletion of any nucleotides, the situation can be more complex at the coding joint. The enzymes of the NHEJ pathway include polymerases as well as Artemis and DNA ligase. As mentioned earlier, the endonuclease activity of Artemis will sometimes nibble at the coding ends after hairpin opening (see Table 6-2). In addition, the DNA polymerases associated with the NHEJ pathway, in particular DNA polymerase (Pol) λ and DNA Pol μ, are less faithful than the conventional DNA polymerase even when acting in a template-dependent manner. Even more dramatically, DNA Pol μ, like TdT, is capable of polymerizing DNA in a non–template-dependent manner and is therefore capable of adding random nucleotides at the coding joint.

Thus, NHEJ repair mechanisms can generate significant nucleotide diversity at the light-chain coding joint, even in the absence of TdT, which acts mainly at the heavy-chain joints.

Comparative sequence analysis of germ-line and mature B-cell Ig genes demonstrated that particularly extensive addition of nontemplated nucleotides could be identified in heavy-chain sequences. These additional nucleotide sequences occurred at both the VH-D and D-JH joints. In addition, careful comparative sequencing of germ-line versus somatic B-cell Ig heavy-chain sequences revealed that nucleotides were also often lost at these junctions. Two distinct types of enzyme-catalyzed activities are responsible for these findings in VH sequences.

Step 8 Exonuclease trimming. Exonuclease activity trims back the edges of the V region DNA joints. Since the RAG proteins themselves can trim DNA near a 3′ flap, it is possible that the RAG proteins may cut off some of the lost nucleotides. Alternatively, as described in step 5, the Artemis–DNA-PKcs complex could be the enzyme responsible for the V(D)J-associated endonuclease function. Extensive exonuclease trimming is more common at the two heavy-chain V gene joints (V-D and D-J) than at the light-chain V-J joint. In cases where trimming is extensive, it can lead to the loss of the entire D region as well as the elimination of any P nucleotides formed as a result of asymmetric hairpin cleavage.

Step 9 N-nucleotide addition. (Most probably occurs simultaneously with step 8.) Nontemplated (N) nucleotides are added by TdT to the coding joints of heavy-chain genes after hairpin cleavage. This enzyme can add up to 20 nucleotides to each side of the joint. The two ends are held together throughout this process by the RAG1/2 enzyme complex. TdT-mediated N-nucleotide addition at the coding joints of the heavy-chain genes is more commonly observed than at light-chain joints, because TdT is expressed at the earliest phases of V(D)J recombination when the heavy chain, but not the light-chain, genes are being rearranged. TdT activity is then usually turned off before light-chain rearrangements begin in mice, although residual TdT activity is found during light-chain rearrangement in humans. In addition, as described above, some nontemplated nucleotide addition at light-chain joints may be mediated by the NHEJ DNA Pol μ.

Step 10 Ligation and repair of the heavy-chain gene. This final step is identical to the ligation and repair for the light-chain genes and is mediated by DNA ligase IV acting in concert with XRCC4.

When considering the creation of an immunoglobulin variable region gene, we must always take into account the fact that nucleotide addition and/or exonuclease trimming at the V(D)J joints does not necessarily occur in sets of three nucleotides, and so can lead to out-of-phase joining. Recombined V segment sequences in which trimming has caused the loss of the correct reading frame for the transcription process cannot encode antibody molecules, and such rearrangements are said to be unproductive. If recombination at one heavy-chain locus is unproductive, rearrangement at the other allele is immediately initiated. Unproductive rearrangement at both alleles leads to apoptosis of the developing cell as it fails to receive necessary survival signals from the pre-BCR (see Chapter 9). Once light-chain rearrangements begin, sequential rearrangement of light-chain alleles occurs if prior rearrangements are unsuccessful.

The cleavage and rearrangement of DNA segments within somatic cells of a mammalian genome is an unusual occurrence and led scientists to question how such a mechanism might have evolved. Compelling evidence now supports the evolutionary origin of the genes encoding the RAG1/2 complex as a transposon unit that hopped into a primitive antigen receptor gene; this is discussed further in Evolution Box 6-3.

Five Mechanisms Generate Antibody Diversity in Naïve B Cells

The above description allows us to understand how such an immensely diversified antibody repertoire can be generated from a finite amount of genetic material. To summarize, the diversity of the naïve BCR repertoire is shaped by the following mechanisms (Table 6-3, first two columns):

  1. Multiple gene segments exist at heavy (V, D, and J) and light-chain (V and J) loci. These can be combined with one another to provide extensive combinatorial diversity.
  2. Heavy-chain/light-chain combinatorial diversity: The same heavy chain can combine with different light chains, and vice versa. The combination of different heavy- and light-chain pairs to form a complete antibody molecule provides further opportunities for increasing the number of available antibody combining sites.
  3. P-nucleotide addition results when the DNA hairpin at the coding joint of heavy and light chains is cleaved asymmetrically. Filling in the single-stranded DNA piece resulting from this asymmetric cleavage generates a short palindromic sequence.
  4. Exonuclease trimming sometimes occurs at the V-D-J and V-J junctions, causing loss of nucleotides.
  5. Nontemplated (N)-nucleotide addition by TdT in heavy-chain V-D and D-J junctions and from DNA polymerase μ in both heavy and light chains.

TABLE 6-3 Comparison of the mechanisms for the generation and expression of diversity among B-cell and T-cell receptor molecules

Mechanism Used in B cells Used in T cells Comments

Multiple germ-line V(D)J genes

Yes

Yes

The mouse Vλ locus has undergone a severe contraction, and, therefore, only 5% of mouse light chains are of the λ type. The TCR γ-chain locus also has few V genes

J region diversity is notably higher in TCR α-chain genes than in other TCR or Ig genes

Light-chain segment use

κ and λ variable regions encoded by V and J segments

α and γ variable regions encoded by V and J segments

Heavy-chain segment use

VH regions encoded by V, D, and J segments

β and δ variable regions encoded by V, D, and J segments

Absolute dependence on RAG1/2 expression

Yes

Yes

Junctional diversity: P-nucleotide and N-nucleotide addition

Yes

Yes

Many fewer N nucleotides found in Ig light chains because of developmental regulation of TdT

Multiple D regions per recombined chain

No

Present only in TCR δ

The presence of two D segments allows an additional site for N-nucleotide addition

Allelic exclusion of receptor gene expression

Absolute

Allelic exclusion of TCR α genes is not absolute

On activation, secretes product with the same binding site as the receptor

Yes

No

Nature of constant region determines function

Yes; constant region of secreted antibody product determines its function.

Constant region of membrane receptor anchors receptor in membrane and connects with signal transduction complex

No secreted product. Constant region of membrane receptor anchors receptor in membrane and connects with signal transduction complex

Receptor genes undergo somatic hypermutation following antigenic stimulation

Yes

No

Mechanisms 3, 4, and 5 give rise to striking sequence diversity at the junctions between gene segments, and result in the formation of the highly variable CDR3 regions of the antibody heavy and light chains.

Together, these five mechanisms are responsible for the creation of the repertoire of BCRs that is available to organisms before any contact with pathogens or other antigens has occurred, the so-called naïve BCR repertoire.

Note that we have described the process of the generation of the primary Ig variable region repertoire as it occurs in humans and rodents. Although the same principles apply to most vertebrate species, different species have evolved their own variations. For example, the process of gene conversion is used in chickens and rabbits, and some species, such as sheep and cows, use somatic hypermutation in the generation of the primary as well as the antigen-experienced repertoire.

The Regulation of V(D)J Gene Recombination Involves Chromatin Alteration

In attempting to understand the complex process of V(D)J recombination and its regulation, investigators must address the question of how Ig gene recombination happens only at particular stages of B-cell development and how two RSSs, located many kilobases or even megabases apart in the linear DNA sequence, are brought into sufficiently close apposition for accurate recombination to succeed.

Because of its capacity to induce genome instability by introducing double-strand breaks in DNA, the expression of the RAG1/2 complex must be tightly regulated. The enzyme complex is expressed only in lymphoid cells at specific periods in lymphoid development (see Chapters 8 and 9). Furthermore, as described earlier, it is inactivated prior to the cell’s entry into S phase, when double-stranded breaks in the DNA might interfere with regulated chromatin distribution into daughter cells. Kinases coupled to the cell cycle phosphorylate RAG2 prior to the G1-S cell cycle transition, targeting RAG2 for ubiquitin-dependent protein degradation prior to entry into S phase (see Figure 6-10c).

However, once RAG1/2 is expressed, how does the cell ensure that its activity is appropriately restricted to the correct sites on the chromatin? Although the native RAG recombinase complex is quite difficult to isolate, a core, catalytically active RAG1/2 heterodimer can be purified quite readily, and has been used to determine the binding specificity and orientation of the recombinase.

On isolated DNA fragments, the core RAG recombinase binds specifically to recombination signal sequences (RSSs), although it also binds to other sites on the genome that lack extensive sequence homology to the RSS. The nonamer-binding domains of RAG1 interact with the A-rich tract of the RSS nonamer. Additional regions in the RAG1 core also interact with the RSS heptamer and with the end of the V, D, or J coding sequence, as well as mediating the catalytic reaction. In general, the isolated core recombinase operating on purified DNA fragments tolerates more considerable sequence variation in the RSS spacer, the heptamer, and even the nonamer than is observed in vivo.

In vivo, the catalytic activity of RAG1/2 occurs in an extraordinarily complex chromosomal environment, and analysis of V(D)J recombination regulatory mechanisms in the native chromosomal context has required the refinement of techniques capable of analyzing the interactions between proteins and nuclear DNA folded within its native chromatin structure. Three techniques in particular: chromatin immunoprecipitation (ChIP); multicolor, three-dimensional fluorescence in situ hybridization (3-D FISH); and methods that identify DNA sequences that interact with one another within the context of active chromatin (e.g., Hi-C), are described in Chapter 20. These approaches were all used to generate the information described in this section.

Changes in Histone Marks

RAG1/2 binding is affected by particular epigenetic modifications on the histones associated with target sequences. Recall that eukaryotic DNA is wound around histone octamers to form nucleosomes. The core DNA that is directly associated with each nucleosome is 147 nucleotides in length, and nucleosomes are separated from one another by linker DNA sequences of up to 80 nucleotides in length that interact with histone H1. This “beads on a string” nucleosomal DNA is then coiled into structures of increasing complexity. Histone modifications, such as methylation or acetylation, can affect the degree to which the DNA in the associated chromatin is accessible to enzymatic activities, such as recombination or transcription, by altering the extent of nucleosome packing. The nature of histone modifications or epigenetic marks associated with a set of genes is referred to as its “histone code.” Alterations in the histone code of chromatin associated with immunoglobulin DNA during B-cell development signal the onset of receptiveness of the Ig locus to transcription and recombination.

Analysis of the biochemical basis for RAG recombinase binding to chromatin shows that the RAG2 plant homeodomain region (see Figure 6-10c) interacts with histone H3 that has been trimethylated at the lysine in position 4 of the histone’s amino acid sequence (H3K4me3). This H3K4me3 modification is typically found at transcriptional start sites in active chromatin. Disruption of this RAG2-histone interaction inhibits V(D)J recombination. Biochemical experiments have shown that RAG2 binding to H3K4me3 increases the affinity of the RAG complex for its DNA substrates, possibly by inducing an activating conformational change in RAG1.

Furthermore, it has long been known that one of the earliest steps in Ig gene recombination is the transcription of noncoding RNA from promoters in DNA regions near the Ig gene segments. This germ-line transcription, irrespective of the nature of the RNA product, confirms that the DNA is now accessible for enzymatic manipulation. RNA polymerase II, the enzyme that transcribes the immunoglobulin genes and initiates the germ-line transcription process alluded to above, often travels with the histone methyltransferases, suggesting that the histone modifications that signal active chromatin are mechanistically linked to the germ-line transcription event and that together, they signal the readiness of the germ-line immunoglobulin DNA for recombination.

In immunoglobulin genes, both the trimethylated lysine histone modification and acetylation of histone residues, which also signals open chromatin, are concentrated in the J-gene segment regions of both heavy and light chain–encoding DNA, with a few trimethylated histones found in association with J-proximal D gene segments. Thus, the nature of the histone code directs the recombination apparatus first to the J regions of immunoglobulin heavy and light chains.

Changes in Higher Order Chromatin Structure

Because the V, D, and J gene segments are so spread out along the chromosome, higher order chromatin structure must also play a role in the regulation of V(D)J recombination. Chromatin visualization techniques have shown that chromatin folds extensively into loops of various lengths that cluster into the form of rosettes (Figure 6-13a). Clustering is regulated by the binding of proteins to specific sites on the DNA. Notable among these site-specific DNA-binding proteins is the factor CTCF, which binds specifically to regions with the DNA sequence CCCTC. Recent experiments have demonstrated that the three-dimensional structure of these loops is altered in real time in a surprisingly orderly fashion and affects variable region recombination.

Illustrations show the chromatin configurations in a pre-pro-B cell and a pro-B cell, and micrographs show the same.

FIGURE 6-13 Three-dimensional organization of chromosomal regions containing V, D, and J segments changes during B-cell development. (a) In the earliest stage of B-cell development, the pre-pro stage, the region of the chromosome encoding the heavy chain of the Ig protein is folded into three clearly demarcated rosettes. One of these includes loops of DNA encoding the distal VH regions (those farthest from the CH complex); the second the more proximal array of VH regions; and the third the D, JH, and CH regions. Since recombination occurs only within, but not between rosettes, this functionally restricts recombination to occurring at the D-JH, but not the VH-D, junctions during the earliest developmental stage. At the pro-B-cell stage, the rosette structure is altered and VH-D recombination is permitted. (b) Chromosomes were labeled with baculovirus artificial chromosomal probes that include the entire Ig region. In pre-pro-B cells, the Ig genes could be seen clustered into two or three regions, whereas in pro-B cells, in which H-chain rearrangement is occurring, chromosomal contraction and re-localization bring the H-chain chromosome into a single cluster.

Detailed analysis of the three-dimensional structure of the Ig heavy-chain gene locus has indicated that it initially appears to be arranged in space into three rosette-containing chromatin regions. One of these regions contains the distal VH genes (those farthest from the D region); a second contains the proximal VH genes (those nearest to the D region); and the third contains the gene segments of the D, JH, and CH regions. Since recombination events are topologically limited to include only genes within a rosette, the rosette loop that contains the D, JH, and CH regions defines the scope of RAG activity in the earliest B lymphoid precursors, pre-pro-B cells. Once DH-JH recombination has occurred, the loop structure is altered to allow VH-DH recombination at the pro-B-cell stage. Figure 6-13a illustrates the change in the structure of the Ig loci as development proceeds.

The alteration in chromatin topology can be visualized microscopically as a locus contraction event (Figure 6-13b) and has been shown to depend on the binding of transcription factors, including Pax-5, to the chromatin. It is thought that Pax-5, a key transcription factor inducing the formation of B lymphocytes (see Chapter 9), interacts with proteins that control the formation of the base of the loops. Once V(D)J recombination has occurred successfully on one allele, the inactive chromosome is decontracted.

Although it is tempting to speculate that selective placement of histone marks and regulated contraction of the chromosomal regions bearing Ig genes can together explain the exquisitely controlled ordering of Ig gene rearrangements, it is now clear that other factors are involved. Specifically, observations of the Igκ locus demonstrated that it is contracted in both pro-B cells and pre-B cells and has the potential for similar levels of long-range interactions at both of these cell stages. However, Vκ rearrangements do not occur at the pro-B-cell stage of development, but rather are delayed until after IgH rearrangements are complete, at the pre-B-cell stage. If the Vκ locus is contracted as early as the pro-B-cell stage, what is preventing Vκ rearrangement from occurring then?

The Intranuclear Localization of Antigen Receptor Chromatin

A further aspect of the regulation of RAG activity concerns the manner in which the intra-nuclear localization of the antigen receptor chromatin is altered in order to make available the relevant genes to the recombinase. Within the nucleus, inactive chromatin is found in regions associated with the nuclear lamina, which lies immediately inside the nuclear membrane. Indeed, some parts of the chromatin can be shown to be tethered to the nuclear lamina. Such inactive chromatin is unable to participate in either transcription or recombination.

In contrast, chromatin located in the general nucleoplasm tends to be more active. Considerable data now suggest that antigen receptor loci move away from the nuclear envelope prior to recombination and that those alleles that are excluded from productive rearrangement re-associate with the envelope once recombination terminates. The movement away from the nuclear lamina occurs subsequent to increased histone acetylation at Ig loci.

Figure 6-14 illustrates the sequence of movement of chromosomes within the nucleus during B-cell development. The IgH locus in hematopoietic progenitor cells and early pre-pro-B cells is associated with the inner nuclear lamina. Thus, colocalization with the nuclear lamina as well as the physical nature of the chromatin loops ensures that the only transcriptional and associated recombinational events that can occur are restricted to the heavy-chain D and J regions. As the B cell enters the pro-B-cell stage, the V gene locus moves away from the nuclear lamina and the entire locus contracts under the influence of Pax-5, facilitating rearrangements with distal as well as proximal VH gene segments

An illustration shows the configurations of I G chromosomal regions in a pre-pro-B cell, a pro-B cell, and a pre-B cell.

FIGURE 6-14 Nuclear positioning of IgH and Igκ loci alters during B-cell development. In pre-pro-B cells, both light- and heavy-chain immunoglobulin loci are located close to the nuclear lamina, in regions of heterochromatin. The Ig chromosomal regions are extended and do not permit recombination. In pro-B cells, in which Ig heavy-chain recombination has been initiated, the heavy-chain chromosomes can be found in the interior of the nucleus and the Ig chromosomal regions are contracted, so as to bring regions of recombination into closer proximity. The light-chain chromosome is also contracted, but it remains at the periphery by the nuclear envelope. In pre-B cells, light-chain recombination occurs. In these cells, again, both light- and heavy-chain chromosomes are contracted, but this time, the heavy-chain chromosomes are located at the periphery of the nucleus and the light chains are brought into a more central location. See text for details and chapter-opening photo for fluorescence microscopy data.

Correspondingly, movement of the Igκ locus into the central nucleoplasmic region has also been demonstrated at the pre-B-cell stage, when light-chain rearrangement occurs, indicating that the positioning of the immunoglobulin loci in the nuclear environment, as well as the extent of locus contraction at Ig genes, together determine the capacity for recombination.

Following a productive recombination at one Ig receptor allele, the potential for recombination at the corresponding allele is shut down, a process known as allelic exclusion (discussed further below). At this point, it remains unclear whether allelic exclusion is correlated with movement of the excluded allele back toward the nuclear lamina. However, in the case of the IgH locus, suppression of the inactive allele has been associated with relocation to heterochromatic regions of the nucleus under the influence of the action of the ataxia telangiectasia mutated (ATM) protein.