**2. Organization of the Ig Loci**

114 Gene Duplication

In this article we will focus on the genes encoding the VH domain of Ig (Fig. 1A) as well as those encoding the so-called "constant regions" (C) of IgG, the mammalian flagship antibody isotype. We use as examples the sequences of duplicated VH genes in swine and bats (opposite extremes) and the duplicated C genes encoding the subclasses of swine IgG,

Fig. 1. Duplication/diversification of Ig genes resulted in macromolecules with repeating units. A. The variable heavy chain domain (VH) with its characteristic -barrel or Ig fold. The dark polypeptides connecting the " barrel staves" contain the CDR regions. B. Complete Igs are multidomain molecules comprised of many Ig fold domains. The CDRcontaining peptides occur only in the VH and VL domains. The remaining C-domains comprise the constant region of the Ig. The "monomeric Ig" shown in Fig. 1B is bivalent,

In the interests of those who are not immunologists, we describe the different Ig-loci, how they vary among vertebrates and the processes involved in the generation of the antibody repertoire (Section 2). Section 3 discusses the gene duplication phenomenon which resulted in the polygeny that characterizes the vertebrate Ig genome, while Section 4 reviews the somatic processes that lead to the synthesis and secretion of antibodies in higher vertebrates. Section 5 discusses the selection processes involved in gene usage. Finally, we provide data from studies in fetal/neonatal piglets, newborn rabbits and the chicken, to support the view that only a small number of the many duplicated V-region Ig genes are actually used. We provide examples in which only one or a few VH genes are needed to generate the antibody repertoire so long as the machinery for somatic recombination and somatic mutation is in place. Based on these examples and comparing them to antibody repertoire development in lower vertebrates, we hypothesize that the extensive polygeny in the Ig loci of higher vertebrates exists as an evolutionary vestige but is retained because of its redundancy value. While the recent duplication/diversification of C allows for specialized effector function, IgG in rabbits did not diversify, yet few would argue against the success of this mammalian order. Thus, many of the C duplicons in other mammals may also have been retained for

with two identical VH/VL pairs that contain the antigen binding sites.

**Multidomain** 

**VL** 

 **Ig** 

**VH VH**

as evidence to suggest how this polygeny occurred.

 **Ig Domain** 

 **icon** 

**A B**

**VL**

### **2.1 Translocon organization of gene segments characterizes higher vertebrates**

Figure 2A shows the organization of the light and heavy chain loci. Each locus can be divided into subloci that, from 5' to 3', are known as the V, D, J and C regions. The light chain loci are similar but lack D subloci. As discussed above, the V, D, and J regions encode the antibody binding site for the heavy and light chain, and are comprised of a large number of duplicated gene segments that vary among species (Table 1). The VH and VL gene segments are the largest (~ 300 nucleotides) and encode both framework regions (FR) and CDR1 and CDR2. The FR regions encode the -pleated sequences of the -barrel (Fig. 1A). Displayed in linear fashion FR1, CDR1, FR2, CDR2 and FR3 comprise a VH (or VL) gene (Fig. 4 and 5). The 3' portion of the JH segment (after the tryptophan codon; Fig. 8) encodes FR4 while CDR3 results from the recombination of V-D-J or V-J (see Section 4).

Fig. 2. The translocon organization of Ig genes of mammals. A. Organization of the variable region gene segments of the human heavy chain (VH), kappa ) and lambda () loci. Brackets indicate the number (n) of gene segments of a particular type. Switch regions are depicted with diagonal strips. B. Organization of the constant region of the heavy chain locus of human and rabbit. The site of intralocus segment duplication in humans is indicated.

Immunoglobulin Polygeny: An Evolutionary Perspective 117

Figure 3 illustrates examples of the organization of V-D-J-C segments in three different shark species which are organized as repeating cassettes rather than in translocon fashion. Interestingly, Figure 2A also shows that an apparent evolutionary remnant of this form of organization is still found in the lambda light chain locus of mammals. In the shark, an entire cassette is used for encoding an antibody; recombination among cassettes is unusual. Furthermore, segments within the cassettes of certain sharks are fused in the genome so recombination (Section 4) does not occur. It is believed that the tandem repeat system of sharks later evolved into the translocon system (Marchalonis et al., 1998). In the translocon system, recombination among the various V, D, and J segments can occur and the

**2.2 The Ig loci in sharks and the chicken are differently organized** 

rearranged VDJ is later spliced to a C region exon (Fig. 4A).

**Shark**

**Chicken**

VH gene.

Ratcliffe, 2006).

**3. Duplication and diversification of Ig genes** 

**VH DH DH JH CH**

**VH DH DH JH CH**

**VH DH DH JH CH**

**ψ VH VH1 DH JH CH**

The chicken also displays a translocon system but there is only one functional VH (and one V; not shown), multiple highly similar DH segments and only one JH. All VH genes upstream of VH1 in the chicken are pseudogenes (Fig. 3; Ratcliffe, 2006). These pseudo VH genes are used in SGC to create the chicken antibody repertoire (Reynaud et al., 1987;

Genomic gene conversion was originally described in yeast (Meselson & Radding, 1975; Szostak et al., 1983) and is a form of non-homologous recombination in which the end result

n=25 n=14 n=1

Fig. 3. Organization of heavy chain loci in sharks and chicken. Three different types of clusters are shown for sharks, some in which VDJs are fused in the genome; n= number of repeating clusters. Modified from Dooley & Flajnik, 2006. In the diagram for chicken, the number (n) of gene segments of each type is indicated. Only VH1 of chicken is a functional

**3.1 VH genes display evidence of duplication and genomic gene conversion** 

n=200

The C- region Sublocus is composed of exons of the genes that encode the "constant" domains of the antibody molecule (Fig. 1B). Fig. 2A illustrates the four exons that encode IgM (C). Each exon encodes one of the constant region domains illustrated in Fig. 1B. Each domain possesses the barrel structures that characterizes the minimal structure of proteins encoded by IGSF genes (Fig. 1B). Within the C-region sublocus, sets of exons encode different antibody isotypes: e.g. IgM, IgD, IgG, IgE IgA (Fig. 2B). Two variations of the C-region sublocus are illustrated by the human and rabbit. The distribution of the encoded isotypes among common vertebrates is summarized in Table 2. Of interest to the theme of this review, the C-region sublocus contains a region that contains stretches of exons of duplicated C genes encoding IgG subclasses (Fig. 2B; most mammals) or multiple IgA subclasses (Fig. 2B; in rabbit).


\* Number of families (F) of variable region genes.

\*\* J-C occurs as duplicons (see Fig. 2A).

Table 1. Variable region gene duplication among mammalian antibody genes


\*\* Varies with species

Table 2. Constant region gene duplication among mammalian antibody genes

The C- region Sublocus is composed of exons of the genes that encode the "constant" domains of the antibody molecule (Fig. 1B). Fig. 2A illustrates the four exons that encode IgM (C). Each exon encodes one of the constant region domains illustrated in Fig. 1B. Each domain possesses the barrel structures that characterizes the minimal structure of proteins encoded by IGSF genes (Fig. 1B). Within the C-region sublocus, sets of exons encode different antibody isotypes: e.g. IgM, IgD, IgG, IgE IgA (Fig. 2B). Two variations of the C-region sublocus are illustrated by the human and rabbit. The distribution of the encoded isotypes among common vertebrates is summarized in Table 2. Of interest to the theme of this review, the C-region sublocus contains a region that contains stretches of exons of duplicated C genes encoding IgG subclasses (Fig. 2B; most mammals) or multiple

Species VH (F\*) DH JH V (F\*) J C V (F\*) J C Human 87 (7) 30 9 70 (7) 7 7\*\* 66 (7) 5 1 Mouse >100 (14) 11 4 3 (3) 4 4\*\* 140 (4) 4 1 Rat >100 (11) ? 5 15 (4) 1 1 18 (?) 6 ? Rabbit >100 (1) 12 6 ? (?) 2 2 >36 (?) 5 2 Swine > 20 (1) 2 1 ? (4) >3 >3 60 (2) 5 1 Horse >10 (2) >7 >5 25 (3) 4 4 >20 (?) 5 1 Cattle >15 (2) 3 5 30 (?) >2 4 ? (?) ? 1 Sheep >10 (1) ? 6 >100 (3) 2 2 10 (4) 3 1 VHH 42 (1) Camelid VH 50 (1) 10 6 ? (?) ? 2 ? (?) ? ? Bat >250 (5) ? 13 ? (?) ? ? ? (?) ? ? Oppossum 12 ? ? 30 (3) 6 6 35 (4) >2 1 Platypus 25 (1) >5 7 15-25 (2) 6 4 ? (4) ? ?

IgA subclasses (Fig. 2B; in rabbit).

 \* Additional pseudogenes \*\* Varies with species

 \* Number of families (F) of variable region genes. \*\* J-C occurs as duplicons (see Fig. 2A).

Table 1. Variable region gene duplication among mammalian antibody genes

Table 2. Constant region gene duplication among mammalian antibody genes

Species IgM (C) IgD (C) IgG (C) IgE (C) IgA (C) C C Human 1 1 >4 1\* >1 1\* 2 >4 3\* 1 Mouse 1 1 4 1 1 >3 1\* 1 Rat 1 1 4 1 1 1 ? Rabbit 1 0 1 1 13 8 2 Swine 1 1 6 1 1 >3 1 Horse 1 1 7 1 1 4 1 Cattle 1 1 3 1 1 4 1 Sheep 1 1 >2 1 1 >1 1 Camel 1 ? >3 4\* ? ? 2 1 Cat 1 ? >2 1 ? >1 1 Dog 1 1 4 1 1 >1 1 Bat 1 1\*\* 1 5\*\* 1 1 ? ? Oppossum 1 0 1 1 1 6 1 Platypus 1 0 2 1 2 4 ?
