**Origin of the Genetic Code and Genetic Disorder**

### Kenji Ikehara

*The Open University of Japan, Nara Study Center International Institute for Advanced Studies of Japan Japan* 

### **1. Introduction**

Genetic disorders are illnesses caused by abnormalities in genetic sequences and the chromosome structures. Most base substitutions, which may lead to genetic disorders, would be repressed to a low level as affecting only one person in every thousands or millions by replication repair systems and by robustness of the genetic code, which is discussed in this Chapter. But, once persons were suffered by the genetic disorders, they would probably get serious diseases during their lives. In addition, it is quite difficult to recover the substituted bases causing the genetic diseases to original bases, after persons were suffered by the rarely occurring genetic disorders. This makes a quite big problem of the genetic disorders from a stand point of medical treatment.

The mutations causing the genetic disorders are scattered throughout genes and their neighboring regions as shown in Figure 1 (A). It is also known that many genetic diseases are induced by single-base substitutions or missense mutations including nonsense mutations in genetic regions encoding amino acid sequences of proteins. For instance, sickle-cell anemia, one of the classical genetic disorders, is caused by a one-base replacement at the sixth codon of the hemoglobin β-globin gene, from A to U, which results in one amino acid substitution from glutamic acid to valine, producing an abnormal type of hemoglobin called hemoglobin S (Figure 1 (B)). Hemoglobin S distorts the shape of red blood cells due to hemoglobin aggregation in the cells, especially when exposed to low oxygen levels, resulting in anemia giving a patient malaria resistance. Phenylketonuria (PKU), adenosine deaminase (ADA) deficiency and galactosemia are also caused by one-base replacements in genes of phenylalanine hydroxylase, adenosine deaminase and galactosidase, respectively (Table 1). Of course, deletion and insertion of a small number of bases causing frameshift mutations in a genetic sequence encoding protein may also affect normal life activities, because the frameshift mutation induce a change to different amino acid sequences following the mutation site. Base substitutions also may occur in transcriptional and translational control regions, splicing sites and so on, which affect various functions for gene expression leading to synthesis of lower or higher amounts of proteins than normal level, resulting in many kinds of genetic diseases (Figure 1 (A)).

Origin of the Genetic Code and Genetic Disorder 5

Base substitutions might occur on every gene encoding functional proteins on a whole genome. In fact, about ten thousands genetic diseases are already known until now, out of which several genetic disorders caused by one-base replacements or monogenic disorders

In this Chapter, I will discuss on genetic disorders, which are caused by one-base replacements in coding regions, because I would like to discuss on relationships among robustness of the universal genetic code, base substitutions in codons and genetic disorders from a stand point of the origin of the genetic code. Term of "the universal genetic code", which is widely used in extant organisms, is used in this Chapter, instead of "the standard genetic code", which is used in many textbooks of in the fields of biochemistry and molecular biology since discoveries of non-universal genetic codes in mitochondria of mammals, protozoa and some bacteria. That is because I would like to emphasize that almost all organisms on this planet have actually used the genetic code. I believe that understanding on the relationship between the robustness and base substitutions will contribute to discovery of proper methods for treatments of many genetic disorders in a

Amino acid substitutions not largely affecting normal protein function are observed, as it is known as single nucleotide polymorohisms in the case of human beings. But, amino acid substitutions of mammals evolving at a quite slow rate due to a long generation time, such as about 25 years in the case of human, have occurred at a comparatively low frequency. On the other hand, amino acids of microbial proteins have been substituted at a high frequency without largely affecting protein functions. That is because evolution rate of microbial proteins is quite large due to the enormously large cell number and a quite short division time, such as about 20-30 minutes in the case of *Escherichia coli*. Therefore, it would be suitable to compare an amino acid sequence of a microbial protein with the homologous amino acid sequence in order to investigate amino acid substitutions occurring without largely affecting the protein function in a wide range as

Fig. 2. Alignment of two amino acid sequences of small homologous single-stranded DNA

*hydrogenoformans* (142 amino acids). Red bold and black letters indicate substituted and conserved amino acids between the two amino acid sequences, respectively. Hyphen (-) means amino acid position deleted from one amino acid sequence. Homology percent between the two single-stranded DNA binding proteins, which were obtained from

binding proteins, from *Aquifex aeolicus* (147 amino acids) and *Carboxydothermus*

GeneBank at http://www.ncbi.nlm.nih.gov/genbank/, is 38%

are described in Table 1.

future.

shown in Figure 2.

Fig. 1. (A) Possible mutation sites, which may affect various functions for gene expression and catalytic functions of proteins. Dark and white horizontal bars indicate exons encoding amino acid sequences of a protein and introns without genetic information for protein synthesis, respectively. Capital letters, P and T, mean a promoter for transcription initiation and a terminator required for termination of mRNA synthesis, respectively. Thick upward open and closed arrows and thin downward arrows indicate insertion and deletion of DNA sequences, and one-base substitutions, respectively. (B) Amino acid replacement observed in a classical and well-known genetic disorder, sickle cell anemia. Red letters indicate replacements of amino acid and base of the genetic mRNA sequence


Table 1. Examples of representative genetic disorders caused by one-base replacements on genetic sequences encoding amino acid sequences of proteins

Fig. 1. (A) Possible mutation sites, which may affect various functions for gene expression and catalytic functions of proteins. Dark and white horizontal bars indicate exons encoding amino acid sequences of a protein and introns without genetic information for protein synthesis, respectively. Capital letters, P and T, mean a promoter for transcription initiation and a terminator required for termination of mRNA synthesis, respectively. Thick upward open and closed arrows and thin downward arrows indicate insertion and deletion of DNA sequences, and one-base substitutions, respectively. (B) Amino acid replacement observed in a classical and well-known genetic disorder, sickle cell anemia. Red letters indicate

Genetic Disorder Inheritance Gene

Hailey-Hailey Disease Autosomal dominant ATP2C1 Adenosine deaminase deficiency Autosomal recessive ADA Thalassemia globins Alstrom Syndrome ALMS1 Tangier Disease ABCA1 Phenylketourea PAH Galactosemia GALT Aicardi-Goutieres syndrome X-link dominant RNAses Bernard-Soulier syndrome GPIs Wiskott-Aldrich syndrome X-link recessive WASp Fabry Disease α-Gal A

deficiency OTC

Table 1. Examples of representative genetic disorders caused by one-base replacements on

replacements of amino acid and base of the genetic mRNA sequence

Ornithine transcarbamoylase

genetic sequences encoding amino acid sequences of proteins

**(A)**

(B)

Base substitutions might occur on every gene encoding functional proteins on a whole genome. In fact, about ten thousands genetic diseases are already known until now, out of which several genetic disorders caused by one-base replacements or monogenic disorders are described in Table 1.

In this Chapter, I will discuss on genetic disorders, which are caused by one-base replacements in coding regions, because I would like to discuss on relationships among robustness of the universal genetic code, base substitutions in codons and genetic disorders from a stand point of the origin of the genetic code. Term of "the universal genetic code", which is widely used in extant organisms, is used in this Chapter, instead of "the standard genetic code", which is used in many textbooks of in the fields of biochemistry and molecular biology since discoveries of non-universal genetic codes in mitochondria of mammals, protozoa and some bacteria. That is because I would like to emphasize that almost all organisms on this planet have actually used the genetic code. I believe that understanding on the relationship between the robustness and base substitutions will contribute to discovery of proper methods for treatments of many genetic disorders in a future.

Amino acid substitutions not largely affecting normal protein function are observed, as it is known as single nucleotide polymorohisms in the case of human beings. But, amino acid substitutions of mammals evolving at a quite slow rate due to a long generation time, such as about 25 years in the case of human, have occurred at a comparatively low frequency. On the other hand, amino acids of microbial proteins have been substituted at a high frequency without largely affecting protein functions. That is because evolution rate of microbial proteins is quite large due to the enormously large cell number and a quite short division time, such as about 20-30 minutes in the case of *Escherichia coli*. Therefore, it would be suitable to compare an amino acid sequence of a microbial protein with the homologous amino acid sequence in order to investigate amino acid substitutions occurring without largely affecting the protein function in a wide range as shown in Figure 2.


Fig. 2. Alignment of two amino acid sequences of small homologous single-stranded DNA binding proteins, from *Aquifex aeolicus* (147 amino acids) and *Carboxydothermus hydrogenoformans* (142 amino acids). Red bold and black letters indicate substituted and conserved amino acids between the two amino acid sequences, respectively. Hyphen (-) means amino acid position deleted from one amino acid sequence. Homology percent between the two single-stranded DNA binding proteins, which were obtained from GeneBank at http://www.ncbi.nlm.nih.gov/genbank/, is 38%

Origin of the Genetic Code and Genetic Disorder 7

As seen in Figure 2, many amino acid substitutions are observed between two homologous single-stranded DNA binding proteins. The amino acid substitutions caused by base substitutions at the first codon position were observed more than those caused by base substitutions at the second codon position (see the Table given in Figure 3). Similar results were obtained from amino acid substitutions between two large homologous stringent response proteins, *Streptomyces coelicolor* RelA and *Staphylococcus aureus* RelA (Figure 3). It can be interpreted as that amino acids with similar chemical and physical properties are arranged in the same column in the genetic code table at a comparably high probability

The universal genetic code is redundant and has a highly non-random structure. Typically, when nucleotide at the third codon position differs from the corresponding one, both codons encode the same amino acids at a high probability, due to the degeneracy of the genetic code at the third codon position. In addition, codons, of which nucleotide at the first codon position differs from each other, usually encode amino acids with different but rather

U C A G U C A G Phe Ser Tyr Cys U Phe Ser Tyr Cys U U Phe Ser Tyr Cys C U Phe Ser Tyr Cys C Leu Ser Term Term A Leu Ser Term Term A Leu Ser Term Trp G Leu Ser Term Trp G Leu Pro His Arg U Leu Pro His Arg U C Leu Pro His Arg C C Leu Pro His Arg C Leu Pro Gln Arg A Leu Pro Gln Arg A Leu Pro Gln Arg G Leu Pro Gln Arg G Ile Thr Asn Ser U Ile Thr Asn Ser U A Ile Thr Asn Ser C A Ile Thr Asn Ser C Ile Thr Lys Arg A Ile Thr Lys Arg A Met Thr Lys Arg G Met Thr Lys Arg G Val Ala Asp Gly U Val Ala Asp Gly U G Val Ala Asp Gly C G Val Ala Asp Gly C Val Ala Glu Gly A Val Ala Glu Gly A Val Ala Glu Gly G Val Ala Glu Gly G

Table 2. Color representation of chemical/physical properties, of amino acids based on the values described in Stryer's "Biochemistry" (Berg *et al*, 2002). (A) hydrophobicities and (B) α-helix propensities of amino acids in the universal genetic code table. Letters in red, yellow and blue boxes represent amino acids with large, middle and small hydrophobicities, and

It can be seen in Table 2 that amino acids encoded by 16 codons in the same column are located in the same or two colored boxes at a high probability, such as two columns from left side of Table 2 (A) and one column at the most left side of Table 2 (D). Contrary to that,

the corresponding degrees of α-helix propensities, respectively

(A) (B)

Hydropathy α-Helix

(Table 2 (A), (B), (C) and (D)).

similar chemical/physical properties.



Fig. 3. The numbers of permissible amino acid substitutions observed between two pairs of homologous proteins, from *S. coelicolor* (left column) and to *S. aureus* (top row) RelA proteins (the numbers at the left side) and from *A. aeolicus* (left column) and to *C. hydrogenoformis* (top row) single-stranded DNA binding proteins (the numbers at the right side). Amino acid replacements upon base substitutions at the first, the second and the third codon positions are written in blue, yellow and red color boxes, respectively. Green, orange and white boxes indicate amino acid replacements induced by base substitutions at the first or the second codon positions, at the first or the third codon positions and other base substitutions, respectively. The base substitutions at the respective codon positions were deduced from amino acid replacements between two homologous proteins, which were occurred by onebase substitutions. The amino acid sequences, which were used for alignment, were obtained from GeneBank at http://www.ncbi.nlm.nih.gov/genbank/

A C D E F G H I K L M N P Q R S T V W Y A 0,0 4,0 6,0 0,0 1,2 2,0 2,0 1,0 2,0 2,0 4,0 1,0 2,0 3,1 6,0 2,0 4,1 0,0 3,0 C 0,0 0,0 0,0 0,0 0,0 0,0 1,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 D 0,0 1,0 5,1 1,0 1,0 0,0 0,0 4,0 1,0 2,0 2,0 0,0 3,0 0,0 2,0 2,1 0,0 0,0 0,0 E 1,0 0,0 1,5 1,1 0,1 0,0 1,1 5,0 0,1 1,0 1,1 1,1 3,0 3,2 2,3 2,1 1,0 0,0 2,0 F 0,0 0,0 0,0 0,0 0,0 0,0 2,3 0,0 1,1 0,0 0,0 0,0 1,0 1,1 0,0 0,0 1,0 0,0 5,0 G 1,0 0,0 1,0 1,0 0,0 0,0 0,0 5,0 0,0 0,0 3,1 0,0 2,1 1,1 2,0 1,0 0,0 0,0 1,0 H 1,0 0,0 1,1 1,0 0,0 1,0 0,0 0,0 0,0 0,0 2,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 1,0 I 0,0 0,0 0,0 1,0 0,0 0,0 0,0 0,0 3,3 1,0 0,0 0,1 0,0 0,0 0,0 0,0 7,3 0,0 1,0 K 2,0 0,0 2,1 4,0 1,0 0,0 1,0 1,1 0,0 0,0 0,0 2,0 0,1 3,0 0,1 0,1 1,2 0,0 1,0 L 1,0 0,0 0,0 0,0 3,3 1,0 0,0 14,0 0,0 5,1 0,0 0,0 2,0 1,0 0,0 1,2 5,1 0,0 2,0 M 0,0 0,0 0,0 0,0 0,0 0,0 0,0 3,0 0,0 5,1 0,0 0,0 1,0 0,0 0,0 0,0 2,0 0,0 1,0 N 0,0 0,0 2,2 1,1 0,0 2,0 0,0 0,0 1,0 0,0 0,0 0,0 1,0 0,0 0,0 1,1 0,0 0,0 0,0 P 1,1 0,0 1,0 1,0 0,0 2,0 0,0 1,0 1,0 1,0 0,0 2,0 0,0 2,0 2,0 1,0 1,0 0,0 1,0 Q 0,0 0,0 1,0 5,0 0,0 0,0 2,0 0,0 2,1 0,0 0,0 1,0 0,1 3,0 0,0 2,1 0,0 0,0 0,0 R 0,0 0,0 3,0 4,1 0,0 1,0 0,0 2,0 17,1 1,0 0,0 6,0 1,1 2,0 3,0 1,0 1,0 1,0 0,0 S 3,0 1,0 4,0 0,0 0,0 0,0 1,0 1,0 5,0 1,0 0,0 5,0 0,0 1,2 1,1 3,2 2,0 0,0 1,0 T 2,0 0,0 1,0 0,0 0,0 1,0 0,0 3,0 0,0 2,0 2,0 5,0 0,0 0,0 0,1 6,0 3,1 0,0 0,0 V 4,1 0,0 0,0 2,1 1,1 2,0 1,0 15,0 1,0 5,0 2,0 1,0 1,0 1,0 0,0 0,0 4,0 0,0 0,1 W 2,1 0,0 0,0 0,0 1,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 1,0 0,0 0,0 0,0 0,0 0,0 0,1 Y 1,0 0,0 1,0 0,0 3,1 1,0 1,1 1,0 0,0 0,0 0,0 0,0 0,0 0,0 0,1 0,0 0,0 0,1 0,1

> Protein 1st 2nd 3rd 1,2 1,3 others RelA 119 93 13 10 8 154 SS-DNA.B 21 13 6 2 5 29

Fig. 3. The numbers of permissible amino acid substitutions observed between two pairs of homologous proteins, from *S. coelicolor* (left column) and to *S. aureus* (top row) RelA proteins (the numbers at the left side) and from *A. aeolicus* (left column) and to *C. hydrogenoformis* (top row) single-stranded DNA binding proteins (the numbers at the right side). Amino acid replacements upon base substitutions at the first, the second and the third codon positions are written in blue, yellow and red color boxes, respectively. Green, orange and white boxes indicate amino acid replacements induced by base substitutions at the first or the second codon positions, at the first or the third codon positions and other base substitutions, respectively. The base substitutions at the respective codon positions were deduced from amino acid replacements between two homologous proteins, which were occurred by onebase substitutions. The amino acid sequences, which were used for alignment, were

obtained from GeneBank at http://www.ncbi.nlm.nih.gov/genbank/

As seen in Figure 2, many amino acid substitutions are observed between two homologous single-stranded DNA binding proteins. The amino acid substitutions caused by base substitutions at the first codon position were observed more than those caused by base substitutions at the second codon position (see the Table given in Figure 3). Similar results were obtained from amino acid substitutions between two large homologous stringent response proteins, *Streptomyces coelicolor* RelA and *Staphylococcus aureus* RelA (Figure 3). It can be interpreted as that amino acids with similar chemical and physical properties are arranged in the same column in the genetic code table at a comparably high probability (Table 2 (A), (B), (C) and (D)).

The universal genetic code is redundant and has a highly non-random structure. Typically, when nucleotide at the third codon position differs from the corresponding one, both codons encode the same amino acids at a high probability, due to the degeneracy of the genetic code at the third codon position. In addition, codons, of which nucleotide at the first codon position differs from each other, usually encode amino acids with different but rather similar chemical/physical properties.


Table 2. Color representation of chemical/physical properties, of amino acids based on the values described in Stryer's "Biochemistry" (Berg *et al*, 2002). (A) hydrophobicities and (B) α-helix propensities of amino acids in the universal genetic code table. Letters in red, yellow and blue boxes represent amino acids with large, middle and small hydrophobicities, and the corresponding degrees of α-helix propensities, respectively

It can be seen in Table 2 that amino acids encoded by 16 codons in the same column are located in the same or two colored boxes at a high probability, such as two columns from left side of Table 2 (A) and one column at the most left side of Table 2 (D). Contrary to that,

Origin of the Genetic Code and Genetic Disorder 9

representative text books, as Stryer's "Biochemistry" (Berg *et al*, 2002). It seems to me that the significance of the genetic code has been underestimated at the present time, judging from my original idea suggesting that protein 0th-order structures, which are specific amino acid compositions favorable for effectively producing water-soluble globular proteins even by random synthesis (see Section 4), are secretly described in the genetic code table (see

Genetic information, which is stored in base sequences or actually in codon sequences on DNA, is propagated from a parent to progeny cells through DNA replication. In parallel, the information is transformed into mRNA and successively into an amino acid sequence of a protein according to the genetic code, when necessary. Various organic molecules required to live are synthesized with enzyme proteins on metabolic pathways (Figure 4). Therefore, it is no exaggeration to say that the genetic code is much more significant for lives than genes and proteins, or that the genetic code is the most important facility in the fundamental life system. Understanding of the origin and evolutionary processes of the genetic code should be quite important to know a framework of the genetic code and a relationship between

Fig. 4. Role of the genetic code playing in the fundamental life system of modern organisms,

transferred into mRNA is translated to the corresponding amino acid sequence of a protein (Step 3) through genetic code mediating genetic information and catalytic function. The universal genetic code used by extant organisms on the earth is composed of 64 codons and

**3. Origin of the Genetic Code (GNC-SNS primitive genetic code hypothesis)**  Our studies on the origin of the genetic code were initiated from the search for a prospective spot on a DNA sequence, from which an entirely new gene encoding an entirely new functional protein will be created, when an extant organism using the universal genetic code has to adapt to a new environment. The spot was searched based on the six necessary conditions for producing water-soluble globular proteins as described below. The six conditions used for the search are hydropathy, α-helix, β-sheet and turn/coil formabilities,

which is composed of genes, the genetic code and proteins (enzymes). Genetic code mediates between two main elements, genetic function composed of DNA (mRNA) and function carried out by proteineous catalysts (enzymes) forming chemical network or metabolism. Genetic information on DNA are transmitted to progeny cells by replication (Step 1), and transcribed into mRNA (Step 2) when necessary. Genetic information

amino acid substitutions and one-base substitutions causing genetic disorders.

Figure 7 in Section 3).

20 amino acids (see Table 2)

no row with the same color boxes is observed in Table 2 (A), (B), (C) and (D). This means that amino acids with similar chemical/physical properties are arranged in the same column, but those with rather different chemical/physical properties are arranged in the same rows at high probabilities. As a result, it makes the genetic code to be highly robust to the change of protein functions upon base substitutions in protein coding sequences, especially at the third and the first codon positions of genetic sequences. My original GNC-SNS primitive genetic code hypothesis on the origin and evolution of the genetic code (Ikehara, et al., 2002), which will be described in Section 3, can explain reasonably the robustness of the genetic code, which might stem from the origin and evolutionary processes. N and S mean either of four bases (A, U/T, G and C) and G or C, respectively.



Table 2. (Continued). (C) β-sheet and (D) turn/coil structure propensities, of amino acids in the universal genetic code table. Letters in red, yellow and blue boxes represent large, middle, and small β-sheet and turn/coil propensities, respectively. Meanings of color boxes in Table (C) and (D) are the same as in Table (A) and (B), described above. Secondary structure (β-sheet; (C) and turn/coil; (D)) propensities of amino acids were obtained from Stryer's "Biochemistry" (Berg *et al*, 2002)

### **2. Significance of the Genetic Code for life**

The genetic code plays a quite important role in transfer of genetic information on DNA nucleotide sequence to amino acid sequence of a protein, such as enzyme and transporter of a chemical compound, *etc* (Figure 4). But, the genetic code has been generally regarded as a simple representation of the relationship between a genetic information or a codon composed of three bases (triplet) and an amino acid in a protein sequence as described in

no row with the same color boxes is observed in Table 2 (A), (B), (C) and (D). This means that amino acids with similar chemical/physical properties are arranged in the same column, but those with rather different chemical/physical properties are arranged in the same rows at high probabilities. As a result, it makes the genetic code to be highly robust to the change of protein functions upon base substitutions in protein coding sequences, especially at the third and the first codon positions of genetic sequences. My original GNC-SNS primitive genetic code hypothesis on the origin and evolution of the genetic code (Ikehara, et al., 2002), which will be described in Section 3, can explain reasonably the robustness of the genetic code, which might stem from the origin and evolutionary processes. N and S mean either of four bases (A, U/T, G and C) and G or C, respectively.

(C) (D)

U C A G U C A G Phe Ser Tyr Cys U Phe Ser Tyr Cys U U Phe Ser Tyr Cys C U Phe Ser Tyr Cys C Leu Ser Term Term A Leu Ser Term Term A Leu Ser Term Trp G Leu Ser Term Trp G Leu Pro His Arg U Leu Pro His Arg U C Leu Pro His Arg C C Leu Pro His Arg C Leu Pro Gln Arg A Leu Pro Gln Arg A Leu Pro Gln Arg G Leu Pro Gln Arg G Ile Thr Asn Ser U Ile Thr Asn Ser U

A Ile Thr Asn Ser C A Ile Thr Asn Ser C

Table 2. (Continued). (C) β-sheet and (D) turn/coil structure propensities, of amino acids in the universal genetic code table. Letters in red, yellow and blue boxes represent large, middle, and small β-sheet and turn/coil propensities, respectively. Meanings of color boxes in Table (C) and (D) are the same as in Table (A) and (B), described above. Secondary structure (β-sheet; (C) and turn/coil; (D)) propensities of amino acids were obtained from

The genetic code plays a quite important role in transfer of genetic information on DNA nucleotide sequence to amino acid sequence of a protein, such as enzyme and transporter of a chemical compound, *etc* (Figure 4). But, the genetic code has been generally regarded as a simple representation of the relationship between a genetic information or a codon composed of three bases (triplet) and an amino acid in a protein sequence as described in

Stryer's "Biochemistry" (Berg *et al*, 2002)

**2. Significance of the Genetic Code for life** 

Ile Thr Lys Arg A Ile Thr Lys Arg A Met Thr Lys Arg G Met Thr Lys Arg G Val Ala Asp Gly U Val Ala Asp Gly U G Val Ala Asp Gly C G Val Ala Asp Gly C Val Ala Glu Gly A Val Ala Glu Gly A Val Ala Glu Gly G Val Ala Glu Gly G

β-Sheet Turn/Coil

representative text books, as Stryer's "Biochemistry" (Berg *et al*, 2002). It seems to me that the significance of the genetic code has been underestimated at the present time, judging from my original idea suggesting that protein 0th-order structures, which are specific amino acid compositions favorable for effectively producing water-soluble globular proteins even by random synthesis (see Section 4), are secretly described in the genetic code table (see Figure 7 in Section 3).

Genetic information, which is stored in base sequences or actually in codon sequences on DNA, is propagated from a parent to progeny cells through DNA replication. In parallel, the information is transformed into mRNA and successively into an amino acid sequence of a protein according to the genetic code, when necessary. Various organic molecules required to live are synthesized with enzyme proteins on metabolic pathways (Figure 4). Therefore, it is no exaggeration to say that the genetic code is much more significant for lives than genes and proteins, or that the genetic code is the most important facility in the fundamental life system. Understanding of the origin and evolutionary processes of the genetic code should be quite important to know a framework of the genetic code and a relationship between amino acid substitutions and one-base substitutions causing genetic disorders.

Fig. 4. Role of the genetic code playing in the fundamental life system of modern organisms, which is composed of genes, the genetic code and proteins (enzymes). Genetic code mediates between two main elements, genetic function composed of DNA (mRNA) and function carried out by proteineous catalysts (enzymes) forming chemical network or metabolism. Genetic information on DNA are transmitted to progeny cells by replication (Step 1), and transcribed into mRNA (Step 2) when necessary. Genetic information transferred into mRNA is translated to the corresponding amino acid sequence of a protein (Step 3) through genetic code mediating genetic information and catalytic function. The universal genetic code used by extant organisms on the earth is composed of 64 codons and 20 amino acids (see Table 2)

#### **3. Origin of the Genetic Code (GNC-SNS primitive genetic code hypothesis)**

Our studies on the origin of the genetic code were initiated from the search for a prospective spot on a DNA sequence, from which an entirely new gene encoding an entirely new functional protein will be created, when an extant organism using the universal genetic code has to adapt to a new environment. The spot was searched based on the six necessary conditions for producing water-soluble globular proteins as described below. The six conditions used for the search are hydropathy, α-helix, β-sheet and turn/coil formabilities,

Origin of the Genetic Code and Genetic Disorder 11

was more primitive one than SNS by using the four more essential conditions which acidic amino acid and basic amino acid compositions were excluded from the six conditions described above. From the results, it was found that [GADV]-proteins encoded by GNC codons well satisfied the four structural conditions, when roughly equal amounts of [GADV]-amino acids were contained in the proteins (Figure 6 (B)). Where [GADV] represents four amino acids of Gly, Ala, Asp and Val, and square bracket ([ ]) was used to discriminate amino acids, especially G and A which are described by one-letter symbols of amino acids, from nucleic acid bases, G and A. It means that even the [GADV]-polypeptide chains with a quite simple amino acid composition could be folded into water-soluble

(A) (B)

Fig. 6. (A) Dot plot analysis of SNS genetic code. Dots concentrated in the respective boxes indicate that the six conditions (hydropathy, α-helix, β-sheet and turn/coil formabilities, and acidic and basic amino acid contents) were satisfied. It means that polylpeptide chains encoded by SNS code could be folded into water-soluble globular structures when bases are contained in the respective rates at three codon positions. (B) Dot plot analysis of GNC code On the other hand, other codes encoding four amino acids, which were picked out from the columns or rows in the universal genetic code table, did not satisfy the four structural conditions, except for GNG code, which is a modified form of the GNC code (Ikehara et al, 2002). Moreover, it was also confirmed that genetic code composed of three amino acids lined in universal genetic code table did not satisfy the four conditions for protein structure formation, suggesting that the GNC code would be used as the most primeval genetic code on the primitive earth (Ikehara et al, 2002). Then, I concluded that SNS primitive genetic code evolved from the GNC primeval genetic code by C and G introductions at the first and

50/100

100 50 GC Content (%)

Dots concentrated in the respective boxes of Figure 6 (B) indicate that the four conditions (hydropathy, α-helix, β-sheet and turn/coil formabilities) were satisfied. It means that polylpeptide chains encoded by GNC code could be folded into water-soluble globular

the third codon positions, respectively (Figure 7 (A)).

100

C2

50

100/0

100/0

50

100/0

Ba se C om p o sition (% )

50

T2

A2

G2

0 50

50 60 70 80 90 100

25

25

25

25

GC Content (%)

structures at a high probability.

G3 C3

50 50/100

GC Content (%)

G2 C2 A2 T2

100

C1

G1

100

0/100

0/100

0

Base

Composition (%)

acidic amino acid and basic amino acid contents of proteins, which were obtained as average values plus/minus standard deviations of water-soluble globular proteins in extant micro-organisms. From the results, it was found that non-stop frames, which appear on antisense strands of GC-rich genes (GC-NSF(a)s) at a high probability, have the strongest possibility to create entirely new genes, not new modified type of genes or homologous genes (Figure 5) (Ikehara et al., 1996). Where GC-NSF(a) means nonstop frame on antisense strand of GC-rich gene. That is because hypothetical proteins encoded by GC-NSF(a)s satisfied the six conditions and because the probability of non-stop frame (NSF) appearance on the GC-rich anticodon sequences was enough high (Ikehara, 2002).

The GC-NSF(a) hypothesis on creation of the first family genes under the universal genetic code led us propose subsequent theory on the origin of the genetic code as GNC-SNS primitive genetic code hypothesis (Ikehara et al., 2002). GNC and SNS represent four codons (GUC, GCC, GAC and GGC) and 16 codons (GUC, GCC, GAC, GGC, GUG, GCG, GAG, GGG, CUG, CCG, CAG, CGG, CUC, CCC, CAC and CGC), respectively. I describe the clues briefly below, from which the hypothesis was obtained. The first one is that base sequences of the GC-NSF(a)s were rather similar to the repeating sequences of SNS. The second one is that hypothetical proteins encoded by GNC code, a part of the SNS code, satisfied the four conditions (hydropathy, α-helix, β-sheet and turn/coil formabilities of proteins) for folding polypeptide chains into water-soluble globular structures (Ikehara et al., 2002). In the following paragraphs, the progress of investigation from the discovery of origin of genes to the GNC-SNS primitive genetic code hypothesis will be describe more precisely.

Fig. 5. GC-NSF(a) primitive gene hypothesis for creation of "original ancestor genes" under the universal genetic code. The hypothesis predicts that new "original ancestor genes" originate from nonstop frames on antisense strands of GC-rich genes (GC-NSF(a)s)

Firstly, we found that base compositions at the three codon positions of the GC-NSF(a) were similar to SNS. Actually, hypothetical polypeptide chains encoded by only SNS code, not containing A and U at the first and third codon positions, satisfied the six conditions, suggesting that polypeptides encoded by SNS code could be folded into water-soluble globular structures at a high probability (Figure 6 (A)). This indicates that SNS code has enough ability encoding proteins with definite-levels of catalytic activities. At this point, I provided SNS hypothesis on the origin of the genetic code about fifteen years ago (Ikehara & Yoshida, 1998).

But, the SNS code composed of 16 codons and 10 amino acids must be too complex to prepare as the first genetic code from the beginning. So, I further searched for which code

acidic amino acid and basic amino acid contents of proteins, which were obtained as average values plus/minus standard deviations of water-soluble globular proteins in extant micro-organisms. From the results, it was found that non-stop frames, which appear on antisense strands of GC-rich genes (GC-NSF(a)s) at a high probability, have the strongest possibility to create entirely new genes, not new modified type of genes or homologous genes (Figure 5) (Ikehara et al., 1996). Where GC-NSF(a) means nonstop frame on antisense strand of GC-rich gene. That is because hypothetical proteins encoded by GC-NSF(a)s satisfied the six conditions and because the probability of non-stop frame (NSF) appearance

The GC-NSF(a) hypothesis on creation of the first family genes under the universal genetic code led us propose subsequent theory on the origin of the genetic code as GNC-SNS primitive genetic code hypothesis (Ikehara et al., 2002). GNC and SNS represent four codons (GUC, GCC, GAC and GGC) and 16 codons (GUC, GCC, GAC, GGC, GUG, GCG, GAG, GGG, CUG, CCG, CAG, CGG, CUC, CCC, CAC and CGC), respectively. I describe the clues briefly below, from which the hypothesis was obtained. The first one is that base sequences of the GC-NSF(a)s were rather similar to the repeating sequences of SNS. The second one is that hypothetical proteins encoded by GNC code, a part of the SNS code, satisfied the four conditions (hydropathy, α-helix, β-sheet and turn/coil formabilities of proteins) for folding polypeptide chains into water-soluble globular structures (Ikehara et al., 2002). In the following paragraphs, the progress of investigation from the discovery of origin of genes to the GNC-SNS primitive genetic code hypothesis will be describe more

Fig. 5. GC-NSF(a) primitive gene hypothesis for creation of "original ancestor genes" under the universal genetic code. The hypothesis predicts that new "original ancestor genes" originate from nonstop frames on antisense strands of GC-rich genes (GC-NSF(a)s)

T

Maturation from a NSF(a) to a New GC-rich Gene

P

Duplication

a GC-rich gene a GC-rich gene

a GC-rich gene (an original gene)

T

T T

t p

a new GC-rich "original ancestor gene"

a GC-NSF(a)

Firstly, we found that base compositions at the three codon positions of the GC-NSF(a) were similar to SNS. Actually, hypothetical polypeptide chains encoded by only SNS code, not containing A and U at the first and third codon positions, satisfied the six conditions, suggesting that polypeptides encoded by SNS code could be folded into water-soluble globular structures at a high probability (Figure 6 (A)). This indicates that SNS code has enough ability encoding proteins with definite-levels of catalytic activities. At this point, I provided SNS hypothesis on the origin of the genetic code about fifteen years ago (Ikehara

But, the SNS code composed of 16 codons and 10 amino acids must be too complex to prepare as the first genetic code from the beginning. So, I further searched for which code

on the GC-rich anticodon sequences was enough high (Ikehara, 2002).

P

P P

precisely.

& Yoshida, 1998).

was more primitive one than SNS by using the four more essential conditions which acidic amino acid and basic amino acid compositions were excluded from the six conditions described above. From the results, it was found that [GADV]-proteins encoded by GNC codons well satisfied the four structural conditions, when roughly equal amounts of [GADV]-amino acids were contained in the proteins (Figure 6 (B)). Where [GADV] represents four amino acids of Gly, Ala, Asp and Val, and square bracket ([ ]) was used to discriminate amino acids, especially G and A which are described by one-letter symbols of amino acids, from nucleic acid bases, G and A. It means that even the [GADV]-polypeptide chains with a quite simple amino acid composition could be folded into water-soluble structures at a high probability.

(A) (B)

Fig. 6. (A) Dot plot analysis of SNS genetic code. Dots concentrated in the respective boxes indicate that the six conditions (hydropathy, α-helix, β-sheet and turn/coil formabilities, and acidic and basic amino acid contents) were satisfied. It means that polylpeptide chains encoded by SNS code could be folded into water-soluble globular structures when bases are contained in the respective rates at three codon positions. (B) Dot plot analysis of GNC code

On the other hand, other codes encoding four amino acids, which were picked out from the columns or rows in the universal genetic code table, did not satisfy the four structural conditions, except for GNG code, which is a modified form of the GNC code (Ikehara et al, 2002). Moreover, it was also confirmed that genetic code composed of three amino acids lined in universal genetic code table did not satisfy the four conditions for protein structure formation, suggesting that the GNC code would be used as the most primeval genetic code on the primitive earth (Ikehara et al, 2002). Then, I concluded that SNS primitive genetic code evolved from the GNC primeval genetic code by C and G introductions at the first and the third codon positions, respectively (Figure 7 (A)).

Dots concentrated in the respective boxes of Figure 6 (B) indicate that the four conditions (hydropathy, α-helix, β-sheet and turn/coil formabilities) were satisfied. It means that polylpeptide chains encoded by GNC code could be folded into water-soluble globular

Origin of the Genetic Code and Genetic Disorder 13

which encodes 20 amino acids and three stop signals with 64 codons (Ikehara & Yoshida, 1998; Ikehara et al., 2002). The GNC-SNS primitive genetic code hypothesis represents that the universal genetic code (NNN: 4x4x4 = 43 = 64 codons), which is both formally and substantially triplet code, originated from formally triplet but substantially singlet GNC code (1x4x1 = 41 = 4 codons) encoding four [GADV]-amino acids, through formally triplet but substantially doublet SNS code (2x4x2 = 42 = 16 codons) encoding 10 amino acids

Evolutionary process of the genetic code from GNC code, encoding four amino acids with quite different chemical/physical properties, to the universal genetic code through SNS code arranged amino acids with similar chemical and physical properties in the same columns and with largely different properties in the same rows at high probabilities (Table 2). So, it is considered that the robustness of the genetic code originated from the evolutionary process of the genetic code as suggested by the GNC-SNS primitive genetic code hypothesis. The discussion on the robustness of the genetic code is consistent with the results of permissible amino acid substitutions, which were observed between two homologous proteins, as given in Figures 2 and 3. As described below, the finding of the GNC-SNS primitive genetic code hypothesis led to the ideas on protein 0th-order structures and on the origin of life as GADV hypothesis or [GADV]-protein world hypothesis (Ikehara,

Discussion on protein structure formation usually begins with primary structure or amino acid sequence of a protein, not with amino acid composition. In Stryer's textbook "Biochemistry" (Berg et al, 2002), it is described that the information needed to specify the catalytically active structure of ribonuclease is contained in its amino acid sequence. The studies on folding of polypeptide chains, which were mainly carried out with small-sized proteins, have established the generality of this central principle of biochemistry: sequence specifies conformation. One of the reasons may rely on the facts that one-dimensional base sequences on DNA or genes encode amino acid sequences or primary structure of proteins. On the other hand, I happened to use amino acid composition for investigation of protein structure formability, the six or four conditions as described above. The utilization gave interesting results and conclusions, such as GC-NSF(a) hypothesis on creation of the first family genes and GNC-SNS primitive genetic code hypothesis as described in the previous Sections 3. During the investigation on the origin of the genetic code, I have noticed the significance of specific amino acid compositions satisfying four (hydropaty and α-helix, βsheet and turn propensities) or six (hydropaty and α-helix, β-sheet and turn propensities plus acidic and basic amino acid compositions) conditions for folding polypeptide chains into water-soluble globular structures. The conditions were obtained as the respective average values plus/minus standard deviations of presently existing water-soluble globular proteins from seven micro-organisms carrying the genomes with widely distributed GC contents. Structure formability of one protein is the same as other proteins randomly assembled in the same amino acid composition. This means that every protein synthesized by random peptide bond formation among amino acids in the specific amino acid composition could be similarly folded into water-soluble globular structures, but into different structures, since the proteins have the same amino acid composition but different

**4. The universal genetic code and protein 0th-order structure** 

(Figure 7) (Ikehara, 2009).

2005; Ikehara, 2009).

sequences from each other.

structures when four bases are contained in the respective rates at the second codon position.

Thus, I provided GNC-SNS hypothesis as the origin of the genetic code about ten years ago (Ikehara et al., 2002), suggesting that the universal genetic code originated from GNC code through SNS code as capturing new codons up and down in the genetic code table (Figure 7 (B)).


Fig. 7. GNC-SNS hypothesis on the origin and evolutionary pathway of the genetic code. (A) In the hypothesis, it is supposed that the universal genetic code originated from GNC primeval genetic code through SNS primitive genetic code. Elucidation of the most primitive GNC code made it possible to propose as GADV hypothesis on the origin of life. (B) Alternative representation of the origin and evolutionary pathway of the genetic code. The universal genetic code originated from GNC primeval genetic code (red row), successively followed by capturing codons of GNG (orange row), and CNS (yellow rows), resulting in formation of SNS code. Therefore, it is considered that the universal genetic code evolved from GNC code through the introduction of rest rows up and down

Due to the evolutionary process of the genetic code, amino acids with similar chemical/physical properties have been arranged in the same column at a high probability (Table 2). Consequently, replacements between two amino acids located in the same column have been permitted at a high probability and the robustness of the genetic code has been generated. Now I believe that the GNC code had stepped up its structure to the SNS primitive genetic code encoding ten amino acids with 16 SNS codons via GNS code (8 codons and 5 amino acids). After that, the SNS code evolved into the universal genetic code,

structures when four bases are contained in the respective rates at the second codon

Thus, I provided GNC-SNS hypothesis as the origin of the genetic code about ten years ago (Ikehara et al., 2002), suggesting that the universal genetic code originated from GNC code through SNS code as capturing new codons up and down in the genetic code table (Figure 7

(A) (B)

Fig. 7. GNC-SNS hypothesis on the origin and evolutionary pathway of the genetic code. (A) In the hypothesis, it is supposed that the universal genetic code originated from GNC primeval genetic code through SNS primitive genetic code. Elucidation of the most primitive GNC code made it possible to propose as GADV hypothesis on the origin of life. (B) Alternative representation of the origin and evolutionary pathway of the genetic code.

successively followed by capturing codons of GNG (orange row), and CNS (yellow rows), resulting in formation of SNS code. Therefore, it is considered that the universal genetic code evolved from GNC code through the introduction of rest rows up and down

Due to the evolutionary process of the genetic code, amino acids with similar chemical/physical properties have been arranged in the same column at a high probability (Table 2). Consequently, replacements between two amino acids located in the same column have been permitted at a high probability and the robustness of the genetic code has been generated. Now I believe that the GNC code had stepped up its structure to the SNS primitive genetic code encoding ten amino acids with 16 SNS codons via GNS code (8 codons and 5 amino acids). After that, the SNS code evolved into the universal genetic code,

The universal genetic code originated from GNC primeval genetic code (red row),

U C A G Phe Ser Tyr Cys U U Phe Ser Tyr Cys C Leu Ser Term Term A Leu Ser Term Trp G Leu Pro His Arg U C Leu Pro His Arg C Leu Pro Gln Arg A Leu Pro Gln Arg G Ile Thr Asn Ser U A Ile Thr Asn Ser C Ile Thr Lys Arg A Met Thr Lys Arg G Val Ala Asp Gly U G Val Ala Asp Gly C Val Ala Glu Gly A Val Ala Glu Gly G

position.

(B)).

which encodes 20 amino acids and three stop signals with 64 codons (Ikehara & Yoshida, 1998; Ikehara et al., 2002). The GNC-SNS primitive genetic code hypothesis represents that the universal genetic code (NNN: 4x4x4 = 43 = 64 codons), which is both formally and substantially triplet code, originated from formally triplet but substantially singlet GNC code (1x4x1 = 41 = 4 codons) encoding four [GADV]-amino acids, through formally triplet but substantially doublet SNS code (2x4x2 = 42 = 16 codons) encoding 10 amino acids (Figure 7) (Ikehara, 2009).

Evolutionary process of the genetic code from GNC code, encoding four amino acids with quite different chemical/physical properties, to the universal genetic code through SNS code arranged amino acids with similar chemical and physical properties in the same columns and with largely different properties in the same rows at high probabilities (Table 2). So, it is considered that the robustness of the genetic code originated from the evolutionary process of the genetic code as suggested by the GNC-SNS primitive genetic code hypothesis. The discussion on the robustness of the genetic code is consistent with the results of permissible amino acid substitutions, which were observed between two homologous proteins, as given in Figures 2 and 3. As described below, the finding of the GNC-SNS primitive genetic code hypothesis led to the ideas on protein 0th-order structures and on the origin of life as GADV hypothesis or [GADV]-protein world hypothesis (Ikehara, 2005; Ikehara, 2009).

### **4. The universal genetic code and protein 0th-order structure**

Discussion on protein structure formation usually begins with primary structure or amino acid sequence of a protein, not with amino acid composition. In Stryer's textbook "Biochemistry" (Berg et al, 2002), it is described that the information needed to specify the catalytically active structure of ribonuclease is contained in its amino acid sequence. The studies on folding of polypeptide chains, which were mainly carried out with small-sized proteins, have established the generality of this central principle of biochemistry: sequence specifies conformation. One of the reasons may rely on the facts that one-dimensional base sequences on DNA or genes encode amino acid sequences or primary structure of proteins.

On the other hand, I happened to use amino acid composition for investigation of protein structure formability, the six or four conditions as described above. The utilization gave interesting results and conclusions, such as GC-NSF(a) hypothesis on creation of the first family genes and GNC-SNS primitive genetic code hypothesis as described in the previous Sections 3. During the investigation on the origin of the genetic code, I have noticed the significance of specific amino acid compositions satisfying four (hydropaty and α-helix, βsheet and turn propensities) or six (hydropaty and α-helix, β-sheet and turn propensities plus acidic and basic amino acid compositions) conditions for folding polypeptide chains into water-soluble globular structures. The conditions were obtained as the respective average values plus/minus standard deviations of presently existing water-soluble globular proteins from seven micro-organisms carrying the genomes with widely distributed GC contents. Structure formability of one protein is the same as other proteins randomly assembled in the same amino acid composition. This means that every protein synthesized by random peptide bond formation among amino acids in the specific amino acid composition could be similarly folded into water-soluble globular structures, but into different structures, since the proteins have the same amino acid composition but different sequences from each other.

Origin of the Genetic Code and Genetic Disorder 15

Base compositions at three codon positions on sense strands of (GNC)n genes are substantially same as those on anti-sense strands, due to the self-complementary structure of the double-stranded (GNC)n genes. Thus, it is easily supposed that, after creation of the first double-stranded (GNC)n gene, GNC codon sequences on anti-sense strands could be utilized as a field for creation of entirely new functional genes encoding the first ancestor proteins in homologous protein families, since GNC codon sequences on antisense strands are quite different from those on sense strands, as can be actually regarded as random arrangement of GNC codons. In addition, (GNC)n sequences on antisense strands must encode [GADV]-proteins satisfying the four conditions for producing water-soluble globular proteins at a high probability (Ikehara, 2002) (Figure. 6 (B)). Also new genetic information could be created from duplicated sense sequences, as proposed by Ohno (1970). But, the duplicated sense sequences could be utilized only for encoding homologous proteins in a family (route 1). Contrary to that, one of two antisense sequences obtained after gene duplication could give a field for production of the protein, which is quite different

As seen in Figure 6 (B), [GADV]-proteins must have similar rigidity to extant proteins, when [GADV]-proteins contain less and more amounts of glycine and alanine than one quarter, respectively. Therefore, it is supposed that [GADV]-proteins, which were produced on the primitive earth in the absence of any genetic function or before creation of the first gene, were more flexible than the presently existing proteins, since the proteins should contain flexible turn/coil forming amino acid, glycine, more than rigid α-helix forming amino acid, alanine. The reason is that glycine would be pre-biotically synthesized more easily and accumulated on the primitive earth more than alanine. Therefore, [GADV]-proteins produced on the primitive earth must be more flexible than extant proteins recognizing usually one organic compound with high catalytic activities and high specificities. The flexible [GADV]-proteins would inevitably have only quite low catalytic activities. Even the low activities of the firstly appeared [GADV]-proteins would have been effective for leading to creation of the first genetic code, the first gene and the first life on the primitive earth. That is because the existence of [GADV]-proteins having the low catalytic activity must be important to develop new metabolic pathway on the primitive earth without any genetic

Formation of flexible but inefficient [GADV]-proteins was also essential to create newlyborn proteins or the first family proteins even after the first double-stranded (GNC)n gene was produced, because the proteins, which were newly produced as ones with quite low enzymatic activities, could evolve to mature enzymes through accumulation of base substitutions and selection of more efficient enzymes with more rigid structures and higher

In fact, I believe that entirely new proteins have been created and selected from watersoluble globular proteins encoded by GC-NSF(a)s similar to (SNS)n or SNS repeating sequences, even at present, when necessary. Initially, entirely new proteins could be produced by transcription from cryptic promoters and translation of anticodon sequences on GC-rich genes if the proteins had pre-requisite catalytic functions (Figure 5). The newlyborn proteins composed of 20 kinds of amino acids would evolve to mature enzyme with more rigid structure and a high specificity for one specific-organic compound through accumulation of mutations and selection of efficient enzymatic activity as similarly as the case of [GADV]-proteins encoded by (GNC)n anticodon sequences. I have now understood the important role of protein 0th-order structures or specific amino acid compositions in

from all proteins existed before (route 2) (Figure 8) (Ikehara, 2009).

specificities for one organic compound than before.

information.

The most important point for creation of entirely new proteins encoded by the first family genes is to form water-soluble globular structure through random synthesis among amino acids in a protein 0th-order structure, because a quite large number of possible catalytic sites for an organic compound could appear on the surface of one globular protein. The number of possible catalytic sites can be estimated from combinations of amino acids locating on the protein surface as about several hundred points. I have named such a specific amino acid composition favorable for protein structure formation as protein 0th-order structure (Ikehara, 2009), for example, the compositions containing roughly equal amounts of four [GADV]-amino acids (Gly [G], Ala [A], Asp [D] and Val [V]) and ten amino acids ([GADV] amino acids plus Glu [E], Leu [L], Pro [P], His [H], Gln [Q] and Arg [R]) encoded by GNC and SNS codes, as [GADV]- or GNC- and SNS-protein 0th-order structures, respectively. This means that the protein 0th-order structures are secretly written in the universal genetic code table (Figure 7 (B)).

**Origins of genes and proteins:** Genetic code plays a central role in connecting genetic function with catalytic function in the fundamental life system, as described above (Figure 4). Under the GNC code, the first genes must be composed of base sequences carrying only GNC codons, which were produced by random phosphodiester bond formation among GNC codons. Subsequently, the first double-stranded (GNC)n gene would be created by complementary strand synthesis against the single-stranded (GNC)n gene.

(a new original gene from antisense sequence)

Fig. 8. Two routes for producing new genes. Once one original double-stranded (GNC)n gene was produced, new genes were easily produced by using two base sequences (one is from sense sequence and the other is from antisense sequence) of the original gene or through two routes. From route 1, new genes could be produced as modified genes of the original gene or homologous genes in a gene family and from route 2, new genes could be created as "entirely new genes" or the first family genes

Creation of the first double-stranded (GNC)n gene following establishment of the GNC primeval genetic code became the most important points leading to the emergence of life, since the invention of double-stranded genes made it possible for the first time to transmit genetic information from parents to progenies and to evolve it through accumulation of base substitutions and selection of more effective genetic sequences (Ikehara, 2009).

The most important point for creation of entirely new proteins encoded by the first family genes is to form water-soluble globular structure through random synthesis among amino acids in a protein 0th-order structure, because a quite large number of possible catalytic sites for an organic compound could appear on the surface of one globular protein. The number of possible catalytic sites can be estimated from combinations of amino acids locating on the protein surface as about several hundred points. I have named such a specific amino acid composition favorable for protein structure formation as protein 0th-order structure (Ikehara, 2009), for example, the compositions containing roughly equal amounts of four [GADV]-amino acids (Gly [G], Ala [A], Asp [D] and Val [V]) and ten amino acids ([GADV] amino acids plus Glu [E], Leu [L], Pro [P], His [H], Gln [Q] and Arg [R]) encoded by GNC and SNS codes, as [GADV]- or GNC- and SNS-protein 0th-order structures, respectively. This means that the protein 0th-order structures are secretly written in the universal genetic

**Origins of genes and proteins:** Genetic code plays a central role in connecting genetic function with catalytic function in the fundamental life system, as described above (Figure 4). Under the GNC code, the first genes must be composed of base sequences carrying only GNC codons, which were produced by random phosphodiester bond formation among GNC codons. Subsequently, the first double-stranded (GNC)n gene would be created by

5'-ggcgccgtcgtcgtcggcgacgccgcc gtcggcgtcggcgtcgacggcgtcggcggcgac-3' 3'-ccgcggcagcagcagccgctgcggcgg cagccgcagccgcagctgccgcagccgccgctg-5'

(One Original (GNC)n Gene)

(Gene Duplication) route 1 route 2

genetic function Original

(Accumulation of Mutation)

Fig. 8. Two routes for producing new genes. Once one original double-stranded (GNC)n gene was produced, new genes were easily produced by using two base sequences (one is from sense sequence and the other is from antisense sequence) of the original gene or through two routes. From route 1, new genes could be produced as modified genes of the original gene or homologous genes in a gene family and from route 2, new genes could be

(a new original gene from antisense sequence)

genetic function

Creation of the first double-stranded (GNC)n gene following establishment of the GNC primeval genetic code became the most important points leading to the emergence of life, since the invention of double-stranded genes made it possible for the first time to transmit genetic information from parents to progenies and to evolve it through accumulation of base

substitutions and selection of more effective genetic sequences (Ikehara, 2009).

created as "entirely new genes" or the first family genes

(a modified gene from sense sequence)

complementary strand synthesis against the single-stranded (GNC)n gene.

code table (Figure 7 (B)).

Original

Base compositions at three codon positions on sense strands of (GNC)n genes are substantially same as those on anti-sense strands, due to the self-complementary structure of the double-stranded (GNC)n genes. Thus, it is easily supposed that, after creation of the first double-stranded (GNC)n gene, GNC codon sequences on anti-sense strands could be utilized as a field for creation of entirely new functional genes encoding the first ancestor proteins in homologous protein families, since GNC codon sequences on antisense strands are quite different from those on sense strands, as can be actually regarded as random arrangement of GNC codons. In addition, (GNC)n sequences on antisense strands must encode [GADV]-proteins satisfying the four conditions for producing water-soluble globular proteins at a high probability (Ikehara, 2002) (Figure. 6 (B)). Also new genetic information could be created from duplicated sense sequences, as proposed by Ohno (1970). But, the duplicated sense sequences could be utilized only for encoding homologous proteins in a family (route 1). Contrary to that, one of two antisense sequences obtained after gene duplication could give a field for production of the protein, which is quite different from all proteins existed before (route 2) (Figure 8) (Ikehara, 2009).

As seen in Figure 6 (B), [GADV]-proteins must have similar rigidity to extant proteins, when [GADV]-proteins contain less and more amounts of glycine and alanine than one quarter, respectively. Therefore, it is supposed that [GADV]-proteins, which were produced on the primitive earth in the absence of any genetic function or before creation of the first gene, were more flexible than the presently existing proteins, since the proteins should contain flexible turn/coil forming amino acid, glycine, more than rigid α-helix forming amino acid, alanine. The reason is that glycine would be pre-biotically synthesized more easily and accumulated on the primitive earth more than alanine. Therefore, [GADV]-proteins produced on the primitive earth must be more flexible than extant proteins recognizing usually one organic compound with high catalytic activities and high specificities. The flexible [GADV]-proteins would inevitably have only quite low catalytic activities. Even the low activities of the firstly appeared [GADV]-proteins would have been effective for leading to creation of the first genetic code, the first gene and the first life on the primitive earth. That is because the existence of [GADV]-proteins having the low catalytic activity must be important to develop new metabolic pathway on the primitive earth without any genetic information.

Formation of flexible but inefficient [GADV]-proteins was also essential to create newlyborn proteins or the first family proteins even after the first double-stranded (GNC)n gene was produced, because the proteins, which were newly produced as ones with quite low enzymatic activities, could evolve to mature enzymes through accumulation of base substitutions and selection of more efficient enzymes with more rigid structures and higher specificities for one organic compound than before.

In fact, I believe that entirely new proteins have been created and selected from watersoluble globular proteins encoded by GC-NSF(a)s similar to (SNS)n or SNS repeating sequences, even at present, when necessary. Initially, entirely new proteins could be produced by transcription from cryptic promoters and translation of anticodon sequences on GC-rich genes if the proteins had pre-requisite catalytic functions (Figure 5). The newlyborn proteins composed of 20 kinds of amino acids would evolve to mature enzyme with more rigid structure and a high specificity for one specific-organic compound through accumulation of mutations and selection of efficient enzymatic activity as similarly as the case of [GADV]-proteins encoded by (GNC)n anticodon sequences. I have now understood the important role of protein 0th-order structures or specific amino acid compositions in

Origin of the Genetic Code and Genetic Disorder 17

positions, respectively. Therefore, the robustness of the genetic code could protect from destroy of protein's active state at a high probability, even if base substitutions occurred at the third and the first codon positions in genetic sequences and even when amino acid substitutions were introduced at the sites of secondary structures as α-helix and β-sheet structures. In contrast, base substitutions at the second codon positions would affect largely the protein functions, leading to the genetic disorders at a high probability, as shown in Figure 9. According to the GNC-SNS primitive genetic code hypothesis, it is considered that the genetic code originated from GNC successively to SNS and finally to the universal genetic code as expanding the code up and down in the genetic code table as described in Section 3. From the evolutionary pathway of the genetic code, it can be understood that codons encoding amino acids with similar and with chemically different amino acids were arranged in columns and rows of the genetic code table, respectively. In other words, it is considered that the genetic code evolved as raising coding capacity to modulate the protein function, and as capturing new codons encoding new amino acids into vacant positions of the previous code table during evolutionary process. Therefore, the robustness of the genetic code could be generated from the origin and evolutionary processes of the genetic code, as

1. Base substitution at the first codon position, but introducing no base change at the second position, does not destroy protein function at a high probability, since codons in the same column of the genetic code table code for amino acids with comparatively similar chemical/physical properties, because amino acids with the same color background are arranged in two and one columns out of four columns of hydrophacy and turn/coil tables, respectively. This can be also confirmed from the facts shown in

2. Base substitution at the second codon position largely destroys protein function at a high probability, since codons located in the same row of the genetic code table encode amino acids with quite different chemical/physical properties (Table 2). Certainly, amino acids with the same color background are not observed on any row of four tables, except for one row having two termination codons in Table 2 (C). Amino acids with two different color backgrounds are arranged in eighteen out of 64 rows of the four tables of Table 2, otherwise amino acids in the same rows have three color

3. Base substitutions at the third codon position induce no amino acid replacement due to the degeneracy of the genetic code and substitutions between amino acids with similar chemical/physical properties, such as Phe-Leu, Asp-Glu, His-Gln and so on, are

Generally speaking, only base substitutions occurred at the second codon position, not at the first and third codon positions, induce substitutions between amino acids with largely different chemical and physical properties. The skillful location of codons in the genetic code table gives the genetic code robustness against base substitutions on genetic sequences, which is derived from the origin and evolutionary process of the genetic code, as suggested

Genetic disorders are actually caused by base changes on autosomes and sex-chromosomes as X-chromosome, or on genomes in organelles as mitochondria. The genetic disorders are

by the GNC-SNS primitive genetic code hypothesis (Ikehara et al., 2005).

**7. The universal genetic code and genetic disorder** 

described below.

Table 2.

backgrounds.

observed at a high probability.

creation of entirely new proteins or the first family proteins. As a matter of course, mechanisms for the creation of entirely new proteins intimately related to the creation of entirely new genes. These new concepts on the origins of the genetic code, proteins and genes led to the GADV hypothesis on the origin of life.

### **5. GNC primeval genetic code and origin of life**

In this Section, I will describe briefly GADV hypothesis on the origin of life, since the hypothesis, which I have proposed, is intimately related to the origin of the genetic code or the GNC primeval genetic code.

RNA world hypothesis has been proposed as a key idea for solving the "chicken and egg dilemma" observed between genes and proteins or the origin of life and has been widely accepted by many investigators at the present time. While I have proposed a novel hypothesis on the origin of life as GADV hypothesis, suggesting that life originated from [GADV]-protein world, which was composed of [GADV]-proteins accumulated by pseudoreplication of the proteins in the absence of any genetic function (Ikehara, 2002; Ikehara, 2005, Ikehara, 2009). In the hypothesis, it is assumed that life emerged from the world through establishment of GNC primeval genetic code followed by formation of singlestranded and double-stranded (GNC)n genes.

I believe that the most important point for solving the riddle on the origin of life would be to understand the origin and evolutionary processes of the fundamental life system, which is composed of genetic function, genetic code and catalytic function (Figure 4), not always to solve the "chicken and egg dilemma" observed between genes and protein, as considered in the RNA world hypothesis. Therefore, the GADV hypothesis would be far more rational to explain the origin of life than the RNA world hypothesis, because the former can easily explain formation processes of the fundamental life system composed of genes, the genetic code and proteins comprehensively as well as the "chicken and egg dilemma" (Ikehara, 2009). Contrary to that, the RNA hypothesis probably cannot explain the ways how the fundamental life system was created, because the hypothesis based on self-replication of RNA, which is carried out by polymerization of nucleotides one-by-one, cannot explain the origins of the genetic code and genes, which are composed of codons having triplet nucleotide sequences.

### **6. Robustness of the universal genetic code**

Most genetic disorders are quite rare as causing the disorders at a ratio of only one person in every thousands or millions. The frequency of a genetic disorder caused by one-base substitution mainly relies on mutation rate. But, as given in Figures 2 and 3, in the cases of homologous microbial proteins belonging in the same protein family, many amino acid substitutions are observed without largely affecting protein function. The reasons are given as followings. The first one is because, utilization of many kinds of amino acids would be permissible in flexible regions of a protein at a high probability, such as turn/coil structures connecting two secondary structures and unstructured segments observed at C-terminal segment and/or at N-terminal segment at a high frequency, as can be seen in Figure 2. The second one could be attributed to the robustness of the universal genetic code, making it possible to use the same amino acids and different amino acids but with similar chemical and physical properties, when base substitutions occurred at the third and the first codon

creation of entirely new proteins or the first family proteins. As a matter of course, mechanisms for the creation of entirely new proteins intimately related to the creation of entirely new genes. These new concepts on the origins of the genetic code, proteins and

In this Section, I will describe briefly GADV hypothesis on the origin of life, since the hypothesis, which I have proposed, is intimately related to the origin of the genetic code or

RNA world hypothesis has been proposed as a key idea for solving the "chicken and egg dilemma" observed between genes and proteins or the origin of life and has been widely accepted by many investigators at the present time. While I have proposed a novel hypothesis on the origin of life as GADV hypothesis, suggesting that life originated from [GADV]-protein world, which was composed of [GADV]-proteins accumulated by pseudoreplication of the proteins in the absence of any genetic function (Ikehara, 2002; Ikehara, 2005, Ikehara, 2009). In the hypothesis, it is assumed that life emerged from the world through establishment of GNC primeval genetic code followed by formation of single-

I believe that the most important point for solving the riddle on the origin of life would be to understand the origin and evolutionary processes of the fundamental life system, which is composed of genetic function, genetic code and catalytic function (Figure 4), not always to solve the "chicken and egg dilemma" observed between genes and protein, as considered in the RNA world hypothesis. Therefore, the GADV hypothesis would be far more rational to explain the origin of life than the RNA world hypothesis, because the former can easily explain formation processes of the fundamental life system composed of genes, the genetic code and proteins comprehensively as well as the "chicken and egg dilemma" (Ikehara, 2009). Contrary to that, the RNA hypothesis probably cannot explain the ways how the fundamental life system was created, because the hypothesis based on self-replication of RNA, which is carried out by polymerization of nucleotides one-by-one, cannot explain the origins of the genetic code and genes, which are composed of codons having triplet

Most genetic disorders are quite rare as causing the disorders at a ratio of only one person in every thousands or millions. The frequency of a genetic disorder caused by one-base substitution mainly relies on mutation rate. But, as given in Figures 2 and 3, in the cases of homologous microbial proteins belonging in the same protein family, many amino acid substitutions are observed without largely affecting protein function. The reasons are given as followings. The first one is because, utilization of many kinds of amino acids would be permissible in flexible regions of a protein at a high probability, such as turn/coil structures connecting two secondary structures and unstructured segments observed at C-terminal segment and/or at N-terminal segment at a high frequency, as can be seen in Figure 2. The second one could be attributed to the robustness of the universal genetic code, making it possible to use the same amino acids and different amino acids but with similar chemical and physical properties, when base substitutions occurred at the third and the first codon

genes led to the GADV hypothesis on the origin of life.

**5. GNC primeval genetic code and origin of life** 

the GNC primeval genetic code.

nucleotide sequences.

stranded and double-stranded (GNC)n genes.

**6. Robustness of the universal genetic code** 

positions, respectively. Therefore, the robustness of the genetic code could protect from destroy of protein's active state at a high probability, even if base substitutions occurred at the third and the first codon positions in genetic sequences and even when amino acid substitutions were introduced at the sites of secondary structures as α-helix and β-sheet structures. In contrast, base substitutions at the second codon positions would affect largely the protein functions, leading to the genetic disorders at a high probability, as shown in Figure 9. According to the GNC-SNS primitive genetic code hypothesis, it is considered that the genetic code originated from GNC successively to SNS and finally to the universal genetic code as expanding the code up and down in the genetic code table as described in Section 3. From the evolutionary pathway of the genetic code, it can be understood that codons encoding amino acids with similar and with chemically different amino acids were arranged in columns and rows of the genetic code table, respectively. In other words, it is considered that the genetic code evolved as raising coding capacity to modulate the protein function, and as capturing new codons encoding new amino acids into vacant positions of the previous code table during evolutionary process. Therefore, the robustness of the genetic code could be generated from the origin and evolutionary processes of the genetic code, as described below.


Generally speaking, only base substitutions occurred at the second codon position, not at the first and third codon positions, induce substitutions between amino acids with largely different chemical and physical properties. The skillful location of codons in the genetic code table gives the genetic code robustness against base substitutions on genetic sequences, which is derived from the origin and evolutionary process of the genetic code, as suggested by the GNC-SNS primitive genetic code hypothesis (Ikehara et al., 2005).

### **7. The universal genetic code and genetic disorder**

Genetic disorders are actually caused by base changes on autosomes and sex-chromosomes as X-chromosome, or on genomes in organelles as mitochondria. The genetic disorders are

Origin of the Genetic Code and Genetic Disorder 19

D 2 1 1 2 2

H 3 1 2 2 3

A 1 2 1 1 1

G 1 2 3 6 1

L 5 4 1 1 1 M 1 1 1 2 N 1 1 1 P 1 1 1 1 1

R 1 1 2 1 1 4 1 1

Protein 1st 2nd 3rd 1,2 1,3 others OTCD 35 60 7 1 10 2

Fig. 9. Amino acid replacements observed in a genetic disorder, ornithine transcarbamoylase deficiency (OTCD). Letters written in the most left column and the top row indicate amino acids of normal ornithine transcarbamoylase described with one-letter symbols and those of mutated ornithine transcarbamoylase causing OTCD. Blue, yellow and red boxes indicate amino acid substitutions caused by base changes at the first, the second and the third codon positions, respectively. Green, orange and white boxes indicate amino acid replacements induced by base substitutions at the first or the second codon position, at the first or the third codon position and other base substitutions, respectively. Color box representation is the same as Figure 3. Data of the amino acid replacements observed in OTCD were obtained

The genetic disorders upon one-base substitutions in genes encoding amino acid sequences of proteins are induced by the base substitutions at the second codon position more

from Natural Variants in Protein Knowledgebase (UniProKB) at the address of

I 1 1 1 1

K 1 1

Q 1 1 1

S 1 1 1 4

T 2 3 3 2 V 2 W 1

http://www.uniprot.org/uniprot/P00480

C

Y 3 3

**8. Conclusion** 

E 1 2 F 1 1

A C D E F G H I K L M N P Q R S T V W Y

classified by location of genetic elements, as autosomal, X-linked, Y-linked and mitochondrial. Now, it is known that many patients are suffered from genetic disorders induced by one-base substitutions on DNA. Several representative genetic disorders are described in Table 1. For simplicity, genetic diseases induced by deletions and insertions of genetic sequences are excluded from the Table. The number of genetic disorders would be reach to the total number of genes (about from twenty to thirty thousands in human), since almost all genes are essential for organisms to live.

Besides classification by locations of genetic changes, the disorders are also classified by forms of the genetic disease appearance into descendants, as dominant and recessive. Genetic disorders caused by mutation of DNA sequences on genomes encoding metabolic enzymes, which leads to reduction of enzyme activities, such as ADA (adenosine deaminase) deficiency and PKU (phenylketonurea), are generally inherited in recessive manners. Autosomal recessive genetic disorders are not appeared into their children, if either parent has two normal genes on two chromosomes, and the disorders are inherited at a 25% chance if both parents are carriers of the disorder. Contrary to that, Huntington's disease and neurofibromatosis caused by inheritance of the abnormal genes from either parent are inherited dominant manner. Therefore, each child has a 50% chance upon inheriting the genetic disorder, if just one parent has a dominant gene defect.

Genetic disorders caused by one-base substitutions are induced when base changes in genetic sequences went across a framework of the robust genetic code or when the base changes made proteins not to satisfy the conditions for formation of water-soluble globular structures, resulting in collapsing the protein structures. As I have discussed in this Chapter, many patients would be suffered from genetic disorders upon even one-amino acid replacement at a high probability, if one-base substitution occurred at the second codon positions. As can be seen in Figure 9, ornithine transcarbamoylase deficiency (OTCD) appears, when one amino acid is replaced to other amino acid encoded by codon having different base at the second codon position, more frequently than the replacement occurring between amino acids encoded by two codons having different bases at the first codon position.

This makes a remarkable contrast with the amino acid replacements observed between homologous proteins with similarly active catalytic function as given in Figures 2 and 3. Therefore, it suggests that it is important to repress base substitutions at the second codon position in genetic sequences in order to protect from genetic diseases. It is necessary to recognize bases at the second base position of codon to accomplish the purpose. As genetic sequences or genes are codon sequences not always mere nucleotide sequences, it would be possible to discriminate the bases at the second codon position from bases at the other two codon positions, based on the differential base compositions at the three base positions in codons. The reason is that it is already known that codons in genetic sequences encoding microbial proteins have specific base compositions at the three respective base positions. For example, guanine bases are generally observed more frequently at the first codon position than other three bases, whereas relatively equal amounts of four bases are contained at the second codon position of GC-rich genes (Ikehara, et al. 1996), although it is almost impossible to find out the strategy for protection of base substitutions at the second codon position at the present time. But, it would be important to recognize the facts described above, as the first step of discovery of the strategies for repression of base replacements at the second codon position in genetic sequences. New possible genetic treatment discovered will release human beings from genetic disorders in a future.

classified by location of genetic elements, as autosomal, X-linked, Y-linked and mitochondrial. Now, it is known that many patients are suffered from genetic disorders induced by one-base substitutions on DNA. Several representative genetic disorders are described in Table 1. For simplicity, genetic diseases induced by deletions and insertions of genetic sequences are excluded from the Table. The number of genetic disorders would be reach to the total number of genes (about from twenty to thirty thousands in human), since

Besides classification by locations of genetic changes, the disorders are also classified by forms of the genetic disease appearance into descendants, as dominant and recessive. Genetic disorders caused by mutation of DNA sequences on genomes encoding metabolic enzymes, which leads to reduction of enzyme activities, such as ADA (adenosine deaminase) deficiency and PKU (phenylketonurea), are generally inherited in recessive manners. Autosomal recessive genetic disorders are not appeared into their children, if either parent has two normal genes on two chromosomes, and the disorders are inherited at a 25% chance if both parents are carriers of the disorder. Contrary to that, Huntington's disease and neurofibromatosis caused by inheritance of the abnormal genes from either parent are inherited dominant manner. Therefore, each child has a 50% chance upon

Genetic disorders caused by one-base substitutions are induced when base changes in genetic sequences went across a framework of the robust genetic code or when the base changes made proteins not to satisfy the conditions for formation of water-soluble globular structures, resulting in collapsing the protein structures. As I have discussed in this Chapter, many patients would be suffered from genetic disorders upon even one-amino acid replacement at a high probability, if one-base substitution occurred at the second codon positions. As can be seen in Figure 9, ornithine transcarbamoylase deficiency (OTCD) appears, when one amino acid is replaced to other amino acid encoded by codon having different base at the second codon position, more frequently than the replacement occurring between amino acids encoded by two codons having different bases at the first codon

This makes a remarkable contrast with the amino acid replacements observed between homologous proteins with similarly active catalytic function as given in Figures 2 and 3. Therefore, it suggests that it is important to repress base substitutions at the second codon position in genetic sequences in order to protect from genetic diseases. It is necessary to recognize bases at the second base position of codon to accomplish the purpose. As genetic sequences or genes are codon sequences not always mere nucleotide sequences, it would be possible to discriminate the bases at the second codon position from bases at the other two codon positions, based on the differential base compositions at the three base positions in codons. The reason is that it is already known that codons in genetic sequences encoding microbial proteins have specific base compositions at the three respective base positions. For example, guanine bases are generally observed more frequently at the first codon position than other three bases, whereas relatively equal amounts of four bases are contained at the second codon position of GC-rich genes (Ikehara, et al. 1996), although it is almost impossible to find out the strategy for protection of base substitutions at the second codon position at the present time. But, it would be important to recognize the facts described above, as the first step of discovery of the strategies for repression of base replacements at the second codon position in genetic sequences. New possible genetic

treatment discovered will release human beings from genetic disorders in a future.

inheriting the genetic disorder, if just one parent has a dominant gene defect.

almost all genes are essential for organisms to live.

position.



Fig. 9. Amino acid replacements observed in a genetic disorder, ornithine transcarbamoylase deficiency (OTCD). Letters written in the most left column and the top row indicate amino acids of normal ornithine transcarbamoylase described with one-letter symbols and those of mutated ornithine transcarbamoylase causing OTCD. Blue, yellow and red boxes indicate amino acid substitutions caused by base changes at the first, the second and the third codon positions, respectively. Green, orange and white boxes indicate amino acid replacements induced by base substitutions at the first or the second codon position, at the first or the third codon position and other base substitutions, respectively. Color box representation is the same as Figure 3. Data of the amino acid replacements observed in OTCD were obtained from Natural Variants in Protein Knowledgebase (UniProKB) at the address of http://www.uniprot.org/uniprot/P00480

### **8. Conclusion**

The genetic disorders upon one-base substitutions in genes encoding amino acid sequences of proteins are induced by the base substitutions at the second codon position more

**2** 

*Spain* 

*Santiago de Compostela* 

**Inbreeding and Genetic Disorder** 

Gonzalo Alvarez1, Celsa Quinteiro2 and Francisco C. Ceballos1

*1Departamento de Genética, Facultad de Biología, Universidad de Santiago de Compostela, 2Fundación Pública Gallega de Medicina Genómica, Hospital Clínico Universitario,* 

 Inbreeding is usually defined as the mating between relatives and the progeny that result of a consanguineous mating between two related individuals is said to be inbred (Cavalli-Sforza & Bodmer, 1971; Hedrick, 2005; Vogel & Motulsky, 1997). As a result of inheriting the same chromosomal segment through both parents, who inherited it from a common ancestor, the individuals born of consanguineous unions have a number of segments of their chromosomes that are homozygous. Therefore, inbreeding increases the amount of homozygosity and, consequently, recessive alleles hidden by heterozygosity with dominant alleles will be expressed through inbreeding. On this basis, it is expected that recessive traits such as many human genetic disorders will occur with increased frequency in the progeny of consanguineous couples. In addition, since many recessive alleles present in natural populations have harmful effects on the organism, inbreeding usually leads to a decrease in size, vigor and reproductive fitness. In a broad sense, it is necessary to consider that inbreeding can occur under two quite different biological situations. There may be inbreeding because of restriction of population number. The degree of relationship between the individuals in a population depends on the size of that population since the individuals are more closely related to each other in a small population than in a large one. Thus, inbreeding is a phenomenon frequently associated with small populations. On the other hand, inbreeding can occur in a large population as a form of nonrandom mating when the frequency of consanguineous matings is higher than that expected by chance. In this case, the population will show a homozygote excess with respect to a random mating population in which genotypic frequencies are expected to be in Hardy-Weinberg equilibrium. The greatest extent of inbreeding is found in plants. A number of plant species are predominantly self-fertilizing which means that most individuals reproduce by selffertilization, the most extreme form of inbreeding. In animals, inbreeding is less prevalent than in plants, even though some invertebrates have brother-sister matings as some Hymenoptera. Inbreeding also plays a very important role in animal and plant breeding because the number of breeding individuals in breeding programs is often not large. In this way, the inbreeding effects associated with small population size must be considered in the

In humans, consanguineous marriage is frequent in many populations. In fact, it has been recently estimated that consanguineous couples and their progeny suppose about 10.4 % of

**1. Introduction** 

context of animal and plant breeding.

frequently than those at the first codon position. The fact intimately relates to the robustness of the genetic code, which is derived from the origin and evolutionary process of the genetic code. According to the GNC-SNS primitive genetic code hypothesis, which I have proposed, it is considered that the universal genetic code originated from GNC code through SNS code as expanding the code up and down in the genetic code table. Due to the origin and evolutionary process of the genetic code, amino acids with similar chemical and physical properties have been located in the same columns. The arrangement of amino acids in the genetic code table makes it possible to repress induction of genetic disorders at a low rate, because one-base substitutions at the first codon position do not largely affect protein functions at a high probability. I would like to say that it is important to understand correctly the main cause inducing the genetic disorders as the first step for protection of the diseases, and that the recognition will release human beings from many genetic disorders someday.

#### **9. Acknowledgment**

I am grateful to Dr. Tadashi Oishi (Narasaho College) for the encouragement of our research on GNC-SNS hypothesis on the genetic code and GADV hypothesis on the origin of life.

#### **10. References**


## **Inbreeding and Genetic Disorder**

Gonzalo Alvarez1, Celsa Quinteiro2 and Francisco C. Ceballos1

*1Departamento de Genética, Facultad de Biología, Universidad de Santiago de Compostela, 2Fundación Pública Gallega de Medicina Genómica, Hospital Clínico Universitario, Santiago de Compostela Spain* 

#### **1. Introduction**

20 Advances in the Study of Genetic Disorders

frequently than those at the first codon position. The fact intimately relates to the robustness of the genetic code, which is derived from the origin and evolutionary process of the genetic code. According to the GNC-SNS primitive genetic code hypothesis, which I have proposed, it is considered that the universal genetic code originated from GNC code through SNS code as expanding the code up and down in the genetic code table. Due to the origin and evolutionary process of the genetic code, amino acids with similar chemical and physical properties have been located in the same columns. The arrangement of amino acids in the genetic code table makes it possible to repress induction of genetic disorders at a low rate, because one-base substitutions at the first codon position do not largely affect protein functions at a high probability. I would like to say that it is important to understand correctly the main cause inducing the genetic disorders as the first step for protection of the diseases, and that the recognition will release human beings from many genetic disorders

I am grateful to Dr. Tadashi Oishi (Narasaho College) for the encouragement of our research on GNC-SNS hypothesis on the genetic code and GADV hypothesis on the origin of life.

Berg JM. Tymoczko JL, & Stryer L. (2002) Biochemistry 5th ed. New York: W. H. Freeman

Ikehara, K. (2002) Origins of gene, genetic code, protein and life: comprehensive view of life system from a GNC-SNS primitive genetic code hypothesis. *J. Biosci*. 27, 165-186. Ikehara, K. (2005) Possible steps to the emergence of life: The [GADV]-protein world

Ikehara, K. (2009) Pseudo-replication of [GADV]-proteins and origin of life. *Int. J. Mol. Sci.*,

Ikehara, K., Amada, F., Yoshida, S., Mikata, Y., & Tanaka, A. (1996) A possible origin of

Ikehara, K., Omori, Y., Arai, R. & Hirose, A. (2002) A novel theory on the origin of the

Ikehara, K., & Yoshida, Y. (1998) SNS hypothesis on the origin of the genetic code. *Viva* 

newly-born bacterial genes: significance of GC-rich nonstop frame on antisense

(*International Journal of Molecular Sciences*) Vol. 10, No. 4, 1525-1537.

genetic code: a GNC-SNS hypothesis. *J. Mol. Evol*., 54, 530-538.

Ohno, S. (1970) Evolution by Gene Duplication, Springer: Heidelberg, Germany.

someday.

**9. Acknowledgment** 

and Company.

*Origino*, 26, 301-310.

hypothesis. *Chem. Record*, 5, 107-118.

strand. Nucl. Acids Res., 24, 4249-4255.

**10. References** 

 Inbreeding is usually defined as the mating between relatives and the progeny that result of a consanguineous mating between two related individuals is said to be inbred (Cavalli-Sforza & Bodmer, 1971; Hedrick, 2005; Vogel & Motulsky, 1997). As a result of inheriting the same chromosomal segment through both parents, who inherited it from a common ancestor, the individuals born of consanguineous unions have a number of segments of their chromosomes that are homozygous. Therefore, inbreeding increases the amount of homozygosity and, consequently, recessive alleles hidden by heterozygosity with dominant alleles will be expressed through inbreeding. On this basis, it is expected that recessive traits such as many human genetic disorders will occur with increased frequency in the progeny of consanguineous couples. In addition, since many recessive alleles present in natural populations have harmful effects on the organism, inbreeding usually leads to a decrease in size, vigor and reproductive fitness. In a broad sense, it is necessary to consider that inbreeding can occur under two quite different biological situations. There may be inbreeding because of restriction of population number. The degree of relationship between the individuals in a population depends on the size of that population since the individuals are more closely related to each other in a small population than in a large one. Thus, inbreeding is a phenomenon frequently associated with small populations. On the other hand, inbreeding can occur in a large population as a form of nonrandom mating when the frequency of consanguineous matings is higher than that expected by chance. In this case, the population will show a homozygote excess with respect to a random mating population in which genotypic frequencies are expected to be in Hardy-Weinberg equilibrium. The greatest extent of inbreeding is found in plants. A number of plant species are predominantly self-fertilizing which means that most individuals reproduce by selffertilization, the most extreme form of inbreeding. In animals, inbreeding is less prevalent than in plants, even though some invertebrates have brother-sister matings as some Hymenoptera. Inbreeding also plays a very important role in animal and plant breeding because the number of breeding individuals in breeding programs is often not large. In this way, the inbreeding effects associated with small population size must be considered in the context of animal and plant breeding.

In humans, consanguineous marriage is frequent in many populations. In fact, it has been recently estimated that consanguineous couples and their progeny suppose about 10.4 % of

Inbreeding and Genetic Disorder 23

chromosome regions with high linkage disequilibrium and low recombination but since linkage disequilibrium is a local phenomenon would cause only short homozygous segments (Broman and Weber, 1999; Gibson et al., 2006). A genomic measure of individual autozygosity termed Froh has been defined as the proportion of the autosomal genome in

Froh = ΣLroh / Lauto where ΣLroh is the total length of all ROHs in the individual above a specified minimum length and Lauto is the length of the autosomal genome covered by the genomic markers (McQuillan et al., 2008). In a genome-wide study based on a 300,000 SNP panel, it has been found a strong correlation (r = 0.86) between Froh and the genealogical inbreeding coefficient (F) among 249 individuals from the isolate population of the Orkney Isles in northern Scotland, for which complete and reliable pedigree data were available (McQuillan et al., 2008). Froh values were computed for a range of minimum-length thresholds (0.5, 1.5 and 5 Mb) and the mean value of Froh for 5 Mb was the closest Froh to that of F computed from pedigree data. ROHs measuring less than 3 or 4 Mb were not uncommon in unrelated individuals. The size of the autozygous segments and their distribution throughout the human genome has been investigated in inbred individuals with recessive Mendelian disorders (Woods et al., 2006). Through a whole-genome scan of 10,000 SNPs, individuals affected with a recessive disease whose parents were first cousins drawn from two populations with a long history of consanguinity (Pakistani and Arab) presented, on average, 20 homozygous segments (range 7-32 homozygous segments) exceeding 3 cM and a size of the homozygous segment associated with recessive disease of 26 cM (range 5-70 cM). The proportion of their genomes that was homozygous varied from 5 to 20% with a mean value of 11%. This figure is increased about 5 % over the expected value for the offspring of a first-cousin union (F = 0.0625) but it is necessary to take into account that the proportion of the genome identical by descent has a large stochastic variation (Carothers et al., 2006). Moreover, the individuals analyzed were those children of first cousins presenting a genetic disorder so that they were a biased sample of a first-cousin progeny. Through the genome scan technology, several studies have shown that extended tracts of genomic homozygosity are globally widespread in many human populations and they provide valuable information of a population´s demographic history such as past consanguinity and

Autozygosity has practical implications for the identification of human disease genes. Homozygosity mapping is the method of choice for mapping human genes that cause rare recessive Mendelian diseases (Botstein & Risch, 2003; Lander and Botstein, 1987). The method consists of searching for a region of the genome that is autozygous in individuals affected by a given disease from consanguineous families. Thus, the disease locus is detected on the basis that the adjacent region will be homozygous by descent in such inbred individuals. The method is also known as autozygosity or consanguinity mapping and has the advantage that relatively few individuals are required. Homozygosity mapping became practical with the discovery of multiple highly polymorphic markers. The first polymorphic markers used were restriction length polymorphisms, subsequently, short sequence repeats and more recently single nucleotide polymorphisms (SNPs) (Woods et al., 2004). Since 1995 until 2003, nearly 200 studies were published in which homozygosity mapping was used to map human genes causing rare recessive disease phenotypes (Botstein and Risch, 2003).

runs of homozygosity above a specified length threshold:

population isolation (Kirin et al., 2010; Nalls et al., 2009).

the 6.7 billion global population of the world (Bittles & Black, 2010). First-cousin marriage and other types of consanguineous unions are frequent in a number of current populations from different parts of the world. The extent of inbreeding of an individual is usually measured in terms of his or her inbreeding coefficient. The coefficient of inbreeding (F) is the probability that an individual receives at a given autosomal locus two alleles that are identical by descent or, equivalently, the proportion of the individual´s autosomal genome expected to be homozygous by descent (autozygous) (Cavalli-Sforza & Bodmer, 1971; Hedrick, 2005). If genealogical information is available for a given individual, his or her inbreeding coefficient can be computed from pedigree analysis. The computation of the genealogical inbreeding coefficient assumes neutrality with respect to natural selection so that the transmission probabilities of alleles can be calculated from Mendelian ratios. In humans, the most extreme cases of inbreeding corresponds to incestuous unions defined as mating between biological first-degree relatives; i. e., father-daughter, mother-son and brother-sister. The progeny from an incestuous union will have an inbreeding coefficient of ¼ (0.25) in the three cases. Offspring of uncle-niece, first-cousin, and second-cousin marriages will have F = 1/8 (0.125), 1/16 (0.0625) and 1/64 (0.0156), respectively. In complex genealogies, the depth of the pedigree is very important for the computation of the inbreeding coefficient. In some cases, genealogical data from the most recent four or five generations seem to be sufficient to capture most of the information relevant to the calculation of the inbreeding coefficient (Balloux et al., 2004). This is due to the fact that recent inbreeding events have a disproportionately large influence on an individual´s inbreeding coefficient relative to events deeper in the pedigree. However, in some large and complex pedigrees, ancestral or remote consanguinity can make a substantial contribution to the inbreeding of a given individual and the exploration of pedigrees limited to a shallow depth carries the risk of underestimating the degree to which individuals are inbred (Alvarez et al., 2009; Boyce, 1983; MacCluer et al., 1983). Computation of inbreeding coefficients from extended pedigrees will be necessary in order to obtain an accurate measure of the inbreeding level in those situations in which remote consanguinity is important.

Studies on genome-wide homozygosity through the genome scan technology have opened new avenues for inbreeding research. Thus, genome-wide homozygosity may be used to estimate the inbreeding coefficient for a given individual when genealogical information is not available. Furthermore, the study of genome-wide homozygosity is very important for the identification of recessive disease genes through homozygosity mapping as well as for the investigation of homozygosity effects on traits of biomedical importance. Long homozygous chromosomal segments have been detected in human chromosomes from the analysis of polymorphic markers in whole-genome scans (Broman & Weber, 1999; McQuillan et al., 2008). These long tracts where homozygous markers occur in an uninterrupted sequence are often termed runs of homozygosity (ROH) and can arise in the genome through a number of mechanisms (Broman & Weber, 1999; Gibson et al., 2006). The most obvious explanation for such tracts is autozygosity, where the same chromosomal segment has been passed to a child from parents who inherited it from a common ancestor. The length of an autozygous segment reflects its age since haplotypes are broken up by recombination at meiosis in such a way that long tracts are expected to occur by close inbreeding whereas a short autozygous segment is likely to be the result of the mating of very distantly related individuals. Homozygous tracts are significantly more common in

the 6.7 billion global population of the world (Bittles & Black, 2010). First-cousin marriage and other types of consanguineous unions are frequent in a number of current populations from different parts of the world. The extent of inbreeding of an individual is usually measured in terms of his or her inbreeding coefficient. The coefficient of inbreeding (F) is the probability that an individual receives at a given autosomal locus two alleles that are identical by descent or, equivalently, the proportion of the individual´s autosomal genome expected to be homozygous by descent (autozygous) (Cavalli-Sforza & Bodmer, 1971; Hedrick, 2005). If genealogical information is available for a given individual, his or her inbreeding coefficient can be computed from pedigree analysis. The computation of the genealogical inbreeding coefficient assumes neutrality with respect to natural selection so that the transmission probabilities of alleles can be calculated from Mendelian ratios. In humans, the most extreme cases of inbreeding corresponds to incestuous unions defined as mating between biological first-degree relatives; i. e., father-daughter, mother-son and brother-sister. The progeny from an incestuous union will have an inbreeding coefficient of ¼ (0.25) in the three cases. Offspring of uncle-niece, first-cousin, and second-cousin marriages will have F = 1/8 (0.125), 1/16 (0.0625) and 1/64 (0.0156), respectively. In complex genealogies, the depth of the pedigree is very important for the computation of the inbreeding coefficient. In some cases, genealogical data from the most recent four or five generations seem to be sufficient to capture most of the information relevant to the calculation of the inbreeding coefficient (Balloux et al., 2004). This is due to the fact that recent inbreeding events have a disproportionately large influence on an individual´s inbreeding coefficient relative to events deeper in the pedigree. However, in some large and complex pedigrees, ancestral or remote consanguinity can make a substantial contribution to the inbreeding of a given individual and the exploration of pedigrees limited to a shallow depth carries the risk of underestimating the degree to which individuals are inbred (Alvarez et al., 2009; Boyce, 1983; MacCluer et al., 1983). Computation of inbreeding coefficients from extended pedigrees will be necessary in order to obtain an accurate measure of the inbreeding level in those situations in which remote consanguinity is

Studies on genome-wide homozygosity through the genome scan technology have opened new avenues for inbreeding research. Thus, genome-wide homozygosity may be used to estimate the inbreeding coefficient for a given individual when genealogical information is not available. Furthermore, the study of genome-wide homozygosity is very important for the identification of recessive disease genes through homozygosity mapping as well as for the investigation of homozygosity effects on traits of biomedical importance. Long homozygous chromosomal segments have been detected in human chromosomes from the analysis of polymorphic markers in whole-genome scans (Broman & Weber, 1999; McQuillan et al., 2008). These long tracts where homozygous markers occur in an uninterrupted sequence are often termed runs of homozygosity (ROH) and can arise in the genome through a number of mechanisms (Broman & Weber, 1999; Gibson et al., 2006). The most obvious explanation for such tracts is autozygosity, where the same chromosomal segment has been passed to a child from parents who inherited it from a common ancestor. The length of an autozygous segment reflects its age since haplotypes are broken up by recombination at meiosis in such a way that long tracts are expected to occur by close inbreeding whereas a short autozygous segment is likely to be the result of the mating of very distantly related individuals. Homozygous tracts are significantly more common in

important.

chromosome regions with high linkage disequilibrium and low recombination but since linkage disequilibrium is a local phenomenon would cause only short homozygous segments (Broman and Weber, 1999; Gibson et al., 2006). A genomic measure of individual autozygosity termed Froh has been defined as the proportion of the autosomal genome in runs of homozygosity above a specified length threshold:

$$\mathbf{F\_{roh}} = \boldsymbol{\Sigma} \mathbf{I\_{roh}} / \; \mathbf{L\_{auto}}$$

where ΣLroh is the total length of all ROHs in the individual above a specified minimum length and Lauto is the length of the autosomal genome covered by the genomic markers (McQuillan et al., 2008). In a genome-wide study based on a 300,000 SNP panel, it has been found a strong correlation (r = 0.86) between Froh and the genealogical inbreeding coefficient (F) among 249 individuals from the isolate population of the Orkney Isles in northern Scotland, for which complete and reliable pedigree data were available (McQuillan et al., 2008). Froh values were computed for a range of minimum-length thresholds (0.5, 1.5 and 5 Mb) and the mean value of Froh for 5 Mb was the closest Froh to that of F computed from pedigree data. ROHs measuring less than 3 or 4 Mb were not uncommon in unrelated individuals. The size of the autozygous segments and their distribution throughout the human genome has been investigated in inbred individuals with recessive Mendelian disorders (Woods et al., 2006). Through a whole-genome scan of 10,000 SNPs, individuals affected with a recessive disease whose parents were first cousins drawn from two populations with a long history of consanguinity (Pakistani and Arab) presented, on average, 20 homozygous segments (range 7-32 homozygous segments) exceeding 3 cM and a size of the homozygous segment associated with recessive disease of 26 cM (range 5-70 cM). The proportion of their genomes that was homozygous varied from 5 to 20% with a mean value of 11%. This figure is increased about 5 % over the expected value for the offspring of a first-cousin union (F = 0.0625) but it is necessary to take into account that the proportion of the genome identical by descent has a large stochastic variation (Carothers et al., 2006). Moreover, the individuals analyzed were those children of first cousins presenting a genetic disorder so that they were a biased sample of a first-cousin progeny. Through the genome scan technology, several studies have shown that extended tracts of genomic homozygosity are globally widespread in many human populations and they provide valuable information of a population´s demographic history such as past consanguinity and population isolation (Kirin et al., 2010; Nalls et al., 2009).

Autozygosity has practical implications for the identification of human disease genes. Homozygosity mapping is the method of choice for mapping human genes that cause rare recessive Mendelian diseases (Botstein & Risch, 2003; Lander and Botstein, 1987). The method consists of searching for a region of the genome that is autozygous in individuals affected by a given disease from consanguineous families. Thus, the disease locus is detected on the basis that the adjacent region will be homozygous by descent in such inbred individuals. The method is also known as autozygosity or consanguinity mapping and has the advantage that relatively few individuals are required. Homozygosity mapping became practical with the discovery of multiple highly polymorphic markers. The first polymorphic markers used were restriction length polymorphisms, subsequently, short sequence repeats and more recently single nucleotide polymorphisms (SNPs) (Woods et al., 2004). Since 1995 until 2003, nearly 200 studies were published in which homozygosity mapping was used to map human genes causing rare recessive disease phenotypes (Botstein and Risch, 2003).

Inbreeding and Genetic Disorder 25

total marriages are consanguineous; with a preference for double first cousin and second cousin, even though there is a great heterogeneity among populations due to different beliefs and cultural backgrounds. The most consanguineous populations studied so far are found in Asia. In Afghanistan, for instance, 55.4% of the matrimonies in the country are between relatives. In the traditional nomadic *Qashqai* from Iran up to 73.5% of the marriages are consanguineous. Table 1 shows the results of a 10-year study performed in the cities of Bangalore and Mysore in the State of Karnataka, South India that involved a total number of 107,518 marriages (Bittles et al., 1991). For the entire sample, 31.4% of all unions were consanguineous and the mean consanguinity measured as the average inbreeding coefficient (α = ΣpiFi) was 0.0299. Consanguinity was more prevalent among Hindus with 33.5% of consanguineous marriages and they had the highest average consanguinity (α = 0.0333) because the high rate of uncle-niece marriages. In the Muslim community, 23.7% of marriages were consanguineous with an average consanguinity of 0.0160. Muslims avoid uncle-niece marriage because this type of consanguineous union is proscribed by the Quran. First-cousin marriage was the most prevalent consanguineous union in the Muslim community. Christians in Karnataka presented an 18.6% of consanguineous marriages including both uncle-niece and first cousin marriages with an average consanguinity of 0.0173. Unlike Asia and Africa, Europe and America seems to have a refusal attitude over consanguinity since most populations present less than 10% of their matrimonies being consanguineous (Figure 1). In Europe, consanguinity appears to be more prevalent in Southern countries such as Spain or Italy where consanguineous unions represent 3.5% and 1.6% of total marriages respectively. North European countries appeared to have lower incidence of consanguineous marriages, for instance, 0.3% in Great Britain, 0.4 in Norway or 0.4 in Hungary. The American continent seems to be very similar to Europe. In South America, the average of consanguineous marriages in 39 Brazilian populations is 4.2%, with different preferences for union type depending on the community. In Colombia and Ecuador, data from six populations indicate that consanguineous marriages represent the 2.8% and 2.9% respectively, of total marriages. In USA, it has been estimated that only 0.2% of total marriages are consanguineous from a couple of populations from Wisconsin, a sample of all-USA of more than 130,000 people and a couple of minorities populations.

 Religion Type of marriage Hindu Muslim Christian

(F=0) 62.0 72.9 78.1

(F<0.0156) 4.5 3.5 3.4

(F=0.0156) 1.7 2.5 1.6

(F=0.0625) 10.8 17.5 6.8

(F=0.125) 21.0 3.7 10.2 Table 1. Consanguineous marriages (%) and religion in Karnataka, India. The inbreeding coefficient (F) of the offspring for each type of marriage is given. (Form Bittles et al. 1991)

Non-Consanguineous

Beyond second cousin

Second Cousin

First cousin

Uncle-niece

Recently, the strategy of homozygosity mapping has been extended to analyze single individuals by means of high-density genome scans in order to circumvent the limitation of the number of consanguineous families required for the analysis (Hildebrandt et al., 2009). Homozygosity mapping in single individuals that bear homozygous disease gene mutations by descent from an unknown distant ancestor may provide a single genomic candidate region small enough to allow successful gene identification. Remote consanguinity will lead in the affected individual to fewer and shorter homozygous intervals that contain the disease gene. The analysis through homozygosity mapping of 72 individuals with known homozygous mutations in 13 different recessive genes detected, by using a whole-genome scan of 250,000 SNPs, the disease gene in homozygous segments as short as 2 Mb containing an average of only 16 candidate genes (Hildebrandt et al., 2009).

### **2. Consanguineous marriage around the world**

Studies on the prevalence and pattern of consanguineous marriages in human populations show that consanguinity is widely extended in many current populations around the world (Bittles, 2001, 2006). In demographic literature a consanguineous marriage is usually defined as a union between individuals who are related as second cousin or closer (F ≥ 0.0156 for their progeny). This arbitrary limit is based in the perception that an inbreeding coefficient below 0.0156 has biological effects not very different from those found in the general population. At the present time, it has been estimated that the consanguineous couples and their progeny suppose 10.4% of the global population (Bittles and Black, 2010). Marriage between first cousins (F = 0.0625 for their progeny) is considered the most prevalent consanguineous union in human populations. Also, matrimony among two second cousins is very frequent. Globally, unions between uncle and nice or double first cousins (F = 0.125 for their progeny, in both cases) are less common; however it is possible to find certain populations with high incidence of uncle-nice unions. Regarding incestuous unions between biological first degree relatives (father-daughter, mother-son, brother-sister; F =0.25 for their progeny, in the three cases), a universal taboo for nuclear family mating exists in all societies. Incest is illegal in many countries and specifically forbidden by the big five religions, even though incestuous practices can be found sporadically in any society. The prevalence of incest around the world is difficult to establish due to its illegality and association with social stigma (Bennett et al, 2002).

Consanguinity is not homogeneously distributed around the globe, so that it is possible to associate certain geographic areas with high consanguinity incidence. The distribution of consanguineous marriages in four continents (Europe, America, Asia and Africa) obtained from data available at the web portal Consanguinity/Endogamy Resource (consang.net) is shown in Figure 1. This web portal compiles data of global prevalence of consanguinity from more than two hundred studies performed since middle of the 20th century. These studies gathered marital information through household and school, pedigree analysis, civil registrations and census, obstetric and hospital inpatients, as well as religious dispensations for more than 450 populations from 90 countries. In this data set, 63.0% of the populations are from Asia, 19.1% from South America, 8.9% from Europe, 6.4% from Africa and just 2.6% from Central and North America. In general, a more favorable attitude towards consanguinity is found in populations from Asia and Africa. In Sub-Saharan Africa, for example, 35 to 50% of the marriages are between relatives. In Egypt, on average, 42.1% of

Recently, the strategy of homozygosity mapping has been extended to analyze single individuals by means of high-density genome scans in order to circumvent the limitation of the number of consanguineous families required for the analysis (Hildebrandt et al., 2009). Homozygosity mapping in single individuals that bear homozygous disease gene mutations by descent from an unknown distant ancestor may provide a single genomic candidate region small enough to allow successful gene identification. Remote consanguinity will lead in the affected individual to fewer and shorter homozygous intervals that contain the disease gene. The analysis through homozygosity mapping of 72 individuals with known homozygous mutations in 13 different recessive genes detected, by using a whole-genome scan of 250,000 SNPs, the disease gene in homozygous segments as short as 2 Mb

Studies on the prevalence and pattern of consanguineous marriages in human populations show that consanguinity is widely extended in many current populations around the world (Bittles, 2001, 2006). In demographic literature a consanguineous marriage is usually defined as a union between individuals who are related as second cousin or closer (F ≥ 0.0156 for their progeny). This arbitrary limit is based in the perception that an inbreeding coefficient below 0.0156 has biological effects not very different from those found in the general population. At the present time, it has been estimated that the consanguineous couples and their progeny suppose 10.4% of the global population (Bittles and Black, 2010). Marriage between first cousins (F = 0.0625 for their progeny) is considered the most prevalent consanguineous union in human populations. Also, matrimony among two second cousins is very frequent. Globally, unions between uncle and nice or double first cousins (F = 0.125 for their progeny, in both cases) are less common; however it is possible to find certain populations with high incidence of uncle-nice unions. Regarding incestuous unions between biological first degree relatives (father-daughter, mother-son, brother-sister; F =0.25 for their progeny, in the three cases), a universal taboo for nuclear family mating exists in all societies. Incest is illegal in many countries and specifically forbidden by the big five religions, even though incestuous practices can be found sporadically in any society. The prevalence of incest around the world is difficult to establish due to its illegality and

Consanguinity is not homogeneously distributed around the globe, so that it is possible to associate certain geographic areas with high consanguinity incidence. The distribution of consanguineous marriages in four continents (Europe, America, Asia and Africa) obtained from data available at the web portal Consanguinity/Endogamy Resource (consang.net) is shown in Figure 1. This web portal compiles data of global prevalence of consanguinity from more than two hundred studies performed since middle of the 20th century. These studies gathered marital information through household and school, pedigree analysis, civil registrations and census, obstetric and hospital inpatients, as well as religious dispensations for more than 450 populations from 90 countries. In this data set, 63.0% of the populations are from Asia, 19.1% from South America, 8.9% from Europe, 6.4% from Africa and just 2.6% from Central and North America. In general, a more favorable attitude towards consanguinity is found in populations from Asia and Africa. In Sub-Saharan Africa, for example, 35 to 50% of the marriages are between relatives. In Egypt, on average, 42.1% of

containing an average of only 16 candidate genes (Hildebrandt et al., 2009).

**2. Consanguineous marriage around the world** 

association with social stigma (Bennett et al, 2002).

total marriages are consanguineous; with a preference for double first cousin and second cousin, even though there is a great heterogeneity among populations due to different beliefs and cultural backgrounds. The most consanguineous populations studied so far are found in Asia. In Afghanistan, for instance, 55.4% of the matrimonies in the country are between relatives. In the traditional nomadic *Qashqai* from Iran up to 73.5% of the marriages are consanguineous. Table 1 shows the results of a 10-year study performed in the cities of Bangalore and Mysore in the State of Karnataka, South India that involved a total number of 107,518 marriages (Bittles et al., 1991). For the entire sample, 31.4% of all unions were consanguineous and the mean consanguinity measured as the average inbreeding coefficient (α = ΣpiFi) was 0.0299. Consanguinity was more prevalent among Hindus with 33.5% of consanguineous marriages and they had the highest average consanguinity (α = 0.0333) because the high rate of uncle-niece marriages. In the Muslim community, 23.7% of marriages were consanguineous with an average consanguinity of 0.0160. Muslims avoid uncle-niece marriage because this type of consanguineous union is proscribed by the Quran. First-cousin marriage was the most prevalent consanguineous union in the Muslim community. Christians in Karnataka presented an 18.6% of consanguineous marriages including both uncle-niece and first cousin marriages with an average consanguinity of 0.0173. Unlike Asia and Africa, Europe and America seems to have a refusal attitude over consanguinity since most populations present less than 10% of their matrimonies being consanguineous (Figure 1). In Europe, consanguinity appears to be more prevalent in Southern countries such as Spain or Italy where consanguineous unions represent 3.5% and 1.6% of total marriages respectively. North European countries appeared to have lower incidence of consanguineous marriages, for instance, 0.3% in Great Britain, 0.4 in Norway or 0.4 in Hungary. The American continent seems to be very similar to Europe. In South America, the average of consanguineous marriages in 39 Brazilian populations is 4.2%, with different preferences for union type depending on the community. In Colombia and Ecuador, data from six populations indicate that consanguineous marriages represent the 2.8% and 2.9% respectively, of total marriages. In USA, it has been estimated that only 0.2% of total marriages are consanguineous from a couple of populations from Wisconsin, a sample of all-USA of more than 130,000 people and a couple of minorities populations.


Table 1. Consanguineous marriages (%) and religion in Karnataka, India. The inbreeding coefficient (F) of the offspring for each type of marriage is given. (Form Bittles et al. 1991)

Inbreeding and Genetic Disorder 27

61.9% of consanguineous unions, and Christian Anabaptists Mennonites from Kansas with

Fig. 2. Percentage of consanguineous marriages in minorities, isolates and migrant

Consanguineous marriage is favored in many societies, especially from Asia and Africa, as a mean of preserving family goods and lands (Bittles, 2006). Social and cultural advantages such as strengthened family ties, enhanced female autonomy, more stable marital relationships, greater compatibility with in-laws, lower domestic violence, lower divorced rates or simplified premarital arrangements along with economic considerations may be the actual motives for the preference of consanguineous unions particularly in rural societies. Furthermore, consanguinity was also common among European royalty and aristocracy up until the middle of 1900s, and nowadays is still present punctually in rich families and aristocracy. Consanguineous marriage cannot be restricted to any specific society or religion, although the attitude of the different societies toward consanguinity is highly influenced by religious beliefs or creeds (Bittles et al., 1991). Marriage regulations in Islam permit firstcousin and double first-cousin unions and the Quran expressly prohibits uncle-nice matrimonies. Unlike Islam, Hinduism attitude over consanguinity is non-uniform. The Aryan Hindus of northern India prohibit marriages between relatives for approximately seven generations. By comparison, Dravidian Hindus of south India strongly favor marriage between first cousin of the type mother's brother's daughter, and particularly in the states of Andhra Pradesh, Karnataka and Tamil Nadu uncle-nice marriages are also widely contracted (Table 1). Buddhism and its two major branches Theravada and Mahayana which are spread through all Asia prohibit any type of consanguineous relationship in marriage. Christianity and Judaism attitude over consanguinity is based in the book of Leviticus, third book of the Hebrew Bible and Torah. Many examples of consanguineous unions are cited in the biblical texts, for example Abraham and Sarah, identified as half siblings (*Genesis* 20:12)

33.0% of their matrimonies being between relatives.

populations around the world. (Data from consag.net)

Fig. 1. Percentage of consanguineous marriages in human populations from four continents. (Data from consag.net)

Consanguinity studies in population minorities, isolates and migrants reveal that there is a great heterogeneity between close communities around the world. Figure 2 shows the incidence of consanguineous marriages in population minorities, isolates and migrants for more than 100 populations from 22 countries (data from consang.net). In the nomadic Bedouin Baggara Arabs community that inhabits Nyertiti state in Sudan, for example, 71.7% of their matrimonies are consanguineous marriages, with a clear preference for first-cousin unions. In Japan, where only 8.98% of all marriages are consanguineous, an isolate population as the Arihara community in the Kansai region presented 47.8% of consanguineous marriages. Samaritan isolate community from Israel has a clear preference for first cousin unions. While in Israel other Hebrew communities have on average 7.6% of consanguineous unions, Samaritans have 46.4%. In Europe, some migrant populations maintain their traditions while living abroad. For instance, Pakistani community of Great Britain living in Bradford has 67% of consanguineous marriages with average consanguinity being 0.0377. Pakistani community in Norway also has high incidence of consanguineous unions since 31% of their marriages are consanguineous. In the Unites States, where first cousin marriages are criminal offence in eight states and illegal in a further 31 states, exceptions have been incorporated to permit uncle-niece marriage within the Jewish community of Rhode Island. High incidence of consanguineous marriages has been reported in isolates minorities from USA such as a Gypsy community from Boston with

Fig. 1. Percentage of consanguineous marriages in human populations from four continents.

Consanguinity studies in population minorities, isolates and migrants reveal that there is a great heterogeneity between close communities around the world. Figure 2 shows the incidence of consanguineous marriages in population minorities, isolates and migrants for more than 100 populations from 22 countries (data from consang.net). In the nomadic Bedouin Baggara Arabs community that inhabits Nyertiti state in Sudan, for example, 71.7% of their matrimonies are consanguineous marriages, with a clear preference for first-cousin unions. In Japan, where only 8.98% of all marriages are consanguineous, an isolate population as the Arihara community in the Kansai region presented 47.8% of consanguineous marriages. Samaritan isolate community from Israel has a clear preference for first cousin unions. While in Israel other Hebrew communities have on average 7.6% of consanguineous unions, Samaritans have 46.4%. In Europe, some migrant populations maintain their traditions while living abroad. For instance, Pakistani community of Great Britain living in Bradford has 67% of consanguineous marriages with average consanguinity being 0.0377. Pakistani community in Norway also has high incidence of consanguineous unions since 31% of their marriages are consanguineous. In the Unites States, where first cousin marriages are criminal offence in eight states and illegal in a further 31 states, exceptions have been incorporated to permit uncle-niece marriage within the Jewish community of Rhode Island. High incidence of consanguineous marriages has been reported in isolates minorities from USA such as a Gypsy community from Boston with

(Data from consag.net)

61.9% of consanguineous unions, and Christian Anabaptists Mennonites from Kansas with 33.0% of their matrimonies being between relatives.

Fig. 2. Percentage of consanguineous marriages in minorities, isolates and migrant populations around the world. (Data from consag.net)

Consanguineous marriage is favored in many societies, especially from Asia and Africa, as a mean of preserving family goods and lands (Bittles, 2006). Social and cultural advantages such as strengthened family ties, enhanced female autonomy, more stable marital relationships, greater compatibility with in-laws, lower domestic violence, lower divorced rates or simplified premarital arrangements along with economic considerations may be the actual motives for the preference of consanguineous unions particularly in rural societies. Furthermore, consanguinity was also common among European royalty and aristocracy up until the middle of 1900s, and nowadays is still present punctually in rich families and aristocracy. Consanguineous marriage cannot be restricted to any specific society or religion, although the attitude of the different societies toward consanguinity is highly influenced by religious beliefs or creeds (Bittles et al., 1991). Marriage regulations in Islam permit firstcousin and double first-cousin unions and the Quran expressly prohibits uncle-nice matrimonies. Unlike Islam, Hinduism attitude over consanguinity is non-uniform. The Aryan Hindus of northern India prohibit marriages between relatives for approximately seven generations. By comparison, Dravidian Hindus of south India strongly favor marriage between first cousin of the type mother's brother's daughter, and particularly in the states of Andhra Pradesh, Karnataka and Tamil Nadu uncle-nice marriages are also widely contracted (Table 1). Buddhism and its two major branches Theravada and Mahayana which are spread through all Asia prohibit any type of consanguineous relationship in marriage. Christianity and Judaism attitude over consanguinity is based in the book of Leviticus, third book of the Hebrew Bible and Torah. Many examples of consanguineous unions are cited in the biblical texts, for example Abraham and Sarah, identified as half siblings (*Genesis* 20:12)

Inbreeding and Genetic Disorder 29

when the parents of a patient suffering from a previously unknown disease are consanguineous the diagnosis of a recessive genetic disease is of serious consideration.

unrelated parents

Several studies have reported the occurrence of a number of detrimental health effects in the progeny of consanguineous marriages. In general, the offspring of consanguineous couples presented increased levels of morbidity and significant medical problems such as major malformations, congenital anomaly and structural birth defects in first few days of life (Bennett et al., 2002; Bittles, 2001, 2006). The estimation of the absolute risk for the offspring of consanguineous unions is often very difficult because an important number of factors such as sociodemographic variables, methods of subject ascertainment and others that are influencing the risk of a given population. Many of these non-genetic variables are hardly controlled in the data analysis. A way to circumvent such problems is to compare the risk in the offspring of consanguineous marriages with that corresponding to non-consanguineous unions. In a compilation based on data from a number of studies, the increased risk for a significant birth defect in progeny of a first cousin marriage varied between 1.7 and 2.8% above that of the non-consanguineous population (Bennett et al., 2002). An important number of abnormalities have also been reported in the offspring of first degree incestuous unions. A compilation from data of several studies shows that 11.7% (25/213) of the incestuous progeny presented known autosomal recessive disorders, 16.0% (34/213) congenital malformations, 11.7% (25/213) nonspecific severe intellectual impairment and

In contrast with the extensive evidence on the effect of inbreeding for Mendelian diseases the contribution of consanguinity to complex or multifactorial diseases is less known. There is, however, growing evidence for adverse effects of inbreeding on complex human diseases of public health importance. The relationship between inbreeding and blood pressure (BP), and the related late-onset disease, essential hypertension, has been investigated in isolate populations from Dalmatian islands, Croatia (Rudan et al., 2003b). A strong linear relationship between the inbreeding coefficient (F) and both systolic and diastolic BP among 2760 adult individuals from 25 villages within Croatian island isolates was found. The individual inbreeding coefficient was computed for each study participant based on pedigree information from four to five ancestral generations. An increase in F of 0.01 corresponded to an increase of approximately 3 mm Hg in systolic and 2 mm Hg in diastolic BP, and 10-15 % of the total variation in BP in those populations could be explained by recessive or partially recessive quantitative trait locus (QTL) alleles. It was estimated that several hundred (300-600) recessive QTLs could contribute to BP variation. Moreover, it was inferred that inbreeding accounts for 36 % of all hypertension in those populations. Dalmatian island populations have been also used to investigate the relationship between inbreeding and the prevalence of 10 late onset complex diseases: coronary heart disease,

 Offspring of first cousins

Offspring of

0.01 1:10,000 1:1,400 0.005 1:40,000 1:3,000 0.001 1:1,000,000 1:16,000 Table 2. Risk of affected individual for rare recessive disease in offspring of unrelated and

14.6% (31/213) mild intellectual impairment (Bennett et al., 2002).

Allele frequency (q)

first-cousin parents

or Moses´ parents, related as nephew and aunt (*Exodus* 6:20). However, in the book of Leviticus is expressed that "None of you shall approach any one of his close relatives to uncover nakedness. I am the Lord" (*Leviticus 18:5*). Despite these sentences, the Leviticus has been interpreted in different ways. Judaic lax interpretation of the Leviticus led its followers to permit first-cousin and even uncle-nice unions. Christianity attitude over consanguineous marriage is characterized by its lack of uniformity. Orthodox churches have a strict interpretation of the Leviticus since they prohibit consanguineous marriage of any form. For members of the Latin Church the effect of the rules addressed in the Leviticus was to prohibit marriage with a biological relative usually up to and including third cousin. Dispensation could, however, be granted at Diocesan level for related couples who wished to marry within the prohibited degrees of consanguinity, albeit with payment of an appropriate benefaction to the church. Among the constellation of different churches arose from Reformed Protestant the existing biblical guidelines were generally adopted, although the closest form of approved union usually has been between first cousins. Paradoxically, the highest rates of consanguineous unions historically recorded in Europe, and even nowadays, appear to be in the southern Roman Catholic countries rather than in the northern Protestant European countries. This pattern is followed also by the Catholic countries of South and Central America in comparison with Protestants, Anabaptist, Anglicans and Restorationists from North America.

#### **3. Inbreeding and genetic disease**

In his classic study of inborn errors of metabolism, Archibald Garrod noted that an unusual high proportion of patients with alkaptonuria were progeny of consanguineous marriages. After this observation carried out at the early years of the 20th century, a very large number of studies have consistently shown that recessive traits occur with increased frequency in the progeny of consanguineous mates, and this outcome is one of the most important clinical consequences of inbreeding. In Europe and Japan, for example, the frequency of first-cousin marriages among the parents of affected individuals with recessive traits such as albinism, phenylketonuria, ichthyosis congenital and microcephaly is remarkably higher than frequency of first-cousin marriages in the corresponding general population (Bodmer & Cavalli-Sforza, 1976; pp. 372-377). In general, the rarer the disease, the higher the proportion of consanguineous marriage among the parents of affected individuals. Similarly, the closer the inbreeding, the higher the effect. The genetic explanation for these observations is simple and derives from basic principles of population genetics. In a random mating population, the frequency of recessive homozygotes *aa* will be *q2* for an allele *a* that has frequency *q*, according to the Hardy-Weinberg law. In an inbred population with inbreeding level F, the frequency of recessive homozygotes will be *q2 + (1 – q)qF* and therefore the ratio of the frequency of the homozygote *aa* in an inbred population relative to a random mating one will be *1 + F (1 – q)/ q*. The ratio is very large for low allele frequencies and increases with the level of inbreeding. For example, when F = 1/16 corresponding to the progeny of a first cousin marriage and q = 0.01, there are more than seven times as many affected individuals in the inbred group as in the non-inbred population. For illustrative purposes, Table 2 shows the risk of recessive disease among progeny of first-cousin marriages and among progeny of unrelated parents for three values of allelic frequency. On this rationale, parental consanguinity can be a useful criterion in clinical diagnosis. Thus,

or Moses´ parents, related as nephew and aunt (*Exodus* 6:20). However, in the book of Leviticus is expressed that "None of you shall approach any one of his close relatives to uncover nakedness. I am the Lord" (*Leviticus 18:5*). Despite these sentences, the Leviticus has been interpreted in different ways. Judaic lax interpretation of the Leviticus led its followers to permit first-cousin and even uncle-nice unions. Christianity attitude over consanguineous marriage is characterized by its lack of uniformity. Orthodox churches have a strict interpretation of the Leviticus since they prohibit consanguineous marriage of any form. For members of the Latin Church the effect of the rules addressed in the Leviticus was to prohibit marriage with a biological relative usually up to and including third cousin. Dispensation could, however, be granted at Diocesan level for related couples who wished to marry within the prohibited degrees of consanguinity, albeit with payment of an appropriate benefaction to the church. Among the constellation of different churches arose from Reformed Protestant the existing biblical guidelines were generally adopted, although the closest form of approved union usually has been between first cousins. Paradoxically, the highest rates of consanguineous unions historically recorded in Europe, and even nowadays, appear to be in the southern Roman Catholic countries rather than in the northern Protestant European countries. This pattern is followed also by the Catholic countries of South and Central America in comparison with Protestants, Anabaptist,

In his classic study of inborn errors of metabolism, Archibald Garrod noted that an unusual high proportion of patients with alkaptonuria were progeny of consanguineous marriages. After this observation carried out at the early years of the 20th century, a very large number of studies have consistently shown that recessive traits occur with increased frequency in the progeny of consanguineous mates, and this outcome is one of the most important clinical consequences of inbreeding. In Europe and Japan, for example, the frequency of first-cousin marriages among the parents of affected individuals with recessive traits such as albinism, phenylketonuria, ichthyosis congenital and microcephaly is remarkably higher than frequency of first-cousin marriages in the corresponding general population (Bodmer & Cavalli-Sforza, 1976; pp. 372-377). In general, the rarer the disease, the higher the proportion of consanguineous marriage among the parents of affected individuals. Similarly, the closer the inbreeding, the higher the effect. The genetic explanation for these observations is simple and derives from basic principles of population genetics. In a random mating population, the frequency of recessive homozygotes *aa* will be *q2* for an allele *a* that has frequency *q*, according to the Hardy-Weinberg law. In an inbred population with inbreeding level F, the frequency of recessive homozygotes will be *q2 + (1 – q)qF* and therefore the ratio of the frequency of the homozygote *aa* in an inbred population relative to a random mating one will be *1 + F (1 – q)/ q*. The ratio is very large for low allele frequencies and increases with the level of inbreeding. For example, when F = 1/16 corresponding to the progeny of a first cousin marriage and q = 0.01, there are more than seven times as many affected individuals in the inbred group as in the non-inbred population. For illustrative purposes, Table 2 shows the risk of recessive disease among progeny of first-cousin marriages and among progeny of unrelated parents for three values of allelic frequency. On this rationale, parental consanguinity can be a useful criterion in clinical diagnosis. Thus,

Anglicans and Restorationists from North America.

**3. Inbreeding and genetic disease** 


when the parents of a patient suffering from a previously unknown disease are consanguineous the diagnosis of a recessive genetic disease is of serious consideration.

Table 2. Risk of affected individual for rare recessive disease in offspring of unrelated and first-cousin parents

Several studies have reported the occurrence of a number of detrimental health effects in the progeny of consanguineous marriages. In general, the offspring of consanguineous couples presented increased levels of morbidity and significant medical problems such as major malformations, congenital anomaly and structural birth defects in first few days of life (Bennett et al., 2002; Bittles, 2001, 2006). The estimation of the absolute risk for the offspring of consanguineous unions is often very difficult because an important number of factors such as sociodemographic variables, methods of subject ascertainment and others that are influencing the risk of a given population. Many of these non-genetic variables are hardly controlled in the data analysis. A way to circumvent such problems is to compare the risk in the offspring of consanguineous marriages with that corresponding to non-consanguineous unions. In a compilation based on data from a number of studies, the increased risk for a significant birth defect in progeny of a first cousin marriage varied between 1.7 and 2.8% above that of the non-consanguineous population (Bennett et al., 2002). An important number of abnormalities have also been reported in the offspring of first degree incestuous unions. A compilation from data of several studies shows that 11.7% (25/213) of the incestuous progeny presented known autosomal recessive disorders, 16.0% (34/213) congenital malformations, 11.7% (25/213) nonspecific severe intellectual impairment and 14.6% (31/213) mild intellectual impairment (Bennett et al., 2002).

In contrast with the extensive evidence on the effect of inbreeding for Mendelian diseases the contribution of consanguinity to complex or multifactorial diseases is less known. There is, however, growing evidence for adverse effects of inbreeding on complex human diseases of public health importance. The relationship between inbreeding and blood pressure (BP), and the related late-onset disease, essential hypertension, has been investigated in isolate populations from Dalmatian islands, Croatia (Rudan et al., 2003b). A strong linear relationship between the inbreeding coefficient (F) and both systolic and diastolic BP among 2760 adult individuals from 25 villages within Croatian island isolates was found. The individual inbreeding coefficient was computed for each study participant based on pedigree information from four to five ancestral generations. An increase in F of 0.01 corresponded to an increase of approximately 3 mm Hg in systolic and 2 mm Hg in diastolic BP, and 10-15 % of the total variation in BP in those populations could be explained by recessive or partially recessive quantitative trait locus (QTL) alleles. It was estimated that several hundred (300-600) recessive QTLs could contribute to BP variation. Moreover, it was inferred that inbreeding accounts for 36 % of all hypertension in those populations. Dalmatian island populations have been also used to investigate the relationship between inbreeding and the prevalence of 10 late onset complex diseases: coronary heart disease,

Inbreeding and Genetic Disorder 31

neonatal sepsis compared to the control sample constituted by 137 age-matched, healthy children (Lyons et al., 2009a). Of a total number of 134 microsatellite markers analyzed, homozygosity was strongly associated with mortality at five markers. These results indicate that homozygosity significantly contribute to the risk of childhood death due to invasive

> Moderate Inbreeding (Mean F=0.013)

Low Inbreeding (Mean F=0.006)

High

Inbreeding (Mean F=0.036)

Coronary heart disease 13.28\*\*\* 11.95\*\*\* 11.23 Stroke 02.43\*\*\* 02.79\*\*\* 01.73 Cancer 14.54\*\*\* 13.44\*\*\* 01.93 Schizophrenia 11.23\*\*\* 10.96\*\*\* 00.14 Uni/bipolar depression 10.26\*\*\* 17.63\*\*\* 04.51 Asthma 03.63\*\*\* 02.64\*\*\* 02.60 Tipe II diabetes 06.02\*\*\* 07.35\*\*\* 06.77 Gout 09.25\*\*\* 07.19\*\*\* 03.96 Peptic ulcer 06.92\*\*\* 04.29\*\*\* 02.18 Epilepsy 01.47\*\*\* 00.78\*\*\* 00.31 Statistically significance (P values) in highly and moderately inbred groups is calculated against the low

Table 3. Prevalence (%) of 10 complex diseases in groups of villages with relatively "high", "moderate" and "low" inbreeding coefficient (F) in Dalmatia islands, Croatia. (From Rudan

One of the adverse effects of consanguineous mating is the phenomenon of inbreeding depression. In population genetics, inbreeding depression is usually defined as the decreased fitness of offspring from related parents (Charlesworth & Willis, 2009). Inbreeding depression occurs in many species of animal and plants as well as in humans and is caused by increased homozygosity of individuals. There are two major hypotheses to explain how increased homozygosity can lower fitness. The "overdominance hypothesis" suggests that heterozygotes at loci determining fitness are superior to homozygotes for either allele so that heterozygote advantage (overdominance) is responsible for inbreeding depression. The "partial dominance hypothesis" assumes that inbreeding depression is caused by recessive or partially recessive deleterious alleles maintained in the population at low frequencies by mutation-selection balance. A number of studies on the genetics of quantitative fitness traits in *Drosophila* and other species suggest that inbreeding depression is predominantly caused by deleterious alleles generated by mutation and kept at low frequency in the population by natural selection, even though some alleles at higher frequencies maintained by some form of balancing selection such as heterozygote advantage or temporal, spatial or frequency-dependent selection could be also involved (Charlesworth

The first experimental research on the harmful effects of consanguinity including inbreeding depression was performed by Charles Darwin and was published in his book "*The effects of* 

bacterial disease.

Disease

et al., 2003a)

**4. Inbreeding depression** 

inbreeding group: \* P<0.05; \*\* P<0.01; \*\*\* P<0.001

& Charlesworth, 1999; Charlesworth & Willis, 2009).

stroke, cancer, schizophrenia, epilepsy, uni/bipolar depression, asthma, adult type diabetes, gout and peptic ulcer, which are commonly occurring disorders in those islands (Rudan et al., 2003a). The study was carried out in 14 isolate villages on three neighboring islands in middle Dalmatia which present a wide range of levels of inbreeding and endogamy, and relative uniformity of environment so that the potential effects of inbreeding on those complex diseases may be detected. Disease prevalence was investigated by comparisons between villages grouped by the level of inbreeding as high (average F = 0.036), moderate (average F = 0.013) and low (average F = 0.006). An increase in disease prevalence across villages associated with an increase in average inbreeding coefficient was observed for gout, depression, peptic ulcer, schizophrenia, cancer, epilepsy, coronary heart disease, stroke and asthma (the last three not statistically significant) but not for type 2 diabetes (Table 3). The results indicated that between 23 % and 48 % of the incidence of these disorders in the population sample (other than type 2 diabetes) could be attributed to inbreeding. These findings provide indirect evidence in support of a major polygenic component to disease susceptibility due to many deleterious recessive alleles located throughout the genome. Rudan et al. (2003a) have suggested that the genetic component of late onset diseases may be caused by large number of rare variants in numerous genes maintained at low frequency in populations by mutation-selection balance, according to the common disease/rare variant (CD/RV) hypothesis (Wright et al., 2003). From this point of view, the study of inbred populations could be very useful in the detection of genetic effects on complex disease since inbred individuals will show stronger phenotypic effects compared with outbred individuals, where most alleles are present in heterozygotes (Rudan et al., 2003b).

A number of evidences suggest that inbreeding is also an important risk factor in susceptibility to infectious diseases in humans. Association between inbreeding and susceptibility to infectious disease has been investigated through microsatellite genome scan data for tuberculosis (TB) in The Gambia, leprosy in India and persistent hepatitis B virus infection both in The Gambia and Italy (Lyons et al., 2009b). In this study, inbreeding coefficients were estimated from correlations in heterozygosity among markers because genealogical information was not available for the studied individuals; r2 values between heterozygosities were calculated from two sets of randomly selected unlinked markers. In The Gambia, where the frequency of first-cousin marriage is approximately 30%, the correlations in heterozygosity among markers were larger in affected individuals than in unaffected ones for both hepatitis and TB. This result suggests that inbred individuals are more common among the infected cases for both hepatitis and TB and, therefore, consanguinity appears significantly to increase the risk of these two major infectious causes of death in humans. Significant differences in r2 values between affected and unaffected individuals were not found for persistent hepatitis in the Italian genome scan, probably due to the low levels of inbreeding in that population. Correlations in heterozygosity among markers were not different between affected and unaffected individuals for leprosy in India, where the frequency of consanguineous marriages is high, suggesting no effect of inbreeding on this infectious disease. Furthermore, evidence for an association between infectious disease and homozygosity has been also reported. In a case-control study of fatal invasive bacterial diseases in Kenyan children that was performed by using a genome-wide scan with microsatellite markers, homozygosity was significantly increased in 148 children aged <13 years who died of invasive bacterial diseases such as bacteraemia, meningitis and

stroke, cancer, schizophrenia, epilepsy, uni/bipolar depression, asthma, adult type diabetes, gout and peptic ulcer, which are commonly occurring disorders in those islands (Rudan et al., 2003a). The study was carried out in 14 isolate villages on three neighboring islands in middle Dalmatia which present a wide range of levels of inbreeding and endogamy, and relative uniformity of environment so that the potential effects of inbreeding on those complex diseases may be detected. Disease prevalence was investigated by comparisons between villages grouped by the level of inbreeding as high (average F = 0.036), moderate (average F = 0.013) and low (average F = 0.006). An increase in disease prevalence across villages associated with an increase in average inbreeding coefficient was observed for gout, depression, peptic ulcer, schizophrenia, cancer, epilepsy, coronary heart disease, stroke and asthma (the last three not statistically significant) but not for type 2 diabetes (Table 3). The results indicated that between 23 % and 48 % of the incidence of these disorders in the population sample (other than type 2 diabetes) could be attributed to inbreeding. These findings provide indirect evidence in support of a major polygenic component to disease susceptibility due to many deleterious recessive alleles located throughout the genome. Rudan et al. (2003a) have suggested that the genetic component of late onset diseases may be caused by large number of rare variants in numerous genes maintained at low frequency in populations by mutation-selection balance, according to the common disease/rare variant (CD/RV) hypothesis (Wright et al., 2003). From this point of view, the study of inbred populations could be very useful in the detection of genetic effects on complex disease since inbred individuals will show stronger phenotypic effects compared with outbred individuals, where most alleles are present in heterozygotes (Rudan et al.,

A number of evidences suggest that inbreeding is also an important risk factor in susceptibility to infectious diseases in humans. Association between inbreeding and susceptibility to infectious disease has been investigated through microsatellite genome scan data for tuberculosis (TB) in The Gambia, leprosy in India and persistent hepatitis B virus infection both in The Gambia and Italy (Lyons et al., 2009b). In this study, inbreeding coefficients were estimated from correlations in heterozygosity among markers because genealogical information was not available for the studied individuals; r2 values between heterozygosities were calculated from two sets of randomly selected unlinked markers. In The Gambia, where the frequency of first-cousin marriage is approximately 30%, the correlations in heterozygosity among markers were larger in affected individuals than in unaffected ones for both hepatitis and TB. This result suggests that inbred individuals are more common among the infected cases for both hepatitis and TB and, therefore, consanguinity appears significantly to increase the risk of these two major infectious causes of death in humans. Significant differences in r2 values between affected and unaffected individuals were not found for persistent hepatitis in the Italian genome scan, probably due to the low levels of inbreeding in that population. Correlations in heterozygosity among markers were not different between affected and unaffected individuals for leprosy in India, where the frequency of consanguineous marriages is high, suggesting no effect of inbreeding on this infectious disease. Furthermore, evidence for an association between infectious disease and homozygosity has been also reported. In a case-control study of fatal invasive bacterial diseases in Kenyan children that was performed by using a genome-wide scan with microsatellite markers, homozygosity was significantly increased in 148 children aged <13 years who died of invasive bacterial diseases such as bacteraemia, meningitis and

2003b).

neonatal sepsis compared to the control sample constituted by 137 age-matched, healthy children (Lyons et al., 2009a). Of a total number of 134 microsatellite markers analyzed, homozygosity was strongly associated with mortality at five markers. These results indicate that homozygosity significantly contribute to the risk of childhood death due to invasive bacterial disease.


Statistically significance (P values) in highly and moderately inbred groups is calculated against the low inbreeding group: \* P<0.05; \*\* P<0.01; \*\*\* P<0.001

Table 3. Prevalence (%) of 10 complex diseases in groups of villages with relatively "high", "moderate" and "low" inbreeding coefficient (F) in Dalmatia islands, Croatia. (From Rudan et al., 2003a)

### **4. Inbreeding depression**

One of the adverse effects of consanguineous mating is the phenomenon of inbreeding depression. In population genetics, inbreeding depression is usually defined as the decreased fitness of offspring from related parents (Charlesworth & Willis, 2009). Inbreeding depression occurs in many species of animal and plants as well as in humans and is caused by increased homozygosity of individuals. There are two major hypotheses to explain how increased homozygosity can lower fitness. The "overdominance hypothesis" suggests that heterozygotes at loci determining fitness are superior to homozygotes for either allele so that heterozygote advantage (overdominance) is responsible for inbreeding depression. The "partial dominance hypothesis" assumes that inbreeding depression is caused by recessive or partially recessive deleterious alleles maintained in the population at low frequencies by mutation-selection balance. A number of studies on the genetics of quantitative fitness traits in *Drosophila* and other species suggest that inbreeding depression is predominantly caused by deleterious alleles generated by mutation and kept at low frequency in the population by natural selection, even though some alleles at higher frequencies maintained by some form of balancing selection such as heterozygote advantage or temporal, spatial or frequency-dependent selection could be also involved (Charlesworth & Charlesworth, 1999; Charlesworth & Willis, 2009).

The first experimental research on the harmful effects of consanguinity including inbreeding depression was performed by Charles Darwin and was published in his book "*The effects of* 

Inbreeding and Genetic Disorder 33

families with F = 0.0625 - 0.0630 and 16.67% in one family with F = 0.1255. Regarding the own Darwin family, the offspring of Charles and Emma had an inbreeding coefficient of 0.0630 and presented one of the highest mortalities (30.0%) among the 25 Darwin/Wedgwood families investigated. Of the three Darwin´s children that died before adulthood (Anne Elizabeth, Mary Eleanor and Charles Waring), the cause of death is known for two of them. Anne Elizabeth (1841-1851), Darwin´s second child and first daughter, probably died of child tuberculosis and Charles Waring (1856-1859), the last Darwin´s child, died of scarlet fever. The recent evidence of inbreeding as an important risk factor in susceptibility to infectious diseases such as hepatitis and tuberculosis as well as the association between homozygosity and childhood mortality resulting from invasive bacterial disease (Lyons et al., 2009a,b) gives a strong support to the hypothesis that inbreeding was directly involved in a number of health problems in Darwin´s children. Furthermore, it has been also suggested that inbreeding might have influenced the fertility of Darwin´s children (Golubovsky, 2008). It is known that three of Charles Darwin´s six children with long-term marriage history suffered from infertility (William Erasmus, Henrietta and Leonard) and a likely cause of that unexplained infertility might be the segregation of some recessive autosomal meiotic mutation manifested in Darwin progeny as

At the present time, there is extensive evidence on the harmful effects of inbreeding on survival before adulthood in humans. Most of this empirical evidence comes from mortality data of the progeny of first cousins as this type of marriage is the most prevalent consanguineous union in human populations. A compilation based on data from 31 studies for various stages of prereproductive mortality showed that the offspring of consanguineous marriages have a higher risk of mortality compared with the offspring of unrelated parents (Khalt & Khoury, 1991; Khoury et al., 1987). The median relative risk for the progeny of first cousin marriages compared with the non consanguineous progeny was 1.41 for prereproductive mortality including all deaths from stillbirths to deaths below 20 years. In other meta-analysis based on data from 38 populations located in eastern and southern Asia, the Middle East, Africa, Europe and South America the progeny of first cousins presented an absolute increase in mortality from birth to a median age of 10 years of 4.4% ± 4.6 (Bittles & Neel, 1994; Figure 5). The most recent compilation on inbreeding depression for survival in humans revealed an absolute increase in mortality from approximately 6 months gestation to an average of 10 years of age of 3.5 % among firstcousin progeny and comprised 69 populations resident in 15 countries located across four continents (Bittles & Black, 2010). It should be emphasized, however, that the above figures represent population averages and they do not reflect the fact that the magnitude of inbreeding depression is highly variable among human populations. Thus, the absolute increase in mortality at first-cousin level varied from nearly zero to approximately 19 % across populations in one of the above mentioned compilations (Bittles & Neel, 1994). Regarding inbreeding depression for high inbreeding levels, the available evidence is at present less abundant than that corresponding to moderate inbreeding from first-cousin marriage. In humans, the most extreme cases of close inbreeding correspond to incestuous unions such as father-daughter, mother-son and brother-sister. Several studies have investigated the adverse effects of inbreeding from children of such incestuous unions, but the results obtained are difficult to interpret because difficulties associated with sample size, unbiased data and suitable controls (Adams & Neel, 1967; Carter, 1967, Seemanová, 1971).

a result of inbreeding.

*cross and self-fertilization in the vegetable kingdom"* (Darwin, 1876). Darwin carried out carefully controlled experiments in the Down House greenhouse that involved selffertilization and outcrossing between unrelated individuals in 57 plant species. In these experiments the offspring of self-fertilized plants were on average shorter, flowered later, weighted less and produced fewer seeds than the progeny of cross-fertilized plants. By these experiments Darwin documented the phenomenon of inbreeding depression for numerous plant species. Darwin´s laborious study on inbreeding had its origin in his interest on plant reproductive systems. In fact, his experiments were performed to explain why numerous plant species have systems that prevent self-fertilization and why reproduction by outcrossing is prevalent in nature. However, it is very likely that Darwin also had a personal interest on this matter. Charles Darwin was married to his first cousin Emma Wedgwood and they had 10 children along their lifetime. Darwin was worried about the health of his children, who were very often ill and three of them died before adulthood. Darwin´s own ill health led him to fear that his children could have inherited his medical problems but he also suspected that his marriage to his first cousin might have caused some of his children´s health problems (Jones, 2008; Moore, 2005). For a long time, it has been commonly accepted that Charles Darwin´s concerns on the harmful effects of first-cousin marriage were unjustified because they were based on the extrapolation from ill-effects of self-fertilization in plants to the outcomes of first-cousin marriage in humans. Nevertheless, recent researches on both survival and fertility in the Darwin/Wedgwood dynasty support the view that inbreeding was effectively involved in a number of health problems of Darwin´s children (Berra et al., 2010; Golubosvky, 2008). First-cousin marriage had a widespread acceptance among the upper middle class of Victorian England in such a way that the firstcousin marriage of Charles and Emma was not unusual in that time. In fact, three of Emma´s brothers were married to relatives: Josiah Wedgwood III married his first cousin Caroline Darwin, who was Charles´s sister, Hensleigh Wedgwood was married to his first cousin Frances MacKintosh and Henry Wedgwood was married to his double first cousin Jessie Wedgwood. All these consanguineous marriages are represented in the pedigree of the Darwin/Wedgwood dynasty shown in Figure 3, which was specifically constructed to compute inbreeding coefficients for Charles Darwin, his progeny and related families combining genealogical information obtained from numerous sources. The inbreeding coefficients computed from the Darwin/Wedgwood pedigree shows that some individuals of the dynasty presented rather high levels of inbreeding. Thus, the children of Henry Wedgwood had a high inbreeding coefficient (F = 0.1255) because their parents were double first cousins. The progeny of both Charles Darwin and Josiah Wedgwood III had a moderate inbreeding coefficient (F = 0.0630), and the progeny of Hensleigh Wedgwood had an inbreeding of 0.0625. Charles Darwin´s mother, Susannah Wedgwood, and her brother, Josiah Wedgwood II, had very low inbreeding values (F = 0.0039). All the remaining individuals in the pedigree depicted in Figure 1 had F = 0, as did Charles Darwin and his father, Robert Darwin. From these data, a statistically significant positive association between child mortality (deaths from birth to 10 years) and inbreeding coefficient was detected in the progeny of 25 marriages belonging to four consecutive generations of the Darwin/Wedgwood dynasty (Berra et al., 2010). Child mortality was clearly higher for those families whose progeny had high inbreeding coefficient (Figure 4). Mean child mortality in progeny of 21 non consanguineous Darwin/Wedgwood marriages was 10.67%, whereas progeny mortality was nearly twice in the consanguineous marriages: 20.00% in those

*cross and self-fertilization in the vegetable kingdom"* (Darwin, 1876). Darwin carried out carefully controlled experiments in the Down House greenhouse that involved selffertilization and outcrossing between unrelated individuals in 57 plant species. In these experiments the offspring of self-fertilized plants were on average shorter, flowered later, weighted less and produced fewer seeds than the progeny of cross-fertilized plants. By these experiments Darwin documented the phenomenon of inbreeding depression for numerous plant species. Darwin´s laborious study on inbreeding had its origin in his interest on plant reproductive systems. In fact, his experiments were performed to explain why numerous plant species have systems that prevent self-fertilization and why reproduction by outcrossing is prevalent in nature. However, it is very likely that Darwin also had a personal interest on this matter. Charles Darwin was married to his first cousin Emma Wedgwood and they had 10 children along their lifetime. Darwin was worried about the health of his children, who were very often ill and three of them died before adulthood. Darwin´s own ill health led him to fear that his children could have inherited his medical problems but he also suspected that his marriage to his first cousin might have caused some of his children´s health problems (Jones, 2008; Moore, 2005). For a long time, it has been commonly accepted that Charles Darwin´s concerns on the harmful effects of first-cousin marriage were unjustified because they were based on the extrapolation from ill-effects of self-fertilization in plants to the outcomes of first-cousin marriage in humans. Nevertheless, recent researches on both survival and fertility in the Darwin/Wedgwood dynasty support the view that inbreeding was effectively involved in a number of health problems of Darwin´s children (Berra et al., 2010; Golubosvky, 2008). First-cousin marriage had a widespread acceptance among the upper middle class of Victorian England in such a way that the firstcousin marriage of Charles and Emma was not unusual in that time. In fact, three of Emma´s brothers were married to relatives: Josiah Wedgwood III married his first cousin Caroline Darwin, who was Charles´s sister, Hensleigh Wedgwood was married to his first cousin Frances MacKintosh and Henry Wedgwood was married to his double first cousin Jessie Wedgwood. All these consanguineous marriages are represented in the pedigree of the Darwin/Wedgwood dynasty shown in Figure 3, which was specifically constructed to compute inbreeding coefficients for Charles Darwin, his progeny and related families combining genealogical information obtained from numerous sources. The inbreeding coefficients computed from the Darwin/Wedgwood pedigree shows that some individuals of the dynasty presented rather high levels of inbreeding. Thus, the children of Henry Wedgwood had a high inbreeding coefficient (F = 0.1255) because their parents were double first cousins. The progeny of both Charles Darwin and Josiah Wedgwood III had a moderate inbreeding coefficient (F = 0.0630), and the progeny of Hensleigh Wedgwood had an inbreeding of 0.0625. Charles Darwin´s mother, Susannah Wedgwood, and her brother, Josiah Wedgwood II, had very low inbreeding values (F = 0.0039). All the remaining individuals in the pedigree depicted in Figure 1 had F = 0, as did Charles Darwin and his father, Robert Darwin. From these data, a statistically significant positive association between child mortality (deaths from birth to 10 years) and inbreeding coefficient was detected in the progeny of 25 marriages belonging to four consecutive generations of the Darwin/Wedgwood dynasty (Berra et al., 2010). Child mortality was clearly higher for those families whose progeny had high inbreeding coefficient (Figure 4). Mean child mortality in progeny of 21 non consanguineous Darwin/Wedgwood marriages was 10.67%, whereas progeny mortality was nearly twice in the consanguineous marriages: 20.00% in those families with F = 0.0625 - 0.0630 and 16.67% in one family with F = 0.1255. Regarding the own Darwin family, the offspring of Charles and Emma had an inbreeding coefficient of 0.0630 and presented one of the highest mortalities (30.0%) among the 25 Darwin/Wedgwood families investigated. Of the three Darwin´s children that died before adulthood (Anne Elizabeth, Mary Eleanor and Charles Waring), the cause of death is known for two of them. Anne Elizabeth (1841-1851), Darwin´s second child and first daughter, probably died of child tuberculosis and Charles Waring (1856-1859), the last Darwin´s child, died of scarlet fever. The recent evidence of inbreeding as an important risk factor in susceptibility to infectious diseases such as hepatitis and tuberculosis as well as the association between homozygosity and childhood mortality resulting from invasive bacterial disease (Lyons et al., 2009a,b) gives a strong support to the hypothesis that inbreeding was directly involved in a number of health problems in Darwin´s children. Furthermore, it has been also suggested that inbreeding might have influenced the fertility of Darwin´s children (Golubovsky, 2008). It is known that three of Charles Darwin´s six children with long-term marriage history suffered from infertility (William Erasmus, Henrietta and Leonard) and a likely cause of that unexplained infertility might be the segregation of some recessive autosomal meiotic mutation manifested in Darwin progeny as a result of inbreeding.

At the present time, there is extensive evidence on the harmful effects of inbreeding on survival before adulthood in humans. Most of this empirical evidence comes from mortality data of the progeny of first cousins as this type of marriage is the most prevalent consanguineous union in human populations. A compilation based on data from 31 studies for various stages of prereproductive mortality showed that the offspring of consanguineous marriages have a higher risk of mortality compared with the offspring of unrelated parents (Khalt & Khoury, 1991; Khoury et al., 1987). The median relative risk for the progeny of first cousin marriages compared with the non consanguineous progeny was 1.41 for prereproductive mortality including all deaths from stillbirths to deaths below 20 years. In other meta-analysis based on data from 38 populations located in eastern and southern Asia, the Middle East, Africa, Europe and South America the progeny of first cousins presented an absolute increase in mortality from birth to a median age of 10 years of 4.4% ± 4.6 (Bittles & Neel, 1994; Figure 5). The most recent compilation on inbreeding depression for survival in humans revealed an absolute increase in mortality from approximately 6 months gestation to an average of 10 years of age of 3.5 % among firstcousin progeny and comprised 69 populations resident in 15 countries located across four continents (Bittles & Black, 2010). It should be emphasized, however, that the above figures represent population averages and they do not reflect the fact that the magnitude of inbreeding depression is highly variable among human populations. Thus, the absolute increase in mortality at first-cousin level varied from nearly zero to approximately 19 % across populations in one of the above mentioned compilations (Bittles & Neel, 1994). Regarding inbreeding depression for high inbreeding levels, the available evidence is at present less abundant than that corresponding to moderate inbreeding from first-cousin marriage. In humans, the most extreme cases of close inbreeding correspond to incestuous unions such as father-daughter, mother-son and brother-sister. Several studies have investigated the adverse effects of inbreeding from children of such incestuous unions, but the results obtained are difficult to interpret because difficulties associated with sample size, unbiased data and suitable controls (Adams & Neel, 1967; Carter, 1967, Seemanová, 1971).

Fig. 3. Pedigree of the Darwin/Wedgwood dynasty. (From Berra et al., 2010)

Inbreeding and Genetic Disorder 35

Fig. 4. Mortality from birth to 10 years and inbreeding coefficient (F) in offspring of 25 marriages of the Darwin/Wedgwood dynasty (n = number of marriages) (Data from Berra

Fig. 5. Mortality in offspring of first cousin and non-consanguineous marriages in Brazil (average of 8 populations), Pakistan (average of 9 populations), India (average of 10

populations), Japan (average of 7 populations) and France (average of 2 populations). (Data

et al., 2010)

from Bittles & Neel, 1994)

Fig. 3. Pedigree of the Darwin/Wedgwood dynasty. (From Berra et al., 2010)

Fig. 4. Mortality from birth to 10 years and inbreeding coefficient (F) in offspring of 25 marriages of the Darwin/Wedgwood dynasty (n = number of marriages) (Data from Berra et al., 2010)

Fig. 5. Mortality in offspring of first cousin and non-consanguineous marriages in Brazil (average of 8 populations), Pakistan (average of 9 populations), India (average of 10 populations), Japan (average of 7 populations) and France (average of 2 populations). (Data from Bittles & Neel, 1994)

Inbreeding and Genetic Disorder 37

13.54% ± 5.40 and the cost of inbreeding for an F value of 0.254, which is the inbreeding coefficient of Charles II, was 54.99%. Statistically significant deviations of a linear relationship between survival and inbreeding coefficient were not detected by the nonlinearity t test which compares the change in mean survival between two low levels of F and that between two high levels of F (Lynch & Walsh, 1998, p 267-268). Yet, departures from linearity were not detected for log-transformed data. These results must be taken, however, with caution because the statistical power of the test was probably not high enough to conclude that factors potentially promoting deviations of linearity such as epistatic interactions among loci or purging selection can be discarded. In any case, these findings suggest that linearity deviations for inbreeding depression on survival could be not very strong in humans so that, at least as a first approximation, estimates of inbreeding depression obtained from low inbreeding levels could be linearly extrapolated to predict the

extent of depression for high inbreeding in a given population.

Fig. 6. Survival and inbreeding coefficient (F) of offspring of 71 marriages from the

The Spanish Habsburg dynasty died out when Charles II, the last king of the dynasty, died in 1700 since no children were born from his two marriages. Indeed, the inbreeding depression on survival suffered by the dynasty was a relevant factor contributing to its extinction but, in the last instance, an effect of inbreeding on morbidity probably was also involved in the extinction of the Spanish Habsburg lineage. Charles II presented important physical and mental disabilities suffering from a number of different diseases during his life, hence being known in Spanish history as *El Hechizado* ("The Hexed") (Gargantilla, 2005). In the light of the knowledge of the current clinical genetics and taking into account that Charles II had an extremely high inbreeding coefficient (F = 0.254) which means that approximately 25.4% of his autosomal genome was autozygous, a tentative hypothesis

Habsburg royal dynasty


1. Child of Philip II and Anna of Habsburg

2. Child of Philip IV and Mariana of Habsburg

Table 4. Inbreeding coefficient (F) of the Spanish Habsburg kings and their wives (From Alvarez et al., 2009)

The European royal dynasties of the Modern Age provide very rich materials for the study of the effects of high inbreeding levels in humans (Alvarez et al., 2009). Consanguineous marriages such as uncle-niece, first cousins and other non-incestuous unions were very frequent in those dynasties along prolonged periods of time and the genealogical records available in the historical sources are very extensive and accessible in such a way that inbreeding coefficients can be computed with extreme precision from extended pedigrees. One of the most important European royal dynasties of the Modern Age was the Habsburg dynasty (also known as the House of Austria) and the Spanish branch of this dynasty ruled over the world-wide Spanish Empire since 1517 until 1700. Along this time, the six kings of the Spanish Habsburg branch contracted 11 marriages and 9 (81.8%) of them were consanguineous unions in a degree of third cousins or closer: two uncle-niece marriages, one double first cousin marriage, one first cousin marriage and other consanguineous unions. The inbreeding coefficient of the Spanish Habsburg kings computed from an extended pedigree up to 16 generations in depth that involves more than 3,000 individuals experienced a strong increase along generations from 0.025 for king Philip I, the founder of the dynasty, to 0.254 for Charles II, the last Spanish Habsburg king (Table 4). The progeny of the Spanish Habsburg kings suffered an important inbreeding depression for survival in such a way that inbreeding at the level of first cousins (F = 0.0625) exerted an adverse effect on survival to 10 years (miscarriages, stillbirth and neonatal deaths not included) of 17.8% ± 12.3. The relationship between survival and inbreeding coefficient in the progeny of 71 Habsburg marriages belonging to both branches of the dynasty (Spanish and Austrian Habsburgs) is shown in Figure 6 (unpublished results). The evidence of a strong inbreeding depression for survival in the Habsburg dynasty is confirmed from this large data set. The absolute decrease in survival to 10 years for the progeny of a first cousin marriage was

Philip I (1478-1506) 0.025 Joanna I of Castile 0.039 Third cousins Charles I (1500 – 1558) 0.037 Isabella of Portugal 0.101 First cousins

Philip III1 (1578 – 1621) 0.218 Margaret of Habsburg 0.139 First cousins once

Philip IV (1605 – 1665) 0.115 Elizabeth of Bourbon 0.007 Third cousins

Charles II2 (1661 – 1700) 0.254 Maria Luise d'Orleans 0.078 Second cousins

Table 4. Inbreeding coefficient (F) of the Spanish Habsburg kings and their wives (From

The European royal dynasties of the Modern Age provide very rich materials for the study of the effects of high inbreeding levels in humans (Alvarez et al., 2009). Consanguineous marriages such as uncle-niece, first cousins and other non-incestuous unions were very frequent in those dynasties along prolonged periods of time and the genealogical records available in the historical sources are very extensive and accessible in such a way that inbreeding coefficients can be computed with extreme precision from extended pedigrees. One of the most important European royal dynasties of the Modern Age was the Habsburg dynasty (also known as the House of Austria) and the Spanish branch of this dynasty ruled over the world-wide Spanish Empire since 1517 until 1700. Along this time, the six kings of the Spanish Habsburg branch contracted 11 marriages and 9 (81.8%) of them were consanguineous unions in a degree of third cousins or closer: two uncle-niece marriages, one double first cousin marriage, one first cousin marriage and other consanguineous unions. The inbreeding coefficient of the Spanish Habsburg kings computed from an extended pedigree up to 16 generations in depth that involves more than 3,000 individuals experienced a strong increase along generations from 0.025 for king Philip I, the founder of the dynasty, to 0.254 for Charles II, the last Spanish Habsburg king (Table 4). The progeny of the Spanish Habsburg kings suffered an important inbreeding depression for survival in such a way that inbreeding at the level of first cousins (F = 0.0625) exerted an adverse effect on survival to 10 years (miscarriages, stillbirth and neonatal deaths not included) of 17.8% ± 12.3. The relationship between survival and inbreeding coefficient in the progeny of 71 Habsburg marriages belonging to both branches of the dynasty (Spanish and Austrian Habsburgs) is shown in Figure 6 (unpublished results). The evidence of a strong inbreeding depression for survival in the Habsburg dynasty is confirmed from this large data set. The absolute decrease in survival to 10 years for the progeny of a first cousin marriage was

Type of

removed

removed

Mary of Portugal 0.123 Double first cousins Mary I of England 0.008 First cousins one

 Elizabeth of Valois 0.001 Remote kinship Anna of Habsburg 0.106 Uncle – niece

Mariana of Habsburg 0.155 Uncle - niece

Maria Anna of Neoburg 0.008 Remote kinship

consanguineous marriage

King F King´s wife F

Philip II (1527 – 1598) 0.123

1. Child of Philip II and Anna of Habsburg 2. Child of Philip IV and Mariana of Habsburg

Alvarez et al., 2009)

13.54% ± 5.40 and the cost of inbreeding for an F value of 0.254, which is the inbreeding coefficient of Charles II, was 54.99%. Statistically significant deviations of a linear relationship between survival and inbreeding coefficient were not detected by the nonlinearity t test which compares the change in mean survival between two low levels of F and that between two high levels of F (Lynch & Walsh, 1998, p 267-268). Yet, departures from linearity were not detected for log-transformed data. These results must be taken, however, with caution because the statistical power of the test was probably not high enough to conclude that factors potentially promoting deviations of linearity such as epistatic interactions among loci or purging selection can be discarded. In any case, these findings suggest that linearity deviations for inbreeding depression on survival could be not very strong in humans so that, at least as a first approximation, estimates of inbreeding depression obtained from low inbreeding levels could be linearly extrapolated to predict the extent of depression for high inbreeding in a given population.

Fig. 6. Survival and inbreeding coefficient (F) of offspring of 71 marriages from the Habsburg royal dynasty

The Spanish Habsburg dynasty died out when Charles II, the last king of the dynasty, died in 1700 since no children were born from his two marriages. Indeed, the inbreeding depression on survival suffered by the dynasty was a relevant factor contributing to its extinction but, in the last instance, an effect of inbreeding on morbidity probably was also involved in the extinction of the Spanish Habsburg lineage. Charles II presented important physical and mental disabilities suffering from a number of different diseases during his life, hence being known in Spanish history as *El Hechizado* ("The Hexed") (Gargantilla, 2005). In the light of the knowledge of the current clinical genetics and taking into account that Charles II had an extremely high inbreeding coefficient (F = 0.254) which means that approximately 25.4% of his autosomal genome was autozygous, a tentative hypothesis

Inbreeding and Genetic Disorder 39

different recessive genetic disorders. From this perspective, inbreeding effects on both survival and fertility due to prolonged consanguineous marriage led to the fall of the Spanish Habsburg lineage which constitutes one of the most dramatic examples of

Because most studies of inbreeding depression in humans have focused on prereproductive stages of the life cycle, research on effects of inbreeding on fitness traits such as fertility has received less attention. The analysis of the effects of inbreeding on reproductive success are subject to a number of potential limitations associated with lack of control for important sociodemographic variables such as age at marriage, literacy, use of contraceptives and duration of marriage. A number of studies have compared the fertility in consanguineous unions with that of unrelated couples. In this way, the effect of the degree of relatedness between spouses on fertility is investigated. The relatedness between individuals is usually expressed as their kinship coefficient (θ), which is equal to the inbreeding coefficient (F) of their offspring (Hedrick, 2005, pp. 269; Lynch & Walsh, 1998, pp. 135-140). In several studies, the total number of offspring (completed fertility) produced by related couples has been found to be higher than that corresponding to unrelated ones (Bittles et al., 2002; Helgason, et al., 2008). In a meta-analysis based on data from a wide range of different human populations (30 populations) located in India, Pakistan, Japan, Kuwait and Turkey, the number of live born children produced by non-consanguineous unions were compared with the number of live born children in four categories of consanguineous unions: double first cousin or uncle-niece (θ = 0.125 in the two cases), first cousin (θ = 0.0625), first cousin once removed/double second cousin (θ = 0.0313), and second cousin (θ = 0.0156) (Bittles et al., 2002). A positive association between kinship and fertility was found at all levels of kinship tested, although the differences between consanguineous and non-consanguineous couples were statistically significant only for first cousin couples. Since these positive associations between consanguinity and fertility could largely be due to uncontrolled sociodemographic variables, Bittles et al. (2002) performed an analysis based on data of first cousin marriages from the National Family and Health Survey conducted in India during 1992-1993. Multivariate analysis showed that fertility is importantly influenced by a number of factors such as illiteracy, earlier age at marriage, lower contraceptive use, duration of marriage and reproductive compensation which were, in turn, positively associated with consanguineous marriage. When the effects of these various factors were adjusted at the multivariate analysis, differences in fertility between first cousin and non-consanguineous couples were not detected. In contrast with these results based on a large data set, some studies provide convincing evidence for a positive association between kinship and fertility in some particular human population. Thus, a significant positive association between kinship and fertility was detected in a study performed from all known couples of the Icelandic population born between 1800 and 1965 (Helgason et al., 2008). Iceland is one of the most socioeconomically and culturally homogeneous societies in the world and is characterized by relatively low levels of inbreeding. The kinship of couples was computed on a depth of up to 10 generations from each couple so that differences in fertility across a fine scale of kinship values was assessed. Research on the inbreeding effect on fertility at an individual level has been also performed through the measurement of fertility in inbred males and females. A significant effect of inbreeding on female fecundity has been found in a 15-year study performed in Hutterite colonies in South Dakota (Ober et al. 1999). The socio-economic conditions are relatively uniform within the Hutterite community so that

detrimental effects of inbreeding in humans.

based on the simultaneous occurrence in this king of two recessive genetic disorders has been advanced to explain most of his complex clinical profile, including his impotence/infertility which in last instance led to the extinction of the Spanish Habsburg lineage (Alvarez et al., 2009). According to contemporary writings, Charles II was often described as "big headed" and "weak breast-fed baby". He was unable to speak until the age of 4, and could not walk until the age of 8. He was short, weak and quite lean and thin. He was described as a person showing very little interest on his surroundings (abulic personality). He first marries at 18 and later at 29, leaving no descendants. His first wife talks of his premature ejaculation, while his second spouse complaints about his impotency. He suffers from sporadic hematuria and intestinal problems (frequent diarrhoea and vomits). He looked like an old person when he was only 30 years old, suffering from edemas on his feet, legs, abdomen and face. During the last years of his life he barely can stand up, and suffers from hallucinations and convulsive episodes. His health worsens until his premature death when he was 39, after an episode of fever, abdominal pain, hard breathing and comma. From these evidences, two recessive genetic disorders, combined pituitary hormone deficiency (CPHD, OMIM 26260) and distal renal tubular acidosis (dRTA, OMIM 602722), could explain an important part of the complex clinical profile of Charles II. Combined pituitary hormone deficiency leads to a multiple endocrine deficit of pituitary hormones: thyroid stimulating hormone (TSH), growth hormone (GH), prolactin (PRL), gonadotropin and adrenocorticotropic hormone (ACTH) (University of Washington, Genetest.gov). This disease shows a slow progression and is frequently caused by a genetic disorder produced by mutations of some of the transcription factors expressed in the pituitary gland, such as *PROP1* (5q), *POU1F1* (3p), *LHX3* (9q), *LHX4* (1q), *HESX1* (3p), *TBX19* (1q), *SOX2* (3q) and *SOX3* (Xq). Mutations occurring in *PROP1* are the most frequent genetic cause of hereditary CPHD, and they are inherited as autosomal recessives. Mutations in *PROP1* are associated with progressive endocrine deficiencies highly variable in both, intensity and in the first clinic sign manifestation (Kelberman & Dattani, 2007; Reynaud et al., 2005). Charles II showed clinical characteristics of hypothyroidism such as muscular weakness, hypotonia, delayed onset of speech and abulic behaviour, and the lack of GH could account for his short stature. His hypogonadotropic hypogonadism could explain his infertility/impotency, and a PRL deficit has been associated with decreased fertility in males. ACTH deficit usually presents in adults with common gastrointestinal symptoms such as nausea, vomit and diarrhoea. At the same time, the patients are fatigued, with general weakness, asthenia and hypotension. Any additional physical stress will exacerbate these clinical manifestations, often resulting in intense abdominal pain, fever, lethargy followed by hypovolemic vascular collapse (Agarwal et al., 2000; McGraw Hill, Access Medicine). The variety and scope of clinical symptoms afflicting Charles II could have been caused by an additional disease responsible for his muscular weakness at a young age, rickets, hematuria and his big head relative to his body size. These symptoms might have been manifestations of a secondary metabolic alteration originated in a renal disease such as severe hyperchloremic hypokalemic distal renal tubular acidosis (dRTA). This disease presents with alterations of the urine acidification mechanisms leading to severe metabolic hyperchloremic hypokalemic acidosis, prominent renal tract calcification with persistent hematuria and rickets. It may be caused by autosomal recessive mutations in *ATP6V0A4* (7q) or *ATP6V1B1* (2q) genes (Stover et al., 2002; Vargas-Possou et al., 2006). In this way, most of the symptoms showed by Charles II might be caused by these two

based on the simultaneous occurrence in this king of two recessive genetic disorders has been advanced to explain most of his complex clinical profile, including his impotence/infertility which in last instance led to the extinction of the Spanish Habsburg lineage (Alvarez et al., 2009). According to contemporary writings, Charles II was often described as "big headed" and "weak breast-fed baby". He was unable to speak until the age of 4, and could not walk until the age of 8. He was short, weak and quite lean and thin. He was described as a person showing very little interest on his surroundings (abulic personality). He first marries at 18 and later at 29, leaving no descendants. His first wife talks of his premature ejaculation, while his second spouse complaints about his impotency. He suffers from sporadic hematuria and intestinal problems (frequent diarrhoea and vomits). He looked like an old person when he was only 30 years old, suffering from edemas on his feet, legs, abdomen and face. During the last years of his life he barely can stand up, and suffers from hallucinations and convulsive episodes. His health worsens until his premature death when he was 39, after an episode of fever, abdominal pain, hard breathing and comma. From these evidences, two recessive genetic disorders, combined pituitary hormone deficiency (CPHD, OMIM 26260) and distal renal tubular acidosis (dRTA, OMIM 602722), could explain an important part of the complex clinical profile of Charles II. Combined pituitary hormone deficiency leads to a multiple endocrine deficit of pituitary hormones: thyroid stimulating hormone (TSH), growth hormone (GH), prolactin (PRL), gonadotropin and adrenocorticotropic hormone (ACTH) (University of Washington, Genetest.gov). This disease shows a slow progression and is frequently caused by a genetic disorder produced by mutations of some of the transcription factors expressed in the pituitary gland, such as *PROP1* (5q), *POU1F1* (3p), *LHX3* (9q), *LHX4* (1q), *HESX1* (3p), *TBX19* (1q), *SOX2* (3q) and *SOX3* (Xq). Mutations occurring in *PROP1* are the most frequent genetic cause of hereditary CPHD, and they are inherited as autosomal recessives. Mutations in *PROP1* are associated with progressive endocrine deficiencies highly variable in both, intensity and in the first clinic sign manifestation (Kelberman & Dattani, 2007; Reynaud et al., 2005). Charles II showed clinical characteristics of hypothyroidism such as muscular weakness, hypotonia, delayed onset of speech and abulic behaviour, and the lack of GH could account for his short stature. His hypogonadotropic hypogonadism could explain his infertility/impotency, and a PRL deficit has been associated with decreased fertility in males. ACTH deficit usually presents in adults with common gastrointestinal symptoms such as nausea, vomit and diarrhoea. At the same time, the patients are fatigued, with general weakness, asthenia and hypotension. Any additional physical stress will exacerbate these clinical manifestations, often resulting in intense abdominal pain, fever, lethargy followed by hypovolemic vascular collapse (Agarwal et al., 2000; McGraw Hill, Access Medicine). The variety and scope of clinical symptoms afflicting Charles II could have been caused by an additional disease responsible for his muscular weakness at a young age, rickets, hematuria and his big head relative to his body size. These symptoms might have been manifestations of a secondary metabolic alteration originated in a renal disease such as severe hyperchloremic hypokalemic distal renal tubular acidosis (dRTA). This disease presents with alterations of the urine acidification mechanisms leading to severe metabolic hyperchloremic hypokalemic acidosis, prominent renal tract calcification with persistent hematuria and rickets. It may be caused by autosomal recessive mutations in *ATP6V0A4* (7q) or *ATP6V1B1* (2q) genes (Stover et al., 2002; Vargas-Possou et al., 2006). In this way, most of the symptoms showed by Charles II might be caused by these two different recessive genetic disorders. From this perspective, inbreeding effects on both survival and fertility due to prolonged consanguineous marriage led to the fall of the Spanish Habsburg lineage which constitutes one of the most dramatic examples of detrimental effects of inbreeding in humans.

Because most studies of inbreeding depression in humans have focused on prereproductive stages of the life cycle, research on effects of inbreeding on fitness traits such as fertility has received less attention. The analysis of the effects of inbreeding on reproductive success are subject to a number of potential limitations associated with lack of control for important sociodemographic variables such as age at marriage, literacy, use of contraceptives and duration of marriage. A number of studies have compared the fertility in consanguineous unions with that of unrelated couples. In this way, the effect of the degree of relatedness between spouses on fertility is investigated. The relatedness between individuals is usually expressed as their kinship coefficient (θ), which is equal to the inbreeding coefficient (F) of their offspring (Hedrick, 2005, pp. 269; Lynch & Walsh, 1998, pp. 135-140). In several studies, the total number of offspring (completed fertility) produced by related couples has been found to be higher than that corresponding to unrelated ones (Bittles et al., 2002; Helgason, et al., 2008). In a meta-analysis based on data from a wide range of different human populations (30 populations) located in India, Pakistan, Japan, Kuwait and Turkey, the number of live born children produced by non-consanguineous unions were compared with the number of live born children in four categories of consanguineous unions: double first cousin or uncle-niece (θ = 0.125 in the two cases), first cousin (θ = 0.0625), first cousin once removed/double second cousin (θ = 0.0313), and second cousin (θ = 0.0156) (Bittles et al., 2002). A positive association between kinship and fertility was found at all levels of kinship tested, although the differences between consanguineous and non-consanguineous couples were statistically significant only for first cousin couples. Since these positive associations between consanguinity and fertility could largely be due to uncontrolled sociodemographic variables, Bittles et al. (2002) performed an analysis based on data of first cousin marriages from the National Family and Health Survey conducted in India during 1992-1993. Multivariate analysis showed that fertility is importantly influenced by a number of factors such as illiteracy, earlier age at marriage, lower contraceptive use, duration of marriage and reproductive compensation which were, in turn, positively associated with consanguineous marriage. When the effects of these various factors were adjusted at the multivariate analysis, differences in fertility between first cousin and non-consanguineous couples were not detected. In contrast with these results based on a large data set, some studies provide convincing evidence for a positive association between kinship and fertility in some particular human population. Thus, a significant positive association between kinship and fertility was detected in a study performed from all known couples of the Icelandic population born between 1800 and 1965 (Helgason et al., 2008). Iceland is one of the most socioeconomically and culturally homogeneous societies in the world and is characterized by relatively low levels of inbreeding. The kinship of couples was computed on a depth of up to 10 generations from each couple so that differences in fertility across a fine scale of kinship values was assessed. Research on the inbreeding effect on fertility at an individual level has been also performed through the measurement of fertility in inbred males and females. A significant effect of inbreeding on female fecundity has been found in a 15-year study performed in Hutterite colonies in South Dakota (Ober et al. 1999). The socio-economic conditions are relatively uniform within the Hutterite community so that

Inbreeding and Genetic Disorder 41

the disease locus on the basis that the adjacent region will be homozygous by descent in

Consanguineous marriage is frequently found in many human populations all over the world. The highest rates of consanguineous marriages occur in north and sub-Saharan Africa, the Middle East, and west, central, and south Asia, where, in some populations, 20 to 60% of all marriages are between relatives. First-cousin marriage is the most common form of consanguineous union in most human populations. There are clear social and economic advantages to consanguinity mainly associated with the maintenance of family structure and property, particularly in rural societies. Consanguineous marriages cannot be linked to any specific religion or religious rules. It is practiced among people of various religions, and the attitudes towards consanguineous marriages vary among followers of the same religion. Offspring of consanguineous parents are at risk both for monogenic autosomal recessive disorders and for conditions with multifactorial inheritance. Consanguineous marriage increases the chance that both members of a couple will carry any recessive variant that is being transmitted in their family, and that this will manifest in the homozygous state in their children. Thus, a large number of studies have reported this outcome as one of the most important clinical consequences of consanguineous marriage. In general, the offspring of consanguineous couples present increased levels of morbidity and significant medical problems such as major malformations, congenital anomaly and structural birth defects. Furthermore, consanguinity has been implicated in susceptibility to a number of complex diseases such as heart disease, cancer, depression, gout, peptic ulcer, schizophrenia, epilepsy and asthma. Consanguinity has been also proven to be a risk factor for infection by a diverse

range of pathogens responsible for a number of human infectious diseases.

The phenomenon of inbreeding depression, that is, the reduced survival and fertility of offspring of related individuals, has been documented in many human populations reflecting the consequences of increased homozygosity for alleles affecting reproductive fitness. Estimates of inbreeding depression in survival have been obtained for a number of human populations comparing the prereproductive mortality in the progeny of first-cousin and non consanguineous marriages. The mean increase in mortality among the offspring of first-cousin marriages (F = 0.0625) was 4.4% ± 4.6 from data of 38 worldwide human populations and a more recent estimate obtained from 69 populations was 3.5%, but it is necessary to emphasize that the extent of inbreeding depression on survival presents a large variation among populations. By contrast, there is little information on inbreeding depression in survival for inbreeding levels higher than those corresponding to first-cousin progenies. Recent studies conducted on European royal dynasties of the Modern Age where inbreeding coefficients were much higher than that corresponding to first-cousins are filling this gap of information. It is expected that these studies could provide a deeper

understanding of the genetic basis of inbreeding depression in human populations.

Adams, M.S. & Neel, J.V. (1967). Children of Incest. *Pediatrics*, Vol. 40, No. 1, pp. 55-62 Agarwal, G.; Bhatia, V., Cook, S. & Thomas, P.Q. (2000). Adrenocorticotropin Deficiency in

Combined Pituitary Hormone Deficiency Patients Homozygous for a Novel PROP1 Delection. *Journal of Clinical Endocrinology and Metabolism*, Vol. 85, pp. 4556-4561 Alvarez, G.; Ceballos, F.C. & Quinteiro, C. (2009). The Role of Inbreeding in the Extinction of

a European Royal Dynasty. *PLoS ONE*, 4(4): e5174.doi:10.1371/journal.pone.0005174

such inbred children.

**6. References** 

inbreeding effects can be studied without the confounding effects of uncontrolled socioeconomic variables. Hutterite women with F ≥ 0.04 showed significantly reduced fecundity as evidenced by longer interbirth intervals. There were no significant effects of father´s F or of the kinship of couples on the interbirth interval. In contrast, completed family sizes did not differ among the more and the less-inbred Hutterite women who were born after 1920, even though the adverse effect of inbreeding on fecundity was evident in those cohorts. These results suggest that reproductive compensation may be occurring in the more inbred, less-fecund women probably to achieve a culturally defined optimal family size. An adverse effect of inbreeding on female fecundity has been also found in a study performed in a small and isolated village in the Swiss Alps where socio-economic factors are rather homogeneous (Postma et al., 2010). A significant negative effect of the inbreeding level of the mother on completed family size was detected so that inbred women had fewer children. On the contrary, an effect of either the inbreeding coefficient of the fathers or the kinship coefficient of the couples was not detected. Moreover, some empirical evidences suggest that sensitivity of fertility to inbreeding might vary with parental age. The effect of consanguineous marriages on reproduction studied in a cohort of women born in the late 19th century in north-eastern Quebec, Canada, showed that the inbreeding coefficient of the father strongly affects reproduction rates along reproductive period as inbred fathers showed a strong asymmetry in the number of children produced during the first half in comparison with the second half (Robert et al., 2009). These results suggest that temporal aspects of reproduction may be relevant in the study of inbreeding depression for fertility in humans.

#### **5. Conclusion**

Inbreeding defined as the mating between relatives is a phenomenon that occurs in many animal and plant species as well as in humans. Genetic effects of inbreeding are basically due to the fact that the inbred individual will frequently inherit the same gene from each parent, who inherited it from a common ancestor. In this way, inbreeding increases the amount of homozygosity so that recessive traits such as many human genetic disorders will occur with increased frequency in the progeny of consanguineous couples. Studies on genome-wide homozygosity through the genome scan technology have opened new possibilities for understanding inbreeding from a genomic perspective. Long homozygous chromosomal segments have been detected through whole-genome scans in human chromosomes. These long homozygous tracts are the result of autozygosity (homozygous by descent) because inbred individuals have segments of their chromosomes that are homozygous as a result of inheriting identical genomic segments through both parents. The distribution of such homozygous tracts throughout the genome has been studied in inbred individuals affected by recessive Mendelian disorders providing valuable information on the genomic architecture underlying human genetic diseases associated with inbreeding. Recent researches have shown that extended tracts of genomic homozygosity are globally widespread in many human populations providing new perspectives in the study of past consanguinity and population isolation. Autozygosity has also practical implications for the identification of human disease genes. Thus, at present, homozygosity mapping is the method of choice for mapping human genes that cause recessive traits from the DNA of affected children from consanguineous marriage. This approach involves the detection of

inbreeding effects can be studied without the confounding effects of uncontrolled socioeconomic variables. Hutterite women with F ≥ 0.04 showed significantly reduced fecundity as evidenced by longer interbirth intervals. There were no significant effects of father´s F or of the kinship of couples on the interbirth interval. In contrast, completed family sizes did not differ among the more and the less-inbred Hutterite women who were born after 1920, even though the adverse effect of inbreeding on fecundity was evident in those cohorts. These results suggest that reproductive compensation may be occurring in the more inbred, less-fecund women probably to achieve a culturally defined optimal family size. An adverse effect of inbreeding on female fecundity has been also found in a study performed in a small and isolated village in the Swiss Alps where socio-economic factors are rather homogeneous (Postma et al., 2010). A significant negative effect of the inbreeding level of the mother on completed family size was detected so that inbred women had fewer children. On the contrary, an effect of either the inbreeding coefficient of the fathers or the kinship coefficient of the couples was not detected. Moreover, some empirical evidences suggest that sensitivity of fertility to inbreeding might vary with parental age. The effect of consanguineous marriages on reproduction studied in a cohort of women born in the late 19th century in north-eastern Quebec, Canada, showed that the inbreeding coefficient of the father strongly affects reproduction rates along reproductive period as inbred fathers showed a strong asymmetry in the number of children produced during the first half in comparison with the second half (Robert et al., 2009). These results suggest that temporal aspects of reproduction may be relevant in the study of inbreeding depression for fertility in

Inbreeding defined as the mating between relatives is a phenomenon that occurs in many animal and plant species as well as in humans. Genetic effects of inbreeding are basically due to the fact that the inbred individual will frequently inherit the same gene from each parent, who inherited it from a common ancestor. In this way, inbreeding increases the amount of homozygosity so that recessive traits such as many human genetic disorders will occur with increased frequency in the progeny of consanguineous couples. Studies on genome-wide homozygosity through the genome scan technology have opened new possibilities for understanding inbreeding from a genomic perspective. Long homozygous chromosomal segments have been detected through whole-genome scans in human chromosomes. These long homozygous tracts are the result of autozygosity (homozygous by descent) because inbred individuals have segments of their chromosomes that are homozygous as a result of inheriting identical genomic segments through both parents. The distribution of such homozygous tracts throughout the genome has been studied in inbred individuals affected by recessive Mendelian disorders providing valuable information on the genomic architecture underlying human genetic diseases associated with inbreeding. Recent researches have shown that extended tracts of genomic homozygosity are globally widespread in many human populations providing new perspectives in the study of past consanguinity and population isolation. Autozygosity has also practical implications for the identification of human disease genes. Thus, at present, homozygosity mapping is the method of choice for mapping human genes that cause recessive traits from the DNA of affected children from consanguineous marriage. This approach involves the detection of

humans.

**5. Conclusion** 

the disease locus on the basis that the adjacent region will be homozygous by descent in such inbred children.

Consanguineous marriage is frequently found in many human populations all over the world. The highest rates of consanguineous marriages occur in north and sub-Saharan Africa, the Middle East, and west, central, and south Asia, where, in some populations, 20 to 60% of all marriages are between relatives. First-cousin marriage is the most common form of consanguineous union in most human populations. There are clear social and economic advantages to consanguinity mainly associated with the maintenance of family structure and property, particularly in rural societies. Consanguineous marriages cannot be linked to any specific religion or religious rules. It is practiced among people of various religions, and the attitudes towards consanguineous marriages vary among followers of the same religion. Offspring of consanguineous parents are at risk both for monogenic autosomal recessive disorders and for conditions with multifactorial inheritance. Consanguineous marriage increases the chance that both members of a couple will carry any recessive variant that is being transmitted in their family, and that this will manifest in the homozygous state in their children. Thus, a large number of studies have reported this outcome as one of the most important clinical consequences of consanguineous marriage. In general, the offspring of consanguineous couples present increased levels of morbidity and significant medical problems such as major malformations, congenital anomaly and structural birth defects. Furthermore, consanguinity has been implicated in susceptibility to a number of complex diseases such as heart disease, cancer, depression, gout, peptic ulcer, schizophrenia, epilepsy and asthma. Consanguinity has been also proven to be a risk factor for infection by a diverse range of pathogens responsible for a number of human infectious diseases.

The phenomenon of inbreeding depression, that is, the reduced survival and fertility of offspring of related individuals, has been documented in many human populations reflecting the consequences of increased homozygosity for alleles affecting reproductive fitness. Estimates of inbreeding depression in survival have been obtained for a number of human populations comparing the prereproductive mortality in the progeny of first-cousin and non consanguineous marriages. The mean increase in mortality among the offspring of first-cousin marriages (F = 0.0625) was 4.4% ± 4.6 from data of 38 worldwide human populations and a more recent estimate obtained from 69 populations was 3.5%, but it is necessary to emphasize that the extent of inbreeding depression on survival presents a large variation among populations. By contrast, there is little information on inbreeding depression in survival for inbreeding levels higher than those corresponding to first-cousin progenies. Recent studies conducted on European royal dynasties of the Modern Age where inbreeding coefficients were much higher than that corresponding to first-cousins are filling this gap of information. It is expected that these studies could provide a deeper understanding of the genetic basis of inbreeding depression in human populations.

#### **6. References**

Adams, M.S. & Neel, J.V. (1967). Children of Incest. *Pediatrics*, Vol. 40, No. 1, pp. 55-62 Agarwal, G.; Bhatia, V., Cook, S. & Thomas, P.Q. (2000). Adrenocorticotropin Deficiency in Combined Pituitary Hormone Deficiency Patients Homozygous for a Novel PROP1 Delection. *Journal of Clinical Endocrinology and Metabolism*, Vol. 85, pp. 4556-4561 Alvarez, G.; Ceballos, F.C. & Quinteiro, C. (2009). The Role of Inbreeding in the Extinction of a European Royal Dynasty. *PLoS ONE*, 4(4): e5174.doi:10.1371/journal.pone.0005174

Inbreeding and Genetic Disorder 43

Gargantilla, P. (2005). *Enfermedades de los Reyes de España: Los Austrias*, La Esfera de los

Gibson, J.; Morton, N.E. & Collins, A. (2006). Extended Tracts of Homozygosity in Outbred Human Populations. *Human Molecular Genetics*, Vol. 15, No. 5, pp. 789-795. Golubovsky, M. (2008). Unexplained Infertility in Charles Darwin´s Family: Genetic Aspect.

Hedrick, P. W. (2005). *Genetics of Populations*, Jones and Bartlett Publishers, ISBN 0-7637-

Helgason, A.; Pálsson, S., Gudbjartsson, D., Kristjánsson, P. & Stefánsson, K. (2008). An

Hildebrandt, F.; Heeringa, S.F., Rüschendorf, F., Attanasio, M.,Nürnberg, G., et al. (2009). A

Jones, S. (2008). *Darwin´s Island. The Galapagos in the Garden of England*, Little, Brown, ISBN

Kelberman, D. & Dattani, MT. (2007). Hypothalamic and Pituitary Development: Novel Insights Into the Aetiology. *European Journal of Endocrinology*, Vol. 157, pp. S3-S14 Khlat, M. & Khoury, M. (1991). Inbreeding and Diseases: Demographic, Genetic, and Epidemiologic Perspectives. *Epidemiologic Reviews*, Vol. 13, pp. 28-41 Khoury, M.J.; Cohen, B.H., Chase, G.A. & Diamond, E.L. (1987). An Edidemiologic

Mortality. *American Journal of Epidemiology*, Vol. 125, No. 2, pp. 251-262 Kirin, M.; McQuillan, R., Franklin, C.S., Campbell, H., McKeigue, P.M. & Wilson, J.F. (2010).

Lander, E.S. & Botstein, D. (1987). Homozygosity Mapping: A Way to Map Human Recessive Traits with the DNA of Inbred Children. *Science*, Vol. 236, pp. 1567-1570 Lynch, M. & Walsh, B. (1998). *Genetics and Analysis of Quantitative Traits*, Sinauer Associates,

Lyons, E.J.; Amos, W., Berkley, J.A., Mwangi, I., Shafi, M. et al. (2009a). Homozygosity and

Lyons, E.J.; Frodsham, A.J., Zhang, L., Hill, A.V.S. and Amos, W. (2009b). *Biology Letters*, Vol.

MacCluer, J.W.; Boyce, A.J., Dyke, B., Weitkamp, L.R., Pfennig, D.W. et al. (1983). Inbreeding

McGraw Hill. March 2011. In: Access Medicine. Date of access March 2011. Available from:

McQuillan, R.; Leutenegger, A.L., Abdel-Rahman, R., Franklin, C.S., Pericic, M. et al. (2008).

Moore, J. (2005). Good Breeding: Darwin Doubted His Own Familiy´s "Fitness". *Natural* 

*PLoS ONE* 5(11):e13996.doi: 10.1371/journal.pone.0013996

ISBN 0-87893-481-2, Sunderland, Massachusetts, USA

Association Between the Kinship and Fertility of Human Couples. *Science*, Vol. 319,

Systematic Approach to Mapping Recessive Disease Genes in Individuals from

Approach to the Evaluation of the Effect of Inbreeding on Prereproductive

Genomic Runs of Homozygosity Record Population History and Consanguinity.

Risk of Childhood Death Due to Invasive Bacterial Disease. *BMC Medical Genetics*,

and Pedigree Structure in Standardbred Horses. *Journal of Heredity*, Vol. 74, pp. 394-

Runs of Homozygosity in European Populations. *The American Journal of Human* 

Libros, ISBN 84-9734-338-7, Madrid, Spain

*Human Reproduction*, Vol. 23, pp. 1237-1238

4772-6, Sudbury, Massachusetts, USA

Outbred Populations. *PLoS Genetics*,

978-1-4087-0000-6, London, England

10:55,doi:10.1186/1471-2350-10-55

*Genetics*, Vol. 83, pp. 359-372

*History*, Vol. 114, pp. 45-46

http://accessmedicine.com/features.aspx

5, pp. 574-576

399

5(1):e1000353.doi:10.1371/journal.pgen.1000353

pp. 813-816


Balloux, F.; Amos, W. & Coulson, T. (2004). Does Heterozygosity Estimate Inbreeding in

Bennett, R.L.; Motulsky, A.G., Bittles, A., Hudgins, L., Uhrich, S. et al. (2002). Genetic

Berra, T.M.; Alvarez, G. & Ceballos, F.C. (2010). Was the Darwin/Wedgwood Dynasty Adversely Affected by Consanguinity?. *BioScience*, Vol. 60, No. 5, pp. 376-383 Bittles, A.H. (2001). Consanguinity and Its Relevance to Clinical Genetics. *Clinical Genetics*,

Bittles, A.H. (2006). A Background Summary of Consanguineous Marriage. Available from

Bittles, A.H. & Black, M.L. (2010). Consanguinity, Human Evolution, and Complex Diseases.

Bittles, A.H.; Grant, J.C., Sullivan, S.G. & Hussain, R. (2002). Does Inbreeding Lead to Decreased Human Fertility. *Annals of Human Biology*, Vol. 29, No. 2, pp. 111-130 Bittles, A.H.; Mason, W.M., Greene, J. & Appaji Rao, N. (1991). Reproductive Behavior and

Bodmer, W.F. & Cavalli-Sforza, L.L. (1976). *Genetics, Evolution, and Man*, W.H. Freeman and

Botstein, D. & Risch, N. (2003). Discovering Genotypes Underlying Human Phenotypes: Past

Boyce, A.J. (1983). Computation of Inbreeding and Kinship Coefficients on Extended

Broman, K.W. & Weber, J.L. (1999). Long Homozygous Chromosomal Segments in

Carothers, A.D.; Rudan, I., Kolcic, I., Polasek, O. Hayward, C. et al. (2006). Estimating

Heterozygosity Approaches. *Annals of Human Genetics*, Vol. 70, pp. 666-676

Cavalli-Sforza, L. L. & Bodmer, W. F. (1971). *The Genetics of Human Populations*, W. H.

Charlesworth, B. & Charlesworth, D. (1999). The genetic Basis of Inbreeding Depression.

Charlesworth, D. & Willis, J.H. (2009). The Genetics of Inbreeding Depression. *Nature* 

Consanguinity/Endogamy Resource. April 2009. Date of access March 2011. Available from:

Darwin, C.R. (1876). *The Effects of Cross and Self-Fertilization in the Vegetable Kingdom*, John

Freeman and Company, ISBN 0-7167-1018-8, San Francisco, USA

Successes for Mendelian Disease, Future Approaches for Complex Disease. *Nature* 

Reference Families from the Centre d´Étude du Polymorphisme Humain, *The* 

Human Inbreeding Coefficients: Comparison of Genealogical and Marker

Health in Consanguineous Marriages. *Science*, Vol. 252, pp. 789-794 Bittles, A.H. & Neel, J.V. (1994). The Costs of Human Inbreeding and Their Implications for

Variations at the DNA Level. *Nature Genetics*, Vol. 8, pp. 117-121

Company, ISBN 0-7167-0573-7, San Francisco, USA

Pedigrees, *Journal of Heredity*, Vol. 74, pp. 400-404

Carter, C.O. (1967). Children of Incest. *Lancet*, Vol. i, pp. 436

*Genetical Research*, Vol. 74, pp. 329-340

*Reviews Genetics*, Vol. 10, pp. 783-796

Murray, London, England

http://www.consang.net/index.php/Main\_Page

*American Journal of Human Genetics*, Vol. 65, pp. 1493-1500

*Proceedings of the National Academy of Sciences USA,* Vol. 107, No. Suppl. 1, pp. 1779-

Counseling and Screening of Consanguineous Couples and Their Offspring: Recommendations of the National Society of Genetic Counselors. *Journal of Genetic* 

Real Populations?. *Molecular Ecology*, Vol. 13, pp. 3021-3031

*Counseling*, Vol. 11, No 2, pp. 97-119

Vol. 60, pp. 89-98

1786

http://www.consang.net

*Genetics*, Vol. 33, pp. 228-237


**3** 

**Cytogenetic Techniques in** 

Kannan Thirumulu Ponnuraj

*Universiti Sains Malaysia* 

*Malaysia* 

**Diagnosing Genetic Disorders** 

When the discovery of giant banded, salivary chromosomes in *Drosophila* was made by Painter in 1934, it gave a tremendous impact to the cytological work carried out in *Drosophila.* This made it possible to identify the chromosomes individually and also to discern the specific segments of the chromosome. Followed by this, cytogenetics bloomed with the establishment of chromosome number in man as 46 in the year 1956. Since then, lot of advancements and improvements have taken place over the years and combination of techniques have made cytogenetics as an undisputable source in diagnosing the various genetic disorders and now, human cytogenetics has completed its glorious 50 years after the discovery of chromosome number in normal human cells. This chapter provides an insight into the fundamentals of cytogenetics and its importance in the diagnosis of commonly

When the genetic importance of polytene chromosomes of Diptera was rediscovered in the early thirties, almost every *Drosophila* geneticist started studying the salivary glands. Nageli, the Swiss botanist first described thread like structures in the nuclei of plant cells in the 1840s and called them "transitory cytoblasts", which represented what now are called chromosomes. Later, the term "chromosome" was coined by Waldeyer in 1888 after staining techniques had been developed which made them better discernible (*chromos* = Greek for colour; *soma* = Greek for body). In 1909, Johannsen coined the term 'gene'. This triggered the beginning of modern cytogenetics, but yet, the progress was moving at a snail's pace. Still, attempts were going on to find the number of chromosomes, which became a serious issue and a matter of great concern among the various researchers. The quality of chromosomes were poor and the numbers varied each and every time. Even determining the diploid number of a mammalian species was considered a difficult accomplishment. The chromosomes were crowded in metaphase and considerations of biological function of the chromosomes and in particular, of modern genetics were beyond the scope of cytological research in the 19th century. It was quite cumbersome to obtain nice slides with good metaphase spreads for easy counting. However, in 1950s, there were advent of new techniques for chromosome preparations, like addition of colcemid and hypotonic treatment, led to the establishment of the diploid number of chromosomes in man as 46 (Tjio

**1. Introduction** 

occurring syndromes and disorders.

**2. History of cytogenetics** 

