**3. Impact of DNA analysis on avian systematics and phylogeny**

When James Watson and Francis Crick discovered the structure of DNA in 1953 [1–3], a new era started in biology and with some delay, also in ornithology. In the decades following the discovery of DNA, new technologies emerged to study DNA and genetics: DNA sequencing was established in 1978, the polymerase chain reaction (PCR) was discovered in 1985 by Kary Mullis and Next Generation Sequencing (NGS) appeared after 2000. NGS or High-throughput Sequencing enable the parallel and concomitant sequencing of millions of DNA sequences. NGS is thus the method of choice for the analysis of complete genomes and transcriptomes [1–3, 13, 14].

## **3.1 DNA as a marker for phylogeny**

Deoxyribonucleic acid (DNA) is a macromolecule composed of linearly coupled nucleotides. The pyrimidine bases cytosine (C) and thymine (T) have two N atoms, and the purine bases adenine (A) and guanine (G) each have four N atoms. In addition, deoxyribose (a sugar called pentose) and a phosphate group belong to a nucleotide building block. Unlike DNA, ribonucleic acid (RNA) contains uracil (U) instead of thymine and ribose (which lacks the hydroxyl group in the 2-position) instead of deoxyribose. DNA thus contains the bases A, T, G, and C, and RNA the bases A, U, G, and C. The DNA strands are complimentary and form a double helix, in which A pairs with T and G with C (**Figure 2**) [1, 3].

The DNA double helix is located in the nucleus of all eukaryotic cells as a linear, i.e. filamentous, macromolecule (**Figure 2**). Depending on the species, the nuclear genome (i.e., the DNA in the nucleus) is organized in specific number of chromosomes [1–3]. During the growth of an organism, cells have to multiply at a high rate. During cell division, the DNA of a mother cell is duplicated by a process, termed DNA replication. Consequently, daughter cells obtain an identical genome copy of the mother cell. All cells, which exist today, are never generated *de novo* but always derive from a mother cell. And this continuous flow of cell divisions must have existed since the first ancestral cell; thus all cells which exist today are connected and their DNA can be traced back to the origin of life.

Except for germ cells, all vertebrate cells have a double (diploid) set of chromosomes. All offspring receive each a haploid (single) set of chromosomes from the mother and father, respectively with the gametes (germ cells that unite at fertilization). These haploid genomes are similar, but not 100% identical. Genetic variability of individuals is generated during the generation of germ cells by a process called meiosis.

The vertebrate genome is thought to have 21,000 genes encoding proteins and another 9,000 genes encoding diverse RNAs. These genes correspond to the genotype of an individual. Since not all genes are active at the same time, but are regulated in a cell- and development-specific manner, the expression of the respective active genes is called phenotype. Epigenetic processes can influence the phenotype and phenotypic variability [3].

**7**

**Figure 2.**

*DNA Analyses Have Revolutionized Studies on the Taxonomy and Evolution in Birds*

In addition to the nuclear genome (ncDNA), all animals have additional DNA in their mitochondria (mtDNA), cell organelles that originally arose from bacteria through symbiosis and whose main function is to provide ATP, the fuel for the cell [3]. Similar to bacteria, mtDNA exists as a ring-shaped chromosome and consists of approximately 16,000 to 19,000 base pairs in vertebrates. It contains 13 genes encoding enzymes or other proteins involved in electron transport, 22 genes for tRNAs (tRNA is the abbreviation for transfer RNA, which is required in protein biosynthesis), and two for rRNAs (rRNA is the abbreviation for ribosomal RNA, which is important for the structure and function of ribosomes) (**Figure 2**). Since each animal cell contains several 100 to 1000 mitochondria and each of the mitochondria contains five to ten mtDNA copies, the total number of identical mtDNA copies is several thousand per cell. The mtDNA makes up about 1% of the total DNA of a cell and is particularly suitable for research in molecular evolution and phylogenetics. In contrast to nuclear DNA, mtDNA is almost exclusively inherited maternally. Because mtDNA exhibits more sequence variation than protein coding ncDNA, the sequence analysis of mtDNA has widely used to study bird taxonomy and phylogenetics [13–16]. Most sequence differences in DNA, i.e. an exchange of one of the four DNA bases A, T, G and C, are due to point mutations. Point mutations are triggered by internal mechanisms that occur spontaneously and regularly. These include biochemical alterations of DNA bases (through depurination, deamination, dimerization, and oxidation) and the incorporation of tautomeric bases [3]. External factors for point mutations include high-energy radiation such as UV, X-ray, and high-energy ionizing radiation from radioactivity or cosmic rays, and mutagens (mutation-inducing substances). Most mutations are repaired by special enzymes before the duplication of chromosomes during cell division. This is one of the great advantages of the double helix: even if information on one DNA strand has been altered by mutation, it is still correctly present on the complementary strand and

can be used by the repair enzymes as a back-up copy [3].

Most mutations are observed in somatic cells (body cells), which are not passed onto the offspring and perish with the death of the individual (somatic mutations).

*DOI: http://dx.doi.org/10.5772/intechopen.97013*

*Schematic view of nuclear and mitochondrial DNA in birds.*

*DNA Analyses Have Revolutionized Studies on the Taxonomy and Evolution in Birds DOI: http://dx.doi.org/10.5772/intechopen.97013*

**Figure 2.** *Schematic view of nuclear and mitochondrial DNA in birds.*

In addition to the nuclear genome (ncDNA), all animals have additional DNA in their mitochondria (mtDNA), cell organelles that originally arose from bacteria through symbiosis and whose main function is to provide ATP, the fuel for the cell [3]. Similar to bacteria, mtDNA exists as a ring-shaped chromosome and consists of approximately 16,000 to 19,000 base pairs in vertebrates. It contains 13 genes encoding enzymes or other proteins involved in electron transport, 22 genes for tRNAs (tRNA is the abbreviation for transfer RNA, which is required in protein biosynthesis), and two for rRNAs (rRNA is the abbreviation for ribosomal RNA, which is important for the structure and function of ribosomes) (**Figure 2**). Since each animal cell contains several 100 to 1000 mitochondria and each of the mitochondria contains five to ten mtDNA copies, the total number of identical mtDNA copies is several thousand per cell. The mtDNA makes up about 1% of the total DNA of a cell and is particularly suitable for research in molecular evolution and phylogenetics. In contrast to nuclear DNA, mtDNA is almost exclusively inherited maternally. Because mtDNA exhibits more sequence variation than protein coding ncDNA, the sequence analysis of mtDNA has widely used to study bird taxonomy and phylogenetics [13–16].

Most sequence differences in DNA, i.e. an exchange of one of the four DNA bases A, T, G and C, are due to point mutations. Point mutations are triggered by internal mechanisms that occur spontaneously and regularly. These include biochemical alterations of DNA bases (through depurination, deamination, dimerization, and oxidation) and the incorporation of tautomeric bases [3]. External factors for point mutations include high-energy radiation such as UV, X-ray, and high-energy ionizing radiation from radioactivity or cosmic rays, and mutagens (mutation-inducing substances). Most mutations are repaired by special enzymes before the duplication of chromosomes during cell division. This is one of the great advantages of the double helix: even if information on one DNA strand has been altered by mutation, it is still correctly present on the complementary strand and can be used by the repair enzymes as a back-up copy [3].

Most mutations are observed in somatic cells (body cells), which are not passed onto the offspring and perish with the death of the individual (somatic mutations).

*Birds - Challenges and Opportunities for Business, Conservation and Research*

taxonomy are discussed in Part 5.

transcriptomes [1–3, 13, 14].

**3.1 DNA as a marker for phylogeny**

in which A pairs with T and G with C (**Figure 2**) [1, 3].

and their DNA can be traced back to the origin of life.

to define common ancestry in clades. Clades, which comprise all descendants of a common ancestor, are termed "monophyletic". According to cladistics, a natural system of classification should be only based on monophyletic groups. If scientists obtain evidence for para- and polyphyletic clades, taxa in such groups need to be either lumped or split until all clades are monophyletic. The consequences for bird

When James Watson and Francis Crick discovered the structure of DNA in 1953 [1–3], a new era started in biology and with some delay, also in ornithology. In the decades following the discovery of DNA, new technologies emerged to study DNA and genetics: DNA sequencing was established in 1978, the polymerase chain reaction (PCR) was discovered in 1985 by Kary Mullis and Next Generation Sequencing (NGS) appeared after 2000. NGS or High-throughput Sequencing enable the parallel and concomitant sequencing of millions of DNA sequences. NGS is thus the method of choice for the analysis of complete genomes and

Deoxyribonucleic acid (DNA) is a macromolecule composed of linearly coupled nucleotides. The pyrimidine bases cytosine (C) and thymine (T) have two N atoms, and the purine bases adenine (A) and guanine (G) each have four N atoms. In addition, deoxyribose (a sugar called pentose) and a phosphate group belong to a nucleotide building block. Unlike DNA, ribonucleic acid (RNA) contains uracil (U) instead of thymine and ribose (which lacks the hydroxyl group in the 2-position) instead of deoxyribose. DNA thus contains the bases A, T, G, and C, and RNA the bases A, U, G, and C. The DNA strands are complimentary and form a double helix,

The DNA double helix is located in the nucleus of all eukaryotic cells as a linear, i.e. filamentous, macromolecule (**Figure 2**). Depending on the species, the nuclear genome (i.e., the DNA in the nucleus) is organized in specific number of chromosomes [1–3]. During the growth of an organism, cells have to multiply at a high rate. During cell division, the DNA of a mother cell is duplicated by a process, termed DNA replication. Consequently, daughter cells obtain an identical genome copy of the mother cell. All cells, which exist today, are never generated *de novo* but always derive from a mother cell. And this continuous flow of cell divisions must have existed since the first ancestral cell; thus all cells which exist today are connected

Except for germ cells, all vertebrate cells have a double (diploid) set of chromosomes. All offspring receive each a haploid (single) set of chromosomes from the mother and father, respectively with the gametes (germ cells that unite at fertilization). These haploid genomes are similar, but not 100% identical. Genetic variability of individuals is generated during the generation of germ cells by a process called

The vertebrate genome is thought to have 21,000 genes encoding proteins and another 9,000 genes encoding diverse RNAs. These genes correspond to the genotype of an individual. Since not all genes are active at the same time, but are regulated in a cell- and development-specific manner, the expression of the respective active genes is called phenotype. Epigenetic processes can influence the phenotype

**3. Impact of DNA analysis on avian systematics and phylogeny**

**6**

meiosis.

and phenotypic variability [3].

Only mutations in germline cells (gametes or sex cells) can be inherited. Most mutations have no or negative consequences. Only in rare cases does a mutated gene or allele provide a carrier with a selective advantage to better adapt its bearer to its environment and thereby increase the reproductive success of its offspring. When we analyze DNA sequences or genome structures of organisms living today, we essentially see only mutations that were either neutral or had a positive selection value. Carriers of mutations with negative consequences have logically not withstood the selection pressure - they often had no or little reproductive success and just disappeared.

Only germline mutations may end up in the next generation. If they are successful, they may survive in subsequent generations. If we look at the DNA of an individual, its DNA may differ by millions of nucleotide exchanges in its genome from conspecifics, which were inherited from the ancestors. These nucleotide exchanges can be discovered by DNA sequencing and can be used to reconstruct the Tree of life. A driver for the evolution of divergent DNA sequence lineages is their geographic or ecological separation. If a population gets isolated on an island and if there is no further exchange of individuals with the ancestral population, then an independent sequence evolution sets in, as outlined in **Figure 3**. This phenomenon and feature is the base for the Tree of life.

The rate of mutations is typical for individual genes and can be used to infer the date of ancient evolutionary divergence events. This is the concept of the "Biological Clock" which is widely used in phylogenetics [3, 14].

Darwin demanded variability of traits within populations as a prerequisite for Natural Selection. We now know that this variability exists and is due to diverse mutations in protein-coding genes and in genes for transcription factors. Mutations in regulatory genes sometimes lead to more pronounced morphological changes. This variability is used, for example, in artificial selection for animal and plant breeding. Darwin already recognized the high plasticity of our genomes, from which a breeder can generate new forms in just a few generations, such as the various cabbage vegetables bred from the wild cabbage plant or domestic dogs from wolves (see [3]).

**Figure 3.**

*Geographic or ecological separations of populations lead to sequence evolution and phylogeny.*

**9**

**Figure 4.**

*DNA Analyses Have Revolutionized Studies on the Taxonomy and Evolution in Birds*

Charles Sibley was the first scientist to utilize DNA analysis to study avian systematics. When in 1975 Sibley embarked on his DNA work, DNA sequencing was not yet invented. Sibley employed DNA–DNA hybridization analysis instead, in which DNA melting temperatures are compared. Together with Jon Ahlquist Charles Sibley investigated the DNA melting profiles of more than 1700 bird taxa. In 1990, they published their results as "Phylogeny and Classification of Birds" [7]. Sibley employed the DNA–DNA hybridization data to postulate a novel avian taxonomy, published in 1990 as "Distribution and Taxonomy of Birds of the

Sibley and Ahlquist [12] grouped many of orders and families of birds correctly, but as we know today, they were completely wrong with others [1]. For example, New World vultures are not storks, as Sibley had assumed, but cluster at the base of the Accipitriformes. DNA–DNA hybridization has severe shortcomings, because it does not provide sufficient resolution and suffers from laboratory artifacts. Sibley and Ahlquist [7] knew the limitations of the DNA–DNA hybridization, but had no

We can isolate DNA from any bird tissue, such as blood and muscle, but DNA also occurs in feathers or in buccal swaps. Using PCR with specific primers, single genes (so-called marker genes) can be amplified and sequenced using the Sanger chain termination method. A schematic view of the procedure, how to go from

Already the sequence analysis of marker genes from mitochondria (e.g. COI, cytochrome b, ND2) or the nuclear genome is often very informative and enables informative and reliable phylogeny reconstructions. The choice of marker genes differs between animals and plants and furthermore, depends on whether one wants to

choice, because at that time, it was the only DNA method around.

*DOI: http://dx.doi.org/10.5772/intechopen.97013*

**3.2 DNA–DNA hybridization**

**3.3 DNA sequence analysis**

DNA to a phylogeny is illustrated in **Figure 4**.

study evolutionarily young or old relationships.

*From a sample with DNA to a phylogeny reconstruction.*

World" [12].

*DNA Analyses Have Revolutionized Studies on the Taxonomy and Evolution in Birds DOI: http://dx.doi.org/10.5772/intechopen.97013*

## **3.2 DNA–DNA hybridization**

*Birds - Challenges and Opportunities for Business, Conservation and Research*

and feature is the base for the Tree of life.

Clock" which is widely used in phylogenetics [3, 14].

Only mutations in germline cells (gametes or sex cells) can be inherited. Most mutations have no or negative consequences. Only in rare cases does a mutated gene or allele provide a carrier with a selective advantage to better adapt its bearer to its environment and thereby increase the reproductive success of its offspring. When we analyze DNA sequences or genome structures of organisms living today, we essentially see only mutations that were either neutral or had a positive selection value. Carriers of mutations with negative consequences have logically not withstood the selection pressure - they often had no or little reproductive success and just disappeared. Only germline mutations may end up in the next generation. If they are successful, they may survive in subsequent generations. If we look at the DNA of an individual, its DNA may differ by millions of nucleotide exchanges in its genome from conspecifics, which were inherited from the ancestors. These nucleotide exchanges can be discovered by DNA sequencing and can be used to reconstruct the Tree of life. A driver for the evolution of divergent DNA sequence lineages is their geographic or ecological separation. If a population gets isolated on an island and if there is no further exchange of individuals with the ancestral population, then an independent sequence evolution sets in, as outlined in **Figure 3**. This phenomenon

The rate of mutations is typical for individual genes and can be used to infer the date of ancient evolutionary divergence events. This is the concept of the "Biological

Darwin demanded variability of traits within populations as a prerequisite for Natural Selection. We now know that this variability exists and is due to diverse mutations in protein-coding genes and in genes for transcription factors. Mutations in regulatory genes sometimes lead to more pronounced morphological changes. This variability is used, for example, in artificial selection for animal and plant breeding. Darwin already recognized the high plasticity of our genomes, from which a breeder can generate new forms in just a few generations, such as the various cabbage vegetables bred from the wild cabbage plant or domestic dogs from wolves (see [3]).

*Geographic or ecological separations of populations lead to sequence evolution and phylogeny.*

**8**

**Figure 3.**

Charles Sibley was the first scientist to utilize DNA analysis to study avian systematics. When in 1975 Sibley embarked on his DNA work, DNA sequencing was not yet invented. Sibley employed DNA–DNA hybridization analysis instead, in which DNA melting temperatures are compared. Together with Jon Ahlquist Charles Sibley investigated the DNA melting profiles of more than 1700 bird taxa. In 1990, they published their results as "Phylogeny and Classification of Birds" [7]. Sibley employed the DNA–DNA hybridization data to postulate a novel avian taxonomy, published in 1990 as "Distribution and Taxonomy of Birds of the World" [12].

Sibley and Ahlquist [12] grouped many of orders and families of birds correctly, but as we know today, they were completely wrong with others [1]. For example, New World vultures are not storks, as Sibley had assumed, but cluster at the base of the Accipitriformes. DNA–DNA hybridization has severe shortcomings, because it does not provide sufficient resolution and suffers from laboratory artifacts. Sibley and Ahlquist [7] knew the limitations of the DNA–DNA hybridization, but had no choice, because at that time, it was the only DNA method around.

#### **3.3 DNA sequence analysis**

We can isolate DNA from any bird tissue, such as blood and muscle, but DNA also occurs in feathers or in buccal swaps. Using PCR with specific primers, single genes (so-called marker genes) can be amplified and sequenced using the Sanger chain termination method. A schematic view of the procedure, how to go from DNA to a phylogeny is illustrated in **Figure 4**.

Already the sequence analysis of marker genes from mitochondria (e.g. COI, cytochrome b, ND2) or the nuclear genome is often very informative and enables informative and reliable phylogeny reconstructions. The choice of marker genes differs between animals and plants and furthermore, depends on whether one wants to study evolutionarily young or old relationships.

**Figure 4.**

*From a sample with DNA to a phylogeny reconstruction.*

After 2000, next generation sequencing (NGS) became available in which whole genomes are analyzed by parallel sequencing [13]. Hundreds of millions of short DNA sequences can be generated in a single NGS run. These sequences are then assembled into longer DNA segments by bioinformaticians and assigned to known genes ("annotation"). Homologous DNA sequences are aligned and, as with marker genes, evaluated using phylogeny programs. A larger and more comprehensive collection of genes or even complete genomes and transcriptomes can be sequenced by the new High-Throughput Sequencers [13, 14].

The pyrosequencer 454 from Roche represented the first generation of NGS sequencers. Several companies developed new NGS strategies and sequencers, such as Illumina, SOLiD, IonTorrent, and PacBio [1, 13, 14]. The Illumina technology is a market leader at present; these sequencers generate of up to 250 million short sequences (50 to 200 nucleotides) in a single lane. The short sequences introduce a number of problems for bioinformatics, thus new developer look sequencers that generate longer reads. 3rd generation sequencers from PacBio or Nanopore Sequencing are beginning to reach the laboratory. The longer sequences allow a localization of the sequence on a chromosome and to reconstruct complete gene assemblies including repetitive elements. Longer and high quality reads are important to reconstruct phylogenies [14].

Several thousand genome sequences are now available, mainly from prokaryotes. The number of genome sequences from animals is comparably small. But already many genome sequences are available to reconstruct the large-scale phylogenomics of animal groups, such as birds: It is foreseeable that the phylogeny of most evolutionary lineages can be reliably reconstructed via genome sequencing in a few years (see Chapter 4).
