**6.7 Scans for selective sweeps**

The domestic dog is thought to be the most recent species of the canine family, within which three phylogenetic groups, or clades, are distinguished: the domestic dog belongs to the same clade as the gray wolf, coyote and jackals [31]. It is thought that the dog appeared about 40,000 years ago, and that the first steps in its domestication took place in East Asia [32]. Most of the domestic breeds we know today, however, are the result of human selection over the past two or three centuries. Many of the most popular modern breeds were created in Europe in the 19th century. Some of the breeds were already present in the ancient world as the greyhound and the dog of the pharaohs. Studies conducted at the genomic level have highlighted a stratification of genetic variability within dog breeds. The recent sequencing methods and the use of SNP arrays allow the screening of the whole genome for the presence of signatures of selection. Sequencing data are aligned to the reference genome to identify selective sweeps. The presence of genes with


#### **Table 2.**

*SNP genotyping using a SNP chip array in dogs.*


#### **Table 3.**

*Example of GWAS and selective sweep analysis in dogs.*

a large number of outliers indicates a positive or negative effect of selection. A genome scan approach can be used to distinguish genome-wide processes (expected to mainly reflect demographic histories) from processes at individual loci. Genome scans may suffer from inflated numbers of false positives under hierarchical spatial structure coupled with isolation by-distance dynamics. In the case of positive selection, there is an increase in the fitness of the population due to a new (or rare) mutation. In the case of hard sweeps, there is an increase in the frequency of some variants and in the linkage disequilibrium. Kim et al. [33] compared 127 dogs (sport-hunting vs. terrier) for sporting characteristics. Results of the study showed the main SNPs (cardio-circulatory, muscular and neuronal systems) and selection signature that are involved in the sport-hunting breeds. In **Table 3** is reported an example of GWAS and selective sweep analysis in dogs **[**29**]**.

#### **7. Genome applications in the canine sector**

The canine genome project was launched in the early 1990s. After some preliminary results, in 2003, a fist sequence of the dog's genome was obtained from a female boxer which is now the reference sequence for the dog [34]. The availability of a high quality canine genome has revolutionized the way in which geneticists operate. The first version of the boxer's genome, carried out with a coverage of 7.5x, covered nearly 99 percent of the animal's genome. The genome sequence provided a first description of the organization, number of genes and the presence of repeated elements. To some surprise, they found a high presence of short interspersed nuclear elements (SINEs) throughout the genome, sometimes located in locations from which they could affect gene expression. For example, the insertion of a SINE into the gene encoding the hypocretin receptor (a neuropeptide hormone found in the hypothalamus) causes narcolepsy in the Doberman. Similarly, the insertion of a SINE element into the silv gene, which is known to be linked to the pigmentation process, is responsible for a particular mottled color called merle. The 2003 sequence comprises approximately 2.4 billion of bases and revealed the existence of approximately 19,000 genes. For about 75% of genes, the homology (resulting from

**15**

**Figure 4.**

*Canine Genetics and Genomics*

*DOI: http://dx.doi.org/10.5772/intechopen.95781*

main components (CP) have been identified (**Figure 4**).

shared ancestral material) between the dog, man and mouse is very high. The study also found that many genes have no gaps in their sequence, which is beneficial if you would like to study the correlation between a given gene and a disease. During its evolution, the dog's genome has accumulated more than two million of SNPs. These markers are proving crucial in understanding the role of genetic variability within one breed and in different breeds. SNPs, analyzed by means of DNA microarrays or bead arrays, can make an important contribution to GWAS (association studies) aimed at identifying the genes responsible for complex traits in dogs. A microarray with around 170,000 SNPs is currently available. By comparing data from dogs with a certain disease with healthy individuals, it is possible to quickly identify the genes responsible for the disease. Dog breeds differ not only in the overall body size but also in leg length, head shape and many other morphological characteristics. In the dog, the phenotypic variability of several traits is very high compared to the other living terrestrial mammals. The first molecular study on the genetic aspects of dog morphology was conducted at the University of Utah [35, 36]. Called Georgie Project (in memory of a dog), the study focused on the Portuguese water dog breed, ideal for this type of study because it comes from a small number of ancestors. In the project, DNA samples of more than a thousand dogs were collected. A completed genome scan using 500 microsatellite markers was carried out. For these animals, in addition to the genealogical and medical data, more than 90 anatomical measurements were obtained from a series of five radiographs taken on each animal during the first phase of the study. Based on the analysis of these data, four primary

The analysis of the genome scans and principal components (CPs) revealed 44 putative QTLs (quantitative trait loci associated with a particular quantitative trait) on 22 chromosomes. QTLs are identified by means of a complicated statistical analysis and identify the genome regions that contribute to the expression of a certain trait. Of particular interest is the gene CFA15 on chromosome 15 which showed a strong association with the body size. Although, it is only one of seven loci thought to affect the body size, it was chosen as the starting point. To find the gene CFA15, several SNPs were identified and then the resulting set of genomewide markers were genotyped. The distribution of these markers showed a single peak near the insulin-like growth factor-1 (IGF 1) gene, which codes for insulin-like growth factor which is known to code for the body size in humans and mice. IGF 1 was analyzed in detail, discovering that there are only two specific combinations of alleles (called haplotypes) and one of them is present in 96% of the population. The haplotype associated with the small size was called B, while the one associated

*Example of PCA (principal component analysis) of genotypic data (autosomal) of three dog populations.*

#### *Canine Genetics and Genomics DOI: http://dx.doi.org/10.5772/intechopen.95781*

*Canine Genetics, Health and Medicine*

chest, white head

**GWAS**

Vcftools60

**Table 3.**

**Dataset of 268 dogs representing 130 breeds**

**R Studio** – Manhattan correlation and box-plots **Identification of positively selected genes**

*Example of GWAS and selective sweep analysis in dogs.*

**Beagle** - infer the haplotype phase

correlation level cutoff of 0.95.

Samples with ≥10x coverage, selecting two males and two females

**XP-EHH** - splitting the genome into non-overlapping segments of 50 kb

a large number of outliers indicates a positive or negative effect of selection. A genome scan approach can be used to distinguish genome-wide processes (expected to mainly reflect demographic histories) from processes at individual loci. Genome scans may suffer from inflated numbers of false positives under hierarchical spatial structure coupled with isolation by-distance dynamics. In the case of positive selection, there is an increase in the fitness of the population due to a new (or rare) mutation. In the case of hard sweeps, there is an increase in the frequency of some variants and in the linkage disequilibrium. Kim et al. [33] compared 127 dogs (sport-hunting vs. terrier) for sporting characteristics. Results of the study showed the main SNPs (cardio-circulatory, muscular and neuronal systems) and selection signature that are involved in the sport-hunting breeds. In **Table 3** is reported an

**Xpclr** - phased genotype input; non-overlapping windows (50 kb), 600 SNPs within each window;

**Phenotypes used in the study:** canids catalog, kinship, aggressiveness, boldness, bulky, drop ears, furnishing, hairless, height, large ears, lengh of fur, life span, long legs, muscled, tail curl, weight, white

**Gemma** - linear-mixed model methods; elimination of variants with missing value >1

The canine genome project was launched in the early 1990s. After some preliminary results, in 2003, a fist sequence of the dog's genome was obtained from a female boxer which is now the reference sequence for the dog [34]. The availability of a high quality canine genome has revolutionized the way in which geneticists operate. The first version of the boxer's genome, carried out with a coverage of 7.5x, covered nearly 99 percent of the animal's genome. The genome sequence provided a first description of the organization, number of genes and the presence of repeated elements. To some surprise, they found a high presence of short interspersed nuclear elements (SINEs) throughout the genome, sometimes located in locations from which they could affect gene expression. For example, the insertion of a SINE into the gene encoding the hypocretin receptor (a neuropeptide hormone found in the hypothalamus) causes narcolepsy in the Doberman. Similarly, the insertion of a SINE element into the silv gene, which is known to be linked to the pigmentation process, is responsible for a particular mottled color called merle. The 2003 sequence comprises approximately 2.4 billion of bases and revealed the existence of approximately 19,000 genes. For about 75% of genes, the homology (resulting from

example of GWAS and selective sweep analysis in dogs **[**29**]**.

**7. Genome applications in the canine sector**

**14**

shared ancestral material) between the dog, man and mouse is very high. The study also found that many genes have no gaps in their sequence, which is beneficial if you would like to study the correlation between a given gene and a disease. During its evolution, the dog's genome has accumulated more than two million of SNPs. These markers are proving crucial in understanding the role of genetic variability within one breed and in different breeds. SNPs, analyzed by means of DNA microarrays or bead arrays, can make an important contribution to GWAS (association studies) aimed at identifying the genes responsible for complex traits in dogs. A microarray with around 170,000 SNPs is currently available. By comparing data from dogs with a certain disease with healthy individuals, it is possible to quickly identify the genes responsible for the disease. Dog breeds differ not only in the overall body size but also in leg length, head shape and many other morphological characteristics. In the dog, the phenotypic variability of several traits is very high compared to the other living terrestrial mammals. The first molecular study on the genetic aspects of dog morphology was conducted at the University of Utah [35, 36]. Called Georgie Project (in memory of a dog), the study focused on the Portuguese water dog breed, ideal for this type of study because it comes from a small number of ancestors. In the project, DNA samples of more than a thousand dogs were collected. A completed genome scan using 500 microsatellite markers was carried out. For these animals, in addition to the genealogical and medical data, more than 90 anatomical measurements were obtained from a series of five radiographs taken on each animal during the first phase of the study. Based on the analysis of these data, four primary main components (CP) have been identified (**Figure 4**).

The analysis of the genome scans and principal components (CPs) revealed 44 putative QTLs (quantitative trait loci associated with a particular quantitative trait) on 22 chromosomes. QTLs are identified by means of a complicated statistical analysis and identify the genome regions that contribute to the expression of a certain trait. Of particular interest is the gene CFA15 on chromosome 15 which showed a strong association with the body size. Although, it is only one of seven loci thought to affect the body size, it was chosen as the starting point. To find the gene CFA15, several SNPs were identified and then the resulting set of genomewide markers were genotyped. The distribution of these markers showed a single peak near the insulin-like growth factor-1 (IGF 1) gene, which codes for insulin-like growth factor which is known to code for the body size in humans and mice. IGF 1 was analyzed in detail, discovering that there are only two specific combinations of alleles (called haplotypes) and one of them is present in 96% of the population. The haplotype associated with the small size was called B, while the one associated

**Figure 4.** *Example of PCA (principal component analysis) of genotypic data (autosomal) of three dog populations.*

with the largest size was called I. Homozygous dogs for the haplotype B showed a smaller average body size while, dogs homozygous for I were larger. Heterozygous dogs showed an intermediate size. The Georgie Project is important for the number of genes discovered. In addition to the genes related to the head shape, body size, leg length and many other traits, additional genes were discovered that control the sexual dimorphism [37, 38]. This dimorphism is observed in almost all mammals but its mechanisms it is not yet fully known. Indeed, it was found a gene on chromosome 15 which interacts with other genes to make males larger and females smaller. On average, females of the Portuguese water dog breed are 15% smaller than the males.
