**2. Polymorphisms**

Genetic polymorphism, the definition by Cavalli-Sforza and Bodmer, is the occurrence in the same population of two or more alleles at one locus, each with appreciable frequency, where the minimum frequency is typically taken as 1% [8]. An allele is one of the variant forms of a gene at specific locus on a homologous chromosome. The different forms of the polymorphism (alleles) are observed more often in the general population than mutations. The most common polymorphism in the human genome is the single-nucleotide polymorphism (SNP) [9].

#### **2.1. Single nucleotide polymorphism (SNP or snip)**

SNPs are popular molecular genetic markers in disease genetics studies and pharmacogenomic research. It is a single base change in a DNA sequence, with a normal alternative of two possible nucleotides at a given position. This variation occurs at a specific position in the genome and has allele frequency of 1% or greater [10]. Around 325 million, SNPs have been identified in the human genome, 15 million of which are present at frequencies of 1% or higher across different populations worldwide [11]. An example for SNP is shown in **Figure 1**. It demonstrates that at a specific position of human genome when compared between two individuals and two DNA sequences. The DNA sequence of a person 1 has C nucleotide which is similar to most of the other person (majority group), whereas the DNA sequence of a person 2 has T at this position which is minority group of population. It is said that there is an SNP at this specific position between allele C or T.

chromosomes, whereas a SNP with a minor allele frequency of 1% or greater requires 192

Polymorphisms

5

http://dx.doi.org/10.5772/intechopen.76728

**Figure 1.** Single nucleotide polymorphism (SNP). (http://en.wikipedia.org/wiki/Single-nucleotide\_polymorphism).

Currently, the genotyped in a large scale of SNPs can be performed by automated machines, which facilitate the genetic association study using DNA-based marker. Human Genome Project rank SNP discovery and characterization as high priorities [16, 17] and encourage

The location of SNPs may affect gene products and others. The SNPs within a gene may alter protein structures. The SNPs in the regulatory region outside a gene may affect when and how the gene is turned on, which affects the quantity of the protein produced. They also affect gene splicing, transcription factor binding, or the sequence of non-coding RNA. The SNPs that are not within the proximity of a gene may be used as genetic markers for locating

As described earlier, the SNPs may fall within coding sequences of gene, or non-coding regions of gene, or in the intergenic regions (regions between genes). The SNPs in the coding region of gene are divided into two types: synonymous and nonsynonymous SNPs. The synonymous SNPs do not change the amino acid sequence of protein or not affect the protein function. The nonsynonymous SNPs are divided into two types: missense and nonsense. A missense SNP, single nucleotide change results in a codon that codes for a different amino acid, resulting in protein nonfunction. For nonsense, a point mutation in a sequence of DNA that changes to a stop codon results in a nonfunctional protein product. SNPs that are in non-coding regions

public and private sections [4, 18, 19] to push an effort toward these objectives.

chromosomes for the verification of genotype of SNP [15].

*2.1.1. Effects of SNPs location*

disease-causing genes (**Figure 2**).

*2.1.2. Types of SNPs*

The majority of SNPs have two alleles, which represent a substitution of one base for another. The SNP occurs at each allele of an individual may be different. If the SNP occurs more frequently in the general population, it is called "major" allele. In contrast, if the frequency of the SNP exist is rare in the population, it is designated the "minor" allele. Since human have two copies of chromosome or diploid, therefore, an individual can have various genotypes such as homozygous of major or minor alleles, or heterozygous of major and minor allele [9]. Many SNPs are correlated with one another, so it is difficult to distinguish the SNP that affects the phenotype from the several SNPs associated with it [12].

SNPs are identified and characterized by sequencing the same genomic region in several populations [13, 14]. The sample size of the population being resequenced is important. In general, larger sample sizes are needed to identify SNPs on the lower end of the minor allele frequency spectrum. The minor allele frequency (MAF) refers to the frequency at which the less common allele occurs in a given population. By using population genetics theory prediction for a SNP detection rate of 99%, a SNP with a minor allele frequency of 5% or greater needs 48

**Figure 1.** Single nucleotide polymorphism (SNP). (http://en.wikipedia.org/wiki/Single-nucleotide\_polymorphism).

chromosomes, whereas a SNP with a minor allele frequency of 1% or greater requires 192 chromosomes for the verification of genotype of SNP [15].

Currently, the genotyped in a large scale of SNPs can be performed by automated machines, which facilitate the genetic association study using DNA-based marker. Human Genome Project rank SNP discovery and characterization as high priorities [16, 17] and encourage public and private sections [4, 18, 19] to push an effort toward these objectives.

#### *2.1.1. Effects of SNPs location*

more than 9 million reported in public databases [5–7]. In this chapter, the definition of several terms such as polymorphism, minor allele frequency (MAF), allele frequency, haplotype, and linkage disequibrium (LD) is clarified. Moreover, SNPs, genome-wide association study (GWAS), methods to detect SNPs and application of SNPs in association with diseases and

Genetic polymorphism, the definition by Cavalli-Sforza and Bodmer, is the occurrence in the same population of two or more alleles at one locus, each with appreciable frequency, where the minimum frequency is typically taken as 1% [8]. An allele is one of the variant forms of a gene at specific locus on a homologous chromosome. The different forms of the polymorphism (alleles) are observed more often in the general population than mutations. The most common polymorphism in the human genome is the single-nucleotide polymorphism (SNP) [9].

SNPs are popular molecular genetic markers in disease genetics studies and pharmacogenomic research. It is a single base change in a DNA sequence, with a normal alternative of two possible nucleotides at a given position. This variation occurs at a specific position in the genome and has allele frequency of 1% or greater [10]. Around 325 million, SNPs have been identified in the human genome, 15 million of which are present at frequencies of 1% or higher across different populations worldwide [11]. An example for SNP is shown in **Figure 1**. It demonstrates that at a specific position of human genome when compared between two individuals and two DNA sequences. The DNA sequence of a person 1 has C nucleotide which is similar to most of the other person (majority group), whereas the DNA sequence of a person 2 has T at this position which is minority group of population. It is said that there is an SNP at

The majority of SNPs have two alleles, which represent a substitution of one base for another. The SNP occurs at each allele of an individual may be different. If the SNP occurs more frequently in the general population, it is called "major" allele. In contrast, if the frequency of the SNP exist is rare in the population, it is designated the "minor" allele. Since human have two copies of chromosome or diploid, therefore, an individual can have various genotypes such as homozygous of major or minor alleles, or heterozygous of major and minor allele [9]. Many SNPs are correlated with one another, so it is difficult to distinguish the SNP that affects the

SNPs are identified and characterized by sequencing the same genomic region in several populations [13, 14]. The sample size of the population being resequenced is important. In general, larger sample sizes are needed to identify SNPs on the lower end of the minor allele frequency spectrum. The minor allele frequency (MAF) refers to the frequency at which the less common allele occurs in a given population. By using population genetics theory prediction for a SNP detection rate of 99%, a SNP with a minor allele frequency of 5% or greater needs 48

drug development are mainly discussed topics.

**2.1. Single nucleotide polymorphism (SNP or snip)**

this specific position between allele C or T.

phenotype from the several SNPs associated with it [12].

**2. Polymorphisms**

4 Genetic Diversity and Disease Susceptibility

The location of SNPs may affect gene products and others. The SNPs within a gene may alter protein structures. The SNPs in the regulatory region outside a gene may affect when and how the gene is turned on, which affects the quantity of the protein produced. They also affect gene splicing, transcription factor binding, or the sequence of non-coding RNA. The SNPs that are not within the proximity of a gene may be used as genetic markers for locating disease-causing genes (**Figure 2**).

#### *2.1.2. Types of SNPs*

As described earlier, the SNPs may fall within coding sequences of gene, or non-coding regions of gene, or in the intergenic regions (regions between genes). The SNPs in the coding region of gene are divided into two types: synonymous and nonsynonymous SNPs. The synonymous SNPs do not change the amino acid sequence of protein or not affect the protein function. The nonsynonymous SNPs are divided into two types: missense and nonsense. A missense SNP, single nucleotide change results in a codon that codes for a different amino acid, resulting in protein nonfunction. For nonsense, a point mutation in a sequence of DNA that changes to a stop codon results in a nonfunctional protein product. SNPs that are in non-coding regions

the chromosomes, and each individual is represented twice to account for the maternal and paternal contributions [9]. The fundamental difference between haplotypes and individual

Polymorphisms

7

http://dx.doi.org/10.5772/intechopen.76728

Haplotypes inform about the exchange of DNA during meiosis or recombination, which is useful for locating the mutation that are associate with diseases by using linkage method. It

In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population that may or may not be on the same chromosome. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is higher or lower than what would be expected if the loci were independent and associated randomly [23]. LD can detect differences between the SNP patterns of the two groups and reveal which pattern is most likely associated with the disease-causing gene or response

LD is an important concept in genetic studies that aims to identify and localize genes related to disease susceptibility. LD is commonly used to indicate that two genes are physically linked. It is defined as the difference between the observed frequency of a particular combination of alleles at two loci and the frequency expected for random formation of haplotypes from alleles. The frequency of a particular allele at a given locus will be independent of alleles at other linked loci. LD plays a crucial role in the current methods for mapping complex disease or trait-associated genes or plays a key role in health and disease. The level of linkage disequilibrium is influenced by a number of factors such as genetic linkage, selection, the rate of recombination, the rate of mutation, genetic drift, non-random mating, and population

GWAS identify the common disease-causing variants by using high throughput genotyping equipment to examine hundreds of thousands of common SNPs and compare these common genetic variants in large numbers of affected cases (patient) to those in unaffected controls (non-patient) to determine whether have an association with disease (**Figure 3**)

In most chromosome regions, there is strong association among SNP, therefore, only a few SNPs in each region are selected to be sequenced to predict the alleles of the remaining SNPs in that region. An accurate mapping of LD pattern among SNPs which differ across ancestral groups is required for selecting the best tag SNPs. The precise LD maps are needed to help genetic association studies and stimulated the developing of human haplotype map [26, 27]. GWAS pinpoint genes that may contribute to a risk of developing disease. The data derived from GWAS inform about disease etiology, therapeutic targets, and gene function

genotypes at SNPs is that the alleles are assigned to a chromosome.

has an effect on linkage disequilibrium.

**3.5. Genome-wide association studies (GWAS)**

**3.4. Linkage disequilibrium (LD)**

to certain drugs.

structure.

[24, 25].

[28].

**Figure 2.** SNP location.

of gene, or in the intergenic regions may affect gene splicing (SNPs at intron region), transcription factor binding (SNPs at 5′ untranslated region), messenger RNA degradation, or the sequence of non-coding RNA. The type of SNPs located upstream or downstream from the gene that affect gene expression is referred to an expression SNP (eSNP).
