**4. Common DNA-based molecular markers**

The development and use of molecular methods for the detection of DNA molecular markers is one of the most significant progresses in the field of molecular genetics. Mapping the human genome requires a set of genetic markers to which we can relate the position of genes. Some of these markers are genes, others SNPs and VNTRs. Molecular markers can be used to mark in genomes for various purposes such as mapping human diseases, pharmacogenetics, and human identification.

#### **4.1. Single nucleotide polymorphisms**

Single base pair change leads to single nucleotide variant, probably accounting for many genetic conditions caused by single gene or multiple genes. SNPs represent the major source of human genomic variability. Due to the lack of knowledge on exact SNP number, it is difficult to give a direct estimate of the number of the SNPs in the human genome but in different public and private data bases, more than 5 million have been recorded and about 4 million validated [23]. "The data from the Human Genome project revealed that that human nucleotide sequence differs every 1000-1500 bases from one individual to another" [24]. "The SNP Map working group observed that two haploid genomes differ at 1 nucleotide per 1331 bp". Over 60,000 however are within genes and some of them associated with diseases [2].

Single nucleotide polymorphisms within protein-coding regions either synonymous polymorphisms; those that do not have any effect on the organism and are said to be selectively silent as the substitution causes no amino acid change in the protein produced (silent mutation) or nonsynonymous substitution results in change in encoded amino acids either missense mutation; change the protein through codon alteration or nonsense mutation results in a chain termination codon [3].

Single nucleotide polymorphisms within a coding sequence cause genetic diseases including sickle cell anemia. SNPs responsible for a disease can also occur in any genetic region that can *eventually* affect the expression activity of genes, for example, in promoter regions. SNPs in the noncoding region of the gene, though their effect is still debatable, most of the genome mostly consists of regulatory elements that control gene expression, but these regions have remained largely unexplored in clinical diagnostics due to the high cost of whole genome sequencing and interpretive challenges. Clinical diagnostic sequencing currently focuses on identifying causal mutations in the exome, where most disease-causing mutations are known to occur.

Another important group of SNPs is the one that alters the primary structure of a protein involved in drug metabolism; these SNPs are targets for pharmacogenetics studies.

However, some SNPs are not causative, some SNPs are in close association with, and therefore segregate with, a disease-causing sequence so, the presence of SNP correlates with the presence or an increased risk of developing the disease; these SNPs are useful in diagnostics, disease prediction, and other applications [3].

Single nucleotide polymorphisms can be used as genetic markers for constructing high genetic maps and to carry out association studies related to diseases because of their abundance and the availability of high throughput analysis technologies. SNPs have become an important application in the development and research of genetic markers [14].

There are numerous strategies that can be implemented to new single nucleotide variant (SNVs) discoveries; the most common and well-known method is by direct sequencing and in comparison to a puplic or other sequence date base [25, 26] or locus specific amplification of target genomic region followed by sequence comparison [27, 28]; prescreening prior to sequence determination is needed. SNV detection encompasses two broad areas: (1) scanning DNA sequences for previously unknown polymorphisms and (2) screening (genotyping) individuals for known polymorphisms. Scanning for new SNVs can be further classified to two different types of approaches, the first one being the global (or random approach) and the other one being the regional (targeted approach) [14]. There are certain methods which have been developed for using SNVs randomly in the genome; "such as representation shotgun sequencing [14, 29], primer-ligation-mediated PCR [14, 30] and degenerate oligonucleotide– primed PCR" [14, 31].

Haplotypes are groups of SNPs that are generally inherited together. Haplotypes can have stronger correlations with diseases or other phenotypic effects compared with individual SNPs and may therefore provide increased diagnostic accuracy in some cases [32].

#### **4.2. Microsatellites (short tandem repeats)**

described as follows: macrosatellites, with sequence repeats longer than 100 bp, are the largest of the tandem DNA repeats, located on one or multiple chromosomes [11], minisatellites, stretches of DNA, are characterized by moderate length patterns, 10–100 bp usually less than 50 bp [9, 18], and microsatellites also known as short tandem repeats (STRs) repeat units of less than 10 bp, [3].

Structural and copy number variations (CNVs) are another frequent source of genome variability [6, 19, 20]. The term CNVs therefore encompasses previously introduced terms such as large-scale copy number variants (LCVs) [19], copy number polymorphisms (CNPs) [20], and intermediate-sized variants (ISVs) [21]. Some currently used terms are structural variations; a genomic alteration (e.g., an inversion) that involves segments of DNA > 1 kb, copy number polymorphisms; a duplication or deletion event involving >1 kb of DNA [22], intermediatesized structural variant; and a structural variant that is ∼8–40 kb in size, this can refer to a

The development and use of molecular methods for the detection of DNA molecular markers is one of the most significant progresses in the field of molecular genetics. Mapping the human genome requires a set of genetic markers to which we can relate the position of genes. Some of these markers are genes, others SNPs and VNTRs. Molecular markers can be used to mark in genomes for various purposes such as mapping human diseases, pharmacogenetics,

Single base pair change leads to single nucleotide variant, probably accounting for many genetic conditions caused by single gene or multiple genes. SNPs represent the major source of human genomic variability. Due to the lack of knowledge on exact SNP number, it is difficult to give a direct estimate of the number of the SNPs in the human genome but in different public and private data bases, more than 5 million have been recorded and about 4 million validated [23]. "The data from the Human Genome project revealed that that human nucleotide sequence differs every 1000-1500 bases from one individual to another" [24]. "The SNP Map working group observed that two haploid genomes differ at 1 nucleotide per 1331 bp".

Over 60,000 however are within genes and some of them associated with diseases [2].

Single nucleotide polymorphisms within protein-coding regions either synonymous polymorphisms; those that do not have any effect on the organism and are said to be selectively silent as the substitution causes no amino acid change in the protein produced (silent mutation) or nonsynonymous substitution results in change in encoded amino acids either missense mutation; change the protein through codon alteration or nonsense mutation results in

CNVs or a balanced structural rearrangement (e.g., an inversion) [21].

**4. Common DNA-based molecular markers**

and human identification.

a chain termination codon [3].

**4.1. Single nucleotide polymorphisms**

**3.4. Structural and copy number variations**

28 Genetic Diversity and Disease Susceptibility

Microsatellites are short tandem repeats (STRs), repeat units, or motifs of less than 10 bp; because of high variability, microsatellite loci are often used in forensics, population genetics, and genetic genealogy. Significant associations were demonstrated between microsatellite variants and many diseases [15].

Depending on the search algorithm, there are approximately 700,000–1,000,000 microsatellite loci which are 2–6 bp long in the human reference genome [33, 34]. Di- and tetra-nucleotides constitute about 75% of microsatellites, with the remaining loci containing tri-, penta, and hexanucleotide. Within genes, STRs are nonrandomly distributed across protein-coding sequences, untranslated regions (UTRs), and introns. STRs containing dinucleotide repeat units that are much more abundant in the regulatory or UTR regions than in other genomic regions. In the coding regions of the genes, repeats mostly have either trimeric or hexameric repeat unit, likely as a result of selection against frameshift mutations [34, 35]. "The mutation rates of STRs often lie between 103 and 106 per cell generation which is 10- to 10<sup>5</sup> -fold higher than the average mutation rates observed in nonrepeated regions of the genome"[36, 37].

real time PCR, hybridization techniques using DNA microarray chips, genome sequencing

DNA Polymorphisms: DNA-Based Molecular Markers and Their Application in Medicine

http://dx.doi.org/ 10.5772/intechopen.79517

31

DNA digestion with restriction enzyme endonuclease cuts DNA at a specific sequence pattern known as a restriction endonuclease recognition site. Thus, the alleles differ in length and can be distinguished by gel electrophoresis, which can arise from a number of genetic events including point mutation in restriction sites, mutation that creates a new restriction site, insertion, deletion, and repeated sequences. The first polymorphic RFLP was described in 1980. RFLPs were the original DNA targets used for human identification, parentage testing, and

The method of hybridization of DNA with probes is called Southern blotting, after the name of the inventor, Southern [41]. RFLP requires relatively large amounts of DNA. Hence, it cannot be performed with the samples degraded by environmental factors and also takes longer time to get the results [42, 43]. PCR-RFLP is now replaced to avoid using Southern blot.

In-vitro amplification of particular DNA sequences with the help of specifically chosen primers and DNA polymerase enzyme is done. The amplified fragments are separated electrophonically and detected by different staining methods. Real-time PCR useful modification of PCR can detect polymorphisms by various methodologies using real-time PCR chemistries,

Genomic array technology is a type of hybridization analysis allowing simultaneous study of large numbers of targets or samples. In 1987, macroarray evolved into the microarray. Tens of thousands of targets can be screened simultaneously in a very small area. Automated depositing systems (arrayers) can place thousands of spots on glass substrate of the size of a microscope slide (chip) with spotting representative sequences of each gene in triplicate, simultaneous screening of the entire human genome on a single chip. This technique facilitates the process of identifying specific homozygous and heterozygous alleles, by comparing the disparity of hybridization of the target DNA with each redundant probe. Microarray is also used to characterize genetic diversity and drug responses, to identify new drug targets,

Since technologies for rapid DNA sequencing have become available they are now widely used. There is a great progression for the detection of single nucleotide variants (SNVs) by direct sequencing, but intermediate-sized (from 50 bp to 50 kb) structural variants (SVs) remain a challenge. Such variants are too small to detect with cytogenetic methods but too large to reliably discover with short-read DNA sequencing. Recent high-quality genome

and to assess the toxicological properties of chemicals and pharmaceuticals [44].

each technique has its own advantage and disadvantage.

gene mapping.

**5.2. Polymerase chain reaction**

**5.3. Genomic array technology**

**5.4. Sequencing**

for example, TaqMan assay or molecular beacons.

**5.1. Restriction fragment length polymorphism with southern blot**

"Polymorphism of tandem repeats within protein-coding regions reveals that tandem repeat variation is an important source of variation in many proteins, many of this variation is of significant impact on protein function. Tandem repeats has been associated with a number of diseases and phenotypic conditions, changes in the protein products of genes, leading to diseases, other tandem repeat polymorphisms in noncoding regions are known to modify function through their impact on gene regulation". "These polymorphisms can arise from events such as unequal crossover, replication slippage or double-strand break repair" [38].

Variations in the STR length play a significant role in modulating gene expression and STRs are likely to be general regulatory elements; regulatory STRs manifest significant polymorphism because of their high intrinsic mutation rate [15].

There are examples for distinctive phenotypic changes and diseases that are directly associated with the increases or decreases of microsatellite repeat arrays; for example, considering Huntington disease gene, triplet nucleotide mutations, the mutation that causes the disease, is an expansion of CAG repeats from the normal range of 11–14 copies to abnormal range of at least 38 copies. The extra CAG repeats that causes extra glutamine is produced [9] and there are more than 40 neurological diseases in humans, such as spinocerebellar ataxia with polyglutamine tracts, which are caused by microsatellite motif length changes in trinucleotide arrays [39].

Testing candidate genes for polymorphisms in exons, promoters, splice sites, or other regulatory regions will have to be done using SNP testing, because it is the most common polymorphisms and more likely responsible for phenotypic variations. For complex phenotypic traits and candidate loci, single-loci SNP analyses present less information due to the bi-allelic nature of the markers, as compared to the multi-allelic microsatellites. However, performing haplotype frequency may improve the accuracy [40]. Recently, polymorphic tandem repeated sequences and coy number variations have emerged as important sources of genomic diversity that facilitate the study of genetic variations in health and diseases.
