**3. Techniques to detect SNPs**

The choice of the methods for SNP detection is diverse. The SNP detection technologies have been evolved with the discovery of new techniques on reporter systems, fluorescent probes, development of enzymatic assays, use of highly sensitive instruments, and mostly the accelerated high-throughput sequencing technology and bioinformatic tools. In the post-genomic era, the accuracy and sensitivity of the detection methods have increased with a cost-effective manner.

The basic idea behind SNP detection is whether identifying a novel polymorphism that is previously not defined or searching for an already-known polymorphism. The techniques for detection can be divided into two main groups: (i) in vitro and (ii) in silico techniques (**Figure 1**). In vitro techniques comprise of non-sequencing, sequencing, and re-sequencing methods.

## **3.1 In vitro techniques**

#### *3.1.1 Non-sequencing techniques*

The firstly developed non-sequencing techniques are restriction digestionbased techniques such as restriction fragment length polymorphisms (RFLPs), cleaved amplified polymorphic sequences (CAPs), and derived cleaved amplified polymorphic sequences (dCAPs). These techniques mainly aim to create or disrupt a restriction enzyme recognition site [33]. Another group of non-sequencing technique is DNA conformation techniques which comprise denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE), and single-strand conformation polymorphism (SSCP). The basis of these techniques is the separation of DNA fragments of the same length with different base composition on their three-dimensional conformation [34]. The chip-based methods are based on DNA hybridization like DNA microarrays and rely on the biochemical principle of nucleotide complementation. Affymetrix and Illumina SNP Chips use

#### **Figure 1.**

*Techniques to detect SNPs. In vitro techniques include non-sequencing, sequencing, and re-sequencing methods. In silico techniques basically includes bioinformatic tools with several output-/input-oriented algorithms.*

**57**

*Single Nucleotide Polymorphisms (SNPs) in Plant Genetics and Breeding*

are considered as the most used and benefited for SNP detection.

the technology to hybridize fragmented single-stranded DNA to arrays containing thousands of nucleotide probe sequences that are designed to bind to a target DNA sequence [35]. Target-induced local lesions in genome (TILLING) is a reverse genetics approach that combines chemical mutagenesis with a sensitive mutation detection instrument called denaturing HPLC (DHPLC) [36]. The need of several optimization steps and "hundreds of bases"-long probes to detect only a small fraction of the region of interest made the non-sequencing methods very laborious and expensive. However, several newly developed approaches provide greater efficiencies. From all of these methodologies, direct DNA sequencing technologies

One of the first designed sequencing-based techniques for SNP detection is locus-specific PCR amplification. In this approach, a large number of loci are targeted using locus-specific PCR primers and through conducting direct sequencing of genomic PCR products. Another sequencing-based technique is reduced representation shotgun (RRS). This method is based on the migration pattern of genomic segments of the same origin with the same size in gel electrophoresis [37]. Comparison in overlapped regions of bacterial artificial chromosome (BAC) or P1-derived artificial chromosome (PAC) clone regions is another sequencing-based

Beside these techniques, there are re-sequencing approaches including matrixassisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF/MS) and pyrosequencing. MALDI-TOF/MS is based on differentiating genotypes by comparing the mass of DNA fragments after a single ddNTP primer extension reaction. This technique does not require labeling, and the detection depends on the mass of the ddNTP that is incorporated [39]. Pyrosequencing is a rapid re-sequencing approach in which the sequencing is performed by detecting the nucleotide incorporated by a DNA polymerase and monitored by a measure of

As mentioned above, many experimental strategies are currently available for SNP detection. In vitro SNP detection methods are often composed of several laborious steps or require specialized instruments which makes the process high-

The developments in the sequencing technologies have resulted in decreasing cost along with rapid progress in next-generation sequencing (NGS) and related bioinformatic computing resources. These developments accelerated the wholegenome association studies (WGAS) and the identification of many new SNPs in model and non-model plants. In the post-genomic era, SNPs became the commonly used marker systems in many plants with several advantages such as stability, ease of use, considerably low mutation rates, and high-throughput genotyping [41]. NGS platforms generate a considerable amount of data in which results in the urge

In silico methods are easy to apply to the SNPs that are occurring in known genomes or sequences of a species of interest. Bioinformatic research is constantly developing online and stand-alone tools, new software, and algorithms to analyze

of alternative data storage methods and shorter processing time.

*DOI: http://dx.doi.org/10.5772/intechopen.91886*

*3.1.2 Sequencing techniques*

approach for SNP discovery [38].

pyrophosphate (PPi) release [40].

**3.2 In silico techniques**

cost and compound.

*3.1.3 Re-sequencing techniques*

*Single Nucleotide Polymorphisms (SNPs) in Plant Genetics and Breeding DOI: http://dx.doi.org/10.5772/intechopen.91886*

the technology to hybridize fragmented single-stranded DNA to arrays containing thousands of nucleotide probe sequences that are designed to bind to a target DNA sequence [35]. Target-induced local lesions in genome (TILLING) is a reverse genetics approach that combines chemical mutagenesis with a sensitive mutation detection instrument called denaturing HPLC (DHPLC) [36]. The need of several optimization steps and "hundreds of bases"-long probes to detect only a small fraction of the region of interest made the non-sequencing methods very laborious and expensive. However, several newly developed approaches provide greater efficiencies. From all of these methodologies, direct DNA sequencing technologies are considered as the most used and benefited for SNP detection.

#### *3.1.2 Sequencing techniques*

*The Recent Topics in Genetic Polymorphisms*

The choice of the methods for SNP detection is diverse. The SNP detection technologies have been evolved with the discovery of new techniques on reporter systems, fluorescent probes, development of enzymatic assays, use of highly sensitive instruments, and mostly the accelerated high-throughput sequencing technology and bioinformatic tools. In the post-genomic era, the accuracy and sensitivity

The basic idea behind SNP detection is whether identifying a novel polymorphism that is previously not defined or searching for an already-known polymorphism. The techniques for detection can be divided into two main groups: (i) in vitro and (ii) in silico techniques (**Figure 1**). In vitro techniques comprise

The firstly developed non-sequencing techniques are restriction digestionbased techniques such as restriction fragment length polymorphisms (RFLPs), cleaved amplified polymorphic sequences (CAPs), and derived cleaved amplified polymorphic sequences (dCAPs). These techniques mainly aim to create or disrupt a restriction enzyme recognition site [33]. Another group of non-sequencing technique is DNA conformation techniques which comprise denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE), and single-strand conformation polymorphism (SSCP). The basis of these techniques is the separation of DNA fragments of the same length with different base composition on their three-dimensional conformation [34]. The chip-based methods are based on DNA hybridization like DNA microarrays and rely on the biochemical principle of nucleotide complementation. Affymetrix and Illumina SNP Chips use

*Techniques to detect SNPs. In vitro techniques include non-sequencing, sequencing, and re-sequencing methods. In silico techniques basically includes bioinformatic tools with several output-/input-oriented algorithms.*

of the detection methods have increased with a cost-effective manner.

of non-sequencing, sequencing, and re-sequencing methods.

**3. Techniques to detect SNPs**

**3.1 In vitro techniques**

*3.1.1 Non-sequencing techniques*

**56**

**Figure 1.**

One of the first designed sequencing-based techniques for SNP detection is locus-specific PCR amplification. In this approach, a large number of loci are targeted using locus-specific PCR primers and through conducting direct sequencing of genomic PCR products. Another sequencing-based technique is reduced representation shotgun (RRS). This method is based on the migration pattern of genomic segments of the same origin with the same size in gel electrophoresis [37]. Comparison in overlapped regions of bacterial artificial chromosome (BAC) or P1-derived artificial chromosome (PAC) clone regions is another sequencing-based approach for SNP discovery [38].

#### *3.1.3 Re-sequencing techniques*

Beside these techniques, there are re-sequencing approaches including matrixassisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF/MS) and pyrosequencing. MALDI-TOF/MS is based on differentiating genotypes by comparing the mass of DNA fragments after a single ddNTP primer extension reaction. This technique does not require labeling, and the detection depends on the mass of the ddNTP that is incorporated [39]. Pyrosequencing is a rapid re-sequencing approach in which the sequencing is performed by detecting the nucleotide incorporated by a DNA polymerase and monitored by a measure of pyrophosphate (PPi) release [40].

#### **3.2 In silico techniques**

As mentioned above, many experimental strategies are currently available for SNP detection. In vitro SNP detection methods are often composed of several laborious steps or require specialized instruments which makes the process highcost and compound.

The developments in the sequencing technologies have resulted in decreasing cost along with rapid progress in next-generation sequencing (NGS) and related bioinformatic computing resources. These developments accelerated the wholegenome association studies (WGAS) and the identification of many new SNPs in model and non-model plants. In the post-genomic era, SNPs became the commonly used marker systems in many plants with several advantages such as stability, ease of use, considerably low mutation rates, and high-throughput genotyping [41]. NGS platforms generate a considerable amount of data in which results in the urge of alternative data storage methods and shorter processing time.

In silico methods are easy to apply to the SNPs that are occurring in known genomes or sequences of a species of interest. Bioinformatic research is constantly developing online and stand-alone tools, new software, and algorithms to analyze

the SNPs. The recently developed open-source and freely available bioinformatic software have speed up the SNP detection and reduced the costs. The important point is the selection of the software, sequence platform, file requirement,

#### **Figure 2.**

*A common workflow schema for in silico SNP mining. According to the input data type, the steps and the algorithms change. If the data set is de novo assembly output, clustering step is needed. If the data set is based on a reference mapped output, mapping step is required. In order to mine possible SNPs, after clustering and mapping steps, alignment, variant calling, annotation, and diversity analysis should be applied, respectively.*

**59**

*Single Nucleotide Polymorphisms (SNPs) in Plant Genetics and Breeding*

resources available today to describe SNPs in many plants.

algorithmic background, operating systems, and organism of interest affect the choice of bioinformatic platform or pipeline to use. There are many databases and

The methods used for SNP mining will be quite similar for both databasederived and high-throughput sequencing-derived data. NGS technologies, Illumina GA/Solexa, SOLiDTM, Oxford Nanopore high-throughput sequencing, generate large amount of sequence data therefore many new SNPs. The method of choice may vary with different source data and varying approach. There are different analysis steps which apply to the two types of sequence data: reference sequence data where sequence data is acquired from species for which a reference sequence is accessible and de novo sequence data. In either case there are three main steps: (i) group the sequence reads according to their sequence similarities, and confirm identity of reads covering the same part of the genome or having the same transcript origin; (ii) align the reads; and (iii) scan for sequence variants (**Figure 2**). If a reference sequence is available, the first step will be determining a homology search tool to map the new sequence reads to the reference. There are several tools for global or local mapping such as BLAST and SSAHA for whole-genome data. If the short reads are derived from Illumina, specially developed tools such as SOAP and MAQ are available. UniGene set is designed for mapping transcript data. The next step is to select a multiple or pairwise alignment tool to align the mapped reads to the reference sequence. Software tools like CAP3 and Phrap have been extensively

In the case of de novo sequence data, an additional step called as clustering is needed to group the short sequence reads. TeraClu, TGICL, and d2cluster are the mostly used tools to fragmentate the input data and assemble them into individual contigs. After the clustering step, all nucleotides from individual reads at the identi-

The final step is the SNP calling or validation. If the fragments are from a trace file or a base quality score, PolyBayes, PolyPhred, novoSNP, and SNPdetector are very well-known tools. If de novo SNP mining is performed, AutoSNP, QualitySNP, and MAVIANT can be used. There are several SNP mining tools or databases available specialized for plants such as dbSNP, ESTree DB, POLYMORPH, SNiPlay, AutoSNPdb, IRIS, etc. Although these tools are the frequently used, reliable and

Single nucleotide polymorphism (SNP) causes genetic diversity among individuals of the species and can occur at different frequencies in different species throughout the entire genome. SNPs can cause phenotypic diversity among individuals such as the color of different plants or fruits, fruit size, ripening, flowering time adaptation, crop quality, grain yield, or tolerance to various abiotic and biotic factors [42]. While SNPs can cause changes in amino acids in the exon of a gene, it can also be silent. In addition, it can occur in noncoding regions. SNPs can influence promoter activity for gene expression and finally produce a functional protein by transcription. Therefore, identifying functional SNPs in genes and determining their effects on the phenotype can lead to a better understanding of the effects on gene function

Conventional breeding and marker-assisted breeding are two approaches used to perform plant breeding [42, 44]. However, in plant breeding, publications on the application of molecular markers compared to conventional breeding have increased significantly over the past 15 years. Plant breeding forms and will

cal position on the gene are aligned similarly using CAP3 and Phrap.

accurate tools, new tools, and platforms are being developed.

**4. Importance of SNPs for crop improvement**

for product development [43].

*DOI: http://dx.doi.org/10.5772/intechopen.91886*

used for this purpose.

#### *Single Nucleotide Polymorphisms (SNPs) in Plant Genetics and Breeding DOI: http://dx.doi.org/10.5772/intechopen.91886*

*The Recent Topics in Genetic Polymorphisms*

the SNPs. The recently developed open-source and freely available bioinformatic software have speed up the SNP detection and reduced the costs. The important point is the selection of the software, sequence platform, file requirement,

**58**

**Figure 2.**

*A common workflow schema for in silico SNP mining. According to the input data type, the steps and the algorithms change. If the data set is de novo assembly output, clustering step is needed. If the data set is based on a reference mapped output, mapping step is required. In order to mine possible SNPs, after clustering and mapping steps, alignment, variant calling, annotation, and diversity analysis should be applied, respectively.*

algorithmic background, operating systems, and organism of interest affect the choice of bioinformatic platform or pipeline to use. There are many databases and resources available today to describe SNPs in many plants.

The methods used for SNP mining will be quite similar for both databasederived and high-throughput sequencing-derived data. NGS technologies, Illumina GA/Solexa, SOLiDTM, Oxford Nanopore high-throughput sequencing, generate large amount of sequence data therefore many new SNPs. The method of choice may vary with different source data and varying approach. There are different analysis steps which apply to the two types of sequence data: reference sequence data where sequence data is acquired from species for which a reference sequence is accessible and de novo sequence data. In either case there are three main steps: (i) group the sequence reads according to their sequence similarities, and confirm identity of reads covering the same part of the genome or having the same transcript origin; (ii) align the reads; and (iii) scan for sequence variants (**Figure 2**).

If a reference sequence is available, the first step will be determining a homology search tool to map the new sequence reads to the reference. There are several tools for global or local mapping such as BLAST and SSAHA for whole-genome data. If the short reads are derived from Illumina, specially developed tools such as SOAP and MAQ are available. UniGene set is designed for mapping transcript data. The next step is to select a multiple or pairwise alignment tool to align the mapped reads to the reference sequence. Software tools like CAP3 and Phrap have been extensively used for this purpose.

In the case of de novo sequence data, an additional step called as clustering is needed to group the short sequence reads. TeraClu, TGICL, and d2cluster are the mostly used tools to fragmentate the input data and assemble them into individual contigs. After the clustering step, all nucleotides from individual reads at the identical position on the gene are aligned similarly using CAP3 and Phrap.

The final step is the SNP calling or validation. If the fragments are from a trace file or a base quality score, PolyBayes, PolyPhred, novoSNP, and SNPdetector are very well-known tools. If de novo SNP mining is performed, AutoSNP, QualitySNP, and MAVIANT can be used. There are several SNP mining tools or databases available specialized for plants such as dbSNP, ESTree DB, POLYMORPH, SNiPlay, AutoSNPdb, IRIS, etc. Although these tools are the frequently used, reliable and accurate tools, new tools, and platforms are being developed.
