**5. Nucleic acid identification by sequencing methods**

Sequencing is the basic method to determine the nucleotide sequence of DNA and RNA molecules. There are different sequencing methods and variations, as the methodology evolves based on successful adaptation and application of the properties of nucleic acids.

Sanger method – in 1974, Sanger et al. [53] reported a sequencing method now known as the Sanger method. It is similar with fragment analysis in that the reaction includes a DNA template, radioactive labeled dNTPs and dideoxynucleotides (ddNTPs – without 3′-hydroxyl group, which is essential in phosphodiester bond formation – ddA, ddG, ddC, and ddT), T7 DNA polymerase (with ability to incorporate 2′,3′-dideoxynucleotides), a primer (forward or reverse) and reaction buffer. The annealing, labeling, and termination steps are performed on different thermoblocks, and the polymerase reaction at 37°C. The polymerase enzyme can incorporate either dNTPs or ddNTPs (depending on their relative concentration) at each elongation step. Elongation will proceed if dNTPs are added and will stop if a ddNTP is added at the 3′ end of the strand. The resulting fragments are of different size (length) and, in gel electrophoresis, will migrate toward the positive electrode at a rate of migration inversely proportional to their molecular weight. The method can differentiate fragments that are only 1 bp different in length.

This served as the basis for development of another approach, in which fluorescent dyes are used instead of radioactive isotopes – cycle sequencing or capillary sequencing (the term "capillary" stems from the fact that electrophoresis is performed in a special matrix in capillary tubes, and fluorescence is detected by means of a laser beam). The reaction components are the same as those in the Sanger method, but are mixed in such a way as to allow thermal cycling: denaturation, annealing, elongation, and generation of a balanced population of short and long fragments, using the same principle as the Sanger method (Figure 5A). Specialized software is used to process the detected fluorescence of each fragment and to plot the result as an electropherogram (Figure 5B) in ABI, FASTA, and PAUP file format.

A limitation of the method is the size of the fragments that can be sequenced: maximum 800– 1000 nucleotides per run.

Sanger method and cycling sequencing (A). Sample electropherogram (portion) of the betalactamase OXA-48 gene of*Klebsiella pneumoniae* strain OXA48BG, NCBI, GenBank accession number KJ959619.1 [54] (B). **Figure 5.** Generation of fragments by incorporation of dNTPs and ddNTPs (labeled) in the Sanger method and cycling sequencing (A). Sample electropherogram (portion) of the beta-lactamase OXA-48 gene of *Klebsiella pneumoniae* strain OXA48BG, NCBI, GenBank accession number KJ959619.1 [54] (B).

The sequences obtained using forward and reverse primersare analyzed by programs available either as freeware [55, 56] or as commercial software. What is important to remember is to always check the sequencing results in the file generated by the software against the electropherogram: there may often be discrepancies between the nucleotide sequence in the file The sequences obtained using forward and reverse primers are analyzed by programs available either as freeware [55, 56] or as commercial software. What is important to remember is to always check the sequencing results in the file generated by the software against the electropherogram: there may often be discrepancies between the nucleotide sequence in the

In the alignment ofsequences (first, between the F andRprimers of a sample and, second,

The way is now open for nucleic acid research using NGStechnologies. This part of the

The Illumina technology is based on the principle described above, i.e., incorporation of



on the operator and the consumables/method used). It is important for the target DNA

fragmentation of the target DNA, followed by 5 and 3 ligation of fragments by

fluorescently labeled dNTPs by DNA polymerase using a DNA or a cDNA template in a sequence of cycles. The identification of incorporated nucleotides is based on fluorophore

(total DNA, PCR products, cDNA) to be of the highest possible purity.

and that in the electropherogram. In such cases, the electropherogram should be considered more

between different samples), it is important for them to be equal in length. The next step is sequence analysis – phylogenetic analysis, genetic distance analysis, etc. For example, in phylogenetic analysis, it is particularly important to choose the mathematical model that is most appropriate for each particular case.Instead, there are software programs especially designed to determine the most appropriate model depending on the sequence length, the number of sequences, potential substitutions, etc.For example,jModelTest[57] analyzes andselects among

reliable but the background effect should also be accounted for.

*New-Generation Sequencing*(NGS) *platforms*

excitation.The whole process includes the following steps:

IlluminaSequencing by Synthesis

chapter will focus on three NGS platforms: Illumina, Nanopore, and Ion Torrent.

89 different models.

file and that in the electropherogram. In such cases, the electropherogram should be considered more reliable but the background effect should also be accounted for.

In the alignment of sequences (first, between the F and R primers of a sample and, second, between different samples), it is important for them to be equal in length. The next step is sequence analysis – phylogenetic analysis, genetic distance analysis, etc. For example, in phylogenetic analysis, it is particularly important to choose the mathematical model that is most appropriate for each particular case. Instead, there are software programs especially designed to determine the most appropriate model depending on the sequence length, the number of sequences, potential substitutions, etc. For example, jModelTest [57] analyzes and selects among 89 different models.

### **5.1. Next-Generation Sequencing (NGS) platforms**

The way is now open for nucleic acid research using NGS technologies. This part of the chapter will focus on three NGS platforms: Illumina, Nanopore, and Ion Torrent.

#### **5.2. Illumina sequencing by synthesis**

template, radioactive labeled dNTPs and dideoxynucleotides (ddNTPs – without 3′-hydroxyl group, which is essential in phosphodiester bond formation – ddA, ddG, ddC, and ddT), T7 DNA polymerase (with ability to incorporate 2′,3′-dideoxynucleotides), a primer (forward or reverse) and reaction buffer. The annealing, labeling, and termination steps are performed on different thermoblocks, and the polymerase reaction at 37°C. The polymerase enzyme can incorporate either dNTPs or ddNTPs (depending on their relative concentration) at each elongation step. Elongation will proceed if dNTPs are added and will stop if a ddNTP is added at the 3′ end of the strand. The resulting fragments are of different size (length) and, in gel electrophoresis, will migrate toward the positive electrode at a rate of migration inversely proportional to their molecular weight. The method can differentiate fragments that are only

This served as the basis for development of another approach, in which fluorescent dyes are used instead of radioactive isotopes – cycle sequencing or capillary sequencing (the term "capillary" stems from the fact that electrophoresis is performed in a special matrix in capillary tubes, and fluorescence is detected by means of a laser beam). The reaction components are the same as those in the Sanger method, but are mixed in such a way as to allow thermal cycling: denaturation, annealing, elongation, and generation of a balanced population of short and long fragments, using the same principle as the Sanger method (Figure 5A). Specialized software is used to process the detected fluorescence of each fragment and to plot the result as an

A limitation of the method is the size of the fragments that can be sequenced: maximum 800–

 A B **Figure 5**.Generation of fragments by incorporation ofdNTPs and ddNTPs (labeled) in the

Sanger method and cycling sequencing (A). Sample electropherogram (portion) of the betalactamase OXA-48 gene of*Klebsiella pneumoniae* strain OXA48BG, NCBI, GenBank accession

available either as freeware [55, 56] or as commercial software. What is important to remember is to always check the sequencing results in the file generated by the software against the electropherogram: there may often be discrepancies between the nucleotide sequence in the file and that in the electropherogram. In such cases, the electropherogram should be considered more

between different samples), it is important for them to be equal in length. The next step is sequence analysis – phylogenetic analysis, genetic distance analysis, etc. For example, in phylogenetic analysis, it is particularly important to choose the mathematical model that is most appropriate for each particular case.Instead, there are software programs especially designed to determine the most appropriate model depending on the sequence length, the number of sequences, potential substitutions, etc.For example,jModelTest[57] analyzes andselects among

reliable but the background effect should also be accounted for.

OXA48BG, NCBI, GenBank accession number KJ959619.1 [54] (B).

*New-Generation Sequencing*(NGS) *platforms*

excitation.The whole process includes the following steps:

IlluminaSequencing by Synthesis

chapter will focus on three NGS platforms: Illumina, Nanopore, and Ion Torrent.

The sequences obtained using forward and reverse primersare analyzed by programs

The sequences obtained using forward and reverse primers are analyzed by programs available either as freeware [55, 56] or as commercial software. What is important to remember is to always check the sequencing results in the file generated by the software against the electropherogram: there may often be discrepancies between the nucleotide sequence in the

**Figure 5.** Generation of fragments by incorporation of dNTPs and ddNTPs (labeled) in the Sanger method and cycling sequencing (A). Sample electropherogram (portion) of the beta-lactamase OXA-48 gene of *Klebsiella pneumoniae* strain

In the alignment ofsequences (first, between the F andRprimers of a sample and, second,

The way is now open for nucleic acid research using NGStechnologies. This part of the

The Illumina technology is based on the principle described above, i.e., incorporation of



on the operator and the consumables/method used). It is important for the target DNA

fragmentation of the target DNA, followed by 5 and 3 ligation of fragments by

fluorescently labeled dNTPs by DNA polymerase using a DNA or a cDNA template in a sequence of cycles. The identification of incorporated nucleotides is based on fluorophore

(total DNA, PCR products, cDNA) to be of the highest possible purity.

electropherogram (Figure 5B) in ABI, FASTA, and PAUP file format.

1 bp different in length.

18 Nucleic Acids - From Basic Aspects to Laboratory Tools

1000 nucleotides per run.

number KJ959619.1 [54] (B).

89 different models.

The Illumina technology is based on the principle described above, i.e., incorporation of fluorescently labeled dNTPs by DNA polymerase using a DNA or a cDNA template in a sequence of cycles. The identification of incorporated nucleotides is based on fluorophore excitation. The whole process includes the following steps:


stranded DNA fragment anneals (by hybridization) to a complementary oligonucleotide on the flow-cell surface (resulting in an ∩-shaped bridge-like structure); then, a DNA polymer‐ ase enzyme synthesizes a complementary DNA strand by incorporation of unlabeled nucleotides. The resulting double-stranded DNA template is denatured, which releases the two single-stranded DNA fragments both from one another as well as from the flow-cell surface at one end: if the initial ssDNA fragment had its 5′ end free, then it is restored and the new copy remains bound at its 3′ end (which is complementary to the free 5′ end). Thus, the templates generated by bridge amplification form clusters of single-stranded fragments bound at one end and the number of DNA templates grows. (Then, the reverse templates are removed and only the forward strands remain – clones identical to the original tem‐ plates)


This technique allows multiplex analysis to be carried out by creating distinct libraries based on the so-called index sequences (short nucleotide sequences specific for each library that can be used like a barcode), which are attached during library preparation step. Different libraries are first individually prepared and are then combined together and loaded in one and the same flow cell lane. The labeled libraries are sequenced simultaneously, in a single run of the equipment, and at the end of the process, the sequences are exported in a single file. Next, a demultiplexing algorithm is used to separate the sequences in different files based on their barcode. This is followed by alignment with referent sequences of interest.

The platform is compatible with different library preparation methods depending on the purpose of the sequencing analysis (whole genome sequencing, target sequencing, mRNA sequencing, 16s RNA gene sequencing, etc.). This is possible because the sequencing steps that come after the library preparation step are fundamental and do not depend on the library preparation method. There are ready-to-use, standardized library preparation kits designed for different sequencing purposes and more and more new approaches and modifications are being developed [58].
