**3. Second-generation sequencing**

sequencing project using the Sanger method and the 373A DNA semiautomated sequenc‐ ers to generate large batches of cDNA sequences with an average length of 397 bases, which they named "expressed sequence tags" (ESTs) and used as substrates and markers for RNA contig and transcriptome mapping. These improvements, together with the establishment of GenBank (http://www.ncbi.nlm.nih.gov/genbank) in 1982, resulted in the generation of hundreds of thousands of more DNA sequences throughout the 1980s, 1990s [34–36], and right up to the beginning of the new millennium, with the publication of the first draft

A sudden increase in the number of DNA and RNA sequences generated for GenBank between 1992 and 2004 (http://www.ncbi.nlm.nih.gov/genbank/statistics) resulted mostly from three main initiatives: the development of automated sequencers and the emergence of service providers, the industrialization and the establishment of sequencing centers and international consortiums, and the continued development of computing hardware and software to store and analyze nucleotide sequences. The automated-industrialized ap‐ proach based on random or shotgun sequencing was initiated by The Institute for Genom‐ ic Research (TIGR) in Rockville, Maryland, and resulted in the publication of 337 new human genes and 48 homologous genes from other organisms [42]. By 1999, the TIGR venture generated 83 million nucleotides of cDNA sequence, 87,000 human cDNA sequen‐ ces, and the complete genome sequences of two bacterial species, *Haemophilus influenzae* [45] and *Mycoplasma genitalium* [46]. This success was in part due to the development of the TIGR sequence assembler, an innovative computer program to assemble vast amounts of EST data [47]. By the end of 2001, the automated sequencers, such as the fully automat‐ ed Prism 3700 with 96 capillaries that could produce 1.6×105 bases of sequence data per day, sequencing centers and international consortiums, such as the TIGR in the USA, the Sanger Centre in the United Kingdom, and RIKEN in Japan, produced the complete genomic sequences of the bacteria *E. coli* and *Bacillus subtilis*, the yeast *Saccharomyces cerevisiae*, the nematode *C. elegans*, the fruit fly *Drosophila melanogaster*, the plant *Arabidopsis thialiana*, and the human genome (see references cited by Stein [48]). Although sequencing was still hugely expensive and time consuming, Sanger sequencing was by then the dominant method. Pundits now placed DNA sequencing into a postgenomic era and predicted functional genomics, SNPs, and transcript arrays as the future of biological investigation [49, 50]. Indeed, after the establishment of the first Affymetrix and GeneChip microarrays in 1996, the decade saw a rapid growth in DNA array technology and applications for various gene expression studies in prokaryotes and eukaryotes [21, 51, 52]. Nevertheless, the outputs for genomic and/or RNA sequencing had neither finished nor slowed; new sequencing methods continued to emerge after 2005 to challenge the cost and supremacy of the Sanger di‐ deoxy method [34–36]. These new methods became known as next-generation sequencing because they were designed to employ massively parallel strategies to produce large amounts of sequence from multiple samples at very high-throughput and at a high degree of sequence coverage to allow for the loss of accuracy of individual reads when com‐ pared to Sanger sequencing. These different approaches brought the cost of sequencing the

genome down from \$100 million in 2001 to less than \$10,000 in 2014 [53].

sequence of the human genome [43, 44].

6 Next Generation Sequencing - Advances, Applications and Challenges

A more detailed history of the development of the first- and next-generation sequencing platforms has been presented in a number of previous reviews [2–6, 11, 34–36, 54]. Table 1 outlines the basic features and performances of the common next-generation sequencing platforms. The basic characteristics of second-generation sequencing technology are the following. Shotgun sequencing of random fragmented genomic (fg) DNA or cDNA reverse transcribed from RNA is performed without the need for cloning via a foreign host cell: instead, linker and/or adapter sequences are ligated to the fgDNA or cDNA for construction of template libraries. Library amplification is performed on a solid surface or on beads while isolated within miniature emulsion droplets or arrays. Nucleotide incorporation is monitored directly by luminescence detection or by changes in electrical charge during the sequencing procedure. NGS generates many millions of nucleotide short reads in parallel in a much shorter time than by the Sanger sequencing method. The read types generated by NGS are digital and therefore enable direct quantitative comparisons. Either single or pair end reads can be obtained at fragment ends.
