**3.2. NGS platforms**

The main features and performances of five commonly used second-generation sequencing technologies that have been reviewed in detail by others [2–4, 11, 36, 54] are shown in Table 1.



**Table 1.** Basic features and performances of NGS platforms. Sources are [4, 11, 20, 54, 115]. For comparison of the NGS outputs, the human genome has 3×109 bp or 3 Gb.

#### *3.2.1. Roche 454 pyrosequencing*

Numerous DNA and RNA library kits and machines are available for the semiautomated or fully automated preparation of DNA libraries both for second- and/or third-generation sequencing. Some of these are GemCode from x10 (http://10xgenomics.com) and Raindrop's Thunderstorm (http://raindancetech.com) for all sequencing platforms, cBot for the Illumina platform [58], and Ion Chef and Ion OneTouch for the Ion Torrent platform [59]. All of these kits and auxiliary machines attempt to reduce workload and costs for the main platform sequencers. The DNA libraries are labeled with barcode sample tags, such as the multiplex identifier (MID) for Roche/454 sequencing, to enable the libraries to be pooled and therefore maximize the sequence output as a multiplex amplicon sequencing step for each sequencing run. After library construction, the DNA fragments are clonally amplified by emulsion PCR with microbeads [4, 6, 60] or by solid-phase PCR using primers attached to a solid surface [4, 61, 62] in order to generate sufficient single-stranded DNA molecules and detectable signal for producing sufficiently reliable sequencing data [54]. Roche 454, Life Technologies' SOLiD, and Ion Torrent platforms use emulsion PCR, whereas Illumina's HiSeq/MiSeq platforms use solid-phase PCR [4]. More recently, isothermal PCR amplification on a solid surface of a flow

8 Next Generation Sequencing - Advances, Applications and Challenges

cell [62] was developed for the SOLiD 5500 W series of sequencing machines.

and other sequencing platform systems.

**Read length per run (bp)**

**No. reads per run**

**3.2. NGS platforms**

**NGS platforms/company/max**

**output per run**

*First generation*

*Second generation*

A problem with preparing sequencing libraries by PCR amplification is that PCR introduces GC bias, a major source of unwanted variation and errors in the sequencing coverage [63]. Using alternative methods to PCR amplification improves library complexity and the coverage of high GC regions and reduces the number of duplicate reads [64]. A number of different PCR-free library preparation kits are available commercially, such as NEXTflex PCR-Free from Bioo Scientific, Accel NGS 2S PCR-free library kit from Swift Biosciences, and the Illumina TruSeq DNA PCR-Free Sample Preparation Kit that uses ligation amplification for Illumina

The main features and performances of five commonly used second-generation sequencing technologies that have been reviewed in detail by others [2–4, 11, 36, 54] are shown in Table 1.

> **Time (h or days)**

Sanger/Life Technologies/84 kb 800 1 2 h 2400 0.3 95,000 Dideoxy terminator

454 GS FLX+/Roche/0.7 Gb 700 1×106 24/48 h 10 1 500,000 Pyrosequencing GS Junior/Roche/70 Mb 500 1×105 18 h 9 100,000 Pyrosequencing HiSeq/Illumina/1500 Gb 2x150 5×109 27/240 h 0.1 0.8 750,000 Reversible terminators MiSeq/Illumina/15 Gb 2x300 3×108 27 h 0.13 0.8 125,000 Reversible terminators

SOLiD/Life Technologies/120 Gb 50 1×109 14 days 0.13 0.01 350,000 Ligation Retrovolocity/BGI/3000 Gb 50 1×109 14 days 0.01 0.01 12×106 Nanoball/ligation

**Cost per 106 bases**

**Raw error rate (%)** **Platform cost (USD approx.)**

**Chemistry**

Roche 454 pyrosequencing by synthesis (SBS) was the first commercially successful secondgeneration sequencing system developed by 454 Life Sciences in 2005 and acquired by Roche in 2007 (http://www.my454.com). This technology uses sequencing chemistry, whereby visible light is detected and measured after it is produced by an ATP sulfurylase, luciferase, DNA polymerase enzymatic system in proportion to the amount of pyrophosphate that is released during repeated nucleotide incorporation into the newly synthesized DNA chain [2, 4, 6]. The system was miniaturized and massively parallelized using PicoTiterPlates to produce more than 200,000 reads at 100 to 150 bp per read with an output of 20 Mb per run in 2005 [6]. The upgraded 454 GS FLX Titanium system released by Roche in 2008 improved the average read length to 700 bp with an accuracy of 99.997% and an output of 0.7 Gb of data per run within 24 h. The GS Junior bench-top sequencer produced a read length of 700 bp with 70 Mb throughput and runtime of 10 to 18 h. The major drawbacks of this technology are the high cost of reagents and high error rates in homopolymer repeats. The estimated cost per million bases is \$10 by Roche 454 compared to \$0.07 by Illumina HiSeq 2000 [54]. A more serious challenge for those using this technology is the announcement by Roche that they will no longer supply or service the 454 sequencing machines or the pyrosequencing reagents and chemicals after 2016 [65].

#### *3.2.2. Illumina (Solexa) HiSeq and MiSeq sequencing*

Illumina (http://www.illumina.com) purchased the Solexa Genome Analyzer in 2006 and commercialized it in 2007 [66, 67]. Today, it is the most successful sequencing system with a claimed >70% dominance of the market, particularly with the HiSeq and MiSeq platforms. The Illumina sequencer is different from the Roche 454 sequencer in that it adopted the technology of sequencing by synthesis using removable fluorescently labeled chain-terminating nucleo‐ tides that are able to produce a larger output at lower reagent cost [4, 6, 66]. The clonally enriched template DNA for sequencing is generated by PCR bridge amplification (also known as cluster generation) into miniaturized colonies called polonies [66]. The output of sequencing data per run is higher (600 Gb), the read lengths are shorter (approximately 100 bp), the cost is cheaper, and the run times are much longer (3-10 days) than most other systems [54]. Illumina provides six industrial-level sequencing machines (NextSeq 500, HiSeq series 2500, 3000, and 4000, and HiSeq X series five and ten) with mid to high output (120–1500 Gb) as well as a compact laboratory sequencer called the MiSeq, which, although small in size, has an output of 0.3 to 15 Gb and fast turnover rates suitable for targeted sequencing for clinical and small laboratory applications [68]. The MiSeq uses the same sequencing and polony technology such as the high-end machines, but it can provide sequencing results in 1 to 2 days at much reduced cost [54]. Illumina's new method of synthetic long reads using TruSeq technology apparently improves *de novo* assembly and resolves complex, highly repetitive transposable elements [69].

#### *3.2.3. Sequencing by Oligonucleotide Ligation and Detection (SOLiD)*

Supported Oligonucleotide Ligation and Detection (SOLiD) is a next-generation sequencer instrument marketed by Life Technologies (http://www.lifetechnologies.com) and first released in 2008 by Applied Biosystems Instruments (ABI). It is based on 2-nucleotide sequencing by ligation (SBL) [4, 6, 66]. This procedure involves sequential annealing of probes to the template and their subsequent ligation. Sequencers on the market today, such as the 5500 W series, are suitable for small- and large-scale projects involving whole genomes, exomes, and transcriptomes. Previously, sample preparation and amplification was similar to that of Roche 454 sequencing [66]. However, the upgrades to Wildfire chemistry have enabled greater throughput and simpler workflows by replacing beads with direct *in situ* amplification on FlowChips and paired-end sequencing [62]. The SOLiD 5500 W series sequencing reactions still use fluorescently labeled octamer probes in repeated cycles of annealing and ligation that are interrogated and eventually deciphered in a complex subtractive process using Exact Call Chemistry that has been well described by others [2, 36, 66]. The advantage of this method is accuracy with each base interrogated twice. The major disadvantages are the short read lengths (50–75 bp), the very long run times of 7 to 14 days, and the need for state-of-the-art computa‐ tional infrastructure and expert computing personnel for analysis of the raw data.
