*3.2.4. DNA nanoball sequencing by BGI Retrovolocity*

Complete Genomics (http://www.completegenomics.com) developed DNA nanoball sequenc‐ ing (DNBS) as a hybrid of sequencing by hybridization and ligation [70]. Small fragments (440– 500 bp) of genomic DNA or cDNA are amplified into DNA nanoballs by rolling-circle replication that requires the construction of complete circular templates before the generation of nanoballs. The DNA nanoballs are deposited onto an arrayed flow cell, with one nanoball per well sequenced at high density. Up to 10 bases of the template are read in 5′ and 3′ direction from each adapter. Since only short sequences, adjacent to adapters, are read, this sequencing format resembles a multiplexed form of mate-pair sequencing similar to using Exact Call Chemistry in SOLiD sequencing [2, 36, 66]. Ligated sequencing probes are removed, and a new pool of probes is added, specific for different interrogated positions. The cycle of annealing, ligation, washing, and image recording is repeated for all 10 positions adjacent to one terminus of one adapter. This process is repeated for all seven remaining adapter termini. Although the developers have sequenced the whole human genome, the major disadvantage of DNBS is the short length of reads and the length of time for the sequencing projects. Claimed cost of the reagents for sequencing of the whole human genome is under \$5000. The major advantage of this approach is the high density of arrays and therefore the high number of DNBs (~350 million) that can be sequenced. In 2015, the Chinese genomics service company BGI-Shenzhen acquired Complete Genomics and introduced the Retrovolocity system for large-scale, highquality whole-genome and whole-exome sequencing with 50x coverage per genome and with the sample to assembled genome produced in less than 8 days [71]. Complete Genomics claims to have sequenced more than 20,000 whole human genomes over 5 years and published widely on the use of their NGS platform. They provide public access to a human repository of 69 genomes data and a cancer data set of two matched tumor and normal sample pairs at http:// www.completegenomics.com/public-data/.

#### *3.2.5. Ion torrent*

tides that are able to produce a larger output at lower reagent cost [4, 6, 66]. The clonally enriched template DNA for sequencing is generated by PCR bridge amplification (also known as cluster generation) into miniaturized colonies called polonies [66]. The output of sequencing data per run is higher (600 Gb), the read lengths are shorter (approximately 100 bp), the cost is cheaper, and the run times are much longer (3-10 days) than most other systems [54]. Illumina provides six industrial-level sequencing machines (NextSeq 500, HiSeq series 2500, 3000, and 4000, and HiSeq X series five and ten) with mid to high output (120–1500 Gb) as well as a compact laboratory sequencer called the MiSeq, which, although small in size, has an output of 0.3 to 15 Gb and fast turnover rates suitable for targeted sequencing for clinical and small laboratory applications [68]. The MiSeq uses the same sequencing and polony technology such as the high-end machines, but it can provide sequencing results in 1 to 2 days at much reduced cost [54]. Illumina's new method of synthetic long reads using TruSeq technology apparently improves *de novo* assembly and resolves complex, highly repetitive transposable elements [69].

Supported Oligonucleotide Ligation and Detection (SOLiD) is a next-generation sequencer instrument marketed by Life Technologies (http://www.lifetechnologies.com) and first released in 2008 by Applied Biosystems Instruments (ABI). It is based on 2-nucleotide sequencing by ligation (SBL) [4, 6, 66]. This procedure involves sequential annealing of probes to the template and their subsequent ligation. Sequencers on the market today, such as the 5500 W series, are suitable for small- and large-scale projects involving whole genomes, exomes, and transcriptomes. Previously, sample preparation and amplification was similar to that of Roche 454 sequencing [66]. However, the upgrades to Wildfire chemistry have enabled greater throughput and simpler workflows by replacing beads with direct *in situ* amplification on FlowChips and paired-end sequencing [62]. The SOLiD 5500 W series sequencing reactions still use fluorescently labeled octamer probes in repeated cycles of annealing and ligation that are interrogated and eventually deciphered in a complex subtractive process using Exact Call Chemistry that has been well described by others [2, 36, 66]. The advantage of this method is accuracy with each base interrogated twice. The major disadvantages are the short read lengths (50–75 bp), the very long run times of 7 to 14 days, and the need for state-of-the-art computa‐

tional infrastructure and expert computing personnel for analysis of the raw data.

Complete Genomics (http://www.completegenomics.com) developed DNA nanoball sequenc‐ ing (DNBS) as a hybrid of sequencing by hybridization and ligation [70]. Small fragments (440– 500 bp) of genomic DNA or cDNA are amplified into DNA nanoballs by rolling-circle replication that requires the construction of complete circular templates before the generation of nanoballs. The DNA nanoballs are deposited onto an arrayed flow cell, with one nanoball per well sequenced at high density. Up to 10 bases of the template are read in 5′ and 3′ direction from each adapter. Since only short sequences, adjacent to adapters, are read, this sequencing format resembles a multiplexed form of mate-pair sequencing similar to using Exact Call Chemistry in SOLiD sequencing [2, 36, 66]. Ligated sequencing probes are removed, and a new

*3.2.4. DNA nanoball sequencing by BGI Retrovolocity*

*3.2.3. Sequencing by Oligonucleotide Ligation and Detection (SOLiD)*

10 Next Generation Sequencing - Advances, Applications and Challenges

Ion Torrent technology (http://www.iontorrent.com) was developed by the inventors of 454 sequencing [60], introducing two major changes. Firstly, the nucleotide sequences are detected electronically by changes in the pH of the surrounding solution proportional to the number of incorporated nucleotides rather than by the generation of light and detection using optical components. Secondly, the sequencing reaction is performed within a microchip that is amalgamated with flow cells and electronic sensors at the bottom of each cell. The incorporated nucleotide is converted to an electronic signal detected by the electronic sensors. The two sequencers in the market that use Ion Torrent technology are the high-throughput Proton sequencer with more than 165 million sensors and the Ion Personal Genome Machine (PGM), a bench-top sequencer with 11.1 million sensors. There are four sequencing chips to choose from [72]. The Ion PI Chip is used with the Proton sequencer, and the Ion 314, 316, or 318 Chips are used with the Ion PGM. The Ion 314 Chip provides the lowest reads at 0.5 million reads per chip, whereas the Ion 318 Chip provides the highest reads of up to 5.5 million reads per chip. The Proton sequencer provides a higher throughput (10–100 Gb vs. 20 Mb–1 Gb) and more reads per run (660 Mb vs. 11 Mb) than the PGM chips, but the read lengths (200–500 bp), run time (4–5 h), and accuracy (99%) are similar [54, 72]. Sample preparation for the generation of DNA libraries is similar to the one used for Roche 454 sequencing but can be simplified with the use of the Ion Chef system for automated template preparation and chip loading. The Ion Torrent chip is used with an ion-sensitive field-effect transistor sensor that has been engineered to detect individual protons produced during the sequencing reaction. The chip is placed within the flow cell and is sequentially flushed with individual unlabeled dNTPs in the presence of the DNA polymerase. Incorporation of nucleotide into the DNA chain releases H protons and changes the pH of the surrounding solution that is proportional to the number of incorporated nucleotides. The major disadvantages of the system are problems in reading homopolymer stretches and repeats. The major advantages seem to be the relatively longer read lengths, flexible workflow, reduced turnaround time, and a cheaper price than those provided by the other platforms [54, 73].
