**2.2. Next Generation Sequencing (NGS)**

### *2.2.1. Principle of NGS technologies*

The development of Next Generation Sequencing (NGS) or Massively Parallel Sequencing (MPS) technologies with an immense capacity (up to 1 Terabase (Tb) of data per run with the HiSeq2500 system, Illumina) is a major technical progress in the field of human genetics. Since 2004, three principal NGS platforms have been commercially available, including (i) 454 GS FLX & GS junior from Roche (ii) Genome Analyzer, HiSeq & MiSeq from Illumina and (iii) SOLiD & Ion Torrent PGM from Life Technologies-Applied Biosystems (AB) [15]. These technologies differ in terms of sample library preparation workflow, enabling sequencing on any of the current NGS sequencers (e.g. Illumina, Life technologies, Roche). Since late 2004, three principal NGS firms developed sequencers commercially available (listed in Table 2).

454 pyrosequencing is a sequencing-by-synthesis method that measures the release of inorganic pyrophosphate upon incorporation of nucleotides, by converting it into luciferase chemiluminescent signals using a series of enzymatic reactions. Ion Torrent semiconductor technology is also based on a sequencing-by-synthesis approach but measures pH changes (instead of light) induced by the release of hydrogen ions as nucleotides are incorporated. SOLiD technology is a ligation-based sequencing system. DNA ligase is used to identify the nucleotide present at a given position in a DNA sequence; each base is read twice, which increases accuracy, even for homopolymeric regions. Base detection uses a mixture of labelled oligonucleotides, which queries the input strand with ligase. In Illumina system, clonal amplification is performed using a process termed 'bridge amplification' followed by two basic steps, initial priming and extending of the single-stranded, single-molecule template and bridge amplification of the immobilized template with immediately adjacent primers to form clusters. For sequencing, only dye-labelled terminators are added; then the sequence at that position is determined for all clusters; next, the dye is cleaved and another round of dyelabelled terminators is added.


**Table 2.** Overview of the main NGS technologies

### *2.2.1.1. Target enrichment*

**2.2. Next Generation Sequencing (NGS)**

The development of Next Generation Sequencing (NGS) or Massively Parallel Sequencing (MPS) technologies with an immense capacity (up to 1 Terabase (Tb) of data per run with the HiSeq2500 system, Illumina) is a major technical progress in the field of human genetics. Since 2004, three principal NGS platforms have been commercially available, including (i) 454 GS FLX & GS junior from Roche (ii) Genome Analyzer, HiSeq & MiSeq from Illumina and (iii) SOLiD & Ion Torrent PGM from Life Technologies-Applied Biosystems (AB) [15]. These technologies differ in terms of sample library preparation workflow, enabling sequencing on any of the current NGS sequencers (e.g. Illumina, Life technologies, Roche). Since late 2004, three principal NGS firms developed sequencers commercially available (listed in Table 2).

454 pyrosequencing is a sequencing-by-synthesis method that measures the release of inorganic pyrophosphate upon incorporation of nucleotides, by converting it into luciferase chemiluminescent signals using a series of enzymatic reactions. Ion Torrent semiconductor technology is also based on a sequencing-by-synthesis approach but measures pH changes (instead of light) induced by the release of hydrogen ions as nucleotides are incorporated. SOLiD technology is a ligation-based sequencing system. DNA ligase is used to identify the nucleotide present at a given position in a DNA sequence; each base is read twice, which increases accuracy, even for homopolymeric regions. Base detection uses a mixture of labelled oligonucleotides, which queries the input strand with ligase. In Illumina system, clonal amplification is performed using a process termed 'bridge amplification' followed by two basic steps, initial priming and extending of the single-stranded, single-molecule template and bridge amplification of the immobilized template with immediately adjacent primers to form clusters. For sequencing, only dye-labelled terminators are added; then the sequence at that position is determined for all clusters; next, the dye is cleaved and another round of dye-

**Ion Torrent SOLiD GS FLX & GS Junior**

Adaptors on template DNA bind primers on beads, one molecule per bead

Emulsion PCR: clusters on beads

density glass slide

(Pyrosequencing)

Adaptors on template DNA bind primers on beads, one molecule per bead

Emulsion PCR: clusters on beads

Beads loaded onto high density plate

Fluorescence Light (luciferase)

(Reversible termination) Sequencing by synthesis Sequencing by Ligation Sequencing by synthesis

Adaptors on template DNA bind primers on beads, one molecule per bead

Emulsion PCR: clusters on beads

> ions (sensitive pH meter)

H+

cell Beads loaded on a chip Beads bonded to high

*2.2.1. Principle of NGS technologies*

206 Cystic Fibrosis in the Light of New Research

labelled terminators is added.

Loading

**Platform GAII, HiSeq &**

Methodology Sequencing by synthesis

Clonal amplification Bridge PCR: Surface

Parallelisation Random array on flow

Detection Fluorescence

**Table 2.** Overview of the main NGS technologies

**MiSeq**

Adaptors on template DNA bind high density primers across surface of slide

array on flow cell

Targeted re-sequencing isolates genomic regions of interest in a sample library, allowing to focus efficiently and cost-effectively on a small subset of the genome, such as an exome, a particular chromosome, a set of genes or a region of interest such as a whole gene. Two main strategies can be envisioned: capture [16, 17] or amplification relevant genomic DNA [18-20] as shown in Table 3.


**Table 3.** Main strategies proposed for target enrichment

### *2.2.1.2. Advantages and limits*

The main advantages of the technology are related to its capability to process a large number of samples in parallel. NGS technologies are time saving and lower the costs per patient, of step 1 and particularly step 2 molecular analyses. But, they also have significant limitations such as high error rates, enrichment of rare variants and large proportion of missing values, as well as the fact that most current analytical methods are designed for population-based association studies. With second generation sequencing, it is necessary to clonally amplify the isolated targets in order to generate sufficient signal for detection during the sequencing run generating clusters of many thousands of identical DNA targets. In addition, each NGS platform generates different read lengths that range from short (e.g. 35 bases) to long reads (over 500 bases). For a number of applications, including targeted re-sequencing, ChIP-Seq and RNA-Seq, short reads are highly informative and adequate. Conversely, longer reads are more suitable for *de novo* genome assembly, mapping of high homology regions (related gene family and pseudogenes) and sequencing of repetitive DNA regions, such as introns. This is an important consideration since short read length can make accurate assembly and alignment computationally challenging.

For some clinical applications, as NGS produces massive amounts of data, their analysis and interpretation are time-consuming, not trivial and a real challenge even if specific portions of a genome is analysed.
