*2.2.1.3. Bioinformatics software to analyse NGS data*

Then, generated data are analysed by bioinformatics tools. Quality criteria such as confidence or coverage are guarantees for optimal sensitivity and specificity of a sequencing run. Bioin‐ formatics analyses are frequently performed by using software developed by sequencers' companies. The three steps include (1) base calling and quality score computation, (2) assembly and alignment and (3) variant calling and annotation. Laboratories developed their own 'inhouse' pipeline mainly to apply filters allowing an easy focus on causing-mutations. Others use free web resources to realign files including BWA (http://bio-bwa.sourceforge.net) or to perform a new variant calling such as Samtools and GATK [21-23]. Others serve as Viewer such as Integrative Genomics Viewer (http://www.broadinstitute.org/igv/). Galaxy website is an integrative platform offering the possibility to view and process files generated by NGS. Many databases, useful for variant annotation and sometimes inserted in others programs or pipelines, are publicly available including the 1000 Genomes Project [24] and the dbSNP database [25] (further described in part 3.1.1).

Many tracks are currently studied (i.e. improvement of bioinformatics tools, comparison of NGS approaches between CF laboratories, development of databases including newly detected NGS variants, publication of guidelines and definition of a diagnosis report model) to make CF molecular diagnosis by NGS suitable for different clinical and familial cases.

### *2.2.2. Indications and choice of CFTR analysed regions*

In the case of Mendelian diseases caused by mutations in a single gene, like *CFTR* in Cystic Fibrosis, NGS sequencing of entire genome or exome is still useless and expensive. Collabo‐ rations between companies specialized in molecular diagnosis and academic laboratories recently led to the establishment of new molecular diagnosis tools based on NGS. Combination of (i) the enrichment of regions of interest by hybrid capture, circularization or Polymerase Chain Reaction (PCR) and (ii) high throughput sequencing now allows time-efficient and economical way to perform analyses. Use of NGS technology as a first intention is currently set up in laboratories, which will soon question the CF diagnosis strategy as a tree. As shown in Figure 2, in our actual CF molecular diagnosis strategy, NGS technology can be used instead of 'classical' techniques for the detection of a panel of common mutations (step 1) or to analyse the '*CFTR* exome' (step 2). Technical manipulations are similar for both, but filters can be used to focus on regions that contain mutations. Multiplicom® and Illumina® propose a locus specific design to library preparation for molecular analysis of the '*CFTR* exome'. CE-marked kits for *in vitro* diagnostic (CE-IVD) and efficient bioinformatics tools are commercialized. The latter could be performed by in-house pipelines or subcontract on commercial firms (Sophia Genetics®). SNV and indels are correctly detected (high specificity and sensitivity) and CNV detection is now available for some amplicon-based methods.

Furthermore, in patients with definite CF clinical diagnosis (positive sweat test, CF clinical features) and who carry no or only one CF mutation, step 3 analysis could be ultimately proposed, in combination with the analysis of potential modifiers genes (see next section)

In literature, three studies reported NGS *CFTR* sequencing (Table 4) on CF patients, CF carriers or controls samples [18, 26, 27]. Abou Tayoun and colleagues [26] first proposed a proof-ofconcept for a '*CFTR* exome' analysis by NGS on 79 samples. Target enrichment was performed by PCR amplification (AmpliSeq Panel, Life technologies) and Sequencing on Ion Torrent Platform (PGM®). Their sequencing offered minimal coverage of 100X (depending on the Ion 314 or 318 chip used). Two others studies realized sequencing of the whole *CFTR* locus (close to 250 kb) including deep intronic *CFTR* regions (Figure 2, Step 3). Trujillano *et al*. [27] reported the *CFTR* re-sequencing by hybridization capture on a custom NimbleGen SeqCap EZ Choise array using HiSeq2000 (Illumina®) in a set of 92 samples. They highlighted the precise characterization of breakpoints of seven genomic rearrangements in *CFTR*.

*2.2.1.3. Bioinformatics software to analyse NGS data*

208 Cystic Fibrosis in the Light of New Research

database [25] (further described in part 3.1.1).

*2.2.2. Indications and choice of CFTR analysed regions*

detection is now available for some amplicon-based methods.

Then, generated data are analysed by bioinformatics tools. Quality criteria such as confidence or coverage are guarantees for optimal sensitivity and specificity of a sequencing run. Bioin‐ formatics analyses are frequently performed by using software developed by sequencers' companies. The three steps include (1) base calling and quality score computation, (2) assembly and alignment and (3) variant calling and annotation. Laboratories developed their own 'inhouse' pipeline mainly to apply filters allowing an easy focus on causing-mutations. Others use free web resources to realign files including BWA (http://bio-bwa.sourceforge.net) or to perform a new variant calling such as Samtools and GATK [21-23]. Others serve as Viewer such as Integrative Genomics Viewer (http://www.broadinstitute.org/igv/). Galaxy website is an integrative platform offering the possibility to view and process files generated by NGS. Many databases, useful for variant annotation and sometimes inserted in others programs or pipelines, are publicly available including the 1000 Genomes Project [24] and the dbSNP

Many tracks are currently studied (i.e. improvement of bioinformatics tools, comparison of NGS approaches between CF laboratories, development of databases including newly detected NGS variants, publication of guidelines and definition of a diagnosis report model) to make CF molecular diagnosis by NGS suitable for different clinical and familial cases.

In the case of Mendelian diseases caused by mutations in a single gene, like *CFTR* in Cystic Fibrosis, NGS sequencing of entire genome or exome is still useless and expensive. Collabo‐ rations between companies specialized in molecular diagnosis and academic laboratories recently led to the establishment of new molecular diagnosis tools based on NGS. Combination of (i) the enrichment of regions of interest by hybrid capture, circularization or Polymerase Chain Reaction (PCR) and (ii) high throughput sequencing now allows time-efficient and economical way to perform analyses. Use of NGS technology as a first intention is currently set up in laboratories, which will soon question the CF diagnosis strategy as a tree. As shown in Figure 2, in our actual CF molecular diagnosis strategy, NGS technology can be used instead of 'classical' techniques for the detection of a panel of common mutations (step 1) or to analyse the '*CFTR* exome' (step 2). Technical manipulations are similar for both, but filters can be used to focus on regions that contain mutations. Multiplicom® and Illumina® propose a locus specific design to library preparation for molecular analysis of the '*CFTR* exome'. CE-marked kits for *in vitro* diagnostic (CE-IVD) and efficient bioinformatics tools are commercialized. The latter could be performed by in-house pipelines or subcontract on commercial firms (Sophia Genetics®). SNV and indels are correctly detected (high specificity and sensitivity) and CNV

Furthermore, in patients with definite CF clinical diagnosis (positive sweat test, CF clinical features) and who carry no or only one CF mutation, step 3 analysis could be ultimately proposed, in combination with the analysis of potential modifiers genes (see next section)

In literature, three studies reported NGS *CFTR* sequencing (Table 4) on CF patients, CF carriers or controls samples [18, 26, 27]. Abou Tayoun and colleagues [26] first proposed a proof-ofWe proposed a complete *CFTR* gene sequencing of DNA samples from patients with a confirmed CF clinical diagnosis but with an incomplete genotype [18]. Although large unexplored intronic regions might contain few mutations (about 1%–3% of CF mutations), we identified a new pathogenic mutation, which creates a pseudo-exon (Table 4). Moreover, we compared hybridization capture and Long-Range PCR to target enrichment and used a smallscale NGS platform for sequencing (GS Junior Sequencer, 454 Life Sciences®). Some promising variants were then confirmed as deleterious by *in vitro/ex vivo* functional assays. However, for most detected intronic variants, classification will be a long and difficult way. This approach is currently under development for CF diagnosis.



**Table 4.** Comparison of three NGS strategies for *CFTR* sequencing
