*3.1.2.3. The French molecular database CFTR-France*

*CFTR*-France [48] has been developed since 2012 with the aim to collect, store and process any category of variants identified in the *CFTR* gene, thanks to the collaboration of nine French laboratories with high expertise in the molecular analyses of this gene. Its specificity is to compile and annotate any category of variations (disease-causing, non-disease-causing and variants of unknown clinical significance) that have been identified by collaborators in patients affected with CF or CFTR-RD, in foetuses with abnormal ultrasonography (e.g. echogenic bowel), newborns with pending or inconclusive diagnosis and asymptomatic individuals carrying at least one sequence variation on each CFTR gene (i.e. carrying two variations in *trans*). The database includes the main clinical data of these individuals, genetic information from familial segregation studies and various variant annotations (frequency in patients and controls populations, sequence homology, predicted or experimentally assessed functional impact, etc.), allowing the analysis of genotype/phenotype relationships.

Thus, *CFTR*-France, by collecting all phenotypes, reflects the phenotypic spectrum of a large number of mutations. It also reports mutations in complex alleles with association frequencies (related to all individuals recorded in the database), and gives the up-to-date HGVS nomen‐ clature of mutations.

Data collected in *CFTR*-France are provided by level 2 (specialised) and reference laboratories, so that patients analysed only by level 1 laboratories (searching for the most common muta‐ tions) are not included in the database.

Note: Access to *CFTR*-France is currently restricted to collaborators. A public access program is in progress for the medical and scientific community and for patients and families.

### **3.2.** *In silico* **prediction analyses**

### *3.2.1. Variants located in exons and exon-intron boundaries*

### *3.2.1.1. Prediction tools for the assessment of the impact on protein*

Prediction methods of amino acid substitutions use protein sequence, structure and/or annotation. Disease-causing mutations that affect protein function tend to occur at evolutio‐ narily conserved sites and/or at key positions in protein structure. Multiple sequence align‐ ment of orthologous sequences reveal what positions have been conserved through evolution, and these positions are supposed to be important for protein function. Annotation can enhance prediction for variants located in structurally and functionally important domains, but this information is often sparse.

The issue of the efficiency of prediction tools in assessing possible pathogenicity of missense variants in the *CFTR* gene is of major interest, since they constitute the vast majority of VUS identified in patients. Diagnostics laboratories frequently use those tools and particularly in problematic situations. Unfortunately their performance has not been clearly established and results (i.e. score of pathogenicity) may be discordant for a given variant.

considered as CF-causing mutations while they were also reported in CFTR-RD patients in

*CFTR*-France [48] has been developed since 2012 with the aim to collect, store and process any category of variants identified in the *CFTR* gene, thanks to the collaboration of nine French laboratories with high expertise in the molecular analyses of this gene. Its specificity is to compile and annotate any category of variations (disease-causing, non-disease-causing and variants of unknown clinical significance) that have been identified by collaborators in patients affected with CF or CFTR-RD, in foetuses with abnormal ultrasonography (e.g. echogenic bowel), newborns with pending or inconclusive diagnosis and asymptomatic individuals carrying at least one sequence variation on each CFTR gene (i.e. carrying two variations in *trans*). The database includes the main clinical data of these individuals, genetic information from familial segregation studies and various variant annotations (frequency in patients and controls populations, sequence homology, predicted or experimentally assessed functional

Thus, *CFTR*-France, by collecting all phenotypes, reflects the phenotypic spectrum of a large number of mutations. It also reports mutations in complex alleles with association frequencies (related to all individuals recorded in the database), and gives the up-to-date HGVS nomen‐

Data collected in *CFTR*-France are provided by level 2 (specialised) and reference laboratories, so that patients analysed only by level 1 laboratories (searching for the most common muta‐

Note: Access to *CFTR*-France is currently restricted to collaborators. A public access program

Prediction methods of amino acid substitutions use protein sequence, structure and/or annotation. Disease-causing mutations that affect protein function tend to occur at evolutio‐ narily conserved sites and/or at key positions in protein structure. Multiple sequence align‐ ment of orthologous sequences reveal what positions have been conserved through evolution, and these positions are supposed to be important for protein function. Annotation can enhance prediction for variants located in structurally and functionally important domains, but this

The issue of the efficiency of prediction tools in assessing possible pathogenicity of missense variants in the *CFTR* gene is of major interest, since they constitute the vast majority of VUS

is in progress for the medical and scientific community and for patients and families.

impact, etc.), allowing the analysis of genotype/phenotype relationships.

*trans* of other CF-causing mutations.

214 Cystic Fibrosis in the Light of New Research

clature of mutations.

tions) are not included in the database.

*3.2.1. Variants located in exons and exon-intron boundaries*

*3.2.1.1. Prediction tools for the assessment of the impact on protein*

**3.2.** *In silico* **prediction analyses**

information is often sparse.

*3.1.2.3. The French molecular database CFTR-France*

Predictions of the impact of non-synonymous substitutions in CFTR are mainly based on multiple sequence alignment of orthologous sequences. Indeed, even if a partial 3D model of the CFTR protein has been established [49, 50], prediction tools do not take into account these elements in the final 'score of pathogenicity'. It is classically recommended to use several prediction tools to obtain concordant predictions that could be considered for variant inter‐ pretation.

Table 5 summarizes bioinformatics programs classically used by diagnostics laboratories [51-53] and the new software SuSPect [54].


**Table 5.** Bioinformatics tools for the prediction of amino acid changes: websites, characteristics and output format [55-59]

A recent work has emphasised the importance of sequence alignments on the performance of prediction tools [60]. The authors constructed custom multiple sequence alignments called phenotype-optimized sequence ensembles (POSEs) that was tested on a training set of *CFTR* mutations.

A previous work already suggested that providing SIFT or PolyPhen-2 with custom align‐ ments increased their performance relative to the default alignments employed by the algorithms [61]. This could explain that, if Alamut© is a highly interesting tool with its ease of use, some predictions obtained by using each tool separately with a custom algorithm could differ from Alamut© results (obtained with default alignments).

### *3.2.1.2. Prediction tools for the assessment of the impact on pre-mRNA splicing*

Splicing mechanisms comprise exon recognition within large pre-mRNA molecules and the precise removal of flanking introns. Three elements constitute the core splicing signals: the intronic branch point, the acceptor site (or 3'splice site), including an inconstant upstream polypyrimidine tract (PPT), and the donor site (or 5' splice site). These core human splice site motifs contain only a part of the information that defines exons, whereas the rest corresponds to less conserved splicing regulatory elements. The latter are located within the exon or flanking introns, promoting or inhibiting exon recognition through exonic/intronic splicing enhancers (ESE or ISE) or silencers (ESS or ISS), respectively (Figure 3).

**Figure 3.** A schematic of key splicing motifs and regulatory elements. Adapted by Le Guédard-Méreuze S. from Wang and Burge, 2008 [62, 63].

Many bioinformatics tools have been developed to predict which splicing modification is the most probable for a given sequence variation — exon skipping, cryptic splice sites activation, use of *de novo* splice sites — or if the variant may be considered as neutral regarding its impact on splicing. Most algorithms were developed based on biostatistical and experimental analyses of information contained in the genomic sequence. They provide a score depending on the strength of the considered splice site. Indeed, the strength of splicing motifs is a key parameter to predict the impact of a sequence variation. Performance of these tools has been widely studied by comparing the results of predictions with experimental assays for various genes including *CFTR* [39, 63-66]. In 2012, Houdayer and collaborators performed a large-scale study of VUCS in BRCA genes in order to assess the performance of six prediction tools [67]. This work provided guidelines for the proper use of these tools and for the interpretation of prediction results.

Table 6 summarizes principle and main characteristics of the most 'popular' bioinformatics programs and ASSEDA, a recently developed program [68-73].


**Table 6.** Main characteristics of several Splicing prediction tools [74-79]

A previous work already suggested that providing SIFT or PolyPhen-2 with custom align‐ ments increased their performance relative to the default alignments employed by the algorithms [61]. This could explain that, if Alamut© is a highly interesting tool with its ease of use, some predictions obtained by using each tool separately with a custom algorithm could

Splicing mechanisms comprise exon recognition within large pre-mRNA molecules and the precise removal of flanking introns. Three elements constitute the core splicing signals: the intronic branch point, the acceptor site (or 3'splice site), including an inconstant upstream polypyrimidine tract (PPT), and the donor site (or 5' splice site). These core human splice site motifs contain only a part of the information that defines exons, whereas the rest corresponds to less conserved splicing regulatory elements. The latter are located within the exon or flanking introns, promoting or inhibiting exon recognition through exonic/intronic splicing

**Figure 3.** A schematic of key splicing motifs and regulatory elements. Adapted by Le Guédard-Méreuze S. from Wang

Many bioinformatics tools have been developed to predict which splicing modification is the most probable for a given sequence variation — exon skipping, cryptic splice sites activation, use of *de novo* splice sites — or if the variant may be considered as neutral regarding its impact on splicing. Most algorithms were developed based on biostatistical and experimental analyses of information contained in the genomic sequence. They provide a score depending on the strength of the considered splice site. Indeed, the strength of splicing motifs is a key parameter to predict the impact of a sequence variation. Performance of these tools has been widely studied by comparing the results of predictions with experimental assays for various genes including *CFTR* [39, 63-66]. In 2012, Houdayer and collaborators performed a large-scale study of VUCS in BRCA genes in order to assess the performance of six prediction tools [67]. This work provided guidelines for the proper use of these tools and for the interpretation of

Table 6 summarizes principle and main characteristics of the most 'popular' bioinformatics

programs and ASSEDA, a recently developed program [68-73].

differ from Alamut© results (obtained with default alignments).

216 Cystic Fibrosis in the Light of New Research

*3.2.1.2. Prediction tools for the assessment of the impact on pre-mRNA splicing*

enhancers (ESE or ISE) or silencers (ESS or ISS), respectively (Figure 3).

and Burge, 2008 [62, 63].

prediction results.

It is important to note that consequences on splicing of exonic synonymous and non-synony‐ mous *CFTR* variants must be assessed, as suggested by recent of experimental studies [39, 80].

### *3.2.2. Deep intronic variants*

The examples of insertion of intronic sequences called pseudo-exons (or cryptic exons) in mature transcripts of various genes are becoming ever more numerous and their role in human diseases has been largely demonstrated. We saw in section 2.2 that NGS strategies currently allow scanning of *CFTR* deep intronic regions [18], resulting in a growing number of new identified deep intronic variants.

Bioinformatics tools described above, which assess the impact of variants on splicing, can also be used to evaluate deep intronic mutations. We tested these algorithms on mutations identified in CF patients after NGS sequencing of the entire *CFTR* locus and they showed satisfactory results [18]. Indeed, prediction tools allowed the selection of possible diseasecausing mutations (i.e. predicted impact on splicing by inclusion of pseudo-exons) and predictions were confirmed by *in vitro* functional studies using minigene constructs (see section 3.3.1.1) and by direct analysis of aberrant transcripts from nasal epithelial cells of patients (see 3.3.2.1).

### **3.3.** *In vitro***/***ex vivo* **functional analyses**

### *3.3.1. Cell lines transfection experiments*

The type of cells used for transfection depends on the tissue that is studied and the clinical context. Pulmonary (BEAS-2B, A549, Calu-3) or intestinal/colic (CACO-2, T84) immortalized cells (by SV40 or carcinoma) contain an appropriate concentration of transcriptional and splicing factors for CFTR protein synthesis. Cells stably transfected with mutated *CFTR* can also be used (CFBe41o-, CFPAC-1). Stable expression is usually obtained by lentivirus transduction and transient transfection by chemical agent (Polyfect, interferin). In this case, the endogenous CF molecular and cellular context (inflammation) should also be considered.

### *3.3.1.1. Splicing assessment*

Minigenes are autonomic cyclic entity containing promoter and exons and are produced by clonal amplification in bacteria [81, 82]. They contain a genomic segment from the gene of interest (here *CFTR*) that includes exon and flanking intronic regions (length can range from ten to thousands of nucleotides, an average of 300 bp) or only intronic regions in the case of evaluation of potential creation of a pseudo-exon. To determine whether a mutation is responsible for altered splicing, minigenes can also include *cis*-regulatory elements if affected (ESE, ESS, ISE or/and ISS) [83]. These regions of interest are framed by two invariable exons, which are part of the system. Every assay of transfection in cell lines compares the wild-type and mutated (through directed mutagenesis) constructs [84]. All *CFTR* exons are needed to produce a mature and functional protein. Thus, a modification of transcript in the *in vitro* system suggests that the assessed *CFTR* change has a deleterious effect on exon splicing. An ever-increasing number of mini-gene studies have been performed to assess the pathogenicity of *CFTR* variants [39, 60, 80, 84]. This strategy, despite its limitations, is of high interest in the overall strategy for the characterization of rare sequence variations.

### *3.3.1.2. Expression vectors for the quantification of mRNA, protein or CFTR-specific chloride conductance*

It is important to note that consequences on splicing of exonic synonymous and non-synony‐ mous *CFTR* variants must be assessed, as suggested by recent of experimental studies [39, 80].

The examples of insertion of intronic sequences called pseudo-exons (or cryptic exons) in mature transcripts of various genes are becoming ever more numerous and their role in human diseases has been largely demonstrated. We saw in section 2.2 that NGS strategies currently allow scanning of *CFTR* deep intronic regions [18], resulting in a growing number of new

Bioinformatics tools described above, which assess the impact of variants on splicing, can also be used to evaluate deep intronic mutations. We tested these algorithms on mutations identified in CF patients after NGS sequencing of the entire *CFTR* locus and they showed satisfactory results [18]. Indeed, prediction tools allowed the selection of possible diseasecausing mutations (i.e. predicted impact on splicing by inclusion of pseudo-exons) and predictions were confirmed by *in vitro* functional studies using minigene constructs (see section 3.3.1.1) and by direct analysis of aberrant transcripts from nasal epithelial cells of

The type of cells used for transfection depends on the tissue that is studied and the clinical context. Pulmonary (BEAS-2B, A549, Calu-3) or intestinal/colic (CACO-2, T84) immortalized cells (by SV40 or carcinoma) contain an appropriate concentration of transcriptional and splicing factors for CFTR protein synthesis. Cells stably transfected with mutated *CFTR* can also be used (CFBe41o-, CFPAC-1). Stable expression is usually obtained by lentivirus transduction and transient transfection by chemical agent (Polyfect, interferin). In this case, the endogenous CF molecular and cellular context (inflammation) should also be considered.

Minigenes are autonomic cyclic entity containing promoter and exons and are produced by clonal amplification in bacteria [81, 82]. They contain a genomic segment from the gene of interest (here *CFTR*) that includes exon and flanking intronic regions (length can range from ten to thousands of nucleotides, an average of 300 bp) or only intronic regions in the case of evaluation of potential creation of a pseudo-exon. To determine whether a mutation is responsible for altered splicing, minigenes can also include *cis*-regulatory elements if affected (ESE, ESS, ISE or/and ISS) [83]. These regions of interest are framed by two invariable exons, which are part of the system. Every assay of transfection in cell lines compares the wild-type and mutated (through directed mutagenesis) constructs [84]. All *CFTR* exons are needed to produce a mature and functional protein. Thus, a modification of transcript in the *in vitro* system suggests that the assessed *CFTR* change has a deleterious effect on exon splicing. An

*3.2.2. Deep intronic variants*

218 Cystic Fibrosis in the Light of New Research

identified deep intronic variants.

**3.3.** *In vitro***/***ex vivo* **functional analyses**

*3.3.1. Cell lines transfection experiments*

patients (see 3.3.2.1).

*3.3.1.1. Splicing assessment*

Full-length *CFTR* cDNA is classically inserted in expression vector system (e.g. pcDNA3 or p-Tracer) upon a promoter that may be drug-activated (G418, tetracycline or doxycyclineactivation). To assess point variants or small indels, directed mutagenesis is carried out (usually QuikChange Mutagenesis kits®, Agilent Technologies). To assess molecular conse‐ quences of large rearrangements concerning one or more exons, a truncated *CFTR* cDNA can be inserted in the expression vector [85-87]. Transient or stable transfection can be performed in eukaryotic cells (describe in III.3.1. section). 3-HA tag (in the fourth loop of *CFTR*) can be introduced to easily visualize protein expression. Then, measurement of mRNA expression and evaluation of function and localization of the CFTR protein can be performed for each alternative transcript construct, compared to wild-type.

Automated real-time RT-PCR allows the relative straightforward quantification of mRNA transcripts with specific primers and appropriate reference genes for normalization. mRNA level informs about future protein quality and quantity.

Protein assessment consists in the implementation of complementary experiments for protein quantification, evaluation of its maturation or its cellular localization. Main techniques are detailed below. The effect of variants on CFTR expression and maturation is assessed based on the detection of immature (core-glycosylated, B band, ~150-kD) and/or mature (additional glycosylation in the Golgi, C band ~170-190 kDa) CFTR forms by immunoblotting. Long-term pulse-chase experiments can provide additional information on the lifetime of CFTR on cellular compartments [88]. Immunocytochemical assays (Immunofluorescence (IF) based) can highlight the cellular localization of the CFTR protein. However, most difficulties noted in IF experiments relate to non-specific antibody staining and the effect of sample processing on characteristics of cell development. Moreover, confusion between cell surface (where CFTR is active) and subsurface (where it would not) may occur. Therefore, more sensitive and specific antibodies as well as co-localization assays with other cell surface markers (such as β-tubulin or WGA) are needed. Finally, this remains a qualitative or semi-quantitative method.

CFTR function and activity, i.e. CFTR-specific chloride conductance, can be determined by patch-clamp electrophysiology, halide selective electrode technique, radioisotope efflux assays and by fluorescence-based halide efflux measurement. The use of a CFTR-activating appropriate drug (such as forskolin, isobutylmethylxanthine (IBMX), isoproterenol, terbuta‐ line, genistein, adenosine, etc.) or ATP followed by specific CFTR inhibitor CFTRinh-172 permits CFTR-dependant or independent chloride transport, respectively. To date, the easiest approach developed consists in Iodide efflux based on fluorescence measurement. YFP fluorescence is dependent on YFP expression levels and iodide concentration. Compared with conventional plate-bound CFTR functional assays, the flow cytometric approach can be used to study CFTR function in cell suspension. It may be further adjusted to study CFTR function in heterologous cell populations using cell surface markers and selection of cells that display high CFTR function. Technical limitations include the need to perform this assay in specialized centres (using expensive imaging equipment).

All these methods offer the possibility to evaluate the functional consequences of molecular abnormalities on CFTR and finally improve the classification of variants.

### *3.3.2. Ex vivo CF patients' cells assays*
