*3.1.2.1. Cystic fibrosis mutation database*

Some variants such as frequent mutations (found in more than 1% of CF mutated alleles), nonsense or frameshift mutations are readily classified as pathogenic mutations. However, the frequent identification of rare sequence alterations of unknown pathogenicity (VUCS, VUS) substantially complicates test interpretation. Moreover, their number will increase with the diffusion of NGS technologies. To facilitate classification of these variants, CF laboratories have to combine several tools like central mutation databases or *CFTR* locus specific databases, *in*

Their goal is to collect all sequence variations detected in all genes and to describe each mutation briefly. These databases are used to assess the frequency of a variation (minor allele frequency (MAF) lower or higher than 1%) in the general population (i.e. unaffected individ‐ uals). Since a MAF higher than 1% reflects a low probability for the variant to be pathogenic,

*3.1.1.1. The National Center for Biotechnology Information (NCBI) short genetic variations database*

dbSNP [25] is the most comprehensive directory of single nucleotide variations. It catalogues short variations in nucleotide sequences from a wide range of organisms. Genetic Variations may be common, thus representing true polymorphisms, or they may be rare. Some of these rare human entries have additional information associated with them, including disease associations, genotype information and allele origin, as some variations are somatic rather than germ-line events. Genotypes and allele frequencies information for various populations from

1000Genomes [24] aimed to find most genetic variants that have frequencies of at least 1% in samples from five populations: East Asian, South Asian, African, European and American ancestries. As in dbSNP, genotypes and allele frequencies information are available for a large

Exome Variant Server [41] is a database that collects data of the NHLBI GO Exome Sequencing Project (ESP). This project aimed to discover novel genes and mechanisms contributing to various disorders by sequencing the protein coding regions of the human genome (i.e. exome) using NGS technology. As the *CFTR* gene is widely studied, this tool would not be of added

LSDBs are now recognized as the best mode of collecting and curating lists of mutations related to human genetic diseases [42]. They compile in a single bioinformatics tool disease-causing

*silico* prediction tools and *ex vivo/in vivo* functional analyses [38, 39].

*3.1.1. Core or central mutation databases*

212 Cystic Fibrosis in the Light of New Research

*dbSNP*

number of variants [40].

**3.1. Epidemiological data and locus specific databases dedicated to** *CFTR*

such data may be highly informative for the interpretation of variants.

different studies, including data form the HapMap project, are also available.

*3.1.1.2. Databases specifically collecting data from NGS projects*

value compared to dbSNP and 1000 Genomes.

*3.1.2. Locus Specific Databases (LSDBs) dedicated to CFTR*

The Cystic Fibrosis Mutation Database (CFMDB) [44] also called 'CFTR1' is an open access database dedicated to the collection of sequence variations in the *CFTR* gene for the interna‐ tional CF genetics research community. It was initiated by the Cystic Fibrosis Genetic Analysis Consortium (CFGAC) in 1989 and is maintained by the Cystic Fibrosis Centre at the Hospital for Sick Children in Toronto. CFMDB allows the direct submission of new variants by laboratories, by filling out an on- line standardized form with the possibility to detail pheno‐ typic data, genotype (i.e. other variants identified in patient) or epidemiological data. The key point of this database is to collect the largest number of *CFTR* sequence variations identified in patients, relatives and partners. On the other side, because the submission procedure applies only to the initial report of each variant, CFMDB does not provide frequency data, available with the two other databases described below. Finally, contributors do not always follow HGVS recommendations and a same variant can be reported by several laboratories under different names, possibly leading to misinterpretation or misreporting in diagnosis reports.

### *3.1.2.2. Clinical and functional translation of CFTR database CFTR2*

CFTR2 [45] is a website designed to provide information about specific CF mutations to patients, researchers and the general public. For each mutation included in the database, it provides information about whether a given mutation causes cystic fibrosis when combined with another CF-causing mutation and clinical and biological information (sweat chloride, lung function, pancreatic status and pseudomonas infection rates) in patients carrying the mutation. A specific section for health practitioners and scientists provides more in-depth and research-related information.

The goal of the CFTR2 project is to categorize all mutations seen in CF patients as diseasecausing (always resulting in CF when combined with another CF-causing mutation), neutral or mutation of varying clinical consequences (CF and CFTR-RD). Mutations that have not been fully analysed are considered of unknown clinical significance.

The major advantages of CFTR2 are (i) the collection of detailed clinical characteristics on large cohorts of individuals [46] that provide useful information related to a given genotype, and (ii) results of functional testing that are key arguments for their final interpretation [47].

However, this database only collects clinical and genetic data of CF patients (from national registers) that can lead to a bias of phenotypic spectrum assessment of several mutations considered as CF-causing mutations while they were also reported in CFTR-RD patients in *trans* of other CF-causing mutations.
