*3.1.1. Core or central mutation databases*

Their goal is to collect all sequence variations detected in all genes and to describe each mutation briefly. These databases are used to assess the frequency of a variation (minor allele frequency (MAF) lower or higher than 1%) in the general population (i.e. unaffected individ‐ uals). Since a MAF higher than 1% reflects a low probability for the variant to be pathogenic, such data may be highly informative for the interpretation of variants.

### *3.1.1.1. The National Center for Biotechnology Information (NCBI) short genetic variations database dbSNP*

dbSNP [25] is the most comprehensive directory of single nucleotide variations. It catalogues short variations in nucleotide sequences from a wide range of organisms. Genetic Variations may be common, thus representing true polymorphisms, or they may be rare. Some of these rare human entries have additional information associated with them, including disease associations, genotype information and allele origin, as some variations are somatic rather than germ-line events. Genotypes and allele frequencies information for various populations from different studies, including data form the HapMap project, are also available.

### *3.1.1.2. Databases specifically collecting data from NGS projects*

1000Genomes [24] aimed to find most genetic variants that have frequencies of at least 1% in samples from five populations: East Asian, South Asian, African, European and American ancestries. As in dbSNP, genotypes and allele frequencies information are available for a large number of variants [40].

Exome Variant Server [41] is a database that collects data of the NHLBI GO Exome Sequencing Project (ESP). This project aimed to discover novel genes and mechanisms contributing to various disorders by sequencing the protein coding regions of the human genome (i.e. exome) using NGS technology. As the *CFTR* gene is widely studied, this tool would not be of added value compared to dbSNP and 1000 Genomes.

### *3.1.2. Locus Specific Databases (LSDBs) dedicated to CFTR*

LSDBs are now recognized as the best mode of collecting and curating lists of mutations related to human genetic diseases [42]. They compile in a single bioinformatics tool disease-causing and non-disease-causing sequence variations identified by genetics laboratories in families with a history of a given Mendelian disease. The most sophisticated ones integrate clinical and biological data, information on the geographic and/or ethnic origin, frequency of variations in the general population, mutation hot spots and all useful information for diagnosis, prognosis and the evaluation of genotype/phenotype relationships [43].

Here we choose to detail three LSDB dedicated to *CFTR* that provide complementary infor‐ mation for the interpretation and the characterization of variants identified in diagnostics practice.
