**4. One disease (too) many theories**

30 Neuromuscular Disorders

and 222 healthy controls revealed a unique association of FSHD with the 161 allele and the 4qA sequence. In particular the haplotype 4A166 associated with D4Z4-reduced alleles was detected in multiple unaffected relatives of two independent families and the 4B163 haplotype was associated with 17 FSHD-sized alleles carried in healthy subjects (Lemmers et al., 2007). On this basis it has been hypothesized that FSHD can develop only in a specific "permissive" chromosomal background represented by the haplotype 4A161. Following this hypothesis, proximal and distal sequences of 4A161 chromosome were compared to those of "nonpermissive" ones, such as 4B163 and 10A166. This approach led to the identification of a single nucleotide polymorphism (SNP, AT(T/C)AAA) in the adjacent pLAM sequence, immediately distal to D4Z4 array. In particular 4A161 and two other uncommon permissive variants, 4A159 and 4A168 presented the ATTAAA variant, which has been interpreted as a polyadenylation

Fig. 4. **Schematic representation of the current view of permissive and not-permissive haplotype. a**. Permissive haplotypes **b.** Non-permissive haplotype. The ATTAAA variant creates a polyadenylation signal (PAS) that stabilizes the *DUX4* transcript and has been

By contrast sequences associated with non-permissive chromosome 10A166 and 4B did not allow the expression of DUX4 (Lemmers et al., 2010a). Analysis of more than 300 unrelated FSHD patients and 5 families with one or more FSHD patients carrying D4Z4-reduced allele strongly supported the hypothesis that the last 4qA D4Z4 unit with the directly adjacent pLAM sequence including the ATTAAA is necessary to the FSHD development (Lemmers et al., 2010a). On this basis it has been proposed that FSHD arises through a toxic gain of function attributable to the stabilized distal *DUX4* transcript (Lemmers et al., 2010a) (Figure 4b). Despite the intriguing premise, the notion that FSHD is a fully-penetrant autosomal dominant disorder caused by the reduction of D4Z4 repeat number associated with 4A161PAS haplotype is challenged by recently published data. First a study conducted on 750 unrelated FSHD families from Italy revealed that the frequency of individuals carrying two D4Z4 reduced alleles (compound heterozygotes) is 2,7%, a frequency much higher than expected for a fully penetrant autosomal dominant disorder with prevalence of 1 in 20,000. Interestingly in these families with compound heterozygosity, 25% of relatives carrying D4Z4-reduced alleles and 4A161PAS are healthy (Scionti et al., 2012a). Second, characterization of 253 unrelated FSHD probands from the Italian National Registry for FSHD showed that only 127 of them (50.1%) carry D4Z4 alleles with 1-8 D4Z4 associated with 4A161PAS, whereas the remaining FSHD probands carry different haplotypes or alleles with greater number of D4Z4 repeats (Scionti et al., 2012b). Third, molecular analysis of 801 normal

signal able to stabilize the *DUX4* transcript (Figure 4a).

postulated to be the critical factor causing FSHD.

After the genetic correlation between D4Z4 and FSHD the most difficult task has been to explain the role of D4Z4 in disease development. D4Z4 can directly cause FSHD through DUX4 expression; on the other end D4Z4 reduction might indirectly cause FSHD by exerting long distance effects. None of the proposed models entirely explain the mechanism leading to disease. In this regard the scientific community does not express undisputed consensus.

Fig. 5. **Models for the molecular basis of FSHD. A**. Healthy individuals carry 11–150 units of D4Z4, whereas FSHD patients have less than 11 repeats. **B. DIRECT MECHANISM**: reduction of D4Z4 repeat array leads to the synthesis of DUX4 transcript, which is normally not transcribed, through changes in D4Z4 heterochromatin and/or stabilization of DUX4 mRNA. **C. INDIRECT MECHANISM:** the reduction of D4Z4 repeats leads to modifications of the spatial and structural organization of chromatin generating changes of transcriptional control over the expression of candidate genes localized in *cis* or in *trans*.

Facioscapulohumeral Muscular Dystrophy: From Clinical Data to Molecular Genetics and Return 33

was hypothesized that loss of D4Z4 repeat would produce a local chromatin relaxation (i.e., loss of heterochromatinization) and, consequently, the transcriptional upregulation of genes nearby D4Z4, possibly in a distance-related manner (Hewitt et al., 1994; Winokur et al., 1994). The identification of a repressor complex that binds to a specific 27 bp DNA element within D4Z4 (Gabellini et al., 2002) supports the *cis*-spreading model. Consistently, three 4qter genes (*FRG2, FRG1,* and *ANT1)* were found upregulated in muscle of FSHD patients

According to this model, D4Z4 is able to interact with target gene(s) by long-distance loops only when the D4Z4 contraction impairs the formation of normal D4Z4 intra-array loops. The hypothesis that normal-sized D4Z4 repeats form intra-array loops is supported by the size distribution of D4Z4 repeats which is multimodal, with equidistant peaks 65 kb apart

D4Z4 is localized between the distal heterochromatic telomeric sequences and the euchromatic sequences more upstream. It has thus been proposed that it might act as an insulator (van Deutekom, 1996a). Reduction of the repeat arrays would impair the separation between domains, and, consequently, the spreading of heterochromatic would silence proximal genes in *cis*. This model is supported by the finding that D4Z4 itself acts as an insulator, which interferes with enhancer–promoter communication and protects from position effect (Ottaviani et al., 2009). Results obtained with different experimental approaches demonstrated that both, the transcriptional factor CTCF (CCCTC-Binding factor), and the A-type intermediate filament Lamins binding, are necessary for D4Z4 insulator function. In this model, FSHD contracted *D4Z4* array associates with CTCF and Atype Lamins at the nuclear periphery resulting in both *cis* and *trans* insulation of gene(s) physiologically interacting with the 4q35 terminal sequences (Ottaviani et al., 2010). This

The FSHD genomic region at 4q35.2 is consistently and specifically localized at the nuclear envelope (Petrov et al., 2006; Ottaviani et al., 2009) in proliferating myoblasts, fibroblasts, lymphoblasts, and differentiated myotubes. Interestingly it is not the D4Z4 repeat itself that mediates interaction with the nuclear envelope but a chromosome 4 genomic regions just proximal to the D4Z4 repeat (D4S139) (Masny et al., 2004; Ottaviani et al., 2009). Since FSHD region is localized to the nuclear periphery, an alternative model for FSHD pathogenesis has been proposed. In this model improper interaction with transcription factors or chromatin modifiers at the nuclear envelope could induce aberrant expression of genes localized in *cis*  or in *trans.* However a differential localization of normal or FSHD alleles to the nuclear

It has been also postulated that reduction of D4Z4 might have a more genome-wide effects, affecting other pathways, such the slow-to-fast fiber differentiation pathway (Celegato et al., 2006) and the response to oxidative stress and myogenic differentiation pathway (Winokur

may lead to the miss regulation of these genes and to the FSHD phenotype.

(Gabellini et al., 2002).

(van Overveld et al., 2000).

*Cis* **model: Nuclear localization** 

periphery has not been observed (Masny et al., 2004).

**Trans-effect model: Genome wide effect** 

**Insulator model** 

*Cis***-looping model: 4q35 genes derepression** 

In examining all the models that have been proposed it is important to remember essential points:


Notably in all proposed models, epigenetic changes such as methylation or histone modifications are used as an additional level of complexity that might help interpreting the complex correlation between genotype and phenotype in FSHD. In this paragraph we will shortly describe the main mechanisms that have been proposed. In the following paragraphs all the factors that have been considered important for the disease onset will be explained in detail.

#### **4.1 Direct mechanism: DUX4**

The most recent model proposed to explain FSHD pathogenesis is based on the idea that the most distal copy of the *DUX4* gene, whose open reading frame is present in each D4Z4 repeat, is transcribed and the expression of this gene has a direct role in FSHD pathophysiology. At first it has been proposed that partial reduction of the D4Z4 repeat array results in destabilization of the D4Z4 heterochromatin and in the inappropriate upregulation of *DUX4* gene (Gabriels et al., 1999; Hewitt et al., 1994). However this hypothesis has never been proven. This model has been consequently modified introducing the concept of a "permissive" chromosomal background namely a single nucleotide polymorphism in the pLAM sequence that provides a polyadenylation signal (PAS) for the *DUX4* transcript. This should stabilize the *DUX4* transcript from the most distal D4Z4 unit on 4q chromosomes resulting in disease through a toxic gain-of-function mechanism (Lemmers et al., 2010b) (Figure 5A).

#### **4.2 Indirect mechanism: Iindirect overexpression of candidate genes**

All the other models proposed to explain the role of D4Z4-reduced alleles in FSHD pathogenesis, predict that D4Z4 reduction is able to generate a modification in the spatial and structural organization of chromatin at 4q35. As a consequence loss of transcriptional control over the expression of candidate genes, localized in *cis* or in *trans,* is generated (Figure 5B).

#### *Cis***-spreading model: 4q35 genes derepression**

D4Z4 contains heterochromatic DNA elements. It was thus reasoned that D4Z4 and surrounding sequences would be strongly packed as heterochromatin. Based on this idea, it was hypothesized that loss of D4Z4 repeat would produce a local chromatin relaxation (i.e., loss of heterochromatinization) and, consequently, the transcriptional upregulation of genes nearby D4Z4, possibly in a distance-related manner (Hewitt et al., 1994; Winokur et al., 1994).

The identification of a repressor complex that binds to a specific 27 bp DNA element within D4Z4 (Gabellini et al., 2002) supports the *cis*-spreading model. Consistently, three 4qter genes (*FRG2, FRG1,* and *ANT1)* were found upregulated in muscle of FSHD patients (Gabellini et al., 2002).

#### *Cis***-looping model: 4q35 genes derepression**

According to this model, D4Z4 is able to interact with target gene(s) by long-distance loops only when the D4Z4 contraction impairs the formation of normal D4Z4 intra-array loops. The hypothesis that normal-sized D4Z4 repeats form intra-array loops is supported by the size distribution of D4Z4 repeats which is multimodal, with equidistant peaks 65 kb apart (van Overveld et al., 2000).

#### **Insulator model**

32 Neuromuscular Disorders

In examining all the models that have been proposed it is important to remember essential

80-85% of FSHD patients carry a reduction in D4Z4 whereas loss of the whole array is

 25% of relatives carrying D4Z4 alleles who are old than 56 years do not have FSHD; Healthy individuals bearing allele with reduced number of repeats (4-8 units) are

Repeat reduction in the highly homologous D4Z4 copy on chromosome 10 is not

Penetrance of the FSHD is not complete and its severity does not clearly correlate with

Notably in all proposed models, epigenetic changes such as methylation or histone modifications are used as an additional level of complexity that might help interpreting the complex correlation between genotype and phenotype in FSHD. In this paragraph we will shortly describe the main mechanisms that have been proposed. In the following paragraphs all the factors that have been considered important for the disease onset will be

The most recent model proposed to explain FSHD pathogenesis is based on the idea that the most distal copy of the *DUX4* gene, whose open reading frame is present in each D4Z4 repeat, is transcribed and the expression of this gene has a direct role in FSHD pathophysiology. At first it has been proposed that partial reduction of the D4Z4 repeat array results in destabilization of the D4Z4 heterochromatin and in the inappropriate upregulation of *DUX4* gene (Gabriels et al., 1999; Hewitt et al., 1994). However this hypothesis has never been proven. This model has been consequently modified introducing the concept of a "permissive" chromosomal background namely a single nucleotide polymorphism in the pLAM sequence that provides a polyadenylation signal (PAS) for the *DUX4* transcript. This should stabilize the *DUX4* transcript from the most distal D4Z4 unit on 4q chromosomes resulting in disease through a toxic gain-of-function mechanism

All the other models proposed to explain the role of D4Z4-reduced alleles in FSHD pathogenesis, predict that D4Z4 reduction is able to generate a modification in the spatial and structural organization of chromatin at 4q35. As a consequence loss of transcriptional control over the expression of candidate genes, localized in *cis* or in *trans,* is generated

D4Z4 contains heterochromatic DNA elements. It was thus reasoned that D4Z4 and surrounding sequences would be strongly packed as heterochromatin. Based on this idea, it

**4.2 Indirect mechanism: Iindirect overexpression of candidate genes** 

points:

not associated with FSHD;

associated with the FSHD;

number of repeats;

**4.1 Direct mechanism: DUX4** 

(Lemmers et al., 2010b) (Figure 5A).

*Cis***-spreading model: 4q35 genes derepression** 

(Figure 5B).

explained in detail.

15-20% of FSHD patients have a normal number of repeats;

No specific 4q haplotype is associated with FSHD;

present in 3% of the healthy population;

D4Z4 is localized between the distal heterochromatic telomeric sequences and the euchromatic sequences more upstream. It has thus been proposed that it might act as an insulator (van Deutekom, 1996a). Reduction of the repeat arrays would impair the separation between domains, and, consequently, the spreading of heterochromatic would silence proximal genes in *cis*. This model is supported by the finding that D4Z4 itself acts as an insulator, which interferes with enhancer–promoter communication and protects from position effect (Ottaviani et al., 2009). Results obtained with different experimental approaches demonstrated that both, the transcriptional factor CTCF (CCCTC-Binding factor), and the A-type intermediate filament Lamins binding, are necessary for D4Z4 insulator function. In this model, FSHD contracted *D4Z4* array associates with CTCF and Atype Lamins at the nuclear periphery resulting in both *cis* and *trans* insulation of gene(s) physiologically interacting with the 4q35 terminal sequences (Ottaviani et al., 2010). This may lead to the miss regulation of these genes and to the FSHD phenotype.

#### *Cis* **model: Nuclear localization**

The FSHD genomic region at 4q35.2 is consistently and specifically localized at the nuclear envelope (Petrov et al., 2006; Ottaviani et al., 2009) in proliferating myoblasts, fibroblasts, lymphoblasts, and differentiated myotubes. Interestingly it is not the D4Z4 repeat itself that mediates interaction with the nuclear envelope but a chromosome 4 genomic regions just proximal to the D4Z4 repeat (D4S139) (Masny et al., 2004; Ottaviani et al., 2009). Since FSHD region is localized to the nuclear periphery, an alternative model for FSHD pathogenesis has been proposed. In this model improper interaction with transcription factors or chromatin modifiers at the nuclear envelope could induce aberrant expression of genes localized in *cis*  or in *trans.* However a differential localization of normal or FSHD alleles to the nuclear periphery has not been observed (Masny et al., 2004).

#### **Trans-effect model: Genome wide effect**

It has been also postulated that reduction of D4Z4 might have a more genome-wide effects, affecting other pathways, such the slow-to-fast fiber differentiation pathway (Celegato et al., 2006) and the response to oxidative stress and myogenic differentiation pathway (Winokur

Facioscapulohumeral Muscular Dystrophy: From Clinical Data to Molecular Genetics and Return 35

prevalently located in subtelomeric or pericentromeric regions (Winokur et al., 1994; van Geel et al., 1999; van Geel et al., 2002). However, only the *FRG2* copies on chromosomes 4 and 10 show a 98% identity, differing for just five nucleotide mismatches in the ORF (Rijkers et al., 2004). Experiments demonstrated that the *FRG2* promoter is sensitive to the presence of D4Z4 repeat units making *FRG2* an interesting candidate gene for FSHD pathophysiology (Rijkers et al., 2004). Indeed it has been shown that overexpression of *FRG2* is obtained by suppressing the activity of the D4Z4 recognition complex (DRC) (Gabellini et al., 2002). Moreover data suggests that in muscle biopsies from FSHD patients, *FRG2* overexpression inversely correlates with D4Z4 repeat number (Gabellini et al., 2002). However the overexpression of *FRG2* in FSHD is still controversial. If there is a general agreement that mRNA is virtually absent in most of human tissue, there is no consensus regarding the expression of *FRG2* in FSHD patients' samples. *FRG2* overexpression was reported in differentiating, but not proliferating myoblasts of FSHD patients (Rijkers et al., 2004). The overexpression of *FRG2* in FSHD myotubes has not been fully confirmed in other works (Arashiro et al., 2009; Cheli et al., 2011; Masny et al., 2010; Osborne et al., 2007). The different outcomes of expression studies may be explained by the intrinsic difficulties in detecting *FRG2* mRNA due to its low expression level and by the presence in the genome of multiple copies of *FRG2.* Moreover *FRG2* is not represented in the gene arrays currently used for RNA expression studies. Whether *FRG2* is involved in FSHD pathogenesis still remains in discussion. Indeed muscle-specific overexpression of *FRG2* in mice does not result in an aberrant phenotype (Figure 6A) (Gabellini et al., 2006), and FSHD patient with proximal deletion encompassing FRG2 have been found (Lemmers et al., 2003). Nevertheless it is worth mentioning that *FRG2* appears late in the evolution together with D4Z4 repeats and it is not present in the mice genome making the mice model for *FRG2* overexpression not

In the human genome *FRG1* gene is located 125 kb centromeric to D4Z4 array on chromosome 4. As for many other genes from the 4q subtelomeric region, several copies of FRG1 are present in the human genome (van Deutekom et al., 1996c). The *FRG1* copy on chromosome 4 encodes a 258-amino acid protein. Although the FRG1 protein does not share significant overall homology to any known protein, it contains two nuclear localization signals in the N-terminal region (NLSs, aa 22-25 and 29-32), a bipartite NLS in the Cterminal region (aa 253-261) and a single fascin-like domain (aa 58-176), indicative of an actin-binding protein (Figure 6B), one potential RNA-binding domain (22-35 aa) homologous to several RNA-binding proteins (RBPs). FRG1 protein is highly conserved among invertebrates and vertebrates: human FRG1 shares 42% identity with *C. elegans*, 81% identity with *Xenopus* and 97% identity with mouse protein (Figure 6B). The high level of conservation throughout species suggests that FRG1 might have a very important function

Since its discovery, FSHD Region Gene 1 (*FRG1*) has been considered a candidate gene for FSHD (Van Deutekom et al., 1996c). Analysis of its expression level in muscle tissues obtained from FSHD patients and healthy subjects showed that *FRG1* was abnormally upregulated in FSHD affected muscles. Significantly, in lymphocytes from FSHD patients, its expression was equivalent to that observed in normal tissue, indicating that this overexpression in FSHD is muscle-specific (Gabellini et al., 2002). Consistent with this evidence,

conclusive. The function of this protein is still unknown.

**6.2 FSHD region gene 1 (***FRG1***)** 

that is preserved during the evolution.

et al., 2003b). Because D4Z4 can be regarded as a docking platform for protein factors, loss of repeats may generate a local imbalance in the availability of D4Z4 proteins in the cell, and/or lead to new interaction with different proteins at the disease allele.
