**2. Role of repeats in protein aggregation**

pathological aggregates, which are crucibles of amyotrophic lateral sclerosis (ALS)

**Keywords:** disordered regions, repeats, motifs of low complexity, Alzheimer's dis‐

Misfolded proteins leading to formation of protein aggregations are a reason for many diseases. There is no definite answer to the question what causes death of cells where such aggregations have been found: is this the cell defence mechanism or plain death? In a normal state, cells that accumulate such aggregations are usually programmed to death or apopto‐ sis. Cell fragments subjected to apoptosis are removed by phagocytic cells. However, aggregations like amyloid fibrils are known to be resistant to the action of different proteas‐ es [1] that can impede the effective termination of efferocytosis and, as a result, accumulate in tissues. In any case, there is undoubtedly close association of the formation of aggregations and the development of many fatal diseases. There are several models describing the process of fibril formation. For example, in order to begin aggregation, proteins should be prelimi‐ narily unfolded or partially folded [2]. As known, the generation of fibrils is facilitated by denaturing conditions. At the same time, aggregation of peptides and proteins involved in pathogenesis of such types of amyloidosis as type II diabetes, Alzheimer's, and Parkinson's diseases does not necessitate preliminary unfolding of a protein molecule. But these data sooner support the general rule, because under physiological conditions most of these proteins have no definite structure, i.e., are natively unstructured [3]. However, most natively unfolded protein in vivo does not aggregate [4]. Moreover, unstructured proteins are resistant to denaturing conditions, i.e., to the factors bringing about stress, and, in the first place, high temperatures [5]. It was demonstrated that the absence of structure does not correlate with the aggregation capacity [6]. Therefore to avoid spontaneous self-assembly of a protein mole‐ cule, the evolutionary selection has led to an increased content of such amino acids as proline and glycine that inhibit protein aggregation [7] and to an increased content of charged amino acids [8]. On the contrary, due to a large number of amyloidogenic regions, globular pro‐ teins have developed capacity to avoid aggregation because of rapid folding into a globular structure. This shows that protein unfolding is necessary but not sufficient for activation of amyloid fibrils. It is most likely that there should be special motifs of amino acid sequences exposed to the solvent, which are more liable to aggregation than other regions of the amino acid sequences. Experimental data corroborate the hypothesis that there are small regions of

a protein molecule responsible to the amyloidogenic behavior [9–11].

pathogenesis.

102 Update on Amyotrophic Lateral Sclerosis

**1. Introduction**

ease, amyotrophic lateral sclerosis (ALS)

It has remained unclear what the mechanism of the earliest stage of initiation of the pathologic irreversible aggregation of proteins is and how this process is triggered in healthy organisms. It is supposed that the key role in the development of systematic amyloidosis belongs to the so-called primes or factors accelerating pathologic aggregation. Such primes can be infectious agents as well protein molecule regions containing motifs of low complexity, especially when these motifs are recurrent. It was shown that the more frequently the repeats occur in a protein sequence, the less structured the protein is. Generally, most homo-repeats are not structured [12–15]. Nevertheless, this is not characteristic of fibrillar proteins [16], the capacity of which to aggregation depends strongly on the amino acid sequence of the protein [17–19]. For example, if the identity of the amino acid sequence of immunoglobulin domains is lower than 30–40%, the proteins lose their capacity to co-aggregation [18]. Using the bioinformatics analysis, it has been established that in a large number of multidomain proteins the identity of amino acid sequences of their domains is below 40%. It suggests a conclusion that in this way the domains avoid mutual aggregation.

The existence of repeats in proteins and the clarification of their special roles are in the focus of attention of researchers. The role of different repeats is studied actively, including repeats such as PGMG (GPGM) and PNN upon biomineralization of PM27 proteins [20], NPNA (NANP) repeats in circumsporozoite protein of *Plasmodium falciparum*, YSPTSPS repeats in RNA-polymerase II, PHGGGWGQ repeats in prion protein, YGHGGG(N) and YNHGGG(G) repeats in plant proteins rich in glycine, PGQGQQ, PGQGQQGQQ, and GYYPTSOQQ repeats in wheat gluten, and FGGMGGGKGG repeat in bivalves (*Aequipecten abductin*). As a rule, conformational loops formed by these repeats are stabilized upon interaction with different cations; they are characterized by noncovalent interactions, particularly, interactions of aromatic groups. It was demonstrated that many tandem repeats add plasticity and mobility to the protein [21]. A leader on the occurrence of repeats in proteins is *P. falciparum* where 35% proteins of 5300 have repeats. Moreover, *P. falciparum* contains 24% proteins with prion-like domains rich in asparagine, whereas flies have only 3.4% of such proteins [22]. The occurrence frequency of asparagine repeats in our HRaP database (http://bioinfo.protres.ru/hrap/) [14] shows that *P. falciparum* is the leader. Of interest is the fact that one of the functions of these homorepeats is connected with the parasitic life [14]. For example, the sequence of the protein from *P. falciparum* (ID Q8IKW2\_PLAF7 1304 aa 493.P\_falciparum) has asparagine homorepeats of different lengths, the maximal length reaching 41 amino acids. The basic functions of this protein are associated with processes such as deacylation of proteins and "silence" of chro‐ matin [14].

## **3. Cell stress and generation of stress granules**

Cell stress can also be an important factor of initiation of aggregation even though each cell has developed an intricate defence system, because it is subjected to destructive stress action at regular intervals. A striking example of waiting till stress is over in the formation of stress granules (SG), when the nontranslated mRNA and RNA-binding proteins are assembled in ribonucleoprotein (RNP) complexes in order to terminate protein synthesis and thereby maintain cell energy. In this case, only those proteins that are synthesized are required for cell survival [23]. This means that after termination of the stress action, SGs disintegrate rapidly and the "released" mRNA resumes its functioning. However, if due to some reasons, the residence time of such proteins in SGs increases or their concentration in SGs exceeds the norm, disintegration of SGs can be impeded, creating favorable conditions for generation of the "center of aggregation initiation" that may induce transition to irreversible pathologic protein aggregation [24]. A detailed analysis of the mechanism of assembly and disassembly of SGs can provide a new insight into the development of diseases associated with this process and suggest novel therapeutic approaches. Since the main function of SGs is protection of cells from stress, many investigations are conducted to reveal factors shifting the balance of reversible aggregation towards pathology after stress termination (**Figure 1**).

**Figure 1.** Schematic representation of different conformational states of self-assembly and disintegration of prion-like domains and a possible transition to irreversible pathologic aggregation. Modified and adapted from Li et al. [25].

It is assumed that due to self-assembly, RNA-binding proteins can facilitate formation of SGs using prion-like domains [26–28]. At any rate, it has been established that namely the struc‐ tured part of RNA-binding proteins is responsible for the formation of hydrogel and binding to it [29]. Thus, protein FUS maintained its capacity to form gel even without the removal of the C-terminal region that corresponds to the RNA-binding domain and lost this capacity upon removal of the N-terminal unstructured region corresponding to the prion-like domain. The capacity to bind to hydrogel has been established for proteins such as hnRNPA2, RBM3 RNAbinding proteins, hnRNPA1, TIA1, CPEB2, FMRP, CIRBP, TDP43, and yeast Sup35. The formation of hydrogel was also demonstrated for the hnRNPA2 protein, and it was shown that proteins are retained to a different degree by hydrogel formed of different proteins [29].
