**2. The intriguing shape diversity of parvovirus telomeres**

#### **2.1 Size, GC content, and shape of parvovirus ends**

Although *Parvoviridae* genomes have been extensively studied, in particular for phylogenetic and evolutionary analyses, the sequence and characteristics of their telomeres are not clearly described. Therefore, we analyzed *Parvoviridae* TRs sequences publicly available in the NCBI GenBank database. *Parvoviridae* complete genomes were downloaded. At least one representative virus per genus was selected based on their notoriety and information available on the internal committee on the taxonomy of viruses (ICTV) website. TRs were annotated following GenBank annotation or information available in the literature (**Table S1**). TR sequences of homotelomeric genomes were then verified by aligning the 5<sup>0</sup> and 3<sup>0</sup> ends. Sequences differing in length and showing no homology (with no common palindromic regions between 5<sup>0</sup> and 3<sup>0</sup> ends) were discarded from the data set. Finally, the presence of palindromic regions was verified by RNAfold (method described later) [7]. A total of 40 *Parvoviridae* 5<sup>0</sup> and the 3<sup>0</sup> TRs sequences were extracted for further analysis. Among those are 17 *Densovirinae*, 22 *Parvovirinae,* and 1 unclassified *Parvoviridae* telomeres.

First, the length of each TR was determined and listed in **Table S1**. Interestingly, TR length varies within a single genus (**Table 1**), for example, going from 122 nucleotides for the PcDV to 550 nucleotides for the GmDV in the same genus *Ambidensovirus*. Second, the percentage of GC was calculated for each parvovirus TR. **Figure 1** highlights the GC content diversity of TRs between parvoviruses. The minimum and maximum GC content was observed for the left TR of AalDV2 and AAV2 with 32.4% and 69.7%, respectively. Within the genus *Ambidensovirus*, the percentage of GC ranges from 35% to 58.5% with the lowest GC content being attributed to the 5<sup>0</sup> TR of the PcDV (**Figure 1b**). Comparatively, telomeres of the human chromosomes contain 50–55% of G and C bases whereas the whole human genome contains 40.9% GC on average [8].

To visualize the general shape and the secondary structures of the viral TRs, the folding of each parvovirus TR was predicted by RNAfold program using parameters of the Turner model for single-stranded RNA and DNA and the Matthews model for double-stranded DNA [7]. Additionally, mFold program was used in the DNA mode to corroborate the predictions [9]. The most thermodynamically stable structures, or minimum free energy (MFE) structures, obtained on the RNAfold web server were used to propose a classification of the TR (**Figure 2**). Four groups were constituted according to the number of hairpin loops at their extremity and named H1 (previously named U- and I-shapes in the literature), H2 (corresponding to J-, Y-, T-shapes), H3, and H4 (**Figure 2**, **Table S1**). This classification based on the number of terminal hairpin loops after folding and on additional structural characteristics may be more informative and precise than the global shape. Moreover, this nomenclature is applicable to all parvovirus TR. Interestingly, TR sequences and shapes differ within a genus. For example, among *Ambidensovirus*, CpDV and DicDV 5<sup>0</sup> TRs are both classified in the H1 group although they only share 43% of sequence homology. In the genus *Bocaparvovirus*, HBoV1 and BpV1 left ends are 62% homologous in sequence but form a terminal H1 and H2 shape, respectively. Phylogenetic and evolution analyses of *Parvoviridae* have been constructed on the basis of NS1 proteins homology. Telomeres have never been considered as a classification criterion.


#### **Table 1.**

*Minimal and maximal length of five-prime and three-prime terminal repeats within genera of the* Parvoviridae *family.*

#### **Figure 1.**

*GC content of 5-prime terminal repeats of the* Parvovirida*e family: Differences inside the* Parvovirinae *(a) and the* Densovirinae *(b) subfamilies. The telomere sequences were downloaded from the NCBI GenBank database (see Table S1 for accession numbers). GC content was calculated using the APE program.*

#### **2.2 Comprehensive analysis of DNA secondary structural elements**

The global analysis of the parvovirus TR has highlighted their broad diversity, even within the same genus. To study the TR divergence, an in-depth prediction of the secondary structures followed by a principal component analysis (PCA) have been realized. Secondary structure elements (**Figure 2**) and non-B form DNA structures were included as variables in the PCA.

Non-canonical specific structures are susceptible to be recognized by cellular proteins and thus to be essential in the virus-host interactions. For example, a recent study reported that special structures in DNA, such as quadruplex structures, can preferentially bind to IFI16 and trigger more potent type I IFN responses than those produced by the same sequence in dsDNA [10]. Such structures are intrinsic in many viral genomes, *The Diversity of Parvovirus Telomeres DOI: http://dx.doi.org/10.5772/intechopen.102684*

#### **Figure 2.**

*Examples of five-prime terminal repeat folding among the* Parvoviridae *family. The most thermodynamically stable structures or minimum free energy (MFE) structures were obtained using the RNAfold program (RNA mode: http:// rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi). Four groups (H1 to H4) were generated according to the number of hairpin loops (in blue) found at the five-prime TR extremity. The following DNA secondary structures were identified and counted: Stems (green), multiloops (red), interior loops (yellow), and hairpin loops (blue).*

such as those of EBV and HPV [11]. Rich in GC, viral telomeres may also contain non-B DNA structures, such as G-quadruplexes (G4) or triplexes.

Therefore, putative G4 and triplexes were determined in all the parvoviral TR. G4 have been non-canonical DNA secondary structures formed by G-rich sequences (**Figure 3a**). Present in human telomeres, they are suggested to participate in chromosome stability maintenance [12]. G4 have also been shown to be present and play major roles in almost all virus families [13]. G4 have also been described in some parvovirus telomeres [14] but has not been systematically predicted in all parvovirus ends. G4 were predicted using the online tool QGRS-mapper [15] using the search parameters—QGRS max length 45, min G-group size 2, loop size from 0 to 12. These criteria are deliberately drastic to increase the stringency and relevance of the G4 prediction. Three values were collected—the raw number of predictive G4, with and without overlaps, and the QGRS max-score rewarding the G4 that are more likely to form. The *erythroparvovirus* B19V contains four G4 without overlaps which represent the maximum number of these non-B motifs for parvovirus TR. Including overlaps, CeDV and SifDV TRs harbor the highest number of putative tetraplex DNA structures with 296 G4. Of note, the two *brevidensoviruses* lack any predictive G4 in their ends. No correlation exist between the length of TR and the number of predictive G4 (data not shown).

In parallel, triplexes are important non-B form DNA structures for protein recognition, such as for the binding of p53 factor [16]. Triplex can form at homopurine: homopyrimidine sequences with mirror symmetry (**Figure 3b**). The triplex package of the R program was used to predict the existence of intramolecular triplex DNA structures in parvovirus TR [17]. Only two triplexes were found, both in the *bocaparvovirus* BAAV ends.

Finally, a PCA was performed for the forty left TRs and with the following variables—length, GC content, shape, max G-score, raw number of G4 with overlaps, and secondary structures elements (hairpins loops, interior loops, junction loops, and stems) collected from RNAfold analysis. The R package FactoMineR was used [18]. The main PCA variables are the stems and loops (**Figure 4a**). The "hairpin loops" criteria is one of the most important element allowing division of parvoviruses into groups, hence the relevance of our proposed classification in shapes H1 to H4. Clustering was subsequently realized on the three most informative dimensions

#### **Figure 3.**

*Two non-B secondary structures: G-quadruplex (G4) and triplex. (a) 3D representation of a theoretical G4. (b) 3D representation of a G4 in the parvovirus B19 five-prime telomere and correspondence to the G tetrads sequence (c). (d) Triplex theoretical 2D representation. (e) 3D representation of a triplex of the five-prime bovine AAV terminal repeat and (f) correspondence to the sequence. (b) and (e) figures were obtained using the Jmol software (Jmol: An open-source Java viewer for chemical structures in 3D. http://www.jmol.org/).*

corresponding to more than 70% of the cumulative variance. Five clusters were obtained (**Figure 4b**).

Cluster 1, composed of individuals such as the HBoV1 and AAV2, one of the most famous parvoviruses in the gene therapy field, is characterized by a high value for the variable "GC content." Parvovirus B19 belongs to cluster 2, a group characterized by high values for the G4 scores and TR length. Cluster 3 mainly depends on the shape class. Individuals in cluster 4 hold a similar number of multiloops and hairpin loops. Finally, viruses in cluster 5 share many DNA structure common features (stems, interior loops, hairpin loops, multiloops, and length). Clusters do not perfectly correlate with phylogenetic classification (**Figure 4c**), however, we observed that cluster 1 is only composed of *Dependoparvovirus* and *Bocaparvovirus* and cluster 5 contains two *Ambidensovirus*, GmDV and PsiDV. Interestingly, the latter highly differs from other groups.
