**3.1. Frequencies of TAR10 and ATR49 triplets in various taxa**

Visual observations of tRNA deduced 2D structures suggested that nt triplets which could correspond to stop or start codons seemed to be particularly represented at specific positions. The UAR (R for purine) triplets at position 8–10 in the standard numbering and therefore will be named UAR10, whereas the potential initiation codons whose last nt is at the position 49 will be called AUR49 (**Figure 1**). We chose to number the codons only according to their last nt because the nt 47 is frequently missing in the metazoan mt-tRNAs. Analyzes focus on DNA; hence, these are usually annotated TAR or ATR instead of UAR or AUR. All the tRNAs which bear one or both of these codons are named ss-tRNAs (ss for stop and start) or ss-*trn* for the corresponding genes. Using tRNAdb and mitotRNAdb databases, these triplets' frequencies were investigated in different taxa including nuclear and organelle genomes for eukaryotes (**Table 1**). Excluding taxa for which the number of *trn* genes is too low for statistical


Metazoan taxa are in italics. *Abbreviation*: Nb, number of *trn* genes. TAR + ATR for % of *trn* genes bearing the two types of triplet. \* Mitochondria.

**Table 1.** Percentage of TAR10 and ATR49 triplets in various taxa independently of the V-R size.

processing of precursor RNAs. Besides, in several taxa mt-mRNAs, rRNAs and even tRNAs may be oligoadenylated or polyadenylated [16]. This has numerous consequences with potentially dual and opposite roles: this promotes transcript stability or offers a target for initiating degradation. Overlapping genes on the same DNA strand occur throughout metazoa [17]. Therefore, the termination points of the protein-encoding genes could be difficult to infer as stop codons (generally UAA or UAG) may be absent. It is accepted that abbreviated stop codons (U or UA) are converted to UAA codons by polyadenylation after transcript cleavage, and this has been confirmed by analyzes of transcripts in some cases [18]. Sometimes, the initiation codon may also not have been detected. For several protein-encoding genes, the question of a possible overlapping with adjacent downstream or upstream *trn* genes is often raised [19]. Moreover,

overlaps between adjacent mt-*trn* genes are frequent, but it is out of our topic [19, 20].

constitutes the original topic of this chapter.

**2. Material and methods**

6 Mitochondrial DNA - New Insights

**3. Results and discussion**

Incidentally, in 2004, searching for chaetognath mt-*trn* genes [21], it was observed that tRNAs bear nt triplets corresponding to stop or start codons at precise conserved positions, and this

Most of the research was done in two databases which include primary sequences and graphical representations of tRNA 2D structures, tRNAdb (http://trna.bioinf.uni-leipzig.de/ DataOutput/) contained more than 12,000 *trn* genes from 579 species belonging to prokaryotes and eukaryotes, whereas in mitotRNAdb (http://mttrna.bioinf.uni-leipzig.de/mtDataOutput/), 30,525 metazoan mt-*trn* genes belonging to 1418 species were recorded [22]. Despite a bias for metazoa, these two databases provide powerful and fast search engines. Alignments were generated by Clustal W (www.ebi.ac.uk/clustalw/), whereas secondary structures were predicted by tRNAscan-SE (http://lowelab.ucsc.edu/tRNAscan-SE/) [23]. BLAST analyzes

Visual observations of tRNA deduced 2D structures suggested that nt triplets which could correspond to stop or start codons seemed to be particularly represented at specific positions. The UAR (R for purine) triplets at position 8–10 in the standard numbering and therefore will be named UAR10, whereas the potential initiation codons whose last nt is at the position 49 will be called AUR49 (**Figure 1**). We chose to number the codons only according to their last nt because the nt 47 is frequently missing in the metazoan mt-tRNAs. Analyzes focus on DNA; hence, these are usually annotated TAR or ATR instead of UAR or AUR. All the tRNAs which bear one or both of these codons are named ss-tRNAs (ss for stop and start) or ss-*trn* for the corresponding genes. Using tRNAdb and mitotRNAdb databases, these triplets' frequencies were investigated in different taxa including nuclear and organelle genomes for eukaryotes (**Table 1**). Excluding taxa for which the number of *trn* genes is too low for statistical

were conducted using the website: https://blast.ncbi.nlm.nih.gov/Blast.cgi.

**3.1. Frequencies of TAR10 and ATR49 triplets in various taxa**

analysis, TAR10 always occurs at high frequencies, whether in prokaryotic, nuclear, or organelle genomes. Values range from 41.1% for fungi to 81.6% for pseudocoelomates. In all the taxa and all tRNA species combined, the percentage of TAG10 triplets is always significantly higher than those of TAA10. The differences are very important in prokaryotic and nuclear genomes, since the percentage of TAA10 is always less than 1, while that of TAG10 is at least 40%. Within the organelle genomes, the difference is smaller but can vary by a factor of 2.5–22.

As the TAR10 triplet (principally TAG) is present in at least 40% of the *trn* genes for all taxa and genomic systems combined, it could have been present in *trn* genes of the Last Unicellular Common Ancestor (LUCA), which presumably lived some 3.5–3.8 billion years ago [24]. It is probably an ancestral character which was present in proto-*trn* sequences. As the percentage of TAA10 strongly increases in *trn* genes of organelles, one can ask whether this character was not already present in their bacterial ancestor. It is now assumed that despite their diversity, all mitochondria derive from an endosymbiotic α-proteobacterium which has been integrated into a host cell related to Asgard Archaea approximately 1.5–2 billion years ago [25]. However, the earliest fossils possessing features typical of fungi date to 2.4 billion years ago [26]. Moreover, the eukaryotic cells would be chimeras constituted of an archaebacterium and one or more Eubacteria [27]. In addition, all current models for the origin of eukaryotes suggest that the eukaryotic common ancestor had mitochondria. Therefore, as the level of TAA10 is very low in *trn* genes of α-proteobacteria, it could therefore be a derived trait that may be related to the increase in AT% in mtDNA and/or recognition constraints by mt-aaRSs and modification enzymes. Similarly, it is generally accepted that all chloroplasts and their derivatives are derived from a single cyanobacterial ancestor [28], and in current cyanobacteria, the respective percentage of TAG10 and TAA10 triplets are 62.5 and 3.6, respectively. The increase in the percentage of TAA10 characterizes organelles.

In all the taxa for all tRNA species combined, the ATR49 triplets are always present in smaller numbers than TAR10. Moreover, their numbers are negligible except in organelle genomes, mainly mitochondria. The low level of ATR49 triplets in Pseudocoelomata is due to the frequent absence of T-arm in their mt-tRNAs. In mitochondria, in some taxa, the frequency of ATG49 is higher than of ATA49, while in others, the opposite occurs. The variability is not surprising, given approximately 2 billion years of mtDNA evolution [29]. It must be noted that the nt G is overrepresented at the 5′-end of the 5′-acceptor- and D-stems, quite often at the 5′-end of the T-stem but rarely at the equivalent position of the anticodon-stem. In taxa where the percentage of ATA49 is higher than those of ATG49, G is most often not the nt majority at the 5′-end of the T-stem. Moreover, differences between the relative percentages of ATG49 and ATA49 could be due, at least in part, to variations in the AT% in organelle DNAs. The percentage of ATR49 is very low in α-proteobacteria and weaker in this last taxon compared to all Proteobacteria or Eubacteria, and it is also very weak in cyanobacteria, and so the significant rate of ATR49 triplets would seem to be a derived condition of organelle DNAs rather than a conserved primitive state lost in current prokaryotes. This trait probably appeared during the transition from endosymbiotic bacterium to permanent organelle that implied massive evolutionary changes including genome reduction, endosymbiotic and lateral gene transfers, and emergence of new genes and the retargeting of proteins [25]. The timing of the mt-endosymbiosis and of the proto-mitochondria to mitochondria transition is uncertain, but one might trace the origin of the ATR49 triplets between at least the first eukaryotic common ancestor (FECA) and the last eukaryotic common ancestor (LECA). A second event occurred, at least, in the mitochondria of the ancestors of Opisthokonta (i.e., Metazoa and Fungi), which would have led to a net increase in numbers of ATR49. ATR49 means that the last two nts of the V-R are AT. It turns out that this mainly concerns the mt-*trn* genes, whose V-R has only 4 nts, which are almost exclusively present in the Fungi/Metazoa clade.

There are large differences in the frequencies of the TAR10 and ATR49 triplets depending on the species of *trn* genes (**Table 2**) and taxa (data not shown), and the selective variations in some taxa suggest that the increase in frequency for some types of triplets would be much more recent than mentioned above; in addition, decreases are also observed. There are, however, very conservative trends such as the presence of ATR49 triplets in genes specifying tRNA-Ala. Analyzes on mt-*trn* genes of Deuterostomia for which a great number of sequences for each type are available (from 1085 to 1382) show that only the tRNA-Cys and tRNA-Glu species have intermediate TAR10 percentages (**Table 2**). In all other tRNA species, the values are extreme, 9 and 10 tRNA species with values ranging from 0.4 to 9.8% or greater than 82.4%, respectively (**Table 2**). In contrast, half of the tRNA species have low ATR49 percentages (≤ to 10.8), and for only four types percentages are ≥77.8. There would also be a tendency suggesting that tRNA species with very high or very low percentages of TAR10 most often have low ATR49 (the tRNA species with the 7 highest and the 8 lowest TAR10 percentages exhibit 10 out of 11 of the lower percentages for ATR49).

was then made in conspecific mt-genomes. Using this strategy, these triplets have been only

The *trn* genes are represented by three-letter codes of amino acids. The tRNAs are ordered by decreasing TAR10 percentages. The percentage values ≥ to 77.8, between 17.0 to 56.5 and ≤ to 10.8 are underlined in yellow, blue, and green,

True Mitochondrial tRNA Punctuation and Initiation Using Overlapping Stop and Start Codons…

http://dx.doi.org/10.5772/intechopen.75555

9

An example of putative uses of TAR10 triplets as stop codons is presented in **Table 3** for a subclass of parasitic flatworms (Platyhelminthes : Eucestoda). Their mt-genetic code has only UAG and UAA as stop codons, avoiding possible bias due to use of other types of termination codons. In 51 among 66 complete mt-genomes, the first in-frame potential stop codon of the *cox1* gene is in the downstream *trnT* gene (24 cases with TAG10 suggesting a 10 nt overlap between *cox1* and *trnT* genes). Authors considering that this long overlap would be impossible have proposed a number of alternative options favoring overlap avoidance (e.g., [30]). (1) *cox1* might use an earlier atypical stop codon. (2) The 3′-end of the *cox1* mRNA could have an abbreviated stop codon (U or UA instead of UAG10) upstream the *trnT* gene which is completed by polyadenylation. (3) If in the potential long transcript, the cleavage would occur just after G10, the *cox1* mRNA would end with the complete UAG10 as stop codon and the first 10 nts of the *trnT* gene would be added by an unknown editing process. (4) The *trnT* gene would be shorter in its 5′-end lacking the nts from 1 to 8 or 9, e.g., this has been proposed for the mt-*trnT* of Cyclophyllidea (*Echinococcus granulosus*, *Hymenolepis diminuta*, and *Taenia crassiceps*). If the full stop codon is used, then there is only a single nt (G10) overlap between *cox1* and *trnT*. Moreover, if the end of the *cox*1 gene is at the level of T9, the stop codon would complete by polyadenylation; whereas if the protein gene has a complete stop codon, the nt G10 would be added by edition. In the alternative structures, the D-arm is absent, whereas it is typical for this tRNA in digeneans (a class of Platyhelminthes) and in other phyla. However, mt-*trnT* genes issuing from Cyclophyllidea for which the first potential stop codon is at different positions (upstream or downstream the *trnT* gene, or in this last gene but upstream or downstream TAG10 or at this last position) exhibit

found in metazoan mtDNAs, in which overlapping mt-*trn* genes have long been known.

**Table 2.** Percentages of TAR10 and ATR49 by mt-*trn* gene species in Deuterostomia.

respectively.

#### **3.2. Examples of putative implications of TAR10 and ATR49 as stop and start codons**

In order to investigate possible implications of TAR10 and ATR49 triplets in translation, analyzes were performed in GenBank using as keywords: "TAA stop codon is completed by the addition of 3' A residues to the mRNA", "alternative start codon" or "start codon not determined" and mitochondrion (or mitochondrial DNA) complete genome. Then, it was researched whether upstream (for start codon) or downstream (for stop codon) of the protein-encoding gene was a *trn* gene. When a *trn* gene was found, TAR10 or ATR49 triplets were searched, and the same investigation



The *trn* genes are represented by three-letter codes of amino acids. The tRNAs are ordered by decreasing TAR10 percentages. The percentage values ≥ to 77.8, between 17.0 to 56.5 and ≤ to 10.8 are underlined in yellow, blue, and green, respectively.

**Table 2.** Percentages of TAR10 and ATR49 by mt-*trn* gene species in Deuterostomia.

mainly mitochondria. The low level of ATR49 triplets in Pseudocoelomata is due to the frequent absence of T-arm in their mt-tRNAs. In mitochondria, in some taxa, the frequency of ATG49 is higher than of ATA49, while in others, the opposite occurs. The variability is not surprising, given approximately 2 billion years of mtDNA evolution [29]. It must be noted that the nt G is overrepresented at the 5′-end of the 5′-acceptor- and D-stems, quite often at the 5′-end of the T-stem but rarely at the equivalent position of the anticodon-stem. In taxa where the percentage of ATA49 is higher than those of ATG49, G is most often not the nt majority at the 5′-end of the T-stem. Moreover, differences between the relative percentages of ATG49 and ATA49 could be due, at least in part, to variations in the AT% in organelle DNAs. The percentage of ATR49 is very low in α-proteobacteria and weaker in this last taxon compared to all Proteobacteria or Eubacteria, and it is also very weak in cyanobacteria, and so the significant rate of ATR49 triplets would seem to be a derived condition of organelle DNAs rather than a conserved primitive state lost in current prokaryotes. This trait probably appeared during the transition from endosymbiotic bacterium to permanent organelle that implied massive evolutionary changes including genome reduction, endosymbiotic and lateral gene transfers, and emergence of new genes and the retargeting of proteins [25]. The timing of the mt-endosymbiosis and of the proto-mitochondria to mitochondria transition is uncertain, but one might trace the origin of the ATR49 triplets between at least the first eukaryotic common ancestor (FECA) and the last eukaryotic common ancestor (LECA). A second event occurred, at least, in the mitochondria of the ancestors of Opisthokonta (i.e., Metazoa and Fungi), which would have led to a net increase in numbers of ATR49. ATR49 means that the last two nts of the V-R are AT. It turns out that this mainly concerns the mt-*trn* genes, whose V-R has only 4 nts, which are almost exclusively present in the Fungi/Metazoa clade.

8 Mitochondrial DNA - New Insights

There are large differences in the frequencies of the TAR10 and ATR49 triplets depending on the species of *trn* genes (**Table 2**) and taxa (data not shown), and the selective variations in some taxa suggest that the increase in frequency for some types of triplets would be much more recent than mentioned above; in addition, decreases are also observed. There are, however, very conservative trends such as the presence of ATR49 triplets in genes specifying tRNA-Ala. Analyzes on mt-*trn* genes of Deuterostomia for which a great number of sequences for each type are available (from 1085 to 1382) show that only the tRNA-Cys and tRNA-Glu species have intermediate TAR10 percentages (**Table 2**). In all other tRNA species, the values are extreme, 9 and 10 tRNA species with values ranging from 0.4 to 9.8% or greater than 82.4%, respectively (**Table 2**). In contrast, half of the tRNA species have low ATR49 percentages (≤ to 10.8), and for only four types percentages are ≥77.8. There would also be a tendency suggesting that tRNA species with very high or very low percentages of TAR10 most often have low ATR49 (the tRNA species with the 7 highest and the 8

lowest TAR10 percentages exhibit 10 out of 11 of the lower percentages for ATR49).

**3.2. Examples of putative implications of TAR10 and ATR49 as stop and start codons**

In order to investigate possible implications of TAR10 and ATR49 triplets in translation, analyzes were performed in GenBank using as keywords: "TAA stop codon is completed by the addition of 3' A residues to the mRNA", "alternative start codon" or "start codon not determined" and mitochondrion (or mitochondrial DNA) complete genome. Then, it was researched whether upstream (for start codon) or downstream (for stop codon) of the protein-encoding gene was a *trn* gene. When a *trn* gene was found, TAR10 or ATR49 triplets were searched, and the same investigation was then made in conspecific mt-genomes. Using this strategy, these triplets have been only found in metazoan mtDNAs, in which overlapping mt-*trn* genes have long been known.

An example of putative uses of TAR10 triplets as stop codons is presented in **Table 3** for a subclass of parasitic flatworms (Platyhelminthes : Eucestoda). Their mt-genetic code has only UAG and UAA as stop codons, avoiding possible bias due to use of other types of termination codons. In 51 among 66 complete mt-genomes, the first in-frame potential stop codon of the *cox1* gene is in the downstream *trnT* gene (24 cases with TAG10 suggesting a 10 nt overlap between *cox1* and *trnT* genes). Authors considering that this long overlap would be impossible have proposed a number of alternative options favoring overlap avoidance (e.g., [30]). (1) *cox1* might use an earlier atypical stop codon. (2) The 3′-end of the *cox1* mRNA could have an abbreviated stop codon (U or UA instead of UAG10) upstream the *trnT* gene which is completed by polyadenylation. (3) If in the potential long transcript, the cleavage would occur just after G10, the *cox1* mRNA would end with the complete UAG10 as stop codon and the first 10 nts of the *trnT* gene would be added by an unknown editing process. (4) The *trnT* gene would be shorter in its 5′-end lacking the nts from 1 to 8 or 9, e.g., this has been proposed for the mt-*trnT* of Cyclophyllidea (*Echinococcus granulosus*, *Hymenolepis diminuta*, and *Taenia crassiceps*). If the full stop codon is used, then there is only a single nt (G10) overlap between *cox1* and *trnT*. Moreover, if the end of the *cox*1 gene is at the level of T9, the stop codon would complete by polyadenylation; whereas if the protein gene has a complete stop codon, the nt G10 would be added by edition. In the alternative structures, the D-arm is absent, whereas it is typical for this tRNA in digeneans (a class of Platyhelminthes) and in other phyla. However, mt-*trnT* genes issuing from Cyclophyllidea for which the first potential stop codon is at different positions (upstream or downstream the *trnT* gene, or in this last gene but upstream or downstream TAG10 or at this last position) exhibit


Species names are followed by their accession number(s). \*: sequences for which the authors of these latter considered that there was an abbreviated stop codon and this latter was upstream the *trn* sequence. Symbols: <sup>X</sup> TAR10 was the first in-frame putative stop codon; § , \$ , & and μ, the putative stop codon was upstream the *trn* gene (*t*g), in the *t*g but upstream or downstream TAR10 or nts 8–10, downstream the *tg*, respectively. *Abbreviations*: *P. c*., *Pseudanoplocephala crawfordi*; Proteoce., Proteocephalidea; stop cod., putative stop codon according to the authors of the sequences.

gene; moreover, an ATA triplet is integrally present in the V-R of the *trnL2* gene of *Heleophryne regis*, but it is not in frame with the following gene. For these last four cases, the authors of the sequences proposed alternative start codons. This seems obligatory, but this has not been experimentally verified. For several authors who have sequenced parts of mtDNAs of Hylidae, the *nd1* gene would start at ATA49 for about 140 sequences (e.g., Roelants and Bossuyt [31]). In the two studied taxa, Blast analyzes of the NCBI ESTs and SRA (SequenceRead Archive) databases have been performed, but no result supports the proposed hypotheses: transcripts starting at an ATR49 or terminating at a TAR10 were not found. However, for each taxon, few

**Table 4.** Position of the first putative start codon of the *nad1* gene versus the upstream gene specifying tRNA-Leu2 in

<sup>X</sup>: ATA49 as the first putative in-frame start codon. \*: "start codon not determined" according to the authors of the

True Mitochondrial tRNA Punctuation and Initiation Using Overlapping Stop and Start Codons…

**3.3. Why mt-***ss-trn* **genes with TAG10 and ATR49 triplets as putative stop and start** 

long. A broader range of mtDNA size is found in higher plants (from 0.2 × 10<sup>6</sup>

Foremost, biases in the search strategy cannot be excluded, but the important point to note is that mt-genomes of animals, fungi, protists, and plants differ drastically in all major characteristics including gene content and large size variation. Generally, metazoans have ultra-compact mtDNAs (from c.10,000 to c.50,000 bp); usually, nonfunctional sequences are rapidly eliminated, and there are short intergenic regions and frequent overlaps [13]. However, nonbilaterian mtgenomes have higher variation in size, gene content, shape, and genetic code [32]. The mtDNA size range is from 30,000 to 90,000 bp in fungi, and generally, intergenic regions are relatively

× 10<sup>6</sup> bp [33]), and the largest known mt-genome in this lineage exceeds sizes of reduced bacterial and nuclear genomes [34]. The increased sizes of plant mtDNAs are mostly due to noncoding DNA sequences, large inserted nuclear regions, and many introns and not to a large increase

to about 11.3

: alternative start codon in the *trn* gene.

http://dx.doi.org/10.5772/intechopen.75555

11

mt-transcripts occur, and fully matured transcripts are even rarer.

sequences. Alternative start codons are given by the authors of the sequences. §

*Abbreviations*: Dendro, Dendrobatidae; Cerato., Ceratophryidae; Heleo., Heleophrynidae.

**codons only occur in Metazoa?**

Hyloidea, a superfamily of frogs.

**Table 3.** Position of the first complete in-frame stop codon of the *cox1* gene versus the following *trnT* gene in Cestoda (Platyhelminthes).

similar secondary structures, including a D-arm. In addition*,* the high level of nt conservation in the 5′-end of the *trnT* genes of cestoda (i.e., G1, G2, T7, T8, A9, G10, T11, T12 and A14) suggests strongly that the 5′-acceptor-stem and the D-stem are under positive selection. All this implies that the hypothesis of D-armless tRNAs is, according to us, improbable.

Concerning the putative ATR49 start codon, in GenBank, the number of complete mt-genomes found using the keywords previously mentioned was relatively low; moreover, in some cases, the upstream gene encoded a protein, specified a rRNA and/or there was only one mention for a given taxon. A significant example within Deuterostomia (frogs) is presented in **Table 4**. In the superfamily Hyloidea, the ATA49 triplet is frequently the first potential complete start codon at the level of the gene pair encoding and specifying NAD1 and tRNA-Leu2, respectively. In two families (Bufonidae, Hylidae), for all the sequences (16 belonging to 14 different species), the first ATR triplet found in frame in the ORF of the *nd1* gene is ATA49. For four sequences belonging to three other frog families, the ATR49 triplet is missing from the *trnL2*


<sup>X</sup>: ATA49 as the first putative in-frame start codon. \*: "start codon not determined" according to the authors of the sequences. Alternative start codons are given by the authors of the sequences. § : alternative start codon in the *trn* gene. *Abbreviations*: Dendro, Dendrobatidae; Cerato., Ceratophryidae; Heleo., Heleophrynidae.

**Table 4.** Position of the first putative start codon of the *nad1* gene versus the upstream gene specifying tRNA-Leu2 in Hyloidea, a superfamily of frogs.

gene; moreover, an ATA triplet is integrally present in the V-R of the *trnL2* gene of *Heleophryne regis*, but it is not in frame with the following gene. For these last four cases, the authors of the sequences proposed alternative start codons. This seems obligatory, but this has not been experimentally verified. For several authors who have sequenced parts of mtDNAs of Hylidae, the *nd1* gene would start at ATA49 for about 140 sequences (e.g., Roelants and Bossuyt [31]).

In the two studied taxa, Blast analyzes of the NCBI ESTs and SRA (SequenceRead Archive) databases have been performed, but no result supports the proposed hypotheses: transcripts starting at an ATR49 or terminating at a TAR10 were not found. However, for each taxon, few mt-transcripts occur, and fully matured transcripts are even rarer.
