**4.1. Genomic stop codon-depletion: overall analysis**

**Table 1** presents numbers of stops in all six frames of the mitochondrial genes of *Aleurodicus dugesii*, and of *Aleurodicus dispersus*, according to the strand presented in GenBank. The frame coding for the regular protein in *Aleurodicus dugesii* is always the only stopless frame, the frame(s) coding for the regular proteins are underlined in *Aleurodicus dispersus*, these are not necessarily stopless, and are not necessarily the only stopless frame.

In order to account for slight variations in gene sizes, I compare between percentages of stop codons averaged across all six frames in the two species, gene by gene. Mean stop Directed Mutations Recode Mitochondrial Genes: From Regular to Stopless Genetic Codes http://dx.doi.org/10.5772/intechopen.80871 65


*3.3.5. Stop codon depletion in antisense strand and stop codon translation: ND4l and ND5*

ND4l according to the above-described alignment.

previous section).

64 Mitochondrial DNA - New Insights

**4. General discussion**

Annotations in GenBank for *Aleurodicus dispersus*' ND4l and ND5 do not match proteins homologous to the corresponding NADH dehydrogenase subunits. For ND4l, the peptide translated from frame 0 of the antisense of that gene (this frame includes five stops) yields a short alignment with the regular protein in *Aleurodicus dugesii* (24 residues, 87% similarity, e value 8.5, not shown). Hence, the annotated ORF in *Aleurodicus dispersus* is probably a stop-codon-depleted antisense sequence (the corresponding frame in *Aleurodicus dugesii* has four stops), which does not code for the regular ND4l gene. This stop codon depletion in *Aleurodicus dispersus* introduced five stops in the sense strand frame that apparently codes for

For ND5, the peptide translated from the +1 frame of the antisense of the GenBank-annotated sequence is homologous over its complete length to NADH dehydrogenase subunit 5 of *Aleurodicus dugesii* (89% similarity, e value 0, not shown). This frame has a single stop codon that aligns with serine in *Aleurodicus dugesii* (see discussion of insertion of serine at stops in

A further mitochondrial gene for which the GenBank annotation does not produce the expected protein for *Aleurodicus dispersus* is ND4. Alignment analyses detect peptides homologous with regular NADH dehydrogenase subunit 4 when translating frames +1 and +2 of the antisense of the GenBank-annotated ND4 gene. Blastp alignment analyses detect homology with the regular ND4-encoded protein of *Aleurodicus dugesii*; the regular protein is encoded in antisense frame +1 until residue 297 (89% similarity, e value 3 × 10−126, not shown). This alignment includes a single stop, matching tyrosine in *Aleurodicus dugesii*. Part of the remaining protein is coded by a stopless stretch of antisense strand frame +2, where residues 341–427 align with the regular protein from *Aleurodicus dugesii* (73% similarity, e value 2 × 10−21). Hence, the annotated ORF in *Aleurodicus dispersus* is a stop-codon-depleted antisense frame that codes for an apparently different protein, the actual NADH dehydrogenase subunit 4 is

**Table 1** presents numbers of stops in all six frames of the mitochondrial genes of *Aleurodicus dugesii*, and of *Aleurodicus dispersus*, according to the strand presented in GenBank. The frame coding for the regular protein in *Aleurodicus dugesii* is always the only stopless frame, the frame(s) coding for the regular proteins are underlined in *Aleurodicus dispersus*, these are not

In order to account for slight variations in gene sizes, I compare between percentages of stop codons averaged across all six frames in the two species, gene by gene. Mean stop

*3.3.6. Stop codon depletion in antisense strand, frameshift, and stop translation: ND4*

encoded by two frames, one containing a stop, on the opposite strand.

necessarily stopless, and are not necessarily the only stopless frame.

**4.1. Genomic stop codon-depletion: overall analysis**

**Table 1.** TAR stop codons in the six frames of mitogenome-encoded genes of *Aleurodicus dugesii* and *Aleurodicus dispersus*.

codon percentages decrease in 11 among 13 protein coding genes of *Aleurodicus dispersus*, as compared to *Aleurodicus dugesii*. This is a significant majority of cases according to a one tailed sign test using a binomial distribution and assuming equal probability of getting more or less stop codons in any of these mitogenes (P = 0.00562). This overall stop codon depletion occurs in all seven "recoded" genes. Stop codon depletion occurs qualitatively in four among the six genes with regular, unchanged coding structure. This tendency is not statistically significant for this subgroup of genes when using the robust, but blunt nonparametric sign test. A paired t test between mean percentages of stop codons averaged across frames indicates also for these six genes a statistically significant decrease in stops in *Aleurodicus dispersus*, as compared to *Aleurodicus dugesii*. This result suggests that stop codon depletion occurred across all or at least most of this genome, and for most frames, not only for genes whose coding structure was altered, and not only for frames who became ORFs.

Presumably, unknown mechanisms associated with replication depleted stop codons in this species' mitogenome, perhaps cumulatively over several replication or DNA edition cycles. Total stop codon depletion in some frames produced new ORFs. Natural selection against stop codons presumably enhanced unknown enzymatic phenomena, eliminating stop codons in these frames. It seems plausible that these frames in usual mitogenomes code for proteins translated by stop suppressor tRNAs. Specific unknown conditions in *Aleurodicus dispersus* may favor enhanced expression of peptides coded by frames that usually include stops in other mitogenomes, such as *Aleurodicus dugesii*. These constraints would have ultimately caused genomic stop codon depletion in *Aleurodicus dispersus*. In regular mitogenomes, stop codon translation downregulates expression of these unusual peptides in favor of proteins coded by regular ORFs, but in *Aleurodicus dispersus*, this hierarchy may be inexistent (when two stopless ORFs occur in a gene) or reversed (as in several mitogenes of *Lepidochelys olivacea* [9]), with translation of the unusual peptide not necessitating stop codon suppression, and translation of regular mitochondrial proteins requiring tRNAs with anticodons matching stop codons.

#### **4.2. Coding redundancy between frames and tolerating ribosomal frameshifts**

The original hypothesis of frame shiftability suggests that different frames of a gene code for somewhat similar peptides, presumably because the genetic code is optimized to tolerate frameshifts [83–85]. This hypothesis suggests that redundancy among frames in *Aleurodicus dispersus* should be greater than in the closely related *Aleurodicus dugesii* where coding seems regular, assuming that changes in coding structure increase redundancy among frames for coding protein variants with similar functions.

frameshifts during translation. Indeed, frequencies of off frame stop codons in mitochondrial genes are inversely correlated to predicted ribosomal RNA stability [86–88], suggesting that genes adapt to avoid negative effects of ribosomal frameshifts [44, 89, 90]. Stop codon-depletion may enable coding for more proteins, in addition to increasing redundancies between frames. Several effects could explain that results are not very strong statistically at the level of redundancy between frames. This hypothesis should be further tested, experimentally as done by Wang et al. [83–85] and by other bioinformatics analyses. For example, one can expect that frameshift tolerance biases exist for identity at amino acids that are not easily replaced by other amino acids (e.g., cysteine), but less for mutable ones (leucine, isoleucine, etc.). The preliminary tests presented here are not incompatible with the frameshift tolerance hypothesis [83–85].

Directed Mutations Recode Mitochondrial Genes: From Regular to Stopless Genetic Codes

http://dx.doi.org/10.5772/intechopen.80871

67

It is important to understand in this context that the genetic code's discovery, among the greatest fundamental discoveries, is not over, but only in process. Indeed, coding sequences include much more information than generally believed, even beyond RNA editing (RDD [91]), systematic transformations during replication [44–46] and transcription [39, 47–51], and translation along expanded codons [32–43]. Cryptic codes [92, 93] such as the well-developed theory of the natural circular code [94–112] regulate the ribosomal translation frame [113–116],

During the redescription of the recoded *Lepidochelys olivacea* mitogenome [9], an anonymous reviewer suggested that sequencing errors mimicked frameshifting mutations (insertion/ deletion), producing the impression of frame recoding. This explanation is incompatible with the phenomena described in *Lepidochelys* and *Aleurodicus*, because these involve numerous specific changes/mutations in stop codon-specific nucleotide contexts, totally depleting stop codons in usually noncoding frames, and introducing stop codons in usually stopless, regular ORFs. Frameshifting mutations insert/delete a nucleotide within a regular ORF, which due to the frameshifting mutation is split between two frames. This does not deplete stop codons occurring in noncoding frames, nor introduce stops in the frameshifted ORF. ORF creation in usually noncoding frames by stop codon depletion in *Lepidochelys olivacea* [9] and *Aleurodicus dispersus* probably originates from natural, enzymatic, directed mutations [118] or other processes causing directed mutations, such as transposon-mediated directed mutations [119, 120]. Recoding probably occurs beyond mitogenomes. However, the short highly conserved mitogenomes [121] are most adequate to manual reannotation, a first necessary stage to detect events where genes are recoded from one to another genetic code. I suggest that annotations of genomes, and mitogenomes in particular, take systematically into account phenomena such as swinger sequences [51], and directed stop codon depletions that may result in ORFs that do not code for regular recognized proteins as presented here, especially in genomes/ genes that seem unusual and remain in an unverified status in GenBank. **Figure 5** resumes the changes in coding structures that occurred in *Aleurodicus dispersus* due to recoding, as compared to the "ancestral" regular situation in *A. dugesii*. The proposed stopless genetic code in *Aleurodicus dispersus* presumably introduces serine at stops TAR and differs from previously described alternative arthropod mitochondrial genetic codes, which usually recode codons

and protein cotranslational folding [117], remain to be described and decoded.

**4.3. Sequencing artifacts and genome annotation**

AGR [122, 123].

I used ClustalX to align the regular protein with peptides coded by the +1 and +2 frames of the same coding strand, for each *Aleurodicus dispersus* and *Aleurodicus dugesii*. Numbers of amino acids that were identical in the alignment were divided by total peptide lengths. This proportion for genes from *Aleurodicus dispersus* is plotted as a function of the corresponding proportion for *Aleurodicus dugesii* for alignments between frame 0 and frame +1 (**Figure 4**). Redundancy between frame 0 and + 1 is greater in mitogenes of *Aleurodicus dispersus* in nine among twelve genes (there was no difference between these species for gene COII), a statistically significant majority according to a one tailed sign test (P = 0.0365). This tendency however does not exist for alignments between frames 0 and +2 (redundancy in *Aleurodicus dispersus* greater in 6 among thirteen genes).

This analysis tentatively indicates that stop codon depletion and coding by frameshifting and translation of stop codons might associate with a phenomenon increasing tolerance to

**Figure 4.** Redundancy between peptides coded by frames 0 and +1 for mitochondrial genes of *Aleurodicus dispersus* as a function of the corresponding redundancy for *Aleurodicus dugesii*. Filled symbols are for genes with unusual coding structure (frameshifts, stop depletion creating new ORFs, stop codon translation).

frameshifts during translation. Indeed, frequencies of off frame stop codons in mitochondrial genes are inversely correlated to predicted ribosomal RNA stability [86–88], suggesting that genes adapt to avoid negative effects of ribosomal frameshifts [44, 89, 90]. Stop codon-depletion may enable coding for more proteins, in addition to increasing redundancies between frames. Several effects could explain that results are not very strong statistically at the level of redundancy between frames. This hypothesis should be further tested, experimentally as done by Wang et al. [83–85] and by other bioinformatics analyses. For example, one can expect that frameshift tolerance biases exist for identity at amino acids that are not easily replaced by other amino acids (e.g., cysteine), but less for mutable ones (leucine, isoleucine, etc.). The preliminary tests presented here are not incompatible with the frameshift tolerance hypothesis [83–85].

It is important to understand in this context that the genetic code's discovery, among the greatest fundamental discoveries, is not over, but only in process. Indeed, coding sequences include much more information than generally believed, even beyond RNA editing (RDD [91]), systematic transformations during replication [44–46] and transcription [39, 47–51], and translation along expanded codons [32–43]. Cryptic codes [92, 93] such as the well-developed theory of the natural circular code [94–112] regulate the ribosomal translation frame [113–116], and protein cotranslational folding [117], remain to be described and decoded.

#### **4.3. Sequencing artifacts and genome annotation**

**4.2. Coding redundancy between frames and tolerating ribosomal frameshifts**

coding protein variants with similar functions.

66 Mitochondrial DNA - New Insights

*dispersus* greater in 6 among thirteen genes).

The original hypothesis of frame shiftability suggests that different frames of a gene code for somewhat similar peptides, presumably because the genetic code is optimized to tolerate frameshifts [83–85]. This hypothesis suggests that redundancy among frames in *Aleurodicus dispersus* should be greater than in the closely related *Aleurodicus dugesii* where coding seems regular, assuming that changes in coding structure increase redundancy among frames for

I used ClustalX to align the regular protein with peptides coded by the +1 and +2 frames of the same coding strand, for each *Aleurodicus dispersus* and *Aleurodicus dugesii*. Numbers of amino acids that were identical in the alignment were divided by total peptide lengths. This proportion for genes from *Aleurodicus dispersus* is plotted as a function of the corresponding proportion for *Aleurodicus dugesii* for alignments between frame 0 and frame +1 (**Figure 4**). Redundancy between frame 0 and + 1 is greater in mitogenes of *Aleurodicus dispersus* in nine among twelve genes (there was no difference between these species for gene COII), a statistically significant majority according to a one tailed sign test (P = 0.0365). This tendency however does not exist for alignments between frames 0 and +2 (redundancy in *Aleurodicus* 

This analysis tentatively indicates that stop codon depletion and coding by frameshifting and translation of stop codons might associate with a phenomenon increasing tolerance to

**Figure 4.** Redundancy between peptides coded by frames 0 and +1 for mitochondrial genes of *Aleurodicus dispersus* as a function of the corresponding redundancy for *Aleurodicus dugesii*. Filled symbols are for genes with unusual coding

structure (frameshifts, stop depletion creating new ORFs, stop codon translation).

During the redescription of the recoded *Lepidochelys olivacea* mitogenome [9], an anonymous reviewer suggested that sequencing errors mimicked frameshifting mutations (insertion/ deletion), producing the impression of frame recoding. This explanation is incompatible with the phenomena described in *Lepidochelys* and *Aleurodicus*, because these involve numerous specific changes/mutations in stop codon-specific nucleotide contexts, totally depleting stop codons in usually noncoding frames, and introducing stop codons in usually stopless, regular ORFs. Frameshifting mutations insert/delete a nucleotide within a regular ORF, which due to the frameshifting mutation is split between two frames. This does not deplete stop codons occurring in noncoding frames, nor introduce stops in the frameshifted ORF. ORF creation in usually noncoding frames by stop codon depletion in *Lepidochelys olivacea* [9] and *Aleurodicus dispersus* probably originates from natural, enzymatic, directed mutations [118] or other processes causing directed mutations, such as transposon-mediated directed mutations [119, 120].

Recoding probably occurs beyond mitogenomes. However, the short highly conserved mitogenomes [121] are most adequate to manual reannotation, a first necessary stage to detect events where genes are recoded from one to another genetic code. I suggest that annotations of genomes, and mitogenomes in particular, take systematically into account phenomena such as swinger sequences [51], and directed stop codon depletions that may result in ORFs that do not code for regular recognized proteins as presented here, especially in genomes/ genes that seem unusual and remain in an unverified status in GenBank. **Figure 5** resumes the changes in coding structures that occurred in *Aleurodicus dispersus* due to recoding, as compared to the "ancestral" regular situation in *A. dugesii*. The proposed stopless genetic code in *Aleurodicus dispersus* presumably introduces serine at stops TAR and differs from previously described alternative arthropod mitochondrial genetic codes, which usually recode codons AGR [122, 123].

necessitates translating stop codons (a stopless genetic code), while frames including stop codons and therefore not considered as ORFs become stop codon depleted, and hence corresponding peptides are coded by the regular invertebrate mitochondrial genetic code. This situation where peptides coded by regular and stopless genetic codes are swapped might reflect a reversal in hierarchies of needs for the expressions of the respective peptides, specific to *Aleurodicus dispersus*. The requirement for tRNAs translating stop codons would regulate these respective expressions, *de facto* swapping between regular and stopless genetic codes. I suggest that the enzymatically directed stop codon depletion is related to the process that caused directed introductions of stop codons in the coding frames of HIV proteins integrated

Directed Mutations Recode Mitochondrial Genes: From Regular to Stopless Genetic Codes

http://dx.doi.org/10.5772/intechopen.80871

69

This study was supported by Méditerranée Infection and the National Research Agency under the program "Investissements d'avenir," reference ANR-10-IAHU-03, and the A\*MIDEX proj-

1 Aix-Marseille University, URMITE, UM 63, CNRS UMR7278, IRD 198, INSERM U1095,

2 The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem,

[1] Seligmann H. Phylogeny of genetic codes and punctuation codes within genetic codes.

[2] Seligmann H. Alignment-based and alignment-free methods converge with experimental data on amino acids coded by stop codons at split between nuclear and mitochon-

Hospitalo-Universitary Institute, Méditerranée-Infection, Marseille, France

in the nuclear genomes of "elite" HIV controller individuals [68, 69].

**Acknowledgements**

**Conflict of interest**

**Author details**

Hervé Seligmann1,2\*

Israel

**References**

ect (no ANR-11-IDEX-0001-02).

The author declares no conflict of interest.

Bio Systems. 2015;**129**:36-43

\*Address all correspondence to: varanuseremius@gmail.com

drial genetic codes. Bio Systems. 2018;**167**:33-46

**Figure 5.** Classical and unusual mitogenome structures of whiteflies (*Aleurodicus dugesii*; *A. dispersus*). In *A. dispersus*, GenBank annotates genes ND1, ND4l, ND4, ND5, ND6 and CytB erroneously stopless frames coding for other proteins. A different frame with stops codes for the metabolically usual protein. CytB: both frames on same strand; other genes: opposite strands.
