**2. Evolution of the genome: why thymine was key**

The genome of all organisms on earth today is coded by 4 nucleobases, which are the two purines, adenine (A) and guanine (G), and the 2 pyrimidines, cytosine (C) and thymine (T). The nucleobases are commonly conjugated to (deoxy)ribose, which are termed nucleosides, and they can then be further conjugated to phosphate groups, giving them the name nucleotides. As deoxynucleotide triphosphates (dNTPs), the nucleobases A, T, G and C are commonly incorporated into DNA, held in a sequence via covalent attachments to the DNA backbone (made up of covalently attached deoxyribose and phosphate in a chain), making a DNA strand. A DNA strand is paired with another DNA strand (forming the classic double-helix structure of DNA) which are complementary to one another via hydrogen bonds between nucleobases within adjacent strands of the DNA helix. In 'normal' DNA, A:T and G:C always pair with one another. Uracil (U) is another nucleobase that is mostly found in RNA, which is synthesised via nucleotide triphosphates (NTPs). RNA can form similar structures to DNA, except that A:U pair together instead of A:T and that the RNA backbone incorporates ribose instead of deoxyribose. RNA is also able to encode genetic information (mRNA) but is more diverse and carries out functions similar to proteins (tRNA and ribozymes). One might ask what the need is for these two similar systems for carrying genetic information and why are they different? To answer these questions, it helps to explore the evolution of the genome.

Life on earth is thought to have originated ~3.8 billion years ago from what is termed the 'primordial soup' (or prebiotic soup). In this prebiotic world the 'RNA world' hypothesis states that the first complex organic molecules to form were RNA based. To support this hypothesis, analysis of carbonaceous meteorites that have fallen to earth have found to contain a range of carbon-molecules, including U, A and G (but not C), which is thought to represent the composition of a very young earth (reviewed here [7]). Additionally, a formamide-based scenario has purposed that formamide (available in the prebiotic earth [8]) could be the starting point for generating all the RNA-components, under conditions that are thought to be present in the prebiotic earth, with U being generated in good yield [9–12]. This leads to the belief that the hereditary genetic information might have been RNA. It is thought that, eventually, there was transition to a DNA-based hereditary system since the deoxyribose-containing DNA backbone is much more stable than ribose-containing RNA backbone [13–15] and RNA replication seems to be far more error prone than DNA replication [16]. In turn, this allowed the evolution of larger, more complex genomes and, therefore, complex multicellular organisms to form.

Potentially, at some point RNA and other molecules were concentrated in a membrane-like structure (making certain catalytic reactions feasible), forming the first RNA cells with metabolism. RNA can catalyse reactions (ribozymes), encode genetic information, transport amino acids (tRNA) and catalyse peptide-bond formation (ribosomes). The idea that RNA was first to carry out these critical functions of the cell is based on ribosomal RNA being extensively involved in peptide-bond formation, suggesting that proteins potentially became essential later in evolution [17]. To allow for the transition from an RNA to DNA cell, the evolution of a mechanism to convert NTPs (containing ribose) to dNTPs (containing deoxyribose) must have occurred. Ribonucleotide reductases (RNRs) frequently catalyses the conversion of nucleotide diphosphates (NDP)/NTP in eukaryotic/prokaryotic cells into dNDP/ dNTP, respectively, and are thought to have a common ancestor [18, 19]. Interestingly, it is thought that the first DNA cell would have incorporated U (instead of T) into its genome. This is backed up by the fact that RNR can directly convert ATP, UTP, GTP

## *The Importance of the Fifth Nucleotide in DNA: Uracil DOI: http://dx.doi.org/10.5772/intechopen.110267*

and CTP into its corresponding dNTPs; however, dTTP needs extra steps involving deoxyuridine monophosphate (dUMP) conversion to dTMP, via thymidylate synthase (TS or TYMS), and two phosphorylation steps by kinases to produce dTTP [20]. Due to this convoluted route to produce dTTP, yet dUTP is synthesised in a simpler manner, it would make sense in an evolutionary context that the initial DNA cell first incorporated U into its DNA, which was then later replaced by T.

One might ask why the need of T-based DNA when U-based DNA performs the same task and is energetically easier to make? C deamination produces U, which happens at a relatively fast rate. A deaminated C will produced a G:U mismatch and led to mutated DNA during replication. In U-based DNA, a U produced via C deamination and U that is normally incorporated in DNA are chemically identical and, therefore, a G:U mismatch would be difficult to identify as damaged DNA, in the context of primitive cells without sophisticated DNA repair mechanisms. To overcome this issue, cells evolved to incorporate T instead of U into their DNA, making U produced from C deamination completely foreign instead. This meant that T-based DNA produced more stable genomes, which is especially important for evolving and maintaining large complex genomes found in, for example, multicellular organisms.

For T-based DNA to be viable, the cell evolved three key enzymes: dUTP nucleotidohydrolase (dUTPase), thymidylate synthase (TS) and uracil-DNA glycosylases (UDGs). After dUTP synthesis via RNR, dUTP is potently dephosphorylated to dUMP by dUTPase. In a two-fold mechanism, this reduces dUTP levels, reducing U misincorporation into the DNA, and produces the TS substrate (dUMP, as discussed in the previous paragraph), leading to an increase in dTTP. The relative levels of the dUTP:dTTP pools determines the rate of U misincorporation into DNA, due to DNA polymerase having difficulty distinguishing between dTTP and dUTP, which are identical molecules except for a single methyl group. In fact, DNA polymerases readily incorporate dUTP as well as dTTP based on their representative concentration and nucleotide availability [21]. If U is misincorporated or C deamination occurs, a UDG removes U, creating an abasic site where no nucleobase is present in a strand of DNA, and a primitive form of DNA damage repair could have corrected it. With the evolution of these 3 proteins, cells were able to detect C deamination (which is always mutagenic) rapidly, repair it and reduce U misincorporation as well.
