**8.9. Humanomics, personomics, and health**

[192–195, 247]. Different classes of TEs are found in the genomes of different eukaryotes that contribute to at least 50% of the human genome [237] and up to 90% of the maize genome [252]. In humans, there are solitary Long Terminal Repeats (LTR) and LTR retrotransposons (endogenous retroviruses) that are characterized by the presence of LTR at both ends; Long Interspersed Nuclear Elements (LINEs) like L1 that represent families of non-LTR TEs about 6 kb in length and encode two proteins, a nucleic acid chaperone, and a reverse transcriptase/ nuclease for retrotransposition; Nonautomomous Miniature Inverted-Repeat Transposable Elements (MITEs); Mammalian-wide Interspersed Repeats (MIRs), an ancient family of tRNAderived SINEs exapted as enhancers and regulatory sequences; and Short Interspersed Nuclear Elements (SINEs) like Alu that are usually less than 300 bp and need a helper transposon element like L1 for transposition [245]. Most ERVs, SINEs, and LINEs in the human genome are now remnants of past insertions and are no longer capable of actively "jumping" like functional TEs [238, 245, 248]. Indeed, many of the TE ancient relics have undergone exaptation and developed new functions, such as transcript repeat elements, within regulatory gene

The importance of widespread HGT in creating genomic diversity in microbes has been highlighted by the many comparative genomic studies using metagenome data [191]. Comparative genomic analysis of different strains of *E. coli* revealed that up to 30% of genes in pathogenic strains were acquired by HGT often creating duplication events and modifying metabolic networks by adding operons that encode two or more enzymes [253]. Comparative genomics of photosynthetic prokaryotes revealed that they have evolved as complex mosaics via multiple HGT events [254]. Similarly, photosynthetic gene clusters and gene clusters that encode various toxins, resistance genes, metabolic genes, and compo‐ nents of secretion systems appear to be the products of HGT [247, 253–255]. Indeed, many HGT events probably were mediated by genomic mobile elements, such as bacteriophag‐ es, plasmids, viruses, transposable elements, and toxin/antitoxin systems that are persis‐

Before the new millennium, transposons and repeat elements were largely viewed as junk and as parasites that created unnecessary burden on the genome. Comparative genomics and online databases dedicated to transposons and repeat elements such as SINES, LINES, and ERVs, however, began to change this picture in the 1990s, and it soon became evident that these elements were the drivers of evolutionary innovation. Many integrated transposons mutate with time to interact with the host transcriptional machinery and therefore provide a useful substrate for evolution of novel regulatory elements [145, 228, 255–258]. Moreover, some of the ancient integrated retrotransposons appear to have been involved in advantageous segmental genomic duplications such as in the major histocompatibility complex region [259– 261], and others have dispersed regulatory controls to provide coordinated regulation across

Agrigenomics or agricultural genomics can be defined as the research and development activities that translate NGS and genomics technology into a better understanding of plant

networks to generate lineage-specific adaptation [145, 249].

36 Next Generation Sequencing - Advances, Applications and Challenges

tent in all life forms [191, 228, 246, 255, 256].

the genome [257, 258].

**8.8. Agrigenomics**

The accumulation of knowledge on the human genome and its genetic and molecular processes (humanomics) has amplified considerably since the first draft assembly was published in 2001 [262]. The first human hybrid genome took about 15 years to sequence and assemble, and when released to the public, it covered 90% of the euchromatic genome, contained about 250,000 gaps, and had many errors in the nucleotide sequence [43, 44]. Ten years after the publication of the first human draft sequence, six more human genome sequences were completed with a much greater coverage and accuracy, enabling more informative comparisons to be made between them [7, 8, 79]. Studies by the 1000 Genomes Project [10], the Personal Genome Project [263], the HapMap Consortium [264], and the Pan-Asian Single Nucleotide Polymorphism Project [265] revealed the enormous sequence diversity that exists between individuals. Since then, 225 Ethiopian and Egyptian genomes were compared to reconstruct their population history out of Africa [266], 911 genomes from 10 populations of African, East Asian, and European ancestries were sequenced to elucidate novel patterns and signatures of genetic differentiation [267], and whole-exome sequences from 951 genomes of a ClinSeq cohort were compared to discover new loss-of-function mutations [268]. Today, there are many 1000 human genome projects, and WGS of the human genome for personalized medicine (per‐ sonomics) is already a reality for 2,638 Icelanders [9] and for some others [269, 270] of the 7.3 billion individuals currently populating the globe (http://www.worldometers.info/worldpopulation).

Veeramah and Hammer [271] recently reviewed the usefulness of NGS to sequence ancient DNA samples for phylogenetic and evolutionary studies and for the reconstruction of human population history. Some of these NGS studies have helped to refine the demographic histories of human evolution. These studies include those of the ancient DNA of extinct hominins (Neanderthals, Denisovans) and ancient modern humans such as 7,000-year-old Mesolithic hunter-gathers in northwestern Spain, Neolithic and post-Neolithic (5,300- to 4,000-year-old) hunter-gathers and farmers in Scandinavia, a 4,000-year-old Paleo-Eskimo from southern Greenland, and a 24,000- and 17,000-year-old South-Central Siberian [271]. NGS of ancient nonhuman genomes such as those of pathogens, parasites, and domesticated animals and plants also can provide new information about human history in regard to life styles, health, and the spread of agriculture [272].

NGS has allowed a detailed analysis of single nucleotide variants (SNVs), structural variants (SV), and methylations in coding and noncoding regions and to assess their role in human disease [9, 14, 15, 19, 22, 25, 29, 30, 123, 125–127, 144, 148, 151, 152]. The establishment of the International HapMap Project in 2003 (Table 3) to develop a "hapmap" of human haplotype genomes from samples of large populations was an important initiative to find genes and genomic variations (SNP and CNV frequencies, genotypes, and phased haplotypes) that affect health and disease [264]. More than 97 million validated SNPs (dbSNP) have been discovered from human genome sequencing projects and many of the variants have been linked to a range of medical and phenotypic conditions and catalogued at dbGAP (Table 3), the database of genotype and phenotype [273]. In July 2015, dbGAP had links to 592 disease and phenotype studies and 3,711 data sets. In addition to SNV, small and large SVs that are duplicated, deleted, or rearranged relative to the reference sequences and individuals have been identified in NGS studies and associated with various diseases [9, 30, 127]. NGS has been used to diagnose rare Mendelian diseases and genetically heterogeneous complex disorders, such as X-linked intellectual disability, congenital disorders, cancer genome heterogeneity, and fetal aneuploi‐ dy [13, 15, 123, 125, 208, 209, 274, 275]. The impact of NGS on the diagnosis of rare genetic diseases is evidenced by the growth of the genes and OMIM database [49, 276] that has doubled in data since 2007 [274]. However, it should be noted that NGS does not always reveal causative mutations but instead may provide a list of possible candidates. Many detected SNPs, SNVs, and SV have not been associated to disease or phenotype and many diseases still await a genetic or genomic cause. NGS in human studies must be used with caution because of the significant levels of false-positive and false-negative rates in sequencing errors and amplification biases.

Soon et al. [30] listed and reviewed the various NGS methods employed in the ENCODE project to annotate and analyze the transcriptome and map elements and identify the methylation patterns of the whole human genome. The information in ENCODE and other databases such as GTEx, FANTOM, NIH ROADMAP, and BLUEPRINT (Table 3) has enabled researchers to map genetic variants to gene regulatory regions and assess indirect links to disease. The Regulome DB based on the accumulation of nongenic functional regulatory regions obtained from ENCODE is a useful resource for the evaluation of polymorphisms of regulatory regions [276]. Although disease-associated SNPs obtained from GWAS studies may point to gene coding regions, they actually might reside in regulatory sites of downstream genes that are in linkage disequilibrium with the reported SNPs [262]. RNA-seq and NGS has confirmed that 98% of the human genome is transcribed from noncoding genomic regions, that only about 2% of the human genome codes for peptides and proteins with about 20,000 distinct proteincoding genes, and that alternative splicing seems to occur for 90% of protein-coding genes to yield many more different types of proteins than genes [134, 135, 151, 152]. The vast majority of the human genome is not functionless "junk DNA" as previously thought [262], but rather, it can be viewed as DNA/RNA "dark matter" expressing hundreds to millions of transcribed short and long noncoding RNA molecules that have important regulatory roles in transcrip‐ tion, translation, transport, metabolism, and innate immunity [133]. Some of these are the interspersed retroelements such as Alu and L1 and endogenous retroviruses (ERVs) that have evolved before and during primate history to function as regulators of transcription and translation [24, 25, 142, 145, 257, 258].

NGS has especially revolutionized the field of cancer genomes revealing mutations, amplifi‐ cations, deletions, translocations, and dysregulation of noncoding and coding RNAs to provide a better understanding of the complex genetics and loss of regulation in cancer [15, 25, 29, 208, 209, 275]. For example, paired end sequencing showed that about half of structural rearrangements in breast cancer genomes were fusion transcripts resulting from the rear‐ rangements of segmental tandem duplications involving multiple genes [277]. Similarly, other cancer types were found to be dominated by duplications, translocations, structural variations, and complex rearrangements called "chromothripsis" that involve chromosomal rearrange‐ ments as single events confined to genomic regions in one or a few chromosomes [278]. NGS also has been applied to circulating tumor cells isolated from the body fluids (blood, urine, sputum, saliva, and stools) [30, 274]. A genomic landscape and a catalogue of somatic muta‐ tions in cancer are provided on the Internet at COSMIC (Table 3, [275]). Thus, NGS potentially provides cancer patients with opportunities for personalized diagnosis and optimized therapeutic treatment [279, 280].

The integration of NGS data obtained from whole genome, exome, transcriptome, and methylome to build up individual genomic profiles is a growing reality in human health care. Recently, Chen et al. [269] developed "integrated personal omic profiling" in an individual by sequencing their genome at high accuracy and profiling their transcriptome, metabolome, and proteome over a 14-month period. In the study, they tracked the emergence of type 2 diabetes and assessed the individual's genetic make-up and disease risks. Others have performed similar studies demonstrating that monitoring the longitudinal trends and changes within individuals is an important future protocol for the diagnosis, management, and treatment of disease [9, 81, 270]. The challenges for "person omics," however, remain formidable at many levels, not least the time, cost, and effort required to gather, process, and interpret the data [101]. The cost benefits of NGS for personomics have still to be assessed with many economic, securities, personal, familial, social, and ethical issues to be considered and resolved.
