**3. Impact of NGS technologies on virology**

**2. Genome databases**

sequences.

**2.2.** *ViralZone*

genome resources are briefly described below.

178 Next Generation Sequencing - Advances, Applications and Challenges

Initial effort towards sequencing of viral genomes resulted in accumulation of genomic data in primary repositories such as GenBank [39], European Molecular Biology Laboratory (EMBL) [40] and DNA Data Bank of Japan (DDBJ) [41] and now continues to rise in International Nucleotide Sequence Database Collaboration (INSDC) [42]. Genome databases and resources dedicated to viruses were developed subsequently [43–47]. Lists of useful databases, resources and analysis tools have also been compiled previously [13, 48]. Most of these resources archive complete genome sequences, their annotations and derived data such as viral variations, multiple sequence alignments (MSAs) and phylogenetic trees, to name a few. Some of the viral

**2.1. National Center for Biotechnology Information (NCBI) viral genome resource**

This reference resource is designed to catalogue publicly available genomic sequences of viruses deposited in INSDC [49]. It attempts to curate reference genome sequences and leverages on the knowledge of experts to annotate as well as to identify important viral

This resource is developed and maintained at the Swiss Institute of Bioinformatics. The objective of the resource is to link textbook knowledge, fact sheets and images to the genomic

The ViPR [51] is supported under the Bioinformatics Resource Centers (BRC) programme of National Institute of Allergy and Infectious Diseases (NIAID). The database currently provides access to molecular data of viruses including complete genomes of 14 viral families. Analytical and visualization tools for metadata-driven statistical sequence analysis, data filtering,

In addition to these, several organism-specific resources have been developed such as HCV

Annotation of the sequence (gene/genome/protein) records is an integral step in downstream processing of database entries. A well-curated reference record serves as template for transfer of annotation in terms of features such as gene boundaries, associated functions (molecular/ cellular/pathway) and non-coding regions [49]. Such annotations will be highly useful in subsequent analysis and model building. The challenges of managing dedicated resources for viral genomes are relatively different as compared to the genomic databases of model and other organisms. The pace of sequencing and the quantum of genomic data being generated are affecting identification of reference genomes and annotations of genomes of strains and isolates. Additionally, to study the spatio-temporal evolution and to model the viral popula‐

and proteomic data with an objective to facilitate the study of viral diversity [50].

analytical workflows and utility of personal workbench are provided to the users.

Database [52] for *Hepatitis C virus* and IVDB [53] for *Influenza virus* and HIV [54].

**2.3. Virus Pathogen Database and Analysis Resource (ViPR)**

Molecular analysis of viruses using data generated by NGS has revolutionized virology. While understanding the sequence–structure–function relationships, it has also resulted in the development of new areas of research such as phyloinformatics and immunoinformatics, which translates raw data into information. The information generated from these independ‐ ent yet interlinked areas, when put together fits as pieces of jigsaw puzzle (Figure 1), leading to an improved understanding of the viral diseases and, thereby, the development of antiviral therapies.

#### **3.1. Unravelling mutational landscapes in viral quasispecies**

Viral quasispecies are mutant swarms generated mainly by RNA viruses during replication, which is known to be error-prone due to the lack of proofreading activity of RNA-dependent RNA polymerase. The resulting mosaic is a dynamic distribution of non-identical but related replicons that cannot be detected using conventional sequencing approaches. Hence, quasis‐ pecies remained unexplored for a considerable time, even though the theoretical concept for quasispecies was put forth by Eigen in 1970 [55]. With the advent of NGS technologies, the generation of large genomic datasets became a reality. Due to the sequencing error issues, it was still tough to demarcate true genetic variations. Circular Sequencing (CirSeq), a novel experimental approach that creates template of tandem repeats of circularized genomic RNA fragments has been developed by Andino's group [56]. CirSeq reduces the sequencing error drastically as the repeats get sequenced in a redundant manner for every genomic fragment. A consensus reduces the theoretical error close to 10−11, which enables capture of the entire mutational spectrum of RNA virus populations. CirSeq was employed to study seven serial passages of *Poliovirus* replicated in HeLa cells. Mutation frequency was computed for every passage and their fitness was determined by mapping onto the 3D structure of proteins. As expected, majority of the mutations detected were neutral substitutions, thus highlighting robustness as driving force for adaptation and evolution [56]. This study clearly delineates the viral mutations responsible for quasispecies structure and highlights the extent of genetic variation that can be maintained in a population.

Microevolution in an evolving quasispecies population is responsible for the sequence diversity in *Porcine reproductive and respiratory syndrome virus* (PRRSV). PRRSV is the causative agent of late-term reproductive failure in sows and respiratory distress in pigs and hence has large economic impact. Genomic complexity of PRRSV due to multiple circulating genotypes results in antigenic diversity, which, in turn, is responsible for lack of effective vaccine development [57]. Sanger sequencing has identified open reading frames ORF5 and ORF7 as the polymorphic regions of the virus genome, encoding major immunogenic epitopes. In order to study the genome-wide polymorphisms, deep sequencing of PRRSV was carried out and amino acid substitutions in ORFs 2–7 in PRRSV strains obtained from pigs that lack B and T cells were studied [58]. By analysing nucleotide substitutions over time followed by compa‐ rative genomics with non-pathogenic variants, the role of mutation and selection in preserving the pathogenesis or fitness of PRRSV was well documented in this study.
