*4.4.1. Virus Recombination Mapper (ViReMa)*

ViReMa is developed to analyse the recombinants within the viral genome data derived through NGS [93]. It can detect inter-virus or virus–host recombination. This method can also detect insertion and substitution events and multiple recombination junctions within a single read.

#### **Algorithm steps:**

*Rhinovirus A, -B* and *-C*, each of which is further subdivided into distinct serotypes. The STRUCTURE-based analysis revealed a strong evidence for existence of seven genetically distinct subpopulations (with *FST* = 0.45, *p* = 0). *Rhinovirus A* and *Rhinovirus C* were subdivid‐ ed into four and two subpopulations respectively, whereas *Rhinovirus B* species remain undivided. Furthermore, usage of both the admixture and the linkage models (with burn-in of 20,000 and burn-length of 40,000) helped to resolve the role of recombination in diversifica‐ tion of subpopulations. In case of *Rhinovirus A*, intra-species recombination was common, whereas in case of *Rhinovirus* C, intra- and inter-species recombination were observed to cause

Linkage equilibrium refers to the statistical independence of alleles at all loci and indicates evidence of free recombination [83]. Thus, linkage disequilibrium is a measure of the correla‐ tion between the occurrences of nucleotides at different loci of the genome. The extent to which recombination occurs can be estimated in terms of the degree of linkage disequilibrium [84] using measures made available by specialized programs such as Linkage Analysis (LIAN) [83] and DNA Sequence Polymorphism (DnaSP) [85]. The extent of linkage can be inferred based

**i. Standardized index of association, ISA:** It is a measure of the degree of haplotype-

[1/(*e*−1)] [(*VD*/*VE*)−1], where '*VD*' represents the observed variance of pairwise distan‐ ces between haplotypes and '*VE*' represents the expected variance when all loci are in linkage equilibrium. The term [(*VD/VE*)−1] is the function of rate of recombination, which is zero in case of linkage equilibrium. The number of loci analysed is denoted

Linkage Analysis), which requires haplotype data as an input. This program imple‐ ments both a Monte Carlo and an algebraic method to test the null hypothesis: *VD* = *VE*.

observed and the expected haplotype frequency in the absence of linkage disequili‐ brium, which is normalized by the maximum (or minimum) possible value of this difference. The squared value of the difference between the observed and the expected haplotype frequency normalized by the variance of the allele frequency is

values for these measures can range between 0 (no linkage disequilibrium) and 1

LD provides a good measure for analysing the extent of recombination in viruses [82, 87]. For

bination obtained using independent methods [82]. Similarly, LD analyses in serotypes of *Foot*

: The |*D*'| measure is the absolute value of the difference between the

. These measures can be computed using DnaSP program [85]. The

= 0.0613) were observed and correlated well with the evidence of recom‐

A is computed using a formula, IS

A = 0.0666, *p* < 10−4; |*D*'| =

, supporting high

A can be computed by using the program called LIAN (for

A =

diversity [82].

on the following parameters.

**ii. |***D***'| and** *r***<sup>2</sup>**

**Case studies:**

0.5409 and that of *r2*

recombination.

denoted by *r*<sup>2</sup>

by '*e*'. The value of IS

**4.3. Methods to compute linkage disequilibrium**

186 Next Generation Sequencing - Advances, Applications and Challenges

wide linkage derived from a given dataset. IS

(complete linkage disequilibrium) [84, 86].

example, in case of *Rhinoviruses,* low values for LD measures (IS

*and mouth disease virus* [87] helped to reveal low values of |*D*'| and *r2*


#### *4.4.2. Recombination Detection Program version 4 (RDP4) package*

In order to detect recombination, various methods have been developed and are provided in RDP4 package [94]. It identifies the significant evidence of recombination events based on the *p-value* and identifies the potential recombinant sequences and its both parents (major and minor). The main strength of the package is that it does not need any prior knowledge pertaining to non-recombinant set of reference sequences. The starting point of analysis is MSA of genomic sequences.

#### **Algorithm steps:**

**i.** RDP4 package sequentially tests every combination of three sequences in MSA (a triplet) for potential evidence that one of the three is a recombinant and the other two are its parents. Various recombination detection methods, such as the Ramer– Douglas–Peucker algorithm (RDP) method [95], BOOTSCAN [96, 97], maximum Chisquare (MAXCHI) method [98, 99], CHIMAERA [99], 3'-end sequencing for expres‐ sion quantification (3SEQ) [100], gene conversion method (GENECONV) [101], Sister Scanning method (SISCAN) [102], LARD [103], Topal/Difference of Sums of Squares (DSS) [104] and DNA distance plot, are used.


#### **Salient feature:**

RDP4 package provides a unified interface for multiple methods and facilitates visualization of recombination events using genomic data (up to 2,500 sequences).

#### **Limitations:**


#### **4.5. Methods for selection pressure analysis**

Natural selection is one of the fundamental evolutionary processes that shape the genetic structure of viral populations. The ratio of non-synonymous substitution rate (*dN*) to synon‐ ymous substitution rate (*dS*) is a useful means to infer selection pressure based on a codon alignment for a particular gene. Positive selection (*dN/dS* > 1) increases the frequency of advantageous alleles, whereas the negative selection (*dN/dS* < 1) is responsible for purging (removal) of deleterious alleles.

Broadly, the selection pressure can be classified as pervasive and episodic. Pervasive selection acts across all the lineages in a phylogenetic tree, whereas the episodic selection operates on a few lineages of a tree. Various statistical methods for analysis of pervasive and episodic selection are available at the Datamonkey web-server of Hypothesis testing using Phylogenies (HyPhy) software package [107–109].

#### *4.5.1. Single Likelihood Ancestor Counting (SLAC)*

**Principle:** This method belongs to a class called counting methods [110]. It is suitable for pervasive selection analysis and involves estimating the number of non-synonymous and synonymous changes that have occurred at each codon throughout the evolutionary history of the sample. It involves reconstructing the ancestral sequences using likelihood-based method [111].
