**3. SARS-CoV-2 variants and mutations in the spike protein**

Genetic sequencing studies have revealed numerous neutral or mildly deleterious mutations, mainly single nucleotide polymorphisms (SNPs) and insertions/deletions (indels). However, a small percentage of mutations can alter fitness and help the virus adapt. These substitutions or deletions can alter peptide polarity and affect the structure and functionality of viral proteins responsible for infectivity, transmissibility, and antigenicity [36]. Several databases and nomenclature systems have been established to classify genome sequences and track the epidemiology and genetic evolution of SARS-CoV-2. GISAID (The Global Initiative on Sharing All Influenza Data, https:// www.epicov.org) contains millions of SARS-CoV-2 whole-genome sequences [37]. The GISAID nomenclature categorizes genomes into clades (based on marker mutations) that help understand large-scale diversity patterns and geographic dispersal. Pango nomenclature (https://cov-lineages.org/) is one of the most widely used nomenclatures that assigns newly identified genomes to a lineage based on the global phylogenetic tree. Based on the extensive sequencing data and observations available,

the WHO has classified the SARS-CoV-2 variants that may pose an increased public health risk into the following three groups:


According to comprehensive data provided by WHO as of May 2, 2021 (https:// www.who.int/en/activities/tracking-SARS-CoV-2-variants/), the above groups are summarized in **Table 1**.

RNA viruses have the highest mutation rates, 1:10,000 to 1:1,000,000 mutations per base pair, due to the lack of proofreading ability of RNA-dependent RNA polymerases (RdRp) [38]. However, coronavirus family viruses have a proofreading


#### **Table 1.**

*Summary of VOI, VOC, and VUM as published by the WHO.*

*Perspective Chapter: Bioinformatics Study of the Evolution of SARS-CoV-2 Spike Protein DOI: http://dx.doi.org/10.5772/intechopen.105915*


#### **Table 2.**

*Collected key mutations of SARS-CoV-2, which have a substantial impact on the viral pathogenicity.*

mechanism due to the exoribonuclease (ExoN) domain of nsp14 [39]. Although this was expected to contribute to a low mutation rate, more than 6 million viral genomes were captured within 2 years (GISAID). Furthermore, the first fitness-enhancing mutation at the spike protein was identified only a few months after SARS-CoV-2 emerged [40]. These findings could be a consequence of the sheer magnitude of infection numbers on a global scale. In addition, Gribble et al. [41] have experimentally demonstrated that nsp14-ExoN may play a critical role in RNA recombination events during viral replication that can generate genetic variants (**Table 2**).

As we see in **Table 2**, the most essential mutations of the SARS-CoV-2 are found on the spike protein's receptor-binding domain (RBD), but that is not the whole story. Other important regions are critical to the success of different lineages, like mutations in the antibody binding regions found on/near the RBD, on the N-terminal domain (NTD) and the S2 domain. Such mutations mainly contribute to the lineages' antibody escape. Some mutations stabilize binding with stabilization of the open conformation and add/remove glycosylation sites which are also important. There is also the furin cleavage site, one of the vital mutation points that was present in the *delta* variant. The other aspect of mutations is also the influence on testing capabilities. For example the mutation S:69/70del that caused the so-called S-drop out in the PCR testing, which was caused by the mutation in the PCR primer region. This mutation was later cleverly exploited to quickly distinguish between *alpha* and non-*alpha* infections and *omicron* and non-*omicron*. There are also mutations on the N-gene that are present mainly in the omicron lineage, which cause lower sensitivity and even failure of detection using lateral flow tests (so-called quick antigen tests). Lastly, there are also mutations in other proteins that are significant for novel drug design. The phylogenetic analysis of SARS-CoV-2 is presented in **Figure 4** (The reader is referred to an excellent resource: https://covariants.org/).

The *omicron* variant mutations potentially attenuate the efficacy of therapeutic antibodies and enhance the binding of ACE2. Of even more significant concern, *omicron* infections have been reported in individuals vaccinated in South Africa and Hong Kong [44]. The recent study by Yin et al. reported the biochemical and structural

#### **Figure 4.**

*On the chart we can see the molecular clock, i.e. the number of mutations in a particular sequence dependent on the time of sampling. The colors represent variants (clades). The thick line represents the average mutation rate. We can see that 21 L (BA.2) omicron, as well as 22A (BA.4), 22B (BA.5), 22C (BA.2.12.1), clearly deviate from the average. Nextstrain; 2022-05-31 [42, 43].*

characterization of the spike protein trimer of SARS-CoV-2 *omicron* variant and its binding to ACE2. Data show that *omicron* variant RBD is less stable and more dynamic than WT RBD. [35] *Omicron* differs significantly from all previous versions of SARS-CoV-2. These mutations in turn have led to important consequences in the behavior of the virus, including significant epidemiological characteristics [42, 43]. Namely, mutations altered the area recognized by neutralizing antibodies and reduced the effectiveness of the immune response elicited by previous variants and vaccines. Of particular interest are the neutralizing antibodies that still recognize the virus, because knowing them will help us to improve prophylactic and therapeutic strategies and to respond adequately to future variants. There is excellent research done on antibody escape by J.D. Bloom et al. [45]. Bloom's lab plotted the antibody escape in dependence on the spike mutation site for Moderna's vaccine serum and convalescent serum. From the plots presented in **Figure 5** it's nicely seen that the mutation on 484 or 456 sites would cause an antigen to escape from any previously acquired immunity. The mutation on 484th residue, present in *beta* and *gamma* lineages as E484K and in *omicron* as E484A, confirmed Bloom's work. The mutation 456 on the other hand, is not present so far in any of the WHO's variants of interests.

*Omicron* variant mutations thus contributed to the successful transmission of the virus from person to person and its faster spread. Surprisingly, the virus with so many mutations continues to effectively bind to the ACE2 receptor. *Omicron* is thought to combine various mutations (we know some of them from other variants), some of which simultaneously help it escape the immune response and bind to ACE2 [43]. The mutations have affected the mechanism of viral entry into the cell. Unlike other variants, which use the mechanism of fusion with the cytoplasmic membrane of the host cell, *omicron* uses another mechanism of entry, namely the uptake by the endosome. According to one hypothesis, this may at least partially explain *omicron*'s preference for the upper respiratory tract (nose and throat) [42]. The mutations have also affected the spectrum of hosts—it is known

**Figure 5.**

*Bloom plots (https://jbloomlab.github.io/SARS2\_RBD\_Ab\_escape\_maps/) top: Moderna's vaccine serum and bottom: Convalescent serum; on 2022-05-31 [45].*

that SARS-CoV-2 can infect a wide range of domestic and wild animals, including cats, dogs, ferrets, hamsters, leopards, minks, deer, etc., but not mice and rats. However, unlike "conventional" SARS-Cov-2 variants, *omicron* can bind to the ACE2 receptor in turkeys, chickens and mice, as well as rats (linked to N501Y and Q498R mutations) [1, 45].

All in all, *omicron* consists of several genome sublines / subvariants (BA.1, BA.2, BA.3, BA.4, BA.5), some of which (BA.1, BA.2) have already been established worldwide, while the growth of the others (BA.4, BA.5) is increasing at the moment. BA4 and BA5 appeared in late December 2021 and early January 2022, they are better transmissible than earlier versions of *omicron* (BA2 and especially BA1) and may partially escape the immune protection provided by infection with previous variants or vaccination. At the beginning of May 2022, infections with BA.4 and BA.5 were 60–75% of the cases in South Africa, and have been registered in a number of other countries in Europe and North America [42, 46]. According to the European Centre for Disease Prevention and Control (ECDC; https://www.ecdc.europa.eu/en/newsevents/epidemiological-update-sars-cov-2-omicron-sub-lineages-ba4-and-ba5), as of May 13, 2022, BA.5 represents already 37% of the cases in Portugal, and the expectation is to become dominant by May 22, 2022. *Omicron* has a large number of mutations (50 as compared to the original variant of SARS-CoV-2 isolated in Wuhan in the end of 2019). Its origin is not yet fully established. Clarifying it is extremely important not only from a theoretical point of view, but also because it will help us to be better prepared to manage with future variants [42]. Among the more recognized hypotheses, we can distinguish four: First, accumulation of mutations during its transmission from person to person. It is known that, unlike other RNA viruses, coronaviruses, including SARS-CoV-2, carry an editing enzyme system that helps

it to correct errors occurring during in RNA molecule synthesis [47, 48]; Second, the appearance of such large number of mutations can be facilitated by infecting immunosuppressed individuals in whose body the virus persists for a long time, which creates conditions for its continuous reproduction and selection of mutations that avoid the immune response; Third, the emergence can be related to its circulation (and consequent accumulation of mutations) in animal organisms. This shows that we need to monitor the fate of the virus also in the animal kingdom [42]; and Fourth, the changes in coronaviruses can be induced through a process of recombination—in this case, the formation of the next vital generation may be the result of combining genetic information. It has been reported that this process takes place not only in bats, but in humans as well and can lead to the emergence of new variants and strains [49, 50].

## **4. In silico comprehensive mutagenesis**

We can postulate new viral variants along with key (canonical) mutations, especially at the receptor binding domain (RBD) of the Spike protein (Spro), improve the ability of the virus to recognize relevant host receptor (ACE2) via steric adaptation and new interactions with the binding partner. In order to inspect all possible mutations at the Spro RBD we performed a comprehensive *in silico* mutagenesis study using FoldX [46]. 3D complexes of Spro wild type along with Spro FoldX mutants were iteratively used for ΔΔG prediction. All possible mutations of RBD binding domain of SARS-CoV-2 S protein (PDB ID: 6M0J) with sequence from K417 towards Y505 (length of 89; *KIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERD ISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGY*) for a total of 1780 point mutations were calculated and the resulting heatmap is presented in **Figure 6** [49]. Individual point mutation calculations were repeated once and mutations with no structural change left for validation purposes where all no-change mutation produced ΔΔG energies below 0.1 kcal/mol.

Upon structural inspection and superimposition (PDB ID: 6M0J, 7DK3), the Spro RBD-ACE2 interface was identified as: 417 LYS, 445 VAL, 446 GLY, 449 TYR, 453 TYR, 455 LEU, 456 PHE, 473 TYR, 475 ALA, 476 GLY, 484 GLU, 486 PHE, 487 ASN, 489 TYR, 493 GLN, 496 GLY, 498 GLN, 500 THR, 501 ASN, 502 GLY, 503 VAL, 505 TYR. Reference key mutations were placed on this interface such as E484K, Q493N, Q493Y, and N501Y with ample experimental data for validation. Namely, N501Y confers increased binding affinity to human ACE2 while N501T shows reduced host ACE2-binding affinity *in vitro* all according to P0DTC2 Uniprot reference [47, 48].

#### **Figure 6.**

*We conducted a full RBD 417–505 mutagenesis study using FoldX in order to assess the key mutations and their effects on the stability of the system; in total 1780 point mutations.*

*Perspective Chapter: Bioinformatics Study of the Evolution of SARS-CoV-2 Spike Protein DOI: http://dx.doi.org/10.5772/intechopen.105915*

We observed FoldX total energies of 0.37, 0.62, and −0.95 kcal/mol upon point mutations E484K, Q493N, and Q493Y respectively in accordance with literature [50]. Furthermore, canonical *delta* L452R, and E484Q displayed insignificant FoldX force field Δ energies of 0.04, 0.09 kcal/mol, respectively [51]. If we focus on *omicron* variant, it possesses the following mutations at the Spro RBD not in contact with ACE2 binding partner: G339D, S371L, S373P, and S375F. In the ACE2 PPI, however the following mutations are present with calculated FoldX Δ energies in kcal/mol: K417N: −0.34, G446S: 2.99, N440K: −0.61, S477N: 0.14, T478K: −0.18, E484A: 1.31, Q493K: −1.20, G496S: −0.06, Q498R: −0.93, N501Y: 6.18 and Y505H: 1.62. The results confirm the observed ensemble of mutations substantially modify the resulting PPI in accordance with the literature [52–60]. It should be stressed, that FoldX evaluations are single-point only and detailed binding energetics should be further studied by experiment supported MD. We postulate however such *in silico* interaction profiling approaches could be further developed to quickly assess key mutations or mutation ensembles for further study in the future.
