**3. Typing of** *Salmonella* **Enteritidis**

#### **3.1 Serotyping**

Serotyping has consistently been the basis of public health surveillance of *Salmonella* and has retained this primary role, as a first-line typing method, in the era of WGS based on the development of novel bioinformatics tools (see Section 3.3). Serotypes of *Salmonella* are defined by the presence of two types of antigens, namely, a heat stable, somatic O antigen, a component of the lipopolysaccharide envelope covering the organism which is an important virulence factor, and the H antigen which is present on the flagella of the organism [49]. The antigenic properties of the O antigen are depicted as numerals, e.g., 1,9,12 for SE. In contrast, the H antigens are described using one or a few letters for the phase I antigen (e.g., g, m for SE) or as a combination of letters and numbers for antigens that are expressed should the flagella bear a phase II antigen (e.g., r and 1, 2 for Heidelberg). Agglutination assays are performed on the organisms using antibodies that are able to recognize specific antigenic molecules developed through laborious crossabsorption process against other serovars [50]. The result is an elaborate classification scheme, developed by Kauffman and White [51, 52] and which has now led to the identification of some 2,600 serotypes of *Salmonella*. The complexity has been further enhanced by the ability of plasmids and prophages to alter the expression of some of the antigens, and this had led to a frequent re-evaluation of some serovar designations. Fortunately, these alterations are fairly rare and the serotyping scheme has served well since first proposed by Schüte in 1920 [53]. Of the large number of *Salmonella* serovars identified so far, only a relatively small numbers, perhaps no more than 100 serovars are commonly associated with foodborne illnesses [54, 55].

#### **3.2 Traditional subtyping procedures for** *Salmonella* **Enteritidis**

There are two approaches for the subspecies characterization of SE. Phenotypic tests rely on the biochemical properties of the live organism and the most

prominent example is phage typing. More recently, DNA based approaches or genotypic tests have dominated the field. The most widely used genotypic test being the Pulsed-Field Gel Electrophoresis. Whole genome sequencing of the DNA of SE, has over the last few years, become the dominant subtyping method in the developed world.

### *3.2.1 Pulsed-field gel electrophoresis (PFGE)*

The PFGE can been used to characterize bacteria isolates based on the pattern of distribution of restriction enzyme sites present in the organism's DNA. For *Salmonella*, the electrophoretic mobility of DNA fragments digested by the restriction enzyme *Xba*I or *Bln*I produces a characteristic fingerprinting pattern that is used to subtype the isolate. During the period between 2009 and 2019, the Canadian Food Inspection Agency used the PFGE for outbreak investigations as one of the two subtyping tests for SE, the other being the phage type. Despite the presence of hundreds of different PFGE types among field isolates of SE only two PFGE types predominated and each consisted of thousands of isolates in the Canadian PulseNet database. The two commonest Canadian primary PFGE types, namely SEN.XAI 0003 and SEN.XAI 0006, were responsible for 33.8 and 19.2% of Canadian SE isolates documented in the PulseNet database between 2012 and 2017 (Ogunremi, Allain and Nadon, unpublished). The predominance of only a few PFGE SE types was long recognized as a consequence of the poor discriminatory ability of the technique for analyzing the relatedness of SE isolates (**Table 1**) rather than a reflection of an evolutionary dominance of a few circulating strains [56]. These observations led to the pursuance of WGS as an alternative approach [57].

#### *3.2.2 Phage typing*

In contrast to the PFGE, phage typing is a phenotypic test that exploits the ability of certain bacteriophages, i.e., viruses that infect bacteria, to differentially attach and gain entrance into strains of bacteria. Phage typing of SE is the outcome of the pattern of susceptibility of different strains to a bacteriophage or a combination of bacteriophages, resulting in lysis of the bacterial cell [58]. A large number of phage types of SE have been described in Canada and elsewhere, however phage types 8, 13 and 13a were observed to predominate in Canada [59]. This observation may not reflect the presence of a few, circulating dominant strains of SE in Canada, but instead may be a consequence of the inadequacy of phage typing as a discriminatory tool that can accurately delineate the population structure of SE in Canada, similar to the PFGE as discussed above (see Section 3.2.2 and **Table 1**). The plasticity of phage types also diminishes its use as a subtyping tool. Factors such as the restriction system within the bacteria, ability of lipopolysaccharides and outer membranes to adsorb the bacteriophage, and the immune system of the vertebrate host infected by the bacteria can alter the phage type of an organism [60]. The reagents used for phage typing require very rigorous quality control and yet, test performance can be remarkably different among laboratories [61]. Changes occurring within an organism such as the acquisition or loss of IncN plasmid [62, 63], transfer of IncX plasmid [64] or loss of the lipopolysaccharide layer [65] have been shown to lead to poor test reproducibility. Thus, two isolates with the same phage type may in fact be unrelated and conversely, two isolates that show distinct phage types may be closely related. As a result of these factors, phage typing shows inadequate discriminatory power, partial typeability and poor reproducibility [66].


*Tracking* Salmonella *Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay… DOI: http://dx.doi.org/10.5772/intechopen.98309*

*The single nucleotide-polymorphism chain reaction (SNP-PCR) was used to test* Salmonella *Enteritidis (SE) isolates and a representative strain for each designated clade (from 1 to 25) is shown in comparison to traditional and whole genome sequence based subtyping results. Only the SNP-PCR and EnteroBase core-genome multi-locus sequence typing (cg-MLST) supplemented with Hierarchical level analysis (HierCC) showed distinct resolution of the representative strains. All other methods including 7 gene MLST, phage typing and pulsed-field gel electrophoresis (PFGE) did not provide adequate discriminatory ability relevant for strain differentiation, outbreak investigation or tracking SE from farm to fork. N/A: Not available.*

#### **Table 1.**

*Clade designation of* Salmonella *Enteritidis organisms depicting a representative strain for each clade and comparison with the results of traditional and new subtyping assays.*

#### *3.2.3 Multiple locus variable-number tandem repeat analysis (MLVA) assay*

MLVA is a molecular typing method that is based on PCR amplification of polymorphic regions of the DNA containing variable numbers of tandemly repeated sequences. The method has been standardized by PulseNet International and applied to the epidemiological investigations of SE either as a supplement or substitute for PFGE subtyping [67, 68]. An advantage of the MLVA is the designation of the typing results with a numeric sequence of tandem repeats. This represents a simple, easy-to-understand nomenclature which facilitated the reporting and exchange of test results between laboratories, and translated to a reliable tracking of an organism during epidemiological investigations. The discriminative ability of the MLVA has been variously shown to be superior [69], equivalent [70] or poorer than the PFGE [71].

Detailed genetic studies of SE have consistently shown the underlying causes of the poor discriminatory abilities of available subtyping tools, namely: isolates of SE are extremely similar (i.e., are highly clonal) and this poses a difficulty in finding a definitive, distinguishing trait that could be used to track lineages [70, 72, 73]. The timely arrival and increasing adoption of WGS has altered the analytical landscape.

#### **3.3 Application of whole genome sequencing (WGS) in** *Salmonella* **Enteritidis: identification and characterization**

The development of WGS procedure has heralded the application of a powerful technology for the identification and characterization of SE [57] which has been used for outbreak investigations [74], trace back procedures [75] and surveillance [76]. Furthermore, WGS analysis of SE has provided insights into phylogenetic relatedness of isolates, presence and prevalence antimicrobial resistance genes, novel mobile elements, virulence markers and bacteriophages in strains of the organism isolated from humans, food animals, production facilities and environmental sources [77–79]. Relevant to developing long term control and intervention strategies are the insights to be gained from the increasing application of WGS to the understanding of transmission dynamics of SE as was done in Chile to infer possible transmission of SE between gulls, poultry, and humans [80]. Bioinformatics approaches that allow useful information to be mined from genome sequences will now be discussed.

#### *3.3.1 Whole genome-based serotyping*

Serovar prediction can now be done on *Salmonella* isolates if the whole genome sequence is available by replacing the laborious agglutination assay (see Section 3.1) with an *in silico* analysis of the nucleotide sequence of the organism. Effectively, the traditional gold standard of traditional serology based on the Kauffmann-White Scheme has been replaced in the developed economies with *in silico* approaches [81]. Two of the mostly widely tools for this purpose are the *Salmonella In Silico* Typing Resource (SISTR) software and the SeqSero2 software [82, 83].

SISTR is an open, web-based bioinformatics platform capable of rapid *in silico* analyses of minimally processed draft assemblies of *Salmonella* genomes to generate accurate serovar designations. A collection of markers previously developed for the various *Salmonella* serovars formed the basis of the new tool [84]. The performance of SISTR is enhanced by the integration of additional multilocus sequence typing tools (see Section 3.3.2) which as a separate platform has been suggested as a replacement for the use of serotypes to define taxonomic as well as evolutionary groups of *Salmonella* [55]. SeqSero, which was launched in 2015 was developed to

#### *Tracking* Salmonella *Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay… DOI: http://dx.doi.org/10.5772/intechopen.98309*

employ the use of the *rfb* cluster, *fliC* and *flijB* to categorize *Salmonella* according to serovar using draft genome assemblies [83]. A subsequent improvement of the software, released as SeqSero2 included addition of markers at the level of the genus, species, subspecies as well as certain serotypes. Furthermore, a kmer-based algorithm was included that ensured a genome can be analyzed and the result available within seconds [85].

## *3.3.2 Multilocus sequence typing*

Multilocus sequence typing (MLST) evaluates the nucleotide sequences of multiple housekeeping genes of an organism as a means of establishing similarities or differences among isolates [86]. Based on the sequences, each housekeeping gene is assigned an allele which can be stringed together in a nomenclature that defines the organism. Although the MLST scheme was developed using the bacterium *Neisseria meningitidis* [86], the advantage of electronic portability of sequence data and ease of incorporation of additional genes found a good synergy in the advent of WGS and has gained application in food safety. This has birthed the widely used EnteroBase (https://enterobase.warwick.ac.uk/) [87], an integrated web-based platform that permits the upload and analysis of short read Illumina sequences. This has allowed the expansion of the MLST scheme which was based on the initial six housekeeping genes [86] to a series of flexible applications and expansions for *Salmonella* including seven genes (legacy MLST), 3002 genes identified as the core genome of *Salmonella,* to produce core genome MLST (cgMLST) and 21,065 orthologous genes detected in a set of 537 *Salmonella* genomes, regarded as whole genome MLST (wgMLST). Despite the adoption of the wgMLST by PulseNet International [88], an influential international body which overlooks regulatory subtyping procedures for foodborne bacteria, EnteroBase's Sequence Type, ST, of *Salmonella* became a widely adopted subtype descriptor for *Salmonella*. However, ST does not provide adequate resolution for epidemiological concordance and outbreak level discrimination [89], and in addressing the challenge EnteroBase has additionally provided the core genome ST, cgSTs, complemented with a newly described 11 levels of genetic resolution hierarchies or HierCC for *Salmonella* (**Table 1**) [87, 90]. The result is a tool that appears to provide the needed resolution for strain differentiation in the context of disease outbreaks*.*

#### *3.3.3 Single nucleotide polymorphism (SNP) pipelines*

Single base substitutions represent one of the commonest variation in genomes and the resulting polymorphism can form the basis for the characterization of a microbe including SE. SNPs are detected as nucleotide changes at a specific location in a genome after aligning or comparing it to a designated reference genome. Bioinformatics pipelines have been developed to automate the aligning and identification of the variants. A number of SNP pipelines are in common use and will now be described. SNVPhyl which was developed at the Public Health Agency of Canada identifies high quality SNPs among a set of selected isolates and is useful for generating phylogenetic trees from these SNPs [91]. Public Health England developed SnapperDB, also a high-quality SNP pipeline which analyzes microbial genomes, evaluates genetic distances among the genomes and infers relatedness of strains [92]. Parsnp detects core genome SNP in bacterial genomes and with the aid of adjunct interactive tool Gingr can be used to display informative overviews for specific sub-clades and genomic regions [93]. The kSNP tool detects SNPs in the pan genome but is uniquely able to carry out comparisons among genomes without a requirement for genome alignment nor the use a reference genome [94].

#### **3.4 Rationale for developing a new reliable, rapid, robust, cost-effective, epidemiologically concordant, easily implementable subtyping tool**

A strategy aimed at developing a tool capable of differentiating lineages in the highly clonal *S.* Enteritidis lineages will likely require interrogating a significant amount of the bacterial DNA information. The opportunities provided by the massively parallel sequencing technology [95], which deduces the entire nucleotide sequence of an organism appeared at the onset to be the most viable option in charting a course to address the need. Use of genome sequence for taxonomy including strain differentiation could conceivably work well with strains showing significant genetic diversity, e.g., >5% differences among unrelated strains. However, this may be very difficult for a clonal organism such as SE where diversity between unrelated strains could be as little as 1% and the similar regions of the genome would have to be ignored before focusing on the dissimilar portions to demonstrate an accurate quantitative estimate of relatedness. This may explain the failure to use whole genome sequence to develop a reliable estimation of genetic distance by means of a phylogenetic tree for a group of SE isolates (Ogunremi et al., unpublished data) using a method shown to work for other bacteria [96].

Consequently, this led to an effort to develop, analyze and characterize the genomes of SE. During the early phase of this endeavor involving a select number of SE isolates from Canada, 669 SNPs were detected in the genome of SE [57]. Subsequent analysis of 135 SE genomes present in the GenBank in 2014 led to the identification of a total of 1440 SNPs providing a robust resource that was exploited for a SNP-based strain differentiation and clustering of foodborne SE isolates [57]. Thus, despite the universal acceptance of the usefulness of whole genome sequences for microbes, individual organisms such as the highly clonal SE may pose a unique challenge that might require a more focused analysis on carefully selected targets of the entire genome.

## **4. Single nucleotide polymorphism-polymerase chain reaction test (SNP-PCR) as a new, nomenclature friendly procedure**

### **4.1 History and development of** *Salmonella* **Enteritidis lineages/clades and SNP-PCR**

The existing molecular methods investigate only very small portions or attributes of the entire bacterial genome. The PFGE, as an example, identifies enzyme restriction patterns in the genome whereas WGS-based procedures have available for analysis detailed information on the entire genome to exploit as a basis for comparison and discrimination. To that end, extremely small differences, such as single nucleotide polymorphisms (SNPs), can be identified and used for subtyping as long as these attributes are consistently preserved in a particular bacterial lineage. Notably, Allard and colleagues [97] carried out bioinformatics analysis of a total of 104 SE genomes belonging, for the most part, to the predominant PFGE pattern (JEGX01.0004). They described a total of 9 clades and found 366 genes that showed variation, i.e., presence or absence, in the SE genome. This observation complemented and expanded on an earlier study by another laboratory which showed that two isolates of SE with the same phage type, PT 13a, were differentiated by a relatively large number of loci, i.e., 250 SNPs [73]. Similarly, by using a specific reference genome, for instance SE strain P125109, the WGS-based sequence reads were mapped to the reference to find SNPs which were used to build maximum-likelihood phylogenetic trees.

#### *Tracking* Salmonella *Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay… DOI: http://dx.doi.org/10.5772/intechopen.98309*

Another study involving 55 SE strains selected from clinical and environmental samples in Minnesota and Ohio from 2001 to 2014 showed the existence of only two major groups [98]. Furthermore, WGS based SNPs analysis of 675 SE isolates from 45 countries formed a global epidemic clade and two new clades that were found to be geographically restricted to distinct regions of Africa [99]. Using a closely related serovar - *S.* Gallinarum - as an outgroup, a maximum-likelihood phylogenetic tree was constructed based on the alignment of a total of 42,373 SNPs [99]. In addition, a SNP-based phylogenetic structure of 401 European SE isolates implicated outbreaks correlating with national and international egg distribution network [75].

Thus, genetic variation that could allow the development of a routine subtyping tool for tracking purposes is present and demonstrable within the SE genome but was apparently not fully exploited given the few number of subgroupings in each of the reported, sampled populations, and this presented a need to properly mine the SE genome and develop a very discriminatory subtyping procedure. In exploring this need, our hypothesis was that the use of a large number of SNPs may not necessarily improve the power of discrimination. More is not necessarily better. A large number of uninformative loci may be counterproductive and undesirable for strain differentiation. As a first step to address this need, whole-genome sequences of 11 SE isolates obtained in Canada were developed and compared to SE P125109 reference strain phage type 4 which led to the identification of 1361 loci where the SE genome showed SNP [100]. Subsequent selection of 60 SNPs spread throughout the genome and distributed among different gene types and in intergenic locations led to the development of a rapid, inexpensive fluorescence-based real time PCR subtyping assay [55].

#### **4.2 The SNP-PCR subtyping procedure**

The SNP-PCR genotype assay is an allele-specific, single amplification procedure based on the specific binding of one of two, competing forward primers, 18–20 nucleotides long, which differ by one single nucleotide at the locus of interest. The use of a single reverse primer completes the amplification process leading to the accumulation of an amplicon bearing the SNP of interest. Each primer is designed with a specific tail that allows a complementary binding with a commercially provided, customized sequence labeled with a fluorescent dye, FAM or HEX for allele 1 or 2 respectively (LGC Genomics, Beverly, MA). Thus, the first cycle of amplification ensures that the specific forward oligonucleotide present in the primer mix binds to the sequence containing the SNP and excludes the other primer. The reverse primer, also 18–20 nucleotides long, binds and elongates the fragment during amplification ensuring that the tail sequence is present, which then allows the accumulating fragment to contain either the FAM or HEX fluorescent label depending on the initial binding of one of the bi-allelic primers, which is dictated by which of the SNP corresponds to allele 1 or allele 2. Thus, detection is based on the use of fluorescent labeled sequence that assigns the allele number to either of the two nucleotides that may occupy the SNP position. The SNP alleles are compiled for all SE strains at the 60 loci and used as input to carry out evolutionary history analyses using Maximum Parsimony method, which was conducted using Molecular Evolutionary Genetics Analysis on the MEGA-X computing platform [101]. The distinct grouping of the SE isolates are identified as clades and each given a specific numerical description starting from 1.

Following the development of the SNP-PCR procedure, our initial application of the assay to a group of 55 SE isolates obtained in Canada led to the recognition of 12 clades of SE [57].

#### **4.3 Twenty five circulating clades of** *Salmonella* **Enteritidis**

Recently, the laboratory validation of the SNP-PCR assay was completed using 1,127 SE isolates obtained from food, animal, humans, and environmental sources in Canada and Europe and we observed a total of 25 circulating clades of SE (**Table 1**, Ogunremi et al., manuscript under preparation). In addition, 13 other globally distributed isolates identified from published papers [98, 99] as well as the widely used reference SE strain P125109 phage type 4 were also included in a phylogenetic comparison using the Maximum Parsimony method. These strains were distributed across the generated phylogenetic tree and homed to distinct SE clades providing further validation of the SNP-PCR tool to appropriately cluster strains and at the same time, distinguish among different strains (Ogunremi et al., manuscript under preparation). The validation procedure unambiguously demonstrated the robustness of the assay while displaying its prowess in estimating genetic distances and relatedness among and between clades, and its relevance in constructing an evolutionary map of SE following the testing of a large number of isolates.

#### **4.4 Advantages of SNP-PCR: nomenclature and population structure**

Previous studies aimed at evaluating the population structure of the highly clonal SE have reported fewer lineages and clades among isolates tested. For instance, a study of 675 very diverse isolates collected over many decades (1948– 2013) in 45 countries and 6 continents revealed the presence of only 3 clades; a subgroup of 58 isolates was identified but could not be clustered by the method used by the authors [97]. Yet another study demonstrated 9 clades among a large but PFGE-uniform group of isolates [99]. These studies, which showed a limited diversity among SE populations, served to underscore our contrasting observations, and reinforced the excellent discrimination observed for SE using the validated SNP-PCR assay. The SNP-PCR compares well with cgMLST-HierCC function in EnteroBase in discriminating among strains chosen to represent SE clades from a very diverse SE population from a variety of sources and different continents (**Table 1**; Ogunremi et al., under preparation).

Apart from being a highly discriminatory and robust assay, the SNP-PCR is very cost-effective. Reagents cost are estimated at Can\$0.25 per SNP per isolate and testing 60 SNPs is cheaper than the traditional, less discriminatory subtyping assays (Can\$26 for phage typing and Can\$36 for two-enzyme PFGE analysis in reagent costs) or for WGS (Can\$100). The SNP-PCR validation procedure (described above) showed that only 17 SNP loci needed to be tested to assign an isolate to a clade and the test performed excellently well on crude, boiled bacterial extract, obviating the need for DNA purification and further creating an increased savings of reagents, labour and time.

Another important attribute of the SNP-PCR is its equal adaptability to few samples or a large number of samples. When compared to Illumina WGS which requires a prescribed number of samples per run (e.g., 20 *Salmonella* strains using MiSeq version 3 library kit over 600 cycle sequencing which runs for 65 hours), the SNP-PCR can be used to test one or a few samples with the appropriate controls without any cost implication on the volume of analysis. At the other end, a single PCR sample can handle a 384-well plate loaded with hundreds of samples and machine run completed in 2 hours. The labor costs of running the SE SNP-PCR test (2 h PCR time) and analyzing the results are at least an order of magnitude lower than those of any subtyping approach including traditional molecular tests or WGS. The SNP-PCR test shows very good reproducibility (95%) in tests conducted in six laboratories.

*Tracking* Salmonella *Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay… DOI: http://dx.doi.org/10.5772/intechopen.98309*

The SNP-PCR impressively satisfies all the seven criteria expected of an ideal subtyping test which includes cost effectiveness, rapid performance, robust results, typeability, high discrimination, reproducibility, and epidemiological concordance [66].

## **5. Conclusions**

The bacterial pathogen, *Salmonella* Enteritidis is one of the most prevalent causes of foodborne illness in humans worldwide, yet tracking a strain of the organism through the food safety system is challenging because of its clonal nature, evident at the genomic level, which historically has resulted in poorly discriminating laboratory typing methods. The current application of genomics has led to the development of comprehensive and highly discriminatory tools however there are still challenges with the interpretation of the outputs and the application of the methods to differentiate between outbreaks and sporadic infections. The effect is a poorly understood population structure of SE.

This chapter illustrates the existence of 25 clades of SE, which should be useful for defining the population structure and tracking the pathogen from farm to fork. The phylogenetic relationships among the 25 clades of SE was obtained using a population of 1127 isolates obtained from a variety of sources in Canada and Europe. The validated SNP-PCR assay displayed the attributes of an ideal subtyping test and can be implemented in resource deprived countries where routine genome sequencing remains unaffordable, as well as in resource rich countries when characterizing a few isolates may not justify the expense of a genome sequencing run or for surveillance where interest in characterizing a large number of lower priority, non-clinical but valuable isolates is a very desirable goal.

### **Acknowledgements**

This work was funded by the Canadian Food Inspection Agency, Ontario Ministry of Agriculture, Food and Rural Affairs, and Genome Research and Development Initiative of the Government of Canada.

#### **Conflict of interest**

The authors declare no conflict of interest.

#### **Acronyms and abbreviations**

