The Pangenome of *Pseudomonas aeruginosa*

*Mauricio Corredor, Juan David Patiño-Salazar, Diana Carolina Castaño and Amalia Muñoz-Gómez*

#### **Abstract**

This review summarizes the most important reports about *Pseudomonas aeruginosa* pangenome. Pan-genomics has tackled some fundamental concerns in pathogenic bacteria. PATRIC and other databases, store more than 9000 *P. aeruginosa* genomes. This data mining is an opportunity to develop discoveries related to antibiotic resistance, virulence, pathogenicity, fitness, and evolution, among others. Observing the different pangenomes of *P. aeruginosa*, it is concluded that this species has an open pangenome, and its accessory genome is larger than the central genome. HGT is one important source for *P. aeruginosa* genome. In recent years various authors developed *P. aeruginosa* pangenomes, from works with five genomes to more than 1300 genomes. This last work analyzed 54,272 genes, and they found a short and tiny core genome (only 665 genes). Other research with lesser strains or genomes identified a core genome bigger, almost 20% of the pangenome. Nevertheless, the total work proves that the accessory plus unique genome is larger than the core genome in *P. aeruginosa*.

**Keywords:** pangenome, pan-genome, *pseudomonas aeruginosa*, bacterias, antibiotic resistome

#### **1. Introduction**

Pan-genomics/pangenomics is an innovative tool to explain pan-genome/pangenome construction throughout the species, which is resolved with comparative genomics, among others. Pangenome is divided into two main classes: the core and accessory genomes. Thousands of unknown bacteria and microorganisms are exposed to natural and manufactured antibiotics, toxins, and harmful compounds, to the highest and lowest temperatures, extreme pHs, and other species competitors. Pangenomics is too a powerful approach to identifying those thousands of involved genes. Virulent genes, phenotypes, and environmental expressed genes from horizontal transference (HGT) derive from the core or accessory genome. These latest concepts are a challenge as a new point of view to face *Pseudomonas aeruginosa*: a new pathosystem, multidrugresistant, and old human pathogen. To delve more specifically into pangenomics and pangenome, there have been published two remarked books recently [1, 2].

Different authors and web-published pangenomes' *P. aeruginosa* [3–7] https:// pangenome.org/Pseudomonas\_aeruginosa): However, other valuable databases


#### **Table 1.**

*List of some pangenomics tools. After publication, some addresses were inactivated. To obtain unavailable tools, please to contact the corresponding author from the references. More than 40 pangenomic tools are now available in online platforms or for local applications [12].*

stocked *P. aeruginosa* genomes (https://www.pseudomonas.com/ or https://patricbrc. org/search/?and(keyword(Pseudomonas),keyword(aeruginosa)) with complete and draft 9954 genomes). These authors published different pangenomes using varying amounts of *P. aeruginosa* genomes: Sharma et al. [3] used 5 genomes, Fischer et al. [5] used 100 genomes, Ding et al. [7] used 153 genomes on the web page, Mosquera-Rendón et al. [4] used 181 genomes, and Freschi et al. [6] used 1311 genomes. We will see in the following subtitles, which will be the best amount of genomes to reach one open and close pangenome.

In the pangenome, the core or central genome is the total of genes common to all the examined and analyzed genomes from a genome pool of a given species. Likewise, this represents the genes present in the overall strains from one species. On the other hand, accessory, variable or flexible genome (for some authors, dispensable genome) represents the genes that do not present in all strains of one species [8], http://www. metagenomics.wiki/pdf/definition/pan-genomeb [9, 10]. Those terms are the key in pangenomics to reach a significant pangenome into some species. Moreover, the next sections will see the robustness degree or gene orthology analysis to have a core and accessory genome.

Concomitantly, multiple pangenomic tools were developed and tested over the last ten years. Of course, we can start from genomics annotation if we do not want to use the available databases since classical pangenomics use only annotated genes and orthology analysis. We will mention some of them because it is sure the high amount of pangenomics tools available today. Furthermore, now database strategy, currently developed to perform via a personal server [11], there are now online resources that allow to quickly build own pangenome analysis see **Table 1**.

#### **2. What is a pangenome?**

The pangenomic technique requires at least two or multiple genomes of bacteria, archaea, fungi, or any eukaryote. This tool can provide the broadest resolution of genetic variation, among them: pathogenicity islands, virulence genes, mobile elements, transposons, horizontal gene transfer, pathogenicity islands, orthologous

#### *DOI: http://dx.doi.org/10.5772/intechopen.108187 The Pangenome of* Pseudomonas aeruginosa


#### **Table 2.**

*Key terminology in the bioinformatics world [8]; http://www.metagenomics.wiki/pdf/definition/pan-genomeb [9, 10].*

shared and syntheny, plasticity, evolution, and others. The development of pangenomics has promoted advances in many fields, like bioinformatics and computational biology, comparative genomics, molecular medicine, molecular epidemiology, agronomy and foods, and many more [18]. In the beginning, one advantage of pangenomics was that experimental data have shown for some species that new genes are being discovered even after sequencing several strain genomes [19]. Given that, the number of unique genes is vast, the pangenome of a bacterial species might be orders of magnitude larger than any single genome, as predicted by [20], more than 14 years ago.

The connection between the core and accessory genome put together the close and open pangenome inside the species. Open pangenome increases when new genomes are added, contrary to a close when a new genome added does not increase the size of the pangenome [10]. **Table 2** summarizes the terms applied to pangenome and pangenomics. It is worth mentioning that the term coined "unique genome", is related to solitary genes not shared among strains.

#### **3. From pangenomics to pangenome**

The term pan-genome, currently written today as pangenome was first used by Sigaux [21] in cancer research to describe a public database containing an assessment of genome and transcriptome alterations in types of tumors, tissues and experimental models. Later, Tettelin et al. [22], using bacterial genomes, defined a microbial pangenome as the combination of a core genome, carrying genes present in overall strains. And a dispensable genome (also described as a flexible or accessory genome) is composed of genes absent from one or more strains [23]. A generalization of such representation could contain not only the genes (transposons, promotors, other mobile elements such as HGT, microRNAs, etc., but also other variations present in the collection of genomes.

#### **Figure 1.**

*Pangenomics tool, in the worldwide context. Pangenomics is useful for the treatment of patients in evolutionary studies. The HGT and virulent genes are investigated today with pangenomics to solve the MDR problem of P. aeruginosa.*

The study of bacterial pangenome has many applications in clinical microbiology, also to study resistant genes or HGTs, virulence, and pathogenicity, and finally to classify species and to know its evolutions and clonal dispersion inside de environments see **Figure 1**. The pangenomics offer a wealth of information about humanassociated bacterial species (12).

#### **4. Could pangenome redefine bacteria species?**

In his excellent chapter of Bobay [24], about the prokaryotic species concept, supported by the pangenome study, takes into account the approaches from Dobzhansky [25] and Mayr [26]. Speciation mechanism is involved in microorganisms, directly or indirectly, since the sustained interruption of gene flow between populations does not engage in sexual reproduction *stricto sensu*, escaping to the classic concept of species. Bobay [24] says that the definition of species has direct consequences regarding the definition of pangenome, and it is clear that 16S rRNA is not entirely cohesive, concluding and affirming: "studies focusing on the evolution of bacterial pangenomes should be based on rigorous species delimitation since the misclassification of a single genome can lead to dramatic overestimates or underestimates of the size of a species' pangenome".

Pangenomics is a fundamental tool for studying the entire repertoire of gene families in the genomes of pathogenic bacterial clades such as *P. aeruginosa.* This does

#### *DOI: http://dx.doi.org/10.5772/intechopen.108187 The Pangenome of* Pseudomonas aeruginosa

not only provide the whole set of genes shared by this species, but also can be applied in interspecies differentiation analysis to mine species-specific genes to use a wealth of genome data [27]. Wang et al. [28] aimed at mining novel specific target gene sequences of *P. aeruginosa* based on the pangenome analysis and established high-specificity and high-sensitivity PCR and quantitative real-time PCR (qPCR) methods based on these targets. They used pangenome analysis to analyze the whole genomic sequences of 1,000 *P. aeruginosa* strains compared with other *Pseudomonas* species. A remarkable problem is the deficiency and mutations of some virulence factors in *P. aeruginosa* strains, which can result in false positives since existing pathogenic factors, may cause a potential threat of food poisoning (Baloyi et al. [29] cited by Wang et al. [28]).

The application of pangenomics in *P. aeruginosa* species with an open pangenome allows realizing that this species accepts external genes constantly and continuously from other strains and species, giving genetic richness and fitness to *P. aeruginosa*. Mutations over the genes and methylations in DNA and RNA gave a broad diversity to *P. aeruginosa*, expanding the variety and allowing success in diverse environments. Hilker et al. [30] compare some clonal genomes and indicate that the differential genetic repertoire of clones maintains a habitat-independent gradient of virulence in the *P. aeruginosa* population.

*P. aeruginosa* is excellently characterized in NCBI/taxonomy (https://www.ncbi. nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=287; [31, 32]. However, genomics and classical taxonomic definition (Pseudo, *false*), (monas, *unit*) are on different approaches; although the first is based on sequencing, the second is based on metabolism and 16S rRNA, it is significant to remember that 16S must be sequenced. Also, to define species is necessary a certain number of strains. Likely, classical taxonomy and pangenome have similarities: it needs a certain number of strains, when better, a large number of those ones. In summary, pangenome comes to put together genomics and classical taxonomy. The contribution of *P. aeruginosa* is remarkable because new tools must complement the information and research and never disperse the consensus. When Carl Wose and colleagues redefined [33–35] taxonomic classification based on 16S rRNA, the initial reluctance to classify was constant. Today the regular is classified with 16S rRNA. But when looking at certain strains and their metabolism was found that 16S rRNA also falls short of classification [36]. Precisely the pangenomics comes in response to the new inconsistencies of 16S rRNA [37, 38].

### **5. Core and accessory genome of** *P. aeruginosa*

As previously mentioned, different *P. aeruginosa* pangenomes were developed by Sharma et al. [3], Fisher et al. [5], Ding et al. [7], Mosquera-Rendón et al. [4], Freschi et al., [6], with a different number of 5, 100, 153, 181, 1311 genomes, respectively. However, the works ranging from 100 to 181 genomes almost agreed on the size of core and accessory genomes. Mosquera-Rendón et al. [4] found approximately 15 % was the core genome (2503/16,820 genes) of the pangenome, while Ding et al. [7] in their web page for *P. aeruginosa*, 16,327 genes selected from 18,780 total genes are accessory genome, and 2453 genes are the core genome. Freschi et al.'s [6] pangenome of *P. aeruginosa* consists of 54,272 genes: 665 are core genes, 26,420 are accessory genes, and 27,187 are unique genes. This work has a robust cutoff bioinformatic to discriminate homologous genes, eliminating potential orthologous (**Figure 2**).

It will most likely be necessary to adjust the core and accessory genome measurements for *P. aeruginosa*, given the differences mentioned before. One facility to

#### **Figure 2.**

*The P. aeruginosa pangenome from Ding et al. [7] in the interactive database https://pangenome.org/ Pseudomonas\_aeruginosa. This web developed with the panX tool allows comparing the pangenomes data of different pathogens and displays rapid results of pangenomics analysis.*

improve this is that *P. aeruginosa* has become an influential species in many scenes, not just clinical or environmental. Therefore, the large number of strains coupled with improved bioinformatics tools, such as improving the complete sequencing of a large number of strains, will enhance the calculations between core and accessory genomes. Genomics and pangenomics are still focused on coding information, leaving aside epigenomic information such as microRNAs, and methylations, among others.
