**2.2 Metagenomic tools used in symbiosis and co-evolution studies**

In recent years, the use of genomic approaches has revealed an unprecedented diversity and bacterial ubiquity in different types of samples (**Figure 1**), through the analysis of 16S ribosomal sequences [1, 2, 5, 6, 18, 19, 22, 25–27]. These techniques have allowed the molecular analysis of populations and how different biological processes have been established, controlled, and evolved [5, 28, 29].

The metagenomic composition analyzes have been carried out through the use of different programs (QIIME, QIIME2, and MOTHUR), that align the reads against a database of ribosomal genes (GreenGenes, SILVA, and RDP) and assign them operational taxonomic units (OTUs), using a distance of 3% and a confidence interval of 80% [29–33]. Once the OTUs have been assigned, the aforementioned programs allow the determination of diversity indices, richness, and main component analysis and perform the rarefaction of the samples [1, 2, 5, 19, 25, 29–32, 34, 35].

Other taxonomic classifiers are based on alignment of short sequences previously edited, by single or paired ends (Kraken, Kraken2, OneCodex) comparing them with the databases available in each program. In the case of Kraken, it makes

#### **Figure 1.**

*Genomic and metagenomic techniques for the analysis in different samples.*

use of the Ref-Seq database where the reads are divided into fragments known as k-mers and are compared with sequenced genomes [34, 36]. The resulting files of these programs are provided in tabular format (tsv), which facilitates their export and processing in other types of programs such as Vegan or R, where studies of richness, diversity, and rarefaction can be carried out [12, 34, 35, 37, 38].

The use of different taxonomic binning programs has been able to determine the presence of ubiquitous microbial phyla present in samples from arctic, temperate, and tropical environments such as: *Proteobacteria, Actinobacteria*, and *Cyanobacteria*, which are considered cosmopolite phyla. The main difference between each site is the proportion of each taxa, which reflects the conditions of each environment [6, 8–10, 39–41]. A similar behavior has been observed when studying the microbiome in different animal models where the phyla: *Proteobacteria, Acitnobacteria, Firmicutes*, and *Bacteroidetes* have been reported among those of greater relative abundance [4, 9, 39, 42–46]. This shows that microbial communities are highly dynamic where the physical-chemical factors of the site, health status, and nutrition shape the metagenome and can determine how reactive a microbial community is to environmental changes.

The use of genomic tools has made possible to identify the core microbiome of different organisms, given that, despite living in different habitats, they share similar bacterial communities, which implies the existence of biological filters that shape the bacterium-host interactions, resulting in a stable relationship with the holobiont [2, 28, 45–47]. In the case of *Apis mellifera*, a global core microbiome formed by *Proteobacteria, Firmicutes, Bacteroidetes,* and *Actinobacteria* has been identified, together with a high amount of lactic acid bacteria which have a beneficial activity in the health of the host organism due to their involvement in the immunomodulation of the intestinal microflora [25, 39, 48]. The presence of symbiont microorganisms within the intestinal tract in different animal species (*A. mellifera*, *Litopenaeus vannamei, Mus musculus, Homo sapiens*) have been reported as necessary for survival, since their cooperative behavior increases the vigor of a community [28, 39, 47, 49, 50]. Recent studies in fecal samples of farm animals have revealed the presence of intervening sequences (IVS), which are host-specific and provide a basis for the differentiation of the microorganisms derived from different hosts [51].

The role of microbial communities within a host is important. Given the existing delicate balance of these associations, any type of alterations in the microbiome composition could cause disease in the host organism [6, 12, 39, 45]. Previous studies have revealed that in diseased individuals of different species, the microbial diversity is significantly reduced. This could be due to the fact that alterations in the microbiome composition skew the association between the host and the microbiome producing dysbiosis and increasing the number of opportunistic pathogens [6, 12, 13, 16, 27, 39, 45, 52]. In marine environments, the continual presence of pathogens has been observed in environmental samples [16, 44] and in several marine organisms (*L. vannamei* and *M. nipponense*) [17, 18, 39, 45, 46]. The continual presence of pathogens in low proportion has been reported during the life cycle of these species, suggesting an active *in situ* infection in which the host has co-evolved with the parasitic organisms and developed mechanisms that cope with the pathogenic mechanisms of the parasites [13, 16, 17, 27, 45, 46, 52]. It has been observed that the developmental stage in *L. vannamei* influences the pathogenic response to *Vibrio*, where the proportion of protective commensal bacteria, *Bacteroides* and *Propionibacterium*, tend to decrease as the host aged in contrast the presence of *Vibrio* increases in diseased individuals [18]. Other mechanisms of coevolution have shown that processes of parasitism and predation can influence the global exchange of resources in an ecosystem. Studies conducted in *Escherichia coli* and the bacterial predator *Myxococcus xanthus* have shown that the genome evolution of the predator

#### *The Use of Bioinformatic Tools in Symbiosis and Co-Evolution Studies DOI: http://dx.doi.org/10.5772/intechopen.86559*

and prey exhibited accelerated genome evolution when compared to controls, where the predator (*M. xanthus*) showed adaptations to cell mucoidy and the prey (*E. coli*) showed adaptations to outer membrane-proteases [7].

The functional analysis of the microbial communities has been carried out using the PICRUST program. This program estimates the families of genes present in a metagenome, by the phylogenetic comparison with sequences of gene families previously reported in databases. These predictions are pre-calculated for genes that code for proteins present in orthologous gene families (COG) or in the Kyoto Encyclopedia of Genes and Genomes (KEEG) [53]. The differential expression of these predicted functions could be assessed with the STAMP software which allows several statistical analysis, size effect, and sample corrections [54]. The use of the afore mentioned protocols have allowed the observation of various attributes in environmental samples related with carbon fixation, amino acid metabolism, and signal transduction in lakes, swamps, and other water bodies [9, 10, 16, 22, 44, 55]. These reports also showed the presence of several bacterial taxa (*Actinobacteria, Verrucomicrobia*, and *Proteobacteria*) who were able to synthesize several extracellular enzymes that digests the organic matter [9, 16, 24] or mineralize other nutrients [22, 44].

The influence of the microbiome on the host function have been proposed as a co-evolutionary process where the functionality and the composition of the microbiome can be influenced by the feeding habits of the host [4, 21], and the host can take advantage of the specialized microorganisms who are able to synthesize metabolites that are not present originally in the environment [6, 39]. The consumption of seaweeds by Japanese allows the introduction of algae associated bacteria, which transfer the genes involved in the degradation of the algal sulphated polysaccharides to competent gut resident bacteria with a process known as horizontal gene transfer [28]. Certain marine invertebrates (*Elysia chlorotica*) that feed on algae are able to maintain the algal plastids as photosynthetically symbionts which allow the use of photosynthates as food source [26]. These examples of coevolutionary processes show how the functionality of the microbiome could be influenced by the dietary habits of the host since; these metabolic add-ons allow the host to thrive in otherwise adverse environmental conditions (oligotrophic habitats).

### **2.3 Metatranscriptomic tools used in symbiosis and co-evolution studies**

The metatranscriptomic allows the establishment of parallel relations between the host and the microbiome, but studies require a series of previous steps in order to obtain unbiased information such as the removal of rRNA and the microbial mRNA enrichment (**Figure 2**) [17, 19–21, 56].

The assembly of genomes and transcriptomes uses short sequences that are separated into fragments known as *k-mers*, which are aligned and compared graphically (De Brujin graphs) in order to perform *de novo* reconstruction of the genome or transcriptome. Several programs such as Velvet, SOAP, Trinity, and FLASH are capable of performing it by using a reference genome or transcriptome, if available [8, 57–63]. In the case of the Trinity platform, it is capable not only of assembling but also of mapping within the assembly (Bowtie1 and Bowtie2), basic statistical analysis of the assembly, quantifying transcripts (RSEM, Salmon, eXpress, and Kallisto), and performing differential expression of transcripts (edgeR, DESeq2, ROTS, and lima/voom) [8].

The metatranscriptomic studies have allowed to reveal the functions of the microorganisms within a host or in different environments and to identify, in both host and microbiome, transcripts related mainly to metabolic processes associated with the nutrient uptake. These observations suggest that the symbiotic chemoautotrophic bacteria provide organic compounds to the host organism that uses it for

#### **Figure 2.**

*Transcriptomic and metatranscriptomic techniques for the analysis in different samples.*

its nutrition [11, 20, 45, 64]. In fact, recent studies have reported that more than a third of genes are shared among living organisms, especially to those related to the central metabolic pathways (Glucolysis, TCA, Oxidative phosphorylation, Purine and Pyrimidine metabolism) which could increase the efficiency for the digestion of several biomolecules [11, 17, 26, 45].

The use of bioinformatic tools in metatranscriptomics studies has allowed the visualization of the host-microbiome interactions, especially those related with the primary metabolism [13, 38, 52]. The visualization of the shared enzymatic modules is accomplished through the use of identifiers derived from KEGG orthology (KO) and Enzyme Codes (EC) on the iPath3 platform [65]. In this platform, it is possible to overlap metabolic functions (host-symbiont) using the EC and KO identifiers in different metabolic maps (general metabolic pathways, bacterial metabolism, and secondary metabolism), showing graphically the enzymatic modules of each individual and highlighting the enzymatic modules with a shared function.

Metatranscriptomic studies have been able to show that microorganisms are capable of generating complex trophic networks communicating with each other through chemical signals in a process known as quorum sensing [12, 26, 56]; however, this process is not restricted only to microorganisms; recent studies have suggested an interdominion quorum sensing [4, 17, 21].

#### **2.4 Metaservers used in the metagenomic and metatranscriptomics studies**

The bioinformatic tools mentioned in this chapter are open-source programs, requiring the user to have a UNIX or OSx operating system installed; a RAM memory greater than 16 GB, a hard disk greater than 500 GB of storage and knowledge about command lines in UNIX [16, 19, 20, 22, 24, 30–32, 35, 46, 48, 51]. These requirements can be complicated for those who want to initiate a bionformatic analysis; however, there are other options, such as the metaservers, which can allow the data processing in a graphical environment.

The metaservers are web service providers that assemble a series of programs and applications that otherwise are dispersed. Among the most used metaservers are Galaxy, TRUFA, and MG-RAST [22, 24, 34].

#### *The Use of Bioinformatic Tools in Symbiosis and Co-Evolution Studies DOI: http://dx.doi.org/10.5772/intechopen.86559*

Galaxy: it is a collaborative initiative that provides a free set of tools and bioinformatics programs ranging from quality control of sequences (FASTQC), sequence editors, data grouping tools, tools for assembly (Trinity), sequence mapping (Bowtie), transcript quantification (Salmon and Kallisto), and metagenomic analysis programs (Mothur, Vegan, Kraken, and Krona) [34]. Being an open initiative, Galaxy presents a series of servers that offer different programs such as the functional prediction of a metagenome by PICRUST (Langille Lab and Huttentowe Lab) and servers dedicated to the functional annotation of transcriptomes (ANASTASIA).

TRUFA: Transcriptome User-Friendly Analysis [22], a program developed by the Institute of Physics of Cantabria, is a free server and contains several programs exclusively for transcriptomic (metatranscriptomic) analysis ranging from quality control (FASTQC and PRINSEQ ), edited of sequences (CutAdapt), assembly of sequences (Trinity), quantification of transcripts (RSEM and eXpress) and functional annotation (BLAST2GO and HMMER). The files can be edited beforehand, and certain modules of the platform can be accessed, such as the functional annotation in case of already assembled sequences.

MG-RAST: Metagenomic Rapid Annotation-based on Subsystems Technology [24] is an open platform capable of analyzing sequences from different NGS platforms (Illumina, PacBio, and Nanopore). Unlike the aforementioned servers, MG-RAST has a pipeline that includes the quality control of the sequences, removal of adapters, detection of isoforms of transcripts, taxonomic comparison, and functional assignment. This server has several databases where the results can be analyzed regarding function (SEED, KEEG, COG, and NOG) and taxonomy (ITS, SILVA, RDP, and GreenGenes). It also has tools to export the data in tabular format, fasta, or in the form of BIOM type matrix.

BLAST2GO: it is a sequence annotator that is able to perform searches in the NCBI, which in its basic version has the BLAST algorithms to add taxonomic filters in order to accelerate the annotation. It also allows searches of Interprotein domains (InterProScan), allows the classification of proteins based on the Gene Orthology (GO) database, interaction maps between each GO term, function enrichment analysis (Fisher Exact Test) and the analysis of the metabolic modules present in the KEGG. The PRO version of this program allows making several annotations at the same time, using CLOUD-BLAST services and performing other types of analysis such as the differential expression of transcripts [37].
