**3. Phylogenetic diversity of microbial communities based on 16S rDNA gen**

Estimating the taxonomic and phylogenetic diversity of a microbial community is also possible through sequencing and analysis of small ribosomal RNA subunit (16S rRNA) gene, whenever this sequence has been considered for a long time a stable marker, crucial in the microbial systematics of the last 30 years. 16S ribosomal ribonucleic acid is a key component of the small subunit of prokaryotic ribosomes, central player in the cellular biology of microorganism; it serves as a linker for the process of translating genetic information to proteins [20]. Because DNA is much easier to sequence than RNA, DNA segment coding for 16 rRNA is obtained for the purposes of sequencing (**Figure 3**). This gene fragment meets several features that have made it a "quasi-gold standard" for bacterial taxonomy:


*Metagenomics-Based Phylogeny and Phylogenomic DOI: http://dx.doi.org/10.5772/intechopen.89492*

#### **Figure 3.**

*Prokaryotic ribosome general representation and variable sequence regions used in microbial phylogenetic diversity estimations.*


*a Not redundant manually curated small (16S, SSU) subunit ribosomal RNA sequences.*

*b The dataset contains 23,629 SSU sequences representing a single bacterial type strain up to June 2017. c Phylotypes with validly published names.*

*Most popular public databases for depositing and analyzing sequences of the 16S ribosomal gene.*

#### **3.1 16S community profiling by analysis of ribosomal amplicons**

Microbial diversity is measured as a function that depends on the richness and abundance of distinct taxons among any community [25]. Obtaining representative DNA sequences from the entire community is essential to make valid inferences. Profiling a microbial community through 16S gene analysis generally consists of four steps (**Figure 4**). To date, several computational tools have been developed to analyze microbial communities through the 16S gene marker; however, estimating the total microbial diversity in any environment is a still a major challenge [6, 26–28], influenced by several factors, among them we want to mention two: (I) processing huge amounts of data moves within the limits of modern computing and (II) the need for some expertise that can cost years of training. Fortunately, many tools have been developed in recent years, aiming to make bioinformatics platforms dedicated to this type of analysis more human-friendly, and there are dedicated sites exclusively to deposit computational alternatives for almost all needs, for example, https://github.com/.

**Table 3.**

**Figure 4.** *General steps for profiling a microbial community through 16S gene analysis.*

A good example of these multiplatforms to profile microbial communities is the Microbiome Taxonomic Profiling (MTP) pipeline from EzBioCloud site (https:// www.ezbiocloud.net/contents/16smtp) [24]. Among its fundamental advantages are: it is free, knowledge of Linux environment is not needed to carry out the analyses, and several types of outputs such as functional profiles, taxonomic and phylogenetic structure, as well as on-demand comparison with other published microbiome data are fully available. New users of EzBioCloud will be required to open a local server account (https://www.ezbiocloud.net/signup?from=addMTP); after that you can upload up to 100,000 reads for sample and begin the analysis. We list general steps to perform a profiling on the platform (**Box 1**).

The platform consists of a very intuitive and user-friendly presentation that guides the beginner user at every stage of the analysis. The first step is the uploading of the next-generation sequencing data (16S amplicon reads). After that, you can request for the MTP pipeline, and the analysis starts. In a relatively short time, you can access the result portal with the preprocessing results resumed in pre-filtered reads (by removing low-quality and chimeric amplicons), statistics about read lengths, and taxonomic read assignments at species level.

Other outputs in results portal are related with several diversity indices, taxonomic composition and hierarchy, and graphical implementations like Krona [29]. MTP implements seven different diversity indices; among them is the phylogenetic diversity index, a measure of biodiversity that considers phylogenetic difference between taxons and ponders several variables like taxonomic diversity and species abundances or distributions.

#### **3.2 Extracting 16S sequences from assembled data**

In occasions, we do not have a set of DNA short-reads, but assembled composites in contiguous regions of variable size. Such is the case of genomes assembled from metagenomes or contigs from complex metagenomes. Inferring taxonomic diversity from this type of data usually requires other strategies. One of the most useful is to predict all the rRNA sequences contained in the assembly and cluster them

*Metagenomics-Based Phylogeny and Phylogenomic DOI: http://dx.doi.org/10.5772/intechopen.89492*

#### **Box 1.**

*16S-based microbiome taxonomic profiling pipeline used in EzBioCloud.*

according to their identity (this implies making a list of nonredundant sequences) to define operative taxonomic units. A simple way to address this problem is through the use of Barrnap software [27]; it works through the Unix command line and has the advantage of consuming few computational resources, so that several complex microbiomes can be analyzed in a personal computer for extraction of rRNA sequences. Barrnap gives us an output with all predicted sequences; this includes 5S, 16S, and 23S rRNA in the case of bacteria. The sequences can be saved on-demand in a text file and subsequently analyzed by a third-party phylogenetic processing software to establish evolutionary relationships between taxa. A suitable platform for this objective is SeaView [28], which contains sequence alignment and curing utilities, as well as a set of phylogenetic reconstruction methods, like PhyML, which uses maximum likelihood algorithms and seven different evolutionary models. It is also possible to use distance methods such as Neighbor Joining and BioNeighbor Joining, both with seven different methods to calculate distances between sequences. The platform is open access and has the advantage of being a graphical application that works on Unix and Windows, as well as being very intuitive.

### **4. Open-source software for phylogenetic and phylogenomic surveys**

Genome-based comparisons play an essential role in the current taxonomy and phylogenetic of Bacteria and Archaea domains and eventually will replace the single gene target approach ruled by 16S rRNA gene phylogeny. The exponential growth of complete genomes and genome drafts with significant completeness values and low contamination (<5%) in international databases has resulted in an approach to phylogenetic analysis where the whole information has become in a more conservative


#### **Table 4.**

*Open-source software for metagenomics-based profiling and phylogenies.*

*Metagenomics-Based Phylogeny and Phylogenomic DOI: http://dx.doi.org/10.5772/intechopen.89492*

fingerprint of the taxonomic categories. The current challenges for science involve improving existing methods for data acquisition and processing, since comparative analysis, even among modest-sized microbial genomes, can be computationally expensive. Here we present a list of those open-source tools and easy-to-use and modest hardware requirements, with the aim that they can be applied by biologists to study microbial diversity in a phylogenetic context (**Table 4**).
