**6. Soil metagenomics: pipelines and outputs**

Metagenomics was initiated with the aim of DNA cloning and screening, and now it has made significant advances in microbiology, evolution, and ecology [91, 92]. These first projects not merely proved the concept of the metagenomics but additionally unraveled enormous gene diversity within the microbial world. The various steps in soil metagenomics are enlisted below and shown in **Figure 2**.

#### **Figure 1.**

*Metagenomic analysis of environmental microbial sampling based on nucleic acids***.**


**7**

**Figure 2.**

**6.1 Sample processing**

Sampling is the first and crucial step. The extracted DNA must be of high quality

for metagenomic library construction and sequencing. Further, fractionation or selective lysis is ideal for those communities which are linked to the plethora.

*The layout of metagenomics showing collection of samples from agricultural field and analysis.*

*Soil Metagenomics: Prospects and Challenges DOI: http://dx.doi.org/10.5772/intechopen.93306*

> **platform/ amplicon analysis technique**

**Total sequencing size**

Pepper field Illumina system 4147 OTUs Spain Studying soil-

sequences

337,961 highquality reads and 647 fungal OTUs

Illumina system 111,3884

*Examples of soil amplicon sequencing done so far covering different habitat types.*

**Country Results References**

borne pathogens

proteobacteria and bacteroidetes

27 endophytic fungi and root hormone quantification

India Identified 27

Denmark Identification of

[105]

[106]

[107]

**Origin Sequencing** 

Tomato Illumina amplicon

sequencing analysis and phytohormone measurements

Solid waste dumping site, Chite river site, Turial river site, Tuikual river site

**Table 1.**

*Soil Metagenomics: Prospects and Challenges DOI: http://dx.doi.org/10.5772/intechopen.93306*


#### **Table 1.**

*Mycorrhizal Fungi - Utilization in Agriculture and Forestry*

**Total sequencing size**

Potato field Pyrosequencing 1674 OTUs USA Identification

sequences

sequences

>3,000,000 sequences

OTUs

sequences

reads

reads

Illumina system 1872 OTUs USA 56 different

Korea

Pyrosequencing 10,166 reads South

Pyrosequencing 598,962

Pea field Pyrosequencing 55,460

Deep Ion Torrent sequencing

Pyrosequencing 10,978

Illumina system 1,706,442

Illumina system 1,729,482

Illumina MiSeq 2,453,023

**Country Results References**

[95]

[96]

[97]

[98]

[99]

[100]

[101]

[102]

[103]

[104]

of potato soilborne pathogens

Wood decomposing, plant-parasitic, endophytic, ectomycorrhizal and saprotrophic

fungi

of 17 bacterial phyla and 4 proteobacterial classes

Germany Identification

Denmark Fungal species, diversity, community composition of phylum Ascomycota, and Basidiomycota

USA 12 fungal strains

Brazil 36 bacterial

Tanzania 31 bacterial

USA 45 decomposing microbes identification

> Comparative account of soil microorganisms of three different sites

UK, France, Italy

identification

phyla and five proteobacteria classes

bacterial phyla

phyla belonging to aromatic hydrocarbons degraders, chitin degraders, chlorophenol degraders and atrazine metabolizers

**Origin Sequencing** 

Soil of 3 islands in the Yellow Sea

6 sites of forest and grassland soils

Hitchiti *Pinus* forest, prior used as cotton cultivation

Sossego copper mine

Riverine Wetland soil

Solid biomedical dumpsites

Grave-soil human cadavers

Zea mays fields

**platform/ amplicon analysis technique**

**6**

*Examples of soil amplicon sequencing done so far covering different habitat types.*

#### **Figure 2.**

*The layout of metagenomics showing collection of samples from agricultural field and analysis.*

#### **6.1 Sample processing**

Sampling is the first and crucial step. The extracted DNA must be of high quality for metagenomic library construction and sequencing. Further, fractionation or selective lysis is ideal for those communities which are linked to the plethora.

Fractionation should be examined for adequate target enrichment with little contamination.

#### **6.2 Sequencing and assembly**

Metagenomic sequencing significantly depends upon the sequencing platforms used. Nowadays, NGS techniques viz. Illumina/Solexa systems, 454/Roche sequencing, and Oxford nanopore sequencing technologies are being continuously used for metagenomic projects. Contigs are essential in getting the whole length sequence. So, assembly of short reads becomes key in metagenomics which may be accomplished by co-assembly and de novo assembly methods. On the flip side, the de novo assembly needs sophisticated computational tools and assemblers (e.g. MetaVelvet, and Meta-IDBA).

#### **6.3 Binning and annotation**

Binning shows the process of sorting of DNA in several groups of individual genomes.

In the very first step, binning explores the conserved nucleotide composition of genomes. Then, the DNA fragments are searched against a reference to bin the sequence. The binning algorithms use structure and similarity, like MetaCluster and PhymmBL. If the goal of the analysis of the reconstructed genome and large contigs, in this particular strategy, little length of contigs should be 30,000 bp or even longer. In future prediction of the assembled sequences, labelling is done while functional annotation includes mapping with an existing database. The sequences which cannot be mapped provide an endless amount of novelty in metagenomic samples. Several reference databases can be utilized to supply functional annotation viz. TIGRFAM, KEGG, eggNOG, PFAM, etc.

#### **6.4 Statistical analysis and data sharing**

Statistical assessment of the metagenomic data is vital for the exploration of the significance of the results. However, it must have appropriate experimental designs with proper replications. Metagenomic data sharing involves a great computational framework as well as a storage facility. Several of the centralized services have typical formats for recording and documenting experimental details.

### **7. Future road map**

Robust extraction, as well as characterization of the DNA of soil microbiota through amplicon sequencing, has revolutionized the studies of ecology and environmental sciences. Essentially, the metagenomic evaluation of nucleic acids gives immediate access to the genomes of the uncultivated majority of underexploited microbial life. Accelerated by developments in sequencing technologies, microbiologists have found more novel species, genera, as well as genes from microorganisms. The unprecedented range of soil types continued exploration of a variety of agricultural and environmental features. The capacity to check out earth microbial communities with increasing capability has presumably the highest promise for answering numerous mysteries of the microbial world. Molecular methods, which include metagenomics, have revolutionized the analysis of microbial ecology. We cannot link virtually all microorganisms to their metabolic roles within an earth community. Increased sequencing capability provided by high throughput

**9**

**Figure 3.**

*A brief account of applications of metagenomics in different fields.*

*Soil Metagenomics: Prospects and Challenges DOI: http://dx.doi.org/10.5772/intechopen.93306*

metagenomics.

blend of strategies.

sequencing technologies has assisted characterize as well as quantify soil diversity. However, these methodologies are usually leveraged to process more samples at a reasonably shallow depth as compared to survey throughout the genomes from a single sample adequately. **Figure 3** describes the various application of

Along with higher diversity, methodological biases produce a considerable challenge for soil microbial characterization. These biases include soil sampling, DNA extraction, adsorption of nucleic acids to soil particles, contributions of extracellular DNA, sample planning, sequencing protocols, sequence analysis, and purposeful annotation. Since current sequencing technologies produce millions of reads, difficulties linked to interpreting these results can contribute to the problems encountered by microbial ecologists in determining the involvement of various microorganisms in the number of processes of soil. Without having a suitable benchmark methodology or dataset for verifying the fidelity of amplicon or perhaps metagenomic analyses, assessing whether the presence, as well as the activity of organisms, are adequately evaluated, is impossible. Furthermore, methodological limitations which might stop the detection of some active and abundant bacteria in soil could lead to the same essential amount of misinterpretation. No individual protocol would be seen as adequate in isolation of DNA. Likewise, the taxonomic and likely useful deciphering of the soil microbiota would critically gain from a

Exact replicates are challenging to obtain due to soil microorganism compositional changes. An additional challenge would be that the total number of species that are in a single sample of soil is unfamiliar, with hugely varying estimates. One crucial very first step toward dealing with several of the problems experienced by soil microbiologists is actually to start developing a substantial catalog of all microbial community members and features for no less than one reference soil. Such a relatively comprehensive reference dataset would shed light on the as-yet-unknown design of a ground microbial species frequency distribution and might serve as an ultimate guide for assessing town composition switches across soil landscapes (i.e., beta diversity). Put simply, the scope of bias with any private strategy (i.e., a onetime DNA extraction method) might be explicitly driven by comparing extraction

#### *Soil Metagenomics: Prospects and Challenges DOI: http://dx.doi.org/10.5772/intechopen.93306*

*Mycorrhizal Fungi - Utilization in Agriculture and Forestry*

contamination.

**6.2 Sequencing and assembly**

**6.3 Binning and annotation**

genomes.

blers (e.g. MetaVelvet, and Meta-IDBA).

viz. TIGRFAM, KEGG, eggNOG, PFAM, etc.

**6.4 Statistical analysis and data sharing**

**7. Future road map**

Fractionation should be examined for adequate target enrichment with little

Metagenomic sequencing significantly depends upon the sequencing platforms used. Nowadays, NGS techniques viz. Illumina/Solexa systems, 454/Roche sequencing, and Oxford nanopore sequencing technologies are being continuously used for metagenomic projects. Contigs are essential in getting the whole length sequence. So, assembly of short reads becomes key in metagenomics which may be accomplished by co-assembly and de novo assembly methods. On the flip side, the de novo assembly needs sophisticated computational tools and assem-

Binning shows the process of sorting of DNA in several groups of individual

In the very first step, binning explores the conserved nucleotide composition of genomes. Then, the DNA fragments are searched against a reference to bin the sequence. The binning algorithms use structure and similarity, like MetaCluster and PhymmBL. If the goal of the analysis of the reconstructed genome and large contigs, in this particular strategy, little length of contigs should be 30,000 bp or even longer. In future prediction of the assembled sequences, labelling is done while functional annotation includes mapping with an existing database. The sequences which cannot be mapped provide an endless amount of novelty in metagenomic samples. Several reference databases can be utilized to supply functional annotation

Statistical assessment of the metagenomic data is vital for the exploration of the significance of the results. However, it must have appropriate experimental designs with proper replications. Metagenomic data sharing involves a great computational framework as well as a storage facility. Several of the centralized services have typical formats for recording and documenting experimental details.

Robust extraction, as well as characterization of the DNA of soil microbiota through amplicon sequencing, has revolutionized the studies of ecology and environmental sciences. Essentially, the metagenomic evaluation of nucleic acids gives immediate access to the genomes of the uncultivated majority of underexploited microbial life. Accelerated by developments in sequencing technologies, microbiologists have found more novel species, genera, as well as genes from microorganisms. The unprecedented range of soil types continued exploration of a variety of agricultural and environmental features. The capacity to check out earth microbial communities with increasing capability has presumably the highest promise for answering numerous mysteries of the microbial world. Molecular methods, which include metagenomics, have revolutionized the analysis of microbial ecology. We cannot link virtually all microorganisms to their metabolic roles within an earth community. Increased sequencing capability provided by high throughput

**8**

sequencing technologies has assisted characterize as well as quantify soil diversity. However, these methodologies are usually leveraged to process more samples at a reasonably shallow depth as compared to survey throughout the genomes from a single sample adequately. **Figure 3** describes the various application of metagenomics.

Along with higher diversity, methodological biases produce a considerable challenge for soil microbial characterization. These biases include soil sampling, DNA extraction, adsorption of nucleic acids to soil particles, contributions of extracellular DNA, sample planning, sequencing protocols, sequence analysis, and purposeful annotation. Since current sequencing technologies produce millions of reads, difficulties linked to interpreting these results can contribute to the problems encountered by microbial ecologists in determining the involvement of various microorganisms in the number of processes of soil. Without having a suitable benchmark methodology or dataset for verifying the fidelity of amplicon or perhaps metagenomic analyses, assessing whether the presence, as well as the activity of organisms, are adequately evaluated, is impossible. Furthermore, methodological limitations which might stop the detection of some active and abundant bacteria in soil could lead to the same essential amount of misinterpretation. No individual protocol would be seen as adequate in isolation of DNA. Likewise, the taxonomic and likely useful deciphering of the soil microbiota would critically gain from a blend of strategies.

Exact replicates are challenging to obtain due to soil microorganism compositional changes. An additional challenge would be that the total number of species that are in a single sample of soil is unfamiliar, with hugely varying estimates. One crucial very first step toward dealing with several of the problems experienced by soil microbiologists is actually to start developing a substantial catalog of all microbial community members and features for no less than one reference soil. Such a relatively comprehensive reference dataset would shed light on the as-yet-unknown design of a ground microbial species frequency distribution and might serve as an ultimate guide for assessing town composition switches across soil landscapes (i.e., beta diversity). Put simply, the scope of bias with any private strategy (i.e., a onetime DNA extraction method) might be explicitly driven by comparing extraction

**Figure 3.** *A brief account of applications of metagenomics in different fields.*

strategies coupled with detailed characterization of the selected reference soil. For instance, the isolation, as well as characterization of cells via single-cell genomics, can assist target phylogenetically analysis. Coupled with extensive DNA based characterization of the collected guide soil microbial diversity, this specific research initiative should ideally assess several levels of gene expression, at the amount of RNA (metatranscriptomics), proteins (metaproteomics), and also metabolites (metametabolomics). By identifying the way a reference soil is structured, both temporally and spatially, the info from this coordinated effort might help supply missing links between typical soil analyses as well as the underlying composition of soil microbial communities.

An in-depth exploration of single guide soil should involve experiments much beyond the typical metagenomic analyses applied to soil samples. Instead, this effort is going to require considerable benchmarking of the sampling technique itself, which is connected to identifying a suitable resource website. Such an endeavor would call for a coordinated inter-disciplinary consortium of knowledge spanning chemistry, soil physics, biochemistry, microbiology, and bioinformatics. The outcomes of the effort can develop an objective foundation for creating standardized protocols for ongoing and future soil microbiological investigations.
