**4. Biomarkers**

purposes (Kammenga et al., 2000). Finally, the biological interpretation seeks the links between metabolome data and underlying metabolic networks through metabolite set enrichment, pathway analysis and metabolic network inference (Trygg et al., 2006). Thus, finding metab‐ olite relationships is essential to determine comprehensive and meaningful metabolic changes as biological response to environmental stimuli (Ellis et al., 2012; Morrison et al., 2007). Accordingly, such extensive evaluation of the impact of pollutants in the metabolism of target organisms is the approach that can add value to the assessment of soil health and viability of

Information processing by bioinformatics tools and computational biology methods has become essential for solving complex biological problems in genomics, proteomics, and metabolomics. Understanding "omics" data requires both common statistical and computa‐

Data-analytical methods for the study of biological systems as developed in the field of computational biology provide a suit of indispensable tools to survey the outcome of metab‐ olomics studies. First, computational biology allows a fast screening of the large biological and chemical data sets generated (Shulaev, 2006), and therefore the identification of the most relevant metabolites, i.e. compounds specifically representative of the metabolic changes in the model system following exposure to different concentrations of organic and inorganic toxicants. As a result of the large number of variables (metabolites) studied, metabolomics studies encompass a significant statistical power for the systematic detection of biological responses to environmental changes (van Ravenzwaay et al., 2012). Second, the mathematical models developed in computational biology allow the identification of relationships between the external stimuli and the metabolic response (Zhang et al., 2010). Third, the implementation of computational algorithms to structural biology makes possible to discover the structurefunction of new macromolecular compounds, the functional enzymatic conversion and changes in their activity, as well as their molecular interaction and relationship with others compounds in the pathways where they are involved (Jimenez-Lopez et al., 2013). Moreover, it is possible to detect patterns in such biological responses and establish significant doseresponse relationships. Besides, pattern recognition reduces the metabolomics data from hundreds of variables to two or three components that are orthogonal to each other. Overall, this advance of computational biology has been possible due to three significant technological breakthroughs: high-information-content data streams, novel bio-statistical methods, and the

Data processing and statistical analyses are commonly performed using multivariate (typically a principal component analysis (PCA) and (or) partial least squares (PLS) regression analysis) and univariate (t-test) analyses (Brown et al., 2010; Jones et al., 2014; McKelvie et al., 2011; Yuk et al., 2013). These analyses are performed in combination with the quantification and identification of the metabolites. Subsequently, biological interpretation of the data is neces‐

tional based methods due to the multi-dimensional and complexity level of the data.

soil organisms undergoing stress from pollution.

**3. Metabolomics bioinformatics**

466 Environmental Risk Assessment of Soil Contamination

computational power to analyse these data.

The somewhat secondary significance of biological responses for soil contamination assess‐ ment was customarily associated to the limitation of biomarkers as measurable responses to contaminants, which classically could only provide an indication of exposure to contaminants in soil (Sanchez-Hernandez, 2006). The development of metabolomics, considered an "emerg‐ ing field" as late as mid-2010, has provided the tools for the determination of multiple biomarkers across different levels of biological organization, and therefore a better assessment of the ecological consequences of contamination. Since the creation of the first metabolomics web database, METLIN (Smith et al., 2005), 60,000 metabolites has been incorporated, a rapid development closely related to the evolution of mass spectrometry instrumentation and data analysis tools. Currently, the number of databases and metabolites registered is continuously increasing. Table 2 summarizes some of the most relevant databases operative and the corresponding website is also indicated. Further information on metabolomics databases can be obtained from the metabolomics society (http://www.metabolomicssociety.org). For instance, ChemSpider is an aggregated database of organic molecules containing more than 20 million compounds from many different providers. At present the database contains information from such diverse sources as a marine natural products database, ACD-Labs chemical databases, the EPA's DSSTox databases and from a series of chemical vendors. It has extensive search utilities and most compounds have a large number of calculated physicochemical property values.

One of the goals in bioinformatics is to establish automated and efficient ways to integrate large, biological datasets from multiple sources. This objective is challenging because data sources are heterogeneous in terms of their functions, structures, data access methods and dissemination formats. In addition, the enormous quantity of information produced by "omics" is handled via computers that systematically analyze and store the accumulating sequence, structure and function data. Databases are essential in metabolomics because they provide a rapid and specific tool to identify the compounds isolated from an organism exposed to a particular environmental challenge. Thus, the KNApSAcK package provides tool for analysing datasets of mass spectra as well as for retrieving information on metabolites by entering the name of a metabolite, the name of an organism, molecular weight or molecular formula. A list of metabolites that are associated to a taxonomic class can be obtained by search with the taxonomic name, from which information of individual metabolites can be retrieved. The NIST Chemistry WebBook provides access to chemical and physical property data for chemical species. The data provided in the site are from collections maintained by the NIST Standard Reference Data Program and outside contributors. Data in the NIST Chemistry WebBook can be found by direct searches for chemical species or indirect searches based on related data. Specific databases are also being developed, such as LIPID MAPS, currently the largest database of lipid molecular structures. Otherwise, SetupX combines mass spectrometric and biological metadata, which is a step forward in the organization of information generated by metabolomics analysis.


**Table 2** Selected metabolomic databases.

Metabolomic databases are thus accompanied by accurate description of the biological study design and accompanying metadata reporting on the laboratory workflow from sample preparation to data processing.

be obtained from the metabolomics society (http://www.metabolomicssociety.org). For instance, ChemSpider is an aggregated database of organic molecules containing more than 20 million compounds from many different providers. At present the database contains information from such diverse sources as a marine natural products database, ACD-Labs chemical databases, the EPA's DSSTox databases and from a series of chemical vendors. It has extensive search utilities and most compounds have a large number of calculated physico-

One of the goals in bioinformatics is to establish automated and efficient ways to integrate large, biological datasets from multiple sources. This objective is challenging because data sources are heterogeneous in terms of their functions, structures, data access methods and dissemination formats. In addition, the enormous quantity of information produced by "omics" is handled via computers that systematically analyze and store the accumulating sequence, structure and function data. Databases are essential in metabolomics because they provide a rapid and specific tool to identify the compounds isolated from an organism exposed to a particular environmental challenge. Thus, the KNApSAcK package provides tool for analysing datasets of mass spectra as well as for retrieving information on metabolites by entering the name of a metabolite, the name of an organism, molecular weight or molecular formula. A list of metabolites that are associated to a taxonomic class can be obtained by search with the taxonomic name, from which information of individual metabolites can be retrieved. The NIST Chemistry WebBook provides access to chemical and physical property data for chemical species. The data provided in the site are from collections maintained by the NIST Standard Reference Data Program and outside contributors. Data in the NIST Chemistry WebBook can be found by direct searches for chemical species or indirect searches based on related data. Specific databases are also being developed, such as LIPID MAPS, currently the largest database of lipid molecular structures. Otherwise, SetupX combines mass spectrometric and biological metadata, which is a step forward in the organization of information generated

chemical property values.

468 Environmental Risk Assessment of Soil Contamination

by metabolomics analysis.

**Table 2** Selected metabolomic databases.

METLIN http://metlin.scripps.edu/index.php

KEGG http://www.genome.jp/kegg/pathway.html

SetupX http://fiehnlab.ucdavis.edu/projects/binbase\_setupx

IIMDB http://metabolomics.pharm.uconn.edu/iimdb/

LIPID MAPS http://www.lipidmaps.org/

ChemSpider http://www.chemspider.com/

MassBank http://www.massbank.jp/ HMP http://www.hmdb.ca/

KNApSAcK http://kanaya.naist.jp/KNApSAcK/ NIST http://webbook.nist.gov/chemistry/ Currently, standard analyses focus on the determination of amino acids, mono- and disac‐ charides, lipids/fatty acids, short chain fatty acids and small phenolics. Accordingly, it is possible to already launch the standardization of metabolomics analysis. For instance, the Northwest Metabolomics Research Center (University of Washington) has established a relevant list of target compounds to evaluate biological responses to changes in the environ‐ ment. The list of compounds is summarized in Table 3.


**Table 3** Summary of metabolites and metabolic pathways representative of biological responses to environmental stimuli.

The information of metabolites and metabolic pathways has been obtained from the website of Kyoto Encyclopedia of Genes and Genomes (Kegg, http://www.genome.jp/kegg/). Accord‐ ing to the research results summarized in Table 1, the implementation of metabolomics in the assessment of soil contamination indicates that contaminants in soil affect several of the major metabolic pathways in living organisms (Table 3), including glycolysis, trycarboxylic acids cycle and amino acids metabolism. Moreover, data analysis indicates an overall reduction in the production of the associated metabolites. For instance, the interference in amino acids specialized pathways results in a decreased synthesis of purine and pyrimidine nucleotides (Brown et al., 2010; McKelvie et al., 2011). These nucleotides are essential for the production of the energy (ATP molecules) that drive most of the enzymatic reactions in living organisms, but also protein synthesis is consequently hampered, which explain the negative effect in processes such as antioxidant activity.

Another emerging group of biomarkers, as highlighted in several studies, are lipids (Rochfort et al., 2009; Sanchez-Hernandez, 2006). Rochfort et al., (2009) indicate that lipophilic extracts can be used in field based metabolomics experiments to investigate different treatment effects on earthworms. Lipid metabolism is highly sensitive to environmental contaminants (Vega-López et al., 2013), with increasing production of lipoprotein vesicle and lipid peroxidation rate during early stages of the biological response to the presence of a toxicant (Lankadurai et al., 2011). Relatedly, earthworm esterases has been proposed as biomarkers for pesticide contamination in soil (Sanchez-Hernandez, 2010). Esterases are directly involved in the natural tolerance of earthworms to pesticides, and can therefore be used as specific biomarkers, but furthermore, their characterization by metabolomics approach might help to select the appropriate earthworm species for regulatory toxicity testing. Overall, the increasing specif‐ icity of the research performed in ecotoxigenomics will allow a realistic and meaningful incorporation of biological responses in ecological risk assessment.
