**3. Metabolomics bioinformatics**

Information processing by bioinformatics tools and computational biology methods has become essential for solving complex biological problems in genomics, proteomics, and metabolomics. Understanding "omics" data requires both common statistical and computa‐ tional based methods due to the multi-dimensional and complexity level of the data.

Data-analytical methods for the study of biological systems as developed in the field of computational biology provide a suit of indispensable tools to survey the outcome of metab‐ olomics studies. First, computational biology allows a fast screening of the large biological and chemical data sets generated (Shulaev, 2006), and therefore the identification of the most relevant metabolites, i.e. compounds specifically representative of the metabolic changes in the model system following exposure to different concentrations of organic and inorganic toxicants. As a result of the large number of variables (metabolites) studied, metabolomics studies encompass a significant statistical power for the systematic detection of biological responses to environmental changes (van Ravenzwaay et al., 2012). Second, the mathematical models developed in computational biology allow the identification of relationships between the external stimuli and the metabolic response (Zhang et al., 2010). Third, the implementation of computational algorithms to structural biology makes possible to discover the structurefunction of new macromolecular compounds, the functional enzymatic conversion and changes in their activity, as well as their molecular interaction and relationship with others compounds in the pathways where they are involved (Jimenez-Lopez et al., 2013). Moreover, it is possible to detect patterns in such biological responses and establish significant doseresponse relationships. Besides, pattern recognition reduces the metabolomics data from hundreds of variables to two or three components that are orthogonal to each other. Overall, this advance of computational biology has been possible due to three significant technological breakthroughs: high-information-content data streams, novel bio-statistical methods, and the computational power to analyse these data.

Data processing and statistical analyses are commonly performed using multivariate (typically a principal component analysis (PCA) and (or) partial least squares (PLS) regression analysis) and univariate (t-test) analyses (Brown et al., 2010; Jones et al., 2014; McKelvie et al., 2011; Yuk et al., 2013). These analyses are performed in combination with the quantification and identification of the metabolites. Subsequently, biological interpretation of the data is neces‐ sary for understanding the link between the external stimulus and the metabolic response of the organisms.

Principal component analysis is the most widely used multivariate statistical approach in metabolomics, used to explain the overall variability in a data set via a a set of uncorrelated variables called principal components (PCs), which are linear combinations of the original variables (Trygg et al., 2006). The organization of samples in PCA scores plots is based on the similarities between their metabolic profiles. Thus, PCA allows for dimensional reduction of the data into a low dimensional plane, such as PC1 versus PC2. The scores plot (e.g., PC1 versus PC2) allows for a visual examination of the relationship between the samples based on their metabolic profiles. In a 1-D PCA loadings plot, the contribution (or weight) of each metabolite to the discrimination of the sample classes along one component is represented by the intensity of the metabolite peak. In the 2-D PCA loadings plot discrimination is performed by selecting the points that are scattered further away from the tight cluster of points found near the origin.

Other widely used multivariate statistical tools in metabolomics are PLS regression analysis and PLS discriminant analysis (PLS-DA). Both PLS-regression and PLS-DA are methods for samples classification, with pre-defined variables added to maximize the separation between the sample classes and to construct predictive models. The predefined variables for PLSregression are measurable quantities such as the contaminant exposure concentration. Validation methods such as the leave-one-out cross validation are used to test the robustness of the models generated by PLS-regression, PLS-DA, OPLS, and OPLS-DA (Whitfield Åslund et al., 2011).

Although metabolomics studies mostly use multivariate statistics, univariate statistical analyses can contribute to the information gained from a study. Thus, t-tests can be used to assess the significance of the separation between the controls and stressed organisms in PCA and PLS-DA scores plots. Also, t tests can be used to determine which metabolites in the 1 H NMR spectra of the treatment class increased or decreased significantly relative to the controls.
