**3. Brief background on PLS**

is minimized. The learning rate, determining the rate (or speed) at which the weights change is a key parameter. This is because if the learning rate is too fast (and as a consequence the learning step is too large), the network may become unstable. Conversely, if the learning rate is too slow (due to use of a very small learning step), the network may take too long to converge, or it may get trapped at a local minimum. The challenge is to seek a balance between learning rate and convergence to the lowest possible minimum. To find the minimum, a **gradient descent** algorithm (i.e., finding the derivative, typically using the chain rule for derivatives) with **momentum** (m) was used. Specifically, a fraction m is added to the previous weights to update the current weight. This approach ignores small ridges in the error surface, thus reducing the possibility of being trapped at a local minimum. Momentum values ranging anywhere between 0 and 1 can be used. A **backpropagation** of errors algorithm is often used with gradient descent method applied and it was employed here. To generalize, backpropagation strives to find a set of weights that minimize the errors by gradient descent between the output of the network and the target output. The predictive ability of an ANN depends on the transfer function (**Figure 3b**), on the learning rule applied and on the network's architecture. The **architecture of the network** consists of the number of neurons in each layer, on the number of layers involved, on the transfer function, and on how the layers are connected to each other (and to the network's inputs). Sequential, **feed-forward architecture** is the most the most widely used [28–32]. In this architecture, each neuron is connected to previous neurons and its output becomes an input to the neighboring neurons. An example of spectral scans used for training purposes of a feed-

**Figure 3.** (a) Simplified illustration of a neuron and (b) of a linear transfer function. As shown above, the input to a neuron incorporates an adjustable input bias, *b*. The input *x* is multiplied by the strength of the weight (w), and the

weight is being adjusted during learning. The product *w*\**x* is passed through a transfer function (F).

232 Advanced Applications for Artificial Neural Networks

For validation of network performance (i.e., the ability to predict **A** (analyte) and **I** (Interferent) concentrations) when given an "unknown" scan (i.e., one in which the network is asked to predict the correct or expected value), one spectral scan at a time is fed into the network (bottom frame of **Figure 4**) and the network returns a "predicted" concentration for the Analyte (**A**) and for the Interferent (**I**). For validation (as is typical in this field of research), the performance of ANNs was compared with the performance of a typical chemometric method, such

forward network architecture is shown in **Figure 4**.

as partial least squares (PLS).

The key objective in PLS or PLS regression (as it is often called), is to develop a mathematical model relating spectral response (e.g., via spectral scans) in the concentration-range of interest for an interfered analyte. In this case, a calibration model is developed using one of the several PLS algorithms, in this work, a non-linear iterative partial least squares (NIPLS) algorithm was used. For more details on PLS, the review Geladi and Kowalski [33] is recommended. Additional references abound, for brevity only selected few are listed [34–40].

In PLS, an appropriate number of latent variables (or principal components) must be used, otherwise if a small number of variables is employed, the model becomes inadequate; if a large number is utilized both noise and signal are modeled. For comparison purposes, (where possible) the same spectral scans were used to evaluate the predictive ability of PLS and of ANNs. An example of spectral scans used with PLS is shown in **Figure 5**.

**Figure 5. PLS**-top frame: Example spectral scans. Bottom frame: Simplified diagram of the PLS algorithm. To facilitate meaningful comparisons, simulations and experimentally obtained spectral scans (where possible) were the same as those used for ANNs.
