3. Data preprocessing

spectroscopy include the extremely weak Raman signal and the presence of undesirable noise

A Raman spectrometer useful for in vivo measurements should be an integrated system that can provide real-time spectral acquisition and analysis [1]. A Raman system for in vivo measurements includes a light source, sample light delivery and collection, spectrograph with detector, and the computer interface. Lasers are the excitation source for Raman spectroscopy due to the fact they can provide sufficient power to the sample in order to detect Raman spectra in a reasonable integration time. However, it is necessary to consider important issues such as power, integration time, and wavelength of the laser to optimize the Raman system for in vivo biomedical applications. For example, to avoid tissue damage, the maximum permissible exposure (defined by ANSI) and temperature increase must be considered. Therefore, a correct laser power selection depends on achieving a good signal to noise and to minimize tissue damage. In biological tissue, the fluorophores can generate signals that mask or overwhelm the weak Raman signal, and to avoid fluorescence background, multiple approaches have been proposed including the excitation in the near infrared (NIR) [2]. It is known that most biological fluorophores have no peak emission in this region of the spectrum, which results in lower fluorescence background compared to visible or UV excitation. Due to these advantages, most of the Raman spectroscopy systems for skin diagnosis use a 785-nm diode laser as the excitation source, since it provides low-cost light source that generates low fluorescence and can penetrate deep into human tissue. In sample light delivery and collection, the most used method for clinical applications is optical fibers. The Raman fiber probe design varies depending on the clinical application. In the case of Raman spectroscopy of the skin, the probe consists of a single central delivery fiber surrounded by several collection fibers. The selection of a suitable detection system is an important issue for Raman spectroscopy. The

sources such as the intense fluorescence background present in biological samples.

Figure 1. Schematic of a typical Raman system for in vivo biomedical applications.

2. Instrumentation

294 Raman Spectroscopy

A big issue in biological Raman spectroscopy is the presence of undesirable background elements related to different sources such as intrinsic fluorescence, noise introduced by the equipment used, and the noise generated by external sources.

#### 3.1. Smoothing and denoising

The main sources of noise present in Raman spectra from biological samples are the shot noise, fluorescence background, flicker noise, dark current, and thermal noise. One alternative to reduce the thermal noise and dark signal is the use of a Raman system with high quality, thermoelectric cooled spectrometers. In Raman spectra, most of the time, the shot noise is the predominant noise associated with the particle nature of light. The approximate shot noise associated with measurement of n counts is n1/2. Thus the signal to noise ratio (S/N) can be improved incrementing the number of counts n. In other words, S/N can be improved by increasing averaging time due to the fact the signal increases proportionally with time. There are several multitude noise removal techniques that can be applied to Raman spectra. Smoothing is often employed for the removal of high-frequency components from Raman spectra, based on the fact that noise appears as high-frequency fluctuations, whereas signals are assumed to be low frequency. One smoothing technique is Fourier filtering [3]. In this technique, the higher frequency fluctuations, which are considered only noise, can be removed and the lower frequency ones can be used to reconstruct Raman spectra without noise. One drawback of this method is that the removal of the higher frequency noise may often introduce artifacts and distortion in Raman spectra. A commonly used smoothing technique is Savitzky-Golay (SG) filtering. The SG filter is a moving window–based local polynomial fitting procedure [4]. As the moving window size increases, some of the Raman bands may disappear. Therefore, it is very important to choose the appropriate parameters such as the polynomial order and the moving window size to avoid loss of Raman data. Other smoothing methods are locally weighted scatter plot smoothing (LOWESS) [5] and wavelet filtering [6] whereby the spectrum is decomposed using the discrete wavelet transform in order to isolate the noise by localizing it in space and frequency. Once it is isolated, it can be set to zero and the inverse wavelet transform is used to reconstruct the data. In all the mentioned methods, parameters have to be chosen carefully to avoid the important Raman bands being eliminated during smoothing.

#### 3.2. Background removal

As mentioned is the last section, one noise source in biological Raman spectra is the fluorescence background. This intrinsic fluorescence emission is several orders of magnitude greater than the Raman scattering intensity of biological tissues; therefore, fluorescence appears as a strong band that obscures Raman signals and must be removed in order to perform the analysis on the Raman spectra. Background elimination has been performed using two approaches: experimental and computational. The experimental methods are related to changes in the instrumentation and those include shifted excitation [7], photo bleaching [8], and time gating [9]. One drawback of these methods is the relatively complex instrumentation, the long acquisition times, and alterations in the sample that could make the analysis of biological samples difficult. On the other hand, background removing by using computational approaches has the advantages such as easy to implement, inexpensive, and fast. Such methods include polynomial fitting [10–12], Fourier transform [13], wavelet transform [13], first- and second-order differentiation [14], multiplicative signal correction [15], linear programming [16], geometric approach [17], asymmetric least squares [18], methods based on iterative reweighted quantile regression [19], iterative exponential smoothing [20], and morphology operators [21, 22]. However, the most used method is polynomial fitting due to simplicity. In this method, a polynomial is fitted and subsequently subtracted from the Raman spectrum to eliminate background effects. The selection of polynomial order is extremely important, because a higher order polynomial fitting may consider Raman bands as background and may be affected by high frequency noise. To solve this issue, some modified polynomial fitting methods were proposed. Figure 2 shows the Raman spectra of in vivo mouse skin tissue with and without fluorescence removal using the polynomial fitting method.

For example, the algorithm proposed by Zhao et al. [11] also known as the Vancouver Raman algorithm (VRA) is widely used for baseline correction in biomedical applications due to effectiveness and simplicity. The main advantage of this method is that it accounts for noise effects and Raman signal contribution.

> used as reference due to their intensities that are not significantly affected by other changes in the sample [23]. This method assumes the reference does not change from one spectrum to other and therefore is not suitable when the nature of the samples could lead to a shift in the

> Figure 2. In vivo Raman spectra of skin with fluorescence (top) and without fluorescence (bottom) using the polynomial

Raman Spectroscopy for In Vivo Medical Diagnosis http://dx.doi.org/10.5772/intechopen.72933 297

Chemometrics uses mathematical and statistical methods to provide chemical/physical information from chemical data or for the subject under consideration, spectroscopic data. In order to identify components in a sample, one possibility is to use individual bands, but this approach is not the best option because one band is not specific for a molecule, as many molecules have the a band in the same localization. A more precise identification is to use multiple bands or the complete spectrum. Such approach considers each point in a spectrum as a variable and spectroscopic data can be displayed as a matrix where columns represent the

band position.

fitting method.

4. Chemometrics

#### 3.3. Normalization

Raman spectra from the same sample could have different intensity levels if they were acquired at different times or under different experimental parameters such as changes in laser power levels. Normalization process deals with these differences in intensity levels by making that the intensity of a specific Raman band of the same material is the same or similar possible in all the spectra recorded under the same experimental parameters. One approach is the normalization to area. In this method, the intensity at each frequency in the spectrum is divided by the square root of the sum of the squares of all intensities. This normalization is useful when the spectra do not share a common band and it is better to normalize the spectra so that the total area under the spectrum is 1.0. This method has the advantage that is not dependent on any single band but one disadvantage is that the background can contribute to the normalization [1]. Another approach is the peak normalization, which uses intensity corresponding to the central frequency of a particular Raman band as reference (internal or external). The 1660 cm<sup>1</sup> (amide I) and the 1450 cm<sup>1</sup> band (CdH vibrations) are commonly

Figure 2. In vivo Raman spectra of skin with fluorescence (top) and without fluorescence (bottom) using the polynomial fitting method.

used as reference due to their intensities that are not significantly affected by other changes in the sample [23]. This method assumes the reference does not change from one spectrum to other and therefore is not suitable when the nature of the samples could lead to a shift in the band position.
