**2. Materials and methods**

#### **2.1 Cyanobacterial strains and cultivation conditions**

All work on preparing cyanobacteria cultures for this research was carried out at the Core Facility Center "Centre for Culture Collection of Microorganisms" of the Research Park of St. Petersburg State University. In the CALU collection [25] at the core facility center, cyanobacterial strains were maintained in semiliquid agar (0.8%) medium no. 6 after Gromov [26] in test tubes of volume 5–6 mL under cotton plugs. The strains were stored at 14°C under a constant illumination of 2000 lux and were recultivated with a periodicity of 2–3 months.

Cyanobacteria used in this investigation were grown on liquid medium no. 6. A stock culture was preliminarily prepared, for which it was cultivated in 30 mL of medium and incubated for 2 weeks at room temperature under continuous illumination from fluorescent lamps. To maintain a constant volume, 5 mL of medium were added to the stock culture every 2 weeks. All experiments in this study were conducted with cultures presumably in the logarithmic phase of their growth.

In this work, 23 cyanobacterial strains from CALU collection were used:

1.*Anabaena variabilis* Kutz. CALU 824, ponds near Old Petergof, Saint Petersburg, Russia.

*Microalgae - From Physiology to Application*

Self-fluorescence originates from excited states that were lost before photochemistry took place and usually represents a small fraction of the excited state decay in a functional photosynthetic complex. Nevertheless, this small fraction can be easily detected by confocal laser scanning microscopy (CLSM). With the confocal fluorescence microscopy, a very small excitation and detection areas can be investigated, so that single cells under non-damage conditions can be studied in vivo. Single-cell detection can provide the information on small peculiarities that is regularly buried in normal ensemble average experiments. This is thus a good way to study the time evolution process and spectroscopic properties of individual cells. Both steady-state and time-resolved fluorescence measurements can be used for probing the organi-

zation and functioning of photosynthetic systems by means of CLSM.

and their pigment compositions have been made [7, 12, 13].

Till now the best taxonomic differentiation is still obtained using classical inverted microscopy. Unfortunately, this method is time-consuming, human based, and requires appropriate technical skills; this eliminates the possibility of its application for continuous online monitoring. Nearly single-cell flow cytometric analysis, based on light scattering by the cells and fluorescence of the chlorophylls and the phycobilins, can be easily automated, but it is appropriate only for unicellular species and is useless for numerous industrially cultured filamentous strains [7, 8]. The main problem of all chemical methods (e.g., high-performance liquid chromatography (HPLC) [9, 10]) is that during the chemical sample preparation, the most of the information about the peculiarities of individual species is lost and the residual part of the information is not enough for species/strain discrimination inside cyanobacterial genera and is suitable only for the rude differentiation of big classes of phytoplankton. Thus, the analysis of the in vivo fluorescence spectra is the only one noninvasive technique for obtaining qualitative information about the phytoplankton abundance and composition, which is continuously demonstrated by various publications [10–17]. The relative phytoplankton abundance can be calculated once initial assumptions about the phytoplankton classes are presented

Maybe the first attempt to use phycoerythrins as chemotaxonomic markers was done by Glazer et al. [18] for red algae in 1982, but until now fluorescence spectra of phycobilins do not appear to be useful at familial, ordinal, and class levels in taxonomic studies. Although the investigation in [18] concerns only purified highmolecular-weight phycoerythrin from red algae, this work clearly demonstrates the possibility of the correct taxonomic analysis on the basis of phycobiliproteins structural differences, which can serve as intrinsical fingerprints for taxons and genera in phytoplankton diversity. Later, the correlation between the distribution of the biliproteins and the genera of Cryptophyceae was discussed in [19]. In 1985, Yentsch and Phinney [20] proposed an ataxonomic technique that utilized the spectral fluorescence signatures of major ocean phytoplankton. Seppälä [16] used spectral fluorescence signals to detect changes in the phytoplankton community. In 2002, Beutler et al. reported a reduced model of the fluorescence from the cyanobacterial photosynthetic apparatus designed for the in situ detection of cyanobacteria and presented a commercially available diveable instrument for online monitoring of

However, the correct classification of cyanobacterial species on the basis of their

bulk fluorescence signature is hampered by alterations in pigment composition within one strain, which depends on the physiological state of the culture (community) and environmental conditions [21]. On the other hand, several researchers show that the nutrient and light limitations do not significantly change the initial fluorescence spectra and cannot impede the species discrimination [17, 22]. Recent rapid development of confocal microscope functionality initiates new directions in subcellular biology research [23, 24]. Confocal laser scanning

**4**

phytoplankton structure [11].


Fluorescent and corresponding transmission photomicrographs, obtained via CLSM, for several strains from CALU collection are presented in **Figure 1**. In further illustrations only the CALU numbers for corresponding strains will be used for the clarity of the narration.

**7**

**Figure 1.**

*Self-Fluorescence of Photosynthetic System: A Powerful Tool for Investigation of Microalgal…*

*DOI: http://dx.doi.org/10.5772/intechopen.88785*

**2.2 Confocal laser scanning microscopy**

*The white bar indicates the object scale.*

Confocal laser scanning microscopes are distinguished by their high spatial and temporal resolution [23, 24]. Modern laser scanning microscopes are unique tools for visualizing cellular structures and analyzing dynamic processes inside single cells. They exceed classical light microscopes especially in their axial resolution, which enables to acquire optical sections (slices) of a specimen. Apart from simple imaging, confocal laser scanning microscopes are designed for the quantification and analysis of image-coded information. Among other things, they allow easy determination of fluorescence intensities, distances, areas, and their changes over time. New acquisition CLSM tools include the detection of quantitative properties of the emitted light such as spectral signatures and fluorescence lifetimes. The most impressive feature of modern CLSMs is their capability for single-cell microscopic spectroscopy, which allows to obtain spectroscopic information inside single cells and small regions.

*CLSM fluorescent and transmission photomicrographs for eight cyanobacterial strains from CALU collection.* 

In the present investigation, Leica TCS-SP5 was used for the investigation of living cyanobacterial cells. Fluorescence emission spectra of the intact cells were measured at eight excitation wavelengths corresponding to all available laser lines. The excitation wavelengths are 458, 476, 488, 496, and 514 nm are the lines of Ar laser,

*Self-Fluorescence of Photosynthetic System: A Powerful Tool for Investigation of Microalgal… DOI: http://dx.doi.org/10.5772/intechopen.88785*

**Figure 1.**

*Microalgae - From Physiology to Application*

Cuba.

Canal, Turkmenistan.

2.*Arthrospira* sp. CALU 1712, Gulf of Finland, Saint Petersburg, Russia.

5.*Leptolyngbya* sp. CALU 1713, river Tikhaya, Saint Petersburg, Russia.

6.*Leptolyngbya* sp. CALU 1715, Gulf of Finland, Saint Petersburg, Russia.

7.*Leptolyngbya* CALU 1750 sp., Lake Tarasovskoe, Saint Petersburg, Russia.

9.Merismopedia sp. CALU 666 punctata Meyen f., Pinar del Rio, Rio de Soroa,

10.Microcystis firma sp. CALU 398 (Breb. et Lenorm) Schmidle, Turkmenbashi

13.*Nostoc* sp. CALU 1817, springs on the Island Big Solovetsky, White Sea, Russia.

16.*Phormidium favosum* CALU 624, brook Ammersbek, Hamburg, Germany.

20. *Synechococcus* sp. CALU 535, ponds near Old Petergof, Saint Petersburg, Russia.

17.*Plectonema* sp. CALU 457, pond in Strelna, Saint Petersburg, Russia.

18.*Pleurocapsa* sp. CALU 1126, Lake Ladoga, Saint Petersburg, Russia.

19. *Spirulina platensis* sp. CALU 550 (Nordst.), Czech Republic.

22. *Synechococcus* sp. CALU 1409, ponds in Vorkuta region, Russia.

23. *Synechocystis* sp. CALU 1336 *aquatilis*, Gulf of Finland, Saint Petersburg,

Fluorescent and corresponding transmission photomicrographs, obtained via CLSM, for several strains from CALU collection are presented in **Figure 1**. In further illustrations only the CALU numbers for corresponding strains will be used

21. *Synechococcus* sp. CALU 756, Czech Republic.

4.*Geitlerinema* sp. CALU 1718 Lake Kamenetz, Pskov, Russia.

8.*Lyngbya* sp. CALU 1804, Lake Valdai, Novgorod, Russia.

11.*Myxosarcina chroococcoides* Geitl. sp. CALU 601, Russia.

12.*Nostoc* sp. CALU 1763, Lake Ladoga, Saint Petersburg, Russia.

14.*Oscillatoria* sp. CALU 1415, ponds in Vorkuta region, Russia.

15.*Oscillatoria* sp. CALU 1416, ponds in Vorkuta region, Russia

3.*Geitlerinema* sp. CALU 1315, Lake Kyzyl-Tash, Ozersk, Chelyabinsk, Russia.

**6**

Russia.

for the clarity of the narration.

*CLSM fluorescent and transmission photomicrographs for eight cyanobacterial strains from CALU collection. The white bar indicates the object scale.*

#### **2.2 Confocal laser scanning microscopy**

Confocal laser scanning microscopes are distinguished by their high spatial and temporal resolution [23, 24]. Modern laser scanning microscopes are unique tools for visualizing cellular structures and analyzing dynamic processes inside single cells. They exceed classical light microscopes especially in their axial resolution, which enables to acquire optical sections (slices) of a specimen. Apart from simple imaging, confocal laser scanning microscopes are designed for the quantification and analysis of image-coded information. Among other things, they allow easy determination of fluorescence intensities, distances, areas, and their changes over time. New acquisition CLSM tools include the detection of quantitative properties of the emitted light such as spectral signatures and fluorescence lifetimes. The most impressive feature of modern CLSMs is their capability for single-cell microscopic spectroscopy, which allows to obtain spectroscopic information inside single cells and small regions.

In the present investigation, Leica TCS-SP5 was used for the investigation of living cyanobacterial cells. Fluorescence emission spectra of the intact cells were measured at eight excitation wavelengths corresponding to all available laser lines. The excitation wavelengths are 458, 476, 488, 496, and 514 nm are the lines of Ar laser,

405 nm is the line of diode UV laser, and 543 and 633 nm are the lines of HeNe laser. In all presented experiments, laser power settings are as follows: 29% of Ar laser power was reflected onto sample with acousto-optical tunable filter (AOTF), and further power percentage for its laser lines was 30% of 458 nm laser line and 10% for all other lines. 405 nm line of diode UV laser was reflected onto sample with 3%; HeNe laser lines 543 and 633 nm were reflected with 10 and 2%, respectively. An acousto-optical beam splitter (AOBS) was used to transmit sample fluorescence to the detector. Emission spectra between 520 and 785 nm were recorded using the lambda scan function of the "Leica Confocal Software" by sequentially acquiring a series ('stack') of 38–45 images, each with a 6 nm fluorescence detection bandwidth and with 6 nm wavelength step. For obtaining fluorescence-intensity information, images of 512 × 512 pixels were collected with a 63× Glycerol immersion lens (Glycerol 80% H2O) with a numeric aperture of 1.3 (objective HCX PL APO 63.0 × 1.30 GLYC 37°C UV) and with additional digital zoom factor 5–9 (depending on a cyanobacterial strain). One pixel corresponds to 53.5 × 53.5 nm. The photomultiplier (PMT) voltages were used in range from 900 to 1100 V. The fluorescence emission images were accompanied with the transmission images (in the parallel channel), collected by a transmission detector with the photomultiplier voltages ranged from 300 to 500 V. For better signal yield, lambda scans were performed with "low speed" setting (400 Hz) in bidirectional scan mode and with a pinhole setting of 1 Airy unit (the inner light circle of the diffraction pattern of a point light source corresponds to a diameter of 102.9 μm with the lens used (see [23]). Regions of interest (ROIs) representing single cells or subcellular regions were used to calculate fluorescence spectra.

For 2D imaging, to raise the sensitivity and contrast, images were recorded at 405 nm excitation wavelength (diod UV laser) and by Leica HyD hybrid detector, which strongly improves contrast in comparison to PMTs. HyD gain was taken as 100 V. The images of 1024 × 1024 and 2048 × 2048 pixels were collected with a 63× Glycerol immersion lens (Glycerol 80% H2O) with a numeric aperture of 1.3 (objective HCX PL APO 63.0 × 1.30 GLYC 37°C UV) and with additional digital zoom factor 10–35. The fluorescence emission images were accompanied with the transmission images (in the parallel channel). The images were recorded with a pinhole setting of 1 Airy unit.

In CLSM applications, the laser light density in the focus point is high. But, generally, it is deposited in short "dwell times" during the laser scanning process. Dwell time and the intervals between the illuminations may influence photodamage and saturation of photosynthetic apparatus of living cells. Thus, since most chromophores bleach under the high laser excitation energies, a bleach test should be performed [27]. It was shown experimentally that especially phycoerythrin (PE) and phycocyanin (PC), as accessory pigments, were very sensitive to photobleaching, while the fluorescence of chlorophyll a (Chl a) and allophycocyanin (APC) remained stable in the intact living cells [27]. During the detection the fluorescence of the main accessory pigments for each cyanobacterial strain should be controlled and the changes in their fluorescence should not exceed 10–20%. In this investigation the power of individual laser lines was chosen according to the photodamage they cause. The repeated spectra were obtained under selected excitation power at a fixed point in a cell to check whether the excitation would affect the cells. It was shown that at the above chosen excitation energies (laser line percentage) the fluorescence spectra did not vary within the experimental error during 10–15 records. When excitation energy was increased, both the height and the center of the bands varied enormously with time because of photodamage or structure breakdown in photosynthetic systems. In the experiments, where several laser lines were involved for the investigation, the first spectrum was recorded

**9**

Zhangirov et al. [28].

*2.3.2 Linear discriminant analysis*

*Self-Fluorescence of Photosynthetic System: A Powerful Tool for Investigation of Microalgal…*

again at the end of each series to control the initial state of the cell. It should be pointed out that the whole procedure of fluorescence spectra recording, used in this study, was designed to minimize preparatory manipulation, so as to conduct a noninvasive investigation of small amounts of experimental material and to

To exclude unpredictable variations in physiological state of investigated cultures, the fluorescence spectra were taken from the cells of one strain several times, at different days and for various developmental stages of the culture. And it was established that the variations in spectrum shape and intensity among cells of

The main difficulty of the considered discrimination problem resulted from the high nonuniformity of the initial data and different numbers of observation for different strains. A small size of initial dataset as well as the sophisticated nature of the experimental data required a complex preprocessing procedure. The original experimental data represents 307 sets of self-fluorescence spectra obtained from cyanobacterial cells, belonging to 23 different strains. Each observation from a data set is described by a series of seven spectra taken from a single cell by means of CLSM. Each initial spectrum is an array of 38–45 numbers, which correspond to the fluorescence intensities on specific emission frequencies of visible light in the range from 520 to 785 nm. In contrast to the previous investigations, which utilized for classification a full spectrum of the samples [12–14], we used a set of integral and statistical characteristics, describing the shape of each spectrum. To extract a set of classification parameters from initial data, a computer program has been developed in a mathematics system MATLAB. By means of this program, normalization, interpolation, extrapolation, and smoothing of the raw spectra were carried out, to eliminate the random noise and metering fluctuations. All spectra were reduced to the same scale and size of data array, the first derivative was taken over initial spectra, and the fast Fourier transform (FFT) was performed, to exclude random noise, owing to the low intensity of the exciting and emitting light. The specific values characterizing the shape of obtained curves and the spectral composition of their derivatives were calculated. All selected classification parameters can be divided into three groups: asymmetry and excess, fluorescence emission percentage for individual pigments in four main spectral regions (phycoerythrin, 573–586 nm; phycocyanin and allophycocyanin, 649–661 nm; chlorophyll a PSII, 674–689 nm; chlorophyll a PSII, 711–727 nm), and the frequency characteristics of the corresponding first-derivative Fourier transforms for each plot (mean values in three specified frequency domains: 43–58 μm-1, 95–110 μm-1, 123–135 μm-1). The detailed description of the extraction of classification parameters is given in

Linear discriminant analysis (LDA) is well-known and often applied in biology for various classification problems [15, 17, 29, 30]. Linear discriminant analysis (LDA) is a statistical technique for classifying samples into two or more groups (classes) [31, 32]. It utilizes linear combinations of independent variables to form a basis for a classification scheme. In our case, the independent variables are 63 classification parameters extracted from each set of single-cell self-fluorescent spectra.

*DOI: http://dx.doi.org/10.5772/intechopen.88785*

prevent any damage of living cells.

one strain are not considerable.

**2.3 Data processing**

*2.3.1 Data preprocessing*

*Self-Fluorescence of Photosynthetic System: A Powerful Tool for Investigation of Microalgal… DOI: http://dx.doi.org/10.5772/intechopen.88785*

again at the end of each series to control the initial state of the cell. It should be pointed out that the whole procedure of fluorescence spectra recording, used in this study, was designed to minimize preparatory manipulation, so as to conduct a noninvasive investigation of small amounts of experimental material and to prevent any damage of living cells.

To exclude unpredictable variations in physiological state of investigated cultures, the fluorescence spectra were taken from the cells of one strain several times, at different days and for various developmental stages of the culture. And it was established that the variations in spectrum shape and intensity among cells of one strain are not considerable.

#### **2.3 Data processing**

*Microalgae - From Physiology to Application*

calculate fluorescence spectra.

pinhole setting of 1 Airy unit.

405 nm is the line of diode UV laser, and 543 and 633 nm are the lines of HeNe laser. In all presented experiments, laser power settings are as follows: 29% of Ar laser power was reflected onto sample with acousto-optical tunable filter (AOTF), and further power percentage for its laser lines was 30% of 458 nm laser line and 10% for all other lines. 405 nm line of diode UV laser was reflected onto sample with 3%; HeNe laser lines 543 and 633 nm were reflected with 10 and 2%, respectively. An acousto-optical beam splitter (AOBS) was used to transmit sample fluorescence to the detector. Emission spectra between 520 and 785 nm were recorded using the lambda scan function of the "Leica Confocal Software" by sequentially acquiring a series ('stack') of 38–45 images, each with a 6 nm fluorescence detection bandwidth and with 6 nm wavelength step. For obtaining fluorescence-intensity information, images of 512 × 512 pixels were collected with a 63× Glycerol immersion lens (Glycerol 80% H2O) with a numeric aperture of 1.3 (objective HCX PL APO 63.0 × 1.30 GLYC 37°C UV) and with additional digital zoom factor 5–9 (depending on a cyanobacterial strain). One pixel corresponds to 53.5 × 53.5 nm. The photomultiplier (PMT) voltages were used in range from 900 to 1100 V. The fluorescence emission images were accompanied with the transmission images (in the parallel channel), collected by a transmission detector with the photomultiplier voltages ranged from 300 to 500 V. For better signal yield, lambda scans were performed with "low speed" setting (400 Hz) in bidirectional scan mode and with a pinhole setting of 1 Airy unit (the inner light circle of the diffraction pattern of a point light source corresponds to a diameter of 102.9 μm with the lens used (see [23]). Regions of interest (ROIs) representing single cells or subcellular regions were used to

For 2D imaging, to raise the sensitivity and contrast, images were recorded at 405 nm excitation wavelength (diod UV laser) and by Leica HyD hybrid detector, which strongly improves contrast in comparison to PMTs. HyD gain was taken as 100 V. The images of 1024 × 1024 and 2048 × 2048 pixels were collected with a 63× Glycerol immersion lens (Glycerol 80% H2O) with a numeric aperture of 1.3 (objective HCX PL APO 63.0 × 1.30 GLYC 37°C UV) and with additional digital zoom factor 10–35. The fluorescence emission images were accompanied with the transmission images (in the parallel channel). The images were recorded with a

In CLSM applications, the laser light density in the focus point is high. But, generally, it is deposited in short "dwell times" during the laser scanning process. Dwell time and the intervals between the illuminations may influence photodamage and saturation of photosynthetic apparatus of living cells. Thus, since most chromophores bleach under the high laser excitation energies, a bleach test should be performed [27]. It was shown experimentally that especially phycoerythrin (PE) and phycocyanin (PC), as accessory pigments, were very sensitive to photobleaching, while the fluorescence of chlorophyll a (Chl a) and allophycocyanin (APC) remained stable in the intact living cells [27]. During the detection the fluorescence of the main accessory pigments for each cyanobacterial strain should be controlled and the changes in their fluorescence should not exceed 10–20%. In this investigation the power of individual laser lines was chosen according to the photodamage they cause. The repeated spectra were obtained under selected excitation power at a fixed point in a cell to check whether the excitation would affect the cells. It was shown that at the above chosen excitation energies (laser line percentage) the fluorescence spectra did not vary within the experimental error during 10–15 records. When excitation energy was increased, both the height and the center of the bands varied enormously with time because of photodamage or structure breakdown in photosynthetic systems. In the experiments, where several laser lines were involved for the investigation, the first spectrum was recorded

**8**

#### *2.3.1 Data preprocessing*

The main difficulty of the considered discrimination problem resulted from the high nonuniformity of the initial data and different numbers of observation for different strains. A small size of initial dataset as well as the sophisticated nature of the experimental data required a complex preprocessing procedure. The original experimental data represents 307 sets of self-fluorescence spectra obtained from cyanobacterial cells, belonging to 23 different strains. Each observation from a data set is described by a series of seven spectra taken from a single cell by means of CLSM. Each initial spectrum is an array of 38–45 numbers, which correspond to the fluorescence intensities on specific emission frequencies of visible light in the range from 520 to 785 nm. In contrast to the previous investigations, which utilized for classification a full spectrum of the samples [12–14], we used a set of integral and statistical characteristics, describing the shape of each spectrum. To extract a set of classification parameters from initial data, a computer program has been developed in a mathematics system MATLAB. By means of this program, normalization, interpolation, extrapolation, and smoothing of the raw spectra were carried out, to eliminate the random noise and metering fluctuations. All spectra were reduced to the same scale and size of data array, the first derivative was taken over initial spectra, and the fast Fourier transform (FFT) was performed, to exclude random noise, owing to the low intensity of the exciting and emitting light. The specific values characterizing the shape of obtained curves and the spectral composition of their derivatives were calculated. All selected classification parameters can be divided into three groups: asymmetry and excess, fluorescence emission percentage for individual pigments in four main spectral regions (phycoerythrin, 573–586 nm; phycocyanin and allophycocyanin, 649–661 nm; chlorophyll a PSII, 674–689 nm; chlorophyll a PSII, 711–727 nm), and the frequency characteristics of the corresponding first-derivative Fourier transforms for each plot (mean values in three specified frequency domains: 43–58 μm-1, 95–110 μm-1, 123–135 μm-1). The detailed description of the extraction of classification parameters is given in Zhangirov et al. [28].

#### *2.3.2 Linear discriminant analysis*

Linear discriminant analysis (LDA) is well-known and often applied in biology for various classification problems [15, 17, 29, 30]. Linear discriminant analysis (LDA) is a statistical technique for classifying samples into two or more groups (classes) [31, 32]. It utilizes linear combinations of independent variables to form a basis for a classification scheme. In our case, the independent variables are 63 classification parameters extracted from each set of single-cell self-fluorescent spectra.

LDA builds n linear discriminant functions, where n is a number of classes and a row vector with a number of parameters describes each observation. The decision of the sample belonging to the class is based on the selecting of the maximal discriminant function for the sample row vector. Discriminant analysis has two very useful applications. First, it identifies a set of classification parameters that are needed to discriminate between known groups, that is, sets of classification parameters can be identified that are necessary to discriminate between known cyanobacteria strains. Second, the analysis can be used to classify an unknown sample (within a certain probability) into a known group of species or strains. The high classification accuracy of LDA is due to the fact that it works with distribution functions for classification parameters and their statistical characteristics, which allows to build better classification model. However, LDA has strong restrictions on the presence of correlations between classification parameters.

In addition, LDA allows to reduce dimension of the feature space. This so-called linear Fisher discriminant analysis (LFDA) is a data classification method, which classifies the samples by dividing them into groups. The boundaries of these groups are determined by threshold coordinate values. The goal of this method is to find the informative projections by maximizing the function constructed of the projective matrix, the between-class scatter matrix, and the within-class scatter matrix. In this procedure, the first largest component (canonical discriminant function) is the maximal, and the classifications are performed using the three-dimensional space defined by the three largest components. The selection of the best classification parameters is based on the criterion that the dissimilarity between classification parameters of different species/strains should be greater than between those of the same group. Actually, LFDA bases on a solution of eigenvalue problem. The eigenvectors with the first highest eigenvalues are used to construct a lower dimensional space, while the other dimensions are neglected.

Also a stepwise discriminant analysis (SDA) was used in this investigation at the stage of selection of the most valuable classification parameters to determine which parameters discriminate better between the specified groups of observations. Standardized coefficients for each variable in each discriminated function represent the contribution of the respective parameter to the discrimination between groups.

The calculations were performed in MATLAB software using custom-built programs [33].

#### *2.3.3 Artificial neural network*

Artificial neural networks (ANNs) are currently being used in a variety of applications with great success [8, 34–36]. In contrast with conventional programs for data analysis, neural networks follow an adaptive approach. They are flexible and eminently suited for application to complex data structures that are not apt for other data analysis methods like cluster analysis or principal component analysis. Their first main advantage is that they do not require a user-specified problem solving algorithm (as is the case with classic programming), but instead they "learn" from examples, much like human beings. Their second main advantage is that they possess an inherent generalization ability. This means that they can identify and respond to patterns that are similar but not identical to the ones on which they have been trained.

ANN can be described as a mathematical model of a specific structure, consisting of a number of the single processing elements (called artificial neurons), arranged in interconnected layers. An active neuron multiplies each input vector by its weight, sums the products, and passes the sum through a transfer function to produce the output [37]. The ANN is made up of a group of interconnected artificial

**11**

**Figure 2.**

*Self-Fluorescence of Photosynthetic System: A Powerful Tool for Investigation of Microalgal…*

input and sends outputs to other neurons to which it is connected.

neurons, belonging to different layers, while inside one layer neurons are independent. ANN consists of an input, hidden, and output layers. Each neuron transforms

There are many different types and architectures of neural networks varying fundamentally. In this paper a feed-forward ANN (FFANN) is used for solving considered classification problem [34, 37]. **Figure 2** illustrates the model of the ANN used in this work. Due to the simplicity of the classification problem to be solved, a multilayer feed-forward neural network (NN) with one hidden layer was considered. As an activation function, a hyperbolic tangent was used at the hidden layer and Softmax function at the output layer, which allows interpreting the output layer as the distribution of probabilities of belonging to each of the classes. On both layers a bias neuron with a signal equal to unity is added. The size of the input layer (*N*in) depends on the number of classification parameters. The number of neurons on the output layer was fixed and equal to the number of classes (*N*out = 16). The number of neurons on hidden layer was estimated by the following

Learning in ANNs is accomplished through special training algorithms developed based on learning rules presumed to mimic the learning mechanisms of biological systems. According to supervised learning, the network is trained with a dataset of observations and optimized basing on its ability to predict a set of known outcomes. The deviation of the network solution from the target (true) value is computed, and the calculation of the error is propagated backward from the output layer to adjust the connection weights. Since in our case the activation function at the output layer was determined as Softmax, the loss function was calculated via cross-entropy method. A lot of special training algorithms were developed according to learning rules. In this investigation the method of adaptive moment estima-

In the training phase, a sample set of classification parameters and the known solution (the strain number of the corresponding cell) are forced iteratively upon the network. The neuron's weights (ANN parameters) are adjusted in small steps until the network has learned the training examples. In the experiments described

*The multilayer feed-forward artificial neural network. The input classification parameters are fed to the input layer of ANN, and signals are propagated through the network via internal neurons to the output layer. In this* 

*way input signal pattern and output signal pattern are associated with each other.*

*DOI: http://dx.doi.org/10.5772/intechopen.88785*

\_

*N*in *N*out [37].

tion (Adam) was chosen for further calculations [38].

equation *N*<sup>h</sup> ∼ √

#### *Self-Fluorescence of Photosynthetic System: A Powerful Tool for Investigation of Microalgal… DOI: http://dx.doi.org/10.5772/intechopen.88785*

neurons, belonging to different layers, while inside one layer neurons are independent. ANN consists of an input, hidden, and output layers. Each neuron transforms input and sends outputs to other neurons to which it is connected.

There are many different types and architectures of neural networks varying fundamentally. In this paper a feed-forward ANN (FFANN) is used for solving considered classification problem [34, 37]. **Figure 2** illustrates the model of the ANN used in this work. Due to the simplicity of the classification problem to be solved, a multilayer feed-forward neural network (NN) with one hidden layer was considered. As an activation function, a hyperbolic tangent was used at the hidden layer and Softmax function at the output layer, which allows interpreting the output layer as the distribution of probabilities of belonging to each of the classes. On both layers a bias neuron with a signal equal to unity is added. The size of the input layer (*N*in) depends on the number of classification parameters. The number of neurons on the output layer was fixed and equal to the number of classes (*N*out = 16). The number of neurons on hidden layer was estimated by the following equation *N*<sup>h</sup> ∼ √ \_ *N*in *N*out[37].

Learning in ANNs is accomplished through special training algorithms developed based on learning rules presumed to mimic the learning mechanisms of biological systems. According to supervised learning, the network is trained with a dataset of observations and optimized basing on its ability to predict a set of known outcomes. The deviation of the network solution from the target (true) value is computed, and the calculation of the error is propagated backward from the output layer to adjust the connection weights. Since in our case the activation function at the output layer was determined as Softmax, the loss function was calculated via cross-entropy method. A lot of special training algorithms were developed according to learning rules. In this investigation the method of adaptive moment estimation (Adam) was chosen for further calculations [38].

In the training phase, a sample set of classification parameters and the known solution (the strain number of the corresponding cell) are forced iteratively upon the network. The neuron's weights (ANN parameters) are adjusted in small steps until the network has learned the training examples. In the experiments described

#### **Figure 2.**

*The multilayer feed-forward artificial neural network. The input classification parameters are fed to the input layer of ANN, and signals are propagated through the network via internal neurons to the output layer. In this way input signal pattern and output signal pattern are associated with each other.*

*Microalgae - From Physiology to Application*

LDA builds n linear discriminant functions, where n is a number of classes and a row vector with a number of parameters describes each observation. The decision of the sample belonging to the class is based on the selecting of the maximal discriminant function for the sample row vector. Discriminant analysis has two very useful applications. First, it identifies a set of classification parameters that are needed to discriminate between known groups, that is, sets of classification parameters can be identified that are necessary to discriminate between known cyanobacteria strains. Second, the analysis can be used to classify an unknown sample (within a certain probability) into a known group of species or strains. The high classification accuracy of LDA is due to the fact that it works with distribution functions for classification parameters and their statistical characteristics, which allows to build better classification model. However, LDA has strong restrictions on

In addition, LDA allows to reduce dimension of the feature space. This so-called linear Fisher discriminant analysis (LFDA) is a data classification method, which classifies the samples by dividing them into groups. The boundaries of these groups are determined by threshold coordinate values. The goal of this method is to find the informative projections by maximizing the function constructed of the projective matrix, the between-class scatter matrix, and the within-class scatter matrix. In this procedure, the first largest component (canonical discriminant function) is the maximal, and the classifications are performed using the three-dimensional space defined by the three largest components. The selection of the best classification parameters is based on the criterion that the dissimilarity between classification parameters of different species/strains should be greater than between those of the same group. Actually, LFDA bases on a solution of eigenvalue problem. The eigenvectors with the first highest eigenvalues are used to construct a lower dimensional

Also a stepwise discriminant analysis (SDA) was used in this investigation at the stage of selection of the most valuable classification parameters to determine which parameters discriminate better between the specified groups of observations. Standardized coefficients for each variable in each discriminated function represent the contribution of the respective parameter to the discrimination between groups. The calculations were performed in MATLAB software using custom-built

Artificial neural networks (ANNs) are currently being used in a variety of applications with great success [8, 34–36]. In contrast with conventional programs for data analysis, neural networks follow an adaptive approach. They are flexible and eminently suited for application to complex data structures that are not apt for other data analysis methods like cluster analysis or principal component analysis. Their first main advantage is that they do not require a user-specified problem solving algorithm (as is the case with classic programming), but instead they "learn" from examples, much like human beings. Their second main advantage is that they possess an inherent generalization ability. This means that they can identify and respond to patterns that are similar but not identical to the ones on which they have

ANN can be described as a mathematical model of a specific structure, consisting of a number of the single processing elements (called artificial neurons), arranged in interconnected layers. An active neuron multiplies each input vector by its weight, sums the products, and passes the sum through a transfer function to produce the output [37]. The ANN is made up of a group of interconnected artificial

the presence of correlations between classification parameters.

space, while the other dimensions are neglected.

**10**

programs [33].

been trained.

*2.3.3 Artificial neural network*

in this study, the training procedure has 500 iterations (epochs). After training, the network is tested. In this test phase, the characteristics of a number of cyanobacterial cells with known identities are fed to the network, and the solutions are compared with these known identities. In this study, after training the network was capable of recognizing about 96% of cyanobacterial cells in the test set. The analysis of generalization quality of ANN is identical to the test procedure; only the identities of the cells are not known beforehand.

The ratio of training sample to the test sampling in this investigation was taken 70:30%. Other parameters of the selected training algorithm were as follows: acceptable error threshold is 0.01, the bandwidth parameter (size of error control window) is 20, the moment parameter is 0.1, and the regularization parameter is 0.001. The selected learning rate was chosen 0.01, and the number of training epochs lays in the range from 300 to 800.

The main criteria for assessing the quality of ANN operation is the value of classification accuracy. There are several approaches to evaluate the accuracy of classification. In the considered case, the classification accuracy is calculated for each class separately, as the ratio of the number of correctly classified class observations to the total number of observations in a given class. Then the average classification accuracy for all classes was obtained. In such case it is possible to build a matrix of errors with size N × N (N—number of classes) and present the results in a bar chart, on which a classification accuracy for each class can be visualized (see **Figure 8** in "Ataxonomic differentiation of cyanobacterial strains on the base of single-cell fluorescence spectra").

On the base of the classification accuracy analysis, it is possible to evaluate the quality of ANN training as well as the quality of internal and external generalization. In our case, the evaluation of the quality of external generalization was obtained on the base a priori knowledge about new species, which was taken from an expert. To validate the correctness of the neural network operation, the results of the NN classification were compared with the results of the LDA.

The ANN architecture presented in this paper, as well as the learning algorithm and its parameters were determined during the study of various configurations. The selected model after training consistently gives a classification accuracy of at least 95%. In this study, ANN was simulated using MATLAB software [33].
