**Automatic Interpretation of Melanocytic Images in Confocal Laser Scanning Microscopy**

Marco Wiltgen and Marcus Bloice

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/63404

#### **Abstract**

[14] von Neubeck C, Geniza M, Kauer P, Robinson J, Chrisler W, Sowa MB. The effect of low dose ionizing radiation on homeostasis and functional integrity in an organotypic

[15] Luna LG (ed). Manual of histologic staining methods of the armed forces institute of

[16] Wilson WE, Nikjoo H. A Monte-Carlo code for positive ion track simulation. Radiat

[17] Dingfelder M, Hantke D, Inokuti M, Paretzke HG. Electron inelastic-scattering cross

[18] Wilson WE, Miller JH, Lynch DJ, Lewis RR, Batdorf M. Analysis of low-energy electron

[19] Weigand DA, Haygood C, Gaylor JR. Cell layers and density of Negro and Caucasian

[20] Mateus R, Abdalghafor H, Oliverira G, Hadgraft J, Lane ME. A new paradigm in dermatopharmacokinetics – confocal Raman spectroscopy. Int J Pharm 2013; 444:106–

human skin model. Mut Res 2015; 775:10–18.

Environ Biophys 1999; 38:97–104.

8.

50 Microscopy and Analysis

20

pathology. 3rd edition. McGraw-Hill, New York; 1968.

sections in liquid water. Radiat Phys Chem 1998; 53:1–18.

track structure in liquid water. Radiat Res 2004; 161:591–6.

stratum coreum. J Invest Dermatol 1974; 62:563–8.

The frequency of melanoma doubles every 20 years. The early detection of malignant changes augments the therapy success. Confocal laser scanning microscopy (CLSM) enables the noninvasive examination of skin tissue. To diminish the need for training and to improve diagnostic accuracy, computer-aided diagnostic systems are required. Two approaches are presented: a multiresolution analysis and an approach based on deep layer convolutional neural networks. For the diagnosis of the CLSM views, architectural structures such as micro-anatomic structures and cell nests are used as guidelines by the dermatologists. Features based on the wavelet transform enable an exploration of architectural structures at different spatial scales. The subjective diagnostic criteria are objectively reproduced. A tree-based machine-learning algo‐ rithm captures the decision structure explicitly and the decision steps are used as diagnostic rules. Deep layer neural networks require no a priori domain knowledge. They are capable of learning their own discriminatory features through the direct analysis of image data. However, deep layer neural networks require large amounts of processing power to learn. Therefore, modern neural network training is performed using graphics cards, which typically possess many hundreds of small, modestly powerful cores that calculate massively in parallel. Readers will learn how to apply multiresolution analysis and modern deep learning neural network techniques to medical image analysis problems.

**Keywords:** confocal laser scanning microscopy, skin lesions, multiresolution image analysis, convolutional neural networks, machine learning, computer-aided diagnosis

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **1. Introduction**

The skin is the largest organ of the body. Its surface comprises up to two square meters. It is the organ that is in direct contact to the environment and is therefore exposed to several environmental influences such as sun radiation, temperature, infections. The skin consists of three main layers: the epidermis, the dermis and the hypodermis (subcutis), whereby each layer is subdivided into several sublayers (strata) [1]. As the outermost layer, the epidermis provides a protective barrier of the body's surface which keeps water in the body, protects against heat and ultraviolet radiation and prevents infections (caused by bacteria, fungi, parasites, etc.) [2, 3]. The horny layer (stratum corneum), which is the top layer of the epidermis, undergoes a continuous process of renovation (every 4 weeks). Keratinocytes, which represents 90% of the cell types in the epidermis, protect the body against ultraviolet radiation. Keratinocytes are derived from epidermal stem cells residing in the lower part of the epidermis (stratum basalis). During theirlifetime, they migrate through the different strata of the epidermis. Via this process, they are pressed to the epidermis surface by the continu‐ ously succeeding cells. During the migration through the different strata, the keratinocytes cells undergo multiple stages of differentiation, whereby they change shape and composi‐ tion and are filled with keratin. Different stages and corresponding strata are represented in **Figure 1**. Keratin, a structural protein, is the key structural material making up the outer layer of the epidermis and protects the cells from damage or stress. On their way to the outermost strata, the keratinocytes lose liquid and become hornier. Corneocytes are keratinocytes that have completed their differentiation program. They are dead cells in the stratum corneum and are shed off (by desquamation) as new ones come in. Keratinocytes protect against ultravio‐ let radiation by taking up melanosomes from epidermal melanocytes. The melanosomes are vesicles which contain the endogenous photo protectant molecule melanin. Melanocytes are melanin producing cells which comprise between 5 and 10% of the cells in the basal layer (stratum basalis) of the epidermis. The production of the skin pigment melanin is stimulat‐ ed by ultraviolet radiation (melanogenesis). Melanocytes have several arm-like structures (dendrites) that stretch out to connect them with many keratinocytes. Once synthesized, melanin is contained in the melanosomes and moved along the dendrites to reach the keratinocytes. The melanin molecules are stored within keratinocytes (and melanocytes) in the perinuclear area, around the nucleus, where they protect the DNA against ultraviolet radiation. Thereby, a melanin molecule transforms nearly all the radiation energy in to heat. This is done by ultrafast internal conversation of the energy from the excited electronic states into vibrational modes. The ultrafast conversion shortens the lifetime of the excitation states and therefore prevents the formation of harmful free radicals.

The dermis is connected to the epidermis through a basement membrane (a thin sheet of fibres) and provides anchoring and nourishment for the epidermis. The dermis contains collagen (stability), elastic fibres (elasticity) and an extrafibrillar matrix as structural components. The papillary region (stratum papillae) in the dermis is composed of connective tissue which extends towards the epidermis. These finger-like projections are called papillae and strengthen the connection between the dermis and the epidermis. In addition to the structural compo‐ nents, blood vessels are present in the dermis providing nourishment for the dermal and epidermal cells. Furthermore, the dermis contains hair follicles, sweat glands and lymphatic vessels. (In addition to the presented components, the dermis also contains mechanoreceptors that enable the sense of touch and thermoreceptors that provide the sense of heat). The hypodermis is beneath the dermis. Its tasks comprise energy storage, heat insulation and the connection of the skin with inner structures like muscles and bones. The hypodermis consists primarily of loose connective tissue and adipocytes (fat cells), which are grouped together in lobules (subcutaneous fat). Furthermore, the hypodermis contains larger blood vessels and nerves than those found in the dermis.

**Figure 1.** The layer architecture of the epidermis.

**1. Introduction**

52 Microscopy and Analysis

The skin is the largest organ of the body. Its surface comprises up to two square meters. It is the organ that is in direct contact to the environment and is therefore exposed to several environmental influences such as sun radiation, temperature, infections. The skin consists of three main layers: the epidermis, the dermis and the hypodermis (subcutis), whereby each layer is subdivided into several sublayers (strata) [1]. As the outermost layer, the epidermis provides a protective barrier of the body's surface which keeps water in the body, protects against heat and ultraviolet radiation and prevents infections (caused by bacteria, fungi, parasites, etc.) [2, 3]. The horny layer (stratum corneum), which is the top layer of the epidermis, undergoes a continuous process of renovation (every 4 weeks). Keratinocytes, which represents 90% of the cell types in the epidermis, protect the body against ultraviolet radiation. Keratinocytes are derived from epidermal stem cells residing in the lower part of the epidermis (stratum basalis). During theirlifetime, they migrate through the different strata of the epidermis. Via this process, they are pressed to the epidermis surface by the continu‐ ously succeeding cells. During the migration through the different strata, the keratinocytes cells undergo multiple stages of differentiation, whereby they change shape and composi‐ tion and are filled with keratin. Different stages and corresponding strata are represented in **Figure 1**. Keratin, a structural protein, is the key structural material making up the outer layer of the epidermis and protects the cells from damage or stress. On their way to the outermost strata, the keratinocytes lose liquid and become hornier. Corneocytes are keratinocytes that have completed their differentiation program. They are dead cells in the stratum corneum and are shed off (by desquamation) as new ones come in. Keratinocytes protect against ultravio‐ let radiation by taking up melanosomes from epidermal melanocytes. The melanosomes are vesicles which contain the endogenous photo protectant molecule melanin. Melanocytes are melanin producing cells which comprise between 5 and 10% of the cells in the basal layer (stratum basalis) of the epidermis. The production of the skin pigment melanin is stimulat‐ ed by ultraviolet radiation (melanogenesis). Melanocytes have several arm-like structures (dendrites) that stretch out to connect them with many keratinocytes. Once synthesized, melanin is contained in the melanosomes and moved along the dendrites to reach the keratinocytes. The melanin molecules are stored within keratinocytes (and melanocytes) in the perinuclear area, around the nucleus, where they protect the DNA against ultraviolet radiation. Thereby, a melanin molecule transforms nearly all the radiation energy in to heat. This is done by ultrafast internal conversation of the energy from the excited electronic states into vibrational modes. The ultrafast conversion shortens the lifetime of the excitation states

and therefore prevents the formation of harmful free radicals.

The dermis is connected to the epidermis through a basement membrane (a thin sheet of fibres) and provides anchoring and nourishment for the epidermis. The dermis contains collagen (stability), elastic fibres (elasticity) and an extrafibrillar matrix as structural components. The papillary region (stratum papillae) in the dermis is composed of connective tissue which extends towards the epidermis. These finger-like projections are called papillae and strengthen the connection between the dermis and the epidermis. In addition to the structural compo‐ nents, blood vessels are present in the dermis providing nourishment for the dermal and

## **2. Malignant melanoma and benign nevi**

The primary cause for the increasing number of melanomas is the extreme sun exposure during sun-bathing (especially for people with low levels of skin pigment). The malignant melanoma is a type of cancer that develops from the pigment containing melanocytes [4]. Melanomas are mainly caused by DNA damage resulting from the ultraviolet radiation [5]. It is observed that strongly pigmented people are less susceptible to (sun induced) melanomas, which demon‐ strates the protection function of melanin. At the early stage, melanocytes begin an out-ofcontrol growth [5]. In a posterior stage (invasive melanoma), the melanoma may grow into the surrounding tissue and can spread out around the body through lymph or blood vessels deeper in the skin. People with melanomas at the early stage are treated by surgical removal of the skin lesion. In cases where the melanoma has spread out, patients are treated by immunotherapy or chemotherapy. Most people are cured if spreading has not occurred. Therefore, the early and reliable recognition of melanomas at the early stage is of special importance [6]. The difference between a benign or malignant tumour is its invasive potential. If a tumour lacks the ability to invade adjacent tissues and to metastasize then it is benign, whereas a malignant tumour is invasive or metastatic. A nevus (birthmark) is a sharply circumscribed and benign chronic lesion of the skin. The melanocytic nevus results from benign proliferation of the dendritic melanocytes. Due to the pigment melanin, they are mostly brown. Nevus cells are related to the melanocytes, but they show a lack of the dendrites and are oval in shape. They are typically arranged in cell nests. The majority of acquired nevi appear during the childhood up to young adults (the first two decades of life). A melanocytic nevus present at birth is called a congenital nevus. They are rarely about one in every 100 newborns. Nevi are harmless. However, 25% of malignant melanomas arise from pre-existing nevi.

#### **3. Confocal laser scanning microscopy**

In conventional microscopy, the entire field of a tissue sample is simultaneously illuminated by light and displayed. Although the brightest light intensity results from the focal point of the objective lens, other parts of the tissue are still illuminated, resulting in a large unfocused background section. This background noise diminishes the image quality. Both conventional and confocal laser scanning microscopy (CLSM) can use reflected light to image a tissue sample. The reflected light from the illuminated spot is then re-collected by the objective lens. In addition to the reflected light from the focal point, the scattered light from sample points outside the focus light (coming from places above or below the focus) is projected by the optical system of the microscope and therefore contributes to the image assembly. This causes a blurring and obscuring of the resulting image. Confocal microscopy overcomes this problem by placing a pinhole in the conjugate focal plane (hence the designation confocal) that allows only the light emitting from the desired focal spot to pass through [7]. Any light outside of the focal plane (the scattered light) is blocked. **Figure 2** shows the principle: the out of focus light (red), coming from places above the selected focal plane, is blocked by the pinhole in the conjugate focal plane. The (in focus) light from focal plane (blue) can pass through the pinhole and is detected. Therefore, a blurring is avoided and sharp and detailed images are produced (in other words: the image information from multiple depths in the sample is not superim‐ posed). In confocal microscopy, a light beam is directed by a dichroic mirror to the objective lens where it is focused into a small focal volume at a layer within the tissue sample (**Figure 3**). A laser, with a near-infrared wavelength, is used as a coherent monochromatic light source. The same microscope objective gathers the reflected light from the illuminated spot in the sample. The dichroic mirror separates the reflected light from the incident light and deflects it to the detector. Before the light reaches the detector, the out of focus sections are blocked by the pinhole in the conjugate focal plane. The in focus light that passes through the pinhole is measured.

Automatic Interpretation of Melanocytic Images in Confocal Laser Scanning Microscopy http://dx.doi.org/10.5772/63404 55

**Figure 2.** Principle of the confocal (left) and laser scanning (right) microscopy.

immunotherapy or chemotherapy. Most people are cured if spreading has not occurred. Therefore, the early and reliable recognition of melanomas at the early stage is of special importance [6]. The difference between a benign or malignant tumour is its invasive potential. If a tumour lacks the ability to invade adjacent tissues and to metastasize then it is benign, whereas a malignant tumour is invasive or metastatic. A nevus (birthmark) is a sharply circumscribed and benign chronic lesion of the skin. The melanocytic nevus results from benign proliferation of the dendritic melanocytes. Due to the pigment melanin, they are mostly brown. Nevus cells are related to the melanocytes, but they show a lack of the dendrites and are oval in shape. They are typically arranged in cell nests. The majority of acquired nevi appear during the childhood up to young adults (the first two decades of life). A melanocytic nevus present at birth is called a congenital nevus. They are rarely about one in every 100 newborns. Nevi are harmless. However, 25% of malignant melanomas arise from pre-existing nevi.

In conventional microscopy, the entire field of a tissue sample is simultaneously illuminated by light and displayed. Although the brightest light intensity results from the focal point of the objective lens, other parts of the tissue are still illuminated, resulting in a large unfocused background section. This background noise diminishes the image quality. Both conventional and confocal laser scanning microscopy (CLSM) can use reflected light to image a tissue sample. The reflected light from the illuminated spot is then re-collected by the objective lens. In addition to the reflected light from the focal point, the scattered light from sample points outside the focus light (coming from places above or below the focus) is projected by the optical system of the microscope and therefore contributes to the image assembly. This causes a blurring and obscuring of the resulting image. Confocal microscopy overcomes this problem by placing a pinhole in the conjugate focal plane (hence the designation confocal) that allows only the light emitting from the desired focal spot to pass through [7]. Any light outside of the focal plane (the scattered light) is blocked. **Figure 2** shows the principle: the out of focus light (red), coming from places above the selected focal plane, is blocked by the pinhole in the conjugate focal plane. The (in focus) light from focal plane (blue) can pass through the pinhole and is detected. Therefore, a blurring is avoided and sharp and detailed images are produced (in other words: the image information from multiple depths in the sample is not superim‐ posed). In confocal microscopy, a light beam is directed by a dichroic mirror to the objective lens where it is focused into a small focal volume at a layer within the tissue sample (**Figure 3**). A laser, with a near-infrared wavelength, is used as a coherent monochromatic light source. The same microscope objective gathers the reflected light from the illuminated spot in the sample. The dichroic mirror separates the reflected light from the incident light and deflects it to the detector. Before the light reaches the detector, the out of focus sections are blocked by the pinhole in the conjugate focal plane. The in focus light that passes through the pinhole is

**3. Confocal laser scanning microscopy**

54 Microscopy and Analysis

measured.

The detector, which is usually a photomultiplier tube or avalanche photodiode, amplifies and transforms the intensity of the reflected light signal into an electrical one that is recorded by a computer. In contrast to conventional microscopy, there is never a complete image of the sample at any given instant; rather only one point in the selected plane of the sample is observed. In order to create an image, light from every point in the plane (x-axis, y-axis) must be recorded. This can be done by a raster scanning mechanism which uses two motor driven high-speed oscillating mirrors, which pivot on mutually perpendicular axes. Coordination of the two mirrors, one scanning along the x-axis and the other on the y-axis, produces the rectilinear raster scan (**Figure 2**). During the scanning process, the detected signal is transferred to a computer that collects all the 'point images' of the sample and serially constructs the image pixel by pixel. The brightness of a resulting image pixel corresponds to the relative intensity of the reflected light. The contrast in the images results from variations in the refractive index of microstructures within the tissue. Information can be collected from different focal planes by raising or lowering the objective lens. Then successive planes make up a 'z-stack'. A stack

**Figure 3.** Principle of the confocal laser scanning microscope.

is a sequence of images captured at the same horizontal position (x- and y-axes) at different depths (z-axis). The images are taken enface (horizontally). The confocal laser scanning microscopy is performed with a Vivascope 1000 (Lucid Inc., USA) which uses a diode laser at 830 nm wavelength and a power of <35 mW at tissue level. A ×30 water-immersion objective lens with a numerical aperture of 0.9 is used with water as an immersion medium. The spatial resolution is 0.5–1.0 μm in the lateral and 3–5 μm in the axial dimension.

The images contain a field-of-view of 0.5 × 0.5 mm. Up to 16 layers per lesion can be scanned. All images, stored in BMP file format, are monochrome images with a spatial resolution of 640 × 480 pixels and a grey level resolution of 8 bits.

## **4. Interpretation of confocal laser scanning microscopic images**

The reflectivity of the tissue depends on chemical structures. Melanin and melanosomes have a high refractive index which contributes strongly to the contrast of the resulting image [8– 10]. Due to such dominating variations of the refractive index, only a certain part of the in falling light is reflected. This makes the appearance of the tissue in a CLSM image so different from conventional histological views. The power of the 830 nm laser limits the imaging depth to a maximum of 350 μm, corresponding to the papillary dermis (higher power could damage the skin). **Figure 4** shows the views of different skin layers [11]. The stratum corneum shows large polygonal anucleated corneocytes (A). Skin folds and marks appear as dark structures. The next layer is the stratus granulosum (B). The stratum spinosum (C) contains keratinocytes in a honeycomb pattern. In the stratum basalis (D), the basal cells are uniform in size and show higher reflections than spinous keratinocytes and appear very intensively. The dermatological guidelines for the interpretation of melanocytic skin lesions in CLSM views are as follows.

**Figure 4.** CLSM views of normal skin.

For the diagnosis of CLSM views of benign common nevi and malignant melanoma, architec‐ tural structures such as micro-anatomic structures; cell nests, etc., play an important role [12]. Monomorphic melanocytic cells, melanocytic cell nests and readily detected keratinocyte cell boarders are suggestive of benign nevi, whereas polymorphic melanocytic cells, disarray of melanocytic architecture and poorly defined keratinocyte cell borders are suggestive of melanoma (**Figure 5**). The images are taken from the centre of the tumours.

**Figure 5.** CLSM images of malignant melanoma (left) and common benign nevi (right).

is a sequence of images captured at the same horizontal position (x- and y-axes) at different depths (z-axis). The images are taken enface (horizontally). The confocal laser scanning microscopy is performed with a Vivascope 1000 (Lucid Inc., USA) which uses a diode laser at 830 nm wavelength and a power of <35 mW at tissue level. A ×30 water-immersion objective lens with a numerical aperture of 0.9 is used with water as an immersion medium. The spatial

The images contain a field-of-view of 0.5 × 0.5 mm. Up to 16 layers per lesion can be scanned. All images, stored in BMP file format, are monochrome images with a spatial resolution of 640

The reflectivity of the tissue depends on chemical structures. Melanin and melanosomes have a high refractive index which contributes strongly to the contrast of the resulting image [8– 10]. Due to such dominating variations of the refractive index, only a certain part of the in falling light is reflected. This makes the appearance of the tissue in a CLSM image so different from conventional histological views. The power of the 830 nm laser limits the imaging depth to a maximum of 350 μm, corresponding to the papillary dermis (higher power could damage the skin). **Figure 4** shows the views of different skin layers [11]. The stratum corneum shows large polygonal anucleated corneocytes (A). Skin folds and marks appear as dark structures. The next layer is the stratus granulosum (B). The stratum spinosum (C) contains keratinocytes in a honeycomb pattern. In the stratum basalis (D), the basal cells are uniform in size and show higher reflections than spinous keratinocytes and appear very intensively. The dermatological guidelines for the interpretation of melanocytic skin lesions in CLSM views are as follows.

For the diagnosis of CLSM views of benign common nevi and malignant melanoma, architec‐ tural structures such as micro-anatomic structures; cell nests, etc., play an important role [12].

resolution is 0.5–1.0 μm in the lateral and 3–5 μm in the axial dimension.

**4. Interpretation of confocal laser scanning microscopic images**

× 480 pixels and a grey level resolution of 8 bits.

56 Microscopy and Analysis

**Figure 4.** CLSM views of normal skin.

Layers from the plane of the spinous keratinocytes (polygonal cells) to the plane of the basal cells (dermo-epidermal junction) are used for diagnosis.

## **5. Analysis of tissue structures at different scales**

As shown in the previous section, the information at different scales (from coarse structures to details) plays a crucial role in the diagnosis of CLSM images of skin lesions. Wavelet analysis is a method to analyse visual data by taking into account scale information [13].

**Figure 6.** Scale-space sequence of a successively Laplacian of Gaussian-filtered image.

The multiple resolutions enable a scale invariant interpretation of an image. **Figure 6** illustrates the principle of scale space analysis for four levels of scale (clockwise direction). In the top left image (scale 1), the feature detection responds to fine texture. The images at higher scales are generated by a Laplacian of Gaussian filter (LoG(x, y)), which is also known as Marr-Hildreth operator or Marr wavelet (**Figure 7**), whereby the kernel size (σ) of the Gaussian increases step by step.

$$\text{LoG}(\mathbf{x}, \mathbf{y}) = \frac{1}{\pi \sigma^4} \left\{ \frac{\mathbf{x}^2 + \mathbf{y}^2}{2\sigma^2} - \mathbf{l} \right\} \cdot \mathbf{e}^{-\frac{\mathbf{x}^2 + \mathbf{y}^2}{2\sigma^2}}$$

The blue and red colours indicate positive and negative values. The images become increas‐ ingly blurred and smaller details (or regions) progressively disappear. The detected features are then associated with a larger scale scene structure. The multiresolution analysis is closely analogous to the human vision system which seems to prefer methods of analysis that run from coarse to fine and, repeating the same process, obtain new information at the end of each cycle [14] (**Figure 6** counter clockwise direction). The wavelet decomposition can be realized as a convolution of the image with a filter bank, consisting of high pass and low pass filters [15]. Whereby, for example a first-order derivative can be used as a convolution kernel for the highpass filter and a moving average as a kernel for the low-pass filter. In our study, the filter coefficients are defined by the Daubechies 4 wavelet transform.

**Figure 7.** Shape of the Laplacian of Gaussian convolutional filter kernel.

The wavelet decomposition performs a multi resolution analysis, whereby the image is successively decomposed by the filter operations followed by sub-sampling. The (pyramidal) algorithm consists of several steps and operates as follows: at the beginning, the image rows are filtered by the high-pass filter and in parallel by the low-pass filter (**Figure 8**). From both operations result two images (which are called sub-bands), one shows details (high pass) and the other is smoothed out (low pass). The sub-sampling is done by removing every second column in both sub-bands. Subsequently, the columns of both sub-bands are high-pass and independently low-pass filtered. This results in four sub-bands, which differ by the kind of filtering. Again a sub-sampling is done by removing every second row in each sub-band. This is the end of the first step. The mixed filtered (high-low pass, etc.) sub-bands are stored. Only the double low-passed sub-band is processed in the second step (**Figure 8**). The second step repeats the operations of the first step. Again this results in four sub-bands and the fourth smoothed sub-band is used as entry for the following step. At every step, the resulting subbands are reduced to half the resolution. The sub-bands with higher spatial resolution contain the detailed information (high pass), whereas the sub-bands with the low-resolution represent the large scale coarse information (low pass). The output of the wavelet decomposition consists of the remaining 'smooth-…-smooth' components and all the accumulated 'detail' compo‐ nents. In other words, via the wavelet decomposition, the image array is decomposed into several sub-bands representing information at different scales. The output of the last low-pass filtering is the mean gray level of the image.

image (scale 1), the feature detection responds to fine texture. The images at higher scales are generated by a Laplacian of Gaussian filter (LoG(x, y)), which is also known as Marr-Hildreth operator or Marr wavelet (**Figure 7**), whereby the kernel size (σ) of the Gaussian increases step

> 4 2 1x y LoG(x, y) 1 e

The blue and red colours indicate positive and negative values. The images become increas‐ ingly blurred and smaller details (or regions) progressively disappear. The detected features are then associated with a larger scale scene structure. The multiresolution analysis is closely analogous to the human vision system which seems to prefer methods of analysis that run from coarse to fine and, repeating the same process, obtain new information at the end of each cycle [14] (**Figure 6** counter clockwise direction). The wavelet decomposition can be realized as a convolution of the image with a filter bank, consisting of high pass and low pass filters [15]. Whereby, for example a first-order derivative can be used as a convolution kernel for the highpass filter and a moving average as a kernel for the low-pass filter. In our study, the filter

The wavelet decomposition performs a multi resolution analysis, whereby the image is successively decomposed by the filter operations followed by sub-sampling. The (pyramidal) algorithm consists of several steps and operates as follows: at the beginning, the image rows are filtered by the high-pass filter and in parallel by the low-pass filter (**Figure 8**). From both operations result two images (which are called sub-bands), one shows details (high pass) and the other is smoothed out (low pass). The sub-sampling is done by removing every second column in both sub-bands. Subsequently, the columns of both sub-bands are high-pass and

coefficients are defined by the Daubechies 4 wavelet transform.

**Figure 7.** Shape of the Laplacian of Gaussian convolutional filter kernel.

ps s

2

= ç - ÷× è ø

2 2 2

s

2

2 2 x y

<sup>+</sup> æ + ö -

by step.

58 Microscopy and Analysis

After the dissection of the quadratic sub-bands, they are usually arranged in a quadratic configuration, whereby the three sub-bands of the first step fill 3/4 of the square, the three subbands of the second step fill 3/16 of the square, etc. The sub-bands representing successively decreasing scales are labelled with increasing indices (**Figure 9**). Then, the architectural structure information is accumulated along the way of the sub-bands (from coarse to fine). In image processing, it is convenient to display the smoothed image as lowest sub-band in the upper left corner of the quadratic sub-band configuration. The coefficients values in the different sub-bands reflect architectural and cell structures at different scales.

**Figure 8.** The multiresolution filter bank of the wavelet decomposition.

**Figure 9.** The sub-bands resulting from the successive high and low pass filter operations.

The tissue features are derived from statistical properties of the sub-band coefficients. For the ith sub-band of size N × N, the coefficients are given by:

$$\mathbf{d}\_{\mathbf{i}} = \left( \mathbf{d}\_{\mathbf{i}}(\mathbf{k}, 1) \Big| \mathbf{k}, 1 = 1 \text{\_\text{\textdegree N}} \right),$$

The texture features are based on the variations of the coefficients within each sub-band and the weighted sum of all the coefficients into each sub-band. The standard deviations of the coefficients inside the single sub-bands and the energy and entropy of the different sub-bands are calculated and used as features (for details see: [16]). The standard deviation of the coefficients represents how exposed the tissue structures in the considered sub-band at the given scale are. The total energy of the coefficients in a given sub-band shows to what degree the structures at the corresponding scale contribute to the image. The distribution of the energy of the sub-bands is represented in a power spectrum, enabling an evaluation of their relative contributions.

The next task in automated image analysis is the use of machine-learning algorithms for classification purposes on hand of the feature values [17]. The algorithm learns, by use of a training set, how to assign the tissue images to given classes. Then, in future, the algorithm can apply the gained knowledge to predict the class of unknown tissue. By means of the classification procedure, the primary inhomogeneous set of CLSM samples, consisting of a mix of malignant melanoma and benign common nevi cases, is split into homogeneous subsets, which are assigned to one of the two tumour classes: common benign nevi or malignant melanoma. A homogeneous subset means that it contains only CLSM images with similar feature values, representing one specific kind of tissue. For the discrimination of the CLSM images, the CART (Classification and Regression Trees) algorithm is used [18].

The tree representation consists of different nodes and branches. There is a root node, several leaf (terminal) nodes and inner nodes (**Figure 10**). The first node in the tree is the root node. It contains the feature values of the whole set of CLSM image samples. A leaf node is a homo‐ geneous node which contains only samples belonging to the same class of tissue. The inner nodes contain more or less inhomogeneous sample sets. A branch in the decision tree involves the testing of one particular texture feature (binary tree). Then, the considered node, which is the parent node, is split into two child nodes (**Figure 10**).

**Figure 10.** Generation of a decision tree.

**Figure 9.** The sub-bands resulting from the successive high and low pass filter operations.

ith sub-band of size N × N, the coefficients are given by:

contributions.

60 Microscopy and Analysis

The tissue features are derived from statistical properties of the sub-band coefficients. For the

d d (k,l) k,l ,N i i = = { 1 }

The texture features are based on the variations of the coefficients within each sub-band and the weighted sum of all the coefficients into each sub-band. The standard deviations of the coefficients inside the single sub-bands and the energy and entropy of the different sub-bands are calculated and used as features (for details see: [16]). The standard deviation of the coefficients represents how exposed the tissue structures in the considered sub-band at the given scale are. The total energy of the coefficients in a given sub-band shows to what degree the structures at the corresponding scale contribute to the image. The distribution of the energy of the sub-bands is represented in a power spectrum, enabling an evaluation of their relative

The next task in automated image analysis is the use of machine-learning algorithms for classification purposes on hand of the feature values [17]. The algorithm learns, by use of a training set, how to assign the tissue images to given classes. Then, in future, the algorithm can apply the gained knowledge to predict the class of unknown tissue. By means of the classification procedure, the primary inhomogeneous set of CLSM samples, consisting of a mix of malignant melanoma and benign common nevi cases, is split into homogeneous subsets, which are assigned to one of the two tumour classes: common benign nevi or malignant melanoma. A homogeneous subset means that it contains only CLSM images with similar feature values, representing one specific kind of tissue. For the discrimination of the CLSM

The tree representation consists of different nodes and branches. There is a root node, several leaf (terminal) nodes and inner nodes (**Figure 10**). The first node in the tree is the root node. It

images, the CART (Classification and Regression Trees) algorithm is used [18].

The feature is tested by comparing its numerical value with a threshold value that divides the value range. The threshold value is selected automatically by the algorithm in such a way that the subsets of samples in the child nodes are purer than the set in the parent node. To this purpose, an information measure is used which indicates the degree of homogeneity; the value in the leaf nodes is zero and the higher the value of an inner node, the higher is its inhomo‐ geneity. At every branch in the tree, subsets with smaller values of the information measure are generated. The decision tree is generated recursively (details are shown in: [16]). Whereby the algorithm consists in principal of three parts: the determination of the optimal splitting at every node; the decision whether the node is a leaf node or an inner node; the assignment of a leaf node to a specific class (**Figure 10**). To classify an unknown sample, it is routed down the tree according to the values of the different features. When a leaf node is reached, the sample is classified according to the class assigned to the leaf. The tree-based machine-learning algorithm captures the decision structure explicitly. That means the generated decision rules are 'Modus Ponens', with a precondition and conclusion part, and are intelligible in such a manner that they can be understood, discussed and used as diagnostic rules.

*IF ...and.Condition1.and.Condition2 THEN Class : A* ( ) ( = )

In total, 39 different features are calculated for 16 frequency bands (labelled from 0 to 15). The mean value is calculated from the first four frequency bands; therefore, 13 values result for each feature. The highest frequency bands contain only information about very fine grey level variations, such as noise, and are therefore not considered for the image analysis. The proce‐ dure for image analysis (including feature extraction and calculation) was developed with the 'Interactive Data Language' software tool IDL (IDL 7.1, ITT Visual Information Solutions). The tree classification is done by the CART analysis software from Salford Systems, San Diego, USA.

## **6. Biological motivation for neural networks**

A neuron is an electrically excitable cell that receives, processes and transmits information as electrochemical signals. It consists of several dendrites, the soma and an axon (**Figure 11**). The soma is the cell body which contains the nucleus and all the necessary cytoplasmic cell structures. The dendrites are cytoplasmic extensions of the cell body with many branches allowing the cell to receive signals from other neurons. The axon is a special extension which carries signals away from the soma. At its terminal, the axon undergoes extensive branching, enabling communication with many target cells. The neurons maintain voltage gradients across their membranes. Ion channels, embedded in the membrane, enable the generation of intracellular-extracellular ion migrations. The resulting changes in the cross-membrane polarization generate an electrochemical pulse, known as the action potential. These changes in the cross-membrane potential are transferred as a wave of successive depolarization and repolarisation processes along the cell's axon. The axon terminal contains synapses, specialized connections to target neurons, where neurotransmitter chemicals are released. Synaptic signals may be excitatory or inhibitory. Once the pulse from the soma along the axon reaches the synapses, a neurotransmitter is released at the synaptic cleft. The neurotransmitter molecules bound at the receptors in the post-synaptic membrane (of the target neuron) and opens ion channels. Then, the electrochemical pulse is transmitted to the target neuron.

**Figure 11.** Microanatomy of a natural neuron (left), principle of an artificial neuron (right).

An artificial neuron is a mathematical model of a biological neuron. Artificial neurons mimic the behaviour of the biological neurons. The input of the artificial neuron is represented by a vector:x=(x1, x2, ..., xn), whereby its dimension reflects the number of contributing dendrites (**Figure 11**). In the mathematical model, each 'dendrite' contributes individually through a weighted signal to the input signal. The weight factor wj =(wj1, wj2, ..., wjn) simulates the ratio of synaptic neurotransmitters, whereby positive values represent excitatory and negative values inhibitory behaviour (a weight value zero means that there is no connection between the involved neurons). The summation function represents the soma of the neuron j. The exciting and inhibiting signals are added in the function:

$$z\_j = \sum\_i x\_i w\_{ji}$$

The firing behaviour of the neuron is represented by the activation function. Its activation depends on the output of the summation function *zj* and a threshold value Θ. If the summation function exceeds the threshold, the neuron is firing and transmits an output signal yj :

$$\mathcal{Y}\_{\prime} = \Phi(z\_{\prime} - \Theta)$$

The biological motivation of the activation function is the threshold potential in natural neurons. Step and sigmoid functions are often used as transfer functions.

## **7. Artificial neural networks**

In total, 39 different features are calculated for 16 frequency bands (labelled from 0 to 15). The mean value is calculated from the first four frequency bands; therefore, 13 values result for each feature. The highest frequency bands contain only information about very fine grey level variations, such as noise, and are therefore not considered for the image analysis. The proce‐ dure for image analysis (including feature extraction and calculation) was developed with the 'Interactive Data Language' software tool IDL (IDL 7.1, ITT Visual Information Solutions). The tree classification is done by the CART analysis software from Salford Systems, San Diego,

A neuron is an electrically excitable cell that receives, processes and transmits information as electrochemical signals. It consists of several dendrites, the soma and an axon (**Figure 11**). The soma is the cell body which contains the nucleus and all the necessary cytoplasmic cell structures. The dendrites are cytoplasmic extensions of the cell body with many branches allowing the cell to receive signals from other neurons. The axon is a special extension which carries signals away from the soma. At its terminal, the axon undergoes extensive branching, enabling communication with many target cells. The neurons maintain voltage gradients across their membranes. Ion channels, embedded in the membrane, enable the generation of intracellular-extracellular ion migrations. The resulting changes in the cross-membrane polarization generate an electrochemical pulse, known as the action potential. These changes in the cross-membrane potential are transferred as a wave of successive depolarization and repolarisation processes along the cell's axon. The axon terminal contains synapses, specialized connections to target neurons, where neurotransmitter chemicals are released. Synaptic signals may be excitatory or inhibitory. Once the pulse from the soma along the axon reaches the synapses, a neurotransmitter is released at the synaptic cleft. The neurotransmitter molecules bound at the receptors in the post-synaptic membrane (of the target neuron) and opens ion

channels. Then, the electrochemical pulse is transmitted to the target neuron.

**Figure 11.** Microanatomy of a natural neuron (left), principle of an artificial neuron (right).

An artificial neuron is a mathematical model of a biological neuron. Artificial neurons mimic the behaviour of the biological neurons. The input of the artificial neuron is represented by a

**6. Biological motivation for neural networks**

USA.

62 Microscopy and Analysis

Artificial neural networks consist of a number of artificial neurons, the computational units, which are interconnected. Each unit performs some small calculation based on inputs it receives from other units, whereby the associated weight factors can be tuned. This tuning occurs by allowing the network to analyse many examples of previously observed data. The most common type of neural network is the feed forward neural network (containing no loops), and in such networks, the computational units are organised into layers from an input layer, where data are fed into the network, to an output layer, where the result of the network's computation is outputted in the form of a classification result or regression result (**Figure 12**). Traditionally, each neuron in a layer is connected to all other neurons in the previous or subsequent layers (fully connected network). Between the output and input layers are hidden layers, and networks that consist of more than one hidden layer are known as *deep learning* algorithms. Such feed forward neural networks have been shown to be universal approxima‐ tors, that is to say they can learn to approximate any continuous function to arbitrary precision, given enough hidden neurons [19]. Neural networks must be trained. The training data are previous observations that have been collected, and the task of the network is to learn a function which should map new input data to a classification label.

**Figure 12.** Structure of a feed forward artificial neural network.

In general, feed forward neural networks are supervised machine-learning algorithms. **Figure 12** shows a network with three layers: (1) an input layer, where data are fed in, (2) a hidden layer consisting of neurons that each contain an activation function that reads in data from the input neurons, performs some calculation, and outputs a value, and (3) an output layer that reads data from the hidden layer and makes a prediction based on this input. All connections between neurons have independently adjustable weights (Section 6). All layers are fully connected meaning that each neuron in the input layer is connected to every neuron in the hidden layer. The network learns by adjusting the weights between each of the connected neurons until the network makes good predictions by minimising an error function (backpro‐ pagation algorithm).

Fully connected neural networks are useful where individual features of a dataset are not very informative. In image data, where an individual pixel is not likely to be very informative taken on its own, a local combination of pixels may very well be informative and represent an object of interest. However, neural networks are also far more computationally intensive than many other machine-learning algorithms, with the number of tuneable parameters quickly growing into the millions as the network increases in depth or size. Also, neural networks typically work on image data directly, without feature reduction, meaning the dimensionality of the data being analysed by neural networks is much higher than that of other algorithms, which often work on extracted features. One could therefore summarise that neural networks are most useful for high *m* high *n* problems—problems where there exist many observations (*n*) of high dimensional data (*m*). Of late, neural networks algorithms have re-emerged as a popular technique in machine learning, especially in the field of image analysis. This reemergence has come due to a number of recent developments in neural network design as well as independent hardware developments. In real-world applications, their usage has grown beyond image analysis and has also been shown to be useful for other tasks, such as natural language processing and artificial intelligence [20, 21]. Nevertheless, a number of advance‐ ments in recent years resulted in an upsurge in the usage of neural networks.

First, hardware advancements have made it feasible for larger neural networks to be trained in reasonable amounts of time. As mentioned previously, neural networks that learn on very high-dimensional data require many neurons and layers, meaning networks can consist of many millions of parameters that need to be tuned. This results in large network architectures that have, for a long time, been unfeasibly difficult to train on standard desktop workstations. However, computational enhancements have meant this is no longer the case. These compu‐ tational advancements are the result of rapid developments in graphics processing unit (GPU) technology due to the ever increasing requirements of the gaming industry, resulting in great improvements in the parallel processing power of GPUs. In 3D gaming, the vast majority of processing power is spent on matrix multiplications, such as transforms and perspective calculations, in order to depict the 3D worlds of games in 2D to the user. Such calculations are, for the most part, performed using matrix and vector multiplications. Such matrix calculations can be performed in parallel, and hence gaming GPUs have evolved to be particularly suited to such parallel processing tasks. To this end, GPUs typically consist of boards with many small, less powerful cores that can perform highly parallel computations. While CPUs tend to possess 2–4 large and fast cores, GPUs possess many hundreds of smaller cores. Crucially, almost 90% of the computational effort required to train a neural network is spent on vector, matrix, and tensor operations, meaning they can benefit from all the recent technological advancements in GPU technology. Indeed, with Moore's Law no longer holding, parallelised algorithms may, in future, be the only way to analyse very large data [22]. Second, empirical data have shown that neural networks with large numbers of hidden layers outperform many algorithms at several machine-learning tasks, especially in computer vision, object recognition and object detection. Deeper and deeper neural networks, with larger and larger numbers of neurons, have achieved human-level performance at very human-like tasks, such as playing video games [23] and playing the game of Go [24]. Deeper networks, however, contain more neurons, each of which needs to perform some calculation, and have its associated weight tuned, resulting in longer training times and larger memory requirements. Again, advances in hardware and optimisation techniques have meant that ever deeper networks are now trainable within reasonable timeframes [25]. Third, more and more data are permanently stored, archived and saved than ever before. This is especially true in fields such as medicine, where large amounts of data are accumulated during routine activities. In the past, these data might have been archived or stored in offline tape drives, or even discarded. However, this is no longer necessarily true as the cost per GB of storage has declined so rapidly, meaning easier access to more data and less likelihood of data being discarded. Deep learning algorithms require large amounts of data to train and access to very large datasets, and the ability for individuals to store large amounts of data has meant they are being applied to such problems more often.

**Figure 12.** Structure of a feed forward artificial neural network.

pagation algorithm).

64 Microscopy and Analysis

In general, feed forward neural networks are supervised machine-learning algorithms. **Figure 12** shows a network with three layers: (1) an input layer, where data are fed in, (2) a hidden layer consisting of neurons that each contain an activation function that reads in data from the input neurons, performs some calculation, and outputs a value, and (3) an output layer that reads data from the hidden layer and makes a prediction based on this input. All connections between neurons have independently adjustable weights (Section 6). All layers are fully connected meaning that each neuron in the input layer is connected to every neuron in the hidden layer. The network learns by adjusting the weights between each of the connected neurons until the network makes good predictions by minimising an error function (backpro‐

Fully connected neural networks are useful where individual features of a dataset are not very informative. In image data, where an individual pixel is not likely to be very informative taken on its own, a local combination of pixels may very well be informative and represent an object of interest. However, neural networks are also far more computationally intensive than many other machine-learning algorithms, with the number of tuneable parameters quickly growing into the millions as the network increases in depth or size. Also, neural networks typically work on image data directly, without feature reduction, meaning the dimensionality of the data being analysed by neural networks is much higher than that of other algorithms, which often work on extracted features. One could therefore summarise that neural networks are most useful for high *m* high *n* problems—problems where there exist many observations (*n*) of high dimensional data (*m*). Of late, neural networks algorithms have re-emerged as a popular technique in machine learning, especially in the field of image analysis. This reemergence has come due to a number of recent developments in neural network design as well as independent hardware developments. In real-world applications, their usage has grown beyond image analysis and has also been shown to be useful for other tasks, such as natural

Traditional feed forward neural networks consist of layers, where each neuron is connected to every other neuron in the layers above and below it. These are known as fully connected, or affine, layers. Fully connected neural networks do not consider the spatial relation between pixels in an image. Pixels which are close together are treated exactly like pixels which are far apart when being processed by the network. For the learning of high-level features, this is suboptimal. In terms of image analysis, one particular type of neural network algorithm has stood out as being especially adept at image classification and object recognition. This is the convolutional neural network. The idea behind convolutional neural networks is to restrict the network to take inputs only from spatially nearby neurons. In other words, the layers are not fully connected, as in the example in **Figure 12**.

### **8. Convolutional neural networks**

In the fields of image analysis, object detection and pattern recognition, convolutional neural networks are the state of the art algorithm for practical applications. Following on from our previous work, where we applied multiresolution analysis and CART as tree-based machinelearning method (Section 5), we decided to test the applicability of convolutional neural networks at a similar classification task. Because neural networks learn their own discrimina‐ tory, high-level features, the dataset requires no pre-processing or feature extraction, with the exception of image resizing and pixel value normalisation. This is in direct contrast to our previous efforts, where a dedicated feature extraction phase was necessary. Convolutional neural networks (CNN), in effect, emulate the way in which classical pattern recognition works, where local features (edges, corners, etc.) are extracted and combined to generate higher level representations that can be used for object recognition. Convolutional neural networks are locally connected, where each neuron is connected only to those that are spatially close (local receptive fields) in the previous layer, mimicking the visual cortex of some animals. Pixels that are closer to each other are more strongly correlated than those which are further away from each other, and this is something which the convolutional neural network has been designed to be able to account for through its architecture [26].

Network architectures with fully connected layers do not take into account the spatial structure of the images. Instead of using a network architecture which is tabula rasa, convolution neural networks (CNN) try to take advantage of spatial structures in images. They use three basic ideas: local receptive fields, shared weights and pooling. It is helpful to represent the input image as a square of neurons, whose values correspond to the pixel intensities. Then, only small, localized regions of the input image are connected to a neuron in the first hidden layer. Such a region in the input image is called the local receptive field for the corresponding hidden neuron. In other words, the hidden neuron learns to analyse its particular local receptive field. If the receptive field has a size of 5 × 5 pixels, then the hidden neuron is connected by 5 × 5 weights, which are adjusted during learning. The input of the hidden neuron is given by the summation function:

$$\mathbf{y}\_{\boldsymbol{j}} = \sum\_{\mathbf{l}=0}^{4} \sum\_{\mathbf{m}=0}^{4} \mathbf{w}\_{\mathbf{l},\mathbf{m}}^{\boldsymbol{j}} \mathbf{b}\_{\boldsymbol{j}+\mathbf{l},\mathbf{k}+\mathbf{m}}$$

The value bx,y denotes the input activation at position (x, y). The output of the hidden neuron is given by the activation function, for example the sigmoid function. The convolutional operation can be considered as a sliding window, which travels over the image, with the window centre moving one or more pixel a time. This is defined by the stride length. If the window is moved by one pixel, the stride length is 1. For each position of the local receptive field, there is a different hidden neuron in the first hidden layer. The map from the input layer to the hidden layer (convolutional layer) is called a feature map. The weights *wl,m* defining the feature map are the shared weights. The shared weights define the convolution kernel (convolution is generally the workhorse of image processing). The pixels in the local receptive field are multiplied element-wise with the kernel. Features maps are generated using only neurons which are spatially close to each other, known as spatial connectivity. Each feature map is defined by a specific set of shared weights enabling the network to detect different kinds of features (edges, corners, etc.). The CNN therefore learns objects related to their spatial structure. For image analysis purposes, more than one feature map are required. Therefore, a complete convolutional layer consists of several different feature maps. In addition to the convolutional layers, CNNs also contain pooling layers which usually follow immediately after the convolutional layers. Pooling layers simplify the information in the output from the convolutional layer by generating a condensed feature map (this removes the positional information of the features learned, meaning the learned features are position invariant). For example, each unit in the pooling layer may summarize a region of 2 × 2 neurons in the previous convolutional layer. Pooling is done for each feature map separately. The final layer in the convolutional network is a fully connected layer. This layer connects every neuron from the last pooling layer to every one of the output neurons.

pixels in an image. Pixels which are close together are treated exactly like pixels which are far apart when being processed by the network. For the learning of high-level features, this is suboptimal. In terms of image analysis, one particular type of neural network algorithm has stood out as being especially adept at image classification and object recognition. This is the convolutional neural network. The idea behind convolutional neural networks is to restrict the network to take inputs only from spatially nearby neurons. In other words, the layers are not

In the fields of image analysis, object detection and pattern recognition, convolutional neural networks are the state of the art algorithm for practical applications. Following on from our previous work, where we applied multiresolution analysis and CART as tree-based machinelearning method (Section 5), we decided to test the applicability of convolutional neural networks at a similar classification task. Because neural networks learn their own discrimina‐ tory, high-level features, the dataset requires no pre-processing or feature extraction, with the exception of image resizing and pixel value normalisation. This is in direct contrast to our previous efforts, where a dedicated feature extraction phase was necessary. Convolutional neural networks (CNN), in effect, emulate the way in which classical pattern recognition works, where local features (edges, corners, etc.) are extracted and combined to generate higher level representations that can be used for object recognition. Convolutional neural networks are locally connected, where each neuron is connected only to those that are spatially close (local receptive fields) in the previous layer, mimicking the visual cortex of some animals. Pixels that are closer to each other are more strongly correlated than those which are further away from each other, and this is something which the convolutional neural network has been

Network architectures with fully connected layers do not take into account the spatial structure of the images. Instead of using a network architecture which is tabula rasa, convolution neural networks (CNN) try to take advantage of spatial structures in images. They use three basic ideas: local receptive fields, shared weights and pooling. It is helpful to represent the input image as a square of neurons, whose values correspond to the pixel intensities. Then, only small, localized regions of the input image are connected to a neuron in the first hidden layer. Such a region in the input image is called the local receptive field for the corresponding hidden neuron. In other words, the hidden neuron learns to analyse its particular local receptive field. If the receptive field has a size of 5 × 5 pixels, then the hidden neuron is connected by 5 × 5 weights, which are adjusted during learning. The input of the hidden neuron is given by the

> 4 4 <sup>j</sup> j l,m j l,k m l 0m 0 y wb + + = = <sup>=</sup> åå

fully connected, as in the example in **Figure 12**.

designed to be able to account for through its architecture [26].

summation function:

**8. Convolutional neural networks**

66 Microscopy and Analysis

A depiction of a typical 7-layer convolutional neural network can be seen in **Figure 13**. Images are read into the network in the input layer. From this input, a number of feature maps (4) are generated, which are subsampled in a max-pooling phase. Then, both phases are repeated once more, before connecting to a conventional fully connected layer which is finally connected to the output layer. CNNs often contain multiple fully connected layers before the final output layer, and modern CNNs can contain many convolution/max-pooling pairs.

**Figure 13.** The structure of a typical seven-layer convolutional neural network.

**Figure 14** describes the convolutional layer and max-pooling layer in more detail. The input into the convolutional neural network is a vector **x**∈R1×*<sup>m</sup>*, and the input layer has one neuron per feature. However, the layers can be thought as having their neurons arranged as depicted in **Figures 11** and **12**. In the case above, a 5 × 5 kernel is used, with a stride of 1, which results in a feature map of size 32 – 5 + 1 = 28 × 28. Typically, a convolutional layer is followed by a max-pooling layer, which acts as a type of sub-sampling, in this case halving the size of the previous feature map (**Figure 13**).

**Figure 14.** Principle of the convolutional layers and max-pooling layers [27].

Convolutional neural networks possess several characteristics that make them very suitable for the analysis of histological images. First, convolutional neural networks are capable of building models which are translation invariant and robust to transformations in the images, such as rotation, and they can learn features which are robust to scaling. They also generate models which are position invariant. This is especially important for microscopy imagery, where a lesion, for example, has no 'right way up', and cannot even be rotationally normalised.

## **9. Deep learning analysis of a CLSM image dataset**

As stated previously, the goal was to train a model which would classify newly seen images as either malignant or benign. The neural network that was designed was based on the structure of the LeNet-5 convolutional neural network structure and was developed using the Keras deep learning library for Python [26]. The network consisted of a total of eight layers: the input layer, two pairs of convolutional and max-pooling pairs, two fully connect‐ ed layers, and the output layer. The rectified linear unit (ReLU) was used throughout as the neuron nonlinearity. The ReLU is a computational unit which uses a ramp function [the rectifier f(x)=max(0, x)] and is currently the most popular activation function for deep neural networks. Because of the depth of network, a graphics processing unit (GPU) was used, which greatly increases the speed at which the network can train. In terms of hardware, a midrange NVidia gaming GPU with 2 GB of dedicated video memory and 640 cores was used for training the network. The card is capable of 1306 GFLOP/s and has a memory band‐

width of 86.4 GB/s. At the time of writing, the card can be purchased for under \$150. The card was installed in a Linux workstation with 32 GB of RAM and a 3.5 GHz 6-core AMD processor running the Xubuntu 14.04 operating system. To illustrate the differences in computational power between a GPU and CPU, and to demonstrate the enormous impact using a GPU can have on training times, we benchmarked our code. Training the network over 20 epochs required 2 min 4 s of time, averaged over three runs, when using the GPU. When using the CPU, this time was 57 min 59 s for 20 epochs (also averaged over three runs), nearly 30 times slower. Experimenting with different parameters, or testing new network structures, can become very tedious when hours of computational power are required per run or experiment. The GPU reduces this time to minutes.

Dropout was used to control overfitting at two points in the network's structure: once after the convolutional and max-pooling pairs, and once again after the first fully connected layer. Dropout helps to control overfitting by randomly setting a certain set percentage of the neurons' weights to zero, effectively forcing the network to relearn those weights, with the intention of mitigating the learning of noise. The output of the network is finally determined by a sigmoid logistic function, squashing the results of the entire network to a value between 0 and 1. Values closer to 1 are therefore classified as being malignant, while values closer to 0 refer to a benign prediction. Such an output can also be used examine the network's confidence at a classification, with a value of 0.99 meaning a highly confident malignant prediction and a value of 0.51 representing an unconfident malignant prediction.

#### **9.1. Input into the neural network**

**Figure 14** describes the convolutional layer and max-pooling layer in more detail. The input into the convolutional neural network is a vector **x**∈R1×*<sup>m</sup>*, and the input layer has one neuron per feature. However, the layers can be thought as having their neurons arranged as depicted in **Figures 11** and **12**. In the case above, a 5 × 5 kernel is used, with a stride of 1, which results in a feature map of size 32 – 5 + 1 = 28 × 28. Typically, a convolutional layer is followed by a max-pooling layer, which acts as a type of sub-sampling, in this case halving the size of the

Convolutional neural networks possess several characteristics that make them very suitable for the analysis of histological images. First, convolutional neural networks are capable of building models which are translation invariant and robust to transformations in the images, such as rotation, and they can learn features which are robust to scaling. They also generate models which are position invariant. This is especially important for microscopy imagery, where a lesion, for example, has no 'right way up', and cannot even be rotationally normalised.

As stated previously, the goal was to train a model which would classify newly seen images as either malignant or benign. The neural network that was designed was based on the structure of the LeNet-5 convolutional neural network structure and was developed using the Keras deep learning library for Python [26]. The network consisted of a total of eight layers: the input layer, two pairs of convolutional and max-pooling pairs, two fully connect‐ ed layers, and the output layer. The rectified linear unit (ReLU) was used throughout as the neuron nonlinearity. The ReLU is a computational unit which uses a ramp function [the rectifier f(x)=max(0, x)] and is currently the most popular activation function for deep neural networks. Because of the depth of network, a graphics processing unit (GPU) was used, which greatly increases the speed at which the network can train. In terms of hardware, a midrange NVidia gaming GPU with 2 GB of dedicated video memory and 640 cores was used for training the network. The card is capable of 1306 GFLOP/s and has a memory band‐

previous feature map (**Figure 13**).

68 Microscopy and Analysis

**Figure 14.** Principle of the convolutional layers and max-pooling layers [27].

**9. Deep learning analysis of a CLSM image dataset**

Images are read directly by the neural network. The only pre-processing which was performed was to resize the images from 640 × 480 to 64 × 64 pixels. Images are read by the neural network as a series of pixel values stored in a vector. Therefore, a single image is stored as a vector *x*, so that one instance of an image The dataset consisted of *n* = 6897 images, each 64 × 64 pixels in size, representing a dimensionality *m* = 4096. The entire dataset is therefore stored in an *n* × *m* matrix:

$$\mathbf{X} \in \mathbb{R}^{n \times m} = \begin{bmatrix} \varkappa\_1^{(1)} & \cdots & \varkappa\_m^{(1)} \\ \vdots & \ddots & \vdots \\ \varkappa\_1^{(n)} & \cdots & \varkappa\_m^{(n)} \end{bmatrix}$$

To reduce the memory footprint, neural networks are typically trained using mini-batches, which are randomly selected subsets of *X*. Targets, or labels, are stored in an *n*-dimensional column vector:

$$\mathbf{y} = \begin{bmatrix} \mathbf{y}^{(i)} \\ \vdots \\ \mathbf{y}^{(n)} \end{bmatrix} \{ \mathbf{y} \in \{0, 1\} \mid \mathbf{0} = \text{Bensign, } \mathbf{1} = \text{Malignant} \}.$$

Therefore, to input an image into a neural network, it must first be converted into a vector of pixel values. Each image vector's label is stored numerically in a separate target vector, *y*. Once these have been prepared, a training matrix *X*train, a test matrix *X*test, and their corresponding target vectors *y*train and *y*test must also be generated.

#### **9.2. Keras**

Recently, a number of frameworks have been developed for deep learning, ranging from lowlevel, general purpose math expression compilers, such as Theano, to higher level frameworks such as Torch. For this analysis, the Keras framework was used. Keras is written in Python and is based on the Theano framework. It offers a high level control over network construction, abstracting the low-level Theano code, making it possible to design neural network structures in a layer-wise, modular fashion. Layers and functionality are added to the network piece by piece and are finally compiled into a complete network once the desired structure has been built. Users of Python can install Keras using pip, by typing pip install keras at the comment prompt. Keras has a number of requirements, including Theano (which can also be installed using pip install Theano at the command prompt). Briefly, once Keras has been correctly installed and successfully imported into the environment, a convolutional neural network is created by instantiating an object of the Sequential class, and then by adding layers to this object until the desired network is complete. For example, a convolutional layer can be added to the network using the add function: model.add(Convolution2D(…)). Configuring network properties, such as when to use dropout or specifying which activation function should be used, is also performed using the add function of the model object. The network is built in this way until the desired structure has been defined, and is then compiled using the model object's compile function. As Keras is based on Theano, the model is generated into Theano code, which itself is compiled into CUDA C++ code, and subsequently run on the GPU. Upon successful compilation the model, it can be trained on a dataset using the fit function, which takes the training data set as one of its parameters. A trained model can then be tested using the held back test data, using the trained model's evaluate function. Full Python source code for the generation of the model can be found in this book chapter's GitHub repository under https:// github.com/mdbloice/CLSM-classification. This source file contains a complete implementa‐ tion of the network, including the generation of all the plots and figures shown in the Section 10.

## **10. Results**

#### **10.1. Multiresolution analysis**

Overall, 857 images of benign common nevi (408 images) and malignant melanoma (449 images) were used as study set [29]. To get more insights into the classification performance, a percentage split was performed by using 66% of the dataset for training and the remaining instances (34%) as the test set (**Table 1**). The classification results of 572 cases (276 benign


common nevi, 296 malignant melanomas) in the training set and 285 cases (132 benign common nevi, 153 malignant melanomas) in the test set.

**Table 1.** Classification results for features based on multiresolution analysis.

Therefore, to input an image into a neural network, it must first be converted into a vector of pixel values. Each image vector's label is stored numerically in a separate target vector, *y*. Once these have been prepared, a training matrix *X*train, a test matrix *X*test, and their corresponding

Recently, a number of frameworks have been developed for deep learning, ranging from lowlevel, general purpose math expression compilers, such as Theano, to higher level frameworks such as Torch. For this analysis, the Keras framework was used. Keras is written in Python and is based on the Theano framework. It offers a high level control over network construction, abstracting the low-level Theano code, making it possible to design neural network structures in a layer-wise, modular fashion. Layers and functionality are added to the network piece by piece and are finally compiled into a complete network once the desired structure has been built. Users of Python can install Keras using pip, by typing pip install keras at the comment prompt. Keras has a number of requirements, including Theano (which can also be installed using pip install Theano at the command prompt). Briefly, once Keras has been correctly installed and successfully imported into the environment, a convolutional neural network is created by instantiating an object of the Sequential class, and then by adding layers to this object until the desired network is complete. For example, a convolutional layer can be added to the network using the add function: model.add(Convolution2D(…)). Configuring network properties, such as when to use dropout or specifying which activation function should be used, is also performed using the add function of the model object. The network is built in this way until the desired structure has been defined, and is then compiled using the model object's compile function. As Keras is based on Theano, the model is generated into Theano code, which itself is compiled into CUDA C++ code, and subsequently run on the GPU. Upon successful compilation the model, it can be trained on a dataset using the fit function, which takes the training data set as one of its parameters. A trained model can then be tested using the held back test data, using the trained model's evaluate function. Full Python source code for the generation of the model can be found in this book chapter's GitHub repository under https:// github.com/mdbloice/CLSM-classification. This source file contains a complete implementa‐ tion of the network, including the generation of all the plots and figures shown in the Section

Overall, 857 images of benign common nevi (408 images) and malignant melanoma (449 images) were used as study set [29]. To get more insights into the classification performance, a percentage split was performed by using 66% of the dataset for training and the remaining instances (34%) as the test set (**Table 1**). The classification results of 572 cases (276 benign

target vectors *y*train and *y*test must also be generated.

**9.2. Keras**

70 Microscopy and Analysis

10.

**10. Results**

**10.1. Multiresolution analysis**

The CART classification shows a correct mean classification of 97.3% samples in the training set and a correct mean classification rate of 81.1% in the test set. In this study, the images were resized to 512 × 512 pixels. To illustrate the differences in the wavelet sub-bands of both tissues, the spectra of the wavelet coefficient standard deviations are shown for typical views of benign common nevi and malignant melanoma (**Figure 15**). The image of benign common nevi show pronounced architectural structures (so called tumour nests), whereas the image of malign melanoma show melanoma cells and connective tissue with few or no architectural structures. These visual findings are reflected by the wavelet coefficients inside the different sub-bands. The standard deviations of the wavelet coefficients in the lower and medium frequency bands (4–10) show higher values for the benign common nevi than for malignant melanoma tissue, indicating more pronounced structures at different orders of magnitude. The tissue of malignant melanoma appears more homogeneous (due to a loss of structure), and the cells are larger as in the case of benign common nevi. The standard deviations in the sub-bands with higher indices (representing finer and more pronounced structures) are lower than in the case of benign common nevi.

**Figure 15.** Sub-band spectra for benign common nevi (right) and malignant melanoma (left).

The analysis of the classification tree shows that seven classification nodes indicate benign common nevi and six nodes malignant melanoma. The visual examination of the selected nodes demonstrates characteristic monomorphic melanocytic cells and melanocytic cell nests for benign common nevi [28, 29]. Contrary polymorphic melanocytic cells, a disarray of melanocytic architecture and poorly defined or absent keratinocytic cell borders are charac‐ teristic for malignant melanomas.

#### **10.2. Convolutional deep learning neural network**

For this study, a dataset consisting of 6897 CLSM images of skin lesions was obtained from our university hospital. The dataset consisted of images of skin lesions in layers of various depths. Before training, the images were randomised and placed into a training set and test set, with the training set consisting of 5000 images and the test set consisting of 1897 images (**Table 2**). It is important to note that, in the case of this project, each image was treated individually, and not treated as belonging to one particular patient or even lesion. The test set, therefore, contained different layers or lesions from potentially the same patient as the training set, as a single patient may have had several scans or may have been examined on multiple occasions.


**Table 2.** The distribution of the classes in the whole dataset and in the training and test set.

Class imbalance occurs when a training set has far more samples of one particular class than another. For example, a small class imbalance existed in the dataset analysed in this chapter, with the samples of benign nevi slightly outnumbering the samples of malignant melanoma (there existed 317 more samples of the former compared to the latter). There are a number of techniques which can be employed to address class imbalance, such as data augmentation (generating synthetic data from your original dataset) or simply by discarding samples to better balance the dataset. In the case of our dataset, class imbalance was not at the degree as to make it problematic. When the training set and test sets were split, however, we ensured that the test set was largely balanced. Class imbalance can also affect how results, such as accuracy and precision/recall, should be perceived when analyzing a trained model on a highly imbalanced test set.

The network, after training for 20 epochs, achieved 93% accuracy on the unseen test set. The model's accuracy on the test set during training, as well as the model's error rate on the training set through each of the 20 epochs is shown in **Figure 16**.

Automatic Interpretation of Melanocytic Images in Confocal Laser Scanning Microscopy http://dx.doi.org/10.5772/63404 73

**Figure 16.** The model's accuracy on the test set and its logistic loss against the training set.

Loss on the training set eventually reduces to almost 0 (meaning it is at this point overfitting heavily), while the accuracy of the model on the unseen test set fluctuates but is tending towards an accuracy of approximately 90%. The accuracy of the final model after epoch 20, when training was terminated, was 93%. A confusion matrix, shown in **Figure 17**, describes the model's accuracy on the test set, in terms of absolute numbers of predicted and actual labels for both the benign and malignant classes.

**Figure 17.** Confusion matrix.

The analysis of the classification tree shows that seven classification nodes indicate benign common nevi and six nodes malignant melanoma. The visual examination of the selected nodes demonstrates characteristic monomorphic melanocytic cells and melanocytic cell nests for benign common nevi [28, 29]. Contrary polymorphic melanocytic cells, a disarray of melanocytic architecture and poorly defined or absent keratinocytic cell borders are charac‐

For this study, a dataset consisting of 6897 CLSM images of skin lesions was obtained from our university hospital. The dataset consisted of images of skin lesions in layers of various depths. Before training, the images were randomised and placed into a training set and test set, with the training set consisting of 5000 images and the test set consisting of 1897 images (**Table 2**). It is important to note that, in the case of this project, each image was treated individually, and not treated as belonging to one particular patient or even lesion. The test set, therefore, contained different layers or lesions from potentially the same patient as the training set, as a single patient may have had several scans or may have been examined on multiple

**Full Dataset Training Set Test Set**

Class imbalance occurs when a training set has far more samples of one particular class than another. For example, a small class imbalance existed in the dataset analysed in this chapter, with the samples of benign nevi slightly outnumbering the samples of malignant melanoma (there existed 317 more samples of the former compared to the latter). There are a number of techniques which can be employed to address class imbalance, such as data augmentation (generating synthetic data from your original dataset) or simply by discarding samples to better balance the dataset. In the case of our dataset, class imbalance was not at the degree as to make it problematic. When the training set and test sets were split, however, we ensured that the test set was largely balanced. Class imbalance can also affect how results, such as accuracy and precision/recall, should be perceived when analyzing a trained model on a highly

The network, after training for 20 epochs, achieved 93% accuracy on the unseen test set. The model's accuracy on the test set during training, as well as the model's error rate on the training

Total 6,897 5,000 1,897 Benign 3,607 2,655 952 Malignant 3,290 2,345 945

**Table 2.** The distribution of the classes in the whole dataset and in the training and test set.

set through each of the 20 epochs is shown in **Figure 16**.

teristic for malignant melanomas.

72 Microscopy and Analysis

occasions.

imbalanced test set.

**10.2. Convolutional deep learning neural network**

Here, all true/false positives and true/false negatives can be seen. From these values, the precision, recall (sensitivity), and *F*1 score (a weighted average of the precision and recall, given by *F*<sup>1</sup> =2<sup>⋅</sup> precision <sup>⋅</sup> recall precision <sup>+</sup> recall ) were calculated, as shown in **Table 3**.


**Table 3.** The generated model's precision, recall and F1 score measured against the test set.

**Table 4** describes the results of the model in absolute terms, with results for the model's predicted labels for both classes versus the actual labels for each class. As well as this, the total number of actual and predicted labels is shown.


**Table 4.** The generated model's predicted labels versus the actual labels, measured on the test set.

#### **10.3. Transfer learning**

Transfer learning is a term that can be applied to several aspects of machine learning. In the case of neural network-based machine-learning approaches, transfer learning often refers to the act of using a pre-trained network as the starting point for a learning procedure, rather than starting with a network which has been initialized with random weights. This is often performed as a time-saving measure, but can also be done when the new data to be classified is scarce. Also, it can be performed only when the data used for pre-training is similar to the new data which should be classified. Furthermore, it constrains the practitioner into using a network which has the same architecture of the pre-trained model. Therefore, it is not useable in all situations, and it does not make sense to use, say, a network pre-trained on the ImageNet dataset (a commonly used benchmarking dataset, containing millions of samples of 1000 classes of images) in the context of CLSM lesion classification.

However, there exist several types of laser scanner-based approaches to skin lesion analysis, where the use of transfer learning may be beneficial. Other methods in the field include two photon excitation fluorescence microscopy, second harmonic imaging microscopy, fluores‐ cence-lifetime imaging microscopy and coherent anti-stokes Raman microscopy. Whether or not transfer learning could indeed be implemented in this context would depend entirely on how well the features learned during pre-training match the features that exist in the new data (in other words, whether the learned features transfer well from one domain to the other). For example, several new methods produce colour images, which would mean the features learned in the analysis described here would likely not transfer well to this new domain (of course, colour images could be converted to greyscale). However, it is conceivable that other technol‐ ogies, that also produce greyscale images, could make use of a pre-trained network, and thus benefit from pre-trained weight initialisation and therefore transfer learning.

The machine-learning community often makes available pre-trained networks for others to use, such as in the Model Zoo (https://github.com/BVLC/caffe/wiki/Model-Zoo). Some of the networks available on the Model Zoo took many weeks to train on powerful hardware, and is considered a very useful resource by many who do not have the time or the computational resources available to them for such an involved learning task. Of course, a pre-trained network could be made available for the CLSM or skin lesion analysis community, if the network was trained on a sufficiently large dataset and if indeed the learned features would transfer well to other domains.

## **11. Discussion**

**Precision Recall (Sensitivity)** *F***1 score Support**

**Table 4** describes the results of the model in absolute terms, with results for the model's predicted labels for both classes versus the actual labels for each class. As well as this, the total

Transfer learning is a term that can be applied to several aspects of machine learning. In the case of neural network-based machine-learning approaches, transfer learning often refers to the act of using a pre-trained network as the starting point for a learning procedure, rather than starting with a network which has been initialized with random weights. This is often performed as a time-saving measure, but can also be done when the new data to be classified is scarce. Also, it can be performed only when the data used for pre-training is similar to the new data which should be classified. Furthermore, it constrains the practitioner into using a network which has the same architecture of the pre-trained model. Therefore, it is not useable in all situations, and it does not make sense to use, say, a network pre-trained on the ImageNet dataset (a commonly used benchmarking dataset, containing millions of samples of 1000

However, there exist several types of laser scanner-based approaches to skin lesion analysis, where the use of transfer learning may be beneficial. Other methods in the field include two photon excitation fluorescence microscopy, second harmonic imaging microscopy, fluores‐ cence-lifetime imaging microscopy and coherent anti-stokes Raman microscopy. Whether or not transfer learning could indeed be implemented in this context would depend entirely on how well the features learned during pre-training match the features that exist in the new data (in other words, whether the learned features transfer well from one domain to the other). For example, several new methods produce colour images, which would mean the features learned in the analysis described here would likely not transfer well to this new domain (of course, colour images could be converted to greyscale). However, it is conceivable that other technol‐

Benign 0.94 0.91 0.93 952 Malignant 0.91 0.94 0.93 945 Avg/total **0.93 0.93 0.93 1897**

**Predicted Benign Malignant Total** Benign **868** 84 952 Malignant 52 **893** 945 All 920 977 1897

**Table 4.** The generated model's predicted labels versus the actual labels, measured on the test set.

**Table 3.** The generated model's precision, recall and F1 score measured against the test set.

**Actual**

classes of images) in the context of CLSM lesion classification.

number of actual and predicted labels is shown.

**10.3. Transfer learning**

74 Microscopy and Analysis

Confocal laser scanning microscopy is a technique for obtaining high-resolution optical images with depth selectivity. It enables the noninvasive examination of skin cancer in real-time. This makes CLSM very suitable for screening and early recognition of skin tumours, which augment the success of the therapy. The training of pathologists to acquire and refine their visual diagnostic skills is very time-consuming. To implement diagnostic capabilities on a computer, it is of considerable interest to understand how the diagnostic process unfolds and which texture features are critical for a successful diagnosis. For medical diagnosis, it is important to duplicate the automated diagnostic process.

The multiresolution approach with wavelets features mimics the diagnostic guidelines of the dermatopathologist, as they use multiscale features for the examination of CLSM views. The decision rules generated by machine-learning algorithms, such as CART, represent explicit knowledge that can be used to analyse and refine the diagnostic process. The generated rules can be implemented in viewer software which enables a visual evaluation of the diagnostic performance by the dermatologist. This can be used as a training aid for ongoing dermatolo‐ gists in education. As shown in the Section 10, the algorithm performance allows a correct classification of 78.0% of the benign common nevi cases and 84.1% of the malignant melanoma in the test set. In contrast, sensitivity and specificity of 85.5 and 80.1% are reached by the human observer (overall performance 82.8%).

Although the CART algorithm discriminates the training set automatically (unsupervised), the feature extraction algorithm is predefined. Algorithms based on artificial neural networks do not perform or require hand-defined analyses of the image features with predefined (filtering) methods. Instead, they use neural computation inspired by the visual system of mammals. Neural networks process an image by use of a hierarchical processing architecture which mimics the way the visual cortex processes visual stimuli from the primary cortex (V1) to different layers (V2–V8) which are selective for different components of the visual stimuli such as orientation, colour, size, depth and motion. Neural networks are well suited for detecting similarities in images. However, the distributed representation of the acquired knowledge complicates the extraction of the diagnostic information. They deliver nothing about the inference mechanism leading to a classification in a form that is easy readable for the human observer. Nevertheless, we can demonstrate a real example as to why artificial neural networks will play an ever more important role in automated medical diagnostic systems. A recent work reported that pigeons (columba livia) proved to have a remarkable ability at discriminating benign from malignant human breast histopathology images and at detecting cancer relevant micro calcifications in mammogram images after differential training with food reinforcement [30]. The discrimination was done by the pigeons via two distinctively coloured response buttons. For a correct discrimination, food was immediately provided by a dispenser. The pigeons proved not only to be capable of image memorization but were able to extend the learned skills to novel tissue images. It results that their diagnostic skills are like that of trained humans. It should be noted that the capabilities were acquired without the benefit of verbal instructions as in the case with human education. The low-level vision capabilities of pigeons appear to be equivalent to those in humans; feedforward and hierarch‐ ical processing seem to dominate. It can be assumed that pigeons do not explicitly analyse the images with predefined criteria and explicit instructions as humans do. The reinforcement training of the pigeons resembles the training of artificial neural networks. Given the high diagnostic accuracy of the pigeons they may serve as a model for the development and amelioration of artificial networks (or vice versa). We still do not know in detail how pigeons differentiate such complex visual stimuli but colour, size, shape, texture, and configurational cues seem to participate. Their visual discrimination performance may guide the basic research in artificial neural networks in order to develop computer-assisted image diagnostic systems. Experienced dermatopathologists reported that a beginner (a person in education) examines the CLSM views strictly according to the dermatological guidelines (Section 4), as the com‐ puters do by multiresolution analysis. Based on the large amount of previously viewed specimens, an experienced person reports the CLSM views more by its visual appearance (personal communication). This is similar to the image analysis performed by a trained neural network. The receptive field of a sensory neuron is a particular region in the visual system in which a stimulus will trigger the firing of that neuron. In vision research, it is known that a cat's visual cortex only develops its receptive fields if it receives visual stimuli in the first months of life [31]. The receptive fields in the primary visual cortex can be thought as 'feature detectors' or 'flexible categorizers'. This means that they learn the structure of the input patterns and become sensitive to combinations that are frequently repeated [14]. This also demonstrates the importance of convolutional neural networks in image processing and analysis.

In this work, and given the relatively small dataset size, the performance of the trained neural network model is encouraging. However, the results must be considered as a proof of concept, and not a model that could be used in a clinical setting, despite the good accuracy of the trained model. For example, the images were collected from a single department, at one hospital in a single region in Austria. To judge the potential real-world accuracy of a trained model would require a far larger dataset, collected from several regions worldwide, and carefully curated to ensure no unintentional bias is introduced (by only collecting data from patients of a certain age range, for example). By training a model on a far larger dataset such a model could be used in real-world clinical settings as a diagnosis aid.

The work here shows that deep layer neural networks have the capacity to learn the high-level discriminatory features required to classify malignant and benign skin lesions. This can be achieved without any dedicated feature engineering phase, data pre-processing or a priori domain knowledge. In the case of the CLSM image classification task presented here, all that was required was a labelled dataset of previous observations. However, what is also true is that neural networks require far more training data than traditional machine vision methods that work on extracted features. This is due to the very high dimensionality of the data, which in our case was ℝ<sup>4096</sup>, in contrast to the analysis of the extracted features where the dimen‐ sionality was ℝ39. To compensate for a far higher dimensionality, a much larger dataset is, therefore, a necessity. In other words, deep learning neural networks are most suitable for situations where you encounter data with 'high m, high n' properties—high dimensional data, like images, of which many samples exist—such datasets are common in the medical domain, meaning deep learning should be of especial interest to researchers in the area of healthcare informatics.

As parallelized hardware advances, Moore's law begins to plateau, and the amounts of data being stored increases, algorithms that take advantage of this perfect storm will become more and more relevant. We have shown in this chapter that classical approaches to image classifi‐ cation can indeed be emulated by deep neural networks fed with large amounts of observed data. In fields such as medicine, where data are in such abundance, highly parallelized algorithms may be the only approach that can deal with such large data sources in a meaningful way. Fortunately, this is no longer the domain of specialized research institutes with access to cluster computing: such algorithms are trainable without large investments in hardware and can be performed on a standard desktop workstation equipped with a modestly priced GPU.

## **Author details**

about the inference mechanism leading to a classification in a form that is easy readable for the human observer. Nevertheless, we can demonstrate a real example as to why artificial neural networks will play an ever more important role in automated medical diagnostic systems. A recent work reported that pigeons (columba livia) proved to have a remarkable ability at discriminating benign from malignant human breast histopathology images and at detecting cancer relevant micro calcifications in mammogram images after differential training with food reinforcement [30]. The discrimination was done by the pigeons via two distinctively coloured response buttons. For a correct discrimination, food was immediately provided by a dispenser. The pigeons proved not only to be capable of image memorization but were able to extend the learned skills to novel tissue images. It results that their diagnostic skills are like that of trained humans. It should be noted that the capabilities were acquired without the benefit of verbal instructions as in the case with human education. The low-level vision capabilities of pigeons appear to be equivalent to those in humans; feedforward and hierarch‐ ical processing seem to dominate. It can be assumed that pigeons do not explicitly analyse the images with predefined criteria and explicit instructions as humans do. The reinforcement training of the pigeons resembles the training of artificial neural networks. Given the high diagnostic accuracy of the pigeons they may serve as a model for the development and amelioration of artificial networks (or vice versa). We still do not know in detail how pigeons differentiate such complex visual stimuli but colour, size, shape, texture, and configurational cues seem to participate. Their visual discrimination performance may guide the basic research in artificial neural networks in order to develop computer-assisted image diagnostic systems. Experienced dermatopathologists reported that a beginner (a person in education) examines the CLSM views strictly according to the dermatological guidelines (Section 4), as the com‐ puters do by multiresolution analysis. Based on the large amount of previously viewed specimens, an experienced person reports the CLSM views more by its visual appearance (personal communication). This is similar to the image analysis performed by a trained neural network. The receptive field of a sensory neuron is a particular region in the visual system in which a stimulus will trigger the firing of that neuron. In vision research, it is known that a cat's visual cortex only develops its receptive fields if it receives visual stimuli in the first months of life [31]. The receptive fields in the primary visual cortex can be thought as 'feature detectors' or 'flexible categorizers'. This means that they learn the structure of the input patterns and become sensitive to combinations that are frequently repeated [14]. This also demonstrates the importance of convolutional neural networks in image processing and

In this work, and given the relatively small dataset size, the performance of the trained neural network model is encouraging. However, the results must be considered as a proof of concept, and not a model that could be used in a clinical setting, despite the good accuracy of the trained model. For example, the images were collected from a single department, at one hospital in a single region in Austria. To judge the potential real-world accuracy of a trained model would require a far larger dataset, collected from several regions worldwide, and carefully curated to ensure no unintentional bias is introduced (by only collecting data from patients of a certain age range, for example). By training a model on a far larger dataset such a model could be used

analysis.

76 Microscopy and Analysis

in real-world clinical settings as a diagnosis aid.

Marco Wiltgen\* and Marcus Bloice

\*Address all correspondence to: marco.wiltgen@medunigraz.at

Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria

## **References**


[19] Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2(5):359–366.

[3] Bolognia J.L, Jorrizo J.L, Schaffer J.V: Dermatology: Expert Consult Premium Edition.

[4] Markovic S.N, Erickson L.A, Flotte T.J, Kottschade L.A. Metastatic malignant melano‐

[5] Oliveria S, Saraiya M, Geller A, Heneghan M, Jorgensen C. Sun exposure and risk of

[6] Friedman R, Rigel D, Kopf A. Early detection of malignant melanoma: the role of physician examination and self-examination of the skin. CA Cancer J Clin. 1985;35(3):

[7] Pawley J.B. Handbook of Biological Confocal Microscopy. 3rd ed.. Springer, Berlin;

[8] Paoli J, Smedh M, Ericson M.B. Multiphoton laser scanning microscopy—a novel diagnostic method for superficial skin cancers. Semin Cutan Med Surg. 2009;28(3):190–

[9] Patel D.V, McGhee C.N. Contemporary in vivo confocal microscopy of the living human cornea using white light and laser scanning techniques: a major review. Clin

[10] Rajadhyaksha M. Confocal microscopy of skin cancers: translational advances toward

[11] Hofmann-Wellenhof R, Pellacani G, Malvehy J, Soyer H.P. (eds). Reflectance Confocal Microscopy for Skin Diseases. Springer, Berlin Heidelberg; 2012; 978-3-642-21996-2.

[12] Pellacani G, Cesinaro A.M, Seidenari S. In vivo assessment of melanocytic nests in nevi and melanomas by reflectance confocal microscopy. Mod Pathol. 2005;18:469–474.

[13] Prasad L, Iyengar S.S. Wavelet analysis with applications to image processing. CRC

[15] Strang G, Nguyen T. Wavelets and Filterbanks. Wellesley-Cambridge Press. MA USA;

[16] Wiltgen M. Confocal laser scanning microscopy in dermatology: Manual and auto‐ mated diagnosis of skin tumours. In: Chau-Chang Wang editor. Laser Scanning, Theory and Applications, Intech Publisher. Croatia. 2011;133–170. ISBN 978–953–307–205–0)

[17] Murphy, K.P. Machine learning: a probabilistic perspective. MIT press, USA. 2012.

[18] Breiman L, Friedman J, Olshen R.A, Stone C.F. Classification and Regression Trees.

clinical utility. Conf Proc IEEE Eng Med Biol Soc. 2009;1:3231–3233.

Saunders. 3 edition UK. 2012; 978–0723435716

ma. G Ital Dermatol Venereol. 2009;144(1):1–26.

melanoma. Arch Dis Child. 2006;1(2):131–138.

130–151.

78 Microscopy and Analysis

195.

1996.

2006. 0-387-25921-X.

Exp Ophthalmol. 2007;35(1):71–88.

Press, Boca Raton; 1997.

[14] Marr D. Vision. W.H. Freeman, New York, 1982.

Chapman & Hall, New York, London; 1993.


## **Super‐Resolution Confocal Microscopy Through Pixel Reassignment**

Longchao Chen, Yuling Wang and Wei Song

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/63192

#### **Abstract**

Confocal microscopy has gained great popularity in the observation of biological microstructures and dynamic processes. Its resolution enhancement comes from shrinking the pinhole size, which, however, degrades imaging signal‐to‐noise ratio (SNR) severely. Recently developed super‐resolution method based on the pixel reassignment technique is capable of achieving a factor of 2 resolution improvement and further reaching twofold improvement by deconvolution, compared with the optical diffrac‐ tion limit. More importantly, the approach allows better imaging SNR when its lateral resolution is similar to the standard confocal microscopy. Pixel reassignment can be realized both computationally and optically, but the optical realization demonstrates much faster acquisition of super‐resolution imaging. In this chapter, the development and advancement of super‐resolution confocal microscopy through the pixel realign‐ ment method are summarized, and its capabilities of imaging biological structures and interactions are represented.

**Keywords:** super resolution, confocal microscopy, pixel reassignment, computational realization, optical realization

### **1. Introduction**

Better understanding of biological processes at the cellular and subcellular level is closely dependent on the direct visualization of the cellular microstructures. Among the various microscopic techniques, fluorescence microscopy takes advantage of the abilities to observe in real‐time the molecular specificities in living biological samples down to the cellular and/or subcellular scale, and thus has found broad applications in the investigations of cell biology and neuroscience. However, the spatial resolution of conventional microscopy is optically diffrac‐

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

tion‐limited, restricting its lateral resolution to be ∼250 nm and axial resolution to be ∼600 nm (primarily determined by the numerical aperture of microscopic objective), respectively. As a result, it is very challenging to resolve the subcellular structures by the conventional micro‐ scopic technologies because their microstructures are comparable to (even finer than) the diffraction‐limited resolution.

Fortunately, a number of novel fluorescence microscopic techniques with super‐resolution capability have been established to break down the optical diffraction limitation in recent years, allowing the observation of many cellular and subcellular structures that are always not resolvable by the conventional fluorescence microscopy. For example, by sharpening the point‐ spread function of the microscope with the suppression of the fluorescence emission on the rim of a focused laser spot, stimulated emission depletion (STED) microscopy breaks the optical diffraction limitation and achieves resolution as high as ∼30 nm [1]. Localization‐based techniques, such as stochastic optical reconstruction microscopy (STORM) and photoactivated localization microscopy (PALM), enable imaging at a resolution of ∼20 nm [2, 3]. Structured illumination microscopy (SIM) applies spatially structured light illumination for shifting the high spatial frequency to the low‐frequency range, which thus can be collected by microscopy [4]. These methods achieve an order of magnitude improvement in spatial resolution over the conventional fluorescence microscopy. Therefore, the super‐resolution microscopic technolo‐ gy opens up new windows for observing the previously unresolved cellular structures and provides great potentials for elucidating biological processes at the subcellular and molecular scale [4].

Among these high‐resolution fluorescence microscopic techniques, confocal microscopy, the first super‐resolution imaging technique, is one of the most widely used imaging approaches with moderately enhanced spatial resolution. Utilizing a focused laser as an excitation source in combination with a pinhole in front of the detector for blocking out out‐of‐focus signals, confocal microscopy is able to improve the spatial resolution by a factor of 2 in principle. However, instead of its super‐resolution capability, the sectioning capability is more impressed because the spatial resolution with a factor of 2 improvement is hardly accessible in the standard confocal microscopy. The resolution of confocal microscopy relies on the pinhole diameter, that is, higher resolution comes from the smaller sized pinhole filter. Such a small pinhole rejects the unwanted out‐of‐focus light, while parts of the desired in‐focus emission are filtered out simultaneously. As a result, the signal‐to‐noise ratio (SNR) is drastically decreased as the pinhole size shrinks, which, in turn, practically deteriorates the spatial resolution. Instead, the fluorescence efficiency within the biological samples is often weak, so a relatively large pinhole diameter is typically chosen concerning the imaging SNR. Therefore, the standard confocal microscopy is practically unable to provide super‐resolution imaging.

In order to achieve spatial resolution improvement and better imaging SNR simultaneously in confocal microscopy, light/fluorescence signals should be detected with a nearly closed pinhole array instead of a single pinhole [5]. The images acquired by each pinhole within the array have the same resolution but different SNR levels [6]. To overcome this limitation, a method applying the pixel reassignment technique is proposed by reasonably summing the signals from each nearly closed pinhole together, which enables simultaneous improvement of resolution and SNR. In this chapter, we present the state‐of‐the‐art super‐resolution techniques based on the pixel reassignment. Section 2 gives the principle of pixel reassignment firstly, and then two different operations realizing the pixel reassignment. Also, some repre‐ sentative super‐resolution images in biological specimens are summarized in this section. At last, some advances in super‐resolution confocal microscopy through the pixel reassignment will be discussed.

## **2. Super resolution by pixel reassignment**

tion‐limited, restricting its lateral resolution to be ∼250 nm and axial resolution to be ∼600 nm (primarily determined by the numerical aperture of microscopic objective), respectively. As a result, it is very challenging to resolve the subcellular structures by the conventional micro‐ scopic technologies because their microstructures are comparable to (even finer than) the

Fortunately, a number of novel fluorescence microscopic techniques with super‐resolution capability have been established to break down the optical diffraction limitation in recent years, allowing the observation of many cellular and subcellular structures that are always not resolvable by the conventional fluorescence microscopy. For example, by sharpening the point‐ spread function of the microscope with the suppression of the fluorescence emission on the rim of a focused laser spot, stimulated emission depletion (STED) microscopy breaks the optical diffraction limitation and achieves resolution as high as ∼30 nm [1]. Localization‐based techniques, such as stochastic optical reconstruction microscopy (STORM) and photoactivated localization microscopy (PALM), enable imaging at a resolution of ∼20 nm [2, 3]. Structured illumination microscopy (SIM) applies spatially structured light illumination for shifting the high spatial frequency to the low‐frequency range, which thus can be collected by microscopy [4]. These methods achieve an order of magnitude improvement in spatial resolution over the conventional fluorescence microscopy. Therefore, the super‐resolution microscopic technolo‐ gy opens up new windows for observing the previously unresolved cellular structures and provides great potentials for elucidating biological processes at the subcellular and molecular

Among these high‐resolution fluorescence microscopic techniques, confocal microscopy, the first super‐resolution imaging technique, is one of the most widely used imaging approaches with moderately enhanced spatial resolution. Utilizing a focused laser as an excitation source in combination with a pinhole in front of the detector for blocking out out‐of‐focus signals, confocal microscopy is able to improve the spatial resolution by a factor of 2 in principle. However, instead of its super‐resolution capability, the sectioning capability is more impressed because the spatial resolution with a factor of 2 improvement is hardly accessible in the standard confocal microscopy. The resolution of confocal microscopy relies on the pinhole diameter, that is, higher resolution comes from the smaller sized pinhole filter. Such a small pinhole rejects the unwanted out‐of‐focus light, while parts of the desired in‐focus emission are filtered out simultaneously. As a result, the signal‐to‐noise ratio (SNR) is drastically decreased as the pinhole size shrinks, which, in turn, practically deteriorates the spatial resolution. Instead, the fluorescence efficiency within the biological samples is often weak, so a relatively large pinhole diameter is typically chosen concerning the imaging SNR. Therefore, the standard confocal microscopy is practically unable to provide super‐resolution imaging. In order to achieve spatial resolution improvement and better imaging SNR simultaneously in confocal microscopy, light/fluorescence signals should be detected with a nearly closed pinhole array instead of a single pinhole [5]. The images acquired by each pinhole within the array have the same resolution but different SNR levels [6]. To overcome this limitation, a method applying the pixel reassignment technique is proposed by reasonably summing the signals from each nearly closed pinhole together, which enables simultaneous improvement

diffraction‐limited resolution.

82 Microscopy and Analysis

scale [4].

The concept of pixel reassignment is firstly proposed more than two decades ago to solve the drawbacks in standard confocal microscopy [5]. As we know, the reduction of the pinhole diameter down to zero allows the finest lateral resolution in confocal microscopy in theory, which, however, generates fluorescent images with a very low SNR due to the dramatically

**Figure 1.** Schematic diagram illustrating the principles of pixel reassignment. (a) One‐dimensional representation of pixel reassignment. Two pinholes (left and right) within an array displaces by a distance of 'a' from the excitation fo‐ cus, which detect light signals mostly originated from the location of the peak of the product of PSFdet (*x*-*a*) and PSFex (*x*). In the case that PSFdet and PSFex are identical (i.e. neglecting the Stokes shift), the maximum in PSFeff occurs at the position with a distance of *a*/2 from the excitation focus. Thus, the detected light signals from the displaced pinholes are reassigned to the well‐aligned pinhole that is at the center of the excitation focus and the original detection spot. (b) Pixel realignment operation. Top panel shows the excitation foci (blue circles) created by scanning illuminating laser across the sample, where four excitation foci are with the distance of *D* and diameter of *a*. Bottom: Two pixel realign‐ ment operations for increasing the image resolution. Lower left panel represents twofold reduction of the foci without altering their distance. Lower right panel displays the increase of the foci distance to 2D, while maintaining all foci sizes. These two implementations produce an equivalent imaging reconstruction, with only different global scaling fac‐ tor.

degraded light collection efficiency. Although the pinhole size can be adjusted to one Airy unit for better imaging SNR, the lateral resolution is sacrificed. Instead of a single pinhole, a pinhole array is used for the light detection, followed by a reconstruction algorithm for the image formation. As a result, the standard confocal microscopy with the pixel reassignment operation is capable of enhancing its lateral resolution simultaneously with higher imaging SNR.

#### **2.1. Principle of pixel reassignment**

Pixel reassignment demonstrates great potentials for improving both lateral resolution and imaging SNR. Instead of summing the signals directly as the conventional imaging technolo‐ gies, each signal is reassigned to a particular location where the signal most probably comes. **Figure 1(a)** gives the principle of the pixel reassignment in terms of excitation and detection point‐spread function (PSF) [7]. The excitation PSF (PSFex, labeled by blue line) represents the distribution of the corresponding excitation focus. At a displaced pinhole, detection PSF (PSFdet, labeled by green line) is centered on the detection axis with a distributed probability of signal detection around that pinhole. The effective PSF (PSFeff, labeled by red line) is contributed from the overlap (multiplication) of PSFdet and PSFex. The well‐aligned pinhole is coaxial with the excitation focus, realizing the maximal signal detection probability. As the pinhole detector is far away from the axis of the excitation focus, the signal acquisition probability decreases because of their less overlying; consequently, these nearly closed pinhole detectors induce lower‐SNR image.

In the pixel reassignment implementation, a camera (similar with a pinhole array), rather than a point detector, is commonly employed because its individual pixels are considered as infinitely narrow pinhole. Neglecting Stocks shift in single‐photon fluorescence and assuming identical PSFdet and PSFex, a maximal probability of signal acquisition (i.e. PSFeff) is at the midway of the peaks of PSFdet and PSFex. **Figure 1(b)** gives two methods for the pixel reas‐ signment operation, either twofold local contraction of the excitation focus without altering the distance between them (panel in lower left of **Figure 1(b)**), or twofold increasing the distance between the foci while maintaining their original size (panel in lower left of **Fig‐ ure 1(b)**) [8]. By reassigning the signals from all pixels within the detector array (i.e. all displaced pinholes as shown in **Figure 1(a)**) to the particular location, a sharper and higher‐ SNR image is eventually achieved.

Pixel reassignment technique is able to improve the resolution to a factor of 2 without sacrificing SNR, and the resolution can be further improved by deconvolution algorithm up to a factor of 2 [9, 10]. Although the spatial resolution of the pixel reassignment technique is still lower compared with other super‐resolution methods, such as STED and STORM [1–3], it overcomes some of their shortcomings. This technique inherits all advantages of the standard confocal microscopy, including high‐speed imaging rate, acceptable excitation intensity, optical sectioning capability, and a broad choice of fluorescent dyes and/or proteins, making it a readily accessible technology in a variety of biological investigations.

The pixel reassignment can be considered as an alternative method of SIM, theoretically achieving the same spatial resolution improvement compare with standard SIM through point‐like illumination feature. In contrast, the technique demonstrates better feasibility over the standard SIM, that is, the pixel reassignment operation can be easily implemented both computationally and experimentally (optical system adaptation). Unlike computational mode that is always time‐consuming in raw data processing, the pixel reassignment realized with optical means is capable of obtaining super‐resolution images with fast imaging acquisition. More details on these two different methods for realizing the pixel reassignment are repre‐ sented as below.

#### **2.2. Computational realization of pixel reassignment**

#### *2.2.1. Image scanning microscopy*

degraded light collection efficiency. Although the pinhole size can be adjusted to one Airy unit for better imaging SNR, the lateral resolution is sacrificed. Instead of a single pinhole, a pinhole array is used for the light detection, followed by a reconstruction algorithm for the image formation. As a result, the standard confocal microscopy with the pixel reassignment operation is capable of enhancing its lateral resolution simultaneously with higher imaging SNR.

Pixel reassignment demonstrates great potentials for improving both lateral resolution and imaging SNR. Instead of summing the signals directly as the conventional imaging technolo‐ gies, each signal is reassigned to a particular location where the signal most probably comes. **Figure 1(a)** gives the principle of the pixel reassignment in terms of excitation and detection point‐spread function (PSF) [7]. The excitation PSF (PSFex, labeled by blue line) represents the distribution of the corresponding excitation focus. At a displaced pinhole, detection PSF (PSFdet, labeled by green line) is centered on the detection axis with a distributed probability of signal detection around that pinhole. The effective PSF (PSFeff, labeled by red line) is contributed from the overlap (multiplication) of PSFdet and PSFex. The well‐aligned pinhole is coaxial with the excitation focus, realizing the maximal signal detection probability. As the pinhole detector is far away from the axis of the excitation focus, the signal acquisition probability decreases because of their less overlying; consequently, these nearly closed pinhole

In the pixel reassignment implementation, a camera (similar with a pinhole array), rather than a point detector, is commonly employed because its individual pixels are considered as infinitely narrow pinhole. Neglecting Stocks shift in single‐photon fluorescence and assuming identical PSFdet and PSFex, a maximal probability of signal acquisition (i.e. PSFeff) is at the midway of the peaks of PSFdet and PSFex. **Figure 1(b)** gives two methods for the pixel reas‐ signment operation, either twofold local contraction of the excitation focus without altering the distance between them (panel in lower left of **Figure 1(b)**), or twofold increasing the distance between the foci while maintaining their original size (panel in lower left of **Fig‐ ure 1(b)**) [8]. By reassigning the signals from all pixels within the detector array (i.e. all displaced pinholes as shown in **Figure 1(a)**) to the particular location, a sharper and higher‐

Pixel reassignment technique is able to improve the resolution to a factor of 2 without sacrificing SNR, and the resolution can be further improved by deconvolution algorithm up to a factor of 2 [9, 10]. Although the spatial resolution of the pixel reassignment technique is still lower compared with other super‐resolution methods, such as STED and STORM [1–3], it overcomes some of their shortcomings. This technique inherits all advantages of the standard confocal microscopy, including high‐speed imaging rate, acceptable excitation intensity, optical sectioning capability, and a broad choice of fluorescent dyes and/or proteins, making

The pixel reassignment can be considered as an alternative method of SIM, theoretically achieving the same spatial resolution improvement compare with standard SIM through

it a readily accessible technology in a variety of biological investigations.

**2.1. Principle of pixel reassignment**

84 Microscopy and Analysis

detectors induce lower‐SNR image.

SNR image is eventually achieved.

Image scanning microscopy (ISM), proposed by C. Müller and J. Enderlein in 2009, is a super‐ resolution microscopic technique based on the pixel reassignment [11]. This system is modified from a standard confocal microscopy that replaces the point detector (normally a photomul‐ tiplier tube) with an Electron multiplying CCD (EMCCD) camera (labeled 9) as shown in **Figure 2(a)**. The camera takes an image of each spatial position of the scanning focus, and then an algorithm of the pixel reassignment processing is utilized by summing the raw images to reconstruct an ISM image, which improves the resolution from 244 nm to 198 nm laterally.

**Figure 2.** Super‐resolution image scanning microscopy (ISM) with computational pixel reassignment. (a) The schemat‐ ic diagram of ISM system. Fluorescence excitation (1); a super‐continuum white light laser equipped with an acousto‐ optic tunable filter; nonpolarizing beam splitter cube (2); dichroic mirror (3); piezo scanning mirror (4); 4f telescope configuration (5); microscope objective (6); beam diagnostic camera (7); confocal aperture with 200 μm diameter (8); EMCCD camera for fluorescence detection (9). (b) Super‐resolution imaging fluorescent beads with 100‐nm diameter. Left panel: Confocal microscopy image; middle panel: ISM image; right panel: Fourier‐weighted ISM image. Scale bar: 1 μm. (c) Linear cross‐sectional distribution along the horizontal axis of an individual bead image in (b). Adapted with permission from reference [11].

Further, deconvolution function is used to improve its lateral resolution up to 150 nm, 1.63‐ fold better than the image from raw data, as shown in **Figure 2(b)** and **(c)**, respectively. Note that the pinhole in ISM (labeled 8) filters the out‐of‐focus light signals, maintaining the optical sectioning capability as the standard confocal microscopy. In this work, the realization of the lateral resolution improvement up to 198 nm does not entirely rely on the pinhole because of its relatively large diameter, which, however, gives a high imaging SNR. Therefore, with the computational pixel realignment ISM is able to provide images with optimization of both spatial resolution and imaging SNR.

#### *2.2.2. Multifocal structured illumination microscopy*

ISM demonstrates multiple advantages, including the optical sectioning capability as the standard confocal microscopy, the enhanced lateral resolution, and the high fluorescence collection efficiency [11]. However, it is subjected to slow frame rate due to the EMCCD camera (imaging acquisition of 10 ms with each scanning position), and is time‐consuming for visualizing the three‐dimensional (3D) microstructures.

In order to speed up the imaging acquisition, Shroff et al. developed multifocal structured illumination microscopy (MSIM) by using a sparse lattice of excitation foci (similar to swept‐ field or spinning disk confocal microscopy) in 2011 [9]. As shown in **Figure 3**, MSIM applies a digital micromirror device (DMD) for generating the sparse lattice illumination patterns.

**Figure 3.** The schematic of multifocal structured illumination microscopy (MSIM). Lasers with 561 and 488 nm serve as illumination sources. Both laser outputs are combined with a dichroic (DC). After beam expanding, both lasers are di‐ rected onto a digital micromirror device (DMD). The resulting pattern is de‐expanded by a pair of lenses, and is subse‐ quently delivered by the tube lens and microscopic objective inside the microscope (not shown) into the samples. Mechanical shutters (SH) placed in front of the laser output are used for switching illumination on or off. Adapted with permission from reference [9].

After a series of reconstruction steps (open‐source software), MSIM enables 3D subdiffractive imaging with resolution doubling, indicating a lateral resolution at 145 nm and an axial resolution at 400 nm. Moreover, it provides the capability of significantly fast imaging acquisition at one 2D image per second.

Further, deconvolution function is used to improve its lateral resolution up to 150 nm, 1.63‐ fold better than the image from raw data, as shown in **Figure 2(b)** and **(c)**, respectively. Note that the pinhole in ISM (labeled 8) filters the out‐of‐focus light signals, maintaining the optical sectioning capability as the standard confocal microscopy. In this work, the realization of the lateral resolution improvement up to 198 nm does not entirely rely on the pinhole because of its relatively large diameter, which, however, gives a high imaging SNR. Therefore, with the computational pixel realignment ISM is able to provide images with optimization of both

ISM demonstrates multiple advantages, including the optical sectioning capability as the standard confocal microscopy, the enhanced lateral resolution, and the high fluorescence collection efficiency [11]. However, it is subjected to slow frame rate due to the EMCCD camera (imaging acquisition of 10 ms with each scanning position), and is time‐consuming for

In order to speed up the imaging acquisition, Shroff et al. developed multifocal structured illumination microscopy (MSIM) by using a sparse lattice of excitation foci (similar to swept‐ field or spinning disk confocal microscopy) in 2011 [9]. As shown in **Figure 3**, MSIM applies a digital micromirror device (DMD) for generating the sparse lattice illumination patterns.

**Figure 3.** The schematic of multifocal structured illumination microscopy (MSIM). Lasers with 561 and 488 nm serve as illumination sources. Both laser outputs are combined with a dichroic (DC). After beam expanding, both lasers are di‐ rected onto a digital micromirror device (DMD). The resulting pattern is de‐expanded by a pair of lenses, and is subse‐ quently delivered by the tube lens and microscopic objective inside the microscope (not shown) into the samples. Mechanical shutters (SH) placed in front of the laser output are used for switching illumination on or off. Adapted

spatial resolution and imaging SNR.

86 Microscopy and Analysis

with permission from reference [9].

*2.2.2. Multifocal structured illumination microscopy*

visualizing the three‐dimensional (3D) microstructures.

For super‐resolution MSIM, the data acquisition and processing are implemented as below (please refer to **Figure 4** for detailed procedures). First, the sample is excited with a sparse, multifocal excitation pattern. Second, the resulting fluorescence image is recorded with a camera, and then the digital pinholes around each fluorescent focus are applied for rejecting the out‐of‐focus emission. Afterwards, the pixel reassignment with 2× scaling is used to process the resulting image. Repeat the above procedures for the entire imaging region fully illumi‐ nated. Eventually, a super‐resolution image with 2‐fold resolution improvement is obtained through the digital summation of all such pinholed and scaled images. Twofold resolution improvement is further achieved with deconvolution.

**Figure 4.** Super‐resolution MSIM realization. Top left figure represents a wide‐field image produced with a uniformly illuminated pattern onto sample. Right panel provides the reconstructed procedure for the first, tenth, and final raw images of a 120‐frame sequence. Lower left figure displays the super‐resolution MSIM image by deconvolving the summed image. Adapted with permission from Ref [9].

The resolution improvement of MSIM is demonstrated by imaging antibody‐labeled micro‐ tubules in human osteosarcoma (U2OS) cells embedded in Fluoromount as shown in **Fig‐ ure 5**. Compared to the wide‐field images, the multifocal‐excited, pinholed, scaled, and summed (MPSS) images have both higher resolution and better contrast (**Figure 5(b)**). In **Figure 5(d)**, the full‐width at half maximum (FWHM) of light intensity of microtubules is estimated at about 145 nm in MSIM images, giving a twofold resolution enhancement compared with the image from wide‐field microscopy (∼299 nm). Moreover, the frame rate of acquiring an image with field of view at 48 × 49 μm is up to 1 Hz in MSIM, indicating more than 6500‐fold faster acquisition over the ISM technology [11].

**Figure 5.** Resolution doubling of MSIM by imaging antibody‐labeled microtubules in human osteosarcoma (U2OS) cells. (a) MSIM imaging microtubules labeled with Alexa Fluor 488 in a fixed cell. MSIM image is formed from 224 raw images taking ∼1 s total acquisition time with 4.5 ms for each image. Scale bar: 5 μm. (b) Magnified images from the boxed region in (a). Top panel showing a wide‐field image, middle panel showing an MPSS image, and bottom panel showing an MPSS and deconvolved (MSIM) image. Scale bars: 5 μm. (c) Close‐up images of the boxed regions in (b). Scale bars: 1 μm. (d) Intensity profiles along the colored lines in (b), giving FWHM values at 299 nm in wide‐field mi‐ croscopy, 224 nm in MPSS, and 145 nm in MSIM, respectively. Adapted with permission from reference [9].

#### **2.3. Optical realization of pixel reassignment**

The pixel reassignment implemented by the computational means is capable of doubling the resolution than wide‐field imaging [9, 11]. The limitation, however, is that the methods are fundamentally time‐consuming compared to the standard conventional microscopy because a large number of raw images are essentially acquired and processed. Recently, optically realized pixel reassignment has been developed to overcome the limitations by adapting the optical imaging system instead of digital data‐processing operations, which produces images with comparable improvement in the spatial resolution [8, 10, 12].

#### *2.3.1. Instant structured illumination microscopy*

summed (MPSS) images have both higher resolution and better contrast (**Figure 5(b)**). In **Figure 5(d)**, the full‐width at half maximum (FWHM) of light intensity of microtubules is estimated at about 145 nm in MSIM images, giving a twofold resolution enhancement compared with the image from wide‐field microscopy (∼299 nm). Moreover, the frame rate of acquiring an image with field of view at 48 × 49 μm is up to 1 Hz in MSIM, indicating more

**Figure 5.** Resolution doubling of MSIM by imaging antibody‐labeled microtubules in human osteosarcoma (U2OS) cells. (a) MSIM imaging microtubules labeled with Alexa Fluor 488 in a fixed cell. MSIM image is formed from 224 raw images taking ∼1 s total acquisition time with 4.5 ms for each image. Scale bar: 5 μm. (b) Magnified images from the boxed region in (a). Top panel showing a wide‐field image, middle panel showing an MPSS image, and bottom panel showing an MPSS and deconvolved (MSIM) image. Scale bars: 5 μm. (c) Close‐up images of the boxed regions in (b). Scale bars: 1 μm. (d) Intensity profiles along the colored lines in (b), giving FWHM values at 299 nm in wide‐field mi‐

The pixel reassignment implemented by the computational means is capable of doubling the resolution than wide‐field imaging [9, 11]. The limitation, however, is that the methods are fundamentally time‐consuming compared to the standard conventional microscopy because a large number of raw images are essentially acquired and processed. Recently, optically realized pixel reassignment has been developed to overcome the limitations by adapting the optical imaging system instead of digital data‐processing operations, which produces images

croscopy, 224 nm in MPSS, and 145 nm in MSIM, respectively. Adapted with permission from reference [9].

**2.3. Optical realization of pixel reassignment**

with comparable improvement in the spatial resolution [8, 10, 12].

than 6500‐fold faster acquisition over the ISM technology [11].

88 Microscopy and Analysis

Instant structured illumination microscopy (ISIM) is developed by Shroff et al. in 2013 that is analogous to MSIM, while its pixel reassignment process operates optically instead of the digital computation procedures [10]. As shown in **Figure 6**, the DMD used in MSIM is replaced with a converging microlens array. As a result, a multifocal excitation pattern is generated in ISIM. Correspondingly, a matched pinhole array is added to physically reject the out‐of‐focus emissions. With this modification, the optical pixel reassignment is realized based on the matched microlens array for twofold local contraction of each fluorescent focus. The fluores‐ cence emission pattern is imaged onto a camera by galvanometer scanning. Eventually, the pinholed and scaled images are optically summed, enabling 2‐fold resolution enhancement.

**Figure 6.** Principles of implementing instant structured illumination for super‐resolution realization. A multifocal exci‐ tation pattern is produced with a converging microlens array. For fluorescence detection, a pinhole array that matches the microlens array rejects the out‐of‐focus fluorescence signals. Afterwards, a second, matched microlens array allows a twofold local contraction of each pinholed fluorescence emission. A galvanometer serves as raster scanning of multi‐ focal excitation and summation of multifocal emission, which thus produces a super‐resolution image during each camera exposure. Adapted with permission from reference [10].

ISIM demonstrates 3D super‐resolution imaging with a lateral resolution of 145 nm and an axial resolution of 350 nm, nearly comparable with MSIM. Moreover, the 100 Hz frame rate comes from the optical operation of pixel realignment in ISIM, allowing super‐resolution real‐ time imaging (almost 100‐fold faster than MSIM). Taking into account the data processing duration, the speed‐up factor exceeds 10000. In addition, the low illumination power in ISIM (∼5–50 W/cm<sup>2</sup> ) mitigates photobleaching. As a result, ISIM can perform imaging over tens of time points without obvious photobleaching or photodamage. In **Figure 7**, the rapid growth (∼3.5 μm/s) of endoplasmic reticulum (ER) is monitored by ISIM even though less than 140 ms in the formation and growth of new ER tubules. The biological processes blur in previously developed technologies, such as MSIM and ISM [9, 11]. The capabilities make ISIM a powerful tool for time‐lapse super‐resolution imaging in living biological samples.

**Figure 7.** ISIM demonstrates high frame rate of imaging endoplasmic reticulum (ER) at 100 Hz. (a) The first image from 200 time points. ER labeled with GFP‐Sec61A within MRL‐TR‐transformed human lung fibroblasts. Scale bar: 10 μm. (b) Magnification of image with the large white box in (a). White arrows point out the growth process of an ER tubule; blue arrows represent the remodeling of an ER tubule. Scale bar: 5 μm. (c) Magnification of the image with the small white box in (a), displaying the dynamic formation of a new tubule within 140 ms. Scale bar: 200 nm. Adapted with permission from reference [10].

#### *2.3.2. Re‐scan confocal microscopy*

Rescan confocal microscopy (RCM) is another optical realization of the pixel reassignment technique, proposed by Luca et al. in 2013 [12].Compared with ISIM, it is more easily accessible to build an RCM because this system can be readily modified from a standard confocal microscopy as shown in **Figure 8**. The optical pixel reassignment in RCM is realized as below. The focal length of the lenses L2 and L3 is adapted for twofold local contraction of the fluorescent focus spot. Alternatively, the final fluorescence image is twofold magnified while maintaining the original fluorescence foci size.

ISIM demonstrates 3D super‐resolution imaging with a lateral resolution of 145 nm and an axial resolution of 350 nm, nearly comparable with MSIM. Moreover, the 100 Hz frame rate comes from the optical operation of pixel realignment in ISIM, allowing super‐resolution real‐ time imaging (almost 100‐fold faster than MSIM). Taking into account the data processing duration, the speed‐up factor exceeds 10000. In addition, the low illumination power in ISIM

time points without obvious photobleaching or photodamage. In **Figure 7**, the rapid growth (∼3.5 μm/s) of endoplasmic reticulum (ER) is monitored by ISIM even though less than 140 ms in the formation and growth of new ER tubules. The biological processes blur in previously developed technologies, such as MSIM and ISM [9, 11]. The capabilities make ISIM a powerful

**Figure 7.** ISIM demonstrates high frame rate of imaging endoplasmic reticulum (ER) at 100 Hz. (a) The first image from 200 time points. ER labeled with GFP‐Sec61A within MRL‐TR‐transformed human lung fibroblasts. Scale bar: 10 μm. (b) Magnification of image with the large white box in (a). White arrows point out the growth process of an ER tubule; blue arrows represent the remodeling of an ER tubule. Scale bar: 5 μm. (c) Magnification of the image with the small white box in (a), displaying the dynamic formation of a new tubule within 140 ms. Scale bar: 200 nm. Adapted

Rescan confocal microscopy (RCM) is another optical realization of the pixel reassignment technique, proposed by Luca et al. in 2013 [12].Compared with ISIM, it is more easily accessible to build an RCM because this system can be readily modified from a standard confocal microscopy as shown in **Figure 8**. The optical pixel reassignment in RCM is realized as below. The focal length of the lenses L2 and L3 is adapted for twofold local contraction of the fluorescent focus spot. Alternatively, the final fluorescence image is twofold magnified while

tool for time‐lapse super‐resolution imaging in living biological samples.

) mitigates photobleaching. As a result, ISIM can perform imaging over tens of

(∼5–50 W/cm<sup>2</sup>

90 Microscopy and Analysis

with permission from reference [10].

*2.3.2. Re‐scan confocal microscopy*

maintaining the original fluorescence foci size.

**Figure 8.** The schematic of rescan confocal microscopy (RCM). Unit 1: A standard confocal microscopy with a set of scanning mirrors for scanning the excitation light and de‐scanning the emission light. Unit 2: A re‐scanning configura‐ tion for 'writing' the light that passes the pinhole onto the CCD‐camera. Although the pinhole is in a relatively large diameter, the resolution is 2 times improved, which thus gives much more photo‐efficient advantage compared to conventional confocal microscopes with the similar resolution. Adapted with permission from reference [12].

This process is accomplished by reasonably changing the angular amplitude of the rescanner. The ratio of angular amplitude of the two scanners, expressed by the sweep factor M, changes the properties of the rescan microscope. For M = 1 the microscope has the same lateral resolution with a wide‐field microscope, defined by the well‐known optical diffraction limit; it achieves the super resolution for M = 2. The rescanner is used to deliver the fluorescence emission onto the camera pixels. The camera is in the exposure status for optical summation of the fluorescent focus during rescanning.

The lateral resolution improvement of RCM is quantified by imaging 100‐nm fluorescent beads. FWHM is found to reduce from 245 nm (15 nm) in wide‐field imaging to 170 nm (±10 nm) in RCM imaging, indicating an improvement by a factor of 2 without deconvolution. Also, the resolution improvement is concluded by visualizing fluorescently labeled microtu‐ bules of HUVEC cell in **Figure 9(a)–(f)**. To demonstrate the capability of RCM for monitoring dynamics, the time‐lapse imaging of living HeLa cells expressing EB3‐GFP with the growing end of microtubules is observed by RCM. As shown in **Figure 9(g)**, RCM is able to track the fast dynamics (0.5 μm/s) with multiple advantages of improved resolution, high sensitivity, and sufficient imaging rate (1 fps).

**Figure 9.** Fluorescently labeled microtubules in HUVEC cells imaged by RCM with sweep‐factor M=1 (a), which gives an image with resolution of a wide‐field fluorescence microscope determined by the diffraction limit. In double‐sweep mode (sweep‐factor M=2) (b) RCM gives resolution improvement by a factor of 2. Junctions of microtubules (c, e) and parallel microtubules (d, f) are unresolved with wide‐field resolution (c, d), but distinguished by RCM in double sweep mode (e, f). (g) Screenshots from an RCM time lapse series of living HeLa cells at M=2 demonstrate the monitor‐ ing of fast dynamic structures (0.5 μm/s). Scale bars: 1 μm. Adapted with permission from reference [12].

#### *2.3.3. Two‐photon instant structured illumination microscopy*

RCM improves resolution by a factor of 2 compared with wide‐field imaging while possessing optical sectioning capabilities as the traditional confocal microscope [8]. Two‐photon excitation offers better optical sectioning capability based on the nonlinear effect. Infrared excitation light minimizes the optical scattering in the tissue, and the fluorescent signals come only from two‐ photon absorption. These advantages effectively increase the penetration depth and simulta‐ neously suppress the background signal, making the two‐photon excitation technique an ideal imaging tool for the thick samples.

Two‐photon instant structured illumination microscopy (2P ISIM) is a combination of RCM and two‐photon excitation technique, presented by Shroff et al. in 2014, as shown in **Fig‐ ure 10(a)** [8]. Similarly, an additional scanning component is introduced in 2P ISIM for the optical realization of pixel reassignment. In **Figure 10(b)–(d)**, 2P ISIM provides better resolu‐ tion than the diffraction‐limited two‐photon excitation mode by imaging the microtubules. Applying the deconvolution, the lateral resolution is further improved in **Figure 10(c)**. 2P ISIM is quantified by ∼150 nm in the lateral resolution and by ∼400 nm in the axial resolution, respectively, with 100‐nm diameter fluorescent beads as imaging targets. A factor of 2 (with deconvolution) resolution enhancement is obtained compared with the conventional two‐ photon wide‐field imaging (∼311 nm).

fast dynamics (0.5 μm/s) with multiple advantages of improved resolution, high sensitivity,

**Figure 9.** Fluorescently labeled microtubules in HUVEC cells imaged by RCM with sweep‐factor M=1 (a), which gives an image with resolution of a wide‐field fluorescence microscope determined by the diffraction limit. In double‐sweep mode (sweep‐factor M=2) (b) RCM gives resolution improvement by a factor of 2. Junctions of microtubules (c, e) and parallel microtubules (d, f) are unresolved with wide‐field resolution (c, d), but distinguished by RCM in double sweep mode (e, f). (g) Screenshots from an RCM time lapse series of living HeLa cells at M=2 demonstrate the monitor‐

RCM improves resolution by a factor of 2 compared with wide‐field imaging while possessing optical sectioning capabilities as the traditional confocal microscope [8]. Two‐photon excitation offers better optical sectioning capability based on the nonlinear effect. Infrared excitation light minimizes the optical scattering in the tissue, and the fluorescent signals come only from two‐ photon absorption. These advantages effectively increase the penetration depth and simulta‐ neously suppress the background signal, making the two‐photon excitation technique an ideal

Two‐photon instant structured illumination microscopy (2P ISIM) is a combination of RCM and two‐photon excitation technique, presented by Shroff et al. in 2014, as shown in **Fig‐ ure 10(a)** [8]. Similarly, an additional scanning component is introduced in 2P ISIM for the optical realization of pixel reassignment. In **Figure 10(b)–(d)**, 2P ISIM provides better resolu‐ tion than the diffraction‐limited two‐photon excitation mode by imaging the microtubules. Applying the deconvolution, the lateral resolution is further improved in **Figure 10(c)**. 2P ISIM is quantified by ∼150 nm in the lateral resolution and by ∼400 nm in the axial resolution, respectively, with 100‐nm diameter fluorescent beads as imaging targets. A factor of 2 (with deconvolution) resolution enhancement is obtained compared with the conventional two‐

ing of fast dynamic structures (0.5 μm/s). Scale bars: 1 μm. Adapted with permission from reference [12].

*2.3.3. Two‐photon instant structured illumination microscopy*

imaging tool for the thick samples.

photon wide‐field imaging (∼311 nm).

and sufficient imaging rate (1 fps).

92 Microscopy and Analysis

**Figure 10.** Schematic diagram of two‐photon instant structured illumination microscopy (2P ISIM) and its imaging ca‐ pabilities. (a) Pulsed femtosecond laser (2PE) serves as a two‐photon excitation source (labeled by red line). Fluores‐ cence (labeled with green line) is collected and delivered onto a camera. HWP: half‐wave plate; POL: polarizer; EXC 2D GALVO: galvanometric mirror for scanning the excitation laser; DC: dichroic mirror; IX‐70: microscope part hous‐ ing objective and sample (not shown); EM 2D GALVO: galvanometric mirror for rescanning the fluorescence emission. (b)–(d) Resolution enhancement of 2P ISIM. (b) 2P ISIM image of immunolabeled microtubules in a fixed U2OS human osteosarcoma cell after deconvolution processing. (c) Magnified view of the yellow rectangular region in (b), indicating the resolution improvement in deconvolved 2P ISIM compared with both 2P wide‐field microscopy (2P WF) and 2P ISIM. (d) Fluorescence intensity profiles of microtubules highlighted with green, red, and blue lines in (c). Scale bar: 10 μm in (b) and 3 μm in (c). Adapted with permission from reference [8].

To demonstrate the enhanced penetration ability of 2P ISIM in living thick samples, embryos of transgenic *Caenorhabditis elegans* expressing GFP‐H2B are imaged in **Figure 11**. Both imaging resolution and contrast severely degrade at depths of more than ∼15 μm from the coverslip surface in 1P illumination due to strong scattering in deep tissue (**Figure 11(a), (b)**). The degradation is not compensated by increasing of the exposure time, which, however, mainly leads to high background noise. Two‐photon excitation of 2P ISIM effectively suppresses the out‐of‐focus emission. Thus, the subnuclear chromatin structures are clearly observed up to the depth of ∼30 μm in **Figure 11(c), (d)**, where the fluorescence signals slightly reduce as the depth increases.

**Figure 11.** Enhanced penetration ability in 2P ISIM. (a, b) 1P ISIM images of a nematode embryo expressing GFP‐H2B in nuclei. (a) Cross sections of the worm embryo at different axial positions. Scale bar: 10 μm. (b) Magnifications of the yellow rectangular regions in (a). Scale bar: 3 μm. The degradation in imaging contrast is observed as the depths in‐ crease. (c, d) 2P ISIM visualizes the subnuclear chromatin structure throughout nematode embryos. (c) Cross sections at the representative axial position. Scale bar: 10 μm. (d) Magnifications of yellow rectangular regions in (c), indicating better resolution, higher contrast, and larger imaging depth compared with 1P ISIM. Scale bar: 2 μm. Adapted with permission from reference [8]

### **3. Conclusion**

In this chapter, we represent the super‐resolution confocal microscopy (and two‐photon microscopy) realized through the pixel reassignment methods computationally and optically. These demonstrate multiple advantages of resolution improvement, high fluorescence collection efficiency, optical sectioning capability, and fast imaging acquisition, which thus is able to investigate biological structures and processes at the cellular and even macromolecular level with 3D spatial scale. Additionally, because the method is directly established based on the standard confocal microscopy and/or two‐photon microscopy, it mitigates the require‐ ments in fluorescent probes and/or labeling methods that are always indispensable in some super‐resolution fluorescence microscopic technologies, such as STORM and PALM [2, 3].

More importantly, the development of these techniques is not limited in the laboratorial stage. In 2015, the first commercial setup, LSM 800, is established by Carl Zeiss [13], which, in principle, is based on ISM but replaces the EMCCD camera with a 32‐channel linear GaAsP‐ PMT array (i.e. Airyscan detector as shown in **Figure 12**). The highest imaging speed of LSM 800 with 512×512 pixels is up to 8 Hz, tremendous faster than ISM. Therefore, we expect that the super‐resolution microscopy based on the pixel reassignment technique has great poten‐ tials for boosting imaging acquisition speed, and therefore further provides better under‐ standing in intracellular molecular interactions and dynamic processes within living biological specimens.

**Figure 12.** Schematic diagram of Airyscan detector in LSM 800. In brief, a hexagonal microlens array (a) collects inci‐ dent light, which is in direct connection with the ends (b) of a fiber bundle (c). The other ends (d) of the fibers are in contact with a linear GaAsP‐PMT array (e) serving as a detector. Thus, an area detector is created, onto which the Airy disk is imaged via a zoom optic configuration. Note that the single detector element, replacing the classical pinhole, acts as the separate pinholes in Airyscan detection. Adapted with permission from reference [13].

**Figure 11.** Enhanced penetration ability in 2P ISIM. (a, b) 1P ISIM images of a nematode embryo expressing GFP‐H2B in nuclei. (a) Cross sections of the worm embryo at different axial positions. Scale bar: 10 μm. (b) Magnifications of the yellow rectangular regions in (a). Scale bar: 3 μm. The degradation in imaging contrast is observed as the depths in‐ crease. (c, d) 2P ISIM visualizes the subnuclear chromatin structure throughout nematode embryos. (c) Cross sections at the representative axial position. Scale bar: 10 μm. (d) Magnifications of yellow rectangular regions in (c), indicating better resolution, higher contrast, and larger imaging depth compared with 1P ISIM. Scale bar: 2 μm. Adapted with

In this chapter, we represent the super‐resolution confocal microscopy (and two‐photon microscopy) realized through the pixel reassignment methods computationally and optically. These demonstrate multiple advantages of resolution improvement, high fluorescence collection efficiency, optical sectioning capability, and fast imaging acquisition, which thus is able to investigate biological structures and processes at the cellular and even macromolecular level with 3D spatial scale. Additionally, because the method is directly established based on the standard confocal microscopy and/or two‐photon microscopy, it mitigates the require‐

permission from reference [8]

**3. Conclusion**

94 Microscopy and Analysis

In addition to the issue of imaging acquisition speed, multicolor fluorescence microscopy is desired for investigating the interactions between different structures or biomolecules via labeling them with distinct colors. The possible interactions can be revealed by the co‐ localization of the different dyes and/or proteins. The standard fluorescence microscopy, however, might give inaccurate co‐localization due to the diffraction‐limited resolution. In combination with the pixel reassignment, the multicolor imaging technique is anticipated to provide a high‐resolution imaging of the biological interaction within live cells.

In MSIM and ISIM based on the pixel reassignment approach [9, 10], both super‐resolution imaging capability and color differentiation have been demonstrated, which have the advan‐ tages of easily configured optical system and weak cross‐talk effect between the different colors. Switching laser lines for the excitation of different fluorophores might induce spatial mismatch in the images. Therefore, it is more preferable for simultaneously exciting all fluorophores and synchronously collecting their fluorescence signals. Multiple detectors with appropriate dichroic mirrors and emission filters can be used to collect the different fluores‐ cence signals with different detection channels. Alternatively, an imaging spectrometer can be applied to record the spectral feature of these fluorophores.

Synchronous imaging decreases the fluorescence photobleaching probability due to low light exposure, benefiting to long‐term monitoring of living samples. However, cross‐talk of the different fluorophores always occurs because of the broad and overlapping excitation and emission bands of fluorophores. Although the cross‐talk effects can be removed by selecting dyes with appropriately wide and non‐overlapping emission spectra, the dyes are often inaccessible, which thus restricts its application in multicolor imaging. Linear spectral unmixing analysis is a solution to eliminate the cross‐talk effect in spectral imaging [14]. The spectrum of the mixed fluorescent signal is expressed as a linear integration of the component dye spectra [15], and therefore the concentration or intensity of the fluorescence from each dye can be precisely analyzed. Based on the data analysis, both spatial mismatch and cross‐talk effect are mitigated in multicolor imaging of live cells.

**Figure 13.** Multicolor RCM reveals the cellular microstructures labeled with different dyes. (a) Simultaneous RCM imaging of nucleus and lysosomes labeled with SYTO 82 and LysoTracker Red in a live bEnd.3 cell, respectively. Based on the linear spectral unmixing analysis, nucleus (c) and lysosomes (d) are differentiated according to their corre‐ sponding spectral features (e), respectively. (b) Overlaid image of the RCM images from (c) and (d). Scale bar: 5 μm.

In **Figure 13**, we establish a multicolor RCM with simultaneous excitation of different fluoro‐ phores and synchronous collection of their fluorescence. Linear spectral unmixing analysis is implemented for the spectral differentiation of the live cells stained with different dyes. SYTO 82‐labeled nucleus and LysoTracker Red‐stained lysosomes within live bEnd.3 cells are imaged by RCM with a spectrometer as the spectral detector. The nucleus and lysosomes are captured simultaneously, followed by the linear spectral unmixing analysis based on the known spectral features of these two dyes (severely overlapping as shown in **Figure 13(e)**). **Figure 13(b)–(d)** gives a clear separation of the two kinds of subcellular organelles. This approach is very powerful in investigation of the dynamic interactions of the subcellular structures.

## **Acknowledgements**

fluorophores and synchronously collecting their fluorescence signals. Multiple detectors with appropriate dichroic mirrors and emission filters can be used to collect the different fluores‐ cence signals with different detection channels. Alternatively, an imaging spectrometer can be

Synchronous imaging decreases the fluorescence photobleaching probability due to low light exposure, benefiting to long‐term monitoring of living samples. However, cross‐talk of the different fluorophores always occurs because of the broad and overlapping excitation and emission bands of fluorophores. Although the cross‐talk effects can be removed by selecting dyes with appropriately wide and non‐overlapping emission spectra, the dyes are often inaccessible, which thus restricts its application in multicolor imaging. Linear spectral unmixing analysis is a solution to eliminate the cross‐talk effect in spectral imaging [14]. The spectrum of the mixed fluorescent signal is expressed as a linear integration of the component dye spectra [15], and therefore the concentration or intensity of the fluorescence from each dye can be precisely analyzed. Based on the data analysis, both spatial mismatch and cross‐talk

**Figure 13.** Multicolor RCM reveals the cellular microstructures labeled with different dyes. (a) Simultaneous RCM imaging of nucleus and lysosomes labeled with SYTO 82 and LysoTracker Red in a live bEnd.3 cell, respectively. Based on the linear spectral unmixing analysis, nucleus (c) and lysosomes (d) are differentiated according to their corre‐ sponding spectral features (e), respectively. (b) Overlaid image of the RCM images from (c) and (d). Scale bar: 5 μm.

In **Figure 13**, we establish a multicolor RCM with simultaneous excitation of different fluoro‐ phores and synchronous collection of their fluorescence. Linear spectral unmixing analysis is implemented for the spectral differentiation of the live cells stained with different dyes. SYTO 82‐labeled nucleus and LysoTracker Red‐stained lysosomes within live bEnd.3 cells are imaged by RCM with a spectrometer as the spectral detector. The nucleus and lysosomes are captured simultaneously, followed by the linear spectral unmixing analysis based on the known spectral features of these two dyes (severely overlapping as shown in **Figure 13(e)**). **Figure 13(b)–(d)** gives a clear separation of the two kinds of subcellular organelles. This approach is very powerful in investigation of the dynamic interactions of the subcellular

applied to record the spectral feature of these fluorophores.

96 Microscopy and Analysis

effect are mitigated in multicolor imaging of live cells.

structures.

This work was supported in part by the National Natural Science Foundation of China grant Nos. 61505238 and 11504042, Daqing Normal University Youth Foundation No. 12ZR12, Daqing Normal University doctor Foundation No. 15ZR03 doctor Foundation, Natural Science Foundation Project of Heilongjiang Province Nos. A200506 and QC2015066, Science and Technology Research Project of Heilongjiang Province Education Department No. 12543002, and Guidance of Science and Technology Plan Projects of Daqing City No. szdfy‐2015‐59.

## **Author details**

Longchao Chen1 , Yuling Wang2 and Wei Song1\*

\*Address all correspondence to: weisong1220@gmail.com

1 Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen University Town, Shenzhen, China

2 School of Mechatronics Engineering, Daqing Normal University, Ranghulu District Xibin Road, Daqing, China

## **References**

