We are IntechOpen, the first native scientific publisher of Open Access books

3,350+ Open access books available

108,000+

International authors and editors

114M+ Downloads

151 Countries delivered to Our authors are among the

Top 1% most cited scientists

12.2%

Contributors from top 500 universities

Selection of our books indexed in the Book Citation Index in Web of Science™ Core Collection (BKCI)

## Interested in publishing with us? Contact book.department@intechopen.com

Numbers displayed above are based on latest data collected. For more information visit www.intechopen.com

## **Meet the editor**

Stefan G. Stanciu, D.Eng., is currently acting as a principal scientific researcher within the Center for Microscopy, Microanalysis and Information Processing of University "Politehnica" of Bucharest, where he leads the Information Processing Group. His research interests lie in the area of signal and image processing, and computer vision with special emphasis on applications

targeted to microscopy imaging by scanning light and scanning probe techniques. Some of the high priority topics in his research agenda are the development of novel image processing, image fusion and computer vision techniques targeted to increase the level of information extracted from microscopy images and to enhance the visualization of microscopy data, the development of novel image quality metrics, image analysis, image understanding and artificial intelligence. His research interests are also connected to optoelectronics, optics and photonics.

Contents

**Preface VII** 

Zhiyang Li

Andon Lazarov

Chapter 1 **Laser Probe 3D Cameras Based on** 

Chapter 3 **Low Bit Rate SAR Image Compression** 

Chapter 6 **Temporal and Spatial Resolution Limit** 

Chapter 7 **Practical Imaging in Dermatology 135** 

Chapter 9 **Compensating Light Intensity Attenuation** 

**in Confocal Scanning Laser Microscopy by Histogram Modeling Methods 187** 

Stefan G. Stanciu, George A. Stanciu and Dinu Coltuc

**Digital Optical Phase Conjugation 1** 

Chapter 2 **ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 27** 

> **Based on Sparse Representation 51**  Alessandra Budillon and Gilda Schirinzi

Chapter 4 **Polygonal Representation of Digital Curves 71**  Dilip K. Prasad and Maylor K. H. Leung

Chapter 5 **Comparison of Border Descriptors and Pattern** 

**Study of Radiation Imaging Systems:** 

**Recognition Techniques Applied to Detection** 

**Notions and Elements of Super Resolution 109**  Faycal Kharfi, Omar Denden and Abdelkader Ali

Ville Voipio, Heikki Huttunen and Heikki Forsvik

Chapter 8 **Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 161**  Juan F. Ramirez-Villegas and David F. Ramirez-Moreno

**and Diagnose of Faults on Sucker-Rod Pumping System 91**  Fábio Soares de Lima, Luiz Affonso Guedes and Diego R. Silva

### Contents

#### **Preface XI**


Preface

in the impact of such topics in the years to come.

We live in a time when digital information plays a key role in various fields. Whether we look towards communications, industry, medicine, scientific research or entertainment, we find digital images to be heavily employed. The high volume of stored and transacted digital images, together with the increasing availability of advanced digital image acquisition and display techniques and devices, came with a growing need for novel, fast and intelligent algorithms for the digital manipulation of digital images. The development of advanced, fast and reliable algorithms for digital image pre- and postprocessing, digital image compression, digital image segmentation and computer vision, 2D and 3D data visualization, image metrology and other related subjects, represents at this time a high priority field of research, as the current trends and the technological advances that we are currently seeing taking place promises to create an exponential rise

This book presents several recent advances that are related or fall under the umbrella of 'digital image processing'. The purpose of this book is to provide an insight on the possibilities offered by digital image processing algorithms in various fields. Digital image processing is quite a multidisciplinary field, and therefore, the chapters in this book cover a wide range of topics. The presented mathematical algorithms are accompanied by graphical representations and illustrative examples for an enhanced readability. The chapters are written in a manner that allows even a reader with basic experience and knowledge in the digital image processing field, to properly understand the presented algorithms. Hopefully, scientists working in various fields will become aware of the high potential that such algorithms can provide, and students will become more interested in this field and will enhance their knowledge accordingly. Concurrently, the structure of the information in this book is such that fellow scientists will be able to use it to push the development of the presented subjects even further.

I would like to thank the authors of the chapters for their valuable contributions, and the editorial team at InTech for providing full support in bringing this book to its

Center for Microscopy – Microanalysis and Information Processing

**D.Eng. Stefan G. Stanciu**

Romania

University "Politehnica" of Bucharest

current form. I sincerely hope that this book will benefit the wide audience.

### Preface

We live in a time when digital information plays a key role in various fields. Whether we look towards communications, industry, medicine, scientific research or entertainment, we find digital images to be heavily employed. The high volume of stored and transacted digital images, together with the increasing availability of advanced digital image acquisition and display techniques and devices, came with a growing need for novel, fast and intelligent algorithms for the digital manipulation of digital images. The development of advanced, fast and reliable algorithms for digital image pre- and postprocessing, digital image compression, digital image segmentation and computer vision, 2D and 3D data visualization, image metrology and other related subjects, represents at this time a high priority field of research, as the current trends and the technological advances that we are currently seeing taking place promises to create an exponential rise in the impact of such topics in the years to come.

This book presents several recent advances that are related or fall under the umbrella of 'digital image processing'. The purpose of this book is to provide an insight on the possibilities offered by digital image processing algorithms in various fields. Digital image processing is quite a multidisciplinary field, and therefore, the chapters in this book cover a wide range of topics. The presented mathematical algorithms are accompanied by graphical representations and illustrative examples for an enhanced readability. The chapters are written in a manner that allows even a reader with basic experience and knowledge in the digital image processing field, to properly understand the presented algorithms. Hopefully, scientists working in various fields will become aware of the high potential that such algorithms can provide, and students will become more interested in this field and will enhance their knowledge accordingly. Concurrently, the structure of the information in this book is such that fellow scientists will be able to use it to push the development of the presented subjects even further.

I would like to thank the authors of the chapters for their valuable contributions, and the editorial team at InTech for providing full support in bringing this book to its current form. I sincerely hope that this book will benefit the wide audience.

> **D.Eng. Stefan G. Stanciu** Center for Microscopy – Microanalysis and Information Processing University "Politehnica" of Bucharest Romania

**1** 

Zhiyang Li

*Hubei, Wuhan, P. R. China*

**Laser Probe 3D Cameras Based on** 

*College of Physical Science and Technology, Central China Normal University* 

A camera makes a picture by projecting objects onto the image plane of an optical lens, where the image is recorded with a film or a CCD or CMOS image sensor. The pictures thus generated are two-dimensional and the depth information is lost. However in many fields depth information is getting more and more important. In industry the shape of a component or a die, needs to be measured accurately for quality control, automated manufacturing, solid modelling, etc. In auto-navigation, three dimensional coordinates of changing environment need to be acquired in real-time to aid auto path planning for vehicles or intelligent robots. In driving assistant systems any obstacle in front a car should be detected within 0.01 second. Even in making 3D movies for true 3D display in the near future, three dimensional coordinates need to be recorded with a fame rate of at least 25f/s, etc. For the past few decades intensive researches have been carried out and various optical methods have been investigated[Chen, et al., 2000], yet they still could not fulfil every requirement of present-day applications on either measuring speed, or accuracy, or measuring range/area, or convenience, etc. For example, although interferometric methods provide very high measuring precision [Yamaguchi, et al., 2006; Barbosa, & Lino, 2007], they are sensitive to speckle noise and vibration and perform measurement over small areas. The structured light projection methods provide good precision and full field measurements [Srinivasan, et al., 1984; Guan, et al., 2003], yet the measuring width is still limited to several meters. Besides they often encounter shading problems. Stereovision is a convenient means for large field measurements without active illumination, but stereo matching often turns very complicated and results in high reconstruction noise [Asim, 2008].To overcome the drawbacks improvements and new methods appear constantly. For example, time-of-flight (TOF) used to be a point-to-point method [Moring, 1989]. Nowadays commercial 3D-TOF cameras are available [Stephan, et al., 2008]. Silicon retina sensors have also been developed which supports event-based stereo matching [Jürgen & Christoph, 2011]. Among all the efforts those employing cameras appear more desirable because they are non-contact,

relatively cheap, easy to carry out, and provide full field measurements, etc.

The chapter introduces a new camera—a so-called laser probe 3D camera, a camera enforced with hundreds and thousands of laser probes projected onto objects, whose pre-known positions help to determine the three dimensional coordinates of objects under

**1. Introduction**

**Digital Optical Phase Conjugation** 

### **Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation**

Zhiyang Li

*College of Physical Science and Technology, Central China Normal University Hubei, Wuhan, P. R. China* 

#### **1. Introduction**

A camera makes a picture by projecting objects onto the image plane of an optical lens, where the image is recorded with a film or a CCD or CMOS image sensor. The pictures thus generated are two-dimensional and the depth information is lost. However in many fields depth information is getting more and more important. In industry the shape of a component or a die, needs to be measured accurately for quality control, automated manufacturing, solid modelling, etc. In auto-navigation, three dimensional coordinates of changing environment need to be acquired in real-time to aid auto path planning for vehicles or intelligent robots. In driving assistant systems any obstacle in front a car should be detected within 0.01 second. Even in making 3D movies for true 3D display in the near future, three dimensional coordinates need to be recorded with a fame rate of at least 25f/s, etc. For the past few decades intensive researches have been carried out and various optical methods have been investigated[Chen, et al., 2000], yet they still could not fulfil every requirement of present-day applications on either measuring speed, or accuracy, or measuring range/area, or convenience, etc. For example, although interferometric methods provide very high measuring precision [Yamaguchi, et al., 2006; Barbosa, & Lino, 2007], they are sensitive to speckle noise and vibration and perform measurement over small areas. The structured light projection methods provide good precision and full field measurements [Srinivasan, et al., 1984; Guan, et al., 2003], yet the measuring width is still limited to several meters. Besides they often encounter shading problems. Stereovision is a convenient means for large field measurements without active illumination, but stereo matching often turns very complicated and results in high reconstruction noise [Asim, 2008].To overcome the drawbacks improvements and new methods appear constantly. For example, time-of-flight (TOF) used to be a point-to-point method [Moring, 1989]. Nowadays commercial 3D-TOF cameras are available [Stephan, et al., 2008]. Silicon retina sensors have also been developed which supports event-based stereo matching [Jürgen & Christoph, 2011]. Among all the efforts those employing cameras appear more desirable because they are non-contact, relatively cheap, easy to carry out, and provide full field measurements, etc.

The chapter introduces a new camera—a so-called laser probe 3D camera, a camera enforced with hundreds and thousands of laser probes projected onto objects, whose pre-known positions help to determine the three dimensional coordinates of objects under

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 3

it would travel to the right wide end and gets decomposed into fundamental mode field of each isolated single-mode waveguide. Since these fundamental mode fields are separated from each other in space, they could be reconstructed using a pair of low resolution SLMs

Taper

For the device in Fig.2 we may adjust the gray scale of each pixel of SLMs so that it modulates the amplitude and the phase of illuminating laser beam properly [Neto, et al. 1996; Tudela, et al. 2004] and reconstruct a conjugate field proportional to above decomposed fundamental mode field within each isolated single-mode waveguide at the right wide end. Due to reciprocity of an optical path the digitally reconstructed conjugate light field within each isolated single-mode waveguide would travel inversely to the left narrow end of the taper, combine and create an optical field proportional to the original incident optical field. Since the device in Fig.2 rebuilds optical fields via digital optical phase conjugation, it gets ride off all the aberrations inherent in conventional optical lens systems automatically. For example, suppose an object A2B2 is placed in front of the optical lens. It forms an image A1B1 with poor quality. The reconstructed conjugate image in front of the narrow end of the taper bears all the aberrations of A1B1. However, due to reciprocity, the light exited from the reconstructed conjugate image of A1B1 would follow the same path and return to the original starting place, restoring A2B2 with exactly the same shape. So the resolution of a digital optical phase conjugation device is merely limited by diffraction,

Taper

B1

2sin

where *θ* is the half cone angle of the light beam arriving at a point at image plane as indicated in Fig.2. The half cone angle *θ* could be estimated from the critical angle *θ<sup>c</sup>* of

(1)

λ

PLPLP

MLA

*dx*

and a micro lens array (MLA) as illustrated in Fig.2.

Fig. 1. Structure of an adiabatic waveguide taper.

A2

B2

Fig. 2. Device to perform digital optical phase conjugation.

Lens

A1

*c*

L2 L1

which can be described by,

investigation. The most challenging task in constructing such a 3D camera is the generation of those huge number of laser probes, with the position of each laser probe independently adaptable according to the shape of an object. In section 2 we will explain how the laser probes could be created by means of digital optical phase conjugation, an accurate method for optical wavefront reconstruction we put forward a little time earlier[Zhiyang, 2010a,2010b]. In section 3 we will demonstrate how the laser probes could be used to construct 3D cameras dedicated for various applications, such as micro 3D measurement, fast obstacle detection, 360-deg shape measurement, etc. In section 4 we will discuss more characteristics like measuring speed, energy consumption, resistance to external interferences, etc., of laser probe 3D cameras. Finally a short summery is given in section 5.

#### **2. Generation of laser probes via digital optical phase conjugation**

To build a laser-probe 3D camera, one needs first to find a way to project simultaneously hundreds and thousands of laser probes into preset positions. Looking the optical field formed by all the laser probes as a whole it might be regarded as a problem of optical wavefront reconstruction. Although various methods for optical wavefront reconstruction have been reported, few of them could fulfil above task. For example, an optical lens system can focus a light beam and move it around with a mechanical gear. But it can hardly adjust its focal length so quickly to produce so many laser probes far and near within the time of a snapshot of a camera. Traditional optical phase conjugate reflection is an efficient way for optical wavefront reconstruction [Yariv, & Peper, 1977; Feinberg, 1982]. However it reproduces, or reflects only existing optical wavefronts based on some nonlinear optical effects. That is to say, to generate above mentioned laser probes one should first find another way to create beforehand the same laser probes with high energy to trig nonlinear optical effect. While holography can reconstruct only static optical wavefronts since high resolution holographic plates have to be used.

To perform real-time digital optical wavefront reconstruction it is promising to employ spatial light modulators (SLM) [Amako, et al. 1993; Matoba, et al. 2002; Kohler, et al. 2006]. A SLM could modulate the amplitude or phase of an optical field pixel by pixel in space. For liquid crystal SLMs several millions of pixels are available. And the width of each pixel might be fabricated as small as 10 micrometers in case of a projection type liquid crystal panel. However the pixel size appears still much larger than the wavelength to be employed in a laser probe 3D camera. According to the sensitive wavelength range of a CCD or CMOS image sensor it is preferable to produce laser probes with a wavelength in the range of 0.35~1.2 micrometers, or 0.7~1.2 micrometers to avoid interference with human eyes if necessary. So the wavelength is about ten times smaller than the pixel pitch of a SLM. Therefore with bare SLMs only slowly varying optical fields could be reconstructed with acceptable precision. Unfortunately the resulting optical field formed by hundreds and thousands of laser probes may appear extremely complex.

Recently we introduced an adiabatic waveguide taper to decompose an optical field, however dramatically it changes over space, into simpler form that is easier to rebuild [Zhiyang, 2010a]. As illustrated in Fig.1, such an adiabatic taper consists of a plurality of single-mode waveguides. At the narrow end of the taper the single-mode waveguides couple to each other. While at the wide end the single-mode waveguides become optically isolated from each other. When an optical field incidents on the left narrow end of the taper,

investigation. The most challenging task in constructing such a 3D camera is the generation of those huge number of laser probes, with the position of each laser probe independently adaptable according to the shape of an object. In section 2 we will explain how the laser probes could be created by means of digital optical phase conjugation, an accurate method for optical wavefront reconstruction we put forward a little time earlier[Zhiyang, 2010a,2010b]. In section 3 we will demonstrate how the laser probes could be used to construct 3D cameras dedicated for various applications, such as micro 3D measurement, fast obstacle detection, 360-deg shape measurement, etc. In section 4 we will discuss more characteristics like measuring speed, energy consumption, resistance to external interferences, etc., of laser probe 3D cameras. Finally a short summery is given in section 5.

To build a laser-probe 3D camera, one needs first to find a way to project simultaneously hundreds and thousands of laser probes into preset positions. Looking the optical field formed by all the laser probes as a whole it might be regarded as a problem of optical wavefront reconstruction. Although various methods for optical wavefront reconstruction have been reported, few of them could fulfil above task. For example, an optical lens system can focus a light beam and move it around with a mechanical gear. But it can hardly adjust its focal length so quickly to produce so many laser probes far and near within the time of a snapshot of a camera. Traditional optical phase conjugate reflection is an efficient way for optical wavefront reconstruction [Yariv, & Peper, 1977; Feinberg, 1982]. However it reproduces, or reflects only existing optical wavefronts based on some nonlinear optical effects. That is to say, to generate above mentioned laser probes one should first find another way to create beforehand the same laser probes with high energy to trig nonlinear optical effect. While holography can reconstruct only static optical wavefronts since high

To perform real-time digital optical wavefront reconstruction it is promising to employ spatial light modulators (SLM) [Amako, et al. 1993; Matoba, et al. 2002; Kohler, et al. 2006]. A SLM could modulate the amplitude or phase of an optical field pixel by pixel in space. For liquid crystal SLMs several millions of pixels are available. And the width of each pixel might be fabricated as small as 10 micrometers in case of a projection type liquid crystal panel. However the pixel size appears still much larger than the wavelength to be employed in a laser probe 3D camera. According to the sensitive wavelength range of a CCD or CMOS image sensor it is preferable to produce laser probes with a wavelength in the range of 0.35~1.2 micrometers, or 0.7~1.2 micrometers to avoid interference with human eyes if necessary. So the wavelength is about ten times smaller than the pixel pitch of a SLM. Therefore with bare SLMs only slowly varying optical fields could be reconstructed with acceptable precision. Unfortunately the resulting optical field formed by hundreds and

Recently we introduced an adiabatic waveguide taper to decompose an optical field, however dramatically it changes over space, into simpler form that is easier to rebuild [Zhiyang, 2010a]. As illustrated in Fig.1, such an adiabatic taper consists of a plurality of single-mode waveguides. At the narrow end of the taper the single-mode waveguides couple to each other. While at the wide end the single-mode waveguides become optically isolated from each other. When an optical field incidents on the left narrow end of the taper,

**2. Generation of laser probes via digital optical phase conjugation** 

resolution holographic plates have to be used.

thousands of laser probes may appear extremely complex.

it would travel to the right wide end and gets decomposed into fundamental mode field of each isolated single-mode waveguide. Since these fundamental mode fields are separated from each other in space, they could be reconstructed using a pair of low resolution SLMs and a micro lens array (MLA) as illustrated in Fig.2.

Fig. 1. Structure of an adiabatic waveguide taper.

Fig. 2. Device to perform digital optical phase conjugation.

For the device in Fig.2 we may adjust the gray scale of each pixel of SLMs so that it modulates the amplitude and the phase of illuminating laser beam properly [Neto, et al. 1996; Tudela, et al. 2004] and reconstruct a conjugate field proportional to above decomposed fundamental mode field within each isolated single-mode waveguide at the right wide end. Due to reciprocity of an optical path the digitally reconstructed conjugate light field within each isolated single-mode waveguide would travel inversely to the left narrow end of the taper, combine and create an optical field proportional to the original incident optical field. Since the device in Fig.2 rebuilds optical fields via digital optical phase conjugation, it gets ride off all the aberrations inherent in conventional optical lens systems automatically. For example, suppose an object A2B2 is placed in front of the optical lens. It forms an image A1B1 with poor quality. The reconstructed conjugate image in front of the narrow end of the taper bears all the aberrations of A1B1. However, due to reciprocity, the light exited from the reconstructed conjugate image of A1B1 would follow the same path and return to the original starting place, restoring A2B2 with exactly the same shape. So the resolution of a digital optical phase conjugation device is merely limited by diffraction, which can be described by,

$$d\mathbf{x} = \frac{\lambda}{2\sin\theta} \tag{1}$$

where *θ* is the half cone angle of the light beam arriving at a point at image plane as indicated in Fig.2. The half cone angle *θ* could be estimated from the critical angle *θ<sup>c</sup>* of

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 5

(a). Distribution for incident light, left: 2-D field; right:1-D Electrical component at Z=0

(b). Distribution for rebuilt light, left: 2-D field; right:1-D Electrical component at Z=0

From the X-directional field distribution one can see that the rebuilt light spot has a halfmaximum-width of about 1μm, which is very close to the predicated resolution of 0.83μm

Fig.5 demonstrated how multiple light spots could be reconstructed simultaneously via digital optical phase conjugation. The simulation parameters were the same as in Fig.4. Three small point light sources were placed 25 micrometers away from the taper and separated 15 micrometers from each other along vertical direction. As could be seen from Fig.5a, the lights emitted from the three point light sources propagate from left to right, enter the first stack of the taper and stimulate various eigenmodes within the taper.

Fig. 4. Reconstruction of a single light spot via digital optical phase conjugation.

by Eq.1, if the initial width of the point light source is discounted.

incidence of the taper through the relation tan(*θ*)/tan(*θc*)=L1/L2=|A1B1|/|A2B2| = 1/*βx*, where *β<sup>x</sup>* being the vertical amplification ratio of the whole optical lens system. When SLMs with 1920×1080 pixels are employed, the width of the narrow end of an adiabatic waveguide taper with a refraction index of 1.5 reaches 0.458mm for *λ*=0.532 μm, or 0.860mm for *λ*=1 μm respectively to support Ns=1920 guided eigenmodes. When a 3×3 array of SLMs with same pixels are employed, the width of the narrow end of the taper increases to 1.376mm for *λ*=0.532 μm, or 2.588mm for *λ*=1 μm respectively to support a total of Ns=3×1920=5760 guided eigenmodes. The height of reconstructed conjugate image A1B1 right in front of the narrow end of the tap may have the same size as the taper. Fig.3 plotted the lateral resolutions at different distances Z from the taper (left), or for different sizes of reconstructed image A2B2(middle and right) with *θ<sup>c</sup>* =80º, where the resolution for *λ*=0.532 μm is plotted in green colour and that for *λ*=1 μm in red colour. It could be seen that within a distance of Z=0~1000 μm, the resolution is jointly determined by wavelength and the pixel number Ns of the SLMs. The optical lens is taken away temporarily since there is no room for it when Z is less than 1mm. However when |A2B2|is larger than 40mm, the resolution becomes irrelevant to wavelength, but decreases linearly with the pixel number Ns of the SLMs and increase linearly with the size of |A2B2|. When |A2B2|=100m, the resolution is about 10.25mm for Ns=1920 and 3.41mm for Ns=5760 respectively.

Fig. 3. Lateral resolution of a laser probe at a distance Z in the range of 0~1000μm (left); or with |A2B2|in the range of 1~100mm(middle); and 0.1~100m(right) for *λ*=0.532 μm(green line) and *λ*=1 μm (red line).

To see more clearly how the device works, Fig.4 simulated the reconstruction of a single light spot via digital optical phase conjugation. The simulation used the same software and followed the same procedure as described in Ref.[Zhiyang, 2010a]. In the calculation *λ*=1.032μm, the number of eigenmodes equals 200 and the perfectly matched layer has a thickness of - 0.15i. The adiabatic waveguide taper has a refraction index of 1.5. To save time only the first stack of the taper, which has a height of 20 micrometers and a length of 5 micrometers, was taken into consideration. A small point light source was placed 25 micrometers away from the taper in the air. As could be seen from Fig.4a, the light emitted from the point light source propagates from left to right, enters the first stack of the taper and stimulates various eigenmodes within the taper. The amplitudes and phases of all the guided eigenmodes on the right side end of the first stack of the taper were transferred to their conjugate forms and used as input on the right side. As could be seen from Fig.4b the light returned to the left side and rebuilt a point light source with expanded size.

incidence of the taper through the relation tan(*θ*)/tan(*θc*)=L1/L2=|A1B1|/|A2B2| = 1/*βx*, where *β<sup>x</sup>* being the vertical amplification ratio of the whole optical lens system. When SLMs with 1920×1080 pixels are employed, the width of the narrow end of an adiabatic waveguide taper with a refraction index of 1.5 reaches 0.458mm for *λ*=0.532 μm, or 0.860mm for *λ*=1 μm respectively to support Ns=1920 guided eigenmodes. When a 3×3 array of SLMs with same pixels are employed, the width of the narrow end of the taper increases to 1.376mm for *λ*=0.532 μm, or 2.588mm for *λ*=1 μm respectively to support a total of Ns=3×1920=5760 guided eigenmodes. The height of reconstructed conjugate image A1B1 right in front of the narrow end of the tap may have the same size as the taper. Fig.3 plotted the lateral resolutions at different distances Z from the taper (left), or for different sizes of reconstructed image A2B2(middle and right) with *θ<sup>c</sup>* =80º, where the resolution for *λ*=0.532 μm is plotted in green colour and that for *λ*=1 μm in red colour. It could be seen that within a distance of Z=0~1000 μm, the resolution is jointly determined by wavelength and the pixel number Ns of the SLMs. The optical lens is taken away temporarily since there is no room for it when Z is less than 1mm. However when |A2B2|is larger than 40mm, the resolution becomes irrelevant to wavelength, but decreases linearly with the pixel number Ns of the SLMs and increase linearly with the size of |A2B2|. When |A2B2|=100m, the resolution is

Fig. 3. Lateral resolution of a laser probe at a distance Z in the range of 0~1000μm (left); or with |A2B2|in the range of 1~100mm(middle); and 0.1~100m(right) for *λ*=0.532 μm(green

To see more clearly how the device works, Fig.4 simulated the reconstruction of a single light spot via digital optical phase conjugation. The simulation used the same software and followed the same procedure as described in Ref.[Zhiyang, 2010a]. In the calculation *λ*=1.032μm, the number of eigenmodes equals 200 and the perfectly matched layer has a thickness of - 0.15i. The adiabatic waveguide taper has a refraction index of 1.5. To save time only the first stack of the taper, which has a height of 20 micrometers and a length of 5 micrometers, was taken into consideration. A small point light source was placed 25 micrometers away from the taper in the air. As could be seen from Fig.4a, the light emitted from the point light source propagates from left to right, enters the first stack of the taper and stimulates various eigenmodes within the taper. The amplitudes and phases of all the guided eigenmodes on the right side end of the first stack of the taper were transferred to their conjugate forms and used as input on the right side. As could be seen from Fig.4b the

light returned to the left side and rebuilt a point light source with expanded size.

about 10.25mm for Ns=1920 and 3.41mm for Ns=5760 respectively.

line) and *λ*=1 μm (red line).

(b). Distribution for rebuilt light, left: 2-D field; right:1-D Electrical component at Z=0

From the X-directional field distribution one can see that the rebuilt light spot has a halfmaximum-width of about 1μm, which is very close to the predicated resolution of 0.83μm by Eq.1, if the initial width of the point light source is discounted.

Fig.5 demonstrated how multiple light spots could be reconstructed simultaneously via digital optical phase conjugation. The simulation parameters were the same as in Fig.4. Three small point light sources were placed 25 micrometers away from the taper and separated 15 micrometers from each other along vertical direction. As could be seen from Fig.5a, the lights emitted from the three point light sources propagate from left to right, enter the first stack of the taper and stimulate various eigenmodes within the taper.

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 7

explained in Ref. [Zhiyang, 2010a] more than 10000 light spots could be generated simultaneously using 8-bit SLMs. Each light spot produces a light cone, or a so called laser

Once large number of laser probes could be produced we may employ them to construct 3D cameras for various applications. Four typical configurations, each dedicated to a particular application, have been presented in following four subsections. Subsection 3.1 provided a simple configuration for micro 3D measurement, while Subsection 3.2 focused on fast obstacle detection in a large volume for auto-navigation and safe driving. The methods and theory set up in section 3.2 also apply in rest subsections. Subsection 3.3 discussed the combination of a laser probe 3D camera with stereovision for full field real-time 3D measurements. Subsection 3.4 discussed briefly strategies for accurate static 3D measurements, including large size and 360-deg shape measurements for industry

To measure three dimensional coordinates of a micro object, we may put it under a digital microscope and search the surface with laser probes as illustrated in Fig.6. When the tip of a laser probe touches the surface it produces a light spot with minimum size and the preset position Z0 of the tip stands for the vertical coordinate of the object. When the tip lays at a height of ΔZ below or above the surface, the diameter of the light spot scattered by the surface expand to Δd. From the geometric relation illustrated in Fig.6 it is easy to see that,

Z0 <sup>Δ</sup><sup>Z</sup> <sup>Δ</sup><sup>d</sup>

where d is the width of the narrow end of an adiabatic waveguide taper. From Eq.2 it is clear that the depth resolution depends on the minimum detectable size of Δd. The minimum detectable size on the image plane of the objective lens is limited by the pixel size of CCD or CMOS image sensor as W0/N0, where W0 is the width of an image sensor that contains N0 pixels. When mapped back onto object plane, the minimum detectable size of Δd is W0/βN0,

Fig. 6. Set-up for micro 3D measurement with laser probes incident from below the object.

d

Taper

Object

Objective lens

Z0

ΔZ

Δd

<sup>d</sup> (2)

**3. Configurations of laser-probe 3D cameras** 

**3.1 Micro 3D measurement** 

inspection. The resolution for each configuration was also analyzed.

probe.

(a). Distribution for incident light, left: 2-D field; right:1-D Electrical component at Z=0

The amplitudes and phases of all the guided eigenmodes on the right side end of the first stack of the taper were recorded. This can also be done in a cumulative way. That is, at one time place one point light source at one place and record the amplitudes and phases of stimulated guided eigenmodes on the right side. Then for each stimulated guided eigenmode sum up the amplitudes and phases recorded in successive steps. Due to the linearity of the system the resulting amplitudes and phases for each stimulated guided eigenmode appear the same as that obtained by placing all the three point light sources at their paces at a time. Next the conjugate forms of above recorded guided eigenmodes were used as input on the right side. As could be seen from Fig.5b the light returned to the left side and rebuilt three point light sources at the same position but with expanded size. As explained in Ref. [Zhiyang, 2010a] more than 10000 light spots could be generated simultaneously using 8-bit SLMs. Each light spot produces a light cone, or a so called laser probe.

#### **3. Configurations of laser-probe 3D cameras**

Once large number of laser probes could be produced we may employ them to construct 3D cameras for various applications. Four typical configurations, each dedicated to a particular application, have been presented in following four subsections. Subsection 3.1 provided a simple configuration for micro 3D measurement, while Subsection 3.2 focused on fast obstacle detection in a large volume for auto-navigation and safe driving. The methods and theory set up in section 3.2 also apply in rest subsections. Subsection 3.3 discussed the combination of a laser probe 3D camera with stereovision for full field real-time 3D measurements. Subsection 3.4 discussed briefly strategies for accurate static 3D measurements, including large size and 360-deg shape measurements for industry inspection. The resolution for each configuration was also analyzed.

#### **3.1 Micro 3D measurement**

6 Digital Image Processing

(a). Distribution for incident light, left: 2-D field; right:1-D Electrical component at Z=0

(b). Distribution for rebuilt light, left: 2-D field; right:1-D Electrical component at Z=0

The amplitudes and phases of all the guided eigenmodes on the right side end of the first stack of the taper were recorded. This can also be done in a cumulative way. That is, at one time place one point light source at one place and record the amplitudes and phases of stimulated guided eigenmodes on the right side. Then for each stimulated guided eigenmode sum up the amplitudes and phases recorded in successive steps. Due to the linearity of the system the resulting amplitudes and phases for each stimulated guided eigenmode appear the same as that obtained by placing all the three point light sources at their paces at a time. Next the conjugate forms of above recorded guided eigenmodes were used as input on the right side. As could be seen from Fig.5b the light returned to the left side and rebuilt three point light sources at the same position but with expanded size. As

Fig. 5. Reconstruction of three light spots via digital optical phase conjugation.

To measure three dimensional coordinates of a micro object, we may put it under a digital microscope and search the surface with laser probes as illustrated in Fig.6. When the tip of a laser probe touches the surface it produces a light spot with minimum size and the preset position Z0 of the tip stands for the vertical coordinate of the object. When the tip lays at a height of ΔZ below or above the surface, the diameter of the light spot scattered by the surface expand to Δd. From the geometric relation illustrated in Fig.6 it is easy to see that,

$$
\Delta \mathbf{Z} = \frac{\mathbf{Z}\_0}{\mathbf{d}} \Delta \mathbf{d} \tag{2}
$$

where d is the width of the narrow end of an adiabatic waveguide taper. From Eq.2 it is clear that the depth resolution depends on the minimum detectable size of Δd. The minimum detectable size on the image plane of the objective lens is limited by the pixel size of CCD or CMOS image sensor as W0/N0, where W0 is the width of an image sensor that contains N0 pixels. When mapped back onto object plane, the minimum detectable size of Δd is W0/βN0,

Fig. 6. Set-up for micro 3D measurement with laser probes incident from below the object.

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 9

might look like a strait laser stick, which makes it difficult to tell where the tip is. In such a case we may use two laser probe generating units and let the laser probes coming from different units meet at preset positions. Since the two laser probe generating units could be separated with a relatively large distance, the angle between two laser probes pointing to the same preset position may increase greatly. Therefore the coordinates of objects could be

Fig.8 illustrated the basic configuration of a laser probe 3D camera constructed with two laser probe generating units U1,2 and a conventional CMOS digital camera C. The camera C lies in the middle of U1,2. In Fig.8 the laser probe generating unit U1 emits a single laser probe as plotted in red line while U2 emits a single laser probe as plotted in green line. The two laser probes meet at preset point A. An auxiliary blue dashed ray is drawn, which originates at the optic centre of the optical lens of the camera C and passes though point A. It is understandable that all the object points lying along the blue dashed line will come onto the same pixel A' of the CMOS image sensor. If an object lies on a plane P1 in front of point A, the camera captures two light spots, with the light spot produced by red laser probe lying at a pixel distance of -Δj1 on the right side of A' and the light spot produced by green laser probe lying at a pixel distance of -Δj2 on the left side of A' as illustrated in Fig.9a. When an object lies on a plane P2 behind point A, the light spots produced by the red and green laser probes exchange their position as illustrated in Fig.9c. When an object sits right at point A the camera captures a single light spot at A' as illustrated in Fig.9b. Suppose the digital camera C in Fig.8 has a total of N pixels along horizontal direction, which cover a scene with a width of W at distance Z, the X-directional distance Δd1 (or Δd2) between a red (or green) laser probe and a blue dashed line in real space could be estimated from the pixel distance

> 1,2 1,2 1,2 <sup>W</sup> 2Z <sup>Δ</sup><sup>d</sup> <sup>Δ</sup><sup>j</sup> <sup>Δ</sup><sup>j</sup> N N

> > -Δd1 -Δd2

 is the half view angle. As illustrated in Fig.8 and Fig.9a-c, Δd1,2 is positive when the light spot caused by red (or green) laser probe laying at the left (or right) side of A'. For illustrative purpose the laser probes emitted from different units are plotted in different

*tg*

(3)

A

P1 P2

ΔZ

Δd2 Δd1


determined with much better accuracy even if they are located at far distances.

Δj1 (or Δj2) on the captured image by,

Z

X

U1

C

U2

Fig. 8. Basic configuration of a laser probe 3D camera.

Z0

where 

D

where β is the amplification ratio of the objective lens. However if W0/βN0 is less than the optic aberration, which is approximately λ/2NA for a well designed objective lens with a numerical aperture NA, the minimum detectable size of Δd is limited instead by λ/2NA.

Using Eq.2 we can estimate the resolution of ΔZ. As discussed in previous section, when SLMs with 1920×1080 pixels are employed, the width of the narrow end of an adiabatic waveguide taper with a refraction index of 1.5 reaches d=0.458mm for *λ*=0.532 μm. When a 3×3 array of SLMs with same pixels are employed, d increases to 1.376mm. Assuming that a 1/2 inch wide CMOS image sensor with 1920×1080 pixels is placed on the image plane of the objective lens, we have W0/N0 ≈ 12.7mm/1920 = 6.6μm. For typical ×4(NA=0.1), ×10(NA=0.25), ×40(NA=0.65) and ×100(NA=1.25) objective lenses, the optic aberrations are about 2.66, 1.06, 0.41, and 0.21μm respectively. At a distance of Z0=1mm, according to Eq.2, the depth resolutions of ΔZ for above ×4,×10,×40,×100 objective lenses are 5.81, 2.32, 0.89, and 0.46μm for d=0.458mm, or 1.93, 0.77, 0.30, and 0.15μm for d=1.376mm respectively.

In above discussion we have not taken into consideration the influence of the refraction index of the transparent object. Although it is possible to make a proper compensation for the influence once the refraction index is known, there is another way to avoid it by inserting the narrow end of an adiabatic waveguide taper above the objective lens. This could be done with the help of a small half–transparent-half-reflective beam splitter M as illustrated in Fig.7. It is of better depth resolution due to increased cone angle of laser probes at the cost of trouble some calibration for each objective lens. When searching for the surface of an object, the tips of laser probes push down slowly toward object. From monitored successive digital images it is easy to tell when a particular laser probe touches a particular place on the object. Since the laser probes propagate in the air, the influence of the internal refraction index of the object is eliminated.

Fig. 7. Set-up for micro 3D measurement with laser probes incident from above the objective lens.

By the way, besides discrete laser probes, a laser probe generating unit could also project structured light beams. That means a laser probe 3D camera could also work in structured light projection mode. It has been demonstrated that by means of structured light projection a lateral resolution of 1μm and a height resolution of 0.1μm could be achieved [Leonhardt, et at. 1994].

#### **3.2 Real-time large volume 3D detection**

When investigating a large field, we need to project laser probes into far away distance. As a result the cone angles of the laser probes would become extremely small. A laser probe

where β is the amplification ratio of the objective lens. However if W0/βN0 is less than the optic aberration, which is approximately λ/2NA for a well designed objective lens with a numerical aperture NA, the minimum detectable size of Δd is limited instead by λ/2NA.

Using Eq.2 we can estimate the resolution of ΔZ. As discussed in previous section, when SLMs with 1920×1080 pixels are employed, the width of the narrow end of an adiabatic waveguide taper with a refraction index of 1.5 reaches d=0.458mm for *λ*=0.532 μm. When a 3×3 array of SLMs with same pixels are employed, d increases to 1.376mm. Assuming that a 1/2 inch wide CMOS image sensor with 1920×1080 pixels is placed on the image plane of the objective lens, we have W0/N0 ≈ 12.7mm/1920 = 6.6μm. For typical ×4(NA=0.1), ×10(NA=0.25), ×40(NA=0.65) and ×100(NA=1.25) objective lenses, the optic aberrations are about 2.66, 1.06, 0.41, and 0.21μm respectively. At a distance of Z0=1mm, according to Eq.2, the depth resolutions of ΔZ for above ×4,×10,×40,×100 objective lenses are 5.81, 2.32, 0.89, and 0.46μm

In above discussion we have not taken into consideration the influence of the refraction index of the transparent object. Although it is possible to make a proper compensation for the influence once the refraction index is known, there is another way to avoid it by inserting the narrow end of an adiabatic waveguide taper above the objective lens. This could be done with the help of a small half–transparent-half-reflective beam splitter M as illustrated in Fig.7. It is of better depth resolution due to increased cone angle of laser probes at the cost of trouble some calibration for each objective lens. When searching for the surface of an object, the tips of laser probes push down slowly toward object. From monitored successive digital images it is easy to tell when a particular laser probe touches a particular place on the object. Since the laser probes propagate in the air, the influence of the internal

Fig. 7. Set-up for micro 3D measurement with laser probes incident from above the objective

M Taper

Object

By the way, besides discrete laser probes, a laser probe generating unit could also project structured light beams. That means a laser probe 3D camera could also work in structured light projection mode. It has been demonstrated that by means of structured light projection a lateral resolution of 1μm and a height resolution of 0.1μm could be achieved [Leonhardt,

When investigating a large field, we need to project laser probes into far away distance. As a result the cone angles of the laser probes would become extremely small. A laser probe

for d=0.458mm, or 1.93, 0.77, 0.30, and 0.15μm for d=1.376mm respectively.

Objective lens

refraction index of the object is eliminated.

**3.2 Real-time large volume 3D detection** 

lens.

et at. 1994].

might look like a strait laser stick, which makes it difficult to tell where the tip is. In such a case we may use two laser probe generating units and let the laser probes coming from different units meet at preset positions. Since the two laser probe generating units could be separated with a relatively large distance, the angle between two laser probes pointing to the same preset position may increase greatly. Therefore the coordinates of objects could be determined with much better accuracy even if they are located at far distances.

Fig.8 illustrated the basic configuration of a laser probe 3D camera constructed with two laser probe generating units U1,2 and a conventional CMOS digital camera C. The camera C lies in the middle of U1,2. In Fig.8 the laser probe generating unit U1 emits a single laser probe as plotted in red line while U2 emits a single laser probe as plotted in green line. The two laser probes meet at preset point A. An auxiliary blue dashed ray is drawn, which originates at the optic centre of the optical lens of the camera C and passes though point A. It is understandable that all the object points lying along the blue dashed line will come onto the same pixel A' of the CMOS image sensor. If an object lies on a plane P1 in front of point A, the camera captures two light spots, with the light spot produced by red laser probe lying at a pixel distance of -Δj1 on the right side of A' and the light spot produced by green laser probe lying at a pixel distance of -Δj2 on the left side of A' as illustrated in Fig.9a. When an object lies on a plane P2 behind point A, the light spots produced by the red and green laser probes exchange their position as illustrated in Fig.9c. When an object sits right at point A the camera captures a single light spot at A' as illustrated in Fig.9b. Suppose the digital camera C in Fig.8 has a total of N pixels along horizontal direction, which cover a scene with a width of W at distance Z, the X-directional distance Δd1 (or Δd2) between a red (or green) laser probe and a blue dashed line in real space could be estimated from the pixel distance Δj1 (or Δj2) on the captured image by,

$$\Delta \text{dılı} \mathbf{z} = \frac{\text{W}}{\text{N}} \Delta \text{jı} \mathbf{z} = \frac{2 \text{Z} \text{t} \text{g} \alpha}{\text{N}} \Delta \text{jı} \mathbf{z} \tag{3}$$

where is the half view angle. As illustrated in Fig.8 and Fig.9a-c, Δd1,2 is positive when the light spot caused by red (or green) laser probe laying at the left (or right) side of A'. For illustrative purpose the laser probes emitted from different units are plotted in different

Fig. 8. Basic configuration of a laser probe 3D camera.

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 11

improve by 10 times. That is to say, a 0.5m wide object lying at a distance of 5m from the camera could be measured with a depth precision of 1.3mm (N= 1920), or 0.25mm(N=10k),

Fig. 10. Depth resolution of a laser probe 3D camera in the range of 0~10m(left) and

To acquire the three dimensional coordinates of a large scene the laser probe generating units should emit hundreds and thousands of laser probes. For convenience, in Fig.8 only one laser probe is shown for each unit. In Fig.11 six laser probes are plotted for each unit. It

Z1 Z2

=0.5, dj1,2=0.5, and N=1920(blue) or 10k(red).

A1

A2

ΔX

A3

A4

A5

A6

W

Z

Fig. 11. Propagations of laser probes with preset destinations at distance Z0.

Z0

0~100m(right) with D=1000mm, *tg*

X

U1

C

U2

D

if its image covers the whole area of the CCD or CMOS image sensor.

colours. In a real laser probe 3D camera all the laser probes may have the same wavelength. To distinguish them we may set the laser probes emitted from one unit slightly higher in vertical direction than the laser probes emitted from another unit as illustrated in Fig.9d-f.

Fig. 9. Images of laser probes reflected by an object located at different distances. Left: in front of A; Middle: right at A; Right: behind A.

From X-directional distance Δd1,2 it is easy to derive the Z-directional distance ΔZ of the object from the preset position A using the geometric relation,

$$\frac{\Delta \text{du}\_{1} z}{\Delta Z} = \frac{\text{D}}{\text{2Zo}}\tag{4}$$

where D is the space between two laser probe generating units U1,2, Z0 being the preset distance of point A. From Eq.3-4 it is not difficult to find,

$$\Delta Z = Z\alpha + \Delta Z = \frac{\text{DNZo}}{\text{DN-4Zo}\text{tg}\alpha\Delta\text{ja}.2} \tag{5}$$

After differentiation and some rearrangement, Eq.5 yields,

$$\text{dZ} = \frac{4\text{Z}^2 \text{tg}\alpha}{\text{DN}} \text{dj} \text{j},\tag{6}$$

where dZ and dj1,2 are small deviations, or measuring precisions of ΔZ and Δj1,2 respectively. In Eq.6 it is noticeable that the preset distance Z0 of a laser probe exerts little influence on the measuring precisions of ΔZ. Usually Δj1,2 could be measured with half pixel precision. Assuming D=1000mm, *tg*=0.5 and dj1,2=0.5, Fig.10 plotted the calculated precision dZ based on Eq.6 when a commercial video camera with 1920×1080 pixels, N= 1920 (in blue line), or a dedicated camera with 10k×10k pixels, N= 10k(in red line) is employed. As could be seen from Fig.10 the depth resolution changes with the square of object distance Z. At a distance of 100,10, 5, and 1m, the depth resolutions are 5263, 53, 13, and 0.5mm for N=1920, which reduce to 1000, 10, 2.5, and 0.1mm respectively for N=10k. The depth resolutions are acceptable in many applications considering the field is as wide as 100 m at a distance of 100 mm. From Eq.6 it is clear that to improve the depth resolution one can increase D or N, or both. But the most convenient way is to decrease , that is, to make a close-up of the object. For example, when *tg*decreases from 0.5 to 0.05, the measuring precision of Z would

colours. In a real laser probe 3D camera all the laser probes may have the same wavelength. To distinguish them we may set the laser probes emitted from one unit slightly higher in vertical direction than the laser probes emitted from another unit as illustrated in Fig.9d-f.


A' A' A'

b).

e).

Fig. 9. Images of laser probes reflected by an object located at different distances. Left: in

1,2

DNZ Z Z <sup>Δ</sup><sup>Z</sup>

Δd D

where D is the space between two laser probe generating units U1,2, Z0 being the preset

2

where dZ and dj1,2 are small deviations, or measuring precisions of ΔZ and Δj1,2 respectively. In Eq.6 it is noticeable that the preset distance Z0 of a laser probe exerts little influence on the measuring precisions of ΔZ. Usually Δj1,2 could be measured with half pixel precision.

based on Eq.6 when a commercial video camera with 1920×1080 pixels, N= 1920 (in blue line), or a dedicated camera with 10k×10k pixels, N= 10k(in red line) is employed. As could be seen from Fig.10 the depth resolution changes with the square of object distance Z. At a distance of 100,10, 5, and 1m, the depth resolutions are 5263, 53, 13, and 0.5mm for N=1920, which reduce to 1000, 10, 2.5, and 0.1mm respectively for N=10k. The depth resolutions are acceptable in many applications considering the field is as wide as 100 m at a distance of 100 mm. From Eq.6 it is clear that to improve the depth resolution one can increase D or N, or

4Z dZ dj DN *tg*

From X-directional distance Δd1,2 it is easy to derive the Z-directional distance ΔZ of the


A' A' A'

0

DN-4Z *tg*

0

0 1,2

Δj

1,2

<sup>Δ</sup>Z 2Z (4)

(6)

, that is, to make a close-up of the object.

(5)

c).

f ).

=0.5 and dj1,2=0.5, Fig.10 plotted the calculated precision dZ

decreases from 0.5 to 0.05, the measuring precision of Z would

front of A; Middle: right at A; Right: behind A.

a).

d).

object from the preset position A using the geometric relation,

distance of point A. From Eq.3-4 it is not difficult to find,

After differentiation and some rearrangement, Eq.5 yields,

both. But the most convenient way is to decrease

Assuming D=1000mm, *tg*

For example, when *tg*

0

improve by 10 times. That is to say, a 0.5m wide object lying at a distance of 5m from the camera could be measured with a depth precision of 1.3mm (N= 1920), or 0.25mm(N=10k), if its image covers the whole area of the CCD or CMOS image sensor.

Fig. 10. Depth resolution of a laser probe 3D camera in the range of 0~10m(left) and 0~100m(right) with D=1000mm, *tg*=0.5, dj1,2=0.5, and N=1920(blue) or 10k(red).

To acquire the three dimensional coordinates of a large scene the laser probe generating units should emit hundreds and thousands of laser probes. For convenience, in Fig.8 only one laser probe is shown for each unit. In Fig.11 six laser probes are plotted for each unit. It

Fig. 11. Propagations of laser probes with preset destinations at distance Z0.

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 13

points. To avoid further confusion it is important that laser probes for different Z0 should be arranged on different planes as indicated in Fig.12. For laser probes arranged on the same plane perpendicular to Y-Z plane they share the same cut line on Y-Z plane. Since the optic centres of the two laser probe emitting units U1,2 and the optical lens of digital camera C all sit at (0, 0) on Y-Z plane, if we arrange the laser probes for a particular distance Z0 on the same plane perpendicular to Y-Z plane, they will come across with each other on that plane with no chance to come across with other laser probes arranged on other planes

Fig. 12. Laser probes arranged on different planes perpendicular to Y-Z plane.

. In following design we let

In what follows, we will design a laser probe 3D camera for auto-navigation and driving assistant systems, demonstrating in detail how the laser probes could be arranged to provide accurate and definite depth measurement. In view of safety a 3D camera for autonavigation or driving assistant systems should detector obstacles in very short time and acquiring three dimensional coordinates within a range from 1 to 100m and a relatively

Z1

Z Z0

Z3

Z2

to be mounted within a car, we may chose a large separation for two laser probe generating units as D=1m, which provides a depth resolution as plotted in Fig.10. To avoid above false measurements we project laser probes with preset destinations at seven different planes at Z0=2,4,8,14,26,50, and 100m. In addition the X-directional spaces between adjacent preset destinations are all set as ΔX=Xi-Xi+1=2m, where the preset destination with lower index number assumes larger X coordinate. The propagations of these laser probes in the air are illustrated in Fig.13-14. In Fig.13 the propagations of the laser probes over a short range between zero to the preset destinations are drawn on the left side, while the propagations of the same laser probes over the entire range between 0~100m are drawn on the right side. In Fig.14 only the propagations of the laser probes over the entire range between 0~100m are shown. The optic centres of the first and second laser probe generating units U1,2 are located at (0.5,1) and (-0.5,0) respectively, while the camera C sits at the original point (0,0). The red and green lines stand for the laser probes emitted from U1 and U2 respectively. The solid blue lines connect the optic centre of the optical lens with the preset destinations of the laser probes on a given plane at Z0, which plays the same auxiliary function as the dashed blue

≈ 26.6º, so that *tg*

=0.5. Since the device is

perpendicular to Y-Z plane.

Y

U1,U2,C

large view angle 2

lines in Fig.8.

is easy to see that as the number of laser probes increases the situation becomes quite complicated. It is true that each laser probe from one unit meets with one particular laser probe from another unit at six preset points A1-6 respectively. However the same laser probe would also come across other five laser probes from another unit at points other than A1-6. Actually, if each laser probe generating unit produces Np laser probes, a total of Np×Np cross points will be made by them, and only Np points among them are at preset positions. The other (Np -1)×Np undesired cross points might probably cause false measurements. See the two cross points on plane Z1 and four cross points on plane Z2 that are marked with small black circles, we could not distinguish them from preset points A2-5, since they all sit on the blue dashed lines, sharing the same preset pixel positions on captured images. As a result it will be impossible to tell whether the object is located around the preset points A1-6 or near the plane Z1 or Z2. To avoid this ambiguity we should first find where the plane Z1 or Z2 is located.

As illustrated in Fig.11, since the optic centre of the optical lens of digital camera C is placed at the original point (0,0), the X-Z coordinates of the optic centres of the two laser probe emitting units U1,2 becomes (D/2,0) and (-D/2,0) respectively. Denoting the X-Z coordinates of Np preset points Ai as (Xi, Z0), i=1,2,…,Np, the equations for red, blue and green lines could be written respectively as,

$$\mathbf{X} = \frac{\mathbf{D}}{2} + (\mathbf{X}\mathbf{i} - \frac{\mathbf{D}}{2})\frac{\mathbf{Z}}{\mathbf{Z}\mathbf{o}}\tag{7}$$

$$\mathbf{X} = \mathbf{X}\_{\mathbf{j}} \frac{\mathbf{Z}}{\mathbf{Z}\mathbf{o}} \tag{8} \\ \tag{9}$$

$$\mathbf{X} = -\frac{\mathbf{D}}{2} + (\mathbf{X}\mathbf{k} + \frac{\mathbf{D}}{2})\frac{\mathbf{Z}}{\mathbf{Z}\mathbf{0}}\tag{9}$$

where i, j and k are independent indexes for preset points Ai, Aj and Ak. The cross points where a red line, a blue line and a green line meet could be find by solving the linear equations Eq.7-9, which yields,

$$\mathbf{Z} = \frac{\mathbf{D}}{\mathbf{D} + \mathbf{X}\mathbf{k} - \mathbf{X}\mathbf{i}} \mathbf{Z}\mathbf{o} \tag{10a}$$

$$\mathbf{X} = \mathbf{X}\_{\mathrm{j}} \frac{\mathbf{Z}}{\mathbf{Z}\mathbf{o}} \tag{10b}$$

$$\mathbf{X}\_{\mathbf{i}} = \frac{\mathbf{X}\_{\mathbf{k}} + \mathbf{X}\_{\mathbf{i}}}{2} \tag{10c}$$

When X=Xi=Xj=Xk, according to Eq.10a, Z=Z0. They stand for the coordinates of Np preset points. When Xk≠Xi, we have Z≠Z0, which gives the coordinates of cross points that cause ambiguity, like the cross points marked with black circles on plane Z1 or Z2 in Fig.11.

One way to eliminate above false measurements is to arrange more laser probes with preset destinations at different Z0 that helps to verify whether the object is located near the preset

is easy to see that as the number of laser probes increases the situation becomes quite complicated. It is true that each laser probe from one unit meets with one particular laser probe from another unit at six preset points A1-6 respectively. However the same laser probe would also come across other five laser probes from another unit at points other than A1-6. Actually, if each laser probe generating unit produces Np laser probes, a total of Np×Np cross points will be made by them, and only Np points among them are at preset positions. The other (Np -1)×Np undesired cross points might probably cause false measurements. See the two cross points on plane Z1 and four cross points on plane Z2 that are marked with small black circles, we could not distinguish them from preset points A2-5, since they all sit on the blue dashed lines, sharing the same preset pixel positions on captured images. As a result it will be impossible to tell whether the object is located around the preset points A1-6 or near the plane Z1 or Z2. To avoid this ambiguity we should first find where the plane Z1 or Z2 is

As illustrated in Fig.11, since the optic centre of the optical lens of digital camera C is placed at the original point (0,0), the X-Z coordinates of the optic centres of the two laser probe emitting units U1,2 becomes (D/2,0) and (-D/2,0) respectively. Denoting the X-Z coordinates of Np preset points Ai as (Xi, Z0), i=1,2,…,Np, the equations for red, blue and green lines

> 0 D DZ X (X ) i 1,2,...,N 2 2Z

0 D DZ X (X ) k 1,2,...,N 2 2Z

where i, j and k are independent indexes for preset points Ai, Aj and Ak. The cross points where a red line, a blue line and a green line meet could be find by solving the linear

0

j p

i p

k p

0

k i <sup>D</sup> Z Z DX X

> j 0

k i

<sup>Z</sup> X X

When X=Xi=Xj=Xk, according to Eq.10a, Z=Z0. They stand for the coordinates of Np preset points. When Xk≠Xi, we have Z≠Z0, which gives the coordinates of cross points that cause

One way to eliminate above false measurements is to arrange more laser probes with preset destinations at different Z0 that helps to verify whether the object is located near the preset

j X X <sup>X</sup> 2

ambiguity, like the cross points marked with black circles on plane Z1 or Z2 in Fig.11.

(7)

<sup>Z</sup> X X <sup>j</sup> 1,2,...,N <sup>Z</sup> (8)

(9)

(10a)

<sup>Z</sup> (10b)

(10c)

located.

could be written respectively as,

equations Eq.7-9, which yields,

points. To avoid further confusion it is important that laser probes for different Z0 should be arranged on different planes as indicated in Fig.12. For laser probes arranged on the same plane perpendicular to Y-Z plane they share the same cut line on Y-Z plane. Since the optic centres of the two laser probe emitting units U1,2 and the optical lens of digital camera C all sit at (0, 0) on Y-Z plane, if we arrange the laser probes for a particular distance Z0 on the same plane perpendicular to Y-Z plane, they will come across with each other on that plane with no chance to come across with other laser probes arranged on other planes perpendicular to Y-Z plane.

Fig. 12. Laser probes arranged on different planes perpendicular to Y-Z plane.

In what follows, we will design a laser probe 3D camera for auto-navigation and driving assistant systems, demonstrating in detail how the laser probes could be arranged to provide accurate and definite depth measurement. In view of safety a 3D camera for autonavigation or driving assistant systems should detector obstacles in very short time and acquiring three dimensional coordinates within a range from 1 to 100m and a relatively large view angle 2. In following design we let ≈ 26.6º, so that *tg*=0.5. Since the device is to be mounted within a car, we may chose a large separation for two laser probe generating units as D=1m, which provides a depth resolution as plotted in Fig.10. To avoid above false measurements we project laser probes with preset destinations at seven different planes at Z0=2,4,8,14,26,50, and 100m. In addition the X-directional spaces between adjacent preset destinations are all set as ΔX=Xi-Xi+1=2m, where the preset destination with lower index number assumes larger X coordinate. The propagations of these laser probes in the air are illustrated in Fig.13-14. In Fig.13 the propagations of the laser probes over a short range between zero to the preset destinations are drawn on the left side, while the propagations of the same laser probes over the entire range between 0~100m are drawn on the right side. In Fig.14 only the propagations of the laser probes over the entire range between 0~100m are shown. The optic centres of the first and second laser probe generating units U1,2 are located at (0.5,1) and (-0.5,0) respectively, while the camera C sits at the original point (0,0). The red and green lines stand for the laser probes emitted from U1 and U2 respectively. The solid blue lines connect the optic centre of the optical lens with the preset destinations of the laser probes on a given plane at Z0, which plays the same auxiliary function as the dashed blue lines in Fig.8.

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 15

1 1 Z ZZ

where n=k-i is an even integer. For an odd integer of n Eq.10c does not hold true. Therefore Z≤Z0/5, if n≠0, which implies that all the possible undesired cross points are located much

W Z N 11

According to Eq.12, Np=2 at Z0=2m. As the maximum value for n in Eq.11 is Np-1=1 and n should be even integer, we have n=0. It means that beside 2 preset points there are no other cross points. In Fig.13a, from the left figure we see only two cross points at preset destinations at 2m. We find no extra cross points in the right figure which plotted the propagations of the same laser probes over a large range of 0~100m. In addition we see by close observation that at large distances the X-directional distance between a red (or green) line and an adjacent blue line approaches one forth of the X-directional distance between

From Z=Z0 to Z=Z0+ΔZ, the X-directional distance between a red (or green) line and an

directional distance between adjacent blue lines changes from ΔX to ΔX'. It is easy to find

0 0 ΔX X' ZZ Z

1,2

d D

0 0 <sup>Δ</sup><sup>X</sup> X' (Z Z) <sup>Z</sup>

0

X' 2 X Z Z

From Eq.15 we can see that Δd1,2/ΔX' approaches 1/4 when ΔZ>>Z0. It could also be seen that Δd1,2/ΔX' becomes -1/4 when ΔZ = - Z0/2. In combination, start from Z0/2 to infinity, both red and green lines are centred round blue lines with X-directional deviations no larger than one fourth of the X-directional distances between adjacent blue lines at the same distance, obtaining no chance to intersect with each other. It implies that no ambiguity would occur if the laser probes with preset destinations at Z0 are used to measure the depth of an object located within the range from Z0/2 to infinity. As shown in Fig.13a using laser probes with preset destinations at Z0=2m, from monitored pictures we can definitely tell whether there is an object and where it is within the range of 1~100m if we search round the

adjacent blue line increases from zero to Δd1,2 as described by Eq.4, meanwhile the X-

0 0

0

1 (k i) X 1 2n (11)

<sup>Δ</sup>X 2 (12)

(13)

(14)

*<sup>Z</sup>* (15)

=Z0, the total

First lets check Fig.13a for Z0=2m. Since ΔX=Xi-Xi+1=2m, Eq.10a becomes,

closer to the camera C. In addition since the field width at Z0 is W=2Z0*tg*

p

two adjacent blue lines. This phenomenon could be explained as follows.

that,

Rearrange Eq.13 as,

Dived Eq.4 by Eq.14, we get,

number of laser probes that could be arranged within a width of W is

Fig. 13. Propagations of laser probes with destinations at a). 2m; b). 4m; c). 8m; and d). 14m.

First lets check Fig.13a for Z0=2m. Since ΔX=Xi-Xi+1=2m, Eq.10a becomes,

$$\mathbf{Z} = \frac{1}{\mathbf{1} + (\mathbf{k} - \mathbf{i})\Delta\mathbf{X}} \mathbf{Z}\mathbf{o} = \frac{1}{\mathbf{1} + 2\mathbf{n}} \mathbf{Z}\mathbf{o} \tag{11}$$

where n=k-i is an even integer. For an odd integer of n Eq.10c does not hold true. Therefore Z≤Z0/5, if n≠0, which implies that all the possible undesired cross points are located much closer to the camera C. In addition since the field width at Z0 is W=2Z0*tg*=Z0, the total number of laser probes that could be arranged within a width of W is

$$\mathbf{N}\_{\text{P}} = \frac{\mathbf{W}}{\Delta \mathbf{X}} + \mathbf{1} = \frac{\mathbf{Z}\mathbf{o}}{2} + \mathbf{1} \tag{12}$$

According to Eq.12, Np=2 at Z0=2m. As the maximum value for n in Eq.11 is Np-1=1 and n should be even integer, we have n=0. It means that beside 2 preset points there are no other cross points. In Fig.13a, from the left figure we see only two cross points at preset destinations at 2m. We find no extra cross points in the right figure which plotted the propagations of the same laser probes over a large range of 0~100m. In addition we see by close observation that at large distances the X-directional distance between a red (or green) line and an adjacent blue line approaches one forth of the X-directional distance between two adjacent blue lines. This phenomenon could be explained as follows.

From Z=Z0 to Z=Z0+ΔZ, the X-directional distance between a red (or green) line and an adjacent blue line increases from zero to Δd1,2 as described by Eq.4, meanwhile the X-

directional distance between adjacent blue lines changes from ΔX to ΔX'. It is easy to find that,

$$\frac{\Delta \mathbf{X}}{\Delta \mathbf{v}} = \frac{\Delta \mathbf{X'}}{\mathbf{Z}\mathbf{v} + \Delta \mathbf{Y}} \tag{13}$$

Rearrange Eq.13 as,

14 Digital Image Processing

a).

b).

c).

d).

Fig. 13. Propagations of laser probes with destinations at a). 2m; b). 4m; c). 8m; and d). 14m.

$$
\Delta \mathbf{X}' = \frac{\Delta \mathbf{X}}{\mathbf{Z}\mathbf{o}} (\mathbf{Z}\mathbf{o} + \Delta \mathbf{Z}) \tag{14}
$$

Dived Eq.4 by Eq.14, we get,

$$\frac{\Delta \text{d1}, \text{2}}{\Delta \text{X}'} = \frac{\text{D}}{\text{2} \Delta \text{X}} \frac{\Delta \text{Z}}{\text{Z} \phi + \Delta \text{Z}} \tag{15}$$

From Eq.15 we can see that Δd1,2/ΔX' approaches 1/4 when ΔZ>>Z0. It could also be seen that Δd1,2/ΔX' becomes -1/4 when ΔZ = - Z0/2. In combination, start from Z0/2 to infinity, both red and green lines are centred round blue lines with X-directional deviations no larger than one fourth of the X-directional distances between adjacent blue lines at the same distance, obtaining no chance to intersect with each other. It implies that no ambiguity would occur if the laser probes with preset destinations at Z0 are used to measure the depth of an object located within the range from Z0/2 to infinity. As shown in Fig.13a using laser probes with preset destinations at Z0=2m, from monitored pictures we can definitely tell whether there is an object and where it is within the range of 1~100m if we search round the

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 17

preset image position A' and confine the searching pixel range Δj less than one fourth of the pixel number between two adjacent preset image positions. Since Np preset points distribute evenly over a width of W, which cover a total of N pixels, Δj ≤ N/4Np. If N=1000, Δj ≤125.

Next lets check Fig.13b for Z0=4m. Using Eq.12, we have Np=3. Since the maximum value for n in Eq.11 is Np-1=2 and n should be even integer, we have n=0, 2. It means that beside 3 preset points there is Np-n=3-2=1 extra cross point at Z=Z0/5=0.8m, which are clearly seen in the left figure in Fig.13b. The number of extra cross points decreases by n because j=(k+i)/2=k-n/2 as required by Eq.10c is unable to adopt every number from 1 to Np. As discussed above using laser probes with preset destinations at Z0=4m, from captured pictures we can definitely tell whether there is an object and where it is within the range of

Similarly both the preset points and the extra cross points are observed exactly as predicated by Eq.12 for Z0=8,14,26,50, and 100m as illustrated in Fig.13c-d and Fig.14. With above arrangement a wide object at a certain distance Z might be hit by laser probes with preset destinations on different planes, while a narrow object might still be missed by all above laser probes since the X-directional spaces between adjacent laser probes are now more than ΔX=2m if Z>Z0, although they decrease to ΔX/2=1m at Z0/2. To detect narrow objects we may add another 100 gropes of laser probes with same preset destinations at Z0 but on different planes perpendicular to Y-Z plane, each grope shifting ΔX/100=20mm along Xdirectional as illustrated in Fig.15. With all these laser probes a slender object as narrow as 20mm, see the object O1 in Fig.15a, would be caught without exception at a single measurement. But if an object is not tall enough to cover several rows of laser probes, see the object O2 in Fig.15a, it may also escape from detection. To increase the possibility of detecting objects with small height we may re-arrange the positions of the laser probes by inserting each row of laser probes from the lower half part into every row of laser probes at upper half part. As a result the maximum X-directional shift between adjacent rows of laser probes reduces from 2-0.02=1.98m to 1m as illustrated in Fig.15b. As could be seen the same

2~100m if we confine the searching pixel range to Δj ≤ N/4Np. If N=1000, Δj ≤83.

object O2 now gets caught by a laser probe in the fourth row.

a). b).

O1

X Y

O2

2m

1.98m

Fig. 15. Arrangements of laser probes with same destination on the X-Y plane.

20mm

O1

X Y

O2

2m

1m

20mm

Fig. 14. Propagations of laser probes with destinations at a). 26m; b). 50m; and c). 100m.

Fig. 14. Propagations of laser probes with destinations at a). 26m; b). 50m; and c). 100m.

c).

b).

a).

preset image position A' and confine the searching pixel range Δj less than one fourth of the pixel number between two adjacent preset image positions. Since Np preset points distribute evenly over a width of W, which cover a total of N pixels, Δj ≤ N/4Np. If N=1000, Δj ≤125.

Next lets check Fig.13b for Z0=4m. Using Eq.12, we have Np=3. Since the maximum value for n in Eq.11 is Np-1=2 and n should be even integer, we have n=0, 2. It means that beside 3 preset points there is Np-n=3-2=1 extra cross point at Z=Z0/5=0.8m, which are clearly seen in the left figure in Fig.13b. The number of extra cross points decreases by n because j=(k+i)/2=k-n/2 as required by Eq.10c is unable to adopt every number from 1 to Np. As discussed above using laser probes with preset destinations at Z0=4m, from captured pictures we can definitely tell whether there is an object and where it is within the range of 2~100m if we confine the searching pixel range to Δj ≤ N/4Np. If N=1000, Δj ≤83.

Similarly both the preset points and the extra cross points are observed exactly as predicated by Eq.12 for Z0=8,14,26,50, and 100m as illustrated in Fig.13c-d and Fig.14. With above arrangement a wide object at a certain distance Z might be hit by laser probes with preset destinations on different planes, while a narrow object might still be missed by all above laser probes since the X-directional spaces between adjacent laser probes are now more than ΔX=2m if Z>Z0, although they decrease to ΔX/2=1m at Z0/2. To detect narrow objects we may add another 100 gropes of laser probes with same preset destinations at Z0 but on different planes perpendicular to Y-Z plane, each grope shifting ΔX/100=20mm along Xdirectional as illustrated in Fig.15. With all these laser probes a slender object as narrow as 20mm, see the object O1 in Fig.15a, would be caught without exception at a single measurement. But if an object is not tall enough to cover several rows of laser probes, see the object O2 in Fig.15a, it may also escape from detection. To increase the possibility of detecting objects with small height we may re-arrange the positions of the laser probes by inserting each row of laser probes from the lower half part into every row of laser probes at upper half part. As a result the maximum X-directional shift between adjacent rows of laser probes reduces from 2-0.02=1.98m to 1m as illustrated in Fig.15b. As could be seen the same object O2 now gets caught by a laser probe in the fourth row.

Fig. 15. Arrangements of laser probes with same destination on the X-Y plane.

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 19

smaller than D. Each laser probe puts a mark on the object. From the image captured by camera C the absolute location of the object could be calculated using Eq.5 and the positions of the same mark on the pictures captured by camera C1 and C2 could then be predicated. In other word, one mark on the pictures captured by camera C1 could easily be matched with the same mark on the pictures captured by camera C2. Without these marks the matching between the pictures captured by cameras C1 and C2 might be very difficult or impossible over many pixels, creating serious noises in 3D reconstruction. The matching of the rest pixels around the marks could be performed quickly with reduced searching range. The marks also serve as an accurate and efficient means for the calibration of cameras C1 and C2. In stereovision it is very important to align one camera accurately with another camera, which brings great trouble when changing the focal length of one camera to zoom in or out, since the same amount of changes must be made instantly to the focal length of another camera. With the help of the laser marks, one camera needs only to follow roughly the changes of another camera. This is because all the pictures captured by cameras are projections of objects on the image plane, which is determined by the location, orientation and the focal length of the camera. Usually camera C is fixed at the origin of coordinates as

length, is pre-calibrated. It could also be determined on the spot for every picture captured by camera C based on the fact that the same object detected by neighbouring laser probes with different preset destination Z0 should have nearly the same depth Z as predicated by

over the depths of many pairs of neighbouring laser probes. Next the unknown locations, orientations and the focal lengths of the camera C1,2 could be derived from the image positions of hundreds of laser probes whose absolute coordinates are pre-calculated using Eq.5. Then by stretching, or rotation, or a combination of them, the pictures come from camera C1 could easily be transformed to match with the pictures from camera C2. After above pre-processing, stereo matching might be performed on the overlapped region of the

When laser probes are arranged as discussed in previous subsection, we say that the laser probe 3D camera is working in detection mode. For continuous measurements, once the location of all the objects are found in a frame, laser probes could be rearranged much more densely near the surfaces of known objects so that more three dimensional coordinates could be acquire within successive frames. In this case we say that the laser probe 3D camera is working in tracing mode. After some frames in tracing mode, a laser probe 3D camera should return to detection mode for one frame to check whether there are new objects appearing in the field. In Fig.16 the number of laser probes increased to 11 for tracing mode. It could be seen that as the number of laser probes increases the number of extra crossing points also increases as compared with that in Fig.11. Nevertheless these extra cross points are harmless since we already known the object lies around Z0, away from Z1 to Z3. The stereovision pictures recorded by cameras C1,2 bear the images of lots of laser probes. They are harmful for later 3D display. Although these marks could be cleaned away via post imaging processing, a more preferable approach is to separate the visible light from infrared laser probes with a beam splitter. As illustrated in Fig.17, the beam splitter BS reflects the infrared laser probes onto image transducer CCD1, while passing the visible light onto image transducer CCD2.The advantage to employ two image transducers within one camera

in Eq.5, which is related to the focal

could be properly determined after a least squares fit

illustrated in Fig.16. The only unknown parameter *tg*

Eq.5. Even for very rough objects *tg*

image pairs from camera C1,2.

In above design we arranged 100 group of laser probes with destinations at Z0=2,4,8,14, 26,50, and 100m respectively. That is to say a total of 100×(1+3+5+8+14+26+51)+1=10801 laser probes have been employed. With so many laser probes an object as narrow as 20mm might be detected within a single measurement, or a single frame, without ambiguity. If the object is located between 50 and 100m, it could be detected correctly by any laser probe hitting on it. If it comes closer to the camera, although it might be incorrectly reported by laser probes with destinations at Z0=100m, or 50m, etc, it will be correctly reported by laser probes with destinations at smaller Z0. Considering the facts that the measuring range of the laser probes overlap greatly, the X-directional space between adjacent laser probes reduces by a half at Z0/2 than that at Z0, and the car bearing the camera, or the object itself, is moving, an object as narrow as 10mm or much less has great chance to be detected, or hit by at least one laser probe, within one or several frames.

#### **3.3 Real-time large volume full field 3D measurement**

Usually a laser probe 3D camera discussed in previous subsection could acquire at maximum about 104 three dimensional coordinates every frame using 8-bit SLMs. If dense three dimensional coordinates need to be acquired in real time a laser probe 3D camera could be incorporated with a pair of stereovision cameras. The accurate three dimensional coordinates come directly from the laser probe 3D camera plus those derived from stereovision make a complete description of the full field. More importantly, the laser probe 3D camera helps greatly in matching, noise depression and calibration for stereovision. As illustrated in Fig.16, a pair of digital cameras C1,2 for stereovision have been added to the device in Fig.8, which are separated by a distance of D1. D1 might adopt a value larger or

Fig. 16. A laser probe 3D camera combined with a stereo vision camera.

In above design we arranged 100 group of laser probes with destinations at Z0=2,4,8,14, 26,50, and 100m respectively. That is to say a total of 100×(1+3+5+8+14+26+51)+1=10801 laser probes have been employed. With so many laser probes an object as narrow as 20mm might be detected within a single measurement, or a single frame, without ambiguity. If the object is located between 50 and 100m, it could be detected correctly by any laser probe hitting on it. If it comes closer to the camera, although it might be incorrectly reported by laser probes with destinations at Z0=100m, or 50m, etc, it will be correctly reported by laser probes with destinations at smaller Z0. Considering the facts that the measuring range of the laser probes overlap greatly, the X-directional space between adjacent laser probes reduces by a half at Z0/2 than that at Z0, and the car bearing the camera, or the object itself, is moving, an object as narrow as 10mm or much less has great chance to be detected, or hit by

Usually a laser probe 3D camera discussed in previous subsection could acquire at maximum about 104 three dimensional coordinates every frame using 8-bit SLMs. If dense three dimensional coordinates need to be acquired in real time a laser probe 3D camera could be incorporated with a pair of stereovision cameras. The accurate three dimensional coordinates come directly from the laser probe 3D camera plus those derived from stereovision make a complete description of the full field. More importantly, the laser probe 3D camera helps greatly in matching, noise depression and calibration for stereovision. As illustrated in Fig.16, a pair of digital cameras C1,2 for stereovision have been added to the device in Fig.8, which are separated by a distance of D1. D1 might adopt a value larger or

Z1 Z2

Z3

A1

A3

A2

A5

A4

A7

A6

A9

A8

A11

A10

Z

at least one laser probe, within one or several frames.

X

U1

C1

C

U2

C2

D

D1

**3.3 Real-time large volume full field 3D measurement** 

Fig. 16. A laser probe 3D camera combined with a stereo vision camera.

Z0

smaller than D. Each laser probe puts a mark on the object. From the image captured by camera C the absolute location of the object could be calculated using Eq.5 and the positions of the same mark on the pictures captured by camera C1 and C2 could then be predicated. In other word, one mark on the pictures captured by camera C1 could easily be matched with the same mark on the pictures captured by camera C2. Without these marks the matching between the pictures captured by cameras C1 and C2 might be very difficult or impossible over many pixels, creating serious noises in 3D reconstruction. The matching of the rest pixels around the marks could be performed quickly with reduced searching range. The marks also serve as an accurate and efficient means for the calibration of cameras C1 and C2. In stereovision it is very important to align one camera accurately with another camera, which brings great trouble when changing the focal length of one camera to zoom in or out, since the same amount of changes must be made instantly to the focal length of another camera. With the help of the laser marks, one camera needs only to follow roughly the changes of another camera. This is because all the pictures captured by cameras are projections of objects on the image plane, which is determined by the location, orientation and the focal length of the camera. Usually camera C is fixed at the origin of coordinates as illustrated in Fig.16. The only unknown parameter *tg* in Eq.5, which is related to the focal length, is pre-calibrated. It could also be determined on the spot for every picture captured by camera C based on the fact that the same object detected by neighbouring laser probes with different preset destination Z0 should have nearly the same depth Z as predicated by Eq.5. Even for very rough objects *tg* could be properly determined after a least squares fit over the depths of many pairs of neighbouring laser probes. Next the unknown locations, orientations and the focal lengths of the camera C1,2 could be derived from the image positions of hundreds of laser probes whose absolute coordinates are pre-calculated using Eq.5. Then by stretching, or rotation, or a combination of them, the pictures come from camera C1 could easily be transformed to match with the pictures from camera C2. After above pre-processing, stereo matching might be performed on the overlapped region of the image pairs from camera C1,2.

When laser probes are arranged as discussed in previous subsection, we say that the laser probe 3D camera is working in detection mode. For continuous measurements, once the location of all the objects are found in a frame, laser probes could be rearranged much more densely near the surfaces of known objects so that more three dimensional coordinates could be acquire within successive frames. In this case we say that the laser probe 3D camera is working in tracing mode. After some frames in tracing mode, a laser probe 3D camera should return to detection mode for one frame to check whether there are new objects appearing in the field. In Fig.16 the number of laser probes increased to 11 for tracing mode. It could be seen that as the number of laser probes increases the number of extra crossing points also increases as compared with that in Fig.11. Nevertheless these extra cross points are harmless since we already known the object lies around Z0, away from Z1 to Z3.

The stereovision pictures recorded by cameras C1,2 bear the images of lots of laser probes. They are harmful for later 3D display. Although these marks could be cleaned away via post imaging processing, a more preferable approach is to separate the visible light from infrared laser probes with a beam splitter. As illustrated in Fig.17, the beam splitter BS reflects the infrared laser probes onto image transducer CCD1, while passing the visible light onto image transducer CCD2.The advantage to employ two image transducers within one camera

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 21

C unchanged, see Fig.18c. From captured picture before and after the movement of U1,2 the exact displacement could be calculated from the displacements of the same laser probes.

For 360-deg shape measurement, we can mount the two laser probe generating units U1,2 and the camera C on two separate circular tracks, with the object under investigation placed at the centre. When the measurement at a certain angle is done, the camera C and laser probe generating units U1,2 could be move to a new view angle separately following the same strategies as discussed above. With the help of laser probes on the overlapped region we can determine over how much angle have U1,2 and camera C each moved, which makes it easy to transfer the local coordinates at a certain view angle accurately to the global coordinates. Otherwise additional measure has to be taken to monitor the positions of U1,2

In shape measurement we can chose CCD or CMOS image sensors with large pixel number to achieve high vertical and lateral resolution at the cost of reduced frame rate. Usually shape measurements are carried out at small fixed distances, if we let D = 2Z, Eq.6 simplifies

Eq.16 implies that the vertical resolution is the same as the lateral resolution determined by the image sensor. When an image sensor with a total of 10k×10k pixels is used, we have N=10000. Then for a subdivision with an area of 100×100mm2, i.e., W=100mm, both vertical and lateral resolutions reach 5μm for dj1,2=0.5. By a least squares fit method sub-pixel resolution for the image position of laser probes are possible [Maalen-Johansen, 1993; Clarke, et al., 1993]. When 1/20 sub-pixel resolution is obtained after least squares fit, dj1,2=0.05, above said resolution improves to 0.5 μm and the relative error reaches 5×10-6.

As discussed in subsection 3.2 within a single frame about 10801 points could be acquired using a laser probe 3D camera, each providing a three dimensional coordinate for the detected object. Usually this number of three dimensional coordinates is enough for industry inspection. The feature sizes, such as width, height, diameter, thickness, etc., could all be derived from measured data. If dense coordinates are needed for very complex shapes, they could be acquired from successive frames. For example, within 100 successive frames, which last 10 seconds at a frame rate of 10f/s, a total of about 100×10801≈106 three dimensional coordinates could be acquired. Between each frame the laser probes shift a little their preset positions along horizontal or vertical direction. When combined together these 106 three dimensional coordinates provide a well description of the entire component. In addition, to detect possible vibrations of the object during successive frames we can fix the positions of a small fraction of laser probes throughout the measurements. The movements of the images of these fixed laser probes help to reveal and eliminate the movements of the

In previous section we discussed four typical configurations of laser probe 3D cameras and their measuring precision. In this section we will provide more analysis concerning such

1,2

<sup>W</sup> dZ dj N (16)

Then measurement on the new subdivision S2 could be carried out.

and camera C.

objects relative to the camera.

**4. Characteristics of laser-probe 3D cameras** 

to,

is that the electronic amplifier for each image transducer may adopt a different gain value so that the dark one does not get lost in the bright one. This is beneficial especially when working in strong day light. If both cameras C1 and C adopted the same structure as illustrated in Fig.17, camera C2 could be taken away because images for visible light from cameras C1 and C are enough to make up a stereovision.

Fig. 17. A digital camera capable of recording infrared and visible light images separately.

#### **3.4 Static large size or 360-deg shape measurement**

In shape measurement for industry inspection the measuring accuracy is more crucial than measuring speed. Usually the component under investigation stays at a fixed position or moves slowly on a line during measurement. To improve measuring precision we can first divide the entire area under investigation into lot of subdivisions, say 10×10 subdivisions. Then measure each subdivision with much improved depth resolution due to reduced view angle as discussed in section 2. Dense and accurate three dimensional coordinates of the entire area could be obtained by patching all the measurements together [Gruen, 1988; Heipke, 1992]. The patching or aligning between adjacent subdivisions becomes easy and accurate with the help of laser probes. We can arrange adjacent subdivisions in a way so that they overlap slightly. As illustrated in Fig.18, when shifting from one subdivision S1 to an adjacent subdivision S2 we move the camera C and laser probe generating units U1,2 separately. First we move the camera C to the new subdivision S2 and keep U1,2 unchanged, see Fig.18b. From the images of laser probes on the overlap region on the pictures taken before and after the movement of camera C, we can find exactly how much the camera C have moved. This could be accomplished using the fact that the laser probes on the overlap region stay at fixed positions. Next we move U1,2 to the new subdivision S2 with the camera

Fig. 18. Steps to move laser probe generating unit U1,2 and camera C separately to adjacent subdivision.

is that the electronic amplifier for each image transducer may adopt a different gain value so that the dark one does not get lost in the bright one. This is beneficial especially when working in strong day light. If both cameras C1 and C adopted the same structure as illustrated in Fig.17, camera C2 could be taken away because images for visible light from

Fig. 17. A digital camera capable of recording infrared and visible light images separately.

CCD2 BS

CCD1

Lens

In shape measurement for industry inspection the measuring accuracy is more crucial than measuring speed. Usually the component under investigation stays at a fixed position or moves slowly on a line during measurement. To improve measuring precision we can first divide the entire area under investigation into lot of subdivisions, say 10×10 subdivisions. Then measure each subdivision with much improved depth resolution due to reduced view angle as discussed in section 2. Dense and accurate three dimensional coordinates of the entire area could be obtained by patching all the measurements together [Gruen, 1988; Heipke, 1992]. The patching or aligning between adjacent subdivisions becomes easy and accurate with the help of laser probes. We can arrange adjacent subdivisions in a way so that they overlap slightly. As illustrated in Fig.18, when shifting from one subdivision S1 to an adjacent subdivision S2 we move the camera C and laser probe generating units U1,2 separately. First we move the camera C to the new subdivision S2 and keep U1,2 unchanged, see Fig.18b. From the images of laser probes on the overlap region on the pictures taken before and after the movement of camera C, we can find exactly how much the camera C have moved. This could be accomplished using the fact that the laser probes on the overlap region stay at fixed positions. Next we move U1,2 to the new subdivision S2 with the camera

Fig. 18. Steps to move laser probe generating unit U1,2 and camera C separately to adjacent

b).Step 2

S1 S2

C

U1 U2

U1 U2

S1 S2

c).Step 3

C

cameras C1 and C are enough to make up a stereovision.

**3.4 Static large size or 360-deg shape measurement** 

subdivision.

U1 U2

C

a).Step 1

S1 S2

C unchanged, see Fig.18c. From captured picture before and after the movement of U1,2 the exact displacement could be calculated from the displacements of the same laser probes. Then measurement on the new subdivision S2 could be carried out.

For 360-deg shape measurement, we can mount the two laser probe generating units U1,2 and the camera C on two separate circular tracks, with the object under investigation placed at the centre. When the measurement at a certain angle is done, the camera C and laser probe generating units U1,2 could be move to a new view angle separately following the same strategies as discussed above. With the help of laser probes on the overlapped region we can determine over how much angle have U1,2 and camera C each moved, which makes it easy to transfer the local coordinates at a certain view angle accurately to the global coordinates. Otherwise additional measure has to be taken to monitor the positions of U1,2 and camera C.

In shape measurement we can chose CCD or CMOS image sensors with large pixel number to achieve high vertical and lateral resolution at the cost of reduced frame rate. Usually shape measurements are carried out at small fixed distances, if we let D = 2Z, Eq.6 simplifies to,

$$\text{dZ} = \frac{\text{W}}{\text{N}} \text{dja.z} \tag{16}$$

Eq.16 implies that the vertical resolution is the same as the lateral resolution determined by the image sensor. When an image sensor with a total of 10k×10k pixels is used, we have N=10000. Then for a subdivision with an area of 100×100mm2, i.e., W=100mm, both vertical and lateral resolutions reach 5μm for dj1,2=0.5. By a least squares fit method sub-pixel resolution for the image position of laser probes are possible [Maalen-Johansen, 1993; Clarke, et al., 1993]. When 1/20 sub-pixel resolution is obtained after least squares fit, dj1,2=0.05, above said resolution improves to 0.5 μm and the relative error reaches 5×10-6.

As discussed in subsection 3.2 within a single frame about 10801 points could be acquired using a laser probe 3D camera, each providing a three dimensional coordinate for the detected object. Usually this number of three dimensional coordinates is enough for industry inspection. The feature sizes, such as width, height, diameter, thickness, etc., could all be derived from measured data. If dense coordinates are needed for very complex shapes, they could be acquired from successive frames. For example, within 100 successive frames, which last 10 seconds at a frame rate of 10f/s, a total of about 100×10801≈106 three dimensional coordinates could be acquired. Between each frame the laser probes shift a little their preset positions along horizontal or vertical direction. When combined together these 106 three dimensional coordinates provide a well description of the entire component. In addition, to detect possible vibrations of the object during successive frames we can fix the positions of a small fraction of laser probes throughout the measurements. The movements of the images of these fixed laser probes help to reveal and eliminate the movements of the objects relative to the camera.

#### **4. Characteristics of laser-probe 3D cameras**

In previous section we discussed four typical configurations of laser probe 3D cameras and their measuring precision. In this section we will provide more analysis concerning such

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 23

employed. Other optical methods could hardly work with such low light power over so large area. For example, in structured light projection method, if an output light energy of 20~100W is projected over the same area, the illuminating intensity is only 2~10mW per square meter. In contrast, at a distance of 100m, the diameter of a laser probe could be as small as ~10mm as discussed in section 2. It means that even with an average energy of 1~5mW, the illuminating intensity provide by each laser probe could reach 1~5mW/25*π* per square millimetre, which is about 15708 times higher than that available in structured light

There are various interferences that may decrease measuring accuracy, to name a few, environmental light, vibration, colour, reflectivity and orientation of the object under investigation, etc. In subsection 3.3 we discussed how to eliminate the influence of environment light with a beam splitter. In subsection 3.4 we introduced a method to detect and eliminate the vibration of an object during successive measuring frames. Since a laser probe 3D camera determines the absolute location of an object by the positions rather than the exact intensities of reflected images of laser probes that are focused with diffraction limited precision, the colour, reflectivity or orientation of the object exerts limited influence

There is another interference source that usually receives little attention but is of vital importance in practice, i.e., mutual interferences between same active devices. When several users turn on their laser probe 3D cameras at the same time, will every camera produce good results like it usually does when it works alone? It is true that one camera would now capture the laser probes projected by other cameras. Fortunately few of the images of laser probes projected by other cameras would lie symmetrically round the pre-known image positions of the laser probes projected by itself, since the laser probes from different devices are projected from different places with different angles. In image processing, to depress this mutual interference, we can discard all the single light spots or pairs of light spots lying asymmetrically round the pre-known image positions. In addition, referring Fig.15, we may store in each camera many sets of laser probe arrangements that differ in their vertical pattern, or rotated with a different angle within the vertical plane. When it observed the existence of laser probes from other cameras by turning off its own laser probes, it may chose one arrangement of laser probes that coincides least with existing laser probes. Considering the number of laser probes projected by one camera is about 100 times less than the camera's total pixel number, about ten cameras might work side by side at the same time without much interference to each other. In addition a laser probe 3D camera could also distinguish its own laser probes from that emitted by other cameras from at least 4 successive frames with its own laser probes turning on and off repeatedly. Those light spots that appear and disappear properly are very likely the images produced by its own laser probes. Further more, for a professional laser probe 3D camera, several laser sources with different wavelengths may be incorporated. Accordingly narrow band changeable beam splitters should be used in Fig.17. When other cameras exist, it may shift to a least occupied wavelength. With all above strategies several tens of laser probe 3D cameras may work well

projection method at the same distance.

on the measuring results, especially in fast obstacle detection.

**4.3 Resistance to interferences** 

side by side at the same time.

characteristics as processing speed, power consumption, and resistance to external interferences and compare them with that of other measuring methods.

#### **4.1 Image processing speed**

The processing, or the derivation of depth information from pictures captured by a laser probe 3D camera is simple compared with many other methods like structured light projection, stereovision, etc. One need only to search on the left and right side of pre-known image positions of preset laser probes to see whether there is a pair of light spots reflected by objects. In other words one need only to check whether there are local intensity maximums, or whether the image intensities exceed a certain value on the left and right sides of Np pre-known pixels with a searching range of ±N/4Np pixels. The searching stops once the pair of light spots is found. Therefore on one image row, a total of Np×2N/4Np =N/2 pixels need to be checked at maximum. Considering pairs of light spots reflected by objects lie symmetrically around pre-known image positions, when one light spot is detected another light spot could be found easily on other side after one or two steps of searching. It implies that the maximum searching steps could be reduced to about N/4. Usually the pre-known pixels are arranged on every other row, so only about one eighth of the total pixels of an image need to be checked. Once a pair of light spots is detected the depth of the object could be calculated easily using Eq.5 with less than 10 operations. For the laser probe 3D camera given in the last section, at most 10801 points need to be calculated. Since the working frequencies of many ARM, FPGA chips have reached 500MHz~1 GHz, one operation could be perform within 100ns on many embedded systems. Then the total time to calculate 10801 points is less than 10801×10×100ns≈0.01s. It means that an obstacle in front of a car could be reported within a single frame.

#### **4.2 Laser power**

Since the laser power is focused into each laser probe rather than projected over the entire field, a laser probe 3D camera may be equipped with a relatively low power laser source. For micro 3D measurement a laser power less than 20mW might be enough, because almost all the laser energy might be gathered by objective lens and shed onto the image transducer except the adsorption by SLMs and optical lenses. However if optical manipulation of micro- or nano-particles is to be carried out higher energy might be necessary [MacDonald, 2002; Grier, 2003]. For industry inspection a laser power less than 2W might be enough since the measurements are usually carried out within about 1m. For obstacle detection within a range of 1~100m, if on average 1~5mW should be assigned to each laser probe for a total of 10801 laser probes, considering an adsorption of 90% by SLMs and optical lenses, a laser power of 100~500W might be necessary. To reduce energy adsorption dedicated gray scale SLMs should be employed. In a gray scale SLM the colour filters could be omitted, which results in triple decrease of energy adsorption as well as triple increase of available pixel number. In addition, among said 10801 laser probes, those with preset destinations at near distances might be assigned with much lower energy. Therefore the total energy for a laser probe 3D camera to measure an area of 100m×100m at a distance of 100m might be reduced to within 20~100W, much less than the lamp power in a LCD projector. The laser power could be further reduced by several times if sensitive CCD or CMOS image sensors are employed. Other optical methods could hardly work with such low light power over so large area. For example, in structured light projection method, if an output light energy of 20~100W is projected over the same area, the illuminating intensity is only 2~10mW per square meter. In contrast, at a distance of 100m, the diameter of a laser probe could be as small as ~10mm as discussed in section 2. It means that even with an average energy of 1~5mW, the illuminating intensity provide by each laser probe could reach 1~5mW/25*π* per square millimetre, which is about 15708 times higher than that available in structured light projection method at the same distance.

#### **4.3 Resistance to interferences**

22 Digital Image Processing

characteristics as processing speed, power consumption, and resistance to external

The processing, or the derivation of depth information from pictures captured by a laser probe 3D camera is simple compared with many other methods like structured light projection, stereovision, etc. One need only to search on the left and right side of pre-known image positions of preset laser probes to see whether there is a pair of light spots reflected by objects. In other words one need only to check whether there are local intensity maximums, or whether the image intensities exceed a certain value on the left and right sides of Np pre-known pixels with a searching range of ±N/4Np pixels. The searching stops once the pair of light spots is found. Therefore on one image row, a total of Np×2N/4Np =N/2 pixels need to be checked at maximum. Considering pairs of light spots reflected by objects lie symmetrically around pre-known image positions, when one light spot is detected another light spot could be found easily on other side after one or two steps of searching. It implies that the maximum searching steps could be reduced to about N/4. Usually the pre-known pixels are arranged on every other row, so only about one eighth of the total pixels of an image need to be checked. Once a pair of light spots is detected the depth of the object could be calculated easily using Eq.5 with less than 10 operations. For the laser probe 3D camera given in the last section, at most 10801 points need to be calculated. Since the working frequencies of many ARM, FPGA chips have reached 500MHz~1 GHz, one operation could be perform within 100ns on many embedded systems. Then the total time to calculate 10801 points is less than 10801×10×100ns≈0.01s. It means that an obstacle in

Since the laser power is focused into each laser probe rather than projected over the entire field, a laser probe 3D camera may be equipped with a relatively low power laser source. For micro 3D measurement a laser power less than 20mW might be enough, because almost all the laser energy might be gathered by objective lens and shed onto the image transducer except the adsorption by SLMs and optical lenses. However if optical manipulation of micro- or nano-particles is to be carried out higher energy might be necessary [MacDonald, 2002; Grier, 2003]. For industry inspection a laser power less than 2W might be enough since the measurements are usually carried out within about 1m. For obstacle detection within a range of 1~100m, if on average 1~5mW should be assigned to each laser probe for a total of 10801 laser probes, considering an adsorption of 90% by SLMs and optical lenses, a laser power of 100~500W might be necessary. To reduce energy adsorption dedicated gray scale SLMs should be employed. In a gray scale SLM the colour filters could be omitted, which results in triple decrease of energy adsorption as well as triple increase of available pixel number. In addition, among said 10801 laser probes, those with preset destinations at near distances might be assigned with much lower energy. Therefore the total energy for a laser probe 3D camera to measure an area of 100m×100m at a distance of 100m might be reduced to within 20~100W, much less than the lamp power in a LCD projector. The laser power could be further reduced by several times if sensitive CCD or CMOS image sensors are

interferences and compare them with that of other measuring methods.

front of a car could be reported within a single frame.

**4.2 Laser power** 

**4.1 Image processing speed** 

There are various interferences that may decrease measuring accuracy, to name a few, environmental light, vibration, colour, reflectivity and orientation of the object under investigation, etc. In subsection 3.3 we discussed how to eliminate the influence of environment light with a beam splitter. In subsection 3.4 we introduced a method to detect and eliminate the vibration of an object during successive measuring frames. Since a laser probe 3D camera determines the absolute location of an object by the positions rather than the exact intensities of reflected images of laser probes that are focused with diffraction limited precision, the colour, reflectivity or orientation of the object exerts limited influence on the measuring results, especially in fast obstacle detection.

There is another interference source that usually receives little attention but is of vital importance in practice, i.e., mutual interferences between same active devices. When several users turn on their laser probe 3D cameras at the same time, will every camera produce good results like it usually does when it works alone? It is true that one camera would now capture the laser probes projected by other cameras. Fortunately few of the images of laser probes projected by other cameras would lie symmetrically round the pre-known image positions of the laser probes projected by itself, since the laser probes from different devices are projected from different places with different angles. In image processing, to depress this mutual interference, we can discard all the single light spots or pairs of light spots lying asymmetrically round the pre-known image positions. In addition, referring Fig.15, we may store in each camera many sets of laser probe arrangements that differ in their vertical pattern, or rotated with a different angle within the vertical plane. When it observed the existence of laser probes from other cameras by turning off its own laser probes, it may chose one arrangement of laser probes that coincides least with existing laser probes. Considering the number of laser probes projected by one camera is about 100 times less than the camera's total pixel number, about ten cameras might work side by side at the same time without much interference to each other. In addition a laser probe 3D camera could also distinguish its own laser probes from that emitted by other cameras from at least 4 successive frames with its own laser probes turning on and off repeatedly. Those light spots that appear and disappear properly are very likely the images produced by its own laser probes. Further more, for a professional laser probe 3D camera, several laser sources with different wavelengths may be incorporated. Accordingly narrow band changeable beam splitters should be used in Fig.17. When other cameras exist, it may shift to a least occupied wavelength. With all above strategies several tens of laser probe 3D cameras may work well side by side at the same time.

Laser Probe 3D Cameras Based on Digital Optical Phase Conjugation 25

Chen, F.; Brown, G. M. & Song, M. (2000). Overview of three-dimensional shape measurement using optical methods, *Opt. Eng.* Vol.39, No.1, pp.10–22. Clarke, T.A.; Cooper, M.A.R. & Fryer, J.G.(1993). An estimation for the random error in sub-

Feinberg, J. (1982). Self-pumped continuous-wave phase-conjugation using internal

Gruen, A.W. (1988). Geometrically constrained multiphoto matching, Photogramm. *Eng.* 

Guan, C.; Hassebrook, L.G. & Lau D.L. (2003). Composite structured light pattern for three-

Heipke, C. (1992). A global approach for least squares image matching and surface

Leonhardt, K; Droste, U. & Tiziani, H.J. (1994). Microshape and rough surface analysis by

Jürgen, K. & Christoph S. (2011), Address-event based stereo vision with bio-inspired silicon

Kohler, C.; Schwab, X. & Osten, W. (2006). Optimally tuned spatial light modulators for

Maalen-Johansen, I.(1993). On the precision of sub-pixel measurements in videometry,

MacDonald, M.P., et al. (2002). Creation and manipulation of three-dimensional optically

Matoba, O., et al. (2002). Real-time three-dimensional object reconstruction by use of a phase-encoded digital hologram, *Appl.Opt.,* Vol.41, No.29, pp.6187-6192. Moring, I. (1989). Active 3-D vision system for automatic model-based shape inspection,

Neto, L.G.; Robergy, D.; & Sheng, Y. (1996). Full-range, continuous, complex modulation by

Srinivasan, V.; Liu, H.C. & Halioua M. (1984). Automated phase-measuring profilometry of

Stephan, H.; Thorsten, R & Bianca, H. (2008). A Performance Review of 3D TOF Vision

Tudela, R., et al. (2004). Wavefront reconstruction by adding modulation capabilities of two

Yamaguchi, I. et al. (2006). Surface shape measurement by phase-shifting digital holography

Yariv, A. & Peper, D.M. (1977). Amplified reflection, phase conjugation, and oscillation in

the use of two coupled liquid-crystal televisions, *Appl.Opt.,* Vol.23, No.23, pp.4567-

Systems in Comparison to Stereo Vision Systems, In: *Stereo Vision,* Asim Bhatti,

recognition in object space, *Photogramm. Eng. Remote Sens.,* Vol.58, No.3, pp.317-

retina imagers, In: *Advances in theory and applications of stereo vision,* Asim Bhatti,

Grier, D.G. (2003). A revolution in optical manipulation, *Nature,* Vol.424, pp.810-816.

pp.161-168.

323.

4576.

reflection, *Opt.Lett.,* Vol.7, pp.486.

*Proc.SPIE,* Vol.2252, pp.169-178.

*Opt. Lasers Eng.,* Vol.10, pp.3-4.

*Remote Sens.,* Vol.54, No.5, pp.633-641.

dimensional video. *Opt Express,* Vol.11, pp.406–417.

fringe projection, *Appl. Opt.,* Vol.33, pp.7477-7488.

pp.165-188, InTech, ISBN:978-953-307-516-7, Vienna, Austria

digital holography, *Appl.Opt.,* Vol.45, No.5, pp.960-967.

trapped structures, *Since,* Vol.296, pp.1101-1103.

3-D diffuse objects. *Appl Opt.,* Vol.23, pp.3105–3108.

pp.103-120, InTech, ISBN: 978-953-7619-22-0, Vienna, Austria.

liquid crystal devices, *Opt.Eng.,* Vol.43, No.11, pp.2650-2657.

with a wavelength shift, *Appl. Opt.,* Vol.45, pp.7610–7616.

general four wave mixing, *Opt.Lett.,*Vol.1,No.1, p.16.

pixel target location and its use in the bundle adjustment, *Proc.SPIE*, Vol.2252,

### **5. Conclusion**

In summery, the chapter puts forth a laser probe 3D camera that offers depth information lost in conventional 2D cameras. Via digital optical phase conjugation, it projects hundreds and thousands of laser probes precisely onto preset destinations to realize accurate and quick three dimensional coordinate measurement. A laser probe 3D camera could be designed with vertical and lateral resolutions from sub-micrometer to several micrometers for micro object or medium sized component measurement. It could also be configured for real-time 3D measurement over a large volume—for example, over a distance of 1~100m with a view angle larger than 50º—and detect any obstacle as narrow as 20mm or much less within a single frame or 0.01 second, which is of great use for auto-navigation, safe-driving, intelligent robot, etc.

The laser probes in a 3D camera not only make a 3D measurement simple and quick, but also help a lot in accurate patching for large size or 360-deg shape measurement, monitoring and elimination of vibration, depression of mutual influence when many laser probe 3D cameras work side by side, etc. When incorporated with stereovision they make the stereo matching easy and accurate. More than that, they offer an efficient means for camera calibration so that when one camera zooms in or out, another camera may follow only roughly rather than exactly, alleviating the stringent equipment requirements in stereo movie industry.

With its diffraction limited resolution to digitally reconstruct any optical wavefront, however complex it is, digital optical phase conjugation opened a way for many new techniques. The laser probe 3D camera discussed in the chapter is only one of the many possible applications. Since huge number of laser probes with varying intensity could be created precisely at lots of preset points, pointing more laser probes into each preset point using a large array of laser probe generating units and taking one such preset point as a 3D pixel, real-time true 3D display over a very large space with fine quality could become a reality. High power laser beams could also be formed, accurately focused and steered via digital optical phase conjugation, which may find wide applications in such fields as nuclear fusion, space laser communication, and so on. In micro world arrays of laser probes with sub-micrometer resolution could be employed for fast micro- or nano-partical assembling, operations on DNA, stereo information storage, etc.

#### **6. Acknowledgment**

The work is financially supported by self-determined research funds of CCNU from the colleges' basic research and operation of MOE. It is also partly supported by a key project No.104120 from Ministry of Education of P.R.China.

#### **7. References**

Amako, J.; Miura, H. & Sonehara, T. (1993). Wavefront control using liquid-crystal devices, *Appl.Opt.,* Vol.32, No.23, pp.4323-4329.

Asim, B. (2008). *Stereo Vision*, InTech, ISBN: 978-953-7619-22-0, Vienna, Austria.

Barbosa, E. A. & Lino, A. (2007). Multiwavelength electronic speckle pattern interferometry for surface shape measurement, *Appl. Opt.,* Vol.46, pp.2624–2631.

In summery, the chapter puts forth a laser probe 3D camera that offers depth information lost in conventional 2D cameras. Via digital optical phase conjugation, it projects hundreds and thousands of laser probes precisely onto preset destinations to realize accurate and quick three dimensional coordinate measurement. A laser probe 3D camera could be designed with vertical and lateral resolutions from sub-micrometer to several micrometers for micro object or medium sized component measurement. It could also be configured for real-time 3D measurement over a large volume—for example, over a distance of 1~100m with a view angle larger than 50º—and detect any obstacle as narrow as 20mm or much less within a single frame or 0.01 second, which is of great use for auto-navigation, safe-driving,

The laser probes in a 3D camera not only make a 3D measurement simple and quick, but also help a lot in accurate patching for large size or 360-deg shape measurement, monitoring and elimination of vibration, depression of mutual influence when many laser probe 3D cameras work side by side, etc. When incorporated with stereovision they make the stereo matching easy and accurate. More than that, they offer an efficient means for camera calibration so that when one camera zooms in or out, another camera may follow only roughly rather than exactly, alleviating the stringent equipment requirements in stereo

With its diffraction limited resolution to digitally reconstruct any optical wavefront, however complex it is, digital optical phase conjugation opened a way for many new techniques. The laser probe 3D camera discussed in the chapter is only one of the many possible applications. Since huge number of laser probes with varying intensity could be created precisely at lots of preset points, pointing more laser probes into each preset point using a large array of laser probe generating units and taking one such preset point as a 3D pixel, real-time true 3D display over a very large space with fine quality could become a reality. High power laser beams could also be formed, accurately focused and steered via digital optical phase conjugation, which may find wide applications in such fields as nuclear fusion, space laser communication, and so on. In micro world arrays of laser probes with sub-micrometer resolution could be employed for fast micro- or nano-partical assembling,

The work is financially supported by self-determined research funds of CCNU from the colleges' basic research and operation of MOE. It is also partly supported by a key project

Amako, J.; Miura, H. & Sonehara, T. (1993). Wavefront control using liquid-crystal devices,

Barbosa, E. A. & Lino, A. (2007). Multiwavelength electronic speckle pattern interferometry

Asim, B. (2008). *Stereo Vision*, InTech, ISBN: 978-953-7619-22-0, Vienna, Austria.

for surface shape measurement, *Appl. Opt.,* Vol.46, pp.2624–2631.

**5. Conclusion** 

intelligent robot, etc.

movie industry.

**6. Acknowledgment**

**7. References** 

operations on DNA, stereo information storage, etc.

No.104120 from Ministry of Education of P.R.China.

*Appl.Opt.,* Vol.32, No.23, pp.4323-4329.


**2** 

Andon Lazarov *Burgas Free University* 

*Bulgaria* 

**ISAR Signal Formation and Image** 

**Reconstruction as Complex Spatial Transforms** 

Inverse aperture synthesis in the radar theory is a recording of the complex reflective pattern (complex microwave hologram) of a moving target as a complex signal. The trajectory of moving target limited by the radar's antenna pattern or time of observation is referred to as inverse synthetic aperture, and radar using the principle of inverse aperture synthesis is inverse synthetic aperture radar (ISAR). The spatial distribution of the reflectivity function of the target referred to as a target image can be retrieved from the

Conventional ISAR systems are coherent radars. In case the radars utilize a range-Doppler principle to obtain the desired image the range resolution of the radar image is directly related to the bandwidth of the transmitted radar signal, and the cross-range resolution is obtained from the Doppler frequency gradient generated by the radial displacement of the

A common approach in ISAR technique is division of the arbitrary movement of the target into radial displacement of its mass centre and rational motion over the mass centre. Radial displacement is compensated considered as not informative and only rotational motion is used for signal processing and image reconstruction. In this case the feature extraction is decomposed into motion compensation and image reconstruction (Li et al., 2001). Multiple ISAR image reconstruction techniques have been created, which can be divided into parametric and nonparametric methods in accordance with the signal model description and the methods of a target features extraction. (Berizzi et al., 2002; Mrtorella et al., 2003; Berizzi et al., 2004). The range-Doppler is the simplest non parametric technique implemented by two-dimensional inverse Fourier transform (2-D IFT). Due to significant change of the effective rotation vector or large aspect angle variation during integration time the image becomes blurred, then motion compensation is applied, which consist in coarse range alignment and fine phase correction, called autofocus algorithm. It is performed via tracking and polynomial approximation of signal history from a dominant or well isolated point scatterer on the target (Chen & Andrews, 1980), referred to as dominant scatterer algorithm or prominent point processing, a synthesized scatterer such as the centroid of multiple scatterers (Wu et al., 1995), referred to as multiple scatterer algorithm. Autofocus technique for random translational motion compensation based on definition of an entropy image cost function is developed in (Xi et al., 1999). Time window technique for suitable

received complex signals by applying image reconstruction techniques.

**1. Introduction** 

object relative to the radar.

Zhiyang, L. (2010a). Accurate optical wavefront reconstruction based on reciprocity of an optical path using low resolution spatial light modulators, *Optics Communications,* Vol. 283, pp.3646-3657. (2010b). *SciTopics*. Retrieved December 30, 2010, from http://www.scitopics.com/Real\_time\_accurate\_optical\_wave\_front\_reconstruction\_ based\_on\_digital\_ optical\_phase\_conjugation.html

### **ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms**

Andon Lazarov *Burgas Free University Bulgaria* 

#### **1. Introduction**

26 Digital Image Processing

Zhiyang, L. (2010a). Accurate optical wavefront reconstruction based on reciprocity of an

based\_on\_digital\_ optical\_phase\_conjugation.html

optical path using low resolution spatial light modulators, *Optics Communications,* Vol. 283, pp.3646-3657. (2010b). *SciTopics*. Retrieved December 30, 2010, from http://www.scitopics.com/Real\_time\_accurate\_optical\_wave\_front\_reconstruction\_

> Inverse aperture synthesis in the radar theory is a recording of the complex reflective pattern (complex microwave hologram) of a moving target as a complex signal. The trajectory of moving target limited by the radar's antenna pattern or time of observation is referred to as inverse synthetic aperture, and radar using the principle of inverse aperture synthesis is inverse synthetic aperture radar (ISAR). The spatial distribution of the reflectivity function of the target referred to as a target image can be retrieved from the received complex signals by applying image reconstruction techniques.

> Conventional ISAR systems are coherent radars. In case the radars utilize a range-Doppler principle to obtain the desired image the range resolution of the radar image is directly related to the bandwidth of the transmitted radar signal, and the cross-range resolution is obtained from the Doppler frequency gradient generated by the radial displacement of the object relative to the radar.

> A common approach in ISAR technique is division of the arbitrary movement of the target into radial displacement of its mass centre and rational motion over the mass centre. Radial displacement is compensated considered as not informative and only rotational motion is used for signal processing and image reconstruction. In this case the feature extraction is decomposed into motion compensation and image reconstruction (Li et al., 2001). Multiple ISAR image reconstruction techniques have been created, which can be divided into parametric and nonparametric methods in accordance with the signal model description and the methods of a target features extraction. (Berizzi et al., 2002; Mrtorella et al., 2003; Berizzi et al., 2004). The range-Doppler is the simplest non parametric technique implemented by two-dimensional inverse Fourier transform (2-D IFT). Due to significant change of the effective rotation vector or large aspect angle variation during integration time the image becomes blurred, then motion compensation is applied, which consist in coarse range alignment and fine phase correction, called autofocus algorithm. It is performed via tracking and polynomial approximation of signal history from a dominant or well isolated point scatterer on the target (Chen & Andrews, 1980), referred to as dominant scatterer algorithm or prominent point processing, a synthesized scatterer such as the centroid of multiple scatterers (Wu et al., 1995), referred to as multiple scatterer algorithm. Autofocus technique for random translational motion compensation based on definition of an entropy image cost function is developed in (Xi et al., 1999). Time window technique for suitable

ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 29

( ) (0) . ( ) (0)

*xt x v*

*yt y v* 

module of the vector velocity and is the angle between vector velocity and *Ox* axis.

where **R v** (0). is the inner product, defined by (0). (0) (0) *<sup>x</sup> <sup>y</sup>* **R v** *x v y v* or

2 2 *R t R vt R vt* ( ) (0) ( . ) 2 (0)( . )cos

The radial velocity of the target at the moment *t* is defined by differentiation of Eq. (5), i.e.

2 2 2 ( ) (0). .cos ( ) (0) ( . ) 2 (0). . .cos *<sup>r</sup>*

*dt R vt R vt*

/ 2 . The time variation of the radial velocity of the target causes a time dependent

where *R*(0) is the distance to the target at the moment *t* 0, measured on *OA*, the initial line

(0). ( ).

(0). *<sup>x</sup> <sup>y</sup> x v yt v R v*

target (point **A**); 2 2 *R xy* (0) (0) (0) is the module of the initial vector;

arccos

*dR t v t R v v t*

, which for the kinematics in Fig. 1 requires an angle

Assume that the ISAR emits to the target a continuous sinusoidal waveform, i.e.

Doppler shift in the frequency of the signal reflected from the target.

*x y*

*t*

which in matrix form can be rewritten as

 ; 

If *t*  0, the radial velocity (0) cos *<sup>r</sup> v v*

**2.2 Doppler frequency of a moving point target** 

moment *t T* when *vT R* . (0)cos

velocity **v** , defined by the equation

Then Eq. (3) can be rewritten as

, *y R* (0) (0).sin

, *v v <sup>y</sup>* .sin

The time dependent distance ISAR – point target can be expressed as

where *x R* (0) (0).cos

aspect angle; .cos *<sup>x</sup> v v*

**R v** (0). (0). .cos *R v*

of sight (LOS).

*<sup>R</sup>*(0)cos *<sup>T</sup> v*

  **RR v** ( ) (0) . *t t* (1)

are the coordinates of the initial position of the

are the coordinates of the vector velocity; *v* is the

2 2 *R t R vt* ( ) (0) ( . ) 2( (0). ). **R v** *t* , (3)

is the angle between position vector **R**(0) and vector

. (4)

. In case the angle 0, then (0) *<sup>r</sup> v v* . At the

the target is on the traverse, then () 0 *<sup>r</sup> v T* , and

(2)

, (5)

. (6)

to have a value

is the initial

selection of the signals to be coherently processed and to provide a focused image is suggested in (Martorella Berizzi, 2005). A robust autofocus algorithm based on a flexible parametric signal model for motion estimation and feature extraction in ISAR imaging of moving targets via minimizing a nonlinear least squares cost function is proposed in (Li et al., 2001). Joint time-frequency transform for radar range-Doppler imaging and ISAR motion compensation via adaptive joint time-frequency technique is presented in (Chen Qian, 1998; Qian , Chen 1998).

In the present chapter assuming the target to be imaged is an assembly of generic point scatterers an ISAR concept, comprising three-dimensional (3-D) geometry and kinematics, short monochromatic, linear frequency modulated (LFM) and phase code modulated (PCM) signals, and target imaging algorithms is thoroughly considered. Based on the functional analysis an original interpretation of the mathematical descriptions of ISAR signal formation and image reconstruction, as a direct and inverse spatial transform, respectively is suggested. It is proven that the Doppler frequency of a particular generic point is congruent with its space coordinate at the moment of imaging. In this sense the ISAR image reconstruction in its essence is a technique of total radial motion compensation of a moving target. Without resort to the signal history of a dominant point scatterer a motion compensation of higher algorithm based on image entropy minimization is created.

#### **2. ISAR complex signal of a point target (scatterer)**

#### **2.1 Kinematic equation of a moving point target**

The Doppler frequency induced by the radial displacement of the target with respect to the point of observation is a major characteristic in ISAR imaging. It requires analysis of the kinematics and signal reflected by moving target. Consider an ISAR placed in the origin of the coordinate system (Oxy) and the point **A** as an initial position with vector **R**(0) at the moment *t* = 0, and the point **B** as a current or final position with vector **R**(*t*) at the moment *t*  (Fig. 1).

Fig. 1. Kinematics of a point target.

Assume a point target is moving at a vector velocity **v**, and then the kinematic vector equation can be expressed as

$$\mathbf{R}(t) = \mathbf{R}(0) + \mathbf{v}.t \tag{1}$$

which in matrix form can be rewritten as

28 Digital Image Processing

selection of the signals to be coherently processed and to provide a focused image is suggested in (Martorella Berizzi, 2005). A robust autofocus algorithm based on a flexible parametric signal model for motion estimation and feature extraction in ISAR imaging of moving targets via minimizing a nonlinear least squares cost function is proposed in (Li et al., 2001). Joint time-frequency transform for radar range-Doppler imaging and ISAR motion compensation via adaptive joint time-frequency technique is presented in (Chen Qian,

In the present chapter assuming the target to be imaged is an assembly of generic point scatterers an ISAR concept, comprising three-dimensional (3-D) geometry and kinematics, short monochromatic, linear frequency modulated (LFM) and phase code modulated (PCM) signals, and target imaging algorithms is thoroughly considered. Based on the functional analysis an original interpretation of the mathematical descriptions of ISAR signal formation and image reconstruction, as a direct and inverse spatial transform, respectively is suggested. It is proven that the Doppler frequency of a particular generic point is congruent with its space coordinate at the moment of imaging. In this sense the ISAR image reconstruction in its essence is a technique of total radial motion compensation of a moving target. Without resort to the signal history of a dominant point scatterer a motion

compensation of higher algorithm based on image entropy minimization is created.

The Doppler frequency induced by the radial displacement of the target with respect to the point of observation is a major characteristic in ISAR imaging. It requires analysis of the kinematics and signal reflected by moving target. Consider an ISAR placed in the origin of the coordinate system (Oxy) and the point **A** as an initial position with vector **R**(0) at the moment *t* = 0, and the point **B** as a current or final position with vector **R**(*t*) at the moment *t* 

Assume a point target is moving at a vector velocity **v**, and then the kinematic vector

**2. ISAR complex signal of a point target (scatterer)** 

**2.1 Kinematic equation of a moving point target** 

Fig. 1. Kinematics of a point target.

equation can be expressed as

1998; Qian , Chen 1998).

(Fig. 1).

$$
\begin{bmatrix} \mathbf{x}(t) \\ \mathbf{y}(t) \end{bmatrix} = \begin{bmatrix} \mathbf{x}(0) \\ \mathbf{y}(0) \end{bmatrix} + \begin{bmatrix} v\_x \\ v\_y \end{bmatrix} t \tag{2}
$$

where *x R* (0) (0).cos , *y R* (0) (0).sin are the coordinates of the initial position of the target (point **A**); 2 2 *R xy* (0) (0) (0) is the module of the initial vector; is the initial aspect angle; .cos *<sup>x</sup> v v* , *v v <sup>y</sup>* .sin are the coordinates of the vector velocity; *v* is the module of the vector velocity and is the angle between vector velocity and *Ox* axis.

The time dependent distance ISAR – point target can be expressed as

$$R(t) = \sqrt{R^2(0) + \left(\upsilon.t\right)^2 + 2\left(\mathbf{R}(0).\mathbf{v}\right)}.\tag{3}$$

where **R v** (0). is the inner product, defined by (0). (0) (0) *<sup>x</sup> <sup>y</sup>* **R v** *x v y v* or **R v** (0). (0). .cos *R v* ; is the angle between position vector **R**(0) and vector velocity **v** , defined by the equation

$$\theta = \arccos \frac{\varkappa(0).v\_x + \jmath(t).v\_y}{R(0).v}.\tag{4}$$

Then Eq. (3) can be rewritten as

$$R(t) = \sqrt{R^2(0) + \left(v.t\right)^2 + 2R(0)\left(v.t\right)\cos\theta} \,\,\,\,\,\tag{5}$$

where *R*(0) is the distance to the target at the moment *t* 0, measured on *OA*, the initial line of sight (LOS).

The radial velocity of the target at the moment *t* is defined by differentiation of Eq. (5), i.e.

$$vv\_r(t) = \frac{dR(t)}{dt} = \frac{v^2t + R(0).v.\cos\theta}{\sqrt{R^2(0) + (v.t)^2 + 2R(0).v.t.\cos\theta}} \,. \tag{6}$$

If *t*  0, the radial velocity (0) cos *<sup>r</sup> v v* . In case the angle 0, then (0) *<sup>r</sup> v v* . At the moment *t T* when *vT R* . (0)cos the target is on the traverse, then () 0 *<sup>r</sup> v T* , and

*<sup>R</sup>*(0)cos *<sup>T</sup> v* , which for the kinematics in Fig. 1 requires an angle to have a value / 2 . The time variation of the radial velocity of the target causes a time dependent Doppler shift in the frequency of the signal reflected from the target.

#### **2.2 Doppler frequency of a moving point target**

Assume that the ISAR emits to the target a continuous sinusoidal waveform, i.e.

ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 31

 / 2 ,

Assume that the point target is moving at the velocity *v* =29 m/s and illuminated by a continuous waveform with wavelength = 3.10-2 m (frequency <sup>10</sup> *f* 10 Hz). CPI time *t* = 712 -722 s s, initial distance *R*(0) = 105 m, guiding angle 0.9., position angle = /3. The calculation results of the current signal frequency and Doppler frequency are illustrated in

(a) Current ISAR signal frequency (b) Doppler frequency

its sign (zero Doppler differential) can be regarded as a moment of target imaging.

Fig. 2. Current ISAR signal and Doppler frequency caused by time varying radial velocity.

It is worth noting that the current signal frequency decreases during CPI due to the alteration of the value and sign of the Doppler frequency varying from -3 to 3 Hz. At the moment *t* = 717 s the Doppler frequency is zero. The time instance where Doppler changes

Computational results of the imaginary and real part of ISAR signal reflected by a point target with time varying radial velocity are presented in Figs. 3, (a), and (b). It can be clearly seen the variation of the current frequency of the signal due to the time dependent Doppler frequency of the point target. The existence of wide bandwidth of Doppler variation in the signal allows multiple point scatterers to be potentially resolved at the moment of imaging.

 0 the Doppler frequency is time dependent during the aperture syntheses, coherent processing interval (CPI), but only one value has a meaning for ISAR imaging, the value

*D*

the target is on the traverse, then *F T <sup>D</sup>*() 0 . At the particular moment *t*

. At the moment *<sup>t</sup> <sup>T</sup>*, i.e

2 2 2

*R vt*

(0) ( . )

. Hence in

2 . () .

*v t F t*

. If 0, then <sup>2</sup> *F v <sup>D</sup>*(0) .

is constant, and for

defined at the moment of imaging, which will be proven in subsection 3.3.

If *t*  0, then <sup>2</sup> *F v <sup>D</sup>*(0) . .cos

<sup>0</sup> , <sup>2</sup> *Ft v <sup>D</sup>*() .

**2.3 Numerical experiments** 

*vT R* . (0)cos

**2.3.1 Example 1** 

Figs 2, (a), and (b).

and 

case 

$$\mathbf{s(t)} = A\_0 \exp(jat) \,, \tag{7}$$

where *<sup>A</sup>*0 is the amplitude of the emitted waveform, 2 . <sup>2</sup> *<sup>c</sup> <sup>f</sup>* is the angular frequency, *f* is the carrier frequency, is the wavelength of the emitted waveform, *c* 3.108 m/s is the speed of the light in vacuum.

The signal reflected from the target can be defined as a time delayed replica of the emitted waveform, i.e.

$$s(t) = A\_i \exp(j o(t - t\_i))\tag{8}$$

where *Ai* is the amplitude of the reflected signal, 2 () *<sup>i</sup> i R t <sup>t</sup> <sup>c</sup>* is the time delay of the replica of the emitted waveform, ( ) *R t <sup>i</sup>* is the radial slant range distance to the target, calculated by Eq. (5). Define the general phase of the reflected signal as

$$\Phi(t) = \alpha \left( t - \frac{2R\_i(t)}{c} \right). \tag{9}$$

Then the current angular frequency of the reflected signal can be determined as

$$
\hat{\alpha}(t) = \frac{d\Phi(t)}{dt} = \alpha - 2\frac{\alpha}{c} \frac{dR\_i(t)}{dt} \,\, , \tag{10}
$$

$$
\hat{\phi}(t) = \frac{d\Phi(t)}{dt} = \alpha - \frac{4\pi}{\lambda} \frac{dR\_i(t)}{dt} \,\tag{11}
$$

where <sup>4</sup> ( ) () . *<sup>i</sup> D dR t <sup>t</sup> dt* is the angular time dependent Doppler frequency.

For the closing target ( ) <sup>0</sup> *<sup>i</sup> dR t dt* , then the angular Doppler frequency is a negative, *<sup>D</sup>*() 0 *t* , and current angular frequency of the signal reflected from the target, ˆ( )*t* , increases, i.e. ˆ() () *t t <sup>D</sup>* . For a receding target ( ) <sup>0</sup> *<sup>i</sup> dR t dt* , then the angular Doppler frequency is a positive, *<sup>D</sup>*() 0 *t* , and current frequency of the signal reflected from the target, ˆ( )*t* , decreases, i.e. ˆ() () *t t <sup>D</sup>* .

Based on Eq. (6) the angular Doppler frequency can be expressed as

$$\rho o\_{\rm D}(t) = \frac{4\pi}{\lambda} \frac{v^2 \, t + R(0).v.\cos\theta}{\sqrt{R^2(0) + (v.t)^2 + 2R(0).v.t\cos\theta}} \,. \tag{12}$$

Accordingly the absolute Doppler frequency can be defined as

$$F\_{\rm D}(t) = \frac{2}{\lambda} \cdot \frac{\upsilon^2 \, t + R(0) \, \upsilon \, \cos \theta}{\sqrt{R^2(0) + \left(\upsilon \, t\right)^2 + 2R(0) \, \upsilon \, t \, \cos \theta}} \cdot \tag{13}$$

If *t*  0, then <sup>2</sup> *F v <sup>D</sup>*(0) . .cos . If 0, then <sup>2</sup> *F v <sup>D</sup>*(0) . . At the moment *<sup>t</sup> <sup>T</sup>*, i.e *vT R* . (0)cos the target is on the traverse, then *F T <sup>D</sup>*() 0 . At the particular moment *t* and <sup>0</sup> , <sup>2</sup> *Ft v <sup>D</sup>*() . is constant, and for / 2 , 2 2 2 2 . () . (0) ( . ) *D v t F t R vt* . Hence in

case 0 the Doppler frequency is time dependent during the aperture syntheses, coherent processing interval (CPI), but only one value has a meaning for ISAR imaging, the value defined at the moment of imaging, which will be proven in subsection 3.3.

#### **2.3 Numerical experiments**

#### **2.3.1 Example 1**

30 Digital Image Processing

<sup>0</sup> *st A j t* ( ) exp( )

The signal reflected from the target can be defined as a time delayed replica of the emitted

( ) exp( ( )) *i i st A j t t* 

of the emitted waveform, ( ) *R t <sup>i</sup>* is the radial slant range distance to the target, calculated by

2 () ( ) *R t <sup>i</sup> t t*

( ) ( ) ˆ( ) 2 . *<sup>i</sup> d t dR t <sup>t</sup> dt c dt* 

 

() 4 ( ) ˆ( ) . *<sup>i</sup> d t dR t <sup>t</sup> dt dt* 

 

is the angular time dependent Doppler frequency.

*<sup>D</sup>*() 0 *t* , and current angular frequency of the signal reflected from the target,

2 2 2 4 . (0). .cos ( )

2 2 2 2 . (0). .cos () .

*vt R v <sup>t</sup>*

*vt R v F t*

*<sup>D</sup>* . For a receding target ( ) <sup>0</sup> *<sup>i</sup> dR t*

Then the current angular frequency of the reflected signal can be determined as

where *<sup>A</sup>*0 is the amplitude of the emitted waveform, 2 . <sup>2</sup> *<sup>c</sup> <sup>f</sup>*

where *Ai* is the amplitude of the reflected signal, 2 () *<sup>i</sup>*

Eq. (5). Define the general phase of the reflected signal as

*dt*

*D*

*D*

ˆ() () *t t <sup>D</sup>* .

Based on Eq. (6) the angular Doppler frequency can be expressed as

Accordingly the absolute Doppler frequency can be defined as

frequency, *f* is the carrier frequency,

waveform, i.e.

where <sup>4</sup> ( ) () . *<sup>i</sup> D*

frequency is a positive,

increases, i.e.

target,

*dR t <sup>t</sup>*

For the closing target ( ) <sup>0</sup> *<sup>i</sup> dR t*

ˆ( )*t* , decreases, i.e.

*dt*

ˆ() () *t t* 

m/s is the speed of the light in vacuum.

*i R t <sup>t</sup>*

*c*

 

is the wavelength of the emitted waveform, *c* 3.108

, (7)

(8)

. (9)

, (10)

, (11)

, then the angular Doppler frequency is a negative,

*dt*

(0) ( . ) 2 (0). . cos

(0) ( . ) 2 (0). . .cos

*R vt R vt*

*R vt R vt*

*<sup>D</sup>*() 0 *t* , and current frequency of the signal reflected from the

 

*<sup>c</sup>* is the time delay of the replica

ˆ( )*t* ,

, then the angular Doppler

. (12)

. (13)

is the angular

Assume that the point target is moving at the velocity *v* =29 m/s and illuminated by a continuous waveform with wavelength = 3.10-2 m (frequency <sup>10</sup> *f* 10 Hz). CPI time *t* = 712 -722 s s, initial distance *R*(0) = 105 m, guiding angle 0.9., position angle = /3. The calculation results of the current signal frequency and Doppler frequency are illustrated in Figs 2, (a), and (b).

Fig. 2. Current ISAR signal and Doppler frequency caused by time varying radial velocity.

It is worth noting that the current signal frequency decreases during CPI due to the alteration of the value and sign of the Doppler frequency varying from -3 to 3 Hz. At the moment *t* = 717 s the Doppler frequency is zero. The time instance where Doppler changes its sign (zero Doppler differential) can be regarded as a moment of target imaging.

Computational results of the imaginary and real part of ISAR signal reflected by a point target with time varying radial velocity are presented in Figs. 3, (a), and (b). It can be clearly seen the variation of the current frequency of the signal due to the time dependent Doppler frequency of the point target. The existence of wide bandwidth of Doppler variation in the signal allows multiple point scatterers to be potentially resolved at the moment of imaging.

ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 33

(a) Imaginary part of an ISAR signal (b) Real part of an ISAR signal.

Computational results of the imaginary and real part of ISAR signal reflected by a point target with constant radial velocity are presented in Figs. 5, (a), and b. It can be clearly seen

The basic characteristic in ISAR imaging is the time dependent distance between a particular generic point from the target and ISAR. Consider 3-D geometry of ISAR scenario with radar and moving target in the coordinate system *Oxyz* (Fig. 6). The target is located in a regular grid, defined in the coordinate system *O XYZ* ' . The generic point scatterer **g** from the target area is specified by the index vector (, , ) *ijk* , i.e. **g** (, , ) *ijk* . The position vector ( ) *ijk* **R** *p* of the *ijk* th generic point scatterer in the coordinate system *Oxyz* at the moment *p* is described

<sup>00</sup> ( ) (0) <sup>2</sup> *ijk <sup>p</sup> ijk*

coordinates of the generic point, *Tp* denotes the pulse repetition period; *p* 1, *N* denotes the index of the emitted pulse, *N* denotes the full number of the emitted pulses during CPI; 00' 00' 00' 00' (0) (0), (0), (0), *<sup>T</sup>* **<sup>R</sup>** *xyz* is the position vector of the target geometric center, that

and cos *V V <sup>z</sup>*

position vector of the *ijk* th generic point; ( ) *X iX ijk* , ( ) *Yijk j Y* and ( ) *Z kZ ijk* denote

*ijk ijk ijk ijk <sup>p</sup> <sup>x</sup> py p <sup>z</sup> <sup>p</sup>* **<sup>R</sup>** , ( ), ( ) *ijk ijk <sup>x</sup> py p* , and ( ) *ijk <sup>z</sup> <sup>p</sup>* are the current

*T*

2 *N*

*<sup>N</sup> p Tp* **RRV AR** , (14)

*p* , [,,]*<sup>T</sup>* **V** *VVV xyz* denotes the vector velocity with

; [ ,,]*<sup>T</sup>* **R***ijk ijk ijk ijk XYZ* denotes the

**3. ISAR signal formation and imaging with a sequence of monochromatic** 

Fig. 5. Imaginary and real part of ISAR signal reflected by a point target.

the change of the phase in the imaginary part of the ISAR signal.

**short pulses** 

**3.1 3-D ISAR geometry and kinematics** 

by the following vector equation

where 2 2 ( ) ( ), ( ), ( )

locates a point *O*' at the moment

, cos *V V <sup>y</sup>*

coordinates cos *V V <sup>x</sup>*

Fig. 3. Imaginary and real part of ISAR signal reflected by a point target.

#### **2.3.2 Example 2**

It is assumed that the point target moves at the velocity *v* =29 m/s and is illuminated with continuous waveform with wavelength = 10-2 m (frequency <sup>10</sup> *f* 3.10 Hz). CPI time *t* = 0 - 2 s, initial distance *R*(0) = 30 m, guiding angle = and position angle, = 0. The calculation results of the current signal frequency and Doppler frequency are illustrated in Figs 4, (a), and (b).

Fig. 4. Current ISAR signal frequency and Doppler frequency with a constant radial velocity.

It can be seen that the current signal frequency has two constant values during CPI due to the constant Doppler frequency with two signs, -5.8 Hz and +5.8 Hz. At the moment *t* = 1.04 s the Doppler frequency alters its sign. The time instance where Doppler changes its sign (zero Doppler differential) can be regarded as a moment of point target imaging that means one point target can be resolved.

Fig. 5. Imaginary and real part of ISAR signal reflected by a point target.

Computational results of the imaginary and real part of ISAR signal reflected by a point target with constant radial velocity are presented in Figs. 5, (a), and b. It can be clearly seen the change of the phase in the imaginary part of the ISAR signal.

#### **3. ISAR signal formation and imaging with a sequence of monochromatic short pulses**

#### **3.1 3-D ISAR geometry and kinematics**

32 Digital Image Processing

(a) Imaginary part of an ISAR signal (b) Real part of an ISAR signal.

(a) Current ISAR signal frequency (b) Doppler frequency

Fig. 4. Current ISAR signal frequency and Doppler frequency with a constant radial velocity.

It can be seen that the current signal frequency has two constant values during CPI due to the constant Doppler frequency with two signs, -5.8 Hz and +5.8 Hz. At the moment *t* = 1.04 s the Doppler frequency alters its sign. The time instance where Doppler changes its sign (zero Doppler differential) can be regarded as a moment of point target imaging that means

It is assumed that the point target moves at the velocity *v* =29 m/s and is illuminated with continuous waveform with wavelength = 10-2 m (frequency <sup>10</sup> *f* 3.10 Hz). CPI time *t* = 0 - 2 s, initial distance *R*(0) = 30 m, guiding angle = and position angle, = 0. The calculation results of the current signal frequency and Doppler frequency are illustrated in

Fig. 3. Imaginary and real part of ISAR signal reflected by a point target.

**2.3.2 Example 2** 

Figs 4, (a), and (b).

one point target can be resolved.

The basic characteristic in ISAR imaging is the time dependent distance between a particular generic point from the target and ISAR. Consider 3-D geometry of ISAR scenario with radar and moving target in the coordinate system *Oxyz* (Fig. 6). The target is located in a regular grid, defined in the coordinate system *O XYZ* ' . The generic point scatterer **g** from the target area is specified by the index vector (, , ) *ijk* , i.e. **g** (, , ) *ijk* . The position vector ( ) *ijk* **R** *p* of the *ijk* th generic point scatterer in the coordinate system *Oxyz* at the moment *p* is described by the following vector equation

$$\mathbf{R}\_{ijk}(p) = \mathbf{R}\_{00}(0) + \mathbf{V}T\_p \left(\frac{N}{2} - p\right) + \mathbf{AR}\_{ijk\ \prime} \tag{14}$$

where 2 2 ( ) ( ), ( ), ( ) *T ijk ijk ijk ijk <sup>p</sup> <sup>x</sup> py p <sup>z</sup> <sup>p</sup>* **<sup>R</sup>** , ( ), ( ) *ijk ijk <sup>x</sup> py p* , and ( ) *ijk <sup>z</sup> <sup>p</sup>* are the current coordinates of the generic point, *Tp* denotes the pulse repetition period; *p* 1, *N* denotes the index of the emitted pulse, *N* denotes the full number of the emitted pulses during CPI; 00' 00' 00' 00' (0) (0), (0), (0), *<sup>T</sup>* **<sup>R</sup>** *xyz* is the position vector of the target geometric center, that locates a point *O*' at the moment 2 *N p* , [,,]*<sup>T</sup>* **V** *VVV xyz* denotes the vector velocity with coordinates cos *V V <sup>x</sup>* , cos *V V <sup>y</sup>* and cos *V V <sup>z</sup>* ; [ ,,]*<sup>T</sup>* **R***ijk ijk ijk ijk XYZ* denotes the position vector of the *ijk* th generic point; ( ) *X iX ijk* , ( ) *Yijk j Y* and ( ) *Z kZ ijk* denote

ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 35

The components *ABC* , , of the normal vector are determined by the components of the position vector 00' 00' 00' 00' (0) (0), (0), (0), *<sup>T</sup>* **<sup>R</sup>** *xyz* , vector velocity of the target and vector position of an arbitrary reference point 00 0 0 [ (0), (0), (0)]*<sup>T</sup>* **<sup>R</sup>** *xyz* in the coordinate system

> 00' 0 00' 0 00' 0 00' 0 00' 0 00' 0

00' 21 22 23

( ) (0) . . . <sup>2</sup>

*z p aaa <sup>Z</sup> <sup>N</sup> z V pT*

Eq. (20) is used in calculation of the time delay of the signal reflected by a particular generic

Consider 3-D ISAR scenario (Fig. 6) and a generic point **g** from the target illuminated by

( ) .rect exp( . ) *<sup>t</sup> st A <sup>j</sup> <sup>t</sup> T*

1, 0 1, rect

0, otherwise.

**g g <sup>g</sup> ,** (22)

m/s is the speed of the light in vacuum; is the wavelength of the signal; *T* is the timewith

( ) ( , ) rect exp{ [ ( )]} *ttp s pt a jttp <sup>T</sup>*

**<sup>g</sup>**

*T*

*t*

 

 

*t*

*T*

*x p aaa X <sup>N</sup> y p y V pT a a a Y*

*x p ijk ijk ijk y p ijk ijk ijk z p*

(0) . . <sup>2</sup>

then the distance between the generic point and ISAR can be expressed as

sequence of short monochromatic pulses, each of which is described by

*z y x z y x*

The projection of the vector equation (14) on Cartesian coordinates yields

*<sup>N</sup> x V pT*

(0) . <sup>2</sup> ( )

00'

00'

point scatterer from the target area while signal modeling.

where *A* is the amplitude of the emitted signal, 2 *<sup>c</sup>*

The signal reflected by the generic point scatter can be written as

( )

**3.2 Short pulse ISAR signal formation** 

of the emitted pulse.

*A Vy y Vz z B Vz z Vx x C Vx x Vy y*

 

[ (0) (0)] [ (0) (0)]; [ (0) (0)] [ (0) (0)]; [ (0) (0)] [ (0) (0)].

11 12 13

31 32 33

1 2 22 <sup>2</sup> () () () () *Rp x p y p z p ijk ijk ijk ijk* (20)

(18)

, (19)

**,** (21)

is the angular frequency; *<sup>c</sup>* <sup>8</sup> 3.10

*Oxyz* by expressions

the discrete coordinates of the *ijk* th generic point in the coordinate system *O XYZ* ' ; *X* , *Y* and *Z* denote the dimensions of the grid cell; cos , cos and 2 2 cos 1 cos cos are the guiding cosines; *V* is the module of the vector velocity.

Fig. 6. Geometry of 3-D ISAR scenario.

The elements of the transformation matrix **A** in Eq. (14) are determined by the Euler expressions

$$\begin{aligned} a\_{11} &= \cos\psi\cos\phi - \sin\psi\cos\theta\sin\phi; \\ a\_{12} &= -\cos\psi\sin\phi - \sin\psi\cos\theta\cos\phi; \\ a\_{13} &= \sin\psi\sin\theta; \\ a\_{21} &= \sin\psi\cos\phi + \cos\psi\cos\theta\sin\phi; \quad a\_{31} = \sin\theta\sin\phi; \\ a\_{22} &= -\sin\psi\sin\phi + \cos\psi\cos\theta\cos\phi; \quad a\_{32} = \sin\theta\cos\phi; \\ a\_{23} &= -\cos\psi\sin\theta; \quad a\_{33} = \cos\theta. \end{aligned} \tag{15}$$

The projection angles , and , defining the space orientation of the 3-D grid are calculated by components *ABC* , , of the normal vector to the plane that specifies the position of the target, and coordinates of the vector velocity, i.e.

$$\mathcal{W} = \arctan\left(-\frac{A}{B}\right); \theta = \arccos\frac{\mathcal{C}}{\left[\left(A\right)^2 + \left(B\right)^2 + \left(\mathcal{C}\right)^2\right]^2} \tag{16}$$

$$\varphi = \arccos \frac{V\_x B - V\_y A}{\{\left[\left(A\right)^2 + \left(B\right)^2\right] \left[\left(V\_x\right)^2 + \left(V\_y\right)^2 + \left(V\_z\right)^2\right]\}^{\frac{1}{2}}} \cdot \tag{17}$$

the discrete coordinates of the *ijk* th generic point in the coordinate system *O XYZ* ' ;

The elements of the transformation matrix **A** in Eq. (14) are determined by the Euler

sin cos cos cos sin ; sin sin ; sin sin cos cos cos ; sin cos ;

calculated by components *ABC* , , of the normal vector to the plane that specifies the

arccos

 

; <sup>1</sup>

22 2 2 2 2

*xyz*

{[( ) ( ) ][( ) ( ) ( ) ]}

*A BV V V*

*x y*

*VB VA*

  (15)

(16)

. (17)

 

, defining the space orientation of the 3-D grid are

222 2

1

[( ) ( ) ( ) ]

*ABC*

*C*

21 31 22 32

 

 

*a a a a*

cos sin ; cos .

cos cos sin cos sin ; cos sin sin cos cos ;

 

*A B*

23 33

*a a*

position of the target, and coordinates of the vector velocity, i.e.

arctan

arccos

sin sin ;

 

 

are the guiding cosines; *V* is the module of the vector velocity.

, cos and

*X* , *Y* and *Z* denote the dimensions of the grid cell; cos

2 2 cos 1 cos cos

Fig. 6. Geometry of 3-D ISAR scenario.

11 12 13

 , and 

*a a a*  

expressions

The projection angles

The components *ABC* , , of the normal vector are determined by the components of the position vector 00' 00' 00' 00' (0) (0), (0), (0), *<sup>T</sup>* **<sup>R</sup>** *xyz* , vector velocity of the target and vector position of an arbitrary reference point 00 0 0 [ (0), (0), (0)]*<sup>T</sup>* **<sup>R</sup>** *xyz* in the coordinate system *Oxyz* by expressions

$$\begin{aligned} A &= V\_z[y\_{00^\circ}(0) - y\_0(0)] - V\_y[z\_{00^\circ}(0) - z\_0(0)]; \\ B &= V\_x[z\_{00^\circ}(0) - z\_0(0)] - V\_z[x\_{00^\circ}(0) - x\_0(0)]; \\ C &= V\_y[x\_{00^\circ}(0) - x\_0(0)] - V\_x[y\_{00^\circ}(0) - y\_0(0)]. \end{aligned} \tag{18}$$

The projection of the vector equation (14) on Cartesian coordinates yields

$$
\begin{bmatrix} x\_{ijk}(p) \\ y\_{ijk}(p) \\ z\_{ijk}(p) \end{bmatrix} = \begin{bmatrix} x\_{00^\circ}(0) + V\_x\left(\frac{N}{2} - p\right)T\_p \\ y\_{00^\circ}(0) + V\_y\left(\frac{N}{2} - p\right)T\_p \\ z\_{ijk}(0) + V\_x\left(\frac{N}{2} - p\right)T\_p \\ z\_{00^\circ}(0) + V\_x\left(\frac{N}{2} - p\right)T\_p \end{bmatrix} + \begin{bmatrix} a\_{11} & a\_{12} & a\_{13} \\ a\_{21} & a\_{22} & a\_{23} \\ a\_{31} & a\_{32} & a\_{33} \end{bmatrix} \begin{bmatrix} X\_{ijk} \\ Y\_{ijk} \\ Z\_{ijk} \end{bmatrix} \tag{19}
$$

then the distance between the generic point and ISAR can be expressed as

$$R\_{ijk}(p) = \left[ \left. \mathbf{x}\_{ijk} \right. \right. \left( p \right) + \left. y\_{ijk} \right. \left( p \right) + \left. z\_{ijk} \right. \right. \left( p \right) \left] \overline{\mathbf{\dot{z}}} \tag{20}$$

Eq. (20) is used in calculation of the time delay of the signal reflected by a particular generic point scatterer from the target area while signal modeling.

#### **3.2 Short pulse ISAR signal formation**

Consider 3-D ISAR scenario (Fig. 6) and a generic point **g** from the target illuminated by sequence of short monochromatic pulses, each of which is described by

$$s(t) = A \operatorname{.rect}\left(\frac{t}{T}\right) \exp(j\omega t) \,\,\,\,\,\tag{21}$$

$$\operatorname{rect}\frac{t}{T} = \begin{cases} 1, \,\, 0 \le \frac{t}{T} < 1, \\ 0, \,\,\,\,\,\,\text{otherwise.} \end{cases}$$

where *A* is the amplitude of the emitted signal, 2 *<sup>c</sup>* is the angular frequency; *<sup>c</sup>* <sup>8</sup> 3.10 m/s is the speed of the light in vacuum; is the wavelength of the signal; *T* is the timewith of the emitted pulse.

The signal reflected by the generic point scatter can be written as

$$s\_{\mathbf{g}}(p,t) = a\_{\mathbf{g}} \operatorname{rect} \frac{t - t\_{\mathbf{g}}(p)}{T} \exp\{j o[t - t\_{\mathbf{g}}(p)]\} \tag{22}$$

ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 37

Due to range uncertainty of generic points placed in the *k*th range resolution cell, *r***<sup>g</sup>** can be

4 4 <sup>ˆ</sup> exp , exp [ ( ) ( ) ( ) ...] 2! 3!

**g g**

Eq. (27) stands for a procedure of total motion compensation of every generic point from *k*th range resolution cell*.* The range distance *r***g** does not influence on the image reconstruction

2 3

If the Doppler frequency of generic points in the *k*th range cell is equal or tends to constant during CPI the equation (28) reduces to the following equation of radial motion compensation

<sup>2</sup> <sup>ˆ</sup> ( , )exp 2 . ( )

Doppler index at the moment of imaging; then the complex image function *a a* ˆ ˆˆ (,) *pk* **g g** in

<sup>ˆ</sup> ˆ ˆ( , ) , exp 2 *N*

*pp a pk spk j <sup>N</sup>*

The equation (30) stands for an IFT of *sp*, *k* for each *k*th range resolution cell and can be

ˆ , exp [ ( ) ( ) ...] exp 2 ( )

*a s p k j a pT a pT j a pT*

<sup>ˆ</sup> ˆ ˆ ( , ) ( , )exp ( ) .exp 2

*pp a pk spk j p j <sup>N</sup>*

*a s p k j v pT*

1

*p*

*p*

*NT*

 

**<sup>g</sup>** …., denote the higher order derivations of the time dependent Doppler

21 1 <sup>ˆ</sup> , exp 2 [ ( ) ( ) ( ) ...] <sup>3</sup>

*a s p k j v pT a pT h pT*

 

 

*a j r s p k j v pT pT pT*

2 3

*pp p*

**g g <sup>g</sup> <sup>g</sup>** . (28)

**g g <sup>g</sup>** . (27)

*pp p*

**<sup>g</sup>** stands for the Doppler frequency whereas terms as

*p*

is the Doppler frequency step; *p*ˆ is the unknown

**<sup>g</sup>** . (30)

*p p p*

**<sup>g</sup>** . (32)

**<sup>g</sup>** (31)

**<sup>g</sup>** , then (28) can be rewritten as

 2 3 2 3 1

*p mp p a pT a pT* as a phase correction and/or motion compensation

**g g** . (29)

*a h*

assumed constant, and (25) can be written as

and can be removed from the equation (27), i.e.

1

frequency, defined at the moment of imaging.

**<sup>g</sup>** , where <sup>1</sup>

discrete space coordinates can be written as

2 *a a* 

1

<sup>2</sup> ( ) ( ) ... ( )*<sup>m</sup>*

*N*

*p*

**<sup>g</sup>** , 3

*N*

*p*

For each *k*th range cell the term <sup>2</sup>

Denote <sup>2</sup> *v pF* ˆ. *<sup>D</sup>* 

Denote 1

2 *a v* **<sup>g</sup>** , 2

Denote <sup>2</sup>

function of higher order, then

2 *a* **<sup>g</sup>** , <sup>2</sup> *<sup>h</sup>* 

1

*v* 

1

*N*

*p*

*D*

*F*

considered as phase and/or motion compensation of first order.

2 3 *a h* 

1

*N*

*p*

*N*

*p*

$$\text{rect}\frac{t - t\_{\mathbf{g}}(p)}{T} = \begin{cases} 1, & 0 \le \frac{t - t\_{\mathbf{g}}(p)}{T} < 1, \\ 0, & \text{otherwise}. \end{cases}$$

where 2 () ( ) *<sup>R</sup> <sup>p</sup> t p <sup>c</sup>* **<sup>g</sup> <sup>g</sup>** is the time delay of the signal, **g** stands for the discrete vector coordinate that locates the generic point scatterer in the target area **G**, *a***g** stands for the magnitude of the 3-D discrete image function, mod *<sup>p</sup> tt T* is the slow time, *p* denotes the number of the emitted pulse, *Tp* is the pulse repetition period, *<sup>p</sup> t t pT* is the fast time, presented as *t kT* . , where *k* is the number of range bin, where the ISAR signal is placed. The demodulated ISAR signal from the target area is

$$\text{res}(p,k) = \sum\_{\mathbf{g}\in\mathbf{G}} a\_{\mathbf{g}} \operatorname{rect} \frac{k.T - t\_{\mathbf{g}}(p)}{T} . \exp\{-j\alpha t\_{\mathbf{g}}(p)\}\,\tag{23}$$

The expression (23) is a weighted complex series of finite complex exponential base functions. It can be regarded as an asymmetric complex transform of the 3-D image function *a***<sup>g</sup>** , **g G** , defined for a whole discrete target area **G** into 2-D signal plane *spk* (,).

#### **3.3 Image reconstruction from a short pulse ISAR signal**

Eq. (23) can be rewritten as

$$s(p,k) = \sum\_{\mathbf{g}\in\mathbf{G}} a\_{\mathbf{g}} \cdot \text{rect}\frac{k.T - \frac{2R\_{\mathbf{g}}(p)}{c}}{T} \cdot \exp\left[-j\frac{4\pi}{\lambda}R\_{\mathbf{g}}(p)\right] \tag{24}$$

Formally for each *k*th range cell the image function can be extracted by the inverse transform

$$\hat{a}\_{\mathbf{g}} = \sum\_{p=1}^{N} s(p,k) \exp\left[j\frac{4\pi}{\mathcal{L}} R\_{\mathbf{g}}(p)\right] \tag{25}$$

where *p* is the number of emitted pulse, *N* is the full number of emitted pulses during CPI.

Because *spk* (,) is a 2-D signal, only a 2-D image function *a*ˆ**g** can be extracted. Eq. (25) is a symmetric complex inverse spatial transform or inverse projective operation of the 2-D signal plane *spk* (,)into 2-D image function *a*ˆ**<sup>g</sup>** , and can be regarded as a spatial correlation between *spk* (,) and <sup>4</sup> exp ( ) *<sup>j</sup> <sup>R</sup> <sup>p</sup>* **<sup>g</sup>** . Moreover, Eq. (25) can be interpreted as a total

compensation of phases, induced by radial displacement *R* ( ) *p* **<sup>g</sup>** of the target. Taylor expansion of the distance to the generic point, *R* ( ) *p* **<sup>g</sup>** at the moment of imaging is

$$R\_{\mathbf{g}}(p) = r\_{\mathbf{g}} + v\_{\mathbf{g}}(pT\_p) + \frac{a\_{\mathbf{g}}}{2!}(pT\_p)^2 + \frac{h\_{\mathbf{g}}}{3!}(pT\_p)^3 + \dots \tag{26}$$

where *r***<sup>g</sup>** , *v***<sup>g</sup>** , *a***g** and *h***<sup>g</sup>** is the distance, radial velocity, acceleration and jerk of the generic point, respectively at the moment of imaging.

( ) ( ) 1, 0 1, rect

coordinate that locates the generic point scatterer in the target area **G**, *a***g** stands for the magnitude of the 3-D discrete image function, mod *<sup>p</sup> tt T* is the slow time, *p* denotes the number of the emitted pulse, *Tp* is the pulse repetition period, *<sup>p</sup> t t pT* is the fast time, presented as *t kT* . , where *k* is the number of range bin, where the ISAR signal is placed.

> . () ( , ) rect .exp{ ( )]} *kT t p <sup>s</sup> <sup>p</sup> k a <sup>j</sup> <sup>t</sup> <sup>p</sup> <sup>T</sup>*

The expression (23) is a weighted complex series of finite complex exponential base functions. It can be regarded as an asymmetric complex transform of the 3-D image function

> 2 () . <sup>4</sup> ( , ) .rect .exp ( ) *<sup>R</sup> <sup>p</sup> k T <sup>c</sup> <sup>s</sup> <sup>p</sup> k a <sup>j</sup> <sup>R</sup> <sup>p</sup> <sup>T</sup>*

 **g**

Formally for each *k*th range cell the image function can be extracted by the inverse transform

<sup>4</sup> <sup>ˆ</sup> ( , )exp ( )

*a spk j R p*

where *p* is the number of emitted pulse, *N* is the full number of emitted pulses during CPI.

Because *spk* (,) is a 2-D signal, only a 2-D image function *a*ˆ**g** can be extracted. Eq. (25) is a symmetric complex inverse spatial transform or inverse projective operation of the 2-D signal plane *spk* (,)into 2-D image function *a*ˆ**<sup>g</sup>** , and can be regarded as a spatial correlation

compensation of phases, induced by radial displacement *R* ( ) *p* **<sup>g</sup>** of the target. Taylor

2 3 ( ) ( ) ( ) ( ) ... 2! 3! *pp p a h*

where *r***<sup>g</sup>** , *v***<sup>g</sup>** , *a***g** and *h***<sup>g</sup>** is the distance, radial velocity, acceleration and jerk of the generic

*R p r v pT pT pT* **g g**

expansion of the distance to the generic point, *R* ( ) *p* **<sup>g</sup>** at the moment of imaging is

**g g**

**g g**

**<sup>g</sup>**

*a***<sup>g</sup>** , **g G** , defined for a whole discrete target area **G** into 2-D signal plane *spk* (,).

*ttp ttp <sup>T</sup> <sup>T</sup>* 

 

**g**

The demodulated ISAR signal from the target area is

**3.3 Image reconstruction from a short pulse ISAR signal** 

**g G**

1

*N*

*p*

**g G**

where 2 () ( ) *<sup>R</sup> <sup>p</sup> t p <sup>c</sup>* **<sup>g</sup>**

Eq. (23) can be rewritten as

between *spk* (,) and <sup>4</sup> exp ( ) *<sup>j</sup> <sup>R</sup> <sup>p</sup>*

point, respectively at the moment of imaging.

0, otherwise.

**<sup>g</sup>** is the time delay of the signal, **g** stands for the discrete vector

**g**

**g g** (25)

**<sup>g</sup>** . Moreover, Eq. (25) can be interpreted as a total

**g gg** , (26)

. (23)

(24)

Due to range uncertainty of generic points placed in the *k*th range resolution cell, *r***<sup>g</sup>** can be assumed constant, and (25) can be written as

$$\hat{a}\_{\mathbf{g}} = \exp\left(j\frac{4\pi}{\lambda}r\_{\mathbf{g}}\right)\sum\_{p=1}^{N}s(p,k)\exp\left(j\frac{4\pi}{\lambda}[v\_{\mathbf{g}}(pT\_{p}) + \frac{a\_{\mathbf{g}}}{2!}(pT\_{p})^{2} + \frac{h\_{\mathbf{g}}}{3!}(pT\_{p})^{3} + ...\right].\tag{27}$$

Eq. (27) stands for a procedure of total motion compensation of every generic point from *k*th range resolution cell*.* The range distance *r***g** does not influence on the image reconstruction and can be removed from the equation (27), i.e.

$$\hat{a}\_{\mathbf{g}} = \sum\_{p=1}^{N} s(p,k) \exp\left(j2\pi [\frac{2}{\lambda}v\_{\mathbf{g}}(pT\_p) + \frac{1}{\lambda}a\_{\mathbf{g}}(pT\_p)^2 + \frac{1}{3}h\_{\mathbf{g}}(pT\_p)^3 + ...\right). \tag{28}$$

For each *k*th range cell the term <sup>2</sup> *v* **<sup>g</sup>** stands for the Doppler frequency whereas terms as 2 *a* **<sup>g</sup>** , <sup>2</sup> *<sup>h</sup>* **<sup>g</sup>** …., denote the higher order derivations of the time dependent Doppler frequency, defined at the moment of imaging.

If the Doppler frequency of generic points in the *k*th range cell is equal or tends to constant during CPI the equation (28) reduces to the following equation of radial motion compensation

$$\hat{a}\_{\mathbf{g}} = \sum\_{p=1}^{N} s(p,k) \exp\left(j2\pi.\left(\frac{2}{\lambda}v\_{\mathbf{g}}\right)(pT\_p)\right). \tag{29}$$

Denote <sup>2</sup> *v pF* ˆ. *<sup>D</sup>* **<sup>g</sup>** , where <sup>1</sup> *D p F NT* is the Doppler frequency step; *p*ˆ is the unknown Doppler index at the moment of imaging; then the complex image function *a a* ˆ ˆˆ (,) *p k* **g g** in discrete space coordinates can be written as

$$\hat{a}\_{\mathbf{g}}(\hat{p},k) = \sum\_{p=1}^{N} s(p,k) \exp\left(j2\pi \frac{p\hat{p}}{N}\right). \tag{30}$$

The equation (30) stands for an IFT of *sp*, *k* for each *k*th range resolution cell and can be considered as phase and/or motion compensation of first order.

$$\text{Denote } a\_1 = \frac{2}{\lambda} v\_{\mathfrak{g}'} \text{ , } a\_2 = \frac{2\pi}{\lambda} a\_{\mathfrak{g}'} \text{ , } a\_3 = \frac{2\pi}{3\lambda} h\_{\mathfrak{g}'} \text{ then (28) can be rewritten as}$$

$$\hat{a}\_{\mathfrak{g}} = \sum\_{p=1}^{N} s(p, k) \exp\left(j[a\_2(pT\_p)^2 + a\_3(pT\_p)^3 + \dots]\right) \exp\left(j2\pi a\_1(pT\_p)\right) \tag{31}$$

Denote <sup>2</sup> <sup>2</sup> ( ) ( ) ... ( )*<sup>m</sup> p mp p a pT a pT* as a phase correction and/or motion compensation function of higher order, then

$$\hat{a}\_{\mathbf{g}}(\hat{p},k) = \sum\_{p=1}^{N} \left[ s(p,k) \exp\left(j\Phi(p)\right) \right] \cdot \exp\left(j2\pi \frac{p\hat{p}}{N}\right). \tag{32}$$

ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 39

The deterministic component of the ISAR signal, reflected by the **g**th generic point scatterer

0,otherwise

**<sup>g</sup>** ,

( , ) rect exp () () *ttp s pt a j t t p bt t p <sup>T</sup>*

( ) ( ) 1,0 <sup>1</sup> rect

where *a***g** is the reflection coefficient of the **g**th generic point scatterer, a 3-D image function;

**<sup>g</sup>** is the round trip time delay of the signal from **g**th generic point scatterer;

*Rp p* **g g** ( ) mod[ ( )] **R** , min *tk pk T* [ ( ) 1] **<sup>g</sup>** is the fast time, *k Kp K* 1, ( ) is the sample

number of a LFM pulse; *KT T* / is the full number of samples of the LFM pulse, *T* is

( ) ( ) *t p k p <sup>T</sup>*

cell where the signal, reflected by the nearest point scatterer of the target is detected,

**<sup>g</sup>** is the minimal time delay of the SAR signal reflected from the nearest

point scatterer of the target, max min *Kp k p k p* () () () **g g** is the relative time dimension of the

<sup>2</sup> ˆ(,) ˆ ˆ ( , ) ( , ) rect exp ( , ) ( , ) *tpk spk s pk a j tpk btpk <sup>T</sup>*

<sup>ˆ</sup> (,) <sup>ˆ</sup> (,) 1 if 0 1, rect

*t pk t pk <sup>T</sup> <sup>T</sup>* 

**g g**

0 otherwise.

**g**

 **g**

**<sup>g</sup>** is the number of the radar range bin where the signal, reflected

max

**<sup>g</sup>** is the number of the radar range

2 () ( ) *R p t p <sup>c</sup>* **<sup>g</sup>**

**<sup>g</sup>** is the maximum

(37)

min

time delay of the SAR signal reflected from the farthest point scatterer of the target.

by farthest point scatterer of the target is detected; max

**g**

 

*ttp ttp <sup>T</sup> <sup>T</sup>* 

provides the dimension of the range resolution cell, i.e. *Rc F* / 2 .

**g**

the time duration of a LFM sample, min

min

( ) ( ) *t p k p <sup>T</sup>*

 **g**

The ISAR signal in discrete form can be written as

*t pk k p k T t p* **g g** ( , ) [ ( ) 1] ( ) **<sup>g</sup>** .

**g G g G**

2 () ( ) *R p t p <sup>c</sup>* **<sup>g</sup>**

target; max max

where min <sup>ˆ</sup>

is the LFM rate. The bandwidth 2*F* of the transmitted waveform

( ) <sup>2</sup>

**g**

**g g g g** (36)

waveform; <sup>2</sup> *<sup>F</sup> <sup>b</sup>*

has the form

( ) ( ) *R p t p <sup>c</sup>* **<sup>g</sup>**

min

*T* 

**4.2 LFM ISAR signal model** 

where *a*ˆ ˆ(,) *p k* **<sup>g</sup>** denotes the complex azimuth image of the target, *p*ˆ denotes the unknown index of the azimuth space coordinate equal to the unknown Doppler index of the generic point scatterer from the target at the moment of imaging. The polynomial coefficients *am* , *m* 2, 3, are calculated iteratively via applying image quality criterion, which will be discussed in subsections 4.4.

Eq. (32) can be interpreted as an ISAR image reconstruction procedure implemented through inverse Fourier transform (IFT) of a phase corrected ISAR signal into a complex azimuth image *a*ˆ ˆ(,) *p k* **<sup>g</sup>** for each *k*th range cell. In this sense the ISAR signal*spk* (,) can be referred to as a spatial frequency spectrum whereas *a*ˆ ˆ(,) *p k* **<sup>g</sup>** can be referred to as a spatial image function defined at the moment of imaging. Based on Eq. (32) two steps of image reconstruction algorithm can be outlined.

*Step 1* Compensate the phases, induced by higher order radial movement, by multiplication of *sp*, *k* with the exponential term exp[ ( )] *j p* , i.e.

$$
\hat{s}(p,k) = s(p,k) \exp[j\Phi(p)]\tag{33}
$$

*Step 2* Compensate the phases induced by first order radial displacement of generic points in the *k*th range cell by applying IFT (extract complex image), i.e.

$$\hat{a}\_{\mathbf{g}}(\hat{p},k) = \sum\_{p=1}^{N} \hat{s}(p,k) . \exp\left(j2\pi \frac{p\hat{p}}{N}\right) \tag{34}$$

Complex image extraction can be implemented by inverse fast Fourier transform (IFFT). The algorithm can be implemented if the phase correction function ( ) *p* is preliminary known.

Otherwise only IFT can be applied. Then non compensated radial acceleration and jerk of the target still remain and the image becomes blurred (unfocused). In order to obtain a focused image motion compensation of second, third and/or higher order has to be applied, that means coefficients of higher order terms in ( ) *p* have to be determined. The definition and application of these terms in image reconstruction is named an autofocus procedure accomplished by an optimization step search algorithm (SSA) which will be discussed in subsection 4.4.

#### **4. ISAR signal formation and imaging with a sequence of LFM waveforms**

#### **4.1 LFM waveform**

Consider 3-D ISAR scenario (Fig. 6) and a target illuminated by sequence of LFM waveforms, each of which is described by

$$s(t) = A.\text{rect}\left(\frac{t}{T}\right) \exp\left[-j\left(\alpha t + bt^2\right)\right],\tag{35}$$

where *<sup>p</sup> t t pT* is the fast time and mod *<sup>p</sup> tt T* is the slow time; *p* is the index of emitted pulse; *Tp* is the pulse repetition period; 2 *<sup>c</sup>* is the carrier angular frequency; <sup>8</sup> *<sup>c</sup>* 3.10 m/s is the speed of the light; is the wavelength of the signal; *T* is the timewidth of a LFM waveform; <sup>2</sup> *<sup>F</sup> <sup>b</sup> T* is the LFM rate. The bandwidth 2*F* of the transmitted waveform provides the dimension of the range resolution cell, i.e. *Rc F* / 2 .

#### **4.2 LFM ISAR signal model**

38 Digital Image Processing

where *a*ˆ ˆ(,) *p k* **<sup>g</sup>** denotes the complex azimuth image of the target, *p*ˆ denotes the unknown index of the azimuth space coordinate equal to the unknown Doppler index of the generic point scatterer from the target at the moment of imaging. The polynomial coefficients *am* , *m* 2, 3, are calculated iteratively via applying image quality criterion, which will be discussed

Eq. (32) can be interpreted as an ISAR image reconstruction procedure implemented through inverse Fourier transform (IFT) of a phase corrected ISAR signal into a complex azimuth image *a*ˆ ˆ(,) *p k* **<sup>g</sup>** for each *k*th range cell. In this sense the ISAR signal*spk* (,) can be referred to as a spatial frequency spectrum whereas *a*ˆ ˆ(,) *p k* **<sup>g</sup>** can be referred to as a spatial image function defined at the moment of imaging. Based on Eq. (32) two steps of image

*Step 1* Compensate the phases, induced by higher order radial movement, by multiplication

*Step 2* Compensate the phases induced by first order radial displacement of generic points in

<sup>ˆ</sup> ˆˆ ˆ ( , ) ( , ).exp 2 *N*

*pp a pk spk j <sup>N</sup>*

Complex image extraction can be implemented by inverse fast Fourier transform (IFFT). The algorithm can be implemented if the phase correction function ( ) *p* is preliminary known. Otherwise only IFT can be applied. Then non compensated radial acceleration and jerk of the target still remain and the image becomes blurred (unfocused). In order to obtain a focused image motion compensation of second, third and/or higher order has to be applied, that means coefficients of higher order terms in ( ) *p* have to be determined. The definition and application of these terms in image reconstruction is named an autofocus procedure accomplished by an optimization step search algorithm (SSA) which will be discussed in

**4. ISAR signal formation and imaging with a sequence of LFM waveforms** 

Consider 3-D ISAR scenario (Fig. 6) and a target illuminated by sequence of LFM

 <sup>2</sup> ( ) .rect exp *<sup>t</sup> st A <sup>j</sup> t bt T*

where *<sup>p</sup> t t pT* is the fast time and mod *<sup>p</sup> tt T* is the slow time; *p* is the index of emitted

m/s is the speed of the light; is the wavelength of the signal; *T* is the timewidth of a LFM

 , (35)

is the carrier angular frequency; <sup>8</sup> *<sup>c</sup>* 3.10

1

*p*

*spk spk j p* ˆ( , ) ( , )exp[ ( )] (33)

**<sup>g</sup>** (34)

in subsections 4.4.

subsection 4.4.

**4.1 LFM waveform** 

waveforms, each of which is described by

pulse; *Tp* is the pulse repetition period; 2 *<sup>c</sup>*

reconstruction algorithm can be outlined.

of *sp*, *k* with the exponential term exp[ ( )] *j p* , i.e.

the *k*th range cell by applying IFT (extract complex image), i.e.

The deterministic component of the ISAR signal, reflected by the **g**th generic point scatterer has the form

$$s\_{\mathbf{g}}(p,t) = a\_{\mathbf{g}} \operatorname{rect} \frac{t - t\_{\mathbf{g}}(p)}{T} \exp\left\{-j\left[o\left(t - t\_{\mathbf{g}}(p)\right) + b\left(t - t\_{\mathbf{g}}(p)\right)^2\right]\right\}\tag{36}$$

$$\operatorname{rect} \frac{t - t\_{\mathbf{g}}(p)}{T} = \begin{cases} 1, 0 \le \frac{t - t\_{\mathbf{g}}(p)}{T} < 1\\ 0, \text{otherwise} \end{cases}$$

where *a***g** is the reflection coefficient of the **g**th generic point scatterer, a 3-D image function; ( ) ( ) *R p t p <sup>c</sup>* **<sup>g</sup> <sup>g</sup>** is the round trip time delay of the signal from **g**th generic point scatterer; *Rp p* **g g** ( ) mod[ ( )] **R** , min *tk pk T* [ ( ) 1] **<sup>g</sup>** is the fast time, *k Kp K* 1, ( ) is the sample number of a LFM pulse; *KT T* / is the full number of samples of the LFM pulse, *T* is the time duration of a LFM sample, min min ( ) ( ) *t p k p <sup>T</sup>* **g <sup>g</sup>** is the number of the radar range cell where the signal, reflected by the nearest point scatterer of the target is detected, min min 2 () ( ) *R p t p <sup>c</sup>* **<sup>g</sup> <sup>g</sup>** is the minimal time delay of the SAR signal reflected from the nearest point scatterer of the target, max min *Kp k p k p* () () () **g g** is the relative time dimension of the target; max max ( ) ( ) *t p k p <sup>T</sup>* **g <sup>g</sup>** is the number of the radar range bin where the signal, reflected by farthest point scatterer of the target is detected; max max 2 () ( ) *R p t p <sup>c</sup>* **<sup>g</sup> <sup>g</sup>** is the maximum time delay of the SAR signal reflected from the farthest point scatterer of the target. The ISAR signal in discrete form can be written as

$$s(p,k) = \sum\_{\mathbf{g}\in\mathcal{G}} s\_{\mathbf{g}}(p,k) = \sum\_{\mathbf{g}\in\mathcal{G}} a\_{\mathbf{g}} \operatorname{rect}\left[\frac{\hat{t}(p,k)}{T}\right] \exp\left\{-j\left[a\hat{t}(p,k) + b\left(\hat{t}(p,k)\right)^2\right]\right\}\tag{37}$$

$$\operatorname{rect}\left[\frac{\hat{t}\_{\mathbf{g}}(p,k)}{T}\right] = \begin{cases} 1 \text{ if } 0 \le \frac{\hat{t}\_{\mathbf{g}}(p,k)}{T} < 1, \\ 0 & \text{otherwise.} \end{cases}$$

where min <sup>ˆ</sup> *t pk k p k T t p* **g g** ( , ) [ ( ) 1] ( ) **<sup>g</sup>** .

ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 41

of the moment of imaging that allows 2-D Taylor expansion on range and azimuth direction

() 2 () <sup>2</sup> .( ) .( ) ... ( )

*k p p mp*

The constant terms 0*a* has noting to do with phase correction and can be neglected. The

where ˆ ˆ *kk pp* ( ), ( ) **g g** ˆ ˆ denote the new unknown range and cross-range space coordinates

22 2 ( , ) .( ) ... ( ) .( ) ... ( ) ( )( ) ... *m m p mp <sup>m</sup> <sup>p</sup> p k a pT a pT b k T b k T c pT k T* (45)

> <sup>2</sup> () 2 () <sup>ˆ</sup> <sup>ˆ</sup> <sup>2</sup> 2 2 (,) *<sup>k</sup> Rp Rp pp kk <sup>b</sup> p k c c NK*

<sup>ˆ</sup> <sup>ˆ</sup> <sup>ˆ</sup> ˆˆ ˆ ( , ) ( , )exp ( , ) 2 2

*pp kk a pk spk j kp N K*

<sup>ˆ</sup> <sup>ˆ</sup> <sup>ˆ</sup> ˆˆ ˆ ( , ) ( , ).exp ( , ) .exp 2 exp 2 . <sup>ˆ</sup>

Eq. (48) can be considered as an image reconstruction computational procedure, which does

Based on the previous analysis the following image reconstruction steps can be defined.

*Step 1* Compensate phase terms of higher order by multiplication of complex matrix *spk* ˆ(,)

*kk pp a pk spk j kp j <sup>j</sup> <sup>K</sup> <sup>N</sup>*

<sup>ˆ</sup>

  *spk spk j pk* ( , ) ( , ).exp ( , ) ˆ (49)

**<sup>g</sup>** (48)

**<sup>g</sup>** , (47)

 

> 

**g g** . (46)

*Rp Rp <sup>b</sup> a a pT a pT a pT*

.( ) .( ) ... ( ) ( )( ) ...

*b k T b k T b k T c pT k T*

cells for each emitted pulse. The sum of higher order terms is signified as

2 2

ˆ

*N K*

*p k*

1 1

reveal the 2-D discrete complex image function <sup>ˆ</sup> *a pk* ˆ ˆ (,) **<sup>g</sup>** .

*N K*

*p k*

**4.3 LFM ISAR image reconstruction algorithm** 

by a complex exponential function exp ( , ) *j pk* , i.e.

1 1

01 2

*m m p* 2

<sup>ˆ</sup> ( )2 *<sup>p</sup> pp a pT <sup>N</sup>* 

*m*

and 1

*K K Kp* ( ) denotes the number of range

(44)

( )2 <sup>ˆ</sup> *kk b kT*

ˆ

*K*

,

to be applied, i.e. the following polynomial of higher order to be defined

2

2 1 2 2

*c c*

linear terms 1.( ) *<sup>p</sup> a pT* and 1 *b kT* .( ) are redefined as 1

of the **g**th generic point at the instant of imaging, ˆ

*k Kp K* 1, ( ) .

Substitute (46) in (43), then

Eq. (47) can be rewritten as

where *p*ˆ 1,*N* , ˆ

**g g**

then

Demodulation of the ISAR signal return is performed by its multiplication by the complex conjugated emitted waveform, i.e.

$$\hat{s}(p,t) = \sum\_{\mathbf{g}\in\mathcal{G}} a\_{\mathbf{g}} \operatorname{rect} \frac{t - t\_{\mathbf{g}}(p)}{T} \exp\left\{ j \left[ o\left( t - t\_{\mathbf{g}}(p) \right) + b\left( t - t\_{\mathbf{g}}(p) \right)^2 \right] \right\}, \exp\left[ -j\left( ot + bt^2 \right) \right] \tag{38}$$

which yields

$$\hat{s}(p,t) = \sum\_{\mathbf{g}\in\mathcal{G}} a\_{\mathbf{g}} \operatorname{rect} \frac{t - t\_{\mathbf{g}}(p)}{T} \exp\left\{-j\left[\left(a + 2bt\right)t\_{\mathbf{g}}(p) - bt\_{\mathbf{g}}^2(p)\right]\right\}.\tag{39}$$

Denote the current angular frequency of emitted LFM pulse as () 2 *t bt* , which in discrete form can be expressed as 2 ( 1) *<sup>k</sup> bk T* , where is the carrier angular frequency, and *b* is the chirp rate, *k* is the LFM sample number, *T* is the time length of the sample, then Eq. (39) can be rewritten as

$$\hat{s}(p,t) = \sum\_{\mathbf{g} \in \mathbf{G}} a\_{\mathbf{g}} \operatorname{rect} \frac{t - \frac{R\_{\mathbf{g}}(p)}{c}}{T} \exp\left[-j\left(2o(t)\frac{R\_{\mathbf{g}}(p)}{c} - b\left(\frac{2R\_{\mathbf{g}}(p)}{c}\right)^2\right)\right],\tag{40}$$

which in discrete form can be expressed as

$$\hat{s}(p,k) = \sum\_{\mathbf{g}\in\mathbf{G}} a\_{\mathbf{g}} \operatorname{rect} \frac{\hat{t}\_{\mathbf{g}}(p,k)}{T} \exp\left[-j\left(2o\_{k}\frac{R\_{\mathbf{g}}(p)}{c} - b\left(\frac{2R\_{\mathbf{g}}(p)}{c}\right)^{2}\right)\right].\tag{41}$$

Eq. (41) can be interpreted as a spatial transform of the 3-D image function *a***g** into 2-D ISAR signal plane *spk* ˆ(,) by the finite transformation operator, the exponential term

$$\left[\exp\left|-j\left(2o\_k\frac{R\_{\mathbf{g}}(p)}{c}-b\left(\frac{2R\_{\mathbf{g}}(p)}{c}\right)^2\right)\right|\right].\tag{42}$$

Formally the 3-D image function *a***g** should be extracted from 2-D ISAR signal plane by the inverse spatial transform but due to theoretical limitation based on the number of measurement parameters only a 2-D image function may be extracted, i.e.

$$\hat{a}\_{\mathbf{g}\leftarrow\mathbf{G}} = \sum\_{p=1}^{N} \sum\_{k=1}^{K(p)+K} \hat{s}(p,k) \cdot \exp\left[j\left(2o\_k \frac{R\_{\mathbf{g}}(p)}{c} - b\left(\frac{2R\_{\mathbf{g}}(p)}{c}\right)^2\right)\right].\tag{43}$$

Extraction of the image function is a procedure of complete phase compensation of the signals reflected by all point scatterers from the object that means total compensation of target movement during CPI. The argument of the exponential term (43),

<sup>2</sup> () 2 () <sup>2</sup> *<sup>k</sup> Rp Rp <sup>b</sup> c c* **g g** is a complex function infinitely differentiable in a neighborhood of the moment of imaging that allows 2-D Taylor expansion on range and azimuth direction to be applied, i.e. the following polynomial of higher order to be defined

$$\begin{aligned} \left(2a\_k \frac{R\_\mathbf{g}(p)}{c} - b \left(\frac{2R\_\mathbf{g}(p)}{c}\right)^2\right) &= a\_0 + a\_1. \left(pT\_p\right) + a\_2. \left(pT\_p\right)^2 + \dots + a\_m \left(pT\_p\right)^m \\ \left(a\_1 + b\_1. \left(k\Delta T\right) + b\_2. \left(k\Delta T\right)^2 + \dots + b\_m \left(k\Delta T\right)^m + c\_2 \left(pT\_p\right) \left(k\Delta T\right) + \dots\right) \end{aligned} \tag{44}$$

The constant terms 0*a* has noting to do with phase correction and can be neglected. The linear terms 1.( ) *<sup>p</sup> a pT* and 1 *b kT* .( ) are redefined as 1 <sup>ˆ</sup> ( )2 *<sup>p</sup> pp a pT <sup>N</sup>* and 1 ˆ ( )2 <sup>ˆ</sup> *kk b kT K* , where ˆ ˆ *kk pp* ( ), ( ) **g g** ˆ ˆ denote the new unknown range and cross-range space coordinates of the **g**th generic point at the instant of imaging, ˆ *K K Kp* ( ) denotes the number of range cells for each emitted pulse. The sum of higher order terms is signified as

$$\Phi(p,k) = a\_2 \cdot \left(pT\_p\right)^2 + \dots + a\_m \left(pT\_p\right)^m + b\_2 \cdot \left(k\Delta T\right)^2 + \dots + b\_m \left(k\Delta T\right)^m + c\_2 \left(pT\_p\right)\left(k\Delta T\right) + \dots \tag{45}$$

then

40 Digital Image Processing

Demodulation of the ISAR signal return is performed by its multiplication by the complex

<sup>2</sup> <sup>2</sup> ( ) ˆ( , ) rect exp ( ) ( ) .exp *ttp spt a j t t p b t t p j t bt <sup>T</sup>*

<sup>2</sup> ( ) ˆ( , ) rect exp 2 ( ) ( ) *ttp spt a j bt t p bt p <sup>T</sup>*

frequency, and *b* is the chirp rate, *k* is the LFM sample number, *T* is the time length of the

() 2 () ˆ( , ) rect exp 2 ( ) *R p <sup>t</sup> Rp Rp <sup>c</sup> spt a jt b T cc*

<sup>2</sup> <sup>ˆ</sup> (,) () 2 () ˆ( , ) rect exp 2 *<sup>k</sup> t pk Rp Rp spk a <sup>j</sup> <sup>b</sup>*

Eq. (41) can be interpreted as a spatial transform of the 3-D image function *a***g** into 2-D ISAR

<sup>2</sup> () 2 () exp 2 *<sup>k</sup> Rp Rp j b c c*

Formally the 3-D image function *a***g** should be extracted from 2-D ISAR signal plane by the inverse spatial transform but due to theoretical limitation based on the number of

() 2 () ˆ ˆ( , ).exp 2

2 ( )

 **g g**

Extraction of the image function is a procedure of complete phase compensation of the signals reflected by all point scatterers from the object that means total compensation of target movement during CPI. The argument of the exponential term (43),

*k*

**g g** is a complex function infinitely differentiable in a neighborhood

*Rp Rp*

*c c*

**g G** . (43)

signal plane *spk* ˆ(,) by the finite transformation operator, the exponential term

measurement parameters only a 2-D image function may be extracted, i.e.

*a spk j b*

**g g g**

**<sup>g</sup>**

*bk T* , where

<sup>2</sup> ( )

**g g**

*T cc* 

 **g g <sup>g</sup>**

 

() 2 *t bt* 

**g g** . (42)

(38)

. (39)

, which in

, (40)

. (41)

is the carrier angular

**g g g**

**g**

 

**g**

Denote the current angular frequency of emitted LFM pulse as

conjugated emitted waveform, i.e.

which yields

**g G**

**g G**

discrete form can be expressed as 2 ( 1) *<sup>k</sup>*

**g g G**

**g g G**

1 1

*p k*

<sup>2</sup> () 2 () <sup>2</sup> *<sup>k</sup> Rp Rp <sup>b</sup> c c*

 

*N Kp K*

sample, then Eq. (39) can be rewritten as

which in discrete form can be expressed as

$$
\left(2a\rho\_k \frac{R\_\mathbf{g}(p)}{c} - b\left(\frac{2R\_\mathbf{g}(p)}{c}\right)^2\right) = 2\pi \frac{p\hat{p}}{N} + 2\pi \frac{k\hat{k}}{K} + \Phi(p,k)\,. \tag{46}
$$

Substitute (46) in (43), then

$$\hat{a}\_{\mathbf{g}}(\hat{p},\hat{k}) = \sum\_{p=1}^{N} \sum\_{k=1}^{\hat{K}} \hat{s}(p,k) \exp\left[j\left(\Phi(k,p) + 2\pi \frac{p\hat{p}}{N} + 2\pi \frac{k\hat{k}}{K}\right)\right],\tag{47}$$

where *p*ˆ 1,*N* , ˆ *k Kp K* 1, ( ) .

Eq. (47) can be rewritten as

$$\hat{a}\_{\mathbf{g}}(\hat{p},\hat{k}) = \sum\_{p=1}^{N} \left[ \sum\_{k=1}^{\mathbb{R}} \hat{s}(p,k) . \exp\left[j\Phi(k,p)\right] . \exp\left(j2\pi \frac{k\hat{k}}{\hat{K}}\right) \right] \exp\left(j2\pi \frac{p\hat{p}}{N}\right). \tag{48}$$

Eq. (48) can be considered as an image reconstruction computational procedure, which does reveal the 2-D discrete complex image function <sup>ˆ</sup> *a pk* ˆ ˆ (,) **<sup>g</sup>** .

#### **4.3 LFM ISAR image reconstruction algorithm**

Based on the previous analysis the following image reconstruction steps can be defined.

*Step 1* Compensate phase terms of higher order by multiplication of complex matrix *spk* ˆ(,) by a complex exponential function exp ( , ) *j pk* , i.e.

$$
\tilde{\mathbf{s}}(p,k) = \hat{\mathbf{s}}(p,k) \exp\left[j\Phi(p,k)\right] \tag{49}
$$

ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 43

*am* 0 and increasing by *am* 0.01 in case the image quality gets better. If the image quality does not improve or gets worse go to computation of the next coefficient *<sup>m</sup>* <sup>1</sup> *a* or stop the procedure. In practice the quadratic term has a major impact on the phase correction process.

Let ( ) *<sup>s</sup> p* be a phase correction function, defined at the *s*th iteration, and then the phase

After current phase correction and image extraction by range and cross-range (azimuth)

,

**g**

*s*

1 1

<sup>ˆ</sup> (,) <sup>ˆ</sup> <sup>ˆ</sup> (,) <sup>ˆ</sup>

*a pk I pk*

*p k*

ˆ

*s ss p k*

*H I pk I pk*

The estimate of the optimal values of coefficients corresponds to the minimum of the

 <sup>ˆ</sup> ˆ ˆ argmin [ ( , )] *m m s <sup>s</sup> <sup>a</sup>*

The procedure is repeated until the global minimum value of the entropy *Hs* is acquired.

To verify the properties of the LFM ISAR signal model and to prove the correctness of the image reconstruction algorithm a numerical experiment is carried out. Assume the target, Mig-35, detected in 3-D coordinate system *O XYZ* ' is moving rectilinearly in a coordinate

*K* 300; timewidth of LFM sample <sup>8</sup> *T* 0.33.10 s; bandwidth <sup>8</sup> *F* 1.5.10 Hz, LFM rate <sup>14</sup> *b* 3.10 <sup>2</sup> s ; number of emitted pulses during CPI *N* 500. Dimensions of the grid cell: *X* = *Y* = *Z* = 0.5 m; reference points on grid axes *XYZ* , , , *i* 1,64 , *j* 1,64 *k* 1,10 .

The complex spatial frequency spectrum and 2-D space image function are presented in Figs. 7 and 8, respectively. The entropy and final focused image of Mig-35 are illustrated in

system *Oxyz* . Kinematical parameters: velocity*V* 400 m/s; guiding angles

00' *y* (0) 71,3.10 m; <sup>3</sup>

Target intensities 0.01 *ijk a* , out of target intensities 0.001 *ijk a* .

1 1

*<sup>s</sup> N K*

2

ˆ (,) ˆ

<sup>ˆ</sup> <sup>2</sup> ,

*a pk*

*s*

**g**

ˆ ˆ ( , )ln[ ( , )] ˆ ˆ *N K*

( , ) ( , )exp( ( )) ˆ *s s s pk spk j p* . (55)

. (57)

*a HIp k* . (58)

00' *z* (0) 5.10 m; reference coordinates: 0 *x* (0) 10

, coordinates of the mass-center at the moment ( *p N* / 2 ):

s, LFM pulse timewidth <sup>6</sup> *T* 10 s; number of LFM samples

<sup>0</sup>*z* (0) 2.10 m. LFM emitted pulse: wavelength <sup>2</sup>

. (56)

 0.92;

3.10 m; pulse

correction is accomplished by

entropy image cost function, i.e.

**4.5 Numerical experiment** 

3 00' *x* (0) 36,3.10 m; <sup>3</sup>

repetition period <sup>2</sup> 1.32.10 *Tp*

Figs. 9 and 10, respectively.

<sup>0</sup> *<sup>y</sup>* (0) 5.10 m; <sup>3</sup>

 0.5 ; 0.42

m; <sup>4</sup>

compression, calculate a power normalized image as

Calculate entropy of the normalized ISAR image

*Step 2* Range compress *spk* (,) by discrete IFT, i.e.

$$\tilde{s}(p,\hat{k}) = \frac{1}{\hat{K}} \sum\_{k=1}^{\hat{K}} \tilde{s}(p,k) . \exp\left(j2\pi \frac{k\hat{k}}{\hat{K}}\right). \tag{50}$$

*Step 3* Azimuth compress ˆ *spk* (,) , i.e. extract a complex image by IFT

$$a\_{\mathbf{g}}(\hat{p}, \hat{k}) = \frac{1}{N} \sum\_{p=1}^{N} \tilde{s}(p, \hat{k}) . \exp\left(j2\pi \frac{p\hat{p}}{N}\right). \tag{51}$$

*Step 4* Compute the module of the complex image by

$$\left| a\_{\mathbf{g}}(\hat{p}, \hat{k}) \right| = \left| \frac{1}{N} \sum\_{p=1}^{N} \tilde{s}(p, \hat{k}) . \exp\left( j2\pi \frac{p\hat{p}}{N} \right) \right|. \tag{52}$$

The aforementioned algorithm is feasible if the phase correction function (,) *p k* is a priory known. Otherwise, a focused image is impossible to extract. In this case taking into account the linear property of computational operations in (48) the image extraction algorithm may start with 2-D IFT (range and cross range compression) of the demodulated ISAR signal, the complex matrix *spk* ˆ(,) , i.e.

$$a\_{\mathbf{g}}(\hat{p},\hat{k}) = \sum\_{p=1}^{N} \left[ \sum\_{k=1}^{\hat{K}} \hat{s}(p,k) . \exp\left(j2\pi \frac{k\hat{k}}{\hat{K}}\right) \right] \exp\left(j2\pi \frac{p\hat{p}}{N}\right). \tag{53}$$

It is worth noting that 2-D IFT are interpreted as a spatial correlation of the complex frequency spectrum, *spk* ˆ(,) with the exponential terms <sup>ˆ</sup> exp 2 <sup>ˆ</sup> *kk <sup>j</sup> <sup>K</sup>* and <sup>ˆ</sup> exp 2 *pp <sup>j</sup> <sup>N</sup>* that reveal unknown range, ˆ *k* and cross range, *p*ˆ space coordinates of a 2-D image function <sup>ˆ</sup> *a pk* (,) <sup>ˆ</sup> **<sup>g</sup>** in the area of all possible values *p*<sup>ˆ</sup> 1,*N* and <sup>ˆ</sup> *k Kp K* 1, ( ) .

#### **4.4 Autofocusing phase correction by image entropy minimization**

If the image obtained by only range (50) and azimuth (51) compression is blurred a higher order phase correction has to be applied, i.e. to perform *skp skp j p* ( , ) ( , ).exp ( ) ˆ . The phase correction or motion compensation of higher order is an autofocus procedure. It requires determination of coefficients 2 *a* … *am* , 2 *b* … *bm* and 2 *c* of the polynomial (45). The computational load is reduced if (,) () *p k p* for each *k*, i.e. (45) is limited to

$$\Phi(p) = a\_2.(pT\_p)^2 + \dots + a\_m(pT\_p)^m \tag{54}$$

An iterative SSA is applied to find out optimal values of the coefficients using entropy as a cost function to evaluate the quality of the image. At first step 2 *a* is calculated, at second - <sup>3</sup> *a* , etc. The exact value of each coefficient *am* , *m* 2,3,... is computed iteratively, starting from

*am* 0 and increasing by *am* 0.01 in case the image quality gets better. If the image quality does not improve or gets worse go to computation of the next coefficient *<sup>m</sup>* <sup>1</sup> *a* or stop the procedure. In practice the quadratic term has a major impact on the phase correction process.

Let ( ) *<sup>s</sup> p* be a phase correction function, defined at the *s*th iteration, and then the phase correction is accomplished by

$$
\tilde{s}\_s(p,k) = \hat{s}(p,k) \exp(-j\Phi\_s(p))\,. \tag{55}
$$

After current phase correction and image extraction by range and cross-range (azimuth) compression, calculate a power normalized image as

$$I\_s(\hat{\boldsymbol{p}}, \hat{k}) = \frac{\left| a\_{\mathbf{g},s}(\hat{\boldsymbol{p}}, \hat{k}) \right|^2}{\sum\_{p=1}^N \sum\_{k=1}^{\hat{K}} \left| a\_{\mathbf{g},s}(\hat{\boldsymbol{p}}, \hat{k}) \right|^2} \,. \tag{56}$$

Calculate entropy of the normalized ISAR image

$$H\_s = -\sum\_{p=1}^{N} \sum\_{k=1}^{\hat{K}} I\_s(\hat{p}, \hat{k}) \ln[I\_s(\hat{p}, \hat{k})] \,. \tag{57}$$

The estimate of the optimal values of coefficients corresponds to the minimum of the entropy image cost function, i.e.

$$
\hat{a}\_m = \arg\min\_{a\_m} \left\{ H\_s[I\_s(\hat{p}, \hat{k})] \right\}.\tag{58}
$$

The procedure is repeated until the global minimum value of the entropy *Hs* is acquired.

#### **4.5 Numerical experiment**

42 Digital Image Processing

ˆ

1 <sup>ˆ</sup> <sup>1</sup> <sup>ˆ</sup> ( , ) ( , ).exp 2 ˆ ˆ *K*

1

1 <sup>1</sup> <sup>ˆ</sup> ˆ ˆ ( , ) ( , ).exp 2 <sup>ˆ</sup> *<sup>N</sup>*

The aforementioned algorithm is feasible if the phase correction function (,) *p k* is a priory known. Otherwise, a focused image is impossible to extract. In this case taking into account the linear property of computational operations in (48) the image extraction algorithm may start with 2-D IFT (range and cross range compression) of the demodulated ISAR signal, the

<sup>ˆ</sup> <sup>ˆ</sup> <sup>ˆ</sup> ( , ) ( , ).exp 2 exp 2 . ˆ ˆ <sup>ˆ</sup>

It is worth noting that 2-D IFT are interpreted as a spatial correlation of the complex

If the image obtained by only range (50) and azimuth (51) compression is blurred a higher order phase correction has to be applied, i.e. to perform *skp skp j p* ( , ) ( , ).exp ( ) ˆ . The phase correction or motion compensation of higher order is an autofocus procedure. It requires determination of coefficients 2 *a* … *am* , 2 *b* … *bm* and 2 *c* of the polynomial (45). The

> 2 <sup>2</sup> ( ) .( ) ... ( )*<sup>m</sup>*

An iterative SSA is applied to find out optimal values of the coefficients using entropy as a cost function to evaluate the quality of the image. At first step 2 *a* is calculated, at second - <sup>3</sup> *a* , etc. The exact value of each coefficient *am* , *m* 2,3,... is computed iteratively, starting from

**<sup>g</sup>** (53)

*k* and cross range, *p*ˆ space coordinates of a 2-D image function

*k Kp K* 1, ( ) .

*p mp p a pT a pT* (54)

*kk pp a pk spk j <sup>j</sup> <sup>K</sup> <sup>N</sup>*

*p pp a pk spk j N N*

ˆ

1 1

frequency spectrum, *spk* ˆ(,) with the exponential terms <sup>ˆ</sup> exp 2 <sup>ˆ</sup>

**4.4 Autofocusing phase correction by image entropy minimization** 

computational load is reduced if (,) () *p k p* for each *k*, i.e. (45) is limited to

<sup>ˆ</sup> *a pk* (,) <sup>ˆ</sup> **<sup>g</sup>** in the area of all possible values *p*<sup>ˆ</sup> 1,*N* and <sup>ˆ</sup>

*N K*

*p k*

*p*

*kk spk spk j K K*

<sup>1</sup> <sup>ˆ</sup> ˆ ˆ ( , ) ( , ).exp 2 . <sup>ˆ</sup> *<sup>N</sup>*

*pp a pk spk j N N*

 . (50)

**<sup>g</sup>** . (52)

 

*kk <sup>j</sup> <sup>K</sup>* 

and <sup>ˆ</sup> exp 2 *pp <sup>j</sup> <sup>N</sup>* 

 that

 

**<sup>g</sup>** (51)

*k*

*Step 3* Azimuth compress ˆ *spk* (,) , i.e. extract a complex image by IFT

*Step 2* Range compress *spk* (,) by discrete IFT, i.e.

*Step 4* Compute the module of the complex image by

complex matrix *spk* ˆ(,) , i.e.

reveal unknown range, ˆ

To verify the properties of the LFM ISAR signal model and to prove the correctness of the image reconstruction algorithm a numerical experiment is carried out. Assume the target, Mig-35, detected in 3-D coordinate system *O XYZ* ' is moving rectilinearly in a coordinate system *Oxyz* . Kinematical parameters: velocity*V* 400 m/s; guiding angles 0.92 ; 0.5 ; 0.42 , coordinates of the mass-center at the moment ( *p N* / 2 ): 3 00' *x* (0) 36,3.10 m; <sup>3</sup> 00' *y* (0) 71,3.10 m; <sup>3</sup> 00' *z* (0) 5.10 m; reference coordinates: 0 *x* (0) 10 m; <sup>4</sup> <sup>0</sup> *<sup>y</sup>* (0) 5.10 m; <sup>3</sup> <sup>0</sup>*z* (0) 2.10 m. LFM emitted pulse: wavelength <sup>2</sup> 3.10 m; pulse repetition period <sup>2</sup> 1.32.10 *Tp* s, LFM pulse timewidth <sup>6</sup> *T* 10 s; number of LFM samples *K* 300; timewidth of LFM sample <sup>8</sup> *T* 0.33.10 s; bandwidth <sup>8</sup> *F* 1.5.10 Hz, LFM rate <sup>14</sup> *b* 3.10 <sup>2</sup> s ; number of emitted pulses during CPI *N* 500. Dimensions of the grid cell: *X* = *Y* = *Z* = 0.5 m; reference points on grid axes *XYZ* , , , *i* 1,64 , *j* 1,64 *k* 1,10 . Target intensities 0.01 *ijk a* , out of target intensities 0.001 *ijk a* .

The complex spatial frequency spectrum and 2-D space image function are presented in Figs. 7 and 8, respectively. The entropy and final focused image of Mig-35 are illustrated in Figs. 9 and 10, respectively.

ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 45

Consider 3-D ISAR scenario (Fig. 6) and a target a sequence of phase-code modulated (PCM)

 <sup>0</sup> ( ) exp ( ) *<sup>t</sup> st A j t bt <sup>T</sup>* **rect** 

where *<sup>p</sup> t t pT* is the fast time and mod *<sup>p</sup> tt T* is the slow time; *p* is the index of the

pulse, *k K* 1, is the index of the PCM segment, *KT T* / is the full number of PCM segments; *T* is the time duration of the phase-code modulated pulse train, *T* is the

Deterministic component of the ISAR signal return reflected by the **g**th generic point

 ( ) ( , ) rect exp ( ( )) ( ) *ttp s p t a j t t p b t*

( ) ( ) 1, if 0 1; rect

The deterministic component of the ISAR signal return reflected from the target for every

 ( ) ( , ) rect exp ( ( )) ( ) *ttp s p t a j t t p b t T*

Eq. (61) is a weighted complex series of finite base functions, ISAR signals from all generic points. It can be regarded as an asymmetric complex transform of the 3-D image function *a***g G** , into a 2-D signal plane *skp* ˆ(,) . Computing rect[ ( ) / ] *t t* **<sup>g</sup>** *p T* time delays *t* ( ) *p* **<sup>g</sup>** are

**g g**

min <sup>ˆ</sup> ( ) ( ( ) 1) ( ) *k k <sup>t</sup> <sup>p</sup> <sup>k</sup> <sup>p</sup> k Tt <sup>p</sup>* **g g <sup>g</sup>** , then Eq. (60) in discrete form can be written as

0, otherwise

**<sup>g</sup> <sup>g</sup>** .

<sup>ˆ</sup> <sup>ˆ</sup> ( ) <sup>ˆ</sup> <sup>ˆ</sup> ( , ) rect .exp ( ) ( 1)

 

**g g**

*<sup>k</sup> t p s p k a j t p bk k T*

<sup>ˆ</sup> <sup>ˆ</sup> <sup>ˆ</sup> <sup>ˆ</sup> ( ) ( ) 1, if 0 1 rect

*<sup>k</sup> <sup>k</sup> t p t p T T*

*k*

**g**

*T*

*ttp ttp <sup>T</sup> <sup>T</sup>* 

 

**g**

0, otherwise.

**g g <sup>g</sup>** , (60)

**g**

<sup>ˆ</sup>

   

> 

*<sup>k</sup>* different from this order is introduced i.e. <sup>ˆ</sup>

. (61)

, (62)

( ) *<sup>k</sup> t p* **<sup>g</sup>** .

timewidth the phase segment, *b t*( ) {0,1} is the binary parameter of the PCM train.

*T*

**g**

**g**

 

, (59)

0 is the initial phase of a PCM

**5. ISAR signal formation and imaging with a sequence of PCM waveforms** 

pulse trains (bursts). Each PCM pulse train is described by

emitted pulse train; *Tp* is the burst repetition period;

**g G**

**g G**

**5.1 PCM waveform** 

**5.2 PCM ISAR signal model** 

*p* th pulse train is described by

arranged in ascending order. An index ˆ

Denote ˆ ˆ

0 is defined by

scatterer if 0

(a) Imaginary part of the ISAR signal (b) Real part of the ISAR signal

Fig. 8. 2-D isometric space image function and 2-D unfocused image of Mig-35 after azimuth compression by second IFT.

Fig. 9. Image entropy and final focused image of Mig-35 by step 47 and minimal entropy 6.2.

#### **5. ISAR signal formation and imaging with a sequence of PCM waveforms**

#### **5.1 PCM waveform**

44 Digital Image Processing

(a) Imaginary part of the ISAR signal (b) Real part of the ISAR signal

(a) Module of the 2-D space image function (b) Unfocused image of Mig-35

Fig. 8. 2-D isometric space image function and 2-D unfocused image of Mig-35 after azimuth

(a) Evolution of the image entropy (b) Final focused image of Mig-35

Fig. 9. Image entropy and final focused image of Mig-35 by step 47 and minimal entropy 6.2.

Fig. 7. Complex ISAR signal - complex spatial frequency spectrum.

compression by second IFT.

Consider 3-D ISAR scenario (Fig. 6) and a target a sequence of phase-code modulated (PCM) pulse trains (bursts). Each PCM pulse train is described by

$$s(t) = A \mathbf{rect}\,\frac{t}{T} \exp\left\{-j\left[\alpha t + \pi b(t) + \varphi\_0\right]\right\},\tag{59}$$

where *<sup>p</sup> t t pT* is the fast time and mod *<sup>p</sup> tt T* is the slow time; *p* is the index of the emitted pulse train; *Tp* is the burst repetition period; 0 is the initial phase of a PCM pulse, *k K* 1, is the index of the PCM segment, *KT T* / is the full number of PCM segments; *T* is the time duration of the phase-code modulated pulse train, *T* is the timewidth the phase segment, *b t*( ) {0,1} is the binary parameter of the PCM train.

#### **5.2 PCM ISAR signal model**

Deterministic component of the ISAR signal return reflected by the **g**th generic point scatterer if 0 0 is defined by

$$s\_{\mathbf{g}}(p,t) = a\_{\mathbf{g}} \operatorname{rect} \frac{t - t\_{\mathbf{g}}(p)}{T} \exp\left\{-j\left[\alpha(t - t\_{\mathbf{g}}(p)) + \pi b(t)\right]\right\},\tag{60}$$

$$\operatorname{rect} \frac{t - t\_{\mathbf{g}}(p)}{T} = \begin{cases} 1, & \text{if } 0 \le \frac{t - t\_{\mathbf{g}}(p)}{T} < 1;\\ 0, & \text{otherwise.} \end{cases}$$

The deterministic component of the ISAR signal return reflected from the target for every *p* th pulse train is described by

$$\mathbf{s}(p,t) = \sum\_{\mathbf{g} \in \mathbf{G}} a\_{\mathbf{g}} \operatorname{rect} \frac{t - t\_{\mathbf{g}}(p)}{T} \exp\left\{-j\left[\alpha(t - t\_{\mathbf{g}}(p)) + \pi b(t)\right]\right\} \,. \tag{61}$$

Eq. (61) is a weighted complex series of finite base functions, ISAR signals from all generic points. It can be regarded as an asymmetric complex transform of the 3-D image function *a***g G** , into a 2-D signal plane *skp* ˆ(,) . Computing rect[ ( ) / ] *t t* **<sup>g</sup>** *p T* time delays *t* ( ) *p* **<sup>g</sup>** are arranged in ascending order. An index ˆ *<sup>k</sup>* different from this order is introduced i.e. <sup>ˆ</sup> ( ) *<sup>k</sup> t p* **<sup>g</sup>** . Denote ˆ ˆ min <sup>ˆ</sup> ( ) ( ( ) 1) ( ) *k k <sup>t</sup> <sup>p</sup> <sup>k</sup> <sup>p</sup> k Tt <sup>p</sup>* **g g <sup>g</sup>** , then Eq. (60) in discrete form can be written as

$$s(p,k) = \sum\_{\mathbf{g}\in\mathcal{G}} a\_{\mathbf{g}} \operatorname{rect} \frac{\hat{\mathbf{f}}\_{\mathbf{g}}^{\hat{k}}(p)}{T} \exp\left\{-j\left[o\hat{\mathbf{f}}\_{\mathbf{g}}^{\hat{k}}(p) + \pi b(k-\hat{k}+1)\Delta T\right]\right\}.\tag{62}$$

$$\operatorname{rect} \frac{\hat{\mathbf{f}}\_{\mathbf{g}}^{\hat{k}}(p)}{T} = \begin{cases} 1, & \text{if } 0 \le \frac{\hat{\mathbf{f}}\_{\mathbf{g}}^{\hat{k}}(p)}{T} < 1\\ 0, & \text{otherwise} \end{cases}.$$

ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 47

*Step 1* Phase correction by multiplication of the phase demodulated ISAR signal with an

*Step 2* Range compression is by correlating of the phase corrected ISAR signal *spk* (,) with reference function, complex conjugated of the transmitted PCM signal ˆ exp[ ( 1) ] *j*

ˆ ˆ ( , ) ( , )exp[ ( 1) ] <sup>ˆ</sup> *k K*

*Step 3* Azimuth compression and complex image extraction by Fourier transform of the

<sup>2</sup> ˆ ˆ ˆ ˆ ( , ) ( , )exp <sup>ˆ</sup> *<sup>N</sup>*

<sup>2</sup> ˆ ˆ ˆ ˆ( , ) ( , )exp <sup>ˆ</sup>

<sup>2</sup> ˆ ˆ ˆˆ ˆ ( , ) ( , ).exp[ (( 1) )] exp <sup>ˆ</sup> *N kK*

<sup>2</sup> *a* … *am* in polynomial (67) using a phase correction SSA described in 4.4.

*a p k s p k j b k k T j pp <sup>N</sup>*

If the image obtained by only range (71) and azimuth (72) compression is blurred a higher order phase correction algorithm has to be applied. It requires determination of coefficients

A numerical experiment is carried out to verify the properties of the PCM ISAR signal model and to prove the correctness of the image reconstruction algorithm. It is assumed that the target, helicopter is detected in a coordinate system *O XYZ* ' and illuminated by Barker's

*a p k s p k j pp <sup>N</sup>*

The aforementioned algorithm is feasible if the phase correction function (,) *p k* is a priory known. Otherwise, a focused image is impossible to extract. In this case taking into account the linear property of computational operations in (68) the image extraction algorithm may start with correlation along range coordinate (range compression) and Fourier transform along cross range coordinate (range compression) of the demodulated ISAR signal, the

*a p k s p k j pp <sup>N</sup>*

1

1,

*p N*

*p*

*bk k T*

**<sup>g</sup>** . (74)

**<sup>g</sup>** . (73)

**<sup>g</sup>** . (72)

**,** (71)

*spk spk j p* ( , ) ( , ).exp[ ( )] ˆ . (70)

*bk k T* ,

Accordingly, the image extraction algorithm can be outlined as follows.

ˆ

ˆ

*k k s p k s p k j*

exponential phase correction function, i.e.

*k K* 1, ( ) *p* .

Then the module of the target image can be calculated by

ˆ

1 ˆ

*p k k*

range compressed ISAR data, i.e.

complex matrix *spk* ˆ(,) , i.e.

**5.4 Numerical experiment** 

i.e.

where *p* 1,*N* , ˆ

where ˆ *<sup>k</sup>* stands for a current range number *k* for which <sup>ˆ</sup> <sup>ˆ</sup> rect[ ( ) / ] *<sup>k</sup> <sup>t</sup>***<sup>g</sup>** *<sup>p</sup> <sup>T</sup>* yields 1 first time. It is possible for many time delays, *<sup>t</sup>* ( ) *<sup>p</sup>* **<sup>g</sup>** the index <sup>ˆ</sup> *k* to have one and the same value. The index ˆ *k* is considered as a space discrete range coordinate of a **g**th generic point at the moment of imaging.

#### **5.3 PCM ISAR image reconstruction procedure**

Based on the phase demodulated ISAR signal

$$\hat{s}(p,k) = \sum\_{\mathbf{g}\in\mathcal{G}} a\_{\mathbf{g}} \operatorname{rect} \frac{\hat{t}\_{\mathbf{g}}^{\hat{k}}(p)}{T} \exp\left(-j\left[o(k\_{\mathbf{g}\min}(p)\Delta T - t\_{\mathbf{g}}^{\hat{k}}(p)) + \pi b((k-\hat{k}+1)\Delta T)\right]\right). \tag{63}$$

formally the 3-D image function *a***g** should be extracted from 2-D ISAR signal plane by the inverse spatial transform but due to theoretical limitation based on the number of measurement parameters only a 2-D image function may by determined, i.e.

$$\hat{a}\_{\mathbf{g}} = \sum\_{p=1}^{N} \sum\_{k=\hat{k}}^{\hat{k}+K} s(p,k) \exp\left\{ j \left[ o(k\_{\mathbf{g}\min}(p)\Delta T - t\_{\mathbf{g}}^{\hat{k}}(p)) + \pi b((k-\hat{k}+1)\Delta T) \right] \right\},\tag{64}$$

Eq. (64) can be rewritten as

$$\hat{a}\_{\mathbf{g}} = \sum\_{p=1}^{N} \sum\_{k=\hat{k}}^{\hat{k}+K} \left[ \hat{\mathbf{s}}(p,k) \exp\{j\pi b(k-\hat{k}+1)\Delta T\} \right] \cdot \exp\left\{j\omega \left[k\_{\mathbf{g}\min}(p)\Delta T - t\_{\mathbf{g}}^{\hat{k}}(p)\right] \right\}.\tag{65}$$

Taylor expansion of the phase term <sup>ˆ</sup> min() () *<sup>k</sup> ijk ijk <sup>t</sup> <sup>p</sup> <sup>t</sup> <sup>p</sup>* can be presented as a polynomial function of higher order, i.e.

$$\mathbb{E}\left[k\_{\mathbf{g}\min}(p) - t\_{ijk}^{\hat{k}}(p)\right] = a\_0 + a\_1(pT\_p) + a\_2(pT\_p)^2 + \dots + a\_m(pT\_p)^m \cdot \tag{66}$$

The linear term 1( ) *<sup>p</sup> <sup>a</sup> pT* is reduced to <sup>2</sup> *<sup>p</sup>*ˆ.*<sup>p</sup> <sup>N</sup>* and considered as a Fourier operator, *p*ˆ is the

discrete unknown coordinate of the *g*th generic point scatterer placed in the *k*th range cell, *N* is the number of emitted PCM trains during CPI. The constant term has nothing to do with the image reconstruction and is removed. The rest sum of the terms in (66) are denoted as

2 <sup>2</sup> ( ) ( ) ... ( )*<sup>m</sup> p mp p a pT a pT* , (67)

then (65) can expressed as

$$\hat{a}\_{\mathbf{g}}\left(\hat{p},\hat{k}\right) = \sum\_{p=1}^{N} \sum\_{k=\hat{k}}^{\hat{k}+K} \left| \hat{s}(p,k) \exp\left[j\pi b(k-\hat{k}+1)\Delta T\right] \right| \exp\left\{j\left[\frac{2\pi}{N}p\hat{p} + \Phi(p)\right]\right\},\tag{68}$$

Based on the linearity of the operations (68) can be rewritten as

$$\hat{a}\_{\mathbf{g}}(\hat{p},\hat{k}) = \sum\_{p=1}^{N} \left\{ \sum\_{k=\hat{k}}^{\hat{k}+K} [\hat{s}(p,k).\exp(j\Phi(p))].\exp[j\pi b(k-\hat{k}+1)\Delta T] \right\} \exp\left\{j\left[\frac{2\pi}{N}\hat{p}p\right]\right\}.\tag{69}$$

Accordingly, the image extraction algorithm can be outlined as follows.

*Step 1* Phase correction by multiplication of the phase demodulated ISAR signal with an exponential phase correction function, i.e.

$$
\tilde{s}(p,k) = \hat{s}(p,k) . \exp[j\Phi(p)]. \tag{70}
$$

*Step 2* Range compression is by correlating of the phase corrected ISAR signal *spk* (,) with reference function, complex conjugated of the transmitted PCM signal ˆ exp[ ( 1) ] *jbk k T* , i.e.

$$\tilde{s}(\hat{p},\hat{k}) = \sum\_{k=\hat{k}}^{\hat{k}+K} \tilde{s}(p,k) \exp\{j\pi b(k-\hat{k}+1)\Delta T\}\_{\mathcal{I}} \tag{71}$$

where *p* 1,*N* , ˆ *k K* 1, ( ) *p* .

46 Digital Image Processing

min <sup>ˆ</sup> ( ) <sup>ˆ</sup> ˆ( , ) rect exp ( ( ) ( )) (( 1) )

formally the 3-D image function *a***g** should be extracted from 2-D ISAR signal plane by the inverse spatial transform but due to theoretical limitation based on the number of

<sup>ˆ</sup> <sup>ˆ</sup>

<sup>ˆ</sup> <sup>ˆ</sup>

min() () *<sup>k</sup>*

**g g <sup>g</sup>** . (65)

*k*

**g g <sup>g</sup>** , (64)

 

 *bk k T j k p T t p*

*k p t p a a pT a pT a pT* **<sup>g</sup>** . (66)

min

*p mp p a pT a pT* , (67)

*ijk ijk <sup>t</sup> <sup>p</sup> <sup>t</sup> <sup>p</sup>* can be presented as a polynomial

and considered as a Fourier operator, *p*ˆ is the

*k*

*<sup>k</sup> t p spk a j k p T t p bk k T <sup>T</sup>* 

min

ˆ ˆ ( , )exp ( ( ) ( )) (( 1) )

ˆ ˆ ˆ( , )exp[ ( 1) ] .exp ( ) ( )

ˆ 2 min 0 1 <sup>2</sup> ( ) ( ) ( ) ( ) ... ( ) *k m ijk p p m p*

discrete unknown coordinate of the *g*th generic point scatterer placed in the *k*th range cell, *N* is the number of emitted PCM trains during CPI. The constant term has nothing to do with the image reconstruction and is removed. The rest sum of the terms in (66) are denoted as

> 2 <sup>2</sup> ( ) ( ) ... ( )*<sup>m</sup>*

<sup>2</sup> ˆ ˆ ˆˆ ˆ ( , ) ( , )exp ( 1) exp <sup>ˆ</sup> ( )

**<sup>g</sup>** , (68)

**<sup>g</sup>** . (69)

*a p k s p k j b k k T j pp p <sup>N</sup>*

<sup>2</sup> ˆ ˆ ˆˆ ˆ ( , ) ( , ).exp( ( ) .exp[ ( 1) )] exp <sup>ˆ</sup> *N kK*

*<sup>a</sup> <sup>p</sup> k s <sup>p</sup> <sup>k</sup> jp j bk k T j pp <sup>N</sup>*

*<sup>p</sup>*ˆ.*<sup>p</sup> <sup>N</sup>* 

<sup>ˆ</sup>

*a spk j k p T t p b k k T*

**g g g**

measurement parameters only a 2-D image function may by determined, i.e.

is possible for many time delays, *<sup>t</sup>* ( ) *<sup>p</sup>* **<sup>g</sup>** the index <sup>ˆ</sup>

**5.3 PCM ISAR image reconstruction procedure** 

*k*

**g**

Based on the phase demodulated ISAR signal

**g G**

1 ˆ

*a s p k j*

Taylor expansion of the phase term <sup>ˆ</sup>

1 ˆ

Based on the linearity of the operations (68) can be rewritten as

<sup>ˆ</sup>

*p k k*

1 ˆ

*p k k*

*NkK*

*p k k*

1 ˆ

The linear term 1( ) *<sup>p</sup> <sup>a</sup> pT* is reduced to <sup>2</sup>

*p k k*

*NkK*

Eq. (64) can be rewritten as

function of higher order, i.e.

then (65) can expressed as

*NkK*

*<sup>k</sup>* stands for a current range number *k* for which <sup>ˆ</sup> <sup>ˆ</sup> rect[ ( ) / ] *<sup>k</sup> <sup>t</sup>***<sup>g</sup>** *<sup>p</sup> <sup>T</sup>* yields 1 first time. It

*k* is considered as a space discrete range coordinate of a **g**th generic point at the

 <sup>ˆ</sup> ˆ

 

*k* to have one and the same value. The

. (63)

where ˆ

index ˆ

moment of imaging.

*Step 3* Azimuth compression and complex image extraction by Fourier transform of the range compressed ISAR data, i.e.

$$
\hat{a}\_{\mathbf{g}}(\hat{p},\hat{k}) = \sum\_{p=1}^{N} \tilde{s}(p,\hat{k}) \exp\left\{ j \left[ \frac{2\pi}{N} \hat{p}p \right] \right\}.\tag{72}
$$

Then the module of the target image can be calculated by

$$\left| \hat{a}\_{\mathbf{g}}(\hat{p}, \hat{k}) \right| = \left| \sum\_{p=\overline{1,N}} \tilde{s}(p, \hat{k}) \exp\left\{ j \left[ \frac{2\pi}{N} \hat{p} p \right] \right\} \right|. \tag{73}$$

The aforementioned algorithm is feasible if the phase correction function (,) *p k* is a priory known. Otherwise, a focused image is impossible to extract. In this case taking into account the linear property of computational operations in (68) the image extraction algorithm may start with correlation along range coordinate (range compression) and Fourier transform along cross range coordinate (range compression) of the demodulated ISAR signal, the complex matrix *spk* ˆ(,) , i.e.

$$\hat{a}\_{\mathbf{g}}(\hat{p},\hat{k}) = \sum\_{p=1}^{N} \left\{ \sum\_{k=k}^{\hat{k}+K} \hat{s}(p,k) \exp[j\pi b((k-\hat{k}+1)\Delta T)] \right\} \exp\left\{j\left[\frac{2\pi}{N}\hat{p}p\right]\right\}.\tag{74}$$

If the image obtained by only range (71) and azimuth (72) compression is blurred a higher order phase correction algorithm has to be applied. It requires determination of coefficients <sup>2</sup> *a* … *am* in polynomial (67) using a phase correction SSA described in 4.4.

#### **5.4 Numerical experiment**

A numerical experiment is carried out to verify the properties of the PCM ISAR signal model and to prove the correctness of the image reconstruction algorithm. It is assumed that the target, helicopter is detected in a coordinate system *O XYZ* ' and illuminated by Barker's

ISAR Signal Formation and Image Reconstruction as Complex Spatial Transforms 49

In the present chapter a mathematical description and original interpretation of ISAR signal formation and imaging has been suggested. It has been illustrated that both of these operations can be interpreted as direct and inverse spatial complex transforms, respectively. It has been proven that the image extraction is a threefold procedure; including phase correction, range compression performed by IFT in case LFM waveforms and by crosscorrelation in case PCM waveforms, and azimuth compression performed by IFT in both cases. It has been underlined that the image reconstruction is a procedure of total motion compensation, i.e. compensation of all phases induced by the target motion. Only phases proportional to the distances from ISAR to all point scatterers on the target at the moment of their imaging still remain. These phases define a complex character of the ISAR image. The drawback of the proposed higher motion compensation algorithm is the existence of multiple local minimums in entropy evolution in case the target is fast maneuvering. In order to find out a global minimum in the entropy and optimal values of the polynomial coefficients the computation process has to be enlarged in wide interval of their variation. The subject of the future research is the exploration of the image reconstruction algorithm with higher order terms and cross-terms of the phase correction polynomial while the target

This chapter is supported by NATO Science for Peace and Security (SPS) Programme:

Li, J.; Wu, R.; Chen, V. (2001). Robust autofocus algorithm for ISAR imaging of moving

targets, *IEEE Transactions on Aerospace and Electronic Systems*, Vol. 37, No 3, (July

Fig. 12. Entropy evolution: min *H* 7,43 by optimal vale of 2 *a* 390 .

**6. Conclusion** 

exhibits complicated movement.

NATO: ESP. EAP. CLG 983876.

2001), pp. 1056-1069, ISSN 0018-9251

**7. Acknowledgement** 

**8. References** 

PCM burst and moving rectilinearly in a coordinate system *Oxyz* . Kinematic parameters: velocity module *V* 25 m/s; velocity guiding angles 0 , 0.5 , 0.5 ; coordinates of the mass-centre: 00' *<sup>x</sup>* (0) 0 m, <sup>4</sup> 00' *y* (0) 5.10 m, <sup>3</sup> 00' *z* 3.10 m. Barker's PCM binary function *b t*( ) : *b t*() 0 if *t T* (1,5,8,9,11,13) , and *b t*() 1 if *t T* (6,7,10,12) ; wavelength <sup>2</sup> 3.10 m; burst repetition period <sup>3</sup> 5.10 *Tp* s; PCM sample timewidth <sup>9</sup> *T* 3.3.10 s; number of burst samples *K* 13 ; sample index *k* 1,13 ; PCM burst timewidth <sup>9</sup> *T* 42.9.10 s; number of bursts emitted during CPI *N* 500 . Grid's cell dimensions *XYZ* 0.5 m. Reference points on axes *XYZ* , , *i j* 1,100 and *k* 1,40 , respectively. Isotropic point scatterers are placed at each node of the regular grid. Target's intensities 0.01 *ijk a* , out of target's intensities 0.001 *ijk a* .

The real and imaginary part of the complex Barker's PCM ISAR signal is presented in Fig. 10, the final image – 2-D space image function - in Fig. 11, and entropy evolution in Fig. 12.

(a) Real part of ISAR signal (b) Imaginary part of ISAR signal

Fig. 10. Complex Barker's PCM ISAR signal as a complex spatial frequency spectrum.

Fig. 11. Final image – 2-D space image function (pseudo color maps).

Fig. 12. Entropy evolution: min *H* 7,43 by optimal vale of 2 *a* 390 .

#### **6. Conclusion**

48 Digital Image Processing

PCM burst and moving rectilinearly in a coordinate system *Oxyz* . Kinematic parameters:

binary function *b t*( ) : *b t*() 0 if *t T* (1,5,8,9,11,13) , and *b t*() 1 if *t T* (6,7,10,12) ;

<sup>9</sup> *T* 3.3.10 s; number of burst samples *K* 13 ; sample index *k* 1,13 ; PCM burst timewidth <sup>9</sup> *T* 42.9.10 s; number of bursts emitted during CPI *N* 500 . Grid's cell dimensions *XYZ* 0.5 m. Reference points on axes *XYZ* , , *i j* 1,100 and *k* 1,40 , respectively. Isotropic point scatterers are placed at each node of the regular

The real and imaginary part of the complex Barker's PCM ISAR signal is presented in Fig. 10, the final image – 2-D space image function - in Fig. 11, and entropy evolution in Fig.

 (a) Real part of ISAR signal (b) Imaginary part of ISAR signal

Fig. 10. Complex Barker's PCM ISAR signal as a complex spatial frequency spectrum.

(a) Unfocused ISAR image (b) Focused ISAR image

Fig. 11. Final image – 2-D space image function (pseudo color maps).

3.10 m; burst repetition period <sup>3</sup> 5.10 *Tp*

grid. Target's intensities 0.01 *ijk a* , out of target's intensities 0.001 *ijk a* .

0 ,

00' *y* (0) 5.10 m, <sup>3</sup>

 0.5 , 0.5;

s; PCM sample timewidth

00' *z* 3.10 m. Barker's PCM

velocity module *V* 25 m/s; velocity guiding angles

coordinates of the mass-centre: 00' *<sup>x</sup>* (0) 0 m, <sup>4</sup>

wavelength <sup>2</sup> 

12.

In the present chapter a mathematical description and original interpretation of ISAR signal formation and imaging has been suggested. It has been illustrated that both of these operations can be interpreted as direct and inverse spatial complex transforms, respectively. It has been proven that the image extraction is a threefold procedure; including phase correction, range compression performed by IFT in case LFM waveforms and by crosscorrelation in case PCM waveforms, and azimuth compression performed by IFT in both cases. It has been underlined that the image reconstruction is a procedure of total motion compensation, i.e. compensation of all phases induced by the target motion. Only phases proportional to the distances from ISAR to all point scatterers on the target at the moment of their imaging still remain. These phases define a complex character of the ISAR image. The drawback of the proposed higher motion compensation algorithm is the existence of multiple local minimums in entropy evolution in case the target is fast maneuvering. In order to find out a global minimum in the entropy and optimal values of the polynomial coefficients the computation process has to be enlarged in wide interval of their variation. The subject of the future research is the exploration of the image reconstruction algorithm with higher order terms and cross-terms of the phase correction polynomial while the target exhibits complicated movement.

#### **7. Acknowledgement**

This chapter is supported by NATO Science for Peace and Security (SPS) Programme: NATO: ESP. EAP. CLG 983876.

#### **8. References**

Li, J.; Wu, R.; Chen, V. (2001). Robust autofocus algorithm for ISAR imaging of moving targets, *IEEE Transactions on Aerospace and Electronic Systems*, Vol. 37, No 3, (July 2001), pp. 1056-1069, ISSN 0018-9251

**3** 

*Italy* 

**Low Bit Rate SAR Image Compression** 

*Dipartimento per le Tecnologie - Università degli Studi di Napoli "Parthenope"* 

Synthetic aperture radar (SAR) is an active remote sensing tool operating in the microwave range of the electromagnetic spectrum. It uses the motion of the radar transmitter to synthesize an antenna aperture much larger than the actual antenna aperture in order to yield high spatial resolution radar images (Curlander & McDonough, 1991). It has been applied to military survey, terrain mapping, and other fields for its characteristics of

In the last few years, high quality images of the Earth produced by SAR systems, carried on a variety of airborne and space borne platforms, have become increasingly available. With the increasing resolution of SAR images, it is of great interest to find efficient ways to store the high volume of SAR image data at real time or to compress SAR images with higher

There are some special characteristics of SAR images so different from incoherent, optical images, to which compression algorithms are commonly applied, that sensitively affect the design of an image compression algorithm: First of all the speckle phenomena, which results from the coherent radiation and processing and severely degrades the quality of SAR images. When illuminated by the SAR, each target contributes backscatter energy which, along with phase and power changes, is then coherently summed for all scatterers. This summation can be either high or low, depending on constructive or destructive interference. Second the very high dynamic range of SAR images also attributable to the coherent nature of the imaging process. Within a resolution cell of an image, the transduced image domain value is related to the radar cross section per unit area of the corresponding patch of illuminated terrain (Eichel & Ives, 1999) This specific cross section can vary over a considerable range. Most natural terrain, being rough relative to the wavelengths employed, exhibit relatively low values of this parameter, in the vicinity of -15 dBsm/m2; while flat, smooth surfaces such as lake exhibit even lower values. On the other hand, manmade objects, especially of conducting materials with large flat surfaces and right angles, can have

These differences mean that encoding/decoding algorithms designed for optical data may

compression performances for limited bandwidth of communication channel.

**1. Introduction**

working in all weather during day and night.

specific cross sections of +60 dBsm/m2 and higher.

not be optimized or even appropriate for SAR data.

**Based on Sparse Representation** 

Alessandra Budillon and Gilda Schirinzi


### **Low Bit Rate SAR Image Compression Based on Sparse Representation**

Alessandra Budillon and Gilda Schirinzi

*Dipartimento per le Tecnologie - Università degli Studi di Napoli "Parthenope" Italy* 

#### **1. Introduction**

50 Digital Image Processing

Berizzi, F.; Dalle Mese, E. & Martorella, M. (2002). Performance analysis of a contrast-based

Martorella, M.; Haywood, B.; Berizzi, F. & Dalle Mese, E. (2003). Performance analysis of an

Berizzi, F.; Martorella, M.; Haywood, B. M.; Dalle Mese, E. &Bruscoli, S. (2004). A survey on

Chen, C. & Andrews, H.C. (1980). Target motion induced radar imaging. *IEEE Transactions* 

Wu, H.; Delisle, G. Y. & Fang, D. G. (1995). Translational motion compensation in ISAR

Xi, L.; Guosui, L. & Ni, J. (1999). Autofocusing of ISAR images based on entropy

Martorella M. & Berizzi, F. (2005). Time windowing for highly focused ISAR image

Chen, V. & Quint, S. (1998). Joint time-frequency transform for radar range Doppler

Qian, S. & Chen, V. (1998). ISAR motion compensation via adaptive joint time-frequency

205, ISBN 0-7803-7357-X, Long Beach, CA, USA, Apr. 22-25, 2002

3-5, 2003

1995), pp. 1561-1571, ISSN 0018-9251

(October 1999), pp. 1240-1252, ISSN 0018-9251

(July 2005), pp. 992-1006, ISSN 1729-8806

1998) pp. 486-499, ISSN 0018-9251

670-677, (April 1999), ISSN 0018-9251

2004

ISAR autofocusing algorithm. *Proceedings of 2002 IEEE Radar Conference*, pp. 200-

ISAR contrast-based autofocusing algorithm using real data. *Proceedings of 2003 IEEE Radar Conference*, pp. 30-35, ISBN 0-7803-7870-9, Adelaide, SA, Australia, Sept.

ISAR autofocusing techniques. *Proceedings of IEEE ICIP 2004*, Singapore, Oct. 24-27,

*on Aerospace and Electronic Systems*, Vol. 16, (January 1980), pp. 2-14, ISSN 0018-9251

image processing. *IEEE Transactions on Image Processing*, Vol. 4, No 11, (November

minimization, *IEEE Transactions on Aerospace and Electronic Systems*, Vol. 35, No 4,

reconstruction," *IEEE Transactions on Aerospace and Electronic Systems,* Vol 41, No 3,

imaging. *IEEE Transactions on Aerospace and Electronic Systems*, Vol. 34, No 2, (April

technique, *IEEE Transactions on Aerospace and Electronic Systems*, Vol. 34, No 2, pp.

Synthetic aperture radar (SAR) is an active remote sensing tool operating in the microwave range of the electromagnetic spectrum. It uses the motion of the radar transmitter to synthesize an antenna aperture much larger than the actual antenna aperture in order to yield high spatial resolution radar images (Curlander & McDonough, 1991). It has been applied to military survey, terrain mapping, and other fields for its characteristics of working in all weather during day and night.

In the last few years, high quality images of the Earth produced by SAR systems, carried on a variety of airborne and space borne platforms, have become increasingly available. With the increasing resolution of SAR images, it is of great interest to find efficient ways to store the high volume of SAR image data at real time or to compress SAR images with higher compression performances for limited bandwidth of communication channel.

There are some special characteristics of SAR images so different from incoherent, optical images, to which compression algorithms are commonly applied, that sensitively affect the design of an image compression algorithm: First of all the speckle phenomena, which results from the coherent radiation and processing and severely degrades the quality of SAR images. When illuminated by the SAR, each target contributes backscatter energy which, along with phase and power changes, is then coherently summed for all scatterers. This summation can be either high or low, depending on constructive or destructive interference. Second the very high dynamic range of SAR images also attributable to the coherent nature of the imaging process. Within a resolution cell of an image, the transduced image domain value is related to the radar cross section per unit area of the corresponding patch of illuminated terrain (Eichel & Ives, 1999) This specific cross section can vary over a considerable range. Most natural terrain, being rough relative to the wavelengths employed, exhibit relatively low values of this parameter, in the vicinity of -15 dBsm/m2; while flat, smooth surfaces such as lake exhibit even lower values. On the other hand, manmade objects, especially of conducting materials with large flat surfaces and right angles, can have specific cross sections of +60 dBsm/m2 and higher.

These differences mean that encoding/decoding algorithms designed for optical data may not be optimized or even appropriate for SAR data.

Low Bit Rate SAR Image Compression Based on Sparse Representation 53

A possible choice of the basis is the overcomplete Independent Component Analysis (ICA) basis (Hyvarinen et al., 1999, Algra, 2000), allowing to model the data as a mixture of non Gaussian and "almost" statistically independent sources, so that the representation coefficients, due to their scarce correlation, can be efficiently coded using a scalar quantizer. In this paper the performance of a compression method based on an overcomplete ICA representation, coupled with the use of an entropy constrained scalar quantizer (Pascazio & Schirinzi, 2003), optimized for the Laplacian statistics of the ICA coefficients, and using a proper bit allocation strategy, first proposed in (Budillon et al., 2005) is analyzed in details and proved on different set of real data, obtained with ERS1, COSMO SkyMed and

Independent Component Analysis (ICA) (Hyvarinen et al., 1999) has been proposed as a statistical generative model that allows to represent the observed data as a linear

*<sup>n</sup>* **s** *ss s* is the random vector of the independent components, **A** is an unknown

The overcomplete ICA paradigm (Hyvarinen et al., 1999) assumes *n*>*m*. This means we have a larger number of independent components and we can more easily adapt to signal statistics. Since the matrix **A** is not invertible, even if it is known, the estimation of the independent components is an undetermined problem that does not admit a unique solution. Then a constraint on the statistical distribution of the ICA coefficients is introduced to solve the

It can be shown (Hyvarinen et al., 1999) that assuming a Laplacian distribution, the optimal

ˆ argmin *<sup>i</sup> x As s s* 

The choice of the Laplacian distribution is convenient with respect to the compression application since permits to have a sparse representation with a small number of non zero

For the estimation of the basis matrix we adopted a modification of FastICA proposed in

In this section we analyze the performance of a scalar quantizer, optimized for Laplaciandistributed coefficients. In particular, we consider an entropy constrained scalar quantizer,

(Hyvarinen et al., 1999), that searches for "quasi-orthogonal" basis.

**3. Entropy constrained scalar quantizer and bit allocation** 

i

**x** *xx xm* is the random vector representing the observed data,

**x As** (1)

*<sup>i</sup> s* leads to the minimization of their 1 -norm with the

(2)

transformation of variables that are non Gaussian and mutually independent.

constant matrix, called the mixing matrix or basis matrix.

TerraSAR-X sensors.

**2. Overcomplete ICA** 

The model is the following:

where 1 2 , ,.... *<sup>T</sup>*

estimation of the coefficients ˆ

1 2 , ,.... *<sup>T</sup>*

problem.

constraint **x As**

coefficient ˆ

*<sup>i</sup> s* .

Many efforts have been taken place in order to develop suitable compression techniques of the bit stream necessary for raw data and/or focused images coding (Kwok & Jhonson, 1989, Pascazio & Schirinzi, 2003, Eichel & Ives, 1999, Dony & Haykin, 1997, Baxter, 1999).

The most widely used compression techniques are based on Block Adaptive Quantization (BAQ), due to its simplicity for coding and decoding. The algorithm is based on the consideration that the SAR (complex) signal has commonly a Gaussian, with real and imaginary parts mutually independent and practically uncorrelated between adjacent pixels. It divides the data in blocks, and for each block computes the standard deviation , in order to determine the optimum quantizer, which adapts to the changing levels of the signal (Kwok & Jhonson, 1989). A non-uniform quantizer (or Lloyd-Max quantizer) that minimizes the Mean Squared Error (MSE) for a given number of quantization levels (Goyal et al., 1998) is commonly used. The minimum block size is selected in order to guarantee a Gaussian statistic within a block; the maximum block size is limited by the fact that the signal power should be approximately constant in the block.

Transform coding algorithms have been also applied on SAR intensity images (Dony & Haykin, 1997, Eichel & Ives, 1999, Baxter, 1999) and on SAR raw data (Pascazio & Schirinzi, 2003). They are based on the decomposition of the signal to be encoded in an orthonormal basis. Then, each decomposition coefficient is approximated by a quantized variable. The role of the signal decomposition is to decorrelate the signal and to make the subsequent quantization process easier. The coding performance depends on the choice of the basis. The best basis compacts the image energy into the fewest coefficients. The small number of significant coefficients in the transformed domain results in a sparse representation, that can be coded with fewer bits, due to its low entropy. An entropy coder can be then used as the last step in the compression scheme. In (Dony & Haykin, 1997) a method which combines Karhunen-Loeve transform and Vector Quantisation is proposed, while in (Eichel & Ives, 1999) simply 2-D Fourier transformation is used, in (Baxter, 1999) a compression system based on the Gabor transform is adopted, in (Pascazio & Schirinzi, 2003) a transform coding compression method using wavelet basis is applied to SAR raw data. Wavelets have also been applied on SAR images by (Zeng & Cumming, 2001, Xingsong et al. 2004), in the first case it has been proposed a treestructured wavelet transform, while in the second case a compression scheme which combines the wavelet packet transform, the quadtree classification and universal trelliscoded quantization has been adopted.

With the aim of reducing the number of significant representation coefficients, and obtaining a sparse representation, overcomplete dictionaries, or frames, have been recently proposed (Goyal et al., 1998). A frame is a set of column vectors, just as a transform, but with a larger number of vectors than the number of elements in each vector. The representation of the observed data in terms of overcomplete basis in not unique, then a constraint have to be enforced to recover uniqueness. To achieve a sparse representation the introduced constraint can be the minimization of the significant coefficients by using an 1 norm based penalty. It can be shown that in this case the problem to be solved is a linear programming problem, that can be viewed as a Maximum a Posteriori (MAP) estimation problem with a Laplacian prior distribution assumption (Hyvarinen et al., 1999).

A possible choice of the basis is the overcomplete Independent Component Analysis (ICA) basis (Hyvarinen et al., 1999, Algra, 2000), allowing to model the data as a mixture of non Gaussian and "almost" statistically independent sources, so that the representation coefficients, due to their scarce correlation, can be efficiently coded using a scalar quantizer.

In this paper the performance of a compression method based on an overcomplete ICA representation, coupled with the use of an entropy constrained scalar quantizer (Pascazio & Schirinzi, 2003), optimized for the Laplacian statistics of the ICA coefficients, and using a proper bit allocation strategy, first proposed in (Budillon et al., 2005) is analyzed in details and proved on different set of real data, obtained with ERS1, COSMO SkyMed and TerraSAR-X sensors.

#### **2. Overcomplete ICA**

,

52 Digital Image Processing

Many efforts have been taken place in order to develop suitable compression techniques of the bit stream necessary for raw data and/or focused images coding (Kwok & Jhonson, 1989, Pascazio & Schirinzi, 2003, Eichel & Ives, 1999, Dony & Haykin, 1997, Baxter, 1999).

The most widely used compression techniques are based on Block Adaptive Quantization (BAQ), due to its simplicity for coding and decoding. The algorithm is based on the consideration that the SAR (complex) signal has commonly a Gaussian, with real and imaginary parts mutually independent and practically uncorrelated between adjacent pixels. It divides the data in blocks, and for each block computes the standard deviation

in order to determine the optimum quantizer, which adapts to the changing levels of the signal (Kwok & Jhonson, 1989). A non-uniform quantizer (or Lloyd-Max quantizer) that minimizes the Mean Squared Error (MSE) for a given number of quantization levels (Goyal et al., 1998) is commonly used. The minimum block size is selected in order to guarantee a Gaussian statistic within a block; the maximum block size is limited by the fact that the

Transform coding algorithms have been also applied on SAR intensity images (Dony & Haykin, 1997, Eichel & Ives, 1999, Baxter, 1999) and on SAR raw data (Pascazio & Schirinzi, 2003). They are based on the decomposition of the signal to be encoded in an orthonormal basis. Then, each decomposition coefficient is approximated by a quantized variable. The role of the signal decomposition is to decorrelate the signal and to make the subsequent quantization process easier. The coding performance depends on the choice of the basis. The best basis compacts the image energy into the fewest coefficients. The small number of significant coefficients in the transformed domain results in a sparse representation, that can be coded with fewer bits, due to its low entropy. An entropy coder can be then used as the last step in the compression scheme. In (Dony & Haykin, 1997) a method which combines Karhunen-Loeve transform and Vector Quantisation is proposed, while in (Eichel & Ives, 1999) simply 2-D Fourier transformation is used, in (Baxter, 1999) a compression system based on the Gabor transform is adopted, in (Pascazio & Schirinzi, 2003) a transform coding compression method using wavelet basis is applied to SAR raw data. Wavelets have also been applied on SAR images by (Zeng & Cumming, 2001, Xingsong et al. 2004), in the first case it has been proposed a treestructured wavelet transform, while in the second case a compression scheme which combines the wavelet packet transform, the quadtree classification and universal trellis-

With the aim of reducing the number of significant representation coefficients, and obtaining a sparse representation, overcomplete dictionaries, or frames, have been recently proposed (Goyal et al., 1998). A frame is a set of column vectors, just as a transform, but with a larger number of vectors than the number of elements in each vector. The representation of the observed data in terms of overcomplete basis in not unique, then a constraint have to be enforced to recover uniqueness. To achieve a sparse representation the introduced constraint can be the minimization of the significant coefficients by using an 1 norm based penalty. It can be shown that in this case the problem to be solved is a linear programming problem, that can be viewed as a Maximum a Posteriori (MAP) estimation problem with a Laplacian prior distribution assumption

signal power should be approximately constant in the block.

coded quantization has been adopted.

(Hyvarinen et al., 1999).

Independent Component Analysis (ICA) (Hyvarinen et al., 1999) has been proposed as a statistical generative model that allows to represent the observed data as a linear transformation of variables that are non Gaussian and mutually independent. The model is the following:

$$\mathbf{x} = \mathbf{A}\mathbf{s} \tag{1}$$

where 1 2 , ,.... *<sup>T</sup>* **x** *xx xm* is the random vector representing the observed data, 1 2 , ,.... *<sup>T</sup> <sup>n</sup>* **s** *ss s* is the random vector of the independent components, **A** is an unknown constant matrix, called the mixing matrix or basis matrix.

The overcomplete ICA paradigm (Hyvarinen et al., 1999) assumes *n*>*m*. This means we have a larger number of independent components and we can more easily adapt to signal statistics. Since the matrix **A** is not invertible, even if it is known, the estimation of the independent components is an undetermined problem that does not admit a unique solution. Then a constraint on the statistical distribution of the ICA coefficients is introduced to solve the problem.

It can be shown (Hyvarinen et al., 1999) that assuming a Laplacian distribution, the optimal estimation of the coefficients ˆ *<sup>i</sup> s* leads to the minimization of their 1 -norm with the constraint **x As**

$$\hat{s} = \underset{x = As}{\text{argmin}} \quad \sum\_{i} |s\_i| \tag{2}$$

The choice of the Laplacian distribution is convenient with respect to the compression application since permits to have a sparse representation with a small number of non zero coefficient ˆ *<sup>i</sup> s* .

For the estimation of the basis matrix we adopted a modification of FastICA proposed in (Hyvarinen et al., 1999), that searches for "quasi-orthogonal" basis.

#### **3. Entropy constrained scalar quantizer and bit allocation**

In this section we analyze the performance of a scalar quantizer, optimized for Laplaciandistributed coefficients. In particular, we consider an entropy constrained scalar quantizer,

Low Bit Rate SAR Image Compression Based on Sparse Representation 55

The total number of bits necessary to encode a data frame is given by the number of bits necessary to encode the significance map, plus the number of bits necessary to encode the significant coefficients. Of course, the total bit rates decreases as the coefficients probability density function (pdf) becomes more peaked, so that the number of significant coefficients

The performance of the quantizer considered depends on three parameters: the threshold value *T*, the number of the quantization levels *L*, and the saturation factor *K*. These parameters must be set in such a way as to minimize distortion for an assigned rate value. The optimization of the threshold quantizer performance with respect to parameters *T*, *L* and *K* can be performed once for a unit variance Laplacian variable. Its performance for

A desired bit rate can be achieved for different values of the coder parameters. The optimal coder parameters ˆˆˆ *TLK* , , for an assigned bit rate can be found by minimizing the distortion, that for Laplacian distribution can be expressed in an analytical form, following the method

The minimum distortion-rate curve ˆ ˆˆˆ *D R DRTLK* (,,, ) for a unit variance Laplacian distribution obtained using the threshold entropy constrained quantizer, is shown in Fig. 2 (solid line), where it is compared with the curves obtained with an entropy constrained

Fig. 2. Distortion-rate function for an optimally threshold quantized Laplacian signal of unit variance (solid line), compared with that obtained with an optimal uniform quantizer (dots).

can be simply inferred by that found for an unit

, and the

decreases and the significance map becomes more correlated (Mallat & Falzon, 1998).

.

presented in (Pascazio & Schirinzi, 2003), for the Gaussian case.

variance Laplacian signal, by simply multiplying the threshold value by

Laplacian signals with variance <sup>2</sup>

obtained distortion by the variance <sup>2</sup>

uniform quantizer (dotted line) (Algra, 2000).

defined as the quantizer minimizing the quadratic distortion for a given value of the entropy of the quantized coefficients (Jayant & Noll, 1984).

It is already known that under the high-resolution quantization hypothesis, i. e. if the number of quantization levels is sufficiently large a uniform quantizer is optimal for a large class of probability distributions [9]. However, at low bit rates, a uniform quantizer is not optimal. Among the non-uniform quantizers, we consider the threshold quantizer (or quasiuniform quantizer), which assigns zero to the coefficients whose amplitude lies inside a proper interval [ ,] *T T* , and uniformly quantizes the others with a quantization step All the samples whose absolute value exceeds a certain saturation amplitude *K* are quantized to the highest or lowest quantization level (see Fig. 1) depending on their sign. The saturation factor *K* can thus be defined as the ratio between the quantizer saturation value and the standard deviation of the coefficient. Note that this quantizer is symmetric and works with an even number of quantization levels.

Fig. 1. Six levels threshold quantizer.

If the decomposition basis is chosen so that many coefficients are close to zero and few of them have a large amplitude, the threshold quantizer tends to an optimal entropy constrained quantizer (Mallat & Falzon, 1998). The larger the number of coefficients close to zero, the closer the threshold quantizer is to the optimal one.

At low bit rates, the decomposition coefficients are coarsely quantized, and many are set to zero. Thus it is convenient to scan the coefficients frame in a predefined order and store the position of zero versus non-zero quantized coefficients in a binary significance map. In the same scanning order, the amplitudes of the non-zero quantized coefficients are also entropy encoded with a Huffman or an arithmetic coding and transmitted together with the coded map.

defined as the quantizer minimizing the quadratic distortion for a given value of the

It is already known that under the high-resolution quantization hypothesis, i. e. if the number of quantization levels is sufficiently large a uniform quantizer is optimal for a large class of probability distributions [9]. However, at low bit rates, a uniform quantizer is not optimal. Among the non-uniform quantizers, we consider the threshold quantizer (or quasiuniform quantizer), which assigns zero to the coefficients whose amplitude lies inside a proper interval [ ,] *T T* , and uniformly quantizes the others with a quantization step All the samples whose absolute value exceeds a certain saturation amplitude *K* are quantized to the highest or lowest quantization level (see Fig. 1) depending on their sign. The saturation factor *K* can thus be defined as the ratio between the quantizer saturation value

If the decomposition basis is chosen so that many coefficients are close to zero and few of them have a large amplitude, the threshold quantizer tends to an optimal entropy constrained quantizer (Mallat & Falzon, 1998). The larger the number of coefficients close to

*-T-*3

*-T-*5

*-T-*

*T*

*T*+

*T*+2

*KT*+3

*C* 

At low bit rates, the decomposition coefficients are coarsely quantized, and many are set to zero. Thus it is convenient to scan the coefficients frame in a predefined order and store the position of zero versus non-zero quantized coefficients in a binary significance map. In the same scanning order, the amplitudes of the non-zero quantized coefficients are also entropy encoded with a Huffman or an arithmetic coding and transmitted together with the coded

of the coefficient. Note that this quantizer is symmetric and

entropy of the quantized coefficients (Jayant & Noll, 1984).

*-T*

*T*+

0

*T*+3

*T*+5

*C*ˆ

works with an even number of quantization levels.

*-T-*2 *-T-*

and the standard deviation

*-K-T-*3

Fig. 1. Six levels threshold quantizer.

map.

zero, the closer the threshold quantizer is to the optimal one.

The total number of bits necessary to encode a data frame is given by the number of bits necessary to encode the significance map, plus the number of bits necessary to encode the significant coefficients. Of course, the total bit rates decreases as the coefficients probability density function (pdf) becomes more peaked, so that the number of significant coefficients decreases and the significance map becomes more correlated (Mallat & Falzon, 1998).

The performance of the quantizer considered depends on three parameters: the threshold value *T*, the number of the quantization levels *L*, and the saturation factor *K*. These parameters must be set in such a way as to minimize distortion for an assigned rate value. The optimization of the threshold quantizer performance with respect to parameters *T*, *L* and *K* can be performed once for a unit variance Laplacian variable. Its performance for Laplacian signals with variance <sup>2</sup> can be simply inferred by that found for an unit variance Laplacian signal, by simply multiplying the threshold value by , and the obtained distortion by the variance <sup>2</sup> .

A desired bit rate can be achieved for different values of the coder parameters. The optimal coder parameters ˆˆˆ *TLK* , , for an assigned bit rate can be found by minimizing the distortion, that for Laplacian distribution can be expressed in an analytical form, following the method presented in (Pascazio & Schirinzi, 2003), for the Gaussian case.

The minimum distortion-rate curve ˆ ˆˆˆ *D R DRTLK* (,,, ) for a unit variance Laplacian distribution obtained using the threshold entropy constrained quantizer, is shown in Fig. 2 (solid line), where it is compared with the curves obtained with an entropy constrained uniform quantizer (dotted line) (Algra, 2000).

Fig. 2. Distortion-rate function for an optimally threshold quantized Laplacian signal of unit variance (solid line), compared with that obtained with an optimal uniform quantizer (dots).

Low Bit Rate SAR Image Compression Based on Sparse Representation 57

ݔ

8x8 Blocks

Overcomplete ICA bases

ICA coefficients *s*

Optimal threshold quantizer

ܣ

Ƹݏ

ܿ

Coded frames

Bit budget distribution

Fig. 3. Scheme of the proposed compression method.

SAR IMAGES

We can note that the optimal threshold quantizer performs better than a uniform one at very low bit rates (lower than 1.8), and reduces to a uniform one for higher bit rates. This behavior can be conveniently exploited when a proper bit allocation is adopted, since in this case it can happen very often that very low rates must be assigned to certain blocks, even if the total bit budget is not very small. If we apply the same threshold quantizer to the entire frame of the coefficients, we obtain the performance shown in Fig.2. Different results can be obtained by using different quantizer for the different blocks in which the coefficient frame is decomposed. In particular, we can use different quantizers and associated coders optimized for the statistic of each block. A bit allocation algorithm can then be used to distribute bits among the blocks.

A procedure that can be used for optimally distributing the assigned number of bits among the different blocks is described in (Pascazio & Schirinzi, 2003). It imposes an equal average distortion per block and assigns more bits to the blocks with a larger variance respect to those assigned to the blocks with a lower variance. The optimal number of bits for each block is determined by exploiting the minimum distortion curve of Fig. 2 (Pascazio & Schirinzi, 2003).

#### **4. Numerical results**

To test the performance of the proposed method we considered different SAR images obtained with different SAR sensors, using ERS1, COSMO SkyMed and TerraSAR-X data. We wanted to test the performance also on different kinds of areas that due to different backscatter characteristics may have different local statistical characteristics: we used three kinds of images that cover agricultural, suburban, and countryside areas.

Each single-look intensity image has been subdivided in data frames in the azimuth and range directions, respectively. Note that the SAR image pixels are floating point valued with a dynamic range of about 50 dB. Moreover, the SAR images are affected by the presence of speckle, typical of images generated by coherent systems, that has to be preserved to keep the information contained in the image.

Each frame has been subdivided in blocks of 8x8 pixels. The overcomplete ICA basis have been computed using the algorithms presented in [6], and starting from a set of 8x8 training vectors. The set size has to be larger than m. The used value for the ratio *m*/*n* is about 0.7.

Then, the ICA coefficients of each frame have been computed using Eq. (2). The ICA coefficients have, then, been quantized using the optimal threshold quantizer of Fig. 2, that, besides exhibiting a better performance, has the advantage of allowing any fractional bit rate value. For the bit budget distribution among the different coefficients vectors, the bit allocation procedure referred in Section III has been adopted. To sum up we followed for each image the scheme reported in Fig. 3

Different quality parameters can be considered to evaluate the performance of the compression method. In particular, one of the most meaningful parameters is the signal-tonoise ratio (SNR), or its reciprocal, the normalized average distortion *D* computed on the SAR images obtained after the processing of the compressed data. We have chosen to evaluate the normalized distortion in order to compare the obtained results with the distortion-rate curve reported in Fig. 2.

We can note that the optimal threshold quantizer performs better than a uniform one at very low bit rates (lower than 1.8), and reduces to a uniform one for higher bit rates. This behavior can be conveniently exploited when a proper bit allocation is adopted, since in this case it can happen very often that very low rates must be assigned to certain blocks, even if the total bit budget is not very small. If we apply the same threshold quantizer to the entire frame of the coefficients, we obtain the performance shown in Fig.2. Different results can be obtained by using different quantizer for the different blocks in which the coefficient frame is decomposed. In particular, we can use different quantizers and associated coders optimized for the statistic of each block. A bit allocation algorithm can then be used to

A procedure that can be used for optimally distributing the assigned number of bits among the different blocks is described in (Pascazio & Schirinzi, 2003). It imposes an equal average distortion per block and assigns more bits to the blocks with a larger variance respect to those assigned to the blocks with a lower variance. The optimal number of bits for each block is determined by exploiting the minimum distortion curve of Fig. 2 (Pascazio &

To test the performance of the proposed method we considered different SAR images obtained with different SAR sensors, using ERS1, COSMO SkyMed and TerraSAR-X data. We wanted to test the performance also on different kinds of areas that due to different backscatter characteristics may have different local statistical characteristics: we used three

Each single-look intensity image has been subdivided in data frames in the azimuth and range directions, respectively. Note that the SAR image pixels are floating point valued with a dynamic range of about 50 dB. Moreover, the SAR images are affected by the presence of speckle, typical of images generated by coherent systems, that has to be preserved to keep

Each frame has been subdivided in blocks of 8x8 pixels. The overcomplete ICA basis have been computed using the algorithms presented in [6], and starting from a set of 8x8 training vectors. The set size has to be larger than m. The used value for the ratio *m*/*n* is about 0.7.

Then, the ICA coefficients of each frame have been computed using Eq. (2). The ICA coefficients have, then, been quantized using the optimal threshold quantizer of Fig. 2, that, besides exhibiting a better performance, has the advantage of allowing any fractional bit rate value. For the bit budget distribution among the different coefficients vectors, the bit allocation procedure referred in Section III has been adopted. To sum up we followed for

Different quality parameters can be considered to evaluate the performance of the compression method. In particular, one of the most meaningful parameters is the signal-tonoise ratio (SNR), or its reciprocal, the normalized average distortion *D* computed on the SAR images obtained after the processing of the compressed data. We have chosen to evaluate the normalized distortion in order to compare the obtained results with the

kinds of images that cover agricultural, suburban, and countryside areas.

distribute bits among the blocks.

Schirinzi, 2003).

**4. Numerical results** 

the information contained in the image.

each image the scheme reported in Fig. 3

distortion-rate curve reported in Fig. 2.

Fig. 3. Scheme of the proposed compression method.

Low Bit Rate SAR Image Compression Based on Sparse Representation 59

Fig. 4. SAR single look ERS-1 intensity image, relative to Fleevoland, in The Netherlands.

$$D = \frac{\text{quantization noise power}}{\text{SAR data power}} = \frac{\left\|\mathbf{x} - \hat{\mathbf{x}}\right\|^2}{\left\|\mathbf{x}\right\|^2}.$$

We reported also the distortion evaluated on the ICA coefficients.

$$D\_s = \frac{\text{quantization ICA coefficients noise power}}{\text{ICA coefficients power}} = \frac{\left\|\mathbf{s} - \hat{\mathbf{s}}\right\|^2}{\left\|\mathbf{s}\right\|^2}.$$

First a single look ERS-1 intensity image, relative to Fleevoland, in The Netherlands, was considered. The original image is shown in Fig. 4.

The (equalized) intensity of a frame is shown in Fig. 5. The obtained statistical distribution of each ICA coefficient, as expected, has a Laplacian behaviour as shown in Fig. 6. The ICA basis are reported in Fig. 7.

The average distortion obtained for different bit rates are presented in Table 1. Note that the bit rate values represent the average number of bits per pixel of the image frame. Being the dimension of the ICA basis larger than that of the observation domain, the average number of bits per coefficient is smaller (it is scaled by the factor *m*/*n*).

Quantized coefficients have then been used to reconstruct the corresponding image using Eq. (1). The image frame obtained with an average rate per sample *R*=2, is shown in Fig. 8.

Secondly we considered a single look COSMO SkyMed of Naples surroundings, in Italy, shown in Fig. 9.

The (equalized) intensity of a frame is shown in Fig. 10. The obtained statistical distribution of each coefficient, as expected, has a Laplacian behaviour as shown in Fig. 11. The ICA basis are reported in Fig. 12.

The average distortion obtained for different bit rates are presented in Table 2.

The image frame obtained with an average rate per sample *R*=2, is shown in Fig. 13.

Thirdly we considered a single look TerraSAR-X of Frankfurt surrounding, in Germany, shown in Fig. 14.

The (equalized) intensity of a frame is shown in Fig. 15. The obtained statistical distribution of each coefficient, as expected, has a Laplacian behaviour as shown in Fig. 16. The ICA basis are reported in Fig. 17.

The average distortion obtained for different bit rates are presented in Table 3.

The image frame obtained with an average rate per sample *R*=2, is shown in Fig. 18.

It can be noted that in all cases the average image distortions reported in Table 1,2 and 3 are below the value of -11 dB obtained for rate 2 in the curve reported in Fig. 2 for an optimally threshold quantized Laplacian signal of unit variance.

Moreover in all the cases there is no visual appreciable degradation in the reconstructed images using the quantized coefficients of the overcomplete ICA basis.

*D*

*D*

basis are reported in Fig. 7.

shown in Fig. 9.

shown in Fig. 14.

basis are reported in Fig. 12.

basis are reported in Fig. 17.

considered. The original image is shown in Fig. 4.

We reported also the distortion evaluated on the ICA coefficients.

of bits per coefficient is smaller (it is scaled by the factor *m*/*n*).

quantization noise power <sup>ˆ</sup> . SAR data power

quantization ICA coefficients noise power <sup>ˆ</sup> . ICA coefficients power *<sup>s</sup>*

First a single look ERS-1 intensity image, relative to Fleevoland, in The Netherlands, was

The (equalized) intensity of a frame is shown in Fig. 5. The obtained statistical distribution of each ICA coefficient, as expected, has a Laplacian behaviour as shown in Fig. 6. The ICA

The average distortion obtained for different bit rates are presented in Table 1. Note that the bit rate values represent the average number of bits per pixel of the image frame. Being the dimension of the ICA basis larger than that of the observation domain, the average number

Quantized coefficients have then been used to reconstruct the corresponding image using Eq. (1). The image frame obtained with an average rate per sample *R*=2, is shown in Fig. 8. Secondly we considered a single look COSMO SkyMed of Naples surroundings, in Italy,

The (equalized) intensity of a frame is shown in Fig. 10. The obtained statistical distribution of each coefficient, as expected, has a Laplacian behaviour as shown in Fig. 11. The ICA

Thirdly we considered a single look TerraSAR-X of Frankfurt surrounding, in Germany,

The (equalized) intensity of a frame is shown in Fig. 15. The obtained statistical distribution of each coefficient, as expected, has a Laplacian behaviour as shown in Fig. 16. The ICA

It can be noted that in all cases the average image distortions reported in Table 1,2 and 3 are below the value of -11 dB obtained for rate 2 in the curve reported in Fig. 2 for an optimally

Moreover in all the cases there is no visual appreciable degradation in the reconstructed

The average distortion obtained for different bit rates are presented in Table 2.

The average distortion obtained for different bit rates are presented in Table 3.

images using the quantized coefficients of the overcomplete ICA basis.

threshold quantized Laplacian signal of unit variance.

The image frame obtained with an average rate per sample *R*=2, is shown in Fig. 18.

The image frame obtained with an average rate per sample *R*=2, is shown in Fig. 13.

2 2

> 2 2

*s s*

*s*

*x x*

*x*

Fig. 4. SAR single look ERS-1 intensity image, relative to Fleevoland, in The Netherlands.

Low Bit Rate SAR Image Compression Based on Sparse Representation 61

Fig. 8. SAR ERS-1 frame obtained by quantized coefficients of the overcomplete ICA basis

Fig. 7. SAR ERS-1 image overcomplete ICA basis.

with *R*=2.

Fig. 5. SAR ERS-1 frame.

Fig. 6. SAR ERS-1 image ICA coefficients empirical probability distribution functions.

Fig. 6. SAR ERS-1 image ICA coefficients empirical probability distribution functions.







0

1

0

1

0

0.5

0.5

0.5

1

0

2

0

1

0

0.5

1

0.5

1

Fig. 5. SAR ERS-1 frame.

0

1

0

1

0

0.5

0.5

0.5

1




Fig. 7. SAR ERS-1 image overcomplete ICA basis.

Fig. 8. SAR ERS-1 frame obtained by quantized coefficients of the overcomplete ICA basis with *R*=2.

Low Bit Rate SAR Image Compression Based on Sparse Representation 63

Fig. 11. SAR COSMO-SkyMed image ICA coefficients empirical probability distribution







0

1

0

1

0

0.5

0.5

0.5

1

0

1

0

1

0

0.5

0.5

0.5

1

Fig. 10. SAR COSMO-SkyMed frame.




0

1

0

1

0

0.5

0.5

0.5

1

functions.


Table 1. Rate-Distortion values for the SAR ERS-1 image.

Fig. 9. SAR COSMO-SkyMed intensity image of Naples surroundings, Italy.

**Coefficients average distortion (dB)**

1 -5.4 -11.1 1.5 -8.3 -13.2 2. -11.6 -15.3

Fig. 9. SAR COSMO-SkyMed intensity image of Naples surroundings, Italy.

Table 1. Rate-Distortion values for the SAR ERS-1 image.

**Overcomplete ICA** 

**Image average distortion (dB)**

**Average rate** 

Fig. 10. SAR COSMO-SkyMed frame.

Fig. 11. SAR COSMO-SkyMed image ICA coefficients empirical probability distribution functions.

Low Bit Rate SAR Image Compression Based on Sparse Representation 65

**Coefficients average distortion (dB)**

1 -4.4 -8 1.5 -6.9 -10.3 2. -10.8 -13.2 Table 2. Rate-Distortion values for the SAR COSMO-SkyMed image.

Fig. 14. TerraSAR-X intensity image of Frankfurt surrounding, Germany.

**Overcomplete ICA** 

**Image average distortion (dB)**

**Average rate** 

Fig. 12. SAR COSMO-SkyMed image overcomplete ICA basis.

Fig. 13. SAR COSMO-SkyMed frame obtained by quantized coefficients of the overcomplete ICA basis with *R*=2.

Fig. 13. SAR COSMO-SkyMed frame obtained by quantized coefficients of the overcomplete

Fig. 12. SAR COSMO-SkyMed image overcomplete ICA basis.

ICA basis with *R*=2.


Table 2. Rate-Distortion values for the SAR COSMO-SkyMed image.

Fig. 14. TerraSAR-X intensity image of Frankfurt surrounding, Germany.

Low Bit Rate SAR Image Compression Based on Sparse Representation 67

Fig. 17. TerraSAR-X image overcomplete ICA basis.

Fig. 15. TerraSAR-X frame.

Fig. 16. TerraSAR-X image ICA coefficients empirical probability distribution functions.

Fig. 16. TerraSAR-X image ICA coefficients empirical probability distribution functions.







0 0.2 0.4 0.6 0.8

0 0.2 0.4 0.6 0.8

0 0.2 0.4 0.6 0.8

0 0.2 0.4 0.6 0.8

0 0.2 0.4 0.6 0.8

0 0.2 0.4 0.6 0.8

Fig. 15. TerraSAR-X frame.

0 0.2 0.4 0.6 0.8

0 0.2 0.4 0.6 0.8

0 0.2 0.4 0.6 0.8




Fig. 17. TerraSAR-X image overcomplete ICA basis.

Low Bit Rate SAR Image Compression Based on Sparse Representation 69

statistical distribution, allowing to discard many coefficients and to use a high bit rate only

We thank for providing the real data the Italian Space Agency (ASI) under the contract "Imaging and Monitoring with Multitemporal/Multiview COSMO/SkyMed SAR Data" (ID: 2246) and the German Aerospace Center (Deutsches Zentrum für Luft- und Raumfahrt;

Algra, T. (2000). Compression of raw SAR data using entropy-constrained quantization,

Baxter, R.A., (1999). SAR image compression with the Gabor transform, *IEEE Trans. Geosci.* 

Budillon, A., Cuozzo, G., D'Elia, C., Schirinzi, G., (2005). Application of overcomplete ICA

Comon, P. (1994). Independent component analysis-a new concept?, *Signal Processing*, vol.

Curlander, J.C., McDonough, R. N. (1991). *Synthetic Aperture Radar, Systems and Signal* 

Dony, R.D., Haykin, S., (1997). Compression of SAR images using KLT, VQ and mixture of

Eichel, P., Ives,R.W. (1999). Compression of Complex-Valued SAR Images, *IEEE Trans. On* 

Goyal, V. K., Vetterli, M. ,Thao, N. T. (1998). Quantized overcomplete expansions in RN:

Hyvarinen, A., Cristescu, R., Oja, E. (1999). A Fast algorithm for estimating overcomplete

Jayant, N. J., Noll, P. (1984), *Digital Coding of Waveforms*, Englewood Cliffs, NJ: Prentice Hall

Kwok, R., Jhonson, W. T. K. (1989). Block Adaptive Quantization of Magellan SAR Data,

Mallat, S., Falzon, F. (1998). Analysis of Low Bit Rate Image Transform Coding, *IEEE Trans.* 

Pascazio, V., Schirinzi, G. (2003). SAR Raw Data Compression by Sub-Band Coding, *IEEE* 

Xingsong, H., Guizhong, L., Yiyang, Z., (2004). SAR image data compression using wavelet

packet transform and universal-trellis coded quantization, *IEEE Trans. Geosci.* 

*IEEE Trans. Geosci. Remote Sensing*, GRS-27, pp. 375-383, 1989.

*Trans. Geosci. Rem. Sensing*, GRS-41, pp. 964-976, 2003.

*Proc. of IEEE Int. Geosci. and Remote Sens. Symp., IGARSS 2000*, pp. 2660-2662,

to SAR image compression, *Proc. of IEEE Int. Geosci. and Remote Sens. Symp.,* 

principal components, *IEE Proceedings on Radar, Sonar and Navigation*, vol. 144, pp

Analysis, synthesis, and algorithms, *IEEE Trans. Inform. Theory*., 44, pp. 16-31, 1998.

ICA bases for image windows, *Proc. Int. Joint Conf. on Neural Networks*, pp. 894-899,

for few significant coefficients in order to keep low bit rates.

DLR) for the TerraSAR-X data (proposal MTH0941).

*Remote Sensing*, GRS-77, pp. 574-588, 1999.

*IGARSS 2005*, Seul, South Corea, July, 2005.

*Processing*, New York: Wiley, 1991.

*Image Proc.,* IP-8, pp. 1483-1487, 1999.

*Signal Processing,* SP-46, pp.1027-1042, 1998.

*Remote Sensing*, GRS-42, pp. 2632-2641, 2004.

Honolulu, USA, July,2000.

36, pp. 287-314, 1994.

Washington, D.C., 1999.

113-120,1997.

Inc. 1984.

**6. Acknowledgment** 

**7. References** 

Fig. 18. TerraSAR-X frame obtained by quantized coefficients of the overcomplete ICA basis with *R*=2.


Table 3. Rate-Distortion values for the TerraSAR-X image.

#### **5. Conclusion**

In this paper the performance of a compression method based on an overcomplete ICA representation, coupled with the use of an entropy constrained scalar quantizer, optimized for the Laplacian statistics of the ICA coefficients, and using a proper bit allocation strategy, has been analyzed in details and proved on different set of real data, obtained with ERS1, COSMO SkyMed and TerraSAR-X sensors. The best performances in terms of ratedistortions are obtained on the TerraSAR-X data frame, since it is relative to an enough uniform reflectivity area, while the worse on the COSMO SkyMed image frame, where more details were present. In any cases the image average distortions are below the one obtained with an optimally threshold quantized Laplacian signal of unit variance. The ICA coefficients exhibits the statistical behaviour forced by the fast ICA algorithm, in fact the probability empirical distributions are in all cases a good approximation of a Laplacian distribution. This behaviour allows to use a threshold quantizer optimized for this particular statistical distribution, allowing to discard many coefficients and to use a high bit rate only for few significant coefficients in order to keep low bit rates.

#### **6. Acknowledgment**

68 Digital Image Processing

Fig. 18. TerraSAR-X frame obtained by quantized coefficients of the overcomplete ICA basis

In this paper the performance of a compression method based on an overcomplete ICA representation, coupled with the use of an entropy constrained scalar quantizer, optimized for the Laplacian statistics of the ICA coefficients, and using a proper bit allocation strategy, has been analyzed in details and proved on different set of real data, obtained with ERS1, COSMO SkyMed and TerraSAR-X sensors. The best performances in terms of ratedistortions are obtained on the TerraSAR-X data frame, since it is relative to an enough uniform reflectivity area, while the worse on the COSMO SkyMed image frame, where more details were present. In any cases the image average distortions are below the one obtained with an optimally threshold quantized Laplacian signal of unit variance. The ICA coefficients exhibits the statistical behaviour forced by the fast ICA algorithm, in fact the probability empirical distributions are in all cases a good approximation of a Laplacian distribution. This behaviour allows to use a threshold quantizer optimized for this particular

**Coefficients average distortion (dB)** 

1 -5.7 -10.6 1.5 -8.5 -13.4 2. -11.4 -16.4

Table 3. Rate-Distortion values for the TerraSAR-X image.

**Overcomplete ICA** 

**Image average distortion (dB)** 

with *R*=2.

**Average rate** 

**5. Conclusion** 

We thank for providing the real data the Italian Space Agency (ASI) under the contract "Imaging and Monitoring with Multitemporal/Multiview COSMO/SkyMed SAR Data" (ID: 2246) and the German Aerospace Center (Deutsches Zentrum für Luft- und Raumfahrt; DLR) for the TerraSAR-X data (proposal MTH0941).

#### **7. References**


**1. Introduction** 

applications of object detection, face detection, etc.

**Polygonal Representation** 

Dilip K. Prasad1 and Maylor K. H. Leung2

Approximating digital curves using polygonal approximations is required in many image processing applications [Kolesnikov & Fränti, 2003, 2005; Lavallee & Szeliski, 1995; Leung, 1990; Mokhtarian & Mackworth, 1986; Prasad*, et al.*, 2011; Prasad & Leung, 2010a, 2010b; Prasad & Leung, 2010; Prasad & Leung, 2012; Prasad*, et al.*, 2011a]. Such representation is used for representing noisy digital curves in a more robust manner, reducing the computational resources required for processing and storing them, and for computing various geometrical properties of digital curves. Specifically, properties like curvature estimation, tangent estimation, detecting inflexion points, perimeter of the curves, etc., which are very sensitive to the digitization noise. Polygonal approximation is also useful in topological representation, segmentation and contour feature extraction in the

Most contemporary methods require some form of control parameter for selecting the most representative points (referred to as the dominant points) in the digital curve to be used as the vertices of the polygonal approximation [Arcelli & Ramella, 1993; Bhowmick & Bhattacharya, 2007; Carmona-Poyato*, et al.*, 2005; Carmona-Poyato*, et al.*, 2010; Carmona-Poyato*, et al.*, 2011; Chung*, et al.*, 2008; Chung*, et al.*, 1994; Davis, 1999; Debled-Rennesson*, et al.*, 2005; Douglas & Peucker, 1973; Gritzali & Papakonstantinou, 1983; Kanungo*, et al.*, 1995; Kolesnikov, 2008; Kolesnikov & Fränti, 2003, 2005, 2007; Latecki*, et al.*, 2009; Lavallee & Szeliski, 1995; Leung, 1990; Lowe, 1987; Marji & Siy, 2004; Mokhtarian & Mackworth, 1986; Pavlidis, 1976; Perez & Vidal, 1994; Phillips & Rosenfeld, 1987; Prasad & Leung, 2010c; Ramer, 1972; Ray & Ray, 1992; Rosin, 1997, 2002; Salotti, 2002; Sankar & Sharma, 1978; Sarkar, 1993; Sato, 1992; Sklansky & Gonzalez, 1980; Tomek, 1975; Wall & Danielsson, 1984; Wang*, et al.*, 2008]. The value of control parameter in all the known algorithms is chosen heuristically. In reality, choosing such control parameter can be very challenging because a suitable value of such control parameters depends upon the nature of the digital curve and one value may not be suitable for all the curves in an image and definitely not suitable for all images in a dataset or an application. In section 2, we first propose a parameter

independent method for polygonal approximation of the digital curves.

*2Universiti Tunku Abdul Rahman (Kampar)* 

**of Digital Curves** 

*1Singapore 2Malaysia* 

*1Nanyang Technological University* 

Zhaohui Zeng; Cumming, I.G., (2001). SAR image data compression using a tree-structured wavelet transform*, IEEE Trans. Geosci. Remote Sensing*, GRS-39, pp. 546-552, 2001. **4** 

### **Polygonal Representation of Digital Curves**

Dilip K. Prasad1 and Maylor K. H. Leung2 *1Nanyang Technological University 2Universiti Tunku Abdul Rahman (Kampar) 1Singapore 2Malaysia* 

#### **1. Introduction**

70 Digital Image Processing

Zhaohui Zeng; Cumming, I.G., (2001). SAR image data compression using a tree-structured

Approximating digital curves using polygonal approximations is required in many image processing applications [Kolesnikov & Fränti, 2003, 2005; Lavallee & Szeliski, 1995; Leung, 1990; Mokhtarian & Mackworth, 1986; Prasad*, et al.*, 2011; Prasad & Leung, 2010a, 2010b; Prasad & Leung, 2010; Prasad & Leung, 2012; Prasad*, et al.*, 2011a]. Such representation is used for representing noisy digital curves in a more robust manner, reducing the computational resources required for processing and storing them, and for computing various geometrical properties of digital curves. Specifically, properties like curvature estimation, tangent estimation, detecting inflexion points, perimeter of the curves, etc., which are very sensitive to the digitization noise. Polygonal approximation is also useful in topological representation, segmentation and contour feature extraction in the applications of object detection, face detection, etc.

Most contemporary methods require some form of control parameter for selecting the most representative points (referred to as the dominant points) in the digital curve to be used as the vertices of the polygonal approximation [Arcelli & Ramella, 1993; Bhowmick & Bhattacharya, 2007; Carmona-Poyato*, et al.*, 2005; Carmona-Poyato*, et al.*, 2010; Carmona-Poyato*, et al.*, 2011; Chung*, et al.*, 2008; Chung*, et al.*, 1994; Davis, 1999; Debled-Rennesson*, et al.*, 2005; Douglas & Peucker, 1973; Gritzali & Papakonstantinou, 1983; Kanungo*, et al.*, 1995; Kolesnikov, 2008; Kolesnikov & Fränti, 2003, 2005, 2007; Latecki*, et al.*, 2009; Lavallee & Szeliski, 1995; Leung, 1990; Lowe, 1987; Marji & Siy, 2004; Mokhtarian & Mackworth, 1986; Pavlidis, 1976; Perez & Vidal, 1994; Phillips & Rosenfeld, 1987; Prasad & Leung, 2010c; Ramer, 1972; Ray & Ray, 1992; Rosin, 1997, 2002; Salotti, 2002; Sankar & Sharma, 1978; Sarkar, 1993; Sato, 1992; Sklansky & Gonzalez, 1980; Tomek, 1975; Wall & Danielsson, 1984; Wang*, et al.*, 2008]. The value of control parameter in all the known algorithms is chosen heuristically. In reality, choosing such control parameter can be very challenging because a suitable value of such control parameters depends upon the nature of the digital curve and one value may not be suitable for all the curves in an image and definitely not suitable for all images in a dataset or an application. In section 2, we first propose a parameter independent method for polygonal approximation of the digital curves.

Polygonal Representation of Digital Curves 73

the value of tol *d* [Prasad*, et al.*, 2011a]. First we show that if a continuous line segment is digitized then the maximum distance between that digital line segment and the continuous line segment is bounded and can be computed analytically. Then, this bound can be used to

We consider the effect of digitization on the slope of a line connecting two points (which may or may not be pixels) [Prasad*, et al.*, 2011a]. Due to digitization in the case of images, a

where round( ) *x* denotes the rounding of the value of real number *x* to its nearest integer.

*x x y y xy* round( ); round( ); , (4)

*xx xyy y x* ; ; 0.5 0.5, 0.5 0.5 *y* (5)

general point *Pxy* (,) is approximated by a pixel *Pxy* (,) as follows:

Fig. 1. Representation of the line *P P*1 2 and the digitized line *P P*1 2 .

*m m*

estimate of the error. This angular difference is given as:

*P*2 using (4). See Fig. 1 for the illustration. Then *m* and *m* are given as:

(digital line) be denoted as *m* , where *P*<sup>1</sup>

Let the slope of the line *P P*1 2 (actual line) be denoted as *m* and the slope of the line *P P*1 2

and *P*<sup>2</sup>

2 1 2 1 *y y <sup>m</sup> x x*

2 1 2 1 2 1 2 1 2 1 2 1 <sup>1</sup> *y y y y x x*

*x x xx xx* 

The angular difference between the numeric tangent and the digital tangent is used as the

are obtained by digitization of *P*1 and

(7)

(6)

choose the value of tol *d* adaptively.

*Pxy* (,) satisfy the following:

In addition to the heuristics, another issue is the problem of measuring the quality of the polygon fitted by an algorithm. It was shown in [Carmona-Poyato*, et al.*, 2011; Rosin, 1997] that most contemporary metrics to compare and benchmark such algorithms are ineffective for different types of digital curves. The reason for this is that the polygonal approximation has conflicting requirements in terms of the local and global quality of fit. In section 3, we show explicitly that these requirements are conflicting. Quality metrics for local and global characteristics are presented in section 3.5. The presented metrics can be used to measure the quality of not only one edge of the approximated polygons, but also for the complete polygon for a digital curve and for all curves in an image.

A few contemporary methods are discussed qualitatively in section 4 and numerical comparisons are provided in section 5. The conclusions are presented in section 6.

#### **2. Parameter independent polygonal approximation method**

The proposed method uses the framework of the method proposed by Lowe [Lowe, 1987] and Ramer-Douglas-Peucker [Douglas & Peucker, 1973; Ramer, 1972] (referred to as L-RDP method for convenience). The L-RDP method of fitting a series of line segment over a digital curve is described here. For a digital curve *e PP P* 1 2 *<sup>N</sup>* , where *Pi* is the *i* th edge pixel in the digital curve *e* . The line passing through a pair of pixels (,) *Px y aaa* and (,) *Px y bbb* is given by:

$$\left(\mathbf{x}\left(y\_a - y\_b\right) + y\left(\mathbf{x}\_b - \mathbf{x}\_a\right) + y\_b\mathbf{x}\_a - y\_a\mathbf{x}\_b = \mathbf{0}\right.\tag{1}$$

Then the deviation *<sup>i</sup> d* of a pixel (,) *Px y e iii* from the line passing through the pair *P P* <sup>1</sup> , *<sup>N</sup>* is given as:

$$d\_i = \left| \mathbf{x}\_i \left( y\_1 - y\_N \right) + y\_i \left( \mathbf{x}\_N - \mathbf{x}\_1 \right) + y\_N \mathbf{x}\_1 - y\_1 \mathbf{x}\_N \right|. \tag{2}$$

Accordingly, the pixel with maximum deviation can be found. Let it be denoted as *P*max . Then considering the pairs *P P* 1 max , and *P P* max , *<sup>N</sup>* , we find two new pixels from *e* using the concept in the equations (1) and (2). It is evident that the maximum deviation goes on decreasing as we choose newer pixels of maximum deviation between a pair. This process can be repeated till a certain condition (depending upon the method) is satisfied by all the line segments. This condition shall be referred to as the optimization goal for the ease of reference.

The condition used by L-RDP [Douglas & Peucker, 1973; Lowe, 1987; Ramer, 1972] is that for each line segment, the maximum deviation of the pixels contained in its corresponding edge segment is less than a certain tolerance value:

$$\max(d\_i) < d\_{\text{tol}}\,. \tag{3}$$

where tol *d* is the chosen threshold.

In general, the value of tol *d* is chosen heuristically to be a few pixels and tol *d* functions as the control parameter. Now, we present the method to choose the value of tol *d* automatically using the characteristics of the line, such that the user does not need to specify

In addition to the heuristics, another issue is the problem of measuring the quality of the polygon fitted by an algorithm. It was shown in [Carmona-Poyato*, et al.*, 2011; Rosin, 1997] that most contemporary metrics to compare and benchmark such algorithms are ineffective for different types of digital curves. The reason for this is that the polygonal approximation has conflicting requirements in terms of the local and global quality of fit. In section 3, we show explicitly that these requirements are conflicting. Quality metrics for local and global characteristics are presented in section 3.5. The presented metrics can be used to measure the quality of not only one edge of the approximated polygons, but also for the complete

A few contemporary methods are discussed qualitatively in section 4 and numerical

The proposed method uses the framework of the method proposed by Lowe [Lowe, 1987] and Ramer-Douglas-Peucker [Douglas & Peucker, 1973; Ramer, 1972] (referred to as L-RDP method for convenience). The L-RDP method of fitting a series of line segment over a digital curve is described here. For a digital curve *e PP P* 1 2 *<sup>N</sup>* , where *Pi* is the *i* th edge pixel in the digital curve *e* . The line passing through a pair of pixels (,) *Px y aaa* and

Then the deviation *<sup>i</sup> d* of a pixel (,) *Px y e iii* from the line passing through the pair *P P* <sup>1</sup> , *<sup>N</sup>*

Accordingly, the pixel with maximum deviation can be found. Let it be denoted as *P*max . Then considering the pairs *P P* 1 max , and *P P* max , *<sup>N</sup>* , we find two new pixels from *e* using the concept in the equations (1) and (2). It is evident that the maximum deviation goes on decreasing as we choose newer pixels of maximum deviation between a pair. This process can be repeated till a certain condition (depending upon the method) is satisfied by all the line segments. This condition shall be referred to as the optimization goal for the ease of

The condition used by L-RDP [Douglas & Peucker, 1973; Lowe, 1987; Ramer, 1972] is that for each line segment, the maximum deviation of the pixels contained in its corresponding edge

In general, the value of tol *d* is chosen heuristically to be a few pixels and tol *d* functions as the control parameter. Now, we present the method to choose the value of tol *d* automatically using the characteristics of the line, such that the user does not need to specify

0 *a b b a ba ab x y y y x x yx yx* . (1)

*i i N iN N N* 1 1 <sup>1</sup> <sup>1</sup> *d x y y y x x y x yx* . (2)

max( )*<sup>i</sup>* tol *d d* . (3)

comparisons are provided in section 5. The conclusions are presented in section 6.

**2. Parameter independent polygonal approximation method** 

polygon for a digital curve and for all curves in an image.

(,) *Px y bbb* is given by:

is given as:

reference.

segment is less than a certain tolerance value:

where tol *d* is the chosen threshold.

the value of tol *d* [Prasad*, et al.*, 2011a]. First we show that if a continuous line segment is digitized then the maximum distance between that digital line segment and the continuous line segment is bounded and can be computed analytically. Then, this bound can be used to choose the value of tol *d* adaptively.

We consider the effect of digitization on the slope of a line connecting two points (which may or may not be pixels) [Prasad*, et al.*, 2011a]. Due to digitization in the case of images, a general point *Pxy* (,) is approximated by a pixel *Pxy* (,) as follows:

$$\mathbf{x}' = \mathbf{round(x)}; \quad \mathbf{y}' = \mathbf{round(y)}; \quad \Rightarrow \mathbf{x}', \mathbf{y}' \in \mathbb{Z} \tag{4}$$

where round( ) *x* denotes the rounding of the value of real number *x* to its nearest integer. *Pxy* (,) satisfy the following:

$$\mathbf{x}' = \mathbf{x} + \Delta \mathbf{x}; \quad y' = y + \Delta y; \quad -0.5 \le \Delta \mathbf{x} \le 0.5, \quad -0.5 \le \Delta y \le 0.5 \tag{5}$$

Fig. 1. Representation of the line *P P*1 2 and the digitized line *P P*1 2 .

Let the slope of the line *P P*1 2 (actual line) be denoted as *m* and the slope of the line *P P*1 2 (digital line) be denoted as *m* , where *P*<sup>1</sup> and *P*<sup>2</sup> are obtained by digitization of *P*1 and *P*2 using (4). See Fig. 1 for the illustration. Then *m* and *m* are given as:

$$m = \frac{y\_2 - y\_1}{x\_2 - x\_1} \tag{6}$$

$$m' = \frac{y\_2' - y\_1'}{x\_2' - x\_1'} = \left( m + \frac{\Delta y\_2 - \Delta y\_1}{x\_2 - x\_1} \right) \Bigg/ \left( 1 + \frac{\Delta x\_2 - \Delta x\_1}{x\_2 - x\_1} \right) \tag{7}$$

The angular difference between the numeric tangent and the digital tangent is used as the estimate of the error. This angular difference is given as:

Polygonal Representation of Digital Curves 75

While Strauss is right in pointing out the duality between the precision (quality of local fit) and reliability (quality of global fit), he is incorrect in attributing it to the nature of Hough transform. It can be shown using simple metrics, precision and reliability measures, that there is a perennial conflict in the quality of fit in the local scale (precision at the level of few pixels) and global scale (reliability at the level of complete curve) [Prasad & Leung, 2010c]. It is due to this reason, most absolute measures fail in quantifying the quality of fit properly

Assuming that we are not bound by the limitation that the points used for representing the digital curve should be a subset of the digital curve, we just use least squares method to get the best line(s) fit for a digital curve and show that the precision and reliability measures are at conflict with each other. Suppose, for a digital curve with the sequence of pixels *S Pxy iii* , , *i N* 1 to , we intend to fit a line *ax by* 1 . Then, the coefficients of the line, *a* and *b* , can be determined by casting the problem of fitting into the following matrix

1 2 *M M* 1 2 *xx x yy y* **<sup>X</sup>** , <sup>T</sup> **<sup>A</sup>** *a b* , the superscript T denotes

the transpose operation, and **J** is a column matrix containing *M* rows, whose every

The precision of fitting can be modeled using the residue of the least squares method:

*<sup>p</sup> N*

It is evident that by choosing lesser number of pixels, i.e., reducing *N* , *<sup>p</sup>*

line in a smaller local region is more precise than a large region.

precision. Noting that T T **B B BB** , (13) can simplified as <sup>T</sup>

is the angle between **J** and **J BJ** . However, since *<sup>p</sup>*

cos

and hence, the precision can be increased. It should be noted that with the decrease in the number of pixels, **X** and consequently **B** change, and thus the contribution from cos

may vary. However, if continuous pixels are considered, the overall variance in **X** is

*p* 

the precision parameter for the ease of reference. The lower the value of *<sup>p</sup>*

where represents the Euclidean norm, T T **-1**

*M* **J BJ** cos

obtained using (12). The subscript *p* in *<sup>p</sup>*

 <sup>4</sup> **J J BJ** cos

reduced, and hence the impact of cos

**XA= J** , (12)

**XA J BJ J** , (13)

**B XXX X** is obtained by substituting **A**

*p* 

. (14)

as

**J BJ** ,

can be reduced

, the greater the

**BJ J BJ J**

represents precision, and we shall refer to *<sup>p</sup>*

, where **J** *N* (since **J** contains *N* elements,

is also reduced. In effect, this means that fitting the

[Carmona-Poyato*, et al.*, 2011; Rosin, 1997].

where T T

equation [Acton, 1984]:

element is 1.

**3.1 Precision** 

each equals to 1) and

can be written as:

*p* 

$$\begin{aligned} \partial \phi &= \left| \tan^{-1}(m) - \tan^{-1}(m') \right| = \left| \tan^{-1} \left( \frac{m - m'}{1 + mm'} \right) \right| \\ &= \left| \tan^{-1} \left( \frac{m \left( \Delta\_2 - \Delta\_1 \right) - \left( \Delta\_2 - \Delta\_1 \right)}{\left( 1 + m^2 \right) \left( \Delta\_2 - \Delta\_1 \right) + m \left( \Delta\_2 - \Delta\_1 \right)} \right) \right| \\ &= \left| \tan^{-1} \left( \left( \frac{\chi\_2 - \chi\_1}{s^2} \right) \left( 1 + t \right)^{-1} \left( m \left( \Delta\_2 - \Delta\_1 \right) - \left( \Delta\_2 - \Delta\_1 \right) \right) \right) \right| \end{aligned} \tag{8}$$
 
$$\phi(\omega\_1, \omega\_2) \phi(\omega\_3, \omega\_4) \phi(\omega\_5, \omega\_6) \tag{9}$$

where, <sup>2</sup> <sup>2</sup> 21 21 *s xx yy* and 2 12 1 2 12 1 2 2 *x xxx <sup>y</sup> yyy <sup>t</sup> s s* . Due

to (5), the maximum value of 2 1 *x x* and 2 1 *y y* is 1. Further, *x xs* 2 1 and *y ys* 2 1 are both less than 1. Thus, *t* 1 if *s* 2 , which is true for any line made of more than 2 pixels (i.e. 3 pixels or more). Thus, infinite geometric series expansion can be used in (8) and can be approximated as:

$$\begin{aligned} \partial \phi &\approx \left| \tan^{-1} \left( \left( \frac{\mathbf{x}\_2 - \mathbf{x}\_1}{s^2} \right) \left( m \left( \Delta \mathbf{x}\_2 - \Delta \mathbf{x}\_1 \right) - \left( \Delta \mathbf{y}\_2 - \Delta \mathbf{y}\_1 \right) \right) \left( 1 - t + t^2 \right) \right) \right| \\ &\approx \left| \left( \frac{\mathbf{x}\_2 - \mathbf{x}\_1}{s^2} \right) \left( m \left( \Delta \mathbf{x}\_2 - \Delta \mathbf{x}\_1 \right) - \left( \Delta \mathbf{y}\_2 - \Delta \mathbf{y}\_1 \right) \right) \left( 1 - t + t^2 \right) \right| \end{aligned} \tag{9}$$

Further we note that, has a maximum value when 21 21 *xx yy* 1 :

$$\hat{c}\phi\_{\text{max}} = \max\left(\frac{1}{s^3} \left(\left|\sin\phi \pm \cos\phi\right|\right) \left|\left|s^2 - s\left(\pm\cos\phi \pm \sin\phi\right) + \left(\pm\cos\phi \pm \sin\phi\right)^2\right|\right) \tag{10}$$

where, <sup>1</sup> tan *m* . Then, the maximum deviation is given by:

$$d\_{\text{max}} = \text{sc}\partial\!\!\!\!\!\!\!/ \_{\text{max}}\,. \tag{11}$$

Based on the above analysis, in L-RDP, the suggested value of tol *d* at every iteration is max max *d s* . At each step in the recursion, if the length of the line segment most fit on the curve (or sub-curve) is *s* and the slope of the line segment is *m* , then using (10), we compute tol max *d s* and use it in (3).

#### **3. Global vs. local characteristics of line fit**

It is expected that while fitting a polygon on a digital curve, which is effectively fitting a series of line segments on the digital curve, either we have to take very small local area in order to achieve high precision or we have to take a larger area in order to have a reliable and practically usable fit. This was formally stated and explained by Strauss in the context of Hough transform [Strauss, 1996; Strauss, 1999], "This duality could be set out as follows: as the shape detection precision increases, the reliability of the detection decreases. This seems to be due to the binary aspect of the vote in the classical Hough transform."

1 21 21

 

21 21 *s xx yy* and 2 12 1 2 12 1

to (5), the maximum value of 2 1 *x x* and 2 1 *y y* is 1. Further, *x xs* 2 1 and *y ys* 2 1 are both less than 1. Thus, *t* 1 if *s* 2 , which is true for any line made of more than 2 pixels (i.e. 3 pixels or more). Thus, infinite geometric series expansion can be

2 1 2 2 21 21

<sup>1</sup> max sin cos *s s* cos sin cos sin

. At each step in the recursion, if the length of the line segment most fit on the

Based on the above analysis, in L-RDP, the suggested value of tol *d* at every iteration is

curve (or sub-curve) is *s* and the slope of the line segment is *m* , then using (10), we

It is expected that while fitting a polygon on a digital curve, which is effectively fitting a series of line segments on the digital curve, either we have to take very small local area in order to achieve high precision or we have to take a larger area in order to have a reliable and practically usable fit. This was formally stated and explained by Strauss in the context of Hough transform [Strauss, 1996; Strauss, 1999], "This duality could be set out as follows: as the shape detection precision increases, the reliability of the detection decreases. This seems to be due to the binary aspect of the vote in the classical Hough

*x x m x x y y tt*

tan 1

1 2 2 1 2 21 21

*x x m x x y y tt*

11 1

tan ( ) tan ( ) tan <sup>1</sup>

2

*m m*

1

tan 1

can be approximated as:

*s*

tan *m* . Then, the maximum deviation is given by:

*s*

*s*

max max *d s*

**3. Global vs. local characteristics of line fit** 

and use it in (3).

max 3

*s*

tan

where, <sup>2</sup> <sup>2</sup>

used in (8) and

Further we note that,

where, <sup>1</sup> 

compute tol max *d s*

max max *d s* 

transform."

1 1 2 1

 

*mx x y y m x x x x my y*

*x x t mx x y y*

2 21 21

has a maximum value when 21 21 *xx yy* 1 :

<sup>2</sup> <sup>2</sup>

(10)

 

1

 

. (11)

2 2 *x xxx <sup>y</sup> yyy <sup>t</sup> s s* . Due

21 2 1 2 1

*m m*

*mm*

(8)

(9)

While Strauss is right in pointing out the duality between the precision (quality of local fit) and reliability (quality of global fit), he is incorrect in attributing it to the nature of Hough transform. It can be shown using simple metrics, precision and reliability measures, that there is a perennial conflict in the quality of fit in the local scale (precision at the level of few pixels) and global scale (reliability at the level of complete curve) [Prasad & Leung, 2010c]. It is due to this reason, most absolute measures fail in quantifying the quality of fit properly [Carmona-Poyato*, et al.*, 2011; Rosin, 1997].

Assuming that we are not bound by the limitation that the points used for representing the digital curve should be a subset of the digital curve, we just use least squares method to get the best line(s) fit for a digital curve and show that the precision and reliability measures are at conflict with each other. Suppose, for a digital curve with the sequence of pixels *S Pxy iii* , , *i N* 1 to , we intend to fit a line *ax by* 1 . Then, the coefficients of the line, *a* and *b* , can be determined by casting the problem of fitting into the following matrix equation [Acton, 1984]:

$$\mathbf{X}\mathbf{\bar{A}}=\overline{\mathbf{j}}\,.\tag{12}$$

where T T 1 2 *M M* 1 2 *xx x yy y* **<sup>X</sup>** , <sup>T</sup> **<sup>A</sup>** *a b* , the superscript T denotes the transpose operation, and **J** is a column matrix containing *M* rows, whose every element is 1.

#### **3.1 Precision**

The precision of fitting can be modeled using the residue of the least squares method:

$$\mathcal{E}\_p = \left\| \mathbf{X}\overline{\mathbf{A}} - \overline{\mathbf{J}} \right\| = \left\| \mathbf{B}\overline{\mathbf{J}} - \overline{\mathbf{J}} \right\|\_{\text{-}\prime} \tag{13}$$

where represents the Euclidean norm, T T **-1 B XXX X** is obtained by substituting **A** obtained using (12). The subscript *p* in *<sup>p</sup>* represents precision, and we shall refer to *<sup>p</sup>* as the precision parameter for the ease of reference. The lower the value of *<sup>p</sup>* , the greater the precision. Noting that T T **B B BB** , (13) can simplified as <sup>T</sup> *p* **BJ J BJ J** <sup>4</sup> **J J BJ** cos *M* **J BJ** cos , where **J** *N* (since **J** contains *N* elements, each equals to 1) and is the angle between **J** and **J BJ** . However, since *<sup>p</sup>* **J BJ** , *p* can be written as:

$$
\varepsilon\_p = \sqrt{N} \cos \theta \,\,\,\tag{14}
$$

It is evident that by choosing lesser number of pixels, i.e., reducing *N* , *<sup>p</sup>* can be reduced and hence, the precision can be increased. It should be noted that with the decrease in the number of pixels, **X** and consequently **B** change, and thus the contribution from cos may vary. However, if continuous pixels are considered, the overall variance in **X** is reduced, and hence the impact of cos is also reduced. In effect, this means that fitting the line in a smaller local region is more precise than a large region.

Polygonal Representation of Digital Curves 77

Fig. 3. Lines that have been fit on the images using least squares approach. The lines are

Image (a) (b) (c) (d) (e) (f) (g) (h)

0.000 0.000 0.000 0.000 0.024 0.035 0.013 0.071

0.000 0.000 0.000 0.000 0.022 0.031 0.008 0.066

0.047 0.015 0.072 0.121 0.150 0.111 0.121 0.233

0.048 0.007 0.057 0.114 0.135 0.107 0.115 0.248

**XA J A** , where denotes the infinity norm (i.e. maximum norm). Since

This is the sum of squares of deviation of each pixel from the approximated polygon. It is

The compression ratio is the ratio of number of pixels in the digital curve ( *N* ) to the number of vertices of the polygonal approximation ( *M* ), *CR N M* . Though this measure is not related to either precision or reliability, it is an important performance metric in

**A AA** . Thus, effectively, ISE is also a precision measure.

corresponding to Fig. 3. Values are shown till 3

):

is a form of precision measure.

is specified by

Image (i) (j) (k) (l) (m) (n) (o) (p)

shown in red asterisk.

*p* 

> *r*

*p* 

> *r*

max 

decimal points.

 , max *<sup>p</sup>* 

 

given by <sup>2</sup> 2 T ISE *p p* 

 

2. Integral square error (ISE):

Table 1. Value of parameters *<sup>p</sup>*

 and *<sup>r</sup>* 

**3.4 Performance measures in the context of precision and reliability**  We use various performance measures for comparing various algorithms.

1. Maximum deviation of any pixel on the edge from the fitted polygon ( max

In the context of a general line given by (12), the maximum deviation max

**A** . Thus, it can be concluded that max

3. Dimensionality reduction (DR) ratio or compression ratio (CR):

#### **3.2 Reliability**

In this sub-section, we first present a quantitative measure of reliability that can be understood and compared with respect to the precision measure. Generally, for the reliability of a fit, the fit is expected to satisfy at least two conditions. First, the fit should be valid for a sufficiently large region (or in this case a long edge) and second, it should not be sensitive to occasional spurious large deviations in the edge. A combination of both these properties can be sought by defining a reliability parameter as follows:

$$\mathcal{E}\_r = \sum\_i \left| \mathbf{X}\_i \overline{\mathbf{A}} - \mathbf{1} \right| \Big/ s\_{\text{max}} \tag{15}$$

where **X***i ii x y* , represents the magnitude and max *s* is the maximum Euclidean distance between any two pair of pixels. Subscript *r* in *<sup>r</sup>* denotes reliability. As before, lower the value of *<sup>r</sup>* , higher the reliability.

#### **3.3 Duality**

As evident from (14) and (15), and the discussion between them, there is always a contradiction between precision and reliability. In order to increase the precision, we need to consider smaller regions for fitting, whereas for increasing the reliability, we need to consider larger regions for fitting (largest region being the region spanned by the connected edge pixels under consideration). Indeed the contradiction does not occur in ideal lines as shown in Fig. 2(a)-(d). It is also not an issue if the lines are in general smooth, so that the precision within a large region is already very high, such that reliability and precision are already sufficiently high and there is no practical need to increase the precision or reliability. Some such examples are presented in Fig. 2(e)-(g). This is illustrated in Fig. 3 and Table 1. However, if indeed an application calls for still higher precision, the reliability will have to be compromised and the duality comes into picture. Examples of more practical cases are shown in Fig. 2(h)-(p). In such cases, the duality comes into picture strongly and a balance has to be achieved in order to obtain a fit that is sufficiently reliable as well as precise.

Fig. 2. Example of small images. Each image is of size 20 20 pixels. The grey pixels are the edge pixels, on which line has to be fit.

In this sub-section, we first present a quantitative measure of reliability that can be understood and compared with respect to the precision measure. Generally, for the reliability of a fit, the fit is expected to satisfy at least two conditions. First, the fit should be valid for a sufficiently large region (or in this case a long edge) and second, it should not be sensitive to occasional spurious large deviations in the edge. A combination of both these

max 1 *r i*

where **X***i ii x y* , represents the magnitude and max *s* is the maximum Euclidean

As evident from (14) and (15), and the discussion between them, there is always a contradiction between precision and reliability. In order to increase the precision, we need to consider smaller regions for fitting, whereas for increasing the reliability, we need to consider larger regions for fitting (largest region being the region spanned by the connected edge pixels under consideration). Indeed the contradiction does not occur in ideal lines as shown in Fig. 2(a)-(d). It is also not an issue if the lines are in general smooth, so that the precision within a large region is already very high, such that reliability and precision are already sufficiently high and there is no practical need to increase the precision or reliability. Some such examples are presented in Fig. 2(e)-(g). This is illustrated in Fig. 3 and Table 1. However, if indeed an application calls for still higher precision, the reliability will have to be compromised and the duality comes into picture. Examples of more practical cases are shown in Fig. 2(h)-(p). In such cases, the duality comes into picture strongly and a balance has to be achieved in order to obtain a

Fig. 2. Example of small images. Each image is of size 20 20 pixels. The grey pixels are the

**X A** *<sup>s</sup>* (15)

denotes reliability. As before,

*i*

properties can be sought by defining a reliability parameter as follows:

distance between any two pair of pixels. Subscript *r* in *<sup>r</sup>*

, higher the reliability.

**3.2 Reliability** 

lower the value of *<sup>r</sup>*

**3.3 Duality** 

fit that is sufficiently reliable as well as precise.

edge pixels, on which line has to be fit.

Fig. 3. Lines that have been fit on the images using least squares approach. The lines are shown in red asterisk.


Table 1. Value of parameters *<sup>p</sup>* and *<sup>r</sup>* corresponding to Fig. 3. Values are shown till 3 decimal points.

#### **3.4 Performance measures in the context of precision and reliability**

We use various performance measures for comparing various algorithms.

1. Maximum deviation of any pixel on the edge from the fitted polygon ( max ):

In the context of a general line given by (12), the maximum deviation max is specified by max **XA J A** , where denotes the infinity norm (i.e. maximum norm). Since , max *<sup>p</sup>* **A** . Thus, it can be concluded that max is a form of precision measure. 2. Integral square error (ISE):

This is the sum of squares of deviation of each pixel from the approximated polygon. It is given by <sup>2</sup> 2 T ISE *p p* **A AA** . Thus, effectively, ISE is also a precision measure.

3. Dimensionality reduction (DR) ratio or compression ratio (CR):

The compression ratio is the ratio of number of pixels in the digital curve ( *N* ) to the number of vertices of the polygonal approximation ( *M* ), *CR N M* . Though this measure is not related to either precision or reliability, it is an important performance metric in

Polygonal Representation of Digital Curves 79

 Curve mean ; to <sup>1</sup> *<sup>j</sup> p p*

is the precision measure of the *j* th line segment, defined using (16).

 

> max 1

*<sup>J</sup> <sup>j</sup> <sup>j</sup> i*

**X A**

*j i <sup>r</sup> <sup>J</sup> <sup>j</sup>*

*j s*

Suppose a dataset contains *L* number of images, the number of edges in the *l* th image is

 

 

 

The algorithm proposed by Perez and Vidal (PV) [Perez & Vidal, 1994] is by far the most popular algorithm used as a benchmark for comparing the performance of polygonal fitting algorithms. The reason for its popularity is twofold. For a given number of points *N N* , where *N* is the number of pixels in the digital curve, it computes the optimal choice of *N* points from the digital curve such that some error metric is minimized. Since the error metric can be flexibly defined by a user, it is versatile in its use. Further, for the purpose of benchmarking, the designers of other algorithms can first perform the polygonal fitting using their own algorithms, obtain a value of *N* as obtained by their own algorithms, use this value of *N* in the algorithm by PV and simply compare the points obtained by their

 

mean ; 1 to

*l*

max ; 1 to

mean ; 1 to

*l*

max ; 1 to

*k*

*k*

1

*<sup>j</sup> <sup>s</sup>* correspond to **X***<sup>i</sup>* , **<sup>A</sup>** , and max *s* defined after (17) for the *j* th line

*l L*

*k K*

*l L*

*k K*

*j J* , (18)

, (19)

, (20)

, (21)

The net reliability measure of the digital curve is defined as follows:

 <sup>1</sup> Curve

*p p*

Dataset Image

Image Curve

Dataset Image

Image Curve

**4. Contemporary polygonal approximation method in the perspective of** 

**4.1 Optimal polygonal representation of Perez and Vidal [Perez & Vidal, 1994]** 

*p p*

*r r*

*r r*

where *<sup>j</sup> p* 

segment.

2. Reliability measure for an edge

3. Precision measure for a dataset of images

*Kl* , then, the precision measure for the dataset is:

4. Reliability measure for a dataset of images

**duality and the upper bound** 

method against the optimal points obtained by PV.

In a manner similar to (20), the reliability measure for a dataset is:

where *<sup>j</sup>* **<sup>X</sup>***<sup>i</sup>* , *<sup>j</sup>* **<sup>A</sup>** , and max

practice. In addition to other metrics representing precision and/or reliability, a larger value of this is beneficial for reduction of data and computational resources. Instead of compression ratio, its reciprocal dimensionality reduction ratio <sup>1</sup> *DR CR M N* can be used as a minimization metric (i.e. the lesser, the better).

#### 4. Figure of merit (FOM)

Figure of merit is given by FOM=CR ISE . This is a maximization metric, i.e., larger value of FOM is preferred over a lower value. However, it is well known that FOM is biased towards ISE [Carmona-Poyato*, et al.*, 2010]. For example, if the break points of a digital curve [Masood, 2008] are considered as the dominant points, the ISE is zero and inconsequent of the CR, FOM is infinity. If we intend to use a minimization metric, we may consider WE =1/FOM [Marji & Siy, 2004]. It suffers 1 with the same deficiency as FOM.

#### 5. Fidelity, Efficiency and Merit

Researchers tried relative measures like fidelity, efficiency, merit to quantify the quality of fit [Carmona-Poyato*, et al.*, 2011; Rosin, 1997]. In relative measures, a so-called optimal algorithm is considered as the reference for comparing the performance of the algorithm being tested. The method proposed by Perez and Vidal [Perez & Vidal, 1994] based on dynamic programming is generally used by the researchers as the reference algorithm. This is because it targets min and min # such that the fitting error is minimized for a certain number of points ( min ) or the number of points for fitting is minimized for a given value of fitting error ( min # ). It is logical that there is no way of determining an optimal value for the fixed number of points ( min ) or the fixed value of fitting error ( min # ), because such a value depends upon the nature of the digital curve for which polygonal approximation is sought.

#### **3.5 Proposed performance measures**

As seen in section 3.4, none of the existing methods cater for the global nature of the fit. Thus, the reliability measure is very important addition to the performance metrics of the polygonal approximation method. For the line segments (edges of the polygon), the precision and reliability measures are computed:

$$
\varepsilon\_p' = \frac{\overline{\mathbf{J}} \cdot (\overline{\mathbf{J}} - \mathbf{X}\overline{\mathbf{A}})}{|\overline{\mathbf{A}}|} \tag{16}
$$

$$\mathcal{E}\_r = \sum\_i \left| \mathbf{X}\_i \overline{\mathbf{A}} - \mathbf{1} \right| \Big/ \mathbf{s}\_{\text{max}} \tag{17}$$

where *<sup>p</sup>* and *<sup>r</sup>* are the precision and reliability measures, **X***i ii x y* , and max *s* is the maximum Euclidean distance between any two pair of pixels [Prasad & Leung, 2010c]. Notation represents the magnitude in the scalar case and the Euclidean norm in the case of vectors.

1. Precision measure for an edge

Suppose *J* line segments are fitted upon a digital curve. Then we define the net precision measure for the digital curve as follows:

practice. In addition to other metrics representing precision and/or reliability, a larger value of this is beneficial for reduction of data and computational resources. Instead of compression ratio, its reciprocal dimensionality reduction ratio <sup>1</sup> *DR CR M N* can be

Figure of merit is given by FOM=CR ISE . This is a maximization metric, i.e., larger value of FOM is preferred over a lower value. However, it is well known that FOM is biased towards ISE [Carmona-Poyato*, et al.*, 2010]. For example, if the break points of a digital curve [Masood, 2008] are considered as the dominant points, the ISE is zero and inconsequent of the CR, FOM is infinity. If we intend to use a minimization metric, we may consider

Researchers tried relative measures like fidelity, efficiency, merit to quantify the quality of fit [Carmona-Poyato*, et al.*, 2011; Rosin, 1997]. In relative measures, a so-called optimal algorithm is considered as the reference for comparing the performance of the algorithm being tested. The method proposed by Perez and Vidal [Perez & Vidal, 1994] based on dynamic programming is generally used by the researchers as the reference algorithm. This

of fitting error ( min # ). It is logical that there is no way of determining an optimal value

such a value depends upon the nature of the digital curve for which polygonal

As seen in section 3.4, none of the existing methods cater for the global nature of the fit. Thus, the reliability measure is very important addition to the performance metrics of the polygonal approximation method. For the line segments (edges of the polygon), the

**J J XA**

max 1 *r i*

maximum Euclidean distance between any two pair of pixels [Prasad & Leung, 2010c]. Notation represents the magnitude in the scalar case and the Euclidean norm in the case

Suppose *J* line segments are fitted upon a digital curve. Then we define the net precision

are the precision and reliability measures, **X***i ii x y* , and max *s* is the

and min # such that the fitting error is minimized for a certain

) or the fixed value of fitting error ( min # ), because

**A** (16)

**X A** *<sup>s</sup>* (17)

) or the number of points for fitting is minimized for a given value

WE =1/FOM [Marji & Siy, 2004]. It suffers 1 with the same deficiency as FOM.

*p* 

*i*

used as a minimization metric (i.e. the lesser, the better).

precision and reliability measures are computed:

4. Figure of merit (FOM)

5. Fidelity, Efficiency and Merit

for the fixed number of points ( min

**3.5 Proposed performance measures** 

1. Precision measure for an edge

measure for the digital curve as follows:

is because it targets min

approximation is sought.

where *<sup>p</sup>* and *<sup>r</sup>* 

of vectors.

number of points ( min

$$\left(\left(\boldsymbol{\varepsilon}\_{p}^{\prime}\right)\_{\text{Curve}} = \text{mean}\left(\boldsymbol{\varepsilon}\_{p}^{\prime j}; j = 1 \text{ to } I\right)\_{\text{'}}\tag{18}$$

where *<sup>j</sup> p* is the precision measure of the *j* th line segment, defined using (16).

#### 2. Reliability measure for an edge

The net reliability measure of the digital curve is defined as follows:

$$\left(\left(\boldsymbol{\varepsilon}\_{r}\right)\_{\text{Curve}} = \frac{\sum\_{j=1}^{l} \sum\_{i} \left| \mathbf{X}\_{i}^{j} \overline{\mathbf{A}}^{j} - \mathbf{1} \right|}{\sum\_{j=1}^{l} s\_{\text{max}}^{j}},\tag{19}$$

where *<sup>j</sup>* **<sup>X</sup>***<sup>i</sup>* , *<sup>j</sup>* **<sup>A</sup>** , and max *<sup>j</sup> <sup>s</sup>* correspond to **X***<sup>i</sup>* , **<sup>A</sup>** , and max *s* defined after (17) for the *j* th line segment.

#### 3. Precision measure for a dataset of images

Suppose a dataset contains *L* number of images, the number of edges in the *l* th image is *Kl* , then, the precision measure for the dataset is:

$$\begin{aligned} \left(\boldsymbol{\varepsilon}\_{p}^{\prime}\right)\_{\text{Dataset}} &= \text{mean}\left(\left(\boldsymbol{\varepsilon}\_{p}^{\prime}\right)\_{\text{Image}}^{l}; l = 1 \text{ to } L\right) \\ \left(\boldsymbol{\varepsilon}\_{p}^{\prime}\right)\_{\text{Image}} &= \max\left(\left(\boldsymbol{\varepsilon}\_{p}^{\prime}\right)\_{\text{Curve}}^{k}; k = 1 \text{ to } K\right)^{\prime} \end{aligned} \tag{20}$$

4. Reliability measure for a dataset of images

In a manner similar to (20), the reliability measure for a dataset is:

$$\begin{aligned} \left(\boldsymbol{\varepsilon}\_{r}\right)\_{\text{Dataset}} &= \text{mean}\left(\left(\boldsymbol{\varepsilon}\_{r}\right)\_{\text{Image}}^{l}; l = 1 \text{ to } L\right) \\ \left(\boldsymbol{\varepsilon}\_{r}\right)\_{\text{Image}} &= \max\left(\left(\boldsymbol{\varepsilon}\_{r}\right)\_{\text{Curve}}^{k}; k = 1 \text{ to } K\right)' \end{aligned} \tag{21}$$

#### **4. Contemporary polygonal approximation method in the perspective of duality and the upper bound**

#### **4.1 Optimal polygonal representation of Perez and Vidal [Perez & Vidal, 1994]**

The algorithm proposed by Perez and Vidal (PV) [Perez & Vidal, 1994] is by far the most popular algorithm used as a benchmark for comparing the performance of polygonal fitting algorithms. The reason for its popularity is twofold. For a given number of points *N N* , where *N* is the number of pixels in the digital curve, it computes the optimal choice of *N* points from the digital curve such that some error metric is minimized. Since the error metric can be flexibly defined by a user, it is versatile in its use. Further, for the purpose of benchmarking, the designers of other algorithms can first perform the polygonal fitting using their own algorithms, obtain a value of *N* as obtained by their own algorithms, use this value of *N* in the algorithm by PV and simply compare the points obtained by their method against the optimal points obtained by PV.

Polygonal Representation of Digital Curves 81

Masood begins with the sequence of the break points, i.e., the smallest set of line segments such that each pixel of the curve lies exactly on the line segments, which is considered as the initial set of dominant points. Then, he proceeds with recursively deleting one break point at a time such that removing it has a minimum impact in its immediate neighborhood and optimizing the locations of the dominant points for minimum precision score. Although the aim of optimization is to improve the global fit and thus indirectly improve the reliability, evidently, Masood's method is tailored for optimizing the precision and performs poorly in

Since Masood begins with largest possible set of dominant points and removes the dominant points till a certain termination criterion is satisfied, if the termination criterion is not very relaxed, the maximum deviation is in general lesser than the upper bound. Thus, in essence, Masood's method is sensitive to the digitization effects and gives an unnecessarily

**4.5 Dominent point detection method of Carmona-Poyato [Carmona-Poyato***, et al.***, 2010]**  Like Masood [Masood, 2008], Carmona also begins with the sequence of break points and the initial set of dominant points. However, unlike Masood, Carmona recursively deletes the dominant points with minimum impact on the global fit of the line segments. Thus, inherently Carmona-Poyato focuses more on reliability than on precision. It is evident in the results reported in [Carmona-Poyato*, et al.*, 2010] that this method has a tendency to be lenient in the maximum allowable deviation in the favor of general shape representation for

2. L-RDP0.5, L-RDP1.0, L-RDP1.5, and L-RDP2.0 (from sections 2 and 4.2) correspond to

3. PRO0.2, PRO0.4, PRO0.6, PRO0.8, and PRO1.0 (from section 4.3) correspond to the

4. Masood (section 4.4, [Masood, 2008]) using the termination criterion specified in

5. Carmona-Poyato (section 4.5, [Carmona-Poyato*, et al.*, 2010]), using the termination

First, we consider L-RDP\_max. The results are in the second row of Fig. 4. We see that L-RDP\_max is able to avoid the fluctuations due to digitization and noise (in Fig. 4, see columns (g-p), row 2). In the meanwhile, it is able to retain good fit for snippets with important curvature changes, see columns (h,m,n,p), row 2 of Fig. 4. Conclusively, due to the consideration of the upper bound of digitization error, L-RDP\_max considers the general features of the digital

**4.4 Break point supression method of Masood [Masood, 2008]** 

terms of reliability.

the whole curve.

values of 0 

**5.1 Images of Fig. 2** 

**5. Numerical examples** 

1. L-RDP\_max (from section 2)

[Masood, 2008], i.e., max

We consider the following methods for comparison:

0.9 .

the values of tol *d* as 0.5, 1.0, 1.5, and 2.0 pixels respectively.

as 0.2, 0.4, 0.6, 0.8, and 1.0, respectively.

condition specified in [Carmona-Poyato*, et al.*, 2010], i.e., 0.4 *ir* .

curve rather than concentrating on every single small scale feature of the curve.

close fit to the digital curve.

Since PV can use any error metric to be minimized, it is interesting to note that we can either use the precision score or the reliability score as the error to be minimized. If precision score is used as the error function, PV attempts to fit segments such that all the line segments are of approximately the same length. If reliability score is used as the error function, PV attempts to fit segments that are combination of two types: first type are the small segment

with small value of max *d* but with very small (close to zero) value of 1 *<sup>i</sup> i* **X A** ; the second type are long segments with comparatively larger values of 1 *<sup>i</sup> i* **X A** but significantly larger value of max *d* such that the reliability score is also small valued.

On the other hand, PV do not guarantee that the maximum deviation of the pixels in curve is within the upper limit of the error due to digitization. If the value of *N* is very large, it is likely that PV will fit the segments such that the maximum deviation is lesser than the upper bound. This means that the polygonal approximation will over fit and be sensitive to the error due to digitization. On the other hand, if the value of *N* is small, the maximum deviation of the fitted segments is larger than the upper bound, thus indicating underfitting. In essence, this means that using a fixed value of *N* or solving min- problem is not suitable for optimal polygonal approximation of the digital curves.

#### **4.2 Lowe [Lowe, 1987] and Ramer-Douglas-Peucker [Douglas & Peucker, 1973; Ramer, 1972] (L-RDP method)**

Lowe [Lowe, 1987] and Ramer-Douglas-Peucker [Douglas & Peucker, 1973; Ramer, 1972] is basically a splitting method in which the point of maximum deviation is found recursively till the maximum deviation of any edge pixel from the nearest line segment is less than a fixed value. Since this is a splitting algorithm, it begins with a very high value of max *d* , which reduces as the edge is split further. The algorithm stops at a point where the maximum deviation satisfies a minimum criterion. Thus, this algorithm focuses more on reliability and attempts to barely satisfy a precision requirement.

In the sense of the upper bound, this algorithm gives a mixed performance. For a few segments, the chosen threshold may be below the upper bound and the result is an overfitting for this segment. On the other hand, the chosen threshold may be above the upper bound for certain line segments, thus resulting in under-fitting for such segments.

#### **4.3 Precision and reliability based optimization (PRO)**

In this method, though the method of optimization is the same as the L-RDP method, the optimization goal is different than (3). Instead of (3), the optimization goal is:

$$\max\left(\varepsilon\_{p'}'\varepsilon\_r\right) < \varepsilon\_{0'} \tag{22}$$

where, 0 is the chosen heuristic parameter. Since this method explicitly uses the precision and reliability measures as the optimization functions, this method is expected to perform well for both precision and reliability measures.

However, this method does not take into account the upper bound of the error due to digitization.

Since PV can use any error metric to be minimized, it is interesting to note that we can either use the precision score or the reliability score as the error to be minimized. If precision score is used as the error function, PV attempts to fit segments such that all the line segments are of approximately the same length. If reliability score is used as the error function, PV attempts to fit segments that are combination of two types: first type are the small segment

On the other hand, PV do not guarantee that the maximum deviation of the pixels in curve is within the upper limit of the error due to digitization. If the value of *N* is very large, it is likely that PV will fit the segments such that the maximum deviation is lesser than the upper bound. This means that the polygonal approximation will over fit and be sensitive to the error due to digitization. On the other hand, if the value of *N* is small, the maximum deviation of the fitted segments is larger than the upper bound, thus indicating under-

*i*

, (22)

*i*

**X A** ; the second

**X A** but significantly

problem is

with small value of max *d* but with very small (close to zero) value of 1 *<sup>i</sup>*

fitting. In essence, this means that using a fixed value of *N* or solving min-

**4.2 Lowe [Lowe, 1987] and Ramer-Douglas-Peucker [Douglas & Peucker, 1973;** 

Lowe [Lowe, 1987] and Ramer-Douglas-Peucker [Douglas & Peucker, 1973; Ramer, 1972] is basically a splitting method in which the point of maximum deviation is found recursively till the maximum deviation of any edge pixel from the nearest line segment is less than a fixed value. Since this is a splitting algorithm, it begins with a very high value of max *d* , which reduces as the edge is split further. The algorithm stops at a point where the maximum deviation satisfies a minimum criterion. Thus, this algorithm focuses more on

In the sense of the upper bound, this algorithm gives a mixed performance. For a few segments, the chosen threshold may be below the upper bound and the result is an overfitting for this segment. On the other hand, the chosen threshold may be above the upper

In this method, though the method of optimization is the same as the L-RDP method, the

max , *p r* <sup>0</sup> 

and reliability measures as the optimization functions, this method is expected to perform

However, this method does not take into account the upper bound of the error due to

is the chosen heuristic parameter. Since this method explicitly uses the precision

bound for certain line segments, thus resulting in under-fitting for such segments.

optimization goal is different than (3). Instead of (3), the optimization goal is:

not suitable for optimal polygonal approximation of the digital curves.

reliability and attempts to barely satisfy a precision requirement.

**4.3 Precision and reliability based optimization (PRO)** 

well for both precision and reliability measures.

**Ramer, 1972] (L-RDP method)** 

where, 0 

digitization.

type are long segments with comparatively larger values of 1 *<sup>i</sup>*

larger value of max *d* such that the reliability score is also small valued.

#### **4.4 Break point supression method of Masood [Masood, 2008]**

Masood begins with the sequence of the break points, i.e., the smallest set of line segments such that each pixel of the curve lies exactly on the line segments, which is considered as the initial set of dominant points. Then, he proceeds with recursively deleting one break point at a time such that removing it has a minimum impact in its immediate neighborhood and optimizing the locations of the dominant points for minimum precision score. Although the aim of optimization is to improve the global fit and thus indirectly improve the reliability, evidently, Masood's method is tailored for optimizing the precision and performs poorly in terms of reliability.

Since Masood begins with largest possible set of dominant points and removes the dominant points till a certain termination criterion is satisfied, if the termination criterion is not very relaxed, the maximum deviation is in general lesser than the upper bound. Thus, in essence, Masood's method is sensitive to the digitization effects and gives an unnecessarily close fit to the digital curve.

#### **4.5 Dominent point detection method of Carmona-Poyato [Carmona-Poyato***, et al.***, 2010]**

Like Masood [Masood, 2008], Carmona also begins with the sequence of break points and the initial set of dominant points. However, unlike Masood, Carmona recursively deletes the dominant points with minimum impact on the global fit of the line segments. Thus, inherently Carmona-Poyato focuses more on reliability than on precision. It is evident in the results reported in [Carmona-Poyato*, et al.*, 2010] that this method has a tendency to be lenient in the maximum allowable deviation in the favor of general shape representation for the whole curve.

#### **5. Numerical examples**

We consider the following methods for comparison:


#### **5.1 Images of Fig. 2**

First, we consider L-RDP\_max. The results are in the second row of Fig. 4. We see that L-RDP\_max is able to avoid the fluctuations due to digitization and noise (in Fig. 4, see columns (g-p), row 2). In the meanwhile, it is able to retain good fit for snippets with important curvature changes, see columns (h,m,n,p), row 2 of Fig. 4. Conclusively, due to the consideration of the upper bound of digitization error, L-RDP\_max considers the general features of the digital curve rather than concentrating on every single small scale feature of the curve.

Polygonal Representation of Digital Curves 83

a brush size of 2 pixels. Then the polygonal approximation obtained using various methods are presented in Fig. 5. As in section 5.1, the performance of L-RDP\_max, L-RDP1.5, and PRO0.8 are similar. For this curve, the performance of Carmano-Poyato is also similar to L-RDP\_max. Not only the number of vertices in the polygonal approximation for these cases are similar, the location of the vertices are also similar. L-RDP0.5 and PRO0.2 over fit the

We consider 7 datasets used in object detection algorithms for the purpose of training. These datasets, namely afright [McCarter & Storkey, 2003], Caltech101 [Fei-Fei*, et al.*, 2007], Caltech 256 [Griffin*, et al.*], Pascal 2007 [Everingham*, et al.*, 2007], Pascal, 2008 [Everingham*, et al.*, 2008], Pascal 2009 [Everingham*, et al.*, 2009], and Pascal 2010[Everingham*, et al.*, 2010], contain a total of 97178 images, with the smallest image being only 80 pixels wide and the

datasets and algorithms are plotted in Fig. 6, Fig. 7, and Fig. 8 respectively. Even over such wide range of images, L-RDP and PRO algorithms give consistent performances, as seen in Fig. 6, Fig. 7, and Fig. 8. Further, all L-RDP and PRO algorithms give better performance in terms of precision and reliability, as seen in Fig. 6 and Fig. 7. As a final note, L-RDP\_max

Fig. 4. The polygonal approximations obtained using various methods for images in Fig. 2.

 , Dataset *<sup>r</sup>* 

and DR for all the

curve with numerous points. The quantitative performances are listed in Table 2.

**5.3 Large datasets used in real applications** 

largest image being 748 pixels wide. The values of Dataset *<sup>p</sup>*

gives better DR than both Masood and Carmona-Poyato, as seen in Fig. 8.

The first observation is that L-RDP algorithms are very sensitive to the tolerance values. L-RDP0.5 algorithm gives a performance comparable to PRO0.2 and PRO0.4, both qualitatively (specifically note the columns (h,i,m,n) of Fig. 4) and quantitatively (see Table 2). A slight increase in tolerance from 0.5 to 1 changes the quality and performance parameters of the line fitting significantly, as evident in Fig. 4 and Table 2. The performance of L-RDP1 is closer to PRO0.6 and the performance of L-RDP1.5 is closer to L-RDP\_max and PRO0.8. As the tolerance is increased further in L-RDP algorithms, the fitted line segments start losing information about the major curvature changes and represent the digital curves only crudely. Thus, though L-RDP2.0 provides significant dimensionality reduction (see DR in Table 2), it performs poorly for all the remaining performance parameters.

Next, we consider the results of PRO algorithms. It can be seen in the row PRO0.2 of Fig. 4 that it follows the digital curves very closely. As a consequence it is very sensitive to digitization and generates numerous small line segments to represent the curve, strongly evident in columns (e-i,k-p) of Fig. 4. Though definitely very reliable and precise, as evident from Dataset *<sup>p</sup>* and Dataset *<sup>r</sup>* in Table 1, due to the tendency to fit the curves very closely, it performs poorly in dimensionality reduction (see DR values in Table 1). In the next set: PRO0.4-0.8, we see that these algorithms tend to follow the curvature of the digital curve, better than PRO0.2. We highlight the results in column (m) of Fig. 4. While PRO0.2 generated many line segment for the right side of the curve, PRO0.4-0.8 are more selective in fitting the line segments and fit the line segments focusing at the location of changes in curvature. Further, as the value of 0 increases from 0.4 to 0.8, the tendency to concentrate further on the general characteristics of the curve (rather than following every small scale feature of the curve) increases. This is significantly evident in the results in columns (h), (n), and (p) of Fig. 4. In the last PRO algorithm, PRO1.0, we see that rather than focusing on small features of the digital curve, these algorithms tend to follow the general characteristics of the digital curve on a relatively larger scale. Due to this reason, the results of PRO0.8-1.0 are closer to L-RDP\_max. As a consequence of this characteristic, PRO0.8-1.0 and L-RDP\_max have significantly better dimensionality reduction as compared to other PRO algorithms (see DR in Table 2).

With lower value of Dataset *<sup>p</sup>* than Dataset *<sup>r</sup>* in Table 2, we see that Masood targets improving the precision (reducing) rather than the reliability. Thus, as noted in columns (km) of Fig. 4, Masood fails in representing the nature of the curves effectively. On the other hand, for Carmona-Poyato, the value of Dataset *<sup>r</sup>* is lower than Dataset *<sup>p</sup>* in Table 2. However, due to the small length of the digital curves here and the fact that Carmona fits the polygonal approximation depending upon the length of the curve, in these figures with very small digital curves, it tends to fit the curves very closely (see columns (e,f,i,n,o) of Fig. 4), thus demonstrating better precision as well as reliability as compared to Masood. On the other hand for these images, L-RDP\_max gives a performance in between Masood and Carmona-Poyato. This indicates that L-RDP\_max avoids both under-fitting and over fitting.

#### **5.2 Example of large closed curve**

In this section, we consider an example of digital closed curve which is significantly large and contains 458 pixels. The digital curve is derived by scanning the image of dog from Figure 14 of [Masood, 2008] at 300 dpi, followed with blurring using Adobe Photoshop with

The first observation is that L-RDP algorithms are very sensitive to the tolerance values. L-RDP0.5 algorithm gives a performance comparable to PRO0.2 and PRO0.4, both qualitatively (specifically note the columns (h,i,m,n) of Fig. 4) and quantitatively (see Table 2). A slight increase in tolerance from 0.5 to 1 changes the quality and performance parameters of the line fitting significantly, as evident in Fig. 4 and Table 2. The performance of L-RDP1 is closer to PRO0.6 and the performance of L-RDP1.5 is closer to L-RDP\_max and PRO0.8. As the tolerance is increased further in L-RDP algorithms, the fitted line segments start losing information about the major curvature changes and represent the digital curves only crudely. Thus, though L-RDP2.0 provides significant dimensionality reduction (see DR

Next, we consider the results of PRO algorithms. It can be seen in the row PRO0.2 of Fig. 4 that it follows the digital curves very closely. As a consequence it is very sensitive to digitization and generates numerous small line segments to represent the curve, strongly evident in columns (e-i,k-p) of Fig. 4. Though definitely very reliable and precise, as evident

it performs poorly in dimensionality reduction (see DR values in Table 1). In the next set: PRO0.4-0.8, we see that these algorithms tend to follow the curvature of the digital curve, better than PRO0.2. We highlight the results in column (m) of Fig. 4. While PRO0.2 generated many line segment for the right side of the curve, PRO0.4-0.8 are more selective in fitting the line segments and fit the line segments focusing at the location of changes in curvature.

general characteristics of the curve (rather than following every small scale feature of the curve) increases. This is significantly evident in the results in columns (h), (n), and (p) of Fig. 4. In the last PRO algorithm, PRO1.0, we see that rather than focusing on small features of the digital curve, these algorithms tend to follow the general characteristics of the digital curve on a relatively larger scale. Due to this reason, the results of PRO0.8-1.0 are closer to L-RDP\_max. As a consequence of this characteristic, PRO0.8-1.0 and L-RDP\_max have significantly better

improving the precision (reducing) rather than the reliability. Thus, as noted in columns (km) of Fig. 4, Masood fails in representing the nature of the curves effectively. On the other

However, due to the small length of the digital curves here and the fact that Carmona fits the polygonal approximation depending upon the length of the curve, in these figures with very small digital curves, it tends to fit the curves very closely (see columns (e,f,i,n,o) of Fig. 4), thus demonstrating better precision as well as reliability as compared to Masood. On the other hand for these images, L-RDP\_max gives a performance in between Masood and Carmona-Poyato. This indicates that L-RDP\_max avoids both under-fitting and over fitting.

In this section, we consider an example of digital closed curve which is significantly large and contains 458 pixels. The digital curve is derived by scanning the image of dog from Figure 14 of [Masood, 2008] at 300 dpi, followed with blurring using Adobe Photoshop with

dimensionality reduction as compared to other PRO algorithms (see DR in Table 2).

 than Dataset *<sup>r</sup>* 

in Table 1, due to the tendency to fit the curves very closely,

increases from 0.4 to 0.8, the tendency to concentrate further on the

in Table 2, we see that Masood targets

in Table 2.

is lower than Dataset *<sup>p</sup>*

in Table 2), it performs poorly for all the remaining performance parameters.

from Dataset *<sup>p</sup>* 

Further, as the value of 0

With lower value of Dataset *<sup>p</sup>*

**5.2 Example of large closed curve** 

 and Dataset *<sup>r</sup>* 

hand, for Carmona-Poyato, the value of Dataset *<sup>r</sup>*

a brush size of 2 pixels. Then the polygonal approximation obtained using various methods are presented in Fig. 5. As in section 5.1, the performance of L-RDP\_max, L-RDP1.5, and PRO0.8 are similar. For this curve, the performance of Carmano-Poyato is also similar to L-RDP\_max. Not only the number of vertices in the polygonal approximation for these cases are similar, the location of the vertices are also similar. L-RDP0.5 and PRO0.2 over fit the curve with numerous points. The quantitative performances are listed in Table 2.

#### **5.3 Large datasets used in real applications**

We consider 7 datasets used in object detection algorithms for the purpose of training. These datasets, namely afright [McCarter & Storkey, 2003], Caltech101 [Fei-Fei*, et al.*, 2007], Caltech 256 [Griffin*, et al.*], Pascal 2007 [Everingham*, et al.*, 2007], Pascal, 2008 [Everingham*, et al.*, 2008], Pascal 2009 [Everingham*, et al.*, 2009], and Pascal 2010[Everingham*, et al.*, 2010], contain a total of 97178 images, with the smallest image being only 80 pixels wide and the largest image being 748 pixels wide. The values of Dataset *<sup>p</sup>* , Dataset *<sup>r</sup>* and DR for all the datasets and algorithms are plotted in Fig. 6, Fig. 7, and Fig. 8 respectively. Even over such wide range of images, L-RDP and PRO algorithms give consistent performances, as seen in Fig. 6, Fig. 7, and Fig. 8. Further, all L-RDP and PRO algorithms give better performance in terms of precision and reliability, as seen in Fig. 6 and Fig. 7. As a final note, L-RDP\_max gives better DR than both Masood and Carmona-Poyato, as seen in Fig. 8.

Fig. 4. The polygonal approximations obtained using various methods for images in Fig. 2.

Polygonal Representation of Digital Curves 85

for various datasets obtained by different algorithms.

for various datasets obtained by different algorithms.

Fig. 6. Precision measure Dataset *<sup>p</sup>*

Fig. 7. Reliability measure Dataset *<sup>r</sup>*


Table 2. Performance metrics for the dataset of the 16 images in Fig. 2 and the digital curve in section 5.2.

Fig. 5. Example of a large curve, shape of a dog. The length of the digital curve is N=458.

DR Dataset *<sup>p</sup>*

L-RDP\_max 0.2751 0.2198 0.1286 0.3954 0.3189 0.1114 0.3059 L-RDP0.5 0.1108 0.0931 0.2233 0.0897 0.0964 0.2642 0.8533 L-RDP1 0.2295 0.1884 0.1356 0.2829 0.2397 0.1332 0.3498 L-RDP1.5 0.2751 0.2198 0.1286 0.4133 0.3508 0.1048 0.2571 L-RDP2 0.3650 0.2899 0.1174 0.5298 0.4341 0.0917 0.2225 PRO0.2 0.0032 0.0030 0.3933 0.0055 0.0062 0.4672 1.6157 PRO0.4 0.1486 0.1227 0.2008 0.1629 0.1607 0.1769 0.5004 PRO0.6 0.1974 0.1700 0.1405 0.2748 0.2467 0.1288 0.3330 PRO0.8 0.2563 0.2086 0.1307 0.4066 0.3483 0.1048 0.2613 PRO1.0 0.3185 0.2471 0.1228 0.5626 0.4635 0.0873 0.2100 Masood 0.2970 0.3144 0.1845 0.1203 0.1155 0.2249 26.1754

Poyato 0.1306 0.1110 0.2436 0.3349 0.3152 0.1157 0.6324

Table 2. Performance metrics for the dataset of the 16 images in Fig. 2 and the digital curve

Fig. 5. Example of a large curve, shape of a dog. The length of the digital curve is N=458.

 Dataset *<sup>p</sup>* 

Carmona-

in section 5.2.

 Dataset *<sup>r</sup>* 

Dataset of the 16 images in Fig. 2 Digital curve in section 5.2

 Dataset *<sup>r</sup>* 

DR Time

(seconds)

Fig. 6. Precision measure Dataset *<sup>p</sup>* for various datasets obtained by different algorithms.

Fig. 7. Reliability measure Dataset *<sup>r</sup>* for various datasets obtained by different algorithms.

Polygonal Representation of Digital Curves 87

achieved by making the edges of the polygon as long as possible. Further, we show that most contemporary measure of quality of fit are either directly related to precision or correspond to the local nature of fit. Since these measures are used in most contemporary algorithms, most of them concentrate on improving the local quality of the fit only. However, as demonstrated by the upper bound of the maximum deviation due to digitization, it may not be worth to reduce the precision below a certain level, since it is difficult to predict if the actual deviation is below the error bound due to digitization, some form of noise, or due to the nature of the curve. In our knowledge, only Carmona-Poyato

includes reliability (though indirectly) in its algorithm [Carmona-Poyato*, et al.*, 2010].

**7. References** 

Incorporated.

*23*(13), 1226-1236.

790.

*Pattern Recognition, 26*(10), 1563-1577.

*Machine Intelligence, 29*(9), 1590-1602.

*Pattern Recognition, 43*(1), 14-25.

We also propose line fitting algorithm that specifically optimize the curves for increasing both precision and reliability simultaneously. We hope that these measures are paid attention to by the research community and better algorithms for polygonal fitting are developed, which provide good local as well as global fit. In the future, it shall be useful to further improve the design of the precision and reliability measures such that they are more representative of the quality of fit. Such improvements in design will also influence the quality of polygonal approximation achieved by the polygonal approximation methods.

Acton, F. S. (1984). *Analysis of straight-line data*. New York: Peter Smith Publisher,

Arcelli, C., & Ramella, G. (1993). Finding contour-based abstractions of planar patterns.

Bhowmick, P., & Bhattacharya, B. B. (2007). Fast polygonal approximation of digital curves

Carmona-Poyato, A., Fernández-García, N. L., Medina-Carnicer, R., & Madrid-Cuevas, F. J.

Carmona-Poyato, A., Madrid-Cuevas, F. J., Medina-Carnicer, R., & Muñoz-Salinas, R. (2010).

Carmona-Poyato, A., Medina-Carnicer, R., Madrid-Cuevas, F. J., Muoz-Salinas, R., &

Chung, K. L., Liao, P. H., & Chang, J. M. (2008). Novel efficient two-pass algorithm for

Debled-Rennesson, I., Rémy, J. L., & Rouyer-Degli, J. (2005). Linear segmentation of discrete curves into blurred segments. *Discrete Applied Mathematics, 151*(1-3), 122-137.

*Journal of Visual Communication and Image Representation, 19*(4), 219-230. Chung, P. C., Tsai, C. T., Chen, E. L., & Sun, Y. N. (1994). Polygonal approximation using a competitive Hopfield neural network. *Pattern Recognition, 27*(11), 1505-1512. Davis, T. J. (1999). Fast decomposition of digital curves into polygons using the Haar

approximation of curves. *Pattern Recognition, 44*(1), 45-54.

using relaxed straightness properties. *IEEE Transactions on Pattern Analysis and* 

(2005). Dominant point detection: A new proposal. *Image and Vision Computing,* 

Polygonal approximation of digital planar curves through break point suppression.

Fernndez-Garca, N. L. (2011). A new measurement for assessing polygonal

closed polygonal approximation based on LISE and curvature constraint criteria.

transform. *IEEE Transactions on Pattern Analysis and Machine Intelligence, 21*(8), 786-

Fig. 8. Average dimensionality reduction for various datasets obtained by different algorithms.

#### **6. Conclusion**

Polygonal approximation of digital curves is an important step in many image processing applications. It is important that the fitted polygons are significantly smaller than the original curves, less sensitive to the digitization effect in the digital curves, and good representations of the curvature related properties of the digital curves. Thus, we need methods to deal with the digitization and consider both local and global properties of the fit.

First, we show that the maximum deviation of a digital curve obtained from a line segment has a definite upper bound. We show that this definite upper bound can be incorporated in a polygonal approximation method like L-RDP for making it parameter independent. Various results are shown to demonstrate the effectiveness of the parameter independent L-RDP method against digitization, dimensionality reduction, and retaining good global and local properties of the digital curve. In the future, we are hopeful that this error bound shall be incorporated in various recent and more sophisticated polygonal approximation and give a good performance boost to them, while making them free from heuristic choice of control parameters.

Second, we show that global and local properties of the fit are in contradiction with each other in general. We propose precision measure for measuring the local quality of fit and reliability measure for measuring the global quality of fit. Using them we show that better local fits are achieved by considering small edges of the polygons, while better global fit is

Fig. 8. Average dimensionality reduction for various datasets obtained by different

Polygonal approximation of digital curves is an important step in many image processing applications. It is important that the fitted polygons are significantly smaller than the original curves, less sensitive to the digitization effect in the digital curves, and good representations of the curvature related properties of the digital curves. Thus, we need methods to deal with the digitization and consider both local and global properties of the fit. First, we show that the maximum deviation of a digital curve obtained from a line segment has a definite upper bound. We show that this definite upper bound can be incorporated in a polygonal approximation method like L-RDP for making it parameter independent. Various results are shown to demonstrate the effectiveness of the parameter independent L-RDP method against digitization, dimensionality reduction, and retaining good global and local properties of the digital curve. In the future, we are hopeful that this error bound shall be incorporated in various recent and more sophisticated polygonal approximation and give a good performance boost to them, while making them free from heuristic choice of control

Second, we show that global and local properties of the fit are in contradiction with each other in general. We propose precision measure for measuring the local quality of fit and reliability measure for measuring the global quality of fit. Using them we show that better local fits are achieved by considering small edges of the polygons, while better global fit is

algorithms.

parameters.

**6. Conclusion** 

achieved by making the edges of the polygon as long as possible. Further, we show that most contemporary measure of quality of fit are either directly related to precision or correspond to the local nature of fit. Since these measures are used in most contemporary algorithms, most of them concentrate on improving the local quality of the fit only. However, as demonstrated by the upper bound of the maximum deviation due to digitization, it may not be worth to reduce the precision below a certain level, since it is difficult to predict if the actual deviation is below the error bound due to digitization, some form of noise, or due to the nature of the curve. In our knowledge, only Carmona-Poyato includes reliability (though indirectly) in its algorithm [Carmona-Poyato*, et al.*, 2010].

We also propose line fitting algorithm that specifically optimize the curves for increasing both precision and reliability simultaneously. We hope that these measures are paid attention to by the research community and better algorithms for polygonal fitting are developed, which provide good local as well as global fit. In the future, it shall be useful to further improve the design of the precision and reliability measures such that they are more representative of the quality of fit. Such improvements in design will also influence the quality of polygonal approximation achieved by the polygonal approximation methods.

#### **7. References**


Polygonal Representation of Digital Curves 89

Masood, A. (2008). Dominant point detection by reverse polygonization of digital curves.

McCarter, G., & Storkey, A. (2003). Air Freight image sequences.

Mokhtarian, F., & Mackworth, A. (1986). Scale-based description and recognition of planar

Pavlidis, T. (1976). Use of algorithms of piecewise approximations for picture processing applications. *ACM Transactions on Mathematical Software, 2*(4), 305-321. Perez, J. C., & Vidal, E. (1994). Optimum polygonal approximation of digitized curves.

Phillips, T. Y., & Rosenfeld, A. (1987). A method of curve partitioning using arc-chord

Prasad, D. K., Gupta, R. K., & Leung, M. K. H. (2011). An Error Bounded Tangent Estimator

Prasad, D. K., & Leung, M. K. H. (2010a). *An ellipse detection method for real images*. Paper

Prasad, D. K., & Leung, M. K. H. (2010b, 14-17 November). *Error analysis of geometric ellipse* 

Prasad, D. K., & Leung, M. K. H. (2010c, 26-29 Sept). *Reliability/Precision Uncertainty in Shape* 

Prasad, D. K., & Leung, M. K. H. (2012). Methods for ellipse detection from edge maps of real images. In F. Solari, M. Chessa & S. Sabatini (Eds.), *Machine Vision*: InTech. Prasad, D. K., Leung, M. K. H., Cho, S. Y., & Quek, C. (2011a, 28-30 Nov.). *A parameter* 

Ramer, U. (1972). An iterative procedure for the polygonal approximation of plane curves.

Ray, B. K., & Ray, K. S. (1992). An algorithm for detection of dominant points and polygonal approximation of digitized curves. *Pattern Recognition Letters, 13*(12), 849-856. Rosin, P. L. (1997). Techniques for assessing polygonal approximations of curves. *IEEE Transactions on Pattern Analysis and Machine Intelligence, 19*(6), 659-666. Rosin, P. L. (2002). Assessing the behaviour of polygonal approximation algorithms. *Pattern* 

Salotti, M. (2002). Optimal polygonal approximation of digitized curves using the sum of

Sankar, P. V., & Sharma, C. U. (1978). A parallel procedure for the detection of dominant points on a digital curve. *Computer Graphics and Image Processing, 7*(4), 403-412.

square deviations criterion. *Pattern Recognition, 35*(2), 435-443.

Symposium on Image and Video Technology (PSIVT 2010), Singapore. Prasad, D. K., & Leung, M. K. H. (2010, 26-28 Feb). *A hybrid approach for ellipse detection in real* 

for Digitized Elliptic Curves *Lecture Notes in Computer Science* (Vol. 6607, pp. 272-

presented at the 25th International Conference of Image and Vision Computing

*detection methods due to quantization.* Paper presented at the Fourth Pacific-Rim

*images.* Paper presented at the 2nd International Conference on Digital Image

*Fitting Problems.* Paper presented at the IEEE International Conference on Image

*independent line fitting method.* Paper presented at the Asian Conference on Pattern

curves and two-dimensional shapes. *IEEE Transactions on Pattern Analysis and* 

*Image and Vision Computing, 26*(5), 702-715.

*Machine Intelligence, PAMI-8*(1), 34-43.

*Pattern Recognition Letters, 15*(8), 743-750.

283): Springer Berlin / Heidelberg.

New Zealand (IVCNZ 2010).

Processing, Singapore.

Processing, Hong Kong.

*Recognition, 36*(2), 505-518.

Recognition (ACPR), Beijing, China.

*Computer Graphics and Image Processing, 1*(3), 244-256.

distance. *Pattern Recognition Letters, 5*(4), 285-288.

http://homepages.inf.ed.ac.uk/amos/afreightdata.html


Douglas, D. H., & Peucker, T. K. (1973). Algorithms for the reduction of the number of

Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2008). The

Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The

Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The

Fei-Fei, L., Fergus, R., & Perona, P. (2007). Learning generative visual models from few

Griffin, G., Holub, A., & Perona, P. Caltech-256 object category database. available at:

Gritzali, F., & Papakonstantinou, G. (1983). A fast piecewise linear approximation algorithm.

Kanungo, T., Jaisimha, M. Y., Palmer, J., & Haralick, R. M. (1995). Methodology for

Kolesnikov, A. (2008). *Constrained piecewise linear approximation of digital curves*, Tampa, FL. Kolesnikov, A., & Fränti, P. (2003). Reduced-search dynamic programming for approximation of polygonal curves. *Pattern Recognition Letters, 24*(14), 2243-2254. Kolesnikov, A., & Fränti, P. (2005). Data reduction of large vector graphics. *Pattern* 

Kolesnikov, A., & Fränti, P. (2007). Polygonal approximation of closed discrete curves.

Latecki, L. J., Sobel, M., & Lakaemper, R. (2009). Piecewise linear models with guaranteed

Lavallee, S., & Szeliski, R. (1995). Recovering the position and orientation of free-form

Leung, M. K. (1990). Dynamic two-strip algorithm in curve fitting. *Pattern Recognition, 23*(1-

Lowe, D. G. (1987). Three-dimensional object recognition from single two-dimensional

Marji, M., & Siy, P. (2004). Polygonal representation of digital planar curves through

closeness to the data. *IEEE Transactions on Pattern Analysis and Machine Intelligence,* 

objects from image contours using 3D distance maps. *IEEE Transactions on Pattern* 

dominant point detection - A nonparametric algorithm. *Pattern Recognition, 37*(11),

network.org/challenges/VOC/voc2007/workshop/index.html

network.org/challenges/VOC/voc2008/workshop/index.html

network.org/challenges/VOC/voc2009/workshop/index.html

network.org/challenges/VOC/voc2010/workshop/index.html

categories. *Computer Vision and Image Understanding, 106*(1), 59-70.

http://authors.library.caltech.edu/7694

*Signal Processing, 5*(3), 221-227.

*Image Processing, 4*(12), 1667-1674.

*Pattern Recognition, 40*(4), 1282-1293.

*Analysis and Machine Intelligence, 17*(4), 378-390.

images. *Artificial Intelligence, 31*(3), 355-395.

*Recognition, 38*(3), 381-394.

*31*(8), 1525-1531.

2), 69-79.

2113-2130.

points required to represent a digitized line or its caricature. *Cartographica: The International Journal for Geographic Information and Geovisualization, 10*(2), 112-122. Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The

PASCAL Visual Object Classes Challenge 2007 (VOC2007). http://www.pascal-

PASCAL Visual Object Classes Challenge 2008 (VOC2008). http://www.pascal-

PASCAL Visual Object Classes Challenge 2009 (VOC2009). http://www.pascal-

PASCAL Visual Object Classes Challenge 2010 (VOC2010). http://www.pascal-

training examples: An incremental Bayesian approach tested on 101 object

quantitative performance evaluation of detection algorithms. *IEEE Transactions on* 


**0**

**5**

*Brazil*

**Comparison of Border Descriptors and Pattern**

**Diagnose of Faults on Sucker-Rod**

*Universidade Federal do Rio Grande do Norte - UFRN*

Fábio Soares de Lima, Luiz Affonso Guedes and Diego R. Silva

**Pumping System**

**Recognition Techniques Applied to Detection and**

Due to high competition and the need to meet deadlines, modern industries with focus on market demand high availability and reliability of their equipment. With this view, in the last years the maintenance activity has undergone several changes which have led to an evolution in the standpoint of organization and planning of its execution. According to Kardec & Nascif

• The quick increase of the amount and the diversity of physical elements that compose

The concept of predictive maintenance has emerged as a result of these demands. Predictive maintenance is the regular monitoring of operating condition (variables and parameters) the performance - from a device or process that will provide the necessary data to ensure the

Historically, the first method of artificial elevation that was used in the oil industry was the sucker-rod pumping. Its importance is reflected in the number of installations found on industry, being the most widely elevation method used around the world. Its popularity is related to low cost of investments and maintenance, flexibility of flow and depth, good energy efficient and the possibility of operating with fluids of different compositions and viscosity in

The main advantages of Sucker-rod Pumping are: the simplicity of operation, maintenance and design of new installations. From normal conditions can be used until the end of productive life of a well and the pumping capacity can be modified, depending on the changes of the well's behavior. However, the main advantage of this method relates to the lower

This method is the most common way of artificial elevation (Alegre, Morooka & da Rocha, 1993; Schirmer & Toutain, 1991). It is estimated that 90% of artificial elevation in the world use the system of mechanical pumping (Nazi & Lea, 1994; Tripp, 1989). In Brazil, 64% of the

• New approaches about the maintenance organization and their responsibilities.

maximum allowed interval between repair and better intervention planning.

cost/production throughout the productive life of the well.

**1. Introduction**

(1998), the direct causes for this development are:

• Most complex engineering projects; • New methods for maintenance activity;

a wide temperature range.

varied equipment of process plants that must be kept available;


### **Comparison of Border Descriptors and Pattern Recognition Techniques Applied to Detection and Diagnose of Faults on Sucker-Rod Pumping System**

Fábio Soares de Lima, Luiz Affonso Guedes and Diego R. Silva *Universidade Federal do Rio Grande do Norte - UFRN Brazil*

#### **1. Introduction**

90 Digital Image Processing

Sarkar, D. (1993). A simple algorithm for detection of significant vertices for polygonal approximation of chain-coded curves. *Pattern Recognition Letters, 14*(12), 959-964. Sato, Y. (1992). Piecewise linear approximation of plane curves by perimeter optimization.

Sklansky, J., & Gonzalez, V. (1980). Fast polygonal approximation of digitized curves.

Strauss, O. (1996). *Reducing the precision/uncertainty duality in the Hough transform.* Paper

Strauss, O. (1999). Use the Fuzzy Hough transform towards reduction of the precision/uncertainty duality. *Pattern Recognition, 32*(11), 1911-1922. Tomek, I. (1975). More on piecewise linear approximation. *Computers and Biomedical Research,* 

Wall, K., & Danielsson, P. E. (1984). A fast sequential method for polygonal approximation of digitized curves. *Computer Vision, Graphics, & Image Processing, 28*(2), 220-227. Wang, B., Shu, H., Shi, C., & Luo, L. (2008). A novel stochastic search method for polygonal

approximation problem. *Neurocomputing, 71*(16-18), 3216-3223.

presented at the Proceedings of the IEEE International Conference on Image

*Pattern Recognition, 25*(12), 1535-1543.

*Pattern Recognition, 12*(5), 327-331.

Processing.

*8*(6), 568-572.

Due to high competition and the need to meet deadlines, modern industries with focus on market demand high availability and reliability of their equipment. With this view, in the last years the maintenance activity has undergone several changes which have led to an evolution in the standpoint of organization and planning of its execution. According to Kardec & Nascif (1998), the direct causes for this development are:


The concept of predictive maintenance has emerged as a result of these demands. Predictive maintenance is the regular monitoring of operating condition (variables and parameters) the performance - from a device or process that will provide the necessary data to ensure the maximum allowed interval between repair and better intervention planning.

Historically, the first method of artificial elevation that was used in the oil industry was the sucker-rod pumping. Its importance is reflected in the number of installations found on industry, being the most widely elevation method used around the world. Its popularity is related to low cost of investments and maintenance, flexibility of flow and depth, good energy efficient and the possibility of operating with fluids of different compositions and viscosity in a wide temperature range.

The main advantages of Sucker-rod Pumping are: the simplicity of operation, maintenance and design of new installations. From normal conditions can be used until the end of productive life of a well and the pumping capacity can be modified, depending on the changes of the well's behavior. However, the main advantage of this method relates to the lower cost/production throughout the productive life of the well.

This method is the most common way of artificial elevation (Alegre, Morooka & da Rocha, 1993; Schirmer & Toutain, 1991). It is estimated that 90% of artificial elevation in the world use the system of mechanical pumping (Nazi & Lea, 1994; Tripp, 1989). In Brazil, 64% of the

Pumping System 3

Applied to Detection and Diagnose of Faults on Sucker-Rod Pumping System

<sup>93</sup> Comparison of Border Descriptors and Pattern Recognition Techniques

recognition using a distance calculus technique (Euclidean Distance) or a similarity tool

This study is divided into more five sections. In the following section, the tools used are theoretically justified, discussing the techniques of descriptors based on edges and some tools for calculating distance metrics and statistics. In Section 3, the oil Sucker-Rod Pumping System is presented, being a study case for implementation of the proposal work. Then, in the next section the methodology for detection and recognition of the failure is presented. Section

The boundary descriptors are mathematic methods that describe a object or a region of figure. The descriptors are separated in two groups (Gonzalez et al., 2003): Descriptors based in contour (border) and Descriptors based in region. The first, it describe the object shape basing in its contour. Now, the region descriptors describe the object inside. The proper descriptor

In the diagnose process of faults in the Sucker-rod pumping system through dynamometer cards, the rotation feature is unnecessary, because several faults show the same contour, but

The contour descriptor by centroid have main focus to calculate the distance between the card geometric center to several points that compose the card to make a distance set (*D* = {*D*0, *D*1, ..., *Dn*}). The Equations 1 and 2 show the centroid calculus, where *N* are the quantity of points that compose the card and the ordered pair, *xc* and *yc*, represents the centroid.

> *N* ∑ *i*=1

> *N* ∑ *i*=1

*xi* (1)

*yi* (2)

(*xi* − *xc*)<sup>2</sup> + (*yi* − *yc*)<sup>2</sup> (3)

*xc* <sup>=</sup> <sup>1</sup> *N*

*yc* <sup>=</sup> <sup>1</sup> *N*

The Equation 3 shows the distance calculus between the centroid and several points.

*Di* = 

So, it can use the distance set like contour feature of dynamometer card.

5 presents the results obtained and finally the conclusions are presented in Section 6.

(Pearson Correlation).

**1.2 Document structure**

**2. Theoretical basis**

• Translation; • Rotation; • Scale; • Start Point.

they are rotated.

**2.1.1 Centroid**

**2.1 Description of boundary descriptors**

ideal must show the following invariant features:

total production is obtained through the mechanical pumping (de Oliveira Costa, 1994). In practice, the monitoring of the mechanical pumping system status is done by reading a card, called dynamometric card. With this card you can know about the condition of the pump located at the bottom of the well. The dynamometric card consists of a graph that relates the charge and the position. This graph reflects the current conditions of pumping (Barreto et al., 1996; Rogers et al., 1990). Thus, the card may take various formats during well's production and represent situations of normal operation or indicate a possible irregularity in mechanical pumping system.

The process of identifying situations of abnormal operation of the mechanical pumping system becomes, thus, a problem of visual information interpretation (Dickinson & Jennings, 1990). In any case, this approach may be influenced by several factors, such as the behavior of the system itself, because of its complexity, which results in diverse forms of dynamometric cards, beyond the knowledge and experience of the engineer responsible for the well. Besides that, nowadays, each field petroleum engineer is responsible for over one hundred wells equipped with mechanical pumping. In this case, the traditional process of interpretation becomes impracticable in an acceptable time.

#### **1.1 Objectives**

This study aims to contribute to the predictive maintenance field through the development of intelligent computing techniques (Russell, 2003) based on digital image processing, capable of preventing damage in a particular equipment or industrial process in a predictive way.

In scientific terms, the main objective is to propose and analyze the performance of nonparametric patterns recognition techniques in the context of fault detection and diagnosis using boundary descriptors and metrics or statistics mathematical tools.

In technological terms, the objective of this study is to contribute to the area of fault automatic detection and diagnosis in dynamical systems, by proposing a new architecture based on visual similarity of signatures (images) that represent operating conditions. This, in turn, will bring benefits that may complement the tools that nowadays operate in industrial parks.

The proposed approach is the automation of fault analysis and diagnosis in dynamic systems, basing on the following points:


In this study, it was used bottom hole dynamometer cards, because the surface cards incorporate various degenerative effects caused by the spread of the charge along the whole column of rods. These effects make surface cards represent only one well. When the bottom cards are used, it is possible to observe that the standards of the system operation are the same (de Almeida Barreto Filho, 1993).

Additionally, was chosen present a new approach of classification of the failures of sucker-rod pumping. As the fault diagnosis in mechanical pumping system is a recognition process of dynamometric cards references, various studies have been developed based on pattern recognition techniques using neural network or expert systems and have been proposed to improve accuracy and efficiency (Nazi & Lea, 1994) of this kind of diagnosis system (Alegre, A & Morooka, 1993; Alegre, Morooka & da Rocha, 1993; Barreto et al., 1996; Chacln, 1969; Dickinson & Jennings, 1990; Nazi & Lea, 1994; Rogers et al., 1990; Schirmer & Toutain, 1991; Schnitman et al., 2003; Xu et al., 2006). Thus, the approach used is based on pattern recognition using a distance calculus technique (Euclidean Distance) or a similarity tool (Pearson Correlation).

#### **1.2 Document structure**

2 Will-be-set-by-IN-TECH

total production is obtained through the mechanical pumping (de Oliveira Costa, 1994). In practice, the monitoring of the mechanical pumping system status is done by reading a card, called dynamometric card. With this card you can know about the condition of the pump located at the bottom of the well. The dynamometric card consists of a graph that relates the charge and the position. This graph reflects the current conditions of pumping (Barreto et al., 1996; Rogers et al., 1990). Thus, the card may take various formats during well's production and represent situations of normal operation or indicate a possible irregularity in mechanical

The process of identifying situations of abnormal operation of the mechanical pumping system becomes, thus, a problem of visual information interpretation (Dickinson & Jennings, 1990). In any case, this approach may be influenced by several factors, such as the behavior of the system itself, because of its complexity, which results in diverse forms of dynamometric cards, beyond the knowledge and experience of the engineer responsible for the well. Besides that, nowadays, each field petroleum engineer is responsible for over one hundred wells equipped with mechanical pumping. In this case, the traditional process of interpretation

This study aims to contribute to the predictive maintenance field through the development of intelligent computing techniques (Russell, 2003) based on digital image processing, capable of preventing damage in a particular equipment or industrial process in a predictive way.

In scientific terms, the main objective is to propose and analyze the performance of nonparametric patterns recognition techniques in the context of fault detection and diagnosis

In technological terms, the objective of this study is to contribute to the area of fault automatic detection and diagnosis in dynamical systems, by proposing a new architecture based on visual similarity of signatures (images) that represent operating conditions. This, in turn, will bring benefits that may complement the tools that nowadays operate in industrial parks. The proposed approach is the automation of fault analysis and diagnosis in dynamic systems,

In this study, it was used bottom hole dynamometer cards, because the surface cards incorporate various degenerative effects caused by the spread of the charge along the whole column of rods. These effects make surface cards represent only one well. When the bottom cards are used, it is possible to observe that the standards of the system operation are the same

Additionally, was chosen present a new approach of classification of the failures of sucker-rod pumping. As the fault diagnosis in mechanical pumping system is a recognition process of dynamometric cards references, various studies have been developed based on pattern recognition techniques using neural network or expert systems and have been proposed to improve accuracy and efficiency (Nazi & Lea, 1994) of this kind of diagnosis system (Alegre, A & Morooka, 1993; Alegre, Morooka & da Rocha, 1993; Barreto et al., 1996; Chacln, 1969; Dickinson & Jennings, 1990; Nazi & Lea, 1994; Rogers et al., 1990; Schirmer & Toutain, 1991; Schnitman et al., 2003; Xu et al., 2006). Thus, the approach used is based on pattern

using boundary descriptors and metrics or statistics mathematical tools.

• A description model based on knowledge through system signatures, and;

• Fault recognition through metric distances or correlations.

pumping system.

**1.1 Objectives**

becomes impracticable in an acceptable time.

basing on the following points:

(de Almeida Barreto Filho, 1993).

This study is divided into more five sections. In the following section, the tools used are theoretically justified, discussing the techniques of descriptors based on edges and some tools for calculating distance metrics and statistics. In Section 3, the oil Sucker-Rod Pumping System is presented, being a study case for implementation of the proposal work. Then, in the next section the methodology for detection and recognition of the failure is presented. Section 5 presents the results obtained and finally the conclusions are presented in Section 6.

#### **2. Theoretical basis**

#### **2.1 Description of boundary descriptors**

The boundary descriptors are mathematic methods that describe a object or a region of figure. The descriptors are separated in two groups (Gonzalez et al., 2003): Descriptors based in contour (border) and Descriptors based in region. The first, it describe the object shape basing in its contour. Now, the region descriptors describe the object inside. The proper descriptor ideal must show the following invariant features:


In the diagnose process of faults in the Sucker-rod pumping system through dynamometer cards, the rotation feature is unnecessary, because several faults show the same contour, but they are rotated.

#### **2.1.1 Centroid**

The contour descriptor by centroid have main focus to calculate the distance between the card geometric center to several points that compose the card to make a distance set (*D* = {*D*0, *D*1, ..., *Dn*}). The Equations 1 and 2 show the centroid calculus, where *N* are the quantity of points that compose the card and the ordered pair, *xc* and *yc*, represents the centroid.

$$\mathbf{x}\_{\mathcal{C}} = \frac{1}{N} \sum\_{i=1}^{N} \mathbf{x}\_{i} \tag{1}$$

$$y\_{\mathcal{L}} = \frac{1}{N} \sum\_{i=1}^{N} y\_i \tag{2}$$

The Equation 3 shows the distance calculus between the centroid and several points.

$$D\_i = \sqrt{(\mathbf{x}\_i - \mathbf{x}\_c)^2 + (y\_i - y\_c)^2} \tag{3}$$

So, it can use the distance set like contour feature of dynamometer card.

Pumping System 5

Applied to Detection and Diagnose of Faults on Sucker-Rod Pumping System

<sup>95</sup> Comparison of Border Descriptors and Pattern Recognition Techniques

*N* = 0, 1, 2, ..., *N* − 1 and *Fn* are transformation coefficient of *z*(*k*). The descriptors can be rotation invariants when used magnitudes of transformation, |*Fn*|. The scale can be normalize

The Euclidean distance between two points is the length of the line segment connecting them. In Cartesian coordinates, if *p* = (*p*1, *p*2, ..., *pn*) and *q* = (*q*1, *q*2, ..., *qn*) are two points in

*D* = *d*(*p*, *q*) = *d*(*q*, *p*) (13)

(*pi* + *qi*)<sup>2</sup> (15)

(*p*<sup>1</sup> − *<sup>q</sup>*1)<sup>2</sup> + (*p*<sup>2</sup> − *<sup>q</sup>*2)<sup>2</sup> + ... + (*pn* − *qn*)<sup>2</sup> (14)

Euclidean space, then the distance from *p* to *q*, or from *q* to *p* is given by Equation 13.

 *n* ∑ *i*=1

The Pearson correlation (or "product-moment correlation coefficient", or also "*r* of Pearson") measure the correlation degree and the direction between two variables of metric scale. This coefficient is represented by *r* and can be between −1 and 1. So, *r* can be analyzed in the

−1**:** It means a perfect correlation too, but, in this analysis, the variables direction is opposite.

In other words, the signal of the result correlation shows if the correlation is positive or

*<sup>i</sup>*=<sup>1</sup> (*xi* − *x*¯) (*yi* − *y*¯)

*n* ∑ *i*=1 *xi*

*n* ∑ *i*=1 *yi* *<sup>i</sup>*=<sup>1</sup> (*yi* − *y*¯)

2

2 · ∑*n*

Where *x*1, *x*2, ..., *xn* e *y*1, *y*2, ..., *yn* are measure values both of variables. Moreover, *xn* can

The first artificial lifting method was the sucker rod pumping, that appears after the birth of the oil industry. The importance of this method is showed in the number of installations now

*D* =

+1**:** It means a perfect correlation and the variables are in the same direction;

The Pearson correlation coefficient is calculated according to the next formula:

*<sup>i</sup>*=<sup>1</sup> (*xi* − *x*¯)

*<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *n* ·

*<sup>y</sup>*¯ <sup>=</sup> <sup>1</sup> *n* ·

These variables (*xn* and *yn*) are arithmetics meanings, both of variables *x* and *y*.

operating in the world. The Figure 1 presents a Sucker Rod Pumping Unit.

0**:** In this case, the variables does not have a linear dependence.

negative and the proportion variable shows the correlation force.

*<sup>r</sup>* <sup>=</sup> <sup>∑</sup>*<sup>n</sup>*

 ∑*n*

to divide the magnitudes of coefficient by |*F*1|.

**2.2.1 Euclidean distance**

**2.2.2 Pearson correlation**

following manners:

be written:

And *yn* can be:

**3. Sucker-rod pumping system**

**2.2 Mathematical tools for calculating of similarity**

*D* = 

#### **2.1.2 Curvature descriptor**

The curvature descriptor is a simple and easy algorithm to develop and the main purpose is calculate the distance between any point in relation to next point (clockwise or counter-clockwise). The Equations 4, 5 and 6 show the distance calculus.

$$D\_{\mathbf{x}}i = (\mathbf{x}\_{i} - \mathbf{x}\_{i} + \mathbf{1})^{2} \tag{4}$$

$$D\_y i = (y\_i - y\_i + 1)^2 \tag{5}$$

*Dc* = *Dxi* − *Dyi*) (6)

#### **2.1.3 K-curvature**

The K-curvature extractor shows the object contour through of the created angle relation between two vectors. From any initial point, *pi*, two points, *pi*<sup>+</sup>*<sup>k</sup>* and *pi*<sup>+</sup>2*k*, are chosen with a spacing of *k* values with the purpose to eliminate contour noises. So, two vectors (*v* and *w*) are defined. The vector *v* is formed by points *pi* and *pi*<sup>+</sup>*k*, while the vector *w* is formed by *pi*<sup>+</sup>*<sup>k</sup>* and *pi*<sup>+</sup>2*k*. The Equation 7 shows the angle calculus between the vectors.

$$\theta = \cos^{-1} \frac{\boldsymbol{v} \cdot \boldsymbol{w}}{|\boldsymbol{v}| \cdot |\boldsymbol{w}|} \tag{7}$$

So, *v* · *w* is the scalar product between vectors (Equation 8) and |*v*| and |*w*| are the norms of vectors (Equation 9 and 10).

$$
\boldsymbol{v} \cdot \boldsymbol{w} = \upsilon\_1 \upsilon\_1 + \upsilon\_2 \upsilon\_2 + \dots + \upsilon\_n \upsilon\_n \tag{8}
$$

$$|v| = \sqrt{v \cdot v} \tag{9}$$

$$|w| = \sqrt{w \cdot w} \tag{10}$$

#### **2.1.4 Fourier Descriptor**

The Fourier Descriptor is compact and light algorithm. To develop this algorithm, consider the following points: (*xk*, *yk*) that represent the object contour coordinates, where *k* = 0, 1, 2, ..., *N* − 1 and *N* is the points quantity of border. The Equation 11 indicates the complex function of object contour coordinates.

$$z(k) = (\mathbf{x}\_k) + j(y\_k) \tag{11}$$

In spite of sequence not be important to this descriptor, in this work, *x* is the position of polish rod and *y* is the applied system force. The Fourier descriptors (Equation 12) are make applying the Discrete Fourier Transform (DFT) in the Equation 11.

$$F\_{\mathbb{N}} = \frac{1}{N} \sum\_{k=0}^{N-1} z(k) e^{-\frac{\langle 2\pi k \rangle}{N}} \tag{12}$$

*N* = 0, 1, 2, ..., *N* − 1 and *Fn* are transformation coefficient of *z*(*k*). The descriptors can be rotation invariants when used magnitudes of transformation, |*Fn*|. The scale can be normalize to divide the magnitudes of coefficient by |*F*1|.

#### **2.2 Mathematical tools for calculating of similarity**

#### **2.2.1 Euclidean distance**

4 Will-be-set-by-IN-TECH

The curvature descriptor is a simple and easy algorithm to develop and the main purpose is calculate the distance between any point in relation to next point (clockwise or

The K-curvature extractor shows the object contour through of the created angle relation between two vectors. From any initial point, *pi*, two points, *pi*<sup>+</sup>*<sup>k</sup>* and *pi*<sup>+</sup>2*k*, are chosen with a spacing of *k* values with the purpose to eliminate contour noises. So, two vectors (*v* and *w*) are defined. The vector *v* is formed by points *pi* and *pi*<sup>+</sup>*k*, while the vector *w* is formed by *pi*<sup>+</sup>*<sup>k</sup>*

*<sup>θ</sup>* <sup>=</sup> *cos*−<sup>1</sup> *<sup>v</sup>* · *<sup>w</sup>*

So, *v* · *w* is the scalar product between vectors (Equation 8) and |*v*| and |*w*| are the norms of

The Fourier Descriptor is compact and light algorithm. To develop this algorithm, consider the following points: (*xk*, *yk*) that represent the object contour coordinates, where *k* = 0, 1, 2, ..., *N* − 1 and *N* is the points quantity of border. The Equation 11 indicates the complex

In spite of sequence not be important to this descriptor, in this work, *x* is the position of polish rod and *y* is the applied system force. The Fourier descriptors (Equation 12) are make applying

> *N*−1 ∑ *k*=0

*z*(*k*)*e*

<sup>−</sup> *<sup>j</sup>*2*πnk*

*Fn* <sup>=</sup> <sup>1</sup> *N*

*Dxi* = (*xi* <sup>−</sup> *xi* <sup>+</sup> <sup>1</sup>)<sup>2</sup> (4)

*Dyi* = (*yi* <sup>−</sup> *yi* <sup>+</sup> <sup>1</sup>)<sup>2</sup> (5)

*Dxi* − *Dyi*) (6)

<sup>|</sup>*v*|·|*w*<sup>|</sup> (7)

*v* · *w* = *v*1*w*<sup>1</sup> + *v*2*w*<sup>2</sup> + ... + *vnwn* (8)

<sup>|</sup>*v*<sup>|</sup> <sup>=</sup> <sup>√</sup>*<sup>v</sup>* · *<sup>v</sup>* (9)

<sup>|</sup>*w*<sup>|</sup> <sup>=</sup> <sup>√</sup>*<sup>w</sup>* · *<sup>w</sup>* (10)

*z*(*k*)=(*xk*) + *j*(*yk*) (11)

*<sup>N</sup>* (12)

counter-clockwise). The Equations 4, 5 and 6 show the distance calculus.

*Dc* = 

and *pi*<sup>+</sup>2*k*. The Equation 7 shows the angle calculus between the vectors.

**2.1.2 Curvature descriptor**

**2.1.3 K-curvature**

vectors (Equation 9 and 10).

**2.1.4 Fourier Descriptor**

function of object contour coordinates.

the Discrete Fourier Transform (DFT) in the Equation 11.

The Euclidean distance between two points is the length of the line segment connecting them. In Cartesian coordinates, if *p* = (*p*1, *p*2, ..., *pn*) and *q* = (*q*1, *q*2, ..., *qn*) are two points in Euclidean space, then the distance from *p* to *q*, or from *q* to *p* is given by Equation 13.

$$D = d(p, q) = d(q, p) \tag{13}$$

$$D = \sqrt{(p\_1 - q\_1)^2 + (p\_2 - q\_2)^2 + \dots + (p\_n - q\_n)^2} \tag{14}$$

$$D = \sqrt{\sum\_{i=1}^{n} (p\_i + q\_i)^2} \tag{15}$$

#### **2.2.2 Pearson correlation**

The Pearson correlation (or "product-moment correlation coefficient", or also "*r* of Pearson") measure the correlation degree and the direction between two variables of metric scale. This coefficient is represented by *r* and can be between −1 and 1. So, *r* can be analyzed in the following manners:

+1**:** It means a perfect correlation and the variables are in the same direction;

−1**:** It means a perfect correlation too, but, in this analysis, the variables direction is opposite. 0**:** In this case, the variables does not have a linear dependence.

In other words, the signal of the result correlation shows if the correlation is positive or negative and the proportion variable shows the correlation force.

The Pearson correlation coefficient is calculated according to the next formula:

$$r = \frac{\sum\_{i=1}^{n} \left(\boldsymbol{x}\_{i} - \bar{\boldsymbol{x}}\right) \left(\boldsymbol{y}\_{i} - \bar{\boldsymbol{y}}\right)}{\sqrt{\sum\_{i=1}^{n} \left(\boldsymbol{x}\_{i} - \bar{\boldsymbol{x}}\right)^{2}} \cdot \sqrt{\sum\_{i=1}^{n} \left(\boldsymbol{y}\_{i} - \bar{\boldsymbol{y}}\right)^{2}}}$$

Where *x*1, *x*2, ..., *xn* e *y*1, *y*2, ..., *yn* are measure values both of variables. Moreover, *xn* can be written:

$$\bar{\mathfrak{x}} = \frac{1}{n} \cdot \sum\_{i=1}^{n} \mathfrak{x}\_i$$

And *yn* can be:

$$\bar{y} = \frac{1}{n} \cdot \sum\_{i=1}^{n} y\_i$$

These variables (*xn* and *yn*) are arithmetics meanings, both of variables *x* and *y*.

#### **3. Sucker-rod pumping system**

The first artificial lifting method was the sucker rod pumping, that appears after the birth of the oil industry. The importance of this method is showed in the number of installations now operating in the world. The Figure 1 presents a Sucker Rod Pumping Unit.

Pumping System 7

Applied to Detection and Diagnose of Faults on Sucker-Rod Pumping System

<sup>97</sup> Comparison of Border Descriptors and Pattern Recognition Techniques

There are two types of dynamometer cards: the surface and the down-hole card. The charges are recorded in the surface through the dynamometers and in the down-hole through special devices or mathematical models. The dynamometer card is one of main tools to analyze and review the condition of the system. This card is a record of the charges along the rod string path. It is possible to view several conditions of pumping through the dynamometer card. In

(a) A Example of Real Card (b) Cards Patterns

this subsection, some Sucker Rod Pumping System cards are presented. Each card showed was chosen based on the main problems of oil fields and they can be found in other previous

These presented cards (Figure 2(b)) are some reference patterns for the proposed model in the

This patterns happen when there is a leak in the down-hole valves (Traveling or Standing

The normal pumping pattern is associated with the follow characteristics:

These patterns are associated with the follow characteristics:

**3.2 Reference card patterns**

ϰϬ

ϮϬ

Ϭ

forward section.

**3.2.2 Fluid pound**

**3.2.3 Valve leak**

valve).

• Low suction pressure; • Low interference of gas; • Blocked Pump Suction.

**3.2.1 Normal operation**

papers.(dos Santos Côrrea, 1995)

• High volumetric Efficiency; • Low interference of gas;

• Low or medium suction pressure.

Ϭ ϮϬ ϰϬ ϲϬ ϴϬ ϭϬϬ **ZŽĚWŽƐŝƚŝŽŶ;йͿ**

**>ŽĂĚ;йͿ**

ϲϬ

ϴϬ

ϭϬϬ

Fig. 1. Sucker Rod Pumping Unit

The great success of the sucker rod pumping system is linked to low cost in investment, maintenance, flow and deep flexibility, good energetic efficiency and the possibility to operate several down-hole conditions. But, the main advantage is the lowest cost/production relationship during the field production life.

#### **3.1 Components of the sucker rod pumping**

#### **3.1.1 Downhole pump**

The Downhole Pump is a positive displacement pump - in other words, when the fluid gets in suction, it does not return.

#### **3.1.2 Rod string**

The string rod is responsible for providing the surface energy to the downhole pump.

#### **3.1.3 Sucker rod pumping unit**

The pumping unit changes the rotation movement of the electric motor in the applied reciprocate movement for polish rod, while the reduction box decreases the rotational speed of the electric motor to allow the pumping speed.

#### **3.1.4 Dynamometer card**

A dynamometer card is a graph of effects originated to active charge in the pump during a pump cycle. The Figure 2(a) presents a example of real card.

#### **3.2 Reference card patterns**

6 Will-be-set-by-IN-TECH

The great success of the sucker rod pumping system is linked to low cost in investment, maintenance, flow and deep flexibility, good energetic efficiency and the possibility to operate several down-hole conditions. But, the main advantage is the lowest cost/production

The Downhole Pump is a positive displacement pump - in other words, when the fluid gets

The pumping unit changes the rotation movement of the electric motor in the applied reciprocate movement for polish rod, while the reduction box decreases the rotational speed

A dynamometer card is a graph of effects originated to active charge in the pump during a

The string rod is responsible for providing the surface energy to the downhole pump.

Fig. 1. Sucker Rod Pumping Unit

**3.1.1 Downhole pump**

**3.1.2 Rod string**

in suction, it does not return.

**3.1.3 Sucker rod pumping unit**

**3.1.4 Dynamometer card**

relationship during the field production life.

**3.1 Components of the sucker rod pumping**

of the electric motor to allow the pumping speed.

pump cycle. The Figure 2(a) presents a example of real card.

There are two types of dynamometer cards: the surface and the down-hole card. The charges are recorded in the surface through the dynamometers and in the down-hole through special devices or mathematical models. The dynamometer card is one of main tools to analyze and review the condition of the system. This card is a record of the charges along the rod string path. It is possible to view several conditions of pumping through the dynamometer card. In

this subsection, some Sucker Rod Pumping System cards are presented. Each card showed was chosen based on the main problems of oil fields and they can be found in other previous

These presented cards (Figure 2(b)) are some reference patterns for the proposed model in the forward section.

#### **3.2.1 Normal operation**

The normal pumping pattern is associated with the follow characteristics:

• High volumetric Efficiency;

papers.(dos Santos Côrrea, 1995)


#### **3.2.2 Fluid pound**

These patterns are associated with the follow characteristics:


#### **3.2.3 Valve leak**

This patterns happen when there is a leak in the down-hole valves (Traveling or Standing valve).

Pumping System 9

Applied to Detection and Diagnose of Faults on Sucker-Rod Pumping System

<sup>99</sup> Comparison of Border Descriptors and Pattern Recognition Techniques

The pattern cards are selected by an expert engineer. Next, the selected cards are available for system to process. From this moment the treatment is similar to the field card as the pattern

After the data acquisition and patterns selection, the cards are processed using the mathematical tool showed in the Section 2.1. Thereby, each field card can be used to calculate

Any signature can be compared with a pattern through their descriptors using some mathematical tools for calculating distance. Thus, one similarity value (coefficient) is generated for every calculation of distance between one signature and the different patterns.

It was necessary to use a classifier to recognize what pattern is closer to the field card. In this study, a simple classifier was used and it is just a maximum function. This function is applied in the generated table in Section 4.2.2. Thereafter, the Euclidean Distance calculus that has the

This section is divided in three subsection. The first subsection, the general results are showed and discussed. In the second subsection, each descriptor is tested for invariance of the features needed to Sucker-rod pumping (Scale, Translation and Start point). After, it is showed a modification to improve the performance of recognition. In the last, the consolidate results

The Figures 3, 4, 5 and 6 are the result of pattern recognition analysis with Euclidean Distance. The K-Curvature Descriptor and Fourier Descriptor are the best performance, but in the

The Figures 7, 8, 9 and 10 are the result of pattern recognition analysis with Pearson Correlation. The Centroid Descriptor and Fourier Descriptor are the best performance, but in the analysis of Fluid Pound and Gas Lock Pattern, both also do not present good results.

In this subsection, the presented results are about the robustness tests of descriptors in relation to features that must be invariant and for fault recognize on Sucker-rod pumping system. In all the tests, the chosen card represents a fault pattern card. The results are present in the Table

analysis of Fluid Pound and Gas Lock Pattern, both not show good results.

**4.2.1 Patterns selection**

**4.2.2 Descriptors generator**

**4.2.3 Calculating of similarity**

lower value represents the fault.

**5.1 General results of euclidean distance**

**5.2 General results of pearson correlation**

**5.3 Tests of invariant characteristics**

**5. Results**

are presented.

1.

the distance with a pattern card that has its own identity.

**4.2.4 Minimum classifier and fault recognition**

cards.

Fig. 2. Stages of Model Proposed

#### **4. Methodology**

In this part, it will be presented the used methodology in approximately 1500 dynamometric cards from the sucker-rod pumping system. The current methodology meet four reference cards. Next, the data flow and applied descriptors will be presented.

#### **4.1 Data flow**

The used model of data flow is based on selection and processing faults pattern cards. Thus, a border descriptor is generated for each pattern card. After this step, it is repeated for analyze each field card. After the border descriptor of the field card, the Euclidean Distance is calculated with each border descriptor of the pattern cards. The result of each distance calculus is compared in a minimum function. The lower value is linked to the closest pattern of the field card.

The Figure 2 presents an information flow in the proposed model.

#### **4.2 Data acquisition**

The data is obtained through the supervisory software. This software gathers the field variables like current, force, horse head position, head pressure, down-hole pressure, among others.

The dynamometer cards are a two dimension graph of force (ordinate axis) *versus* horse head position (abscissa axis). This card consists of one hundred points.

#### **4.2.1 Patterns selection**

8 Will-be-set-by-IN-TECH

In this part, it will be presented the used methodology in approximately 1500 dynamometric cards from the sucker-rod pumping system. The current methodology meet four reference

The used model of data flow is based on selection and processing faults pattern cards. Thus, a border descriptor is generated for each pattern card. After this step, it is repeated for analyze each field card. After the border descriptor of the field card, the Euclidean Distance is calculated with each border descriptor of the pattern cards. The result of each distance calculus is compared in a minimum function. The lower value is linked to the closest pattern

The data is obtained through the supervisory software. This software gathers the field variables like current, force, horse head position, head pressure, down-hole pressure, among

The dynamometer cards are a two dimension graph of force (ordinate axis) *versus* horse head

cards. Next, the data flow and applied descriptors will be presented.

The Figure 2 presents an information flow in the proposed model.

position (abscissa axis). This card consists of one hundred points.

Fig. 2. Stages of Model Proposed

**4. Methodology**

**4.1 Data flow**

of the field card.

**4.2 Data acquisition**

others.

The pattern cards are selected by an expert engineer. Next, the selected cards are available for system to process. From this moment the treatment is similar to the field card as the pattern cards.

#### **4.2.2 Descriptors generator**

After the data acquisition and patterns selection, the cards are processed using the mathematical tool showed in the Section 2.1. Thereby, each field card can be used to calculate the distance with a pattern card that has its own identity.

#### **4.2.3 Calculating of similarity**

Any signature can be compared with a pattern through their descriptors using some mathematical tools for calculating distance. Thus, one similarity value (coefficient) is generated for every calculation of distance between one signature and the different patterns.

#### **4.2.4 Minimum classifier and fault recognition**

It was necessary to use a classifier to recognize what pattern is closer to the field card. In this study, a simple classifier was used and it is just a maximum function. This function is applied in the generated table in Section 4.2.2. Thereafter, the Euclidean Distance calculus that has the lower value represents the fault.

#### **5. Results**

This section is divided in three subsection. The first subsection, the general results are showed and discussed. In the second subsection, each descriptor is tested for invariance of the features needed to Sucker-rod pumping (Scale, Translation and Start point). After, it is showed a modification to improve the performance of recognition. In the last, the consolidate results are presented.

#### **5.1 General results of euclidean distance**

The Figures 3, 4, 5 and 6 are the result of pattern recognition analysis with Euclidean Distance. The K-Curvature Descriptor and Fourier Descriptor are the best performance, but in the analysis of Fluid Pound and Gas Lock Pattern, both not show good results.

#### **5.2 General results of pearson correlation**

The Figures 7, 8, 9 and 10 are the result of pattern recognition analysis with Pearson Correlation. The Centroid Descriptor and Fourier Descriptor are the best performance, but in the analysis of Fluid Pound and Gas Lock Pattern, both also do not present good results.

#### **5.3 Tests of invariant characteristics**

In this subsection, the presented results are about the robustness tests of descriptors in relation to features that must be invariant and for fault recognize on Sucker-rod pumping system. In all the tests, the chosen card represents a fault pattern card. The results are present in the Table 1.

Pumping System 11

Applied to Detection and Diagnose of Faults on Sucker-Rod Pumping System

<sup>101</sup> Comparison of Border Descriptors and Pattern Recognition Techniques

*Normal 1,11 1,11 1,11 1,11 Gas Lock 33,33 25,56 46,67 5,56 Fluid Pound 57,78 73,33 52,22 92,22 Standing Valve 7,78 0,00 0,00 1,11 Traveling Valve 0,00 0,00 0,00 0,00*

*Normal 0,24 0,24 0,24 0,24 Gas Lock 26,09 12,56 48,07 6,76 Fluid Pound 54,83 87,20 51,69 83,33 Standing Valve 15,94 0,00 0,00 4,59 Traveling Valve 2,90 0,00 0,00 5,07*

Fig. 6. Euclidean Distance - Results to Showed Cards of Fluid Pound

*K-Curvature Centroid Curvature* 

*Descriptor*

*Fourier Descriptor*

*K-Curvature Centroid Curvature* 

*Descriptor*

*Fourier Descriptor*

*0,00 10,00 20,00 30,00 40,00 50,00 60,00 70,00 80,00 90,00 100,00*

*0,00 10,00 20,00 30,00 40,00 50,00 60,00 70,00 80,00 90,00*

Fig. 5. Euclidean Distance - Results to Showed Cards of Gas Lock

Fig. 3. Euclidean Distance - Results to Showed Cards of Leaking Standing Valve

Fig. 4. Euclidean Distance - Results to Showed Cards of Leaking Traveling Valve

10 Will-be-set-by-IN-TECH

*Normal 0,00 0,00 0,00 0,00 Gas Lock 10,81 5,41 100,00 0,00 Fluid Pound 8,11 94,59 0,00 8,11 Standing Valve 72,97 0,00 0,00 86,49 Traveling Valve 8,11 0,00 0,00 5,41*

*Normal 0,00 0,00 0,00 0,00 Gas Lock 0,00 6,90 100,00 0,00 Fluid Pound 0,00 93,10 0,00 6,90 Standing Valve 6,90 0,00 0,00 0,00 Traveling Valve 93,10 0,00 0,00 93,10*

Fig. 4. Euclidean Distance - Results to Showed Cards of Leaking Traveling Valve

*K-Curvature Centroid Curvature* 

*Descriptor*

*Fourier Descriptor*

Fig. 3. Euclidean Distance - Results to Showed Cards of Leaking Standing Valve

*K-Curvature Centroid Curvature* 

*Descriptor*

*Fourier Descriptor*

*0,00 10,00 20,00 30,00 40,00 50,00 60,00 70,00 80,00 90,00 100,00*

*0,00 10,00 20,00 30,00 40,00 50,00 60,00 70,00 80,00 90,00 100,00*

Fig. 5. Euclidean Distance - Results to Showed Cards of Gas Lock

Fig. 6. Euclidean Distance - Results to Showed Cards of Fluid Pound

Pumping System 13

Applied to Detection and Diagnose of Faults on Sucker-Rod Pumping System

<sup>103</sup> Comparison of Border Descriptors and Pattern Recognition Techniques

*K-Curvature Centroid Curvature* 

*Normal 0 0 0 1,111111 Gas Lock 17,777778 18,888889 21,111111 7,777778 Fluid Pound 74,444444 80 74,444444 91,111111 Standing Valve 7,777778 1,111111 4,444444 0 Traveling Valve 0000*

*K-Curvature Centroid Curvature* 

*Normal 0 0,483092 0 0 Gas Lock 17,63285 5,797101 26,328502 7,246377 Fluid Pound 56,038647 84,782609 58,21256 89,855072 Standing Valve 24,879227 8,937198 13,52657 0,724638 Traveling Valve 1,449275 0 1,932367 2,173913*

*Descriptor*

*Fourier Descriptor*

*Descriptor*

*Fourier Descriptor*

Fig. 10. Pearson Correlation - Results to Showed Cards of Fluid Pound

Fig. 9. Pearson Correlation - Results to Showed Cards of Gas Lock

Fig. 7. Pearson Correlation - Results to Showed Cards of Leaking Standing Valve

Fig. 8. Pearson Correlation - Results to Showed Cards of Leaking Traveling Valve

12 Will-be-set-by-IN-TECH

*K-Curvature Centroid Curvature* 

*Normal 0 0 0 5,405405 Gas Lock 8,108108 0 8,108108 0 Fluid Pound 5,405405 2,702703 2,702703 2,702703 Standing Valve 86,486486 89,189189 89,189189 91,891892 Traveling Valve 0 8,108108 0 0*

*K-Curvature Centroid Curvature* 

*Normal 0 6,896552 0 20,689655 Gas Lock 3,448276 3,448276 3,448276 10,344828 Fluid Pound 10,344828 3,448276 3,448276 3,448276 Standing Valve 41,37931 17,241379 82,758621 10,344828 Traveling Valve 44,827586 68,965517 10,344828 55,172414*

Fig. 8. Pearson Correlation - Results to Showed Cards of Leaking Traveling Valve

*Descriptor*

*Fourier Descriptor*

Fig. 7. Pearson Correlation - Results to Showed Cards of Leaking Standing Valve

*Descriptor*

*Fourier Descriptor*

Fig. 9. Pearson Correlation - Results to Showed Cards of Gas Lock

Fig. 10. Pearson Correlation - Results to Showed Cards of Fluid Pound

Pumping System 15

Applied to Detection and Diagnose of Faults on Sucker-Rod Pumping System

<sup>105</sup> Comparison of Border Descriptors and Pattern Recognition Techniques


10

Fault Modified Start Point

20

30

40

50

60

70

80

Start Point

90

100

Fault Scaled Fault

<sup>0</sup> <sup>20</sup> <sup>40</sup> <sup>60</sup> <sup>80</sup> <sup>100</sup> <sup>0</sup>


10

20

Start Point

30

40

50

60

70

80

90

100

Fig. 12. Used Example to Teste Scale Invariance

<sup>0</sup> <sup>20</sup> <sup>40</sup> <sup>60</sup> <sup>80</sup> <sup>100</sup> <sup>0</sup>

Fig. 13. Used Example to Teste Start Point Invariance


0

0.5

1


Table 1. Invariance Tests

Fig. 11. Used Example to Teste Translation Invariance

#### **5.3.1 Translation invariance**

Translation tests, the chosen card was translated as shown in Figure 11. As expected, all descriptors were successful in recognizing.

#### **5.3.2 Scaling invariance**

Scaling tests, the chosen card was scaled as shown in Figure 12. And also, all descriptors were successful in recognizing.

#### **5.3.3 Start point invariance**

Start point tests, the start point was changed. The Figure 13 shows the both start points (Original Start Point and Modified Start Point) from data acquisition. But now, as can be seen, all descriptors were not successful.

The Fourier descriptor is able to solve this problem. Modifying the Equation 12 to 16 through to calculate of absolute value of Fourier Transform. But, this procedure insert a new problem to recognize of fault that are equal when rotated. It can be seen in the Figure 14 where different faults are recognized as the same when this process is used.

$$F\_{\rm H} = abs \left\{ \frac{1}{N} \sum\_{k=0}^{N-1} z(k) e^{-\frac{j2\pi nk}{N}} \right\} \tag{16}$$

14 Will-be-set-by-IN-TECH

Descriptors Translation Scale Start Point Centroid OK OK FAIL K-Curvature OK OK FAIL Curvature OK OK FAIL Fourier OK OK FAIL


Translation tests, the chosen card was translated as shown in Figure 11. As expected, all

Scaling tests, the chosen card was scaled as shown in Figure 12. And also, all descriptors were

Start point tests, the start point was changed. The Figure 13 shows the both start points (Original Start Point and Modified Start Point) from data acquisition. But now, as can be

The Fourier descriptor is able to solve this problem. Modifying the Equation 12 to 16 through to calculate of absolute value of Fourier Transform. But, this procedure insert a new problem to recognize of fault that are equal when rotated. It can be seen in the Figure 14 where different

> *N*−1 ∑ *k*=0

*z*(*k*)*e*

<sup>−</sup> *<sup>j</sup>*2*πnk N* 

(16)

Fault Translated Fault

Table 1. Invariance Tests

0

**5.3.1 Translation invariance**

**5.3.2 Scaling invariance**

successful in recognizing.

**5.3.3 Start point invariance**

Fig. 11. Used Example to Teste Translation Invariance

descriptors were successful in recognizing.

seen, all descriptors were not successful.

faults are recognized as the same when this process is used.

*Fn* = *abs*

 1 *N*

0.5

1

1.5

2

2.5

Fig. 12. Used Example to Teste Scale Invariance

Fig. 13. Used Example to Teste Start Point Invariance

Pumping System 17

Applied to Detection and Diagnose of Faults on Sucker-Rod Pumping System

<sup>107</sup> Comparison of Border Descriptors and Pattern Recognition Techniques

*Normal 0,00 0,00 0,00 0,00 Gas Lock 4,59 7,49 7,49 13,53 Fluid Pound 76,57 92,51 92,51 76,81 Standing Valve 15,94 0,00 0,00 4,59 Traveling Valve 2,90 0,00 0,00 5,07*

*Normal 0,00 0,00 0,00 0,00 Gas Lock 83,33 91,11 91,11 90,00 Fluid Pound 8,89 8,89 8,89 8,89 Standing Valve 7,78 0,00 0,00 1,11 Traveling Valve 0,00 0,00 0,00 0,00*

*K-Curvature Centroid Curvature* 

Nowadays, the quantity of onshore wells using Sucker-Rod Pumping System is higher and it stickles the engineer work. In addition, the difficulty in recognizing a specific card shape

This study, using the processing image, is suitable for fault diagnosis of Sucker-Rod Pumping System and could help to interpret the down-hole condition of oil well promptly and correctly.

*K-Curvature Centroid Curvature* 

*Descriptor*

*Descriptor*

*Fourier Descriptor*

*Fourier Descriptor*

*0,00 10,00 20,00 30,00 40,00 50,00 60,00 70,00 80,00 90,00 100,00*

> *0,00 10,00 20,00 30,00 40,00 50,00 60,00 70,00 80,00 90,00 100,00*

**6. Conclusion**

Fig. 16. Modified System - Results to Showed Cards of Gas Lock

augment as the amount of noise increase mainly as a function of well depth.

Fig. 15. Modified System - Results to Showed Cards of Fluid Pound

Fig. 14. Examples of Distinct Faults by Rotation

#### **5.4 Proposed modification**

To improve the performance of pattern recognizing to Fluid Pound and Gas Lock faults, it is proposed a specialist system that analyze the curvature card. The better curvature that represents a third order function is identified as Gas Lock fault. Look at the results for Euclidean Distance in the Figure 15 and in the Figure 16.

It is possible to observe that the results as the others faults not change its values.

#### **5.5 Consolidate results**

The Table 2 presents the consolidate results. Thus, it can be observed that Fourier Descriptor was better than others when used with Euclidean distance and it is as good as the Centroid Descriptor when used with the Pearson Correlation.


Table 2. Consolidate Results

Fig. 15. Modified System - Results to Showed Cards of Fluid Pound

Fig. 16. Modified System - Results to Showed Cards of Gas Lock

#### **6. Conclusion**

16 Will-be-set-by-IN-TECH


Standing Valve Leaking Traveling Valve Leaking

To improve the performance of pattern recognizing to Fluid Pound and Gas Lock faults, it is proposed a specialist system that analyze the curvature card. The better curvature that represents a third order function is identified as Gas Lock fault. Look at the results for

The Table 2 presents the consolidate results. Thus, it can be observed that Fourier Descriptor was better than others when used with Euclidean distance and it is as good as the Centroid

> Border Sucess (%) Descriptors Euclidean Pearson Centroid 45,91 84,55 K-Curvature 81,49 68,72 Curvature 45,91 62,11 Fourier 86,60 83,12

It is possible to observe that the results as the others faults not change its values.


0

0.05

0.1

0.15

0.2

0.25



Euclidean Distance in the Figure 15 and in the Figure 16.

Descriptor when used with the Pearson Correlation.

Fig. 14. Examples of Distinct Faults by Rotation


**5.4 Proposed modification**

**5.5 Consolidate results**

Table 2. Consolidate Results




0

0.05

0.1

0.15

Nowadays, the quantity of onshore wells using Sucker-Rod Pumping System is higher and it stickles the engineer work. In addition, the difficulty in recognizing a specific card shape augment as the amount of noise increase mainly as a function of well depth.

This study, using the processing image, is suitable for fault diagnosis of Sucker-Rod Pumping System and could help to interpret the down-hole condition of oil well promptly and correctly.

**6** 

*Algeria* 

**Temporal and Spatial Resolution Limit Study of** 

The characterization of a radiographic imaging system response in terms of spatial and temporal resolution is a very important task that allows the determination of this system limits and capability in the investigation and visualization of very fast processes and of very small spatial details. Thus, the spatial and temporal resolutions limits are very important parameters of an imaging system that should be taken into consideration before the examination of any static object or dynamic process. The objectives of this chapter are the study and determination of radiation imaging system response in terms of spatial and temporal resolution limits and the application of super-resolution (SR) methods and algorithms to improve the resolution of captured neutron images or video sequences. The imaging system taken as example and being studied is a high-sensitivity neutron imaging system composed of an LiF+ZnS scintillator screen (0.25 mm thick), an Aluminium-coated

The proposed approach and procedure for spatial resolution and system response determination is based on the establishment of the Modulation Transfer Function (MTF) using standard slanted edge method with the most appropriate algorithm selected according to previous studies. Temporal resolution study and characterization is a more complicated task that requires well understanding of video sequence capture process. In this chapter the limit of temporal resolution allowing minimum motion blur is studied according to a selected dynamic process capture and examination under different exposure conditions. Due to physical constraints (low L/D collimation ratio and weakness of neutron beam) and instrumental limitations of the used imaging system, the captured images or neutron video can be of low resolution. To overcome this limitation, super resolution methods and algorithms are applied to improve space-time resolution. The most used methods are: the iterative variation regularization methods, the methods based on information combing from

mirror and a Charged Coupled Device (CCD) camera (2x10-5 lx at F1.4).

low resolution frames, the methods of motion compensation and the SR filters.

All the mentioned procedures and approaches above are still available for any similar X-ray or Gamma imaging systems based on the same examination principle of radiation transmission. These methods and procedures are used to judge the ability of such imaging

**1. Introduction** 

**Radiation Imaging Systems: Notions and** 

**Elements of Super Resolution** 

Faycal Kharfi, Omar Denden and Abdelkader Ali

*Neutron Radiography Department/Nuclear Research Centre of Birine,* 

The results presented a high efficiency for processed field cards and seemed to be very robust to inherent problems in the processing images, like rotation, translation and scale. In the future, it is desired to test other recognizing functions and to develop an capable system of identify all faults. It will permit the prediction and the planning of the maintenance, so the field engineer manages his equipments.

#### **7. References**


### **Temporal and Spatial Resolution Limit Study of Radiation Imaging Systems: Notions and Elements of Super Resolution**

Faycal Kharfi, Omar Denden and Abdelkader Ali *Neutron Radiography Department/Nuclear Research Centre of Birine, Algeria* 

#### **1. Introduction**

18 Will-be-set-by-IN-TECH

108 Digital Image Processing

The results presented a high efficiency for processed field cards and seemed to be very robust to inherent problems in the processing images, like rotation, translation and scale. In the future, it is desired to test other recognizing functions and to develop an capable system of identify all faults. It will permit the prediction and the planning of the maintenance, so the

Alegre, L., A, F, D. R. & Morooka, C. (1993). Intelligent approach of rod pumping problems,

Alegre, L., Morooka, C. & da Rocha, A. (1993). Intelligent diagnosis of rod pumping problems,

Barreto, F., Tygel, M., Rocha, A. & Morooka, C. (1996). Automatic downhole card generation

Chacln, J. (1969). A numerical approach to the diagnosis of sucker rod pumping installations

de Almeida Barreto Filho, M. (1993). *Geração de carta dinamométrica de fundo para diagnóstico do*

de Oliveira Costa, R. (1994). *Curso de Bombeio Mecânico*, november/1995 edn, PETROBRAS -

Dickinson, R. & Jennings, J. (1990). Use of pattern-recognition techniques in analyzing downhole dynamometer cards, *SPE Production Engineering* pp. 187–192. dos Santos Côrrea, J. F. (1995). *Sistema inteligente para aplicações de soluções em bombeamento*

Gonzalez, R. C., Woods, R. E. & Eddins, S. L. (2003). *Digital Image Processing Using Matlab*,

Kardec, A. & Nascif, J. (1998). *Manutenção - Função Estratégica*, 1998 edn, QualityMark Editora. Nazi, G. & Lea, J. (1994). Application of artificial neural network to pump card diagnosis, *SPE*

Rogers, J., Guffey, C. & Oldham, W. (1990). Artificial neural networks for identification

Schirmer, G. P. & Toutain, J. C. P. (1991). Use of advanced pattern recognition and

Schnitman, L., Albuquerque, G. S. & Corrêa, J. (2003). Modeling and implementation of

Xu, P., Xu, S. & Yin, H. (2006). Application of self-organizing competitive neural network

and its verification with downhole pump field measurements, *SPE - Society Petroleum*

*bombeio mecânico em poços de petróleo*, Master's thesis, UNIVERSIDADE ESTADUAL

*mecânico*, Master's thesis, UNIVERSIDADE ESTADUAL DE CAMPINAS -

of beam pump dynamometer load cards, *SPE - 65th Annual Technical Conference*

knowledge-based system in analyzing dynamometer cards, *SPE Computer Application*

a system for sucker rod downhole dynamometer card pattern recognition, *SPE -*

in fault diagnosis of suck rod pumping system, *Journal of Petroleum Science and*

and classification, *SPE Annual Technical Conference* pp. 311–318.

*SPE - Petroleum Computer Conference* pp. 249–255.

*SPE - 68th Annual Technical Conference* III: 97–108.

field engineer manages his equipments.

**7. References**

*Engineer* .

UNICAMP.

Prentice Hall.

pp. 349–359.

pp. 21–24.

DE CAMPINAS - UNICAMP.

*Computer Application* pp. 9–14.

Russell, S. (2003). *Inteligência Artificial*, Campus.

*Engineering* pp. 43–48.

*Annual Technical Conference and Exhibition* pp. 1–4.

Tripp, H. (1989). A review: analyzing beam-pumped wells, *JPT* pp. 457–458.

Petróleo Brasileiro S.A.

The characterization of a radiographic imaging system response in terms of spatial and temporal resolution is a very important task that allows the determination of this system limits and capability in the investigation and visualization of very fast processes and of very small spatial details. Thus, the spatial and temporal resolutions limits are very important parameters of an imaging system that should be taken into consideration before the examination of any static object or dynamic process. The objectives of this chapter are the study and determination of radiation imaging system response in terms of spatial and temporal resolution limits and the application of super-resolution (SR) methods and algorithms to improve the resolution of captured neutron images or video sequences. The imaging system taken as example and being studied is a high-sensitivity neutron imaging system composed of an LiF+ZnS scintillator screen (0.25 mm thick), an Aluminium-coated mirror and a Charged Coupled Device (CCD) camera (2x10-5 lx at F1.4).

The proposed approach and procedure for spatial resolution and system response determination is based on the establishment of the Modulation Transfer Function (MTF) using standard slanted edge method with the most appropriate algorithm selected according to previous studies. Temporal resolution study and characterization is a more complicated task that requires well understanding of video sequence capture process. In this chapter the limit of temporal resolution allowing minimum motion blur is studied according to a selected dynamic process capture and examination under different exposure conditions. Due to physical constraints (low L/D collimation ratio and weakness of neutron beam) and instrumental limitations of the used imaging system, the captured images or neutron video can be of low resolution. To overcome this limitation, super resolution methods and algorithms are applied to improve space-time resolution. The most used methods are: the iterative variation regularization methods, the methods based on information combing from low resolution frames, the methods of motion compensation and the SR filters.

All the mentioned procedures and approaches above are still available for any similar X-ray or Gamma imaging systems based on the same examination principle of radiation transmission. These methods and procedures are used to judge the ability of such imaging

Temporal and Spatial Resolution Limit Study of

task, a MatLab code was developed.

Fig. 1. The neutron imaging system studied.

**2.1.1 Theoretical approach and experimental procedures** 

An MTF is the spatial frequency response of an imaging system. It characterizes the image sharpness produced by such systems. There are many methods allowing the determination of this function. One of them, based on analysis of a slanted-edge image of a suitable target, is used in this work. The methodology followed is described by the ISO 12233 standard*.* The advantages of this method are the simplicity and the minimal arrangement required during the experimental phase. The MTF curve is calculated from the Line Spread Function (LSF). LSF is computed by taking the first derivative of the Edge Spread Function (ESF). The ESF represent the pixel response in terms of gray levels along a line(s) perpendicular to the edge. Fourier transformation and subsequent normalization procedures are then applied to the LSF to compute the MTF (Estribau & Magnau, 2004). The MTF describes the amplitude or relevant contrast by which sine functions of different frequencies are moderated by an imaging system. It gives a measurement of the degradation when transferring data of the "physical" object to becoming an image. An MTF value of 1 indicates that the full amplitude

Radiation Imaging Systems: Notions and Elements of Super Resolution 111

The first goal of this chapter is to apply this methodology and to overcome some undersampling and slanted edge identification difficulties, to establish an accurate MTF curve to be used for the determination of the system response and the effective spatial resolution limit of the neutron imaging system being studied (Domanus, 1992). The components of this imaging system are placed in an aluminium light tight box (175 mm x 105 mm x 120 mm) positioned vertically by a holder fixed on the metallic table of the neutron radiography facility. The camera consists of a 250 μm thick LiF–ZnS scintillator, a front-surfaced Al/SiO2 mirror and a high sensitivity Sony CCD camera with 2x10-5 lx (at f1.4) as the minimum required illumination (Figure 1). It has a bit depth of 8 bits. The camera was manufactured and assembled by "Neutron Optics", France (www.NeutronOptics.com). According to the manufacturer specifications, this system has a maximum intrinsic spatial resolution of 200 µm (171.5 µm calculated). All necessary experiments were performed at the neutron radiography facility of the Es-Salam research reactor (Kharfi, et al., 2005). To achieve this

system to produce spatial (internal details) and temporal (dynamic) properties of any moving object or dynamic process under examination.

This chapter is divided into three main parts. In the first one, the MTF determination is presented by proposing an accurate edge identification method and a line spread function under-sampling problem-resolving procedure. The proposed method and procedure were integrated into a MatLab code. The source code can be requested to the author of this chapter. In the second section, induced motion blur when a moving object is examined is studied as a function of video capture frame rate. This approach allows us to determine suitable exposure conditions and therefore the temporal resolution that enables an optimum image quality with minimum motion blur and noise and acceptable contrast. Finally, notions and elements of super resolution are presented in the last section with some interesting examples. The examples of resolution enhancement presented in this chapter concern a water flow process examined with the described neutron imaging system. Experimental results on real data sets confirm the electiveness of the proposed procedures and methodologies. All experiments were performed at the neutron radiography facility and laboratory of the Algerian Es-Salam research reactor.

#### **2. Temporal and spatial resolution limit study of a radiographic imaging system**

#### **2.1 Spatial resolution limit determination**

Spatial resolution limit is a very important parameter of an imaging system that should be taken into consideration before the examination of any object. In this first chapter part, we propose the determination of a neutron imaging system's response in terms of spatial resolution. The proposed procedure is based on establishment of the Modulation Transfer Function (MTF). The imaging system being studied is based on a high sensitivity CCD neutron camera (2x10-5 lx at f1.4). The neutron beam used is issued from the horizontal beam port (H.6) of the Algerian Es-Salam research reactor. Our contribution in this field is in the MTF determination by proposing an accurate edge identification method and a line spread function undersampling problem-resolving procedure. These methods and procedure are integrated into a MatLab program (code). The methods, procedures and approaches proposed in this work are available for any other neutron imaging system and allow the judgment of the ability of a neutron imaging system to produce spatial properties of any object under examination.

The Modulation Transfer Function is a common metric used to quantify the spatial resolution in the system response. Traditional methods for MTF measurements were initially designed for devices forming consecutive images that can give erroneous MTF results, due to the fact that the sampling of digital devices is not properly taken into consideration. The slanted edge method, which is analogous to the impulse response determination for an electronic system, is a common technique used for the measurement of an accurate MTF because it requests a relatively simple experimental arrangement. In neutron imaging, this technique consists of imaging a thin and sharp slanted-edge onto the detector. The edge target must be made from a strong neutron absorbing material such as Gadolinium. The ISO 12233 standard presents the general methodology for MTF measurement based on this technique (Jespers et al., 1976). An interesting algorithms comparison work used for MTF measurement based on slanted-edge technique is presented in reference (Samei, et al., 2005).

system to produce spatial (internal details) and temporal (dynamic) properties of any

This chapter is divided into three main parts. In the first one, the MTF determination is presented by proposing an accurate edge identification method and a line spread function under-sampling problem-resolving procedure. The proposed method and procedure were integrated into a MatLab code. The source code can be requested to the author of this chapter. In the second section, induced motion blur when a moving object is examined is studied as a function of video capture frame rate. This approach allows us to determine suitable exposure conditions and therefore the temporal resolution that enables an optimum image quality with minimum motion blur and noise and acceptable contrast. Finally, notions and elements of super resolution are presented in the last section with some interesting examples. The examples of resolution enhancement presented in this chapter concern a water flow process examined with the described neutron imaging system. Experimental results on real data sets confirm the electiveness of the proposed procedures and methodologies. All experiments were performed at the neutron radiography facility

**2. Temporal and spatial resolution limit study of a radiographic imaging** 

Spatial resolution limit is a very important parameter of an imaging system that should be taken into consideration before the examination of any object. In this first chapter part, we propose the determination of a neutron imaging system's response in terms of spatial resolution. The proposed procedure is based on establishment of the Modulation Transfer Function (MTF). The imaging system being studied is based on a high sensitivity CCD neutron camera (2x10-5 lx at f1.4). The neutron beam used is issued from the horizontal beam port (H.6) of the Algerian Es-Salam research reactor. Our contribution in this field is in the MTF determination by proposing an accurate edge identification method and a line spread function undersampling problem-resolving procedure. These methods and procedure are integrated into a MatLab program (code). The methods, procedures and approaches proposed in this work are available for any other neutron imaging system and allow the judgment of the ability of a neutron imaging system to produce spatial properties of any object under examination.

The Modulation Transfer Function is a common metric used to quantify the spatial resolution in the system response. Traditional methods for MTF measurements were initially designed for devices forming consecutive images that can give erroneous MTF results, due to the fact that the sampling of digital devices is not properly taken into consideration. The slanted edge method, which is analogous to the impulse response determination for an electronic system, is a common technique used for the measurement of an accurate MTF because it requests a relatively simple experimental arrangement. In neutron imaging, this technique consists of imaging a thin and sharp slanted-edge onto the detector. The edge target must be made from a strong neutron absorbing material such as Gadolinium. The ISO 12233 standard presents the general methodology for MTF measurement based on this technique (Jespers et al., 1976). An interesting algorithms comparison work used for MTF measurement based on slanted-edge

moving object or dynamic process under examination.

and laboratory of the Algerian Es-Salam research reactor.

technique is presented in reference (Samei, et al., 2005).

**2.1 Spatial resolution limit determination** 

**system** 

The first goal of this chapter is to apply this methodology and to overcome some undersampling and slanted edge identification difficulties, to establish an accurate MTF curve to be used for the determination of the system response and the effective spatial resolution limit of the neutron imaging system being studied (Domanus, 1992). The components of this imaging system are placed in an aluminium light tight box (175 mm x 105 mm x 120 mm) positioned vertically by a holder fixed on the metallic table of the neutron radiography facility. The camera consists of a 250 μm thick LiF–ZnS scintillator, a front-surfaced Al/SiO2 mirror and a high sensitivity Sony CCD camera with 2x10-5 lx (at f1.4) as the minimum required illumination (Figure 1). It has a bit depth of 8 bits. The camera was manufactured and assembled by "Neutron Optics", France (www.NeutronOptics.com). According to the manufacturer specifications, this system has a maximum intrinsic spatial resolution of 200 µm (171.5 µm calculated). All necessary experiments were performed at the neutron radiography facility of the Es-Salam research reactor (Kharfi, et al., 2005). To achieve this task, a MatLab code was developed.

Fig. 1. The neutron imaging system studied.

#### **2.1.1 Theoretical approach and experimental procedures**

An MTF is the spatial frequency response of an imaging system. It characterizes the image sharpness produced by such systems. There are many methods allowing the determination of this function. One of them, based on analysis of a slanted-edge image of a suitable target, is used in this work. The methodology followed is described by the ISO 12233 standard*.* The advantages of this method are the simplicity and the minimal arrangement required during the experimental phase. The MTF curve is calculated from the Line Spread Function (LSF). LSF is computed by taking the first derivative of the Edge Spread Function (ESF). The ESF represent the pixel response in terms of gray levels along a line(s) perpendicular to the edge. Fourier transformation and subsequent normalization procedures are then applied to the LSF to compute the MTF (Estribau & Magnau, 2004). The MTF describes the amplitude or relevant contrast by which sine functions of different frequencies are moderated by an imaging system. It gives a measurement of the degradation when transferring data of the "physical" object to becoming an image. An MTF value of 1 indicates that the full amplitude

Temporal and Spatial Resolution Limit Study of

just first and last points are presented).

Vertical axis (for cross scan)

With:

*a*

Radiation Imaging Systems: Notions and Elements of Super Resolution 113

Fig. 2. Sampling improvement by profile line projection after edge identification (for clarity

<sup>θ</sup> Identified edge

(Projection direction)

≥ 2 pixels

**x**

Horizontal axis (for along scan)

**x**

**x x**

larger than that obtained with lower contrast target for a given spatial frequency; therefore, when the MTF of a system is presented, the contrast of the line pair used to measure the MTF should be given (Williams, 2004). In our case the maximum contrast of the target image obtained is 37.68% calculated by Michelson formula (Michelson, 1927). Noise can also affect contrast transfer and overall detection capability of imaging system. Therefore, the measured MTF by the used slanted edge method will be biased. The use of multiple lines of profile (m lines) for data improving can reduce the source of error but will usually not eliminate it completely (Burns, 2000). For practical purpose, different tilt angles (θ) varying from 1° to 5° were tested allowing relatively high and low resolution ESF generation. Before starting using our program for edge identification and MTF establishment, a Sobel edge detection operator followed by thresholding and binary morphological processing (filtering) is used to identify edge with the proper (real) orientation (θp) for a verification purpose (Figure 3). A value of θp of 0.95° was found for a tested tilt angle (θ) of 1° that prove the good target fixing and correct geometrical exposure conditions. After this verification, the image of the target obtained is first read and displayed in a suitable format (.bmp) by the developed program. The edge locations were accurately determined for each lines of profile of the selected ROI. An initial estimate of the location and angle of the edge is then determined by performing a least squares regression of selected points along the edge. The

approximate equation of the identified edge average straight line is given by:

2

**,**

*b*

2

1 11

 

*xy x y*

*m mm ii i i i ii m m i i i i*

2 1 1

*mx x*

*Y ax b* (1)

2

(2)

1 1 11

*i i i ii*

*x y x xy*

 

*m m mm*

2 1 1

*mx x*

*i i ii m m i i i i*

is transferred by the imaging system, while a MTF value of 0 indicates that no signal at all is transferred. In neutron imaging, an object can be represented in spatial domain by a 3D distribution function of neutron attenuation coefficients, a distribution of materials densities or any suitable distribution, and in the frequency domain by a distribution of frequencies that represent fine (high frequency) and coarse (low frequency) details. A good MTF must be sensitive to any change in the system input frequencies of the target or the examined object. For our case, the MTF deviation from the ideal value of 1 is due to: 1) the scintillator/mirror/CCD detector combination, 2) the geometrical exposure conditions, especially the beam divergent angle and the collimation ratio (L/D), and 3) the scattered neutrons. Indeed, a thorough understanding of exposure geometry and conditions, object or target input frequencies transferring and system response theory are very important for the establishment and correct analysis of the MTF obtained.

The slanted-edge method consists of imaging an edge onto the detector, slightly tilted with regard to the rows (or the columns) (Reichenbach et al., 1991). A slanted oriented edge, therefore, allows the horizontal Spatial Frequency Response (SFR) of the system to be obtained. In this case, the response of each pixel line across the edge gives a different ESF, due to the phasing of the pixel centre locations as the edge location changes with each row of pixels. These ESF are under-sampled but it is possible to mathematically increase the sampling rate by projecting data along the edge.

The main steps of the procedure used in this chapter to establish the MTF are the following:


In this chapter, a MATLAB code based on the Slanted-edge method was developed. This program gives users an accurate edge identification application and a procedure for sampling improvement and ESF construction, Line Spread Function computation and MTF determination and plotting with the possibility of changing input data such as region of interest (ROI) dimensions (m lines, n pixels) and pixel size (display monitor). The output data can be directly displayed or saved in Excel formats. The most important functions of this program are:

**1. Edge Identification:** The initial task in MTF measurement is the identification of suitable edges for analysis. According to detection pixels grid vertical or horizontal axes, the edge must be oriented with a minimum tilt angle for an along scan or cross scan MTF determination (Figure 2). This minimum angle is required as well as sufficient length for suitable ESF construction (Kohm, 2004). It can be easily proven that at least two horizontal pixels should go through along the target edge to guarantee that sub-pixels will be placed on uniform grid after projection for ESF construction (Figure 2). In our case, the target length is 4.5 cm (~240 pixels), thus the minimum required angle is 0.47°. As the orientation of the angle changes, the resolution varies also, becoming either coarser or finer. The candidate edge must also meet contrast and noise requirements for selection (Jain, 1989). This is because, the MTF value obtained with high contrast edge target is always

Fig. 2. Sampling improvement by profile line projection after edge identification (for clarity just first and last points are presented).

larger than that obtained with lower contrast target for a given spatial frequency; therefore, when the MTF of a system is presented, the contrast of the line pair used to measure the MTF should be given (Williams, 2004). In our case the maximum contrast of the target image obtained is 37.68% calculated by Michelson formula (Michelson, 1927). Noise can also affect contrast transfer and overall detection capability of imaging system. Therefore, the measured MTF by the used slanted edge method will be biased. The use of multiple lines of profile (m lines) for data improving can reduce the source of error but will usually not eliminate it completely (Burns, 2000). For practical purpose, different tilt angles (θ) varying from 1° to 5° were tested allowing relatively high and low resolution ESF generation. Before starting using our program for edge identification and MTF establishment, a Sobel edge detection operator followed by thresholding and binary morphological processing (filtering) is used to identify edge with the proper (real) orientation (θp) for a verification purpose (Figure 3). A value of θp of 0.95° was found for a tested tilt angle (θ) of 1° that prove the good target fixing and correct geometrical exposure conditions. After this verification, the image of the target obtained is first read and displayed in a suitable format (.bmp) by the developed program. The edge locations were accurately determined for each lines of profile of the selected ROI. An initial estimate of the location and angle of the edge is then determined by performing a least squares regression of selected points along the edge. The approximate equation of the identified edge average straight line is given by:

With:

112 Digital Image Processing

is transferred by the imaging system, while a MTF value of 0 indicates that no signal at all is transferred. In neutron imaging, an object can be represented in spatial domain by a 3D distribution function of neutron attenuation coefficients, a distribution of materials densities or any suitable distribution, and in the frequency domain by a distribution of frequencies that represent fine (high frequency) and coarse (low frequency) details. A good MTF must be sensitive to any change in the system input frequencies of the target or the examined object. For our case, the MTF deviation from the ideal value of 1 is due to: 1) the scintillator/mirror/CCD detector combination, 2) the geometrical exposure conditions, especially the beam divergent angle and the collimation ratio (L/D), and 3) the scattered neutrons. Indeed, a thorough understanding of exposure geometry and conditions, object or target input frequencies transferring and system response theory are very important for the

The slanted-edge method consists of imaging an edge onto the detector, slightly tilted with regard to the rows (or the columns) (Reichenbach et al., 1991). A slanted oriented edge, therefore, allows the horizontal Spatial Frequency Response (SFR) of the system to be obtained. In this case, the response of each pixel line across the edge gives a different ESF, due to the phasing of the pixel centre locations as the edge location changes with each row of pixels. These ESF are under-sampled but it is possible to mathematically increase the

The main steps of the procedure used in this chapter to establish the MTF are the following: 1. Neutron radiography of the target, scanning of all lines and gray levels averaging and adjusting according to open beam and dark current images. This produces the ESF, also called the edge profile function. The distances in pixels are corrected depending on the

In this chapter, a MATLAB code based on the Slanted-edge method was developed. This program gives users an accurate edge identification application and a procedure for sampling improvement and ESF construction, Line Spread Function computation and MTF determination and plotting with the possibility of changing input data such as region of interest (ROI) dimensions (m lines, n pixels) and pixel size (display monitor). The output data can be directly displayed or saved in Excel formats. The most important functions of

**1. Edge Identification:** The initial task in MTF measurement is the identification of suitable edges for analysis. According to detection pixels grid vertical or horizontal axes, the edge must be oriented with a minimum tilt angle for an along scan or cross scan MTF determination (Figure 2). This minimum angle is required as well as sufficient length for suitable ESF construction (Kohm, 2004). It can be easily proven that at least two horizontal pixels should go through along the target edge to guarantee that sub-pixels will be placed on uniform grid after projection for ESF construction (Figure 2). In our case, the target length is 4.5 cm (~240 pixels), thus the minimum required angle is 0.47°. As the orientation of the angle changes, the resolution varies also, becoming either coarser or finer. The candidate edge must also meet contrast and noise requirements for selection (Jain, 1989). This is because, the MTF value obtained with high contrast edge target is always

angle of the edge slope. The edge should be sloped for smooth averaging.

3. Applying a discrete Fourier transform to the LSF. This produces the MTF.

2. Taking the first derivative from the ESF to generate the LSF.

establishment and correct analysis of the MTF obtained.

sampling rate by projecting data along the edge.

this program are:

$$Y = a\mathbf{x} + b \tag{1}$$

$$\mathbf{a} = \frac{\left(\sum\_{i=1}^{m} \mathbf{x}\_i y\_i\right) - \left(\sum\_{i=1}^{m} \mathbf{x}\_i\right)\left(\sum\_{i=1}^{m} y\_i\right)}{m\left(\sum\_{i=1}^{m} \mathbf{x}\_i^2\right) - \left(\sum\_{i=1}^{m} \mathbf{x}\_i\right)^2}, \\ \mathbf{b} = \frac{\left(\sum\_{i=1}^{m} \mathbf{x}\_i^2\right)\left(\sum\_{i=1}^{m} y\_i\right) - \left(\sum\_{i=1}^{m} \mathbf{x}\_i\right)\left(\sum\_{i=1}^{m} \mathbf{x}\_i y\_i\right)}{m\left(\sum\_{i=1}^{m} \mathbf{x}\_i^2\right) - \left(\sum\_{i=1}^{m} \mathbf{x}\_i\right)^2} \tag{2}$$

Temporal and Spatial Resolution Limit Study of

of selected profile lines.

following table (table.1).

Radiation Imaging Systems: Notions and Elements of Super Resolution 115

Where: *Pw* is the detected pixel width, *Ph* is the detected pixel height and *Npl* is the number

The slanted-edge target used is a 25 µm thick foil of Gadolinium which is a highly neutron absorbing material allowing the production of suitable images with the necessary contrast between dark and light parts (edge). The used tilt angles are: 1° for the Sobel edge detection test and 5° for the MTF determination. To check the dependence between tilt angle and MTF

The main characteristics of the neutron imaging system studied are presented in the

Detector element Sony ICX419ALL, ½ inch interline transfer CCD image sensor

Shutter speed or frames rate Hi: 1/50, 1/125, 1/250, 1/500, 1/1000, 1/2000 sec

Dynamic range Relatively Wide at standard imaging conditions

2λ/25mm; reflectivity 94%

Neutron radiography image were captured at the neutron radiograph facility of Es-Salam research reactor under neutron beam intensity of 1.5x106 n/cm2/s and with a collimation ratio (L/D) of ~125. The exposure time to the neutron beam was 15 seconds. Selected camera gain was 18 dB. After target image capture, the developed code performs the following

2. ROI selection, 240 lines x110 pixels were chosen from the image of the target. The main criteria for ROI selection are: 1) the width (n pixels) of the ROI must perfectly cover the edge area; 2) the length of the ROI (m lines) should be selected as long as possible to

3. Edge identification and estimation of the real tilt angle. Edge positions were determined on a line-by-line (ESFs) basis using pixel profile information. For each single blurred line of profile (ESF) a simple digital differentiation is applied to detect maximum slope. The sub-pixel edge points are determined by fitting a cubic polynomial equation to the edge data using four values around the maximum slope point. Then, the zero crossing location of the second derivative of the polynomial coefficients indicates the curve inflection point,

ensure the reduction of noise and low-frequency MTF estimation errors.

Neutron Scintillator 250 μm LiF –ZnS, green emission (~520nm) Mirror Front-surfaced Al/SiO2-mirror; optical flatness

LO: 1, 2, 4, 8 , 16, 32, 64, 128, 256 frame(s)

ߠܽ݊

result, some others target tilt angles, of values less than 10°, were tested.

Number of pixels 752(H) x 582 (V), 512x512 used

Table 1. Main characteristics of the neutron imaging system studied.

Unit cell size 8.6 µm x 8.3 µm Minimum illumination 0.00002 lx. F1.4

Bit depth 8 bits (255 levels)

Manual gain control 8-38 dB

Signal to noise ratio 52 dB

operations for MTF computation and plotting: 1. Target image reading in suitable format:

ೢ

(3)

ே

Where:

*m* = number of data (line number of the selected ROI); *Xi* = row number;

*Yi* = sub-pixel edge position.

Fig. 3. Result of Sobel edge detection operator followed by thresholding and binary morphological processing (filtering) to identify the proper edge orientation (θp~0.95°, for θ=1°).

**2. Sampling improvement and Edge Spread Function Construction:** The ESF is the system response to the input of an ideal edge. As the output of the system is a sampled image, the fidelity of the edge spread function using a single line of image data is insufficient for MTF analysis. Aliasing due to under sampling in the camera, along with phase effects and the angle of the actual edge with respect to the sampling grid will cause variable results for a single line. The phase effects and edge angle may be exploited, however, to provide a high fidelity measurement of the ESF. Construction of the ESF is graphically represented in Figure 2. The edge is identified in the image as described above. A line is then constructed perpendicular to the edge. For a given line of image data, each point around the edge transition is projected onto the perpendicular line. This process is then repeated for each subsequent line of image data along the edge. The difference in sub-pixel location of the edge with respect to the sampling grid for different lines in the image results in differences in the location of the projected data point onto the perpendicular. This yields a high fidelity representation of the system response to an edge. Small changes in the edge angle used during construction of the super-sampled edge affect the quality of the resulting ESF. The angle is systematically adjusted by small increments of 0.25° around the initial estimate (5°) which is equivalent to one pixel shift of one edge extremity in left or right direction for an along scan. The resulting curve fit (equation) is used to refine the edge angle estimate for the final ESF construction. After the individual ESF data points have been determined, the data must be conditioned and resampled to a fixed interval. In general, the angle of the edge with respect to the sampling grid does not produce uniformly distributed data points along the perpendicular to the edge. Also, with longer edges, many data points may be located in close proximity to one another. Suitable 3 order polynomial data fitting is used to re-sample the data to uniformly spaced sample points. More sophisticated fitting algorithm is presented in reference (Cleveland, 1985).

To avoid data interference when performing projection operation, it is important that the selected tilt angle θ (projection angle) obeys the following condition:

Fig. 3. Result of Sobel edge detection operator followed by thresholding and binary morphological processing (filtering) to identify the proper edge orientation (θp~0.95°, for

θ

selected tilt angle θ (projection angle) obeys the following condition:

**2. Sampling improvement and Edge Spread Function Construction:** The ESF is the system response to the input of an ideal edge. As the output of the system is a sampled image, the fidelity of the edge spread function using a single line of image data is insufficient for MTF analysis. Aliasing due to under sampling in the camera, along with phase effects and the angle of the actual edge with respect to the sampling grid will cause variable results for a single line. The phase effects and edge angle may be exploited, however, to provide a high fidelity measurement of the ESF. Construction of the ESF is graphically represented in Figure 2. The edge is identified in the image as described above. A line is then constructed perpendicular to the edge. For a given line of image data, each point around the edge transition is projected onto the perpendicular line. This process is then repeated for each subsequent line of image data along the edge. The difference in sub-pixel location of the edge with respect to the sampling grid for different lines in the image results in differences in the location of the projected data point onto the perpendicular. This yields a high fidelity representation of the system response to an edge. Small changes in the edge angle used during construction of the super-sampled edge affect the quality of the resulting ESF. The angle is systematically adjusted by small increments of 0.25° around the initial estimate (5°) which is equivalent to one pixel shift of one edge extremity in left or right direction for an along scan. The resulting curve fit (equation) is used to refine the edge angle estimate for the final ESF construction. After the individual ESF data points have been determined, the data must be conditioned and resampled to a fixed interval. In general, the angle of the edge with respect to the sampling grid does not produce uniformly distributed data points along the perpendicular to the edge. Also, with longer edges, many data points may be located in close proximity to one another. Suitable 3 order polynomial data fitting is used to re-sample the data to uniformly spaced sample points. More sophisticated fitting algorithm is presented in reference (Cleveland, 1985). To avoid data interference when performing projection operation, it is important that the

*m* = number of data (line number of the selected ROI);

Where:

θ=1°).

*Xi* = row number;

*Yi* = sub-pixel edge position.

$$\tan \theta \ge \frac{p\_w}{s\_{pl} p\_h} \tag{3}$$

Where: *Pw* is the detected pixel width, *Ph* is the detected pixel height and *Npl* is the number of selected profile lines.

The slanted-edge target used is a 25 µm thick foil of Gadolinium which is a highly neutron absorbing material allowing the production of suitable images with the necessary contrast between dark and light parts (edge). The used tilt angles are: 1° for the Sobel edge detection test and 5° for the MTF determination. To check the dependence between tilt angle and MTF result, some others target tilt angles, of values less than 10°, were tested.

The main characteristics of the neutron imaging system studied are presented in the following table (table.1).


Table 1. Main characteristics of the neutron imaging system studied.

Neutron radiography image were captured at the neutron radiograph facility of Es-Salam research reactor under neutron beam intensity of 1.5x106 n/cm2/s and with a collimation ratio (L/D) of ~125. The exposure time to the neutron beam was 15 seconds. Selected camera gain was 18 dB. After target image capture, the developed code performs the following operations for MTF computation and plotting:


Temporal and Spatial Resolution Limit Study of

Fig. 5. Screen shot of the developed MatLab program (code).

by SPLINE function are presented in figures 6(a) and 6(b).

(a) (b)

the determination of spatial resolution limit.

Radiation Imaging Systems: Notions and Elements of Super Resolution 117

Results for the main steps of MTF determination performed by the developed program such as edge identification, profile line superposition and interpolation (ESFs) and data projection

Fig. 6. (a) Example of edge identification by the developed code: after original image pixel decomposing into small intervals (polynomial fit steps), accurate edge location is detected in each horizontal line of profile across the edge and an average straight line of all detected locations is determined (least-squares) allowing thus data projection to improve ESF sampling (SPline), (b) Superposed profile lines using SPLINE function of MatLab.

Finally, the proposed code plots the MTF of the studied imaging system (Figure 7). The analysis of this MTF curve allows the characterization of spatial response of this system and

which is assumed as the sub-pixel edge location. Finally, all the ESFs sub-pixel edge locations are forced to be a straight line by assuming that the edge is a straight line. This is done by fitting a line through the sub-pixel edge locations obtained from the previous step and then declaring the actual edge locations to be on the line. The least-squares approach for finding a straight line involves determining the best approximating line when the squared error of the sum of squares of the differences between the edge points on the approximating line and the given edge point values is minimized. All these operations are performed automatically by the developed program


#### **2.1.2 Example of practical application and experimental results**

In order to characterize the imaging system being studied, the proposed procedure and developed code were used. The effective spatial resolution was determined for a recommended MTF value of 0.1 (*MTF10*). MTF10 is a value corresponding to the average human eye's separation power (resolution). Another very interesting MTF characteristic is MTF50 corresponding to a value of 0.5 of MTF (Nyquist Frequency) that provides the limit between an under-sampled and poor displayed image (very low frequencies) and an optimally sampled and well-displayed image (medium and high frequencies). The neutron image obtained of the examined edge target is shown in figure 4. Figure 5 represents the main window (screen shot) of the developed program used for MTF computation. The spatial resolution limit (*Rl*) of our system is calculated in mm by the following expression:

(Background)

$$
\mathcal{R}\_{\mathbb{I}} = \frac{\mathbf{1}[\text{lp}]}{\mathbf{2}\,\text{MTF10}[\mathbf{\underline{\underline{\text{lip}}}}]} \tag{4}
$$

$$
\begin{array}{l}
\text{Gadolinium} \\
\text{covered} \\
\text{surface} \\
\text{surface}
\end{array}
$$

$$
\begin{array}{l}
\text{Not covered} \\
\text{surface}
\end{array}
$$

Fig. 4. Target neutron radiograph.

4. Sampling improvement by determination of sub-sampling factor and ESF generation using data projection technique on an orthogonal line to the detected edge. Before data projection, the aligned edge data of every line of profile (ESF) taken from the ROI of the image target is interpolated with a cubic splines. Then all the ESFs are splined using the function spline.m of MATLAB. And finally, the averaged of splined ESFs is determined; 5. ESF data selection between 10% and 90% around edge area in the interval between maximum and minimum gray levels values (Fig.6 (b)) to avoid unnecessary frequencies

7. Numerical differentiation of fitted ESF curve allowing the determination of LSF; 8. Fast Fourier Transform of LSF with the application of Hamming FFT-window;

In order to characterize the imaging system being studied, the proposed procedure and developed code were used. The effective spatial resolution was determined for a recommended MTF value of 0.1 (*MTF10*). MTF10 is a value corresponding to the average human eye's separation power (resolution). Another very interesting MTF characteristic is MTF50 corresponding to a value of 0.5 of MTF (Nyquist Frequency) that provides the limit between an under-sampled and poor displayed image (very low frequencies) and an optimally sampled and well-displayed image (medium and high frequencies). The neutron image obtained of the examined edge target is shown in figure 4. Figure 5 represents the main window (screen shot) of the developed program used for MTF computation. The spatial resolution limit (*Rl*) of our system is calculated in mm by the following expression:

> R� <sup>=</sup> ����] �������� �� ��]

Gadolinium covered surface

Not covered surface (Background) (4)

operations are performed automatically by the developed program

6. Fitting a 3 order polynomial equation to average ESF data;

**2.1.2 Example of practical application and experimental results** 

9. MTF plotting after normalization of FFT magnitude.

considering when applying FFT;

Fig. 4. Target neutron radiograph.

which is assumed as the sub-pixel edge location. Finally, all the ESFs sub-pixel edge locations are forced to be a straight line by assuming that the edge is a straight line. This is done by fitting a line through the sub-pixel edge locations obtained from the previous step and then declaring the actual edge locations to be on the line. The least-squares approach for finding a straight line involves determining the best approximating line when the squared error of the sum of squares of the differences between the edge points on the approximating line and the given edge point values is minimized. All these


Fig. 5. Screen shot of the developed MatLab program (code).

Results for the main steps of MTF determination performed by the developed program such as edge identification, profile line superposition and interpolation (ESFs) and data projection by SPLINE function are presented in figures 6(a) and 6(b).

Fig. 6. (a) Example of edge identification by the developed code: after original image pixel decomposing into small intervals (polynomial fit steps), accurate edge location is detected in each horizontal line of profile across the edge and an average straight line of all detected locations is determined (least-squares) allowing thus data projection to improve ESF sampling (SPline), (b) Superposed profile lines using SPLINE function of MatLab.

Finally, the proposed code plots the MTF of the studied imaging system (Figure 7). The analysis of this MTF curve allows the characterization of spatial response of this system and the determination of spatial resolution limit.

Temporal and Spatial Resolution Limit Study of

reproduces fairly the image of the target.

parameters are dominant:

emission properties.

resolution).

**2.2 Temporal resolution** 

<sup>1</sup> *lp* : line pairs or cycle.

conditions.

reflection by the mirror.

**2.2.1 Temporal resolution estimation** 

edge position is equal to 400µm.

affecting the neutron beam intensity considerably.

Radiation Imaging Systems: Notions and Elements of Super Resolution 119

2. The spatial resolution limit at MTF10 that correspond to an MTF of 0.1 is equal to 1.2 *lp*1*/mm*. This value is equivalent to 416 µm (according to equation 6) and is a very significant and an effective one because it is greater than the value of the intrinsic spatial resolution of ~200 µm of the studied imaging system. Deviation from the

1. The geometric properties of the neutron radiography facility: In our case, two

a. The L/D neutron collimation ration (~125): for a practical case of Objetc-todetector distance (Lf) of 5 cm, the induced geometrical Unsharpness (Ug=Lf/(L/D) due to the beam spread when a point is projected at the same

b. The scintillator thickness: for scintillator-based converter system, the first source of blur is spreading of emitted light within the scintillator material. The spreading is determined by the material's thickness and by the design of the scintillator in terms of its crystal structure and its neutron absorptive and light

Despite all these constraints, the geometric blurring can be minimized by reducing the object-to-image detector distance as much as possible (e.g., contact), and by increasing the collimator inlet aperture-to-object distance (L) to a suitable level not

2. The imaging technique: indeed indirect-conversion method of the used image detector can scatter light over several pixels, further limiting the effective resolution of the system, more so than indicated by pixel size alone (intrinsic

3. Optical properties: the intrinsic resolution can often be degraded by other factors which introduce blurring of the image, such as improper focusing and light

However, one cannot isolate spatial resolution effects on neutron image quality from effects due to quantum mottle and electronic noise under typical digital image acquisition

The temporal resolution is determined by the frame rate and by the exposure-time of the camera. These limit the maximal speed of dynamic events that can be well captured by the neutron camera. Rapid dynamic events that occur faster than the frame rate of CCD cameras are not visible (or else captured incorrectly) in the recorded video sequences. There are two typical visual effects in video sequences which are caused by very fast motion (Shechtman,

intrinsic resolution is due to many reasons and factors that are the following:

drop rapidly in this region. This is not a good characteristic of the imaging system studied because it allows a fair contrast image display when compared to other imaging systems with MTF dropping gradually (Figure 8). The CCD camera, in such a case, presents optimum separation ability (resolution) and, in term of gray levels (contrast), it

Fig. 7. MTF of the studied neutron imaging system.

Fig. 8. MTF curves for components of a radiographic imaging system and the composite MTF for the entire system introduced for comparison purpose. A: MTF for the screen-film combination, B: MTF for a 1 mm focal spot with 90 cm between the focal spot and the object and 10 cm between object and film, C: MTF for 0.3 mm motion of the object during exposure, D: Composite MTF for the entire imaging system (Hendee & Ritlenour 2002).

When critically examining the obtained MTF, the following remarks and conclusions can be drawn:

1. The interesting and best exploited region of MTF is in the interval between the Nyquist frequency and the human eye's resolution. We can see that the MTF curve obtained

Fig. 8. MTF curves for components of a radiographic imaging system and the composite MTF for the entire system introduced for comparison purpose. A: MTF for the screen-film combination, B: MTF for a 1 mm focal spot with 90 cm between the focal spot and the object

When critically examining the obtained MTF, the following remarks and conclusions can be

1. The interesting and best exploited region of MTF is in the interval between the Nyquist frequency and the human eye's resolution. We can see that the MTF curve obtained

and 10 cm between object and film, C: MTF for 0.3 mm motion of the object during exposure, D: Composite MTF for the entire imaging system (Hendee & Ritlenour 2002).

Fig. 7. MTF of the studied neutron imaging system.

drawn:

drop rapidly in this region. This is not a good characteristic of the imaging system studied because it allows a fair contrast image display when compared to other imaging systems with MTF dropping gradually (Figure 8). The CCD camera, in such a case, presents optimum separation ability (resolution) and, in term of gray levels (contrast), it reproduces fairly the image of the target.

	- 1. The geometric properties of the neutron radiography facility: In our case, two parameters are dominant:
		- a. The L/D neutron collimation ration (~125): for a practical case of Objetc-todetector distance (Lf) of 5 cm, the induced geometrical Unsharpness (Ug=Lf/(L/D) due to the beam spread when a point is projected at the same edge position is equal to 400µm.
		- b. The scintillator thickness: for scintillator-based converter system, the first source of blur is spreading of emitted light within the scintillator material. The spreading is determined by the material's thickness and by the design of the scintillator in terms of its crystal structure and its neutron absorptive and light emission properties.

Despite all these constraints, the geometric blurring can be minimized by reducing the object-to-image detector distance as much as possible (e.g., contact), and by increasing the collimator inlet aperture-to-object distance (L) to a suitable level not affecting the neutron beam intensity considerably.


However, one cannot isolate spatial resolution effects on neutron image quality from effects due to quantum mottle and electronic noise under typical digital image acquisition conditions.

#### **2.2 Temporal resolution**

#### **2.2.1 Temporal resolution estimation**

The temporal resolution is determined by the frame rate and by the exposure-time of the camera. These limit the maximal speed of dynamic events that can be well captured by the neutron camera. Rapid dynamic events that occur faster than the frame rate of CCD cameras are not visible (or else captured incorrectly) in the recorded video sequences. There are two typical visual effects in video sequences which are caused by very fast motion (Shechtman,

 <sup>1</sup> *lp* : line pairs or cycle.

Temporal and Spatial Resolution Limit Study of

**2.2.2 Experimental results** 

indicator's rotation speed of 0.45 RPS.

speeds are shown in figure 12.

Radiation Imaging Systems: Notions and Elements of Super Resolution 121

enables the calculation of the motion blur B using equation (5) for each video sequence obtained. The motion blur in this case is considered as the width (in pixels) of the shade involved behind the rotating indicator appearing in each frame. This width is measured at the middle of the indication on a selected frame. The data obtained allows us to determine

> ቀͳ െ ூೢ

where: *Iwr* is the real indicator width (7.9mm), *Iwi* is the indicator width measured on the obtained image, *Lf* is the distance between the indicator position and the scintillator screen (10.5 mm), *L* is the distance between the collimator inlet aperture and the indicator position

Selected frame from captured video sequences is presented in figure 10. Preliminary analysis of the video sequences obtained shows that the temporal resolution is affected

Fig. 10. Selected frame showing the indicator position and motion blur width taken from selected video sequence captured with a frame rates of 12.5 fps and for a cadmium

Motions blur variation as a function of frame rate is presented in figure 11. Graphs 1 and 2

On graphs 1 and 2, we can easily verify that motion blur decrease when the frame rate is increased up to critical values where it became constant. It was observed that the critical points are located at 12.5, 50, 125 and 250 fps on the frame rate axis for, respectively, 0.0017, 0.45, 1.2 and 1.65 RPS of the cadmium indicator rotation speeds. These critical points correspond to the temporal resolution limits that retain motion blur at a minimum although with low image dynamic range. Results of temporal resolution limits for different rotational

correspond to indication rotation speeds of 0.45 and 1.2 RPS respectively.

ቁ ͳቃ (5)

the effective limits of temporal resolution for each rotational speed.

(2500.5 mm), *D* is the collimator inlet aperture diameter (20 mm).

principally by motion blur and not by motion-based aliasing.

ܤൌܫ௪ െ ܫ௪ ቂ

2005). One effect (motion blur) is caused by the exposure-time of the camera and the other effect (motion aliasing) is due to the temporal sub-sampling introduced by the frame rate of the camera. In our case of neutron imaging, only motion blur is of interest. It occurs when the camera integrates the light coming from the scene (scintillator) during the exposure time in order to generate each frame. As a result, fast moving objects produce a noted blur along their trajectory, often resulting in distorted or unrecognizable object shapes. The faster the object moves, the stronger this effect is, especially if the trajectory of the moving object is not linear (Shetchman et al., 2002). To quantify this motion blur and therefore to determine the temporal resolution limit, we proceed as follows: the neutron imaging video of a rotating cadmium indicator is captured for different rotational speeds (0.0017, 0.45, 1.2 and 1.65 RPS2) and different frame rates (Figure 9). The indicator used is a thin 7.9 mm wide cadmium blade. It was put into rotational motion by a speed controlled electrical motor. Video sequences (Vid. Seq.) for different indicator rotational speeds are captured under the following conditions (table 2):


Table 2. Experimental video sequences capture conditions.

Fig. 9. Experimental arrangement for induced motion blurs characterisation.

The video sequences obtained are examined with suitable image processing and analysis software (Image J). The used CCD neutron camera integrates light coming from the scintillator screen during the exposure time in order to generate each frame. The software

 2 RPS : rounds per second.

<sup>3</sup> fps: frames per second.

enables the calculation of the motion blur B using equation (5) for each video sequence obtained. The motion blur in this case is considered as the width (in pixels) of the shade involved behind the rotating indicator appearing in each frame. This width is measured at the middle of the indication on a selected frame. The data obtained allows us to determine the effective limits of temporal resolution for each rotational speed.

$$B = I\_{wl} - I\_{wr} \left[ \frac{L\_f}{L} \left( 1 - \frac{D}{l\_{wr}} \right) + 1 \right] \tag{5}$$

where: *Iwr* is the real indicator width (7.9mm), *Iwi* is the indicator width measured on the obtained image, *Lf* is the distance between the indicator position and the scintillator screen (10.5 mm), *L* is the distance between the collimator inlet aperture and the indicator position (2500.5 mm), *D* is the collimator inlet aperture diameter (20 mm).

#### **2.2.2 Experimental results**

120 Digital Image Processing

2005). One effect (motion blur) is caused by the exposure-time of the camera and the other effect (motion aliasing) is due to the temporal sub-sampling introduced by the frame rate of the camera. In our case of neutron imaging, only motion blur is of interest. It occurs when the camera integrates the light coming from the scene (scintillator) during the exposure time in order to generate each frame. As a result, fast moving objects produce a noted blur along their trajectory, often resulting in distorted or unrecognizable object shapes. The faster the object moves, the stronger this effect is, especially if the trajectory of the moving object is not linear (Shetchman et al., 2002). To quantify this motion blur and therefore to determine the temporal resolution limit, we proceed as follows: the neutron imaging video of a rotating cadmium indicator is captured for different rotational speeds (0.0017, 0.45, 1.2 and 1.65 RPS2) and different frame rates (Figure 9). The indicator used is a thin 7.9 mm wide cadmium blade. It was put into rotational motion by a speed controlled electrical motor. Video sequences (Vid. Seq.) for different indicator rotational speeds are captured under the

**intensity** 

**Camera Gain** 

Rotation direction

**Exposure time (s)** 

**Frame rates (fps**3**) Neutron beam** 

Fig. 9. Experimental arrangement for induced motion blurs characterisation.

The video sequences obtained are examined with suitable image processing and analysis software (Image J). The used CCD neutron camera integrates light coming from the scintillator screen during the exposure time in order to generate each frame. The software

Table 2. Experimental video sequences capture conditions.

Speed-controlled motor

Rotating Cadmium

Field of view (scintillator

indicator

screen)

Vid. Seq.1 12.5 1.5x106 n/cm2/s 22 dB 90' Vid. Seq.2 25 1.5x106 n/cm2/s 22 dB 90' Vid. Seq.3 50 1.5x106 n/cm2/s 22 dB 90' Vid. Seq.4 125 1.5x106 n/cm2/s 22 dB 90' Vid. Seq.5 250 1.5x106 n/cm2/s 38 dB 90'

following conditions (table 2):

 2

3

RPS : rounds per second.

fps: frames per second.

Selected frame from captured video sequences is presented in figure 10. Preliminary analysis of the video sequences obtained shows that the temporal resolution is affected principally by motion blur and not by motion-based aliasing.

Fig. 10. Selected frame showing the indicator position and motion blur width taken from selected video sequence captured with a frame rates of 12.5 fps and for a cadmium indicator's rotation speed of 0.45 RPS.

Motions blur variation as a function of frame rate is presented in figure 11. Graphs 1 and 2 correspond to indication rotation speeds of 0.45 and 1.2 RPS respectively.

On graphs 1 and 2, we can easily verify that motion blur decrease when the frame rate is increased up to critical values where it became constant. It was observed that the critical points are located at 12.5, 50, 125 and 250 fps on the frame rate axis for, respectively, 0.0017, 0.45, 1.2 and 1.65 RPS of the cadmium indicator rotation speeds. These critical points correspond to the temporal resolution limits that retain motion blur at a minimum although with low image dynamic range. Results of temporal resolution limits for different rotational speeds are shown in figure 12.

Temporal and Spatial Resolution Limit Study of

summarized in reference (SIT Technical Note, 1994).

and is typically applied at the analogue-to-digital conversion step.

**3.1 Neutron video capture conditions study** 

high bit depth (+14 bits).

**in neutron imaging** 

before.

Radiation Imaging Systems: Notions and Elements of Super Resolution 123

especially when the neutron beam intensity is low. Finally, it's very important to mention that the video sequence captured are, relatively, not rich in terms of gray levels digitalization and contrast because of the limited dynamic range and bit depth (8 bits) of the imaging system being studied. In this work, it is obvious that, the image quality is analyzed and judged regarding to the performance of the imaging system used for neutron video capture. In order to improve the video image quality it is recommended to increase the neutron beam intensity at the sample level to the maximum and to use a CCD camera with a

**3. Criteria for best video capture results and application of super resolution** 

Dynamic process imaging using a neutron beam is a very powerful and interesting investigative tool as far as light materials, elements and substances such hydrogen, water lubricants and some other relevant materials are concerned. The neutron imaging system that is most utilised is based on a CCD camera (charged coupled device) and scintillator screen. Because of the complexity and large number of operations necessary for capturing dynamic processes with any neutron digital imaging system (neutron camera), specific optimum experimental conditions and procedural accuracy are required. The sequence of events required to capture a single image with a full-frame CCD camera system is

In neutron imaging with CCD-based neutron camera, there are several camera operating parameters that modify the readout stage of image acquisition and have an impact on image quality (Spring, K. R. et al., www.microscopyu.com/articles/digitalimaging/ccdintro.html). The two most important ones are the frame rate (FRM) and the gain (G). The frame rate (also referred to as the readout rate) of most scientific-grade CCD cameras is adjustable. The maximum achievable rate is a function of the processing speed of the camera electronics, which reflects the time required to digitize a single pixel. Applications aimed at tracking rapid kinetic processes require fast readout and frame rates (FRM) in order to achieve adequate temporal resolution. In certain situations, a video rate of 30 frames per second or higher is necessary. A second camera acquisition factor, which can affect image quality because it modifies the CCD readout process, is the electronic gain (G) of the camera system. The gain adjustment of a digital CCD camera system defines the number of accumulated photoelectrons that determine each gray level step distinguished by the readout electronics,

In this work, video sequence quality in terms of contrast, resolution and noise are studied and the most suitable conditions for the flow process examination are determined as a function of the neutron beam intensity. The second objective of this work is the application of a post acquisition super resolution (SR) processing procedures to improve the quality of the neutron video obtained. The neutron imaging system used is the same as described

According to previous work results, it was demonstrated that the optimum acquisition parameters for a water flow process capture are respectively: 12.5 fps as a capture frame rate

 Fig. 11. Motion blurs variation of video sequences in function of frame rates for two rotation speeds of the cadmium indicator.

Fig. 12. Temporal resolution limits as a function of indicator rotation speed. The temporal resolution limit allowing optimum image quality in term of motion blur, with, certainly a low dynamic range, presented in this figure are for specific exposure and image capture conditions, namely: a neutron beam intensity of 1.5x106 n/cm2/s, a camera signal gain of 22 dB and 38 dB (last point).

As we can see in figure 11, the temporal resolution limit is closely related to the indicator rotational speed. These values give an idea about the optimal frame rate that must be selected to avoid motion blur but not necessarily the most suitable for optimum frame exposure. With low frame exposure contrast and dynamic range can be seriously affected

Equation y = A1\*exp(-x/t1) + y0 Adj. R-Square 0,99568

Fig. 11. Motion blurs variation of video sequences in function of frame rates for two rotation

0,0 0,2 0,4 0,6 0,8 1,0 1,2 1,4 1,6 1,8

Indicator Rotation Speed (RPS)

Fig. 12. Temporal resolution limits as a function of indicator rotation speed. The temporal resolution limit allowing optimum image quality in term of motion blur, with, certainly a low dynamic range, presented in this figure are for specific exposure and image capture conditions, namely: a neutron beam intensity of 1.5x106 n/cm2/s, a camera signal gain of 22

As we can see in figure 11, the temporal resolution limit is closely related to the indicator rotational speed. These values give an idea about the optimal frame rate that must be selected to avoid motion blur but not necessarily the most suitable for optimum frame exposure. With low frame exposure contrast and dynamic range can be seriously affected

C1 y0 0,00536 0,00182 C1 A1 0,07949 0,00302 C1 t1 0,26834 0,03292

Motion Blur (pixels)

0 40 80 120 160 200 240

Critical point

Framerate (frames/s)

Fit of Motion Blur Data

Rotating Speed: 1,20 RPS

<sup>140</sup> Motion Blur

Temporal Resolution Exp Dec Fit of data

Value Standard Error

0 50 100 150 200 250

Framerate (frames/s)

Rotating Speed: 0,45 RPS

Motion Blur Fit of Motion blur data

Critical point

speeds of the cadmium indicator.

0,00 0,01 0,02 0,03 0,04 0,05 0,06 0,07 0,08 0,09

dB and 38 dB (last point).

Temporal Resolution (s)

Motion Blur (pixels)

especially when the neutron beam intensity is low. Finally, it's very important to mention that the video sequence captured are, relatively, not rich in terms of gray levels digitalization and contrast because of the limited dynamic range and bit depth (8 bits) of the imaging system being studied. In this work, it is obvious that, the image quality is analyzed and judged regarding to the performance of the imaging system used for neutron video capture. In order to improve the video image quality it is recommended to increase the neutron beam intensity at the sample level to the maximum and to use a CCD camera with a high bit depth (+14 bits).

#### **3. Criteria for best video capture results and application of super resolution in neutron imaging**

Dynamic process imaging using a neutron beam is a very powerful and interesting investigative tool as far as light materials, elements and substances such hydrogen, water lubricants and some other relevant materials are concerned. The neutron imaging system that is most utilised is based on a CCD camera (charged coupled device) and scintillator screen. Because of the complexity and large number of operations necessary for capturing dynamic processes with any neutron digital imaging system (neutron camera), specific optimum experimental conditions and procedural accuracy are required. The sequence of events required to capture a single image with a full-frame CCD camera system is summarized in reference (SIT Technical Note, 1994).

In neutron imaging with CCD-based neutron camera, there are several camera operating parameters that modify the readout stage of image acquisition and have an impact on image quality (Spring, K. R. et al., www.microscopyu.com/articles/digitalimaging/ccdintro.html). The two most important ones are the frame rate (FRM) and the gain (G). The frame rate (also referred to as the readout rate) of most scientific-grade CCD cameras is adjustable. The maximum achievable rate is a function of the processing speed of the camera electronics, which reflects the time required to digitize a single pixel. Applications aimed at tracking rapid kinetic processes require fast readout and frame rates (FRM) in order to achieve adequate temporal resolution. In certain situations, a video rate of 30 frames per second or higher is necessary. A second camera acquisition factor, which can affect image quality because it modifies the CCD readout process, is the electronic gain (G) of the camera system. The gain adjustment of a digital CCD camera system defines the number of accumulated photoelectrons that determine each gray level step distinguished by the readout electronics, and is typically applied at the analogue-to-digital conversion step.

In this work, video sequence quality in terms of contrast, resolution and noise are studied and the most suitable conditions for the flow process examination are determined as a function of the neutron beam intensity. The second objective of this work is the application of a post acquisition super resolution (SR) processing procedures to improve the quality of the neutron video obtained. The neutron imaging system used is the same as described before.

#### **3.1 Neutron video capture conditions study**

According to previous work results, it was demonstrated that the optimum acquisition parameters for a water flow process capture are respectively: 12.5 fps as a capture frame rate

Temporal and Spatial Resolution Limit Study of

flow process through three different diameter holes.

Radiation Imaging Systems: Notions and Elements of Super Resolution 125

Fig. 13. Experimental system used in the study and visualization of water and car engine oil

Super resolution (SR) is a mean of producing a high-resolution (HR) image from one or a set of low-resolution, blurred and noisy images. It represent the capacity to transform noisy and blurred images obtained by low resolution (LR) imaging system (camera) to a higher resolution (HR) image with greater details than those of LR image. Methods of SR are classified coarsely into two categories: (1) The classical multi-images super resolution methods, and (2) Example based super resolution (Irani et al., 1991, Capel, 2004, Farsiu et al., 2004). Practically, classical methods allow small increases in resolution by factor smaller than two (Baker et al., 2002). However, example-based SR methods have been shown to exceed the limits of classical methods (Freeman et al., 2000, Kim et al., 2008). Other more sophisticated methods for image up-scaling based on learning edge models have also been developed and proposed (Sun et al., 2008). To achieve this HR image with classical multiimage super resolution methods, usually, an estimation of an image which minimizes the difference between its projection and each of the low resolution (LR) images through suitable algorithms is performed based on iteration process. LR images must have different sub pixel shifts, so that every image contains new information. To guarantee this last condition, two alternatives are available: 1. Extracting a number of frames from a video that are captured in a very small laps of time, 2. camera is moved from frame to frame, 3. multiple cameras are used in different positions. SR restores HR images from degraded (noisy, blurred and aliased) images. The first step in SR procedure application is the formulation of a model that relates between HR image and LR image. The common used

**3.2 Super resolution video sequence enhancement procedures and algorithms** 

model is given by the following equation (Sroubek & Flusser, 2007, Sung et al., 2003):

(display 29.97 fps, MPEG-1 format) and a selected value of 22dB as a signal gain for the case of the imaging system being used (Kharfi et al, 2011). In this work the captured video images have been analyzed in terms of contrast, noise and resolution according to the neutron beam intensity. This is to check the effect of the neutron beam intensity on the image quality. The quantification of contrast, noise and resolution of the neutron video captured is based on histogram and edge profile analysis. Indeed, understanding image histograms is probably the single most important element in the analysis of images or video sequences from a digital CCD camera. A histogram can tell us whether or not the image has been properly exposed, and what adjustments will work best. Noise is the most important variable that can strongly affect the quality of a digital image or video. In this work, only gamma radiation noise (impulse noise) that appears in images as undesirable white spots is considered. The gamma noise can be estimated through the statistical measure called the "standard deviation," which quantifies the typical variation that a pixel gray-scale value will have from its mean and "true" value. This concept can also be understood by looking at the histogram for a carefully selected bright region of interest (ROI) on a frame arbitrarily selected from the video sequence obtained.

A histogram can also describe the amount of contrast. Contrast is the difference in brightness between light and dark areas in a frame. In this work, the maximum contrast is estimated according to Michelson formula given by Eq.5 (Michelson, 1927):

$$\mathbf{C} = \frac{\mathbf{l\_{max}} - \mathbf{l\_{m\ln n}}}{\mathbf{l\_{max}} + \mathbf{l\_{m\ln n}}} \tag{6}$$

with *Imax* and *Imin* representing the highest and lowest luminance of the analyzed image.

In neutron imaging, it is common that not all important image quality criteria can be simultaneously optimized in a single image, or a video. Obtaining the best images within the constraints imposed by a particular process or experiment typically requires a compromise between the listed criteria, which often exert contradictory demands (Anderson, 2009). In order to study the influence of the neutron beam intensity on the neutron video quality, capture of time-lapse sequences of water and car engine oil flows inside a metallic container system is performed. Several videos are captured under different experimental neutron exposure conditions of the flow process. Details and conditions of video sequences capture are shown in table 3. The experiment setup used consists of two Aluminium compartments which communicate through three holes of 1mm, 1.5 mm and 2 mm in diameters (figure 13).


Table 3. Experimental video sequences neutron exposure and capture conditions.

(display 29.97 fps, MPEG-1 format) and a selected value of 22dB as a signal gain for the case of the imaging system being used (Kharfi et al, 2011). In this work the captured video images have been analyzed in terms of contrast, noise and resolution according to the neutron beam intensity. This is to check the effect of the neutron beam intensity on the image quality. The quantification of contrast, noise and resolution of the neutron video captured is based on histogram and edge profile analysis. Indeed, understanding image histograms is probably the single most important element in the analysis of images or video sequences from a digital CCD camera. A histogram can tell us whether or not the image has been properly exposed, and what adjustments will work best. Noise is the most important variable that can strongly affect the quality of a digital image or video. In this work, only gamma radiation noise (impulse noise) that appears in images as undesirable white spots is considered. The gamma noise can be estimated through the statistical measure called the "standard deviation," which quantifies the typical variation that a pixel gray-scale value will have from its mean and "true" value. This concept can also be understood by looking at the histogram for a carefully selected bright region of interest (ROI) on a frame arbitrarily

A histogram can also describe the amount of contrast. Contrast is the difference in brightness between light and dark areas in a frame. In this work, the maximum contrast is

> C = ��������� ���������

In neutron imaging, it is common that not all important image quality criteria can be simultaneously optimized in a single image, or a video. Obtaining the best images within the constraints imposed by a particular process or experiment typically requires a compromise between the listed criteria, which often exert contradictory demands (Anderson, 2009). In order to study the influence of the neutron beam intensity on the neutron video quality, capture of time-lapse sequences of water and car engine oil flows inside a metallic container system is performed. Several videos are captured under different experimental neutron exposure conditions of the flow process. Details and conditions of video sequences capture are shown in table 3. The experiment setup used consists of two Aluminium compartments which communicate through three holes of 1mm, 1.5 mm and 2

with *Imax* and *Imin* representing the highest and lowest luminance of the analyzed image.

**Liquid Neutron beam** 

Video Sequence.1 water 1.6 x 106 22 12.5 Video Sequence.2 water 1.44 x 107 22 12.5 Video Sequence.3 car engine oil 1.6 x 106 22 12.5 Video Sequence.4 car engine oil 4.8 x 106 22 12.5 Video Sequence.5 car engine oil 9.6 x 106 22 12.5 Video Sequence.6 car engine oil 1.44 x 107 22 12.5

Table 3. Experimental video sequences neutron exposure and capture conditions.

**intensity (n/cm2/s)** 

(6)

**Gain (dB)** 

**Capture frame rate (fps)** 

estimated according to Michelson formula given by Eq.5 (Michelson, 1927):

selected from the video sequence obtained.

mm in diameters (figure 13).

Fig. 13. Experimental system used in the study and visualization of water and car engine oil flow process through three different diameter holes.

#### **3.2 Super resolution video sequence enhancement procedures and algorithms**

Super resolution (SR) is a mean of producing a high-resolution (HR) image from one or a set of low-resolution, blurred and noisy images. It represent the capacity to transform noisy and blurred images obtained by low resolution (LR) imaging system (camera) to a higher resolution (HR) image with greater details than those of LR image. Methods of SR are classified coarsely into two categories: (1) The classical multi-images super resolution methods, and (2) Example based super resolution (Irani et al., 1991, Capel, 2004, Farsiu et al., 2004). Practically, classical methods allow small increases in resolution by factor smaller than two (Baker et al., 2002). However, example-based SR methods have been shown to exceed the limits of classical methods (Freeman et al., 2000, Kim et al., 2008). Other more sophisticated methods for image up-scaling based on learning edge models have also been developed and proposed (Sun et al., 2008). To achieve this HR image with classical multiimage super resolution methods, usually, an estimation of an image which minimizes the difference between its projection and each of the low resolution (LR) images through suitable algorithms is performed based on iteration process. LR images must have different sub pixel shifts, so that every image contains new information. To guarantee this last condition, two alternatives are available: 1. Extracting a number of frames from a video that are captured in a very small laps of time, 2. camera is moved from frame to frame, 3. multiple cameras are used in different positions. SR restores HR images from degraded (noisy, blurred and aliased) images. The first step in SR procedure application is the formulation of a model that relates between HR image and LR image. The common used model is given by the following equation (Sroubek & Flusser, 2007, Sung et al., 2003):

Temporal and Spatial Resolution Limit Study of


images.

Original neutron video

reorganisation in sets of low

Neutron video

resolution frames

generation and video reconstruction

reconstruction.

Radiation Imaging Systems: Notions and Elements of Super Resolution 127

The second objective of this work is the application and testing of SR procedure based on robust method on a neutron imaging video in order to improve their quality in term of spatial resolution by reducing the motion blur and also in term of noise. For such a purpose, we have adapted a Robust SR procedure described in (Zomet et al., 2001) in a MATLAB code to improve the quality and resolution of neutron video obtained. This method is an Iterated Back Projection one that computes the gradient, which is not given by the sum of all errors, but by the median of all errors. This brings robustness against outliners in the LR

For practical purpose, the video frames are, first, rearranged into sets of frames (5 frames in each one). Then, the SR procedure mentioned above is applied on each set of frames separately with an interpolation factor of 2 (figure 15). For each set of frame, the motion

……….........

Fig. 15. Neutron video reorganisation in sets of multi-frames, SR procedure application on low resolution frames, high resolution frames generation and high resolution neutron video

……….......... High resolution frames

SR procedure application on low resolution multi-frames sets


$$\mathbf{y}\_{\mathbf{k}} = \mathbf{D}\_{\mathbf{k}} \mathbf{B}\_{\mathbf{k}} \mathbf{M}\_{\mathbf{k}} \mathbf{x} + \mathbf{n}\_{\mathbf{k}} \tag{7}$$

Where:


With this simple formulation, the direct solution estimate can be inaccurate. Robust SR methods can help to improve this solution. The solution estimate in each iteration is updated by a gradient iterative minimization method given by the following expression:

$$\mathbf{x}^{\mathbf{n}+1} = \mathbf{x}^{\mathbf{n}} + \delta \nabla \mathbf{E}(\mathbf{x}) \tag{8}$$

where *E(x)* is the total squared error of resampling the high-resolution image represented by *x* and *δ* is a scale factor defining the step size in the direction of the gradient.

$$E(\mathbf{x}) = \frac{1}{2} \Sigma\_{\mathbf{k}=1}^{\mathbf{n}} \|\mathbf{y}\_{\mathbf{k}} - \mathbf{D} \mathbf{B}\_{\mathbf{k}} \mathbf{M}\_{\mathbf{k}} \mathbf{x}\|\_{2}^{2} \tag{9}$$

The gradient of *x* is can be given by:

$$\text{VE(x)} = \sum\_{\mathbf{k}=\mathbf{1}}^{\mathbf{n}} B\_{\mathbf{k}} \tag{10}$$

Robustness is introduced by replacing this last sum of images by the scaled pixel median:

$$\nabla E(\mathbf{x}) = n.median\{B\_k\}\_{k=1}^n \tag{11}$$

Some other combinations in addition to the median operator can also be applied to get more suitable and accurate results.

The procedure of SR consists of three stages (Fig.14):


Fig. 14. Model for Super Resolution for observed LR.

The most important SR image reconstruction approaches are the following:


With this simple formulation, the direct solution estimate can be inaccurate. Robust SR methods can help to improve this solution. The solution estimate in each iteration is updated by a gradient iterative minimization method given by the following expression:

where *E(x)* is the total squared error of resampling the high-resolution image represented by

∇E(x) = ∑ B� �

∇�(�) = �� �������������

Some other combinations in addition to the median operator can also be applied to get more

Registration Interpolation

Robustness is introduced by replacing this last sum of images by the scaled pixel median:

� ∑ ‖y� − DB�M�x‖� � �

*x* and *δ* is a scale factor defining the step size in the direction of the gradient.

E(x) <sup>=</sup> �

Where:




The gradient of *x* is can be given by:

suitable and accurate results.

2. interpolation onto HR grid; 3. Removing blur and noise.

1. registration;

Degradation: - Blur; -Noise;


The procedure of SR consists of three stages (Fig.14):

Fig. 14. Model for Super Resolution for observed LR.




The most important SR image reconstruction approaches are the following:


y� = D�B�M�x+n� (7)

x��� = x� + δ∇E(x) (8)

��� (9)

��� (10)

� (11)

Reconstructed image

The second objective of this work is the application and testing of SR procedure based on robust method on a neutron imaging video in order to improve their quality in term of spatial resolution by reducing the motion blur and also in term of noise. For such a purpose, we have adapted a Robust SR procedure described in (Zomet et al., 2001) in a MATLAB code to improve the quality and resolution of neutron video obtained. This method is an Iterated Back Projection one that computes the gradient, which is not given by the sum of all errors, but by the median of all errors. This brings robustness against outliners in the LR images.

For practical purpose, the video frames are, first, rearranged into sets of frames (5 frames in each one). Then, the SR procedure mentioned above is applied on each set of frames separately with an interpolation factor of 2 (figure 15). For each set of frame, the motion

Fig. 15. Neutron video reorganisation in sets of multi-frames, SR procedure application on low resolution frames, high resolution frames generation and high resolution neutron video reconstruction.

Temporal and Spatial Resolution Limit Study of

106 n/cm2/s and 1.44 x 107 n/cm2/s (from the left).

(from the left to the right).

Table 4. Summary of results obtained.

Radiation Imaging Systems: Notions and Elements of Super Resolution 129

Fig. 16. Frames taken respectively from video sequence 1 and 2 of the water flow process for a frame rate of 12.5 fps, a Gain of 22 dB and a respectively neutron beam intensity of 1.6 x

Fig. 17. Frames taken respectively from video sequence 3, 4, 5 and 6 of the car engine oil flow process for a frame rate of 12.5 fps, a Gain of 22 dB and a respectively neutron beam intensity of 1.6 x 106 n/cm2/s, 4.8 x 106 n/cm2/s, 9.6 x 106 n/cm2/s and 1.44 x 107 n/cm2/s

**Video Sequences liquid Thermal noise (std.dev) Contrast**  Video Sequence.1 water 1.58 0.184 Video Sequence.2 water 1.52 0.409 Video Sequence.3 car engine oil 1.35 0.164 Video Sequence.4 car engine oil 2.35 0.376 Video Sequence.5 car engine oil 4.05 0.426 Video Sequence.6 car engine oil 1.89 0.446

estimation that is necessary for the application of this SR method is performed with Vandewalle et al (2006) algorithm. This method uses the property that a shift in the space domain is translated into a linear shift in the phase of the image's Fourier Transform. Similarly, a rotation in the space domain is visible in the amplitude of the Fourier Transform. Hence, the Vandewalle et al. motion estimation algorithm computes the images' Fourier Transforms and determines the 1-D shifts in both their amplitudes and phases. One advantage of this method is that it discards high-frequency components, where aliasing may have occurred, in order to be more robust. When all the frames sets are processed, the HR frames generated are used for a high resolution neutron video reconstruction and displaying with a standard speed of 25 fps or 30 fps (figure 14). To guarantee a best result with this proposed procedure, the capture frame rate of the examined dynamic process must be higher than the displaying one which is generally selected equal to 25 or 30 fps, depending on the used standard. The robust method applied to improve spatial resolution is based on pixel interpolation and super-sampling. The presented example in this chapter will not claim for completeness. The field of neutron imaging is experimental and seems to change with every new system built and CCD camera used for the examinations.

It is very important to mention that the capture frame rate must be accuracy selected to guarantee the optimum exposure of each frame to the neutron beam that is necessary for the well perception of the dynamic process being examined regarding its speed. The spatial resolution limit of the imaging system must be also taken into consideration. In our case the selected video capturing speed (frame rate) of 12.5 fps is sufficient to ensure the optimum visualization of the studied dynamic process. The rearrangement of the obtained video into frames sets of 5 frames doesn't affect the video quality. This is because our imaging system has a spatial resolution limit of ~ 400 μm and the average speed of the studied flow is about 0.095 m/s. According to these data, the equivalent video frame rate after SR procedure application, which is equal to the ratio of the average speed and the spatial resolution limit values, is approximately equal to ~2.35 fps. This effective frame rate when multiplied by 5 must give a value that is in the order of the selected video capture frame rate of 12.5 fps (5x2.35=11.75). Indeed, the selected capture frame rate and the number of frames in each set ensure the optimum visualization of studied flow process.

#### **3.3 Results and discussions**

#### **3.3.1 Best video capture practical conditions establishment**

Selected frames from the video sequences obtained are presented in figure 16 and 17. Selected region of interest (ROI) of 40x30 pixels are indicated by dashed and white squares. These regions indicate where gray levels measurements are performed that allow the calculation of necessary standard deviation for the estimation of noise and the necessary maximum and minimum gray levels values for the determination of contrast. A summary of the results obtained is given in table 4 based on histogram analysis. The variation of contrast and noise as a function of neutron beam intensity is shown in figure 18(a) and 18(b).

From the results obtained, we can first observe that at the maximum neutron beam intensity of 1.44 x 107 n/cm2/s, the video sequence obtained is the best one in terms of contrast and

estimation that is necessary for the application of this SR method is performed with Vandewalle et al (2006) algorithm. This method uses the property that a shift in the space domain is translated into a linear shift in the phase of the image's Fourier Transform. Similarly, a rotation in the space domain is visible in the amplitude of the Fourier Transform. Hence, the Vandewalle et al. motion estimation algorithm computes the images' Fourier Transforms and determines the 1-D shifts in both their amplitudes and phases. One advantage of this method is that it discards high-frequency components, where aliasing may have occurred, in order to be more robust. When all the frames sets are processed, the HR frames generated are used for a high resolution neutron video reconstruction and displaying with a standard speed of 25 fps or 30 fps (figure 14). To guarantee a best result with this proposed procedure, the capture frame rate of the examined dynamic process must be higher than the displaying one which is generally selected equal to 25 or 30 fps, depending on the used standard. The robust method applied to improve spatial resolution is based on pixel interpolation and super-sampling. The presented example in this chapter will not claim for completeness. The field of neutron imaging is experimental and seems to change with every new system built and

It is very important to mention that the capture frame rate must be accuracy selected to guarantee the optimum exposure of each frame to the neutron beam that is necessary for the well perception of the dynamic process being examined regarding its speed. The spatial resolution limit of the imaging system must be also taken into consideration. In our case the selected video capturing speed (frame rate) of 12.5 fps is sufficient to ensure the optimum visualization of the studied dynamic process. The rearrangement of the obtained video into frames sets of 5 frames doesn't affect the video quality. This is because our imaging system has a spatial resolution limit of ~ 400 μm and the average speed of the studied flow is about 0.095 m/s. According to these data, the equivalent video frame rate after SR procedure application, which is equal to the ratio of the average speed and the spatial resolution limit values, is approximately equal to ~2.35 fps. This effective frame rate when multiplied by 5 must give a value that is in the order of the selected video capture frame rate of 12.5 fps (5x2.35=11.75). Indeed, the selected capture frame rate and the number of frames in each set

Selected frames from the video sequences obtained are presented in figure 16 and 17. Selected region of interest (ROI) of 40x30 pixels are indicated by dashed and white squares. These regions indicate where gray levels measurements are performed that allow the calculation of necessary standard deviation for the estimation of noise and the necessary maximum and minimum gray levels values for the determination of contrast. A summary of the results obtained is given in table 4 based on histogram analysis. The variation of contrast

From the results obtained, we can first observe that at the maximum neutron beam intensity of 1.44 x 107 n/cm2/s, the video sequence obtained is the best one in terms of contrast and

and noise as a function of neutron beam intensity is shown in figure 18(a) and 18(b).

CCD camera used for the examinations.

**3.3 Results and discussions** 

ensure the optimum visualization of studied flow process.

**3.3.1 Best video capture practical conditions establishment** 

Fig. 16. Frames taken respectively from video sequence 1 and 2 of the water flow process for a frame rate of 12.5 fps, a Gain of 22 dB and a respectively neutron beam intensity of 1.6 x 106 n/cm2/s and 1.44 x 107 n/cm2/s (from the left).

Fig. 17. Frames taken respectively from video sequence 3, 4, 5 and 6 of the car engine oil flow process for a frame rate of 12.5 fps, a Gain of 22 dB and a respectively neutron beam intensity of 1.6 x 106 n/cm2/s, 4.8 x 106 n/cm2/s, 9.6 x 106 n/cm2/s and 1.44 x 107 n/cm2/s (from the left to the right).


Table 4. Summary of results obtained.

Temporal and Spatial Resolution Limit Study of

in figure 20 with the corresponding FFT and histogram.

deviation) is significantly reduced.

Radiation Imaging Systems: Notions and Elements of Super Resolution 131

histogram. The FFT is used to show the frequency support of the noisy and low resolution original frame and the histogram can inform us about sampling, gray levels distribution and noise affecting this frame. The SR algorithm and method described above is applied with an

Fig. 19. One of the five LR frame selected from a set after arrangement of obtained neutron

Example of generated high resolution frame result after the SR method application is shown

Fig. 20. Super resolved frame obtained after the application of the Robust Super resolution method on a set of low resolution frames arbitrary selected from the obtained neutron video of the flow process and its FFT and Histogram. When comparing FFT and histograms, we can easily verify that the sampling and resolution are improved and the noise (standard

video in a number of frames sets and its corresponding FFT and histogram.

interpolation factor of 2 to improve the quality of this frame after a shift estimation.

Fig. 18. (a: left). Contrast variation as a function of neutron beam intensity, (b: right). Random distribution of noise as a function of neutron beam intensity.

noise. Thus, a value of a neutron beam intensity of 1.6 x 106 n/cm2/s is not enough to produce a well perceptible neutron image with a wide dynamic range in this case of studied flow process. The contrast was found proportional to the neutron beam intensity but after a value of ~ 1x107 n/cm2/s it becomes almost constant (kharfi et al., 2011). The impulse noise that affects the video sequences has a random distribution as a function of the neutron beam intensity. The optimum exposure and acquisition parameters that allow capturing a high quality neutron video sequence for this case of flow process examination are the following (table 5):


Table 5. Optimum video capture parameters.

In this work, it was established that the best video sequences 2 (water) and 6 (oil)) were obtained with a frame rate of 12.5 fps and a gain of 22 dB for a neutron beam intensity 1.44 x 107 n/cm2/s. These exposure and capture video condition have guaranteed 256 digitized levels (full dynamic range). These video sequence are well exposed, the different phases of the flow process (continuous flow, drops) are well perceptible. These video can be well exploited in study the flow process phases and the drops shape as a function of hole diameter. The flow speed as a function of pressure and hole diameter can be also measured. Thus, the neutron video sequences obtained are rich in information and can be used for a wide variety of applications and purposes in the domain of fluid flow analysis. A future works will be focused on the exploitation of such video sequence information for a specific flow analysis.

#### **3.3.2 Practical example of video capture and Super Resolution image quality improvement**

According to the above results, video sequence 5 captured at a neutron beam intensity of 4.8 x106 n/cm2/s is the highly noised one. A selected LR frames set is taken from this video sequence. A frame from this set is shown in figure 19 with its corresponding FFT4 and

<sup>4</sup> FFT : Fourier Fast Transform.

1

Fig. 18. (a: left). Contrast variation as a function of neutron beam intensity, (b: right).

video sequence for this case of flow process examination are the following (table 5):

on the exploitation of such video sequence information for a specific flow analysis.

**3.3.2 Practical example of video capture and Super Resolution image quality** 

According to the above results, video sequence 5 captured at a neutron beam intensity of 4.8 x106 n/cm2/s is the highly noised one. A selected LR frames set is taken from this video sequence. A frame from this set is shown in figure 19 with its corresponding FFT4 and

**parameters** Neutron beam intensity Capture Frame rate Gain **Optimum value** 1x107 n/cm2/s 12.5 fps 22 dB

In this work, it was established that the best video sequences 2 (water) and 6 (oil)) were obtained with a frame rate of 12.5 fps and a gain of 22 dB for a neutron beam intensity 1.44 x 107 n/cm2/s. These exposure and capture video condition have guaranteed 256 digitized levels (full dynamic range). These video sequence are well exposed, the different phases of the flow process (continuous flow, drops) are well perceptible. These video can be well exploited in study the flow process phases and the drops shape as a function of hole diameter. The flow speed as a function of pressure and hole diameter can be also measured. Thus, the neutron video sequences obtained are rich in information and can be used for a wide variety of applications and purposes in the domain of fluid flow analysis. A future works will be focused

noise. Thus, a value of a neutron beam intensity of 1.6 x 106 n/cm2/s is not enough to produce a well perceptible neutron image with a wide dynamic range in this case of studied flow process. The contrast was found proportional to the neutron beam intensity but after a value of ~ 1x107 n/cm2/s it becomes almost constant (kharfi et al., 2011). The impulse noise that affects the video sequences has a random distribution as a function of the neutron beam intensity. The optimum exposure and acquisition parameters that allow capturing a high quality neutron

Random distribution of noise as a function of neutron beam intensity.

2

3

Noise (std,dev,)

4

0,0 6,0x106 1,2x107

Neutron beam intensity(n/cm2/s)

0,0 6,0x10<sup>6</sup> 1,2x10<sup>7</sup>

Table 5. Optimum video capture parameters.

Neutron beam intensity (n/cm2/s)

0,1

**improvement** 

FFT : Fourier Fast Transform.

 4 0,2

0,3

Contrast (C)

0,4

0,5

histogram. The FFT is used to show the frequency support of the noisy and low resolution original frame and the histogram can inform us about sampling, gray levels distribution and noise affecting this frame. The SR algorithm and method described above is applied with an interpolation factor of 2 to improve the quality of this frame after a shift estimation.

Fig. 19. One of the five LR frame selected from a set after arrangement of obtained neutron video in a number of frames sets and its corresponding FFT and histogram.

Example of generated high resolution frame result after the SR method application is shown in figure 20 with the corresponding FFT and histogram.

Fig. 20. Super resolved frame obtained after the application of the Robust Super resolution method on a set of low resolution frames arbitrary selected from the obtained neutron video of the flow process and its FFT and Histogram. When comparing FFT and histograms, we can easily verify that the sampling and resolution are improved and the noise (standard deviation) is significantly reduced.

Temporal and Spatial Resolution Limit Study of

978-0-387-78692-6.

Dordrecht, Holland.

*2000 PICS Conference*, pp. (135-138).

*Image Processing*, 53, pp. (231-239).

*conference*, Istanbul, Turkey, 2004.

Michelson, A. (1927). *Studies in Optics*. U. of Chicago Press, USA.

International Publishing, pp. (485-522).

**5. References** 

1183).

5135-5.

USA.

(213-218).

TR-173.

Radiation Imaging Systems: Notions and Elements of Super Resolution 133

video sequence with a post processing procedures. Here it is very important to mention that the post processing procedures must be well selected to avoid unnecessary processing.

Crow, L. (2009). Neutron detectors for imaging. In: *Neutron imaging and applications*

Baker, A., Kanade, T. (2002). Limits on super resolution and how to break them, *IEEE* 

Burns, P.D. (2000). Slanted-Edge MTF for Digital Camera and Scanner Analysis*, Proc. IS&T* 

Domanus, J.C. (1992). *Practical Neutron Radiography*, Kluwer Academic Publishers,

Estribau, M., Magnau, P. (2004). Fast MTF measurement of CMOS images using ISO 12233

Farsiu, S., Robinson, M.D., Elad, M., Milanfar, P. (2004). Fast and robust multiframe super resolution. *IEEE Transactions in image processing*, vol. 13, No 10, pp. (1327-1344). Freeman, W.T., Pasztor, E., Carmichael, O.(2000). Learning low-level vision, *International* 

Irani, M., Peleg, S. (1991). Improving resolution by image registration. *Graphical Models and* 

Jain, A. (1989). *Fundamental of Digital Image Processing*, Prentice Hall, Englewood Cliffs, NJ,

Jespers, P. G., Van de While, F., White, M. H. (1976). *Solid State Imaging,* Noordhoff

Kharfi, F., Abbaci,M., Boukerdja, L. Attari, K. (2005). Implementation of Neutron

Kharfi, F., Denden, O., Ali, A. (2011). Implementation and characterisation of new neutron

Kim, K., Kwon, Y. (2008). *Example-based learning for single image super resolution and JPEG* 

Kohm, K. (2004). Modulation Transfer Function measurement method and results for the

reactor*, Journal of Applied Radiation and Isotopes*, 69, pp.(1359-1364).

Tomography around the Algerian Es-Salam Research Reactor: preliminary study and first steps, *Nuclear Instrument and Methods in Physics Research, vol. A542*, pp.

imaging system for dynamic processes investigation at the Es-Salam research

*artifact removal*. Max Planck Institute for Biological Cybernetics, Technical Report

ORBVIEW-3 high resolution imaging satellite, *Proceedings of ISPRS annual* 

Hendee, W.R., Ritlenour, E.R. (2002). *Medical imaging physics*. Wiley-LISS, New York.

Slanted-edge methodology*, Proceedings of the Society of Photo-optical Instrumentation Engineers (SPIE), vol.5251*, 2004, pp.(243-252), Bellingham, WA, USA, ISBN:0-8194-

*Journal of Computer Vision (IJCV)*, 40, No.1, pp.(25-47), Kluwer Academic Publishers.

Capel, D.P. (2004) *Image mosaicing and super resolution*. Springer-Vergla, ISBN: 1852337710. Cleveland, W. (1985). *The Elements of Graphics Data,* Ward Worth, Belmont, CA, USA.

Anderson, I.S., McGreevy, L. M., Biheux, H. Z., pp.(47-66) , Springer Science, ISBN:

*Transactions on Pattern Analysis and Machine Intelligence (PAMI)*, 24(9), pp.(1167-

From the HR frames obtained by SR procedure, a new high sampled and lowly noised neutron video is reconstructed. Although the neutron images (frames) processed through the proposed SR method are originally of optimum quality because a prior good selection of suitable video capture parameters was performed, the application of such SR method can reduce noise and improve the sampling of the obtained images and therefore allow the possibility to exploit theses images quantitatively. Because of the high sampling performance of the used CCD camera, not all SR methods are suitable for neuron images and video enhancement. The methods if well adapted and applied can contribute to improve the quality of neutron images and video suffering from motion blur with high efficiency.

#### **4. Conclusions**

Ultimately, the MTF is one component in characterizing the overall image quality performance of an imaging system. A robust method for estimating the MTF of high sensitivity neutron imaging CCD camera is presented. Longitudinal scan MTF results for the neutron imaging system are provided. Although the ISO 12233 slanted edge methodology was originally designed mostly for digital still camera MTF evaluation, it can successfully be applied to our neutron imaging system characterization. The weak dependence on tilt angle for values of less than 10° (the standard recommendation is 5°) allows the alignment constraints to be relaxed. The use of a standardized target and a specifically developed program allows MTF data to be easily and quickly obtained from a single target image. The ISO slanted-edge technique is seen to be a valuable alternative for fast and efficient MTF measurements, allowing the determination of spatial response and effective spatial resolution of the neutron imaging system being studied.

The approach followed and the experimental procedures for temporal resolution limit determination were found to be very suitable and allowed the production of very accurate data. Although, our neutron camera can operate under high frames rates, the temporal resolution is limited and cannot exceed some values which are conditioned by the intensity of the neutron source, the performance of the imaging system and the speed of the dynamic process under examination. The study of temporal resolution limits for the case of higher indicator's rotational speeds (1.2 and 1.65 RPS) demonstrates that, in order to get the best visualization conditions, the frame rate must be increased. Practical tests reveal that this last operation may cause contrast degradation and brightness loss in the frames of the neutron video produced. This, because of the limited dynamic range - due to the limited neutron beam intensity- and the small bit depth of the imaging system used.

Results of flow processes studied by imaging with our neutron imaging system demonstrate that different capture conditions situations often produce completely different neuron video results in terms of noise and contrast. A small number of CCD performance factors and camera operating parameters such as frame rate and signal gain dominate the major aspects of digital image quality in neutron imaging; and their effects overlap to a great extent. The neutron beam intensity was also demonstrated as an important exposure parameter that can modify completely the quality of neutron image in terms of contrast and dynamic range. A flux of ~ 1x107 b/cm2/s was found enough to produce a high contrast image with a maximum dynamic range. For our case of flow process, frame rates of 12.5 fps with a signal gain of 22 dB were found as optimum video capture conditions. The super resolution method tested in this work proves that it is possible to improve the quality of the neutron video sequence with a post processing procedures. Here it is very important to mention that the post processing procedures must be well selected to avoid unnecessary processing.

#### **5. References**

132 Digital Image Processing

From the HR frames obtained by SR procedure, a new high sampled and lowly noised neutron video is reconstructed. Although the neutron images (frames) processed through the proposed SR method are originally of optimum quality because a prior good selection of suitable video capture parameters was performed, the application of such SR method can reduce noise and improve the sampling of the obtained images and therefore allow the possibility to exploit theses images quantitatively. Because of the high sampling performance of the used CCD camera, not all SR methods are suitable for neuron images and video enhancement. The methods if well adapted and applied can contribute to improve the quality

Ultimately, the MTF is one component in characterizing the overall image quality performance of an imaging system. A robust method for estimating the MTF of high sensitivity neutron imaging CCD camera is presented. Longitudinal scan MTF results for the neutron imaging system are provided. Although the ISO 12233 slanted edge methodology was originally designed mostly for digital still camera MTF evaluation, it can successfully be applied to our neutron imaging system characterization. The weak dependence on tilt angle for values of less than 10° (the standard recommendation is 5°) allows the alignment constraints to be relaxed. The use of a standardized target and a specifically developed program allows MTF data to be easily and quickly obtained from a single target image. The ISO slanted-edge technique is seen to be a valuable alternative for fast and efficient MTF measurements, allowing the determination of spatial response and effective spatial

The approach followed and the experimental procedures for temporal resolution limit determination were found to be very suitable and allowed the production of very accurate data. Although, our neutron camera can operate under high frames rates, the temporal resolution is limited and cannot exceed some values which are conditioned by the intensity of the neutron source, the performance of the imaging system and the speed of the dynamic process under examination. The study of temporal resolution limits for the case of higher indicator's rotational speeds (1.2 and 1.65 RPS) demonstrates that, in order to get the best visualization conditions, the frame rate must be increased. Practical tests reveal that this last operation may cause contrast degradation and brightness loss in the frames of the neutron video produced. This, because of the limited dynamic range - due to the limited neutron

Results of flow processes studied by imaging with our neutron imaging system demonstrate that different capture conditions situations often produce completely different neuron video results in terms of noise and contrast. A small number of CCD performance factors and camera operating parameters such as frame rate and signal gain dominate the major aspects of digital image quality in neutron imaging; and their effects overlap to a great extent. The neutron beam intensity was also demonstrated as an important exposure parameter that can modify completely the quality of neutron image in terms of contrast and dynamic range. A flux of ~ 1x107 b/cm2/s was found enough to produce a high contrast image with a maximum dynamic range. For our case of flow process, frame rates of 12.5 fps with a signal gain of 22 dB were found as optimum video capture conditions. The super resolution method tested in this work proves that it is possible to improve the quality of the neutron

of neutron images and video suffering from motion blur with high efficiency.

resolution of the neutron imaging system being studied.

beam intensity- and the small bit depth of the imaging system used.

**4. Conclusions** 


**0**

**7**

*Finland*

**Practical Imaging in Dermatology**

Ville Voipio, Heikki Huttunen and Heikki Forsvik

*Department of Signal Processing, Tampere University of Technology*

Visual observation has an important role in dermatology as the skin is the most visible organ. This makes dermatology a good candidate for utilizing digital imaging and automatic diagnostic tools. This chapter illustrates the use of 2D imaging in dermatology by presenting a method for automatization of common allergy testing. The so called "prick test" is the most common diagnostic test for different allergies. It involves measuring the skin response to

If there is a skin reaction against the allergen the blood flow is increased in the area and a wheal is formed. In physical terms, the color of the skin is changed, and a vertical displacement emerges. In this chapter we will concentrate on the color change (erythema) as we are only

Several approaches have been proposed to the skin erythema detection. The emphasis is often on detecting signs of melanoma and there exists a lot of literature on melanoma segmentation (Celebi et al., 2009; Gomez et al., 2008; 2007), but only a few studies of measurement of allergic reactions from 2D pictures (Nischik & Forster, 1997; Roullot et al., 2005). There are also some studies which have been performed with 3D imaging (Santos et al., 2008) or other specialized

We utilize the inexpensive commonplace 2D digital photography. The key challenges in this approach are the transformation of the color image into a maximal-contrast single-variable (grayscale) image and the interpretation of the wheal dimensions from this transformed image. This chapter is organized so that section 2 discusses the effect of the imaging hardware and camera settings on the medical and scientific imaging. After that we present our algorithm for segmenting the wheal area in Section 3, which is based on our earlier paper

The easy availability and high quality of digital cameras made them attractive for medical and other scientific use. Purpose-built scientific digital cameras carry typically a tenfold price tag,

While the actual performance of the inexpensive cameras is usually very good, their image processing chain is not well documented. Scientific cameras are supplied with photometric specifications, and they do not perform any unspecified processing to the image.

and the pace of development is very fast in the mainstream digital cameras.

**1. Introduction**

**2. Imaging**

percutaneously introduced allergens.

imaging hardware (Wöhrl et al., 2006).

using 2D without the vertical displacement information.

(Huttunen et al., 2011). Finally, Section 4 discusses the results.


SIT Technical Note. (1994). *An introduction to scientific imaging charge coupled devices*, Scientific Imaging Technologies*.* Inc, Beaverton, Oregon, USA.


www.NeutronOptics.com


## **Practical Imaging in Dermatology**

Ville Voipio, Heikki Huttunen and Heikki Forsvik *Department of Signal Processing, Tampere University of Technology Finland*

#### **1. Introduction**

134 Digital Image Processing

Reichenbach, S.E., Park, S.K., Narayanswaury, R. (1991). Characterizing digital image

Samei, E., Buhr, E., Granfors, P., Vandenbroncke, D., Wang, X. (2005). Comparison of edge

Shechtman, E., Caspi, Y., Irani, M.(2005). Space-Time Super-Resolution, *IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)*, Vol.27, No.4, pp.(531-545). Shetchman, E., Capsi, Y., Irani, M. (2002). Increasing Space-Time resolution in video,

SIT Technical Note. (1994). *An introduction to scientific imaging charge coupled devices*, Scientific

Spring, K. R., Fellers, T.J., Davidson, M. W. *Introduction to charge coupled devices (CCDs)*,

Sroubek, F., Flusser, J. (2007). Multiframe blind deconvolution coupled with frame

Sun, J., Xu, Z., Shum, H. (2008). Image super-resolution using gradient profile prior,

Sung Cheol Park, S.C., Park, M. K., Kang. M. M. (2003). Super resolution image

Vandewalle, P., Süsstrunk, S., Vetterli, M. (2006). A Frequency Domain Approach to

Williams, D. (2004). Low-Frequency MTF Estimation for Digital Imaging Devices Using

Zomet, A., Rav-Acha, A., Peleg, S. (2001). Robust Super-Resolution, *Proceedings of the* 

www.microscopyu.com/articles/digitalimaging/ccdintro.html.

analysis techniques for the determination of the MTF of digital radiographic

*Proceedings of the 7th European Conference on Computer Vision (ECCV)*, Springer-

registration and resolution enhancement. *IEEE Transactions on Image Processing*, vol.

*Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR)*,

reconstruction, a technical overview, *Signal Processing Magazine, IEEE*, Volume 20,

Registration of Aliased Images with Application to Super-Resolution. *EURASIP Journal on Applied Signal Processing (special issue on Super-resolution)*, Vol. 2006.

Slanted Edge Analysis, *Proc. SPIE-IS&T Electronic Imaging Symposium*, SPIE vol.

*international conference on computer vision and pattern recognition (CVPR)*, Hawaii,

acquisition devices, *Optical Engineering, 30*, pp.(170-177).

Vergla, vol.890, Issue 5, pp. (753-768), ISBN: 3540437452.

Imaging Technologies*.* Inc, Beaverton, Oregon, USA.

16, No. 9, pp. (2322-2332).

pp.(1-8), ISBN: 978-1-4244-22425.

Issue 3, May 2003, pp. (21 – 36).

Article ID 71459, 14 pages.

5294, pp. (93-101), 2004.

USA, December 2001, vol.1, pp.(645-650).

www.NeutronOptics.com

systems, *Phys. Med. Biol. 50*, No.15, pp. (3613–3625).

Visual observation has an important role in dermatology as the skin is the most visible organ. This makes dermatology a good candidate for utilizing digital imaging and automatic diagnostic tools. This chapter illustrates the use of 2D imaging in dermatology by presenting a method for automatization of common allergy testing. The so called "prick test" is the most common diagnostic test for different allergies. It involves measuring the skin response to percutaneously introduced allergens.

If there is a skin reaction against the allergen the blood flow is increased in the area and a wheal is formed. In physical terms, the color of the skin is changed, and a vertical displacement emerges. In this chapter we will concentrate on the color change (erythema) as we are only using 2D without the vertical displacement information.

Several approaches have been proposed to the skin erythema detection. The emphasis is often on detecting signs of melanoma and there exists a lot of literature on melanoma segmentation (Celebi et al., 2009; Gomez et al., 2008; 2007), but only a few studies of measurement of allergic reactions from 2D pictures (Nischik & Forster, 1997; Roullot et al., 2005). There are also some studies which have been performed with 3D imaging (Santos et al., 2008) or other specialized imaging hardware (Wöhrl et al., 2006).

We utilize the inexpensive commonplace 2D digital photography. The key challenges in this approach are the transformation of the color image into a maximal-contrast single-variable (grayscale) image and the interpretation of the wheal dimensions from this transformed image. This chapter is organized so that section 2 discusses the effect of the imaging hardware and camera settings on the medical and scientific imaging. After that we present our algorithm for segmenting the wheal area in Section 3, which is based on our earlier paper (Huttunen et al., 2011). Finally, Section 4 discusses the results.

#### **2. Imaging**

The easy availability and high quality of digital cameras made them attractive for medical and other scientific use. Purpose-built scientific digital cameras carry typically a tenfold price tag, and the pace of development is very fast in the mainstream digital cameras.

While the actual performance of the inexpensive cameras is usually very good, their image processing chain is not well documented. Scientific cameras are supplied with photometric specifications, and they do not perform any unspecified processing to the image.

photography. The signal-to-noise ratio limited by the shot noise is inversely proportional to

Practical Imaging in Dermatology 137

The purpose of the optical system is to direct photons arriving from a certain direction onto the detector plane. There are two physical limitations related to the optical system; limited depth-of-field and diffraction limit. If the optical system is to collect more light, its light collecting area (usually a circular aperture) has to be larger. Inevitably, large aperture makes a shorter depth-of-view, i.e., only objects at a certain limited distance range are in focus (see section 2.3.2). On the other hand, if the aperture is small, the image becomes diffraction limited due to the wave nature of light. In practice, even moderately low cost digital camera systems

*<sup>d</sup>* <sup>=</sup> 1.22*<sup>λ</sup> <sup>f</sup>*

where *d* is the smallest resolvable detail on the image plane, *λ* wavelength, *D* the size of the entrance pupil (aperture), and *f* the focal length of the objective. For a typical digital camera with a typical objective *f* /*D* is in the range of 2..20. The wavelength of green light is roughly 500 nm, and the pixel size of a SLR digital camera is typically approximately 5 micrometers,

The discussion above is somewhat simplified in two respects. First, the Rayleigh's criterion is somewhat arbitrary, it actually refers to the distance between the center and the first minimum of the Airy disk (a diffraction pattern arising from an aperture), and has its roots in astronomy. Two sharp spots of light can be resolved even when they are closer than the Rayleigh's criterion, so the criterion is more a rule of thumb than an exact measure (Hecht, 1987). Second, in practice the objective may not be diffraction limited at or close to its largest aperture setting. The optical system may also lose some light due to absorption and reflection losses. While the absolute amount of light lost this way is usually rather small, these losses may be seen as uneven illumination across the field (usually less light in the corners) or loss of contrast when

Virtually all consumer digital still cameras are color cameras. As the light-sensitive pixels in the image detector cannot detect the color of the incoming light, a color filter is required1. The color filter represents a major loss of light, as the photons of mismatching energies ("wrong

The color filters are laid out so that neighboring pixels have different filters in front of them. There are numerous different filter patterns, both the layout and the actual colors of the filters <sup>1</sup> There is one notable exception to this rule; Sigma Corporation Foveon X3 sensors have three photosensitive sensors overlaid on top of each other, which in principle eliminates the filter and losses associated to it. Also, many digital video cameras employ a color-separation beamsplitter which directs

*<sup>D</sup>* , (1)

the square root of the intensity of the light falling onto the detector.

The diffraction limit for an optical system is given by the Rayleigh's criterion:

and hence the resolution becomes easily diffraction limited when *f* /*D >* 8.

are often diffraction limited in their behaviour.

there are very brightly illuminated areas in the image.

different wavelengths to three separate image sensors.

**2.1.2 Color filters**

color") are absorbed.

**2.1.1 Objective lens**

Photography-oriented equipment very seldom gives any photometric specifications, and the cameras may perform a lot of unspecified image processing.

The low cost of ordinary digital cameras opens up a large number of new opportunities in medicine. In order to avoid some common pitfalls associated with their use, one should be aware of the typical image processing in the camera. The actual processing varies from one camera to another, and the image processing methods are not published by the manufacturers, but the following sections outline the fundamental limitations and give an overview of the signal processing in a typical digital camera.

The discussion below will concentrate on digital cameras with interchangeable optics (typically SLR, Single Lens Reflex cameras) if not indicated otherwise. The use of compact cameras is usually not recommended in medical or scientific work; SLR cameras provide much better imaging properties and are still affordable.

#### **2.1 Signal path**

Finding a simple and accurate definition for a digital camera is difficult, as the use of a camera may range from art to measurement. In the following discussion the digital camera is viewed as *a measurement device measuring the spectral intensity distribution of light arriving from different spatial angles towards the camera*. Thus, in an ideal camera the spectrum of light arriving from each angle to the film plane is recorded.

Evidently, the definition above gives an infinite amount of information and is in contradiction with both the real world situation and physics. In order to understand the practical limitations, we may have a look at the signal path in a digital camera and the limitations imposed by each step in the path (see figure 1).

Fig. 1. Signal path in a digital camera and sources of loss of information associated with each step.

The first and a very fundamental reason for loss of information lies in the quantum nature of light. As the light collected by the optical system consists of a number of photons, there is always some statistical noise ("shot noise"). Shot noise is usually the dominant noise source in photography. The signal-to-noise ratio limited by the shot noise is inversely proportional to the square root of the intensity of the light falling onto the detector.

#### **2.1.1 Objective lens**

2 Will-be-set-by-IN-TECH

Photography-oriented equipment very seldom gives any photometric specifications, and the

The low cost of ordinary digital cameras opens up a large number of new opportunities in medicine. In order to avoid some common pitfalls associated with their use, one should be aware of the typical image processing in the camera. The actual processing varies from one camera to another, and the image processing methods are not published by the manufacturers, but the following sections outline the fundamental limitations and give an overview of the

The discussion below will concentrate on digital cameras with interchangeable optics (typically SLR, Single Lens Reflex cameras) if not indicated otherwise. The use of compact cameras is usually not recommended in medical or scientific work; SLR cameras provide

Finding a simple and accurate definition for a digital camera is difficult, as the use of a camera may range from art to measurement. In the following discussion the digital camera is viewed as *a measurement device measuring the spectral intensity distribution of light arriving from different spatial angles towards the camera*. Thus, in an ideal camera the spectrum of light arriving from

Evidently, the definition above gives an infinite amount of information and is in contradiction with both the real world situation and physics. In order to understand the practical limitations, we may have a look at the signal path in a digital camera and the limitations

> -

> > -

Fig. 1. Signal path in a digital camera and sources of loss of information associated with each

The first and a very fundamental reason for loss of information lies in the quantum nature of light. As the light collected by the optical system consists of a number of photons, there is always some statistical noise ("shot noise"). Shot noise is usually the dominant noise source in

 

  -

 **-**

  -

 

 -- -

 -- - 

cameras may perform a lot of unspecified image processing.

signal processing in a typical digital camera.

each angle to the film plane is recorded.

**--**

 - -

 -- - 

 -- 

step.

imposed by each step in the path (see figure 1).

**-**

 
 -


**2.1 Signal path**

much better imaging properties and are still affordable.

The purpose of the optical system is to direct photons arriving from a certain direction onto the detector plane. There are two physical limitations related to the optical system; limited depth-of-field and diffraction limit. If the optical system is to collect more light, its light collecting area (usually a circular aperture) has to be larger. Inevitably, large aperture makes a shorter depth-of-view, i.e., only objects at a certain limited distance range are in focus (see section 2.3.2). On the other hand, if the aperture is small, the image becomes diffraction limited due to the wave nature of light. In practice, even moderately low cost digital camera systems are often diffraction limited in their behaviour.

The diffraction limit for an optical system is given by the Rayleigh's criterion:

$$d = 1.22 \frac{\lambda f}{D},\tag{1}$$

where *d* is the smallest resolvable detail on the image plane, *λ* wavelength, *D* the size of the entrance pupil (aperture), and *f* the focal length of the objective. For a typical digital camera with a typical objective *f* /*D* is in the range of 2..20. The wavelength of green light is roughly 500 nm, and the pixel size of a SLR digital camera is typically approximately 5 micrometers, and hence the resolution becomes easily diffraction limited when *f* /*D >* 8.

The discussion above is somewhat simplified in two respects. First, the Rayleigh's criterion is somewhat arbitrary, it actually refers to the distance between the center and the first minimum of the Airy disk (a diffraction pattern arising from an aperture), and has its roots in astronomy. Two sharp spots of light can be resolved even when they are closer than the Rayleigh's criterion, so the criterion is more a rule of thumb than an exact measure (Hecht, 1987). Second, in practice the objective may not be diffraction limited at or close to its largest aperture setting.

The optical system may also lose some light due to absorption and reflection losses. While the absolute amount of light lost this way is usually rather small, these losses may be seen as uneven illumination across the field (usually less light in the corners) or loss of contrast when there are very brightly illuminated areas in the image.

#### **2.1.2 Color filters**

Virtually all consumer digital still cameras are color cameras. As the light-sensitive pixels in the image detector cannot detect the color of the incoming light, a color filter is required1. The color filter represents a major loss of light, as the photons of mismatching energies ("wrong color") are absorbed.

The color filters are laid out so that neighboring pixels have different filters in front of them. There are numerous different filter patterns, both the layout and the actual colors of the filters

<sup>1</sup> There is one notable exception to this rule; Sigma Corporation Foveon X3 sensors have three photosensitive sensors overlaid on top of each other, which in principle eliminates the filter and losses associated to it. Also, many digital video cameras employ a color-separation beamsplitter which directs different wavelengths to three separate image sensors.

**2.1.3 Image detector**

quantum efficiency (QE).

a larger number of pixels would be beneficial.

(deep dark areas or very low light).

very long exposure times.

**2.1.4 AD conversion**

approximately 1/3.

After going through the color filters, the light is detected by a CCD (Charge-Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) array camera. The difference between the two technologies is rather subtle, in both devices a photon hitting a detector cell has some probability of generating a charge carrier which is trapped in the pixel. This probability depends on the sensor and the wavelength of the incident photon and is called the

Practical Imaging in Dermatology 139

The actual quantum efficiencies of real-world cameras are trade secrets, but Farrell et al. (2006) suggests that the quantum efficiency of a digital SLR pixels is around 0.65, but as the fill factor (proportion of active pixel area to the total sensor area) is only around 50 %, the total QE is

The pixel size limits the spatial resolution of the image. However, this is not usually the most limiting factor, as the physical and practical limits of the optics already limit the resolution. Also, making small pixels may reduce the fill factor and increase the relative read-out noise. The choice of pixel size is a compromise between dynamic performance and resolution of the sensor (Farrell et al., 2006). In practice, there are very few situations in medical imaging where

Once the light has been captured by a sensor pixel, it is read out, amplified and quantized by an AD (analog-to-digital) converter. The pixel itself has some temperature-dependent thermally induced noise. Usually this noise is significant only with long exposure times (such as in astrophotography). The pixel readout noise is a source for constant noise which is not dependent on the illumination. Thus, it is usually significant only with low illumination levels

Also, the conversion gain from stored charge carriers to a voltage signal varies from one pixel to another. There may even be hot (always at full output) or dead (always at zero) pixels due to manufacturing defects in the sensor array. Hot pixels may be such that they show only at

While the number of bits in the AD converter seems to be a an important marketing feature for some digital cameras, the quantization noise of the AD converter is typically far below the other noise sources (readout in the dark area, shot noise in the highlight areas). The maximum dynamic intensity range offered by a 14-bit AD converter is 1 : 2<sup>14</sup> = 1 : 16384, whereas the typical maximum dynamic range of the sensor limited by other factors is roughly 1:2000.

Practically all digital cameras offer the possibility to change the sensitivity (ISO number) of the camera. However, this choice does not change the sensitivity of the photosensitive elements. Instead, it either changes the amplification of the analog signal between the camera element and the AD converter or is purely a digital multiplication. As these operations do not change the fundamental noise sources (shot noise, thermal noise, readout noise), it is better to increase

In this signal chain from the spatial distribution of light to bits from the sensor it is important to note that many of the limitations are fundamental physical limitations (shot

the illumination instead of increasing the sensitivity setting.

pixels on the camera element

vary. Probably the most common of these patterns is the Bayer pattern where a 2x2-pixel block has one red, one blue, and one green filter (see Figure 2). The choice of the filter pattern and color is a compromise between spatial resolution, light collecting efficiency, and color reproduction. There are even some cameras which have four color filters (RGBE, E for emerald) to increase the color sensitivity.

It should be noted that whatever the filter pattern, the color reconstruction (demosaic) filter is not a trivial one. If the maximum resolution is desired, the filter needs to identify edges by combining information from several pixels. This requires non-linear processing and is prone to creating artefacts.

#### Fig. 2. Color reproduction in a Bayer filter.

filter

It is important to note that color cameras are essentially very crude three-channel filter spectrometers; the full spectrum of visible light is reduced to three values. The spectral properties of the color filters are usually not disclosed, and in this way the color information given by a digital camera can usually be interpreted only in a relative manner.

Color is not a physical property of light, it is a property of the human visual system. There are three different types of color sensitive cone cells in the human retina. One cell type is sensitive to longer wavelength (red), the other to medium wavelengths (green), and the third type to short wavelengths (blue). The RGB system for color pictures is adopted with the idea that each of the primary colors corresponds to one type of cone cells.2

If the transmission properties of the color filters in a digital camera differ from those of the retinal cone cells, the color space as seen by the digital camera is not the same as seen by the human eye. There may be metameric colors (colors which look the same to a human observer but have different spectral distribution) which a camera may be able to distinguish, and vice versa. There is no simple way to describe these relations without knowing the actual filter characteristics of the color filters in the camera, and even then the practical implications are not necessarily evident.

<sup>2</sup> While a RGB image sensor can cover the full range of colors seen by the human eye, the same does not apply to reproducing the color. Mixing light of any three colors can never produce the full range (gamut) of colors which the human eye can see.

#### **2.1.3 Image detector**

4 Will-be-set-by-IN-TECH

vary. Probably the most common of these patterns is the Bayer pattern where a 2x2-pixel block has one red, one blue, and one green filter (see Figure 2). The choice of the filter pattern and color is a compromise between spatial resolution, light collecting efficiency, and color reproduction. There are even some cameras which have four color filters (RGBE, E for

It should be noted that whatever the filter pattern, the color reconstruction (demosaic) filter is not a trivial one. If the maximum resolution is desired, the filter needs to identify edges by combining information from several pixels. This requires non-linear processing and is prone

It is important to note that color cameras are essentially very crude three-channel filter spectrometers; the full spectrum of visible light is reduced to three values. The spectral properties of the color filters are usually not disclosed, and in this way the color information

Color is not a physical property of light, it is a property of the human visual system. There are three different types of color sensitive cone cells in the human retina. One cell type is sensitive to longer wavelength (red), the other to medium wavelengths (green), and the third type to short wavelengths (blue). The RGB system for color pictures is adopted with the idea that

If the transmission properties of the color filters in a digital camera differ from those of the retinal cone cells, the color space as seen by the digital camera is not the same as seen by the human eye. There may be metameric colors (colors which look the same to a human observer but have different spectral distribution) which a camera may be able to distinguish, and vice versa. There is no simple way to describe these relations without knowing the actual filter characteristics of the color filters in the camera, and even then the practical implications are

<sup>2</sup> While a RGB image sensor can cover the full range of colors seen by the human eye, the same does not apply to reproducing the color. Mixing light of any three colors can never produce the full range

(c) Intensity values on the pixels on the camera element

(a) Original image (b) Image filtered by the Bayer filter

given by a digital camera can usually be interpreted only in a relative manner.

each of the primary colors corresponds to one type of cone cells.2

emerald) to increase the color sensitivity.

Fig. 2. Color reproduction in a Bayer filter.

not necessarily evident.

(gamut) of colors which the human eye can see.

to creating artefacts.

After going through the color filters, the light is detected by a CCD (Charge-Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) array camera. The difference between the two technologies is rather subtle, in both devices a photon hitting a detector cell has some probability of generating a charge carrier which is trapped in the pixel. This probability depends on the sensor and the wavelength of the incident photon and is called the quantum efficiency (QE).

The actual quantum efficiencies of real-world cameras are trade secrets, but Farrell et al. (2006) suggests that the quantum efficiency of a digital SLR pixels is around 0.65, but as the fill factor (proportion of active pixel area to the total sensor area) is only around 50 %, the total QE is approximately 1/3.

The pixel size limits the spatial resolution of the image. However, this is not usually the most limiting factor, as the physical and practical limits of the optics already limit the resolution. Also, making small pixels may reduce the fill factor and increase the relative read-out noise. The choice of pixel size is a compromise between dynamic performance and resolution of the sensor (Farrell et al., 2006). In practice, there are very few situations in medical imaging where a larger number of pixels would be beneficial.

Once the light has been captured by a sensor pixel, it is read out, amplified and quantized by an AD (analog-to-digital) converter. The pixel itself has some temperature-dependent thermally induced noise. Usually this noise is significant only with long exposure times (such as in astrophotography). The pixel readout noise is a source for constant noise which is not dependent on the illumination. Thus, it is usually significant only with low illumination levels (deep dark areas or very low light).

Also, the conversion gain from stored charge carriers to a voltage signal varies from one pixel to another. There may even be hot (always at full output) or dead (always at zero) pixels due to manufacturing defects in the sensor array. Hot pixels may be such that they show only at very long exposure times.

#### **2.1.4 AD conversion**

While the number of bits in the AD converter seems to be a an important marketing feature for some digital cameras, the quantization noise of the AD converter is typically far below the other noise sources (readout in the dark area, shot noise in the highlight areas). The maximum dynamic intensity range offered by a 14-bit AD converter is 1 : 2<sup>14</sup> = 1 : 16384, whereas the typical maximum dynamic range of the sensor limited by other factors is roughly 1:2000.

Practically all digital cameras offer the possibility to change the sensitivity (ISO number) of the camera. However, this choice does not change the sensitivity of the photosensitive elements. Instead, it either changes the amplification of the analog signal between the camera element and the AD converter or is purely a digital multiplication. As these operations do not change the fundamental noise sources (shot noise, thermal noise, readout noise), it is better to increase the illumination instead of increasing the sensitivity setting.

In this signal chain from the spatial distribution of light to bits from the sensor it is important to note that many of the limitations are fundamental physical limitations (shot

color space conversion and compression is to reduce the number of bits without removing visually important information. For example, the very commonly used JPEG (ECMA, 2009) image compression reduces the chrominance (color) information more than the luminance (brightness) information, as the human visual system is more sensitive to luminance changes. The camera image processing may be aware of certain optical properties of the objective lens, e.g., chromatic aberration, geometric aberrations, vignetting (loss of light at the corners), and

Practical Imaging in Dermatology 141

For machine vision applications, there are two possibilities when choosing the file format. If a single camera type is used, then using RAW files of that camera preserves as much information as possible. If there is no control over the camera type, then setting the camera to produce JPEG files with as little processing (sharpening, noise reduction, compression) as possible

It should be noted that even the RAW images from the camera may be compressed images. An analysis of dcraw's source code reveals, for example, that some RAW files have the AD outputs further quantized. Whether or not such operations actually reduce any useful information from the image is unknown, as their effect may remain below the noise level.

The example images used in this chapter have been compressed in-camera with a moderate JPEG. This approach is not optimal, and some of the noise visible in Figure 6 is likely to be compression artefacts. Note that all color space transformations tend to emphasize the compression noise, because JPEG is designed for good visual appearance in the RGB color

One additional concern is storing the images for long-term use. JPEG images can most probably be read a long time in the future as it is the most common photographic image file format at the moment. The less common RAW formats may be difficult to interpret in the decades to come, as none of the formats has gained significantly larger popularity than the others. In medical applications the images may be stored in the DICOM (NEMA, 2011) format. This does not change the work flow in the digital camera in any way, and the DICOM format offers a number of different lossy and lossless data encoding schemes, whose suitability depend a great deal on the imaging application itself. However, the use of DICOM does not necessarily mean the images will be completely future-proof, as the standard and its

Designing a suitable photographic setup is very important in scientific or medical photography. The arrangement of the photographic setup has usually much more impact

The purpose of photographic imaging in dermatology is to measure the spectral reflectance of skin at different points in the area of interest. The accuracy of this process depends on a large number of parameters related to the illumination, setup geometry, optics, camera settings, and

A diagram of the photographic setup used in our project is shown in figure 3. The forearm of the patient is supported so that it does not move during the test. The support is built to

on the usability of the image than differences between different cameras.

may perform corrective operations.

gives a reasonably standardized result.

space.

use evolve.

**2.3 Photographic setup**

the camera itself.

noise, depth-of-view, diffraction limit). The camera technology is very good even at today's level, and the room for improvement is rather limited.

#### **2.2 Digital camera signal processing**

Before the image from the sensor is available as an image file, it undergoes several digital processing steps. There are at least three different goals in this processing:


No image processing can increase the amount of information in the image. Some of the image processing steps may be useful in scientific imaging, but others may make the image less useful in further image processing. At this point the purpose-built scientific cameras are usually more faithful to the actual scene. The image processing steps and algorithms in each individual camera type are proprietary, but there are some general steps which are almost unavoidable:


After these processing steps the result is a RGB image in a linear color space. This is usually the most useful image form for further image processing. Many digital cameras provide such an image under the name "RAW". However, the raw image format is manufacturer-specific, usually does not have any demosaicing applied, and may or may not compensate for dead pixels. The RAW formats are seldom officially published, but there are utilities such as dcraw (Coffin, 2011) which can be used to convert the images to some format easier to handle with image processing software.

One attempt to standardize the varying RAW file formats is the DNG ("digital negative") format by Adobe (Adobe Systems Incorporated, 2009). The DNG specification gives an overview of different aspects of RAW images. Despite this attempt, using the RAW images may be slightly challenging in practice.

In everyday photography, the images are usually processed in-camera with at least the following steps:


The purpose of this processing is to make the image look visually pleasing and occupy small space. Unfortunately, these steps reduce the information available in the image. The goal of 6 Will-be-set-by-IN-TECH

noise, depth-of-view, diffraction limit). The camera technology is very good even at today's

Before the image from the sensor is available as an image file, it undergoes several digital

No image processing can increase the amount of information in the image. Some of the image processing steps may be useful in scientific imaging, but others may make the image less useful in further image processing. At this point the purpose-built scientific cameras are usually more faithful to the actual scene. The image processing steps and algorithms in each individual camera type are proprietary, but there are some general steps which are almost

After these processing steps the result is a RGB image in a linear color space. This is usually the most useful image form for further image processing. Many digital cameras provide such an image under the name "RAW". However, the raw image format is manufacturer-specific, usually does not have any demosaicing applied, and may or may not compensate for dead pixels. The RAW formats are seldom officially published, but there are utilities such as dcraw (Coffin, 2011) which can be used to convert the images to some format easier to handle with

One attempt to standardize the varying RAW file formats is the DNG ("digital negative") format by Adobe (Adobe Systems Incorporated, 2009). The DNG specification gives an overview of different aspects of RAW images. Despite this attempt, using the RAW images

In everyday photography, the images are usually processed in-camera with at least the

The purpose of this processing is to make the image look visually pleasing and occupy small space. Unfortunately, these steps reduce the information available in the image. The goal of

processing steps. There are at least three different goals in this processing:

level, and the room for improvement is rather limited.

2. Improving the visual appearance of the image 3. Making the image technically easier to use

• Pixel-to-pixel non-uniformity (fixed noise) compensation

**2.2 Digital camera signal processing**

1. Correcting image defects

unavoidable:

• Dead pixel elimination

image processing software.

following steps:

• Sharpening

• Compression

• Noise reduction

• Demosaicing (color reproduction)

may be slightly challenging in practice.

• Illumination correction (white balance)

• Color space conversion to a non-linear one

color space conversion and compression is to reduce the number of bits without removing visually important information. For example, the very commonly used JPEG (ECMA, 2009) image compression reduces the chrominance (color) information more than the luminance (brightness) information, as the human visual system is more sensitive to luminance changes.

The camera image processing may be aware of certain optical properties of the objective lens, e.g., chromatic aberration, geometric aberrations, vignetting (loss of light at the corners), and may perform corrective operations.

For machine vision applications, there are two possibilities when choosing the file format. If a single camera type is used, then using RAW files of that camera preserves as much information as possible. If there is no control over the camera type, then setting the camera to produce JPEG files with as little processing (sharpening, noise reduction, compression) as possible gives a reasonably standardized result.

It should be noted that even the RAW images from the camera may be compressed images. An analysis of dcraw's source code reveals, for example, that some RAW files have the AD outputs further quantized. Whether or not such operations actually reduce any useful information from the image is unknown, as their effect may remain below the noise level.

The example images used in this chapter have been compressed in-camera with a moderate JPEG. This approach is not optimal, and some of the noise visible in Figure 6 is likely to be compression artefacts. Note that all color space transformations tend to emphasize the compression noise, because JPEG is designed for good visual appearance in the RGB color space.

One additional concern is storing the images for long-term use. JPEG images can most probably be read a long time in the future as it is the most common photographic image file format at the moment. The less common RAW formats may be difficult to interpret in the decades to come, as none of the formats has gained significantly larger popularity than the others. In medical applications the images may be stored in the DICOM (NEMA, 2011) format. This does not change the work flow in the digital camera in any way, and the DICOM format offers a number of different lossy and lossless data encoding schemes, whose suitability depend a great deal on the imaging application itself. However, the use of DICOM does not necessarily mean the images will be completely future-proof, as the standard and its use evolve.

#### **2.3 Photographic setup**

Designing a suitable photographic setup is very important in scientific or medical photography. The arrangement of the photographic setup has usually much more impact on the usability of the image than differences between different cameras.

The purpose of photographic imaging in dermatology is to measure the spectral reflectance of skin at different points in the area of interest. The accuracy of this process depends on a large number of parameters related to the illumination, setup geometry, optics, camera settings, and the camera itself.

A diagram of the photographic setup used in our project is shown in figure 3. The forearm of the patient is supported so that it does not move during the test. The support is built to

**Lamp type Spectral properties**

Table 1. Typical spectral properties of different light sources

⎞ ⎠ = *k*

in-depth discussion of white balance, see Viggiano (2004).

light source.

specular reflections than area sources.

⎛ ⎝

components have to be adjusted by a multiplication:

*Rcorr Gcorr Bcorr*

⎛ ⎝

Photographic flash Several rather narrow peaks

nm

angle.

Incandescent lamp Smooth spectrum, low intensity in the blue

Practical Imaging in Dermatology 143

Fluorescent tube Usually a spectrum consisting of a large

Monochromatic LED A single peak with a typical peak width of 20

White LED Either a single narrow blue peak and a wide

in a RGB value with equal amount of each of the components. If this is not the case, the

where *X* is the uncorrected value for color X, *Xgray* the uncorrected value for the gray surface illuminated with the same illuminant, *Xcorr* the corrected value, and *k* a normalizing factor. If the automatic white balance setting offered by the camera is used, the camera algorithms try to find the optimal correction coefficients. This process is based on typical photographic scenes, and it may result in unexpected correction results in scientific photography. It should be noted that the white balance correction is often quite significant especially in the blue channel.

Most cameras try to convert the information from the image sensor to the commonly used sRGB color space. This conversion usually involves using a 3x3 matrix multiplication from the camera RGB to the sRGB color space. The conversion may be beneficial for human viewing, but after it has been performed, the white balance equations above do not hold true. For an

Film-based medical photography used to have some applications with monochrome film and an external filter to enhance the contrast of specific features. In general, this method cannot be substituted with digital color photography and digital post processing, as the transmission properties of the filters do not match those of the color filters of the camera. The solution is to use a filter in front of the color camera, filter the light source, or use a (quasi-)monochromatic

The geometric properties of the illumination have a lot of impact on the contrast of the image. The illumination may create shadows or specular reflections (bright spots) which are usually undesired, but in some cases they may be useful in detecting feature outlines or surface normal directions. In general, a point-like light sources give more contrast and more shadows and

1/*Rgray* 0 0 0 1/*Ggray* 0 0 0 1/*Bgray*

(short wavelength) region (tungsten halogen

yellow peak ("white LED") or three narrow separate peaks (combination of R, G, and B LEDs). The color may depend on the radiation

> ⎞ ⎠

⎛ ⎝ *R G B*

⎞

⎠ , (2)

lamps have more blue emission)

number of separate narrow peaks

Fig. 3. The photographic setup for the skin prick test.

eliminate unnecessary muscular strain, as the test takes a half an hour to complete. The light falls from the direction of the camera to eliminate shadows. While the basic structure of the test setup is well-suited to the application, the illumination can be improved. Especially, the spectral properties of the light source were suboptimal, as there was little short wavelength (blue) radiation in the illumination.

#### **2.3.1 Illumination**

Illumination has an extremely important role. There are three main aspects to be taken into account in designing the lighting:


In practice, the requirement for a correct amount of light translates to the requirement of having enough light. Insufficient illumination decreases the amount of information in the photograph either as a loss of resolution or as an increase of noise. (In some rather rare cases one should also pay attention to the effect of the light or thermal radiation onto the tissue illuminated.)

The way colors are formed in a digital camera actually means that the camera measures the product of the spectrum of the illumination, spectral reflection of the target, and spectral transmission of the camera color filters. This process naturally loses most of the spectral reflection information of the target, and accurate measurement of color-related quantities is difficult in digital photography.

Table 1 summarizes the typical spectral properties of some illuminants. The variation within each technology is usually significant, and if reliable color reproduction is required, the light source has to be chosen and calibrated carefully. It should also be noted that changing the camera type changes the camera filter absorption spectrum and thus the color reproduction.

In most cases the color of the illuminant can be sufficiently compensated for by the white balance compensation. The light reflected off a white or neutral gray target should result 8 Will-be-set-by-IN-TECH

eliminate unnecessary muscular strain, as the test takes a half an hour to complete. The light falls from the direction of the camera to eliminate shadows. While the basic structure of the test setup is well-suited to the application, the illumination can be improved. Especially, the spectral properties of the light source were suboptimal, as there was little short wavelength

Illumination has an extremely important role. There are three main aspects to be taken into

In practice, the requirement for a correct amount of light translates to the requirement of having enough light. Insufficient illumination decreases the amount of information in the photograph either as a loss of resolution or as an increase of noise. (In some rather rare cases one should also pay attention to the effect of the light or thermal radiation onto the tissue

The way colors are formed in a digital camera actually means that the camera measures the product of the spectrum of the illumination, spectral reflection of the target, and spectral transmission of the camera color filters. This process naturally loses most of the spectral reflection information of the target, and accurate measurement of color-related quantities is

Table 1 summarizes the typical spectral properties of some illuminants. The variation within each technology is usually significant, and if reliable color reproduction is required, the light source has to be chosen and calibrated carefully. It should also be noted that changing the camera type changes the camera filter absorption spectrum and thus the color reproduction. In most cases the color of the illuminant can be sufficiently compensated for by the white balance compensation. The light reflected off a white or neutral gray target should result

 

Fig. 3. The photographic setup for the skin prick test.

(blue) radiation in the illumination.

account in designing the lighting:

difficult in digital photography.

2. spectral properties of the illumination 3. geometric properties of the illumination

**2.3.1 Illumination**

1. amount of light

illuminated.)


> -




Table 1. Typical spectral properties of different light sources

in a RGB value with equal amount of each of the components. If this is not the case, the components have to be adjusted by a multiplication:

$$
\begin{pmatrix} \mathbf{R}\_{corr} \\ \mathbf{G}\_{corr} \\ \mathbf{B}\_{corr} \end{pmatrix} = k \begin{pmatrix} 1/R\_{gray} & 0 & 0 \\ 0 & 1/G\_{gray} & 0 \\ 0 & 0 & 1/B\_{gray} \end{pmatrix} \begin{pmatrix} \mathbf{R} \\ \mathbf{G} \\ \mathbf{B} \end{pmatrix} \tag{2}
$$

where *X* is the uncorrected value for color X, *Xgray* the uncorrected value for the gray surface illuminated with the same illuminant, *Xcorr* the corrected value, and *k* a normalizing factor. If the automatic white balance setting offered by the camera is used, the camera algorithms try to find the optimal correction coefficients. This process is based on typical photographic scenes, and it may result in unexpected correction results in scientific photography. It should be noted that the white balance correction is often quite significant especially in the blue channel.

Most cameras try to convert the information from the image sensor to the commonly used sRGB color space. This conversion usually involves using a 3x3 matrix multiplication from the camera RGB to the sRGB color space. The conversion may be beneficial for human viewing, but after it has been performed, the white balance equations above do not hold true. For an in-depth discussion of white balance, see Viggiano (2004).

Film-based medical photography used to have some applications with monochrome film and an external filter to enhance the contrast of specific features. In general, this method cannot be substituted with digital color photography and digital post processing, as the transmission properties of the filters do not match those of the color filters of the camera. The solution is to use a filter in front of the color camera, filter the light source, or use a (quasi-)monochromatic light source.

The geometric properties of the illumination have a lot of impact on the contrast of the image. The illumination may create shadows or specular reflections (bright spots) which are usually undesired, but in some cases they may be useful in detecting feature outlines or surface normal directions. In general, a point-like light sources give more contrast and more shadows and specular reflections than area sources.

slightly deteriorate the image resolution, as well. The optimal choice of the aperture depends

Practical Imaging in Dermatology 145

It should be noted that comparing the actual light collecting aperture of different cameras is not straightforward. The aperture size is usually given as the *f*-number (or *f* /#), so that the actual aperture size in physical units is the focal length divided by the *f*-number. Thus, a large *f*-number indicates a small aperture, and the *f*-number itself is not meaningful unless the focal length is known. Even in digital cameras with interchangeable lenses the aperture

The exposure time is perhaps the simplest adjustment. The longer the exposure time, the more photons reach the imaging element. Usually the shortest exposure times available in digital cameras are below one millisecond, and the longest times in the order of dozens of seconds. In practice, the short end of the exposure range is not very useful in medical photography, as there very seldom is enough light. On the other hand, exposure times above 1/60 s are prone to movements in the image. The amount of movement depends very much on the actual

In some cases the dynamic range of the scene surpasses that of the digital camera. The dynamic range of a digital camera is typically approximately 1:2000. If this is not sufficient for some reason, it is possible to take several shots of the same scene with different exposure times. This technique is called bracketing, and the imaging method carries the acronym HDR (High Dynamic Range). The Achilles' heel of HDR is the time difference between different

The skin prick test is widely used in allergy testing. As the test results in local skin reactions (wheals) around a well-defined test site, it lends itself well to computer-based interpretation.

The skin prick test is a well-known and well-established method for quantitative measurement of allergic reactions (Oppenheimer et al., 2011). An allergen is introduced percutaneously; a drop of allergen solution is dropped onto the skin, and the skin is then punctured by a small blade designed for this purpose. If there is an allergic reaction, a wheal

If an allergen is introduced percutaneously, and there are IgE antibodies to that allergen, an inflammatory reaction will arise. Two different types of cells are involved in the process, basophils and mast cells. These cells release chemicals associated with inflammation, such as cytokines and histamines. These chemicals then mediate processes such as vasodilatation

The result of these processes is redness and heating due to increased blood flow and swelling due to leakage of fluid into the surrounding tissues. In the case of the prick test the reaction is localized around the prick site and produce an approximately circular wheal. In practice, the shape of the wheal may vary a lot depending on the structure of the surrounding tissue.

(expansion of the blood vessels) and increased permeability of the blood vessel walls.

area may vary in the range of 1:4 with the same field of view and aperture number.

on the application, and finding it may require some experimenting.

application, sometimes much longer exposure times are useful.

shots, which usually makes it impossible to apply to moving objects.

**3. Wheal shape and size recognition**

**3.1 Medical background**

will emerge.

**3.1.1 Skin reactions**

There is very little in the camera or in the image processing that can be done to compensate for a bad illumination geometry. However, if shininess or specular reflections are the only problems in otherwise good photographic setup, a polarizing filter may help, as it blocks the light reflected at certain angles.

#### **2.3.2 Camera settings**

There are essentially four independent settings in the camera which can be adjusted:


The focal length together with the image sensor size determine the field of view. In a given photographic setup the position of the camera is usually fixed, so that an objective with a suitable focal length needs to be used. In general, objectives with fixed focal length exhibit better optical performance than zoom objectives. The optimal distance between the target and the camera depends on the application, but usually very short focal lengths (wide field of view) are to be avoided, if possible. Figure 4 illustrates the effect of focal length on the perspective of the image.

(a) Focal length: 105 mm, *f* /7.1 (b) Focal length: 35 mm, *f* /7.1 (c) Focal length: 35 mm, *f* /1.8

Fig. 4. Effect of focal length and aperture changes. The perspective distortion is much smaller at a longer focal length, and the depth of field is shorter with larger aperture (small *f*-number). The target and the imaging angle are the same in all images.

The digital cameras offer a selection between automatic or manual focus setting. If the imaging setup is fixed, the manual focus setting is preferred, as the autofocus algorithms are tuned to everyday photography, and the scenes in medical or scientific imaging are different.

The aperture (or entrance pupil) of the optics describes the size of the light collecting area of the objective. A large aperture collects more light and is thus useful in low-light situations. However, the depth of field (useful focus range) is very short at a large aperture (see figure 4). Also, at the large end of the aperture range the image resolution may suffer due to practical limitations. On the other hand, very small apertures have more diffraction, and this may 10 Will-be-set-by-IN-TECH

There is very little in the camera or in the image processing that can be done to compensate for a bad illumination geometry. However, if shininess or specular reflections are the only problems in otherwise good photographic setup, a polarizing filter may help, as it blocks the

The focal length together with the image sensor size determine the field of view. In a given photographic setup the position of the camera is usually fixed, so that an objective with a suitable focal length needs to be used. In general, objectives with fixed focal length exhibit better optical performance than zoom objectives. The optimal distance between the target and the camera depends on the application, but usually very short focal lengths (wide field of view) are to be avoided, if possible. Figure 4 illustrates the effect of focal length on the

(a) Focal length: 105 mm, *f* /7.1 (b) Focal length: 35 mm, *f* /7.1 (c) Focal length: 35 mm, *f* /1.8

Fig. 4. Effect of focal length and aperture changes. The perspective distortion is much smaller

The digital cameras offer a selection between automatic or manual focus setting. If the imaging setup is fixed, the manual focus setting is preferred, as the autofocus algorithms are tuned to everyday photography, and the scenes in medical or scientific imaging are different. The aperture (or entrance pupil) of the optics describes the size of the light collecting area of the objective. A large aperture collects more light and is thus useful in low-light situations. However, the depth of field (useful focus range) is very short at a large aperture (see figure 4). Also, at the large end of the aperture range the image resolution may suffer due to practical limitations. On the other hand, very small apertures have more diffraction, and this may

at a longer focal length, and the depth of field is shorter with larger aperture (small

*f*-number). The target and the imaging angle are the same in all images.

There are essentially four independent settings in the camera which can be adjusted:

light reflected at certain angles.

**2.3.2 Camera settings**

1. focal length

4. exposure time

perspective of the image.

2. focus 3. aperture slightly deteriorate the image resolution, as well. The optimal choice of the aperture depends on the application, and finding it may require some experimenting.

It should be noted that comparing the actual light collecting aperture of different cameras is not straightforward. The aperture size is usually given as the *f*-number (or *f* /#), so that the actual aperture size in physical units is the focal length divided by the *f*-number. Thus, a large *f*-number indicates a small aperture, and the *f*-number itself is not meaningful unless the focal length is known. Even in digital cameras with interchangeable lenses the aperture area may vary in the range of 1:4 with the same field of view and aperture number.

The exposure time is perhaps the simplest adjustment. The longer the exposure time, the more photons reach the imaging element. Usually the shortest exposure times available in digital cameras are below one millisecond, and the longest times in the order of dozens of seconds. In practice, the short end of the exposure range is not very useful in medical photography, as there very seldom is enough light. On the other hand, exposure times above 1/60 s are prone to movements in the image. The amount of movement depends very much on the actual application, sometimes much longer exposure times are useful.

In some cases the dynamic range of the scene surpasses that of the digital camera. The dynamic range of a digital camera is typically approximately 1:2000. If this is not sufficient for some reason, it is possible to take several shots of the same scene with different exposure times. This technique is called bracketing, and the imaging method carries the acronym HDR (High Dynamic Range). The Achilles' heel of HDR is the time difference between different shots, which usually makes it impossible to apply to moving objects.

#### **3. Wheal shape and size recognition**

The skin prick test is widely used in allergy testing. As the test results in local skin reactions (wheals) around a well-defined test site, it lends itself well to computer-based interpretation.

#### **3.1 Medical background**

The skin prick test is a well-known and well-established method for quantitative measurement of allergic reactions (Oppenheimer et al., 2011). An allergen is introduced percutaneously; a drop of allergen solution is dropped onto the skin, and the skin is then punctured by a small blade designed for this purpose. If there is an allergic reaction, a wheal will emerge.

#### **3.1.1 Skin reactions**

If an allergen is introduced percutaneously, and there are IgE antibodies to that allergen, an inflammatory reaction will arise. Two different types of cells are involved in the process, basophils and mast cells. These cells release chemicals associated with inflammation, such as cytokines and histamines. These chemicals then mediate processes such as vasodilatation (expansion of the blood vessels) and increased permeability of the blood vessel walls.

The result of these processes is redness and heating due to increased blood flow and swelling due to leakage of fluid into the surrounding tissues. In the case of the prick test the reaction is localized around the prick site and produce an approximately circular wheal. In practice, the shape of the wheal may vary a lot depending on the structure of the surrounding tissue.

The practice of measuring the wheal vary. The practitioner may use a transparent ruler and slightly press the wheal to see the outline, or she may try to use the color change only. An illustration of the measurement is in Figure 5, where the application of pressure can be seen. Our computerized interpretation method uses the color change only, and the wheal is always

Practical Imaging in Dermatology 147

One of the challenges in skin prick testing is the difference of reactions between different individuals. A common way to account for this is to use histamine as one of the test allergens. It is not an allergen, but it always causes an inflammatory reaction, which can be used as a reference of the individual reaction. Saline solution can also be introduced as a zero control which should not cause any reaction. We used both a zero reference and histamine reference

The problem of finding the area within a wheal in a photograph carries some challenges. The color of the skin varies from one person to another, and the skin does not have an even color. The wheal edges are not sharp, so a threshold has to be defined. Naturally, the first step in finding out the wheal area is to convert the RGB image into a single-variable (grayscale) image. Preferably, the method should be adaptive, as the range of skin colors and illumination

Among the earlier studies in the field, Roullot et al. (2005) considers seven well known color spaces and compares the separability of the reaction from the background using a training database. As a result, they discover that the optimal dimension among the color spaces is the *a*∗-component of the *L*∗*a*∗*b*∗ color space. Using the extracted *a*∗-component, they use simple

Nischik *et al.* (Nischik & Forster, 1997) also discover the *L*∗*a*∗*b*∗ color space most suitable for the wheal segmentation and use the standard deviations of the *L*∗ and *a*∗ components as the features for classification. The classifier is trained to separate between foreground (the wheal) and the background (healthy skin) using manually generated training data. The classifier

Recent work by Celebi *et al.* consider finding optimal color transformation for extracting the foreground (Celebi et al., 2009). Although the paper concentrates on melanoma segmentation, the principle is applicable for other skin diseases, as well. The paper searches for optimal linear combination of the RGB-components, such that the output maximizes the separability of the foreground and background. The foreground and background are determined in each iteration using Otsu thresholding. Thus, the algorithm iterates all projections defined on a finite grid, and tests their performance by measuring the Fisher ratio of the foreground and

The method of Celebi et al. (2009) is an unsupervised method, which attempts to find the best projection without any user manual assistance. However, in our work we study the case, in which the user points the approximate center of the wheal. From the practical point of view this is acceptable, because it requires less work than the manual measurement. However, we plan to automate the detection of the wheal location in the future. Note also that in the

output determines directly the boundary between the two regions.

background (which are determined using Otsu thresholding).

interpreted as an ellipse.

**3.2 Wheal extraction by grayscale transformation**

thresholding for segmenting the wheal.

in our study.

variations is large.

Fig. 5. Manual allergy measurement.

The prick test has some shortcomings. The magnitude of the skin response varies from one patient to another. For example, with young patients the size of the wheal becomes larger as the patient grows (Meinert et al., 1994). Also, the correlation between the actual allergic reactions and the skin reactions is not always very good, and there are several practical factors which may change the results (Haahtela et al., 2010).

#### **3.1.2 Prick testing in practice**

In practice, the prick test is used to test several allergens at the same time. To facilitate this, a suitable even skin area is required, the most common such area being the inner forearm. The number of allergens tested at the same time varies, but typically the inner forearm can carry 20 pricks.

The results are read some 15 to 20 minutes after the prick. The time is sufficient for the reaction to emerge but not long enough to let the symptoms fade. There is some evidence that if the time is too long, the reliability of the test may suffer (Seibert et al., 2011). However, the actual development of the inflammation reaction as a function of time is not generally known, as there have not been any suitable tools for measuring it. One of the aims of our research is to introduce these tools, as a series of photographs reveals the development of the wheal as a function of time.

There are different readout methods in use worldwide. In many cases the results are evaluated pseudo quantitatively by visual inspection only. Seibert et al. (2011) argue that the most reliable way of testing is to measure the size of the wheal, as the visual estimation has been shown to be highly variable.

When the wheal size is measured, there are several different ways of doing it. Traditionally, the practitioner uses a ruler to manually measure the size of the wheal. The test procedure assumes an elliptic shape, with possible elongated branches (called *pseudopodia*) disregarded and the result of the measurement is the mean of the major and minor axes of the imaginary ellipse (Santos et al., 2008). There is also a good correlation between the length of the long axis and the area of the wheal, but the mean method should yield even better results.

12 Will-be-set-by-IN-TECH

The prick test has some shortcomings. The magnitude of the skin response varies from one patient to another. For example, with young patients the size of the wheal becomes larger as the patient grows (Meinert et al., 1994). Also, the correlation between the actual allergic reactions and the skin reactions is not always very good, and there are several practical factors

In practice, the prick test is used to test several allergens at the same time. To facilitate this, a suitable even skin area is required, the most common such area being the inner forearm. The number of allergens tested at the same time varies, but typically the inner forearm can carry

The results are read some 15 to 20 minutes after the prick. The time is sufficient for the reaction to emerge but not long enough to let the symptoms fade. There is some evidence that if the time is too long, the reliability of the test may suffer (Seibert et al., 2011). However, the actual development of the inflammation reaction as a function of time is not generally known, as there have not been any suitable tools for measuring it. One of the aims of our research is to introduce these tools, as a series of photographs reveals the development of the wheal as a

There are different readout methods in use worldwide. In many cases the results are evaluated pseudo quantitatively by visual inspection only. Seibert et al. (2011) argue that the most reliable way of testing is to measure the size of the wheal, as the visual estimation has been

When the wheal size is measured, there are several different ways of doing it. Traditionally, the practitioner uses a ruler to manually measure the size of the wheal. The test procedure assumes an elliptic shape, with possible elongated branches (called *pseudopodia*) disregarded and the result of the measurement is the mean of the major and minor axes of the imaginary ellipse (Santos et al., 2008). There is also a good correlation between the length of the long axis

and the area of the wheal, but the mean method should yield even better results.

Fig. 5. Manual allergy measurement.

**3.1.2 Prick testing in practice**

20 pricks.

function of time.

shown to be highly variable.

which may change the results (Haahtela et al., 2010).

The practice of measuring the wheal vary. The practitioner may use a transparent ruler and slightly press the wheal to see the outline, or she may try to use the color change only. An illustration of the measurement is in Figure 5, where the application of pressure can be seen. Our computerized interpretation method uses the color change only, and the wheal is always interpreted as an ellipse.

One of the challenges in skin prick testing is the difference of reactions between different individuals. A common way to account for this is to use histamine as one of the test allergens. It is not an allergen, but it always causes an inflammatory reaction, which can be used as a reference of the individual reaction. Saline solution can also be introduced as a zero control which should not cause any reaction. We used both a zero reference and histamine reference in our study.

#### **3.2 Wheal extraction by grayscale transformation**

The problem of finding the area within a wheal in a photograph carries some challenges. The color of the skin varies from one person to another, and the skin does not have an even color. The wheal edges are not sharp, so a threshold has to be defined. Naturally, the first step in finding out the wheal area is to convert the RGB image into a single-variable (grayscale) image. Preferably, the method should be adaptive, as the range of skin colors and illumination variations is large.

Among the earlier studies in the field, Roullot et al. (2005) considers seven well known color spaces and compares the separability of the reaction from the background using a training database. As a result, they discover that the optimal dimension among the color spaces is the *a*∗-component of the *L*∗*a*∗*b*∗ color space. Using the extracted *a*∗-component, they use simple thresholding for segmenting the wheal.

Nischik *et al.* (Nischik & Forster, 1997) also discover the *L*∗*a*∗*b*∗ color space most suitable for the wheal segmentation and use the standard deviations of the *L*∗ and *a*∗ components as the features for classification. The classifier is trained to separate between foreground (the wheal) and the background (healthy skin) using manually generated training data. The classifier output determines directly the boundary between the two regions.

Recent work by Celebi *et al.* consider finding optimal color transformation for extracting the foreground (Celebi et al., 2009). Although the paper concentrates on melanoma segmentation, the principle is applicable for other skin diseases, as well. The paper searches for optimal linear combination of the RGB-components, such that the output maximizes the separability of the foreground and background. The foreground and background are determined in each iteration using Otsu thresholding. Thus, the algorithm iterates all projections defined on a finite grid, and tests their performance by measuring the Fisher ratio of the foreground and background (which are determined using Otsu thresholding).

The method of Celebi et al. (2009) is an unsupervised method, which attempts to find the best projection without any user manual assistance. However, in our work we study the case, in which the user points the approximate center of the wheal. From the practical point of view this is acceptable, because it requires less work than the manual measurement. However, we plan to automate the detection of the wheal location in the future. Note also that in the

experiments, the training set of the wheal (S1) is obtained inside the circular neighborhood with the radius of 10 pixels. The training set of the healthy skin (S0) is acquired from pixels that are far away from the center. We have used all pixels located at a radius between 45 and

Practical Imaging in Dermatology 149

The natural tool for optimally projecting the three-dimensions to grayscale is the *Fisher Discriminant*, Fisher (1936). Fisher discriminant finds the projection dimension **w** that maximizes the separability of the classes in terms of the ratio of the between-class-variance

*<sup>J</sup>*(**w**) = **<sup>w</sup>***T***S***B***<sup>w</sup>**

where **<sup>S</sup>***<sup>W</sup>* <sup>∈</sup> **<sup>R</sup>**3×<sup>3</sup> and **<sup>S</sup>***<sup>B</sup>* <sup>∈</sup> **<sup>R</sup>**3×<sup>3</sup> are the within-class and between-class scatter matrices,

**w** = **S**−<sup>1</sup>

An example of the result of the Fisher discriminant projection is shown in Figure 6 (e).

The FD is a special case of the so called *Kernel Fisher Discriminant* (KFD) (Mika et al., 1999; Schölkopf & Smola, 2001), which is a kernelized version of the standard FD. As all kernel methods, the KFD implicitly maps the original data into a high-dimensional feature space and finds the optimally separating manifold there. Using the implicit mapping via the *kernel trick*, the explicit mapping can be avoided, which allows calculating the FD even in

The kernel trick enables better separation between the two classes, which is important because the foreground and the background are typically not linearly separable in the original RGB space. For example, it might be that the foreground is in the middle of two background color regions in the RGB space. One consequence of this is the fact that most authors prefer the *L*∗*a*∗*b*∗ color space, because the classes are better linearly separable there. However, the kernel trick makes the color space transformation less significant due to the transformation into a higher dimensional space, where the classes will be better separable almost regardless of the

<sup>1</sup> , **<sup>x</sup>**<sup>F</sup>

After the samples from the two classes have been collected, the optimally separating linear transformation is given by projection onto the vector **w** defined by the Fisher discriminant as

**<sup>w</sup>** = (**C**<sup>1</sup> <sup>+</sup> **<sup>C</sup>**0)−1(μ<sup>1</sup> <sup>−</sup> <sup>μ</sup>0),

<sup>2</sup> ,..., **<sup>x</sup>**<sup>F</sup> *NF*

*NB* }, where each sample is a 3-dimensional vector in the RGB space: **<sup>x</sup>**F,B

**<sup>w</sup>***T***S***W***<sup>w</sup>** , (3)

*<sup>W</sup>* (μ<sup>1</sup> − μ0), (4)

}, and the background samples by

*<sup>k</sup>* =

50 pixels from the center, as illustrated in Figure 6 (h).

and within-class-variance; i.e., the so called Fisher ratio:

respectively. It can be shown that the optimal direction **w** is given by

where <sup>μ</sup><sup>1</sup> <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> and <sup>μ</sup><sup>0</sup> <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> are the sample means of <sup>S</sup><sup>1</sup> and <sup>S</sup>0.

an infinite-dimensional space.

Denote the foreground samples by <sup>Υ</sup><sup>1</sup> <sup>=</sup> {**x**<sup>F</sup>

original color space.

<sup>1</sup> , **<sup>x</sup>**<sup>B</sup>

<sup>2</sup> ,..., **<sup>x</sup>**<sup>B</sup>

**3.3 The Fisher discriminant projection**

<sup>Υ</sup><sup>0</sup> <sup>=</sup> {**x**<sup>B</sup>

follows,

(*r* F,B *<sup>k</sup>* , *<sup>g</sup>*F,B *<sup>k</sup>* , *<sup>b</sup>*F,B *<sup>k</sup>* )*T*.

Fig. 6. The projection of the wheal in RGB color space. The original RGB image is shown in Figure (a), and (b) shows the hue component, (c) the *a*∗ component of the *L*∗*a*∗*b*∗ color space and (d) is the projection proposed in Celebi et al. (2009). Optimal Fisher discriminant projection is shown in Figure (e), and the results of its kernelized version are shown in figures (f) and (g) using RBF kernel with bandwidth *σ* selected using Silverman rule of thumb and with fixed *σ* = 0.5. The training sets are obtained from areas shown in (h), where the blue center is the foreground sample region and the green circle is the background sample region.

temporal direction, clicking the last image in the time series is enough for determining the wheal location in all pictures, if temporal motion compensation is used.

The key problem when searching the borders of the wheal is the poor contrast between the wheal and skin. An example of a wheal is illustrated in Figure 6 (a). Although the wheal borders are barely visible, the shape becomes highlighted when mapped into grayscale in a suitable manner. Well known mappings for skin color processing include the hue component of the HSV color space (Figure 6 (b)) and the *a*∗ component of the *L*∗*a*∗*b*∗ color space (Figure 6 (c)). In all projections, we have smoothed the RGB image by convolution with a disc shaped window of radius 5. However, these are more or less arbitrary, and variability in skin color and allergic reaction strength may decrease their applicability. Instead, training based projections may improve the separation further, and make it more invariant for all patients. An unsupervised method for finding a well-separating projection in terms of the Fisher criterion was proposed by Celebi *et al.* (Celebi et al., 2009), whose result is shown in Figure 6 (d). In this case the coefficients are 1, -0.1 and -0.3 for red, green and blue channels, respectively.

Optimality of the grayscale projection can be studied assuming that we know the approximate location of the wheal. This way we can construct training sets consisting of the wheal area and the surrounding healthy skin, denoted by S<sup>1</sup> and S0, respectively. With the training sets we can seek for optimal separation in the RGB space in a supervised fashion.

The training set is acquired as follows. When the user has pointed the approximate location of the center of the wheal, a set of RGB values is obtained from the neighborhood. In our 14 Will-be-set-by-IN-TECH

Fig. 6. The projection of the wheal in RGB color space. The original RGB image is shown in Figure (a), and (b) shows the hue component, (c) the *a*∗ component of the *L*∗*a*∗*b*∗ color space and (d) is the projection proposed in Celebi et al. (2009). Optimal Fisher discriminant

projection is shown in Figure (e), and the results of its kernelized version are shown in figures (f) and (g) using RBF kernel with bandwidth *σ* selected using Silverman rule of thumb and with fixed *σ* = 0.5. The training sets are obtained from areas shown in (h), where the blue center is the foreground sample region and the green circle is the background sample region.

temporal direction, clicking the last image in the time series is enough for determining the

The key problem when searching the borders of the wheal is the poor contrast between the wheal and skin. An example of a wheal is illustrated in Figure 6 (a). Although the wheal borders are barely visible, the shape becomes highlighted when mapped into grayscale in a suitable manner. Well known mappings for skin color processing include the hue component of the HSV color space (Figure 6 (b)) and the *a*∗ component of the *L*∗*a*∗*b*∗ color space (Figure 6 (c)). In all projections, we have smoothed the RGB image by convolution with a disc shaped window of radius 5. However, these are more or less arbitrary, and variability in skin color and allergic reaction strength may decrease their applicability. Instead, training based projections may improve the separation further, and make it more invariant for all patients. An unsupervised method for finding a well-separating projection in terms of the Fisher criterion was proposed by Celebi *et al.* (Celebi et al., 2009), whose result is shown in Figure 6 (d). In this case the coefficients are 1, -0.1 and -0.3 for red, green and blue channels,

Optimality of the grayscale projection can be studied assuming that we know the approximate location of the wheal. This way we can construct training sets consisting of the wheal area and the surrounding healthy skin, denoted by S<sup>1</sup> and S0, respectively. With the training sets we

The training set is acquired as follows. When the user has pointed the approximate location of the center of the wheal, a set of RGB values is obtained from the neighborhood. In our

wheal location in all pictures, if temporal motion compensation is used.

can seek for optimal separation in the RGB space in a supervised fashion.

respectively.

experiments, the training set of the wheal (S1) is obtained inside the circular neighborhood with the radius of 10 pixels. The training set of the healthy skin (S0) is acquired from pixels that are far away from the center. We have used all pixels located at a radius between 45 and 50 pixels from the center, as illustrated in Figure 6 (h).

The natural tool for optimally projecting the three-dimensions to grayscale is the *Fisher Discriminant*, Fisher (1936). Fisher discriminant finds the projection dimension **w** that maximizes the separability of the classes in terms of the ratio of the between-class-variance and within-class-variance; i.e., the so called Fisher ratio:

$$f(\mathbf{w}) = \frac{\mathbf{w}^T \mathbf{S}\_B \mathbf{w}}{\mathbf{w}^T \mathbf{S}\_W \mathbf{w}},\tag{3}$$

where **<sup>S</sup>***<sup>W</sup>* <sup>∈</sup> **<sup>R</sup>**3×<sup>3</sup> and **<sup>S</sup>***<sup>B</sup>* <sup>∈</sup> **<sup>R</sup>**3×<sup>3</sup> are the within-class and between-class scatter matrices, respectively. It can be shown that the optimal direction **w** is given by

$$\mathbf{w} = \mathbf{S}\_{W}^{-1} (\boldsymbol{\mu}\_{1} - \boldsymbol{\mu}\_{0}) \, \tag{4}$$

where <sup>μ</sup><sup>1</sup> <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> and <sup>μ</sup><sup>0</sup> <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> are the sample means of <sup>S</sup><sup>1</sup> and <sup>S</sup>0.

An example of the result of the Fisher discriminant projection is shown in Figure 6 (e).

The FD is a special case of the so called *Kernel Fisher Discriminant* (KFD) (Mika et al., 1999; Schölkopf & Smola, 2001), which is a kernelized version of the standard FD. As all kernel methods, the KFD implicitly maps the original data into a high-dimensional feature space and finds the optimally separating manifold there. Using the implicit mapping via the *kernel trick*, the explicit mapping can be avoided, which allows calculating the FD even in an infinite-dimensional space.

The kernel trick enables better separation between the two classes, which is important because the foreground and the background are typically not linearly separable in the original RGB space. For example, it might be that the foreground is in the middle of two background color regions in the RGB space. One consequence of this is the fact that most authors prefer the *L*∗*a*∗*b*∗ color space, because the classes are better linearly separable there. However, the kernel trick makes the color space transformation less significant due to the transformation into a higher dimensional space, where the classes will be better separable almost regardless of the original color space.

Denote the foreground samples by <sup>Υ</sup><sup>1</sup> <sup>=</sup> {**x**<sup>F</sup> <sup>1</sup> , **<sup>x</sup>**<sup>F</sup> <sup>2</sup> ,..., **<sup>x</sup>**<sup>F</sup> *NF* }, and the background samples by <sup>Υ</sup><sup>0</sup> <sup>=</sup> {**x**<sup>B</sup> <sup>1</sup> , **<sup>x</sup>**<sup>B</sup> <sup>2</sup> ,..., **<sup>x</sup>**<sup>B</sup> *NB* }, where each sample is a 3-dimensional vector in the RGB space: **<sup>x</sup>**F,B *<sup>k</sup>* = (*r* F,B *<sup>k</sup>* , *<sup>g</sup>*F,B *<sup>k</sup>* , *<sup>b</sup>*F,B *<sup>k</sup>* )*T*.

#### **3.3 The Fisher discriminant projection**

After the samples from the two classes have been collected, the optimally separating linear transformation is given by projection onto the vector **w** defined by the Fisher discriminant as follows,

$$\mathbf{w} = (\mathbf{C}\_1 + \mathbf{C}\_0)^{-1} (\mu\_1 - \mu\_0)\_{\prime\prime}$$

To reveal the kernel nature of (6), a part of it is further investigated, giving

∑ **x**∈S*<sup>i</sup>* **<sup>Q</sup>***T*(**Φ**(**x**) <sup>−</sup> <sup>μ</sup>**<sup>Φ</sup>**

Practical Imaging in Dermatology 151

Now it is rather clear that (6) can be represented using only dot-products of the mapped samples. Since also the projection of a test sample **x** to grayscale is expressible as α*T***Q***T***Φ**(**x**), the explicit use of mapping function **Φ**(**x**) can be substituted with its dot-products, which in hand are replaceable with Mercer kernels (Mika et al., 1999). This *kernel-trick* removes the curse of dimensionality, ultimately allowing implicit mapping to an infinite-dimensional space. More precise introduction, although with different notation, to KFD by Mika *et al.* is in

There are various alternatives for the kernel function *κ*(·, ·), among which the most widely used are the polynomial kernels and the Radial Basis Function (RBF) kernel. We experimented with various kernels, and found out that the polynomial kernels do not increase the separation significantly when compared with the linear kernel, which is equivalent to the traditional FD. In other words, all low-order polynomial kernels produce a projection very similar to the first order kernel, shown in Figure 6 (e). However, the separation seems to improve with the RBF

There are two parameters in the KFD projection with RBF kernel: The regularization parameter *λ* and the kernel width *σ*2. Since there exists a lot of training data in our case, it seems to be less sensitive to the regularization parameter *λ* than the width *σ*2. In our experiments we set the value of *λ* = 10−5, and if the condition number of the matrix in Eq. (6) indicates that the matrix is close to singular, the value of *λ* is increased ten-fold until the

Figures 6 (f-g) illustrate the effect of the bandwidth parameter *σ*2. Figure 6 (f) uses the bandwidth selected using the so called Silverman's rule of thumb (Silverman, 1986), widely

standard deviation of the data and *N* is the data length. In the example in Figure 6 (f) the rule of thumb gives *σ*rot = 1.37. Figure 6 (g) on the other hand illustrates the result with fixed

It seems that the rule of thumb tends to give too large values for our problem. This can be seen from the visually improved separation which is obtained using a smaller bandwidth, *σ* = 0.5,

As the final step of the wheal area segmentation, one has to threshold the grayscale projected image in order to obtain a binary segmentation. The simple approach is to use thresholding. The problem of selecting a proper threshold has been extensively studied, and, e.g., Sezgin *et al.* (Sezgin & Sankur, 2004) compares 40 selected methods for a set of test images. One of their conclusions is that the best performing method is different depending on the nature of the input grayscale image (e.g., text, natural scenes, etc.). Since one of our goals is to compare different grayscale projections for the wheal detection purpose, we restrict ourselves

−||**<sup>u</sup>** <sup>−</sup> **<sup>v</sup>**||<sup>2</sup> 2*σ*<sup>2</sup>

*<sup>κ</sup>*(**u**, **<sup>v</sup>**) = exp

used in kernel density estimation and defined by *<sup>σ</sup>*ˆrot = 1.06*σ*ˆ*xN*<sup>−</sup> <sup>1</sup>

*<sup>i</sup>* )(**Q***T*(**Φ**(**x**) <sup>−</sup> <sup>μ</sup>**<sup>Φ</sup>**

*<sup>i</sup>* ))*T*. (8)

. (9)

<sup>5</sup> , where *σ*ˆ*<sup>x</sup>* is the sample

**Q***T***S<sup>Φ</sup>**

*<sup>W</sup>***Q** = ∑ *i*=0,1

Mika et al. (1999) and in Schölkopf & Smola (2001).

kernel

*σ* = 0.5.

inversion succeeds.

whose result is shown in Figure 6 (g).

**3.5 Segmentation of the grayscale image**

where **<sup>C</sup>**<sup>1</sup> <sup>∈</sup> **<sup>R</sup>**3×<sup>3</sup> and **<sup>C</sup>**<sup>0</sup> <sup>∈</sup> **<sup>R</sup>**3×<sup>3</sup> are the sample covariance matrices and <sup>μ</sup><sup>1</sup> <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> and <sup>μ</sup><sup>0</sup> <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> the sample means of the foreground and background samples, respectively.

#### **3.4 The kernel Fisher discriminant projection**

The Fisher discriminant projection is a special case of the kernel Fisher discriminant projection, occurring when the implicit mapping function is **Φ**(**x**) = **x**. The use of more complicated mapping functions allows more complicated separations for the classes. For arbitrary mapping the Kernel Fisher Discriminant extends the FD by mapping the data into a higher dimensional feature space H.

In practice the KFD can be calculated implicitly by substituting all dot products with a kernel function *κ*(·, ·). It can be shown, that all positive definite kernel functions correspond to a dot product after transforming the data to a feature space H with mapping **Φ**(·) (Schölkopf & Smola, 2001). The feature space H can be very high dimensional, and the use of the projection vector **w** directly may be impractical or impossible. Instead, the famous *Representer theorem* guarantees that the solution can be represented as a linear combination of the mapped samples (Schölkopf & Smola, 2001). Thus, the Fisher ratio in the feature space is based on the weights of the samples α instead of the weights of the dimensions:

$$J(\alpha) = \frac{\alpha^T \mathbf{Q}^T \mathbf{S}\_B^\Phi \mathbf{Q} \alpha}{\alpha^T \mathbf{Q}^T \mathbf{S}\_W^\Phi \mathbf{Q} \alpha},\tag{5}$$

where <sup>α</sup> <sup>∈</sup> **<sup>R</sup>***<sup>N</sup>* = (*α*1, *<sup>α</sup>*2,..., *<sup>α</sup>N*)*<sup>T</sup>* is the weight vector for the mapped training samples in the matrix **Q** = [**Φ**(**x**1), ..., **Φ**(**x***N*)], and **S<sup>Φ</sup>** *<sup>B</sup>* and **<sup>S</sup><sup>Φ</sup>** *<sup>W</sup>* are the between-class and within-class scatter matrices in the feature space H, respectively.

Similar solution as the one for the Fisher discriminant in Eq. (4) can be found also for this case (Schölkopf & Smola, 2001). However the inversion becomes more difficult, since the dimension of the weight vector α is now the number of the collected training samples. Therefore, we need a regularization term *λ***I**, where *λ* is a small positive scalar and **I** is the *N* × *N* identity matrix. Using regularization also improves the robustness of the projection and makes it less likely to overfit, as the solution becomes less sensitive to within-class scatter. In our notation this yields the solution

$$\alpha = (\mathbf{Q}^T \mathbf{S}\_W^{\Phi} \mathbf{Q} + \lambda \mathbf{I})^{-1} \mathbf{Q}^T (\mu\_1^{\Phi} - \mu\_0^{\Phi}),\tag{6}$$

where μ**<sup>Φ</sup>** <sup>1</sup> ∈ H and <sup>μ</sup>**<sup>Φ</sup>** <sup>0</sup> ∈ H are the sample means of the mapped wheal and skin samples, respectively.

It is straightforward to show, that Eq. (6) can be expressed in terms of dot products and thus the kernel trick (Mika et al., 1999; Schölkopf & Smola, 2001). Also the actual projection of a test sample **<sup>x</sup>** <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> can be expressed through the kernel as

$$y = \alpha^T \mathbf{Q}^T \Phi(\mathbf{x}) = \sum\_{i=1}^N \alpha\_i \kappa(\mathbf{x}\_{i\prime} \mathbf{x}). \tag{7}$$

16 Will-be-set-by-IN-TECH

where **<sup>C</sup>**<sup>1</sup> <sup>∈</sup> **<sup>R</sup>**3×<sup>3</sup> and **<sup>C</sup>**<sup>0</sup> <sup>∈</sup> **<sup>R</sup>**3×<sup>3</sup> are the sample covariance matrices and <sup>μ</sup><sup>1</sup> <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> and

The Fisher discriminant projection is a special case of the kernel Fisher discriminant projection, occurring when the implicit mapping function is **Φ**(**x**) = **x**. The use of more complicated mapping functions allows more complicated separations for the classes. For arbitrary mapping the Kernel Fisher Discriminant extends the FD by mapping the data into a higher

In practice the KFD can be calculated implicitly by substituting all dot products with a kernel function *κ*(·, ·). It can be shown, that all positive definite kernel functions correspond to a dot product after transforming the data to a feature space H with mapping **Φ**(·) (Schölkopf & Smola, 2001). The feature space H can be very high dimensional, and the use of the projection vector **w** directly may be impractical or impossible. Instead, the famous *Representer theorem* guarantees that the solution can be represented as a linear combination of the mapped samples (Schölkopf & Smola, 2001). Thus, the Fisher ratio in the feature space is based on the weights

*<sup>J</sup>*(α) = <sup>α</sup>*T***Q***T***S<sup>Φ</sup>**

α*T***Q***T***S<sup>Φ</sup>**

where <sup>α</sup> <sup>∈</sup> **<sup>R</sup>***<sup>N</sup>* = (*α*1, *<sup>α</sup>*2,..., *<sup>α</sup>N*)*<sup>T</sup>* is the weight vector for the mapped training samples in

Similar solution as the one for the Fisher discriminant in Eq. (4) can be found also for this case (Schölkopf & Smola, 2001). However the inversion becomes more difficult, since the dimension of the weight vector α is now the number of the collected training samples. Therefore, we need a regularization term *λ***I**, where *λ* is a small positive scalar and **I** is the *N* × *N* identity matrix. Using regularization also improves the robustness of the projection and makes it less likely to overfit, as the solution becomes less sensitive to within-class scatter.

*<sup>W</sup>***<sup>Q</sup>** <sup>+</sup> *<sup>λ</sup>***I**)−1**Q***T*(μ**<sup>Φ</sup>**

It is straightforward to show, that Eq. (6) can be expressed in terms of dot products and thus the kernel trick (Mika et al., 1999; Schölkopf & Smola, 2001). Also the actual projection of a

> *N* ∑ *i*=1

*y* = α*T***Q***T***Φ**(**x**) =

<sup>1</sup> <sup>−</sup> <sup>μ</sup>**<sup>Φ</sup>**

<sup>0</sup> ∈ H are the sample means of the mapped wheal and skin samples,

*<sup>B</sup>* and **<sup>S</sup><sup>Φ</sup>**

*<sup>B</sup>* **Q**α

*W***Q**α, (5)

*<sup>W</sup>* are the between-class and within-class

<sup>0</sup> ), (6)

*αiκ*(**x***i*, **x**). (7)

<sup>μ</sup><sup>0</sup> <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> the sample means of the foreground and background samples, respectively.

**3.4 The kernel Fisher discriminant projection**

of the samples α instead of the weights of the dimensions:

the matrix **Q** = [**Φ**(**x**1), ..., **Φ**(**x***N*)], and **S<sup>Φ</sup>**

In our notation this yields the solution

<sup>1</sup> ∈ H and <sup>μ</sup>**<sup>Φ</sup>**

where μ**<sup>Φ</sup>**

respectively.

scatter matrices in the feature space H, respectively.

α = (**Q***T***S<sup>Φ</sup>**

test sample **<sup>x</sup>** <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> can be expressed through the kernel as

dimensional feature space H.

To reveal the kernel nature of (6), a part of it is further investigated, giving

$$\mathbf{Q}^T \mathbf{S}\_W^{\Phi} \mathbf{Q} = \sum\_{i=0,1} \sum\_{\mathbf{x} \in \mathcal{S}\_l} \mathbf{Q}^T (\Phi(\mathbf{x}) - \boldsymbol{\mu}\_i^{\Phi}) (\mathbf{Q}^T (\Phi(\mathbf{x}) - \boldsymbol{\mu}\_i^{\Phi}))^T. \tag{8}$$

Now it is rather clear that (6) can be represented using only dot-products of the mapped samples. Since also the projection of a test sample **x** to grayscale is expressible as α*T***Q***T***Φ**(**x**), the explicit use of mapping function **Φ**(**x**) can be substituted with its dot-products, which in hand are replaceable with Mercer kernels (Mika et al., 1999). This *kernel-trick* removes the curse of dimensionality, ultimately allowing implicit mapping to an infinite-dimensional space. More precise introduction, although with different notation, to KFD by Mika *et al.* is in Mika et al. (1999) and in Schölkopf & Smola (2001).

There are various alternatives for the kernel function *κ*(·, ·), among which the most widely used are the polynomial kernels and the Radial Basis Function (RBF) kernel. We experimented with various kernels, and found out that the polynomial kernels do not increase the separation significantly when compared with the linear kernel, which is equivalent to the traditional FD. In other words, all low-order polynomial kernels produce a projection very similar to the first order kernel, shown in Figure 6 (e). However, the separation seems to improve with the RBF kernel

$$\kappa(\mathbf{u}, \mathbf{v}) = \exp\left(-\frac{||\mathbf{u} - \mathbf{v}||^2}{2\sigma^2}\right). \tag{9}$$

There are two parameters in the KFD projection with RBF kernel: The regularization parameter *λ* and the kernel width *σ*2. Since there exists a lot of training data in our case, it seems to be less sensitive to the regularization parameter *λ* than the width *σ*2. In our experiments we set the value of *λ* = 10−5, and if the condition number of the matrix in Eq. (6) indicates that the matrix is close to singular, the value of *λ* is increased ten-fold until the inversion succeeds.

Figures 6 (f-g) illustrate the effect of the bandwidth parameter *σ*2. Figure 6 (f) uses the bandwidth selected using the so called Silverman's rule of thumb (Silverman, 1986), widely used in kernel density estimation and defined by *<sup>σ</sup>*ˆrot = 1.06*σ*ˆ*xN*<sup>−</sup> <sup>1</sup> <sup>5</sup> , where *σ*ˆ*<sup>x</sup>* is the sample standard deviation of the data and *N* is the data length. In the example in Figure 6 (f) the rule of thumb gives *σ*rot = 1.37. Figure 6 (g) on the other hand illustrates the result with fixed *σ* = 0.5.

It seems that the rule of thumb tends to give too large values for our problem. This can be seen from the visually improved separation which is obtained using a smaller bandwidth, *σ* = 0.5, whose result is shown in Figure 6 (g).

#### **3.5 Segmentation of the grayscale image**

As the final step of the wheal area segmentation, one has to threshold the grayscale projected image in order to obtain a binary segmentation. The simple approach is to use thresholding. The problem of selecting a proper threshold has been extensively studied, and, e.g., Sezgin *et al.* (Sezgin & Sankur, 2004) compares 40 selected methods for a set of test images. One of their conclusions is that the best performing method is different depending on the nature of the input grayscale image (e.g., text, natural scenes, etc.). Since one of our goals is to compare different grayscale projections for the wheal detection purpose, we restrict ourselves

the graph. Note, that in our case an alternative graph formulation could take advantage of the known foreground and background locations: the background node could be connected only to the borders of the image (which is known to be background) and the foreground node only

Practical Imaging in Dermatology 153

In order to obtain a reasonable segmentation result, the weights are determined according to the following rules. Edges connecting neighboring pixels have a weight inversely proportional to the difference of the grayscale values. This way homogeneous areas (with grayscales close to each other) have large weights and heterogeneous areas (large pixel difference) have small weights. In our case the weight of the edge between pixels *p* and *q*

This function satisfies the requirement that far away grayscale values have a small weight while close grayscales obtain a large weight. The exact exponential form and the normalization coefficients of the function were obtained through experimentation, although

The edges connecting the pixels to *foreground* and *background* nodes are determined by the grayscale value. The idea is that the brighter the pixel, the stronger the connection to the foreground node and vice versa. In our case the foreground edge weight for pixel *p* with

255 − *Ip* + 1

*Ip* + 1

*Df*(*Ip*) = *<sup>γ</sup>*

*Db*(*Ip*) = <sup>1</sup> <sup>−</sup> *<sup>γ</sup>*

The idea is that the foreground connection would be strong, when the pixel value *Ip* is large (close to 255) and the background connection would be strong, when the pixel value *Ip* is

The parameter *γ* ∈ [0, 1] balances the edge weights and can be used for adjusting the foreground area size. There are closed form solutions for the parameter to obtain a desired the probability of a foreground pixel. However, an easier method is to use binary search over *γ* ∈ [0, 1] to get a desired ratio of background and foreground sizes. In our case we selected

As the final step, we remove all but the largest object from the segmentation result. This is

The transition from the background (the healthy skin) to the foreground (the wheal) can be quite smooth, and the KFD-projected image may contain several individual foreground

−|*Ip* <sup>−</sup> *Iq*<sup>|</sup> <sup>100</sup>

. (11)

, (12)

. (13)

*Vp*,*q*(*Ip*, *Iq*) = <sup>10</sup> · exp

with grayscale intensities *Ip* and *Iq* is determined by the rule

various other choices were almost equally effective.

intensity *Ip* ∈ {0, 1, . . . , 255} was determined by the rule

the desired ratio to be equal to that of the Otsu thresholding result.

**3.6 Using a shape model for wheal area detection**

because sometimes there remain small foreground areas that are due to noise.

and the background edge weight by the rule

small (close to 0).

to the center of the image.

Fig. 7. The graph representation of a 3 × 3 image plane.

to using only one automatic threshold selection method; the most widely used Otsu method (Otsu, 1979). Without this restriction, we would be comparing all combinations of grayscale projection and thresholding techniques summing up to hundreds of combinations.

As an alternative to grayscale thresholding, we consider a newer approach, the powerful graph cut based segmentation. The idea of using graphs for solving hard optimization problems was originally discovered by Greig *et al.* (Greig et al., 1989), and extended to multilabel problems by Boykov *et al.* (Boykov et al., 2001). A fast implementation was proposed by Boykov *et al.* (Boykov & Kolmogorov, 2004), and their free implementation is widely used, and is the basis for our method, as well.

Graph cuts are a method for efficient minimization of certain energy functionals of the type

$$E[f] = \sum\_{p \in \mathcal{P}} D\_p(I\_p) + \sum\_{p, q \in \mathcal{N}} V\_{p, q}(I\_{p \prime} I\_q), \tag{10}$$

where *Dp*(*fp*) reflects the distance from the image to be modeled (segmented in our case), and *Vp*,*q*(*Ip*, *Iq*) is the cost assigned to labeling neighboring points differently. In a sense, using graph cuts is *regularized thresholding*, where neighboring points are encouraged to have the same label. In practice, the difference to thresholding is that the resulting labeling has fewer holes. In our case, the graph cut approach for segmentation treats the KFDA-projected grayscale image as a weighted graph, whose nodes represents the pixels and edges the connections between the neighboring pixels. An illustration of this is shown in Figure 7, which represents a 3 × 3 image, whose nine pixels are considered as nodes of the graph. Each pixel is connected to its closest neighbors, and additionally to two special nodes; *foreground* node and *background* node.

The graph cut method attempts to split the graph into two disconnected parts, such that the foreground and background nodes are in different partitions. The splitting is done with minimal cost, i.e., such that the sum of weights of cut nodes is as small as possible. As a result, the foreground area consists of pixels connected to the foreground node after splitting 18 Will-be-set-by-IN-TECH

Image plane

Foreground

Background

projection and thresholding techniques summing up to hundreds of combinations.

to using only one automatic threshold selection method; the most widely used Otsu method (Otsu, 1979). Without this restriction, we would be comparing all combinations of grayscale

As an alternative to grayscale thresholding, we consider a newer approach, the powerful graph cut based segmentation. The idea of using graphs for solving hard optimization problems was originally discovered by Greig *et al.* (Greig et al., 1989), and extended to multilabel problems by Boykov *et al.* (Boykov et al., 2001). A fast implementation was proposed by Boykov *et al.* (Boykov & Kolmogorov, 2004), and their free implementation is

Graph cuts are a method for efficient minimization of certain energy functionals of the type

*Dp*(*Ip*) + ∑

where *Dp*(*fp*) reflects the distance from the image to be modeled (segmented in our case), and *Vp*,*q*(*Ip*, *Iq*) is the cost assigned to labeling neighboring points differently. In a sense, using graph cuts is *regularized thresholding*, where neighboring points are encouraged to have the same label. In practice, the difference to thresholding is that the resulting labeling has fewer holes. In our case, the graph cut approach for segmentation treats the KFDA-projected grayscale image as a weighted graph, whose nodes represents the pixels and edges the connections between the neighboring pixels. An illustration of this is shown in Figure 7, which represents a 3 × 3 image, whose nine pixels are considered as nodes of the graph. Each pixel is connected to its closest neighbors, and additionally to two special nodes; *foreground* node

The graph cut method attempts to split the graph into two disconnected parts, such that the foreground and background nodes are in different partitions. The splitting is done with minimal cost, i.e., such that the sum of weights of cut nodes is as small as possible. As a result, the foreground area consists of pixels connected to the foreground node after splitting

*p*,*q*∈N

*Vp*,*q*(*Ip*, *Iq*), (10)

Fig. 7. The graph representation of a 3 × 3 image plane.

widely used, and is the basis for our method, as well.

and *background* node.

*E*[ *f* ] = ∑

*p*∈P

the graph. Note, that in our case an alternative graph formulation could take advantage of the known foreground and background locations: the background node could be connected only to the borders of the image (which is known to be background) and the foreground node only to the center of the image.

In order to obtain a reasonable segmentation result, the weights are determined according to the following rules. Edges connecting neighboring pixels have a weight inversely proportional to the difference of the grayscale values. This way homogeneous areas (with grayscales close to each other) have large weights and heterogeneous areas (large pixel difference) have small weights. In our case the weight of the edge between pixels *p* and *q* with grayscale intensities *Ip* and *Iq* is determined by the rule

$$V\_{p,q}(I\_{p'}I\_q) = 10 \cdot \exp\left(-\frac{|I\_p - I\_q|}{100}\right). \tag{11}$$

This function satisfies the requirement that far away grayscale values have a small weight while close grayscales obtain a large weight. The exact exponential form and the normalization coefficients of the function were obtained through experimentation, although various other choices were almost equally effective.

The edges connecting the pixels to *foreground* and *background* nodes are determined by the grayscale value. The idea is that the brighter the pixel, the stronger the connection to the foreground node and vice versa. In our case the foreground edge weight for pixel *p* with intensity *Ip* ∈ {0, 1, . . . , 255} was determined by the rule

$$D\_f(I\_p) = \frac{\gamma}{255 - I\_p + 1},\tag{12}$$

and the background edge weight by the rule

$$D\_b(I\_p) = \frac{1-\gamma}{I\_p+1}.\tag{13}$$

The idea is that the foreground connection would be strong, when the pixel value *Ip* is large (close to 255) and the background connection would be strong, when the pixel value *Ip* is small (close to 0).

The parameter *γ* ∈ [0, 1] balances the edge weights and can be used for adjusting the foreground area size. There are closed form solutions for the parameter to obtain a desired the probability of a foreground pixel. However, an easier method is to use binary search over *γ* ∈ [0, 1] to get a desired ratio of background and foreground sizes. In our case we selected the desired ratio to be equal to that of the Otsu thresholding result.

As the final step, we remove all but the largest object from the segmentation result. This is because sometimes there remain small foreground areas that are due to noise.

#### **3.6 Using a shape model for wheal area detection**

The transition from the background (the healthy skin) to the foreground (the wheal) can be quite smooth, and the KFD-projected image may contain several individual foreground

0 10 20 30 40 50 60 70 80 90

Major axis

separating ellipse overlaid on top of original RGB data.

taking the logarithm of the data and the model of Eq. (15):

(ln(*zk*) <sup>−</sup> ln(*f*(**x***k*; *<sup>c</sup>*, **<sup>x</sup>**0, **<sup>Σ</sup>**)))<sup>2</sup> <sup>=</sup> min

Fig. 9. Left: The KFDR as a function of the ellipse size. Center: The maximally separating ellipse overlaid on top of the corresponding KFD projection. Right: The maximally

Practical Imaging in Dermatology 155

However, in order to ease the task, we first seek for an initial guess for the coefficients by

*N* ∑ *k*=0

(ln(*zk*) + ln *<sup>c</sup>* + (**<sup>x</sup>** <sup>−</sup> **<sup>x</sup>**0)*T***Σ**(**<sup>x</sup>** <sup>−</sup> **<sup>x</sup>**0))2. (16)

*c*,**Σ**,**x**<sup>0</sup>

nonlinear LS problem is then solved using Matlab Optimization toolbox.

center and the result of nonlinear iterative fitting on the right.

This makes the problem linear, and the result can be found in a closed form. However, taking the logarithm increases the importance of the smaller values, and the method essentially fits the model to the noise surrounding the wheal, not the wheal itself. Thus, the resulting solution is used only as the initial guess for the nonlinear LS problem without the logarithm. The

Figure 8 shows the original grayscale data on the left, the result of logarithmic fitting in the

The isosurfaces of the Gaussian fit can be used as candidates for elliptic segmentation. As noted earlier, all the isosurfaces are cross-sections of a paraboloid and thus ellipses. Moreover, due to fitting, they most likely have the correct orientation and correct ratio of major and minor axis lengths. Thus, our next goal is to seek for the best elliptic isosurface among them

The definition of a good ellipse among the candidates needs some measure of separation between the segmented areas. Recent work by Harchaoui *et al.* (Harchaoui et al., 2008) considers using the Kernel Fisher Discriminant Ratio (KFDR) for testing the homogeneity between two sets, which coincides well with our use of KFD for grayscale projection in Section

In other words, we test all ellipses that are cross sections of the fitted Gaussian and attempt to maximize the KFDR of Eq. (5) with respect to training sets defined by the ellipse. The situation is similar to the grayscale projection, but now we are not looking for a good classifier for the RGB data, but only assessing how well the data *could be classified*. Unlike Section 3.2, the choice

In a sense, the KFDR homogeneity test attempts to design a classifier to separate the "inside" class and the "outside" class, and the KFDR is a natural measure how well this can be done. Note that this is not equivalent to calculating the variances directly from the projections of

of the training samples is now based on the boundaries of the ellipse to be tested.

0

min *c*,**Σ**,**x**<sup>0</sup>

all.

3.2.

*N* ∑ *k*=0

1

2

KFD ratio

3 x 105

Fig. 8. Left: The grayscale projection data of the wheal in Figure 6 using KFD projection with RBF kernel and *σ* = 1. Right: The final result after nonlinear LS fit.

regions although the image has only one wheal. This is mostly due to the noise in the data, whose effect is greatly emphasized by the grayscale projection.

In order to increase the robustness of the segmentation, we fit a shape model for the appearance of the wheal. Since the manual measurement assumes that the wheals are ellipses, an elliptic shape model seems reasonable. Thus, the problem is to find an ellipse that divides the image into two maximally inhomogeneous areas.

Since there are an infinite amount of ellipses, we have to limit the search space somehow. This can be done by fitting a model to the grayscale projection and considering only the isosurfaces of the model. Based on Figure 6, the Gaussian surface seems an appropriate model for the spatial grayscale distribution in this case. Moreover, it suits our assumption of elliptic wheals, because the isosurfaces of the two-dimensional Gaussian are ellipses.

More specifically, the Gaussian model is defined by

$$f(\mathbf{x}; \mathbf{c}, \mathbf{x}\_0, \boldsymbol{\Sigma}) = \boldsymbol{c} \cdot \exp \left( - (\mathbf{x} - \mathbf{x}\_0)^T \boldsymbol{\Sigma} (\mathbf{x} - \mathbf{x}\_0) \right), \tag{14}$$

where *<sup>c</sup>* <sup>∈</sup> **<sup>R</sup>**<sup>+</sup> defines the scale of the Gaussian, **<sup>x</sup>** = (*x*, *<sup>y</sup>*)*<sup>T</sup>* denotes the image coordinates where the the model is fitted, **x**<sup>0</sup> = (*x*0, *y*0)*<sup>T</sup>* denotes the location of the peak of the Gaussian and **<sup>Σ</sup>** <sup>∈</sup> **<sup>R</sup>**2×<sup>2</sup> is a symmetric coefficient matrix.

The least squares (LS) fit to the grayscale image data is defined by

$$\min\_{\mathbf{c}, \mathbf{E}, \mathbf{x}\_0} \sum\_{k=0}^{N} (z\_k - f(\mathbf{x}\_k; \mathbf{c}, \mathbf{x}\_0, \mathbf{E}))^2,\tag{15}$$

where *zk* denotes the grayscale value at image position **x***k*. Note that the data has to be preprocessed by subtracting the minimum of *zk*, *k* = 0, . . . , *N*, in order to avoid a constant offset term in the model.

Fitting the Gaussian is a nontrivial problem, although lot of literature on the topic exists (e.g., Brändle et al. (2000)). However, the easiest approach is to use software packages such as Matlab Optimization toolbox to find the optimal parameters.

20 Will-be-set-by-IN-TECH

Fig. 8. Left: The grayscale projection data of the wheal in Figure 6 using KFD projection with

regions although the image has only one wheal. This is mostly due to the noise in the data,

In order to increase the robustness of the segmentation, we fit a shape model for the appearance of the wheal. Since the manual measurement assumes that the wheals are ellipses, an elliptic shape model seems reasonable. Thus, the problem is to find an ellipse that divides

Since there are an infinite amount of ellipses, we have to limit the search space somehow. This can be done by fitting a model to the grayscale projection and considering only the isosurfaces of the model. Based on Figure 6, the Gaussian surface seems an appropriate model for the spatial grayscale distribution in this case. Moreover, it suits our assumption of elliptic wheals,

where *<sup>c</sup>* <sup>∈</sup> **<sup>R</sup>**<sup>+</sup> defines the scale of the Gaussian, **<sup>x</sup>** = (*x*, *<sup>y</sup>*)*<sup>T</sup>* denotes the image coordinates where the the model is fitted, **x**<sup>0</sup> = (*x*0, *y*0)*<sup>T</sup>* denotes the location of the peak of the Gaussian

where *zk* denotes the grayscale value at image position **x***k*. Note that the data has to be preprocessed by subtracting the minimum of *zk*, *k* = 0, . . . , *N*, in order to avoid a constant

Fitting the Gaussian is a nontrivial problem, although lot of literature on the topic exists (e.g., Brändle et al. (2000)). However, the easiest approach is to use software packages such as

<sup>−</sup>(**<sup>x</sup>** <sup>−</sup> **<sup>x</sup>**0)*T***Σ**(**<sup>x</sup>** <sup>−</sup> **<sup>x</sup>**0)

(*zk* <sup>−</sup> *<sup>f</sup>*(**x***k*; *<sup>c</sup>*, **<sup>x</sup>**0, **<sup>Σ</sup>**))2, (15)

, (14)

RBF kernel and *σ* = 1. Right: The final result after nonlinear LS fit.

because the isosurfaces of the two-dimensional Gaussian are ellipses.

*f*(**x**; *c*, **x**0, **Σ**) = *c* · exp

The least squares (LS) fit to the grayscale image data is defined by

Matlab Optimization toolbox to find the optimal parameters.

min *c*,**Σ**,**x**<sup>0</sup>

*N* ∑ *k*=0

whose effect is greatly emphasized by the grayscale projection.

the image into two maximally inhomogeneous areas.

More specifically, the Gaussian model is defined by

and **<sup>Σ</sup>** <sup>∈</sup> **<sup>R</sup>**2×<sup>2</sup> is a symmetric coefficient matrix.

offset term in the model.

Fig. 9. Left: The KFDR as a function of the ellipse size. Center: The maximally separating ellipse overlaid on top of the corresponding KFD projection. Right: The maximally separating ellipse overlaid on top of original RGB data.

However, in order to ease the task, we first seek for an initial guess for the coefficients by taking the logarithm of the data and the model of Eq. (15):

$$\min\_{\mathbf{c}, \boldsymbol{\Sigma}, \mathbf{x}\_0} \sum\_{k=0}^{N} (\ln(z\_k) - \ln(f(\mathbf{x}\_k; \mathbf{c}, \mathbf{x}\_0, \boldsymbol{\Sigma})))^2 = \min\_{\mathbf{c}, \mathbf{\Sigma}, \mathbf{x}\_0} \sum\_{k=0}^{N} (\ln(z\_k) + \ln c + (\mathbf{x} - \mathbf{x}\_0)^T \boldsymbol{\Sigma}(\mathbf{x} - \mathbf{x}\_0))^2. \tag{16}$$

This makes the problem linear, and the result can be found in a closed form. However, taking the logarithm increases the importance of the smaller values, and the method essentially fits the model to the noise surrounding the wheal, not the wheal itself. Thus, the resulting solution is used only as the initial guess for the nonlinear LS problem without the logarithm. The nonlinear LS problem is then solved using Matlab Optimization toolbox.

Figure 8 shows the original grayscale data on the left, the result of logarithmic fitting in the center and the result of nonlinear iterative fitting on the right.

The isosurfaces of the Gaussian fit can be used as candidates for elliptic segmentation. As noted earlier, all the isosurfaces are cross-sections of a paraboloid and thus ellipses. Moreover, due to fitting, they most likely have the correct orientation and correct ratio of major and minor axis lengths. Thus, our next goal is to seek for the best elliptic isosurface among them all.

The definition of a good ellipse among the candidates needs some measure of separation between the segmented areas. Recent work by Harchaoui *et al.* (Harchaoui et al., 2008) considers using the Kernel Fisher Discriminant Ratio (KFDR) for testing the homogeneity between two sets, which coincides well with our use of KFD for grayscale projection in Section 3.2.

In other words, we test all ellipses that are cross sections of the fitted Gaussian and attempt to maximize the KFDR of Eq. (5) with respect to training sets defined by the ellipse. The situation is similar to the grayscale projection, but now we are not looking for a good classifier for the RGB data, but only assessing how well the data *could be classified*. Unlike Section 3.2, the choice of the training samples is now based on the boundaries of the ellipse to be tested.

In a sense, the KFDR homogeneity test attempts to design a classifier to separate the "inside" class and the "outside" class, and the KFDR is a natural measure how well this can be done. Note that this is not equivalent to calculating the variances directly from the projections of

in *A*. The favourable property of Eq. (17) is that it increases linearly with respect to the error in major and minor axes. For example, it can be shown that the error measure for concentric circles with radii *r* + *a* and *r* − *a* are equal if the true radius is *r*. This is not the case with the

Practical Imaging in Dermatology 157

Examples of segmentation results are illustrated in Figure 10. The figure shows the result of manual segmentation (red) compared with the result of the proposed method with (blue) and without (green) the shape model. Table 2 represents the average errors with different grayscale transformations. The test data consists of seven wheals including those shown in Figure 10. The five first columns represent different KFD projections designed using training data, while in the last two columns the projection is designed in an unsupervised or *ad hoc*

From the results, one can see that the *L*∗*a*∗*b*∗ color space produces the smallest errors. This is in coherence with the results of earlier studies (Nischik & Forster, 1997; Roullot et al., 2005), who have also discovered the importance of the *a*∗ component in skin color analysis. However, the difference between color spaces is not that significant, especially when using a Gaussian kernel for the KFD. For example, in the case of Gaussian kernel with bandwidth *σ* = 1, all errors are close to each other, and visual inspection of the results reveals that there are no gross errors. The lesser importance of the initial color space when using the Gaussian KFD kernel is natural, because the three-dimensional data is mapped to an infinite dimensional

space, where the colors are most likely separable regardless of the initial color space.

Another interesting observation is the worse than expected performance of the elliptic shape model. In many cases its use increases the error. However, this is partly due to the fact that the manually segmented wheals are not ellipses, so the elliptic model can not reach zero error even in theory. The best cases are the ones where the true wheal is ellipse-shaped with no elongated pseudopodia, e.g., the 2nd and 3rd wheals from the right in Figure 10. In all other cases the wheal shape is more irregular, and the shape model results in the largest inscribed ellipse. However, there is some randomness in the results due to the small *N*. We plan to

> With *RGB* 0.3668 0.2393 0.1912 0.2567 0.2723 1.3710 1.0297 Otsu *L*∗ *a*∗*b*∗ 0.2505 0.1861 0.2811 0.2382 0.2925 1.0788 0.9835 Thresh *HSV* 0.2177 0.2176 0.5129 0.2733 0.2563 1.5259 0.7424 With *RGB* 0.3660 0.2406 0.1915 0.2545 0.2714 1.3759 1.0307 Graph *L*∗ *a*∗*b*∗ 0.2508 0.1855 0.2813 0.2364 0.2926 1.0810 0.9844 Cuts *HSV* 0.2176 0.2169 0.5129 0.2714 0.2570 1.5260 0.7422 With *RGB* 0.4769 0.2479 0.2289 0.4821 0.2143 1.3588 0.4291 Shape *L*∗ *a*∗*b*∗ 0.2325 0.2388 0.2335 0.1894 0.2240 1.1731 1.4467 Model *HSV* 0.2472 0.2300 0.7174 0.4852 0.3815 1.3465 0.8816

Table 2. The comparison of automated wheal measurement methods in terms of the error of Eq. (17). Each row defines an initial color space, each column corresponds to a grayscale transformation. The last column corresponds to a manually designed transformation based on what looks good. In the RGB case, the *Ad Hoc* transformation is the difference *G* − *B*, in

the *L*∗*a*∗*b*∗ case it is the *a*∗ component, and in the HSV case the *H* component.

*Gaussian Gaussian Gaussian Linear 2. order Celebi Ad* (*σ* = 0.5) (*σ* = 1) (*σ* = *σ*rot) *kernel kernel method hoc*

error of (Celebi et al., 2009).

manner.

Fig. 10. An example of segmentation result. The red boundary is the result of manual segmentation, while blue and green boundaries represent the result of our method with and without the elliptic shape model, respectively. (Wheals 2, 14, 15, etc. have not been segmented, as there is no detectable allergic reaction at these sites.)

Figure 6, because the projection is calculated separately for the training sets determined by each ellipse candidate.

Sometimes the KFDR separability criterion results in very small ellipses, because a small foreground training set tends to be well separable. As an extreme example, an ellipse containing only a single pixel has extremely good separability assuming no other pixel has exactly the same RGB value. Thus, we decided to modify the criterion by multiplying it with the cardinality of the smaller training set. Alternatively, we could set a minimum size restriction for the ellipse.

An example of the separability test is shown in Figure 9. The figure shows the KFDR between the "inside" class and the "outside" class for ellipses with different radius. It can be seen that the maximal separation is obtained at radius 42, and the corresponding ellipse is illustrated in Figure 9, as well.

#### **3.7 Experiments**

The results from the described method are compared to manual wheal segmentations (made by a non-medical expert). The similarity measure used by Celebi et al. (Celebi et al., 2009) compares the *areas* of the segmentations. For our purposes, this is not an appropriate criterion, since ultimately we are interested in the major and minor axes of the wheal. The error in areas increases quadratically with respect to the axes, which is not desirable. Instead, we used the following error criterion between the computer segmentation *A* and the manual ground truth *B*:

$$E(A,B) = \frac{\sqrt{\text{Area}(\text{OR}(A,B))} - \sqrt{\text{Area}(\text{AND}(A,B))}}{\sqrt{\text{Area}(B)}},\tag{17}$$

where OR(*A*, *B*) consists of pixels segmented as foreground in *A* or *B*, and AND(*A*, *B*) of foreground pixels in both *A* and *B*. Moreover, Area(*A*) is the number of foreground pixels 22 Will-be-set-by-IN-TECH

Fig. 10. An example of segmentation result. The red boundary is the result of manual segmentation, while blue and green boundaries represent the result of our method with and

Figure 6, because the projection is calculated separately for the training sets determined by

Sometimes the KFDR separability criterion results in very small ellipses, because a small foreground training set tends to be well separable. As an extreme example, an ellipse containing only a single pixel has extremely good separability assuming no other pixel has exactly the same RGB value. Thus, we decided to modify the criterion by multiplying it with the cardinality of the smaller training set. Alternatively, we could set a minimum size

An example of the separability test is shown in Figure 9. The figure shows the KFDR between the "inside" class and the "outside" class for ellipses with different radius. It can be seen that the maximal separation is obtained at radius 42, and the corresponding ellipse is illustrated in

The results from the described method are compared to manual wheal segmentations (made by a non-medical expert). The similarity measure used by Celebi et al. (Celebi et al., 2009) compares the *areas* of the segmentations. For our purposes, this is not an appropriate criterion, since ultimately we are interested in the major and minor axes of the wheal. The error in areas increases quadratically with respect to the axes, which is not desirable. Instead, we used the following error criterion between the computer segmentation *A* and the manual ground truth

where OR(*A*, *B*) consists of pixels segmented as foreground in *A* or *B*, and AND(*A*, *B*) of foreground pixels in both *A* and *B*. Moreover, Area(*A*) is the number of foreground pixels

Area(OR(*A*, *<sup>B</sup>*)) <sup>−</sup> Area(AND(*A*, *<sup>B</sup>*))

Area(*B*) , (17)

without the elliptic shape model, respectively. (Wheals 2, 14, 15, etc. have not been

segmented, as there is no detectable allergic reaction at these sites.)

each ellipse candidate.

restriction for the ellipse.

*E*(*A*, *B*) =

Figure 9, as well.

**3.7 Experiments**

*B*:

in *A*. The favourable property of Eq. (17) is that it increases linearly with respect to the error in major and minor axes. For example, it can be shown that the error measure for concentric circles with radii *r* + *a* and *r* − *a* are equal if the true radius is *r*. This is not the case with the error of (Celebi et al., 2009).

Examples of segmentation results are illustrated in Figure 10. The figure shows the result of manual segmentation (red) compared with the result of the proposed method with (blue) and without (green) the shape model. Table 2 represents the average errors with different grayscale transformations. The test data consists of seven wheals including those shown in Figure 10. The five first columns represent different KFD projections designed using training data, while in the last two columns the projection is designed in an unsupervised or *ad hoc* manner.

From the results, one can see that the *L*∗*a*∗*b*∗ color space produces the smallest errors. This is in coherence with the results of earlier studies (Nischik & Forster, 1997; Roullot et al., 2005), who have also discovered the importance of the *a*∗ component in skin color analysis. However, the difference between color spaces is not that significant, especially when using a Gaussian kernel for the KFD. For example, in the case of Gaussian kernel with bandwidth *σ* = 1, all errors are close to each other, and visual inspection of the results reveals that there are no gross errors. The lesser importance of the initial color space when using the Gaussian KFD kernel is natural, because the three-dimensional data is mapped to an infinite dimensional space, where the colors are most likely separable regardless of the initial color space.

Another interesting observation is the worse than expected performance of the elliptic shape model. In many cases its use increases the error. However, this is partly due to the fact that the manually segmented wheals are not ellipses, so the elliptic model can not reach zero error even in theory. The best cases are the ones where the true wheal is ellipse-shaped with no elongated pseudopodia, e.g., the 2nd and 3rd wheals from the right in Figure 10. In all other cases the wheal shape is more irregular, and the shape model results in the largest inscribed ellipse. However, there is some randomness in the results due to the small *N*. We plan to


Table 2. The comparison of automated wheal measurement methods in terms of the error of Eq. (17). Each row defines an initial color space, each column corresponds to a grayscale transformation. The last column corresponds to a manually designed transformation based on what looks good. In the RGB case, the *Ad Hoc* transformation is the difference *G* − *B*, in the *L*∗*a*∗*b*∗ case it is the *a*∗ component, and in the HSV case the *H* component.

preprocessing methods. As the results show, there is a difference in results when using different color spaces as the input. Thus, one might anticipate that adding artificial channels

Practical Imaging in Dermatology 159

Boykov, Y. & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-

Boykov, Y., Veksler, O. & Zabih, R. (2001). Fast approximate energy minimization via graph

Brändle, N., Chen, H., Bischof, H. & Lapp, H. (2000). Robust parametric and semi-parametric

Celebi, M., Iyatomi, H. & Schaefer, G. (2009). Contrast enhancement in dermoscopy images

Farrell, J., Xiao, F. & Kavusi, S. (2006). Resolution and light sensitivity tradeoff with pixel size,

Fisher, R. (1936). The use of multiple measurements in taxonomic problems, *Annals of Eugenics*

Gomez, D., Butakoff, C., Ersboll, B. & Stoecker, W. (2008). Independent histogram pursuit for segmentation of skin lesions, *IEEE Trans. Biomed. Eng.* 55(1): 157–161. Gomez, D. D., Clemmensen, L. H., ll, B. K. E. & Carstensen, J. M. (2007). Precise acquisition

Greig, D. M., Porteous, B. T. & Seheult, A. H. (1989). Exact maximum a posteriori estimation for binary images, *Journal of the Royal Statistical Society.* 51(2): pp. 271–279. Haahtela, T., Petman, L., Järvenpaa, S. & Kautiainen, H. (2010). [Quality to allegy testing and

Harchaoui, Z., Bach, F. & Eric, M. (2008). Testing for homogeneity with kernel fisher

Huttunen, H., Ryynänen, J.-P., Forsvik, H., Voipio, V. & Kikuchi, H. (2011). Kernel

Meinert, R., Frischer, T., Karmaus, W. & Kuehr, J. (1994). Influence of skin prick test criteria on

Mika, S., Ratsch, G., Weston, J., Scholkopf, B. & Mullers, K. (1999). Fisher discriminant analysis with kernels, *Proc. IEEE Neural Netw. for Signal Process. IX*, pp. 41–48.

and unsupervised segmentation of multi-spectral images, *Comp. Vis. and Image*

discriminant analysis, *Adv. in Neural Inf. Proc. Syst. 20*, MIT Press, Cambridge, MA,

fisher discriminant and elliptic shape model for automatic measurement of allergic reactions, *in* A. Heyden & F. Kahl (eds), *Image Analysis*, Vol. 6688 of *Lecture Notes in*

estimation of prevalence and incidence of allergic sensitization in children, *Allergy*

flow algorithms for energy minimization in vision, *Pattern Analysis and Machine*

cuts, *Pattern Analysis and Machine Intelligence, IEEE Transactions on* 23(11): 1222 –1239.

spot fitting for spot array images, *ISMB-2000 8th Intl. Conf. on Intell. Syst. for Mol. Biol.*,

by maximizing a histogram bimodality measure, *16th IEEE Int. Conf. on Image Proc.*

produced by different filters may also be of help in the segmentation.

*Intelligence, IEEE Transactions on* 26(9): 1124 –1137.

URL: *http://www.cybercom.net/ dcoffin/dcraw/dcraw.c* ECMA (2009). *Technical Report TR/98 JPEG File Interchange Format (JFIF)*.

interpretation of its results], *Duodecim* 126: 529–535.

Hecht, E. (1987). *Optics*, 2nd edn, Addison-Wesley, chapter 10.2.6.

*Computer Science*, Springer Berlin / Heidelberg, pp. 764–773.

*SPIE Electronic Imaging '2006 Conference*.

*Understanding* 106(2-3): 183–193.

Adobe Systems Incorporated (2009). *Digital Negative Specification version 1.3.0.0*.

**5. References**

pp. 1–12.

7: 179–188.

pp. 609–616.

49: 526–532.

*(ICIP)*, pp. 2601–2604. Coffin, D. (2011). dcraw source code.

study the performance with larger *N* and compare them with the manual results of a trained physician.

#### **4. Conclusions**

This research projects shows one possibility to automatize a simple dermatological examination. While the results obtained are not perfect, they are very promising. Automatization gives several benefits in the case of the prick test, as it eliminates the inter-observer variance and offers a way to study the immune reaction as a function of time.

The method shown above is semi-automatic in the sense that the user has to input a point close to the center of the wheal. This is not a major obstacle in practice, as the prick test can be performed so that the centers are known. Another possibility is to use some easily recognizable markings on the skin so that the centers can be found automatically.

Finding the size of a wheal is a surprisingly complicated task. The wheal edges are not sharp, and the contrast between the wheal and its surroundings is very low. The skin color is not constant, it varies significantly not only between individuals but also between different areas of the skin of an individual.

Our algorithm for determining the wheal size has several steps. First, the image is transformed into a grayscale image by using the KFD (Kernel Fisher Discriminant) projection. Although the linear kernel with the *L*∗*a*∗*b*∗ color space resulted in best results in this study, we anticipate that the flexibility of the Gaussian kernel would be useful with larger amount of patients (e.g., with different skin colors).

At this point there are two different methods for finding out the wheal size and shape. The first method segments the wheals using either straightforward thresholding or the Graph Cut method to determine which pixels in the image belong to the wheal. It seems that the performance of the segmentation approaches is equally good. The second method finds the ellipse which has the highest KFDR (Kernel Fisher Discriminant Ratio) between the pixels inside and outside of the ellipse.

It is not clear which of the methods is best in practice. The Graph Cut method gives irregular wheals, and the results are close to those obtained with manual segmentation. On the other hand, the medical practice uses one or two diameters of the wheal, and it in unclear whether or not any protruding features of the wheal should be included. Choosing between these two methods will require a large number of images, as the differences are not very large.

The images used in developing the algorithms were not of the highest quality. They were taken with a relatively old digital camera, and we earned during the process that there were significant compression artefacts in the images, which is clearly visible in the grayscale images. The use of a more modern camera without compression should reduce both compression artefacts and image noise significantly and thus improve segmentation results. This, also, is a topic for further research.

One should also note that the above methods generalize to other multichannel segmentation methods than RGB segmentation. Using alternative wavelengths might help the segmentation, and any combination of input images is possible. Another alternative is to generate artificial input channels, e.g., by filtering the RGB channels using different preprocessing methods. As the results show, there is a difference in results when using different color spaces as the input. Thus, one might anticipate that adding artificial channels produced by different filters may also be of help in the segmentation.

#### **5. References**

24 Will-be-set-by-IN-TECH

study the performance with larger *N* and compare them with the manual results of a trained

This research projects shows one possibility to automatize a simple dermatological examination. While the results obtained are not perfect, they are very promising. Automatization gives several benefits in the case of the prick test, as it eliminates the inter-observer variance and offers a way to study the immune reaction as a function of time. The method shown above is semi-automatic in the sense that the user has to input a point close to the center of the wheal. This is not a major obstacle in practice, as the prick test can be performed so that the centers are known. Another possibility is to use some easily

Finding the size of a wheal is a surprisingly complicated task. The wheal edges are not sharp, and the contrast between the wheal and its surroundings is very low. The skin color is not constant, it varies significantly not only between individuals but also between different areas

Our algorithm for determining the wheal size has several steps. First, the image is transformed into a grayscale image by using the KFD (Kernel Fisher Discriminant) projection. Although the linear kernel with the *L*∗*a*∗*b*∗ color space resulted in best results in this study, we anticipate that the flexibility of the Gaussian kernel would be useful with larger amount

At this point there are two different methods for finding out the wheal size and shape. The first method segments the wheals using either straightforward thresholding or the Graph Cut method to determine which pixels in the image belong to the wheal. It seems that the performance of the segmentation approaches is equally good. The second method finds the ellipse which has the highest KFDR (Kernel Fisher Discriminant Ratio) between the pixels

It is not clear which of the methods is best in practice. The Graph Cut method gives irregular wheals, and the results are close to those obtained with manual segmentation. On the other hand, the medical practice uses one or two diameters of the wheal, and it in unclear whether or not any protruding features of the wheal should be included. Choosing between these two

The images used in developing the algorithms were not of the highest quality. They were taken with a relatively old digital camera, and we earned during the process that there were significant compression artefacts in the images, which is clearly visible in the grayscale images. The use of a more modern camera without compression should reduce both compression artefacts and image noise significantly and thus improve segmentation results.

One should also note that the above methods generalize to other multichannel segmentation methods than RGB segmentation. Using alternative wavelengths might help the segmentation, and any combination of input images is possible. Another alternative is to generate artificial input channels, e.g., by filtering the RGB channels using different

methods will require a large number of images, as the differences are not very large.

recognizable markings on the skin so that the centers can be found automatically.

physician.

**4. Conclusions**

of the skin of an individual.

of patients (e.g., with different skin colors).

inside and outside of the ellipse.

This, also, is a topic for further research.

Adobe Systems Incorporated (2009). *Digital Negative Specification version 1.3.0.0*.


**8** 

*Colombia* 

**Microcalcification Detection** 

*Computational Neuroscience, Department of Physics,* 

*Universidad Autonoma de Occidente,* 

**A Neurobiologically-Inspired Approach** 

A Computer-Aided Diagnosis (CAD) system is a set of automatic or semi-automatic tools developed to assist radiologists in the detection and/or classification of abnormalities presented in diagnostic images of different modalities. Although on the early phase of research and development CAD systems were criticized by some computer scientists; regardless of this criticism, nowadays' experimental evidence indicates that success rates of radiologists increase significantly when they are helped by these systems: In mammography, researchers have reported results from prospective studies on a large number of screenees, regarding the effect of CAD on the detection rate of breast cancer. Although there is a large variation in the results, it is important to note that all of these studies indicated an increase in the detection rates of breast cancer with the use of CAD; as a consequence of this, using CAD contributes to decrease cancer-related deceases due to the

The idea of developing computer systems to assist physicians in the detection of diseases has been a challenging matter during the last years, specifically on reducing the number of missed diagnosis and the time taken to reach a diagnosis among the different diagnostic image modalities. Moreover, the recent development of full-field digital imaging and picture archiving and communication systems (PACS) have been a catalyst in the increase of

Because of the emphasis on screening programs in almost every country, the number of mammograms to be analyzed by the radiologists is enormous but, only a small portion of them are related to breast cancer (Oliver et al., 2010). In addition, a mammographic image is characterized by a high spatial resolution which is adequate enough to detect subtle finescale signs such as microcalcifications. Consequently, the analysis of mammographic images

During the last years, the number of papers related to CAD has been augmented due to the increased interest on improving disease diagnosis using different image modalities. As far as the evidence indicates, it appears reasonable to use CAD for screening examinations, provided that large fractions of them give normal results and therefore the task of diagnosis

is a complex and cumbersome task which requires highly specialized radiologists.

**1. Introduction**

early detection of cancer signs.

such computer systems in developed countries.

Juan F. Ramirez-Villegas and David F. Ramirez-Moreno

**in Digitized Mammograms:** 

NEMA (2011). *Digital Imaging and Communications in Medicine (DICOM)*.


### **Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach**

Juan F. Ramirez-Villegas and David F. Ramirez-Moreno *Computational Neuroscience, Department of Physics, Universidad Autonoma de Occidente, Colombia* 

#### **1. Introduction**

26 Will-be-set-by-IN-TECH

160 Digital Image Processing

Nischik, M. & Forster, C. (1997). Analysis of skin erythema using true-color images, *IEEE*

Otsu, N. (1979). A threshold selection method from gray-level histograms, *Systems, Man and*

Roullot, E., Autegarden, J.-E., Devriendt, P. & Leynadier, F. (2005). Segmentation of erythema

Santos, R., Mlynek, A., Lima, H., Martus, P. & M, M. (2008). Beyond flat wheals: validation

Schölkopf, B. & Smola, A. J. (2001). *Learning with Kernels: Support Vector Machines,*

Seibert, S. M., King, T. S., Kline, D., Mende, C. & Craig, T. (2011). Reliability of skin test results

Sezgin, M. & Sankur, B. (2004). Survey over image thresholding techniques and quantitative

*and Digital Photography Applications V: Proceedings of the SPIE, volume 5301*. Wöhrl, S., Vigl, K., Binder, M., Stingl, G. & Prinz, M. (2006). Automated measurement of

*Regularization, Optimization, and Beyond*, 1st edn, The MIT Press.

when read at different time points, *Allergy Asthma Proc* 32: 203–205.

performance evaluation, *Journal of Electronic Imaging* 13(1): 146–168. Silverman, B. (1986). *Density Estimation for Statistics and Data Analysis*, Chapman-Hall. Viggiano, J. A. S. (2004). Comparison of the accuracy of different white balancing algorithms as

URL: *http://www.worldallergy.org/professional/allergic\_diseases\_center/allergy\_diagnostic/*

from skin photographs for assisted diagnosis in allergology, *Pattern Rec. and Image Anal.*, Vol. 3687 of *Lecture Notes in Computer Science*, Springer Berlin / Heidelberg,

of a three-dimensional imaging technology that will improve skin allergy research,

quantified by their color constancy, *Sensors and Camera Systems for Scientific, Industrial,*

skin prick tests: an advance towards exact calculation of wheal size, *Experimental*

NEMA (2011). *Digital Imaging and Communications in Medicine (DICOM)*.

Oppenheimer, J., Durham, S. & Nelson, H. (2011). Allergy diagnostic testing.

*Trans. Med. Imag.* 16(6): 711–716.

*Clin. Exp. Dermatol.* 33(6): 772–775.

*Dermatology* 15(2): 119–124.

pp. 754–763.

*Cybernetics, IEEE Transactions on* 9(1): 62–66.

A Computer-Aided Diagnosis (CAD) system is a set of automatic or semi-automatic tools developed to assist radiologists in the detection and/or classification of abnormalities presented in diagnostic images of different modalities. Although on the early phase of research and development CAD systems were criticized by some computer scientists; regardless of this criticism, nowadays' experimental evidence indicates that success rates of radiologists increase significantly when they are helped by these systems: In mammography, researchers have reported results from prospective studies on a large number of screenees, regarding the effect of CAD on the detection rate of breast cancer. Although there is a large variation in the results, it is important to note that all of these studies indicated an increase in the detection rates of breast cancer with the use of CAD; as a consequence of this, using CAD contributes to decrease cancer-related deceases due to the early detection of cancer signs.

The idea of developing computer systems to assist physicians in the detection of diseases has been a challenging matter during the last years, specifically on reducing the number of missed diagnosis and the time taken to reach a diagnosis among the different diagnostic image modalities. Moreover, the recent development of full-field digital imaging and picture archiving and communication systems (PACS) have been a catalyst in the increase of such computer systems in developed countries.

Because of the emphasis on screening programs in almost every country, the number of mammograms to be analyzed by the radiologists is enormous but, only a small portion of them are related to breast cancer (Oliver et al., 2010). In addition, a mammographic image is characterized by a high spatial resolution which is adequate enough to detect subtle finescale signs such as microcalcifications. Consequently, the analysis of mammographic images is a complex and cumbersome task which requires highly specialized radiologists.

During the last years, the number of papers related to CAD has been augmented due to the increased interest on improving disease diagnosis using different image modalities. As far as the evidence indicates, it appears reasonable to use CAD for screening examinations, provided that large fractions of them give normal results and therefore the task of diagnosis

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 163

generalization ability of the neural network; this network used a set of six features and was trained and tested using the DDSM benchmark database and the results showed an accuracy between 79% and 94%. Wei et al., 2009, proposed a microcalcification classification scheme assisted by content-based mammogram retrieval. The algorithm was tested using 200 different mammographic images from 104 cases. This approach used an adaptive support vector machine (Ada-SVM) as classifier which outperformed the classification accuracies given by other classifiers due to the incorporation of proximity information; the

Tsai et al., 2010, proposed an approach in which suspicious microcalcified regions are separated from normal tissue by wavelet layers and Renyi's information theory. Subsequently, several statistical shape-based descriptors are extracted; principal component analysis (PCA) is used to reduce the dimensionality of the feature space and the data classification is performed by a standard MLP neural network. The maximum performance

**2. Visual cortex mechanisms: Neurobiological considerations and potential** 

Up to this moment, microcalcification detection has been largely studied along with the development of computer vision algorithms. There are many computational approaches which have driven the problem at reasonable cost-effectiveness. Nonetheless, as a matter of fact, neurobiologically-inspired approaches have been rather neglected due to the poor establishment of the relation between cogent neurobiological principles and their potential

Primates' visual cortex is capable to interpret dynamical scenes in clutter, in spite of using several serial visual processes as the attention shifting and saccadic eye movements suggest. As pure parallel processing of visual inputs becomes obscure and cumbersome for the visual cortex machinery, it deals with such task by selecting circumscribed regions of visual information to be processed preferentially and by changing the processing focus over the time course. Up to this moment, there are several approaches for the dynamic routing of visual stimuli and information flow through the visual cortex, which accounts for competitive interactions and dynamical modifications of the neural activity into the ventral and dorsal pathways, and the consequent biasing of these interactions in favor of certain objects of the space into scene-dependent (bottom-up) and/or task-dependent (top-down) strategies (Itti & Koch, 2000). The interactions among these two visual processes have been addressed by many researchers (Fix et al., 2010; Navalpakkam & Itti, 2005; Navalpakkam &

Objects in the visual field must compete for processing within more than 30 different visual cortical areas. As the ability to screen out objects during visual search tasks is contextual and primates often detect a single target in an array of non-targets, detections –for all the effects– depend largely on the correlation between targets and non-targets. According to this biased competition model, the targets and non-targets of a scene compete for processing spaces during visual search. There may be biases towards sudden appearances of new objects in the visual field and towards objects that are larger, brighter, faster moving, etc (Desimone &

reported classification accuracy was 0.82 in terms of the area under the ROC curve.

achieved by this approach was 97.1 at 0.08 false positives.

to visual computer systems development.

Itti, 2002; Walther & Koch, 2006; Serre et al., 2006).

**for CAD** 

Duncan, 1995).

becomes both cumbersome and time-consuming. In addition, the current performance of commercial CAD systems have shown that there is a substantial gain in detection rates as well as an important increase in recall rate, not to mention the overall performance of such systems for the detection of disease signs (e.g., 98% sensitivity at 0.25 false positives per mammographic image, for one of the latest commercial CAD systems) (Doi, 2007).

As far as the literature shows, there seems to be only one attempt to integrate CAD systems into a multi-organ and multi-disease one incorporating all the diagnostic knowledge (Kobatake, 2007). On the other hand, the current status of single-purpose, single-organ CAD systems shows some good examples of commercial and functionally CAD systems for practical and clinical use. In mammography, chest radiography and thoracic CT, a number of commercial systems are available. The former systems include the detection and differential diagnosis of masses and microcalcifications. Furthermore, in chest radiography and thoracic CT, CAD schemes include the detection and differential diagnosis of lung nodules, interstitial lung diseases, and the detection of cardiomegaly, pneumothorax and interval changes (Doi, 2007). Researchers have reported an important reduction in the mean age of patients at the time of detection when CAD was used along with the increase in the detection rates of breast cancer (Cupples et al., 2005), similar results were achieved on the detection rates of lung cancer, colon diseases, intracranial aneurysms, among others (Doi, 2007).

Microcalcification detection has been extensively studied. Yu and Guan, 2000, developed a technique for the detection of clustered microcalcifications. The first part of the algorithm addresses the extraction of features based on wavelet decomposition and gray-level statistics, followed by a neural-network classifier. The detection of individual objects depends on shape factors, gray-level features, and a second neural network as a classification scheme. The algorithm was tested using a set of 40 mammograms and the sensitivity reported was 90% at 0.5 false positive per image.

Christoyianni et al., 2002, proposed a neural classification scheme for different kinds of regions of suspicion (ROS) on digitized mammograms; in this approach the Mini-MIAS database was used to perform the feature extraction and classification stages. The feature extraction stage was based on independent component analysis calculation in order to find a set of regions that generates the mammograms observed. The recognition accuracy for the detection of abnormalities was 88.23% and 79.31% in distinguishing between benign and malignant regions. El-Naqa et al., 2002, used support vector machines to detect microcalcification clusters. The algorithm was tested using 76 mammograms, containing 1120 microcalcifications, and it outperformed several well-known methods for microcalcification detection with a sensitivity of 94% at one false positive.

Vilarrasa, 2006, proposed a variety of visual processing and classification schemes to detect and classify mammary tissue. This group of algorithms employs standard segmentation procedures such as Tukey outlier test, region growing and segmentation via watershed transformation; additionally, a neural classifier is proposed to distinguish between healthy and calcified mammary tissue. The results were not good enough (were not reported due to its poorness), nevertheless, a morphologic filter was used to increase the success rates of the classifier; finally, the system reached 84% sensitivity, 64% specificity and 77.2% accuracy.

Verma et al., 2009, used a novel soft cluster neural network technique for the classification of suspicious areas in digital mammograms; the main idea of the soft clusters is to increase the

becomes both cumbersome and time-consuming. In addition, the current performance of commercial CAD systems have shown that there is a substantial gain in detection rates as well as an important increase in recall rate, not to mention the overall performance of such systems for the detection of disease signs (e.g., 98% sensitivity at 0.25 false positives per

As far as the literature shows, there seems to be only one attempt to integrate CAD systems into a multi-organ and multi-disease one incorporating all the diagnostic knowledge (Kobatake, 2007). On the other hand, the current status of single-purpose, single-organ CAD systems shows some good examples of commercial and functionally CAD systems for practical and clinical use. In mammography, chest radiography and thoracic CT, a number of commercial systems are available. The former systems include the detection and differential diagnosis of masses and microcalcifications. Furthermore, in chest radiography and thoracic CT, CAD schemes include the detection and differential diagnosis of lung nodules, interstitial lung diseases, and the detection of cardiomegaly, pneumothorax and interval changes (Doi, 2007). Researchers have reported an important reduction in the mean age of patients at the time of detection when CAD was used along with the increase in the detection rates of breast cancer (Cupples et al., 2005), similar results were achieved on the detection rates of lung

Microcalcification detection has been extensively studied. Yu and Guan, 2000, developed a technique for the detection of clustered microcalcifications. The first part of the algorithm addresses the extraction of features based on wavelet decomposition and gray-level statistics, followed by a neural-network classifier. The detection of individual objects depends on shape factors, gray-level features, and a second neural network as a classification scheme. The algorithm was tested using a set of 40 mammograms and the

Christoyianni et al., 2002, proposed a neural classification scheme for different kinds of regions of suspicion (ROS) on digitized mammograms; in this approach the Mini-MIAS database was used to perform the feature extraction and classification stages. The feature extraction stage was based on independent component analysis calculation in order to find a set of regions that generates the mammograms observed. The recognition accuracy for the detection of abnormalities was 88.23% and 79.31% in distinguishing between benign and malignant regions. El-Naqa et al., 2002, used support vector machines to detect microcalcification clusters. The algorithm was tested using 76 mammograms, containing 1120 microcalcifications, and it outperformed several well-known methods for

Vilarrasa, 2006, proposed a variety of visual processing and classification schemes to detect and classify mammary tissue. This group of algorithms employs standard segmentation procedures such as Tukey outlier test, region growing and segmentation via watershed transformation; additionally, a neural classifier is proposed to distinguish between healthy and calcified mammary tissue. The results were not good enough (were not reported due to its poorness), nevertheless, a morphologic filter was used to increase the success rates of the classifier; finally, the system reached 84% sensitivity, 64% specificity and 77.2% accuracy.

Verma et al., 2009, used a novel soft cluster neural network technique for the classification of suspicious areas in digital mammograms; the main idea of the soft clusters is to increase the

mammographic image, for one of the latest commercial CAD systems) (Doi, 2007).

cancer, colon diseases, intracranial aneurysms, among others (Doi, 2007).

microcalcification detection with a sensitivity of 94% at one false positive.

sensitivity reported was 90% at 0.5 false positive per image.

generalization ability of the neural network; this network used a set of six features and was trained and tested using the DDSM benchmark database and the results showed an accuracy between 79% and 94%. Wei et al., 2009, proposed a microcalcification classification scheme assisted by content-based mammogram retrieval. The algorithm was tested using 200 different mammographic images from 104 cases. This approach used an adaptive support vector machine (Ada-SVM) as classifier which outperformed the classification accuracies given by other classifiers due to the incorporation of proximity information; the reported classification accuracy was 0.82 in terms of the area under the ROC curve.

Tsai et al., 2010, proposed an approach in which suspicious microcalcified regions are separated from normal tissue by wavelet layers and Renyi's information theory. Subsequently, several statistical shape-based descriptors are extracted; principal component analysis (PCA) is used to reduce the dimensionality of the feature space and the data classification is performed by a standard MLP neural network. The maximum performance achieved by this approach was 97.1 at 0.08 false positives.

#### **2. Visual cortex mechanisms: Neurobiological considerations and potential for CAD**

Up to this moment, microcalcification detection has been largely studied along with the development of computer vision algorithms. There are many computational approaches which have driven the problem at reasonable cost-effectiveness. Nonetheless, as a matter of fact, neurobiologically-inspired approaches have been rather neglected due to the poor establishment of the relation between cogent neurobiological principles and their potential to visual computer systems development.

Primates' visual cortex is capable to interpret dynamical scenes in clutter, in spite of using several serial visual processes as the attention shifting and saccadic eye movements suggest. As pure parallel processing of visual inputs becomes obscure and cumbersome for the visual cortex machinery, it deals with such task by selecting circumscribed regions of visual information to be processed preferentially and by changing the processing focus over the time course. Up to this moment, there are several approaches for the dynamic routing of visual stimuli and information flow through the visual cortex, which accounts for competitive interactions and dynamical modifications of the neural activity into the ventral and dorsal pathways, and the consequent biasing of these interactions in favor of certain objects of the space into scene-dependent (bottom-up) and/or task-dependent (top-down) strategies (Itti & Koch, 2000). The interactions among these two visual processes have been addressed by many researchers (Fix et al., 2010; Navalpakkam & Itti, 2005; Navalpakkam & Itti, 2002; Walther & Koch, 2006; Serre et al., 2006).

Objects in the visual field must compete for processing within more than 30 different visual cortical areas. As the ability to screen out objects during visual search tasks is contextual and primates often detect a single target in an array of non-targets, detections –for all the effects– depend largely on the correlation between targets and non-targets. According to this biased competition model, the targets and non-targets of a scene compete for processing spaces during visual search. There may be biases towards sudden appearances of new objects in the visual field and towards objects that are larger, brighter, faster moving, etc (Desimone & Duncan, 1995).

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 165

In this work, a total of 23 mammographic images containing microcalcified tissue were taken from The Mini-MIAS Database of Mammograms (Suckling, 1994), which is widely used by researchers to carry out and evaluate their research work before other researchers in the area of CAD of breast cancer. We have used this database in our previous research (Ramirez-Villegas et al., 2010; Ramirez-Villegas & Ramirez-Moreno, 2011). The database provides appropriate details of the pathologies and general characteristics of the mammograms: The MIAS database reference number, character of background tissue (as it can be fatty, fatty-glandular and dense-glandular), class of abnormality (as it can be calcification, well-defined/circumscribed masses, spiculated masses, ill-defined masses, architectural distortion, asymmetry or normal), severity of abnormality (as it can be benign or malignant), the *(x,y)* image-coordinates of centre of abnormality and the approximate radius (in pixels) of a circle enclosing the abnormality. The resolution of the original images was 200 micron pixel edge so that every image's size was 1024 x 1024 pixels. The images are

All ROIs (calcified tissue samples) were selected using the reference given in the description

Enhancement algorithms have been employed for the improvement of contrast features and the suppression of noise (Papadopoulos, 2008). They are commonly used to increase the radiologist's detection effectiveness or as pre-processing stages of CAD schemes. In the preprocessing module, the significant features of the mammogram are enhanced, recovering most of the hidden characteristics and improving the image quality. According to recent findings (Papadopoulos, 2008), the contribution of the preprocessing module in the detection of ability of the CAD system is definite. Consequently, the final outcome of the

The pre-processing stage of the current approach is divided in two parts: (1) Contrast enhancement and, (2) microcalcification enhancement by the so-called top-hat algorithm. The usefulness of these methods is reported in the literature along with their potential to

In our previous work (Ramirez-Villegas et al., 2010; Ramirez-Villegas & Ramirez-Moreno, 2011), we implemented the Adaptive Histogram Equalization (AHE) as preprocessing stage. According to our findings, this technique can be applied to enhance the high frequency components of the image, i.e., microcalcifications, due to the computations applied to central and contextual region pixels. In order to avoid the noise amplification a contrast limited-equalization can be performed, especially in homogeneous areas. This method exhibits improvements over the Local-Area Histogram Equalization (LAHE), which presents high computational load and noise magnification due to standard histogram equalization computed for each pixel taking into account its neighborhood (contextual

**3.1 Mammographic database** 

centered in the matrix.

**3.2 Mammograms enhancement** 

CAD scheme depends largely on the pre-processing steps.

enhance signs present in mammographic images.

**3.2.1 Adaptive Histogram Equalization (AHE)** 

of the database.

region).

Many computational models of human visual search have embraced the idea of a saliency map to accomplish preattentive selection. This representation contains the overall neural activity elicited by the objects and non-objects of the space, which compete for processing spaces in the visual search according to primary visual features such as intensity, orientations, colors and motion. The conformation of feature maps is a consequence of highly structured receptive fields of cells in lateral geniculate nucleus (LGN) and, notably, V1. Certain well-established neurobiological evidence points out the existence of this neuronal map and, on the other hand, some other evidence rejects the idea of a topographical representation standing for the overall saliency of visual stimuli and, therefore, points out the selectivity as a consequence of interactions among feature maps, each codifying the saliency of objects in a specific feature (Itti & Koch, 2000).

Modeling of visual attention mechanisms seems to have reasonably high promise, and its application to microcalcification detection will be the main topic and purpose of this chapter. In this approach we perform pre-processing and post-processing stages using several computer vision algorithms. This allows us to identify the potential of the neurobiologicallyinspired visual mechanisms model as part of a CAD scheme. We also give some relevant comparisons in relation to our previous approach (Ramirez-Villegas et al., 2010).

#### **3. The proposed algorithm**

The algorithm proposed in this book chapter is illustrated by Figure 1. The overall procedure is divided in six stages: (1) Mammographic images were taken from the Mini-MIAS Database of Mammograms (see sub-section 3.1. for a detailed description of the data); (2) The region of interest (ROI) cropping is accomplished by using the available information on the description section of the database; specifically we took into account the location and the approximate radius of the circle enclosing the abnormalities (microcalcifications); (3) Adaptive histogram equalization and the so-called top-hat algorithm were performed as pre-processing steps in order to enhance the microcalcifications' traces; (4) A pre-attentive bottom-up visual model was implemented in order to preliminarily distinguish between calcified and non-calcified tissue; (5) Tukey outlier test-based segmentation was used to perform the final segmentation of sub-regions via the simulated gaze allocation outcomes obtained in the former step; (6) Finally, a Self-Organizing Map (SOM) neural network was implemented in order to adjust topologically the microcalcifications and to provide a final visual output.

Fig. 1. Overview of the proposed approach.

#### **3.1 Mammographic database**

164 Digital Image Processing

Many computational models of human visual search have embraced the idea of a saliency map to accomplish preattentive selection. This representation contains the overall neural activity elicited by the objects and non-objects of the space, which compete for processing spaces in the visual search according to primary visual features such as intensity, orientations, colors and motion. The conformation of feature maps is a consequence of highly structured receptive fields of cells in lateral geniculate nucleus (LGN) and, notably, V1. Certain well-established neurobiological evidence points out the existence of this neuronal map and, on the other hand, some other evidence rejects the idea of a topographical representation standing for the overall saliency of visual stimuli and, therefore, points out the selectivity as a consequence of interactions among feature maps,

Modeling of visual attention mechanisms seems to have reasonably high promise, and its application to microcalcification detection will be the main topic and purpose of this chapter. In this approach we perform pre-processing and post-processing stages using several computer vision algorithms. This allows us to identify the potential of the neurobiologicallyinspired visual mechanisms model as part of a CAD scheme. We also give some relevant

The algorithm proposed in this book chapter is illustrated by Figure 1. The overall procedure is divided in six stages: (1) Mammographic images were taken from the Mini-MIAS Database of Mammograms (see sub-section 3.1. for a detailed description of the data); (2) The region of interest (ROI) cropping is accomplished by using the available information on the description section of the database; specifically we took into account the location and the approximate radius of the circle enclosing the abnormalities (microcalcifications); (3) Adaptive histogram equalization and the so-called top-hat algorithm were performed as pre-processing steps in order to enhance the microcalcifications' traces; (4) A pre-attentive bottom-up visual model was implemented in order to preliminarily distinguish between calcified and non-calcified tissue; (5) Tukey outlier test-based segmentation was used to perform the final segmentation of sub-regions via the simulated gaze allocation outcomes obtained in the former step; (6) Finally, a Self-Organizing Map (SOM) neural network was implemented in order to adjust

each codifying the saliency of objects in a specific feature (Itti & Koch, 2000).

comparisons in relation to our previous approach (Ramirez-Villegas et al., 2010).

topologically the microcalcifications and to provide a final visual output.

**3. The proposed algorithm** 

Fig. 1. Overview of the proposed approach.

In this work, a total of 23 mammographic images containing microcalcified tissue were taken from The Mini-MIAS Database of Mammograms (Suckling, 1994), which is widely used by researchers to carry out and evaluate their research work before other researchers in the area of CAD of breast cancer. We have used this database in our previous research (Ramirez-Villegas et al., 2010; Ramirez-Villegas & Ramirez-Moreno, 2011). The database provides appropriate details of the pathologies and general characteristics of the mammograms: The MIAS database reference number, character of background tissue (as it can be fatty, fatty-glandular and dense-glandular), class of abnormality (as it can be calcification, well-defined/circumscribed masses, spiculated masses, ill-defined masses, architectural distortion, asymmetry or normal), severity of abnormality (as it can be benign or malignant), the *(x,y)* image-coordinates of centre of abnormality and the approximate radius (in pixels) of a circle enclosing the abnormality. The resolution of the original images was 200 micron pixel edge so that every image's size was 1024 x 1024 pixels. The images are centered in the matrix.

All ROIs (calcified tissue samples) were selected using the reference given in the description of the database.

#### **3.2 Mammograms enhancement**

Enhancement algorithms have been employed for the improvement of contrast features and the suppression of noise (Papadopoulos, 2008). They are commonly used to increase the radiologist's detection effectiveness or as pre-processing stages of CAD schemes. In the preprocessing module, the significant features of the mammogram are enhanced, recovering most of the hidden characteristics and improving the image quality. According to recent findings (Papadopoulos, 2008), the contribution of the preprocessing module in the detection of ability of the CAD system is definite. Consequently, the final outcome of the CAD scheme depends largely on the pre-processing steps.

The pre-processing stage of the current approach is divided in two parts: (1) Contrast enhancement and, (2) microcalcification enhancement by the so-called top-hat algorithm. The usefulness of these methods is reported in the literature along with their potential to enhance signs present in mammographic images.

#### **3.2.1 Adaptive Histogram Equalization (AHE)**

In our previous work (Ramirez-Villegas et al., 2010; Ramirez-Villegas & Ramirez-Moreno, 2011), we implemented the Adaptive Histogram Equalization (AHE) as preprocessing stage. According to our findings, this technique can be applied to enhance the high frequency components of the image, i.e., microcalcifications, due to the computations applied to central and contextual region pixels. In order to avoid the noise amplification a contrast limited-equalization can be performed, especially in homogeneous areas. This method exhibits improvements over the Local-Area Histogram Equalization (LAHE), which presents high computational load and noise magnification due to standard histogram equalization computed for each pixel taking into account its neighborhood (contextual region).

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 167

is the opening of the image *A x* ,*y* by a structuring element *B x* ,*y* , where and

As images are functions mapping a Euclidean space *E* into , , where is the set of real numbers, the grayscale erosion and dilatation of *A x* ,*y* by *B x* ,*y* are given,

, , argmin ', ' ( ' , ' )

, , arg max ', ' ( ', ')

The enhancement stage must be sensitive enough to emphasize small low-contrast objects, while it must have the required specificity to suppress the background. Usually the background corresponds to some smoothed fractions of the image provided by the tissue characteristics and image acquisition process; in consequence, these areas are softened regions of image which give no-relevant information about pathologies in many cases. In our last work (Ramirez-Villegas et al., 2010), the suppression is performed using difference

where *Ixy* (,) is the input image, and the additional term of convolution is the filter function. In this way, the convolution term corresponds to a smoothed version of the input image. DoG (Difference of Gaussians) is a linear filter implemented in several artificial vision tasks, which works by subtracting two Gaussian blurs of the image corresponding to

> 1 1 2 22 2 ( , ) exp exp *x y x y DoG x y A <sup>A</sup>*

 

The enhancing process with the DoG works in both the spatial and frequency domain. The

while the term which follows *An* peaks normalizes the sum of mask elements to unity in the image processing. Typically these parameters are determined in a heuristic way, according to the desired performance and microcalcifications and image general characteristics. Nevertheless, as a reference method for this research, there are some mathematical expressions (Ochoa, 1996) used to determine the DoG parameters according to microcalcifications' average width and Marr's ratio (Marr, 1982). For reference, an example

As background suppression using DoG filters is a well-known method, it will give us some feedback in order to compare the performance of the current approach and, consequently, to

performance of the filter is conditioned by parameters

express where it stands relative to the existing literature.

estimation. In Eq. (7), Standard deviation

of the DoG processing is in Figure 2.

, (4)

, (5)

(,) (,) (,) (,) *rI x y I x y DoG x y I x y* , (6)

2 2 2 2

 

*<sup>n</sup>* is related with lateral inhibition of the filter,

*<sup>n</sup>* and in one case, *An* peaks

, (7)

 

1 2 22 22 11 22

 ', '

 ', '

*xy E A x y B x y A x y Bx x y y* 

*xy E A xy B xy A x y Bx x y y* 

denote erosion and dilatation, respectively.

**3.3 Background suppression (revised method)** 

of Gaussians filters according to Eq. (6).

different functions widths.

respectively, by:

In order to decrease the computational load, equalization can be computed only for some pixels (and its context regions), as the image is divided into a mosaic; thereby, the modified pixel is the central pixel, and the others are obtained using a standard interpolation method. In this way, each contextual region will affect, with its equalization, another spatial zone which doubles its length.

The final value of each pixel will be obtained applying the pixel mapping given by

$$L(i) = \mathbb{C}\left[E\,N\_{-}(i) + (1-E)N\_{+-}(i)\right] + (1-\mathbb{C})\left[E\,N\_{-+}(i) + (1-E)N\_{++}(i)\right],\tag{1}$$

where *N* is the mapping of the left superior area, *N* is the mapping of the left inferior area, and so on; and

$$\mathcal{C} = \bigvee\_{-}^{} \begin{pmatrix} y - y\_{-} \\ y\_{+} - y\_{-} \end{pmatrix} \text{ and } E = \begin{pmatrix} \mathbf{x} - \mathbf{x}\_{-} \\ \mathbf{x}\_{+} - \mathbf{x}\_{-} \end{pmatrix}$$

#### **3.2.2 Top-hat algorithm**

As a matter of fact, background removal and microcalcification enhancing are considered as necessary procedures in many CAD applications, given the initial visibility and detectability of such mammographic signs. Morphological operations can be employed to enhance mammographic images at reasonable computational load-effectiveness. A large class of filters can be represented by mathematical morphology implementing two simple operations: Erosion and dilatation. When the signal of gray levels and the background of an image are constant, a standard image thresholding procedure can be performed to detect objects. Nonetheless, the top-hat algorithm becomes a very good choice when the signal of gray levels of the background is highly sparse, as it is the mammary tissue in a mammographic image.

The top-hat algorithm consists of a standard pixel-to-pixel subtraction of the original image from its opened version. The image opening is defined as the erosion of the image followed by its dilatation. Erosion is the morphologic operation in which a pixel, located at the center of the structuring element, is substituted by the minimum value of the pixels of the neighborhood. Hence, this operation reduces small regions with higher gray levels than those of the structuring element. On the other hand, dilatation is the opposite morphologic operation to erosion; in this case, the pixel located at the center of the structuring element is substituted by the maximum value of the pixels of the structuring element. Consequently, this operation enlarges the regions of the image with high gray levels which did not disappear as a result of the erosion step.

The top-hat algorithm can be formulated as follows:

$$A^\circ(\mathbf{x}, y) = A(\mathbf{x}, y) - \left\lfloor A(\mathbf{x}, y) \circ B(\mathbf{x}, y) \right\rfloor\_\prime \tag{2}$$

where:

$$A\begin{pmatrix} \mathbf{x}, \mathbf{y} \end{pmatrix} \circ B\begin{pmatrix} \mathbf{x}, \mathbf{y} \end{pmatrix} = \begin{bmatrix} A\begin{pmatrix} \mathbf{x}, \mathbf{y} \end{pmatrix} \Theta \mathcal{B}\begin{pmatrix} \mathbf{x}, \mathbf{y} \end{pmatrix} \end{pmatrix} \oplus B\begin{pmatrix} \mathbf{x}, \mathbf{y} \end{pmatrix},\tag{3}$$

is the opening of the image *A x* ,*y* by a structuring element *B x* ,*y* , where and denote erosion and dilatation, respectively.

As images are functions mapping a Euclidean space *E* into , , where is the set of real numbers, the grayscale erosion and dilatation of *A x* ,*y* by *B x* ,*y* are given, respectively, by:

$$A\left(\mathbf{x},\mathbf{y}\right)\Theta B\left(\mathbf{x},\mathbf{y}\right) = \operatorname\*{arg\,min}\_{\mathbf{x}^\prime,\mathbf{y}^\prime \in E} \left[ A\left(\mathbf{x}^\prime,\mathbf{y}^\prime\right) - B\left(\mathbf{x}^\prime - \mathbf{x}^\prime,\mathbf{y}^\prime - \mathbf{y}\right) \right],\tag{4}$$

$$A(\mathbf{x}, y) \oplus B(\mathbf{x}, y) = \underset{\mathbf{x}', y' \in E}{\arg\max} \left[ A(\mathbf{x}', y') + B(\mathbf{x} - \mathbf{x}', y - y') \right]\_{\mathcal{I}} \tag{5}$$

#### **3.3 Background suppression (revised method)**

166 Digital Image Processing

In order to decrease the computational load, equalization can be computed only for some pixels (and its context regions), as the image is divided into a mosaic; thereby, the modified pixel is the central pixel, and the others are obtained using a standard interpolation method. In this way, each contextual region will affect, with its equalization, another spatial zone

where *N* is the mapping of the left superior area, *N* is the mapping of the left inferior

As a matter of fact, background removal and microcalcification enhancing are considered as necessary procedures in many CAD applications, given the initial visibility and detectability of such mammographic signs. Morphological operations can be employed to enhance mammographic images at reasonable computational load-effectiveness. A large class of filters can be represented by mathematical morphology implementing two simple operations: Erosion and dilatation. When the signal of gray levels and the background of an image are constant, a standard image thresholding procedure can be performed to detect objects. Nonetheless, the top-hat algorithm becomes a very good choice when the signal of gray levels of the background is highly sparse, as it is the mammary tissue in a

The top-hat algorithm consists of a standard pixel-to-pixel subtraction of the original image from its opened version. The image opening is defined as the erosion of the image followed by its dilatation. Erosion is the morphologic operation in which a pixel, located at the center of the structuring element, is substituted by the minimum value of the pixels of the neighborhood. Hence, this operation reduces small regions with higher gray levels than those of the structuring element. On the other hand, dilatation is the opposite morphologic operation to erosion; in this case, the pixel located at the center of the structuring element is substituted by the maximum value of the pixels of the structuring element. Consequently, this operation enlarges the regions of the image with high gray levels which did not

*Li C EN i E N i C EN i E N i* () () 11 1 ( ) ( ) ( ) , (1)

*x x <sup>E</sup>*

*x x* 

*A xy Axy Axy Bxy* ', , , , , (2)

*Axy Bxy Axy Bxy Bxy* ,, ,, , , (3)

The final value of each pixel will be obtained applying the pixel mapping given by

*y y* , and

*y y <sup>C</sup>*

which doubles its length.

area, and so on; and

**3.2.2 Top-hat algorithm** 

mammographic image.

where:

disappear as a result of the erosion step.

The top-hat algorithm can be formulated as follows:

The enhancement stage must be sensitive enough to emphasize small low-contrast objects, while it must have the required specificity to suppress the background. Usually the background corresponds to some smoothed fractions of the image provided by the tissue characteristics and image acquisition process; in consequence, these areas are softened regions of image which give no-relevant information about pathologies in many cases. In our last work (Ramirez-Villegas et al., 2010), the suppression is performed using difference of Gaussians filters according to Eq. (6).

$$I\_r(\mathbf{x}, y) = I(\mathbf{x}, y) - D \mathbf{o} \mathbf{G}(\mathbf{x}, y) \otimes I(\mathbf{x}, y) \,, \tag{6}$$

where *Ixy* (,) is the input image, and the additional term of convolution is the filter function. In this way, the convolution term corresponds to a smoothed version of the input image. DoG (Difference of Gaussians) is a linear filter implemented in several artificial vision tasks, which works by subtracting two Gaussian blurs of the image corresponding to different functions widths.

$$DoG(x,y) = A\_1 \frac{1}{2\pi\sigma\_1^2} \exp\left[-\frac{x^2+y^2}{2\sigma\_1^2}\right] - A\_2 \frac{1}{2\pi\sigma\_2^2} \exp\left[-\frac{x^2+y^2}{2\sigma\_2^2}\right],\tag{7}$$

The enhancing process with the DoG works in both the spatial and frequency domain. The performance of the filter is conditioned by parameters *<sup>n</sup>* and in one case, *An* peaks estimation. In Eq. (7), Standard deviation *<sup>n</sup>* is related with lateral inhibition of the filter, while the term which follows *An* peaks normalizes the sum of mask elements to unity in the image processing. Typically these parameters are determined in a heuristic way, according to the desired performance and microcalcifications and image general characteristics. Nevertheless, as a reference method for this research, there are some mathematical expressions (Ochoa, 1996) used to determine the DoG parameters according to microcalcifications' average width and Marr's ratio (Marr, 1982). For reference, an example of the DoG processing is in Figure 2.

As background suppression using DoG filters is a well-known method, it will give us some feedback in order to compare the performance of the current approach and, consequently, to express where it stands relative to the existing literature.

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 169

Microcalcifications are low-contrast conspicuous locations in a background of distractors (surrounding mammary tissue and noisy regions). The competitive behavior of the neurons in the early stages of visual processing guarantees that there would be a biased competition in favor of certain objects of the space based on certain characteristics which make them 'unique'. But, how do unique features attract attention? Experimental evidence shows that neural structures in Lateral Geniculate Nucleus (LGN) and primary visual cortex (V1) are responsive to features which are common to all objects of the visual field, e.g., intensity, orientation, color opponency, motion, stereo disparity, among others. In this work, we assume that visual input is represented in the form of iconic, topographic feature maps. In order to construct such representations, we use center-surround computations in every feature at different spatial scales and within-feature spatial competition (Itti & Koch, 2000). All the information contained in these maps is combined to obtain a single representation,

This part of our approach computes saliency using two features studied by Itti et al., 1998, for the formerly visual attention model proposed model of Koch & Ullman, 1985: Intensity and orientation. These features are organized into 30 maps (6 for intensity, 24 for orientation; a detailed explanation of this is given further). These maps are combined using across-scale sums in order to obtain the conspicuity maps, which provide input for a unique saliency map (central representation). Figure 3 illustrates the overview of this processing

This model is limited to selective attention given by the properties of the visual stimuli and consequently it does not involve any volition-dependent process (top-down visual processing). Low-level visual features are directly extracted from the input image over different resolution scales using pyramid-like linear filters, i.e., the so-called Gaussian pyramids. This approach consists of successive filtering processes and compression of the input images (Burt & Adelson, 1983). This process is illustrated by the following equations:

where 0 *<sup>l</sup> l N* and *i j* , , 0 *<sup>l</sup> i C* , 0 *<sup>l</sup> j R* . *Nl* is the number of levels of the pyramid and, *Cl* and *Rl* are the dimensions of the image at the *l*th level. Finally, *w* is defined

> ˆ , 1, 1 , 0

*a x*

Typically the value of *a* is 0.4 and the value of *b* is 0.25, in consequence, the values of *w x* ˆ

*wx b x*

*a b x*

 

*g i j wmng i m j n*

, ,, 2 2 *l l*

1

, 2, 2 <sup>2</sup>

, (8)

*wmn wmwn* , ˆ ˆ , (9)

, (10)

2 2

*m n*

2 2

i.e., the saliency map.

according to Eq. (9) and Eq. (10).

are given by Eq. (11).

where *w*ˆ is a normal and symmetric function:

step.

#### **3.4 Bottom-up processing in visual cortex**

The selection of a part of the available sensory information before a detailed processing stage by intermediate and high visual centers is an ability of the visual system of primates. Koch and Ullman (Koch & Ullman, 1985) introduced the idea of a saliency map to accomplish preattentive selection. Saliency map can be defined as a two-dimensional representation that represents topographically the saliency of objects in the visual field. The competitive behavior of the neurons in this map gives rise to a single winning location, which corresponds to the most salient object. Subsequently, the next conspicuous locations are attended in order of decreasing saliency, given the prior inhibition to already attended locations.

Fig. 3. Overview of the visual cortex-like bottom-up processing step.

Microcalcifications are low-contrast conspicuous locations in a background of distractors (surrounding mammary tissue and noisy regions). The competitive behavior of the neurons in the early stages of visual processing guarantees that there would be a biased competition in favor of certain objects of the space based on certain characteristics which make them 'unique'. But, how do unique features attract attention? Experimental evidence shows that neural structures in Lateral Geniculate Nucleus (LGN) and primary visual cortex (V1) are responsive to features which are common to all objects of the visual field, e.g., intensity, orientation, color opponency, motion, stereo disparity, among others. In this work, we assume that visual input is represented in the form of iconic, topographic feature maps. In order to construct such representations, we use center-surround computations in every feature at different spatial scales and within-feature spatial competition (Itti & Koch, 2000). All the information contained in these maps is combined to obtain a single representation, i.e., the saliency map.

This part of our approach computes saliency using two features studied by Itti et al., 1998, for the formerly visual attention model proposed model of Koch & Ullman, 1985: Intensity and orientation. These features are organized into 30 maps (6 for intensity, 24 for orientation; a detailed explanation of this is given further). These maps are combined using across-scale sums in order to obtain the conspicuity maps, which provide input for a unique saliency map (central representation). Figure 3 illustrates the overview of this processing step.

This model is limited to selective attention given by the properties of the visual stimuli and consequently it does not involve any volition-dependent process (top-down visual processing). Low-level visual features are directly extracted from the input image over different resolution scales using pyramid-like linear filters, i.e., the so-called Gaussian pyramids. This approach consists of successive filtering processes and compression of the input images (Burt & Adelson, 1983). This process is illustrated by the following equations:

$$\log\_l(i,j) = \sum\_{m=-2}^{2} \sum\_{n=-2}^{2} w(m,n) \lg\_{l-1} \left( 2i + m, 2j + n \right),\tag{8}$$

where 0 *<sup>l</sup> l N* and *i j* , , 0 *<sup>l</sup> i C* , 0 *<sup>l</sup> j R* . *Nl* is the number of levels of the pyramid and, *Cl* and *Rl* are the dimensions of the image at the *l*th level. Finally, *w* is defined according to Eq. (9) and Eq. (10).

$$
\hat{w}(m,n) = \hat{w}(m)\hat{w}(n) \,. \tag{9}
$$

where *w*ˆ is a normal and symmetric function:

168 Digital Image Processing

Fig. 2. Example of background suppression by DoG: (a) Input ROI; (b) Enhanced ROI.

decreasing saliency, given the prior inhibition to already attended locations.

Fig. 3. Overview of the visual cortex-like bottom-up processing step.

The selection of a part of the available sensory information before a detailed processing stage by intermediate and high visual centers is an ability of the visual system of primates. Koch and Ullman (Koch & Ullman, 1985) introduced the idea of a saliency map to accomplish preattentive selection. Saliency map can be defined as a two-dimensional representation that represents topographically the saliency of objects in the visual field. The competitive behavior of the neurons in this map gives rise to a single winning location, which corresponds to the most salient object. Subsequently, the next conspicuous locations are attended in order of

**3.4 Bottom-up processing in visual cortex** 

$$\hat{w}\left(\mathbf{x}\right) = \begin{cases} b - \begin{cases} b - \begin{cases} b \\ \end{cases} \end{cases} & \mathbf{x} = -\mathbf{2} \text{, } \mathbf{2} \\ b, & \mathbf{x} = -\mathbf{1} \text{, } \mathbf{1} \\ a, & \mathbf{x} = \mathbf{0} \end{cases} \end{cases} \tag{10}$$

Typically the value of *a* is 0.4 and the value of *b* is 0.25, in consequence, the values of *w x* ˆ are given by Eq. (11).

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 171

*xx y* ' cos sin 

*yx y* ' sin cos 

> 4 4 2 3

2 3 0 45 90 135 º, º, º, º

All conspicuity maps are linearly combined into one saliency map according to Eq. (21).

2

<sup>1</sup>

Finally, as the objects in the space compete for processing spaces during visual processing, the locations in the saliency map representation compete for the highest saliency value into a winner-take-all (WTA) strategy. This means that the next location to be attended , *w w x y* is the most salient one in the saliency map; subsequently, the saliency map is inhibited by means of the so-called inhibition of return mechanism, allowing the model to

WTA models have been largely implemented for making decisions from a neurobiologically-inspired perspective (Koch & Ullman, 1985; Itti et al. 1998; Walther & Koch, 2006). It should be noted that in a neuronally plausible implementation, the saliency map could be modeled as a layer of leaky *integrate-and-fire neurons*, as a backwards WTA selection mechanism (Walther & Koch, 2006) or as a layer of neurons with logistic profiles implemented in the form of mean field equations (Ramirez-Moreno & Ramirez-Villegas, 2011). In the case of the leaky *integrate-and-fire neurons*, when a threshold potential is reached, a prototypical spike is generated and the capacitive charge of the neuron is shunted to zero (note that neurons here are RC circuit-based models). Therefore, the synaptic interactions among the units ensure that only the most active location of the saliency map remains and the potential elicited by other locations are suppressed. Similarly, using the mean field approach in a network of neural populations, the WTA approach emerges directly from the competitive behavior of the units, thereby, inhibitory and local excitatory

*O N N Ocs*

The purpose of the function *N*· is to normalize each conspicuity map. The simplest procedure to achieve such normalization is to adjust the dynamic range of the maps. However, it is possible to obtain a normalized map into an iterative or trained way (Itti &

*c sc I N Ics* 

simulate a visual scan path over the whole content of the image.

Once we obtain the 30 feature maps (6 for intensity and 24 for orientation), feature maps of the same type are linearly combined and, consequently, we obtain two conspicuity maps

, *<sup>c</sup>*

4 4

*c sc*

:

, *<sup>c</sup>*

, (19)

*S NI NO* , (21)

, (20)

, standard deviation

, (17)

, (18)

,

are even and odd Gabor filters, respectively, with aspect ratio

, and rotated coordinates by

wavelength

Koch, 2000).

, phase

(one for each feature):

$$\begin{aligned} \hat{w}(\mathbf{x}) = \begin{cases} 0.05, & \mathbf{x} = -2, \ 2 \\ 0.25, & \mathbf{x} = -1, \mathbf{l} \\ 0.4, & \mathbf{x} = \mathbf{0} \end{cases} \end{aligned} \tag{11}$$

Note that there is a 5x5 pattern of weights *w* to generate each pyramid array.

In this step, a total of nine spatial scales 01 8 , ,..., are created using the Gaussian pyramid scheme. This approach yields horizontal and vertical image reduction factors from 1:1 (scale zero) to 1:256 (scale eight) in eight octaves.

Subsequently, each feature is calculated using the center-surround scheme, which is highly related to the visual receptive fields. Such center-surround differences are calculated between coarse and fine resolution scales in every feature: The receptive center corresponds to a pixel at resolution level *c* 234 , , in the pyramid, and the surround is the corresponding pixel at resolution level*s c* , with 3 4, . As a result of the combination between the receptive center and surround resolution levels, we obtain a total of six feature maps.

Intensity contrast is extracted by standard band-pass filtering to calculate center surround differences between the established resolution levels:

$$I(\mathcal{c}, \mathbf{s}) = \left| I(\mathcal{c}) \Theta I(\mathbf{s}) \right| . \tag{12}$$

where *I c* is the center intensity signal, *I s* is the surround intensity signal and the symbol " " is termed across-scale subtraction, i.e., the point-by-point subtraction of images of different resolutions by interpolation to the finer scale.

Orientation is extracted using standard Gabor pyramids *O* , , where 0 45 90 135 º, º, º, º (Greenspan et al., 1994). Thereby, orientation contrast is defined as:

$$O(\mathbf{c}, \mathbf{s}, \theta) = \left| O(\mathbf{c}, \theta) \Theta O(\mathbf{s}, \theta) \right|, \tag{13}$$

where *O c* , and *O s* ,are the center and surround orientation signals, respectively.

The local orientation maps *O c* , and *O s* , are computed by convolving the levels of the intensity pyramid with standard Gabor filters (note that this procedure can be performed either in the frequency or spatial domain):

$$\, \, \, O\left(\sigma, \theta\right) = \left[I\left(\sigma\right) \* O\_E\left(\theta\right)\right]^2 + \left[I\left(\sigma\right) \* O\_O\left(\theta\right)\right]^2,\tag{14}$$

where is the resolution level, and *OE* and *OO* :

$$\text{Cov}\left(\mathbf{x}, \mathbf{y}, \theta\right) = \exp\left(-\frac{\mathbf{x}'^2 + \mathbf{y}'^2 y'^2}{2\delta^2}\right) \cos\left(2\pi \frac{\mathbf{x}'}{\lambda} + \boldsymbol{\nu}'\right),\tag{15}$$

$$O\_O\left(\mathbf{x}, y, \theta\right) = \exp\left(-\frac{\mathbf{x}'^2 + \mathbf{y}'^2 y'^2}{2\delta^2}\right) \sin\left(2\pi \frac{\mathbf{x}'}{\mathcal{A}} + \boldsymbol{\psi}'\right),\tag{16}$$

pyramid scheme. This approach yields horizontal and vertical image reduction factors from

Subsequently, each feature is calculated using the center-surround scheme, which is highly related to the visual receptive fields. Such center-surround differences are calculated between coarse and fine resolution scales in every feature: The receptive center corresponds to a pixel at resolution level *c* 234 , , in the pyramid, and the surround is the

combination between the receptive center and surround resolution levels, we obtain a total

Intensity contrast is extracted by standard band-pass filtering to calculate center surround

where *I c* is the center intensity signal, *I s* is the surround intensity signal and the symbol " " is termed across-scale subtraction, i.e., the point-by-point subtraction of images

0 45 90 135 º, º, º, º (Greenspan et al., 1994). Thereby, orientation contrast is defined as:

*Ocs Oc Os* ,, , ,

 

the intensity pyramid with standard Gabor filters (note that this procedure can be

2 2 ,\* \* *O IO IO*

2 22

2 22

2 ' ' ' , , exp cos *<sup>E</sup> x y <sup>x</sup> O xy*

2

*<sup>O</sup>* , , exp sin *x y <sup>x</sup> O xy*

   

are the center and surround orientation signals, respectively.

 

<sup>2</sup> 2

<sup>2</sup> 2

' ' '

 

 

 *E O* , (14)

> 

 

Orientation is extracted using standard Gabor pyramids *O*

and *O s* ,

 

, with *Ics Ic I s* , , (12)

ˆ ., , . ,

*wx x*

0 05 2 2 0 25 1 1 04 0

, (11)

3 4, . As a result of the

 ,

, (15)

, (16)

, (13)

are computed by convolving the levels of

, where

01 8 , ,..., are created using the Gaussian

*x*

., ,

*x*

Note that there is a 5x5 pattern of weights *w* to generate each pyramid array.

In this step, a total of nine spatial scales

1:1 (scale zero) to 1:256 (scale eight) in eight octaves.

corresponding pixel at resolution level*s c*

differences between the established resolution levels:

of different resolutions by interpolation to the finer scale.

of six feature maps.

where *O c* ,

where  and *O s* ,

The local orientation maps *O c* ,

performed either in the frequency or spatial domain):

is the resolution level, and *OE* and *OO* :

 are even and odd Gabor filters, respectively, with aspect ratio , standard deviation , wavelength , phase , and rotated coordinates by :

$$\mathbf{x}' = \mathbf{x}\cos(\theta) + y\sin(\theta) \,\mathrm{\,\,\,} \tag{17}$$

$$y' = -x\sin(\theta) + y\cos(\theta)',\tag{18}$$

Once we obtain the 30 feature maps (6 for intensity and 24 for orientation), feature maps of the same type are linearly combined and, consequently, we obtain two conspicuity maps (one for each feature):

$$\overline{I} = \bigoplus\_{c=2}^{4} \stackrel{c+4}{\oplus}\_{s=c+3} \mathcal{N}[I(c,s)] \, , \tag{19}$$

$$\overline{O} = \sum\_{\substack{\theta \in \{0^{\circ}, 4 \mathcal{S}^{\circ}, 90^{\circ}, 13 \mathcal{S}^{\circ}\}}} N \left| \bigoplus\_{\substack{c=2 \ s=c+3}}^{4} \frac{c+4}{\mathcal{S}} \, N \Big[ O(c, s) \right] \right| \,. \tag{20}$$

The purpose of the function *N*· is to normalize each conspicuity map. The simplest procedure to achieve such normalization is to adjust the dynamic range of the maps. However, it is possible to obtain a normalized map into an iterative or trained way (Itti & Koch, 2000).

All conspicuity maps are linearly combined into one saliency map according to Eq. (21).

$$S = \frac{1}{2} \left[ N\left(\overline{I}\right) + N\left(\overline{O}\right) \right],\tag{21}$$

Finally, as the objects in the space compete for processing spaces during visual processing, the locations in the saliency map representation compete for the highest saliency value into a winner-take-all (WTA) strategy. This means that the next location to be attended , *w w x y* is the most salient one in the saliency map; subsequently, the saliency map is inhibited by means of the so-called inhibition of return mechanism, allowing the model to simulate a visual scan path over the whole content of the image.

WTA models have been largely implemented for making decisions from a neurobiologically-inspired perspective (Koch & Ullman, 1985; Itti et al. 1998; Walther & Koch, 2006). It should be noted that in a neuronally plausible implementation, the saliency map could be modeled as a layer of leaky *integrate-and-fire neurons*, as a backwards WTA selection mechanism (Walther & Koch, 2006) or as a layer of neurons with logistic profiles implemented in the form of mean field equations (Ramirez-Moreno & Ramirez-Villegas, 2011). In the case of the leaky *integrate-and-fire neurons*, when a threshold potential is reached, a prototypical spike is generated and the capacitive charge of the neuron is shunted to zero (note that neurons here are RC circuit-based models). Therefore, the synaptic interactions among the units ensure that only the most active location of the saliency map remains and the potential elicited by other locations are suppressed. Similarly, using the mean field approach in a network of neural populations, the WTA approach emerges directly from the competitive behavior of the units, thereby, inhibitory and local excitatory

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 173

shifting) is a subset *S x* ', , *y S x y* ; the Tukey outlier test will set aside the observation

where 1*q* and <sup>3</sup> *q* denote the first and third quartiles of the sample, respectively. Once the arguments of the above expressions are obtained, pixels above *U* and below *L* are considered as outliers of the distribution. As microcalcifications at this stage of the approach appear as highly bright regions with atypical gray level values (outliers in the distribution of the resulting processed image), the segmentation threshold to segment them is equal to *U* . From a neural networks perspective, the segmentation procedure proposed in this work can

> <sup>1</sup> 0 ' , ' , *S xy U H S xy otherwise*

Under this scheme, the typical gray values of the distribution are discarded (set to zero) and

The final stage of the approach reported in this chapter, is the implementation of a SOM neural network in order to topologically adjust the microcalcifications and show the final outcome for diagnosis purposes. Figure 4 illustrates the architecture of the neural network

Self-Organizing Maps (SOM) have been largely implemented for a plethora of tasks, in a very similar way to those which other neural networks have been used to, e.g., pattern recognition, vision systems, signal processing, among others. In SOM-like neural networks, neighboring cells compete through mutual lateral interactions, and develop adaptively into specific detectors of different signal patterns (Kohonen, 1990). Each point of the input data shaping the structure of an N-dimensional space determines the spatial location of the weight of a cell in the network. Consequently, the network would be capable of giving a

the others are transferred to the next processing step (SOM neural network).

*S x* ' , *o o y L* , (23)

*Lq qq* 1 31 1 5. , (24)

*S x* ' , *o o y U* , (25)

*U q qq* 1 31 1 5. , (26)

, (27)

*S x* ' , *o o y* if one of the following conditions is fulfilled:

be seen as a hard-limit transfer function node, where:

**3.6 Self-organizing map (SOM) neural network** 

with the saliency map as input.

categorization of the input space.

where

or

where

connections among the neurons of the same layer produce the most active location to rise above the other ones (Ramirez-Moreno & Ramirez-Villegas, 2011).

As the main aim of the current approach is not to reproduce the brain dynamics in a one-toone implementation, we select the most active location in the saliency map in order to define the position where the model should attend; hence, we define the most salient location as follows:

$$FOA\_w = \arg\max\boxed{S(x, y)}\_{\prime}\tag{22}$$

where *FOAw* defines the winning location *x y xy w w* , , and , *x y* , 0 0 *xN yM* ', ' in the saliency map of dimensions '\* ' *N M* .

Under this strategy, the focus of attention (FOA) is shifted to the location of the winner neuron. Further, local inhibition must be applied in an area in the location of the FOA, in order to allow the system to determine a new winning location and then produce a new attentional shift. In order to reproduce such inhibition of return mechanism, when selecting the most active location in the map, a small excitation is activated in the surrounds of the FOA (Koch & Ullman, 1985), consequently, the shape of the FOA can be approximated to a disk whose radius is fixed according to the microcalcifications' average width (in this work, we compared the performance obtained using radiuses of 2, 3 and 5 pixels); subsequently, such location is inhibited by setting its activity to zero.

#### **3.5 Serial segmentation procedure**

Frequently the processing in the collected images is varying in quality (satisfactory quality and poor quality); hence, this establishes some individuality of the grey level contrast (Ramirez-Villegas et al., 2010) provided by the tissue characteristics and image acquisition process. Furthermore, regions of images such as mammograms are suitable to several segmentation algorithms. The image segmentation procedure must be specific enough to avoid false positives in the enhancing process.

In statistical analysis, when outliers are present, the estimates of the data are distorted. Consequently, these estimates are not suitable to make inferences about the data. In this case, these erroneous values should be eliminated for subsequent analysis purposes.

The Tukey outlier test (Hoaglin et al., 1983) assumes that there is no specific distribution of the data series. This method is based on the supposition that any distribution has a group of typical values surrounded by atypical data (i.e., outliers) that exaggerate the histogram length. The larger the sample size, the higher the probability of getting at least one outlier. The Tukey outlier test is based in, at least, two assumptions: (1) that the central part of the distribution contains most of the information of the genuine reference values; and (2) that outliers may be detected as values lying outside limits, taking into account the statistical properties of this central part.

In our work, we implemented this outlier detector as a serial segmentation algorithm using the FOAs determined by the saliency-based bottom-up approach described in Section 3.4. Each serially attended location (i.e., the circumscribed regions used to simulate attentional shifting) is a subset *S x* ', , *y S x y* ; the Tukey outlier test will set aside the observation *S x* ' , *o o y* if one of the following conditions is fulfilled:

$$S'(x\_{o'}y\_o) < L \,. \tag{23}$$

where

172 Digital Image Processing

connections among the neurons of the same layer produce the most active location to rise

As the main aim of the current approach is not to reproduce the brain dynamics in a one-toone implementation, we select the most active location in the saliency map in order to define the position where the model should attend; hence, we define the most salient location as

where *FOAw* defines the winning location *x y xy w w* , , and , *x y* , 0 0 *xN yM* ', '

Under this strategy, the focus of attention (FOA) is shifted to the location of the winner neuron. Further, local inhibition must be applied in an area in the location of the FOA, in order to allow the system to determine a new winning location and then produce a new attentional shift. In order to reproduce such inhibition of return mechanism, when selecting the most active location in the map, a small excitation is activated in the surrounds of the FOA (Koch & Ullman, 1985), consequently, the shape of the FOA can be approximated to a disk whose radius is fixed according to the microcalcifications' average width (in this work, we compared the performance obtained using radiuses of 2, 3 and 5 pixels); subsequently,

Frequently the processing in the collected images is varying in quality (satisfactory quality and poor quality); hence, this establishes some individuality of the grey level contrast (Ramirez-Villegas et al., 2010) provided by the tissue characteristics and image acquisition process. Furthermore, regions of images such as mammograms are suitable to several segmentation algorithms. The image segmentation procedure must be specific enough to

In statistical analysis, when outliers are present, the estimates of the data are distorted. Consequently, these estimates are not suitable to make inferences about the data. In this

The Tukey outlier test (Hoaglin et al., 1983) assumes that there is no specific distribution of the data series. This method is based on the supposition that any distribution has a group of typical values surrounded by atypical data (i.e., outliers) that exaggerate the histogram length. The larger the sample size, the higher the probability of getting at least one outlier. The Tukey outlier test is based in, at least, two assumptions: (1) that the central part of the distribution contains most of the information of the genuine reference values; and (2) that outliers may be detected as values lying outside limits, taking into account the statistical

In our work, we implemented this outlier detector as a serial segmentation algorithm using the FOAs determined by the saliency-based bottom-up approach described in Section 3.4. Each serially attended location (i.e., the circumscribed regions used to simulate attentional

case, these erroneous values should be eliminated for subsequent analysis purposes.

*FOAw* argmax , *Sxy* , (22)

above the other ones (Ramirez-Moreno & Ramirez-Villegas, 2011).

in the saliency map of dimensions '\* ' *N M* .

**3.5 Serial segmentation procedure** 

properties of this central part.

avoid false positives in the enhancing process.

such location is inhibited by setting its activity to zero.

follows:

$$L = \left[q\_1 - 1.\mathcal{S}(q\_3 - q\_1)\right].\tag{24}$$

or

$$S'(x\_{o'}y\_o) > \mathcal{U}\_{\prime} \tag{25}$$

where

$$
\Box I = \left[ q\_1 + 1.5 \left( q\_3 - q\_1 \right) \right]\_{\prime} \tag{26}
$$

where 1*q* and <sup>3</sup> *q* denote the first and third quartiles of the sample, respectively. Once the arguments of the above expressions are obtained, pixels above *U* and below *L* are considered as outliers of the distribution. As microcalcifications at this stage of the approach appear as highly bright regions with atypical gray level values (outliers in the distribution of the resulting processed image), the segmentation threshold to segment them is equal to *U* .

From a neural networks perspective, the segmentation procedure proposed in this work can be seen as a hard-limit transfer function node, where:

$$H\left[S^{\circ}(\mathbf{x},y)\right] = \begin{cases} 1 & S^{\circ}(\mathbf{x},y) > \mathcal{U} \\ 0 & \text{otherwise} \end{cases},\tag{27}$$

Under this scheme, the typical gray values of the distribution are discarded (set to zero) and the others are transferred to the next processing step (SOM neural network).

#### **3.6 Self-organizing map (SOM) neural network**

The final stage of the approach reported in this chapter, is the implementation of a SOM neural network in order to topologically adjust the microcalcifications and show the final outcome for diagnosis purposes. Figure 4 illustrates the architecture of the neural network with the saliency map as input.

Self-Organizing Maps (SOM) have been largely implemented for a plethora of tasks, in a very similar way to those which other neural networks have been used to, e.g., pattern recognition, vision systems, signal processing, among others. In SOM-like neural networks, neighboring cells compete through mutual lateral interactions, and develop adaptively into specific detectors of different signal patterns (Kohonen, 1990). Each point of the input data shaping the structure of an N-dimensional space determines the spatial location of the weight of a cell in the network. Consequently, the network would be capable of giving a categorization of the input space.

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 175

The parameters of the function (the Gaussian widths) define the size of the neighborhood. Typically, it changes according to a monotonically decreasing function throughout the whole training procedure. In the current implementation such function for either Gaussian

1 ,; , exp 2 22 *<sup>x</sup> <sup>y</sup> f xy*

 

<sup>0</sup>

after which narrowing the neighborhood improves the spatial resolution of the map.

Finally, the updating process of the weights is given by the following equation:

*t* is the so-called adaptation gain 0 1

 

*n n*

*t*

,

where *t* represents the current training cycle and *T* the total number of train cycles. The

(the neurons' distribution), the segmented saliency map size or in a heuristic way. It should be noted that a wide initial neighborhood first induces a rough global order in the *wj* values

> 1 · *w t w t tf X w t j j*

For illustrative purposes, a topology adjustment example by a SOM network is given by Figure 5. In this example, the input space is a square-shaped random distribution of points,

Fig. 5. SOM neural network (two-dimensional circular array) in a squared input space: (a)

Initial weights (iteration 0); (b) weights after several training cycles (iteration 100).

*<sup>f</sup>* learning rates must be small values and

the network learns the topology of the input space. Typically, this parameter is also described by a monotonically decreasing function. In our work, it has the following form:

> 0 0 *<sup>f</sup> <sup>t</sup> <sup>t</sup>*

 

1 2 2 2

1 2 1 2

*t T n f*

*<sup>n</sup>*, *<sup>f</sup>* neighborhood sizes can be estimated according to the map size

*T*

*i j* , (32)

*t* , which is related to the rate at which

, (33)

 *<sup>f</sup>* 0 .

0 ,

,

*n*

 

2 2

  , (30)

, (31)

 

width is given by:

*<sup>n</sup>*,<sup>0</sup> and final

initial 

where 

here, the initial

0 and final

Fig. 4. Scheme of the neural network implemented in this work.

Let , , *<sup>T</sup> X x i ii s <sup>y</sup> S x <sup>y</sup>* be a two-dimensional input vector in the segmented saliency map. The weight vector of the node *j* in the SOM layer is therefore denoted by *w w w Sx j jj s* , , 1 2 , , *<sup>y</sup>* . We define an analytical measure of match between *<sup>X</sup>* and *<sup>w</sup>* . The simplest way to define the match may be the inner product *<sup>T</sup> X wi <sup>j</sup>* ; however, the Euclidean distance gives better and more convenient matching criterion (Kohonen, 1990). The Euclidean distance between the input patterns and the vector of weights is defined as:

$$\|E\_{ji} = \|w\_j - X\_i\| = \sqrt{\left(w\_{j,1} - x\_i\right)^2 + \left(w\_{j,2} - y\_i\right)}\tag{28}$$

The minimum Euclidean distance defines the winner neuron at the current iteration. Hence, there is a single neuron chosen such that:

$$E\_c = \underset{j, i}{\arg\min} \left( E\_{ji} \right)\_{\prime} \tag{29}$$

Lateral interactions among the units are enforced by defining a neighborhood set ' *n* , around the winner unit. At each learning step the cells within the neighborhood are updated. Depending on the neighborhood function, the cells outside ' *n* are left intact or almost intact. Such function technically defines the adaptation strength among the neurons of the map. For a closer proximity to the winner node, stronger adaptation strength is elicited by the other nodes. In our work we used an elliptical Gaussian function, which according to our experimentation gave robust solution to the topologic adjustment task:

Let , , *<sup>T</sup> X x i ii s <sup>y</sup> S x <sup>y</sup>* be a two-dimensional input vector in the segmented saliency map. The weight vector of the node *j* in the SOM layer is therefore denoted by *w w w Sx j jj s* , , 1 2 , , *<sup>y</sup>* . We define an analytical measure of match between *<sup>X</sup>* and *<sup>w</sup>* . The simplest way to define the match may be the inner product *<sup>T</sup> X wi <sup>j</sup>* ; however, the Euclidean distance gives better and more convenient matching criterion (Kohonen, 1990). The Euclidean distance between the input patterns and the vector of weights is defined as:

The minimum Euclidean distance defines the winner neuron at the current iteration. Hence,

, *<sup>c</sup>* arg min *ji j i*

Lateral interactions among the units are enforced by defining a neighborhood set ' *n* , around the winner unit. At each learning step the cells within the neighborhood are updated. Depending on the neighborhood function, the cells outside ' *n* are left intact or almost intact. Such function technically defines the adaptation strength among the neurons of the map. For a closer proximity to the winner node, stronger adaptation strength is elicited by the other nodes. In our work we used an elliptical Gaussian function, which according to our experimentation gave robust solution to the topologic

<sup>2</sup>

*E wX w x w ji <sup>j</sup> <sup>i</sup> <sup>j</sup>*, , 1 2 *<sup>i</sup> <sup>j</sup> <sup>i</sup> y* , (28)

*E E* , (29)

Fig. 4. Scheme of the neural network implemented in this work.

there is a single neuron chosen such that:

adjustment task:

$$f(\mathbf{x}, y; \sigma\_1, \sigma\_2) = \frac{1}{2\pi\sigma\_1\sigma\_2} \exp\left[-\left(\frac{\mathbf{x}^2}{2\sigma\_1^2} + \frac{y^2}{2\sigma\_2^2}\right)\right] \tag{30}$$

The parameters of the function (the Gaussian widths) define the size of the neighborhood. Typically, it changes according to a monotonically decreasing function throughout the whole training procedure. In the current implementation such function for either Gaussian width is given by:

$$
\sigma\_n(t) = \sigma\_{n,0} \left( \frac{\sigma\_{n,f}}{\sigma\_{n,0}} \right)^{t\_T} \tag{31}
$$

where *t* represents the current training cycle and *T* the total number of train cycles. The initial *<sup>n</sup>*,<sup>0</sup> and final *<sup>n</sup>*, *<sup>f</sup>* neighborhood sizes can be estimated according to the map size (the neurons' distribution), the segmented saliency map size or in a heuristic way. It should be noted that a wide initial neighborhood first induces a rough global order in the *wj* values after which narrowing the neighborhood improves the spatial resolution of the map.

Finally, the updating process of the weights is given by the following equation:

$$w\_j(t+1) = w\_j(t) + a(t)f(\cdot) \left[ X\_i - w\_j(t) \right],\tag{32}$$

where *t* is the so-called adaptation gain 0 1 *t* , which is related to the rate at which the network learns the topology of the input space. Typically, this parameter is also described by a monotonically decreasing function. In our work, it has the following form:

$$
\alpha \left( t \right) = \alpha\_0 + \left( \alpha\_f - \alpha\_0 \right) \left( \frac{t}{T} \right) \tag{33}
$$

here, the initial 0 and final *<sup>f</sup>* learning rates must be small values and *<sup>f</sup>* 0 .

For illustrative purposes, a topology adjustment example by a SOM network is given by Figure 5. In this example, the input space is a square-shaped random distribution of points,

Fig. 5. SOM neural network (two-dimensional circular array) in a squared input space: (a) Initial weights (iteration 0); (b) weights after several training cycles (iteration 100).

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 177

iterations) and a small inhibition factor (between 0 and 1) in order to avoid undesired over-

Fig. 6. Example illustrating the processing steps of the proposed approach: (a) Equalized ROI; (b) output of the top-hat algorithm; (c) saliency map; (d) segmented image; (e)

Figure 6(d) and Figure 7(d) illustrate the results of the serial segmentation. Note that the specificity of the bottom-up processing increases with the pattern discrimination obtained after the serial calculation of the Tukey outlier test. For these examples the radius of the FOA was 2 pixels. The white locations in Figure 6(d) and Figure 7(d) were those for which the statistical procedure detected at least one outlier. Furthermore, like in many other relevant situations, according to our results, it is hard to find an algorithm that can handle all the possible scenarios and all mammographic images' conditions. In addition, regardless of the distribution of the FOA, in absence of outliers (microcalcifications), the Tukey statistical test provided a low rate of false detections (specificity). We performed extensive experiments to evaluate the serial segmentation algorithm by limiting the maximum number of attended locations by the saliency-based bottom-up model; the algorithm's outcome limiting the number of attended locations did not present large variations as if the

Figure 6(e) and Figure 7(e) show the topologic adjustment of microcalcifications performed by the SOM neural network. This performance was obtained by training the network over 500 cycles, in which the locations of the possibly pathological regions were given. Although the topological adjustment made by the SOM network is accurate and suitable for the application, some microcalcifications were not associated because the number of neurons in the input space was limited due to computational load constraints. Further research will be

topologic adjustment of microcalfications by SOM neural network.

attention shifting occurred across the whole saliency map.

needed to evaluate other schemes in the topologic adjustment task.

competitive behaviour among the neurons of the map.

while the network (initially) is a circle-shaped array of interconnected units. Note that the weight vectors tend to approximate the density function of the input vectors after a few training cycles (the blue edges indicate that the neurons are neighbors in the grid).

### **4. Results**

As aforementioned, we tested our approach using The Mini-MIAS Database of Mammograms (Suckling, 1994), which is widely used by researchers to carry out and evaluate their research work before other researchers in the area of CAD of breast cancer. From this database, a total of 23 mammographic images made part of our study (those containing microcalcifications). The background and tissue character in the images enabled us to test the algorithm with certain variability of conditions. ROIs were extracted according to the specifications of the database in the form of squared regions enclosing the microcalcification clusters. In some cases calcifications were widely distributed throughout the mammogram rather than concentrated at single sites; in these cases various ROIs of the images containing microcalcifications were extracted. Subsequently, all the processing steps were performed according to Figure 1.

In this section we present the main outcomes of the proposed methodology. In Section 4.1 we give some relevant examples to illustrate how the proposed CAD application operates during mammogram inspection. Similarly, in Section 4.2 we present comparative Free-Response Operating Characteristic (FROC) curves to test the outcome of our methodology varying the radius of the FOAs in the saliency-based bottom-up model. We also conduct relevant comparisons between the proposed algorithm and the DoG approach and additionally, other comparisons are made between the performance obtained in the detection of benign microcalcification signs and the detection of malign microcalcification signs.

#### **4.1 Experimental results**

The analysed ROIs containing the microcalcifications in the mammograms vary in radius from 8 to 93 pixels, and the performance of the SOM neural network was achieved in 500 training cycles, in which location of the possibly pathological regions are given as an output.

In Figure 6 and Figure 7, examples of microcalcification detection are presented. Note that after the preprocessing stages (the image histogram equalization and the top-hat algorithm), the saliency-based bottom-up approach reveals the locations of the image which the visual system should attend to. In this case, the visual processing model biases the competition among the different locations of the image in favour of certain objects of the space. The attended conspicuous objects in this case are the microcalcifications present in the mammograms. As the degree of conspicuity of the microcalcifications on the preprecessed images varies, the saliency map activity is somewhat heterogeneous. This illustrates that the neural responses elicited by the objects and the competitive interactions among certain locations in the maps induce one target to rise above the others at a given time instant. In addition, our model incorporates the iterative normalization strategy described by Itti & Koch, 2000, which consists on iteratively convolving the feature maps by a 2D DoG filter, adding the result to the original image and setting the negative results to zero after each iteration. We tested the model with a reduced number of iterations (a maximum of 3

while the network (initially) is a circle-shaped array of interconnected units. Note that the weight vectors tend to approximate the density function of the input vectors after a few

As aforementioned, we tested our approach using The Mini-MIAS Database of Mammograms (Suckling, 1994), which is widely used by researchers to carry out and evaluate their research work before other researchers in the area of CAD of breast cancer. From this database, a total of 23 mammographic images made part of our study (those containing microcalcifications). The background and tissue character in the images enabled us to test the algorithm with certain variability of conditions. ROIs were extracted according to the specifications of the database in the form of squared regions enclosing the microcalcification clusters. In some cases calcifications were widely distributed throughout the mammogram rather than concentrated at single sites; in these cases various ROIs of the images containing microcalcifications were extracted. Subsequently, all the processing steps

In this section we present the main outcomes of the proposed methodology. In Section 4.1 we give some relevant examples to illustrate how the proposed CAD application operates during mammogram inspection. Similarly, in Section 4.2 we present comparative Free-Response Operating Characteristic (FROC) curves to test the outcome of our methodology varying the radius of the FOAs in the saliency-based bottom-up model. We also conduct relevant comparisons between the proposed algorithm and the DoG approach and additionally, other comparisons are made between the performance obtained in the detection of benign microcalcification signs and the detection of malign microcalcification

The analysed ROIs containing the microcalcifications in the mammograms vary in radius from 8 to 93 pixels, and the performance of the SOM neural network was achieved in 500 training cycles, in which location of the possibly pathological regions are given as an output. In Figure 6 and Figure 7, examples of microcalcification detection are presented. Note that after the preprocessing stages (the image histogram equalization and the top-hat algorithm), the saliency-based bottom-up approach reveals the locations of the image which the visual system should attend to. In this case, the visual processing model biases the competition among the different locations of the image in favour of certain objects of the space. The attended conspicuous objects in this case are the microcalcifications present in the mammograms. As the degree of conspicuity of the microcalcifications on the preprecessed images varies, the saliency map activity is somewhat heterogeneous. This illustrates that the neural responses elicited by the objects and the competitive interactions among certain locations in the maps induce one target to rise above the others at a given time instant. In addition, our model incorporates the iterative normalization strategy described by Itti & Koch, 2000, which consists on iteratively convolving the feature maps by a 2D DoG filter, adding the result to the original image and setting the negative results to zero after each iteration. We tested the model with a reduced number of iterations (a maximum of 3

training cycles (the blue edges indicate that the neurons are neighbors in the grid).

**4. Results** 

signs.

**4.1 Experimental results** 

were performed according to Figure 1.

iterations) and a small inhibition factor (between 0 and 1) in order to avoid undesired overcompetitive behaviour among the neurons of the map.

Fig. 6. Example illustrating the processing steps of the proposed approach: (a) Equalized ROI; (b) output of the top-hat algorithm; (c) saliency map; (d) segmented image; (e) topologic adjustment of microcalfications by SOM neural network.

Figure 6(d) and Figure 7(d) illustrate the results of the serial segmentation. Note that the specificity of the bottom-up processing increases with the pattern discrimination obtained after the serial calculation of the Tukey outlier test. For these examples the radius of the FOA was 2 pixels. The white locations in Figure 6(d) and Figure 7(d) were those for which the statistical procedure detected at least one outlier. Furthermore, like in many other relevant situations, according to our results, it is hard to find an algorithm that can handle all the possible scenarios and all mammographic images' conditions. In addition, regardless of the distribution of the FOA, in absence of outliers (microcalcifications), the Tukey statistical test provided a low rate of false detections (specificity). We performed extensive experiments to evaluate the serial segmentation algorithm by limiting the maximum number of attended locations by the saliency-based bottom-up model; the algorithm's outcome limiting the number of attended locations did not present large variations as if the attention shifting occurred across the whole saliency map.

Figure 6(e) and Figure 7(e) show the topologic adjustment of microcalcifications performed by the SOM neural network. This performance was obtained by training the network over 500 cycles, in which the locations of the possibly pathological regions were given. Although the topological adjustment made by the SOM network is accurate and suitable for the application, some microcalcifications were not associated because the number of neurons in the input space was limited due to computational load constraints. Further research will be needed to evaluate other schemes in the topologic adjustment task.

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 179

Additionally, Figure 8 illustrates the FOA shifting for the first six attended locations in the saliency map in the example of Figure 6. In this case, the FOA is represented by a disk of radius 2 pixels. In general, although it depends on the ROI size, the processed ROI required approximately from 16 to 300 shifts (with overlapping) to cover all the possible saliency map locations. Since larger ROIs took somewhat longer to be analyzed by the algorithm, in the slowest case the processing steps took approximately one minute to be performed. Furthermore, since mammographic regions could be considered as highly cluttered scenes,

The performance of the proposed system is shown by FROC curves in Figure 9 and Figure 10. A FROC or Free-response Receiver Operating Characteristic curve is the plot of the lesion localization vs. the non-lesion localization, as the threshold to report a finding is varied; FROC curves are mainly implemented to objectively evaluate and analyse image processing algorithms, such as imaging CAD algorithms. Increasing the sensitivity of the algorithm can lead to false positives when reaching the detection of subtle signs. The experimental results of this algorithm were directed to how the proposed algorithm can improve the diagnosis of pathological signs (in this case, microcalcifications). When a microcalcification (or microcalcification cluster) is detected at the approximate position given in the database specifications, we count a true positive (TP). Otherwise, if a microcalcification (or microcalcification cluster) is detected outside the approximate radius indicated in the database, we count a false positive (FP). Furthermore, the malignancy of the pathologies in diagnostic images in different modalities should be one of the main topics in CAD evaluations, as it provides information about how specific are the techniques or approaches in detection of pathologies; thereby they can be characterized by powerful descriptors such as the size of the signs, character of the background tissue, characterization of the abnormality (e. g. single or clustered microcalcifications) and the approximated radius

Figure 9 illustrates the FROC curves for different FOA radiuses. Note that as the FOA becomes narrower, the overall performance of the proposed neurobiologically-inspired algorithm increases. This is an expected effect that emerges from the visual system's features: As the circumscribed region to which attention is directed reduces, the sensitivity of the system increases. Furthermore, this is a convenient strategy when the scenes are too cluttered and consequently, difficult to analyse. The maximum performance reached by the proposed algorithm was approximately 92.0% at one false positive and 100.0% at 1.5 false positives implementing a 2-pixel FOA radius. Note that at this stage, the visual cortex model described in this work is limited to the bottom-up control of attention. Furthermore, we have followed this strategy as our main concern is the localization of the stimuli to be

From the FROC curves in Figure 10 and specifically the true positive ratios and the average number of false positives per image, it should be noticed that the pre-attentive bottom-up model outperforms the DoG-based approach. In general, DoG kernels exhibit a medium specificity, which allows the use of a single filter to enhance all the microcalcifications under certain conditions. This means that in order to make the system more robust and make microcalcifications of all sizes detectable, a bank of filters would be needed. Although the

the current state of this model reproduces many classical results in psychophysics.

**4.2 Performance evaluation** 

of the pathology in each image.

attended, not their identification.

Fig. 8. Example of the operation of the visual saliency model with the mammographic ROI in Figure 6. Note that once the visual machinery model combines the information of the topographic conspicuity maps into the saliency map, the most salient locations of the scene are attended into a serial strategy (the black arrows indicate the spatial shifts of the FOA).

Additionally, Figure 8 illustrates the FOA shifting for the first six attended locations in the saliency map in the example of Figure 6. In this case, the FOA is represented by a disk of radius 2 pixels. In general, although it depends on the ROI size, the processed ROI required approximately from 16 to 300 shifts (with overlapping) to cover all the possible saliency map locations. Since larger ROIs took somewhat longer to be analyzed by the algorithm, in the slowest case the processing steps took approximately one minute to be performed. Furthermore, since mammographic regions could be considered as highly cluttered scenes, the current state of this model reproduces many classical results in psychophysics.

#### **4.2 Performance evaluation**

178 Digital Image Processing

Fig. 7. Example illustrating the processing steps of the proposed approach: (a) Equalized ROI; (b) output of the top-hat algorithm; (c) saliency map; (d) segmented image; (e)

Fig. 8. Example of the operation of the visual saliency model with the mammographic ROI in Figure 6. Note that once the visual machinery model combines the information of the topographic conspicuity maps into the saliency map, the most salient locations of the scene are attended into a serial strategy (the black arrows indicate the spatial shifts of the FOA).

topologic adjustment of microcalfications by SOM neural network.

The performance of the proposed system is shown by FROC curves in Figure 9 and Figure 10. A FROC or Free-response Receiver Operating Characteristic curve is the plot of the lesion localization vs. the non-lesion localization, as the threshold to report a finding is varied; FROC curves are mainly implemented to objectively evaluate and analyse image processing algorithms, such as imaging CAD algorithms. Increasing the sensitivity of the algorithm can lead to false positives when reaching the detection of subtle signs. The experimental results of this algorithm were directed to how the proposed algorithm can improve the diagnosis of pathological signs (in this case, microcalcifications). When a microcalcification (or microcalcification cluster) is detected at the approximate position given in the database specifications, we count a true positive (TP). Otherwise, if a microcalcification (or microcalcification cluster) is detected outside the approximate radius indicated in the database, we count a false positive (FP). Furthermore, the malignancy of the pathologies in diagnostic images in different modalities should be one of the main topics in CAD evaluations, as it provides information about how specific are the techniques or approaches in detection of pathologies; thereby they can be characterized by powerful descriptors such as the size of the signs, character of the background tissue, characterization of the abnormality (e. g. single or clustered microcalcifications) and the approximated radius of the pathology in each image.

Figure 9 illustrates the FROC curves for different FOA radiuses. Note that as the FOA becomes narrower, the overall performance of the proposed neurobiologically-inspired algorithm increases. This is an expected effect that emerges from the visual system's features: As the circumscribed region to which attention is directed reduces, the sensitivity of the system increases. Furthermore, this is a convenient strategy when the scenes are too cluttered and consequently, difficult to analyse. The maximum performance reached by the proposed algorithm was approximately 92.0% at one false positive and 100.0% at 1.5 false positives implementing a 2-pixel FOA radius. Note that at this stage, the visual cortex model described in this work is limited to the bottom-up control of attention. Furthermore, we have followed this strategy as our main concern is the localization of the stimuli to be attended, not their identification.

From the FROC curves in Figure 10 and specifically the true positive ratios and the average number of false positives per image, it should be noticed that the pre-attentive bottom-up model outperforms the DoG-based approach. In general, DoG kernels exhibit a medium specificity, which allows the use of a single filter to enhance all the microcalcifications under certain conditions. This means that in order to make the system more robust and make microcalcifications of all sizes detectable, a bank of filters would be needed. Although the

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 181

perceptual saliency and their weighing can be influenced by top-down modulation (Deco & Rolls, 2004; Peters et al., 2005; Serre et al., 2006). There is experimental evidence concerning strong interactions among different visual modalities such as color and orientation for certain visual locations; these interactions are subjected to top-down modulation and training. On the other hand, the most important issue regarding the bottom-up processing is the contrast among features instead of the absolute intensity of each feature. However, the primary visual neurons are not only tuned to some kind of local spatial contrast, but also to neural responses elicited tightly by context, in a structure that extends the range of the

It is likely that the relative weight of the features which contribute to the most general representation is modulated by higher cortical centers. In this way, the attention process selects the necessary information to discriminate between the distractors and the target as both bottom-up and top-down processes are carried out to analyze the same scene: Here the top-down process is related to previously acquired knowledge that biases the neural processing competition among the objects, therefore the recognition is performed by selecting the next eye movement that maximizes the information gain. The computational challenge, then, lies in the integration of bottom-up and top-down cues, such as to provide coherent control signals for the focus of attention, and in the interplay between attentional orientating and scene or object recognition (Itti & Koch, 2001). As the current approach is familiar only to bottom-up processes, it presents high resemblance and compatibility to integrate top-down processes. Integrating such processes would raise the overall

**5.2 Could this approach be extended to mass detection? Through multi-sign detection**  Another important issue is the possibility of multi-sign detection, i.e., the detection of multiple signs on the same image modality. Current image processing techniques make the primitive breast abnormalities detection easier (Verma, 2008), nevertheless, the detection of these abnormalities leads to many false detections which depend on the robustness of the vision system (Vilarrasa, 2006; Ramirez-Villegas et al., 2010). On the other hand, the modular architectures of most of such existing systems lead to the necessity of creating separate algorithms for detecting different kinds of cancer signs, e.g., microcalcifications and masses. As these two types of abnormalities are in several ways remarkably different, many researchers have addressed these two diagnostic tasks separately; consequently, the difficulty on detecting cancer rises in direct proportion with the number of implemented algorithms for such tasks (in this case, at least two different processing pathways). Nevertheless, visual attention modeling could be an important step towards the development of a fully-comprehensive CAD system for mammographic image analysis. Up to the knowledge of the authors of this chapter, the potential of such models in the analysis of mammographic images have not been yet issued, nor identified. Theoretically, given the features of the visual processes intended to be modeled, any visual cortex-like model would be capable of helping (to some extent) in the analysis of

Figure 11 illustrates an example of how the saliency-based bottom-up model operates for a

classical receptive field.

any diagnostic image.

mammographic image containing a mass.

performance of the mammography CAD system.

Fig. 9. FROC curves illustrating the performance of the proposed approach for different FOAs' radiuses (in pixels).

Fig. 10. FROC curves illustrating the performance of (a) the proposed approach; and (b) the DoG approach reported by Ramirez-Villegas et al., 2010, for the malign and benign cases.

performance of the DoG filters could be somewhat limited, DoG filters attenuate (to some extent) adequately the low frequencies, which is highly desirable in some of the processing stages. On the other hand, the proposed approach adapts better to all mammographic conditions given the multi-resolution processing strategy of the visual attention model and the center surround interactions; this makes the model selective enough to enhance the conspicuous locations and suppress low frequency components as well as some high ones in a band pass-like strategy.

#### **5. Discussion, conclusions and perspectives**

#### **5.1 Visual cortex mechanisms**

Many computational principles regarding bottom-up and top-down visual processing have emerged from experimental and modeling studies. Different features contribute to

Fig. 9. FROC curves illustrating the performance of the proposed approach for different

Fig. 10. FROC curves illustrating the performance of (a) the proposed approach; and (b) the DoG approach reported by Ramirez-Villegas et al., 2010, for the malign and benign cases.

performance of the DoG filters could be somewhat limited, DoG filters attenuate (to some extent) adequately the low frequencies, which is highly desirable in some of the processing stages. On the other hand, the proposed approach adapts better to all mammographic conditions given the multi-resolution processing strategy of the visual attention model and the center surround interactions; this makes the model selective enough to enhance the conspicuous locations and suppress low frequency components as well as some high ones in

Many computational principles regarding bottom-up and top-down visual processing have emerged from experimental and modeling studies. Different features contribute to

FOAs' radiuses (in pixels).

a band pass-like strategy.

**5.1 Visual cortex mechanisms** 

**5. Discussion, conclusions and perspectives** 

perceptual saliency and their weighing can be influenced by top-down modulation (Deco & Rolls, 2004; Peters et al., 2005; Serre et al., 2006). There is experimental evidence concerning strong interactions among different visual modalities such as color and orientation for certain visual locations; these interactions are subjected to top-down modulation and training. On the other hand, the most important issue regarding the bottom-up processing is the contrast among features instead of the absolute intensity of each feature. However, the primary visual neurons are not only tuned to some kind of local spatial contrast, but also to neural responses elicited tightly by context, in a structure that extends the range of the classical receptive field.

It is likely that the relative weight of the features which contribute to the most general representation is modulated by higher cortical centers. In this way, the attention process selects the necessary information to discriminate between the distractors and the target as both bottom-up and top-down processes are carried out to analyze the same scene: Here the top-down process is related to previously acquired knowledge that biases the neural processing competition among the objects, therefore the recognition is performed by selecting the next eye movement that maximizes the information gain. The computational challenge, then, lies in the integration of bottom-up and top-down cues, such as to provide coherent control signals for the focus of attention, and in the interplay between attentional orientating and scene or object recognition (Itti & Koch, 2001). As the current approach is familiar only to bottom-up processes, it presents high resemblance and compatibility to integrate top-down processes. Integrating such processes would raise the overall performance of the mammography CAD system.

#### **5.2 Could this approach be extended to mass detection? Through multi-sign detection**

Another important issue is the possibility of multi-sign detection, i.e., the detection of multiple signs on the same image modality. Current image processing techniques make the primitive breast abnormalities detection easier (Verma, 2008), nevertheless, the detection of these abnormalities leads to many false detections which depend on the robustness of the vision system (Vilarrasa, 2006; Ramirez-Villegas et al., 2010). On the other hand, the modular architectures of most of such existing systems lead to the necessity of creating separate algorithms for detecting different kinds of cancer signs, e.g., microcalcifications and masses. As these two types of abnormalities are in several ways remarkably different, many researchers have addressed these two diagnostic tasks separately; consequently, the difficulty on detecting cancer rises in direct proportion with the number of implemented algorithms for such tasks (in this case, at least two different processing pathways). Nevertheless, visual attention modeling could be an important step towards the development of a fully-comprehensive CAD system for mammographic image analysis. Up to the knowledge of the authors of this chapter, the potential of such models in the analysis of mammographic images have not been yet issued, nor identified. Theoretically, given the features of the visual processes intended to be modeled, any visual cortex-like model would be capable of helping (to some extent) in the analysis of any diagnostic image.

Figure 11 illustrates an example of how the saliency-based bottom-up model operates for a mammographic image containing a mass.

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 183

number of target diseases. It has become obvious that a more straightforward strategy will be needed to overcome the problem of integrating and understanding all the diagnostic knowledge by processing images and extracting candidate regions having structures and/or characteristics that are not normal. Furthermore, the integration and processing of such wide variety of multi-modal medical images need to be assembled and implemented as part

For instance, in clinical situations, it is important that the sensitivity of the system to be maintained as high as possible, which is achievable using a complex and strongly wired computational model (such as cortex-like models), and not with an unnecessarily increased number of different and complementary CAD schemes. Furthermore, although a fullycomprehensive CAD scheme can be seen as a highly robust and integral software, processing images from so many modalities is time consuming and therefore, it is likely that diagnosis would not be performed in real time; rather than that, image analysis would be performed offline. As a first goal towards the development of a multi-organ and multidisease CAD, the integration of multi-modal medical images and intelligent assistance in diagnosis of multi-dimensional images has been of great interest (Kobatake, 2007). This task poses additional challenges from the viewpoint of the computational efficiency and the trade-off between the processing efficacy at large datasets and the time taken to reach the diagnosis to support the decision of the physician. Moreover, the analysis of structures is another important issue to consider for the diagnosis of multiple diseases. For example, the thoracic structure contains at least nine different areas of high diagnostic interest (including lung area, trachea and pulmonary vessels) related to at least eight pathology-related signs (large lesions, pulmonary nodules attached to vessels, isolated pulmonary nodules, among others). Therefore, the integrated multi-disease and multi-organ CAD system, in this particular case, would extend the standard lung-cancer detection CAD system, irrespective

Finally, beyond the quite simple technical predictions of the authors of this book chapter, comprehensive CAD systems and the potential of cortex-like mechanisms modeling to overcome detection problems and to raise the sensitivity on disease sings detection will contribute dramatically to the development and improvement of the current capabilities of

Burt, P. J. & Adelson, E. H. (1983). The Laplacian pyramid as a compact image code. *IEEE Trans. Com.*, Vol. 31, No. 4, (April 1983), pp. 532-540, ISSN 0090-6778 Christoyianni, I., Koutras, A., Dermatas, E. & Kokkinakis, G. (2002). Computer aided

*and Graphics,* Vol. 26, No. 5, (September 2002), pp. 309-319, ISSN 0895-6111 Cupples, T. E., Cunningham, J. E., Reynolds, J. C. (2005). Impact of computeraided detection

Vol. 185, No. 4, (October 2005), pp. 944–950, ISSN 1546-3141

diagnosis of breast cancer in digitized mammograms. *Computerized Medical Imaging* 

in a regional screening mammography program. *American Journal of Roentgenology*,

of PACS in order to be used in clinical situations.

of the methods used to detect the disease signs.

CAD systems.

**6. References** 

Fig. 11. Saliency-based bottom-up approach in the detection of masses: (a) Equalized image; (b) Saliency map; (c) First attended location by the WTA approach.

#### **5.3 Perspectives: Through multi-organ and multi-disease CAD**

There are many computational approaches which have addressed the problem of diagnosis at reasonable cost-effectiveness relation; however, they commonly are single-purpose systems, i.e., their target is the detection of only one disease and one organ. Such schemes are referred to as *abnormality-dependent approaches*. These approaches work well in the case of a single-purpose CAD; nonetheless, it is not preferable to apply such approaches to a multidisease CAD, given the disproportionately large amount of algorithms that would be necessary to detect every single disease (at least one per disease). The future integration of CAD systems into multi-disease and multi-organ ones ensures its usage in diagnosing a large amount of target diseases with a fully comprehensive and integral architecture. Conversely, the diversity of acquisition conditions and the features of different kinds of diagnostic images pose additional challenges on well-known CAD processing steps such as segmentation, registration and classification, not to mention that the characteristics of abnormal regions on these images depend largely on the type of disease. Therefore, it is desirable to integrate the diagnostic knowledge of various types of diseases into a *universal dictionary of features for diagnosis*.

Some research efforts have been made in improving multi-organ and multi-disease CAD. As a matter of fact, there is a rising interest on integrating such systems into multi-purpose ones. One aspect of this is that cancer, for instance, can spread to other organs in the body. Therefore, if a single-disease CAD can detect cancer, it would be of quite limited use for predicting metastasis and complications related to such cancer detection; moreover, it also would be of little use in order to detect cancer in other organs when such metastasis has occurred.

Conventional single-disease CAD approaches have addressed as many diseases as the number of involved computer vision algorithms to detect them. As a matter of fact, the wide range of conditions and characteristics of images is the most cumbersome issue that abnormality-dependent approaches have to face. Conversely, from the viewpoint of computational efficiency, it is not desirable to have as many diagnosis algorithms as the number of target diseases. It has become obvious that a more straightforward strategy will be needed to overcome the problem of integrating and understanding all the diagnostic knowledge by processing images and extracting candidate regions having structures and/or characteristics that are not normal. Furthermore, the integration and processing of such wide variety of multi-modal medical images need to be assembled and implemented as part of PACS in order to be used in clinical situations.

For instance, in clinical situations, it is important that the sensitivity of the system to be maintained as high as possible, which is achievable using a complex and strongly wired computational model (such as cortex-like models), and not with an unnecessarily increased number of different and complementary CAD schemes. Furthermore, although a fullycomprehensive CAD scheme can be seen as a highly robust and integral software, processing images from so many modalities is time consuming and therefore, it is likely that diagnosis would not be performed in real time; rather than that, image analysis would be performed offline. As a first goal towards the development of a multi-organ and multidisease CAD, the integration of multi-modal medical images and intelligent assistance in diagnosis of multi-dimensional images has been of great interest (Kobatake, 2007). This task poses additional challenges from the viewpoint of the computational efficiency and the trade-off between the processing efficacy at large datasets and the time taken to reach the diagnosis to support the decision of the physician. Moreover, the analysis of structures is another important issue to consider for the diagnosis of multiple diseases. For example, the thoracic structure contains at least nine different areas of high diagnostic interest (including lung area, trachea and pulmonary vessels) related to at least eight pathology-related signs (large lesions, pulmonary nodules attached to vessels, isolated pulmonary nodules, among others). Therefore, the integrated multi-disease and multi-organ CAD system, in this particular case, would extend the standard lung-cancer detection CAD system, irrespective of the methods used to detect the disease signs.

Finally, beyond the quite simple technical predictions of the authors of this book chapter, comprehensive CAD systems and the potential of cortex-like mechanisms modeling to overcome detection problems and to raise the sensitivity on disease sings detection will contribute dramatically to the development and improvement of the current capabilities of CAD systems.

#### **6. References**

182 Digital Image Processing

Fig. 11. Saliency-based bottom-up approach in the detection of masses: (a) Equalized image;

There are many computational approaches which have addressed the problem of diagnosis at reasonable cost-effectiveness relation; however, they commonly are single-purpose systems, i.e., their target is the detection of only one disease and one organ. Such schemes are referred to as *abnormality-dependent approaches*. These approaches work well in the case of a single-purpose CAD; nonetheless, it is not preferable to apply such approaches to a multidisease CAD, given the disproportionately large amount of algorithms that would be necessary to detect every single disease (at least one per disease). The future integration of CAD systems into multi-disease and multi-organ ones ensures its usage in diagnosing a large amount of target diseases with a fully comprehensive and integral architecture. Conversely, the diversity of acquisition conditions and the features of different kinds of diagnostic images pose additional challenges on well-known CAD processing steps such as segmentation, registration and classification, not to mention that the characteristics of abnormal regions on these images depend largely on the type of disease. Therefore, it is desirable to integrate the diagnostic knowledge of various types of diseases into a *universal* 

Some research efforts have been made in improving multi-organ and multi-disease CAD. As a matter of fact, there is a rising interest on integrating such systems into multi-purpose ones. One aspect of this is that cancer, for instance, can spread to other organs in the body. Therefore, if a single-disease CAD can detect cancer, it would be of quite limited use for predicting metastasis and complications related to such cancer detection; moreover, it also would be of little use in order to detect cancer in other organs when such metastasis has

Conventional single-disease CAD approaches have addressed as many diseases as the number of involved computer vision algorithms to detect them. As a matter of fact, the wide range of conditions and characteristics of images is the most cumbersome issue that abnormality-dependent approaches have to face. Conversely, from the viewpoint of computational efficiency, it is not desirable to have as many diagnosis algorithms as the

(b) Saliency map; (c) First attended location by the WTA approach.

**5.3 Perspectives: Through multi-organ and multi-disease CAD** 

*dictionary of features for diagnosis*.

occurred.


Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach 185

Oliver, A., Freixenet, J., Martí, J., Pérez, E., Pont, J., Denton, E.R.E. & Zwiggelaar, R. (2010). A

Peters, R. J., Iyer, A., Itti, L. & Koch, C. (2005). Components of bottom-up gaze allocation in

Ramirez-Moreno, D. F. & Ramirez-Villegas, J. F. (2011). A computational implementation of

Ramirez-Villegas, J. F., Lam-Espinosa, E. & Ramirez-Moreno, D. F. (2010). Microcalcification

*Graphics and Image Processing*, 11-15 October, pp. 186-193, ISSN: 1550-1834. Ramirez-Villegas, J. F. & Ramirez-Moreno, D. F. (2011). Wavelet packet energy, Tsallis

Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M. & Poggio, T. (2007). Robust Object

Verma, B. (2008). Novel network architecture and learning algorithm for the classification of

Verma, B., McLeod, P. & Klevansky, A. (2009). A novel soft cluster neural network for the

Vilarrasa-Andrés, A. (2006). *Sistema inteligente para la detección y diagnóstico de patología* 

Walther, D. & Koch C. (2006). Modeling attention to salient proto-objects. *Neural Networks,*

Wei, L., Yang, Y. & Nishikawa, R.M. (2009). Microcalcification classification assisted by

Vol. 42, No. 1, (January 2008), pp. 67-79, ISSN 0933-3657

42, No. 9, (September 2009), pp. 1845-1852, ISSN 0031-3203

Vol. 19, No. 9, (November 2006), pp. 1395-1407, ISSN 0893-6080

42, No. 6, (June 2009), pp. 1126-1132, ISSN 0031-3203

Complutense de Madrid, Madrid, España

ISSN 0010-4825

4993 (In press)

0042-6989

(In press)

ISSN 0895-6111 (In Press)

review of automatic mass detection and segmentation in mammographic images. *Medical Image Analysis*, Vol. 14, No. 2, (April 2010), pp. 87-110, ISSN 1361-8415 Papadopoulos, A., Fotiadis, D.I. & Costaridou, L. (2008). Improvement of microcalcifications

cluster detection in mammography utilizing image enhancing techniques. *Computers in Biology and Medicine*, Vol. 38, No. 10, (October 2008), pp. 1045-1055,

natural images. *Vision Research*, Vol. 45, No. 18, (August 2005), pp. 2397-2416, ISSN

a bottom-up visual attention model applied to natural scenes. *Rev. Ing.*, ISSN 0121-

detection in mammograms using difference of Gaussians filters and a hybrid feedforward-Kohonen neural network. *XXII Brazilian Symposium on Computer* 

entropy and statistical parameterization for support vector-based and neuralbased classification of mammographic regions. *Neurocomputing*, ISSN 0925-2312

recognition with cortex-like mechanisms. *IEEE Transactions on Pattern Analysis and Machine Intelligence*, Vol. 29, No. 3, (March 2007), pp. 411-426, ISSN 0162-8828 Suckling, J. (1994). The Mammographic Image Analysis Society Digital Mammogram Database. Excerpta Medica, International Congress Series, Vol. 1069, pp. 375-378. Tsai, N-C., Chen, H-W. & Hsu, S-L. (2010). Computer-aided diagnosis for early-stage breast

cancer by using Wavelet Transform. *Computerized Medical Imaging and Graphics*,

mass abnormalities in digitized mammograms. *Artificial Intelligence in Medicine,* 

classification of suspicious areas in digital mammograms. *Pattern Recognition*, Vol.

*mamaria*, PhD Thesis, Dept. de radiología y medicina física, Universidad

content-based image retrieval for breast cancer diagnosis. *Pattern Recognition*, Vol.


Deco, G. & Rolls, E. T. (2004). A neurodynamical cortical model of visual attention and

Desimone, R. & Duncan, J. (1995). Neural mechanisms of selective visual attention. *Annu.* 

Doi, K. (2007). Computer-aided diagnosis in medical imaging: Historical review, current

El-Naqa, I., Yang, Y., Wernick, M.N., Galatsanos, N.P. & Nishikawa, R.M. (2002). A support

Greenspan, H., Belongie, S., Goodman, R., Perona, P., Rakshit, S., & Anderson, C. H. (1994).

Hoaglin, D., Mosteller, F. & Tukey J. (1983). *Understanding Robust and Exploratory Data Analysis*, John Wiley & Sons, ISBN 978-0471384915, New York, USA. Itti, L., Koch, C. & Niebur, E. (1998). A Model of Saliency-Based Visual Attention for Rapid

Itti, L. & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of

Itti, L. & Koch, C. (2001). Computational modeling of visual attention. *Nature Reviews Neuroscience*, Vol. 2, No. 3, (March 2001), pp. 194-203, ISSN 1471-0048 Kobatake, H. (2007). Future CAD in multi-dimensional medical images – Project on multi-

Koch, C. & Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. *Human Neurobiol.*, Vol. 4, No. 4, pp. 219-227, ISSN 0721-9075 Kohonen, T. (1990). The self-organizing map. *Proceedings of the IEEE*, Vol. 78, No. 9,

Navalpakkam, V. & Itti, L. (2002). A goal oriented attention guidance model. In: *Lecture* 

Ochoa, E. M. (1996). *Clustered microcalcification detection using optimized difference of Gaussians* 

*Research*, Vol. 45, No. 2, (January 2005), pp. 205-231, ISSN 0042-6989

31, No. 4-5, (June-July 2007), pp. 258-266, ISSN 0895-6111

(September 1990), pp. 1464-1480, ISSN 0018-9219

*Imaging*, Vol. 21, No. 12, (Febrero 2003), pp. 1552-1563, ISSN 0278-0062 Fix, J., Rougier, N. & Alexandre, F. (2011). A dynamic neural field approach to the covert

*Rev. Neurosci.*, Vol. 18, No. 1, pp. 193-222, ISSN 0147-006X

4-5, (June-July 2007), pp. 198-211, ISSN 0895-6111

(March 2011), pp. 279-293, ISSN 1866-9956

Seattle, WA, USA, June 21-23, 2002.

1998), pp. 1254-1259, ISSN 0162-8828

0042-6989

University, Ohio, USA.

642, ISSN 0042-6989

invariant object recognition. *Vision Research*, Vol. 44, No. 6, (March 2004), pp. 621-

status and future potential. *Computerized Medical Imaging and Graphics*, Vol. 31, No.

vector machine approach for detection of microcalcifications. *IEEE Trans. Med.* 

and overt deployment of spatial attention. *Cognitive Computation,* Vol. 3, No. 1,

Overcomplete Steerable Pyramid Filters and Rotation Invariance, *Proceedings of IEEE Computer Vision and Pattern Recognition*, pp. 222-228, ISBN 0-8186-5825-8,

Scene Analysis*. IEEE Trans. Patt. Anal. Mach. Intel*., Vol. 20, No. 11, (November

visual attention. *Vision Research,* Vol. 40, No. 10-12, (June 2000), pp. 1489–1506, ISSN

organ, multi-disease CAD system. *Computerized Medical Imaging and Graphics,* Vol.

*Notes in Computer Science,* H. Bülthoff, C. Wallraven, S-W. Lee & T. Poggio, (Eds.), 453-461, Springer Berlin/Heidelberg, ISBN 978-3-540-00174-4, Berlin, Germany Navalpakkam, V. & Itti, L. (2005). Modeling the influence of task on attention. *Vision* 

*(DoG)*. M.Sc. thesis, Dept. of the Air Force, Air Force Institute of Technology, Air


**9** 

*Romania* 

**Compensating Light Intensity Attenuation in** 

The scientific discipline of microscopy aims to make possible the visualization of objects that cannot be observed by the unassisted human vision system, allowing researchers to enhance their understanding on the morphology and processes which characterize such objects. All microscopy techniques by themselves represent crucial tools for scientists working in various fields of research, and furthermore, when combined with image processing and computer vision algorithms the level of information and the speed at which it can be

Confocal Scanning Laser Microscopy (CSLM) is generally considered to be one of the most important microscopy techniques at this time because of the optical sectioning possibilities offered. It is widely accepted that the confocal microscope was invented by Marvin Minsky, who filed a patent in 1957 (Minsky, 1957). However, at that time such a system was very difficult, if not impossible, to implement, due to the unavailability of the required laser sources, sensitive photomultipliers or computer image storage possibilities. The first CSLM system, functioning by using mechanical object scanning, was developed in Oxford in 1975, and a review of this work was later published (Sheppard, 1990). As mentioned above, the architecture of a CSLM system provides the possibility to acquire images representing optical sections on a sample's volume. In order to achieve this, in a CSLM system an excitation source (laser) emits coherent light which is scanned across the sample surface. In reflection mode the light reaching the sample is reflected backwards to the objective, towards a detector. In fluorescence mode the same optical path is used, with the difference being that the reflected light is discarded and the detector collects only the light rays corresponding to the fluorescence emission from the sample. While in conventional microscopy, the detector is subjected to light which is reflected by out of focus planes, resulting in out-of-focus blur being contained in the final image, the architecture of a CSLM system avoids this situation. In order to acquire images corresponding to a certain optical section, a confocal aperture (usually known as pinhole) is situated in front of the detector. More precisely, the pinhole is placed in a plane conjugate to the intermediate image plane

extracted from microscopy images can be greatly increased.

**1. Introduction** 

**Confocal Scanning Laser Microscopy by** 

Stefan G. Stanciu1, George A. Stanciu1 and Dinu Coltuc2 *1Center for Microscopy-Microanalysis and Information Processing,* 

**Histogram Modeling Methods** 

*University "Politehnica" of Bucharest 2University Valahia of Targoviste* 

Yu, S., Guan, L. (2000). A CAD system for the automatic detection of clustered microcalcifications in digitized mammogram films. *IEEE Trans. Med. Imaging,* Vol. 19, No. 2, (February 2000), pp. 115-126, ISSN 0278-0062

### **Compensating Light Intensity Attenuation in Confocal Scanning Laser Microscopy by Histogram Modeling Methods**

Stefan G. Stanciu1, George A. Stanciu1 and Dinu Coltuc2 *1Center for Microscopy-Microanalysis and Information Processing, University "Politehnica" of Bucharest 2University Valahia of Targoviste Romania* 

#### **1. Introduction**

186 Digital Image Processing

Yu, S., Guan, L. (2000). A CAD system for the automatic detection of clustered

19, No. 2, (February 2000), pp. 115-126, ISSN 0278-0062

microcalcifications in digitized mammogram films. *IEEE Trans. Med. Imaging,* Vol.

The scientific discipline of microscopy aims to make possible the visualization of objects that cannot be observed by the unassisted human vision system, allowing researchers to enhance their understanding on the morphology and processes which characterize such objects. All microscopy techniques by themselves represent crucial tools for scientists working in various fields of research, and furthermore, when combined with image processing and computer vision algorithms the level of information and the speed at which it can be extracted from microscopy images can be greatly increased.

Confocal Scanning Laser Microscopy (CSLM) is generally considered to be one of the most important microscopy techniques at this time because of the optical sectioning possibilities offered. It is widely accepted that the confocal microscope was invented by Marvin Minsky, who filed a patent in 1957 (Minsky, 1957). However, at that time such a system was very difficult, if not impossible, to implement, due to the unavailability of the required laser sources, sensitive photomultipliers or computer image storage possibilities. The first CSLM system, functioning by using mechanical object scanning, was developed in Oxford in 1975, and a review of this work was later published (Sheppard, 1990). As mentioned above, the architecture of a CSLM system provides the possibility to acquire images representing optical sections on a sample's volume. In order to achieve this, in a CSLM system an excitation source (laser) emits coherent light which is scanned across the sample surface. In reflection mode the light reaching the sample is reflected backwards to the objective, towards a detector. In fluorescence mode the same optical path is used, with the difference being that the reflected light is discarded and the detector collects only the light rays corresponding to the fluorescence emission from the sample. While in conventional microscopy, the detector is subjected to light which is reflected by out of focus planes, resulting in out-of-focus blur being contained in the final image, the architecture of a CSLM system avoids this situation. In order to acquire images corresponding to a certain optical section, a confocal aperture (usually known as pinhole) is situated in front of the detector. More precisely, the pinhole is placed in a plane conjugate to the intermediate image plane

Compensating Light Intensity Attenuation in Confocal

depending on the depth at which they have been collected.

Scanning Laser Microscopy by Histogram Modeling Methods 189

scattering and adsorption, light-aberrations or photo bleaching in the case of fluorescence labeled samples. Also, due to the fact that staining thick samples by fluorophores evenly is a difficult task, the intensity attenuation with depth is commonly encountered in CSLM investigations on such samples. The intensity attenuation can be caused also by chromatic or spherical aberrations which may occur due to various properties of the optical elements present in a CSLM system. These aberrations can lead to a distortion of focus, which can further on lead to decrease in the excitation intensity. The attenuation of light can increase with the depth of the imaged focal planes also because of physical phenomena such as scattering and absorption, more precisely due to the fact that the light rays are significantly scattered and absorbed by the atoms and molecules contained in the medium encountered by the light on the path to the targeted focal plane (and on the return path as well). When a dense medium that significantly scatters and absorbs light is present above the region of focus, the image that corresponds to the focal plane will have a lower contrast than the images that are collected from the upper planes. As a consequence, the images within the image stacks captured using confocal optical microscopy will have different intensities

Besides absorption and scattering, another phenomenon which can lead to light attenuation and thus affect the CSLM image acquisition is the reflection of the incident laser beam towards a different direction rather than the direction of the objective. When the laser beam encounters a plain surface, it will reflect backwards, in the direction of the objective. When the laser beam encounters an inclined surface instead of reflecting backwards towards the objective, it will reflect in a direction normal to the plane of that surface as illustrated in Fig 2. The light that reaches the detector in the case of the interaction between the laser beam and regions with morphology of this type will have a low intensity. An example on such a scenario can be found in (Stanciu et al. 2010), where CSLM images collected on Photonic Quantum Ring Laser devices that have a deficient aspect due to this situation are presented.

Fig. 2. Scenario in laser scanning microscopy when in certain regions the laser beam is not

The intensity attenuation and structural blurring in the image stacks can be the cause of serious problems in the analysis of CSLM images. Problematic situations occur also when trying to use computer vision algorithms designed for tasks such as object & scene recognition or object tracking along with image stacks collected by CSLM. These types of

reflected backwards, towards the objective, due to the sample's geometry.

and, thus, to the object plane of the microscope. As a result, only light reflected from the focal plane reaches the detector, out-of-focus light being blocked by the pinhole (Fig. 1). The dimension of the pinhole is variable and together with the wavelength which is being used and the numerical aperture of the objective, determines the thickness of the focal plane (Shepard et. al., 1997; Wilson, 2002).

Fig. 1. Principle of Confocal Scanning Laser Microscopy.

The architecture of a CSLM offers specific advantages such as increased resolution and better contrast than conventional microscopy. Meantime, by providing access to images corresponding to optical sections it offers as well significant advantages for people working in fields such as biology, medicine, material science or microelectronics mainly because the CSLM image stacks can be used for 3D reconstructions of the material surfaces (surface topological studies) or of the internal structure of semi-transparent specimens (sub-surface bulk studies) (Rigaut et al., 1991; Sugawara et al. 2005; Liu et al. 1997; Rodriguez et al, 2003; Pironon, 1998). The limits of a CSLM system's performance are essentially determined by the working depth of the high numerical aperture (NA) objective lens which is used in a particular investigation session but also by the properties of other components such as the laser source, the photomultiplier sensitivity or others.

One of the causes that lead to problematic scenarios which can occur during the CSLM investigations sessions is light intensity attenuation. This problem is mainly caused by light

and, thus, to the object plane of the microscope. As a result, only light reflected from the focal plane reaches the detector, out-of-focus light being blocked by the pinhole (Fig. 1). The dimension of the pinhole is variable and together with the wavelength which is being used and the numerical aperture of the objective, determines the thickness of the focal plane

The architecture of a CSLM offers specific advantages such as increased resolution and better contrast than conventional microscopy. Meantime, by providing access to images corresponding to optical sections it offers as well significant advantages for people working in fields such as biology, medicine, material science or microelectronics mainly because the CSLM image stacks can be used for 3D reconstructions of the material surfaces (surface topological studies) or of the internal structure of semi-transparent specimens (sub-surface bulk studies) (Rigaut et al., 1991; Sugawara et al. 2005; Liu et al. 1997; Rodriguez et al, 2003; Pironon, 1998). The limits of a CSLM system's performance are essentially determined by the working depth of the high numerical aperture (NA) objective lens which is used in a particular investigation session but also by the properties of other components such as the

One of the causes that lead to problematic scenarios which can occur during the CSLM investigations sessions is light intensity attenuation. This problem is mainly caused by light

(Shepard et. al., 1997; Wilson, 2002).

Fig. 1. Principle of Confocal Scanning Laser Microscopy.

laser source, the photomultiplier sensitivity or others.

scattering and adsorption, light-aberrations or photo bleaching in the case of fluorescence labeled samples. Also, due to the fact that staining thick samples by fluorophores evenly is a difficult task, the intensity attenuation with depth is commonly encountered in CSLM investigations on such samples. The intensity attenuation can be caused also by chromatic or spherical aberrations which may occur due to various properties of the optical elements present in a CSLM system. These aberrations can lead to a distortion of focus, which can further on lead to decrease in the excitation intensity. The attenuation of light can increase with the depth of the imaged focal planes also because of physical phenomena such as scattering and absorption, more precisely due to the fact that the light rays are significantly scattered and absorbed by the atoms and molecules contained in the medium encountered by the light on the path to the targeted focal plane (and on the return path as well). When a dense medium that significantly scatters and absorbs light is present above the region of focus, the image that corresponds to the focal plane will have a lower contrast than the images that are collected from the upper planes. As a consequence, the images within the image stacks captured using confocal optical microscopy will have different intensities depending on the depth at which they have been collected.

Besides absorption and scattering, another phenomenon which can lead to light attenuation and thus affect the CSLM image acquisition is the reflection of the incident laser beam towards a different direction rather than the direction of the objective. When the laser beam encounters a plain surface, it will reflect backwards, in the direction of the objective. When the laser beam encounters an inclined surface instead of reflecting backwards towards the objective, it will reflect in a direction normal to the plane of that surface as illustrated in Fig 2. The light that reaches the detector in the case of the interaction between the laser beam and regions with morphology of this type will have a low intensity. An example on such a scenario can be found in (Stanciu et al. 2010), where CSLM images collected on Photonic Quantum Ring Laser devices that have a deficient aspect due to this situation are presented.

Fig. 2. Scenario in laser scanning microscopy when in certain regions the laser beam is not reflected backwards, towards the objective, due to the sample's geometry.

The intensity attenuation and structural blurring in the image stacks can be the cause of serious problems in the analysis of CSLM images. Problematic situations occur also when trying to use computer vision algorithms designed for tasks such as object & scene recognition or object tracking along with image stacks collected by CSLM. These types of

Compensating Light Intensity Attenuation in Confocal

Scanning Laser Microscopy by Histogram Modeling Methods 191

maximal distances (*M*0, *M*1, *M*2, . . . , *M*10) are searched in the histograms of all images in the stack and once found are counted up and stretched to cover a grayscale of 256 levels. The breaks between the rescaled maximal distances (*L*0, *L*1, *L*2, . . . , *L*10) represent the new landmarks of the standard scale (Fig. 3). The landmarks of the standard scale are matched to the landmarks of individual image histograms in order to compute the new intensities of the image pixels and further on the image intensities between the landmarks (*L*0, *L*1, *L*2, . . . , *L*10) are piece-wise linearly interpolated, as illustrated in Fig. 4. In other words, the normalized histogram is determined by taking for each pair of landmarks the maximal distance and by stretching these distances to cover the graylevel range. Finally, the normalized histogram is specified to each image in the stack by histogram warping. Histogram warping, originally proposed by Cox (1995), is closely related to histogram specification. Instead of transforming a given image to match a given histogram, for histogram warping one should transform two given images in order to achieve the same somehow "intermediate" histogram. In fact, the histogram warping problem consists in deriving the intermediate histogram which can be exactly specified to both images. The complete details of the algorithm presented above can be found in the original publication (Capek et. al, 2006).

Fig. 3. Mapping of the maximal distances to a standard 256 levels grayscale.

Fig. 4. Piece-wise linear interpolation of new intensity values based on landmark matching.

techniques can provide awkward results when the contrast parameters, which are directly dependant to the light intensity, have very different values throughout the series. For example the results that may be achieved by using various thresholding algorithms directly depend on the separability and stationarity of the intensity distributions corresponding to the two classes in the 1D intensity space. In the case of light intensity attenuation, the intensity distributions are non-stationary and non-stationarity reduces the effective separability between the classes, which is likely to lead to segmentation errors (Semechko et al, 2011). In (Sun et al., 2004) is shown that intensity compensation can also enhance the visualization of CSLM data when 3D reconstruction techniques are employed for volume rendering.

In this chapter we overview several recently reported digital image processing techniques that can be used to compensate the effects of light attenuation in CSLM imaging. The techniques that we present in this chapter can be regarded as histogram modeling methods. In order compensate light attenuation, Capek et al. (2006) considered the specification for each frame in the stack of a standard histogram. The standard histogram was computed according to a normalization procedure proposed by Nyul et al. (2000) which consists on matching landmarks in histograms. Stanciu et al. (2010), propose in the same purpose a method based on exact histogram specification. In their method to each of the images in the CSLM stack is exactly specified the histogram of a reference frame. The reference frame is elected by using an estimator which takes into account aspects such as brightness, contrast and sharpness. Semechko et al (2011), propose an intensity attenuation correction method which combines the ones of Capek et al. (2006) and Stanciu et al. (2010). In this method, the authors use as reference the standard histogram proposed in Capek (2006), specify it in its exact shape to the other images of the stack by using the algorithm of Coltuc (2006) and finally nonlinear diffusion filtering is used aiming to suppress noise and homogenize locally over-enhanced image regions. Each of these methods will be presented in detail in the following sections.

#### **2. Intensity attenuation based on histogram normalization & histogram warping**

In Capek et al., (2006) a method for the compensation of intensity attenuation based on the warping of the histograms of the individual images in the CSLM stack to a standard histogram is introduced. The standard histogram is constructed such that warping the histograms of the individual images onto it will both preserve the high contrast of the optimally captured sections while in the same time will improve the contrast and the brightness of the low quality images in the stack. The high quality images are most likely to correspond to the topmost layers in the specimen, while the low quality images are most likely to correspond to the deep ones. The computation of the standard histogram is inspired by a procedure for histogram normalization proposed by Nyul et al. (2000) which consists in directly matching landmarks in histograms. Unlike the original approach by Nyul et al.(2000), the method proposed by Capek et al. (2006) searches for the longest distance between two adjacent landmarks in one histogram. The considered landmarks correspond to the minimum intensity, maximum intensity and the n-th percentiles of the image histogram, for *n*= {10, 20,…, 90}. This approach is chosen by the authors as maximal distances between landmarks are likely to preserve maximum image contrast. These

techniques can provide awkward results when the contrast parameters, which are directly dependant to the light intensity, have very different values throughout the series. For example the results that may be achieved by using various thresholding algorithms directly depend on the separability and stationarity of the intensity distributions corresponding to the two classes in the 1D intensity space. In the case of light intensity attenuation, the intensity distributions are non-stationary and non-stationarity reduces the effective separability between the classes, which is likely to lead to segmentation errors (Semechko et al, 2011). In (Sun et al., 2004) is shown that intensity compensation can also enhance the visualization of CSLM data when 3D reconstruction techniques are

In this chapter we overview several recently reported digital image processing techniques that can be used to compensate the effects of light attenuation in CSLM imaging. The techniques that we present in this chapter can be regarded as histogram modeling methods. In order compensate light attenuation, Capek et al. (2006) considered the specification for each frame in the stack of a standard histogram. The standard histogram was computed according to a normalization procedure proposed by Nyul et al. (2000) which consists on matching landmarks in histograms. Stanciu et al. (2010), propose in the same purpose a method based on exact histogram specification. In their method to each of the images in the CSLM stack is exactly specified the histogram of a reference frame. The reference frame is elected by using an estimator which takes into account aspects such as brightness, contrast and sharpness. Semechko et al (2011), propose an intensity attenuation correction method which combines the ones of Capek et al. (2006) and Stanciu et al. (2010). In this method, the authors use as reference the standard histogram proposed in Capek (2006), specify it in its exact shape to the other images of the stack by using the algorithm of Coltuc (2006) and finally nonlinear diffusion filtering is used aiming to suppress noise and homogenize locally over-enhanced image regions. Each of these methods will be presented in detail in the

**2. Intensity attenuation based on histogram normalization & histogram** 

In Capek et al., (2006) a method for the compensation of intensity attenuation based on the warping of the histograms of the individual images in the CSLM stack to a standard histogram is introduced. The standard histogram is constructed such that warping the histograms of the individual images onto it will both preserve the high contrast of the optimally captured sections while in the same time will improve the contrast and the brightness of the low quality images in the stack. The high quality images are most likely to correspond to the topmost layers in the specimen, while the low quality images are most likely to correspond to the deep ones. The computation of the standard histogram is inspired by a procedure for histogram normalization proposed by Nyul et al. (2000) which consists in directly matching landmarks in histograms. Unlike the original approach by Nyul et al.(2000), the method proposed by Capek et al. (2006) searches for the longest distance between two adjacent landmarks in one histogram. The considered landmarks correspond to the minimum intensity, maximum intensity and the n-th percentiles of the image histogram, for *n*= {10, 20,…, 90}. This approach is chosen by the authors as maximal distances between landmarks are likely to preserve maximum image contrast. These

employed for volume rendering.

following sections.

**warping** 

maximal distances (*M*0, *M*1, *M*2, . . . , *M*10) are searched in the histograms of all images in the stack and once found are counted up and stretched to cover a grayscale of 256 levels. The breaks between the rescaled maximal distances (*L*0, *L*1, *L*2, . . . , *L*10) represent the new landmarks of the standard scale (Fig. 3). The landmarks of the standard scale are matched to the landmarks of individual image histograms in order to compute the new intensities of the image pixels and further on the image intensities between the landmarks (*L*0, *L*1, *L*2, . . . , *L*10) are piece-wise linearly interpolated, as illustrated in Fig. 4. In other words, the normalized histogram is determined by taking for each pair of landmarks the maximal distance and by stretching these distances to cover the graylevel range. Finally, the normalized histogram is specified to each image in the stack by histogram warping. Histogram warping, originally proposed by Cox (1995), is closely related to histogram specification. Instead of transforming a given image to match a given histogram, for histogram warping one should transform two given images in order to achieve the same somehow "intermediate" histogram. In fact, the histogram warping problem consists in deriving the intermediate histogram which can be exactly specified to both images. The complete details of the algorithm presented above can be found in the original publication (Capek et. al, 2006).

Fig. 3. Mapping of the maximal distances to a standard 256 levels grayscale.

Fig. 4. Piece-wise linear interpolation of new intensity values based on landmark matching.

Compensating Light Intensity Attenuation in Confocal

brightness, contrast and contour sharpness.

average graylevel, *μf*, can be defined as follows:

estimating the gradient in the *y*-direction (rows):

Scanning Laser Microscopy by Histogram Modeling Methods 193

As mentioned above, the proposed method relies on specifying the histogram that corresponds to the image of the best visual quality in the series onto the rest of the images in the stack. The reference frame can be selected by visual inspection, with a human operator examining the entire image stack and choosing the best quality frame. Obviously, such an approach is both subjective and time consuming. Therefore, the authors have proposed a procedure that offers the automated detection of the reference frame. In order to automate the reference frame detection, a quality assessment metric is defined, and the reference frame is selected as the one with the best score with respect to the considered metric. The metric that the authors have proposed is based on the evaluation of three attributes which are generally considered as responsible for the quality of a graylevel images. These are:

A good measure of image brightness is the average graylevel of the image. Considering the discrete image *f*:[0,*M*-1] x [0,*N*-1]->[0,*L*-1] and *H*={*h*(0), *h*(1),… *h*(*L*-1)} its histogram. The

As the standard deviation measures how widely spread the values in a data set are, it can be regarded as a measure of contrast. If many data points are close to the mean, then the standard deviation is small. By the contrary, if many data points are far from the mean, then the standard deviation is large. Finally, if all data values are equal, then the standard deviation is zero. Obviously the variance is a measure of the contrast, too. The standard deviation has the advantage that, unlike variance, it is expressed in the same units as the

*<sup>f</sup> iih MN ih*

)( *<sup>L</sup>*

1

)( <sup>1</sup>

*oi*

The third factor that we have taken into consideration when designing the reference image estimator is related to the sharpness of the edges contained in the image. An image is generally considered to be of good quality if the objects contained in it can be discerned very clearly. Edges characterize boundaries and represent therefore a problem of fundamental importance in image processing. Edges can be regarded as discontinuities between image regions of rather uniform graylevel or color. Since detecting edges means detecting discontinuities, one can use derivative operators as the gradient or Laplacian. Derivative operators are commonly used as well for focus assessment in microscopy imaging (Osibote et al., 2010) and can be employed in image fusion methods (Stanciu, 2011). We have employed the Sobel edge detector (Gonzales and Woods, 2002) in the design of the automatic reference frame estimator. The Sobel operator uses a pair of 3x3 convolution masks *Sx* & *Sy*, with *Sx* estimating the gradient in the *x*-direction (columns), while *Sy*

 

1 *<sup>L</sup>*

 1

*i <sup>f</sup> <sup>f</sup> iih*

0

))(( <sup>1</sup>

2

data. An unbiased estimate of the standard deviation can be defined as follows:

*MN*

*oi*

*L*

 *oi*

1

*iih*

)(

*L*

1

(1)

(2)

In Fig. 5 we present a subset of a a stack of images collected by CSLM on a sol–gel matrice sample doped with a photosensitizer, in original aspect and in the aspect resulted after processing by using the algorithm described in this section. The number in the top left corner depicts the numerical order of optical sections in the full series. The distance between sections in the subset is of 1 μm. The image series were collected by using a Leica TCS SP CLSM system, working in reflection workmode (HeNe 633nm). A HC PL FLUOTAR 20.0x objective was used, having a numerical aperture of 0.50.

Fig. 5. Subset from CSLM stack collected on sol-gel matrice sample a) in original aspect; b) in an aspect resulted after histogram warping to a normalized histogram

#### **3. Intensity attenuation based on reference frame detection & exact histogram specification**

The drawback of the method presented above is a certain over-enhancement which may occur, as in the case of the basic histogram equalization method presented in (Stanciu and Friedman, 2009). If the contrast is increased too match, false contours may appear along with an enhancement of the noise. In order to eliminate these drawbacks, Stanciu et al. (2010) have introduced a different technique based on histogram modeling aiming to compensate the light attenuation in the case of CSLM images. In the proposed method instead of a uniform or a normalized histogram, the histogram of the best visual quality image of the stack is specified to the other images in the stack. This image of best visual quality was entitled the 'reference frame'. The reference frame was elected from among the images that make up the stack based on a procedure that offers automated selection. In order to specify the histogram of the reference frame to the others, the exact histogram specification algorithm introduced by Coltuc et al., (2006) was used instead of the classical histogram specification algorithms. It should be stressed that the approach of Coltuc et al., (2006) provides exact results, while the classical histogram specification as well as the histogram warping method proposed by Cox et al. (1995) provide only approximate results.

In Fig. 5 we present a subset of a a stack of images collected by CSLM on a sol–gel matrice sample doped with a photosensitizer, in original aspect and in the aspect resulted after processing by using the algorithm described in this section. The number in the top left corner depicts the numerical order of optical sections in the full series. The distance between sections in the subset is of 1 μm. The image series were collected by using a Leica TCS SP CLSM system, working in reflection workmode (HeNe 633nm). A HC PL FLUOTAR 20.0x

Fig. 5. Subset from CSLM stack collected on sol-gel matrice sample a) in original aspect;

The drawback of the method presented above is a certain over-enhancement which may occur, as in the case of the basic histogram equalization method presented in (Stanciu and Friedman, 2009). If the contrast is increased too match, false contours may appear along with an enhancement of the noise. In order to eliminate these drawbacks, Stanciu et al. (2010) have introduced a different technique based on histogram modeling aiming to compensate the light attenuation in the case of CSLM images. In the proposed method instead of a uniform or a normalized histogram, the histogram of the best visual quality image of the stack is specified to the other images in the stack. This image of best visual quality was entitled the 'reference frame'. The reference frame was elected from among the images that make up the stack based on a procedure that offers automated selection. In order to specify the histogram of the reference frame to the others, the exact histogram specification algorithm introduced by Coltuc et al., (2006) was used instead of the classical histogram specification algorithms. It should be stressed that the approach of Coltuc et al., (2006) provides exact results, while the classical histogram specification as well as the histogram warping method proposed by Cox et al. (1995) provide only approximate results.

b) in an aspect resulted after histogram warping to a normalized histogram

**3. Intensity attenuation based on reference frame detection & exact** 

objective was used, having a numerical aperture of 0.50.

**histogram specification** 

As mentioned above, the proposed method relies on specifying the histogram that corresponds to the image of the best visual quality in the series onto the rest of the images in the stack. The reference frame can be selected by visual inspection, with a human operator examining the entire image stack and choosing the best quality frame. Obviously, such an approach is both subjective and time consuming. Therefore, the authors have proposed a procedure that offers the automated detection of the reference frame. In order to automate the reference frame detection, a quality assessment metric is defined, and the reference frame is selected as the one with the best score with respect to the considered metric. The metric that the authors have proposed is based on the evaluation of three attributes which are generally considered as responsible for the quality of a graylevel images. These are: brightness, contrast and contour sharpness.

A good measure of image brightness is the average graylevel of the image. Considering the discrete image *f*:[0,*M*-1] x [0,*N*-1]->[0,*L*-1] and *H*={*h*(0), *h*(1),… *h*(*L*-1)} its histogram. The average graylevel, *μf*, can be defined as follows:

$$\mu\_f = \frac{\sum\_{i=o}^{L-1} i h(i)}{\sum\_{l=o}^{L-1} h(i)} = \frac{1}{MN} \sum\_{l=o}^{L-1} i h(l) \tag{1}$$

As the standard deviation measures how widely spread the values in a data set are, it can be regarded as a measure of contrast. If many data points are close to the mean, then the standard deviation is small. By the contrary, if many data points are far from the mean, then the standard deviation is large. Finally, if all data values are equal, then the standard deviation is zero. Obviously the variance is a measure of the contrast, too. The standard deviation has the advantage that, unlike variance, it is expressed in the same units as the data. An unbiased estimate of the standard deviation can be defined as follows:

$$
\sigma\_f = \sqrt{\frac{1}{MN - 1} \sum\_{i=0}^{L-1} h(i)(i - \mu\_f)^2} \tag{2}
$$

The third factor that we have taken into consideration when designing the reference image estimator is related to the sharpness of the edges contained in the image. An image is generally considered to be of good quality if the objects contained in it can be discerned very clearly. Edges characterize boundaries and represent therefore a problem of fundamental importance in image processing. Edges can be regarded as discontinuities between image regions of rather uniform graylevel or color. Since detecting edges means detecting discontinuities, one can use derivative operators as the gradient or Laplacian. Derivative operators are commonly used as well for focus assessment in microscopy imaging (Osibote et al., 2010) and can be employed in image fusion methods (Stanciu, 2011). We have employed the Sobel edge detector (Gonzales and Woods, 2002) in the design of the automatic reference frame estimator. The Sobel operator uses a pair of 3x3 convolution masks *Sx* & *Sy*, with *Sx* estimating the gradient in the *x*-direction (columns), while *Sy* estimating the gradient in the *y*-direction (rows):

Compensating Light Intensity Attenuation in Confocal

optimized for structural similarity is proposed by Avanki (2009).

Scanning Laser Microscopy by Histogram Modeling Methods 195

information about graylevels around the current pixel *f*(*x*,*y*). Furthermore, to each pixel *f*(*x*,*y*) is associated the *K*-tuple *Φ*(*f*)(*x,y*)={*φ*1(*f*)(*x,y*), *φ*2(*f*)(*x,y*),…, *φK*(*f*)(*x,y*)}. Finally, the new ordering between image pixels is defined by ordering in lexicographic order on the corresponding *K*-tuple set. A higher value for *K* is equivalent to a finer ordering. If K is large enough, for natural images a strict ordering is induced. Furthermore, it clearly appears that the inclusion among the filter supports orders the amount of information extracted by each filter. Thus, when *i* is small, the information extracted is strongly connected to the current pixel. As index *i* increases, support *Wi* increases as well and the weight of the current pixel decreases in the filter response. This is a reason for ranking pixels using the lexicographic order starting with the first index. In case of using moving average filters, it appears that an almost strict ordering is obtained for a rather small value of *K*, i.e., *K*=6. Other linear or nonlinear filters can be used as well (Gaussian filters, median, etc.). Wan and Shi (2007) applied the same idea of ordering, but on the coefficients of the non-decimated wavelet decomposition of the image. A discussion on exact histogram specification can be found also in Bevilacqua and Azzari (2007) while a solution for exact global histogram specification

In Fig. 6 we present a subset from the stack of images collected by CSLM on a sol–gel matrice sample doped with a photosensitizer (presented in Fig. 5, as well), in original aspect and in the aspect resulted after processing by using the algorithm described in this section.

Fig. 6. Subset from CSLM stack collected on sol-gel matrice sample a) in original aspect; b) in an aspect resulted after exact specification of the reference frame's histogram

In Fig. 7, an example of how the histogram of an image is modified upon exact histogram specification and histogram warping is presented. The initial histograms of the 2nd, 9th, 14th, 20th images in the stack; the specified histogram, which is actually the histogram of the 9th image in the stack; and the modality in which histogram warping influences the histogram shape in the case of a particular example (i.e., frame 20) are illustrated. Both the warping model histogram and the histogram resulted after histogram warping are presented. The

$$S\_x = \begin{vmatrix} -1 & 0 & 1 \\ -1 & 0 & 1 \\ -1 & 0 & 1 \end{vmatrix} \quad S\_y = \begin{vmatrix} -1 & -1 & -1 \\ 0 & 0 & 0 \\ 1 & 1 & 1 \end{vmatrix} \tag{3}$$

With *gx* & *gy* being the gradients in x and y direction computed by convolution with *Sx*, and *Sy*, respectively, the magnitude of the gradient can be defined as:

$$\mathbf{g} = \sqrt{\mathbf{g}\_x^2 + \mathbf{g}\_y^2} \tag{4}$$

As an estimate on the sharpness of contours (edges) contained in image f, the mean intensity of its gradient image *g*, namely *μg* is used.

The three measures discussed above, *μf, σf,* and *μg* are normalized in order to take values in [0,1]. Thus, *μf* and *μg* are divided by *L*-1, the graylevel range, and *σf* is divided by (*L*-1)2/12, i.e., by the standard deviation of a uniform random variable defined on [0,*L*-1]. Finally, the quality measure for automated detection of the reference frame, *qf* is computed as the simple product:

$$
\pi\_f = \mu\_f \sigma\_f \mu\_g \tag{5}
$$

After computing *qf* for all the images in the stack, the image that outputs the highest response to *qf* is selected as reference frame. Further on, its histogram is specified to the other images by using the exact histogram specification algorithm of Coltuc et al. (2006). This algorithm is presented in the next part.

Considering *H*={*h*(0), *h*(1),… *h*(*L*-1)} the histogram to be specified, the exact histogram specification proceeds as follows:


The definition of the strict order relation is the essential stage of histogram specification. Since image informational content should be preserved, the induced ordering must be consistent with the normal ordering. This means that, if a pixel graylevel is greater than another one with the normal ordering on integers set, it should be greater with the new ordering as well. The new ordering should refine the normal ordering on the set of integers, i.e., equal pixels according to the normal order will be differentiated by the induced ordering. Meantime, in order to avoid noise, the induced ordering should correspond, in a certain way, to the human perception of brightness.

Coltuc et al. (2006) have transferred the problem of ordering on a scalar image to a *K*dimensional space by associating a vector to each pixel. The image pixels are ordered by lexicographically ordering the vectors and inducing the same ordering among them. The approach considers a bank of *K* filters, *Φ*={*φ*1, *φ*2,…, *φK*} whose supports *Wi*, i=1,…,*K*, are symmetric and obey an inclusion relation: . The support of *φ*1, *W*1, is one pixel size. The size of each *Wi* is kept to a minimum. Each filter extracts some local *WWW <sup>K</sup>* ... <sup>21</sup>

With *gx* & *gy* being the gradients in x and y direction computed by convolution with *Sx*, and

*Sy*

111 000 111

101 101 101

As an estimate on the sharpness of contours (edges) contained in image f, the mean intensity

22 *yx ggg*

The three measures discussed above, *μf, σf,* and *μg* are normalized in order to take values in [0,1]. Thus, *μf* and *μg* are divided by *L*-1, the graylevel range, and *σf* is divided by (*L*-1)2/12, i.e., by the standard deviation of a uniform random variable defined on [0,*L*-1]. Finally, the quality measure for automated detection of the reference frame, *qf* is computed as the simple

After computing *qf* for all the images in the stack, the image that outputs the highest response to *qf* is selected as reference frame. Further on, its histogram is specified to the other images by using the exact histogram specification algorithm of Coltuc et al. (2006).

*gfff q* 

Considering *H*={*h*(0), *h*(1),… *h*(*L*-1)} the histogram to be specified, the exact histogram

1. The image pixels are ordered in increasing order by using a special strict ordering

The definition of the strict order relation is the essential stage of histogram specification. Since image informational content should be preserved, the induced ordering must be consistent with the normal ordering. This means that, if a pixel graylevel is greater than another one with the normal ordering on integers set, it should be greater with the new ordering as well. The new ordering should refine the normal ordering on the set of integers, i.e., equal pixels according to the normal order will be differentiated by the induced ordering. Meantime, in order to avoid noise, the induced ordering should correspond, in a

Coltuc et al. (2006) have transferred the problem of ordering on a scalar image to a *K*dimensional space by associating a vector to each pixel. The image pixels are ordered by lexicographically ordering the vectors and inducing the same ordering among them. The approach considers a bank of *K* filters, *Φ*={*φ*1, *φ*2,…, *φK*} whose supports *Wi*, i=1,…,*K*, are symmetric and obey an inclusion relation: . The support of *φ*1, *W*1, is one pixel size. The size of each *Wi* is kept to a minimum. Each filter extracts some local

*WWW <sup>K</sup>* ... <sup>21</sup>

2. The ordered string is split from left to right in groups of *h*(*j*) pixels;

3. For all the pixels in a group *j* the *j*th gray level is assigned, where *j*=0,…,*L*-1.

*Sy*, respectively, the magnitude of the gradient can be defined as:

*Sx*

 

of its gradient image *g*, namely *μg* is used.

This algorithm is presented in the next part.

certain way, to the human perception of brightness.

specification proceeds as follows:

relation;

product:

, (3)

(4)

(5)

information about graylevels around the current pixel *f*(*x*,*y*). Furthermore, to each pixel *f*(*x*,*y*) is associated the *K*-tuple *Φ*(*f*)(*x,y*)={*φ*1(*f*)(*x,y*), *φ*2(*f*)(*x,y*),…, *φK*(*f*)(*x,y*)}. Finally, the new ordering between image pixels is defined by ordering in lexicographic order on the corresponding *K*-tuple set. A higher value for *K* is equivalent to a finer ordering. If K is large enough, for natural images a strict ordering is induced. Furthermore, it clearly appears that the inclusion among the filter supports orders the amount of information extracted by each filter. Thus, when *i* is small, the information extracted is strongly connected to the current pixel. As index *i* increases, support *Wi* increases as well and the weight of the current pixel decreases in the filter response. This is a reason for ranking pixels using the lexicographic order starting with the first index. In case of using moving average filters, it appears that an almost strict ordering is obtained for a rather small value of *K*, i.e., *K*=6. Other linear or nonlinear filters can be used as well (Gaussian filters, median, etc.). Wan and Shi (2007) applied the same idea of ordering, but on the coefficients of the non-decimated wavelet decomposition of the image. A discussion on exact histogram specification can be found also in Bevilacqua and Azzari (2007) while a solution for exact global histogram specification optimized for structural similarity is proposed by Avanki (2009).

In Fig. 6 we present a subset from the stack of images collected by CSLM on a sol–gel matrice sample doped with a photosensitizer (presented in Fig. 5, as well), in original aspect and in the aspect resulted after processing by using the algorithm described in this section.

Fig. 6. Subset from CSLM stack collected on sol-gel matrice sample a) in original aspect; b) in an aspect resulted after exact specification of the reference frame's histogram

In Fig. 7, an example of how the histogram of an image is modified upon exact histogram specification and histogram warping is presented. The initial histograms of the 2nd, 9th, 14th, 20th images in the stack; the specified histogram, which is actually the histogram of the 9th image in the stack; and the modality in which histogram warping influences the histogram shape in the case of a particular example (i.e., frame 20) are illustrated. Both the warping model histogram and the histogram resulted after histogram warping are presented. The

Compensating Light Intensity Attenuation in Confocal

corresponding to a particular optical section.

stationarity (Semechko et al, 2011).

the nonlinear diffusion filter.

filtering.

Scanning Laser Microscopy by Histogram Modeling Methods 197

homogenize locally over-enhanced image regions. The method overviewed in this section consists in three main steps: calculation of the reference histogram, the exact specification of the reference histogram to all the images taking part of the CLSM stack and diffusion

The reference histogram is computed by using the method proposed by Capek et al. (2006), taking into account the intensity information of the entire CLSM stack. As detailed in section 2, the first step in computing the normalized histogram involves remapping of intensities of individual CLSM frames so that they cover the same intensity range. The authors regard the resulted reference histogram as a global representation of intensity distribution of the entire volumetric image, without being targeted towards any specific cross section. As the authors experiment on biofilm samples, they consider this approach more appropriate than the one presented in Stanciu et al. (2010), because the proportions of biofilm and fluid may be different in the top and bottom CLSM cross sections that they experiment with. This situation can result in different aspect of the images in the stack depending on the fluid ratio

In the next step, the reference histogram is specified in its exact form to the images that make up the CLSM stack. The algorithm used for the exact histogram specification is the one used also by Stanciu et al. (2010), presented in section 3 of this chapter. The authors justify their choice as regardless of the difference in intensity representation of the materials in lower and upper CLSM cross sections, the pixels that are likely to represent the biofilm will be mapped to higher intensities and pixels corresponding to fluid will be mapped to lower intensities, thus preserving separability of intensity-based representation and enforcing

In the last step of this method, a diffusion filter is applied to the processed CLSM images in order to suppress any local overenhancement that may occur after the exact specification of the reference frame. In the same time, the diffusion filter can attenuate the noise the CSLM images may contain. The authors justify their choice to use diffusion filtering after the intensity attenuation stage and not before, as in the second case it will cause unequal noise filtering throughout the CLSM stack and will potentially blur structural boundaries in the cross sections most affected by intensity attenuation. With the diffusion being modulated by the magnitude of the intensity gradient (which is related to contrast and hence, to intensity attenuation) between the neighboring voxels, the amount of diffusion will be greater in the bottom cross sections than in the upper cross sections. The authors claim that the gradient magnitude at the biofilm and fluid interface is also likely to be smaller in comparison to the upper cross sections, due to the reduced contrast in the lower cross sections. This approach is meant to avoid changes of structural appearance, which could occur if the gradient magnitude is smaller than the contrast threshold parameter in the diffusion equation used in

Considering *Ω* as the image domain and *I(x,0)*: Ω → the original image. The filtered image,

In case of inhomogeneous isotropic diffusion, �(��) is a spatially dependent scalar quantity commonly referred to as diffusivity. The authors of this method chose to compute

��� � ���(�(��)��) (6)

*I(x,t)*, was obtained as a transient solution of the diffusion equation :

histogram resulted after exact histogram specification, by using the algorithm of Coltuc et al. (2006), is exactly the same histogram as the specified one, in our case the histogram of the 9th image in the stack. It can be observed very clearly the difference between the results obtained by the three different techniques. The complete details of the method presented in this section can be found in the original publication (Stanciu et al, 2010).

Fig. 7. a) Initial histograms of the 2nd,9th,14th,20th frames corresponding to the sol gel matrice sample; b) specified histogram; c) histogram warping model; d) resulted histogram after warping.

#### **4. Intensity attenuation by exact specification of a normalized histogram and nonlinear diffusion filtering**

The third method that we overview is the one of Semechko et al, (2011), who propose a method that performs intensity attenuation correction that combines aspects from the methods of Capek et. al. (2006) and Stanciu et. al. (2010) with nonlinear diffusion filtering. The authors choose to use nonlinear diffusion filtering aiming to suppress noise and

histogram resulted after exact histogram specification, by using the algorithm of Coltuc et al. (2006), is exactly the same histogram as the specified one, in our case the histogram of the 9th image in the stack. It can be observed very clearly the difference between the results obtained by the three different techniques. The complete details of the method presented in

 Fig. 7. a) Initial histograms of the 2nd,9th,14th,20th frames corresponding to the sol gel matrice sample; b) specified histogram; c) histogram warping model; d) resulted histogram after

**4. Intensity attenuation by exact specification of a normalized histogram and** 

The third method that we overview is the one of Semechko et al, (2011), who propose a method that performs intensity attenuation correction that combines aspects from the methods of Capek et. al. (2006) and Stanciu et. al. (2010) with nonlinear diffusion filtering. The authors choose to use nonlinear diffusion filtering aiming to suppress noise and

warping.

**nonlinear diffusion filtering** 

this section can be found in the original publication (Stanciu et al, 2010).

homogenize locally over-enhanced image regions. The method overviewed in this section consists in three main steps: calculation of the reference histogram, the exact specification of the reference histogram to all the images taking part of the CLSM stack and diffusion filtering.

The reference histogram is computed by using the method proposed by Capek et al. (2006), taking into account the intensity information of the entire CLSM stack. As detailed in section 2, the first step in computing the normalized histogram involves remapping of intensities of individual CLSM frames so that they cover the same intensity range. The authors regard the resulted reference histogram as a global representation of intensity distribution of the entire volumetric image, without being targeted towards any specific cross section. As the authors experiment on biofilm samples, they consider this approach more appropriate than the one presented in Stanciu et al. (2010), because the proportions of biofilm and fluid may be different in the top and bottom CLSM cross sections that they experiment with. This situation can result in different aspect of the images in the stack depending on the fluid ratio corresponding to a particular optical section.

In the next step, the reference histogram is specified in its exact form to the images that make up the CLSM stack. The algorithm used for the exact histogram specification is the one used also by Stanciu et al. (2010), presented in section 3 of this chapter. The authors justify their choice as regardless of the difference in intensity representation of the materials in lower and upper CLSM cross sections, the pixels that are likely to represent the biofilm will be mapped to higher intensities and pixels corresponding to fluid will be mapped to lower intensities, thus preserving separability of intensity-based representation and enforcing stationarity (Semechko et al, 2011).

In the last step of this method, a diffusion filter is applied to the processed CLSM images in order to suppress any local overenhancement that may occur after the exact specification of the reference frame. In the same time, the diffusion filter can attenuate the noise the CSLM images may contain. The authors justify their choice to use diffusion filtering after the intensity attenuation stage and not before, as in the second case it will cause unequal noise filtering throughout the CLSM stack and will potentially blur structural boundaries in the cross sections most affected by intensity attenuation. With the diffusion being modulated by the magnitude of the intensity gradient (which is related to contrast and hence, to intensity attenuation) between the neighboring voxels, the amount of diffusion will be greater in the bottom cross sections than in the upper cross sections. The authors claim that the gradient magnitude at the biofilm and fluid interface is also likely to be smaller in comparison to the upper cross sections, due to the reduced contrast in the lower cross sections. This approach is meant to avoid changes of structural appearance, which could occur if the gradient magnitude is smaller than the contrast threshold parameter in the diffusion equation used in the nonlinear diffusion filter.

Considering *Ω* as the image domain and *I(x,0)*: Ω → the original image. The filtered image, *I(x,t)*, was obtained as a transient solution of the diffusion equation :

$$
\partial\_t I = \operatorname{div}(\mathcal{D}(\nabla I)\nabla I) \tag{6}
$$

In case of inhomogeneous isotropic diffusion, �(��) is a spatially dependent scalar quantity commonly referred to as diffusivity. The authors of this method chose to compute

Compensating Light Intensity Attenuation in Confocal

Research Projects 1566/2008 and 1882/2008.

**6. Acknowledgments** 

**7. References** 

Scanning Laser Microscopy by Histogram Modeling Methods 199

The research presented in this paper has been supported by the National Program of Research, Development and Innovation PN-II-IDEI-PCE, UEFISCDI, in the framework of

Avanaki, A.N. (2009). Exact global histogram specification optimized for structural

Bevilacqua, P. Azzari, (2007). A High Performance Exact Histogram Specification Algorithm, ICIAP 2007, Image Analysis and Processing, pages 623-628. Black, M.J., Sapiro, G., Marimont, D.H., Heeger, D. (1998). Robust anisotropic diffusion, *IEEE Transactions on Image Processing*, Volume 7, Issue 3, pages 421-32. Capek, M., Janacek, J. and Kubinova, L., (2006), Methods for Compensation of the Light

Coltuc, D., Bolon, P., & Chassery, J.M., (2006), Exact Histogram Specification, *IEEE* 

J. Cox, S. Rey, S. L. Hingorani, (1995). Dynamic Histogram Warping of Image Pairs for

Gonzales, R.C. & Woods, R.E., (2002). Digital Image Processing. Upper Saddle River, NJ:

Nyul, L.G., Udupa, J.K. and Zhang, X. (2000). New variants of a method of MRI scale standardization. *IEEE Transactions on Medical Imaging*, volume 19, pages 143–150. Osibote, O., Dendere, R., Krishnan, S. and Douglas, T. (2010). Automated focusing in bright-

Rigaut, J.P., Vassy, J., Herlin, P., Duigou, F., Masson, E., Briane, D., Foucrier, J., Carvajal-

Rodriguez, A., Ehlenberger, D., Kelliher, K., Einstein, M., Henderson, S. C., Morrison, J. H.,

Sheppard, C.J.R., 15 years of scanning optical microscopy at Oxford. Proceedings Royal

Sheppard, C.J.R., Hotton, D.M., Shotton, D. (1997). Confocal Laser Scanning Microscopy,

Stanciu, S.G. Friedmann, J., Compensating the effects of light attenuation in confocal

Attenuation with depth of Images Captured by a Confocal Microscope, *Microscopy* 

Constant Image Brightness, Proceedings of the IEEE Intl. Conf. on Image

field microscopy for tuberculosis detection, *Journal of Microscopy*, Volume 240, Issue

Gonzalez, S., Downs, A.M., Mandard, A.M. (1991) Three-dimensional DNA image cytometry by confocal scanning laser microscopy in thick tissue blocks, *Cytometry*,

Hof, P. R., Wearne, S. L., Automated reconstruction of three-dimensional neuronal morphology from laser scanning microscopy images. *Methods* 30 (1), 2003, 94–105. ,

microscopy by histogram modelling techniques, Proceedings of IEEE ICTON-MW

similarity, *Optical Review*, Volume 16, Number 6, pages 613-621.

*Research and Technique,* Volume 69, Issue 8, pages 624 – 635.

Processing, ICIP'95, vol. 2, p. 366- 369

Microscopical Society, 1990. 25: p. 319-321

Prentice-Hall.

2, pages 155–163.

vol. 12, pp. 511-524

ISSN: 1046-2023

2008, pages 1-5

ISBN 0387915141 ,Oxford

*Transactions on Image Processing*, volume 15, issue 5 , p. 1143 – 1152

Minsky, M., Microscopy apparatus, 1961: US Patent No. 3013467, filed Nov. 7, 1957.

diffusivity using Tukey's biweight ( Black et al, 1998). The complete details of the algorithm presented in this section can be found in the original publication (Semechko et al, 2011)

#### **5. Conclusions**

In this chapter three recent methods for the compensation of intensity attenuation in CSLM imaging have been overviewed. The method based on histogram warping requires no interaction with the user and can be performed automatically, but the resulted image stack can be subject to contrast over enhancement. The method based on histogram specification that we have previously reported, can be performed either automatically, providing an algorithm that has the ability to automatically select the reference image is employed, or manually, assuming a human operator can nominate the image of best quality in the stack. A very important feature offered by the exact histogram specification and equalization is that all the images in the processed sequence have normalized histograms. This can lead to very effective results in the case of segmentation tasks, as the algorithms for thresholding and segmentation are based on mixtures of Gaussian probability density functions and optimal coding schemes are expected to be obtained if the image within the processed stack will have similar histograms. In the case when all images in the stack are considerably affected by the light attenuation, the methods based on histogram warping will provide better results than the method relying on histogram specification. However, these two methods may provide a result which can be radically different in aspect than the initial aspect of the images due to the possible over-enhancement. In the case of the method based on the exact specification of the reference frame's histogram, the histogram of one of the images in the stack (noted throughout the chapter as the reference image) is specified onto the other images in order to preserve an aspect close to original one. The reference image can be determined automatically taking into consideration the brightness, the standard deviation and the sharpness of the contours (edges) contained in the image. The proposed algorithm can provide the premises for the fast processing of the image sequence. However, when choosing to use this method one must limit to image stacks which contain images that represent optical sections of the same object; otherwise the contrast of the resulted images will be influenced not only by the light attenuation but also by the morphological structure of the objects contained in the image. In this case it would be useless specifying the histogram of an image that represents one object of a certain shape onto another image depicting an object with a completely different shape. The results in that case would be quite unpredictable. The proposed method based on the exact specification of the reference frame's histogram improves the contrast of the image in the case when images of good contrast are present in the stack, and in the same time preserves the initial aspect of the images. When the image content is not uniform throughout the stack the methods based on histogram warping may be regarded as better alternatives.

The techniques presented in this chapter do not restorate any high frequency degradation and, in the same time, cannot compensate the drawbacks that are related to the pinhole size that was used during image acquisition. These methods are simply meant to enhance the visual appearance of collected images that have a deficient aspect due to intensity attenuation, and to assist in segmentation tasks. Besides enhancement, histogram specification yields image normalization in respect to the average gray level, energy, entropy, etc.

#### **6. Acknowledgments**

The research presented in this paper has been supported by the National Program of Research, Development and Innovation PN-II-IDEI-PCE, UEFISCDI, in the framework of Research Projects 1566/2008 and 1882/2008.

#### **7. References**

198 Digital Image Processing

diffusivity using Tukey's biweight ( Black et al, 1998). The complete details of the algorithm presented in this section can be found in the original publication (Semechko et al, 2011)

In this chapter three recent methods for the compensation of intensity attenuation in CSLM imaging have been overviewed. The method based on histogram warping requires no interaction with the user and can be performed automatically, but the resulted image stack can be subject to contrast over enhancement. The method based on histogram specification that we have previously reported, can be performed either automatically, providing an algorithm that has the ability to automatically select the reference image is employed, or manually, assuming a human operator can nominate the image of best quality in the stack. A very important feature offered by the exact histogram specification and equalization is that all the images in the processed sequence have normalized histograms. This can lead to very effective results in the case of segmentation tasks, as the algorithms for thresholding and segmentation are based on mixtures of Gaussian probability density functions and optimal coding schemes are expected to be obtained if the image within the processed stack will have similar histograms. In the case when all images in the stack are considerably affected by the light attenuation, the methods based on histogram warping will provide better results than the method relying on histogram specification. However, these two methods may provide a result which can be radically different in aspect than the initial aspect of the images due to the possible over-enhancement. In the case of the method based on the exact specification of the reference frame's histogram, the histogram of one of the images in the stack (noted throughout the chapter as the reference image) is specified onto the other images in order to preserve an aspect close to original one. The reference image can be determined automatically taking into consideration the brightness, the standard deviation and the sharpness of the contours (edges) contained in the image. The proposed algorithm can provide the premises for the fast processing of the image sequence. However, when choosing to use this method one must limit to image stacks which contain images that represent optical sections of the same object; otherwise the contrast of the resulted images will be influenced not only by the light attenuation but also by the morphological structure of the objects contained in the image. In this case it would be useless specifying the histogram of an image that represents one object of a certain shape onto another image depicting an object with a completely different shape. The results in that case would be quite unpredictable. The proposed method based on the exact specification of the reference frame's histogram improves the contrast of the image in the case when images of good contrast are present in the stack, and in the same time preserves the initial aspect of the images. When the image content is not uniform throughout the stack the methods based on

histogram warping may be regarded as better alternatives.

The techniques presented in this chapter do not restorate any high frequency degradation and, in the same time, cannot compensate the drawbacks that are related to the pinhole size that was used during image acquisition. These methods are simply meant to enhance the visual appearance of collected images that have a deficient aspect due to intensity attenuation, and to assist in segmentation tasks. Besides enhancement, histogram specification yields image normalization in respect to the average gray level, energy,

**5. Conclusions** 

entropy, etc.


Stanciu, S.G., Stanciu G.A. and Coltuc, D. (2010). Automated compensation of light

Stanciu, S.G., (2011) Image Fusion Methods for Confocal Scanning Laser Microscopy

Sugawara, Y., Kamioka, H., Honjo, T., Tezuka, K., Takano-Yamamoto, T., (2005) Three-

Wan, Y. and Shi, D. (2007). Joint Exact Histogram Specification and Image Enhancement

Semechko, A., Sudarsan, R., Bester, E., Dony, R. and Eberl, H. (2011) Influence of light

Wilson, T. (2002). Confocal microscopy: Basic principles and architectures. In: Diaspro A,

using confocal microscopy, *Bone*, 36, 5, Pages 877-883, ISSN 8756-3282 Sun, Y., Rajwa, B and Robinson, J.P., (2004), Adaptive Image-Processing Technique and

*Research and Technique*, Volume 73, Issue 3, pages 165–175, 2010

Publisher

*Technique*, volume 64, pages156-163

*Eng.,* volume 31, issue 2, pages 135-144.

advances, ISBN 0471409200, New York.

issue 9, pages 2245-2250

attenuation in confocal microscopy by exact histogram specification, Microscopy

experimented on Images of Photonic Quantum Ring Laser Devices, in "Image Fusion" Ed. Osamu Ukimura, ISBN 978-953-7619-X-X, INTECH Open Access

dimensional reconstruction of chick calvarial osteocytes and their cell processes

Effective Visualization of Confocal Microscopy Images, *Microscopy Research and* 

through the Wavelet Transform, *IEEE Transactions on Image Processing*, volume 16,

attenuation on biofilm parameters evaluated from CLSM image data, *J. Med. Biol.* 

editor. Confocal and two-photon microscopy: Foundations, applications and

### *Edited by Stefan G. Stanciu*

This book presents several recent advances that are related or fall under the umbrella of 'digital image processing', with the purpose of providing an insight into the possibilities offered by digital image processing algorithms in various fields. The presented mathematical algorithms are accompanied by graphical representations and illustrative examples for an enhanced readability. The chapters are written in a manner that allows even a reader with basic experience and knowledge in the digital image processing field to properly understand the presented algorithms. Concurrently, the structure of the information in this book is such that fellow scientists will be able to use it to push the development of the presented subjects even further.

Photo by tgasser / iStock

Digital Image Processing

Digital Image Processing

*Edited by Stefan G. Stanciu*