**5. Results and discussion**

In order to assess the accuracy of the developed approach water areas have been comprehensively digitized on-screen for selected validation sites of 175.000 pixels in size (350 by 500 or 500 by 350). The subsets have been chosen to contain as many challenging surface types as possible and to represent all different landscape types and sensors. Based on the digitized water reference areas no-water reference areas have been created by buffering the water reference areas with a two pixel buffer because of the mixed pixel problem and inverting the buffered areas. Using the reference areas of water and no-water several error metrics based on confusion matrices have been calculated. These are the probability of detection (POD), probability of false detection (POFD), false alarm ratio (FAR), overall accuracy (OA), average accuracy (AA) and kappa coefficient given in Tab. 3.


Table 3. Results of the accuracy assessment. The first four test sites are subsets of datasets from which reflectance spectra have been analysed during the algorithm development. The last six test sites are subsets from independent validation datasets. The largest errors are highlighted in gray and discussed below.

The overall accuracy (a common error measure for classification results) amounts to 97% or above for all the test sites. However, to evaluate the detection accuracy of an underrepresented class the overall accuracy is not the best measure because it credits correct detections and correct not-detections equally and it is strongly influenced by the dominating class, i.e. the no-water class in this study. The overall measures average accuracy and especially kappa coefficient – although very high, too - reveal the remaining problems of the algorithm much better (highlighted in gray in Tab. 3). However, the most sensitive measures are the class-specific measures POD and FAR.

POFD, POD and FAR are typical measures for evaluating the accuracy of forecasting methods (Jolliffe & Stephenson, 2003) as well as two-class classification problems like detection tasks (one class of interest and one background class). The POFD of a class, also known as the false alarm rate, measures the fraction of false alarm pixels in relation to the

In order to assess the accuracy of the developed approach water areas have been comprehensively digitized on-screen for selected validation sites of 175.000 pixels in size (350 by 500 or 500 by 350). The subsets have been chosen to contain as many challenging surface types as possible and to represent all different landscape types and sensors. Based on the digitized water reference areas no-water reference areas have been created by buffering the water reference areas with a two pixel buffer because of the mixed pixel problem and inverting the buffered areas. Using the reference areas of water and no-water several error metrics based on confusion matrices have been calculated. These are the probability of detection (POD), probability of false detection (POFD), false alarm ratio (FAR), overall accuracy (OA), average accuracy (AA) and kappa coefficient given in

Test site POD POFD FAR OA AA Kappa

Berlin\_09:38 79.5 0.1 9.6 99.8 89.7 0.845 Berlin\_10:12 71.8 0.6 34.7 99.0 85.6 0.679 Potsdam 98.2 0.5 1.8 99.2 98.9 0.977 Helgo\_08:32 99.8 2.3 25.3 97.8 98.8 0.843 Helgo\_09:26 99.6 0.2 3.1 99.8 99.7 0.982 Rheinsberg 98.2 0.3 3.0 99.6 99.0 0.974 Dresden\_sub1 98.7 0.0 7.9 100.0 99.3 0.953 Dresden\_sub2 100.0 2.5 25.1 97.7 98.8 0.844 Mönchsgut 98.8 0.0 0.2 99.7 99.4 0.991 Döberitzer 100.0 1.8 1.9 99.1 99.1 0.981

Table 3. Results of the accuracy assessment. The first four test sites are subsets of datasets from which reflectance spectra have been analysed during the algorithm development. The last six test sites are subsets from independent validation datasets. The largest errors are

The overall accuracy (a common error measure for classification results) amounts to 97% or above for all the test sites. However, to evaluate the detection accuracy of an underrepresented class the overall accuracy is not the best measure because it credits correct detections and correct not-detections equally and it is strongly influenced by the dominating class, i.e. the no-water class in this study. The overall measures average accuracy and especially kappa coefficient – although very high, too - reveal the remaining problems of the algorithm much better (highlighted in gray in Tab. 3). However, the most sensitive measures

POFD, POD and FAR are typical measures for evaluating the accuracy of forecasting methods (Jolliffe & Stephenson, 2003) as well as two-class classification problems like detection tasks (one class of interest and one background class). The POFD of a class, also known as the false alarm rate, measures the fraction of false alarm pixels in relation to the

**5. Results and discussion** 

highlighted in gray and discussed below.

are the class-specific measures POD and FAR.

Tab. 3.

background class, i.e. the number of false alarm pixels divided by the total number of ground truth pixels of the background class (= omission error of the no-water class). The achieved POFDs for the test sites are very low (usually below 1 %) showing that water can be well distinguished from no-water surfaces. This is a big step forward compared to the NDWI and MNDWI which applied to high spatial resolution data result in many false positives for urban surface materials (see Fig. 3).

The POD of a class, also known as hit rate, measures the fraction of the detected pixels of the class of interest that were correctly identified, i.e. the number of correctly identified pixels divided by the total number of ground truth pixels of the class (= producer accuracy of the water class). The achieved PODs for most of the test sites are very high (> 98 %) showing that the developed algorithm usually detects almost all water pixels. False negatives occur only for small water bodies (small ponds within the park at the top left in Berlin\_09:38, parts of the river in Berlin\_10:12, and narrow rivers in Rheinsberg). Possible explanations are the adjacency effect (light from neighbouring pixels that is scattered into the instantaneous field of view by the atmosphere) and diffuse illumination of the water surface by surrounding trees. These two effects might be the reason for the spectral shape of the water spectra of small water bodies with surrounding trees that looks much more like a reflectance spectrum of vegetation than one of water (Fig. 12) and do not show the typical decreasing slopes that enabled the spectral identification of water as shown in section 4.3.1.

Fig. 12. A typical surface reflectance spectrum of water (blue) compared to a reflectance spectrum of a small water body with surrounding trees (green)

The false alarm ratio (FAR) gives the fraction of false alarm pixels in relation to the number of detected water pixels in the image, i.e. the number of false alarm pixels divided by the total number of classified water pixels ( = commission error of water class). This error measure reveals clearly if to much water pixels have been falsely identified. This is the case for the test sites Berlin\_10:12, Helgo\_08:32, and Dresden\_sub2 as well as in a weakened form for Berlin\_09:38. In all of these test sites the confusion is related to shadow areas classified as water. For the test site Helgo\_08:32 this can be explained by the intertidal zone which is wet even when the water is gone. Therefore, it is possible that there are some small water

On the Use of Airborne Imaging Spectroscopy Data for the

*EARSeL Workshop on Imaging Spectroscopy*, Zürich.

Interner Bericht, No. DLR-IB 564-1/2005, DLR, 83 p.

*Canadian Journal of Remote Sensing*, Vol. 30, No. 1, pp. 77-86

*of Remote Sensing*, Vol. 30, No. 6, pp. 1407-1424

*of the Geological Institutions of Uppsala*, Vol. 23, pp. 937-941

27, No. 6, pp. 1035-1051

Union (Hrsg.), 0001-0073 p.

No. 11, pp. 1307-1317

No. 7, pp. 1425-1432

No. 9, pp. 991-1001

p.

Automatic Detection and Delineation of Surface Water Bodies 21

Carleer A.P. & E. Wolff (2006). Urban land cover multi-level region-based classification of

Cocks T., R. Jensen, A. Stewart, I. Wilson & T. Shields (1998). The HyMap airborne

Effler S.W. & M.T. Auer (1987). Optical heterogeneity in Green Bay *Water Resources Bulletin* 

European Parliament and the Council of the European Union (2000). *European Water* 

Frazier P.S. & K.J. Page (2000). Water body detection and delineation with Landsat TM data. *Photogrammetric Engineering and Remote Sensing*, Vol. 66, No. 12, pp. 1461-1467 Gege P. (2005). The Water Colour Simulator WASI - User manual for version 3. DLR-

Guanter L., R. Richter & H. Kaufmann (2009). On the application of the MODTRAN4

Heege T. & J. Fischer (2004). Mapping of water constituents in Lake Constance using

Ji L., L. Zhang & B. Wylie (2009). Analysis of Dynamic Thresholds for the Normalized

Jolliffe I.T. & D.B. Stephenson (2003). *Forecast verification : a practitioner's guide in atmospheric* 

Lira J. (2006). Segmentation and morphology of open water bodies from multispectral images. *International Journal of Remote Sensing*, Vol. 27, No. 18, pp. 4015-4038 Manavalan P., P. Sathyanath & G.L. Rajegowda (1993). Digital image-analysis techniques to

McFeeters S.K. (1996). The use of the normalized difference water index (NDWI) in the

Morel A. (1974). Optical properties of pure water and pure seawater, In: *Optical aspects of oceanography,* Jerlov N.G. & E. Steeman Nielsen (eds.), pp. 1-24, Academic, London Müller J.L. & G.S. Fargion (2002). Ocean Optic Protocols for Satellite Ocean Colour Sensor

Overton I.C. (2005). Modelling floodplain inundation on a regulated river: Integrating GIS,

Pope R.M. & E.S. Fry (1997). Absorption spectrum (380-700 nm) of pure water .2. Integrating

cavity measurements. *Applied Optics*, Vol. 36, No. 33, pp. 8710-8723

*science*. J. Wiley, Chichester, West Sussex, England, Hoboken, NJ

*Engineering and Remote Sensing*, Vol. 59, No. 9, pp. 1389-1395

VHR data by selecting relevant features. *International Journal of Remote Sensing*, Vol.

hyperspectral sensor: the system, calibration and performance. In: *Proc. of the 1st* 

*Framework Directive, Directive 2000/60/EC*. Vol. Official Journal L 327, European

atmospheric radiative transfer code to optical remote sensing. *International Journal* 

multispectral airborne scanner data and a physically based processing scheme.

Difference Water Index. *Photogrammetric Engineering and Remote Sensing*, Vol. 75,

estimate waterspread for capacity evaluations of reservoirs. *Photogrammetric* 

delineation of open water features. *International Journal of Remote Sensing*, Vol. 17,

Validation. Edited by NASA, Sensor Intercomparison and Merger for Biological and Interdisciplinary Ocean Studies (SIMBIOS) Project Technical Memoranda, 308

remote sensing and hydrological models. *River Research and Applications*, Vol. 21,

influenced areas under the shadow which is a problem that has not yet been regarded in the water-shadow separation (section 4.3.1) and is still an open issue for the future.

Another open issue is the detection of white water pixels which are usually to bright to be included in the low albedo mask (section 4.1). This can be seen in the top left side of the test site Mönchsgut.

Overall, it can be seen from Tab. 3 that the accuracies of the independent datasets is not less than the accuracies of the datasets analyzed during the algorithm development. Thus, the algorithm seems to be robust and generalizes well to unknown datasets.
