**2.2. Proposed approach**

The knowledge discovery process comprehends three main steps: (1) data preparation of satellite image time series, (2) extraction of the NDVI profiles, and (3) clustering analysis. **Figure 2** presents a flowchart of the proposed process to assess multi-temporal satellite images.

### *2.2.1. Satellite image time series (SITS)*

The database of multi-temporal NDVI/NOAA/AVHRR images used in this chapter is available at the Centre for Meteorological and Climatic Research Applied to Agriculture (Cepagri) at the University of Campinas (Unicamp), Brazil, having AVHRR/NOAA images recorded since April 1995 with approximately 6 terabytes of data. It was used in the analysis AVHRR/NOAA-16 and AVHRR/NOAA-17 images gathered from April 2001 to March 2010.

It is necessary to preprocess the images, since the AVHRR/NOAA images often have geometric distortions caused by the Earth curvature and rotation, attitude errors and imprecise orbits of the satellite [14]. These distortions must be corrected specially for land applications that require

**Figure 2.** Flowchart with the main steps of proposed approach employed in this chapter.

a highly accurate geometric matching, with one pixel accuracy (1.1 km) in the Equidistant Cylindrical Projection. To perform accurate geometric, the maximum cross-correlation (MCC) method is applied. The MCC method compares a target image to a base image (one for each year season), geometrically accurate and cloudless [15]. The first step to be executed corresponds to the image georeferencing process, which is executed in batch mode by the NAVPRO system [16, 17] to accomplish the necessary tasks, such as:


**2.2. Proposed approach**

26 Time Series Analysis and Applications

*2.2.1. Satellite image time series (SITS)*

The knowledge discovery process comprehends three main steps: (1) data preparation of satellite image time series, (2) extraction of the NDVI profiles, and (3) clustering analysis. **Figure 2** presents a flowchart of the proposed process to assess multi-temporal satellite images.

The database of multi-temporal NDVI/NOAA/AVHRR images used in this chapter is available at the Centre for Meteorological and Climatic Research Applied to Agriculture (Cepagri) at the University of Campinas (Unicamp), Brazil, having AVHRR/NOAA images recorded since April 1995 with approximately 6 terabytes of data. It was used in the analysis AVHRR/NOAA-16 and

It is necessary to preprocess the images, since the AVHRR/NOAA images often have geometric distortions caused by the Earth curvature and rotation, attitude errors and imprecise orbits of the satellite [14]. These distortions must be corrected specially for land applications that require

AVHRR/NOAA-17 images gathered from April 2001 to March 2010.

**Figure 2.** Flowchart with the main steps of proposed approach employed in this chapter.

• Identification of pixels classified as cloud

To attenuate the effect of the atmosphere on the images, maximum-value composite (MVC) of NDVI images was generated. Following the recommendations [18], it is important to mask out the inappropriate pixels, such as cloud-contaminated pixels. The georeferencing module allows users to generate NDVI images for a specific region. As the volume of images is huge, it was used the SatImagExplorer system [19]. This system is interactive and allows the user to specify regions of interest (ROIs), using as input basis a satellite image time series. SatImagExplorer extrapolates the region indication for all images in the sequence, generating time series of the ROIs corresponding to that indicated for all available images. This tool allows the user to focus their analysis on strategic points of interest, as well as facilitates the analysis of a long series of data. Time series extracted from multi-temporal images using SatImagExplorer are one of the data to be mined by the clustering method.

### *2.2.2. Clustering analysis*

The clustering task is defined as a process of grouping similar objects, following a given criterion [20]. In this step, NDVI time series are analyzed by clustering method implemented in the SatImagExplorer system. We have used the partition-based method named k-means.

k-Means divide n objects from the input dataset into k partitions. Initially, the algorithm randomly determines k objects as initial centroids and associates each remaining object to the partition represented by the most similar (closest) centroid. In the end of each iteration, centroids that correspond to the average values of the cluster objects are recalculated to define the new order of n objects in the clusters during the next iteration. The k-means algorithm converges when there are no more changes in the clusters. Although simple and computationally efficient (O(nk)), as k-means considers average values, it is more sensitive to errors when noise and outliers appear in time series [21].

The k-means method uses a distance function to perform similarity search operations to find the series most similar to a given time series that is being analyzed. A distance function or metric can be defined as a similarity measure between two data elements that are, in this case, two time series. The most widely used distance functions are those from the Minkowski family (or Lp norm). The Euclidean distance corresponds to L2, which is commonly used to calculate the distance between multidimensional arrays and vectors. The dynamic time warping (DTW) is a very efficient distance function to compare time series [22]. Its main objective is to keep close time series that have similar behavior but are delayed or distorted along the time axis. Thus, this technique presents a proper way of working to warping, because the comparisons between corresponding points are not rigid. DTW is a tool with two of the main issues raised by high-temporal-resolution satellite image time series, namely, the irregular sampling in the temporal dimension and the need for comparison of pairs of time series having different numbers of samples [23].

We will show next the three clustering analyses performed:

First: k-Means used with Euclidean distance, when we considered only monthly NDVI values. These values of sugarcane fields were extracted using geographical coordinates (latitude and longitude) provided by the Canasat/INPE Project (www.canasat.inpe.br). In this approach, each element of the dataset corresponds to one NDVI value, which refers to a month value in a given location (pixel), in order to obtain monthly analysis of the region of interest. Considering similarity among NDVI values, elements were assigned to different clusters. Five clusters were generated for each month of the crop season (2004–2005), being able to follow the development stage of the crop per month. For example, whether crop is in maturing phase, it has already been harvested, and there are not spectral mixing with other crops or vegetation;

Second: k-Means used with DTW distance function, when we have generated series of NDVI values corresponding to one or more sugarcane crop series. The clustering was determined by five clusters for each crop season (2001–2010) for annual crop monitoring according to the type of planting in each crop season, for example, sugarcane ratoon, sugarcane expansion, sugarcane renewed, sugarcane under renewing and not defined [13, 24].

Third: k-Means used with DTW distance function of three dimensional (multivariate) time series database, extracted from 324 monthly images of NDVI, albedo and surface temperature. Since DTW calculates the distance between pairs of data points using Euclidean distance, DTW method can be applied to multivariate time series. The whole dataset had 220,238 data series, being each observation a triplet of NDVI, albedo and surface temperature values of study area in a given month, with 108 values per time series [25].
