**2. The CANDELA platform**

CANDELA's main objective is the creation of additional value from Copernicus data through the provisioning of modeling and analytics tools provided that the

**79**

level fusion.

*Artificial Intelligence Data Science Methodology for Earth Observation*

tasks of data collection, processing, storage, and access will be carried out by the Copernicus Data Information and Access Service [18]. The corresponding flowchart is presented in **Figure 2** and in [17]. In the end, after the integration of all compo-

The CANDELA platform [17] allows prototyping of EO applications by applying efficient data retrieval, data mining augmented with machine learning techniques, as well as interoperability in order to fully benefit from the available assets and to add more value to the satellite data. It also helps to interactively detect objects or

The implementation of the platform is putting in place a set of powerful tools in artificial intelligence environments (e.g., with machine learning and deep learn-

• To process large volumes of EO data and to perform data analytics

• To extract the information content from the EO data based on data mining

• To fuse various EO sensors in order to increase and to complement the infor-

From this list of objectives, we focus on two of them, namely, data mining and data fusion (see **Figure 3**). Our goal is to simplify data access and to analyze large volumes of EO data without specific knowledge about the processing of EO data

For the development of the data mining component, we started from [20], and we improved the cascaded active learning system of [21] for typical Copernicus Earth observation images. Its implementation, test, and validation aim at automated knowledge extraction and image content interpretation. The results are

Regarding data fusion, a new sub-component had to be developed within data mining. This new sub-component fuses multispectral and SAR images. There are two types of fusions; one is performed at the feature level and the other one at the semantic level. The results are shown in Section 4.2 for feature-

*DOI: http://dx.doi.org/10.5772/intechopen.86886*

nents, CANDELA will be deployed on top of DIAS.

structures and to classify land cover categories.

mation extracted from different sensors

and to fuse the outputs for content exploration.

presented in Section 4.1.

• To apply deep learning to detect changes in EO data

• To semantically search and index our EO image catalog

ing). These tools have as their objectives:

**Figure 2.**

*CANDELA platform [17].*

*Artificial Intelligence Data Science Methodology for Earth Observation DOI: http://dx.doi.org/10.5772/intechopen.86886*

**Figure 2.** *CANDELA platform [17].*

*Advanced Analytics and Artificial Intelligence Applications*

hensive use of data science concepts brings about.

databases complementing machine learning results.

physical models and find quantitative parameters for predictions.

shall be understood in the following sense:

sensing opportunities.

**2. The CANDELA platform**

Here traceable products yielding quantitative data about physical phenomena, change maps, and change predictions are among our primary goals. Of course, we have to consider the implementation effort as well as the attainable accuracy of our products. For each scenario dealt with below, the reader should try to understand what the additional value of machine learning, artificial intelligence, and compre-

The basic terms of machine learning, artificial intelligence, and data science

• We use the term "machine learning" mainly when we talk about learning target category parameters derived from selected images and applying these parameters to other examples. Currently, we see much progress by "deep" techniques (e.g., deep learning [15, 16]). An important point is the selection of reliable reference data for traceable validation and verification of the methods.

• "Artificial intelligence" describes how machine learning results are exploited for further use. Typically this includes recognizing and being aware of typical situations, making decisions based on the recognized high-level parameters, and predicting future developments. To this end, one can profit from external

• "Data science" covers the entire field of comprehensive data management and tools, machine learning, and artificial intelligence. This includes topics like distributed processing, monitoring of workflows, visualization techniques, and performance monitoring. Even seemingly trivial tasks (e.g., accessing and handling of data) may belong to data science. However, remote sensing still is in urgent need of efficient tools to familiarize the user community with remote

When we look at remote sensing in more detail, we currently see many efforts to transform sensor data to physical quantities that can be exploited for quantitative analysis or modeling. If we accomplish this, we can combine measured data with

In the following, we describe how we applied these concepts in a research project

funded by the European Union [17]; the project's main objective is to allow the creation of added value from Copernicus data through the provisioning of modeling and analytics tools for data collection, processing, storage, and access that are provided by the Copernicus Data and Information Access Services (DIAS) [18] and creating a data science workflow where sub-images (image chips) are annotated,

administered, and validated based on their assigned semantic labels [19]. The chapter is organized in seven main sections. Section 2 explains the CANDELA platform used for prototyping EO applications, while Section 3 describes the characteristics of the data set. Section 4 presents typical examples which a user can obtain when using the platform from Section 2 and the data set from Section 3. Section 5 illustrates the perspectives in EO data science workflows and Section 6 summarizes our conclusions, while Section 7 contains the future

work. The chapter ends with acknowledgments and a list of references.

CANDELA's main objective is the creation of additional value from Copernicus data through the provisioning of modeling and analytics tools provided that the

**78**

tasks of data collection, processing, storage, and access will be carried out by the Copernicus Data Information and Access Service [18]. The corresponding flowchart is presented in **Figure 2** and in [17]. In the end, after the integration of all components, CANDELA will be deployed on top of DIAS.

The CANDELA platform [17] allows prototyping of EO applications by applying efficient data retrieval, data mining augmented with machine learning techniques, as well as interoperability in order to fully benefit from the available assets and to add more value to the satellite data. It also helps to interactively detect objects or structures and to classify land cover categories.

The implementation of the platform is putting in place a set of powerful tools in artificial intelligence environments (e.g., with machine learning and deep learning). These tools have as their objectives:


From this list of objectives, we focus on two of them, namely, data mining and data fusion (see **Figure 3**). Our goal is to simplify data access and to analyze large volumes of EO data without specific knowledge about the processing of EO data and to fuse the outputs for content exploration.

For the development of the data mining component, we started from [20], and we improved the cascaded active learning system of [21] for typical Copernicus Earth observation images. Its implementation, test, and validation aim at automated knowledge extraction and image content interpretation. The results are presented in Section 4.1.

Regarding data fusion, a new sub-component had to be developed within data mining. This new sub-component fuses multispectral and SAR images. There are two types of fusions; one is performed at the feature level and the other one at the semantic level. The results are shown in Section 4.2 for featurelevel fusion.

**Figure 3.** *Block diagram of the CANDELA platform modules [17].*
