• *The first one is based on signal-information logic* (**Figures 18** and **19**).

The objective is the knowledge extraction from the sensor signal of the physically meaningful parameters or Earth's surface cover categories.

The process is divided in two steps:


Based on these procedures, value-adding is an iterative process.

The satellite data are generally multi-mission data, e.g., multispectral and SAR data that are restructured in a common database, which becomes the **data source**.

**91**

**Figure 20.**

*The value-adding logic scheme.*

*Artificial Intelligence Data Science Methodology for Earth Observation*

*The signal-information logic scheme: chain data-information-knowledge.*

*The signal-information logic scheme: chain data-information-knowledge-semantic value.*

The **data preparation** component is generating the **Analysis-Ready Data** (ARD) ensuring the least and mandatory processing and organizational steps that enable a

direct analysis, thus minimizing the user interaction at the data level.

*DOI: http://dx.doi.org/10.5772/intechopen.86886*

**Figure 18.**

**Figure 19.**

*Artificial Intelligence Data Science Methodology for Earth Observation DOI: http://dx.doi.org/10.5772/intechopen.86886*

**Figure 18.**

*Advanced Analytics and Artificial Intelligence Applications*

Recently, a new paradigm for Earth observation, namely, Data Knowledge Discovery, was introduced [17]. This paradigm defines the entire chain "datainformation-knowledge-value" and deals with a meaningful EO content extraction,

We developed user-invariant and EO domain-specific compensatory methods for the individual user- and domain-subjective biases. The derived models generate a sharable knowledge body as a means to enable the communication between fragmented knowledge learned from metadata, image data, and other data in synergy with the domain expertise of EO users. Today's EO paradigms and technologies are largely domain-oriented and have to support the communication outlined above. Artificial intelligence big data in Earth observation [13] forced the development of new technologies starting from management platforms [4] and is reaching now

An example for the first category are ESA's Thematic Exploitation Platforms (TEPs) [4] that are designed and focused for coastal applications, forest, geohazards, hydrology, polar, urban, and food and security application domains, integrating standard processing chains that have low user interaction. The Copernicus system (currently still under development) and its data information and access services component [18] are a major achievement but still represent a "classic"

Currently, "classic" existing systems/platforms are usually batch-oriented (e.g., TEPs, DIAS), but with EOLib [20, 40] and the new CANDELA platform [17], this paradigm was "moved" to interactive systems (e.g., supporting active learning). There are three perspectives to describe this type of interactive systems:

The objective is the knowledge extraction from the sensor signal of the physically

○ The first step is an automated batch process to manage the satellite image product files, i.e., to extract the image data and to select the relevant metadata, to perform a spatial breakdown of the image into patches, to estimate for each image patch the particular signatures or primitive descriptors, and

○ In a second step following interactive machine learning paradigms, the extracted information is transformed into semantic entities attached to each image location. The process is a combination of querying, browsing, and active learning. Using positive examples, i.e., training samples for the categories of interest and complemented by negative examples to enhance the accuracies of each class, a user can define the image semantics adapted to

• *The first one is based on signal-information logic* (**Figures 18** and **19**).

to further structure the extracted information in a database.

• *The second perspective is based on the value-adding logic* (**Figure 20**).

The satellite data are generally multi-mission data, e.g., multispectral and SAR data that are restructured in a common database, which becomes the **data source**.

Based on these procedures, value-adding is an iterative process.

meaningful parameters or Earth's surface cover categories.

The process is divided in two steps:

a particular application.

**5. Data science workflows**

the information platforms.

management paradigm.

i.e., the semantic and knowledge aspects.

**90**

*The signal-information logic scheme: chain data-information-knowledge.*

#### **Figure 19.**

*The signal-information logic scheme: chain data-information-knowledge-semantic value.*

The **data preparation** component is generating the **Analysis-Ready Data** (ARD) ensuring the least and mandatory processing and organizational steps that enable a direct analysis, thus minimizing the user interaction at the data level.

Among them are the generation of radiometrically and geometrically calibrated data cubes. **Browsing** the data sets is a first step of visual inspection where the user is getting acquainted with the observed structures and their signatures. Further, data mining is an automated process to discover the main data particularities and categories but also detect artifacts or outliers in the data sets, which are beyond the capabilities of human observation, due to the large data volumes and the nonvisual nature of the satellite images. The discovered and selected data sets are further **analyzed** in detail by extracting the particular characteristics of the observed scenes or objects. The results of the analysis are contributing to update existing models or build new models for the observations. **Visualization** of the model parameters or extracted information is a verification step to cope with large complex data volumes. Specific **evaluation** paradigms are needed to build trust in the obtained results, to be used to make **predictions**. The process is iterative, and when new data are acquired, they will be analyzed further.
