**1. Introduction**

Typical shortcomings of current image analysis tools are the lack of content understanding. This becomes apparent with current developments in Earth observation and data analysis [1]. In this chapter, we therefore concentrate on artificial intelligence (AI) applications and our solution strategies as our main objectives in the field of remote sensing, i.e., the acquisition and semantic interpretation of instrument data from remote platforms such as aircraft or satellites observing, for instance, atmospheric phenomena on Earth for weather prediction—or icebergs drifting in arctic waters endangering maritime transport. In particular, we will describe the exploitation of imaging data acquired by Earth-observing satellites and their sensors.

These satellites may either circle about the Earth (mostly on low polar Earth orbits) or be operated from stationary or slowly moving points high above our planet (on so-called geostationary or geosynchronous orbits). Typical examples are Earth-observing and meteorological satellites. All these instruments have been designed with dedicated goals that, as a rule, can only be fulfilled by systematic and interactive data processing and data interpretation on the ground. The processing and data analysis chains are then the main candidates where one can and shall apply modern data science approaches (e.g., machine learning and artificial intelligence) in order to fully exploit the full information content of the sensor data.

In general, we have quite a number of different sensors installed on satellites. These include passive instruments observing the backscattered solar illumination or thermal emissions from the Earth—or active imaging instruments (transmitting and receiving light pulses or radio signals toward and from the target area being observed). For the ease of understanding, we will limit ourselves to optical sensors operating in the visible and infrared spectral ranges and to radar sensors applying synthetic-aperture radar (SAR) concepts [2, 3]. These instruments provide large-scale images with a typical spatial resolution of 1–40 m per pixel. The images can be acquired from spacecraft orbits that cover the Earth completely with well-defined repeat cycles.

After being transmitted to the ground, the image data will have to undergo systematic processing steps. Typically, the processing schemes follow a stepwise approach where for all steps the image data are accompanied by the necessary descriptor data (metadata). The processing chains start with what we call level-0 data consisting of reordered and annotated detector data; level-1 data provide calibrated sensor data, while level-2 data contain data in commonly known physical units preferably on regular spatial or map grids. Then level-3 data are higher-level products such as thematic maps or time series results (obtained by merging or concatenation of several individual images) or similar operations. Finally, users can apply additional interactive processing steps on their own or exploit available software/platform concepts [4].

This principle of ordered value-adding requires well-established techniques for data management, batch processing and databases, local and distributed (cloud) processing, understanding of the information flow, experience with learning principles, knowledge extraction from image and library data, and discovery of image semantics. At present, typical data sources with easy access are publicly available scientific image data provided by the European Copernicus mission with its Sentinel satellites [5, 6] as well as high-resolution remote sensing images [7, 8]. The European Sentinel satellites comprise among others a constellation of SAR imagers (i.e., Sentinel-1A/ Sentinel-1B providing typically large radar images, with a ground sampling distance of 20 meters and selectable horizontal and vertical polarizations), and a constellation of optical imagers (i.e., Sentinel-2A/Sentinel-2B delivering typically large multispectral images with 13 different bands and a ground resolution—depending on the bands—of 10–60 m). This space segment of the Copernicus mission is complemented by systematic level-1 and level-2 image data processing on the ground and by support environments that serve as comfortable platforms for further data handling and interpretation covering all aspects of applied data science. These approaches then pave the way for deeper semantic data analysis and understanding as typically required in Earth observation for crop yield predictions, atmospheric research, etc.

The design of Earth observation (EO) missions as constellations of several satellites brings important advantages. However, this is not the case for some of the most popular EO missions. **Figure 1** shows typical TerraSAR-X and Copernicus Sentinel overpasses from different orbits and their target areas.

**77**

*Artificial Intelligence Data Science Methodology for Earth Observation*

orbital phasing difference) and a repeat cycle of 11 days with 167 orbits per cycle. Due to its flexibility, TerraSAR-X can cover any point on Earth within a maximum

*Satellite overpasses of Sentinel-1A/Sentinel-1B, Sentinel-2A/Sentinel-2B, and TerraSAR-X (on 23th of August* 

The Sentinel-1 satellites fly on a near-polar, Sun-synchronous orbit, too. The satellite constellations (comprising Sentinel-1A and Sentinel-1B) share the same orbit plane with a 180° orbital phasing difference and a repeat cycle of 6 days with 175 orbits per cycle. Sentinel-1 can cover the equator on 3 days, the Artic on less

Like the Sentinel-1 constellation, the Sentinel-2 constellations (comprising Sentinel-2A and Sentinel-2B) share the same orbit with a separation of 180°. The repeat cycle is 5 days with 143 orbits per cycle. Sentinel-2 can cover the equator on

When selecting data for fusion, we have to constrain ourselves to data acquired

These data handling approaches are typical for recent advances in big data scenarios in distributed systems on the web (e.g., with high data volumes and throughput rates, conventional and innovative data processing steps, additional necessary tools and environments, and greater user expectations). In our case, this affects the tasks of image processing (e.g.*,* data fusion), image understanding, and comparisons with physical models. This can also be seen when we look at the evolution of satellite data analysis. While early concepts started with data being transferred to algorithms, current systems often transfer data to archives, and future systems may

A typical example is the full functionality offered by machine learning tools, while the basic ideas of future data science aspects for Earth observation as seen by the European Space Agency can be found in [13]. In our case, we are interested in applying more theoretical data science, machine learning, and artificial intelligence (for instance, deep learning, powerful classification maps, and prediction results) together with interactive visualization on various information levels. These ideas will be dealt with below for three remote sensing scenarios as detailed in [14]:

• Urban monitoring (urban growth and sprawl, urban classification, and

• Disaster monitoring (earthquakes, inundations, mud slides, etc.)

of 4.5 days and 90% of the Earth's surface within 2 days [9].

as close as possible in time.

**Figure 1.**

*2018 starting at 14:02 UT) [12].*

semantic indicators)

support more and more distributed systems.

• Quantitative interpretation of forested areas

than 1 day, and Europe, Canada, and shipping routes in 1–3 days [10].

5 days under cloud-free conditions and in 2–3 days at mid-latitudes [11].

*DOI: http://dx.doi.org/10.5772/intechopen.86886*

TerraSAR-X flies on a polar Sun-synchronous circular dawn-dusk orbit. This satellite shares its orbit plane with its twin satellite TanDEM-X (keeping a 97.44° *Artificial Intelligence Data Science Methodology for Earth Observation DOI: http://dx.doi.org/10.5772/intechopen.86886*

**Figure 1.**

*Advanced Analytics and Artificial Intelligence Applications*

These satellites may either circle about the Earth (mostly on low polar Earth orbits) or be operated from stationary or slowly moving points high above our planet (on so-called geostationary or geosynchronous orbits). Typical examples are Earth-observing and meteorological satellites. All these instruments have been designed with dedicated goals that, as a rule, can only be fulfilled by systematic and interactive data processing and data interpretation on the ground. The processing and data analysis chains are then the main candidates where one can and shall apply modern data science approaches (e.g., machine learning and artificial intelligence)

In general, we have quite a number of different sensors installed on satellites. These include passive instruments observing the backscattered solar illumination or thermal emissions from the Earth—or active imaging instruments (transmitting and receiving light pulses or radio signals toward and from the target area being observed). For the ease of understanding, we will limit ourselves to optical sensors operating in the visible and infrared spectral ranges and to radar sensors applying synthetic-aperture radar (SAR) concepts [2, 3]. These instruments provide large-scale images with a typical spatial resolution of 1–40 m per pixel. The images can be acquired from spacecraft

in order to fully exploit the full information content of the sensor data.

orbits that cover the Earth completely with well-defined repeat cycles.

software/platform concepts [4].

overpasses from different orbits and their target areas.

After being transmitted to the ground, the image data will have to undergo systematic processing steps. Typically, the processing schemes follow a stepwise approach where for all steps the image data are accompanied by the necessary descriptor data (metadata). The processing chains start with what we call level-0 data consisting of reordered and annotated detector data; level-1 data provide calibrated sensor data, while level-2 data contain data in commonly known physical units preferably on regular spatial or map grids. Then level-3 data are higher-level products such as thematic maps or time series results (obtained by merging or concatenation of several individual images) or similar operations. Finally, users can apply additional interactive processing steps on their own or exploit available

This principle of ordered value-adding requires well-established techniques for data management, batch processing and databases, local and distributed (cloud) processing, understanding of the information flow, experience with learning principles, knowledge extraction from image and library data, and discovery of image semantics. At present, typical data sources with easy access are publicly available scientific image data provided by the European Copernicus mission with its Sentinel satellites [5, 6] as well as high-resolution remote sensing images [7, 8]. The European Sentinel satellites comprise among others a constellation of SAR imagers (i.e., Sentinel-1A/ Sentinel-1B providing typically large radar images, with a ground sampling distance of 20 meters and selectable horizontal and vertical polarizations), and a constellation of optical imagers (i.e., Sentinel-2A/Sentinel-2B delivering typically large multispectral images with 13 different bands and a ground resolution—depending on the bands—of 10–60 m). This space segment of the Copernicus mission is complemented by systematic level-1 and level-2 image data processing on the ground and by support environments that serve as comfortable platforms for further data handling and interpretation covering all aspects of applied data science. These approaches then pave the way for deeper semantic data analysis and understanding as typically required in Earth observation for crop yield predictions, atmospheric research, etc. The design of Earth observation (EO) missions as constellations of several satellites brings important advantages. However, this is not the case for some of the most popular EO missions. **Figure 1** shows typical TerraSAR-X and Copernicus Sentinel

TerraSAR-X flies on a polar Sun-synchronous circular dawn-dusk orbit. This satellite shares its orbit plane with its twin satellite TanDEM-X (keeping a 97.44°

**76**

*Satellite overpasses of Sentinel-1A/Sentinel-1B, Sentinel-2A/Sentinel-2B, and TerraSAR-X (on 23th of August 2018 starting at 14:02 UT) [12].*

orbital phasing difference) and a repeat cycle of 11 days with 167 orbits per cycle. Due to its flexibility, TerraSAR-X can cover any point on Earth within a maximum of 4.5 days and 90% of the Earth's surface within 2 days [9].

The Sentinel-1 satellites fly on a near-polar, Sun-synchronous orbit, too. The satellite constellations (comprising Sentinel-1A and Sentinel-1B) share the same orbit plane with a 180° orbital phasing difference and a repeat cycle of 6 days with 175 orbits per cycle. Sentinel-1 can cover the equator on 3 days, the Artic on less than 1 day, and Europe, Canada, and shipping routes in 1–3 days [10].

Like the Sentinel-1 constellation, the Sentinel-2 constellations (comprising Sentinel-2A and Sentinel-2B) share the same orbit with a separation of 180°. The repeat cycle is 5 days with 143 orbits per cycle. Sentinel-2 can cover the equator on 5 days under cloud-free conditions and in 2–3 days at mid-latitudes [11].

When selecting data for fusion, we have to constrain ourselves to data acquired as close as possible in time.

These data handling approaches are typical for recent advances in big data scenarios in distributed systems on the web (e.g., with high data volumes and throughput rates, conventional and innovative data processing steps, additional necessary tools and environments, and greater user expectations). In our case, this affects the tasks of image processing (e.g.*,* data fusion), image understanding, and comparisons with physical models. This can also be seen when we look at the evolution of satellite data analysis. While early concepts started with data being transferred to algorithms, current systems often transfer data to archives, and future systems may support more and more distributed systems.

A typical example is the full functionality offered by machine learning tools, while the basic ideas of future data science aspects for Earth observation as seen by the European Space Agency can be found in [13]. In our case, we are interested in applying more theoretical data science, machine learning, and artificial intelligence (for instance, deep learning, powerful classification maps, and prediction results) together with interactive visualization on various information levels. These ideas will be dealt with below for three remote sensing scenarios as detailed in [14]:


Here traceable products yielding quantitative data about physical phenomena, change maps, and change predictions are among our primary goals. Of course, we have to consider the implementation effort as well as the attainable accuracy of our products. For each scenario dealt with below, the reader should try to understand what the additional value of machine learning, artificial intelligence, and comprehensive use of data science concepts brings about.

The basic terms of machine learning, artificial intelligence, and data science shall be understood in the following sense:


When we look at remote sensing in more detail, we currently see many efforts to transform sensor data to physical quantities that can be exploited for quantitative analysis or modeling. If we accomplish this, we can combine measured data with physical models and find quantitative parameters for predictions.

In the following, we describe how we applied these concepts in a research project funded by the European Union [17]; the project's main objective is to allow the creation of added value from Copernicus data through the provisioning of modeling and analytics tools for data collection, processing, storage, and access that are provided by the Copernicus Data and Information Access Services (DIAS) [18] and creating a data science workflow where sub-images (image chips) are annotated, administered, and validated based on their assigned semantic labels [19].

The chapter is organized in seven main sections. Section 2 explains the CANDELA platform used for prototyping EO applications, while Section 3 describes the characteristics of the data set. Section 4 presents typical examples which a user can obtain when using the platform from Section 2 and the data set from Section 3. Section 5 illustrates the perspectives in EO data science workflows and Section 6 summarizes our conclusions, while Section 7 contains the future work. The chapter ends with acknowledgments and a list of references.
