**4. Evaluation methods**

Given the area of research, there are always some standard methods to evaluate a system that uses a ML algorithm. In addition, there should be some standard datasets the are prepared for the learning process (training, tuning and testing). Therefore, in the following the evaluation metrics and datasets in GIS for urbanization are introduced.

### **4.1 Metrics**

There is no single connotation for the word "quality", because it is difficult to define quality with an absolute concept. Obviously, the data quality within software systems relates to the benefits that can be achieved by an organization. Furthermore, it is dependent on various aspects. Thus, to measure data quality accurately, one unique feature has to be chosen for considering the contribution of other attributes of the data quality as a whole. Following aspects can be used to describe the data quality (**Table 2**) [83].

It should be noted that by getting a high score in any of the mentioned dimensions, does not simply mean that a high quality data has been achieved. For example, the timelines may only matter in terms of correctness (correct user information is available, but if it is not updated, then it is useless). Sometimes, these features complement to each other [83].


#### **Table 2.** *Patterns of data quality dimensions.*

#### *A Review of the Machine Learning in GIS for Megacities Application DOI: http://dx.doi.org/10.5772/intechopen.94033*

Several studies have used GPS data for classification purposes. However, using GPS sensors have some limitations, such as: in shielded areas like tunnels, GPS information is not available and the GPS signals may be lost especially in high dense locations, which results in erroneous position information. In addition, the GPS sensor consumes power a lot, so sometimes users turn it off to save the battery [79, 81]. Some research focus on developing detection models using ML techniques and data obtained from smartphone sensors like gyroscope, accelerometer and rotation vector, without GPS data [82]. In this way, it has the advantage of considering multiple sensors even without using GPS, the transportation modes can be

*Geographic Information Systems in Geospatial Intelligence*

Given the area of research, there are always some standard methods to evaluate

There is no single connotation for the word "quality", because it is difficult to define quality with an absolute concept. Obviously, the data quality within software systems relates to the benefits that can be achieved by an organization. Furthermore, it is dependent on various aspects. Thus, to measure data quality accurately, one unique feature has to be chosen for considering the contribution of other attributes of the data quality as a whole. Following aspects can be used to describe

It should be noted that by getting a high score in any of the mentioned dimensions, does not simply mean that a high quality data has been achieved. For example, the timelines may only matter in terms of correctness (correct user information is available, but if it is not updated, then it is useless). Sometimes, these features

Relevance The importance of each piece of information stored in the database.

Correctness The real world situation is represented by each set of stored data. Timeliness The data has been updated on time and with adequate frequency Precision The accuracy of the stored data is enough to characterize it.

Objectivity Data is objective: do not need people to judge, interpret, or evaluate.

Completeness The absence of the essential data: how much available data is missing.

a system that uses a ML algorithm. In addition, there should be some standard datasets the are prepared for the learning process (training, tuning and testing). Therefore, in the following the evaluation metrics and datasets in GIS for urbaniza-

identified.

**4. Evaluation methods**

tion are introduced.

the data quality (**Table 2**) [83].

complement to each other [83].

**Dimensions Definition**

Reliability The sources of data are reliable.

Security Access is secure and limited.

Unambiguous Each piece of data carries a unique meaning. Accuracy The level of data that can be accurately represented.

**4.1 Metrics**

**Table 2.**

**40**

*Patterns of data quality dimensions.*

The goals of data quality metrics are multi-dimensional. Indeed, they can set information quality objectives for data creators and managers to achieve, set standards for data to be produced, acquired and curated, and introduce measurement methods for quality judgment.

These metrics include rules that determine the thresholds of meeting appropriate professional expectations and govern the measurement of data quality aspects and levels. In order to configure and organize the rules, a basic structure is needed to distinguish the transformation process from data quality expectations to a set of applicable claims and to prevent unprofessional conduct [84].

Defining dimensions of data quality metrics can meet some purposes. Most of the time, the dimensions are classified according to accepted standard of scholarly activities within an academic discipline as well as other related disciplines that use the data. Scientists have developed several sets of data quality dimensions [85].

The dimension categories differ from each other, according to the academic field(s) in which data are regulated or by the different researchers' understanding and preference. Not only their dimensions are categorized differently among scholars, but also their definitions vary according mostly to different types of data. In practice, variations exist, such as integrity may be described in a different way to measurement adjusted strategies, and accuracy may be calculated at different levels of explanation [85].

Landslide susceptibility mapping (LSM) is a prime step in implementing the immediate disaster management planning and risk mitigation measures. All susceptibility models must be verified for their predictions accuracy. An unverified prediction model and susceptibility maps are nonetheless meaningless and hence do not have any scientific significance. The issue of LSM validation have tackled by many studies [86].

Several LSM approaches have been developed and described in numerous papers. These approaches are mainly divided into three groups: heuristic, deterministic and statistical techniques.

The heuristic techniques are based on the expert's knowledge to group landslideprone areas into several ranks from high to low classes. It is often used for susceptibility mapping in large areas. While, deterministic techniques rely on numerical modeling of the physical mechanism that controls slope failure. However, they are not suitable for a large-scale mapping, due to their problematic and impractical need for a huge array of data, namely rock mechanical properties, the wetness and soil saturation and soil depth. Statistical and probabilistic techniques including bivariate, multivariate statistical methods, certainty factor, as well as knowledgebased techniques such as ANNs and fuzzy logic approaches are promising methods for predicting the landslides [87].

In most cases, the models are tested with an independent set of data, which was not used for training the model. In [88], authors reported a three following approach to obtain an independent sample of the landslide for validation purpose [87].


period. This is the most sufficient to test the validity of the prediction mode, however, the toughest to apply as it requires knowledge of the temporal distribution of landslides over an adequately long-time spans.

As an example, image classification is not valid without evaluating its accuracy. The source of errors could stem from the classification itself, image recording, inappropriate training data and so on, however, in accuracy evaluation it is assumed that all differences between classification results and reference data comes from the classification errors.

Confusion matrix is one of the most common methods that evaluates classification accuracy. This matrix contains a category comparison of relationship between known, ground-truth data and classification results for the same category.

The overall accuracy of the classification process, is measured in percent and indicates the number of pixels which correctly classified divided by the total number of pixels. Kappa coefficient is a measure of overall statistical agreement. It measures the overall agreement of classification results, excluding agreement acquired, not on purpose, but by chance [89].

#### **4.2 Data**

From the very first satellite launched in 1972 till the Landsat 8, launched in 2013, Landsat satellite data have been recognized as a source of objective and reliable information. These missions provide high quality worldwide multispectral data and have been successfully used in countless applications in science [90].

The Landsat archive has provided multispectral data over the Earth for about 40 years. This fact makes Landsat data an attractive information source for studies related to change detection, especially for identifying land use and land cover changes indications.

high as 0.4″ (approximately 12 m) at the equator and 0.6″ in the mid-latitudes on a global coverage. Also it is freely available for scientific use. This high resolution constitutes a paradigm shift in studying urban extent for cities around the world. The importance of the satellite imageries for evaluating urbanization by measuring land use and land cover change for cities and their surroundings, is undeniable. Remote sensing (RS) is a reliable data source, which provides spatially consistent coverage of large areas with temporal frequency and high spatial detail. Besides, it is useful for analyzing phenomenon that is time dependent, such as urban expansion [98]. Therefore, RS is an accurate and effective data source for monitoring expansion of metropolitans, especially in cases that information related

**Band Spectral band Resolution** 0.45–0.52 μm 30 m 30 m 0.52–0.60 μm 30 m 30 m 0.63–0.69 μm 30 m 30 m 0.76–0.90 μm 30 m 30 m 1.55–1.75 μm 30 m 30 m 10.4–12.5 μm 120 120 m 2.08–2.35 μm 30 m 30 m

*A Review of the Machine Learning in GIS for Megacities Application*

*DOI: http://dx.doi.org/10.5772/intechopen.94033*

**Band Spectral band Resolution** 0.45–0.515 μm 30 m 30 m 0.525–0.605 30 m 30 m 0.63–0.69 μm 30 m 30 m 0.75–0.90 μm 30 m 30 m 1.55–1.75 μm 30 m 30 m 10.4–12.5 μm 60 m 60 m 2.09–2.35 μm 30 m 30 m

This is a list of some other datasets that provide information related to GIS for

the status of global environment, accurately. It is developed through the cooperation of National Geospatial Information Authorities (NGIAs) in the world. The Global Map provides eight main map themes at a nominal ground resolution of 1 km for raster data and at a scale of 1:1,000,000 for

<sup>1</sup> https://nationalmap.gov/small\_scale/atlas-ftp-global-map.html?openChapters=chptrans#

: It is a set of digital maps that cover the entire world to express

to the land use management is inconsistent and inappropriate.

urbanization:

**43**

**Table 3.**

**Table 4.**

*Landsat 7 ETM+ bands.*

*Landsat 5 TM bands.*

1.GLOBAL Map1

vector data. These themes are:

• Transportation

• Boundary

World population was more than 7 billion at the time of the latest Landsat, Landsat 8. Considering the valuable information about changes to Earth's land surface for more than 40 years, the Landsat program has given decision makers a reliable source for managing Earth's resources for the planet's burgeoning population with integral information about the World's food, water, forests and how land resources are being used [90].

Imagery from these satellites is distributed for free and was obtained from the USGS website: http://earthexplorer.usgs.gov/.

Landsat 5 had Multi-Spectral Scanner (MSS) and Thematic Mapper (TM) sensors. TM sensor has 6 spectral bands with the resolution – 30 m and 1 thermal infrared band with resolution of 120 m (**Table 3**) [91].

Landsat 7 has Enhanced Thematic Mapper Plus (ETM+) sensor with 6 multispectral bands with 30 m resolution, 1 thermal band with the resolution of 60 m and 1 panchromatic band with 15 m resolution (**Table 4**). Bands 1–5&7 were used for LULC classification, while band 6 for LST extraction in both cases of Landsat 5 and 7 [91].

High-resolution maps of settlements and urban footprints form the basis for an integrated evaluation of global settlement patterns. In the past decade, there has been rapid progress in preparation of such maps. New satellite technology and improved data processing using ML have facilitated rapid improvement in their accuracy and resolution. The MODIS 500 urban land cover [92], until recently represented the state of the art in urban land cover datasets [93]. It is now outperformed by both the Global Urban Footprint (GUF) dataset which have higher resolution and accuracy than any other urban land cover dataset [94], even if comparing it to the high quality Global Human Settlement Layer (GHSL) [95, 96] or GlobeLand 30 [97]. The GUF attributes a binary urban footprint at a resolution as


### *A Review of the Machine Learning in GIS for Megacities Application DOI: http://dx.doi.org/10.5772/intechopen.94033*

## **Table 3.**

period. This is the most sufficient to test the validity of the prediction mode, however, the toughest to apply as it requires knowledge of the temporal

As an example, image classification is not valid without evaluating its accuracy.

Confusion matrix is one of the most common methods that evaluates classification accuracy. This matrix contains a category comparison of relationship between

The overall accuracy of the classification process, is measured in percent and indicates the number of pixels which correctly classified divided by the total number of pixels. Kappa coefficient is a measure of overall statistical agreement. It measures the overall agreement of classification results, excluding agreement

From the very first satellite launched in 1972 till the Landsat 8, launched in 2013,

The Landsat archive has provided multispectral data over the Earth for about 40 years. This fact makes Landsat data an attractive information source for studies related to change detection, especially for identifying land use and land cover

World population was more than 7 billion at the time of the latest Landsat, Landsat 8. Considering the valuable information about changes to Earth's land surface for more than 40 years, the Landsat program has given decision makers a reliable source for managing Earth's resources for the planet's burgeoning population with integral information about the World's food, water, forests and how land

Imagery from these satellites is distributed for free and was obtained from the

Landsat 7 has Enhanced Thematic Mapper Plus (ETM+) sensor with 6 multispectral bands with 30 m resolution, 1 thermal band with the resolution of 60 m and 1 panchromatic band with 15 m resolution (**Table 4**). Bands 1–5&7 were used for LULC classification, while band 6 for LST extraction in both cases of Landsat 5 and 7 [91]. High-resolution maps of settlements and urban footprints form the basis for an integrated evaluation of global settlement patterns. In the past decade, there has been rapid progress in preparation of such maps. New satellite technology and improved data processing using ML have facilitated rapid improvement in their accuracy and resolution. The MODIS 500 urban land cover [92], until recently represented the state of the art in urban land cover datasets [93]. It is now outperformed by both the Global Urban Footprint (GUF) dataset which have higher resolution and accuracy than any other urban land cover dataset [94], even if comparing it to the high quality Global Human Settlement Layer (GHSL) [95, 96] or GlobeLand 30 [97]. The GUF attributes a binary urban footprint at a resolution as

Landsat 5 had Multi-Spectral Scanner (MSS) and Thematic Mapper (TM) sensors. TM sensor has 6 spectral bands with the resolution – 30 m and 1 thermal

Landsat satellite data have been recognized as a source of objective and reliable information. These missions provide high quality worldwide multispectral data and

have been successfully used in countless applications in science [90].

The source of errors could stem from the classification itself, image recording, inappropriate training data and so on, however, in accuracy evaluation it is assumed that all differences between classification results and reference data comes from the

known, ground-truth data and classification results for the same category.

acquired, not on purpose, but by chance [89].

*Geographic Information Systems in Geospatial Intelligence*

distribution of landslides over an adequately long-time spans.

classification errors.

changes indications.

resources are being used [90].

USGS website: http://earthexplorer.usgs.gov/.

infrared band with resolution of 120 m (**Table 3**) [91].

**4.2 Data**

**42**

*Landsat 5 TM bands.*


## **Table 4.**

*Landsat 7 ETM+ bands.*

high as 0.4″ (approximately 12 m) at the equator and 0.6″ in the mid-latitudes on a global coverage. Also it is freely available for scientific use. This high resolution constitutes a paradigm shift in studying urban extent for cities around the world.

The importance of the satellite imageries for evaluating urbanization by measuring land use and land cover change for cities and their surroundings, is undeniable. Remote sensing (RS) is a reliable data source, which provides spatially consistent coverage of large areas with temporal frequency and high spatial detail. Besides, it is useful for analyzing phenomenon that is time dependent, such as urban expansion [98]. Therefore, RS is an accurate and effective data source for monitoring expansion of metropolitans, especially in cases that information related to the land use management is inconsistent and inappropriate.

This is a list of some other datasets that provide information related to GIS for urbanization:

	- Transportation
	- Boundary

<sup>1</sup> https://nationalmap.gov/small\_scale/atlas-ftp-global-map.html?openChapters=chptrans#


*4.2.1 Urban landscapes*

connective infrastructure [99].

ing and pricing of services.

*4.2.2 Megacities*

**45**

The World Bank, in south and east Asia, has explored the patterns, consequences and policy implications related to spatial development of cities by outlining the increasing availability of spatial data and developments in analytics. Data from Earth observation (EO) satellite can give valuable results which are useful for measuring urban growth over a wide range of spatial and temporal scales, especially when combined with data from other sources. The resulting digital urban maps give an accurate, up-to-date and cost effective resource to assist governments in understanding the nature of urban development and making informed decisions. EO datasets allow for harmonized and standardized measurements. Also, they enable planners to make spatially and temporally consistent comparisons and global assessment. In addition, they are particularly significant for monitoring and understanding the evolution of cities. For instance, allowing authorities to know when built-up areas spill across formal administrative boundaries. This shows the need to cooperate with adjoining administrative areas on issues like collecting garbage or

*A Review of the Machine Learning in GIS for Megacities Application*

*DOI: http://dx.doi.org/10.5772/intechopen.94033*

The World Bank has created a database to analyze the speed, magnitude and spatial form of urbanization in EO data. These data help researchers examine the drivers and influences of the urbanization nature and how the urban landscape has evolved into its current state. It offers a basis for understanding the effects of policy change and identifying priorities for new initiatives. In particular, the focus is on exploring the institutional frameworks for urban management, like mechanisms to coordinate service delivery across administrative jurisdictions, investment for example in transport and other network infrastructure and regulation such as zon-

About twelve years ago, the World Bank launched the "Earth Observation for Development" initiative. So, data in areas where data are commonly scarce and unreliable, are provided. Such information is useful for building project fundamentals against which progress can be gauged, high priority issues identified and mitigation measures determined. Focus of this project is on areas like metropolitan development and related fields including disaster risk management, the environment, water and energy. The bank has also developed the Urban Management and Analysis (PUMA) platform to facilitate more collaboration between policymakers and other development stakeholders, toward these purposes. By using this tool, users with no GIS experience would be able to access, analyze and share urban

These activities have resulted in more than 30 technical helping projects that done for urban planners and partners, in the period 2008–2018. As a result, highly specialized big data mapping products and monitoring systems that leverage EO

In the South Asia Megacities Improvement Program, EO big data was used to analyze 20 years of urban expansion in the metropolitan areas of Delhi, Mumbai and Dhaka. These data make it possible to measure the qualitative and quantitative aspects of transformation, like the distribution and density of urban sprawl, the growth rate of built-up areas and urban land use change. This information helps analysts to trace how informal settlements grow outside the cities' boundaries, and to understand the drivers of land use [99]. Therefore, some important insights into land cover and use in the three cities revealed (**Figure 4**). Furthermore, it showed

spatial data in an interactive and customizable way [100].

data for South Asian cities have been launched.


<sup>2</sup> https://sedac.ciesin.columbia.edu/data/collection/gpw-v4

<sup>3</sup> https://sourceforge.net/projects/googleworldbank/

<sup>4</sup> https://gadm.org/

<sup>5</sup> https://www.acleddata.com/

<sup>6</sup> https://sedac.ciesin.columbia.edu/data/collection/grump-v1

<sup>7</sup> https://learnosm.org/en/osm-data/osm-in-qgis/

<sup>8</sup> https://geohive.ie/

### *A Review of the Machine Learning in GIS for Megacities Application DOI: http://dx.doi.org/10.5772/intechopen.94033*

#### *4.2.1 Urban landscapes*

• Drainage

• Elevation

• Vegetation

• Land Cover

3.World Bank Geodata<sup>3</sup>

7.Open Street Map (OSM)7

road names, ferry routes, etc.

<sup>2</sup> https://sedac.ciesin.columbia.edu/data/collection/gpw-v4 <sup>3</sup> https://sourceforge.net/projects/googleworldbank/

<sup>6</sup> https://sedac.ciesin.columbia.edu/data/collection/grump-v1

<sup>7</sup> https://learnosm.org/en/osm-data/osm-in-qgis/

present.

8.Geohive<sup>8</sup>

<sup>4</sup> https://gadm.org/

<sup>8</sup> https://geohive.ie/

**44**

<sup>5</sup> https://www.acleddata.com/

• Land Use

• Population Centers

2.Gridded Population of the World (GPW)<sup>2</sup>

*Geographic Information Systems in Geospatial Intelligence*

decision-making and communication.

5.Armed CONFLICT Location and Event Dataset<sup>5</sup>

6.Global Rural-Urban Mapping Project (GRUMP)6

information on rural and urban population balances.

4.Global ADMINISTRATIVE Areas<sup>4</sup>

: It is the dataset of NASA's

: Administrative areas in this database are

: This data includes all

: It is the dataset from

socioeconomic data and applications center, which includes raw population, and population density of the past, current and future prediction. The purpose

compatible with datasets from social, economic, and Earth science disciplines, and RS. This data is globally consistent and spatially explicit for research,

countries and lower level subdivisions such as provinces and departments. The

administrative areas, and scientists can download the spatial data by country.

reported conflict events in 50 countries in developing world, from 1997 to

contains many important things like points of interest, buildings, roads and

access to public spatial data, and includes population and county statistics. it is not provided in GIS data formats, but it is easily convertible from CSV.

: the initiative is made available by Ordnance Survey Ireland for easy

NASA'S socioeconomic data and applications center, which includes

: In this data, a wide range of World Bank datasets

: Crowdsourced data for the whole world, which

of GPW is to provide a spatially disaggregated population layer that is

converted to KML format, including GNP, schooling and financial data.

latest version is 3.6 and it was released in 2018. It restricts 386,735

The World Bank, in south and east Asia, has explored the patterns, consequences and policy implications related to spatial development of cities by outlining the increasing availability of spatial data and developments in analytics. Data from Earth observation (EO) satellite can give valuable results which are useful for measuring urban growth over a wide range of spatial and temporal scales, especially when combined with data from other sources. The resulting digital urban maps give an accurate, up-to-date and cost effective resource to assist governments in understanding the nature of urban development and making informed decisions. EO datasets allow for harmonized and standardized measurements. Also, they enable planners to make spatially and temporally consistent comparisons and global assessment. In addition, they are particularly significant for monitoring and understanding the evolution of cities. For instance, allowing authorities to know when built-up areas spill across formal administrative boundaries. This shows the need to cooperate with adjoining administrative areas on issues like collecting garbage or connective infrastructure [99].

The World Bank has created a database to analyze the speed, magnitude and spatial form of urbanization in EO data. These data help researchers examine the drivers and influences of the urbanization nature and how the urban landscape has evolved into its current state. It offers a basis for understanding the effects of policy change and identifying priorities for new initiatives. In particular, the focus is on exploring the institutional frameworks for urban management, like mechanisms to coordinate service delivery across administrative jurisdictions, investment for example in transport and other network infrastructure and regulation such as zoning and pricing of services.

About twelve years ago, the World Bank launched the "Earth Observation for Development" initiative. So, data in areas where data are commonly scarce and unreliable, are provided. Such information is useful for building project fundamentals against which progress can be gauged, high priority issues identified and mitigation measures determined. Focus of this project is on areas like metropolitan development and related fields including disaster risk management, the environment, water and energy. The bank has also developed the Urban Management and Analysis (PUMA) platform to facilitate more collaboration between policymakers and other development stakeholders, toward these purposes. By using this tool, users with no GIS experience would be able to access, analyze and share urban spatial data in an interactive and customizable way [100].

These activities have resulted in more than 30 technical helping projects that done for urban planners and partners, in the period 2008–2018. As a result, highly specialized big data mapping products and monitoring systems that leverage EO data for South Asian cities have been launched.

#### *4.2.2 Megacities*

In the South Asia Megacities Improvement Program, EO big data was used to analyze 20 years of urban expansion in the metropolitan areas of Delhi, Mumbai and Dhaka. These data make it possible to measure the qualitative and quantitative aspects of transformation, like the distribution and density of urban sprawl, the growth rate of built-up areas and urban land use change. This information helps analysts to trace how informal settlements grow outside the cities' boundaries, and to understand the drivers of land use [99]. Therefore, some important insights into land cover and use in the three cities revealed (**Figure 4**). Furthermore, it showed

*4.2.3 Residential cities*

**5. Conclusions**

megacities.

geographical area.

related to the megacity studies.

**47**

EO big data approaches are also contribute to drive sustainable urban development. The mentioned research on the use of high-resolution satellite data for poverty mapping, draws emerging techniques that can show fast changing urban areas in near real-time. These methods can determine built-up area, density of cars and buildings, and types of roofing and road. Via ML techniques and image processing algorithms, also they can calculate whether buildings are more rectangular or have more chaotic angles, that indicates higher poverty level, and construct poverty indicators like the ratio of paved roads in an area. So, stakeholders can target their

All in all, analysis of EO big data can be an important tool for managing city development in low-income countries. It can measure and track the urban expansion and highlight the drivers of economic growth. This result in better understanding the factors contributing to inefficiencies and inequality in urban areas, and providing optimized policies. Besides, they can create flexibility in urban environments, so that residents, businesses and systems can adapt to persistent stresses or shocks. Also they can provide residential cities that meet their residents' needs.

This book chapter briefly introduced ML and past research about the application of ML algorithms for processing of daily satellite imagery. It has been demonstrated several aspects of detecting and classification of Earth features merging into local geographical and geodetical system with further GIS development. The main purpose of the chapter is to provide existing resources for researchers to be aware of the up-to-date status of development of ML application in GIS in particular in studies of

The real potential of ML in GIS is not sufficiently developed yet. On one hand, both fields intersect in analytical discussions. At the same time, most GIS applications which are desirable for ML implementation, are driven by conventional

Merging GIS and ML offers a potential mechanism to reduce the cost of analysis of spatial information by decreasing the amount of time spent on data interpretation. This integration allows the interpretive outcome from a small area to be transferred to a larger, geographically similar area, without the extra time and expense of putting geographers in the field for a time sufficient to cover

ML can be considered both as a science and as engineering, depending on the goal. This technology is often seen as part of computing; however, it has links with various other areas including philosophy, psychology and linguistics. Its techniques can provide benefits within GIS over traditional methods, like statistical analysis, especially if data show some form of non-linearity. Thanks to such an opportunity of ML/GIS technology makes most successfully to apply for monitoring and

Most people are unaware that they use artificial intelligence in their daily life. Finding solutions to decision-making issues by using models that allow decision makers to express their limitations and imprecise concepts that are used with large volume of geographic data, costs a lot. This chapter is expected to open opportunity to understand clearly fundamental aspects of ML/GIS development with basically

interventions exactly where they are mostly useful [101].

*A Review of the Machine Learning in GIS for Megacities Application*

*DOI: http://dx.doi.org/10.5772/intechopen.94033*

approach and standard tools of commercial GIS packages.

observation consequences of megacity development.

#### **Figure 4.**

*Sample visualizations from the South Asia geospatial analysis [101].*

the percentage of land taken by settlements and industrial build-up, agriculture, natural or semi-natural vegetation and forest [101].

By using it, urban planners and development stakeholders could understand existing demands and plan for future needs. For instance, in Delhi, the maps illustrate that with industrial development, the urban expansion is accelerated. This mostly happened between 2003 to 2010; however, a considerable increase in construction sites shows that it will continue in the future, so it must be planned [101].

By using digitized spatial data, analysts would be able to study the target at different administrative levels: metropolitan, city, district or sub-district, and also other non-administrative units. These datasets make it possible to aggregate flexibly. One example is showing the proportion of sprawl by district, its density, the drivers of urban change and class evolution within urban areas. Together with environmental or socio-economic data, the data can prepare information on the proportion of population to urban growth, and can measure indicators like compactness, the ratio of green space to citizens, and the accessibility of these areas.

The results of applying EO big data can be crucial for coordination between public, private and household investment in infrastructure, productive capital and housing, respectively. Thus, policymakers would be able to promote optimal spatial and transportation links between businesses, affordable housing and commercial units, health and education services and recreational areas. In addition, these views can be applied to support rural-to-urban migrants and ensuring that rapid urbanization is inclusive. Since EO big data methods spread across the world's megacities, and are refined and adapted, they will provide valuable tools to policymakers, and greater benefits for the citizens of the future.

### *4.2.3 Residential cities*

EO big data approaches are also contribute to drive sustainable urban development. The mentioned research on the use of high-resolution satellite data for poverty mapping, draws emerging techniques that can show fast changing urban areas in near real-time. These methods can determine built-up area, density of cars and buildings, and types of roofing and road. Via ML techniques and image processing algorithms, also they can calculate whether buildings are more rectangular or have more chaotic angles, that indicates higher poverty level, and construct poverty indicators like the ratio of paved roads in an area. So, stakeholders can target their interventions exactly where they are mostly useful [101].

All in all, analysis of EO big data can be an important tool for managing city development in low-income countries. It can measure and track the urban expansion and highlight the drivers of economic growth. This result in better understanding the factors contributing to inefficiencies and inequality in urban areas, and providing optimized policies. Besides, they can create flexibility in urban environments, so that residents, businesses and systems can adapt to persistent stresses or shocks. Also they can provide residential cities that meet their residents' needs.
