**Abstract**

Machine learning (ML) is very useful for analyzing data in many domains, including the satellite images processing. In the remote sensing data processing, ML tools are mainly founded out a place for filtering, interpretation and prediction information. Filtering aims at removing noise and performing transformations, which is vital segment of data processing as useful performance of data validation. An interpretation is significant part of it as the stage of objects classification depends of existing task for solution. Prediction is performed to estimate precise values of underlying parameters or future events in the data. It can be used successfully above achievements in a variety of areas. An urbanization is one of the spheres of advance technology application where highly need to collect appropriate data for understanding of challenges facing society. The process of urbanization becomes very important problem, thanks to city expansion. Each city is a complicated system. It consists of various interactive sub-systems and is affected by multiple factors, including population growth, transportation and management policies. To understand the driving forces of the urban structure change, the satellite-based estimates are considered to monitor these changes, in long term. GIS (geographic information system) is equivalent to methods related to the use of geospatial information. Besides, the increasing application of ML techniques in various fields, including GIS, is undeniable. Thus, the chapter attempts to review the application of ML techniques in GIS with a focus on megacities and theirs features fixing/identification and solution.

**Keywords:** geographic information system, machine learning, urbanization, data processing, modeling

## **1. Introduction**

Today there is a growing need for the collection, processing, management and efficiently use of reliable spatial information. Therefore, it is very significant to be aware of relevant approaches and to share experiences and develop best practices. This growing demand is due to the most important developments in society, which in turn are magnified by rapid urbanization and the conditions of the megacities.

Location, in the form of spatial data, is a key point for visualizing the current location, predicting events and enhancing service delivery. Information about location can integrate and strengthen the complex analysis of the distribution of locations, events, and services. This provides many opportunities for improving government services in terms of best governmental segments, interacting with

customers and optimizing processes. As cities get larger, spatial information becomes like a key tool in efficient urban service delivery, public safety, and overall resource management.

• Unsupervised ML algorithms are used when the training data is not labeled or classified. The purpose of these algorithms is to examine how systems can derive a function to describe the hidden pattern of unlabeled data. They may not specify the appropriate output, but it explores data and can infer to describe hidden structures from unlabeled data. Examples of these Learning:

• Reinforcement learning algorithms are learning methods that interacts with its environment by generating actions and receiving punishments or rewards. Trial and error search and delayed reward are the most important features of these algorithms. They allow systems agents to automatically determine the ideal behavior in a particular context in order to maximize its performance quality. Simple reward feedback is known as the reinforcement signal. Examples of these Learning: Q-learning, Markov Decision Process.

ML can be used in cases where using human resources is not time/cost effective or when many variables are being considered simultaneously. ML uses the prepared data to train a ML algorithm. An algorithm is a computerized procedure or recipe. When the algorithm is trained on the data, a ML model will be generated. Once the data is prepared and the algorithm trained, the ML model can make predictions

Selecting the right algorithm for the issue is necessary for applying ML success-

fully. Selection is largely influenced by the application and the data available.

There are a large number of ML algorithms available. Choosing the optimal algorithm for a specific problem is dependent on its features such as speed, accuracy, training and predicting time, amount of data required to train, data type, how easy is it to implement, etc. Most of the time, for GIS applications, time is very

To avoid dependence on the specific conditions, it is common to analyze the runtime of algorithms in an asymptotic sense. So, considering *n* the number of training sample, *p* the number of features, *nt* the number of trees, *nsv*, the number of support vectors and *k*, the number of clusters, following are time complexity factors of some ML algorithms, which help to choose the correct algorithm for the issue

ML enables analysis of massive amount of data. Besides, it generally provides faster, more accurate results in order to identify profitable opportunities or dangerous risks, it may also require additional time and resources to train it properly. ML requires formatted data that is analyzed to build a ML model. In other words, it requires an appropriate set of data that can be applied to a

• Semi-supervised ML algorithms fall in between the two types of previously mentioned algorithms, because they use both labeled and unlabeled data for training. Usually, a small portion of data is labeled and a large amount of it, is unlabeled. The systems that use these algorithms can achieve high level of accuracy. Typically, semi-supervised learning is selected when the acquired labeled data requires skilled and relevant resources in order to learn from it (producing labeled data costs money and takes time.). Otherwise, accessing to unlabeled data generally does not require additional

Apriori algorithm, K-means, EM.

*DOI: http://dx.doi.org/10.5772/intechopen.94033*

*A Review of the Machine Learning in GIS for Megacities Application*

resources.

learning process.

important.

(**Table 1**):

**31**

about the unseen data, on its own.

*2.1.1 Choosing the most appropriate ML algorithm*

On the other hand, today, artificial intelligence methods, especially ML techniques, have come to the attention of scientists and officials in various fields, to analyze and manage the enormous data that is produced at any given moment, and one of the most exciting tools that have entered the material science toolbox in recent years is ML. Undoubtedly, one of these fields is GIS.

In practice, a GIS allows users to understand the spatial dimensions of their work and relate it to information such as population information as well. The data collected and stored by the GIS can be used for different purposes ranging from transport, draught analysis, agriculture, disease-outbreak analysis, land occupancy, etc. At the same time GIS makes possible to storage a big volume of data in safely stage and access to them at any needed time and rapid base. So, the goal of this chapter is to review past works and research in this area, because it can be supposed that can help greatly in understanding the current situation and capabilities; besides, it will be attempt of step in planning for future developments in the field of GIS.

The remainder of this chapter is organized and structures as follows. In Section 2, main definitions are mentioned. Section 3 presents an overview of ML application in GISs and related works in this area. In Section 4, it has been introduced the evaluation metrics and datasets. The last Section 5 provides conclusions.
