**2. Fundamental principles**

In order to review the ML application in GIS, the first is needed to familiarize with the basic concepts in this regard. The followings are some fundamental principles and definitions.

#### **2.1 Machine learning**

ML is an application of artificial intelligence that provides systems the ability to automatically learn and improve their performance from experience without being explicitly programmed. ML focuses on the development of computer programs that can access data and use it in the process of learning [1].

The process of learning begins with observations or data, such as examples, direct experience, or instruction. The data will be used in order to look for patterns in it and make better decisions based on the provided examples. The primary goal is to allow the machines learn without human intervention or assistance and adjust actions accordingly.

ML algorithms are often categorized as supervised or unsupervised, however this categorization is very general and it cannot cover all of the available methods [2]:

• Supervised ML algorithms can use what has been learned in the past by using labeled examples to predict future events from unseen data. Starting from the analysis of a training dataset (labeled examples), the learning algorithm predicts the output values. The system is capable of providing targets for each new input after sufficient training. Besides, the algorithm can compare its output with the correct output and find errors to modify the model accordingly. Examples of these algorithms: Support Vector Machine (SVM), Decision Tree, Random Forest, KNN, Regression, etc.

### *A Review of the Machine Learning in GIS for Megacities Application DOI: http://dx.doi.org/10.5772/intechopen.94033*

customers and optimizing processes. As cities get larger, spatial information becomes like a key tool in efficient urban service delivery, public safety, and overall

recent years is ML. Undoubtedly, one of these fields is GIS.

*Geographic Information Systems in Geospatial Intelligence*

metrics and datasets. The last Section 5 provides conclusions.

can access data and use it in the process of learning [1].

On the other hand, today, artificial intelligence methods, especially ML techniques, have come to the attention of scientists and officials in various fields, to analyze and manage the enormous data that is produced at any given moment, and one of the most exciting tools that have entered the material science toolbox in

In practice, a GIS allows users to understand the spatial dimensions of their work and relate it to information such as population information as well. The data collected and stored by the GIS can be used for different purposes ranging from transport, draught analysis, agriculture, disease-outbreak analysis, land occupancy, etc. At the same time GIS makes possible to storage a big volume of data in safely stage and access to them at any needed time and rapid base. So, the goal of this chapter is to review past works and research in this area, because it can be supposed that can help greatly in understanding the current situation and capabilities; besides, it will be attempt of step in planning for future developments in the field

The remainder of this chapter is organized and structures as follows. In Section 2, main definitions are mentioned. Section 3 presents an overview of ML application in GISs and related works in this area. In Section 4, it has been introduced the evaluation

In order to review the ML application in GIS, the first is needed to familiarize with the basic concepts in this regard. The followings are some fundamental prin-

ML is an application of artificial intelligence that provides systems the ability to automatically learn and improve their performance from experience without being explicitly programmed. ML focuses on the development of computer programs that

The process of learning begins with observations or data, such as examples, direct experience, or instruction. The data will be used in order to look for patterns in it and make better decisions based on the provided examples. The primary goal is to allow the machines learn without human intervention or assistance and adjust

• Supervised ML algorithms can use what has been learned in the past by using labeled examples to predict future events from unseen data. Starting from the analysis of a training dataset (labeled examples), the learning algorithm predicts the output values. The system is capable of providing targets for each new input after sufficient training. Besides, the algorithm can compare its output with the correct output and find errors to modify the model

accordingly. Examples of these algorithms: Support Vector Machine (SVM),

ML algorithms are often categorized as supervised or unsupervised, however this categorization is very general and it cannot cover all of the available

Decision Tree, Random Forest, KNN, Regression, etc.

resource management.

of GIS.

**2. Fundamental principles**

ciples and definitions.

**2.1 Machine learning**

actions accordingly.

methods [2]:

**30**


ML enables analysis of massive amount of data. Besides, it generally provides faster, more accurate results in order to identify profitable opportunities or dangerous risks, it may also require additional time and resources to train it properly. ML requires formatted data that is analyzed to build a ML model. In other words, it requires an appropriate set of data that can be applied to a learning process.

ML can be used in cases where using human resources is not time/cost effective or when many variables are being considered simultaneously. ML uses the prepared data to train a ML algorithm. An algorithm is a computerized procedure or recipe. When the algorithm is trained on the data, a ML model will be generated. Once the data is prepared and the algorithm trained, the ML model can make predictions about the unseen data, on its own.

Selecting the right algorithm for the issue is necessary for applying ML successfully. Selection is largely influenced by the application and the data available.

#### *2.1.1 Choosing the most appropriate ML algorithm*

There are a large number of ML algorithms available. Choosing the optimal algorithm for a specific problem is dependent on its features such as speed, accuracy, training and predicting time, amount of data required to train, data type, how easy is it to implement, etc. Most of the time, for GIS applications, time is very important.

To avoid dependence on the specific conditions, it is common to analyze the runtime of algorithms in an asymptotic sense. So, considering *n* the number of training sample, *p* the number of features, *nt* the number of trees, *nsv*, the number of support vectors and *k*, the number of clusters, following are time complexity factors of some ML algorithms, which help to choose the correct algorithm for the issue (**Table 1**):

*Geographic Information Systems in Geospatial Intelligence*


#### **Table 1.**

*Time complexity of some ML algorithms.*

Where:

• Time for Learning is time associate with training of dataset. It varies with size of data and algorithm we are using in that.

> As more cities are becoming megacities and existing megacities are growing, policymakers and urban planners are grappling with the questions of how to make growth at this scale sustainable, and how to tackle the escalating social, economic and environmental problems evident in the world's megacities. One of the most

Urban dispersal and expansion has become an important issue for municipalities, environmental scientists and urban planners. Especially, in megacities, this issue becomes more vital. Currently, more than %50 of the world's population lives in urban areas and then it is predicted to grow over the %65 by 2050, according to the United Nation report. For example, all population in the 500,000+ urban areas of Australia and New Zealand combines to equal that of Moscow or Bangkok, and only slightly larger than Los Angeles (16.4 million). It is known that developing countries have already begun a rapid urbanization [4]. The fact that the global population has increased rapidly since the industrial revolution of the 18th century, highlights the problems of urban planning and urbanization, because of the population gathering in certain centers [5]. This unnatural pace of urbanization has created significant social and environmental challenges for decision-makers [6]. In addition, modeling and simulation are effective tools for discovering the urban development mechanisms and for providing planning in growth management. Therefore, monitoring and modeling the urban sprawl of cities is a necessary key

As it has been illustrated in **Figure 2**, Asia remains the dominance in terms of megacities, with nearly 58 percent of the population in larger metropolitan areas. This is approximately five times as many greater urban area residents as in North America or Africa. Besides, Asia has more than five times as many larger urban area

residents as Europe and eight times that of South America [3].

popular solutions is ML.

*The 38 megacities in 2019 [3].*

**Figure 1.**

**33**

**3. Application of machine learning in GIS**

*A Review of the Machine Learning in GIS for Megacities Application*

*DOI: http://dx.doi.org/10.5772/intechopen.94033*

parameter to prevent precautions [7, 8].

• Time for Predicting is time associate with testing of dataset or predicting unseen data. It varies with size of data and algorithm we are using in that.

Most of the time, about 80 percent of the dataset will be used for training and the remaining part will be used for tuning and testing. In addition, it should be noted that, as the training phase most of the time can be performed offline, the predicting time is more important for developers.

Generally, it can be used the points above to shortlist a few algorithms, but it is hard to know right at the start which algorithm will work in the best way. It is usually desirable to work iteratively. Among the ML algorithms can be identified as potential good approaches, throw the data into them, run them all in either parallel or serial, and at the end evaluate the performance of the algorithms to select the best one(s).

#### **2.2 Megacity**

A megacity is defined by the United Nations (UN) as a city which has a population of 10 million or more people. Currently, there are 38 megacities in the World (**Figure 1**). The UN statistics indicate that the city with the largest populations worldwide is Tokyo with 38.8 million people. Recently, the UN has predicted that the number of megacities will rise to 41 by the year 2030.

The urbanization process poses enormous challenges for governments, social and environmental planners, engineers, architects and the residents of the megacities. No wonder, the growing population of cities creates demand, in areas such as housing and services. The environmental destruction and poverty are two other concerns, which city administrations have to take care of, as especially poor people do not have the necessary financial resources to tackle these problems.

Megacities affect a variety of living conditions for citizens. Although stress level, traffic jams, poor air quality and increasing health risks, make life more difficult in megacities, most people still choose to live there. Therefore, more accurate governmental programs are needed to help improve living conditions for the metropolitan inhabitants.

*A Review of the Machine Learning in GIS for Megacities Application DOI: http://dx.doi.org/10.5772/intechopen.94033*

**Figure 1.** *The 38 megacities in 2019 [3].*

Where:

*Time complexity of some ML algorithms.*

**Table 1.**

one(s).

**2.2 Megacity**

inhabitants.

**32**

• Time for Learning is time associate with training of dataset. It varies with size

**Algorithm Learning Predicting** Regression O(p2n <sup>þ</sup> <sup>p</sup>3) O(p) Decision Tree O(n2p<sup>Þ</sup> O(p) Random Forest O(n2pnt<sup>Þ</sup> O(pnt<sup>Þ</sup> Naïve Bayes O(np) O(p) SVM O(n2p <sup>þ</sup> <sup>n</sup>3) O(pnsv<sup>Þ</sup> KNN — O(np) K-means O(npkþ1) O(k)

• Time for Predicting is time associate with testing of dataset or predicting unseen data. It varies with size of data and algorithm we are using in that.

Most of the time, about 80 percent of the dataset will be used for training and the remaining part will be used for tuning and testing. In addition, it should be noted that, as the training phase most of the time can be performed offline, the

Generally, it can be used the points above to shortlist a few algorithms, but it is

A megacity is defined by the United Nations (UN) as a city which has a population of 10 million or more people. Currently, there are 38 megacities in the World (**Figure 1**). The UN statistics indicate that the city with the largest populations worldwide is Tokyo with 38.8 million people. Recently, the UN has predicted that

The urbanization process poses enormous challenges for governments, social and environmental planners, engineers, architects and the residents of the megacities. No wonder, the growing population of cities creates demand, in areas such as housing and services. The environmental destruction and poverty are two other concerns, which city administrations have to take care of, as especially poor people

Megacities affect a variety of living conditions for citizens. Although stress level, traffic jams, poor air quality and increasing health risks, make life more difficult in megacities, most people still choose to live there. Therefore, more accurate governmental programs are needed to help improve living conditions for the metropolitan

do not have the necessary financial resources to tackle these problems.

hard to know right at the start which algorithm will work in the best way. It is usually desirable to work iteratively. Among the ML algorithms can be identified as potential good approaches, throw the data into them, run them all in either parallel or serial, and at the end evaluate the performance of the algorithms to select the best

of data and algorithm we are using in that.

*Geographic Information Systems in Geospatial Intelligence*

predicting time is more important for developers.

the number of megacities will rise to 41 by the year 2030.

As more cities are becoming megacities and existing megacities are growing, policymakers and urban planners are grappling with the questions of how to make growth at this scale sustainable, and how to tackle the escalating social, economic and environmental problems evident in the world's megacities. One of the most popular solutions is ML.
