**1. Introduction**

Due to the rapid growth of big data analytics related to all aspects of human activities, the surge in decision-making complexity due to the current climate of uncertainty with unforeseen consequences, and the increasing pervasiveness of advanced information and communication technologies (ICT) such as the proliferation of mobile applications, Internet-of-Things, and bots, we have witnessed an acceleration of integration of many complex ICT systems-of-systems (SoS) and social networks across of a wide spectrum of application domains that include, but are not limited to, telecommunications, satellite communications, medicine, military, education, agriculture, arts, and culture. The primary motivation for this book is to compile some of the latest research work addressing recent advances and applications of data and decision sciences (DDS) across the above-mentioned application domains. This book is a collective effort that uses a diverse set of studies and investigations to cover a wide spectrum of DDS applications. The goal is to shed some insights into the use of DDS models for assisting data analysts and decision-makers.

The objective of this introductory chapter is two-fold, namely, to provide (i) an overview of the data science and decision science, (ii) recent advances and DSS applications with an emphasis on machine learning and artificial intelligence (ML-AI), and (iii) overview and understanding of recent DDS applications. The remaining of this chapter is organized as follows:


## **2. Overview data science, and decision science**

### **2.1 Data science**

Data science is a relatively new and emerging field of research for many mathematicians, statisticians, scientists, and engineers in the world. It has been derived from data mining along with statistical analysis. It is defined in Cambridge Dictionary as "the use of scientific methods to obtain useful information from computer data, especially large amounts of data [1]". In a more technical detail definition in Dictionary.com, it is defined as a field that "deals with advanced data analytics and modeling, using mathematics, statistics, programming, and machine learning to extract valuable, often predictive information from large data sets [2]". Practically, IBM defines data science as a science field, which combines mathematics and statistics, specialized programming languages, advanced data analytics, artificial intelligence (AI), and machine learning (ML) with specific subject matter expertise to uncover actionable insights hidden in an organization's data. These insights can be used to guide decision-making and strategic planning [3]. A data science life cycle used by the industry is captured on the home page of the University of California in Berkley, School of Information [4]. The data science life cycle includes five stages, namely, (i) Stage 1 – data capture stage: data acquisition, data entry, signal reception, and data extraction; (ii) Stage 2 – data maintenance stage: data warehousing, data cleansing, data staging, data processing, data architecture; (iii) Stage 3: data mining processing stage: data mining, clustering/classification, data modeling, data summarization; (iv) Stage 4 – data analysis stage: exploratory/confirmatory, predictive analysis, regression, text mining, qualitative analysis; and (v) Stage 5 – communication stage: data reporting, data visualization, business intelligence, decision making. In the context of this book, this Subsection 2.1 focuses on IBM's definition and data processing and analysis stages of the data science life cycle, including data mining, machine learning and artificial intelligence (ML-AI) using neural networks and deep learning, statistical learning, and Bayesian statistics.

#### *2.1.1 Data mining*

Big data analytics (BDA) is defined as the process of exploiting and extracting meaningful information from a large and complex collection of data1 . Data mining is one of the key required functions in the BDA process. It's well-known that the BDA process is part of the data science1 life cycle, including five data processing stages. As pointed out earlier, the data processing Stage 3 is the data mining processing (DMP) to discover data patterns from a large collection of data [5–8]. **Figure 1** illustrates the DMP characteristics, including the types of data that can be mined and analyzed, the kinds of data mining patterns, data mining techniques, and applications.

As shown in **Figure 1**, DMP can be performed on various types of data such as (i) data from databases collected past and current banking data or experimental data from a complex satellite system; (ii) data from data warehouses, e.g., Amazon data warehouse contained mass data from Amazon business transactions collected from multiple sources and stored in a unified schema; (iii) actual daily real-time transaction data from the banks, e.g., credit approvals, check approvals, payment approvals, etc.; and (iv) other type of data includes but not limited to data collected from information technology (IT) system, data collected from health care and medical sciences, data collected from military defense systems such as images, video data streams, etc. For the kinds of data mining patterns, one of the key components of the data mining patterns is the characterization and discrimination of the features of a target class of data objects against the other features of objects from one or multiple contrasting target classes. After characterization and discrimination processing, the

<sup>1</sup> https://www.bmc.com/blogs/big-data-vs-analytics/.

*Introductory Chapter: Overview of Data and Decision Sciences – Recent Advances and Applications DOI: http://dx.doi.org/10.5772/intechopen.112546*

**Figure 1.** *Data mining processing (DMP) characteristics.*

data is analyzed for frequent patterns using association and statistical correlation analyses. An example of the frequent pattern analysis is to study the behavior of the computer consumers in terms of how often and the types of the computer they buy, the software they buy, and the buyers' profession and their income ranges.

Within the context of this chapter, the following subsections will discuss ML-AI and statistical methods of interest to the data mining techniques as shown in **Figure 1**.

### *2.1.2 ML-AI, statistical learning, and Bayesian statistics*

ML-AI techniques are usually used for estimating and predicting the data characteristics and associated data trends. For examples, ML-AI can be used to analyze big data to predict stock market trends [9], and analysis of big consumer data can help the suppliers to forecast trends of customer behavior, markets, prices, and so on [10]. When the data content has several categorical variables, the prediction can be achieved through classification and pattern recognition. As an example, ML-AI using supervised learning and support vector machine (SVM) can be used to (i) predict the impacts of signal distortions caused by non-ideal satellite operational environment to the transmitted signal components, and (ii) classify the source of signal distortions [11]. In terms of ML-AI, past data is used to train the system, thus the newly accumulated data represents the case of repeat "modeling" where new data will be used to predict the trend in the future or classify an object (e.g., signal component) to a group (e.g., source of signal distortion) by comparing with the old data. Majority of practical and useful ML-AI modeling techniques are usually stochastic or statical in nature. Therefore, the term statistical learning is also used in literature for ML-AI modeling. As pointed out in [12], Bayesian is a way of practicing statistics in which the ML-AI modeling is built upon probability distributions, i.e., the modeling is solely calibrating and adjusting the probabilities. Thus, Bayesian statistics utilize Bayes theorem and facilitate the calculation of posterior distributions as follows:

$$p(\theta \mid \text{data}) \propto p(\text{data} \mid \theta) \propto p(\theta) \tag{1}$$

Where *p data* (θ / ) is the posterior probability distribution, *p data* ( /θ ) is the likely hood or the classification probability, and *p*(θ) is the prior probability.

Bayesian functional data analytical techniques include multiple curve-fitting (MCF), single neuronal analysis (SNA), and population-level analysis (PLA) [12]. MCF uses hierarchical modeling of firing intensity curves using BARS (i.e., BAR chart) approach. SNA is used for testing equality of two or more curves. Finally, PLA is used for testing equality of two groups of curves. As examples, MCF, SNA, and PLA have been used in health care and biology applications [13–15], respectively.

#### *2.1.3 ML-AI using neural network and deep learning*

ML-AI modeling uses neural networks (NN) and deep learning (DL) to model the neuronal cells and their intricate functionality, and networking for processing the data (i.e., information) [12]. The terminology NN-DL or deep NN (DNN) is usually used to indicate a network that has more than two hidden layers with an input layer and one output layer with multiple nodes, as shown in **Figure 2** [12, 16]. The variables ( ) , , *<sup>m</sup> x wi i* and *<sup>j</sup> y* are the DNN's parameters that are defined as the input node, weight of the hidden layer node, and the output node, respectively. As pointed out in [16] DNN model has more hidden layers, which requires longer simulation time and more training data storage.

The DNN modeling requires (i) characterizing the DNN's system parameters and the associated "loss" function in terms of the weight parameter (*m*) *wi* , and (ii) tune these parameters using the training data collected by the system architect under controlled environment. **Figure 3** provides a high-level description of the key DNN tuning parameters, including layer size and related mini-batch size for numerical approximation of gradient, gradient threshold, and learning rate. **Figure 3(a)** illustrates the layer size; **Figure 3(b)** shows the exploding gradient if the "terms" in the differential equation are greater than 1, and **Figure 3(c)** depicts the learning rate. **Figure 3(c)** shows that the learning rate can have a large learning rate and a small learning rate that can be used for fast adaptation during data acquisition phase and slow adaptation during tracking phase after the loss function is converged. Note that a differential equation is usually used to characterize a neural network layer.

In practice for DNN, there are usually four hyper-parameters to tune, namely, layer size, mini-batch size, gradient threshold, and learning rate. Tuning the layer size to select the best size to produce the best manageable agent size of training data meaning that the layer size should be selected to optimize the required training

*Introductory Chapter: Overview of Data and Decision Sciences – Recent Advances and Applications DOI: http://dx.doi.org/10.5772/intechopen.112546*

**Figure 3.** *Deep neural network (DNN) and associated tuning parameters.*

data storage. Tuning the mini-batch size to get the best size for the numerical approximation of the gradient. Tuning the gradient threshold to obtain the best gradient clipping to avoid an "exploding gradient" and the best step size to achieve a timely gradient descent or ascent step. Finally, the tuning of the learning rate is required to achieve the best reward/stopping criteria and learning rate criteria for better convergence. Ref. [16] describes the tuning process for an application of DNN in the design and development of future global navigation and satellite system (GNSS).

#### *2.1.4 AI and expert systems*

Earliest example of rule-based expert system was DENDRAL a system for identifying chemical structures developed in the 1960s at Stanford University [17]. DENDRAL was the first system that was called AI and expert system because the decision-making process and problem-solving behavior of organic chemistry were automated to identify unknown organic molecules. Since then, many systems were derived from DENDRAL including MYCIN, REX, MOLGEN, PROSPECTOR, XCON, STEAMER, etc. As an example, MYCIN system was developed in the 1970s to help physicians diagnose meningitis and bacterial infections [18]. As another example, REX system was developed in the 1980s and it was written with the language LIPS from Bell Labs. REX system had advanced the AI and expert system by incorporating rule-based guidance for simple linear regression. The name REX was derived from Regression EXpert, and it was an interface between humans (or users) and statistical software, and an interactive modeling software (IMS). The IMS was created to allow the user interacts with the statistical software more effectively [19].

Since then, the AI and expert systems have undergone rapid evolution. Especially, the COVID pandemic had stimulated private companies to invest in smart and advanced technologies using machine learning and AI, expert systems, cloud computing, and the Internet of Things (IoT) that enable their businesses to make better, more informed decisions in the presence of uncertain environment and fast-changing conditions. As pointed out in [20], currently, ML-AI and expert systems are typically designed and built for specific applications to address specific business or organization needs or technical challenges. They can be classified into two categories, namely, (i) forward chaining ML-AI Expert System (FC/ML-AI-ES) that uses data to predict future events, and (ii) backward chaining ML-AI Expert System (BC/ML-AI-ES) that uses historical data to understand why something occurred. Examples of FC/ML-AI-ES are forecasting inventory demand, or future crop conditions associated with specific geographic areas, etc. Examples of BC/ ML-AI-ES are medical diagnostics or troubleshooting complex technical issues in hardware and software systems. A typical ML-AI-ES consists of three primary components, namely, knowledge base (KB), inference engine (IE), and user interface (UI). KB is defined as the data that the ML-AI-ES uses and works with. Modernized KB has automated capabilities that can organize the data and present the data as the user requested it (a.k.a. curate). IE is defined as part of the ML-AI-ES that applies logical rules and related mathematical and/or simulation models (a.k.a. algorithms) that can pull intelligent insights from KB based on user queries. Finally, UI is defined as the means through which a user interacts with KB through a commercial off the shelves software (COTS) platform. **Figure 4** depicts a high-level architecture of a ML-AI-ES [20].

*Introductory Chapter: Overview of Data and Decision Sciences – Recent Advances and Applications DOI: http://dx.doi.org/10.5772/intechopen.112546*

**Figure 4.** *Typical high-level ML-AI-ES architecture.*

#### **2.2 Decision science**

Unlike data science, the root of decision science has been found in open literature dated back in the 1930s with an application to economic [21]. As defined by Harvard Chan School of Public Health, Decision Science is the collection of quantitative techniques used to inform decision-making at the individual and population levels2 . It includes decision analysis, risk analysis, cost–benefit and cost-effectiveness analysis, constrained optimization, simulation modeling, and behavioral decision theory, as well as parts of operations research, microeconomics, statistical inference, management control, cognitive and social psychology, and computer science. With the emergence of ML-AI and digital technologies, decision science ranges from traditional decision theories and analysis to advanced decision theories using emerging decision optimization techniques leveraging game theory, ML-AI, and ML-AI combined with mathematical modeling and simulation (M&S) techniques.

Basically, the traditional decision theory and analysis deal with the reasoning that drives a person's decision, or organization's choice, or a business's decision. In general, the traditional decision theory and analysis consist of three core concepts, including (i) elicitation and interpretation of the decision maker's preferences, (ii) the search of available options, and (iii) the management of uncertainty, risks, and regrets [21–23]. For a large organizations or collective settings involved multiple options associated with different users' needs and interests, the decision-making process is extended to multiple stakeholders. In the 1950s, Von Stackelberg, Nobel laureate John Nash, and Von Neumann are universally credited for their pioneering work on using game theory applied to decision-making process [24–26]. They proposed mathematical models of strategic interaction among rational decision-makers. The latter can be either cooperative or non-cooperative.
