**Meet the editor**

Dr. Marco A. Aceves-Fernandez obtained his BSc degree (Eng) in Telematics from the Universidad de Colima, Mexico. He obtained both his MSc and PhD degrees from the University of Liverpool, England, in the field of Intelligent Systems. He is a full professor at the Universidad Autonoma de Queretaro, Mexico. He is being recognized as a member of the National System

of Researchers (SNI) since 2009. He has published more than 80 research papers as well as numerous book chapters and congress papers. He has contributed to more than 20 funded research projects both academic and industrial in the area of artificial intelligence ranging from environmental, biomedical, automotive, aviation, consumer, and robotics fields, among other applications. He is also an honorary president at the Mexican Association of Embedded Systems (AMESE), a senior member of the IEEE, and a board member of many institutions and associations. His research interests include intelligent and embedded systems.

Contents

**Preface IX**

**Navigation 3**

Talpur

**Section 1 Artificial Intelligence Fundamentals 1**

Chapter 1 **Biologically Inspired Intelligence with Applications on Robot**

Chapter 2 **A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification 29**

Chapter 3 **Differential Evolution Algorithm in the Construction of Interpretable Classification Models 49** Rafael Rivera-Lopez and Juana Canul-Reich

Chapter 4 **Advanced Content and Interface Personalization through Conversational Behavior and Affective Embodied**

Matej Rojc, Zdravko Kačič and Izidor Mlakar

**Section 2 Engineering and Technology Applications 47**

**Conversational Agents 75**

**Cryptography 101**

de la Luz Carrillo-González

Chapter 5 **High Performance Technology in Algorithmic**

Chaomin Luo, Gene Eu Jan, Zhenzhong Chu and Xinde Li

Mohd Najib Mohd Salleh, Noureen Talpur and Kashif Hussain

Arturo Lezama-León, José Juan Zarate-Corona, Evangelina Lezama-León, José Angel Montes-Olguín, Juan Ángel Rosales-Alba and Ma.

## Contents

## **Preface XIII**




Chapter 16 **A Quantitative Approach for Web Usability Using Eye**

Chapter 17 **Deep Learning Models for Predicting Phenotypic Traits and**

Chapter 18 **Can Reinforcement Learning Be Applied to Surgery? 353**

Chapter 22 **Static/Dynamic Zoometry Concept to Design Cattle Facilities Using Back Propagation Neural Network (BPNN) 435** Sugiono Sugiono, Rudy Soenoko and Rio Prasetyo Lukodono

Md. Mohaiminul Islam, Yang Wang and Pingzhao Hu

Masakazu Sato, Kaori Koga, Tomoyuki Fujii and Yutaka Osuga

Contents **VII**

Zeinab Hajjar, Shokoufe Tayyebi and Mohammad Hosein Eghbal

López-Orozco and Florencia-Juárez

**Diseases from Omics Data 333**

Chapter 19 **Application of AI in Modeling of Real System in**

Chapter 20 **Application of AI in Chemical Engineering 399**

Chapter 21 **Application of Biomedical Text Mining 417**

M. H. Ahmadi Azqhandi and M. Shekari

**Tracking Data 317**

**Chemistry 383**

Ahmadi

Lejun Gong

Chapter 16 **A Quantitative Approach for Web Usability Using Eye Tracking Data 317** López-Orozco and Florencia-Juárez

Chapter 6 **A Deterministic Algorithm for Arabic Character Recognition**

Chapter 7 **Human-AI Synergy in Creativity and Innovation 143**

Chapter 8 **Min k-Cut for Asset Selection in Risk-Based Portfolio**

Chapter 9 **Virtual Reality for Urban Sound Design: A Tool for Architects**

Spyros Makridakis, Antonis Polemitis, George Giaglis and Soula

Evon Abu-Taieh, Auhood Alfaries, Nabeel Zanoon, Issam H. Al

**Based on Letter Properties 123**

Hadid and Alia M. Abu-Tayeh

Saejoon Kim and Soong Kim

**and Urban Planners 179**

**Progress of AI 197**

Chapter 10 **Blockchain: The Next Breakthrough in the Rapid**

Chapter 11 **Augmenting Reality with Intelligent Interfaces 221**

Chapter 12 **The Today Tendency of Sentiment Classification 243**

Chapter 14 **Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis 275**

Chapter 15 **Normal Versus Abnormal ECG Classification by the Aid of Deep**

Dov Schafer and David Kaufman

Vo Ngoc Phu and Vo Thi Ngoc Tran

**Satisfaction Problem 263** Noureddine Bouhmala

Yasir Hassan Ali

**Learning 295**

**Section 3 Life and Health Sciences 293**

Linpeng Jin and Jun Dong

Chapter 13 **A Multilevel Genetic Algorithm for the Maximum**

Tony McCaffrey

**VI** Contents

**Strategies 161**

Josep Llorca

Louca


Preface

we faced a century ago.

world.

has reached the limits of our abilities to predict.

1. Artificial Intelligence Fundamentals 2. Engineering and Technology Applications

3. Life and Health Sciences

From the earliest humans to populate the earth, we have gradually tried to understand and control the world around us. From trying to understand such phenomena, humans started to make predictions to various extents. For instance, humans started to make predictions about motions of the planets, eclipses, cycles of rainfall, or periodicity of certain diseases. However, in the last few decades, the complexity of the predictions needed to be carried out

Luckily, the dawn or electronic computers are deeply increasing its abilities to predict the nature, although the problems we are facing now are far more complex than the problems

The ability of these machines to demonstrate advanced cognitive skills in taking decisions, learn and perceive the environment, predict certain behavior, and process written or spoken languages, among other skills, makes this discipline of paramount importance in today's

The wide range of applications in artificial intelligence nowadays is shown in a variety of

I hope that this work is of interest to students and researchers alike as I did my best to com‐

**Dr. Marco A. Aceves-Fernandez, PhD**

Universidad Autonoma de Queretaro, Mexico

applications compiled in this book. This book is organized as follows:

prise quality research contributions with a number of different applications.

## Preface

From the earliest humans to populate the earth, we have gradually tried to understand and control the world around us. From trying to understand such phenomena, humans started to make predictions to various extents. For instance, humans started to make predictions about motions of the planets, eclipses, cycles of rainfall, or periodicity of certain diseases. However, in the last few decades, the complexity of the predictions needed to be carried out has reached the limits of our abilities to predict.

Luckily, the dawn or electronic computers are deeply increasing its abilities to predict the nature, although the problems we are facing now are far more complex than the problems we faced a century ago.

The ability of these machines to demonstrate advanced cognitive skills in taking decisions, learn and perceive the environment, predict certain behavior, and process written or spoken languages, among other skills, makes this discipline of paramount importance in today's world.

The wide range of applications in artificial intelligence nowadays is shown in a variety of applications compiled in this book. This book is organized as follows:


I hope that this work is of interest to students and researchers alike as I did my best to com‐ prise quality research contributions with a number of different applications.

> **Dr. Marco A. Aceves-Fernandez, PhD** Universidad Autonoma de Queretaro, Mexico

**Section 1**

**Artificial Intelligence Fundamentals**

**Artificial Intelligence Fundamentals**

**Chapter 1**

Provisional chapter

**Biologically Inspired Intelligence with Applications on**

DOI: 10.5772/intechopen.75692

Biologically inspired intelligence technique, an important embranchment of series on computational intelligence, plays a crucial role for robotics. The autonomous robot and vehicle industry has had an immense impact on our economy and society and this trend will continue with biologically inspired neural network techniques. In this chapter, multiple robots cooperate to achieve a common coverage goal efficiently, which can improve the work capacity, share the coverage tasks, and reduce the completion time by a biologically inspired intelligence technique, is addressed. In many real-world applications, the coverage task has to be completed without any prior knowledge of the environment. In this chapter, a neural dynamics approach is proposed for complete area coverage by multiple robots. A bio-inspired neural network is designed to model the dynamic environment and to guide a team of robots for the coverage task. The dynamics of each neuron in the topologically organized neural network is characterized by a shunting neural equation. Each mobile robot treats the other robots as moving obstacles. Each robot path is autonomously generated from the dynamic activity landscape of the neural network and the previous robot position. The proposed model algorithm is computationally sim-

Keywords: biologically inspired intelligence, real-time motion planning, navigation and

Biologically inspired intelligence technique, an important embranchment of series on computational intelligence, plays a crucial role for robotics. The autonomous robot and vehicle

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Biologically Inspired Intelligence with Applications on

**Robot Navigation**

Robot Navigation

Abstract

mapping

1. Introduction

Xinde Li

Xinde Li

Chaomin Luo, Gene Eu Jan, Zhenzhong Chu and

ple. The feasibility is validated by four simulation studies.

Chaomin Luo, Gene Eu Jan, Zhenzhong Chu and

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75692

#### **Biologically Inspired Intelligence with Applications on Robot Navigation** Biologically Inspired Intelligence with Applications on Robot Navigation

DOI: 10.5772/intechopen.75692

Chaomin Luo, Gene Eu Jan, Zhenzhong Chu and Xinde Li Chaomin Luo, Gene Eu Jan, Zhenzhong Chu and Xinde Li

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75692

#### Abstract

Biologically inspired intelligence technique, an important embranchment of series on computational intelligence, plays a crucial role for robotics. The autonomous robot and vehicle industry has had an immense impact on our economy and society and this trend will continue with biologically inspired neural network techniques. In this chapter, multiple robots cooperate to achieve a common coverage goal efficiently, which can improve the work capacity, share the coverage tasks, and reduce the completion time by a biologically inspired intelligence technique, is addressed. In many real-world applications, the coverage task has to be completed without any prior knowledge of the environment. In this chapter, a neural dynamics approach is proposed for complete area coverage by multiple robots. A bio-inspired neural network is designed to model the dynamic environment and to guide a team of robots for the coverage task. The dynamics of each neuron in the topologically organized neural network is characterized by a shunting neural equation. Each mobile robot treats the other robots as moving obstacles. Each robot path is autonomously generated from the dynamic activity landscape of the neural network and the previous robot position. The proposed model algorithm is computationally simple. The feasibility is validated by four simulation studies.

Keywords: biologically inspired intelligence, real-time motion planning, navigation and mapping

## 1. Introduction

Biologically inspired intelligence technique, an important embranchment of series on computational intelligence, plays a crucial role for robotics. The autonomous robot and vehicle

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

industry has had an immense impact on our economy and society and this trend will continue with biologically inspired neural network techniques. Biologically inspired intelligence, such as biologically inspired neural networks (BNNs), is about learning from nature, which can be applied to the real world robot and vehicle systems. Recently, the research and development of bio-inspired systems for robotic applications is increasingly expanding worldwide. Biologically inspired algorithms contain emerging subtopics such as bio-inspired neural network algorithms, brain-inspired neural networks, swam intelligence with BNN, ant colony optimization algorithms (ACO) with BNN, bee colony optimization algorithms (BCO), particle swarm optimization with BNN, immune systems with BNN, and biologically inspired evolutionary optimization and algorithms. Additionally, it is decomposed of computational aspects of bio-inspired systems such as machine vision, pattern recognition for robot and vehicle systems, motion control, motion planning, movement control, sensor-motor coordination, and learning in biological systems for robot and vehicle systems.

free cells from the initial point to the goal point. Oh et al. [6] developed a triangular cell decomposition method for unknown environments for CAC. This method combines triangular cell decomposition, a template-based approach, and a wall following navigation algorithm for CAC. It can only deal with a single robot CAC. Wagner et al. [21] proposed multi-robot coverage algorithms to explore and cover an unknown environment by approximate cell decomposition approach. The group of multiple robots has limited sensors and no explicit communication. Kurabayashi et al. [15, 16] proposed an exact cell decomposition off-line coverage algorithm for multiple cleaning robots using a Voronoi diagram and boustrophedon approach, where a cost function is defined to obtain a near-optimal solution of the collective coverage task. The approach can avoid overlaps of sweeping areas. The algorithm needs to

Biologically Inspired Intelligence with Applications on Robot Navigation

http://dx.doi.org/10.5772/intechopen.75692

5

Gabriely and Rimon [42] suggested a spanning tree coverage approach with single robot. It divides up the workspace into discrete cells and generates spanning tree of graph induced by the cells. The robot is able to cover every point precisely once and travel an optimal path in a grid-like representation in the workspace that achieves complete area coverage. Hazon and Kaminka [14] developed a complete and robust multi-robot spanning-tree coverage (MSTC) algorithm based on approximate cell decomposition. Afterwards, the coverage efficiency was improved by a multi-robot forest coverage (MFC) algorithm of approximate cell decomposition proposed by Zheng et al. [23]. Hazon et al. [18] successfully extended the spanning tree work with single robot of Gabriely and Rimon [42] and their off-line work [14] into on-line

Behaviour-based strategy for a multi-robot system employs relatively little internal variable states to model the environment and makes fewer assumptions about their environment, thus it is more robust. The cooperative coverage by multirobots with the feature of reactive planning model is implemented by designing individual and team behaviours. Jung et al. [29] combined the advantages of spatial and topological map representations of the environment in a behaviour-based framework for cooperative cleaning of multiple robots. The cooperative multi-robot coverage missions can be accomplished by unifying paths in navigation, cooperation, communication and reactive behaviours [43]. The more detailed explanation of the implementation of the architecture for behaviour-based agents for cooperative multiple cleaning robots are given in [30]. Balch and Arkin [26] proposed a behaviour-based multi-robot approach for coverage, in which the robots are developed with various goal-oriented behaviours for navigation. Recently, Fang et al. [27] used a behaviour-based coverage approach for multirobots to efficiently define the region in which an optimal solution can be found in unknown environments. Most recently, an idea of a leader robot and other follower robots for planning path and controlling robots was proposed using the behaviour-based model [44].

The idea of building up a graph of environment is used for multi-robot coverage. One workspace is decomposed into subregions called cells and therefore a graph may be constructed. The underlying idea of graph-based approach is that multiple robots traverse every edge of the graph to achieve the cooperative coverage. Wagner et al. [20, 31] proposed an approximate cellular decomposition approach for multi-robot coverage to decompose the environment. They employ a dirt grid on the floor for communication among robots. The

consider in advance the knowledge of the workspace with known obstacles.

multi-robot coverage and improved the coverage efficiency.

One of the applications of biologically inspired intelligence techniques on the robot navigation is complete area coverage navigation of autonomous mobile robots. Complete area coverage (CAC) is an essential issue in mobile robots, which requires the robot path to pass through every area in the workspace. Many robotic applications require CAC, e.g., cleaning robots [1– 6], vacuum robots [7], painter robots, autonomous underwater covering vehicles, de-mining robots [8], land mine detectors [9], lawn mowers, automated harvesters [10], agricultural crop harvesting equipment [11], and window cleaners [12]. CAC can be completed by a single robot or multiple robots.

Nowadays, cooperative coverage by a multiple robot system is becoming increasingly important. The cooperative area coverage by multiple robots can improve the efficiency and complete the work more quickly than a single robot. These robots may share the coverage tasks and thus reduce the time to complete the coverage task. Additionally, if one of the robots fails, the rest will fulfill the missions, therefore, the coverage by robots is able to improve reliability and robustness. For instance, in de-mining applications, coverage reliability, an important factor, is enhanced by using cooperative multirobots. In some cleaning applications, the workspace (e.g., a stadium) needs to be cleaned in a limited amount of time. Thus, it requires multiple robots to work in a cooperative manner.

Multi-robot coverage has been extensively studied using various models. Depending on whether a map is required for the multirobots, the coverage models may be categorized as off-line and on-line algorithms [13]. Off-line algorithms require a map of workspace for robots (e.g. [14–16]), while on-line algorithms do not need an environmental map (e.g. [17–21]). Previous research on area coverage may be classified into cell-decomposition-based model (e.g. [15, 16, 21–24]), spanning-tree-based approach (e.g. [14, 18, 23–25]), behaviour-based model (e.g. [26–30]), graph-based model (e.g. [20, 31, 32]), depth first search approach (e.g. [18, 19]), Frontier-based model (e.g. [33–37]), and others (e.g. [38–41]).

Many multi-robot coverage algorithms are based on cell decomposition. Cell decomposition methods break continuous space into a finite set of cells. After this decomposition, a connectivity graph is constructed according to the adjacency relationships between the cells. From this connectivity graph, a continuous path can be determined by simply following adjacent free cells from the initial point to the goal point. Oh et al. [6] developed a triangular cell decomposition method for unknown environments for CAC. This method combines triangular cell decomposition, a template-based approach, and a wall following navigation algorithm for CAC. It can only deal with a single robot CAC. Wagner et al. [21] proposed multi-robot coverage algorithms to explore and cover an unknown environment by approximate cell decomposition approach. The group of multiple robots has limited sensors and no explicit communication. Kurabayashi et al. [15, 16] proposed an exact cell decomposition off-line coverage algorithm for multiple cleaning robots using a Voronoi diagram and boustrophedon approach, where a cost function is defined to obtain a near-optimal solution of the collective coverage task. The approach can avoid overlaps of sweeping areas. The algorithm needs to consider in advance the knowledge of the workspace with known obstacles.

industry has had an immense impact on our economy and society and this trend will continue with biologically inspired neural network techniques. Biologically inspired intelligence, such as biologically inspired neural networks (BNNs), is about learning from nature, which can be applied to the real world robot and vehicle systems. Recently, the research and development of bio-inspired systems for robotic applications is increasingly expanding worldwide. Biologically inspired algorithms contain emerging subtopics such as bio-inspired neural network algorithms, brain-inspired neural networks, swam intelligence with BNN, ant colony optimization algorithms (ACO) with BNN, bee colony optimization algorithms (BCO), particle swarm optimization with BNN, immune systems with BNN, and biologically inspired evolutionary optimization and algorithms. Additionally, it is decomposed of computational aspects of bio-inspired systems such as machine vision, pattern recognition for robot and vehicle systems, motion control, motion planning, movement control, sensor-motor coordination,

One of the applications of biologically inspired intelligence techniques on the robot navigation is complete area coverage navigation of autonomous mobile robots. Complete area coverage (CAC) is an essential issue in mobile robots, which requires the robot path to pass through every area in the workspace. Many robotic applications require CAC, e.g., cleaning robots [1– 6], vacuum robots [7], painter robots, autonomous underwater covering vehicles, de-mining robots [8], land mine detectors [9], lawn mowers, automated harvesters [10], agricultural crop harvesting equipment [11], and window cleaners [12]. CAC can be completed by a single robot

Nowadays, cooperative coverage by a multiple robot system is becoming increasingly important. The cooperative area coverage by multiple robots can improve the efficiency and complete the work more quickly than a single robot. These robots may share the coverage tasks and thus reduce the time to complete the coverage task. Additionally, if one of the robots fails, the rest will fulfill the missions, therefore, the coverage by robots is able to improve reliability and robustness. For instance, in de-mining applications, coverage reliability, an important factor, is enhanced by using cooperative multirobots. In some cleaning applications, the workspace (e.g., a stadium) needs to be cleaned in a limited amount of time. Thus, it requires

Multi-robot coverage has been extensively studied using various models. Depending on whether a map is required for the multirobots, the coverage models may be categorized as off-line and on-line algorithms [13]. Off-line algorithms require a map of workspace for robots (e.g. [14–16]), while on-line algorithms do not need an environmental map (e.g. [17–21]). Previous research on area coverage may be classified into cell-decomposition-based model (e.g. [15, 16, 21–24]), spanning-tree-based approach (e.g. [14, 18, 23–25]), behaviour-based model (e.g. [26–30]), graph-based model (e.g. [20, 31, 32]), depth first search approach (e.g.

Many multi-robot coverage algorithms are based on cell decomposition. Cell decomposition methods break continuous space into a finite set of cells. After this decomposition, a connectivity graph is constructed according to the adjacency relationships between the cells. From this connectivity graph, a continuous path can be determined by simply following adjacent

and learning in biological systems for robot and vehicle systems.

4 Artificial Intelligence - Emerging Trends and Applications

multiple robots to work in a cooperative manner.

[18, 19]), Frontier-based model (e.g. [33–37]), and others (e.g. [38–41]).

or multiple robots.

Gabriely and Rimon [42] suggested a spanning tree coverage approach with single robot. It divides up the workspace into discrete cells and generates spanning tree of graph induced by the cells. The robot is able to cover every point precisely once and travel an optimal path in a grid-like representation in the workspace that achieves complete area coverage. Hazon and Kaminka [14] developed a complete and robust multi-robot spanning-tree coverage (MSTC) algorithm based on approximate cell decomposition. Afterwards, the coverage efficiency was improved by a multi-robot forest coverage (MFC) algorithm of approximate cell decomposition proposed by Zheng et al. [23]. Hazon et al. [18] successfully extended the spanning tree work with single robot of Gabriely and Rimon [42] and their off-line work [14] into on-line multi-robot coverage and improved the coverage efficiency.

Behaviour-based strategy for a multi-robot system employs relatively little internal variable states to model the environment and makes fewer assumptions about their environment, thus it is more robust. The cooperative coverage by multirobots with the feature of reactive planning model is implemented by designing individual and team behaviours. Jung et al. [29] combined the advantages of spatial and topological map representations of the environment in a behaviour-based framework for cooperative cleaning of multiple robots. The cooperative multi-robot coverage missions can be accomplished by unifying paths in navigation, cooperation, communication and reactive behaviours [43]. The more detailed explanation of the implementation of the architecture for behaviour-based agents for cooperative multiple cleaning robots are given in [30]. Balch and Arkin [26] proposed a behaviour-based multi-robot approach for coverage, in which the robots are developed with various goal-oriented behaviours for navigation. Recently, Fang et al. [27] used a behaviour-based coverage approach for multirobots to efficiently define the region in which an optimal solution can be found in unknown environments. Most recently, an idea of a leader robot and other follower robots for planning path and controlling robots was proposed using the behaviour-based model [44].

The idea of building up a graph of environment is used for multi-robot coverage. One workspace is decomposed into subregions called cells and therefore a graph may be constructed. The underlying idea of graph-based approach is that multiple robots traverse every edge of the graph to achieve the cooperative coverage. Wagner et al. [20, 31] proposed an approximate cellular decomposition approach for multi-robot coverage to decompose the environment. They employ a dirt grid on the floor for communication among robots. The robots communicate with each other by leaving traces. A graph is built up for representation of the workspace to be covered. Each edge is assigned to two "smell labels". If an edge is traversed, it is marked by a fresh trace of odour. Recently, to benefit from the graph-based approach, Williams and Burdick [32] constructed a graph for multi-robot navigation. An improved graph representation of the task is applied for boundary coverage problem and a graph algorithm is developed for the boundary coverage problem.

proposed by Hazon and Kaminka [14] is robust and complete. The robots with this algorithm would cover cells more than once. Neural network methods have been broadly applied to robot motion planning, control and coverage (e.g. [2], [49, 50]). However, most of them deal with single robot for coverage (e.g. [2, 5, 6]). Some neural network models require learning procedures (e.g. [51, 52], which are computationally expensive and difficult to achieve CAC in real time.

Biologically Inspired Intelligence with Applications on Robot Navigation

http://dx.doi.org/10.5772/intechopen.75692

7

In this chapter, a neural dynamics approach is proposed for multi-robot area coverage applications. Mobile robots have no collisions among themselves and can avoid obstacles and cooperatively work together to improve cleaning productivity effectively. The proposed approach is capable of performing CAC for multirobots, autonomously without any human intervention. Each robot treats the other robots as moving obstacles. The neural activity landscape of each robot is able to guide the robot to follow a reasonable path and to cooperate with other robots. In this paper, the real-time path is generated by employing a neural network algorithm, without either any prior knowledge of the environment or any pre-defined template. No learning procedures are required in the proposed algorithm. The advantage of such CAC strategy using the proposed neural networks is that the robots do not repeat the previous covered locations. The simulation studies demonstrate that the robustness and fault-tolerant can be ensured if one of the robots fails. It is computationally simple and flexible to implement the proposed algorithm on autonomous CAC as no learning procedures and no templates are required. The dynamics of each neuron is characterized by a shunting equation or an additive equation derived from the membrane model for a biological neural system [53]. There are only local lateral connections among neurons. Thus, the computational complexity depends on the neural network size. The varying environment is represented by the dynamic activity landscape of the neural network. Multiple robots share the environmental information, which is collected from the sensors mounted in the workspace, and all the sensors on individual robots. The effective, complete, and robust cooperate area coverage is achieved by the proposed neural dynamics model. The term "cooperate" is in the sense that multiple robots can work together to achieve a common coverage mission more efficiently and more quickly. "Robust" is in the sense that the multi-robot system does not fatally failed or is not wholly affected by a single robot failure. In this chapter, cleaning robot is used as an example, but the method is

The rest of the chapter is organized as follows. In Section 2, the biological inspiration, model algorithm, and stability analysis of this neural dynamics-based approach to real-time collisionfree CAC by multirobot are addressed. Several simulation studies aimed to demonstrate the completeness, robustness, and effectiveness of the proposed model for CAC are performed and described in Section 3. Finally, several important properties of the proposed model with

In this section, the originality of the proposed neural network approach to real-time CAC for multiple mobile robots will be briefly introduced. Then, the fundamental concept and model

applicable for any CAC applications.

CAC are concluded in Section 4.

2. The proposed model

algorithm of the proposed approach will be presented.

A robot can obtain updated knowledge for its environmental map if it moves to a frontier since frontiers are this type of areas that are on the boundary between open space and uncovered space. With the movement of single robot or multiple robots to successive frontiers, the robots can obtain sufficient information to build up and update the maps for coverage mission. Yamauchi [36, 37] adopted a Frontier-based coverage approach, which leads each robot to the closest unknown region, represented by frontier between the free and unknown workspace to produce a robust autonomous cooperative coverage strategy. The technique builds a global map of the environment, which is analysed to locate the frontiers around the robot and environments. Recently, Burgard et al. [33] and Ko et al. [35] developed algorithms to compute utilities of frontier cells to cover different areas of the unknown environments.

Some previous work combines several approaches to take an advantage of different benefits. For instance, Rekleitis et al. [19] split a terrain by an exact cell decomposition method and a tree was built with each subregion as a node. A centralized depth first search (DFS) algorithm is employed for robots to traverse the unknown region and thus the entire areas are explored and covered. Hazon et al. [18] recently extended the spanning tree work with single robot [42] and their own off-line work [14] into on-line multi-robot coverage. Their spanning trees are constructed by a DFS-like procedure. The effective, robust, and complete multi-robot coverage is implemented. Gossage et al. [34] combined the advantages, in order to obtain robust cooperation, of local Frontier-based approach and global graph-based representation of unknown environments for the cooperation of multirobot. Most recently, Zavlanos and Pappas [45] combined a distributed multi-destination potential field approach and a dynamic assignment algorithm for coverage motion planning of multiple robots.

In some other methods, multirobots collect the incoming sensor information of every single robot in a team to cooperatively perform coverage in unknown environments. In most cases, cell decomposition method is used to split terrain and ensures complete coverage [46]. Butler [38, 39] proposed a distributed cooperative coverage algorithm for multirobot, which performs independently on each robot in a team with a rectilinear environment. The algorithm employs only intrinsic contact sensing knowledge to determine the boundaries of the environment. Recently, Boonpinon and Sudsang [47] developed a multi-robot mapping and area coverage approach using a centroidal Voronoi diagram where a team of robots exchange limited sensory information by explicit communication. Latimer et al. [40] and Rekleitis et al. [41] proposed a coordinated multi-robot approach for a coverage mission while the workspace is typically broken down into distinct regions by Boustrophedon decomposition and different region is covered by robots with back-and-forth motions. Most recently, Schwager et al. [48] suggested a near optimal sensing configuration for coverage by a group of robots by learning the distribution of the sensory information in environments.

Although there have been many studies on multi-robot coverage and most attempt to improve completeness, very few existing coverage algorithms focus on robustness. The MSTC algorithm proposed by Hazon and Kaminka [14] is robust and complete. The robots with this algorithm would cover cells more than once. Neural network methods have been broadly applied to robot motion planning, control and coverage (e.g. [2], [49, 50]). However, most of them deal with single robot for coverage (e.g. [2, 5, 6]). Some neural network models require learning procedures (e.g. [51, 52], which are computationally expensive and difficult to achieve CAC in real time.

In this chapter, a neural dynamics approach is proposed for multi-robot area coverage applications. Mobile robots have no collisions among themselves and can avoid obstacles and cooperatively work together to improve cleaning productivity effectively. The proposed approach is capable of performing CAC for multirobots, autonomously without any human intervention. Each robot treats the other robots as moving obstacles. The neural activity landscape of each robot is able to guide the robot to follow a reasonable path and to cooperate with other robots. In this paper, the real-time path is generated by employing a neural network algorithm, without either any prior knowledge of the environment or any pre-defined template. No learning procedures are required in the proposed algorithm. The advantage of such CAC strategy using the proposed neural networks is that the robots do not repeat the previous covered locations. The simulation studies demonstrate that the robustness and fault-tolerant can be ensured if one of the robots fails. It is computationally simple and flexible to implement the proposed algorithm on autonomous CAC as no learning procedures and no templates are required. The dynamics of each neuron is characterized by a shunting equation or an additive equation derived from the membrane model for a biological neural system [53]. There are only local lateral connections among neurons. Thus, the computational complexity depends on the neural network size. The varying environment is represented by the dynamic activity landscape of the neural network. Multiple robots share the environmental information, which is collected from the sensors mounted in the workspace, and all the sensors on individual robots.

The effective, complete, and robust cooperate area coverage is achieved by the proposed neural dynamics model. The term "cooperate" is in the sense that multiple robots can work together to achieve a common coverage mission more efficiently and more quickly. "Robust" is in the sense that the multi-robot system does not fatally failed or is not wholly affected by a single robot failure. In this chapter, cleaning robot is used as an example, but the method is applicable for any CAC applications.

The rest of the chapter is organized as follows. In Section 2, the biological inspiration, model algorithm, and stability analysis of this neural dynamics-based approach to real-time collisionfree CAC by multirobot are addressed. Several simulation studies aimed to demonstrate the completeness, robustness, and effectiveness of the proposed model for CAC are performed and described in Section 3. Finally, several important properties of the proposed model with CAC are concluded in Section 4.

## 2. The proposed model

robots communicate with each other by leaving traces. A graph is built up for representation of the workspace to be covered. Each edge is assigned to two "smell labels". If an edge is traversed, it is marked by a fresh trace of odour. Recently, to benefit from the graph-based approach, Williams and Burdick [32] constructed a graph for multi-robot navigation. An improved graph representation of the task is applied for boundary coverage problem and a

A robot can obtain updated knowledge for its environmental map if it moves to a frontier since frontiers are this type of areas that are on the boundary between open space and uncovered space. With the movement of single robot or multiple robots to successive frontiers, the robots can obtain sufficient information to build up and update the maps for coverage mission. Yamauchi [36, 37] adopted a Frontier-based coverage approach, which leads each robot to the closest unknown region, represented by frontier between the free and unknown workspace to produce a robust autonomous cooperative coverage strategy. The technique builds a global map of the environment, which is analysed to locate the frontiers around the robot and environments. Recently, Burgard et al. [33] and Ko et al. [35] developed algorithms to compute

Some previous work combines several approaches to take an advantage of different benefits. For instance, Rekleitis et al. [19] split a terrain by an exact cell decomposition method and a tree was built with each subregion as a node. A centralized depth first search (DFS) algorithm is employed for robots to traverse the unknown region and thus the entire areas are explored and covered. Hazon et al. [18] recently extended the spanning tree work with single robot [42] and their own off-line work [14] into on-line multi-robot coverage. Their spanning trees are constructed by a DFS-like procedure. The effective, robust, and complete multi-robot coverage is implemented. Gossage et al. [34] combined the advantages, in order to obtain robust cooperation, of local Frontier-based approach and global graph-based representation of unknown environments for the cooperation of multirobot. Most recently, Zavlanos and Pappas [45] combined a distributed multi-destination potential field approach and a dynamic assignment

In some other methods, multirobots collect the incoming sensor information of every single robot in a team to cooperatively perform coverage in unknown environments. In most cases, cell decomposition method is used to split terrain and ensures complete coverage [46]. Butler [38, 39] proposed a distributed cooperative coverage algorithm for multirobot, which performs independently on each robot in a team with a rectilinear environment. The algorithm employs only intrinsic contact sensing knowledge to determine the boundaries of the environment. Recently, Boonpinon and Sudsang [47] developed a multi-robot mapping and area coverage approach using a centroidal Voronoi diagram where a team of robots exchange limited sensory information by explicit communication. Latimer et al. [40] and Rekleitis et al. [41] proposed a coordinated multi-robot approach for a coverage mission while the workspace is typically broken down into distinct regions by Boustrophedon decomposition and different region is covered by robots with back-and-forth motions. Most recently, Schwager et al. [48] suggested a near optimal sensing configuration for coverage by

a group of robots by learning the distribution of the sensory information in environments.

Although there have been many studies on multi-robot coverage and most attempt to improve completeness, very few existing coverage algorithms focus on robustness. The MSTC algorithm

graph algorithm is developed for the boundary coverage problem.

6 Artificial Intelligence - Emerging Trends and Applications

utilities of frontier cells to cover different areas of the unknown environments.

algorithm for coverage motion planning of multiple robots.

In this section, the originality of the proposed neural network approach to real-time CAC for multiple mobile robots will be briefly introduced. Then, the fundamental concept and model algorithm of the proposed approach will be presented.

#### 2.1. Biological inspiration

In 1952, Hodgkin and Huxley [53] proposed a computational model for a patch of membrane in a biological neural system using electrical circuit elements. In this model, the dynamics of voltage across the membrane, Vm, is described using the state equation technique as

$$\begin{aligned} \mathbf{C}\_{m}\frac{dV\_{m}}{dt} &= -\left(E\_{p} + V\_{m}\right)\mathbf{g}\_{p} + (E\_{\text{Na}} - V\_{m})\mathbf{g}\_{\text{Na}} \\ &- (E\_{K} + V\_{m})\mathbf{g}\_{K} \end{aligned} \tag{1}$$

denoted by a vector qi <sup>∈</sup> <sup>R</sup><sup>2</sup>

a shunting equation as

dxi

Iið Þ¼ m; n

excitatory and inhibitory inputs, S<sup>e</sup>

j � �, where <sup>∣</sup>qi � qj

by wij ¼ f jqi � qj

8 ><

>:

where <sup>E</sup> <sup>≫</sup> <sup>B</sup> is a very large positive constant. The terms Ii ½ �<sup>þ</sup> <sup>þ</sup> <sup>P</sup><sup>n</sup>

<sup>i</sup> and Si

, uniquely represents a position in W. In the proposed model, the

Biologically Inspired Intelligence with Applications on Robot Navigation

http://dx.doi.org/10.5772/intechopen.75692

k

wij xj � �<sup>þ</sup>

1 A

<sup>j</sup>¼<sup>1</sup> wij xj

<sup>i</sup> in Eq. (2), respectively. Function ½ � a <sup>þ</sup> is a linear-

∣ represents the Euclidean distance between vectors qi and qj

(3)

9

(4)

� �<sup>þ</sup> and Ii ½ �� are the

j¼1

excitatory input results from the uncovered locations and the lateral neural connections, while the inhibitory input results from the obstacles only. Each neuron has local lateral connections to its neighbouring neurons that constitute a subset R<sup>i</sup> in S. The subset R<sup>i</sup> is called the receptive field of the ith neuron in neurophysiology. The neuron responds only to the stimulus within its receptive field. Thus, the dynamics of the ith neuron in the neural network is characterized by

> 0 @

where k is the number of neural connections of the ith neuron to its neighbouring neurons within the receptive field Ri. The external input Ii to the ith neuron at Position (m,n) is defined as:

> E if it is uncovered 0 if it is covered

above-threshold function defined as ½ � a <sup>þ</sup> ¼ maxf g a; 0 and the nonlinear function ½ � a � is defined as ½ � a � ¼ maxf g �a; 0 . The connection weight wij from the ith neuron to the jth neuron is given

in the state space, and f að Þ is a monotonically decreasing function, such as a function defined as f að Þ¼ μ=a, if 0 ≤ a < r0; f að Þ¼ 0, if a ≥ r0, where μ and r<sup>0</sup> are positive constants. Therefore, each neuron has only local lateral connections in a small region 0½ � ;r<sup>0</sup> . It is obvious that the weight wij is symmetric, i.e., wij ¼ wji. A schematic diagram of the neural network in 2D with three-layer (r = 1, 2, and 3) neighbouring neurons with regard to the central neuron C(m,n) is shown in Figure 1, where r<sup>0</sup> is chosen as r<sup>0</sup> ¼ 2 and r is the number of circles enclosing the central neuron C(m,n). The receptive field of the ith neuron is represented by a circle with a radius of r0. The ith neuron has only eight lateral connections to its neighbouring neurons that are within its receptive field. The 2D Cartesian workspace in the proposed approach is discretized into squares. The diagonal length of each discrete area is equal to the robot coverage radius that is the size of robot effector or footprint [2]. Each position (grid) uses a number to represent its environmental information. The neurons are placed uniformly on the space to represent covered positions, uncovered positions, and obstacles. In this algorithm, it is necessary to have a flag, denoted by Iið Þ m; n in Eq. (4), for a neuron at Position ð Þ m; n to indicate its status that is uncovered, covered, obstacle, or moving obstacle (another robot is regarded as a moving obstacle). This flag may be technically obtained by the sensor information from the current map. A topologically organized discrete map is employed to represent the workspace; each grid uses a number (flag) to represent its environmental information

�E if it is obstacle another robot ð Þ

dt ¼ � Axi <sup>þ</sup> ð Þ <sup>B</sup> � xi Ii ½ �<sup>þ</sup> <sup>þ</sup><sup>X</sup>

� ð Þ D þ xi Ii ½ ��

where Cm is the membrane capacitance, EK, ENa, and Ep are the Nernst potentials (saturation potentials) for potassium ions, sodium ions, and the passive leak current in the membrane, respectively. Parameters gK, gNa and gp represent the conductances of potassium, sodium, and passive channels, respectively. This model provides the foundation of the shunting model and leads to a number of model variations and applications [54].

By setting Cm <sup>¼</sup> 1 and substituting xi <sup>¼</sup> Ep <sup>þ</sup> Vm, A <sup>¼</sup> gp, B <sup>¼</sup> ENa <sup>þ</sup> Ep, D <sup>¼</sup> Ek � Ep, Se <sup>i</sup> ¼ gNa and S<sup>i</sup> <sup>i</sup> ¼ gK in Eq. (1), a shunting equation is obtained

$$\frac{d\mathbf{x}\_i}{dt} = -A\mathbf{x}\_i + (B - \mathbf{x}\_i)\mathbf{S}\_i^\varepsilon(t) - (D + \mathbf{x}\_i)\mathbf{S}\_i^i(t) \tag{2}$$

where xi is the neural activity (membrane potential) of the ith neuron. Parameters A, B, and D are nonnegative constants representing the passive decay rate, the upper and lower bounds of the neural activity, respectively. Variables Se <sup>i</sup> and Si <sup>i</sup> are the excitatory and inhibitory inputs to the neuron. This shunting model was first proposed by Grossberg to understand the real-time adaptive behaviour of individuals to complex and dynamic environmental contingencies and has many applications in visual perception, sensory motor control, and many other areas [54]. Research on biologically inspired robots has currently received attention [24, 55, 56].

#### 2.2. Model algorithm

The fundamental concept of the proposed model is to develop a neural network architecture, whose dynamic neural activity landscape represents the dynamically varying environment. By properly defining the external inputs from the varying environment and internal neural connections, the uncovered areas and obstacles are guaranteed to stay at the peak and the valley of the activity landscape of the neural network, respectively. The uncovered areas globally attract the robot in the whole state space through neural activity propagation, while the obstacles have only local effect in a small region to avoid collisions. The real-time collision-free robot area coverage is accomplished based on the dynamic activity landscape of the neural network, the previous robot position and the other robot positions, to guarantee all locations to be covered and the robots to travel along smooth, continuous paths with less turning.

The proposed topologically organized model is expressed in a 2D Cartesian workspace W of the cleaning robots. The position of the ith neuron in the state space S of the neural network, denoted by a vector qi <sup>∈</sup> <sup>R</sup><sup>2</sup> , uniquely represents a position in W. In the proposed model, the excitatory input results from the uncovered locations and the lateral neural connections, while the inhibitory input results from the obstacles only. Each neuron has local lateral connections to its neighbouring neurons that constitute a subset R<sup>i</sup> in S. The subset R<sup>i</sup> is called the receptive field of the ith neuron in neurophysiology. The neuron responds only to the stimulus within its receptive field. Thus, the dynamics of the ith neuron in the neural network is characterized by a shunting equation as

2.1. Biological inspiration

8 Artificial Intelligence - Emerging Trends and Applications

and S<sup>i</sup>

In 1952, Hodgkin and Huxley [53] proposed a computational model for a patch of membrane in a biological neural system using electrical circuit elements. In this model, the dynamics of

gp <sup>þ</sup> ð Þ ENa � Vm gNa

<sup>i</sup>ð Þ� <sup>t</sup> ð Þ <sup>D</sup> <sup>þ</sup> xi <sup>S</sup><sup>i</sup>

i

(1)

<sup>i</sup> ¼ gNa

ð Þt (2)

<sup>i</sup> are the excitatory and inhibitory inputs to

voltage across the membrane, Vm, is described using the state equation technique as

� ð Þ EK þ Vm gK

By setting Cm <sup>¼</sup> 1 and substituting xi <sup>¼</sup> Ep <sup>þ</sup> Vm, A <sup>¼</sup> gp, B <sup>¼</sup> ENa <sup>þ</sup> Ep, D <sup>¼</sup> Ek � Ep, Se

where xi is the neural activity (membrane potential) of the ith neuron. Parameters A, B, and D are nonnegative constants representing the passive decay rate, the upper and lower bounds of

<sup>i</sup> and Si

the neuron. This shunting model was first proposed by Grossberg to understand the real-time adaptive behaviour of individuals to complex and dynamic environmental contingencies and has many applications in visual perception, sensory motor control, and many other areas [54].

The fundamental concept of the proposed model is to develop a neural network architecture, whose dynamic neural activity landscape represents the dynamically varying environment. By properly defining the external inputs from the varying environment and internal neural connections, the uncovered areas and obstacles are guaranteed to stay at the peak and the valley of the activity landscape of the neural network, respectively. The uncovered areas globally attract the robot in the whole state space through neural activity propagation, while the obstacles have only local effect in a small region to avoid collisions. The real-time collision-free robot area coverage is accomplished based on the dynamic activity landscape of the neural network, the previous robot position and the other robot positions, to guarantee all locations to be

The proposed topologically organized model is expressed in a 2D Cartesian workspace W of the cleaning robots. The position of the ith neuron in the state space S of the neural network,

Research on biologically inspired robots has currently received attention [24, 55, 56].

covered and the robots to travel along smooth, continuous paths with less turning.

dt ¼ �Axi <sup>þ</sup> ð Þ <sup>B</sup> � xi <sup>S</sup><sup>e</sup>

where Cm is the membrane capacitance, EK, ENa, and Ep are the Nernst potentials (saturation potentials) for potassium ions, sodium ions, and the passive leak current in the membrane, respectively. Parameters gK, gNa and gp represent the conductances of potassium, sodium, and passive channels, respectively. This model provides the foundation of the shunting model and

dt ¼ � Ep <sup>þ</sup> Vm

Cm dVm

leads to a number of model variations and applications [54].

<sup>i</sup> ¼ gK in Eq. (1), a shunting equation is obtained

dxi

the neural activity, respectively. Variables Se

2.2. Model algorithm

$$\begin{split} \frac{d\mathbf{x}\_i}{dt} &= -A\mathbf{x}\_i + (B - \mathbf{x}\_i) \left( \left[ I\_i \right]^+ + \sum\_{j=1}^k \varpi\_{\dot{v}i} \left[ \mathbf{x}\_j \right]^+ \right) \\ &- (D + \mathbf{x}\_i) \left[ I\_i \right]^- \end{split} \tag{3}$$

where k is the number of neural connections of the ith neuron to its neighbouring neurons within the receptive field Ri. The external input Ii to the ith neuron at Position (m,n) is defined as:

$$I\_i(m,n) = \begin{cases} E & \text{if it is uncovered} \\ 0 & \text{if it is covered} \\ -E & \text{if it is obstacle (another robot)} \end{cases} \tag{4}$$

where <sup>E</sup> <sup>≫</sup> <sup>B</sup> is a very large positive constant. The terms Ii ½ �<sup>þ</sup> <sup>þ</sup> <sup>P</sup><sup>n</sup> <sup>j</sup>¼<sup>1</sup> wij xj � �<sup>þ</sup> and Ii ½ �� are the excitatory and inhibitory inputs, S<sup>e</sup> <sup>i</sup> and Si <sup>i</sup> in Eq. (2), respectively. Function ½ � a <sup>þ</sup> is a linearabove-threshold function defined as ½ � a <sup>þ</sup> ¼ maxf g a; 0 and the nonlinear function ½ � a � is defined as ½ � a � ¼ maxf g �a; 0 . The connection weight wij from the ith neuron to the jth neuron is given by wij ¼ f jqi � qj j � �, where <sup>∣</sup>qi � qj ∣ represents the Euclidean distance between vectors qi and qj in the state space, and f að Þ is a monotonically decreasing function, such as a function defined as f að Þ¼ μ=a, if 0 ≤ a < r0; f að Þ¼ 0, if a ≥ r0, where μ and r<sup>0</sup> are positive constants. Therefore, each neuron has only local lateral connections in a small region 0½ � ;r<sup>0</sup> . It is obvious that the weight wij is symmetric, i.e., wij ¼ wji. A schematic diagram of the neural network in 2D with three-layer (r = 1, 2, and 3) neighbouring neurons with regard to the central neuron C(m,n) is shown in Figure 1, where r<sup>0</sup> is chosen as r<sup>0</sup> ¼ 2 and r is the number of circles enclosing the central neuron C(m,n). The receptive field of the ith neuron is represented by a circle with a radius of r0. The ith neuron has only eight lateral connections to its neighbouring neurons that are within its receptive field. The 2D Cartesian workspace in the proposed approach is discretized into squares. The diagonal length of each discrete area is equal to the robot coverage radius that is the size of robot effector or footprint [2]. Each position (grid) uses a number to represent its environmental information. The neurons are placed uniformly on the space to represent covered positions, uncovered positions, and obstacles. In this algorithm, it is necessary to have a flag, denoted by Iið Þ m; n in Eq. (4), for a neuron at Position ð Þ m; n to indicate its status that is uncovered, covered, obstacle, or moving obstacle (another robot is regarded as a moving obstacle). This flag may be technically obtained by the sensor information from the current map. A topologically organized discrete map is employed to represent the workspace; each grid uses a number (flag) to represent its environmental information

yj <sup>¼</sup> <sup>1</sup> � <sup>Δ</sup>θ<sup>j</sup>

where Δθ<sup>j</sup> ∈ ½ � 0; π is the turning angle between the current moving direction and next moving direction, e.g., if the robot moves straight, Δθ<sup>j</sup> ¼ 0; if goes backward, Δθ<sup>j</sup> ¼ π. Thus, Δθ<sup>j</sup> can be

; xpj � xpc � � � atan2 ypc

current position reaches its next position, the next position becomes a new current position (if the found next position is the same as the current position, the robot stays there without any movement). The current robot position adaptively changes according to the varying environment.

In a multi-robot system, if there exist two robots that work together to sweep in a workspace, it may be viewed as a multiple neural network system [58]. Each robot needs one neural network and all robots share the same workspace information. Each robot treats the other robots as moving obstacles recognized by the sensor information so that they can avoid collisions and cooperatively work together. Each robot has its own neural network which is updated dynamically on the positions of the robot. The environmental knowledge is also dynamically updated, which is sensed by robots via the sensor information. Every position is flagged by a number in Eq. (4). Once one of the multiple robots moves to Position ð Þ m; n , the position

The proposed neural network is a stable system. The neural activity xi is bounded in the finite interval ½ � �D; B [54]. The stability and convergence of the present shunting neural network model can also be rigorously proved using a Lyapunov stability theory. By introducing new

which is Grossberg's general form [54]. It can be proved that Eqs. (2) or (3) satisfies all the three stability conditions required by the Grossberg's general form [54]. The rigorous proof of the stability and convergence of Eq. (7) can be found in [59]. The dynamics of the neural network is guaranteed to converge to an equilibrium state of the system. Eq. (3) combined with the previous robot position ensures to generate complete coverage path. At the beginning, when t ¼ 1, the neural activity of all neurons is set to zero. The state of the workspace varies in terms of the dynamics of the neural network described by (3) due to the influence of external inputs.

It is inevitable that multiple cleaning robots have to deal with a deadlock situation in real-world applications. When a cleaning robot arrives in a deadlock situation, i.e., all the neighboring positions are either obstacle or covered locations, all the neural activities of its neighboring locations are not larger than the activity at the current location, because its neighboring locations receive either negative external input (obstacles) or no external input (covered locations), and all the covered neighboring locations passed a longer decay time as they were covered earlier than the current location [see Eq. (3)]. In the proposed model, the neural activity at the deadlock

X N

� � <sup>0</sup>

cijdj zj

1

j¼1

variables and performing variable substitutions, Eqs. (2) or (3) can be written as

dt <sup>¼</sup> aið Þ zi bið Þ� zi

@

� ypc

given as: Δθ<sup>j</sup> ¼ ∣θ<sup>j</sup> � θc∣ ¼ ∣atan2 ypj

should be marked as the external input Iið Þ¼� m; n E.

dzi

The planned motion ends when the network reaches a steady state.

<sup>π</sup> (6)

; xpp � xpp � �∣. After the

http://dx.doi.org/10.5772/intechopen.75692

11

A (7)

� ypc

Biologically Inspired Intelligence with Applications on Robot Navigation

Figure 1. The architecture of a 2D neural network with three-layer (r = 1, 2, and 3) neighbouring neurons with regard to the central neuron C(m,n). The ith neuron has only eight lateral connections to its neighbouring neurons that are within its receptive field.

(state). This approach only needs current map, instead of the prior map information. The boundary of the workspace is assumed to be known that can be obtained by wall-following algorithm [57].

The proposed network characterized by Eq. (3) ensures that the positive neural activity can propagate to all the state space, but the negative activity only stays locally. Therefore, the uncovered areas globally attract the robot, while the obstacles only locally prevent the robot from collisions. The positions of the uncovered areas and obstacles may vary over time, e.g., there are moving obstacles (other robots); the covered areas become uncovered again. The activity landscape of the neural network dynamically changes due to the varying external inputs from the uncovered areas and obstacles and the internal activity propagation among neurons. For energy and time efficiency, the robot should travel a shortest path (with least revisited locations) and make least turning of moving directions. The robot path is generated from the dynamic activity landscape and the previous robot position to avoid least navigation direction changes. For a given current robot position in S (i.e., a position in W), denoted by pc , the next robot position pn (also called "command position") is obtained by

$$p\_n \quad \Leftarrow \quad \mathbf{x}\_{\mathbb{P}\_n} = \max \left\{ \mathbf{x}\_{\mathbb{I}} + \mathbf{c} \mathbf{y}\_{\mathbb{I}}, j = 1, 2, \dots, k \right\} \tag{5}$$

where c is a positive constant, k is the number of neighbouring neurons of the pcth neuron, i.e., all the possible next positions of the current position pc. Variable xj is the neural activity of the jth neuron, yj is a monotonically increasing function of the difference between the current to next robot moving directions, which can be defined as a function of the previous position pp, the current position pc, and the possible next position pj , e.g., a function defined as

Biologically Inspired Intelligence with Applications on Robot Navigation http://dx.doi.org/10.5772/intechopen.75692 11

$$y\_j = 1 - \frac{\Delta\theta\_j}{\pi} \tag{6}$$

where Δθ<sup>j</sup> ∈ ½ � 0; π is the turning angle between the current moving direction and next moving direction, e.g., if the robot moves straight, Δθ<sup>j</sup> ¼ 0; if goes backward, Δθ<sup>j</sup> ¼ π. Thus, Δθ<sup>j</sup> can be given as: Δθ<sup>j</sup> ¼ ∣θ<sup>j</sup> � θc∣ ¼ ∣atan2 ypj � ypc ; xpj � xpc � � � atan2 ypc � ypc ; xpp � xpp � �∣. After the current position reaches its next position, the next position becomes a new current position (if the found next position is the same as the current position, the robot stays there without any movement). The current robot position adaptively changes according to the varying environment.

In a multi-robot system, if there exist two robots that work together to sweep in a workspace, it may be viewed as a multiple neural network system [58]. Each robot needs one neural network and all robots share the same workspace information. Each robot treats the other robots as moving obstacles recognized by the sensor information so that they can avoid collisions and cooperatively work together. Each robot has its own neural network which is updated dynamically on the positions of the robot. The environmental knowledge is also dynamically updated, which is sensed by robots via the sensor information. Every position is flagged by a number in Eq. (4). Once one of the multiple robots moves to Position ð Þ m; n , the position should be marked as the external input Iið Þ¼� m; n E.

The proposed neural network is a stable system. The neural activity xi is bounded in the finite interval ½ � �D; B [54]. The stability and convergence of the present shunting neural network model can also be rigorously proved using a Lyapunov stability theory. By introducing new variables and performing variable substitutions, Eqs. (2) or (3) can be written as

(state). This approach only needs current map, instead of the prior map information. The boundary of the workspace is assumed to be known that can be obtained by wall-following

Figure 1. The architecture of a 2D neural network with three-layer (r = 1, 2, and 3) neighbouring neurons with regard to the central neuron C(m,n). The ith neuron has only eight lateral connections to its neighbouring neurons that are within its

The proposed network characterized by Eq. (3) ensures that the positive neural activity can propagate to all the state space, but the negative activity only stays locally. Therefore, the uncovered areas globally attract the robot, while the obstacles only locally prevent the robot from collisions. The positions of the uncovered areas and obstacles may vary over time, e.g., there are moving obstacles (other robots); the covered areas become uncovered again. The activity landscape of the neural network dynamically changes due to the varying external inputs from the uncovered areas and obstacles and the internal activity propagation among neurons. For energy and time efficiency, the robot should travel a shortest path (with least revisited locations) and make least turning of moving directions. The robot path is generated from the dynamic activity landscape and the previous robot position to avoid least navigation direction changes. For a given current robot position in S (i.e., a position in W), denoted by pc ,

the next robot position pn (also called "command position") is obtained by

the current position pc, and the possible next position pj

pn ( xpn ¼ max xj þ cyj

where c is a positive constant, k is the number of neighbouring neurons of the pcth neuron, i.e., all the possible next positions of the current position pc. Variable xj is the neural activity of the jth neuron, yj is a monotonically increasing function of the difference between the current to next robot moving directions, which can be defined as a function of the previous position pp,

; j ¼ 1; 2; ⋯; k n o

, e.g., a function defined as

(5)

algorithm [57].

receptive field.

10 Artificial Intelligence - Emerging Trends and Applications

$$\frac{dz\_i}{dt} = a\_i(z\_i) \left( b\_i(z\_i) - \sum\_{j=1}^{N} c\_{ij} d\_j(z\_j) \right) \tag{7}$$

which is Grossberg's general form [54]. It can be proved that Eqs. (2) or (3) satisfies all the three stability conditions required by the Grossberg's general form [54]. The rigorous proof of the stability and convergence of Eq. (7) can be found in [59]. The dynamics of the neural network is guaranteed to converge to an equilibrium state of the system. Eq. (3) combined with the previous robot position ensures to generate complete coverage path. At the beginning, when t ¼ 1, the neural activity of all neurons is set to zero. The state of the workspace varies in terms of the dynamics of the neural network described by (3) due to the influence of external inputs. The planned motion ends when the network reaches a steady state.

It is inevitable that multiple cleaning robots have to deal with a deadlock situation in real-world applications. When a cleaning robot arrives in a deadlock situation, i.e., all the neighboring positions are either obstacle or covered locations, all the neural activities of its neighboring locations are not larger than the activity at the current location, because its neighboring locations receive either negative external input (obstacles) or no external input (covered locations), and all the covered neighboring locations passed a longer decay time as they were covered earlier than the current location [see Eq. (3)]. In the proposed model, the neural activity at the deadlock location will quickly decay to zero due to the passive decay term �Axi in Eq. (3). Meanwhile, due to the lateral excitatory connections among neurons, the positive neural activity from the uncovered locations in the workspace will propagate toward the current robot location through neural activity propagation (please see [60]). Therefore, the robot is able to find a smooth path from the current deadlock location directly to an uncovered location. The robot continues its cleaning task until all the locations in the workspace become covered. Thus, the proposed model is capable of achieving complete coverage path planning with deadlock avoidance. The multiple robots will not be trapped in deadlock situations.

The complexity is squarely proportional to the degree of discretization. There are only local connections among neurons. If the workspace is an N � N square and the number of neurons is <sup>M</sup> <sup>¼</sup> <sup>N</sup> � <sup>N</sup>, <sup>N</sup><sup>2</sup> neurons are required and there are 8N<sup>2</sup> neural connections. If the workspace is a rectangle, the number of neurons required is equal to M ¼ Nx � Ny, where Nx and Ny are the discretized size of the Cartesian workspace. Each neuron has maximal eight local connections. As a result, the total neural connections are 8M. The computational complexity of the proposed algorithm is O N<sup>2</sup> .

> robot effector, is proportional to the size of each square. The size of a robot is slightly larger than that of each square. Two wheels driven by DC motors are mounted on shafts with brush or dustpan. The wheels are axle mounted and supported using two sealed ball bearings [see Figure 3(a)]. Note that ball bearings do not appear in Figure 3. New design is assumed that the wheels have capability to rotate at any directions on the floor make the robots flexibly turn in the workspace. For example, Figure 3(b) shows that the robot turns clockwise and counterclockwise 45<sup>∘</sup> . Therefore, sweeping an area can be achieved by traversing the center of that area represented by a rectangular cell. It is assumed that a discrete location represented by that squared cell is regarded to be covered once a robot visits the discrete cell. If a cleaning robot covers every discrete cell, the robot path is

Biologically Inspired Intelligence with Applications on Robot Navigation

http://dx.doi.org/10.5772/intechopen.75692

13

Figure 3. Two wheels driven by DC motors are mounted to the robot. (a) The robot; (b) the robot rotates clockwise and

considered as a CAC in the workspace.

counter-clockwise 45.

Figure 2. Discretized square enclosed by sweeping area.

#### 2.3. Implementation issues

There have been many studies on the CAC implementation of mobile robots using various approaches [46, 61–63]. The selection of on-board sensors is equally important as the development of CAC navigation algorithm itself. The performance of multirobots will depend on both CAC navigation algorithm and the placement of on-board sensors. Appropriate amount of sensors is key for the multi-robot system. Use of excessive amount of sensors will cause the increase of cost of robots [62, 64]. Each robot in this multi-robot system has the same configuration. The basic configuration of the robot consists of CPU, memory, sensors, DC motors, wheels, brush or dustpan, and an on-board power supply (e.g., rechargeable battery) [61].


Biologically Inspired Intelligence with Applications on Robot Navigation http://dx.doi.org/10.5772/intechopen.75692 13

Figure 2. Discretized square enclosed by sweeping area.

location will quickly decay to zero due to the passive decay term �Axi in Eq. (3). Meanwhile, due to the lateral excitatory connections among neurons, the positive neural activity from the uncovered locations in the workspace will propagate toward the current robot location through neural activity propagation (please see [60]). Therefore, the robot is able to find a smooth path from the current deadlock location directly to an uncovered location. The robot continues its cleaning task until all the locations in the workspace become covered. Thus, the proposed model is capable of achieving complete coverage path planning with deadlock avoidance. The multiple robots will

The complexity is squarely proportional to the degree of discretization. There are only local connections among neurons. If the workspace is an N � N square and the number of neurons is <sup>M</sup> <sup>¼</sup> <sup>N</sup> � <sup>N</sup>, <sup>N</sup><sup>2</sup> neurons are required and there are 8N<sup>2</sup> neural connections. If the workspace is a rectangle, the number of neurons required is equal to M ¼ Nx � Ny, where Nx and Ny are the discretized size of the Cartesian workspace. Each neuron has maximal eight local connections. As a result, the total neural connections are 8M. The computational complexity of the

There have been many studies on the CAC implementation of mobile robots using various approaches [46, 61–63]. The selection of on-board sensors is equally important as the development of CAC navigation algorithm itself. The performance of multirobots will depend on both CAC navigation algorithm and the placement of on-board sensors. Appropriate amount of sensors is key for the multi-robot system. Use of excessive amount of sensors will cause the increase of cost of robots [62, 64]. Each robot in this multi-robot system has the same configuration. The basic configuration of the robot consists of CPU, memory, sensors, DC motors, wheels, brush or dustpan, and an on-board power supply (e.g., rechargeable battery) [61].

• Sensors equipped on the robots are assumed to be imaging sensors (e.g., camera and position sensor) and rang sensors (e.g., infrared sensors, sonar, ultrasonic sensors or small radar). With on-board sensors, mobile robots can construct current environmental map around the robots [2, 65]. Ultrasonic sensors are employed for range measures due to their simplicity, flexibility, adaptability, low cost, and robustness. The interpretation of the sonar readings is helpful to build an environmental map. The small angular resolution of infrared sensors makes it suitable for CAC with back-and-forth motion. Infrared sensors are capable of reducing angular uncertainty caused by accumulated error of dead reckoning [63]. In addition, several cameras are suspended from the ceiling of workspace that

• Wheels, DC motors, and discretized environments: each robot is driven by two DC geared motors with two wheels installed to the gear axis. The 2D Cartesian workspace is discretized into squares as most other CAC models. The diagonal length of each discrete grid is equal to a robot sweeping radius, which is the size of the robot effector or footprint (Figure 2). The robot in this paper is assumed to be round in shape and a square is embedded in this round as robot's body. A robot sweeping range, which is the size of the

provides global environmental information to robots [66].

not be trapped in deadlock situations.

12 Artificial Intelligence - Emerging Trends and Applications

proposed algorithm is O N<sup>2</sup> .

2.3. Implementation issues

robot effector, is proportional to the size of each square. The size of a robot is slightly larger than that of each square. Two wheels driven by DC motors are mounted on shafts with brush or dustpan. The wheels are axle mounted and supported using two sealed ball bearings [see Figure 3(a)]. Note that ball bearings do not appear in Figure 3. New design is assumed that the wheels have capability to rotate at any directions on the floor make the robots flexibly turn in the workspace. For example, Figure 3(b) shows that the robot turns clockwise and counterclockwise 45<sup>∘</sup> . Therefore, sweeping an area can be achieved by traversing the center of that area represented by a rectangular cell. It is assumed that a discrete location represented by that squared cell is regarded to be covered once a robot visits the discrete cell. If a cleaning robot covers every discrete cell, the robot path is considered as a CAC in the workspace.

Figure 3. Two wheels driven by DC motors are mounted to the robot. (a) The robot; (b) the robot rotates clockwise and counter-clockwise 45.

## 3. Simulation and comparison experiments

The proposed model for CAC is performed by simulation experiments in C++ in this section. The approach is capable of performing CAC for multiple cleaning robots, autonomously without human operation. Two or more mobile robots can cooperatively sweep in indoor environments so as to improve the work productivity. Each robot is required to not only clean its own floor but also cooperate with others to perform the cleaning tasks. In this section, the proposed approach is first applied to multiple robots for cooperative coverage in a corridorlike environment. Then, cooperative coverage in an indoor room environment is studied. Next, it is applied to cooperative coverage in a warehouse environment. Finally, four cleaning robots working together to sweep a sport field is simulated.

paths by performing back-and-forth motions. Initially, when Robot 1 reaches Position A1ð Þ 5; 17 and Robot 2 Position A2ð Þ 27; 10 , the neural activity landscape of the neural networks is shown in Figure 5(a). In this case, two robots have two neural networks and the neural activities of two neural networks are, respectively, computed through Eq. (3). Consequently, their neural

Biologically Inspired Intelligence with Applications on Robot Navigation

http://dx.doi.org/10.5772/intechopen.75692

15

These two robots meet at the middle section of the corridor at Positions P1ð Þ 16; 14 (flag Iið Þ¼� 16; 14 E) and P2ð Þ 16; 13 (flag Iið Þ¼� 16; 13 E), respectively. Obviously, Robot 1 and Robot 2 reach the central corridor simultaneously without collision in the central area as shown in Figure 4(a). From the central area, these two robots are able to search point-to-point paths to move to uncovered areas since uncovered areas globally attract them (see [60]). These two robots can follow continuous and smooth paths to achieve Q1ð Þ 8; 8 and Q2ð Þ 24; 19 , respectively. They generate CAC in these two uncovered areas and sweep them as shown in Figure 4(b) until they fulfill all the coverage mission at Positions T1ð Þ 27; 8 and T2ð Þ 5; 19 . In fact, the procedure of these two robots cooperatively cover the workspace can be illustrated by the neural activity landscape of the neural networks in Figure 5. It demonstrates that after these two robots sweep their own areas, they clean the public aisle areas cooperatively and then move to sweep their own areas again. Figure 5 shows the neural activity landscape when these two robots move to Positions A1ð Þ 5; 17 , A2ð Þ 27; 10 ; B1ð Þ 12; 13 , B2ð Þ 20; 14 ; C1ð Þ 15; 15 ,

To investigate the flexibility and adaptability of cooperative coverage by multiple robots, the proposed model is applied to a warehouse environment with wall-like obstacles placed in different positions (Figure 6). For each robot, the neural network has 32 � 32 topologically organized neurons with zero initial neural activities and the model chooses the same parame-

Robots 1 and 2, respectively, work in the lower half and upper half sections of the workspace. However, they can also assist each other, in other words, one robot can aid to cover the other areas if it has already covered its own column. In this simulation, they start from the same side in the workspace. Robot 1 represented by a solid dot starts from the lower left corner S1(1,1),

The wall-like obstacles have influence over the cleaning assignments of two robots. The robot paths that these two robots meet at the middle of the workspace are shown in Figure 6(a) when they reach Positions A1(1,15) and A2(1,16), where these two robots equally accomplish the coverage tasks. The neural activity landscape of the neural networks when Robots 1 and 2

These two robots are responsible for the different areas in the warehouse and sweep their own regions. Due to the placement of obstacles, these two robots get through different amounts of sweeping assignments. For instance, when Robot 1 arrives at Position B1(7,5), Robot 2 only reaches Position B2(4,23). The neural activity landscape of the neural networks is illustrated in Figure 7(b). Similarly, when Robot 1 arrives at Position C1(12,15), Robot 2 only reaches Position

activities may be plotted in a figure as a neural activity landscape of neural network.

C2ð Þ 17; 12 ; D1ð Þ 12; 4 , D2ð Þ 20; 23 ; E1ð Þ 17; 6 , E2ð Þ 15; 21 ; and F1ð Þ 25; 6 , F2ð Þ 7; 21 .

whereas Robot 2 by an empty circle sweeps from the upper left corner S2(1,30).

meet at Positions A1(1,15) and A2(1,16) can be found in Figure 7(a).

3.2. Cooperative coverage in a warehouse environment

ters as the case above.

#### 3.1. Cooperative coverage in a corridor-like environment

To illustrate the cooperative coverage by a multi-robot system, the proposed model is applied to a cooperative coverage with obstacle avoidance in a corridor-like environment. In the simulation for the multi-robot system, there are two neural network systems for these two robots that share mutual external input signals from the sensor information representing environmental knowledge. Each neural network has 33 � 28 topologically organized neurons with zero initial neural activities. The model parameters are set as A ¼ 10, B ¼ 1, and D ¼ 1 for the shunting equation; μ ¼ 0:7 and r<sup>0</sup> ¼ 2 for the lateral connections; and E ¼ 100 for the external inputs (on parameter sensitivity, see [2]). In Figure 4(a), one robot, called Robot 1, represented using solid circle starts to sweep from the lower left corner S1(1, 1); the other, Robot 2, exhibited by a hollow circle covers from the upper right corner S2(31, 26). After two robots sweep four columns in their own regions, they encounter walls and then enter to clean in a narrow corridor-like workspace [see Figure 4(a)]. Two robots move along smooth zigzag

Figure 4. Cooperative coverage in a corridor-like environment. (a) When Robot 1 reaches Position B1ð Þ 8; 8 and Robot 2 Position B2ð Þ 24; 19 ; (b) when two robots fulfill the coverage task.

paths by performing back-and-forth motions. Initially, when Robot 1 reaches Position A1ð Þ 5; 17 and Robot 2 Position A2ð Þ 27; 10 , the neural activity landscape of the neural networks is shown in Figure 5(a). In this case, two robots have two neural networks and the neural activities of two neural networks are, respectively, computed through Eq. (3). Consequently, their neural activities may be plotted in a figure as a neural activity landscape of neural network.

These two robots meet at the middle section of the corridor at Positions P1ð Þ 16; 14 (flag Iið Þ¼� 16; 14 E) and P2ð Þ 16; 13 (flag Iið Þ¼� 16; 13 E), respectively. Obviously, Robot 1 and Robot 2 reach the central corridor simultaneously without collision in the central area as shown in Figure 4(a). From the central area, these two robots are able to search point-to-point paths to move to uncovered areas since uncovered areas globally attract them (see [60]). These two robots can follow continuous and smooth paths to achieve Q1ð Þ 8; 8 and Q2ð Þ 24; 19 , respectively. They generate CAC in these two uncovered areas and sweep them as shown in Figure 4(b) until they fulfill all the coverage mission at Positions T1ð Þ 27; 8 and T2ð Þ 5; 19 . In fact, the procedure of these two robots cooperatively cover the workspace can be illustrated by the neural activity landscape of the neural networks in Figure 5. It demonstrates that after these two robots sweep their own areas, they clean the public aisle areas cooperatively and then move to sweep their own areas again. Figure 5 shows the neural activity landscape when these two robots move to Positions A1ð Þ 5; 17 , A2ð Þ 27; 10 ; B1ð Þ 12; 13 , B2ð Þ 20; 14 ; C1ð Þ 15; 15 , C2ð Þ 17; 12 ; D1ð Þ 12; 4 , D2ð Þ 20; 23 ; E1ð Þ 17; 6 , E2ð Þ 15; 21 ; and F1ð Þ 25; 6 , F2ð Þ 7; 21 .

#### 3.2. Cooperative coverage in a warehouse environment

3. Simulation and comparison experiments

14 Artificial Intelligence - Emerging Trends and Applications

working together to sweep a sport field is simulated.

Position B2ð Þ 24; 19 ; (b) when two robots fulfill the coverage task.

3.1. Cooperative coverage in a corridor-like environment

The proposed model for CAC is performed by simulation experiments in C++ in this section. The approach is capable of performing CAC for multiple cleaning robots, autonomously without human operation. Two or more mobile robots can cooperatively sweep in indoor environments so as to improve the work productivity. Each robot is required to not only clean its own floor but also cooperate with others to perform the cleaning tasks. In this section, the proposed approach is first applied to multiple robots for cooperative coverage in a corridorlike environment. Then, cooperative coverage in an indoor room environment is studied. Next, it is applied to cooperative coverage in a warehouse environment. Finally, four cleaning robots

To illustrate the cooperative coverage by a multi-robot system, the proposed model is applied to a cooperative coverage with obstacle avoidance in a corridor-like environment. In the simulation for the multi-robot system, there are two neural network systems for these two robots that share mutual external input signals from the sensor information representing environmental knowledge. Each neural network has 33 � 28 topologically organized neurons with zero initial neural activities. The model parameters are set as A ¼ 10, B ¼ 1, and D ¼ 1 for the shunting equation; μ ¼ 0:7 and r<sup>0</sup> ¼ 2 for the lateral connections; and E ¼ 100 for the external inputs (on parameter sensitivity, see [2]). In Figure 4(a), one robot, called Robot 1, represented using solid circle starts to sweep from the lower left corner S1(1, 1); the other, Robot 2, exhibited by a hollow circle covers from the upper right corner S2(31, 26). After two robots sweep four columns in their own regions, they encounter walls and then enter to clean in a narrow corridor-like workspace [see Figure 4(a)]. Two robots move along smooth zigzag

Figure 4. Cooperative coverage in a corridor-like environment. (a) When Robot 1 reaches Position B1ð Þ 8; 8 and Robot 2

To investigate the flexibility and adaptability of cooperative coverage by multiple robots, the proposed model is applied to a warehouse environment with wall-like obstacles placed in different positions (Figure 6). For each robot, the neural network has 32 � 32 topologically organized neurons with zero initial neural activities and the model chooses the same parameters as the case above.

Robots 1 and 2, respectively, work in the lower half and upper half sections of the workspace. However, they can also assist each other, in other words, one robot can aid to cover the other areas if it has already covered its own column. In this simulation, they start from the same side in the workspace. Robot 1 represented by a solid dot starts from the lower left corner S1(1,1), whereas Robot 2 by an empty circle sweeps from the upper left corner S2(1,30).

The wall-like obstacles have influence over the cleaning assignments of two robots. The robot paths that these two robots meet at the middle of the workspace are shown in Figure 6(a) when they reach Positions A1(1,15) and A2(1,16), where these two robots equally accomplish the coverage tasks. The neural activity landscape of the neural networks when Robots 1 and 2 meet at Positions A1(1,15) and A2(1,16) can be found in Figure 7(a).

These two robots are responsible for the different areas in the warehouse and sweep their own regions. Due to the placement of obstacles, these two robots get through different amounts of sweeping assignments. For instance, when Robot 1 arrives at Position B1(7,5), Robot 2 only reaches Position B2(4,23). The neural activity landscape of the neural networks is illustrated in Figure 7(b). Similarly, when Robot 1 arrives at Position C1(12,15), Robot 2 only reaches Position

C2(10,22), in which the corresponding neural activity landscape of the neural networks is exhibited in Figure 7(c). Obviously, when these two robots meet at Positions D1(14,25) and D2(14,26), Robot 1 is assisting Robot 2 for coverage work, where the corresponding neural activity landscape of the neural networks is shown in Figure 7(d). Conversely, Robot 2 aids Robot 1 to do coverage work when they arrive at Positions E1(16,4) and E2(16,5). They can cooperatively work together without collisions to improve the cleaning productivity efficiently. The neural activity landscape of the neural networks is illustrated in Figure 7(e). Finally, when Robot 1 arrives at F1(28,29), Robot 2 reaches F2(27,30), which shows that the Robot 1 has assisted to sweep one column area for Robot 2 as shown in Figure 6(a). These two robots continue to sweep the rest of the workspace. Robot 1 passes Positions P(29,29) and Q(30,26) and reaches the final position T1(30,1), while Robot 2 attains the final position T2(30,27) via U(30,30) [see Figure 6(b)]. The neural activity landscape of the neural networks is illustrated in Figure 7(f). Ultimately, they reaches Positions T1(30,1) and T2(30,27) asynchronously. Therefore, the neural network is able to guide these two robots to complete the coverage task.

Figure 6. Cooperative coverage in the warehouse environment. (a) Robots 1 and 2 reach Positions F1(28,29) and F2(27,30);

Biologically Inspired Intelligence with Applications on Robot Navigation

http://dx.doi.org/10.5772/intechopen.75692

17

(b) two robots work cooperatively.

3.3. Cooperative coverage by four cleaning robots in a sports field environment

The proposed model is further applied to cooperative coverage in a sports field environment by four cleaning robots, where there exist four neural network systems and the four cleaning robots share mutual external input signals from sensory data representing the environmental information. Each neural network has 20 20 topologically organized neurons with zero initial neural activities and the same model parameters as the above case. In Figure 8(a), the use of various lines is to distinguish the generated paths by the robots. Robot 1 whose paths are represented by solid lines starts to move from the lower left corner S1(1,1). Robot 2 whose

Figure 5. The neural activity landscape of the neural networks for the corridor-like environment case when Robots 1 and 2 reach (a) Positions A1ð Þ 5; 17 , A2ð Þ 27; 10 ; (b) B1ð Þ 12; 13 , B2ð Þ 20; 14 ; (c) C1ð Þ 15; 15 , C2ð Þ 17; 12 ; (d) D1ð Þ 12; 4 , D2ð Þ 20; 23 ; (e) E1ð Þ 17; 6 , E2ð Þ 15; 21 ; and (f) F1ð Þ 25; 6 , F2ð Þ 7; 21 .

Biologically Inspired Intelligence with Applications on Robot Navigation http://dx.doi.org/10.5772/intechopen.75692 17

Figure 6. Cooperative coverage in the warehouse environment. (a) Robots 1 and 2 reach Positions F1(28,29) and F2(27,30); (b) two robots work cooperatively.

C2(10,22), in which the corresponding neural activity landscape of the neural networks is exhibited in Figure 7(c). Obviously, when these two robots meet at Positions D1(14,25) and D2(14,26), Robot 1 is assisting Robot 2 for coverage work, where the corresponding neural activity landscape of the neural networks is shown in Figure 7(d). Conversely, Robot 2 aids Robot 1 to do coverage work when they arrive at Positions E1(16,4) and E2(16,5). They can cooperatively work together without collisions to improve the cleaning productivity efficiently. The neural activity landscape of the neural networks is illustrated in Figure 7(e). Finally, when Robot 1 arrives at F1(28,29), Robot 2 reaches F2(27,30), which shows that the Robot 1 has assisted to sweep one column area for Robot 2 as shown in Figure 6(a). These two robots continue to sweep the rest of the workspace. Robot 1 passes Positions P(29,29) and Q(30,26) and reaches the final position T1(30,1), while Robot 2 attains the final position T2(30,27) via U(30,30) [see Figure 6(b)]. The neural activity landscape of the neural networks is illustrated in Figure 7(f). Ultimately, they reaches Positions T1(30,1) and T2(30,27) asynchronously. Therefore, the neural network is able to guide these two robots to complete the coverage task.

#### 3.3. Cooperative coverage by four cleaning robots in a sports field environment

Figure 5. The neural activity landscape of the neural networks for the corridor-like environment case when Robots 1 and 2 reach (a) Positions A1ð Þ 5; 17 , A2ð Þ 27; 10 ; (b) B1ð Þ 12; 13 , B2ð Þ 20; 14 ; (c) C1ð Þ 15; 15 , C2ð Þ 17; 12 ; (d) D1ð Þ 12; 4 , D2ð Þ 20; 23 ; (e)

E1ð Þ 17; 6 , E2ð Þ 15; 21 ; and (f) F1ð Þ 25; 6 , F2ð Þ 7; 21 .

16 Artificial Intelligence - Emerging Trends and Applications

The proposed model is further applied to cooperative coverage in a sports field environment by four cleaning robots, where there exist four neural network systems and the four cleaning robots share mutual external input signals from sensory data representing the environmental information. Each neural network has 20 20 topologically organized neurons with zero initial neural activities and the same model parameters as the above case. In Figure 8(a), the use of various lines is to distinguish the generated paths by the robots. Robot 1 whose paths are represented by solid lines starts to move from the lower left corner S1(1,1). Robot 2 whose

paths are represented by dashed lines moves from the upper left corner S2(1,18). Robot 3 whose paths are represented by dash-dotted lines starts from the upper right corner S3(18,18). Robot 4 whose paths are represented by dash-dot-dot lines sweeps from the lower right corner S4(18,1). In the simulation, the planned robot paths are shown in Figure 8(a), where the four robots search snake-trail CAC paths and meet at the central area. Because for each neural network, the positive neural activity can propagate to the whole state space of the neural network, each robot can achieve CAC. If one grid is covered by one robot, it will be marked covered by external input signal (Ii ¼ 0), another robot will know that it has been covered. When the four robots eventually meet, they have no collision. It shows that these four robots are able to autonomously sweep the whole workspace. They can sweep along zigzag coverage

Biologically Inspired Intelligence with Applications on Robot Navigation

http://dx.doi.org/10.5772/intechopen.75692

19

After they meet at the central area, where there is a deadlock situation [2], i.e., Robot 1 is at F<sup>1</sup> (9,9); Robot 2 at F<sup>2</sup> (9,10); Robot 3 at F<sup>3</sup> (10,10); and Robot 4 at F<sup>4</sup> (10,9), the robots are able to search point-to-point paths to move to any pre-defined targets. In this simulation as shown in Figure 8(b), Robot 1 goes back to its initial point G1(1,1). Robot 2, 3, and 4 move back to their initial points G2(1,18), G3(18,18), and G4(18,1), respectively. They may be pre-defined to move to any points based on the demand (see [2]). The targets can globally attract the robots in the whole workspace through neural activity propagation. This case has potential applications in sport fields such as basketball or volleyball match. The four mobile robots can be assigned to clean fields together (e.g., mop sweat on the floor during the volleyball match) and then move back to their starting points during sports contest break, without any human intervention.

To verify the robustness and effectiveness of the proposed model, it is applied to a complicated case of cooperative coverage in an indoor environment, where there exist seven sets of

Figure 8. CAC of four mobile robots in a sport field environment. (a) The robot paths when four meet at the centre; (b) the

3.4. Cooperative coverage in an indoor environment with a robot failure

paths and avoid collisions with each other.

entire robot paths after each robot returns its home position.

Figure 7. The neural activity landscape of the neural networks for the warehouse environment when Robots 1 and 2 reach (a) Positions A1ð Þ 1; 15 and A2ð Þ 1; 16 ; (b) B1ð Þ 7; 5 and B2ð Þ 4; 23 ; (c) C1ð Þ 12; 15 and C2ð Þ 10; 22 ; (d) D1ð Þ 14; 25 and D2ð Þ 14; 26 ; (e) E1ð Þ 16; 4 and E2ð Þ 16; 5 ; and (f) F1ð Þ 28; 29 and F2ð Þ 27; 30 .

paths are represented by dashed lines moves from the upper left corner S2(1,18). Robot 3 whose paths are represented by dash-dotted lines starts from the upper right corner S3(18,18). Robot 4 whose paths are represented by dash-dot-dot lines sweeps from the lower right corner S4(18,1). In the simulation, the planned robot paths are shown in Figure 8(a), where the four robots search snake-trail CAC paths and meet at the central area. Because for each neural network, the positive neural activity can propagate to the whole state space of the neural network, each robot can achieve CAC. If one grid is covered by one robot, it will be marked covered by external input signal (Ii ¼ 0), another robot will know that it has been covered. When the four robots eventually meet, they have no collision. It shows that these four robots are able to autonomously sweep the whole workspace. They can sweep along zigzag coverage paths and avoid collisions with each other.

After they meet at the central area, where there is a deadlock situation [2], i.e., Robot 1 is at F<sup>1</sup> (9,9); Robot 2 at F<sup>2</sup> (9,10); Robot 3 at F<sup>3</sup> (10,10); and Robot 4 at F<sup>4</sup> (10,9), the robots are able to search point-to-point paths to move to any pre-defined targets. In this simulation as shown in Figure 8(b), Robot 1 goes back to its initial point G1(1,1). Robot 2, 3, and 4 move back to their initial points G2(1,18), G3(18,18), and G4(18,1), respectively. They may be pre-defined to move to any points based on the demand (see [2]). The targets can globally attract the robots in the whole workspace through neural activity propagation. This case has potential applications in sport fields such as basketball or volleyball match. The four mobile robots can be assigned to clean fields together (e.g., mop sweat on the floor during the volleyball match) and then move back to their starting points during sports contest break, without any human intervention.

#### 3.4. Cooperative coverage in an indoor environment with a robot failure

To verify the robustness and effectiveness of the proposed model, it is applied to a complicated case of cooperative coverage in an indoor environment, where there exist seven sets of

Figure 8. CAC of four mobile robots in a sport field environment. (a) The robot paths when four meet at the centre; (b) the entire robot paths after each robot returns its home position.

Figure 7. The neural activity landscape of the neural networks for the warehouse environment when Robots 1 and 2 reach (a) Positions A1ð Þ 1; 15 and A2ð Þ 1; 16 ; (b) B1ð Þ 7; 5 and B2ð Þ 4; 23 ; (c) C1ð Þ 12; 15 and C2ð Þ 10; 22 ; (d) D1ð Þ 14; 25 and

D2ð Þ 14; 26 ; (e) E1ð Þ 16; 4 and E2ð Þ 16; 5 ; and (f) F1ð Þ 28; 29 and F2ð Þ 27; 30 .

18 Artificial Intelligence - Emerging Trends and Applications

obstacles with different sizes and shapes in the workspace as shown in Figure 9. Each neural network has 33 � 28 topologically organized neurons with zero initial neural activities. The model parameters are chosen as A ¼ 50, B ¼ 1 and D ¼ 1 for the shunting equation; μ ¼ 0:7 and r<sup>0</sup> ¼ 2 for the lateral connections; and E ¼ 100 for the external inputs. In this section, two simulations are performed as shown in Figure 9. First, these two robots work cooperatively

[shown in Figure 9(a)]. Second, when Robot 2 fails at one position, Robot 1 is demonstrated to

Biologically Inspired Intelligence with Applications on Robot Navigation

http://dx.doi.org/10.5772/intechopen.75692

21

Robot 1 symbolized by a solid circle starts to move from the lower left corner S1(1,1) and Robot 2 by an empty circle sweeps from the upper right corner S2(31,26) as illustrated in Figure 9. The real-time collision-free robot paths are shown in Figure 9(a), where the solid lines represent paths of Robot 1 and the dashed lines stand for those of Robot 2. These two robots cover the workspace cooperatively. Robot 1 reaches Position P1ð Þ 8; 8 , while Robot 2 attains Position P2ð Þ 25; 20 as shown in Figure 10(a). Continuing the work cooperatively, two robots can

Figure 11. The neural activity landscape of the neural networks for the indoor environment case when Robot 2 fails at

F2ð Þ 26; 26 and Robot 1 reaches (a) Position A1ð Þ 11; 19 ; (b) B1ð Þ 24; 4 ; (c) C1ð Þ 25; 13 .

perform the rest of the work.

Figure 9. Complete cooperative coverage in an indoor environment with (a) both robots function properly; (b) a robot failure at Position F2ð Þ 26; 26 .

Figure 10. The neural activity landscape of the neural networks for the unstructured environment case when Robot 1 reaches (a) Position P1ð Þ 8; 8 ; (b) Position Q1ð Þ 16; 13 .

[shown in Figure 9(a)]. Second, when Robot 2 fails at one position, Robot 1 is demonstrated to perform the rest of the work.

obstacles with different sizes and shapes in the workspace as shown in Figure 9. Each neural network has 33 � 28 topologically organized neurons with zero initial neural activities. The model parameters are chosen as A ¼ 50, B ¼ 1 and D ¼ 1 for the shunting equation; μ ¼ 0:7 and r<sup>0</sup> ¼ 2 for the lateral connections; and E ¼ 100 for the external inputs. In this section, two simulations are performed as shown in Figure 9. First, these two robots work cooperatively

Figure 9. Complete cooperative coverage in an indoor environment with (a) both robots function properly; (b) a robot

Figure 10. The neural activity landscape of the neural networks for the unstructured environment case when Robot 1

failure at Position F2ð Þ 26; 26 .

20 Artificial Intelligence - Emerging Trends and Applications

reaches (a) Position P1ð Þ 8; 8 ; (b) Position Q1ð Þ 16; 13 .

Robot 1 symbolized by a solid circle starts to move from the lower left corner S1(1,1) and Robot 2 by an empty circle sweeps from the upper right corner S2(31,26) as illustrated in Figure 9. The real-time collision-free robot paths are shown in Figure 9(a), where the solid lines represent paths of Robot 1 and the dashed lines stand for those of Robot 2. These two robots cover the workspace cooperatively. Robot 1 reaches Position P1ð Þ 8; 8 , while Robot 2 attains Position P2ð Þ 25; 20 as shown in Figure 10(a). Continuing the work cooperatively, two robots can

Figure 11. The neural activity landscape of the neural networks for the indoor environment case when Robot 2 fails at F2ð Þ 26; 26 and Robot 1 reaches (a) Position A1ð Þ 11; 19 ; (b) B1ð Þ 24; 4 ; (c) C1ð Þ 25; 13 .

approach to achieve complete coverage of the entire workspace as uncovered areas globally attract them to visit. The neural activity landscape of the neural networks illustrates that these robots march on to achieve CAC when Robot 1 arrives at Position Q1ð Þ 16; 13 and Robot 2 Position Q2ð Þ 18; 8 in Figure 10(b). It shows that these two cleaning robots are capable of autonomously and cooperatively sweeping the whole workspace. Not only can they sweep along a curve path to avoid the irregularly shaped obstacles but also are able to avoid collisions with each other.

Author details

\*, Gene Eu Jan2,4, Zhenzhong Chu3 and Xinde Li<sup>5</sup>

Department of Electrical Engineering, National Taipei University, Taiwan

Neural Networks and Learning Systems. 2008;19(7):1279-1298

5 School of Automation, Southeast University, Nanjing, China

3 College of Information Engineering, Maritime University, Shanghai, China 4 Department of Electrical Engineering, National Taipei University, Taiwan

1 Department of Electrical and Computer Engineering, University of Detroit Mercy, Michigan,

Biologically Inspired Intelligence with Applications on Robot Navigation

http://dx.doi.org/10.5772/intechopen.75692

23

2 Graduate Institute of Animation and Film Art, Tainan National University of the Arts, and

[1] Lawitzky G. A navigation system for cleaning robots. Autonomous Robots. 2000;9:255-260

[2] Luo C, Yang SX. A bioinspired neural network for real-time concurrent map building and complete coverage robot navigation in unknown environments. IEEE Transactions on

[3] Ortiz F, Pastor JA, Alvarez B, Iborra A, Ortega N, Rodriguez D, Concsa C. Robots for hull ship cleaning. In: Proceedings of IEEE International Symposium on Industrial Electronics;

[4] Wagner IA, Altshuler Y, Yanovski V, Bruckstein AM. Cooperative cleaners: A study in ant

[5] Oh JS, Park JB, Choi YH. Complete coverage navigation of clean robot based on triangular cell map, In: Proceedings of IEEE International Symposium on Industrial Electronics;

[6] Oh JS, Choi YH, Park JB, Zheng YF. Complete coverage navigation of cleaning robots using triangular-cell-based map. IEEE Transactions on Industrial Electronics. 2004;51(3):

[7] Yasutomi F, Takaoka D, Yamada M, Tsukamoto K. Cleaning robot control. In: Proceedings of the IEEE International Conference on Robotics and Automation; Philadelphia. USA;

[8] Gage DW. Randomized search strategies with imperfect sensors. In: Proceedings of SPIE, Mobile Robots VIII—The International Society for Optical Engineering; Boston. USA;

robotics. The International Journal of Robotics Research. 2008;27(1):127-151

\*Address all correspondence to: luoch@udmercy.edu

Chaomin Luo<sup>1</sup>

References

2007. pp. 2077-2082

718-726

1988. pp. 1839-1841

1994. pp. 270-279

Pusan. South Korea; 2001. pp. 2089-2093

USA

Now a simulation is performed to demonstrate the robustness of the proposed model. It is assumed that Robot 2 fails to work at Position F2ð Þ 26; 26 as shown in Figure 9(b). Robot 1 can continue to complete the rest of coverage mission even though its partner, Robot 2, was stuck at Position F2ð Þ 26; 26 [see Figure 11(b)]. The neural activity landscape of the neural networks when Robot 1 reaches Positions A1ð Þ 11; 19 and B1ð Þ 24; 4 are illustrated in Figure 11(a) and (b), respectively. Eventually, when Robot 1 arrives at Position C1ð Þ 25; 13 as shown in Figure 11(c), the CAC approaches to an end. In fact, compared Figure 9(a) with Figure 9(b), Robot 1 assists Robot 2 for eight columns of sweeping work. The neural activity landscape of the neural networks for the indoor environment case when Robot 2 fails at F2ð Þ 26; 26 illustrates the area coverage progress of these two robots (Figure 11). Although Robot 2 fails at Position F2ð Þ 26; 26 , Robot 1 can still be responsible for the rest coverage work. This simulation shows that the proposed approach is robust. The complete coverage can be achieved as long as at least one single robot is able to work.

## 4. Conclusion

Multiple robots have the capacity for covering the areas more efficiently than a single robot. In this chapter, a biologically motivated neural network approach to cooperative area coverage by a multi-robot system is proposed, which is capable of autonomously accomplishing collision-free cooperative coverage in CAC environments. The effectiveness of the presented paradigm has been discussed and demonstrated through case studies. Multiple robots can work together to achieve a common coverage goal efficiently and robustly.

It is practical to implement the proposed approach in autonomous area coverage as no learning and no templates are required. The robustness and fault-tolerant can be ensured if some robots fail. The model algorithm is computationally simple. The robot path is generated without explicitly searching over the global free workspace or the collision paths, without explicitly optimizing any global cost functions, without any prior knowledge of the dynamic environment, without any templates, and without any learning procedures.

In the future, some research work will be carried out. First, energy-driven multirobot algorithms associated with deep reinforcement learning will be further studied to explore the minimum-energy cleaning robots. Second, the task allocation and impact of number of robots for the cleaning mission will be addressed. Third, the algorithm will be considered to be implemented on an FPGA-based platform. Finally, SLAM and robot vision will be carried out to make the cleaning algorithms more accurate.

## Author details

approach to achieve complete coverage of the entire workspace as uncovered areas globally attract them to visit. The neural activity landscape of the neural networks illustrates that these robots march on to achieve CAC when Robot 1 arrives at Position Q1ð Þ 16; 13 and Robot 2 Position Q2ð Þ 18; 8 in Figure 10(b). It shows that these two cleaning robots are capable of autonomously and cooperatively sweeping the whole workspace. Not only can they sweep along a curve path to avoid the irregularly shaped obstacles but also are able to avoid colli-

Now a simulation is performed to demonstrate the robustness of the proposed model. It is assumed that Robot 2 fails to work at Position F2ð Þ 26; 26 as shown in Figure 9(b). Robot 1 can continue to complete the rest of coverage mission even though its partner, Robot 2, was stuck at Position F2ð Þ 26; 26 [see Figure 11(b)]. The neural activity landscape of the neural networks when Robot 1 reaches Positions A1ð Þ 11; 19 and B1ð Þ 24; 4 are illustrated in Figure 11(a) and (b), respectively. Eventually, when Robot 1 arrives at Position C1ð Þ 25; 13 as shown in Figure 11(c), the CAC approaches to an end. In fact, compared Figure 9(a) with Figure 9(b), Robot 1 assists Robot 2 for eight columns of sweeping work. The neural activity landscape of the neural networks for the indoor environment case when Robot 2 fails at F2ð Þ 26; 26 illustrates the area coverage progress of these two robots (Figure 11). Although Robot 2 fails at Position F2ð Þ 26; 26 , Robot 1 can still be responsible for the rest coverage work. This simulation shows that the proposed approach is robust. The complete coverage can be achieved as long as at least one

Multiple robots have the capacity for covering the areas more efficiently than a single robot. In this chapter, a biologically motivated neural network approach to cooperative area coverage by a multi-robot system is proposed, which is capable of autonomously accomplishing collision-free cooperative coverage in CAC environments. The effectiveness of the presented paradigm has been discussed and demonstrated through case studies. Multiple robots can

It is practical to implement the proposed approach in autonomous area coverage as no learning and no templates are required. The robustness and fault-tolerant can be ensured if some robots fail. The model algorithm is computationally simple. The robot path is generated without explicitly searching over the global free workspace or the collision paths, without explicitly optimizing any global cost functions, without any prior knowledge of the dynamic

In the future, some research work will be carried out. First, energy-driven multirobot algorithms associated with deep reinforcement learning will be further studied to explore the minimum-energy cleaning robots. Second, the task allocation and impact of number of robots for the cleaning mission will be addressed. Third, the algorithm will be considered to be implemented on an FPGA-based platform. Finally, SLAM and robot vision will be carried out

work together to achieve a common coverage goal efficiently and robustly.

environment, without any templates, and without any learning procedures.

to make the cleaning algorithms more accurate.

sions with each other.

22 Artificial Intelligence - Emerging Trends and Applications

single robot is able to work.

4. Conclusion

Chaomin Luo<sup>1</sup> \*, Gene Eu Jan2,4, Zhenzhong Chu3 and Xinde Li<sup>5</sup>

\*Address all correspondence to: luoch@udmercy.edu

1 Department of Electrical and Computer Engineering, University of Detroit Mercy, Michigan, USA

2 Graduate Institute of Animation and Film Art, Tainan National University of the Arts, and Department of Electrical Engineering, National Taipei University, Taiwan


## References


[9] Najjaran H, Kircanski N. Path planning for a terrain scanner robot. In: Proceedings of the 31st International Symposium on Robotics; Montreal. Canada; 2000. pp. 132-137

[22] Kong CS, New AP, Rekleitis I. Distributed coverage with multi-robot system. In: Proceedings of IEEE International Conference on Robotics and Automation; Orlando, USA.

Biologically Inspired Intelligence with Applications on Robot Navigation

http://dx.doi.org/10.5772/intechopen.75692

25

[23] Zheng X, Jain S, Koenig S, Kempe D. Multi-robot forest coverage. In: Proceedings of IEEE/ RSJ International Conference on Intelligent Robots and Systems; Edmonton, Canada.

[24] Zheng X, Koenig S. Robot coverage of terrain with nonuniform traversability. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems; San

[25] Agmon N, Hazon N, Kaminka GA. Constructing spanning trees for efficient multi-robot coverage. In: Proceedings of IEEE International Conference on Robotics and Automation;

[26] Balch T, Arkin R. Behavior-based formation control for multirobot teams. IEEE Trans-

[27] Fang G, Dissanayake G, Lau H. A behaviour-based optimisation strategy for multi-robot exploration. In: Proceedings of the IEEE Conference on Robotics, Automation and

[28] Jung D, Cheng G, Zelinsky A. Experimentals in realising cooperation between autonomous mobile robots. In: Experiments Robotics V—The Fifth International Symposium on

[29] Jung D, Cheng G, Zelinsky A. Robot cleaning: An application of distributed planning and real-time vision. In: Zelinsky A, editor. Filed and Service Robotics. New York, USA:

[30] Jung D, Zelinsky A. An architecture for distributed cooperative-planning in a behaviourbased multi-robot system. Robotics and Autonomous Systems. 1999;26:149-174

[31] Wagner IA, Lindenbaum M, Bruckstein AM. Smell as a computational resource—A lesson we can learn from the ant. In: Proceedings of the Fourth Israel Symposium on Theory of

[32] Williams K, Burdick J. Multi-robot boundary coverage with plan revision. In: Proceedings of IEEE International Conference on Robotics and Automation; Orlando, USA. 2006. pp.

[33] Burgard W, Moors M, Stachniss C, Schneider FE. Coordinated multi-robot exploration.

[34] Gossage M, New AP, Cheng CK. Frontier-graph exploration for multi-robot systems in an unknown indoor environment. In: Gini M, Voyles R, editors. Distributed Autonomous

2006. pp. 2423-2429

2005. pp. 3852-3857

Diego, USA. 2007. pp. 3757-3764

Orlando, USA. 2006. pp. 1698-1703

Springer; 1998. pp. 187-194

1716-1723

Mechatronics; Singapore. 2004. pp. 875-879

actions on Robotics and Automation. 1998;14(6):926-939

Experimental Robotics; Barcelona, Spain. 1997. pp. 609-620

Computing and Systems; Jerusalem, Israel. 1996. pp. 219-230

IEEE Transactions on Robotics. 2005;21(3):376-386

Robotic Systems. Japan: Springer; 2006. pp. 51-60


[22] Kong CS, New AP, Rekleitis I. Distributed coverage with multi-robot system. In: Proceedings of IEEE International Conference on Robotics and Automation; Orlando, USA. 2006. pp. 2423-2429

[9] Najjaran H, Kircanski N. Path planning for a terrain scanner robot. In: Proceedings of the 31st International Symposium on Robotics; Montreal. Canada; 2000. pp. 132-137

[10] Ollis M, Stentz A. Vision-based perception for an automated harvester. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems; Grenoble. France;

[11] Ollis M, Stentz A. First results in vision-based crop line tracking. In: Proceedings of IEEE International Conference on Robotics and Automation; Minneapolis. USA; 1996. pp. 951-956

[12] Farsi M, Ratcliff K, Johnson PJ, Allen CR, Karam KZ, Pawson R. Robot control system for window cleaning. In: Proceedings of 11th International Symposium on Automation and

[13] Choset H. Coverage for robotivs - a survey of recent results. Annals of Mathematics and

[14] Hazon N, Kaminka GA. Redundancy, efficiency, and robustness in multi-robot coverage. In: Proceedings of IEEE International Conference on Robotics and Automation; Barcelona.

[15] Kurabayashi D, Ota J, Arai T, Ichikawa S, Koga S, Asama H, Endo I. Cooperative sweeping by multiple mobile robots with relocating portable obstacles. In: Proceedings of IEEE/ RSJ International Conference on Intelligent Robots and Systems; Osaka. Japan; 1996. pp.

[16] Kurabayashi D, Ota J, Arai T, Yoshida E. Cooperative sweeping by multiple mobile robots. In: Proceedings of IEEE International Conference on Robotics and Automation; Minneap-

[17] Ferranti E, Trigoni N, Levene M. Brick & Mortar: An online multi-agent exploration algorithm. In: Proceedings of IEEE International Conference on Robotics and Automation;

[18] Hazon M, Mieli F, Kaminka GA. Towards robust on-line multi-robot coverage. In: Proceedings of IEEE International Conference on Robotics and Automation; Orlando. USA;

[19] Rekleitis IM, Dudek D, Milios EE. Multi-robot exploration of an unknown environment, efficiently reducing the odometry error. In: Proceedings of the 15th IEEE International

[20] Wagner IA, Lindenbaum M, Bruckstein AM. Distributed covering by ant-robots using evaporating traces. IEEE Transactions on Robotics and Automation. 1999;15(5):918-933

[21] Wagner IA, Lindenbaum M, Bruckstein AM. MAC versus PC: determinism and randomness as complementary approaches to robotic exploration of continuous unknown

Joint Conference on Artificial Intelligence; Nagoya. Japan; 1997. pp. 1340-1345

domains. The International Journal of Robotics Research. 2000;19(1):12-31

Robotics in Construction; Brighton. UK; 1994. pp. 617-623

Artificial Intelligence. 2001;31:113-126

Spain; 2005. pp. 735-741

olis. USA; 1996. pp. 1744-1749

Roma. Italy, 2007. pp. 761-767

2006. pp. 1710-1715

1472-1477

1997. pp. 1838-1844

24 Artificial Intelligence - Emerging Trends and Applications


[35] Ko J, Stewart B, Fox D, Konolige K, Limketkai B. A practical, decision-theoretic approach to multi-robot mapping and exploration. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems; Las Vegas, USA: 2003. pp. 3232-3238

[49] Huang S-J, Lian R-J. A hybrid fuzzy logic and neural network algorithm for robot motion

Biologically Inspired Intelligence with Applications on Robot Navigation

http://dx.doi.org/10.5772/intechopen.75692

27

[50] Su Z, Zeng B, Liu G, Ye F, Xu M. Application of fuzzy neural network in parameter optimization of mobile robot path planning using potential field. In: Proceedings of IEEE

[51] Arleo A, Smeraldi F, Gerstner W. Cognitive navigation based on nonuniform gabor space sampling, unsupervised growing networks, and reinforcement learning. IEEE Transac-

[52] Er MJ, Deng C. Obstacle avoidance of a mobile robot using hybrid learning approach.

[53] Hodgkin AL, Huxley AF. A quantitative description of membrane current and its application to conduction and excitation in nerve. Journal of Physiology (London). 1952;117:500-544 [54] Grossberg S. Nonlinear neural networks: principles, mechanisms, and architecture. Neu-

[55] Svennebring J, Koenig S. Building terrain-covering ant robots. Autonomous Robots. 2004;

[56] Xu WL, Pap J-S, Bronlund J. Design of a biologically inspired parallel robot for foods

[57] Domenech JE, Regueiro CV, Gamallo C, Quintia P. Learning wall following behaviour in robotics through reinforcement and image-based states. In: Proceedings of IEEE Interna-

[58] Lee C-Y, Lee J-J. Multiple neuro-adaptive control of robot manipulators using visual cues.

[59] Grossberg S. Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Transactions on Systems, Man, and Cybernetics.

[60] Yang SX, Luo C. A neural network approach to complete coverage path planning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 2004;34(1):718-725

[61] Gutierrez-Osuna R, Janet JA, Luo RC. Modeling of ultrasonic range sensors for localization of autonomous mobile robots. IEEE Transactions on Industrial Electronics. 1998;45(4):

[62] Ulusar UD, Akin HL. Design and implementation of a real time planning system for autonomous robots. In: Proceedings of IEEE International Symposium on Industrial Elec-

[63] Fong T, Thorpe C, Baur C. Multi-robot remote driving with collaborative control. IEEE

chewing. IEEE Transactions on Industrial Electronics. 2008;55(2):832-841

tional Symposium on Industrial Electronics; 2007. pp. 2101-2106

IEEE Transactions on Industrial Electronics. 2005;52(1):320-326

Transactions on Industrial Electronics. 2003;50(4):699-704

control. IEEE Transactions on Industrial Electronics. 1997;44(3):408-417

International Symposium on Industrial Electronics; 2007. pp. 2125-2128

tions on Neural Networks and Learning Systems. 2004;15(3):639-652

IEEE Transactions on Industrial Electronics. 2005;52(3):898-905

ral Networks. 1988;1:17-61

16(3):313-332

1983;13(5):815-926

tronics; 2006. pp. 74-79

654-662


[49] Huang S-J, Lian R-J. A hybrid fuzzy logic and neural network algorithm for robot motion control. IEEE Transactions on Industrial Electronics. 1997;44(3):408-417

[35] Ko J, Stewart B, Fox D, Konolige K, Limketkai B. A practical, decision-theoretic approach to multi-robot mapping and exploration. In: Proceedings of IEEE/RSJ International Con-

[36] B. Yamauchi. Frontier-based exploration using multiple robots. In: Proceedings of the 2nd International Conference on Autonomous Agents; Minneapolis, USA: 1998. pp. 47-53

[37] Yamauchi B. Decentralized coordination for multirobot exploration. Robotics and Auton-

[38] Butler Z. Distributed coverage of rectilinear environments [PhD dissertation]. Carnegie

[39] Butler ZJ, Rizzi AA, Hollis RL. Cooperative coverage of rectilinear environments. In: Proceedings of IEEE International Conference on Robotics and Automation; San Francisco,

[40] Latimer-IV D, Srinivasa S, Lee-Shue V, Sonne S, Choset H, Hurst A. Toward sensor based coverage with robot teams. In: Proceedings of IEEE International Conference on Robotics

[41] Rekleitis I, Lee-Shue V, New AP, Choset H. Limited communication, multi-robot team based coverage. In: Proceedings of IEEE International Conference on Robotics and Auto-

[42] Gabriely Y, Rimon E. Spanning-tree based coverage of continuous areas by a mobile robot.

[43] Chen J, Li LR. Path planning protocol for collaborative multi-robot systems. In: Proceedings of IEEE International Symposium on Computational Intelligence in Robotics and

[44] Chueh M, Yeung YLWA, Lei K-PC, Joshi SS. Following controller for autonomous mobile robots using behavioral cues. IEEE Transactions on Industrial Electronics. 2008;55(8):3124-3132

[45] Zavlanos MM, Pappas GJ. Dynamic assignment in distributed motion planning with local

[46] Ribas L, Izquierdo M, Mujal J, Ramon E. A prototype of a low-cost, downsizeable, dynamically reconfigurable unit for robot swarms. In: Proceedings of IEEE International Sympo-

[47] Boonpinon N, Sudsang A. Constrained coverage for heterogeneous multi-robot team. In: Proceedings of IEEE International Conference on Robotics and Biomimetics; Sanya,

[48] Schwager M, Slotine JJ, Rus D. Consensus learning for distributed coverage control. In: Proceedings of IEEE International Conference on Robotics and Automation; Pasadena,

Annals of Mathematics and Artificial Intelligence. 2001;31(1–4):77-98

coordination. IEEE Transactions on Robotics. 2008;24(1):232-242

sium on Industrial Electronics; 2007. pp. 2167-2172

and Automation; Washington, DC, USA: 2002. pp. 961-967

mation; New Orleans, USA: 2004. pp. 3462-3468

omous Systems. 1999;29(2–3):111-118

26 Artificial Intelligence - Emerging Trends and Applications

Mellon University; 2000

USA. 2000. pp. 2722-2727

Automation; 2005. pp. 721-726

China. 2007. pp. 799-804

USA. 2008. pp. 1042-1048

ference on Intelligent Robots and Systems; Las Vegas, USA: 2003. pp. 3232-3238


[64] Luo RC, Su KL. Multilevel multisensor-based intelligent recharging system for mobile robot. IEEE Transactions on Industrial Electronics. 2008;55(1):270-279

**Chapter 2**

**Provisional chapter**

**A Modified Neuro-Fuzzy System Using Metaheuristic**

**A Modified Neuro-Fuzzy System Using Metaheuristic** 

The impact of innovated Neuro-Fuzzy System (NFS) has emerged as a dominant technique for addressing various difficult research problems in business. ANFIS (Adaptive Neuro-Fuzzy Inference system) is an efficient combination of ANN and fuzzy logic for modeling highly non-linear, complex and dynamic systems. It has been proved that, with proper number of rules, an ANFIS system is able to approximate every plant. Even though it has been widely used, ANFIS has a major drawback of computational complexities. The number of rules and its tunable parameters increase exponentially when the numbers of inputs are large. Moreover, the standard learning process of ANFIS involves gradient based learning which has prone to fall in local minima. Many researchers have used meta-heuristic algorithms to tune parameters of ANFIS. This study will modify ANFIS architecture to reduce its complexity and improve the accuracy of classification problems. The experiments are carried out by trying different types and shapes of membership functions and meta-heuristics Artificial Bee Colony (ABC) algorithm with ANFIS and the training error results are measured for each combination. The results showed that modified ANFIS combined with ABC method provides better training error results

**Keywords:** adaptive neuro-fuzzy inference system, meta-heuristics, classification

The recent advances in artificial intelligence and soft computing techniques have opened new avenues for researchers to explore their applications. These machine learning techniques consist

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: 10.5772/intechopen.75575

**Approaches for Data Classification**

**Approaches for Data Classification**

Mohd Najib Mohd Salleh, Noureen Talpur and

Mohd Najib Mohd Salleh, Noureen Talpur and

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75575

than common ANFIS model.

problem

**1. Introduction**

Kashif Hussain Talpur

Kashif Hussain Talpur

**Abstract**


#### **A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification**

DOI: 10.5772/intechopen.75575

Mohd Najib Mohd Salleh, Noureen Talpur and Kashif Hussain Talpur Mohd Najib Mohd Salleh, Noureen Talpur and Kashif Hussain Talpur

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75575

#### **Abstract**

[64] Luo RC, Su KL. Multilevel multisensor-based intelligent recharging system for mobile

[65] Won WJ, Ban SW, Lee M. Real time implementation of a selective attention model for the intelligent robot with autonomous mental development. In: Proceedings of IEEE Interna-

[66] Perez DP, Gomez ES, Quintas MM. Simultaneous localization and structure reconstruction of mobile robots with external cameras. In: Proceedings of IEEE International Sym-

robot. IEEE Transactions on Industrial Electronics. 2008;55(1):270-279

tional Symposium on Industrial Electronics; 2005. pp. 1309-1314

posium on Industrial Electronics; 2005. pp. 1321-1326

28 Artificial Intelligence - Emerging Trends and Applications

The impact of innovated Neuro-Fuzzy System (NFS) has emerged as a dominant technique for addressing various difficult research problems in business. ANFIS (Adaptive Neuro-Fuzzy Inference system) is an efficient combination of ANN and fuzzy logic for modeling highly non-linear, complex and dynamic systems. It has been proved that, with proper number of rules, an ANFIS system is able to approximate every plant. Even though it has been widely used, ANFIS has a major drawback of computational complexities. The number of rules and its tunable parameters increase exponentially when the numbers of inputs are large. Moreover, the standard learning process of ANFIS involves gradient based learning which has prone to fall in local minima. Many researchers have used meta-heuristic algorithms to tune parameters of ANFIS. This study will modify ANFIS architecture to reduce its complexity and improve the accuracy of classification problems. The experiments are carried out by trying different types and shapes of membership functions and meta-heuristics Artificial Bee Colony (ABC) algorithm with ANFIS and the training error results are measured for each combination. The results showed that modified ANFIS combined with ABC method provides better training error results than common ANFIS model.

**Keywords:** adaptive neuro-fuzzy inference system, meta-heuristics, classification problem

## **1. Introduction**

The recent advances in artificial intelligence and soft computing techniques have opened new avenues for researchers to explore their applications. These machine learning techniques consist

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

of several intelligent computing paradigms, including artificial neural networks (ANN), support vector machine (SVM), decision tree, neuro-fuzzy systems (NFS), which have been successfully employed to model various real-world problems [1]. These problems broadly range from engineering to finance, geology and bio-sciences etc.

are the vital reasons behind the inspiration of building hybrid networks by combining two techniques to overcome the limitations of an individual technique. Among those hybrid networks, adaptive neuro-fuzzy inference system (ANFIS) is one of the best combinations of neural network and fuzzy logic. Next section will be explaining the basic structure and working of ANFIS.

A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification

http://dx.doi.org/10.5772/intechopen.75575

31

The concept of ANFIS was introduced by Jang in 1993, which is a proficient combination of neural network and fuzzy logic. Since ANFIS use fuzzy logic, Fuzzy inference system (FIS) is a useful soft computing technique introducing the concept of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning; therefore, while designing the ANFIS, the selection of the FIS is the major concern. It is a framework of neuro-fuzzy model that can integrate human expertise as well as adapt itself through learning. As an adaptive neuro-fuzzy model; it has advantage of being flexible, adaptive and effective for non-linear complex problems [6]. Recently, ANFIS has been successfully applied to the applications involving classification, rule-based process controls and pattern recognition. ANFIS consists of five layers, so nodes of each layer are connected to another layer by directed links. Therefore, to produce the output for a single node, each node performs a particular function on its incoming signals. Hence it is often known as a multilayer feed-forward network. The main objective of the ANFIS is to determine the optimum values of the equivalent

**Figure 1** shows complete ANFIS architecture. The five layers are: (i) fuzzification layer, (ii) product layer, (iii) normalized layer, (iv) defuzzification layer, (v) output layer. The two types of nodes are fixed (circle nodes) and adaptable (square nodes). Two fuzzy if-then rules with

, *Bi*

ing process. Parameters in IF part are known as antecedent or premise parameters, whereas, parameters in THEN part are known as consequent parameters. Nodes of layer 1 (Premise part) and layer 4 (consequent part) are trainable or adaptable nodes, while the nodes of layer 2 (product) and layer 3 (normalization) are fixed nodes. To execute the above two rules, the

This layer is an adaptive layer with the nodes of square shape. Every input of node *i* in this layer is adaptive membership function to generate the membership degree of linguistic variables. Membership function can be of any shape, i.e. Triangle, Trapezoidal, Gaussian, or gen-

, are linear parameters that are the design parameters identified during train-

are fuzzy membership functions. Whereas

**2.1. Adaptive neuro-fuzzy inference system (ANFIS)**

fuzzy inference system parameters by applying a learning algorithm.

*Rule* 1: *If x is A*<sup>1</sup> *and y is B*<sup>1</sup> *then f* = *p*<sup>1</sup> *x* + *q*<sup>1</sup> *y* + *r*<sup>1</sup>

*Rule* 2: *If x is A*<sup>2</sup> *and y is B*<sup>2</sup> *then f* = *p*<sup>2</sup> *x* + *q*<sup>2</sup> *y* + *r*<sup>2</sup>

two inputs *x*, *y* and one output *f* are considered as:

are non-linear parameters and *Ai*

five-layer architecture of ANFIS is explained as following:

*A*1 , *A*<sup>2</sup> , *B*1 , *B*2

*p*1 , *p*2 , *q*1 , *q*2 , and *r* 1 , *r* 2

*2.1.1. Layer 1 (Fuzzification)*

eralized Bell function.

Among the other soft computing techniques mentioned above, ANFIS is an efficient combination of ANN and fuzzy logic for modeling highly non-linear, complex, and dynamic systems [2]. It has been proved that, with proper number of rules, an ANFIS system is able to approximate every plant. Therefore, ANFIS systems are widely used and play the advantage of good applicability since they can be interpreted as non-linear modeling and conventional linear techniques for state estimation and control [3]. Even though it has been widely used, ANFIS has a major drawback of computational complexities. The number of rules and its tunable parameters increase exponentially when the numbers of inputs are large. Moreover, the standard learning process of ANFIS involves gradient based learning which has prone to fall in local minima. The systems designed in literature generally have few inputs and ANFIS models with large inputs have not been implemented due to curse of dimensionality and many researchers have used meta-heuristic algorithms to tune parameters of ANFIS. This study will modify ANFIS architecture to reduce its complexity. The proposed approach will focus on trying different types of membership functions because ANFIS accuracy highly depends on the type and shape of its membership functions. This research will propose solution to implement ANFIS for the problems with large number of inputs. Therefore, the problem of curse of dimensionality will be addressed in this research.

This study is organized in the following order: Section 2 defines the comprehensive literature review and gap analysis. Section 3 presented the methodology of the research to solve the gaps identified in previous section. Section 4 explains the results and analysis based on the experiments performed in this study. Section 5 summarized the whole research finding and future work.

## **2. Research background**

Recently, Neuro-Fuzzy system has gained more attraction between research communities than other types of fuzzy expert systems since it combines the advantages of learning ability of neural network and reasoning ability of fuzzy logic to solve many non-linear and complex real-world problems with high accuracy. This combination has been broadly used in the areas of education system, medical system, electrical system, traffic control, image processing, predictions and control of linear and nonlinear systems [4]. Since, every technique has few advantages and limitations such as; fuzzy logic is good on describing how they reach to certain decision, but they cannot learn rules themselves while neural networks are good on pattern matching whereas they cannot give a clear picture of how they reach to the certain decision. Adaptability is the main advantage of neural networks; therefore, it has ability to adjust their weights automatically to optimize their behavior as pattern recognizers, decision makers, predictors, etc. In fuzzy expert systems, it is not an easy task to find appropriate membership function parameters and rules. It either needs high level of expertise or a hard practice of trial-and-error [5]. These limitations are the vital reasons behind the inspiration of building hybrid networks by combining two techniques to overcome the limitations of an individual technique. Among those hybrid networks, adaptive neuro-fuzzy inference system (ANFIS) is one of the best combinations of neural network and fuzzy logic. Next section will be explaining the basic structure and working of ANFIS.

## **2.1. Adaptive neuro-fuzzy inference system (ANFIS)**

of several intelligent computing paradigms, including artificial neural networks (ANN), support vector machine (SVM), decision tree, neuro-fuzzy systems (NFS), which have been successfully employed to model various real-world problems [1]. These problems broadly range from

Among the other soft computing techniques mentioned above, ANFIS is an efficient combination of ANN and fuzzy logic for modeling highly non-linear, complex, and dynamic systems [2]. It has been proved that, with proper number of rules, an ANFIS system is able to approximate every plant. Therefore, ANFIS systems are widely used and play the advantage of good applicability since they can be interpreted as non-linear modeling and conventional linear techniques for state estimation and control [3]. Even though it has been widely used, ANFIS has a major drawback of computational complexities. The number of rules and its tunable parameters increase exponentially when the numbers of inputs are large. Moreover, the standard learning process of ANFIS involves gradient based learning which has prone to fall in local minima. The systems designed in literature generally have few inputs and ANFIS models with large inputs have not been implemented due to curse of dimensionality and many researchers have used meta-heuristic algorithms to tune parameters of ANFIS. This study will modify ANFIS architecture to reduce its complexity. The proposed approach will focus on trying different types of membership functions because ANFIS accuracy highly depends on the type and shape of its membership functions. This research will propose solution to implement ANFIS for the problems with large number of inputs. Therefore, the prob-

This study is organized in the following order: Section 2 defines the comprehensive literature review and gap analysis. Section 3 presented the methodology of the research to solve the gaps identified in previous section. Section 4 explains the results and analysis based on the experiments performed in this study. Section 5 summarized the whole research finding and

Recently, Neuro-Fuzzy system has gained more attraction between research communities than other types of fuzzy expert systems since it combines the advantages of learning ability of neural network and reasoning ability of fuzzy logic to solve many non-linear and complex real-world problems with high accuracy. This combination has been broadly used in the areas of education system, medical system, electrical system, traffic control, image processing, predictions and control of linear and nonlinear systems [4]. Since, every technique has few advantages and limitations such as; fuzzy logic is good on describing how they reach to certain decision, but they cannot learn rules themselves while neural networks are good on pattern matching whereas they cannot give a clear picture of how they reach to the certain decision. Adaptability is the main advantage of neural networks; therefore, it has ability to adjust their weights automatically to optimize their behavior as pattern recognizers, decision makers, predictors, etc. In fuzzy expert systems, it is not an easy task to find appropriate membership function parameters and rules. It either needs high level of expertise or a hard practice of trial-and-error [5]. These limitations

engineering to finance, geology and bio-sciences etc.

30 Artificial Intelligence - Emerging Trends and Applications

lem of curse of dimensionality will be addressed in this research.

future work.

**2. Research background**

The concept of ANFIS was introduced by Jang in 1993, which is a proficient combination of neural network and fuzzy logic. Since ANFIS use fuzzy logic, Fuzzy inference system (FIS) is a useful soft computing technique introducing the concept of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning; therefore, while designing the ANFIS, the selection of the FIS is the major concern. It is a framework of neuro-fuzzy model that can integrate human expertise as well as adapt itself through learning. As an adaptive neuro-fuzzy model; it has advantage of being flexible, adaptive and effective for non-linear complex problems [6]. Recently, ANFIS has been successfully applied to the applications involving classification, rule-based process controls and pattern recognition.

ANFIS consists of five layers, so nodes of each layer are connected to another layer by directed links. Therefore, to produce the output for a single node, each node performs a particular function on its incoming signals. Hence it is often known as a multilayer feed-forward network. The main objective of the ANFIS is to determine the optimum values of the equivalent fuzzy inference system parameters by applying a learning algorithm.

**Figure 1** shows complete ANFIS architecture. The five layers are: (i) fuzzification layer, (ii) product layer, (iii) normalized layer, (iv) defuzzification layer, (v) output layer. The two types of nodes are fixed (circle nodes) and adaptable (square nodes). Two fuzzy if-then rules with two inputs *x*, *y* and one output *f* are considered as:

*Rule* 1: *If x is A*<sup>1</sup> *and y is B*<sup>1</sup> *then f* = *p*<sup>1</sup> *x* + *q*<sup>1</sup> *y* + *r*<sup>1</sup>

*Rule* 2: *If x is A*<sup>2</sup> *and y is B*<sup>2</sup> *then f* = *p*<sup>2</sup> *x* + *q*<sup>2</sup> *y* + *r*<sup>2</sup>

*A*1 , *A*<sup>2</sup> , *B*1 , *B*2 are non-linear parameters and *Ai* , *Bi* are fuzzy membership functions. Whereas *p*1 , *p*2 , *q*1 , *q*2 , and *r* 1 , *r* 2 , are linear parameters that are the design parameters identified during training process. Parameters in IF part are known as antecedent or premise parameters, whereas, parameters in THEN part are known as consequent parameters. Nodes of layer 1 (Premise part) and layer 4 (consequent part) are trainable or adaptable nodes, while the nodes of layer 2 (product) and layer 3 (normalization) are fixed nodes. To execute the above two rules, the five-layer architecture of ANFIS is explained as following:

## *2.1.1. Layer 1 (Fuzzification)*

This layer is an adaptive layer with the nodes of square shape. Every input of node *i* in this layer is adaptive membership function to generate the membership degree of linguistic variables. Membership function can be of any shape, i.e. Triangle, Trapezoidal, Gaussian, or generalized Bell function.

**Figure 1.** Basic ANFIS's architecture.

$$O\_{1j} = \mu\_{A\_i}(\mathbf{x}), \quad i = 1, 2 \text{ or } O\_{1j} = \mu\_{\beta\_{\gamma i}} \eta \rho\_j, i = 3, 4. \tag{1}$$

Here, *x* and *y* are the two inputs and if *μAi* and *μBi* are Gaussian MFs (Eq. 2), they are specified by two parameters center *c* and width *σ*, which are referred to as premise parameters.

$$\text{Gaussian } (\mathfrak{x}; \mathfrak{c}, \sigma) = e^{-\frac{1}{2} \left( \frac{\mathfrak{x} \cdot \mathfrak{x}}{\sigma} \right)^{1}} \tag{2}$$

*2.1.4. Layer 4 (Defuzzification)*

*2.1.5. Layer 5 (Overall output)*

**2.2. ANFIS learning algorithm**

*O*5,*<sup>i</sup>* = ∑

sufficiently fit the training data (**Table 1**).

**Table 1.** Two pass hybrid learning algorithm for ANFIS.

**2.3. Data partitioning**

ANFIS.

and *pi* , *qi* , *r i*

*O*4,*<sup>i</sup>* = *w*¯*<sup>i</sup> f*

Nodes in this layer are adaptive with node function,

where *w*¯ is rule's normalized firing strength and {*pi*

*<sup>i</sup>* = *w*¯*<sup>i</sup>*

(*pi x* + *qi y* + *ri*

represents output of layer 4. Parameters in this layer are linear parameters well-known as consequent parameters. These parameters are identified during the training process of

This single node is called output layer which is labeled as "∑." This layer only does summation of outputs of all rules in previous layer and converts fuzzy result into crisp output.

ANFIS learns and update its all modifiable parameters by using two pass learning algorithm; forward pass and backward pass. ANFIS train its parameters such as c, σ (MF parameters)

put; using a hybrid of gradient descent (GD) and least squares estimator (LSE). During forward pass of the learning algorithm, consequent parameters are updated by LSE method and signal is node outputs. During backward pass, the premise parameters are updated by the GD algorithm and the error signals propagate backward from the output layer to until input layer. At this point, neural network learns and train to determine parameter values that can

ANFIS can be constructed by partitioning of the input-output data into rule patches. So, this can be accomplished by using three methods such as; genfis1 (grid-partitioning), genfis2 (subtractive clustering) and genfis3 (Fuzzy C-Mean). Grid partitioning employs grid partitioning

Antecedent parameters Fixed GD Consequent parameters LSE Fixed

Signals Node outputs Error signals

**Forward pass Backward pass**

*<sup>i</sup>* <sup>=</sup> <sup>∑</sup>*<sup>i</sup>*=1 <sup>2</sup> ¯*wi <sup>f</sup>* \_\_\_\_\_\_\_*<sup>i</sup> w*<sup>1</sup> + *w*<sup>2</sup>

(consequent parameters) for minimizing error between actual and the desired out-

*i*=1 2 ¯ *wi f* ,*qi* , *r i*

A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification

), *i* = 1, 2 (5)

http://dx.doi.org/10.5772/intechopen.75575

} is a first order polynomial. *O*4,*<sup>i</sup>*

(6)

33

*O1,i* is the output of layer l and the *i*th node.

#### *2.1.2. Layer 2 (Product)*

These are fixed nodes that represent the product Π to calculate firing strength of a rule. This layer accepts input values from first layer and turns as a membership function to represent fuzzy sets of respective input variables. The output of each node is the product of all the incoming signals that are coming to it.

$$O\_{\mathbf{z}\_{\rangle}} = w\_{\rangle} = \mu\_{A\_{\rangle}}(\mathbf{x})\,\mu\_{\mathbf{z}}\,\mathcal{Y}\mathbf{y}\_{\mathbf{y}} \quad \text{i} = 1, 2 \tag{3}$$

#### *2.1.3. Layer 3 (Normalization)*

Nodes of layer 3 are also fixed nodes. All nodes in this layer are labeled as *N*. Each node normalizes firing strength of a rule from previous layer by calculating the ratio of the *i*th rule's firing strength to the sum of all rules' firing strength.

$$O\_{3,i} = \overline{w}\_{\rangle} = \frac{w\_i}{\overline{w}\_i + w\_{2}} \quad i = 1, 2 \tag{4}$$

Where *w*¯ is referred to as normalized firing strength of a rule.

#### *2.1.4. Layer 4 (Defuzzification)*

Nodes in this layer are adaptive with node function,

$$\mathcal{O}\_{4,i} = \overline{w}\_i f\_i = \overline{w}\_i (p\_i \ge +q\_i y + r\_i)\_{\mathbb{P}} \quad i = 1, 2 \tag{5}$$

where *w*¯ is rule's normalized firing strength and {*pi* ,*qi* , *r i* } is a first order polynomial. *O*4,*<sup>i</sup>* represents output of layer 4. Parameters in this layer are linear parameters well-known as consequent parameters. These parameters are identified during the training process of ANFIS.

## *2.1.5. Layer 5 (Overall output)*

*O*1,*<sup>i</sup>* = *μAi*

**Figure 1.** Basic ANFIS's architecture.

*2.1.2. Layer 2 (Product)*

Here, *x* and *y* are the two inputs and if *μAi*

32 Artificial Intelligence - Emerging Trends and Applications

*O1,i* is the output of layer l and the *i*th node.

incoming signals that are coming to it.

*2.1.3. Layer 3 (Normalization)*

*O*2,*<sup>i</sup>* = *wi* = *μAi*

firing strength to the sum of all rules' firing strength.

*<sup>O</sup>*3,*<sup>i</sup>* <sup>=</sup> *<sup>w</sup>*¯*<sup>i</sup>* <sup>=</sup> \_\_\_\_\_\_ *wi*

Where *w*¯ is referred to as normalized firing strength of a rule.

*Gaussian* (*x*; *c*, *σ*) = *e* <sup>−</sup>\_\_1

(*x*), *<sup>i</sup>* <sup>=</sup> <sup>1</sup>, 2 or *O*1,*<sup>i</sup>* <sup>=</sup> *μBi*−<sup>2</sup>

and *μBi*

These are fixed nodes that represent the product Π to calculate firing strength of a rule. This layer accepts input values from first layer and turns as a membership function to represent fuzzy sets of respective input variables. The output of each node is the product of all the

(*x*) *μBi*

Nodes of layer 3 are also fixed nodes. All nodes in this layer are labeled as *N*. Each node normalizes firing strength of a rule from previous layer by calculating the ratio of the *i*th rule's

*w*<sup>1</sup> + *w*<sup>2</sup>

2( \_\_\_ *x*−*c σ* ) 2

by two parameters center *c* and width *σ*, which are referred to as premise parameters.

(*y*), *i* = 3, 4. (1)

(2)

are Gaussian MFs (Eq. 2), they are specified

(*y*), *i* = 1, 2 (3)

, *i* = 1, 2 (4)

This single node is called output layer which is labeled as "∑." This layer only does summation of outputs of all rules in previous layer and converts fuzzy result into crisp output.

$$O\_{z\_{j,i}} = \sum\_{i=1}^{2} \varpi \tau\_{\cdot} f\_{i} = \frac{\sum\_{i=1}^{2} \varpi\_{\cdot} f\_{i}}{\varpi\_{1} + w\_{2}} \tag{6}$$

#### **2.2. ANFIS learning algorithm**

ANFIS learns and update its all modifiable parameters by using two pass learning algorithm; forward pass and backward pass. ANFIS train its parameters such as c, σ (MF parameters) and *pi* , *qi* , *r i* (consequent parameters) for minimizing error between actual and the desired output; using a hybrid of gradient descent (GD) and least squares estimator (LSE). During forward pass of the learning algorithm, consequent parameters are updated by LSE method and signal is node outputs. During backward pass, the premise parameters are updated by the GD algorithm and the error signals propagate backward from the output layer to until input layer. At this point, neural network learns and train to determine parameter values that can sufficiently fit the training data (**Table 1**).

#### **2.3. Data partitioning**

ANFIS can be constructed by partitioning of the input-output data into rule patches. So, this can be accomplished by using three methods such as; genfis1 (grid-partitioning), genfis2 (subtractive clustering) and genfis3 (Fuzzy C-Mean). Grid partitioning employs grid partitioning


**Table 1.** Two pass hybrid learning algorithm for ANFIS.

approach that divides data space into grids based on the number of memberships function per input. Generally, it is appropriate to use grid partitioning only for problems with a less than six number of input variables. The number of rules increases exponentially when the number of inputs increases in underlying system [7]. To avoid this issue, the clustering approach is an effective partitioning method that divides data points into groups (clusters) according to the membership grade or degree. For this reason, clustering based methods seem to be preferred to the grid partitioning techniques. Subtractive clustering method is an effective clustering approach when there is no clear idea about how many clusters will be needed for a given dataset. It reduces the computation time by finding the center of clustering by using the data itself [8]. Fuzzy C-Mean (FCM) algorithm allows one data to belong to two or more clusters. Thus, FCM let membership function value be in the range of 0 until 1 which certainly improve the result [9]. Furthermore, other than partitioning methods, to train ANFIS parameters, many researchers have proposed ANFIS training methods based on meta-heuristic algorithms. Following Section 2.4 discuss more about meta-heuristic algorithms.

## **2.4. Optimization using meta-heuristics**

In meta-heuristic algorithms, *meta* means "higher level," all meta-heuristic algorithms use some trade-off of local search and global exploration. The main components of any metaheuristic algorithm are: exploitation and exploration. Exploration means generating diverse solutions to explore the search space on the global scale, while exploitation means focusing on the search in a local region by exploiting the information that a current good solution is found in this region. As for ANFIS, structure learning and parameters identification are the two dimensions of ANFIS training. Some have focused on either of the two dimensions, while others have tried to work on both of the issues. The original ANFIS proposed by Jang uses hybrid learning, which uses GD and LSE. However, the drawbacks of GD have opted the researchers to different alternatives such as; ant colony, particle swarm optimization (PSO), firefly algorithm (FFA), cuckoo search algorithm (CSA), artificial bee colony (ABC). Among these training methods in meta-heuristic paradigm, this study will employ ABC (Artificial Bee Colony) optimization algorithm to train the parameters of ANFIS model instead of gradient based learning mechanism.

**2.6. Comparative study of ANFIS**

**Figure 2.** Flowchart of Artificial Bee Colony (ABC) algorithm.

complexity.

The successful integration of neural network and fuzzy logic models in the form of ANFIS holds the advantages of solving applications that are highly non-linear and complex. Based on the robustness in results, ANFIS has been implemented in a wide variety of applications including classification tasks, rule-based process control, pattern recognition [4]. Even though ANFIS is one of the best tradeoff between neural network and fuzzy systems, providing smoothness and adaptability in the system because of fuzzy logic interpolation and the neural network back-propagation, on the other hand, the model faces strong computational

A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification

http://dx.doi.org/10.5772/intechopen.75575

35

Generally, literature emphasis on three main problems regarding ANFIS to solve computational complexity: reducing rule-base, developing efficient training methods and membership functions selection. A significant increment in rules increase the complexity of ANFIS architecture as rules are generated with all possible combinations of antecedent and quality of result depends on the effectiveness of these rules [10]. Therefore, a careful study of the literature reveals that there have been many techniques to achieve rule-base minimization and accuracy maximization. These techniques include selecting potential and removing nonpotential rules from the entire ANFIS knowledge-base, such as; Karnaugh map (K-Map) [11]. Moreover, other than rule-base minimization issue, training the parameters of ANFIS model is one of the main issues encountered when the model is applied to the real-world problems. The original ANFIS architecture that was introduced by Jang has drawbacks, since it uses hybrid learning algorithm which is the combination of GD and LSE. Because of using GD, it has problem to be likely trapped in local [6]. To cope with this many researchers have proposed meta-heuristic algorithms; such as, genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC) etc. Similarly, a new hybrid method was introduced in literature by Orouskhani and Mansouri [12]. This study modified Cat Swarm Optimization for training the parameters of ANFIS by updating antecedent parameters. On the other hand,

## **2.5. Artificial Bee Colony algorithm**

Artificial Bee Colony (ABC) is a swarm intelligence-based algorithm which is inspired by the intelligent behavior of honey bees. It was introduced by DervisKaraboga in 2005 to solve optimization problems. Since then, the ABC algorithm has been used infields such as data mining, image processing and numerical problems. As in **Figure 2**, ABC provides a population-based search, the bees searching for food are divided into three groups in ABC: employed bees, onlooker bees and scout bees. Employed bees used to look around the search space to hunt and gather information about the position and quality of food source while the onlooker bees stay in hive and choose the food source based on information given by the employed bees. Scout bees try to replace the abandoned employed bees to search for new food sources arbitrarily [21]. The food source's position is a possible solution to theoptimization problem. The amount of nectar in a food source measured the quality of the problem.

A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification http://dx.doi.org/10.5772/intechopen.75575 35

**Figure 2.** Flowchart of Artificial Bee Colony (ABC) algorithm.

#### **2.6. Comparative study of ANFIS**

approach that divides data space into grids based on the number of memberships function per input. Generally, it is appropriate to use grid partitioning only for problems with a less than six number of input variables. The number of rules increases exponentially when the number of inputs increases in underlying system [7]. To avoid this issue, the clustering approach is an effective partitioning method that divides data points into groups (clusters) according to the membership grade or degree. For this reason, clustering based methods seem to be preferred to the grid partitioning techniques. Subtractive clustering method is an effective clustering approach when there is no clear idea about how many clusters will be needed for a given dataset. It reduces the computation time by finding the center of clustering by using the data itself [8]. Fuzzy C-Mean (FCM) algorithm allows one data to belong to two or more clusters. Thus, FCM let membership function value be in the range of 0 until 1 which certainly improve the result [9]. Furthermore, other than partitioning methods, to train ANFIS parameters, many researchers have proposed ANFIS training methods based on meta-heuristic algorithms.

In meta-heuristic algorithms, *meta* means "higher level," all meta-heuristic algorithms use some trade-off of local search and global exploration. The main components of any metaheuristic algorithm are: exploitation and exploration. Exploration means generating diverse solutions to explore the search space on the global scale, while exploitation means focusing on the search in a local region by exploiting the information that a current good solution is found in this region. As for ANFIS, structure learning and parameters identification are the two dimensions of ANFIS training. Some have focused on either of the two dimensions, while others have tried to work on both of the issues. The original ANFIS proposed by Jang uses hybrid learning, which uses GD and LSE. However, the drawbacks of GD have opted the researchers to different alternatives such as; ant colony, particle swarm optimization (PSO), firefly algorithm (FFA), cuckoo search algorithm (CSA), artificial bee colony (ABC). Among these training methods in meta-heuristic paradigm, this study will employ ABC (Artificial Bee Colony) optimization algorithm to train the parameters of ANFIS model instead of gradi-

Artificial Bee Colony (ABC) is a swarm intelligence-based algorithm which is inspired by the intelligent behavior of honey bees. It was introduced by DervisKaraboga in 2005 to solve optimization problems. Since then, the ABC algorithm has been used infields such as data mining, image processing and numerical problems. As in **Figure 2**, ABC provides a population-based search, the bees searching for food are divided into three groups in ABC: employed bees, onlooker bees and scout bees. Employed bees used to look around the search space to hunt and gather information about the position and quality of food source while the onlooker bees stay in hive and choose the food source based on information given by the employed bees. Scout bees try to replace the abandoned employed bees to search for new food sources arbitrarily [21]. The food source's position is a possible solution to theoptimization problem. The

amount of nectar in a food source measured the quality of the problem.

Following Section 2.4 discuss more about meta-heuristic algorithms.

**2.4. Optimization using meta-heuristics**

34 Artificial Intelligence - Emerging Trends and Applications

ent based learning mechanism.

**2.5. Artificial Bee Colony algorithm**

The successful integration of neural network and fuzzy logic models in the form of ANFIS holds the advantages of solving applications that are highly non-linear and complex. Based on the robustness in results, ANFIS has been implemented in a wide variety of applications including classification tasks, rule-based process control, pattern recognition [4]. Even though ANFIS is one of the best tradeoff between neural network and fuzzy systems, providing smoothness and adaptability in the system because of fuzzy logic interpolation and the neural network back-propagation, on the other hand, the model faces strong computational complexity.

Generally, literature emphasis on three main problems regarding ANFIS to solve computational complexity: reducing rule-base, developing efficient training methods and membership functions selection. A significant increment in rules increase the complexity of ANFIS architecture as rules are generated with all possible combinations of antecedent and quality of result depends on the effectiveness of these rules [10]. Therefore, a careful study of the literature reveals that there have been many techniques to achieve rule-base minimization and accuracy maximization. These techniques include selecting potential and removing nonpotential rules from the entire ANFIS knowledge-base, such as; Karnaugh map (K-Map) [11]. Moreover, other than rule-base minimization issue, training the parameters of ANFIS model is one of the main issues encountered when the model is applied to the real-world problems. The original ANFIS architecture that was introduced by Jang has drawbacks, since it uses hybrid learning algorithm which is the combination of GD and LSE. Because of using GD, it has problem to be likely trapped in local [6]. To cope with this many researchers have proposed meta-heuristic algorithms; such as, genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC) etc. Similarly, a new hybrid method was introduced in literature by Orouskhani and Mansouri [12]. This study modified Cat Swarm Optimization for training the parameters of ANFIS by updating antecedent parameters. On the other hand, Karaboga and Kaya [13] proposed an Adaptive and Hybrid Artificial Bee Colony (*a*ABC) algorithm to train all parameters of ANFIS. Najafi [14] employed PSO with ANFIS to optimize and train parameters for prediction of viscosity of mixed oils. Along with issue such as; rule base and training methods; ANFIS also suffer from uncertainty. Therefore, techniques like fuzzy logic have been applied because fuzzy logic with the help of membership function is capable to describe uncertainty [15]. As ANFIS also practice fuzzy logic, therefore correct choice of membership functions is most important factor in building the ANFIS model. Although, various studies in literature can be found for the choice of membership function in fuzzy inference system such as; Suntae [16] did the comparison of membership functions in security robot system for decision making. Similarly, the key focus of Saha [17] study was to investigate the best membership function in applications of sign language. Adil [18] compared the effects of different types of membership functions to determine the performance of fuzzy logic controller. But for the case of neuro-fuzzy systems like ANFIS, such studies have been barely found.

modified ANFIS architecture to compare and validate results between standard ANFIS architecture and modified ANFIS architecture. The outcome of this research will be modified ANFIS

A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification

http://dx.doi.org/10.5772/intechopen.75575

37

Among various clustering techniques available, grid partitioning (genfis1), subtractive clustering (genfis2) and fuzzy C-Mean clustering (genfis3) most widely used during clustering into fuzzy inference system (FIS). It acts as a model that will reflect the relationship between the different input parameters. As it is observed from the comparative analysis and literature review that those clustering approaches are very useful to generate better accuracy for ANFIS model, since it generates maximum number of rules by considering all possibilities, but it also increases computational cost as consequent part of rules contains more number of parameters. Therefore, fourth layer which holds linear coefficients shares most of the computational cost of training algorithm because consequent part holds more number of trainable parameters than the parameters in premise part. Following example explains number of trainable

Where *n* represents number of inputs and each input *n* is partitioned into *m* number of membership functions and parameters need to be trained or modified in membership function are labeled as *p*. According to this, total number of premise trainable parameters are *n* × *m* × *p*. Similarly, consequent trainable parameters per rule are *n* + 1 and *m<sup>n</sup>* expresses total number of rules in system. Therefore, the total number of consequent trainable

architecture as stated in proposed approach (**Figure 3**: Research Framework).

**3.1. Proposed approach**

parameters are *m<sup>n</sup>* × (*n* + 1).

parameters holds by premise and consequent part:

*F*(*n*, *m*, *p*) = *n* × *m* × *p* + *m<sup>n</sup>* × (*n* + 1)

**Figure 3.** Standard architecture of ANFIS with function associated to each layer.

As ANFIS accuracy is highly dependent to the type and shape of membership function, therefore, this study will try different types and shapes of membership function to get best suitable membership function for ANFIS model. Additionally, this study will focus on modifying the standard ANFIS architecture by reducing the fourth layer as consequent part of rules contains more number of parameters, plus; one of the best meta-heuristic approach will be used to tune the parameters of ANFIS model. The final model will be ideal for dataset having large number of inputs. The proposed approach to achieve the goal of this study is further explained in next section which defines research methodology.

## **3. Research methodology**

The research methodology starts by collecting classification datasets. This research will solve six real world classification problems Iris, Breast Cancer, Car evaluation, Teacher Assistant, Glass Identification and Seeds based on small to large number of inputs (4–10 inputs) taken from the University of California Irvine Machine Learning Repository (UCIMLR). For splitting the data into training and testing purpose, according to literature [19, 20] most researchers practiced the 70:30 ratio (70% training, 30% testing) because the more data applied for the training, the more optimal and accurate results a system generates. Therefore, in this study the 70% of the dataset instances were selected for training set and the remaining 30% of the dataset instances were chosen for testing set.

Right after collecting, preprocessing and partitioning the selected data, in modification phase, different types and shapes of membership functions will be examined in standard ANFIS architecture while solving six classification problems taken from UCI repository. ANFIS architecture will be modified to lessen its computational complexities because ANFIS's basic architecture contains five layers and use GD and updating parameters consumes a lot of computational time when the number of parameters are large. The dataset collected will be then applied in modified ANFIS architecture to compare and validate results between standard ANFIS architecture and modified ANFIS architecture. The outcome of this research will be modified ANFIS architecture as stated in proposed approach (**Figure 3**: Research Framework).

#### **3.1. Proposed approach**

Karaboga and Kaya [13] proposed an Adaptive and Hybrid Artificial Bee Colony (*a*ABC) algorithm to train all parameters of ANFIS. Najafi [14] employed PSO with ANFIS to optimize and train parameters for prediction of viscosity of mixed oils. Along with issue such as; rule base and training methods; ANFIS also suffer from uncertainty. Therefore, techniques like fuzzy logic have been applied because fuzzy logic with the help of membership function is capable to describe uncertainty [15]. As ANFIS also practice fuzzy logic, therefore correct choice of membership functions is most important factor in building the ANFIS model. Although, various studies in literature can be found for the choice of membership function in fuzzy inference system such as; Suntae [16] did the comparison of membership functions in security robot system for decision making. Similarly, the key focus of Saha [17] study was to investigate the best membership function in applications of sign language. Adil [18] compared the effects of different types of membership functions to determine the performance of fuzzy logic controller. But for the case of neuro-fuzzy systems like ANFIS, such studies have

As ANFIS accuracy is highly dependent to the type and shape of membership function, therefore, this study will try different types and shapes of membership function to get best suitable membership function for ANFIS model. Additionally, this study will focus on modifying the standard ANFIS architecture by reducing the fourth layer as consequent part of rules contains more number of parameters, plus; one of the best meta-heuristic approach will be used to tune the parameters of ANFIS model. The final model will be ideal for dataset having large number of inputs. The proposed approach to achieve the goal of this study is further

The research methodology starts by collecting classification datasets. This research will solve six real world classification problems Iris, Breast Cancer, Car evaluation, Teacher Assistant, Glass Identification and Seeds based on small to large number of inputs (4–10 inputs) taken from the University of California Irvine Machine Learning Repository (UCIMLR). For splitting the data into training and testing purpose, according to literature [19, 20] most researchers practiced the 70:30 ratio (70% training, 30% testing) because the more data applied for the training, the more optimal and accurate results a system generates. Therefore, in this study the 70% of the dataset instances were selected for training set and the remaining 30% of the

Right after collecting, preprocessing and partitioning the selected data, in modification phase, different types and shapes of membership functions will be examined in standard ANFIS architecture while solving six classification problems taken from UCI repository. ANFIS architecture will be modified to lessen its computational complexities because ANFIS's basic architecture contains five layers and use GD and updating parameters consumes a lot of computational time when the number of parameters are large. The dataset collected will be then applied in

explained in next section which defines research methodology.

been barely found.

36 Artificial Intelligence - Emerging Trends and Applications

**3. Research methodology**

dataset instances were chosen for testing set.

Among various clustering techniques available, grid partitioning (genfis1), subtractive clustering (genfis2) and fuzzy C-Mean clustering (genfis3) most widely used during clustering into fuzzy inference system (FIS). It acts as a model that will reflect the relationship between the different input parameters. As it is observed from the comparative analysis and literature review that those clustering approaches are very useful to generate better accuracy for ANFIS model, since it generates maximum number of rules by considering all possibilities, but it also increases computational cost as consequent part of rules contains more number of parameters. Therefore, fourth layer which holds linear coefficients shares most of the computational cost of training algorithm because consequent part holds more number of trainable parameters than the parameters in premise part. Following example explains number of trainable parameters holds by premise and consequent part:

$$F(n, m, p) = n \times m \times p + m^n \times (n+1)^n$$

Where *n* represents number of inputs and each input *n* is partitioned into *m* number of membership functions and parameters need to be trained or modified in membership function are labeled as *p*. According to this, total number of premise trainable parameters are *n* × *m* × *p*. Similarly, consequent trainable parameters per rule are *n* + 1 and *m<sup>n</sup>* expresses total number of rules in system. Therefore, the total number of consequent trainable parameters are *m<sup>n</sup>* × (*n* + 1).

**Figure 3.** Standard architecture of ANFIS with function associated to each layer.

For instance; If dataset applied in ANFIS model having *n* = 4 inputs, *m = 2* (membership functions) for each input and membership type of triangular function with parameters *p* = 3, the number of premise and number of consequent trainable parameters will be as follows:

 *F*(*n*, *m*, *p*) = *n* × *m* × *p* + *m<sup>n</sup>* × (*n* + 1) *F*(4, 2, 3) = 4 × 2 × 3 + 2<sup>4</sup> × (4 + 1) *<sup>F</sup>*(4, <sup>2</sup>, 3) <sup>=</sup> <sup>24</sup> <sup>+</sup> <sup>80</sup> <sup>=</sup> <sup>104</sup>               <sup>=</sup> Premise <sup>+</sup> Consequent <sup>+</sup> Total trainable parameters

Thus, out of the total trainable parameters 104, the number of premise trainable parameters is 24 and consequent trainable parameters are 80 which is far more than premise parameters. Therefore, removal of fourth layer may contribute in reduction of computation hence ANFIS architecture can be reduced to four layers. Following **Figure 3** shows standard architecture of ANFIS with 5 layers and their associated functions in each layer. This research will modify ANFIS architecture by removing fourth layer which contain adaptable nodes and holds more parameters to update. As observed in **Figure 3**, the calculation performed by fourth layer practice this following equation with the node function:

$$\begin{aligned} \bullet \quad & \quad \bullet \quad \bullet \\ f\_i &= p\_i \ge +q\_i \, y + r\_i \\ \text{where} \quad \bullet \quad \bullet\_{4,i} = \varpi\_i f\_i &= \varpi\_i (p\_i \ge +q\_i \, y + r\_i) \end{aligned}$$

Where *w*¯ is rule's normalized firing strength from third layer and {p*<sup>i</sup>* , q*<sup>i</sup>* , r*i* } is a first order polynomial from *f i* . Parameters in fourth layer are referred to as consequent parameters and are identified during the training process of ANFIS. This research will try to remove two extra parameter *pi* and *qi* from the function *f i* in fourth layer as following *f <sup>i</sup>* = *pi x* + *qi y* + *r i* .

To make it *f i* =*ri* , this approach will reduce two trainable parameters and likewise make third layer as an adaptable to merge the fourth layer function into third layer as:

$$\mathbf{O}\_{3\angle} = \overline{w}i \mathbf{\dot{r}} \times \mathbf{f}\mathbf{\dot{r}}\tag{7}$$

*F*(*n*, *m*, *p*) = *n* × *m* × *p* + *m<sup>n</sup>* × 1

**Figure 4.** Proposed approach to modify ANFIS architecture.

*F*(4, 2, 3) = 4 × 2 × 2 + 2<sup>4</sup> × 1

*F*(4, 2, 3) = 16 + 16 = 32

trainable parameters and number of epochs.

= Premise + Consequent = Total trainable parameters

are far less than the consequent parameters in standard ANFIS model.

Hence, it can be observed that out of 32 total trainable parameters, the premise parameters are 16 and consequent parameters are 16. The consequent parameters in modified ANFIS model

A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification

http://dx.doi.org/10.5772/intechopen.75575

39

Therefore, after modifying the architecture, different datasets with small to large amount of inputs will be applied in modified ANFIS to compare results in terms of computation time and efficiency with performance validation matrix MSE percentage of accuracy, number of

Furthermore, training algorithm also plays an important role to train ANFIS model. It is observed form the literature review section, that the meta-heuristic algorithms, such as ant colony, particle swarm optimization (PSO), firefly algorithm (FFA), cuckoo search algorithm (CSA), artificial bee colony (ABC) and many more algorithms have been developed and used to train ANFIS model by researchers recently. Among these training methods in metaheuristic paradigm, this study will employ ABC (Artificial Bee Colony) optimization algorithm to train the parameters of ANFIS model instead of gradient based learning mechanism. Therefore, this study will not only reduce computational complexity of ANFIS architecture,

So, the resultant ANFIS will be a modified architecture that is more simple architecture comprising total four layers and less trainable parameters. For example, following **Figure 4** shows proposed approach to modify ANFIS architecture to make it simple and less complex:

According to the modified architecture of ANFIS, the total parameters trained by system will be reduced. By taking the same example as standard ANFIS model. If dataset applied in ANFIS model having *n* = 4 inputs, *m = 2* (membership functions) for each input and membership type of gaussian function with parameters *p* = 2, the number of premise and number of consequent trainable parameters will be as follows:

A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification http://dx.doi.org/10.5772/intechopen.75575 39

**Figure 4.** Proposed approach to modify ANFIS architecture.

For instance; If dataset applied in ANFIS model having *n* = 4 inputs, *m = 2* (membership functions) for each input and membership type of triangular function with parameters *p* = 3, the number of premise and number of consequent trainable parameters will be as follows:

Thus, out of the total trainable parameters 104, the number of premise trainable parameters is 24 and consequent trainable parameters are 80 which is far more than premise parameters. Therefore, removal of fourth layer may contribute in reduction of computation hence ANFIS architecture can be reduced to four layers. Following **Figure 3** shows standard architecture of ANFIS with 5 layers and their associated functions in each layer. This research will modify ANFIS architecture by removing fourth layer which contain adaptable nodes and holds more parameters to update. As observed in **Figure 3**, the calculation performed by fourth layer

(*pi x* + *qi y* + *ri*

are identified during the training process of ANFIS. This research will try to remove two extra

So, the resultant ANFIS will be a modified architecture that is more simple architecture comprising total four layers and less trainable parameters. For example, following **Figure 4** shows

According to the modified architecture of ANFIS, the total parameters trained by system will be reduced. By taking the same example as standard ANFIS model. If dataset applied in ANFIS model having *n* = 4 inputs, *m = 2* (membership functions) for each input and membership type of gaussian function with parameters *p* = 2, the number of premise and number of

proposed approach to modify ANFIS architecture to make it simple and less complex:

. Parameters in fourth layer are referred to as consequent parameters and

in fourth layer as following *f*

, this approach will reduce two trainable parameters and likewise make third

), *i* = 1, 2

, q*<sup>i</sup>* , r*i*

) (7)

*<sup>i</sup>* = *pi x* + *qi y* + *r i* .

} is a first order

*F*(*n*, *m*, *p*) = *n* × *m* × *p* + *m<sup>n</sup>* × (*n* + 1)

38 Artificial Intelligence - Emerging Trends and Applications

*F*(4, 2, 3) = 4 × 2 × 3 + 2<sup>4</sup> × (4 + 1)

practice this following equation with the node function:

from the function *f*

*O*3,*<sup>i</sup>* = *w*¯*i* × f*i*(*ri*

consequent trainable parameters will be as follows:

*<sup>i</sup>* = *pi x* + *qi y* + *ri* where *<sup>O</sup>*4,*<sup>i</sup>* <sup>=</sup> ¯

*wi f i* = ¯ *wi*

*i*

layer as an adaptable to merge the fourth layer function into third layer as:

Where *w*¯ is rule's normalized firing strength from third layer and {p*<sup>i</sup>*

*<sup>f</sup>*

polynomial from *f*

parameter *pi*

To make it *f*

*i*

and *qi*

*i* =*ri*

*<sup>F</sup>*(4, <sup>2</sup>, 3) <sup>=</sup> <sup>24</sup> <sup>+</sup> <sup>80</sup> <sup>=</sup> <sup>104</sup>               <sup>=</sup> Premise <sup>+</sup> Consequent <sup>+</sup> Total trainable parameters

 *F*(*n*, *m*, *p*) = *n* × *m* × *p* + *m<sup>n</sup>* × 1 *F*(4, 2, 3) = 4 × 2 × 2 + 2<sup>4</sup> × 1 *F*(4, 2, 3) = 16 + 16 = 32

= Premise + Consequent = Total trainable parameters

Hence, it can be observed that out of 32 total trainable parameters, the premise parameters are 16 and consequent parameters are 16. The consequent parameters in modified ANFIS model are far less than the consequent parameters in standard ANFIS model.

Therefore, after modifying the architecture, different datasets with small to large amount of inputs will be applied in modified ANFIS to compare results in terms of computation time and efficiency with performance validation matrix MSE percentage of accuracy, number of trainable parameters and number of epochs.

Furthermore, training algorithm also plays an important role to train ANFIS model. It is observed form the literature review section, that the meta-heuristic algorithms, such as ant colony, particle swarm optimization (PSO), firefly algorithm (FFA), cuckoo search algorithm (CSA), artificial bee colony (ABC) and many more algorithms have been developed and used to train ANFIS model by researchers recently. Among these training methods in metaheuristic paradigm, this study will employ ABC (Artificial Bee Colony) optimization algorithm to train the parameters of ANFIS model instead of gradient based learning mechanism. Therefore, this study will not only reduce computational complexity of ANFIS architecture, but this approach will also apply efficient training mechanism to train ANFIS parameters by using meta-heuristic swarm intelligence-based approach.

ranked from smallest to largest (rank 1–6) according to the sum of training and testing RMSE for each dataset. Furthermore, the average of these ranks is being computed to generate overall rank in the group of membership functions. **Table 3** presents the final result of comparison in terms of average of train and test RMSE, average of ranks and overall ranks. The overall results of the experiments from **Table 3** shows that among all three models of ANFIS (i.e. genfis1, genfis2, and genfis3) and membership functions, ANFIS with subtractive clustering (genfis2) and Gaussian membership function performed best in all classification datasets. Similarly, the result of ANFIS with grid partitioning method which is genfis1 shows that gaussian membership function have performed best and achieved best RMSE as compared to three other shapes i.e., trapezoidal, bell, and triangular because Gaussian membership function draws smooth curve which represents the data points

**Dataset RMSE and rank trapmf gbellmf trimf gaussmf gaussmf gaussmf**

D1 Train. RMSE 0.0361 0.0244 0.0102 0.0629 0.0551 0.1552

Test RMSE 0.6219 0.4111 0.1983 0.2195 0.1234 0.3461 Rank 6 4 2 3 1 5 D2 Train. RMSE 0.5186 0.4909 0.5090 0.4317 0.1021 0.4162

Test RMSE 2.2147 1.5992 4.7986 1.1171 1.8323 1.8443 Rank 5 3 6 1 2 4 D3 Train. RMSE 0.1714 0.1714 0.1769 0.1758 0.1283 0.1872

Test RMSE 0.2826 0.2432 0.2225 0.2216 0.5201 0.2185 Rank 5 4 2 1 6 3 D4 Train. RMSE 0.0274 0.0158 0.0555 0.0158 7.7889e-05 0.0297

Test RMSE 2.1954 0.9718 1.3917 6.6173 0.7233 0.5780 Rank 5 3 4 6 2 1 D5 Train. RMSE 0.1154 0.1065 0.1065 0.1065 0.1065 0.1442

Test RMSE 2.1406 1.4741 0.9554 1.4568 0.5469 2.9828 Rank 5 4 2 3 1 6 D6 Train. RMSE 0.0613 0.0443 2.4258 0.1074 0.4597 0.3135

Test RMSE 31.4304 206.4092 2.4367 29.1106 2.0372 2.6016 Rank 5 6 3 4 1 2 Avg. Train. RMSE 0.1555 0.1422 0.5473 0.1500 0.1420 0.2077 Avg. Test RMSE 6.4809 35.1848 1.6672 6.4571 0.9639 1.4285 Avg. Rank 5.1667 4 3.1667 3 02.1667 3.5 **Overall Rank 6 5 3 2 1 4**

**Table 3.** Simulation results of membership function.

**(genfis1) (genfis1) (genfis1) (genfis1) (genfis2) (genfis3)**

A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification

http://dx.doi.org/10.5772/intechopen.75575

41

## **4. Experimental results and discussions**

To achieve objectives of the research, this section discusses and evaluates the experimental results of the study in detail by following the proposed research framework carefully,explained in previous section. Firstly, the experiments have been performed with ANFIS models genfis1, genfis2 and genfis3 to investigate the best suitable membership function for input parameters as ANFIS model. Secondly, standard ANFIS model consists of five layers was modified by employing ABC (Artificial Bee Colony) optimization algorithm to train the parameters of ANFIS model instead of gradient based learning mechanism. Therefore, this study will reduce computational complexity of ANFIS architecture. The performance of proposed modified ANFIS model was compared with another three clustering genfis1, genfis2 and genfis3 based ANFIS models with the measurement criteria of MSE, percentage of accuracy and number of trainable parameters with the number of epochs. These simulations were performed to solve the classification problems; thus, six benchmark classification datasets naming Iris, Teacher Assistant Evaluation, Car Evaluation, Seeds, Breast Cancer and Glass Identification were taken from University of California Irvine Machine Learning Repository (UCIMLR) ranging from 4 to 10 input attributes.

## **4.1. Membership function**

To evaluate the results of membership functions on performance of ANFIS, Fuzzy Logic ToolboxTM was used in MATLAB to employ ANFIS models with clustering methods genfis1, genfis2 and genfis3 into FIS as input model. Furthermore, most of the settings for ANFIS models were used as default as mentioned in the toolbox, however the distinguishing changes are presented in **Table 2**.

As ANFIS with grid-partitioning method (genfis1) offers to try and test different types of membership functions, it is noteworthy to mention that ANFIS with grid-partitioning (genfis1) was used to test 4 basic types of membership functions, however subtractive clustering (genfis2) and fuzzy c-mean clustering (genfis3) use Gaussian types of membership function by default in MATLAB toolbox. The performance of ANFIS with different partitioning methods and membership functions is evaluated using Root Mean Square Error (RMSE). Similarly, performance of ANFIS with different membership functions have been


**Table 2.** ANFIS settings for membership functions.

ranked from smallest to largest (rank 1–6) according to the sum of training and testing RMSE for each dataset. Furthermore, the average of these ranks is being computed to generate overall rank in the group of membership functions. **Table 3** presents the final result of comparison in terms of average of train and test RMSE, average of ranks and overall ranks.

The overall results of the experiments from **Table 3** shows that among all three models of ANFIS (i.e. genfis1, genfis2, and genfis3) and membership functions, ANFIS with subtractive clustering (genfis2) and Gaussian membership function performed best in all classification datasets. Similarly, the result of ANFIS with grid partitioning method which is genfis1 shows that gaussian membership function have performed best and achieved best RMSE as compared to three other shapes i.e., trapezoidal, bell, and triangular because Gaussian membership function draws smooth curve which represents the data points


**Table 3.** Simulation results of membership function.

but this approach will also apply efficient training mechanism to train ANFIS parameters by

To achieve objectives of the research, this section discusses and evaluates the experimental results of the study in detail by following the proposed research framework carefully,explained in previous section. Firstly, the experiments have been performed with ANFIS models genfis1, genfis2 and genfis3 to investigate the best suitable membership function for input parameters as ANFIS model. Secondly, standard ANFIS model consists of five layers was modified by employing ABC (Artificial Bee Colony) optimization algorithm to train the parameters of ANFIS model instead of gradient based learning mechanism. Therefore, this study will reduce computational complexity of ANFIS architecture. The performance of proposed modified ANFIS model was compared with another three clustering genfis1, genfis2 and genfis3 based ANFIS models with the measurement criteria of MSE, percentage of accuracy and number of trainable parameters with the number of epochs. These simulations were performed to solve the classification problems; thus, six benchmark classification datasets naming Iris, Teacher Assistant Evaluation, Car Evaluation, Seeds, Breast Cancer and Glass Identification were taken from University of California Irvine Machine Learning Repository (UCIMLR)

To evaluate the results of membership functions on performance of ANFIS, Fuzzy Logic ToolboxTM was used in MATLAB to employ ANFIS models with clustering methods genfis1, genfis2 and genfis3 into FIS as input model. Furthermore, most of the settings for ANFIS models were used as default as mentioned in the toolbox, however the distinguishing changes are

As ANFIS with grid-partitioning method (genfis1) offers to try and test different types of membership functions, it is noteworthy to mention that ANFIS with grid-partitioning (genfis1) was used to test 4 basic types of membership functions, however subtractive clustering (genfis2) and fuzzy c-mean clustering (genfis3) use Gaussian types of membership function by default in MATLAB toolbox. The performance of ANFIS with different partitioning methods and membership functions is evaluated using Root Mean Square Error (RMSE). Similarly, performance of ANFIS with different membership functions have been

**ANFIS Membership function type Number of membership functions Epochs Error tolerance**

genfis1 trapmf, gbellmf, trimf, gaussmf 2 200 0 genfis2 gaussmf 10 200 0 genfis3 gaussmf 10 200 0

using meta-heuristic swarm intelligence-based approach.

**4. Experimental results and discussions**

40 Artificial Intelligence - Emerging Trends and Applications

ranging from 4 to 10 input attributes.

**Table 2.** ANFIS settings for membership functions.

**4.1. Membership function**

presented in **Table 2**.

effectively with minute differences. Therefore, as a conclusion, this experiment shows that the Gaussian membership function is best option to employ with ANFIS model to solve classification problems.

was compared with other three ANFIS models, genfis1(grid partitioning), genfis2(subtractive clustering) and genfis3(Fuzzy C Mean), respectively. In this experiment, 6 benchmark classification datasets were employed on the ANFIS models. The performance measurement criteria used for evaluation were MSE, percentage of accuracy, number of trainable parameters and number of epochs. The summary of the simulation based on performance of modified ANFIS model and genfis1(grid partitioning), genfis2(subtractive clustering) and genfis3(Fuzzy C

A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification

http://dx.doi.org/10.5772/intechopen.75575

43

According to the **Table 4**, it is clearly demonstrated that the proposed modified ANFIS model with ABC optimization algorithm outperformed than standard ANFIS models genfis1 with grid partitioning method while solving classification problems. From the results, it can be concluded that the modified ANFIS model shows better MSE and reasonable percentage of accuracy by training the less number of trainable parameters with equal number of epochs as compare to standard ANFIS model. Hence, removing the fourth layer and reducing the parameters for training not only solved the issue of computational complexity of standard ANFIS model but it is also can be considered as a perfect solution to save the training cost of the model.

Based on the results of experiments, this section summarized that as ANFIS accuracy highly depends on the shape and type of membership function. To achieve this, ANFIS with three partitioning method (i) grid-partitioning (genfis1), (ii) subtractive clustering (genfis2), and (iii) fuzzy c-mean clustering (genfis3) have been used to evaluate the membership functions. The simulation results clarify that gaussian membership function is best fit to employ with ANFIS model to solve classification problems. Moreover, as the standard ANFIS architecture holds five layers and uses gradient based learning that increases the complexity of ANFIS architecture when the number of inputs are large, thus, this research proposed a modified ANFIS architecture by reducing the fourth layer as well as reducing the parameters in consequent part, reducing the burden of training the parameters. Apart from that, one of the meta-heuristics optimization algorithm; artificial bee colony (ABC) algorithm have been used to train the parameters of ANFIS model instead of using gradient decent. The performance of proposed modified ANFIS model was compared with another three ANFIS models genfis1(grid partitioning), genfis2(subtractive clustering) and genfis3(Fuzzy C Mean). The result in Table 4.17 proves that that the designed modified ANFIS model can be implemented by researchers

Furthermore, for future work, more experiments can be performed to find out the appropriate membership function for standard ANFIS model and modified ANFIS model to deal with problems other than classification problems such as time series and clustering problems. This research focused on finding out the good membership function for ANFIS model for classification problems. Additionally, this study modified ANFIS model to make it less complex and instead of using typical two pass learning algorithms to train the parameters, this study applied one of the meta-heuristic approach which is ABC optimization algorithm. As metaheuristic approach provide huge options for optimization algorithms such as ant colony (AC), particle swarm optimization (PSO), firefly algorithm (FFA), cuckoo search algorithm

Mean), respectively, is shown in **Table 4**.

**5. Conclusion and future work**

while solving classification problems with confidence.

**Table 4** shows the simulation results for the performance of modified ANFIS architecture with other ANFIS models. To evaluate the performance, the proposed modified ANFIS model


**Table 4.** Performance analysis of modified ANFIS model and standard ANFIS model.

was compared with other three ANFIS models, genfis1(grid partitioning), genfis2(subtractive clustering) and genfis3(Fuzzy C Mean), respectively. In this experiment, 6 benchmark classification datasets were employed on the ANFIS models. The performance measurement criteria used for evaluation were MSE, percentage of accuracy, number of trainable parameters and number of epochs. The summary of the simulation based on performance of modified ANFIS model and genfis1(grid partitioning), genfis2(subtractive clustering) and genfis3(Fuzzy C Mean), respectively, is shown in **Table 4**.

According to the **Table 4**, it is clearly demonstrated that the proposed modified ANFIS model with ABC optimization algorithm outperformed than standard ANFIS models genfis1 with grid partitioning method while solving classification problems. From the results, it can be concluded that the modified ANFIS model shows better MSE and reasonable percentage of accuracy by training the less number of trainable parameters with equal number of epochs as compare to standard ANFIS model. Hence, removing the fourth layer and reducing the parameters for training not only solved the issue of computational complexity of standard ANFIS model but it is also can be considered as a perfect solution to save the training cost of the model.

## **5. Conclusion and future work**

effectively with minute differences. Therefore, as a conclusion, this experiment shows that the Gaussian membership function is best option to employ with ANFIS model to solve

**Table 4** shows the simulation results for the performance of modified ANFIS architecture with other ANFIS models. To evaluate the performance, the proposed modified ANFIS model

> genfis2 0.00300 99.8481 96 200 genfis3 0.02400 98.7956 96 200

> genfis1 0.1863 90.6817 212 200 genfis2 0.0104 99.4787 212 200 genfis3 0.1732 91.3388 212 200

> genfis2 0.0164 99.1769 472 200 genfis3 0.0350 98.2478 472 200

> genfis2 0.0113 99.4328 5156 200 genfis3 0.0207 98.9603 5156 200

> genfis2 0.2113 89.4337 11,304 200 genfis3 0.0983 95.0858 11,304 200

> genfis2 261.5473 −12977.36 1052 200 genfis3 8.8209 × 10−4 99.9558 1052 200

0.00310 99.8412 32 200

0.0400 97.9973 52 200

0.0337 98.3136 88 200

0.0102 99.4901 546 200

0.0177 99.1177 1064 200

1.170724 × 10−4 99.9941 156 200

**parameters**

**Epochs**

**Classification problem ANFIS model MSE Accuracy (%) Trainable** 

Modified ANFIS

Modified ANFIS

Modified ANFIS

Modified ANFIS

Modified ANFIS

Modified ANFIS

**Table 4.** Performance analysis of modified ANFIS model and standard ANFIS model.

model

model

model

model

model

model

Iris genfis1 0.00395 99.8021 96 200

Car evaluation genfis1 0.0309 98.4547 472 200

Breast cancer genfis1 0.0113 99.4328 5156 200

Glass identification genfis1 0.0115 99.4232 11,304 200

Seeds genfis1 2.4964 × 10−4 99.9875 1052 200

classification problems.

42 Artificial Intelligence - Emerging Trends and Applications

Teaching assistant evaluation

> Based on the results of experiments, this section summarized that as ANFIS accuracy highly depends on the shape and type of membership function. To achieve this, ANFIS with three partitioning method (i) grid-partitioning (genfis1), (ii) subtractive clustering (genfis2), and (iii) fuzzy c-mean clustering (genfis3) have been used to evaluate the membership functions. The simulation results clarify that gaussian membership function is best fit to employ with ANFIS model to solve classification problems. Moreover, as the standard ANFIS architecture holds five layers and uses gradient based learning that increases the complexity of ANFIS architecture when the number of inputs are large, thus, this research proposed a modified ANFIS architecture by reducing the fourth layer as well as reducing the parameters in consequent part, reducing the burden of training the parameters. Apart from that, one of the meta-heuristics optimization algorithm; artificial bee colony (ABC) algorithm have been used to train the parameters of ANFIS model instead of using gradient decent. The performance of proposed modified ANFIS model was compared with another three ANFIS models genfis1(grid partitioning), genfis2(subtractive clustering) and genfis3(Fuzzy C Mean). The result in Table 4.17 proves that that the designed modified ANFIS model can be implemented by researchers while solving classification problems with confidence.

> Furthermore, for future work, more experiments can be performed to find out the appropriate membership function for standard ANFIS model and modified ANFIS model to deal with problems other than classification problems such as time series and clustering problems. This research focused on finding out the good membership function for ANFIS model for classification problems. Additionally, this study modified ANFIS model to make it less complex and instead of using typical two pass learning algorithms to train the parameters, this study applied one of the meta-heuristic approach which is ABC optimization algorithm. As metaheuristic approach provide huge options for optimization algorithms such as ant colony (AC), particle swarm optimization (PSO), firefly algorithm (FFA), cuckoo search algorithm

(CSA), and genetic algorithm (GA) to train the ANFIS parameters. Hence, in future, other meta-heuristic algorithms can be applied in modified ANFIS model for comparison purpose to find out best meta-heuristic algorithm for the modified ANFIS model.

and Data Mining: The Second International Conference on Soft Computing and Data Mining (SCDM-2016) Proceedings; August 18-20, 2016; Bandung, Indonesia. Cham:

A Modified Neuro-Fuzzy System Using Metaheuristic Approaches for Data Classification

http://dx.doi.org/10.5772/intechopen.75575

45

[10] Akrami SA, El-Shafie A, Jaafar O. Improving rainfall forecasting efficiency using modified adaptive neuro-fuzzy inference system (MANFIS). Journal of Water Resources

[11] Soh AC, Kean KY. Reduction of ANFIS-rules based system through K-map minimization for traffic signal controller. In: 12th International Conference on Control, Automation

[12] Orouskhani M, Mansouri. A hybrid method of modified cat swarm optimization and gradient descent algorithm for training ANFIS. International Journal of Computational

[13] Karaboga D, Kaya E. An adaptive and hybrid artificial bee colony algorithm (aABC) for ANFIS training. Applied Soft Computing. Amsterdam, The Netherlands: Elsevier

[14] Najafi A. Implementing a PSO-ANFIS model for prediction of viscosity of mixed oils.

[15] Ibarra L, Ponce P, Molina A. A new approach to uncertainty description through accomplishment membership functions. Expert Systems with Applications. 2015;**42**(21):7895-7904

[16] Suntae K. A study of fuzzy membership functions for dependence decision-making in security robot system. Neural Computing and Applications. 2017;**28**(1):155-164

[17] Saha S, Bhattacharya S, Konar A. Comparison between type-1 fuzzy membership functions for sign language applications. In: Proceedings of 2016 International Conference on Microelectronics, Computing and Communications (MicroCom 2016), Durgapur, India:

[18] Adil O. Comparison between the effects of different types of membership functions on fuzzy logic controller performance. International Journal of Emerging Engineering

[19] Gholami V et al. Modeling of groundwater level fluctuations using dendrochronology in

[20] Shrivastava H, Sridharan S. Conception of data preprocessing and partitioning procedure for machine learning algorithm. International Journal of Recent Advances in

[21] Uddin J. Optimization of ANFIS using artificial bee colony algorithm for classification of Malaysian SMEs. In: Recent Advances on Soft Computing and Data Mining: The Second International Conference on Soft Computing and Data Mining (SCDM-2016) Proceedings; August 18-20, 2016; Bandung, Indonesia: Springer International Publishing;

alluvial aquifers. Journal of Hydrology. 2015;**529**(Part 3):1060-1069

Engineering & Technology (IJRAET). 2013;**1**(3). ISSN: 2347-2812

Management. 2013;**27**(9):3507-3523. DOI: 10.1007/s11269-013-0361-9

Journal of Petroleum Science and Technology. 2017;**35**(2):155-162

Springer International Publishing; 2017. pp. 91-100

and Systems (ICCAS), 2012; Jeju Island, Korea: IEEE

Intelligence and Applications. 2013;**12**(02):1350007

Science Publishers B. V. 2016;**49**:423-436

Research and Technology. 2015;**3**(3):76-83

IEEE; 2016

2017. ISSN 2194-5357

## **Author details**

Mohd Najib Mohd Salleh\*, Noureen Talpur and Kashif Hussain Talpur

\*Address all correspondence to: najib@uthm.edu.my

Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia

## **References**


and Data Mining: The Second International Conference on Soft Computing and Data Mining (SCDM-2016) Proceedings; August 18-20, 2016; Bandung, Indonesia. Cham: Springer International Publishing; 2017. pp. 91-100

[10] Akrami SA, El-Shafie A, Jaafar O. Improving rainfall forecasting efficiency using modified adaptive neuro-fuzzy inference system (MANFIS). Journal of Water Resources Management. 2013;**27**(9):3507-3523. DOI: 10.1007/s11269-013-0361-9

(CSA), and genetic algorithm (GA) to train the ANFIS parameters. Hence, in future, other meta-heuristic algorithms can be applied in modified ANFIS model for comparison purpose

[1] Buragohain M, Mahanta C. A novel approach for ANFIS modelling based on full factorial design. Applied Soft Computing. Amsterdam, The Netherlands: Elsevier Science

[2] Aghbashlo M et al. An exergetically-sustainable operational condition of a photobiohydrogen production system optimized using conventional and innovative fuzzy

[3] Liu P, Leng W, Fang W. Training ANFIS model with an improved quantum-behaved particle swarm optimization algorithm. Mathematical Problems in Engineering. 2013.

[4] Kar S, Das S, Ghosh PK. Applications of neuro fuzzy systems: A brief review and future outline. Applied Soft Computing. Amsterdam, The Netherlands: Elsevier Science

[5] Das S, Ghosh PK, Kar S. Hypertension diagnosis: A comparative study using fuzzy expert system and neuro fuzzy system. In: IEEE International Conference on Fuzzy Systems (FUZZ), 2013; IEEE; 2013. pg 1-7, 978-1-4799-0020-6, DOI: 10.1109/FUZZ-IEEE.2013.6622434

[6] Salleh MNM, Hussain K.A review of training methods of ANFIS for applications in business and economics. International Journal of u-and e-Service, Science and Technology, International Conference on Recent Trends in Computer Science and Electronics

[7] Abdulshahed AM, Longstaff AP, Fletcher S. A novel approach for ANFIS modelling based on Grey system theory for thermal error compensation. In: 14th UK Workshop on Computational Intelligence (UKCI), 2014; IEEE; 2014. DOI: 10.1109/UKCI.2014.6930155 [8] Le T, Altman T. A new initialization method for the fuzzy c-means algorithm using fuzzy subtractive clustering. In: Proceedings of International Conference on Information

[9] Samat NA, Salleh MNM. A study of data imputation using fuzzy c-means with particle swarm optimization. In: Herawan T, et al., editors. Recent Advances on Soft Computing

Engineering (RTCSE'16), Kuala Lumpur, Malaysia. 2016;**9**(7):165-172

and Knowledge Engineering, 2011 (IKE'11); Las Vegas, USA; 2011

to find out best meta-heuristic algorithm for the modified ANFIS model.

Mohd Najib Mohd Salleh\*, Noureen Talpur and Kashif Hussain Talpur

Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia

\*Address all correspondence to: najib@uthm.edu.my

44 Artificial Intelligence - Emerging Trends and Applications

Publishers B. V. 2008;**8**(1):609-625

Article ID: 595639, 10 pages

Publishers B. V. 2014;**15**:243-259

techniques. Renewable Energy. 2016;**94**:605-618

**Author details**

**References**


**Section 2**

**Engineering and Technology Applications**

**Engineering and Technology Applications**

**Chapter 3**

Provisional chapter

**Differential Evolution Algorithm in the Construction of**

DOI: 10.5772/intechopen.75694

In this chapter, the application of a differential evolution-based approach to induce oblique decision trees (DTs) is described. This type of decision trees uses a linear combination of attributes to build oblique hyperplanes dividing the instance space. Oblique decision trees are more compact and accurate than the traditional univariate decision trees. On the other hand, as differential evolution (DE) is an efficient evolutionary algorithm (EA) designed to solve optimization problems with real-valued parameters, and since finding an optimal hyperplane is a hard computing task, this metaheuristic (MH) is chosen to conduct an intelligent search of a near-optimal solution. Two methods are described in this chapter: one implementing a recursive partitioning strategy to find the most suitable oblique hyperplane of each internal node of a decision tree, and the other conducting a global search of a near-optimal oblique decision tree. A statistical analysis of the experimental results suggests that these methods show better performance as decision tree induction procedures in comparison with other supervised learning approaches.

Knowledge discovery refers to the process of nontrivial extraction of potentially useful and previously unknown information from a dataset [1]. Within the stages of this process, data mining stands out since it allows analyzing the data and producing models for their representation. In particular, machine learning provides data mining with useful procedures to build these models, since many of the techniques aimed at information discovery are based on inductive learning. Decision trees (DTs), artificial neural networks (ANN), and support vector

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Keywords: machine learning, classification, evolutionary algorithms

Differential Evolution Algorithm in the Construction of

**Interpretable Classification Models**

Interpretable Classification Models

Rafael Rivera-Lopez and Juana Canul-Reich

Rafael Rivera-Lopez and Juana Canul-Reich

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75694

Abstract

1. Introduction

#### **Differential Evolution Algorithm in the Construction of Interpretable Classification Models** Differential Evolution Algorithm in the Construction of Interpretable Classification Models

DOI: 10.5772/intechopen.75694

Rafael Rivera-Lopez and Juana Canul-Reich Rafael Rivera-Lopez and Juana Canul-Reich

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75694

#### Abstract

In this chapter, the application of a differential evolution-based approach to induce oblique decision trees (DTs) is described. This type of decision trees uses a linear combination of attributes to build oblique hyperplanes dividing the instance space. Oblique decision trees are more compact and accurate than the traditional univariate decision trees. On the other hand, as differential evolution (DE) is an efficient evolutionary algorithm (EA) designed to solve optimization problems with real-valued parameters, and since finding an optimal hyperplane is a hard computing task, this metaheuristic (MH) is chosen to conduct an intelligent search of a near-optimal solution. Two methods are described in this chapter: one implementing a recursive partitioning strategy to find the most suitable oblique hyperplane of each internal node of a decision tree, and the other conducting a global search of a near-optimal oblique decision tree. A statistical analysis of the experimental results suggests that these methods show better performance as decision tree induction procedures in comparison with other supervised learning approaches.

Keywords: machine learning, classification, evolutionary algorithms

## 1. Introduction

Knowledge discovery refers to the process of nontrivial extraction of potentially useful and previously unknown information from a dataset [1]. Within the stages of this process, data mining stands out since it allows analyzing the data and producing models for their representation. In particular, machine learning provides data mining with useful procedures to build these models, since many of the techniques aimed at information discovery are based on inductive learning. Decision trees (DTs), artificial neural networks (ANN), and support vector

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

machines (SVMs), as well as clustering methods, have been widely used to build predictive models. The ability to track and evaluate every step in the information extraction process is one of the most crucial factors for relying on the models gained from data mining methods [2]. In particular, DTs are classification models characterized by their high levels of comprehensibility and robustness. Knowledge learned via a DT is understandable due to its graphical representation [3], and also DTs can handle noise or data with missing values and make correct predictions [4].

2. Background

Machine learning methods are an essential tool in emerging disciplines such as data science [12] and business intelligence [13] since they provide efficient predictive models constructed from the data previously collected. DTs, ANN, and SVMs, as well as clustering methods, have been widely used to build these models. A DT is a hierarchical model using an ordered sequence of decisions to predict the class membership of new unclassified instances. An ANN consists of many nonlinear elements connected by links associated with weighted variables operating in parallel [14], in which learning is performed iteratively as the network processes the training instances, trying to simulate the way a human being learns from previous experiences. Finally, one SVM finds the hyperplane that best separates the training instances into two different classes using a set of functions called kernels. The optimal hyperplane is described

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

http://dx.doi.org/10.5772/intechopen.75694

51

A DT is an acyclic connected graph with a single root node used as one classification model induced through a set of training instances. A DT contains zero or more internal nodes and one or more leaf nodes [16]. Each internal node evaluates a test condition consisting of a combination of one or more attributes of the dataset, and each leaf node has a class label. The arcs joining an internal node with their successor nodes are labeled with the possible outcomes of its test condition. Each DT branch represents a sequence of decisions made by the model to determine the class membership of a new unclassified instance. The DT induction (DTI) process commonly implements a recursive partition strategy. In each stage of this process, the most appropriate test condition to split the training set is selected according to some partition criterion. As a result of evaluating the training instances with this test condition, two or more instances subsets are created which are assigned to the successor nodes of the current internal node. This process is recursively applied until a stop criterion is reached. If the number of attributes used in the test conditions of the tree internal nodes is regarded, two types of DT can be constructed: axis-parallel or multivariate DTs. An axis-parallel DT is a univariate DT that evaluates a single attribute in each test condition to split the training set. On the other hand, oblique DTs and nonlinear DTs are multivariate DTs in which a linear combination and a nonlinear composition of attributes are utilized in the test conditions of a DT, respectively. Multivariate DTs commonly show better performance, and they are smaller than univariate DTs, but they require more computational effort to induce them. In particular, an oblique

hyperplane divides the instance space into two halfspaces, and it is defined as follows:

where d is the number of attributes in the dataset, xj is the value of the j-th attribute, hj is a realvalued coefficient in the hyperplane, and b represents the independent term of the hyperplane. Figure 1 shows an axis-parallel DT induced from the iris dataset [17] using the J48 method [18], and Figure 2 shows a near-optimal oblique DT constructed from the same dataset by the DEbased method implementing a global search strategy. Iris dataset has four attributes, three

hjxj þ b > 0 (1)

X d

j¼1

with a combination of entry points known as support vectors [15].

On the other hand, soft-computing-based approaches have been widely used to solve complex problems in almost all areas of science and technology. These approaches try to imitate the process of human reasoning when solving a problem with the objective of obtaining acceptable results in a reasonable time. For the case of data mining, soft computing techniques such as ANN, metaheuristics (MHs), fuzzy logic methods, and other approaches have been used as tools to solve the data mining challenges. In particular, an MH is a general algorithmic template based on intelligent processes and behaviors observed in both nature and other disciplines [5]. Evolutionary algorithms (EAs) are one type of MH that have been successfully applied for providing near-optimal solutions for many computationally complex problems in almost all areas of science and technology. The effectiveness of EAs is due to two factors: (1) they combine a clever exploration of the search space to identify promising areas and (2) they perform an efficient exploitation of these areas aiming to improve the known solution or solutions. EAs are inspired by evolutionary theories that synthesize the Darwinian evolution through natural selection with the Mendelian genetic inheritance. In particular, differential evolution (DE) algorithm is an EA designed for solving optimization problems with variables in continuous domains that, instead of implementing traditional crossover and mutation operators, it applies a linear combination of several randomly selected candidate solutions to produce a new solution [6].

MHs have been previously applied to build DTs, and there exist several surveys that describe their implementation [7–11]. Some approaches apply a recursive partitioning strategy in which an MH finds a near-optimal test condition for each internal node of a DT; however, the approach most commonly used is to perform a global search in the solution space with the aim of finding near-optimal DTs. Since DE is one of the most powerful EA to solve real-valued optimization problems, and the task of finding a near-optimal oblique hyperplane with realvalued coefficients is an optimization problem in a continuous space, in this chapter, two DEbased methods to induce oblique DTs are described: one implementing a recursive partitioning strategy to find the most suitable oblique hyperplane of each internal node of a decision tree, and the other conducting a global search of a near-optimal oblique decision tree. A statistical analysis of the experimental results suggests that these methods show better performance as decision tree induction procedures in comparison with other supervised learning approaches.

The rest of this chapter is organized as follows: Section 2 provides a set of basic definitions about DTs and the DE algorithm. The induction of oblique DTs by means of MH-based approaches is described in Section 3. The constituent elements of the DE-based methods described in this chapter is discussed in Section 4, and the experimental results are discussed in Section 5. Finally, Section 6 describes the conclusion and the future work.

## 2. Background

machines (SVMs), as well as clustering methods, have been widely used to build predictive models. The ability to track and evaluate every step in the information extraction process is one of the most crucial factors for relying on the models gained from data mining methods [2]. In particular, DTs are classification models characterized by their high levels of comprehensibility and robustness. Knowledge learned via a DT is understandable due to its graphical representation [3], and also DTs can handle noise or data with missing values and make correct

On the other hand, soft-computing-based approaches have been widely used to solve complex problems in almost all areas of science and technology. These approaches try to imitate the process of human reasoning when solving a problem with the objective of obtaining acceptable results in a reasonable time. For the case of data mining, soft computing techniques such as ANN, metaheuristics (MHs), fuzzy logic methods, and other approaches have been used as tools to solve the data mining challenges. In particular, an MH is a general algorithmic template based on intelligent processes and behaviors observed in both nature and other disciplines [5]. Evolutionary algorithms (EAs) are one type of MH that have been successfully applied for providing near-optimal solutions for many computationally complex problems in almost all areas of science and technology. The effectiveness of EAs is due to two factors: (1) they combine a clever exploration of the search space to identify promising areas and (2) they perform an efficient exploitation of these areas aiming to improve the known solution or solutions. EAs are inspired by evolutionary theories that synthesize the Darwinian evolution through natural selection with the Mendelian genetic inheritance. In particular, differential evolution (DE) algorithm is an EA designed for solving optimization problems with variables in continuous domains that, instead of implementing traditional crossover and mutation operators, it applies a linear combination of several randomly selected candidate solutions to

MHs have been previously applied to build DTs, and there exist several surveys that describe their implementation [7–11]. Some approaches apply a recursive partitioning strategy in which an MH finds a near-optimal test condition for each internal node of a DT; however, the approach most commonly used is to perform a global search in the solution space with the aim of finding near-optimal DTs. Since DE is one of the most powerful EA to solve real-valued optimization problems, and the task of finding a near-optimal oblique hyperplane with realvalued coefficients is an optimization problem in a continuous space, in this chapter, two DEbased methods to induce oblique DTs are described: one implementing a recursive partitioning strategy to find the most suitable oblique hyperplane of each internal node of a decision tree, and the other conducting a global search of a near-optimal oblique decision tree. A statistical analysis of the experimental results suggests that these methods show better performance as decision tree induction procedures in comparison with other supervised learning approaches. The rest of this chapter is organized as follows: Section 2 provides a set of basic definitions about DTs and the DE algorithm. The induction of oblique DTs by means of MH-based approaches is described in Section 3. The constituent elements of the DE-based methods described in this chapter is discussed in Section 4, and the experimental results are discussed

in Section 5. Finally, Section 6 describes the conclusion and the future work.

predictions [4].

50 Artificial Intelligence - Emerging Trends and Applications

produce a new solution [6].

Machine learning methods are an essential tool in emerging disciplines such as data science [12] and business intelligence [13] since they provide efficient predictive models constructed from the data previously collected. DTs, ANN, and SVMs, as well as clustering methods, have been widely used to build these models. A DT is a hierarchical model using an ordered sequence of decisions to predict the class membership of new unclassified instances. An ANN consists of many nonlinear elements connected by links associated with weighted variables operating in parallel [14], in which learning is performed iteratively as the network processes the training instances, trying to simulate the way a human being learns from previous experiences. Finally, one SVM finds the hyperplane that best separates the training instances into two different classes using a set of functions called kernels. The optimal hyperplane is described with a combination of entry points known as support vectors [15].

A DT is an acyclic connected graph with a single root node used as one classification model induced through a set of training instances. A DT contains zero or more internal nodes and one or more leaf nodes [16]. Each internal node evaluates a test condition consisting of a combination of one or more attributes of the dataset, and each leaf node has a class label. The arcs joining an internal node with their successor nodes are labeled with the possible outcomes of its test condition. Each DT branch represents a sequence of decisions made by the model to determine the class membership of a new unclassified instance. The DT induction (DTI) process commonly implements a recursive partition strategy. In each stage of this process, the most appropriate test condition to split the training set is selected according to some partition criterion. As a result of evaluating the training instances with this test condition, two or more instances subsets are created which are assigned to the successor nodes of the current internal node. This process is recursively applied until a stop criterion is reached. If the number of attributes used in the test conditions of the tree internal nodes is regarded, two types of DT can be constructed: axis-parallel or multivariate DTs. An axis-parallel DT is a univariate DT that evaluates a single attribute in each test condition to split the training set. On the other hand, oblique DTs and nonlinear DTs are multivariate DTs in which a linear combination and a nonlinear composition of attributes are utilized in the test conditions of a DT, respectively. Multivariate DTs commonly show better performance, and they are smaller than univariate DTs, but they require more computational effort to induce them. In particular, an oblique hyperplane divides the instance space into two halfspaces, and it is defined as follows:

$$\sum\_{j=1}^{d} h\_{\restriction} \mathbf{x}\_{j} + b > \mathbf{0} \tag{1}$$

where d is the number of attributes in the dataset, xj is the value of the j-th attribute, hj is a realvalued coefficient in the hyperplane, and b represents the independent term of the hyperplane. Figure 1 shows an axis-parallel DT induced from the iris dataset [17] using the J48 method [18], and Figure 2 shows a near-optimal oblique DT constructed from the same dataset by the DEbased method implementing a global search strategy. Iris dataset has four attributes, three

mutation, crossover, and selection operators with the aim to reach a near-optimal solution. Several DE variants differing in the implementation of the mutation and crossover operators have been described in existing literature. In this chapter, the standard DE algorithm, named DE/rand/1/bin in agreement with the nomenclature adopted to refer DE variants, is used as a procedure to find a near-optimal solution. At each iteration of this evolutionary process, known as a generation, a new population of chromosomes is generated from the previous one. For each <sup>i</sup>∈f g <sup>1</sup>; …; NP in the <sup>g</sup>-th generation, xi is taken from the Xg�<sup>1</sup> population, and it is used to build a new vector ui by applying the mutation and crossover operators. Vectors x<sup>i</sup> and ui are known as the target vector and the trial vector, respectively. To build a new chromosome, instead of implementing a traditional mutation operator, DE first applies a linear combination of several chromosomes randomly chosen from the current population (x<sup>r</sup><sup>1</sup> , xr<sup>2</sup> , and xr<sup>3</sup> ) to construct a mutated vector vi <sup>¼</sup> xr<sup>1</sup> <sup>þ</sup> <sup>F</sup> xr<sup>2</sup> � xr<sup>3</sup> ð Þ, where F is a user-specified value representing a scale factor applied to control the differential variation. Next, the crossover

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

a random value is less than a crossover factor (CF), the j-th parameter value of ui is taken from

shows a scheme of the application of the DE operators to build a new chromosome for the next

DE has been used in conjunction with several machine learning techniques to implement classification methods [23–27]. It has been mainly applied to optimize the parameters of classification methods or to conduct preprocessing tasks in a data mining process. DE has several advantages in comparison with other MHs, and since mutation operator is based on a linear combination of several randomly chosen individuals, DE exhibits a good trade-off between its exploitation and exploration skills [28]. On the other hand, although DE requires the definition of a smaller number of parameters compared to other MHs, its performance is

, based on a stochastic decision. If

http://dx.doi.org/10.5772/intechopen.75694

53

. Finally, a one-to-one tournament is applied to determine

, is selected as a member of the new population Xg. Figure 3

operator determines each parameter in u<sup>i</sup> from either xi or vi

<sup>j</sup> <sup>¼</sup> xi j

sensitive to the values selected for CR, F, and NP.

Figure 3. DE operators applied to build a new chromosome for the next population.

vi

population.

, otherwise its value is u<sup>i</sup>

which vector, between xi and u<sup>i</sup>

Figure 1. An axis-parallel DT induced from the iris dataset.

Figure 2. An oblique DT induced from the iris dataset.

class labels, and 150 instances. It is clear that the oblique DT is more compact and more accurate than its axis-parallel version, but it has been proved that to find the oblique hyperplane that minimizes the number of misclassified instances both above and below is an NPhard problem [19].

On the other hand, MHs are general algorithmic templates that can be easily adapted to solve almost all optimization problems [20]. MHs are nature-inspired procedures using stochastic components to find a near-optimal solution and have several parameters that need to be fitted to the specific problem [21]. In accordance with the number of candidate solutions used in its search procedure, MHs have been grouped in single-solution-based MHs and populationbased MHs [22]. Single-solution-based MHs implement intelligent search procedures that iteratively replace a candidate solution with a neighboring solution with the aim of reaching a near-optimal solution. Simulated annealing (SA) and Tabu search (TS) are two well-known single-solution-based MHs. Population-based MHs use a group of candidate solutions in each step of their iterative process. The most commonly used population-based MHs are related to EAs and Swarm intelligence (SI) methods. Genetic algorithms (GA), genetic programming (GP), evolutionary strategies (ES) and DE are the most prominent EAs, and ant colony optimization (ACO) and particle swarm optimization (PSO) are examples of SI methods.

In particular, DE is an effective EA designed to solve optimization problems with real-valued parameters [6]. DE evolves a population X <sup>¼</sup> <sup>x</sup><sup>1</sup>; ; <sup>x</sup><sup>2</sup>;…; ; <sup>x</sup>NP of NP chromosomes by applying mutation, crossover, and selection operators with the aim to reach a near-optimal solution. Several DE variants differing in the implementation of the mutation and crossover operators have been described in existing literature. In this chapter, the standard DE algorithm, named DE/rand/1/bin in agreement with the nomenclature adopted to refer DE variants, is used as a procedure to find a near-optimal solution. At each iteration of this evolutionary process, known as a generation, a new population of chromosomes is generated from the previous one. For each <sup>i</sup>∈f g <sup>1</sup>; …; NP in the <sup>g</sup>-th generation, xi is taken from the Xg�<sup>1</sup> population, and it is used to build a new vector ui by applying the mutation and crossover operators. Vectors x<sup>i</sup> and ui are known as the target vector and the trial vector, respectively. To build a new chromosome, instead of implementing a traditional mutation operator, DE first applies a linear combination of several chromosomes randomly chosen from the current population (x<sup>r</sup><sup>1</sup> , xr<sup>2</sup> , and xr<sup>3</sup> ) to construct a mutated vector vi <sup>¼</sup> xr<sup>1</sup> <sup>þ</sup> <sup>F</sup> xr<sup>2</sup> � xr<sup>3</sup> ð Þ, where F is a user-specified value representing a scale factor applied to control the differential variation. Next, the crossover operator determines each parameter in u<sup>i</sup> from either xi or vi , based on a stochastic decision. If a random value is less than a crossover factor (CF), the j-th parameter value of ui is taken from vi , otherwise its value is u<sup>i</sup> <sup>j</sup> <sup>¼</sup> xi j . Finally, a one-to-one tournament is applied to determine which vector, between xi and u<sup>i</sup> , is selected as a member of the new population Xg. Figure 3 shows a scheme of the application of the DE operators to build a new chromosome for the next population.

DE has been used in conjunction with several machine learning techniques to implement classification methods [23–27]. It has been mainly applied to optimize the parameters of classification methods or to conduct preprocessing tasks in a data mining process. DE has several advantages in comparison with other MHs, and since mutation operator is based on a linear combination of several randomly chosen individuals, DE exhibits a good trade-off between its exploitation and exploration skills [28]. On the other hand, although DE requires the definition of a smaller number of parameters compared to other MHs, its performance is sensitive to the values selected for CR, F, and NP.

Figure 3. DE operators applied to build a new chromosome for the next population.

class labels, and 150 instances. It is clear that the oblique DT is more compact and more accurate than its axis-parallel version, but it has been proved that to find the oblique hyperplane that minimizes the number of misclassified instances both above and below is an NP-

On the other hand, MHs are general algorithmic templates that can be easily adapted to solve almost all optimization problems [20]. MHs are nature-inspired procedures using stochastic components to find a near-optimal solution and have several parameters that need to be fitted to the specific problem [21]. In accordance with the number of candidate solutions used in its search procedure, MHs have been grouped in single-solution-based MHs and populationbased MHs [22]. Single-solution-based MHs implement intelligent search procedures that iteratively replace a candidate solution with a neighboring solution with the aim of reaching a near-optimal solution. Simulated annealing (SA) and Tabu search (TS) are two well-known single-solution-based MHs. Population-based MHs use a group of candidate solutions in each step of their iterative process. The most commonly used population-based MHs are related to EAs and Swarm intelligence (SI) methods. Genetic algorithms (GA), genetic programming (GP), evolutionary strategies (ES) and DE are the most prominent EAs, and ant colony optimi-

zation (ACO) and particle swarm optimization (PSO) are examples of SI methods.

In particular, DE is an effective EA designed to solve optimization problems with real-valued parameters [6]. DE evolves a population X <sup>¼</sup> <sup>x</sup><sup>1</sup>; ; <sup>x</sup><sup>2</sup>;…; ; <sup>x</sup>NP of NP chromosomes by applying

hard problem [19].

Figure 1. An axis-parallel DT induced from the iris dataset.

52 Artificial Intelligence - Emerging Trends and Applications

Figure 2. An oblique DT induced from the iris dataset.

## 3. Induction of oblique decision trees using metaheuristics

Several MHs have been used to induce DTs with methods implementing a recursive partitioning strategy. One timeline of these methods is shown in Figure 4. Single-solution-based MHs such as SA and TS have been used to induce DTs through this strategy. SA is applied in the simulated annealing of decision trees (SADT) method [29] that iteratively perturbs one randomly selected coefficient to build a new hyperplane, and in one variant of the oblique classifier 1 (OC1) system [30], named OC1-SA [31], that disturbs simultaneously several coefficients of the best axis-parallel hyperplane found by the OC1 algorithm. TS is used in the linear discriminant and TS (LDTS) method [32] and in the linear discrete support vector DT with TS (LDSDTTS) method [33]. Furthermore, EAs such as ES, GA, and DE also have been applied to build an oblique DT through this strategy. The OC1-ES algorithm [31] and the multimembered ES oblique DT (MESODT) method [34] obtain a near-optimal hyperplane using the 1ð Þ þ 1 -ES and the μ; λ -ES, respectively. Furthermore, GA evolves a population of hyperplanes encoded: (1) with a binary chromosome in the binary tree-GA (BTGA) algorithm [35] and in the HereBoy for DT (HBDT) method [36] and (2) with a real-valued chromosome in the OC1- GA algorithm [31] and in the procedures described by Krȩtowski [37], and by Pangilinan and Janssens [38]. Finally, DE is applied in an OC1 variant named OC1-DE algorithm [39].

and evolved trees (TARGET) algorithm [44] use trees as chromosomes. Furthermore, the standard GP is applied by Liu and Xu [45], the strongly typed GP is used by Bot and Langdon [46, 47], and the grammar-based GP is utilized in the GP with margin maximization (GP-MM) method [48]. Finally, DE is implemented in the perceptron DT (PDT) method [49, 50] and in the

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

http://dx.doi.org/10.5772/intechopen.75694

55

In this chapter, two methods to induce an oblique DT using the DE/rand/1/bin algorithm are described. The first method, named OC1-DE, is similar to the OC1 system and its variants, but it applies DE to find a near-optimal hyperplane at each internal node of an oblique DT [39]. The second one, named DE-ODT method, implements a global search strategy to induce

The OC1-DE method is based on the OC1 system [30] and the OC1-GA procedure [31]. The OC1 system applies a two-step process to find a better hyperplane. First, it finds the best axisparallel hyperplane splitting the instance set. Next, it applies two procedures to disturb the

• Sequential perturbation: This is a deterministic rule that adjusts the hyperplane coefficients,

• Random vector perturbation: When the sequential perturbation reaches a local optimum, a random vector is added to the current hyperplane with the aim of looking elsewhere in

Finally, this procedure returns as the best hyperplane to the one selected between the bestperturbed hyperplane and the best axis-parallel hyperplane. OC1 uses several criteria to evaluate the quality of the candidate hyperplanes such as information gain [52] and three criteria introduced by Heath [19]: max minority, minority sum, and sum of impurities. Induced DT is pruned by removing sub-trees whose impurity value is less than a predefined threshold value. An improved OC1 version [53] uses the elements defined in the CART method [54]: the Gini index and the twoing rule as splitting criteria and the cost-complexity pruning method. On the other hand, the OC1-GA method is an OC1 variant encoding the hyperplane coefficients in a real-valued chromosome. First, the axis-parallel hyperplane that best splits the training instances is obtained. This hyperplane is copied to 10% of the initial population and the remaining hyperplanes are randomly created. Then, OC1-GA evolves this population to find a near-optimal hyperplane evaluating its quality through the twoing rule. Oblique DT is then induced in a recursive partitioning strategy, and it is pruned using the cost-complexity

DE for inducing oblique DTs (DE-ODT) method [51].

oblique DTs [51].

hyperplane coefficients:

the solutions space.

pruning method.

4. DE-based methods to induce oblique decision trees

4.1. OC1-DE method to search near-optimal oblique hyperplanes

taking one at a time and looking for its optimal value.

On the other hand, several MH-based approaches implementing a global search strategy have been described in the existing literature. GA evolves a population of variable-length chromosomes in the generalized decision tree inducer (GDTI) method [40] and in the evolutionary full tree induction (EFTI) method [41]. Other GA-based approaches such as the Global EA for oblique DTI (GEA-ODT) procedure [42, 43] and the tree analysis with randomly generated

Figure 4. Timeline of the MH-based approaches to induce oblique DTs.

and evolved trees (TARGET) algorithm [44] use trees as chromosomes. Furthermore, the standard GP is applied by Liu and Xu [45], the strongly typed GP is used by Bot and Langdon [46, 47], and the grammar-based GP is utilized in the GP with margin maximization (GP-MM) method [48]. Finally, DE is implemented in the perceptron DT (PDT) method [49, 50] and in the DE for inducing oblique DTs (DE-ODT) method [51].

## 4. DE-based methods to induce oblique decision trees

3. Induction of oblique decision trees using metaheuristics

54 Artificial Intelligence - Emerging Trends and Applications

Several MHs have been used to induce DTs with methods implementing a recursive partitioning strategy. One timeline of these methods is shown in Figure 4. Single-solution-based MHs such as SA and TS have been used to induce DTs through this strategy. SA is applied in the simulated annealing of decision trees (SADT) method [29] that iteratively perturbs one randomly selected coefficient to build a new hyperplane, and in one variant of the oblique classifier 1 (OC1) system [30], named OC1-SA [31], that disturbs simultaneously several coefficients of the best axis-parallel hyperplane found by the OC1 algorithm. TS is used in the linear discriminant and TS (LDTS) method [32] and in the linear discrete support vector DT with TS (LDSDTTS) method [33]. Furthermore, EAs such as ES, GA, and DE also have been applied to build an oblique DT through this strategy. The OC1-ES algorithm [31] and the multimembered ES oblique DT (MESODT) method [34] obtain a near-optimal hyperplane using the 1ð Þ þ 1 -ES and the μ; λ -ES, respectively. Furthermore, GA evolves a population of hyperplanes encoded: (1) with a binary chromosome in the binary tree-GA (BTGA) algorithm [35] and in the HereBoy for DT (HBDT) method [36] and (2) with a real-valued chromosome in the OC1- GA algorithm [31] and in the procedures described by Krȩtowski [37], and by Pangilinan and

Janssens [38]. Finally, DE is applied in an OC1 variant named OC1-DE algorithm [39].

Figure 4. Timeline of the MH-based approaches to induce oblique DTs.

On the other hand, several MH-based approaches implementing a global search strategy have been described in the existing literature. GA evolves a population of variable-length chromosomes in the generalized decision tree inducer (GDTI) method [40] and in the evolutionary full tree induction (EFTI) method [41]. Other GA-based approaches such as the Global EA for oblique DTI (GEA-ODT) procedure [42, 43] and the tree analysis with randomly generated

In this chapter, two methods to induce an oblique DT using the DE/rand/1/bin algorithm are described. The first method, named OC1-DE, is similar to the OC1 system and its variants, but it applies DE to find a near-optimal hyperplane at each internal node of an oblique DT [39]. The second one, named DE-ODT method, implements a global search strategy to induce oblique DTs [51].

## 4.1. OC1-DE method to search near-optimal oblique hyperplanes

The OC1-DE method is based on the OC1 system [30] and the OC1-GA procedure [31]. The OC1 system applies a two-step process to find a better hyperplane. First, it finds the best axisparallel hyperplane splitting the instance set. Next, it applies two procedures to disturb the hyperplane coefficients:


Finally, this procedure returns as the best hyperplane to the one selected between the bestperturbed hyperplane and the best axis-parallel hyperplane. OC1 uses several criteria to evaluate the quality of the candidate hyperplanes such as information gain [52] and three criteria introduced by Heath [19]: max minority, minority sum, and sum of impurities. Induced DT is pruned by removing sub-trees whose impurity value is less than a predefined threshold value. An improved OC1 version [53] uses the elements defined in the CART method [54]: the Gini index and the twoing rule as splitting criteria and the cost-complexity pruning method.

On the other hand, the OC1-GA method is an OC1 variant encoding the hyperplane coefficients in a real-valued chromosome. First, the axis-parallel hyperplane that best splits the training instances is obtained. This hyperplane is copied to 10% of the initial population and the remaining hyperplanes are randomly created. Then, OC1-GA evolves this population to find a near-optimal hyperplane evaluating its quality through the twoing rule. Oblique DT is then induced in a recursive partitioning strategy, and it is pruned using the cost-complexity pruning method.

The DE implementation to find a near-optimal hyperplane at each internal node of an oblique DT is shown in the Algorithm 1. First, the axis-parallel hyperplane that best splits a set of training instances is obtained (line 1). It is copied to 10% of the initial population, as is proposed in [31], and the remaining hyperplanes are randomly created (line 2). Each random hyperplane is constructed considering that almost two instances with different class labels are separated by the hyperplane. Next, this population is evolved through several generations using the DE operators (lines 3–19), and the best hyperplane in the population is selected (line 20). Finally, the OC1-DE algorithm returns the hyperplane selected between the best axisparallel hyperplane and the best oblique hyperplane produced by DE (line 21).

4.2.1. Linear representation of oblique decision trees

number of leaf nodes of the same DT is 2<sup>H</sup>�<sup>1</sup>

a training set with d attributes is computed as follows:

evolutionary process of this problem has 28 real-valued parameters.

Figure 5. Linear encoding scheme for the internal nodes of a complete binary oblique tree.

In the DE-ODT method, each chromosome represents the internal nodes of a complete binary oblique DT stored in a fixed-length real-valued vector (Figure 5). This vector encodes the set of hyperplanes used as test conditions of the oblique DT. Vector size is determined using both the number of attributes and the number of class labels of the training set whose model is induced. Since each internal node of an oblique DT has a hyperplane as its test condition, the size of the real-valued vector xi used to encode each i-th candidate solution in the population is fixed as neð Þ d þ 1 , where ne is the estimated number of internal nodes of a complete binary oblique DT. Considering that: (1) an oblique DT is more compact than its univariate version and since (2) the DT size is related to the structure of the training set, the DE-ODT method determines the value of ne based on both the number of attributes and the number of class labels (s) in it.

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

If the number of internal nodes of a complete binary DT with height <sup>H</sup> is 2<sup>H</sup>�<sup>1</sup> � 1 and the

and, the size of the real-valued parameter vector representing a sequence of ne hyperplanes for

As an example, if a hypothetical dataset with three numerical attributes and three class labels is used to induce an oblique DT, then <sup>d</sup> <sup>¼</sup> 3 and <sup>s</sup> <sup>¼</sup> 3. In this case, Hi <sup>¼</sup> log2ð Þþ <sup>4</sup> <sup>1</sup> <sup>¼</sup> 3 and Hl <sup>¼</sup> log2ð Þþ <sup>3</sup> <sup>1</sup> <sup>¼</sup> 3. Finally, ne <sup>¼</sup> 2max 3f g ;<sup>3</sup> � <sup>1</sup> <sup>¼</sup> 7. This implies that the oblique DT could have seven internal nodes. Finally, one chromosome representing a candidate solution in the

<sup>þ</sup>1e, and Hl <sup>¼</sup> log2ð Þþ <sup>s</sup> <sup>1</sup> . Using these equations, ne is determined as follows:

, two heights can be obtained: Hi <sup>¼</sup> log2ð Þ <sup>d</sup> <sup>þ</sup> <sup>1</sup>

http://dx.doi.org/10.5772/intechopen.75694

57

ne <sup>¼</sup> 2max Hi ð Þ� ;Hl <sup>1</sup> � <sup>1</sup>, (2)

n ¼ neð Þ d þ 1 : (3)

The hyperplane returned by the OC1-DE is used as the test condition of a new internal node that is added in an oblique DT. This hyperplane is used to split the training instances into two subsets. The OC1-DE method is recursively applied using each subset until a leaf node is created as all instances in the subset have the same class label or a threshold value of unclassified instances is reached. The quality of the hyperplane is obtained using the twoing rule. Finally, a pruning procedure is applied in order to reduce the overfitting of DT produced and to improve its predictive power.


#### 4.2. DE-ODT method to induce oblique decision trees

The DE-ODT method implements a global search strategy in which the DE algorithm is applied to find a near-optimal oblique DT, where each real-valued chromosome encodes only a feasible oblique DT.

#### 4.2.1. Linear representation of oblique decision trees

The DE implementation to find a near-optimal hyperplane at each internal node of an oblique DT is shown in the Algorithm 1. First, the axis-parallel hyperplane that best splits a set of training instances is obtained (line 1). It is copied to 10% of the initial population, as is proposed in [31], and the remaining hyperplanes are randomly created (line 2). Each random hyperplane is constructed considering that almost two instances with different class labels are separated by the hyperplane. Next, this population is evolved through several generations using the DE operators (lines 3–19), and the best hyperplane in the population is selected (line 20). Finally, the OC1-DE algorithm returns the hyperplane selected between the best axis-

The hyperplane returned by the OC1-DE is used as the test condition of a new internal node that is added in an oblique DT. This hyperplane is used to split the training instances into two subsets. The OC1-DE method is recursively applied using each subset until a leaf node is created as all instances in the subset have the same class label or a threshold value of unclassified instances is reached. The quality of the hyperplane is obtained using the twoing rule. Finally, a pruning procedure is applied in order to reduce the overfitting of DT produced and

The DE-ODT method implements a global search strategy in which the DE algorithm is applied to find a near-optimal oblique DT, where each real-valued chromosome encodes only

parallel hyperplane and the best oblique hyperplane produced by DE (line 21).

to improve its predictive power.

56 Artificial Intelligence - Emerging Trends and Applications

4.2. DE-ODT method to induce oblique decision trees

a feasible oblique DT.

In the DE-ODT method, each chromosome represents the internal nodes of a complete binary oblique DT stored in a fixed-length real-valued vector (Figure 5). This vector encodes the set of hyperplanes used as test conditions of the oblique DT. Vector size is determined using both the number of attributes and the number of class labels of the training set whose model is induced. Since each internal node of an oblique DT has a hyperplane as its test condition, the size of the real-valued vector xi used to encode each i-th candidate solution in the population is fixed as neð Þ d þ 1 , where ne is the estimated number of internal nodes of a complete binary oblique DT. Considering that: (1) an oblique DT is more compact than its univariate version and since (2) the DT size is related to the structure of the training set, the DE-ODT method determines the value of ne based on both the number of attributes and the number of class labels (s) in it.

If the number of internal nodes of a complete binary DT with height <sup>H</sup> is 2<sup>H</sup>�<sup>1</sup> � 1 and the number of leaf nodes of the same DT is 2<sup>H</sup>�<sup>1</sup> , two heights can be obtained: Hi <sup>¼</sup> log2ð Þ <sup>d</sup> <sup>þ</sup> <sup>1</sup> <sup>þ</sup>1e, and Hl <sup>¼</sup> log2ð Þþ <sup>s</sup> <sup>1</sup> . Using these equations, ne is determined as follows:

$$m\_{\varepsilon} = 2^{\max(H\_i, H\_l) - 1} - \mathbf{1}\_{\prime} \tag{2}$$

and, the size of the real-valued parameter vector representing a sequence of ne hyperplanes for a training set with d attributes is computed as follows:

$$m = n\_e(d+1). \tag{3}$$

As an example, if a hypothetical dataset with three numerical attributes and three class labels is used to induce an oblique DT, then <sup>d</sup> <sup>¼</sup> 3 and <sup>s</sup> <sup>¼</sup> 3. In this case, Hi <sup>¼</sup> log2ð Þþ <sup>4</sup> <sup>1</sup> <sup>¼</sup> 3 and Hl <sup>¼</sup> log2ð Þþ <sup>3</sup> <sup>1</sup> <sup>¼</sup> 3. Finally, ne <sup>¼</sup> 2max 3f g ;<sup>3</sup> � <sup>1</sup> <sup>¼</sup> 7. This implies that the oblique DT could have seven internal nodes. Finally, one chromosome representing a candidate solution in the evolutionary process of this problem has 28 real-valued parameters.

Figure 5. Linear encoding scheme for the internal nodes of a complete binary oblique tree.

#### 4.2.2. Induction of feasible oblique decision trees

The DE-ODT method applies the following steps to map an oblique DT from a chromosome xi of the population:

1. Hyperplanes construction: xi is used to build the vector wi representing the sequence of candidate hyperplanes utilized in the internal nodes of a partial DT. Since the values of xi represent the hyperplane coefficients contained in these nodes, the following criterion applies: Values xi <sup>1</sup>; …; xi dþ1 are assigned to the hyperplane h<sup>1</sup> , the values xi <sup>d</sup>þ<sup>2</sup>;…; <sup>x</sup><sup>i</sup> 2dþ2 are assigned to the hyperplane h<sup>2</sup> , and so on. For each j∈f g 1; …; ne , and for each <sup>k</sup> <sup>∈</sup>f g <sup>1</sup>; …; <sup>d</sup> <sup>þ</sup> <sup>1</sup> , the <sup>k</sup>-th coefficient of hj is designed as follows

$$\mathbf{h}\_k^j = \mathbf{x}\_{(j-1)(d+1)+k}^i. \tag{4}$$

complete training set for the root node of the tree), and it is labeled as an internal node. To evaluate each instance in this set using the hyperplane associated to the internal node, two instances subsets are created, and they are assigned to the successor nodes of this node. This assignment is repeated for each node of the partial DT. If the internal node is located at the end of a branch of the DT, then two leaf nodes are created, and they are designated as successor nodes of this node. The instances subsets created are assigned to these leaf nodes. On the other hand, if all instances in the set assigned to the internal node have the same class label, it is labeled as a leaf node and its successor nodes are removed, if they exist. Figure 8 shows an example of this tree-completion procedure. Figure 8 shows that all the instances assigned to w<sup>3</sup> and w<sup>5</sup> have the same class label, so they are designated as leaf nodes, and the successor nodes of w<sup>3</sup> are removed from the tree. On the other hand, since w<sup>4</sup> is the ending node of a branch, its instance set is split using its hyperplane, the instances subsets produced are assigned to two new leaf nodes, and their majoritarian classes are assigned as their class labels. It can be observed that this tree has three internal nodes and

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

http://dx.doi.org/10.5772/intechopen.75694

59

The Algorithm 2 shows the structure of the DE-ODT method described in this chapter. This procedure requires to identify the training set used to induce an oblique DT, as well as the three control parameters applied by the DE algorithm (CR, F, and NP) and the threshold value (τ) used to determine if a node is labeled as a leaf node. First, the DE-ODT method gets the attributes vector (a), the vector of class labels (c), and the instance set (ι) from the dataset whose

four leaf nodes.

4.2.3. General structure of the DE-ODT method

Figure 7. Construction of a partial oblique DT with only internal nodes.

These hyperplanes are assigned to the elements of w<sup>i</sup> : h<sup>1</sup> is assigned to w<sup>i</sup> <sup>1</sup>, <sup>h</sup><sup>2</sup> is assigned to wi <sup>2</sup>, and so on. Figure 6 shows an example of the construction of a set of hyperplanes from one chromosome for the hypothetical dataset previously described. Once wi is completed, it is used to create a partial DT with only internal nodes.


Figure 6. Construction of a set of hyperplanes from xi .

Differential Evolution Algorithm in the Construction of Interpretable Classification Models http://dx.doi.org/10.5772/intechopen.75694 59

Figure 7. Construction of a partial oblique DT with only internal nodes.

4.2.2. Induction of feasible oblique decision trees

58 Artificial Intelligence - Emerging Trends and Applications

<sup>1</sup>; …; xi dþ1

are assigned to the hyperplane h<sup>2</sup>

of the population:

wi

applies: Values xi

partial oblique DT from w<sup>i</sup>

successor nodes of w<sup>i</sup>

<sup>2</sup> and wi

Figure 6. Construction of a set of hyperplanes from xi

root node, wi

The DE-ODT method applies the following steps to map an oblique DT from a chromosome xi

1. Hyperplanes construction: xi is used to build the vector wi representing the sequence of candidate hyperplanes utilized in the internal nodes of a partial DT. Since the values of xi represent the hyperplane coefficients contained in these nodes, the following criterion

<sup>2</sup>, and so on. Figure 6 shows an example of the construction of a set of hyperplanes from one chromosome for the hypothetical dataset previously described. Once wi is completed,

elements of w<sup>i</sup> are inserted in pT<sup>i</sup> as successor nodes of those previously added so that each new level of the tree is completed before placing new nodes at the next level, in a similar way to the breadth-first search strategy. Figure 7 shows an example of the construction of a

. In this figure, it can be observed that w<sup>i</sup>

, the values xi

, and so on. For each j∈f g 1; …; ne , and for each

: h<sup>1</sup> is assigned to w<sup>i</sup>

<sup>1</sup>, w<sup>i</sup>

<sup>4</sup> and w<sup>i</sup>

ð Þ <sup>j</sup>�<sup>1</sup> ð Þþ <sup>d</sup>þ<sup>1</sup> <sup>k</sup>: (4)

<sup>d</sup>þ<sup>2</sup>;…; <sup>x</sup><sup>i</sup>

<sup>1</sup>, <sup>h</sup><sup>2</sup> is assigned to

. Next, the remaining

<sup>1</sup> is selected as the tree

<sup>5</sup> are designed as the

2dþ2

). First,

are assigned to the hyperplane h<sup>1</sup>

2. Partial oblique decision tree construction: w<sup>i</sup> is used to create the partial tree (pT<sup>i</sup>

<sup>3</sup> are placed as the successor nodes of w<sup>i</sup>

.

3. Oblique decision tree completion: The final stage of the mapping scheme adds leaf nodes in pT<sup>i</sup> using the training set. In this stage, one instance set is assigned to a node (the

the element in the initial location of wi is used as the root node of pT<sup>i</sup>

<sup>k</sup> <sup>∈</sup>f g <sup>1</sup>; …; <sup>d</sup> <sup>þ</sup> <sup>1</sup> , the <sup>k</sup>-th coefficient of hj is designed as follows

These hyperplanes are assigned to the elements of w<sup>i</sup>

it is used to create a partial DT with only internal nodes.

<sup>2</sup>, and so on.

h j <sup>k</sup> <sup>¼</sup> xi

> complete training set for the root node of the tree), and it is labeled as an internal node. To evaluate each instance in this set using the hyperplane associated to the internal node, two instances subsets are created, and they are assigned to the successor nodes of this node. This assignment is repeated for each node of the partial DT. If the internal node is located at the end of a branch of the DT, then two leaf nodes are created, and they are designated as successor nodes of this node. The instances subsets created are assigned to these leaf nodes. On the other hand, if all instances in the set assigned to the internal node have the same class label, it is labeled as a leaf node and its successor nodes are removed, if they exist. Figure 8 shows an example of this tree-completion procedure. Figure 8 shows that all the instances assigned to w<sup>3</sup> and w<sup>5</sup> have the same class label, so they are designated as leaf nodes, and the successor nodes of w<sup>3</sup> are removed from the tree. On the other hand, since w<sup>4</sup> is the ending node of a branch, its instance set is split using its hyperplane, the instances subsets produced are assigned to two new leaf nodes, and their majoritarian classes are assigned as their class labels. It can be observed that this tree has three internal nodes and four leaf nodes.

#### 4.2.3. General structure of the DE-ODT method

The Algorithm 2 shows the structure of the DE-ODT method described in this chapter. This procedure requires to identify the training set used to induce an oblique DT, as well as the three control parameters applied by the DE algorithm (CR, F, and NP) and the threshold value (τ) used to determine if a node is labeled as a leaf node. First, the DE-ODT method gets the attributes vector (a), the vector of class labels (c), and the instance set (ι) from the dataset whose

definition of the parameters of each method, are given. Then, both the model validation technique used in the experiments and the statistical tests applied to evaluate the results obtained are outlined. Finally, a discussion about the performance of the DE-based methods is provided.

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

http://dx.doi.org/10.5772/intechopen.75694

61

A benchmark of 20 datasets chosen from the UCI machine learning repository [55] is used to carry out the experimental study. These datasets have been selected as their attributes are numerical, their instances are classified into two or more classes, and most of them are imbalanced datasets. Table 1 shows the description of these datasets. To ensure that the comparison of the results achieved by the DE variants with those produced by other approaches is not affected by the treatment of the data, all datasets used in this study do not have missing values. Also, the data are not preprocessed, filtered, or normalized, that is, they

The DE-based methods are implemented in the Java language using the JMetal library [56]. The mutation scale factor is linearly decreased from 0:5 to 0:1 as the evolutionary process progresses, and the crossover rate is fixed at 0:9. The decrement in the F value allows more exploration of search space at the beginning of the evolutionary process, and with the passage of the generations, it tries to make a better exploitation of promising areas of this space [57]. The population size is adjusted to 5n, with 250 and 500 chromosomes as lower and upper bound, respectively. These bounds are used to ensure that the population is not so small as not to allow a reasonable exploration of the search space and is not so large as to impact the runtime of the algorithm. Furthermore, the fitness function used in the DE-ODT method computes the training accuracy of each DT in population, and the twoing rule is used as fitness value in the OC1-DE method. The best oblique DT induced by these methods is pruned using the error-based pruning (EBP) approach [58]. Finally, the threshold value used to determine

Dataset Instances Attributes Classes Class distribution Dataset Instances Attributes Classes Class distribution

statlog

blocks

tissue-6

270 13 2 150∣120

5473 10 5 4913∣329∣28∣88∣115

106 9 6 22∣21∣14∣15∣16∣18

class

Australian 690 14 2 307∣383

Parkinsons 195 22 2 48∣147

Segment 2310 19 7 330 instances per

Glass 214 9 7 70∣76∣17∣0∣13∣9∣29 Diabetes 768 8 2 500∣268

Ionosphere 351 34 2 126∣225 Wine 178 13 3 59∣71∣48 Sonar 208 60 2 97∣111 Vehicle 846 18 4 212∣217∣218∣199

Ecoli 336 7 8 143∣77∣52∣35∣20∣5∣2∣2 Spambase 4601 57 2 1813∣2788

class

class

class

345 6 2 145∣200 Page-

5.1. Experimental setup

are used as they are obtained from the UCI repository.

Balance-scale 625 4 3 288∣49∣288 Heart-

Blood-t 748 4 2 570∣178 Breast-

360 90 15 24 instances per

Iris 150 4 3 50 instances per

Seeds 210 6 3 70 instances per

Table 1. Description of datasets used in the experiments.

Liverdisorder

Movementlibras

Figure 8. Completion of an oblique DT using pT<sup>i</sup> and the training instances.

model must be built (line 1). Next, the value of d and n are computed (lines 2–4). Then, the DE algorithm evolves a population of real-valued chromosomes encoding oblique DTs. DE selects the best candidate solution xbest in the last population as the result of its evolutionary process (line 5). After that, a near-optimal oblique DT is constructed applying the procedures described in the previous paragraphs (lines 6–8). Since the DE-ODT method uses an a priori definition of the size of the chromosome, it is possible that some leaf nodes in the DT do not meet the following conditions: that the size of its instances subset is less than τ or that all the instances in the subset belong to the same class. The DE-ODT method refines the DT by replacing nonoptimal leaf nodes with sub-trees (line 9). Finally, the oblique DT is pruned to reduce the possible overfitting generated by applying this refinement (line 10).

This procedure allows inducing feasible oblique DTs with a different number of nodes, although they are represented with a fixed-length parameter vector.


## 5. Experimental study

In this chapter, the experimental study carried out to analyze the performance of the DE-based methods for DTI is detailed. First, a description of the datasets used in this study, as well as the definition of the parameters of each method, are given. Then, both the model validation technique used in the experiments and the statistical tests applied to evaluate the results obtained are outlined. Finally, a discussion about the performance of the DE-based methods is provided.

## 5.1. Experimental setup

model must be built (line 1). Next, the value of d and n are computed (lines 2–4). Then, the DE algorithm evolves a population of real-valued chromosomes encoding oblique DTs. DE selects the best candidate solution xbest in the last population as the result of its evolutionary process (line 5). After that, a near-optimal oblique DT is constructed applying the procedures described in the previous paragraphs (lines 6–8). Since the DE-ODT method uses an a priori definition of the size of the chromosome, it is possible that some leaf nodes in the DT do not meet the following conditions: that the size of its instances subset is less than τ or that all the instances in the subset belong to the same class. The DE-ODT method refines the DT by replacing nonoptimal leaf nodes with sub-trees (line 9). Finally, the oblique DT is pruned to reduce the

This procedure allows inducing feasible oblique DTs with a different number of nodes,

In this chapter, the experimental study carried out to analyze the performance of the DE-based methods for DTI is detailed. First, a description of the datasets used in this study, as well as the

possible overfitting generated by applying this refinement (line 10).

Figure 8. Completion of an oblique DT using pT<sup>i</sup> and the training instances.

60 Artificial Intelligence - Emerging Trends and Applications

although they are represented with a fixed-length parameter vector.

5. Experimental study

A benchmark of 20 datasets chosen from the UCI machine learning repository [55] is used to carry out the experimental study. These datasets have been selected as their attributes are numerical, their instances are classified into two or more classes, and most of them are imbalanced datasets. Table 1 shows the description of these datasets. To ensure that the comparison of the results achieved by the DE variants with those produced by other approaches is not affected by the treatment of the data, all datasets used in this study do not have missing values. Also, the data are not preprocessed, filtered, or normalized, that is, they are used as they are obtained from the UCI repository.

The DE-based methods are implemented in the Java language using the JMetal library [56]. The mutation scale factor is linearly decreased from 0:5 to 0:1 as the evolutionary process progresses, and the crossover rate is fixed at 0:9. The decrement in the F value allows more exploration of search space at the beginning of the evolutionary process, and with the passage of the generations, it tries to make a better exploitation of promising areas of this space [57]. The population size is adjusted to 5n, with 250 and 500 chromosomes as lower and upper bound, respectively. These bounds are used to ensure that the population is not so small as not to allow a reasonable exploration of the search space and is not so large as to impact the runtime of the algorithm. Furthermore, the fitness function used in the DE-ODT method computes the training accuracy of each DT in population, and the twoing rule is used as fitness value in the OC1-DE method. The best oblique DT induced by these methods is pruned using the error-based pruning (EBP) approach [58]. Finally, the threshold value used to determine


Table 1. Description of datasets used in the experiments.

whether a node should be labeled as one leaf node is set to be two instances, and the DT size is defined as the number of leaf nodes of the oblique DT.

The results obtained with the DE-based methods are compared with those achieved by several supervised learning methods available on the WEKA data mining software [62]. First, the accuracy and the size of the DTs got by these algorithms are compared with those obtained by the J48 method [63] and by the SimpleCART (sCART) [54] procedure. Next, the accuracy of the DTs constructed with the DE-based procedures is compared with those achieved using the following classification methods: Naïve Bayes (NB) [64], multilayer perceptron (MLP) [65],

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

http://dx.doi.org/10.5772/intechopen.75694

63

Table 2 and Figure 9 show the average accuracies of the DTs induced by the DTI algorithms as well as those achieved by the OC1-DE method. In Table 2, the best result for each dataset is highlighted with bold numbers, and the numbers in parentheses refer to the ranking reached by each method for each dataset. The last row in this table indicates the average ranking of each method. It is observed that the DE-based methods produce better results than those

A statistical test of the experimental results is conducted to evaluate the performance of the DE-based methods. First, the Friedman test is run and its resulting statistic value is 16.197 for

p-value with a significance level of 5%, H<sup>0</sup> is rejected. Next, the BH post hoc test is applied to

Figure 9. Graphical comparison of the average accuracies obtained by the DTI algorithms and the DE-based methods.

. When evaluating this

four methods and 20 datasets, which has a <sup>p</sup>-value of 1:<sup>033</sup> <sup>10</sup><sup>3</sup>

radial basis function neural network (RBF-NN) [66], and random forest (RF) [67].

5.2. Comparison with DTI methods

generated by the other DTI algorithms.

In this study, a repeated stratified 10-fold cross-validation (CV) procedure is used to estimate the predictive performance of the DE-based methods, and the Friedman test [59] is applied to carry out a statistical analysis of the results produced by these methods as compared to them with those obtained by other classification methods. This nonparametric statistical test evaluates the statistical significance of the experimental results through computing the p-value without making any assumptions about the distribution of the analyzed data. This p-value is used to accept or to reject the null hypothesis H<sup>0</sup> of the experiment which holds that the performance of the compared algorithms does not present significant differences. If the p-value does not exceed a predefined significance level, H<sup>0</sup> is rejected and the Bergmann-Hommel (BH) post hoc test [60] is conducted to detect the differences between all existing pairs of algorithms. These statistical tests are applied using the scmamp R library [61].


Table 2. Average accuracies obtained by the DTI algorithms and the DE-based methods.

The results obtained with the DE-based methods are compared with those achieved by several supervised learning methods available on the WEKA data mining software [62]. First, the accuracy and the size of the DTs got by these algorithms are compared with those obtained by the J48 method [63] and by the SimpleCART (sCART) [54] procedure. Next, the accuracy of the DTs constructed with the DE-based procedures is compared with those achieved using the following classification methods: Naïve Bayes (NB) [64], multilayer perceptron (MLP) [65], radial basis function neural network (RBF-NN) [66], and random forest (RF) [67].

#### 5.2. Comparison with DTI methods

whether a node should be labeled as one leaf node is set to be two instances, and the DT size is

In this study, a repeated stratified 10-fold cross-validation (CV) procedure is used to estimate the predictive performance of the DE-based methods, and the Friedman test [59] is applied to carry out a statistical analysis of the results produced by these methods as compared to them with those obtained by other classification methods. This nonparametric statistical test evaluates the statistical significance of the experimental results through computing the p-value without making any assumptions about the distribution of the analyzed data. This p-value is used to accept or to reject the null hypothesis H<sup>0</sup> of the experiment which holds that the performance of the compared algorithms does not present significant differences. If the p-value does not exceed a predefined significance level, H<sup>0</sup> is rejected and the Bergmann-Hommel (BH) post hoc test [60] is conducted to detect the differences between all existing pairs of

algorithms. These statistical tests are applied using the scmamp R library [61].

Table 2. Average accuracies obtained by the DTI algorithms and the DE-based methods.

Dataset J48 sCART OC1-DE DE-ODT Glass 67.62 (4) 71.26 (2) 71.31 (1) 68.97 (3) Diabetes 74.49 (3) 74.56 (2) 73.37 (4) 75.79 (1) Balance-scale 77.82 (4) 78.74 (3) 93.92 (1) 91.97 (2) Heart-statlog 78.15 (2) 78.07 (3) 74.11 (4) 81.11 (1) Iris 94.73 (3) 94.20 (4) 96.73 (2) 97.17 (1) Australian 84.35 (4) 85.19 (2.5) 85.19 (2.5) 85.61 (1) Ionosphere 89.74 (3) 88.86 (4) 91.11 (2) 92.28 (1) Wine 93.20 (1) 89.49 (4) 92.58 (2) 91.88 (3) Sonar 73.61 (3) 70.67 (4) 77.65 (2) 79.34 (1) Vehicle 72.28 (2) 69.91 (4) 72.32 (1) 71.33 (3) Liver-disorders 65.83 (4) 66.64 (3) 67.63 (2) 71.16 (1) Page-blocks 96.99 (2) 96.76 (4) 96.88 (3) 97.07 (1) Blood-t 78.20 (2) 77.86 (3) 76.35 (4) 78.70 (1) Breast-tissue-6 34.81 (3) 32.45 (4) 34.91 (2) 38.85 (1) Movement-libras 69.31 (2) 65.64 (3) 75.11 (1) 55.63 (4) Parkinsons 84.72 (4) 86.31 (3) 87.95 (1) 86.43 (2) Seeds 90.90 (3.5) 90.90 (3.5) 93.76 (1) 91.79 (2) Segment 96.79 (1) 95.83 (3) 95.93 (2) 94.78 (4) Ecoli 82.83 (4) 83.15 (3) 83.51 (2) 84.72 (1) Spambase 92.68 (2) 92.35 (3) 92.19 (4) 93.94 (1) Average ranking 2.825 3.250 2.175 1.750

defined as the number of leaf nodes of the oblique DT.

62 Artificial Intelligence - Emerging Trends and Applications

Table 2 and Figure 9 show the average accuracies of the DTs induced by the DTI algorithms as well as those achieved by the OC1-DE method. In Table 2, the best result for each dataset is highlighted with bold numbers, and the numbers in parentheses refer to the ranking reached by each method for each dataset. The last row in this table indicates the average ranking of each method. It is observed that the DE-based methods produce better results than those generated by the other DTI algorithms.

A statistical test of the experimental results is conducted to evaluate the performance of the DE-based methods. First, the Friedman test is run and its resulting statistic value is 16.197 for four methods and 20 datasets, which has a <sup>p</sup>-value of 1:<sup>033</sup> <sup>10</sup><sup>3</sup> . When evaluating this p-value with a significance level of 5%, H<sup>0</sup> is rejected. Next, the BH post hoc test is applied to

Figure 9. Graphical comparison of the average accuracies obtained by the DTI algorithms and the DE-based methods.

#### 64 Artificial Intelligence - Emerging Trends and Applications


find all the possible hypotheses which cannot be rejected. Table 3 shows both the average rank (AR) of the results yielded by each method and the p-values computed by comparing the average accuracies achieved by the DE-based procedures versus those obtained by the other DTI methods. The p-values highlighted with bold numbers indicate that H<sup>0</sup> is rejected for this pair of methods since they show different performance. Unadjusted p-values are calculated with the average ranks of the two methods being compared, as is described by Demšar in [68]. These values are used by the BH post hoc test to compute the corresponding adjusted p-values. Table 3 shows that the DE-ODT method has a better performance than the other DTI methods since it has the lowest average rank (1.750), and its results are statistically different than these methods. Figure 10 shows a graph where the nodes represent the compared methods and the edges joining two nodes indicate that the performance of these methods does not present significant differences. The values shown in the edges are the p-values computed by the BH post hoc test. This figure is based on that obtained using the scmamp library, and in it is observed that the DE-based methods are not statistically different between them, and that the DE-ODT method is statistically different with the DTI methods. This statistical results indicate

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

http://dx.doi.org/10.5772/intechopen.75694

65

On the other hand, the average sizes of the DTs constructed by the DE-based algorithms and also of those induced by the J48 and the sCART methods are shown in Table 4 and Figure 11. Similar to Table 2, the best result for each dataset in Table 4 is highlighted with bold numbers, and the numbers in parentheses refer to the ranking reached by each method for each dataset. These results indicate that the DE-ODT method produces the most compact DTs. Also, it is observed that the size of the DTs built for the OC1-DE method has less complexity than those

that the DE-ODT method is the better DTI method to build oblique DT.

yielded by the J48 method.

Figure 11. Average DT sizes of several DTI methods.

Table 3. The p-values for multiple comparisons among DTI algorithms and the DE-based methods.


Figure 10. The p-value graph of the DTI algorithms and the DE-based methods.


Table 4. Average DT sizes obtained by the DTI methods.

find all the possible hypotheses which cannot be rejected. Table 3 shows both the average rank (AR) of the results yielded by each method and the p-values computed by comparing the average accuracies achieved by the DE-based procedures versus those obtained by the other DTI methods. The p-values highlighted with bold numbers indicate that H<sup>0</sup> is rejected for this pair of methods since they show different performance. Unadjusted p-values are calculated with the average ranks of the two methods being compared, as is described by Demšar in [68]. These values are used by the BH post hoc test to compute the corresponding adjusted p-values. Table 3 shows that the DE-ODT method has a better performance than the other DTI methods since it has the lowest average rank (1.750), and its results are statistically different than these methods. Figure 10 shows a graph where the nodes represent the compared methods and the edges joining two nodes indicate that the performance of these methods does not present significant differences. The values shown in the edges are the p-values computed by the BH post hoc test. This figure is based on that obtained using the scmamp library, and in it is observed that the DE-based methods are not statistically different between them, and that the DE-ODT method is statistically different with the DTI methods. This statistical results indicate that the DE-ODT method is the better DTI method to build oblique DT.

On the other hand, the average sizes of the DTs constructed by the DE-based algorithms and also of those induced by the J48 and the sCART methods are shown in Table 4 and Figure 11. Similar to Table 2, the best result for each dataset in Table 4 is highlighted with bold numbers, and the numbers in parentheses refer to the ranking reached by each method for each dataset. These results indicate that the DE-ODT method produces the most compact DTs. Also, it is observed that the size of the DTs built for the OC1-DE method has less complexity than those yielded by the J48 method.

Figure 11. Average DT sizes of several DTI methods.

Method AR OC1-DE DE-ODT

64 Artificial Intelligence - Emerging Trends and Applications

J48 2.825 1.1134e-01 1.1134e-01 8.4584e-03 2.5375e-02 sCART 3.250 8.4584e-03 2.5375e-02 2.3856e-04 1.4131e-03 OC1-DE 2.175 — — 2.9786e-01 5.9572e-01 DE-ODT 1.750 2.9786e-01 5.9572e-01 — —

Dataset J48 sCART OC1-DE DE-ODT Glass 23.58 (4) 8.00 (1) 21.61 (3) 11.08 (2) Diabetes 22.20 (3) 3.00 (1) 41.55 (4) 14.77 (2) Balance-scale 41.60 (4) 13.00 (2) 15.24 (3) 5.01 (1) Heart-statlog 17.82 (4) 16.00 (2) 17.43 (3) 7.23 (1) Iris 4.64 (3) 5.00 (4) 3.00 (1) 3.37 (2) Australian 25.75 (4) 5.00 (1) 21.90 (3) 15.64 (2) Ionosphere 13.87 (4) 3.00 (1) 7.20 (2) 7.73 (3) Wine 5.30 (3) 5.00 (2) 5.48 (4) 4.71 (1) Sonar 14.45 (4) 10.00 (2) 10.24 (3) 6.13 (1) Vehicle 69.50 (3) 80.00 (4) 56.74 (2) 44.25 (1) Liver-disorders 25.51 (4) 3.00 (1) 22.65 (3) 6.60 (2) Page-blocks 42.91 (4) 22.00 (1) 38.70 (3) 24.56 (2) Blood-t 6.50 (1) 10.00 (3) 22.39 (4) 8.46 (2) Breast-tissue-6 22.45 (4) 8.00 (1) 14.09 (3) 8.97 (2) Movement-libras 47.52 (4) 30.00 (3) 27.46 (1) 29.07 (2) Parkinsons 10.24 (4) 7.00 (2) 7.11 (3) 4.85 (1) Seeds 7.42 (4) 6.00 (3) 4.78 (2) 3.17 (1) Segment 41.21 (4) 41.00 (3) 30.53 (2) 27.91 (1) Ecoli 18.59 (4) 15.00 (3) 12.57 (2) 7.06 (1) Spambase 103.37 (4) 75.00 (3) 74.42 (2) 31.70 (1) Average ranking 3.65 2.15 2.65 1.55

Table 3. The p-values for multiple comparisons among DTI algorithms and the DE-based methods.

Figure 10. The p-value graph of the DTI algorithms and the DE-based methods.

Table 4. Average DT sizes obtained by the DTI methods.

Unadjusted BH Unadjusted BH


5.3. Comparison with other classification methods

Method AR OC1-DE DE-ODT

Table 6. The p-values for multiple comparisons among several classification methods.

the RBF-NN algorithm and the NB method.

Figure 13. The p-value graph of the classification methods.

different with the NB method, only.

Table 5 and Figure 12 show the average accuracies got by several classification methods as well as those obtained by the DE-based methods. In this table, we can observe that the RF algorithm and the MLP method construct more accurate classifiers than the others, and also that the DE-based procedures induce DTs with better accuracy than the models built by both

NB 4.800 6.2979e-02 3.7787e-01 4.0591e-03 2.8414e-02 MLP 2.925 1.9019e-01 7.6079e-01 7.6737e-01 8.9374e-01 RBF-NN 4.350 2.7118e-01 7.7679e-01 3.4610e-03 1.3844e-01 RF 2.125 7.7623e-03 5.4336e-03 0.9342e-02 3.9736e-01 OC1-DE 3.700 — — 3.1049e-01 7.6079e-01

DE-ODT 3.100 3.1049e-01 7.6079e-01 — —

Unadjusted BH Unadjusted BH

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

http://dx.doi.org/10.5772/intechopen.75694

67

The Friedman statistics computed by analyzing the results got by these six methods with 20 datasets is 27.661, and the corresponding <sup>p</sup>-value is 4:<sup>24</sup> <sup>10</sup><sup>5</sup> so that <sup>H</sup><sup>0</sup> is rejected. The BH post hoc test is then applied to find all possible hypotheses that cannot be refused. Table 6 shows the results of these tests, and Figure 13 shows the graph corresponding to these p-values. The value highlighted with bold in Table 6 indicates that the DE-ODT method is statistically

Table 5. Average accuracies obtained by several classification methods.

Figure 12. Graphical comparison of the average accuracies obtained by several classification methods.

Differential Evolution Algorithm in the Construction of Interpretable Classification Models http://dx.doi.org/10.5772/intechopen.75694 67


Table 6. The p-values for multiple comparisons among several classification methods.

#### 5.3. Comparison with other classification methods

Dataset NB MLP RBF-NN RF OC1-DE DE-ODT Glass 49.44 (6) 67.29 (4) 65.09 (5) 79.95 (1) 71.31 (2) 68.97 (3) Diabetes 75.76 (3) 74.75 (4) 74.04 (5) 76.18 (1) 73.37 (6) 75.79 (2) Balance-scale 90.53 (4) 90.69 (3) 86.34 (5) 81.71 (6) 93.92 (1) 91.97 (2) Heart-statlog 83.59 (1) 79.41 (5) 83.11 (2) 82.41 (3) 74.11 (6) 81.11 (4) Iris 95.53 (5) 96.93 (2) 96.00 (4) 94.73 (6) 96.73 (3) 97.17 (1) Australian 77.19 (6) 83.42 (4) 82.55 (5) 86.77 (1) 85.19 (3) 85.61 (2) Ionosphere 82.17 (6) 91.05 (5) 91.71 (3) 93.39 (1) 91.11 (4) 92.28 (2) Wine 97.47 (4) 98.03 (1.5) 97.70 (3) 98.03 (1.5) 92.58 (5) 91.88 (6) Sonar 67.69 (6) 81.59 (2) 72.60 (5) 84.47 (1) 77.65 (4) 79.34 (3) Vehicle 44.68 (6) 81.11 (1) 65.35 (5) 75.14 (2) 72.32 (3) 71.33 (4) Liver-disorders 54.87 (6) 68.72 (3) 65.04 (5) 72.99 (1) 67.63 (4) 71.16 (2) Page-blocks 90.01 (6) 96.28 (4) 94.91 (5) 97.54 (1) 96.88 (3) 97.07 (2) Blood-t 75.28 (5) 78.46 (2) 78.22 (3) 73.62 (6) 76.35 (4) 78.70 (1) Breast-tissue-6 46.42 (1) 35.47 (5) 41.13 (3) 45.19 (2) 34.91 (6) 38.85 (4) Movement-libras 64.14 (5) 80.50 (2) 75.50 (3) 82.89 (1) 75.11 (4) 55.63 (6) Parkinsons 70.10 (6) 91.44 (1) 81.49 (5) 91.38 (2) 87.95 (3) 86.43 (4) Seeds 90.52 (6) 95.24 (1) 91.67 (5) 93.57 (3) 93.76 (2) 91.79 (4) Segment 80.17 (6) 96.21 (2) 87.31 (5) 98.07 (1) 95.93 (3) 94.78 (4) Ecoli 85.51 (2) 84.85 (3) 83.30 (6) 86.25 (1) 83.51 (5) 84.72 (4) Spambase 79.56 (6) 91.19 (4) 81.31 (5) 95.65 (1) 92.19 (3) 93.94 (2) Average ranking 4.800 2.925 4.350 2.125 3.700 3.100

Table 5. Average accuracies obtained by several classification methods.

66 Artificial Intelligence - Emerging Trends and Applications

Figure 12. Graphical comparison of the average accuracies obtained by several classification methods.

Table 5 and Figure 12 show the average accuracies got by several classification methods as well as those obtained by the DE-based methods. In this table, we can observe that the RF algorithm and the MLP method construct more accurate classifiers than the others, and also that the DE-based procedures induce DTs with better accuracy than the models built by both the RBF-NN algorithm and the NB method.

The Friedman statistics computed by analyzing the results got by these six methods with 20 datasets is 27.661, and the corresponding <sup>p</sup>-value is 4:<sup>24</sup> <sup>10</sup><sup>5</sup> so that <sup>H</sup><sup>0</sup> is rejected. The BH post hoc test is then applied to find all possible hypotheses that cannot be refused. Table 6 shows the results of these tests, and Figure 13 shows the graph corresponding to these p-values. The value highlighted with bold in Table 6 indicates that the DE-ODT method is statistically different with the NB method, only.

Figure 13. The p-value graph of the classification methods.

The p-values obtained by the BH post hoc test point out that the RF method is statistically different only with the RBF-NN algorithm and the NB method, and that both the MLP method and the DE-ODT procedure are statistically different with the NB method. The comparison between the remaining pairs of algorithms indicates that they have a similar performance. The RF method is the best ranked in this comparison, and the AR of the DE-ODT procedure places it as the third best classification method.

Author details

VER, México

References

275-306

Cunduacán, TAB, México

Rafael Rivera-Lopez1 and Juana Canul-Reich<sup>2</sup>

overview. AI Magazine. 1992;13(3):57

matics. PLoS One. 2012;7(3):1-13

port Systems. 2011;51(1):141-154

and Reviews). 2010;40(2):121-144

The Knowledge Engineering Review. 2004;19(1):27-59

Part C (Applications and Reviews). 2012;42(3):291-312

\*Address all correspondence to: juana.canul@ujat.mx

\*

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

http://dx.doi.org/10.5772/intechopen.75694

69

1 Departamento de Sistemas y Computación, Instituto Tecnológico de Veracruz, Veracruz,

2 División Académica de Informática y Sistemas, Universidad Juárez Autónoma de Tabasco,

[1] Frawley WJ, Piatetsky-Shapiro G, Matheus CJ. Knowledge discovery in databases: An

[2] Stiglic G, Kocbek S, Pernek I, Kokol P. Comprehensive decision tree models in bioinfor-

[3] Huysmans J, Dejaeger K, Mues C, Vanthienen J, Baesens B. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Sup-

[4] Nettleton DF, Orriols-Puig A, Fornells A. A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review. 2010;33(4):

[5] Du KL, Swamy MNS. Search and Optimization by Metaheuristics. Switzerland: Springer; 2016 [6] Storn R, Price K. Differential evolution – A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization. 1997;11(4):341-359 [7] Galea M, Shen Q, Levine J. Evolutionary approaches to fuzzy modelling for classification.

[8] Espejo PG, Ventura S, Herrera F. A survey on the application of genetic programming to classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications

[9] Kokol P, Pohorec S, Štiglic G, Podgorelec V. Evolutionary design of decision trees for medical application. Data Mining and Knowledge Discovery. 2012;2(3):237-254

[10] Barros RC, Basgalupp MP, Carvalho ACPLF, Freitas AA. A survey of evolutionary algorithms for decision-tree induction. IEEE Transactions on Systems, Man, and Cybernetics,

## 6. Conclusions

In this chapter, two DE-based methods to induce oblique DTs are described. The OC1-DE method implements a recursive partitioning strategy to find a near-optimal hyperplane which is used as test condition of an oblique DT. On the other hand, in the DE-ODT method, a global search in the space of oblique DTs is conducted with the aim of finding a near-optimal tree. The DE-ODT method estimates the size of the chromosome encoding a complete tree based on both the number of attributes and the number of classes of the dataset whose model is constructed. This method also defines a scheme to map feasible oblique DTs from this chromosome.

The experimental results obtained indicate that these DE-based methods are better DTI methods, since they build more accurate and compact oblique DTs than those induced by the J48 and the sCART procedures. The DE-ODT method is better than the OC1-DE since its search procedure uses intelligent search procedures combining their exploration and exploitation skills, thus providing a better way to discover the relationships between the attributes used in the training set, and although the search process is only guided by the accuracy of the DT, the models constructed are more compact than those produced by the methods that implement a recursive partitioning strategy. Among the other compared methods, the results got by the OC1-DE method are better than those obtained by the other methods, since it uses a linear combination of attributes in each test condition of the tree, and it produce better hyperplanes than the axis-parallel hyperplanes.

Even though the results yielded by the DE-based variants are not better than those produced by the RF algorithm and the MLP-based classifier, they are statistically equivalent. An advantage of these methods is that it constructs models whose decisions and operations are easily understood, and although the RF method also builds DTs, its voting scheme makes it very difficult to trace the way in which the model takes its decisions.

In this chapter, an analysis of the run time of the algorithms is not performed, since it is known that MHs consume more computational time than other approaches because they work with a group of candidate solutions, unlike the traditional methods where only one DT is induced from the training set. It is important to mention that for many practical applications, the construction of the model is conducted in one offline procedure, so the time of its construction is not a parameter that usually impacts the efficiency of the built model.

## Author details

The p-values obtained by the BH post hoc test point out that the RF method is statistically different only with the RBF-NN algorithm and the NB method, and that both the MLP method and the DE-ODT procedure are statistically different with the NB method. The comparison between the remaining pairs of algorithms indicates that they have a similar performance. The RF method is the best ranked in this comparison, and the AR of the DE-ODT procedure places

In this chapter, two DE-based methods to induce oblique DTs are described. The OC1-DE method implements a recursive partitioning strategy to find a near-optimal hyperplane which is used as test condition of an oblique DT. On the other hand, in the DE-ODT method, a global search in the space of oblique DTs is conducted with the aim of finding a near-optimal tree. The DE-ODT method estimates the size of the chromosome encoding a complete tree based on both the number of attributes and the number of classes of the dataset whose model is constructed. This method also defines a scheme to map feasible oblique DTs from this chromosome.

The experimental results obtained indicate that these DE-based methods are better DTI methods, since they build more accurate and compact oblique DTs than those induced by the J48 and the sCART procedures. The DE-ODT method is better than the OC1-DE since its search procedure uses intelligent search procedures combining their exploration and exploitation skills, thus providing a better way to discover the relationships between the attributes used in the training set, and although the search process is only guided by the accuracy of the DT, the models constructed are more compact than those produced by the methods that implement a recursive partitioning strategy. Among the other compared methods, the results got by the OC1-DE method are better than those obtained by the other methods, since it uses a linear combination of attributes in each test condition of the tree, and it produce better hyperplanes

Even though the results yielded by the DE-based variants are not better than those produced by the RF algorithm and the MLP-based classifier, they are statistically equivalent. An advantage of these methods is that it constructs models whose decisions and operations are easily understood, and although the RF method also builds DTs, its voting scheme makes it very

In this chapter, an analysis of the run time of the algorithms is not performed, since it is known that MHs consume more computational time than other approaches because they work with a group of candidate solutions, unlike the traditional methods where only one DT is induced from the training set. It is important to mention that for many practical applications, the construction of the model is conducted in one offline procedure, so the time of its construction

difficult to trace the way in which the model takes its decisions.

is not a parameter that usually impacts the efficiency of the built model.

it as the third best classification method.

68 Artificial Intelligence - Emerging Trends and Applications

than the axis-parallel hyperplanes.

6. Conclusions

Rafael Rivera-Lopez1 and Juana Canul-Reich<sup>2</sup> \*

\*Address all correspondence to: juana.canul@ujat.mx

1 Departamento de Sistemas y Computación, Instituto Tecnológico de Veracruz, Veracruz, VER, México

2 División Académica de Informática y Sistemas, Universidad Juárez Autónoma de Tabasco, Cunduacán, TAB, México

## References


[11] Kolçe E, Frasheri N. The use of heuristics in decision tree learning optimization. International Journal of Computer Engineering in Research Trends. 2014;1(3):127-130

[27] Tušar T. Optimizing accuracy and size of decision trees. In: Proceedings of the 16th International Electrotechnical and Computer Science Conference (ERK-2007), Portorož; Slovenia,

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

http://dx.doi.org/10.5772/intechopen.75694

71

[28] Neri F, Tirronen V. Recent advances in differential evolution: A survey and experimental

[29] Heath DG, Kasif S, Salzberg S. Induction of oblique decision trees. In: Bajcsy R, editor. Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI-93);

[30] Murthy SK, Kasif S, Salzberg S, Beigel R. OC1: A randomized algorithm for building

[31] Cantú-Paz E, Kamath C. Inducing oblique decision trees with evolutionary algorithms.

[32] Li XB, Sweigart JR, Teng JTC, Donohue JM, Thombs L, Wang SM. Multivariate decision trees using linear discriminants and tabu search. IEEE Transactions on Systems, Man, and

[33] Orsenigo C, Vercellis C. Discrete support vector decision trees via tabu search. Computa-

[34] Zhang K, Xu Z, Buckles BP. Oblique decision tree induction using multimembered evolution strategies. In: Dasarathy BV, editor. Proceeding of Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security, SPIE 2005. Vol. 5812. Orlando:,

[35] Chai BB, Zhuang X, Zhao Y, Sklansky J. Binary linear decision tree with genetic algorithm. In: Proceedings of the 13th International Conference on Pattern Recognition (ICPR'96). Track D: Parallel and Connectionist Systems. Vol. IV. Vienna: IEEE; 1996. pp. 530-534

[36] Struharik R, Vranjkovic V, Dautovic S, Novak L. Inducing oblique decision trees. In: Proceedings of the 12th International Symposium on Intelligent Systems and Informatics

[37] Krȩtowski M. An evolutionary algorithm for oblique decision tree induction. In: Rutkowski L, Siekmann J, Tadeusiewicz R, Zadeh LA, editors. Proceedings of the 7th International Conference on Artificial Intelligence and Soft Computing (ICAISC 2004).

[38] Pangilinan JM, Janssens GK. Pareto-optimality of oblique decision trees from evolutionary

[39] Rivera-Lopez R, Canul-Reich J, Gámez JA, Puerta JM. OC1-DE: A differential evolution based approach for inducing oblique decision trees. In: Rutkowski L, Korytkowski M, Scherer R, Tadeusiewicz R, Zadeh LA, Zurada JM, editors. Proceedings of the 16th International Conference in Artificial Intelligence and Soft Computing (ICAISC 2017). LNCS.

oblique decision trees. In: AAAI'93. Vol 93. AAAI press; 1993. pp. 322-327

IEEE Transactions on Evolutionary Computation. 2003;7(1):54-68

Cybernetics – Part A: Systems and Humans. 2003;33(2):194-205

tional Statistics & Data Analysis. 2004;47(2):311-322

(SISY–2014), Subotica, Serbia: IEEE; 2014. pp. 257-262

LNAI. Vol. 3070. Zakopane, Poland: Springer; 2004. pp. 432-437

algorithms. Journal of Global Optimization. 2011;51(2):301-311

Vol 10245. Zakopane, Poland: Springer; 2017. pp. 427-438

Florida SPIE; 2005. pp. 263-270

analysis. Artificial Intelligence Review. 2010;33(1–2):61-106

Chambéry; France, 1993. pp. 1002-1007

2007; pp. 81-84


[27] Tušar T. Optimizing accuracy and size of decision trees. In: Proceedings of the 16th International Electrotechnical and Computer Science Conference (ERK-2007), Portorož; Slovenia, 2007; pp. 81-84

[11] Kolçe E, Frasheri N. The use of heuristics in decision tree learning optimization. International Journal of Computer Engineering in Research Trends. 2014;1(3):127-130

[12] Alurkar AA, Ranade SB, Joshi SV, Ranade SS, Sonewar PA, Mahalle PN, Deshpande AV. A proposed data science approach for email spam classification using machine learning techniques. In: 2017 Internet of Things Business Models, Users, and Networks; November

[13] Mishan MT, Kushan AL, Fadzil AFA, Amir ALB, Anuar NB. An analysis on business intelligence predicting business profitability model using naive Bayes neural network algorithm. In: 2017 7th IEEE International Conference on System Engineering and Tech-

[14] Lippmann RP. An introduction to computing with neural nets. ASSP Magazine. 1987;4(2):

[15] Abe S. Support Vector Machines for Pattern Classification. London, UK: Springer; 2005 [16] Murthy SK. On Growing Better Decision Trees from Data. PhD thesis, The Johns Hopkins

[17] Fisher RA. The use of multiple measurements in taxonomic problems. Annals of Eugenics.

[18] Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. San

[19] Heath DG. A Geometric Framework for Machine Learning [PhD thesis]. Johns Hopkins

[20] Birattari M. Tuning Metaheuristics: A Machine Learning Perspective, Volume 197 of

[21] Boussaïd I, Lepagnot J, Siarry P. A survey on optimization metaheuristics. Information

[22] Talbi EG. Metaheuristics: From Design to Implementation. Hoboken, NY, USA: Wiley; 2006 [23] Li J, Ding L, Li B. Differential evolution-based parameters optimisation and feature selection for support vector machine. International Journal of Computational Science and

[24] Leema N, Nehemiah HK, Kannan A. Neural network classifier optimization using differential evolution with global information and back propagation algorithm for clinical

[25] Geetha K, Baboo SS. An empirical model for thyroid disease classification using evolutionary multivariate Bayesian prediction method. Global Journal of Computer Science and

[26] García S, Derrac J, Triguero I, Carmona CJ, Herrera F. Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Systems. 2012;

Studies in Computational Intelligence. Berlin Heildelberg: Springer; 2009

nology (ICSET). Shah Alam, Malasya: IEEE; 2017; pp. 59-64

Francisco, CA, USA: Morgan Kaufmann; 2005

2017; pp. 1-5

70 Artificial Intelligence - Emerging Trends and Applications

4-22

University; 1997

1936;7(2):179-188

University; 1993

Sciences. 2013;237:82-117

Engineering. 2016;13(4):355-363

Technology. 2016;16(1):1-9

25(1):3-12

datasets. Applied Soft Computing. 2016;49:834-844


[40] Dumitrescu D, András J. Generalized decision trees built with evolutionary techniques. Studies in Informatics and Control. 2005;14(1):15-22

[53] Murthy SK, Kasif S, Salzberg S. A system for induction of oblique decision trees. Journal of

Differential Evolution Algorithm in the Construction of Interpretable Classification Models

http://dx.doi.org/10.5772/intechopen.75694

73

[54] Breiman L, Friedman J, Olshen R, Stone C. Classification and Regression Trees. Boca

[55] Lichman M. UCI Machine Learning Repository. University of California, Irvine, School of

[56] Durillo JJ, Nebro AJ. jMetal: A Java framework for multi-objective optimization. Advances

[57] Das S, Konar A, Chakraborty UK. Two improved differential evolution schemes for faster global search. In: Beyer HG, editor. Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation (GECCO'05), Washington, DC, USA: ACM; 2005. pp. 991-998 [58] Quinlan JR. C4.5: Programs for Machine Learning. San Mateo, CA, USA: Morgan Kaufm-

[59] Friedman M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association. 1937;32(200):675-701

[60] Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni

[61] Calvo B, Guzmán-Santafé R. scmamp: Statistical comparison of multiple algorithms in

[62] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: An update. SIGKDD Explorations Newsletter. 2009;11(1):10-18

[63] Witten IH, Frank E, Trigg LE, Hall MA, Holmes G, Weka SJC. Practical machine learning tools and techniques with Java implementations. Technical Report 11, Department of

[64] John GH Langley P. Estimating Continuous Distributions in Bayesian Classifiers. In: Besnard P, Hanks S, editors. Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI'95), San Francisco, CA, USA: Morgan Kaufmann; 1995. pp. 338-345

[65] Murtagh F. Multilayer perceptrons for classification and regression. Neurocomputing.

[66] Frank E. Fully supervised training of Gaussian radial basis function networks in WEKA. Technical Report 04, Department of Computer Science. New Zeland: Waikato; 2014

[68] Demšar J. Statistical comparisons of classifiers over multiple data sets. The Journal of

Artificial Intelligence Research. 1994;2(1):1-32

Raton, FL, USA: Chapman and Hall; 1984

Information and Computer Sciences; 2013

in Engineering Software. 2011;42(10):760-771

test. Biometrika. 1988;75(2):383-386

multiple problems. The R Journal. 2016;8(1):248-256

Computer Science. New Zeland: Waikato; 1999

[67] Breiman L. Random forests. Machine Learning. 2001;45(1):5-32

Machine Learning Research. 2006;7:1-30

ann; 1993

1991;2(5):183-197


[53] Murthy SK, Kasif S, Salzberg S. A system for induction of oblique decision trees. Journal of Artificial Intelligence Research. 1994;2(1):1-32

[40] Dumitrescu D, András J. Generalized decision trees built with evolutionary techniques.

[41] Vukobratovic B, Struharik R. Evolving full oblique decision trees. In: Proceedings of the 16th IEEE International Symposium on Computational Intelligence and Informatics

[42] Krȩtowski M, Grześ M. Global induction of oblique decision trees: An evolutionary approach. In: Kłopotek MA et al editors. IIPWM'05, Volume 31 of ASC. Berlin Heildelberg:

[43] Krȩtowski M, Grześ M. Evolutionary learning of linear trees with embedded feature selection. In: Rutkowski L et al editors. ICAISC 2006. LNAI. Volume 4029 of LNAI-

[44] Gray JB, Fan G. Classification tree analysis using TARGET. Computational Statistics &

[45] Liu KH, Xu CG. A genetic programming-based approach to the classification of multiclass

[46] Bot MCJ, Langdon WB. Application of genetic programming to induction of linear classification trees. In: Poli R et al., editors. EuroGP 2000. LNCS. Vol. 1802. Berlin Heildelberg:

[47] Bot MCJ, Langdon WB. Improving induction of linear classification trees with genetic programming. In: Whitley LD et al., editors. GECCO-2000. San Francisco, CA, USA: Mor-

[48] Agapitos A, O'Neill M, Brabazon A, Theodoridis T. Maximum margin decision surfaces for increased generalisation in evolutionary decision tree learning. In: Silva S et al editors.

[49] Lopes RA, Freitas ARR, Silva RCP, Guimarães FG. Differential evolution and perceptron decision trees for classification tasks. In: Yin H, Costa JAF, Barreto G, editors. Proceedings of the 13th International Conference Intelligent Data Engineering and Automated Learn-

[50] Freitas ARR, Silva RCP, Guimarães FG. Differential evolution and perceptron decision trees for fault detection in power transformers. In: Snášel V et al, editors. SOCO Models in Industrial & Environmental Applications. AISC. Volume 188. Berlin Heildelberg:

[51] Rivera-Lopez R, Canul-Reich J. A global search approach for inducing oblique decision trees using differential evolution. In: Mouhoub M, Langlais P, editors. Proceedings of the 30th Canadian Conference on Artificial Intelligence (AI 2017), volume 10233 of LNCS.

[52] Quinlan JR. Induction of decision trees. Machine Learning. 1986;1(1):81-106

EuroGP 2011. LNCS. Volume 6621. Berlin Heildelberg: Springer; 2011. pp. 61-72

ing (IDEAL 2012). LNCS. Vol. 7435. Natal, Brazil: Springer; 2012. pp. 550-557

Studies in Informatics and Control. 2005;14(1):15-22

Springer; 2005. pp. 309-318

72 Artificial Intelligence - Emerging Trends and Applications

Springer; 2006. pp. 400-409

Springer; 2000. pp. 247-258

Springer; 2013. pp. 143-152

Edmonton, Canada: Springer; 2017. pp. 27-38

gan Kaufmann; 2000. pp. 403-410

Data Analysis. 2008;52(3):1362-1372

(CINTI 2015). Budapest, Hungary: IEEE; 2015. pp 95-100

microarray datasets. Bioinformatics. 2009;25(3):331-337


**Chapter 4**

**Provisional chapter**

**Advanced Content and Interface Personalization**

**Advanced Content and Interface Personalization** 

DOI: 10.5772/intechopen.75599

Conversation is becoming one of the key interaction modes in HMI. As a result, the conversational agents (CAs) have become an important tool in various everyday scenarios. From Apple and Microsoft to Amazon, Google, and Facebook, all have adapted their own variations of CAs. The CAs range from chatbots and 2D, carton-like implementations of talking heads to fully articulated embodied conversational agents performing interaction in various concepts. Recent studies in the field of face-to-face conversation show that the most natural way to implement interaction is through synchronized verbal and co-verbal signals (gestures and expressions). Namely, co-verbal behavior represents a major source of discourse cohesion. It regulates communicative relationships and may support or even replace verbal counterparts. It effectively retains semantics of the information and gives a certain degree of clarity in the discourse. In this chapter, we will represent a model of

**Keywords:** co-verbal behavior generation, affective embodied conversational avatars,

generation and realization of more natural machine-generated output.

humanoid robot behavior, multimodal interaction, unity, EVA framework

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution,

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

and reproduction in any medium, provided the original work is properly cited.

One of the key challenges in the modern human-machine interaction (HMI) is the generation of more natural, more personalized, and more human-like human-machine interaction [1]. As a result, the conversational agents (CAs) are gaining interest and traction, especially due to the fact that most of the user devices are already capable to support multimedia and the concept of conversational agents (CAs). Apple, Microsoft, Amazon, Google, Facebook,

**through Conversational Behavior and Affective**

**through Conversational Behavior and Affective** 

**Embodied Conversational Agents**

**Embodied Conversational Agents**

Matej Rojc, Zdravko Kačič and Izidor Mlakar

Matej Rojc, Zdravko Kačič and Izidor Mlakar

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75599

**Abstract**

**1. Introduction**

#### **Advanced Content and Interface Personalization through Conversational Behavior and Affective Embodied Conversational Agents Advanced Content and Interface Personalization through Conversational Behavior and Affective Embodied Conversational Agents**

DOI: 10.5772/intechopen.75599

Matej Rojc, Zdravko Kačič and Izidor Mlakar Matej Rojc, Zdravko Kačič and Izidor Mlakar

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75599

#### **Abstract**

Conversation is becoming one of the key interaction modes in HMI. As a result, the conversational agents (CAs) have become an important tool in various everyday scenarios. From Apple and Microsoft to Amazon, Google, and Facebook, all have adapted their own variations of CAs. The CAs range from chatbots and 2D, carton-like implementations of talking heads to fully articulated embodied conversational agents performing interaction in various concepts. Recent studies in the field of face-to-face conversation show that the most natural way to implement interaction is through synchronized verbal and co-verbal signals (gestures and expressions). Namely, co-verbal behavior represents a major source of discourse cohesion. It regulates communicative relationships and may support or even replace verbal counterparts. It effectively retains semantics of the information and gives a certain degree of clarity in the discourse. In this chapter, we will represent a model of generation and realization of more natural machine-generated output.

**Keywords:** co-verbal behavior generation, affective embodied conversational avatars, humanoid robot behavior, multimodal interaction, unity, EVA framework

## **1. Introduction**

One of the key challenges in the modern human-machine interaction (HMI) is the generation of more natural, more personalized, and more human-like human-machine interaction [1]. As a result, the conversational agents (CAs) are gaining interest and traction, especially due to the fact that most of the user devices are already capable to support multimedia and the concept of conversational agents (CAs). Apple, Microsoft, Amazon, Google, Facebook,

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

etc., already have adapted their own variations of CAs. Moreover, the newest technologies, such as the Amazon Echo and Google Home, which are positioned as supporting multiuser and highly personalized interaction in collocated environments (e.g., homes and ambientassisted living environments), integrate virtual agents supporting both visual and auditory interactions [2–4]. Thus, exploring the conversational models and challenges around CA supported interaction represents a timely topic. The production of conversational behavior, e.g., socially shared information and attitude, incorporates much more than just speech verbal exchange. Namely, it is multimodal and multilayered, since it entails multiple verbal and nonverbal signals that are correlated in a dynamic and highly unpredictable settings. One might even say that the social interaction involves synchronized signal verbal and nonverbal channels. The verbal channels carry symbolic/semantic interpretation of message through the linguistic and paralinguistic features of interaction, while the co-verbal channels serves as an orchestrator of communication [5–7]. Thus, such an interaction facilitates full embodiment of the collocutors. It also exploits physical environment in which the interaction is positioned in [8–10]. Further, the co-verbal behavior is actually equally relevant as speech. Namely, it actively contributes to the information presentation and understanding, as well as the discourse itself. It establishes semantic coherence and regulates communicative relationships. It may support or even replace the verbal communication in order to clarify or reinforce the information provided by the verbal counterparts [11–13]. The co-verbal behavior goes well beyond an add-on or a style of information representation. For instance, spatial orientation of the face and eye gaze are key nonverbal cues that shape the footing of the conversational participants [14]. Through the co-verbal responses, the listeners may signal their interest, attention, and understanding [15]. As a result, the role of co-verbal (and nonverbal) behavior in human communication and in human-machine interaction has been increasingly scrutinized over the last few decades, within a wide range of contexts [16–21]. Embodied conversational agents (ECAs) are nowadays the most natural selection for the generation of affective and personalized agents. ECAs are those CAs that can facilitate full virtual body and the available embodiment in order to incorporate humanlike responses. The ECA technology ranges from chatbots and 2D/3D realizations in a form of talking heads [22–24] to fully articulated embodied conversational agents engaged in various concepts of HMI, including sign language [25], storytelling [26], companions [27], and virtual hosts within user interfaces, and even used as moderators of various concepts in ambient-assisted living environments [28–32].

In Chapters 4 and 5, how these resources are integrated into the two-folded approach of the automatic co-verbal behavior generation is then described. The presented approach involves (a) the problem of behavior formulation (intent and behavior planning) and (b) the problem of behavior realization (animation via ECA). Finally, we conclude with synthesis of affective

Advanced Content and Interface Personalization through Conversational Behavior…

http://dx.doi.org/10.5772/intechopen.75599

77

In order to cope also with the complexity in multiparty conversations, and in order to apply the knowledge to various concepts in human-machine interaction in a form of conversational behavior, we have envisaged and deployed an advanced EVA conversational model, which is used (a) to study the nature of natural behavior of human-collocutors; (b) to create conversational knowledge in form of linguistic, paralinguistic verbal, and nonverbal features; (c) and to test theories and to apply knowledge in various conversational settings as part of situation understanding or as a part of output generation processes. The presented EVA conversational model is outlined in **Figure 1**. As can be seen, it consists of the following three cooperative frameworks/platforms: *conversational analysis platform*, *EVA framework*, and *EVA* 

**Figure 1.** EVA conversational model for generation and exploitation of expressive human-like machine responses.

co-verbal behavior within interfaces and final remarks.

**interaction**

*realization framework*.

**2. EVA conversational model for expressive human-machine** 

In this chapter we present novel expressive conversational model for facilitating humanlike conversations and a solution for affective and personalized human-machine interaction. The model facilitates (i) a platform for the generation of "conversational" knowledge and resources, (ii) a framework for planning and generation of (non-)co-verbal behavior, and (iii) a framework for delivery of affective and reactive co-verbal behavior through attitude, emotion, and gestures synchronized with the speech. Namely, the EVA expressive conversational model is outlined in Section 2. The main idea is to formulate various forms of co-verbal behavior (gestures) with respect to arbitrary and unannotated text and broader social and conversational context. The "conversational" knowledge and resources required are generated via annotation of spontaneous dialog and through the corpus analysis as presented in Section 3. In Chapters 4 and 5, how these resources are integrated into the two-folded approach of the automatic co-verbal behavior generation is then described. The presented approach involves (a) the problem of behavior formulation (intent and behavior planning) and (b) the problem of behavior realization (animation via ECA). Finally, we conclude with synthesis of affective co-verbal behavior within interfaces and final remarks.

## **2. EVA conversational model for expressive human-machine interaction**

etc., already have adapted their own variations of CAs. Moreover, the newest technologies, such as the Amazon Echo and Google Home, which are positioned as supporting multiuser and highly personalized interaction in collocated environments (e.g., homes and ambientassisted living environments), integrate virtual agents supporting both visual and auditory interactions [2–4]. Thus, exploring the conversational models and challenges around CA supported interaction represents a timely topic. The production of conversational behavior, e.g., socially shared information and attitude, incorporates much more than just speech verbal exchange. Namely, it is multimodal and multilayered, since it entails multiple verbal and nonverbal signals that are correlated in a dynamic and highly unpredictable settings. One might even say that the social interaction involves synchronized signal verbal and nonverbal channels. The verbal channels carry symbolic/semantic interpretation of message through the linguistic and paralinguistic features of interaction, while the co-verbal channels serves as an orchestrator of communication [5–7]. Thus, such an interaction facilitates full embodiment of the collocutors. It also exploits physical environment in which the interaction is positioned in [8–10]. Further, the co-verbal behavior is actually equally relevant as speech. Namely, it actively contributes to the information presentation and understanding, as well as the discourse itself. It establishes semantic coherence and regulates communicative relationships. It may support or even replace the verbal communication in order to clarify or reinforce the information provided by the verbal counterparts [11–13]. The co-verbal behavior goes well beyond an add-on or a style of information representation. For instance, spatial orientation of the face and eye gaze are key nonverbal cues that shape the footing of the conversational participants [14]. Through the co-verbal responses, the listeners may signal their interest, attention, and understanding [15]. As a result, the role of co-verbal (and nonverbal) behavior in human communication and in human-machine interaction has been increasingly scrutinized over the last few decades, within a wide range of contexts [16–21]. Embodied conversational agents (ECAs) are nowadays the most natural selection for the generation of affective and personalized agents. ECAs are those CAs that can facilitate full virtual body and the available embodiment in order to incorporate humanlike responses. The ECA technology ranges from chatbots and 2D/3D realizations in a form of talking heads [22–24] to fully articulated embodied conversational agents engaged in various concepts of HMI, including sign language [25], storytelling [26], companions [27], and virtual hosts within user interfaces, and even used as moderators of various concepts in ambient-assisted

In this chapter we present novel expressive conversational model for facilitating humanlike conversations and a solution for affective and personalized human-machine interaction. The model facilitates (i) a platform for the generation of "conversational" knowledge and resources, (ii) a framework for planning and generation of (non-)co-verbal behavior, and (iii) a framework for delivery of affective and reactive co-verbal behavior through attitude, emotion, and gestures synchronized with the speech. Namely, the EVA expressive conversational model is outlined in Section 2. The main idea is to formulate various forms of co-verbal behavior (gestures) with respect to arbitrary and unannotated text and broader social and conversational context. The "conversational" knowledge and resources required are generated via annotation of spontaneous dialog and through the corpus analysis as presented in Section 3.

living environments [28–32].

76 Artificial Intelligence - Emerging Trends and Applications

In order to cope also with the complexity in multiparty conversations, and in order to apply the knowledge to various concepts in human-machine interaction in a form of conversational behavior, we have envisaged and deployed an advanced EVA conversational model, which is used (a) to study the nature of natural behavior of human-collocutors; (b) to create conversational knowledge in form of linguistic, paralinguistic verbal, and nonverbal features; (c) and to test theories and to apply knowledge in various conversational settings as part of situation understanding or as a part of output generation processes. The presented EVA conversational model is outlined in **Figure 1**. As can be seen, it consists of the following three cooperative frameworks/platforms: *conversational analysis platform*, *EVA framework*, and *EVA realization framework*.

**Figure 1.** EVA conversational model for generation and exploitation of expressive human-like machine responses.

The model builds on the notion that verbal to co-verbal alignment and synchronization are driving forces behind affective and social interaction. Thus, the *conversational analysis platform* is used for analyzing how linguistic and paralinguistic features interplay with embodiments during complex, spontaneous, and multiparty interactions. Its processing is based on multimodal corpus, named EVA corpus [33], the EVA annotation scheme developed to describe the complex relations of co-verbal behavior (proposed in [34]), and *EVA framework* capable (a) to capture various contexts in the "data" and (b) to provide the basis to analytically investigate into various multidimensional correlations among co-verbal behavior features.

articulated movement generated by the expressive virtual entity, e.g., to apply the EVA-Script language onto the articulated 3D model EVA in the form of animated movement. Further, the framework contains the *animation-parameters builder* and the *animation-realization engine*. Both are maintained within the *EVA behavior realizer* and implemented as a series of standalone modules. The *animation-parameters builder* is used to understand the specified behavior and to adapt it to the restriction of the targeted agent. Thus, it transforms the EVA-Script sequence into animation parameters that are then mapped to different control units of the ECA's 3D articulated model, while the *animation-realization engine* is responsible for scheduling and execution of the translated animation parameters in a form of sequences of parallel/sequential transformation of ECA-related resources (e.g., meshes textures, bones, morphed shapes, etc.). These resources actually constitute virtual agent's embodiment in a form of hand/arm gestures, posture and body configuration, head movement and gaze, facial expressions, and lip sync. Finally, the *EVA realization framework* also incorporates procedures required for efficient integration of the embodied conversational agent into various user interfaces. In the next section, we will present particular modules of the *EVA conversational model* in more detail. Firstly, we will describe the *conversational analysis platform*, which represents the basis not only for the generation of communicative behavior but also for understanding the nature of complex

Advanced Content and Interface Personalization through Conversational Behavior…

http://dx.doi.org/10.5772/intechopen.75599

79

Conversation analysis represents a powerful tool for analyzing language and human co-verbal behavior in various aspects of social communication [39]. Namely, interaction through dialog is an act of conveying information, in which humans can convey information through a variety of methods, such as speaking, body language (gestures and posture) and facial expression, and even social signals [40]. Interpersonal communication can involve the transfer of information between two or more collocutors that use verbal and nonverbal methods and channels. Symbolic/semantic interpretation of message is presented through linguistic and paralinguistic features, while the co-verbal part in general serves as an orchestrator of communication [5]. The concept of the co-verbal behavior has become one of the central research paradigms and one of the important features of human-human interaction. It has been investigated from various perspectives, e.g., from anthropology and linguistics and psychosociological fields to companions, communication and multimodal interfaces, smart homes, ambient assisted living, etc. The multimodal corpora represent the results of various research efforts. They are the tools through which researchers analyze the inner workings of interaction. The knowledge generated by using such multimodal corpora and annotation schemes, therefore, represents a key resource for better understanding the complexity of the relations between verbal and coverbal parts of human-human communication. It provides insights into understanding of several (social) signals and their interplay in the natural exchange of information. In the following section, we will represent EVA corpus, a corpus of spontaneous and multiparty face-to-face dialog. We will also outline the EVA annotation scheme designed as a tool for corpus analytics

conversational behavior and face-to-face interaction as a whole.

**3. Conversation analysis and annotation scheme**

and generation of verbal and nonverbal resources [33].

Communication is in its fundaments a multimodal process. In order to describe the nature of face-to-face interaction, we have chosen to build upon the concept of "multimodality in interaction" over linguistic basis [2]. We extended this concept with a cognitive dimension consisted of various concepts, such as emotions, sentiment, and communicative intent. We also blend it with the meta-information represented as a set of paralinguistic features, such as facial expressions and gestures, prosody, pitch, dialog function and role, etc., [17, 35]. As outlined in **Figure 1**, the annotated EVA corpus is used to build resources required for the planning and generation of conversational behavior, both verbal part (e.g., text-to-speech (TTS) synthesis) and (non-)co-verbal part (conversational behavior synthesis). These are lexicons, language models, semiotic grammar of communicative intent, lexicon of conversational shapes, gestures and movements, acoustic and prosodic properties, and other linguistic and paralinguistic features (e.g., word/syllable segmentation, sentence type, sentiment, etc.) that are used within behavior generation rules and machine-learned behavior generation models. The resources generated within the *conversation analysis platform* are fed to the *EVA framework*. The main idea of the proprietary *EVA framework* proposed in [36] is to evoke a social response in human-machine interaction through affective synthetic response generated on arbitrary and unannotated texts. Thus, the EVA behavior generation model within the *EVA framework* is data-driven and also driven by the text-to-speech (TTS) synthesis engine. The model is modular and merged with the TTS engine's architecture, into the first omni-comprehensive TTS system's engine as proposed in [37]. The output of the EVA framework represents the complete co-verbal behavior described by using the proprietary procedural description language, named EVA-Script, and the synthesized speech, both perfectly synchronized at the phoneme level. The co-verbal behavior is described within an EVA event, where it represents a contextual link between the language, context-independent motor skills (shapes, movements, and poses that conversational agent can display), and the context-dependent intent for which the behavior has been generated (e.g., situation, relation, communicative function, etc.). The behavior is already adapted to the nature and capabilities of the virtual entity representing it. The EVA event then represents input for the *EVA realization framework*.

The *EVA realization framework* is built on the premises that natural multimodal interaction is much more than speech accompanied with the repetitive movements of the limbs and face. Namely, natural interaction entails multiple behavior variations that are correlated in dynamic and highly unpredictable settings [6]. It also incorporates various social and interpersonal signals in order to "color" the final outcome and can dynamically adapt to various intra- and interpersonal contexts as well as various situational contexts [4, 38]. The role of this framework is to transform the co-verbal descriptions contained in EVA events into articulated movement generated by the expressive virtual entity, e.g., to apply the EVA-Script language onto the articulated 3D model EVA in the form of animated movement. Further, the framework contains the *animation-parameters builder* and the *animation-realization engine*. Both are maintained within the *EVA behavior realizer* and implemented as a series of standalone modules. The *animation-parameters builder* is used to understand the specified behavior and to adapt it to the restriction of the targeted agent. Thus, it transforms the EVA-Script sequence into animation parameters that are then mapped to different control units of the ECA's 3D articulated model, while the *animation-realization engine* is responsible for scheduling and execution of the translated animation parameters in a form of sequences of parallel/sequential transformation of ECA-related resources (e.g., meshes textures, bones, morphed shapes, etc.). These resources actually constitute virtual agent's embodiment in a form of hand/arm gestures, posture and body configuration, head movement and gaze, facial expressions, and lip sync. Finally, the *EVA realization framework* also incorporates procedures required for efficient integration of the embodied conversational agent into various user interfaces. In the next section, we will present particular modules of the *EVA conversational model* in more detail. Firstly, we will describe the *conversational analysis platform*, which represents the basis not only for the generation of communicative behavior but also for understanding the nature of complex conversational behavior and face-to-face interaction as a whole.

## **3. Conversation analysis and annotation scheme**

The model builds on the notion that verbal to co-verbal alignment and synchronization are driving forces behind affective and social interaction. Thus, the *conversational analysis platform* is used for analyzing how linguistic and paralinguistic features interplay with embodiments during complex, spontaneous, and multiparty interactions. Its processing is based on multimodal corpus, named EVA corpus [33], the EVA annotation scheme developed to describe the complex relations of co-verbal behavior (proposed in [34]), and *EVA framework* capable (a) to capture various contexts in the "data" and (b) to provide the basis to analytically investigate

Communication is in its fundaments a multimodal process. In order to describe the nature of face-to-face interaction, we have chosen to build upon the concept of "multimodality in interaction" over linguistic basis [2]. We extended this concept with a cognitive dimension consisted of various concepts, such as emotions, sentiment, and communicative intent. We also blend it with the meta-information represented as a set of paralinguistic features, such as facial expressions and gestures, prosody, pitch, dialog function and role, etc., [17, 35]. As outlined in **Figure 1**, the annotated EVA corpus is used to build resources required for the planning and generation of conversational behavior, both verbal part (e.g., text-to-speech (TTS) synthesis) and (non-)co-verbal part (conversational behavior synthesis). These are lexicons, language models, semiotic grammar of communicative intent, lexicon of conversational shapes, gestures and movements, acoustic and prosodic properties, and other linguistic and paralinguistic features (e.g., word/syllable segmentation, sentence type, sentiment, etc.) that are used within behavior generation rules and machine-learned behavior generation models. The resources generated within the *conversation analysis platform* are fed to the *EVA framework*. The main idea of the proprietary *EVA framework* proposed in [36] is to evoke a social response in human-machine interaction through affective synthetic response generated on arbitrary and unannotated texts. Thus, the EVA behavior generation model within the *EVA framework* is data-driven and also driven by the text-to-speech (TTS) synthesis engine. The model is modular and merged with the TTS engine's architecture, into the first omni-comprehensive TTS system's engine as proposed in [37]. The output of the EVA framework represents the complete co-verbal behavior described by using the proprietary procedural description language, named EVA-Script, and the synthesized speech, both perfectly synchronized at the phoneme level. The co-verbal behavior is described within an EVA event, where it represents a contextual link between the language, context-independent motor skills (shapes, movements, and poses that conversational agent can display), and the context-dependent intent for which the behavior has been generated (e.g., situation, relation, communicative function, etc.). The behavior is already adapted to the nature and capabilities of the virtual entity representing it.

into various multidimensional correlations among co-verbal behavior features.

78 Artificial Intelligence - Emerging Trends and Applications

The EVA event then represents input for the *EVA realization framework*.

The *EVA realization framework* is built on the premises that natural multimodal interaction is much more than speech accompanied with the repetitive movements of the limbs and face. Namely, natural interaction entails multiple behavior variations that are correlated in dynamic and highly unpredictable settings [6]. It also incorporates various social and interpersonal signals in order to "color" the final outcome and can dynamically adapt to various intra- and interpersonal contexts as well as various situational contexts [4, 38]. The role of this framework is to transform the co-verbal descriptions contained in EVA events into Conversation analysis represents a powerful tool for analyzing language and human co-verbal behavior in various aspects of social communication [39]. Namely, interaction through dialog is an act of conveying information, in which humans can convey information through a variety of methods, such as speaking, body language (gestures and posture) and facial expression, and even social signals [40]. Interpersonal communication can involve the transfer of information between two or more collocutors that use verbal and nonverbal methods and channels. Symbolic/semantic interpretation of message is presented through linguistic and paralinguistic features, while the co-verbal part in general serves as an orchestrator of communication [5]. The concept of the co-verbal behavior has become one of the central research paradigms and one of the important features of human-human interaction. It has been investigated from various perspectives, e.g., from anthropology and linguistics and psychosociological fields to companions, communication and multimodal interfaces, smart homes, ambient assisted living, etc. The multimodal corpora represent the results of various research efforts. They are the tools through which researchers analyze the inner workings of interaction. The knowledge generated by using such multimodal corpora and annotation schemes, therefore, represents a key resource for better understanding the complexity of the relations between verbal and coverbal parts of human-human communication. It provides insights into understanding of several (social) signals and their interplay in the natural exchange of information. In the following section, we will represent EVA corpus, a corpus of spontaneous and multiparty face-to-face dialog. We will also outline the EVA annotation scheme designed as a tool for corpus analytics and generation of verbal and nonverbal resources [33].

#### **3.1. The multimodal EVA corpus**

Among video corpora, television (TV) interviews and theatrical plays have shown themselves to be very usable resource of spontaneous conversational behavior for the analytical observation, and annotation of co-verbal behavior and emotions, used during conversation. In general, TV discussions represent a mixture of institutional discourse, semi-institutional discourse and casual conversation. Material used in existing corpora is often subject to certain restrictions in order to reduce the conversational noise, such as time restriction, strict agenda, strict scenario and instructions to implement targeted concepts, and technical features (camera direction and focus, editing) that further influence especially communicative function of co-verbal behavior and its expressive dimensions (speech, gestures, facial displays). However, the conversational noise, if properly analyzed and incorporated, may unravel a lot of features and contexts that model the natural multimodal conversational expressions. Namely, by exploiting the casual nature and noise in the material, as we do with the EVA corpus, we can take into consideration the complete interplay of various conversation phenomena, such as dialog, emotional attitude, prosody, communicative intents, structuring of information, and the form of its representation. All these can give us a true insight into how informal communication works, what stimuli triggers conversational phenomena, and how do these impulses interact and reflect on each other. Such relations can then provide synthetic agents with the basis for the multimodal literacy, namely, the capacity to construct meaning through understanding of situation and responding to some not predefined situation. The conversational setting in the EVA corpus is totally relaxed and free and is built around a talk show that follows some script/scenario; however, the topics discussed are highly unpredictable, changeable, informal, and full of humor and emotions. Further, although sequencing exists, it is performed highly unorderly as are also the communicative functions. This guarantees a highly causal and unordered human discourse, with lots of overlapping statements and roles, vivid emotional responses, and facial expressions. The goals of the corpus and the annotation scheme are built around (semiotic) communicative intent as the driving force for the generation of co-verbal and communicative behavior. The communicative intent is a concept through which we are able to correlate the intent of the spoken information (defined, e.g., through part-of-speech (POS) tags, prosodic features, and classification of interpretation through meaning) with co-verbal behavior (gestures). Human face-to-face interactions are multimodal and go well beyond pure language and semantics. Within the EVA corpus and corpus analysis outlined in this section, we decided for the extension of semantics by applying the concept of communicative intent and other linguistic and paralinguistic features, such as dialog role and functions, attitude, sentiment and emotions, prosodic phrases, pitch, accentuation, etc., to the observed exchange of information.

scheme is outlined in **Figure 2**. In addition to symbolic conversational correlations, the presented scheme also targets the analysis of the form of movement in high resolution. This allows us to test and evaluate also the low-level correlation between movement and prosody, communicative intent, and other linguistic and paralinguistic features. As a result, we can analyze the face-to-face interactions in greater detail. Further, through the extracted knowledge, we are able to pair features into complex stimuli used for triggering the generation of the conversational artifacts and to improve the understanding of the situation through multimodality. As can be seen in **Figure 2**, the annotation session per speaker is separated into symbolical annotation (e.g., annotation of function) and into annotation of the form (e.g., annotation of visualization). Each of the annotated features (linguistic, paralinguistic, and movement/shape related) is captured on a separate track and interlinked with spoken content and movement via a shared timeline. In this way we are able to analyze and search for various multidimensional relationships between conversational artifacts and identify and establish temporal and symbolic links between verbal and co-verbal features of complex

Advanced Content and Interface Personalization through Conversational Behavior…

http://dx.doi.org/10.5772/intechopen.75599

81

As outlined in **Figure 2**, the EVA annotation scheme has the capacity not only to establish links between features on the symbolic level but also to interlink the form of co-verbal movement and its manifestation (e.g., the visualization) with symbolic artifacts, such as dialog role, emotions, lemma, POS tags, sentence type, phrase breaks, prominence, sentiment, and semiotic intent. This is quite important for investigating into the multidimensional interlinks between various features. For instance, the co-verbal behavior may originate as a reflection of attitude/emotion or even be a supportive artifact in the implementation of the communicative function (e.g., feedback, turn taking, turn accepting, sequencing, etc.), while the verbal behavior primarily used for representation of information may also reflect attitude/emotion or be adjusted to serve as a part of the implementation of a communicative function. Through

**Figure 2.** The formal model of the EVA annotation scheme and the topology of the annotation of the form.

multiparty interaction.

#### **3.2. The EVA annotation scheme**

In order to capture and analyze conversational phenomena in EVA corpus, the video material is annotated by following the EVA annotation scheme that incorporates linguistic and paralinguistic features and interlinks them with nonverbal movement [37, 41]. The annotation process is performed separately for each speaker. The formal model of the annotation scheme is outlined in **Figure 2**. In addition to symbolic conversational correlations, the presented scheme also targets the analysis of the form of movement in high resolution. This allows us to test and evaluate also the low-level correlation between movement and prosody, communicative intent, and other linguistic and paralinguistic features. As a result, we can analyze the face-to-face interactions in greater detail. Further, through the extracted knowledge, we are able to pair features into complex stimuli used for triggering the generation of the conversational artifacts and to improve the understanding of the situation through multimodality. As can be seen in **Figure 2**, the annotation session per speaker is separated into symbolical annotation (e.g., annotation of function) and into annotation of the form (e.g., annotation of visualization). Each of the annotated features (linguistic, paralinguistic, and movement/shape related) is captured on a separate track and interlinked with spoken content and movement via a shared timeline. In this way we are able to analyze and search for various multidimensional relationships between conversational artifacts and identify and establish temporal and symbolic links between verbal and co-verbal features of complex multiparty interaction.

**3.1. The multimodal EVA corpus**

80 Artificial Intelligence - Emerging Trends and Applications

tuation, etc., to the observed exchange of information.

In order to capture and analyze conversational phenomena in EVA corpus, the video material is annotated by following the EVA annotation scheme that incorporates linguistic and paralinguistic features and interlinks them with nonverbal movement [37, 41]. The annotation process is performed separately for each speaker. The formal model of the annotation

**3.2. The EVA annotation scheme**

Among video corpora, television (TV) interviews and theatrical plays have shown themselves to be very usable resource of spontaneous conversational behavior for the analytical observation, and annotation of co-verbal behavior and emotions, used during conversation. In general, TV discussions represent a mixture of institutional discourse, semi-institutional discourse and casual conversation. Material used in existing corpora is often subject to certain restrictions in order to reduce the conversational noise, such as time restriction, strict agenda, strict scenario and instructions to implement targeted concepts, and technical features (camera direction and focus, editing) that further influence especially communicative function of co-verbal behavior and its expressive dimensions (speech, gestures, facial displays). However, the conversational noise, if properly analyzed and incorporated, may unravel a lot of features and contexts that model the natural multimodal conversational expressions. Namely, by exploiting the casual nature and noise in the material, as we do with the EVA corpus, we can take into consideration the complete interplay of various conversation phenomena, such as dialog, emotional attitude, prosody, communicative intents, structuring of information, and the form of its representation. All these can give us a true insight into how informal communication works, what stimuli triggers conversational phenomena, and how do these impulses interact and reflect on each other. Such relations can then provide synthetic agents with the basis for the multimodal literacy, namely, the capacity to construct meaning through understanding of situation and responding to some not predefined situation. The conversational setting in the EVA corpus is totally relaxed and free and is built around a talk show that follows some script/scenario; however, the topics discussed are highly unpredictable, changeable, informal, and full of humor and emotions. Further, although sequencing exists, it is performed highly unorderly as are also the communicative functions. This guarantees a highly causal and unordered human discourse, with lots of overlapping statements and roles, vivid emotional responses, and facial expressions. The goals of the corpus and the annotation scheme are built around (semiotic) communicative intent as the driving force for the generation of co-verbal and communicative behavior. The communicative intent is a concept through which we are able to correlate the intent of the spoken information (defined, e.g., through part-of-speech (POS) tags, prosodic features, and classification of interpretation through meaning) with co-verbal behavior (gestures). Human face-to-face interactions are multimodal and go well beyond pure language and semantics. Within the EVA corpus and corpus analysis outlined in this section, we decided for the extension of semantics by applying the concept of communicative intent and other linguistic and paralinguistic features, such as dialog role and functions, attitude, sentiment and emotions, prosodic phrases, pitch, accen-

As outlined in **Figure 2**, the EVA annotation scheme has the capacity not only to establish links between features on the symbolic level but also to interlink the form of co-verbal movement and its manifestation (e.g., the visualization) with symbolic artifacts, such as dialog role, emotions, lemma, POS tags, sentence type, phrase breaks, prominence, sentiment, and semiotic intent. This is quite important for investigating into the multidimensional interlinks between various features. For instance, the co-verbal behavior may originate as a reflection of attitude/emotion or even be a supportive artifact in the implementation of the communicative function (e.g., feedback, turn taking, turn accepting, sequencing, etc.), while the verbal behavior primarily used for representation of information may also reflect attitude/emotion or be adjusted to serve as a part of the implementation of a communicative function. Through

**Figure 2.** The formal model of the EVA annotation scheme and the topology of the annotation of the form.

the annotation scheme, all artifacts are interconnected through temporal domain and can be related among each other in numerous ways and combinations.

Overall, the symbolic annotation allows us to identify and describe in detail the nature of communicative acts performed during information exchange. The annotation of the form, on the other hand, then describes the shapes and movements generated during these symbolically defined communication acts. Thus, the concept of EVA annotation is based on the idea that symbolic relations and concepts are established on the functional/symbolic level and realized via body expressions, e.g., hand gestures (left and right arm and hands), facial expression, head movement, and gaze. During each symbolic act, the movement of each body part (hands, arms, head, and face) can be described with movement phrase, movement phases, transitions, and the articulators propagating the observed movement. The movement phrase describes the full span of movement phases, where each movement phase contains a mandatory stroke and optional preparation, hold, and retraction phases. Further, each movement phase identifies a pose at the beginning *Ps* and a pose at the end *Pe*, where poses are "interconnected" with a trajectory that identifies the path over which the observed body parts propagate from the start pose to the end pose. The trajectory *T* is a parametric description of propagation, which includes the partitioning of the trajectory *T* into movement primes (simple patterns), such as linear and arc, each defined through the intermediate poses.

## **4. Advanced co-verbal behavior generation by using EVA framework**

facilitates language−/context-dependent and language−/context-independent resources. The language−/context-dependent resources are linguistic resources and the nonlinguistic semiotic grammar. The linguistic resources include lexicons, language models, rules, and corpuses, while the nonlinguistic grammar includes sets of semiotic rules for pairing language with communicative intent. The language−/context-independent resources are *Gesticon* and the repository of *motor skills*. *Gesticon* couples the context-independent *motor skills* with the semiotic nature of the intent. Namely, we associate a given semiotic pattern (communicative intent) with a unique set of features describing the manifestation of the shape/pose. The semiotic pattern incorporates a semiotic class/subclass, the movement phase within which the pose manifestation is observed, and the POS tag of the word represented as the nucleus of meaning. The unique set of features describing the manifestation of shape/pose incorporates body-pose identification, representing a pair of initial and final pose, a general trajectory of hand movement, semantic word relation, and minimal and maximal temporal values within which the gesture was observed to be carried out, and the number of occurrences of the given gesture that was observed in the EVA corpus. The *semiotic grammar* and *Gesticon* are created and populated with patterns and associated with unique sets based on the analysis and anno-

**Figure 3.** The architecture of the EVA behavior generation model for the generation of expressive co-verbal behavior.

Advanced Content and Interface Personalization through Conversational Behavior…

http://dx.doi.org/10.5772/intechopen.75599

83

The conceptual EVA behavior generation model has been actualized in the form of the EVA engine outlined in **Figure 4**. The EVA engine converts a general text into the speech signal accompanied by humanlike synchronized gesticulation and lip movement. The EVA engine

tation discussed in Section 3.

The EVA behavior generation model proposed in [36] is used to convert general unannotated texts into co-verbal behavior description automatically. The model integrates several nonverbal elements that are associated (correlated) with the verbal behavior (speech). Therefore, general texts can be presented as multimodal output, consisting of spoken communication channel as well as synchronized visual communication channel. The EVA behavior generation model performs synchronization of the verbal and nonverbal elements that is necessary in order to achieve desired naturalness, in the domain of meaning (intent) and in the temporal domain. Further, the EVA model generates the co-verbal behavior descriptions and the verbal behavior simultaneously. The EVA model distinguishes between the behavior generation and behavior realization step. **Figure 3** outlines the expressive conversational behavior generation module, which consists of the following three concurrent processes: intent classification, behavior planning, and speech synthesis. The *speech synthesis process* converts general text into speech signal and also represents a source of linguistic and prosodic features that are used for planning and synchronizing the nonverbal behavior. Further, the *intent classification process* identifies the nature of the spoken content through pattern matching incorporating linguistic and prosodic features, where the intent of the input text is defined in the form of classification of linguistic expressions into semiotic classes. The result is a set of possible interpretations of the input text. Further, the *behavior planning process* involves filtering of several interpretations, the pose/gesture selection process based on target cost calculation mechanism, and the temporal synchronization step based on prosodic and acoustic features obtained during synthesizing the speech signal. As outlined in **Figure 3**, the EVA behavior generation model Advanced Content and Interface Personalization through Conversational Behavior… http://dx.doi.org/10.5772/intechopen.75599 83

the annotation scheme, all artifacts are interconnected through temporal domain and can be

Overall, the symbolic annotation allows us to identify and describe in detail the nature of communicative acts performed during information exchange. The annotation of the form, on the other hand, then describes the shapes and movements generated during these symbolically defined communication acts. Thus, the concept of EVA annotation is based on the idea that symbolic relations and concepts are established on the functional/symbolic level and realized via body expressions, e.g., hand gestures (left and right arm and hands), facial expression, head movement, and gaze. During each symbolic act, the movement of each body part (hands, arms, head, and face) can be described with movement phrase, movement phases, transitions, and the articulators propagating the observed movement. The movement phrase describes the full span of movement phases, where each movement phase contains a mandatory stroke and optional preparation, hold, and retraction phases. Further, each movement phase identifies a pose at the beginning *Ps* and a pose at the end *Pe*, where poses are "interconnected" with a trajectory that identifies the path over which the observed body parts propagate from the start pose to the end pose. The trajectory *T* is a parametric description of propagation, which includes the partitioning of the trajectory *T* into movement primes

(simple patterns), such as linear and arc, each defined through the intermediate poses.

**4. Advanced co-verbal behavior generation by using EVA framework**

The EVA behavior generation model proposed in [36] is used to convert general unannotated texts into co-verbal behavior description automatically. The model integrates several nonverbal elements that are associated (correlated) with the verbal behavior (speech). Therefore, general texts can be presented as multimodal output, consisting of spoken communication channel as well as synchronized visual communication channel. The EVA behavior generation model performs synchronization of the verbal and nonverbal elements that is necessary in order to achieve desired naturalness, in the domain of meaning (intent) and in the temporal domain. Further, the EVA model generates the co-verbal behavior descriptions and the verbal behavior simultaneously. The EVA model distinguishes between the behavior generation and behavior realization step. **Figure 3** outlines the expressive conversational behavior generation module, which consists of the following three concurrent processes: intent classification, behavior planning, and speech synthesis. The *speech synthesis process* converts general text into speech signal and also represents a source of linguistic and prosodic features that are used for planning and synchronizing the nonverbal behavior. Further, the *intent classification process* identifies the nature of the spoken content through pattern matching incorporating linguistic and prosodic features, where the intent of the input text is defined in the form of classification of linguistic expressions into semiotic classes. The result is a set of possible interpretations of the input text. Further, the *behavior planning process* involves filtering of several interpretations, the pose/gesture selection process based on target cost calculation mechanism, and the temporal synchronization step based on prosodic and acoustic features obtained during synthesizing the speech signal. As outlined in **Figure 3**, the EVA behavior generation model

related among each other in numerous ways and combinations.

82 Artificial Intelligence - Emerging Trends and Applications

**Figure 3.** The architecture of the EVA behavior generation model for the generation of expressive co-verbal behavior.

facilitates language−/context-dependent and language−/context-independent resources. The language−/context-dependent resources are linguistic resources and the nonlinguistic semiotic grammar. The linguistic resources include lexicons, language models, rules, and corpuses, while the nonlinguistic grammar includes sets of semiotic rules for pairing language with communicative intent. The language−/context-independent resources are *Gesticon* and the repository of *motor skills*. *Gesticon* couples the context-independent *motor skills* with the semiotic nature of the intent. Namely, we associate a given semiotic pattern (communicative intent) with a unique set of features describing the manifestation of the shape/pose. The semiotic pattern incorporates a semiotic class/subclass, the movement phase within which the pose manifestation is observed, and the POS tag of the word represented as the nucleus of meaning. The unique set of features describing the manifestation of shape/pose incorporates body-pose identification, representing a pair of initial and final pose, a general trajectory of hand movement, semantic word relation, and minimal and maximal temporal values within which the gesture was observed to be carried out, and the number of occurrences of the given gesture that was observed in the EVA corpus. The *semiotic grammar* and *Gesticon* are created and populated with patterns and associated with unique sets based on the analysis and annotation discussed in Section 3.

The conceptual EVA behavior generation model has been actualized in the form of the EVA engine outlined in **Figure 4**. The EVA engine converts a general text into the speech signal accompanied by humanlike synchronized gesticulation and lip movement. The EVA engine

*the synchronization of intent and shape* (the intent classification and planning), *the synchronization of several movements* (movement planning), and *prosodic and timing synchronization regard-*

In phase I, named *classification of the intent*, the input is POS-tagged text and *semiotic grammar*. The *semiotic grammar* is used for mapping individual morphosyntactic sequence in the text onto a

we are able to perform the classification of the intent by defining the semiotic classes and corresponding cores of meaning. The algorithm searches for the longest morphosyntactic sequence *xi-j* that can be found in the *semiotic grammar*, while the following two rules are implemented:

• *If a sequence x at word index j is already contained in previous CU elements started at word index i* 

In this way, for each such sequence, the content unit (*CU*) is created. The *CUs* are used to store the semiotic classification of the intent as well as those meaningful words that actually induce

Nevertheless, sentences/statements can have multiple interpretations. Further, the classified *CU* interpretations can partly or fully overlap, therefore, introducing ambiguities and a number of inconsistencies. Thus, in phase II, named *planning of the intent,* these inconsistencies and

Represents a unique set of features describing the semiotic class/subclass *S*, the index of the word(s) that represent a nucleus of meaning *I*, the distribution of the POS sequences for the specific semiotic class/subclass *ωs*, and the distribu-

of the gesture.

(*S*) <sup>⊆</sup> *xB*

Advanced Content and Interface Personalization through Conversational Behavior…

http://dx.doi.org/10.5772/intechopen.75599

85

, *pS*}. 1

(*S*)*, where both sequences belong to* 

In this way,

corresponding parametric description of the semiotic class (subclass) *<sup>Z</sup>* <sup>=</sup> {*S*, *<sup>I</sup>*, *<sup>ω</sup><sup>S</sup>*

*ing speech and gestures* (synchronization of the form).

**Figure 5.** The algorithm for the generation of expressive co-verbal behavior.

• *If at a specific word index the sequence xA happens to be xA*

the shape in the stroke movement phase *FS*

1

*the same semiotic class, then sequence xA must be discarded*.

*and with the same semiotic intent (i < j), then it is discarded*.

tion of selecting the specific POS sequence among the semiotic classes/subclasses *ps.*

**Figure 4.** EVA engine: implementation of the EVA behavior generation model.

is composed of (a) processing steps for text-to-speech synthesis as proposed in the TTS system PLATTOS [42] and of (b) processing steps for expressive co-verbal behavior generation algorithm. All steps are fused into a compact processing EVA engine. In this way, the expressive co-verbal behavior generation algorithm works with the verbal modules concurrently, by sharing data, machine-trained models, and other resources. The EVA engine takes into account the latest results of research on multimodal communication, goes beyond traditional computational speech processing techniques, and facilitates heuristic and psychological models of human interaction. The algorithm for the generation of expressive co-verbal behavior implemented within the engine in **Figure 5** generates the co-verbal behavior by considering Advanced Content and Interface Personalization through Conversational Behavior… http://dx.doi.org/10.5772/intechopen.75599 85

**Figure 5.** The algorithm for the generation of expressive co-verbal behavior.

is composed of (a) processing steps for text-to-speech synthesis as proposed in the TTS system PLATTOS [42] and of (b) processing steps for expressive co-verbal behavior generation algorithm. All steps are fused into a compact processing EVA engine. In this way, the expressive co-verbal behavior generation algorithm works with the verbal modules concurrently, by sharing data, machine-trained models, and other resources. The EVA engine takes into account the latest results of research on multimodal communication, goes beyond traditional computational speech processing techniques, and facilitates heuristic and psychological models of human interaction. The algorithm for the generation of expressive co-verbal behavior implemented within the engine in **Figure 5** generates the co-verbal behavior by considering

**Figure 4.** EVA engine: implementation of the EVA behavior generation model.

84 Artificial Intelligence - Emerging Trends and Applications

*the synchronization of intent and shape* (the intent classification and planning), *the synchronization of several movements* (movement planning), and *prosodic and timing synchronization regarding speech and gestures* (synchronization of the form).

In phase I, named *classification of the intent*, the input is POS-tagged text and *semiotic grammar*. The *semiotic grammar* is used for mapping individual morphosyntactic sequence in the text onto a corresponding parametric description of the semiotic class (subclass) *<sup>Z</sup>* <sup>=</sup> {*S*, *<sup>I</sup>*, *<sup>ω</sup><sup>S</sup>* , *pS*}. 1 In this way, we are able to perform the classification of the intent by defining the semiotic classes and corresponding cores of meaning. The algorithm searches for the longest morphosyntactic sequence *xi-j* that can be found in the *semiotic grammar*, while the following two rules are implemented:


In this way, for each such sequence, the content unit (*CU*) is created. The *CUs* are used to store the semiotic classification of the intent as well as those meaningful words that actually induce the shape in the stroke movement phase *FS* of the gesture.

Nevertheless, sentences/statements can have multiple interpretations. Further, the classified *CU* interpretations can partly or fully overlap, therefore, introducing ambiguities and a number of inconsistencies. Thus, in phase II, named *planning of the intent,* these inconsistencies and

<sup>1</sup> Represents a unique set of features describing the semiotic class/subclass *S*, the index of the word(s) that represent a nucleus of meaning *I*, the distribution of the POS sequences for the specific semiotic class/subclass *ωs*, and the distribution of selecting the specific POS sequence among the semiotic classes/subclasses *ps.*

ambiguities have to be resolved by using *integration*, *elimination*, and *selection*. For this task the prosodic information (prominence, stress, prosodic phrases), as predicted by the TTS modules, is used. The prosodic information includes syllables labeled with *PA* (most prominent) and *NA* (stressed) tags that exist within minor phrases (*B2*) or major phrases (*B3*). The resolving is implanted by observing the following rules:


At the end, the structure of the *intent* is uniquely defined, and the co-verbal behavior *G* can be represented as a sequence of co-verbal models *H* that are related with the *CU* as follows:

$$\hat{G} = T\_{\textit{u}}^{-1} \hat{H} = H[\textit{CLI}\_{\textit{p}'} \boldsymbol{t}\_1] \times H[\textit{CLI}\_{\textit{p}'} \boldsymbol{t}\_2] \times \dots \times H[\textit{CLI}\_{\textit{n}'} \boldsymbol{t}\_n] \tag{1}$$

The semiotic indicator *I* is defined as a set of corresponding semantic indicators contained within individual *CUs* in the sequence. Finally, *external overlapping* can occur, when some *CU* boundaries are stretched over the boundaries of the prosodic phrase. In this case the *CU* is

Advanced Content and Interface Personalization through Conversational Behavior…

http://dx.doi.org/10.5772/intechopen.75599

• *The CU element includes the PA syllable, and this PA syllable lies within the boundaries of the pro-*

• *The semiotic indicator I of the CU element lies within the boundaries of the prosodic phrase B2, i.e.,*

• *If CU* ∩ *next*(*CU*) *is not empty, the right boundary of the CU is set to the left boundary of the next (CU)* 

• *If CU* ∩ *next*(*CU*) *represents an empty set, the CU element is kept completely. Further, those words that lie outside the prosodic phrase that contains semiotic indicator I can only represent the holding phase* 

.

And, only when both rules are met, the following two rules are implemented:

, *CUi*+1) =

I ∈ B ∧ PA ∈ B ∧ PA ∈ *CU<sup>i</sup>* ∧ *CU<sup>i</sup>* ∩ *CU<sup>i</sup>*+1 ≠ ∅ *keep* & *extend*; *if* I <sup>∈</sup> <sup>B</sup> <sup>∧</sup> PA <sup>∈</sup> <sup>B</sup> <sup>∧</sup> PA <sup>∈</sup> *CU<sup>i</sup>* <sup>∧</sup> *CU<sup>i</sup>* <sup>∩</sup> *CU<sup>i</sup>*+1 <sup>=</sup> <sup>∅</sup>

In phase III, named *planning of the movement*, the movement models, which are based on *CU* units, are defined. A movement model is an animated sequence of shapes/poses that together represent

is aligned with the acoustic prosody information, as specified by the TTS engine. Therefore, the prosodic synchronization of movement phases is based on temporal information (regarding phoneme and pause durations). The following rule is used for the stroke movement phases *FS*

*enumeration* and/or *search*, which are not directly related to the *PA* syllables, by using the fol-

• *If the word that represents the semiotic indicator I for the specific CU does not contain the PA syl-*

 *is always performed on the PA word and is ended together with the correspond-*

(4)

87

has to be defined, which

with the gesticulation in case of

:

I ∉ B ∨ PA ∉ B ∨ PA ∉ *CU<sup>i</sup>*

a co-verbal expression. For each *H*, at least a stroke movement phase *FS*

The next step then represents the synchronization of all *FS*

*lable, the NA syllable is considered in the same way instead.*

kept only when the following rules are met:

*element, in order that CU* ∩ *next*(*CU*) = ∅ *is true*.

In this case the common *CU* is created as:

⎧

⎪ ⎨

⎪ ⎩

*CU* = *f*(*CU<sup>i</sup>*

*remove*; *if*

*keep* & *reduce*; *if*

*sodic phrase B2: PA* <sup>∈</sup> *<sup>B</sup>*<sup>2</sup> <sup>∧</sup> *PA* <sup>∈</sup> *CU<sup>i</sup>*

*and/or the retraction phase*.

• *The stroke phase FS*

*ing PA syllable.*

lowing rule:

*I* ∈ *B*.

where *H*[*CU<sup>i</sup>* , *t i*] describes the movement model that depends on semiotic classification and prosodic characteristics in each *CU* element. However, *internal overlapping* can still occur, when several *CU* elements contain one or more of the same words, while their boundaries lie within the same prosodic phrase. In this case we have to decide, which of the *CU* elements must be kept, since only one *CU* element is allowed within each prosodic phrase. Firstly, the algorithm removes all those *CU* elements that do not contain a word with a *PA* syllable, and if overlapping still exists, then on the remaining *CUs*, their normal distribution *ω<sup>s</sup>* is considered, as calculated for the given semiotic class, when the *CU* with its maximum value is only kept. Therefore, common *CU* is created as:

$$\begin{aligned} \text{CLI}\_{\text{S(CLI}\_{n}):\text{s(CLI}\_{n})} &= f(\text{CLI}\_{n'}, \dots, \text{CLI}\_{n}) = \\ \begin{cases} \text{CLI}\_{n'} \dot{\text{if}} \,\omega\_{s} \text{(CLI}\_{n}) > \omega\_{s} \text{(CLI}\_{n}) \\ \text{CLI}\_{n'} \dot{\text{if}} \,\omega\_{s} \text{(CLI}\_{n}) < \omega\_{s} \text{(CLI}\_{n}) \end{cases} \\ \begin{cases} \text{max}\,(\text{len(CLI}\_{n}) \,\text{len(CLI}\_{n})) \text{if} \,\omega\_{s} \text{(CLI}\_{n}) = \omega\_{s} \text{(CLI}\_{n}) \end{cases} \end{aligned} \tag{2}$$

Further, in the case of *ambiguity*, there are multiple *CUs* within a single prosodic phrase but without overlapping. When these *CU*s are consecutive and classify the same semiotic class *S* (e.g., represent the same communicative intent), they are merged into a single *CU* element as follows:

$$\begin{array}{cccc} \circ & \circ & \circ & \circ \\ \text{CLI} = f(\text{CLI}\_{l}, \ldots, \text{CLI}\_{l}) = \\ & \\ \begin{cases} \text{Join} \, \text{CLI}\_{l'} \, \text{CLI}\_{k+l'} \ldots \text{:} \, \text{if} \\ \text{CLI}\_{k} \, \text{POS} = \text{CLI}\_{k+l} \text{POS} \wedge \text{S} \{ \text{CLI}\_{l} \} = \text{S} \{ \text{CLI}\_{k+l} \} \\ \text{Add into a set;} \, \text{otherwise} \end{cases} \end{array} \tag{3}$$

The semiotic indicator *I* is defined as a set of corresponding semantic indicators contained within individual *CUs* in the sequence. Finally, *external overlapping* can occur, when some *CU* boundaries are stretched over the boundaries of the prosodic phrase. In this case the *CU* is kept only when the following rules are met:


And, only when both rules are met, the following two rules are implemented:


In this case the common *CU* is created as:

ambiguities have to be resolved by using *integration*, *elimination*, and *selection*. For this task the prosodic information (prominence, stress, prosodic phrases), as predicted by the TTS modules, is used. The prosodic information includes syllables labeled with *PA* (most prominent) and *NA* (stressed) tags that exist within minor phrases (*B2*) or major phrases (*B3*). The resolv-

• *Each CU must include the most prominent syllable (PA) within a given prosodic phrase (B2 or B3),* 

• *Each prosodic phrase can be represented by no more than one concept of motion, i.e., with no more* 

• *When the CU element contains the semiotic class enumeration, the CU boundaries must remain* 

At the end, the structure of the *intent* is uniquely defined, and the co-verbal behavior *G* can be represented as a sequence of co-verbal models *H* that are related with the *CU* as follows:

<sup>1</sup>] × *H*[*CU*<sup>2</sup>

sodic characteristics in each *CU* element. However, *internal overlapping* can still occur, when several *CU* elements contain one or more of the same words, while their boundaries lie within the same prosodic phrase. In this case we have to decide, which of the *CU* elements must be kept, since only one *CU* element is allowed within each prosodic phrase. Firstly, the algorithm removes all those *CU* elements that do not contain a word with a *PA* syllable, and if overlap-

calculated for the given semiotic class, when the *CU* with its maximum value is only kept.

max(*len*(*CUm*), *len*(*CUn*)); *if ωs*(*CUm*) = *ωs*(*CUn*)

Further, in the case of *ambiguity*, there are multiple *CUs* within a single prosodic phrase but without overlapping. When these *CU*s are consecutive and classify the same semiotic class *S* (e.g., represent the same communicative intent), they are merged into a single *CU* element as follows:

, *t*

*i*] describes the movement model that depends on semiotic classification and pro-

.*POS* <sup>∧</sup> *<sup>S</sup>*(*CUk*) <sup>=</sup> *<sup>S</sup>*(*CUk*+1)

<sup>2</sup>] × …× *H*[*CU<sup>n</sup>*

, *t*

*<sup>n</sup>*] (1)

is considered, as

(2)

(3)

ing is implanted by observing the following rules:

• *Each CU element must lie within the prosodic phrase (B2 or B3)*.

*unchanged (the boundaries of prosodic phrases are not considered)*.

*H*̂ = *H*[*CU*<sup>1</sup>

ping still exists, then on the remaining *CUs*, their normal distribution *ω<sup>s</sup>*

*CU<sup>S</sup>*(*CUm*)=*S*(*CUm*) = *f*(*CUm*, …,*CUn*) =

; *if <sup>ω</sup>*s(*CUm*) <sup>&</sup>lt; *<sup>ω</sup>s*(*CUn*)

, …,*CUj*) = <sup>⎧</sup>

, …; *if*

*CUm*; *if ωs*(*CUm*) > *ωs*(*CUn*)

, *t*

−1

*except in the case of enumeration*.

86 Artificial Intelligence - Emerging Trends and Applications

*than one element CU*.

*G*̂ = *Tm*

Therefore, common *CU* is created as:

⎧ ⎪ ⎨ ⎪ ⎩

*CU<sup>n</sup>*

*CU* = *f*

Join *CU<sup>k</sup>*

*CU<sup>k</sup>*

⎪ ⎨ ⎪ ⎩ (*CU<sup>i</sup>*

, *CU<sup>k</sup>*+1

.*POS* = *CU<sup>k</sup>*+1

Add into a set; *otherwise*

, *t*

where *H*[*CU<sup>i</sup>*

$$\begin{aligned} \text{CLI} &= \text{f}(\text{CLI}\_{i}\text{CLI}\_{i+1}) = \\ &\begin{cases} \text{keep \& } reduce; if \\ \text{I} &\in \text{B} \land \text{PA} \in \text{B} \land \text{PA} \in \text{CLI}\_{i} \land \text{CLI}\_{i+1} \neq \textsf{O} \\ \text{keep \& }extended; if \\ \text{I} &\in \text{B} \land \text{PA} \in \text{B} \land \text{PA} \in \text{CLI}\_{i} \land \text{CLI}\_{i} \cap \text{CLI}\_{i+1} = \textsf{O} \\ \text{remove; if} \\ \text{I} &\notin \text{B} \lor \text{PA} \notin \text{B} \lor \text{PA} \notin \text{CLI}\_{i} \end{aligned} \end{aligned} \tag{4}$$

In phase III, named *planning of the movement*, the movement models, which are based on *CU* units, are defined. A movement model is an animated sequence of shapes/poses that together represent a co-verbal expression. For each *H*, at least a stroke movement phase *FS* has to be defined, which is aligned with the acoustic prosody information, as specified by the TTS engine. Therefore, the prosodic synchronization of movement phases is based on temporal information (regarding phoneme and pause durations). The following rule is used for the stroke movement phases *FS* :

• *The stroke phase FS is always performed on the PA word and is ended together with the corresponding PA syllable.*

The next step then represents the synchronization of all *FS* with the gesticulation in case of *enumeration* and/or *search*, which are not directly related to the *PA* syllables, by using the following rule:

• *If the word that represents the semiotic indicator I for the specific CU does not contain the PA syllable, the NA syllable is considered in the same way instead.*

The most prominent words (with the *PA* syllable) do not necessarily represent the semiotic indicator *I* for the given *CU* element. If this is the case, the following rule is applied:

*t*

*t*

as well as restricting the set of shapes.

rations of the embodiment within *FS*

tions on each sequence is used at the end.

The stroke movement phase *FS*

*P* for each *FS*

each *FS*

max(*FS*) = *t*

max(*FR*) = *t*

max(*FP*) + *t*

min(*FR*) + ∑

where *n* represents the number of subsequent syllables, while *k* is the first syllable after the *FR*. Temporal descriptions of movement phases define time instants when the individual shape must be fully manifested and also the time that is available for the transition between the shapes. Further, this time also restricts the set of suitable motion trajectories for the transition,

In phase IV, named *synchronization of the shape*, the movement is then temporally aligned with the temporal features of verbal information (durations of phonemes and pauses). Based on morphosyntactic sequences, movement models, and durations of the movement phases, a lookup into *Gesticon* is carried out, in order to specify the best shapes *V* (or poses *P*) and trajectories *T* of the realization of the co-verbal behavior. Thus, a lookup for possible configu-

matched poses in the *Gesticon*, the set of most appropriate poses is selected by the CART (classification and regression tree) model, while the most appropriate pose *P* is assigned to

*FH* are also defined. At the end, the transition between the poses are also defined and aligned with given temporal and prosodic specifications. Namely, the trajectory describes the local space in which body part should move when traversing from the start to the end pose. The huge diversity of trajectories within the material demands restrictions, when describing them in the *Gesticon*. We are specifically interested in the trajectories of hands or the curve that the hand describes during the transition. The definition of the trajectory between two poses is performed by considering the temporal structure of the movement phase *MPH*, the semiotic class, the movement phase type, the morphosyntactic tags, prosodic features within the phase, and possible semantic relation. The lookup in the *Gesticon*, therefore, results in several possible trajectories. Therefore, only the most appropriate and closest to the temporal predic-

In phase V, named *generation of the co-verbal behavior G*, the conversion of the defined movement models (stored within the heterogeneous relation graph (HRG)) into a procedural description can be understood as a parameterization of the animation. Each movement phrase is transformed into a symbolically, prosodically, and spatially coherent movement of an individual body part. Thus, it viably illustrates the communicative intent of the corresponding verbal segment. In order to be applied on an ECA and represented to the user, it has to be converted into a procedural description in the EVA-Script notation. Each model *H* represents simultaneous execution and is described within the block *<bgesture>*.

*<bgesture >* represent sequences during which a change in configuration of embodiment

. After defining the pose candidates on each stroke phase *FS*

*j*=0 *n*−1

Advanced Content and Interface Personalization through Conversational Behavior…

. These poses are evaluated by using the *suitability functions* [43]. If there are no

Finally, *FR* can be extended with subsequent syllables but only up to the last *NA* syllable:

min(*FS*) (7)

http://dx.doi.org/10.5772/intechopen.75599

89

*t*(*zk*+1) (8)

, the poses for *FP*, *FR*, and

phases is performed. It selects a set of probable poses

and the preparation movement phase *FP* within the block

• *If the* semiotic *indicator I and the PA syllable are not represented by the same word, the stroke phase FS must be defined by following the previous rules, but the hold movement phase FSH must end with the NA syllable of the word that represents the semiotic indicator I within the given CU.*

Within the algorithm movement models, *H* is represented by movement phrase units (*MPHRs*), where each unit can contain several movement phases (*MPHs*). Further, each *MPHR* must contain at least stroke phase *FS* . The syllables that occur before the stroke phase *FS* are used for the preparation movement phase *FP*, while the *sil* segment just before the first syllable of the *FS* can be used for the hold movement phase *FHS* (hold before stroke). And, those syllables after the *FS* are used for the retraction movement phase *FR*. In this way, the behavior structure is applied by the following rules:


Additionally, the created *MPHs* can be extended or merged by the following rules:


Nevertheless, the extensions are always limited by the boundaries of the specific *CU*. The structure of the movement models *H* is now synchronized with verbal content on the symbolic level. Further, temporal synchronization is performed by considering the durations of phonemes and pauses. The duration of individual movement phase is described by the following sum of the syllable durations that they may include:

$$t\left(MPH\_{\cdot}\right) = \left(\sum\_{j=0}^{n-1} t\left(syl\,I\_{j\cdot k}\right)\right) \tag{5}$$

where *n* represents the number of syllables in each *MPH* unit and *k* its first syllable. *FP* can be fused with the *FH* phase, resulting in the following maximal duration:

$$t\_{\text{max}}\left(F\_p\right) = t\left(F\_p\right) + \sum\_{j=k}^{n-1} t\left(z\_{k\neg j}\right) + t\left(z\_r\right) \tag{6}$$

where *n* represents the number of syllables before the *FP*, *k* its first syllable, and *r* the first syllable after the *FP*. Further, *FS* can be fused with *FP*, resulting in the maximal duration:

$$t\_{\text{max}}(F\_{\text{g}}) = t\_{\text{max}}(F\_p) + t\_{\text{min}}(F\_{\text{g}}) \tag{7}$$

Finally, *FR* can be extended with subsequent syllables but only up to the last *NA* syllable:

The most prominent words (with the *PA* syllable) do not necessarily represent the semiotic

• *If the* semiotic *indicator I and the PA syllable are not represented by the same word, the stroke phase* 

Within the algorithm movement models, *H* is represented by movement phrase units (*MPHRs*), where each unit can contain several movement phases (*MPHs*). Further, each *MPHR* must

for the preparation movement phase *FP*, while the *sil* segment just before the first syllable of

• *The sil segment, which can have a nonzero duration between the words with the preparation phase* 

• The right boundary of movement phase (with the exception of the hold phase FH) must be

• The stroke phase FS and the preparation phase FP can be joined into the stroke phase FS, as this often occurs in multimodal communication (as observed in database annotations).

Nevertheless, the extensions are always limited by the boundaries of the specific *CU*. The structure of the movement models *H* is now synchronized with verbal content on the symbolic level. Further, temporal synchronization is performed by considering the durations of phonemes and pauses. The duration of individual movement phase is described by the fol-

> ) <sup>=</sup> (<sup>∑</sup> *j*=0 *n*−1 *t*(*syl l*

where *n* represents the number of syllables in each *MPH* unit and *k* its first syllable. *FP* can be

*j*=*k n*−1

can be fused with *FP*, resulting in the maximal duration:

where *n* represents the number of syllables before the *FP*, *k* its first syllable, and *r* the first syl-

Additionally, the created *MPHs* can be extended or merged by the following rules:

can be used for the hold movement phase *FHS* (hold before stroke). And, those syllables

are used for the retraction movement phase *FR*. In this way, the behavior structure

 *must be defined by following the previous rules, but the hold movement phase FSH must end with* 

. The syllables that occur before the stroke phase *FS*

*, represents the so-called hold before stroke, which (if it exists) represents* 

are used

 *and lasts from the NA syllable to the* 

*<sup>j</sup>*+*<sup>k</sup>*)) (5)

*t*(*zk*−*<sup>j</sup>*) + *t*(*zr*) (6)

indicator *I* for the given *CU* element. If this is the case, the following rule is applied:

*the NA syllable of the word that represents the semiotic indicator I within the given CU.*

*FS*

the *FS*

after the *FS*

contain at least stroke phase *FS*

is applied by the following rules:

*.*

88 Artificial Intelligence - Emerging Trends and Applications

*a ready-made idea regarding the content.*

*t*(*MPHi*

*t*

lable after the *FP*. Further, *FS*

*beginning of the FS*

*FP and the stroke phase FS*

a PA or NA syllable.

• *The* preparation *phase FP starts before the stroke phase FS*

lowing sum of the syllable durations that they may include:

fused with the *FH* phase, resulting in the following maximal duration:

max(*FP*) = *t*(*FP*) + ∑

$$t\_{\text{max}}(F\_R) = t\_{\text{min}}(F\_R) + \sum\_{j=0}^{n-1} t\left(z\_{j+1}\right) \tag{8}$$

where *n* represents the number of subsequent syllables, while *k* is the first syllable after the *FR*. Temporal descriptions of movement phases define time instants when the individual shape must be fully manifested and also the time that is available for the transition between the shapes. Further, this time also restricts the set of suitable motion trajectories for the transition, as well as restricting the set of shapes.

In phase IV, named *synchronization of the shape*, the movement is then temporally aligned with the temporal features of verbal information (durations of phonemes and pauses). Based on morphosyntactic sequences, movement models, and durations of the movement phases, a lookup into *Gesticon* is carried out, in order to specify the best shapes *V* (or poses *P*) and trajectories *T* of the realization of the co-verbal behavior. Thus, a lookup for possible configurations of the embodiment within *FS* phases is performed. It selects a set of probable poses *P* for each *FS* . These poses are evaluated by using the *suitability functions* [43]. If there are no matched poses in the *Gesticon*, the set of most appropriate poses is selected by the CART (classification and regression tree) model, while the most appropriate pose *P* is assigned to each *FS* . After defining the pose candidates on each stroke phase *FS* , the poses for *FP*, *FR*, and *FH* are also defined. At the end, the transition between the poses are also defined and aligned with given temporal and prosodic specifications. Namely, the trajectory describes the local space in which body part should move when traversing from the start to the end pose. The huge diversity of trajectories within the material demands restrictions, when describing them in the *Gesticon*. We are specifically interested in the trajectories of hands or the curve that the hand describes during the transition. The definition of the trajectory between two poses is performed by considering the temporal structure of the movement phase *MPH*, the semiotic class, the movement phase type, the morphosyntactic tags, prosodic features within the phase, and possible semantic relation. The lookup in the *Gesticon*, therefore, results in several possible trajectories. Therefore, only the most appropriate and closest to the temporal predictions on each sequence is used at the end.

In phase V, named *generation of the co-verbal behavior G*, the conversion of the defined movement models (stored within the heterogeneous relation graph (HRG)) into a procedural description can be understood as a parameterization of the animation. Each movement phrase is transformed into a symbolically, prosodically, and spatially coherent movement of an individual body part. Thus, it viably illustrates the communicative intent of the corresponding verbal segment. In order to be applied on an ECA and represented to the user, it has to be converted into a procedural description in the EVA-Script notation. Each model *H* represents simultaneous execution and is described within the block *<bgesture>*. The stroke movement phase *FS* and the preparation movement phase *FP* within the block *<bgesture >* represent sequences during which a change in configuration of embodiment is actually requested. The hold movement phase *FH* and the retraction movement phase *FR*, however, do not require procedural description. Namely, *FH* only represents a hold of the existing configuration, while *FR* a retraction into a rest/neutral state. The transformation of movement model *H* into an EVA event (co-verbal behavior written in EVA-Script) is outlined in **Figure 6**. **Figure 6** also outlines how EVA event is applied to an ECA and then realized as a multimodal expression, which is built from the synchronized verbal and co-verbal sequences. As can be seen, the *FP* is defined across the word "*bila*" (*was*), with predicted extension up to the word "*je*" (*is*). Further, the predicted duration of the *FP* phase is 413 ms, while the maximum duration of the *FP* is 593 ms. The *FS* is defined across the *PA* word "*tako*" (*that*), with a predicted duration 300 ms and a maximal duration 893 ms. The post-stroke-hold phase *FHS* is identified across the semiotic nucleus, the word "*velika*" (*big*), with a duration of 451 ms.

**5. Realization of expressive conversational behavior on embodied** 

**Figure 7**, two animation engines are implemented, one is based on Panda 3D2

articulated 3D models. However, all articulated 3D models support the same movement controllers (bones and morphed shapes). Thus, any EVA event can be used by either realizer, and the result will be still practically the same. The major difference between the supported animation engines is their implementation of frame-by-frame operations. Namely, in the Panda 3D engine, frame-by-frame operations are handled internally by the renderer, while in the Unity 3D engine, the renderer only renders each frame. This means that all

and the other based on Unity 3D game engine3

Panda 3D: https://www.panda3d.org/.

Unity 3D: https://unity3d.com/.

2

3

game engine

. Each of them incorporates its own set of

The proprietary *EVA realization framework* proposed in [36–37, 44] has been developed in order to be able to evoke a social response in human-machine interaction based on expressive conversational behavior generated through the previous modules of the EVA model. The framework enables machines to engage with the user on the more personal level, namely, through humanlike entity realization of multimodal interaction models, which are based on the concept of conversation. Thus, this framework integrates ECAs as virtual bodies and generates responses via their embodiment. The ECA's artificial body and articulation capabilities (embodiment) are already close to those found on real humans. From the skin, face, hands, and body posture, these virtual entities tend to look and behave as realistically as possible. ECAs also tend to imitate as many features of human face-to-face dialogs as possible and integrate them into interaction as synchronized as possible. Although the humanlike equation is mostly defined via co-verbal behavior generation model and the corpus analysis, the framework actually represents the final component through which users actually come in contact with the response. Thus, one could say that the framework brings responses to "life." Further, diversity and capacity to handle highly dynamic and interchangeable contexts of human interaction are in addition to realism of appearance, one of the key challenges of the modern ECAs. 3D tools, such as Maya, Daz3D, Blender, Panda3D, and Unity, have opened up completely new possibilities to design virtual entities, which appear almost like real-life persons. The modern behavior generators open the possibility to plan and model responses almost completely to the context of situation and collocutor. The behavior realizer, therefore, represents the bridge between the two concepts. The *EVA realization framework* also creates an environment that is capable to deliver expressive and socially colored responses in the form of facial expressions, gaze, head, and hand movement. Its architecture is outlined in **Figure 7**. It consists of animation-parameters builder, animation-realization engines, articulated 3D models, and created 3D resources. The *animation-parameters builder* is used to understand and transform the co-verbal events into animation parameters and integrate them onto the animation execution plan. The *animation-realization engines* then realize these animation parameters through their internal renderers and display them to the user. As outlined in

Advanced Content and Interface Personalization through Conversational Behavior…

http://dx.doi.org/10.5772/intechopen.75599

91

**conversational agents**

**Figure 6.** Realization of a sentence with the conversational agent EVA.

## **5. Realization of expressive conversational behavior on embodied conversational agents**

The proprietary *EVA realization framework* proposed in [36–37, 44] has been developed in order to be able to evoke a social response in human-machine interaction based on expressive conversational behavior generated through the previous modules of the EVA model. The framework enables machines to engage with the user on the more personal level, namely, through humanlike entity realization of multimodal interaction models, which are based on the concept of conversation. Thus, this framework integrates ECAs as virtual bodies and generates responses via their embodiment. The ECA's artificial body and articulation capabilities (embodiment) are already close to those found on real humans. From the skin, face, hands, and body posture, these virtual entities tend to look and behave as realistically as possible. ECAs also tend to imitate as many features of human face-to-face dialogs as possible and integrate them into interaction as synchronized as possible. Although the humanlike equation is mostly defined via co-verbal behavior generation model and the corpus analysis, the framework actually represents the final component through which users actually come in contact with the response. Thus, one could say that the framework brings responses to "life." Further, diversity and capacity to handle highly dynamic and interchangeable contexts of human interaction are in addition to realism of appearance, one of the key challenges of the modern ECAs. 3D tools, such as Maya, Daz3D, Blender, Panda3D, and Unity, have opened up completely new possibilities to design virtual entities, which appear almost like real-life persons. The modern behavior generators open the possibility to plan and model responses almost completely to the context of situation and collocutor. The behavior realizer, therefore, represents the bridge between the two concepts. The *EVA realization framework* also creates an environment that is capable to deliver expressive and socially colored responses in the form of facial expressions, gaze, head, and hand movement. Its architecture is outlined in **Figure 7**.

It consists of animation-parameters builder, animation-realization engines, articulated 3D models, and created 3D resources. The *animation-parameters builder* is used to understand and transform the co-verbal events into animation parameters and integrate them onto the animation execution plan. The *animation-realization engines* then realize these animation parameters through their internal renderers and display them to the user. As outlined in **Figure 7**, two animation engines are implemented, one is based on Panda 3D2 game engine and the other based on Unity 3D game engine3 . Each of them incorporates its own set of articulated 3D models. However, all articulated 3D models support the same movement controllers (bones and morphed shapes). Thus, any EVA event can be used by either realizer, and the result will be still practically the same. The major difference between the supported animation engines is their implementation of frame-by-frame operations. Namely, in the Panda 3D engine, frame-by-frame operations are handled internally by the renderer, while in the Unity 3D engine, the renderer only renders each frame. This means that all

**Figure 6.** Realization of a sentence with the conversational agent EVA.

is actually requested. The hold movement phase *FH* and the retraction movement phase *FR*, however, do not require procedural description. Namely, *FH* only represents a hold of the existing configuration, while *FR* a retraction into a rest/neutral state. The transformation of movement model *H* into an EVA event (co-verbal behavior written in EVA-Script) is outlined in **Figure 6**. **Figure 6** also outlines how EVA event is applied to an ECA and then realized as a multimodal expression, which is built from the synchronized verbal and co-verbal sequences. As can be seen, the *FP* is defined across the word "*bila*" (*was*), with predicted extension up to the word "*je*" (*is*). Further, the predicted duration of the *FP* phase

word "*tako*" (*that*), with a predicted duration 300 ms and a maximal duration 893 ms. The post-stroke-hold phase *FHS* is identified across the semiotic nucleus, the word "*velika*" (*big*),

is defined across the *PA*

is 413 ms, while the maximum duration of the *FP* is 593 ms. The *FS*

with a duration of 451 ms.

90 Artificial Intelligence - Emerging Trends and Applications

<sup>2</sup> Panda 3D: https://www.panda3d.org/.

<sup>3</sup> Unity 3D: https://unity3d.com/.

*behavior generator* and optionally to the dialog handlers. Similarly, after the full realization of the behavior que, a change in conversational context event is raised, and the generation

Advanced Content and Interface Personalization through Conversational Behavior…

http://dx.doi.org/10.5772/intechopen.75599

93

The communication between processes within the *EVA realization framework* is implemented via event-oriented publish/subscribe model. Namely, when the *event processor* intercepts a conversational event, it firstly checks its type and priority. Afterward, it pushes it into the *animation scheduler*. When the *animation scheduler* receives the conversational event, it initiates internal interpreter in order to segment the behavior into three animation streams. The interpreter transforms the EVA-Script behavior into body part-segmented schedule of parallel/ consecutively executed behavior in a form of animation streams. At the same time, the *scheduler* smoothly stops any idle behavior, destroys its handlers, and moves it to the rest pose. The overall result of the *animation-parameters builder* is, therefore, a set of animation parameters, which describe the execution of one or more animations representing the planned co-verbal act (multimodal response). The animation parameters are those features that specify how the animation engine should build its animation graph. The animation parameters are "fed" to the second component of the framework, the *EVA animation engine*. The *EVA animation engine* takes care for the transformation of animation parameters into animated sequences. The animation plan contains the co-verbal behavior que. After its que is emptied, the *animation scheduler* signals that the animation stream has been completed and will destroy animator objects, in order to release the reserved resources. After all animation segments are completed, the *animation scheduler* signals the end of the conversational event. As a result, the *event handler*, if no more co-verbal events arrive, triggers the manifestation of the idle behavior. Each animation engine transforms the animation parameters maintained in the animation plans into corresponding sequential and/or parallel movements of control points (bones or morphed shapes). Further, both animation engines provide the forward kinematics (and inverse kinematics) and animation-blending techniques, which enable for animation parameters to appear as viable behavior even on segments that have to be controlled by different gestures at the same time, e.g., simultaneously animating smile and viseme. Each gesture/ expression/emotion is realized by combining different sequential/concurrent transformations of different movement controllers (embodiments) of the ECA. The EVA-Script events describe the facilitation of the movement controllers in a form of temporally defined end poses. Thus, each entry contains "next" configuration, which should be rendered over the specified temporal distribution. The in-between frames, which actually generate movement, however, are calculated and interpolated by animation engines. In case of Panda 3D engine, the render receives the required end pose and calculates the in-between frames automatically, while in the case of Unity 3D engine, the *animation handler* handles frame-by-frame operations, e.g., the render receives the next in-between configuration, which is calculated by the *animation handler* at each frame. In this way, any animation can be modified at any step, even during the execution of some step/sequence. For a smooth transition, the scheduler does not have to wait and adjust its temporal scheduling. It just has to adjust its frame-by-frame schedule and replace it with new configurations. It can actually instantiate changes instantly as they occur. It can also insert new behavior between configurations, etc. As a result, the virtual character becomes more responsive and can react to changes of the conversational, environmental, and other

of inactive (rest) behavior is triggered.

**Figure 7.** EVA realization framework.

calculations for the next "in-between" pose are calculated via the implanted algorithms. As a result, the Unity 3D implementation allows for controlling the scheduled animation and even animation already being executed. In order that the *EVA realization framework* realizes the generated synchronized behavior and represents it to the user, the *EVA behavior realizer* transforms conversational events into their physical representations. This is achieved by applying the co-verbal features described in the co-verbal events into the 3D resources available in the renderer. The *animation-parameters builder* translates the EVA-Script into animation parameters. This is achieved by interfacing each script's tag with the control unit or behavioral template and by extrapolating different groups of movements. Each group of movement is defined by semantic (which control units in which order), temporal (durations of stroke, hold, and retraction movement phases), and spatial features (ending position of the control unit). The main components of the *animation-parameters builder* are *event processor*, *animation generator*, and *animation scheduler*. The *event processor* intercepts and handles the conversational events. It parses event stream and checks event's type and priority. The *animation generator* then transforms the conversational behavior into animation sequences. As part of this process, the *animation generator* applies temporal and spatial constraints adjusted to the agents' articulated body. The a*nimation scheduler* then inserts the generated animation sequences into the execution plan. It handles the animation graph and feeds them to the rendering engines accordingly. Finally, after the realization of each animation sequence is completed, the *event processor* signals its status (conversational context) to the *behavior generator* and optionally to the dialog handlers. Similarly, after the full realization of the behavior que, a change in conversational context event is raised, and the generation of inactive (rest) behavior is triggered.

The communication between processes within the *EVA realization framework* is implemented via event-oriented publish/subscribe model. Namely, when the *event processor* intercepts a conversational event, it firstly checks its type and priority. Afterward, it pushes it into the *animation scheduler*. When the *animation scheduler* receives the conversational event, it initiates internal interpreter in order to segment the behavior into three animation streams. The interpreter transforms the EVA-Script behavior into body part-segmented schedule of parallel/ consecutively executed behavior in a form of animation streams. At the same time, the *scheduler* smoothly stops any idle behavior, destroys its handlers, and moves it to the rest pose. The overall result of the *animation-parameters builder* is, therefore, a set of animation parameters, which describe the execution of one or more animations representing the planned co-verbal act (multimodal response). The animation parameters are those features that specify how the animation engine should build its animation graph. The animation parameters are "fed" to the second component of the framework, the *EVA animation engine*. The *EVA animation engine* takes care for the transformation of animation parameters into animated sequences. The animation plan contains the co-verbal behavior que. After its que is emptied, the *animation scheduler* signals that the animation stream has been completed and will destroy animator objects, in order to release the reserved resources. After all animation segments are completed, the *animation scheduler* signals the end of the conversational event. As a result, the *event handler*, if no more co-verbal events arrive, triggers the manifestation of the idle behavior. Each animation engine transforms the animation parameters maintained in the animation plans into corresponding sequential and/or parallel movements of control points (bones or morphed shapes). Further, both animation engines provide the forward kinematics (and inverse kinematics) and animation-blending techniques, which enable for animation parameters to appear as viable behavior even on segments that have to be controlled by different gestures at the same time, e.g., simultaneously animating smile and viseme. Each gesture/ expression/emotion is realized by combining different sequential/concurrent transformations of different movement controllers (embodiments) of the ECA. The EVA-Script events describe the facilitation of the movement controllers in a form of temporally defined end poses. Thus, each entry contains "next" configuration, which should be rendered over the specified temporal distribution. The in-between frames, which actually generate movement, however, are calculated and interpolated by animation engines. In case of Panda 3D engine, the render receives the required end pose and calculates the in-between frames automatically, while in the case of Unity 3D engine, the *animation handler* handles frame-by-frame operations, e.g., the render receives the next in-between configuration, which is calculated by the *animation handler* at each frame. In this way, any animation can be modified at any step, even during the execution of some step/sequence. For a smooth transition, the scheduler does not have to wait and adjust its temporal scheduling. It just has to adjust its frame-by-frame schedule and replace it with new configurations. It can actually instantiate changes instantly as they occur. It can also insert new behavior between configurations, etc. As a result, the virtual character becomes more responsive and can react to changes of the conversational, environmental, and other

calculations for the next "in-between" pose are calculated via the implanted algorithms. As a result, the Unity 3D implementation allows for controlling the scheduled animation and even animation already being executed. In order that the *EVA realization framework* realizes the generated synchronized behavior and represents it to the user, the *EVA behavior realizer* transforms conversational events into their physical representations. This is achieved by applying the co-verbal features described in the co-verbal events into the 3D resources available in the renderer. The *animation-parameters builder* translates the EVA-Script into animation parameters. This is achieved by interfacing each script's tag with the control unit or behavioral template and by extrapolating different groups of movements. Each group of movement is defined by semantic (which control units in which order), temporal (durations of stroke, hold, and retraction movement phases), and spatial features (ending position of the control unit). The main components of the *animation-parameters builder* are *event processor*, *animation generator*, and *animation scheduler*. The *event processor* intercepts and handles the conversational events. It parses event stream and checks event's type and priority. The *animation generator* then transforms the conversational behavior into animation sequences. As part of this process, the *animation generator* applies temporal and spatial constraints adjusted to the agents' articulated body. The a*nimation scheduler* then inserts the generated animation sequences into the execution plan. It handles the animation graph and feeds them to the rendering engines accordingly. Finally, after the realization of each animation sequence is completed, the *event processor* signals its status (conversational context) to the

**Figure 7.** EVA realization framework.

92 Artificial Intelligence - Emerging Trends and Applications

contexts instantly. The agent also "remembers" what it was gesturing prior to the excited state. Additionally, it can continue with the realization of that behavior after the excited state dissipates.

## **6. Realization of complex co-verbal acts**

When the realization framework receives some co-verbal EVA event, it transforms it into a synchronized and fluid stream of movement, performed by the following independent body parts: hands, arms, face, and head. To retain naturalness (especially regarding visual speech coarticulation) and at the same time prevent "jerky" expressions, the animation-blending techniques are used. The same animation-blending techniques are also used to realize three different types of complex emotions: emotion masking, mixed expressions, and qualified emotions. These complex emotions further intensify the expressive factor of the framework and enable the implantation of highly humanlike representation of feelings in facial region (e.g., by modulating, falsifying, and qualifying an expression with one or more elemental expressions). The modulation of expression is realized by using animation-blending techniques, based on intensification or de-intensification. Both are similar to qualification of expression and implemented through the power (e.g., stress attribute) and temporal (durationUp, durationDown, delay, and persistence attributes) components of the domain of expressivity [36–37, 44]. **Figure 8** outlines output from the *realization framework* as interpretation and realization of EVA-Script events, including several layers of complexity, and EVA-Script attributes described through the EVA-Script language. In **Figure 8**, the behavior generator defines a co-verbal act that consists of two co-verbal events. The first one resembles the end of "searching idea" event (when some idea of a solution comes to our mind), and the second one reassembles the beginning of revelation of the idea (e.g., how one starts outlining the solution to collocutors). The co-verbal behavior is described in order to be performed via full embodiment (all co-verbal artifacts), namely, by using the arms, hands, face, and head. To add an additional layer of complexity, the inner synchronization and the temporal distribution are different for each co-verbal artifact.

before the previous two co-verbal artifacts finished. The right arm finished with its animation after 0.567 s, while the right hand manifested the targeted hand shape in 0.3 s. The left hand and arm propagated to their end-configuration until the overall end of the event (e.g., for 1.167 s). Those co-verbal artifacts, which have already finished, just maintained their configuration. The second co-verbal act is targeted to last 5.567 s. During the realization of the second co-verbal act, the right arm (with hand) is regarded as less significant; therefore, it is moved to its intended rest position as slowly as possible. The left arm (with hand) is, in this situation, regarded as one of the significant co-verbal artifacts carrying some conversational meaning. The same holds true for the head and face. The left-hand movement in this case appeared as with most power in order to gain the most attention of the collocutor, and the face expressed confidence colored with excitement. Finally, by directing gaze to the collocutor, the ECA EVA prepares the conversational environment, which facilitates the full attention of the collocutor.

Advanced Content and Interface Personalization through Conversational Behavior…

http://dx.doi.org/10.5772/intechopen.75599

95

Natural interaction entails multiple behavior variations correlated in a dynamic and highly unpredictable setting. It also incorporates various social and interpersonal signals to "color" the final outcome. Furthermore, multimodality in interaction is not just an add-on or a style of information representation. It goes well beyond semantics and even semiotic artifacts. It significantly contributes to representation of information as well as in interpersonal and textual function of communication. In this chapter we have outlined approach to automatic synthesis of more natural humanlike responses generated based on EVA conversational model. The presented model consists of three interlinked and repetitive frameworks. The

Thus, it can start with the representation of the recently developed idea.

**Figure 8.** Realization of EVA event on ECA EVA rendered in unity-based realizer.

**7. Conclusion**

During the first co-verbal event (e.g., revelation), the head, face, and right hand are the dominant artifacts. Thus, they appear to "move" with most significance and power. On the other hand, the left hand moves to its targeted position slightly delayed but as fast as possible. In the second act, however, the left hand is the dominant artifact. Thus, its movement will appear as most significant, e.g., the longest duration and with most power, while the face/ head and right arm/hand are moved to the position as "quietly" as possible. The overall duration of the first co-verbal act (Act 1) act is 1.567 s. During this time period, the agent has to perform a pointing gesture, by pointing to the sky and by moving its left arm to a position that is relevant for the specified pointing gesture (e.g., almost touching the torso). Additionally, the agent should express a blend of happy/surprised emotion on his/her face. As outlined in **Figure 8**, head/gaze and facial expression started to appear first (delay = 0.0 s). The two coverbal artifacts then moved to their final configuration in 0.5 s, while the right- and left-hand movements are delayed for 0.4 s. Thus, both configurations started to form 0.1 s interval, Advanced Content and Interface Personalization through Conversational Behavior… http://dx.doi.org/10.5772/intechopen.75599 95

**Figure 8.** Realization of EVA event on ECA EVA rendered in unity-based realizer.

before the previous two co-verbal artifacts finished. The right arm finished with its animation after 0.567 s, while the right hand manifested the targeted hand shape in 0.3 s. The left hand and arm propagated to their end-configuration until the overall end of the event (e.g., for 1.167 s). Those co-verbal artifacts, which have already finished, just maintained their configuration. The second co-verbal act is targeted to last 5.567 s. During the realization of the second co-verbal act, the right arm (with hand) is regarded as less significant; therefore, it is moved to its intended rest position as slowly as possible. The left arm (with hand) is, in this situation, regarded as one of the significant co-verbal artifacts carrying some conversational meaning. The same holds true for the head and face. The left-hand movement in this case appeared as with most power in order to gain the most attention of the collocutor, and the face expressed confidence colored with excitement. Finally, by directing gaze to the collocutor, the ECA EVA prepares the conversational environment, which facilitates the full attention of the collocutor. Thus, it can start with the representation of the recently developed idea.

## **7. Conclusion**

contexts instantly. The agent also "remembers" what it was gesturing prior to the excited state. Additionally, it can continue with the realization of that behavior after the excited state

When the realization framework receives some co-verbal EVA event, it transforms it into a synchronized and fluid stream of movement, performed by the following independent body parts: hands, arms, face, and head. To retain naturalness (especially regarding visual speech coarticulation) and at the same time prevent "jerky" expressions, the animation-blending techniques are used. The same animation-blending techniques are also used to realize three different types of complex emotions: emotion masking, mixed expressions, and qualified emotions. These complex emotions further intensify the expressive factor of the framework and enable the implantation of highly humanlike representation of feelings in facial region (e.g., by modulating, falsifying, and qualifying an expression with one or more elemental expressions). The modulation of expression is realized by using animation-blending techniques, based on intensification or de-intensification. Both are similar to qualification of expression and implemented through the power (e.g., stress attribute) and temporal (durationUp, durationDown, delay, and persistence attributes) components of the domain of expressivity [36–37, 44]. **Figure 8** outlines output from the *realization framework* as interpretation and realization of EVA-Script events, including several layers of complexity, and EVA-Script attributes described through the EVA-Script language. In **Figure 8**, the behavior generator defines a co-verbal act that consists of two co-verbal events. The first one resembles the end of "searching idea" event (when some idea of a solution comes to our mind), and the second one reassembles the beginning of revelation of the idea (e.g., how one starts outlining the solution to collocutors). The co-verbal behavior is described in order to be performed via full embodiment (all co-verbal artifacts), namely, by using the arms, hands, face, and head. To add an additional layer of complexity, the inner synchronization and the temporal distribution are different for each co-verbal

During the first co-verbal event (e.g., revelation), the head, face, and right hand are the dominant artifacts. Thus, they appear to "move" with most significance and power. On the other hand, the left hand moves to its targeted position slightly delayed but as fast as possible. In the second act, however, the left hand is the dominant artifact. Thus, its movement will appear as most significant, e.g., the longest duration and with most power, while the face/ head and right arm/hand are moved to the position as "quietly" as possible. The overall duration of the first co-verbal act (Act 1) act is 1.567 s. During this time period, the agent has to perform a pointing gesture, by pointing to the sky and by moving its left arm to a position that is relevant for the specified pointing gesture (e.g., almost touching the torso). Additionally, the agent should express a blend of happy/surprised emotion on his/her face. As outlined in **Figure 8**, head/gaze and facial expression started to appear first (delay = 0.0 s). The two coverbal artifacts then moved to their final configuration in 0.5 s, while the right- and left-hand movements are delayed for 0.4 s. Thus, both configurations started to form 0.1 s interval,

dissipates.

artifact.

**6. Realization of complex co-verbal acts**

94 Artificial Intelligence - Emerging Trends and Applications

Natural interaction entails multiple behavior variations correlated in a dynamic and highly unpredictable setting. It also incorporates various social and interpersonal signals to "color" the final outcome. Furthermore, multimodality in interaction is not just an add-on or a style of information representation. It goes well beyond semantics and even semiotic artifacts. It significantly contributes to representation of information as well as in interpersonal and textual function of communication. In this chapter we have outlined approach to automatic synthesis of more natural humanlike responses generated based on EVA conversational model. The presented model consists of three interlinked and repetitive frameworks. The first framework involves conversational analysis through which we analyze multiparty and spontaneous face-to-face dialogs in order to create various types of conversational resources (from rules and guidelines to complex multidimensional features). The second framework then involves an omni-comprehensive algorithm for the synthesis of affective co-verbal behavior based on the arbitrary and unannotated text. In contrast to the related research, the proposed algorithm allows for the conversational behavior to be driven simultaneously by prosody and text and modeled by various dimensions of situational, inter- and intrapersonal contexts. Finally, the predicted behavior well synchronized to its verbal counterparts has to be represented to a user in a most viable manner. Thus, the third framework in the proposed model involves implementation of co-verbal behavior realizer. In our case we have decided to fuse advantages of state-of-the-art 3D modeling tool and game engines with the latest concepts in behavior realization in order to deploy an efficient and highly responsive framework through which the generated co-verbal expressions may be represented to users via realistic and humanlike embodied conversational agents. Namely, modern behavior realizers have the capacity to support several parameters of believability of conversational behavior, such as diversity and multimodal planning, situational awareness, synthesis of verbal content, synchronization, etc. The game engines on the other hand are a powerful tool for rapid and high-quality design and rendering of virtual humanlike entities including ECAs. They enable the design and delivery of beautiful and highly realistic graphics and the efficient handling of hardware resources. To sum up, the ability to express information visually and emotionally plays a central role in human interaction and thus in defining ECA's personality, its emotional state, and can make such an agent an active participant in a conversation. However, in order to make him be perceived even more natural, the agent must be able to respond to situational triggers smoothly and almost instantly as they are perceived and by facilitating synchronized verbal and co-verbal channels. Thus, the presented model presents an important step toward generating more natural and humanlike companions and machine-generated responses.

**References**

2017. pp. 431-436

978-1-4438-1107-1

1316502365

2017;**107**:105-128

Claypool; 2017. pp. 239-276

1409490610, 9781409490616

2015;**42**(1):122-145

[1] Luger E, Sellen A. Like having a really bad PA: The gulf between user expectation and experience of conversational agents. In: Proceedings of the 2016 CHI Conference on

Advanced Content and Interface Personalization through Conversational Behavior…

http://dx.doi.org/10.5772/intechopen.75599

97

[2] Feyaerts K, Brône G, Oben B. Multimodality in interaction. In: Dancygier B, editor. The Cambridge Handbook of Cognitive Linguistics. Cambridge: Cambridge University

[3] Li J, Galley M, Brockett C, Spithourakis GP, Gao J, Dolan B. A Persona-Based Neural Conversation Model. In Proceedings of the 54th Annual Meeting of the Association for

[4] Porcheron M, Fischer JE, McGregor M, Brown B, Luger E, Candello H, O'Hara K. Talking with conversational agents in collaborative action. In: Companion of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM.

[5] Bonsignori V, Camiciottoli BC, editors. Multimodality Across Communicative Settings, Discourse Domains and Genres. Cambridge Scholars Publishing; 2016. Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK. ISBN (10): 1-4438-1107-6. ISBN (13):

[6] Kopp S, Bergmann K. Using cognitive models to understand multimodal processes: The case for speech and gesture production. In: The Handbook of Multimodal-Multisensor Interfaces. New York, NY, USA: Association for Computing Machinery and Morgan &

[7] McNeill D.Why We Gesture: The Surprising Role of Hand Movement in Communication. Cambridge: Cambridge University Press; 2016. ISBN-10: 1316502368. ISBN-13: 978-

[8] Davitti E, Pasquandrea S. Embodied participation: What multimodal analysis can tell us about interpreter-mediated encounters in pedagogical settings. Journal of Pragmatics.

[9] Hazel S, Mortensen K. Embodying the institution—Object manipulation in developing interaction in study counselling meetings. Journal of Pragmatics. 2014;**65**:10-29

[10] Vannini P, Waskul D, editors. Body/Embodiment: Symbolic Interaction and The Sociology of the Body. New York, NY, USA: Ashgate Publishing, Ltd.; 2012. ISBN:

[11] Colletta JM, Guidetti M, Capirci O, Cristilli C, Demir OE, Kunene-Nicolas RN, Levine S. Effects of age and language on co-speech gesture production: An investigation of French, American, and Italian children's narratives. Journal of Child Language.

Human Factors in Computing Systems. ACM. 2016. pp. 5286-5297

Computational Linguistics. Berlin, Germany; 2016; pp. 994-1003

Press; 2017. pp. 135-156. DOI: 10.1017/9781316339732.010

## **Acknowledgements**

This work is partially funded by the European Regional Development Fund and the Ministry of Education, Science and Sport of Slovenia (project SAIAL).

This work is partially funded by the European Regional Development Fund and the Republic of Slovenia (project IQHOME).

## **Author details**

Matej Rojc\*, Zdravko Kačič and Izidor Mlakar

\*Address all correspondence to: matej.rojc@um.si

Faculty of Electrical Engineering and Computer Science, University of Maribor, Slovenia

## **References**

first framework involves conversational analysis through which we analyze multiparty and spontaneous face-to-face dialogs in order to create various types of conversational resources (from rules and guidelines to complex multidimensional features). The second framework then involves an omni-comprehensive algorithm for the synthesis of affective co-verbal behavior based on the arbitrary and unannotated text. In contrast to the related research, the proposed algorithm allows for the conversational behavior to be driven simultaneously by prosody and text and modeled by various dimensions of situational, inter- and intrapersonal contexts. Finally, the predicted behavior well synchronized to its verbal counterparts has to be represented to a user in a most viable manner. Thus, the third framework in the proposed model involves implementation of co-verbal behavior realizer. In our case we have decided to fuse advantages of state-of-the-art 3D modeling tool and game engines with the latest concepts in behavior realization in order to deploy an efficient and highly responsive framework through which the generated co-verbal expressions may be represented to users via realistic and humanlike embodied conversational agents. Namely, modern behavior realizers have the capacity to support several parameters of believability of conversational behavior, such as diversity and multimodal planning, situational awareness, synthesis of verbal content, synchronization, etc. The game engines on the other hand are a powerful tool for rapid and high-quality design and rendering of virtual humanlike entities including ECAs. They enable the design and delivery of beautiful and highly realistic graphics and the efficient handling of hardware resources. To sum up, the ability to express information visually and emotionally plays a central role in human interaction and thus in defining ECA's personality, its emotional state, and can make such an agent an active participant in a conversation. However, in order to make him be perceived even more natural, the agent must be able to respond to situational triggers smoothly and almost instantly as they are perceived and by facilitating synchronized verbal and co-verbal channels. Thus, the presented model presents an important step toward generating more natural and humanlike companions and

This work is partially funded by the European Regional Development Fund and the Ministry

This work is partially funded by the European Regional Development Fund and the Republic

Faculty of Electrical Engineering and Computer Science, University of Maribor, Slovenia

of Education, Science and Sport of Slovenia (project SAIAL).

machine-generated responses.

96 Artificial Intelligence - Emerging Trends and Applications

of Slovenia (project IQHOME).

Matej Rojc\*, Zdravko Kačič and Izidor Mlakar

\*Address all correspondence to: matej.rojc@um.si

**Author details**

**Acknowledgements**


[12] Esposito A, Vassallo J, Esposito AM, Bourbakis N. On the amount of semantic information conveyed by gestures. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI); IEEE. 2015. pp. 660-667

[26] Tolins J, Liu K, Neff M, Walker MA, Tree JEF. A verbal and gestural corpus of story

Advanced Content and Interface Personalization through Conversational Behavior…

http://dx.doi.org/10.5772/intechopen.75599

99

[27] Ochs M, Pelachaud C, Mckeown G. A user perception-based approach to create smiling embodied conversational agents. ACM Transactions on Interactive Intelligent Systems.

[28] Bellamy RK, Andrist S, Bickmore T, Churchill EF, Erickson T. Human-agent collaboration: Can an agent be a partner? In: Proceedings of the 2017 CHI Conference Extended

[29] Neff M. Hand Gesture Synthesis for Conversational Characters. Handbook of Human Motion. Switzerland: Springer International Publishing; 2017. pp. 1-12. ISBN:

[30] Provoost S, Lau HM, Ruwaard J, Riper H. Embodied conversational agents in clinical psychology: A scoping review. Journal of Medical Internet Research. 2017;**19**(5):e151,

[31] Rojc M, Presker M, Kačič Z, Mlakar I. TTS-driven expressive embodied conversation agent EVA for UMB-SmartTV. International Journal of Computers and Communications.

[32] Shaked NA. Avatars and virtual agents—Relationship interfaces for the elderly. Health-

[33] Mlakar I, Kačič Z, Rojc M. A corpus for investigating the multimodal nature of multispeaker spontaneous conversations–EVA corpus. WSEAS Transactions on Information

[34] Mlakar I, Kačič Z, Rojc M. Describing and animating complex communicative verbal and nonverbal behavior using Eva-framework. Applied Artificial Intelligence. 2014;**28**(5):

[35] Shamekhi A, Czerwinski M, Mark G, Novotny M, Bennett GA. An exploratory study toward the preferred conversational style for compatible virtual agents. In: International

[36] Rojc M, Mlakar I, Kačič Z. The TTS-driven affective embodied conversational agent EVA, based on a novel conversational-behavior generation algorithm. Engineering

[37] Rojc M, Mlakar I. An expressive conversational-behavior generation model for advanced interaction within multimodal user interfaces. In: Computer Science, Technology and Applications. New York: Nova Science Publishers, Inc.; 2016, cop. XIV, p. 234 str. ISBN

[38] Pelachaud C. Greta: An interactive expressive embodied conversational agent. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent

Abstracts on Human Factors in Computing Systems. ACM. 2017. pp. 1289-1294

retellings to an expressive embodied virtual character. In LREC. 2016

2017;**7**(1):33. DOI: DOI: 10.1145/2925993, article 4 (January 2017)

978-3-319-30808-1

pp.1-17

470-503

2014;**8**:57-66

care Technology Letters. 2017;**4**(3):83-87

Science and Applications. 2017;**14**:213-226. ISSN 1790-0832

Conference on Intelligent Virtual Agents. 2016. pp. 40-50

Applications of Artificial Intelligence. 2017;**57**:80-104

978-1-63482-955-7. ISBN 978-1-63484-084-2

Systems; May 2015. pp. 5-5


[26] Tolins J, Liu K, Neff M, Walker MA, Tree JEF. A verbal and gestural corpus of story retellings to an expressive embodied virtual character. In LREC. 2016

[12] Esposito A, Vassallo J, Esposito AM, Bourbakis N. On the amount of semantic information conveyed by gestures. In: 2015 IEEE 27th International Conference on Tools with

[13] Kendon A. Gesture: Visible Action as Utterance. Cambridge University Press; 2004.

[14] Zhao R, Sinha T, Black AW, Cassell J. Socially-aware virtual agents: Automatically assessing dyadic rapport from temporal patterns of behavior. In International Conference on

[15] Pejsa T, Gleicher M, Mutlu B. Who, me? How virtual agents can shape conversational footing in virtual reality. In: International Conference on Intelligent Virtual Agents;

[16] Allwood J. A framework for studying human multimodal communication. In: Coverbal Synchrony in Human-Machine Interaction. Boca Raton; London; New York: CRC Press;

2013. cop. 2014. XIV, 420 str., ilustr. ISBN 1-4665-9825-5. ISBN 978-1-4665-9825-6

[17] Bozkurt E, Yemez Y, Erzin E. Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures. Speech Communication. 2016;**85**:29-42

[18] Chen CL, Herbst P. The interplay among gestures, discourse, and diagrams in students' geometrical reasoning. Educational Studies in Mathematics. 2013;**83**(2):285-307

[19] Holler J, Bavelas J. In: Breckinridge Church R, Alibali MW, Kelly SD, editors. Multimodal Communication of Common Ground. Why Gesture? How the Hands Function in

[20] Poggi I. Hands, Mind, Face and Body: A Goal and Belief View of Multimodal Communication. Berlin: Weidler; 2007. ISBN (10): 3896932632. ISBN (13): 978-3896932631

[21] Yumak Z, Magnenat-Thalmann N. Multimodal and multi-party social interactions. In: Context Aware Human-Robot and Human-Agent Interaction. Switzerland: Springer

[22] Kuhnke F, Ostermann J. Visual speech synthesis from 3D mesh sequences driven by combined speech features. In 2017 IEEE International Conference on Multimedia and

[23] Peng X, Chen H, Wang L, Wang H. Evaluating a 3-D virtual talking head on pronunciation learning. International Journal of Human-Computer Studies. 2018;**109**:26-40

[24] Wang N, Ahn J, Boulic R. Evaluating the sensitivity to virtual characters facial asymme-

[25] Gibet S, Carreno-Medrano P, Marteau PF. Challenges for the animation of expressive virtual characters: The standpoint of sign language and theatrical gestures. In: Dance Notations and Robot Motion. Switzerland: Springer International Publishing; 2016. pp. 169-186

try in emotion synthesis. Applied Artificial Intelligence. 2017;**31**(2):103-118

Speaking, Thinking and Communicating. Vol. 7. 2017. pp. 213-240

International Publishing; 2016. pp. 275-298

Expo (ICME). IEEE. 2017. pp. 1075-1080

Intelligent Virtual Agents; Springer International Publishing. 2016. pp. 218-233

Artificial Intelligence (ICTAI); IEEE. 2015. pp. 660-667

ISBN 0 521 83525 9. ISBN 0 521 54293 6

98 Artificial Intelligence - Emerging Trends and Applications

Cham: Springer. 2017. pp. 347-359


[39] Mondada L. New challenges for conversation analysis: The situated and systematic organization of social interaction. Langage et Societe. 2017;**2**:181-197

**Chapter 5**

**Provisional chapter**

**High Performance Technology in Algorithmic**

**High Performance Technology in Algorithmic** 

DOI: 10.5772/intechopen.75959

Alan Turing's article, "Computation and intelligence", gives the preamble of the characteristics of guessing if it is a machine or another human being. Currently, the use of ubiquitous technologies, such as the use of firmware, allows direct access to analog data, however, we must find a way to secure the information. Analyzing cryptographic algorithms for the transfer of multimedia information. Raise the use of cryptarithmetic. Finite automata will be developed that will govern the logic of the cryptographic algorithms to be integrated into Firmware, performance tests and controls will be carried out to determine the best strategies for their performance and algorithmic complexity. Technologies are expressed that allow the creation of learning environments, such as neural networks,

In this research, it is revealed how algorithms are integrated into different representations of

that support other processes as the recognition of patterns on images. **Keywords:** cryptarithmetic, cryptographic algorithms, firmware, HPC

the data, information seen as signals, images, video and text.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Arturo Lezama-León, José Juan Zarate-Corona,

José Juan Zarate-Corona, Evangelina Lezama-León,

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75959

**Cryptography**

**Cryptography**

Evangelina Lezama-León, José Angel Montes-Olguín, Juan Ángel Rosales-Alba and Ma. de la Luz Carrillo-González

José Angel Montes-Olguín, Juan Ángel Rosales-Alba and Ma. de la Luz Carrillo-González

Arturo Lezama-León,

**Abstract**

**1. Introduction**


#### **High Performance Technology in Algorithmic Cryptography High Performance Technology in Algorithmic Cryptography**

DOI: 10.5772/intechopen.75959

Arturo Lezama-León, José Juan Zarate-Corona, Evangelina Lezama-León, José Angel Montes-Olguín, Juan Ángel Rosales-Alba and Ma. de la Luz Carrillo-González Arturo Lezama-León, José Juan Zarate-Corona, Evangelina Lezama-León, José Angel Montes-Olguín, Juan Ángel Rosales-Alba and Ma. de la Luz Carrillo-González

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75959

**Abstract**

[39] Mondada L. New challenges for conversation analysis: The situated and systematic

[40] Velentzas J, Broni DG. Communication cycle: Definition, process, models and examples. In: Proceeding of the 5th International Conference on Finance, Accounting and Law

[41] Mlakar I, Kačič Z, Rojc M. Form-Oriented Annotation for Building a Functionally Independent Dictionary of Synthetic Movement, Vol. 7403. Berlin; New York: Springer;

[42] Rojc M, Mlakar I. Multilingual and multimodal corpus-based text-to-speech system PLATTOS. In: Ivo I, editor. Speech and Language Technologies. Rijeka: InTech; 2011.

[43] Rojc M, Kačič Z. Gradient-descent based unit-selection optimization algorithm used for corpus-based text-to-speech synthesis. Applied Artificial Intelligence. 2011;**25**(7):635-668

[44] Mlakar I, Kačič Z, Borko M, Rojc M. A novel unity-based realizer for the realization of conversational behavior on embodied conversational agents. International Journal of

organization of social interaction. Langage et Societe. 2017;**2**:181-197

(ICFA" 14); Vol. 17. 2014. pp. 117-131

100 Artificial Intelligence - Emerging Trends and Applications

Computers. 2017;**2**:205-213. ISSN: 2367-8895

2012. pp. 251-265

ISBN: 978-953-307-322-4

Alan Turing's article, "Computation and intelligence", gives the preamble of the characteristics of guessing if it is a machine or another human being. Currently, the use of ubiquitous technologies, such as the use of firmware, allows direct access to analog data, however, we must find a way to secure the information. Analyzing cryptographic algorithms for the transfer of multimedia information. Raise the use of cryptarithmetic. Finite automata will be developed that will govern the logic of the cryptographic algorithms to be integrated into Firmware, performance tests and controls will be carried out to determine the best strategies for their performance and algorithmic complexity. Technologies are expressed that allow the creation of learning environments, such as neural networks, that support other processes as the recognition of patterns on images.

**Keywords:** cryptarithmetic, cryptographic algorithms, firmware, HPC

## **1. Introduction**

In this research, it is revealed how algorithms are integrated into different representations of the data, information seen as signals, images, video and text.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### **1.1. Turing machines**

Alan Turing's test, proposed in 1950, was designed to provide an operational and satisfactory definition of intelligence.

When a possible entry is described under this form, then we are in the presence of an entry

High Performance Technology in Algorithmic Cryptography

http://dx.doi.org/10.5772/intechopen.75959

103

For access, we search for a match of a stored key to managing access to the data, if the password is treated by a cryptographic system then the entry and access controls do not change

By having access control list (ACL), the restrictions are carried out through the management

Cryptographic systems have been implemented to treat various types of data and signals such

In this way its study is of interest because in applications oriented to the control and automation of signals in greenhouses, home automation, automation, automation and control, information security in closed or public environments, the integration of microcontrollers is intended which

In this research, it is revealed how algorithms are integrated into the different representations

The signals are presented in the form of a 1D dimension, two 2D dimensions, and three 3D dimensions. It is important to understand that as the degree of complexity is advanced, it

This research does not intend to venture to deny or violate systems, but as a form of prevention, to continue with research studies, still need to specify the cryptanalysis for each system,

The use of technology in today's world is inevitable. Whether it is making reservations on our smartphones, or checking emails, or checking in for flights, usage of technology is present. Whilst its benefits cannot be questioned, unfortunately, the increase of our reliance on technology implies that we are at higher risk of attack and breaches – cyber-attacks. Responding to this current scenario, current trends of governments protecting their critical infrastructures is the implementation of cybersecurity standards to their critical sectors, in reference [5]. The Global Risk 2014 by World Economic Forum (WEF) shows the 'cyber-attacks' listed in the top 5 global risks, highlighting that the dependency on technology by economies and societies is

The implementation of cybersecurity standards is by no means a silver bullet in critical infrastructure protection. However, its implementation can establish a set of controls that contribute and build better resiliency. The cybersecurity standards may support the capabilities of preparing, protecting, responding and recovering from cyber-attacks. Some of the common cybersecurity-related standards being implemented globally include the following (not exhaustive): ISO/IEC 27032:2012. Information technology – Security techniques

can supply to a large extent, restrictive measures which make their assurance slow.

increases as the subject deserves or intends for an even more exhaustive study.

their form, but the time of computation to corroborate its degree of efficiency.

The use of ACL increases the level of security in information systems.

of the data, information seen as signals, images, video, and text.

The use of Firewalls allows having an accurate strategy as if it were a blog.

control but not access.

of routing tables and entry rules.

as multimedia so that not only text can be treated.

only the studies discussed so far are presented.

**1.4. Computer legislation and regulations**

inevitable, in reference [6].

He suggested a test based on the inability to differentiate between indisputable intelligent entities and human beings.

The computer should have the following capabilities


These six disciplines cover most of the Artificial Intelligence (AI), in reference [1, 2].

#### **1.2. Artificial intelligence and the cryptarithmetic**

The Artificial Intelligence is the study of how to make computer do things which, at the moment, people do better, in Ref. [2].

#### *1.2.1. cryptarithmetic*

It considers high order constraints which can be represented as a collection dev binary options F <> T, in Ref. [3].

#### **1.3. Automata design**

An automaton is a graphic representation of a process, which simulates a sequential process, by means of a deterministic finite automaton, the use of a regular expression can be modeled, and this, in turn, can be evaluated by algorithmic complexity.

There is a counterpart which obeys different inputs at the beginning and consequently can represent several outputs, so we proceed by means of a mechanism to migrate them as a finite deterministic automaton, similarities with neural networks in terms of structure, but not functioning because they pursue different aspect.

Regarding its sequential form, it can be simulated by means of a flow diagram as a graphic form, however, in its parallel form, it must be used either an activity diagram or a Petri net, which can model the concurrent behavior, in reference [4].

Because the keys are usually registered by characters of the ASCII code, they can be validated by regular expressions and this in turn then recreating validation controls by deterministic finite automata.

When a possible entry is described under this form, then we are in the presence of an entry control but not access.

For access, we search for a match of a stored key to managing access to the data, if the password is treated by a cryptographic system then the entry and access controls do not change their form, but the time of computation to corroborate its degree of efficiency.

By having access control list (ACL), the restrictions are carried out through the management of routing tables and entry rules.

The use of ACL increases the level of security in information systems.

The use of Firewalls allows having an accurate strategy as if it were a blog.

Cryptographic systems have been implemented to treat various types of data and signals such as multimedia so that not only text can be treated.

In this way its study is of interest because in applications oriented to the control and automation of signals in greenhouses, home automation, automation, automation and control, information security in closed or public environments, the integration of microcontrollers is intended which can supply to a large extent, restrictive measures which make their assurance slow.

In this research, it is revealed how algorithms are integrated into the different representations of the data, information seen as signals, images, video, and text.

The signals are presented in the form of a 1D dimension, two 2D dimensions, and three 3D dimensions. It is important to understand that as the degree of complexity is advanced, it increases as the subject deserves or intends for an even more exhaustive study.

This research does not intend to venture to deny or violate systems, but as a form of prevention, to continue with research studies, still need to specify the cryptanalysis for each system, only the studies discussed so far are presented.

## **1.4. Computer legislation and regulations**

**1.1. Turing machines**

definition of intelligence.

102 Artificial Intelligence - Emerging Trends and Applications

entities and human beings.

conclusions.

*1.2.1. cryptarithmetic*

F <> T, in Ref. [3].

finite automata.

**1.3. Automata design**

The computer should have the following capabilities

• Computational vision, to perceive objects. • Robotics, to manipulate and move objects.

**1.2. Artificial intelligence and the cryptarithmetic**

and this, in turn, can be evaluated by algorithmic complexity.

which can model the concurrent behavior, in reference [4].

tioning because they pursue different aspect.

moment, people do better, in Ref. [2].

• Representation of knowledge, to store what is known or felt.

Alan Turing's test, proposed in 1950, was designed to provide an operational and satisfactory

He suggested a test based on the inability to differentiate between indisputable intelligent

• Automatic reasoning, to use stored information to answer questions and draw new

• Machine learning, to adapt to new circumstances and to detect and extrapolate patterns.

The Artificial Intelligence is the study of how to make computer do things which, at the

It considers high order constraints which can be represented as a collection dev binary options

An automaton is a graphic representation of a process, which simulates a sequential process, by means of a deterministic finite automaton, the use of a regular expression can be modeled,

There is a counterpart which obeys different inputs at the beginning and consequently can represent several outputs, so we proceed by means of a mechanism to migrate them as a finite deterministic automaton, similarities with neural networks in terms of structure, but not func-

Regarding its sequential form, it can be simulated by means of a flow diagram as a graphic form, however, in its parallel form, it must be used either an activity diagram or a Petri net,

Because the keys are usually registered by characters of the ASCII code, they can be validated by regular expressions and this in turn then recreating validation controls by deterministic

These six disciplines cover most of the Artificial Intelligence (AI), in reference [1, 2].

• Natural language processing, which allows you to communicate in English.

The use of technology in today's world is inevitable. Whether it is making reservations on our smartphones, or checking emails, or checking in for flights, usage of technology is present. Whilst its benefits cannot be questioned, unfortunately, the increase of our reliance on technology implies that we are at higher risk of attack and breaches – cyber-attacks. Responding to this current scenario, current trends of governments protecting their critical infrastructures is the implementation of cybersecurity standards to their critical sectors, in reference [5]. The Global Risk 2014 by World Economic Forum (WEF) shows the 'cyber-attacks' listed in the top 5 global risks, highlighting that the dependency on technology by economies and societies is inevitable, in reference [6].

The implementation of cybersecurity standards is by no means a silver bullet in critical infrastructure protection. However, its implementation can establish a set of controls that contribute and build better resiliency. The cybersecurity standards may support the capabilities of preparing, protecting, responding and recovering from cyber-attacks. Some of the common cybersecurity-related standards being implemented globally include the following (not exhaustive): ISO/IEC 27032:2012. Information technology – Security techniques – Guidelines for cybersecurity. ISO/IEC 27001 Information technology – Security techniques – Information security management systems-Requirements. ISO 22301 Societal security – Business continuity management systems-Requirements. ISO/IEC 15408 Information technology – Security techniques – Evaluation criteria for IT security. ISO/IEC 27035 Information technology – Security techniques – Information security incident management. ISO/IEC 27005 Information technology – Security techniques – Information security risk management. FIPS 140-1: Security Requirements for Cryptographic Modules. FIPS 186-3: Digital Signature Standard, in reference [5].

element of a broader updated national USA cyber security strategy, (2) standards mandated to the Energy and Dams sectors are the Reliability Standards Critical Infrastructure Protection (CIP) 002-009 through the Code of Laws of the United States of America (U.S.C)Title 16 – Conservation, Section 824o – Electric Reliability (16 U.S.C 824o), (3) the standards mandated for the Government Sector is the Federal Information Processing Standards (FIPS) through Federal Information Security Management Act 2002 (FISMA), (4) other standards in the critical sectors are ISO 27799, ISO/IEC 27010 Information technology – Security techniques – Information security management for inter-sector and inter-organizational communications as well as ISO/IEC 27011 Information technology – Security techniques – Information security management guidelines for telecommunications organizations based on ISO/IEC 27002 for Telecommunications and ISO/ IECTR 27015 Information technology – Security techniques – Information security management guidelines for financial services for Financial Services, and (5) the government promotes the application of cyber security standards via establishment of national-level certification schemes for the following standards such as ISO/IEC 27001, ISO/IEC 27005, ISO/IEC

High Performance Technology in Algorithmic Cryptography

http://dx.doi.org/10.5772/intechopen.75959

105

The different approaches that countries take to cybersecurity standard compliance show that cybersecurity standards whether implemented mandatorily or voluntarily is a measure to enhance the protection of the critical infrastructure. Enforcing cybersecurity standards compliance may bring about a positive outcome to the overall cybersecurity management of a

In Mexico, computer legislation is distributed in different rules and laws because the way in which it is pursued depends on the nature of the crime, then a list of some of the most common:

The year 2015 marked a milestone in efforts to eradicate poverty and promote prosperity for all people on a safe planet. With the adoption of the 2030 Agenda for Sustainable Development and other major international commitments. The 2030 Agenda is centered on a set of farreaching and people-centered universal Sustainable Development Goals (SDGs). Reaching these goals in all countries and creating peaceful, just and inclusive societies will be extremely difficult in the absence of effective, accountable and inclusive institutions. Institutions need to

They need to be able to mobilize the society and the private sector in implementing the SDGs. Capacities and innovation will be required to promote policy integration, enhance public accountability, promote participation for more inclusive societies as well as ensure equitable and effective public services for all, particularly for the poorest and most vulnerable groups. Information and Communication Technology (ICT) and e-government are important tools to

The 2030 Agenda itself recognized that "the spread of information and communications technology and global interconnectedness has great potential to accelerate human progress, to

country, and not just the organizations implementing them, in reference [5].

be capable and equipped to adapt the Agenda to the national situation.

27006, ISO/IEC 20000, and ISO 22301, in reference [5].

**1.5. Mexico rules and regulation**

show in **Table 1**.

**1.6. E-government (eGov)**

realize these objectives, in reference [8].

Countries take different approaches towards implementing cybersecurity standards in the efforts of protecting their critical infrastructures. Some countries implement cybersecurity standards through mandatory requirements, whilst others provide guidelines and frameworks, in reference [5].

The United Kingdom's priorities for action is to model the best practice on cyber security in the government systems which will set the standard for suppliers to government to raise the bar on cybersecurity requirement.

Due to this:


Australia's cybersecurity standards compliance implementation is supported by the country's Cybersecurity Strategy 2009 highlighting the need for a consistent and integrated framework of policies, procedures and standards to protect its government's systems, as well as the other interconnected systems. Australia's security measures are: (1) the development and enforcement of the Protective Security Policy Framework (PSPF) to the government agencies through a Directive by the Attorney-General Department (AGD), (2) Australian government has enforced the ISO/IEC 15408 for procurement of products with security functions in the Government Sector, and (3) standards implemented voluntarily and adopted by critical infrastructure organizations are the American National Standard Institute/International Society of Automation (ANSI/ISA)-99, Industrial Automation and Control Systems Security and ISO27799 Health Informatics – Information security management in health using ISO/IEC 27002.

United States of America (USA) has developed various national strategies on cyber security: (1) The Comprehensive National Cybersecurity Initiative (CNCI) will evolve to become the key element of a broader updated national USA cyber security strategy, (2) standards mandated to the Energy and Dams sectors are the Reliability Standards Critical Infrastructure Protection (CIP) 002-009 through the Code of Laws of the United States of America (U.S.C)Title 16 – Conservation, Section 824o – Electric Reliability (16 U.S.C 824o), (3) the standards mandated for the Government Sector is the Federal Information Processing Standards (FIPS) through Federal Information Security Management Act 2002 (FISMA), (4) other standards in the critical sectors are ISO 27799, ISO/IEC 27010 Information technology – Security techniques – Information security management for inter-sector and inter-organizational communications as well as ISO/IEC 27011 Information technology – Security techniques – Information security management guidelines for telecommunications organizations based on ISO/IEC 27002 for Telecommunications and ISO/ IECTR 27015 Information technology – Security techniques – Information security management guidelines for financial services for Financial Services, and (5) the government promotes the application of cyber security standards via establishment of national-level certification schemes for the following standards such as ISO/IEC 27001, ISO/IEC 27005, ISO/IEC 27006, ISO/IEC 20000, and ISO 22301, in reference [5].

The different approaches that countries take to cybersecurity standard compliance show that cybersecurity standards whether implemented mandatorily or voluntarily is a measure to enhance the protection of the critical infrastructure. Enforcing cybersecurity standards compliance may bring about a positive outcome to the overall cybersecurity management of a country, and not just the organizations implementing them, in reference [5].

#### **1.5. Mexico rules and regulation**

– Guidelines for cybersecurity. ISO/IEC 27001 Information technology – Security techniques – Information security management systems-Requirements. ISO 22301 Societal security – Business continuity management systems-Requirements. ISO/IEC 15408 Information technology – Security techniques – Evaluation criteria for IT security. ISO/IEC 27035 Information technology – Security techniques – Information security incident management. ISO/IEC 27005 Information technology – Security techniques – Information security risk management. FIPS 140-1: Security Requirements for Cryptographic Modules. FIPS 186-3: Digital Signature

Countries take different approaches towards implementing cybersecurity standards in the efforts of protecting their critical infrastructures. Some countries implement cybersecurity standards through mandatory requirements, whilst others provide guidelines and frameworks, in

The United Kingdom's priorities for action is to model the best practice on cyber security in the government systems which will set the standard for suppliers to government to raise the

**1.** UK has enforced compliance to the Network Interoperability Consultative Committee (NICC) Minimum Security Standards (ND 1643) through the Communications Act 2003 to

**2.** UK has developed the cyber security and information assurance standards – Information Assurance Maturity Model (IAMM) incorporating the requirements from the Security

**3.** SPF recognizes and has aligned its principles to the ISO/IEC 27001 and the Business Conti-

**4.** The application of standards is promoted through the establishment of national-level certification schemes for the following standards such as the ISO/IEC 15408, ISO/IEC 27001,

Australia's cybersecurity standards compliance implementation is supported by the country's Cybersecurity Strategy 2009 highlighting the need for a consistent and integrated framework of policies, procedures and standards to protect its government's systems, as well as the other interconnected systems. Australia's security measures are: (1) the development and enforcement of the Protective Security Policy Framework (PSPF) to the government agencies through a Directive by the Attorney-General Department (AGD), (2) Australian government has enforced the ISO/IEC 15408 for procurement of products with security functions in the Government Sector, and (3) standards implemented voluntarily and adopted by critical infrastructure organizations are the American National Standard Institute/International Society of Automation (ANSI/ISA)-99, Industrial Automation and Control Systems Security and ISO27799

Health Informatics – Information security management in health using ISO/IEC 27002.

United States of America (USA) has developed various national strategies on cyber security: (1) The Comprehensive National Cybersecurity Initiative (CNCI) will evolve to become the key

nuity Management (BCM) standards (BS 25999/ISO 22301), and.

ISO/IEC 20000, BS 25999 and ISO 22301, in Ref. [5].

Standard, in reference [5].

104 Artificial Intelligence - Emerging Trends and Applications

bar on cybersecurity requirement.

the Communications Sector,

Policy Framework (SPF),

reference [5].

Due to this:

In Mexico, computer legislation is distributed in different rules and laws because the way in which it is pursued depends on the nature of the crime, then a list of some of the most common: show in **Table 1**.

## **1.6. E-government (eGov)**

The year 2015 marked a milestone in efforts to eradicate poverty and promote prosperity for all people on a safe planet. With the adoption of the 2030 Agenda for Sustainable Development and other major international commitments. The 2030 Agenda is centered on a set of farreaching and people-centered universal Sustainable Development Goals (SDGs). Reaching these goals in all countries and creating peaceful, just and inclusive societies will be extremely difficult in the absence of effective, accountable and inclusive institutions. Institutions need to be capable and equipped to adapt the Agenda to the national situation.

They need to be able to mobilize the society and the private sector in implementing the SDGs. Capacities and innovation will be required to promote policy integration, enhance public accountability, promote participation for more inclusive societies as well as ensure equitable and effective public services for all, particularly for the poorest and most vulnerable groups. Information and Communication Technology (ICT) and e-government are important tools to realize these objectives, in reference [8].

The 2030 Agenda itself recognized that "the spread of information and communications technology and global interconnectedness has great potential to accelerate human progress, to


bridge the digital divide and to develop knowledge societies, as does scientific and techno-

High Performance Technology in Algorithmic Cryptography

http://dx.doi.org/10.5772/intechopen.75959

107

The General Assembly has recognized on several occasions the role of information and communications technology in promoting sustainable development and supporting public policies and service delivery. It has underscored that ICT have enabled breakthroughs "in Government and the provision of public services, education, healthcare and employment, as well as in business, agriculture and science, with greater numbers of people having access to services and data that might previously been out of reach or unaffordable". The General Assembly has also specifically affirmed the "potential of e-government in promoting transparency, accountability,

There are several definitions in circulation which differ as to the meaning or scope of the term "eGov". Next definitions illustrate different scope or stress in the understanding of what eGov is: (A) United Nations: the employment of the Internet and the world-wide-web for delivering government information and services to the citizens, (B) The World Wide Web Consortium says that is the use of the Web and other information technologies by governments to interact with the citizenry, between departments and divisions, and with other governments. (C) The Organization for Economic Co-operation and Development focuses on the use of new information and communication technologies (ICTs) by governments as applied to the full range of government functions. In particular, the networking potential offered by the Internet and related technologies has the potential to transform the structures and operation of government,

From the business perspective, eGov is a way of introducing new channels of interaction between government and consumers of its services, in order to make this interaction more convenient to the consumers and cheaper for the provider of the services, these business ben-

• Facilitating access to government data and processes to all types of consumers of these services, be it general public, business, government agencies or their employees, or other

• Improving operational characteristics of government, including: (1) decreasing the cost to government of providing quality services to their consumers, (2) decreasing the load on the office workers by making at least some types of data or some of business processes directly

• The reach of eGov services can be widened from initialized specialized groups to all consumers that have a need and a right for using the service without incurring substantial

The International Organization for Standardization (ISO) has the ISO/IEC 27000 family of standards, which aims to help in the management of asset security such as financial informa-

tion, intellectual property and confidential information of employees or third parties.

logical innovation across areas as diverse as medicine and energy", in reference [8].

efficiency and citizen engagement in public service delivery", in reference [8].

in reference [9].

governments.

efits, including the following

available to service consumers.

additional costs, in reference [9].

*1.6.1. International and national standards in information security*

**Table 1.** Mexico rules and regulation.

bridge the digital divide and to develop knowledge societies, as does scientific and technological innovation across areas as diverse as medicine and energy", in reference [8].

1. E-commerce Procurement Law, leases and public sector services (Ley de adquisiciones,

consumidor)

106 Artificial Intelligence - Emerging Trends and Applications

Administrativo)

Tributaria) 3. Personal data protection Law of Investment Funds (Ley de Fondos de Inversión)

4. Right to the information Article 6 Constitutional (Artículo 6° Constitucional)

arrendamientos y servicios del sector público) Commercial code (Código de comercio) Federal civil code (Código civil federal)

Federal Law for Protection of the Consumer (Ley federal de protección al

Federal Law of Administrative Procedure (Ley Federal de Procedimiento

Law of the Tax Administration Service (Ley del Servicio de Administración

Federal Law on Protection of Personal Data in Possession of Individuals (Ley Federal de Protección de Datos Personales en Posesión de los Particulares)

Articles from 211 bis 1 to 211 bis 7 of the Federal Criminal Code. (Artículos

Federal Criminal Code and Political Constitution of the United Mexican States (Código Penal Federal y Constitución Política de los Estados Unidos

General Law of Transparency (Ley General de Transparencia) Access to public information (Acceso a la Información Pública) Federal Transparency Law (Ley Federal de Transparencia) Access to public information (Acceso a la Información Pública)

Law of Investment Funds (Ley de Fondos de Inversión) Federation fiscal Code (Código Fiscal de la Federación)

Social Security Law (Ley del Seguro Social)

2. Electronic signature Advanced Electronic Signature Law (Ley de Firma Electrónica Avanzada)

5. Violation of correspondence Article 173 to 177 of the Federal Criminal Code (Artículo 173 al 177 del Código Penal Federal)

Código Penal Federal)

9. Industrial property Industrial property law (Ley de la propiedad industrial). 10. Stock market Law of the marketc of values (Ley del mercado de valores).

Mexicanos)

7. Illicit access to computer systems

**Table 1.** Mexico rules and regulation.

and equipment

In Ref. [7].

6. Revelation of secrets Article 210 to 211 Bis of the Federal Penal Code (Artículo 210 al 211 Bis del

8. Copyright Articles from 424 to 429 of the Federal Penal Code (Artículos del 424 al 429 del Código Penal Federal).

11. Telecommunication Federal Law on Telecommunications and Broadcasting (Ley Federal de Telecomunicaciones y Radiodifusión)

del 211 bis 1 al 211 bis 7 del Código Penal Federal).

The General Assembly has recognized on several occasions the role of information and communications technology in promoting sustainable development and supporting public policies and service delivery. It has underscored that ICT have enabled breakthroughs "in Government and the provision of public services, education, healthcare and employment, as well as in business, agriculture and science, with greater numbers of people having access to services and data that might previously been out of reach or unaffordable". The General Assembly has also specifically affirmed the "potential of e-government in promoting transparency, accountability, efficiency and citizen engagement in public service delivery", in reference [8].

There are several definitions in circulation which differ as to the meaning or scope of the term "eGov". Next definitions illustrate different scope or stress in the understanding of what eGov is: (A) United Nations: the employment of the Internet and the world-wide-web for delivering government information and services to the citizens, (B) The World Wide Web Consortium says that is the use of the Web and other information technologies by governments to interact with the citizenry, between departments and divisions, and with other governments. (C) The Organization for Economic Co-operation and Development focuses on the use of new information and communication technologies (ICTs) by governments as applied to the full range of government functions. In particular, the networking potential offered by the Internet and related technologies has the potential to transform the structures and operation of government, in reference [9].

From the business perspective, eGov is a way of introducing new channels of interaction between government and consumers of its services, in order to make this interaction more convenient to the consumers and cheaper for the provider of the services, these business benefits, including the following


## *1.6.1. International and national standards in information security*

The International Organization for Standardization (ISO) has the ISO/IEC 27000 family of standards, which aims to help in the management of asset security such as financial information, intellectual property and confidential information of employees or third parties.

Both ISO (the International Organization for Standardization) and IEC (International Electrotechnical Commission) make up the specialized system for global standardization, in Ref. [10].

**2. High performance in technology**

presented internationally. (e.g. **Table 2**, **Figure 1**).

of shared systems in High performance in computing.

the resources of a computer system such as our computer.

• Representation (parent/child or guardian/dependent)

**Architecture System share (%)**

is also given by logic. It continued with the **Table 5**. Performance metric.

tems in the world, in reference [15].

**2.1. High performance in computation**

HPC stands for High-performance computing

model contains the following components

• Structure (static/dynamic) • Granularity (nested/plane)

**Table 2.** Architecture.

• Initialization (Thick/Fine Ending)

**Cluster** 87.4 **MPP** 12.6

**Table 4**, **Figure 3**).

There is an international body that measures the capacity in supercomputing performance, within its schemes contemplated for most of its platforms the use of Linux, within the architecture is appreciated the use of MPP and the cluster, where you can appreciate the acceptable capabilities both work and performance. With this, it is possible to detect the speeds that are

High Performance Technology in Algorithmic Cryptography

http://dx.doi.org/10.5772/intechopen.75959

109

The TOP500 project ranks and details the 500 most powerful don't distribute computer sys-

The following graphs are presented in terms of performance and shared systems on an international scale, information retrieved from top500.org, in Ref. [15] (e.g. **Table 3**, **Figure 2**).

Mexico only occupies 0.2% of System share, it is for this reason that it is necessary to learn from the superpowers because it seems that the power of the supercomputing is wasted. (e.g.

In which of the registered systems it can be noticed that Asia occupies the first place in the use

One of the first programs to verify the performance of the processes and programs was the task manager because it allows us to measure the performance by means of metrics to support

By computation is processing, which has been attributed to thinking carefully about the hardware or a physical infrastructure, however, it should be distinguished in that the computation

The use of cluster implies simultaneously using concurrent technology, in the concurrent

The implementation of this system aims to preserve the confidentiality, integrity and availability of the information, through the application of a risk management process, which gives certainty to the interested parties that the risk has been adequately managed. They are considered best practices and are not considered mandatory as mentioned on their website, in Ref. [11].

In North America, there is NIST (National Institute of Standard Technology) of the United States Department of Commerce. This Institute handles issues of cybersecurity and privacy through the application of standards and best practices, including the ISO/IEC 27000 standards, in order to help organizations, manage the risk of cybersecurity within of a framework, in Ref. [12]. This framework includes physical, cybernetic and people security, applicable to organizations dependent on technology, such as industrial control systems, Information Technologies, cyber-physical systems, or connected devices, this includes the IoT (Internet of Things), in Ref. [13].

NIST has the National Cybersecurity Improvement Commission that is looking for:


Within NIST is the NC Coe (National Cybersecurity Center of Excellence), which are agreements with industry and institutions that participate in projects related to the topic of cybersecurity, in Ref. [14]. When information security is implemented, and the rules related to the country and form of government are respected, then its regulation must be followed to concretize the integration of the strategies involved. Such as applicable international norms and standards for their protection. Within the IT Audit is made aware of the use of controls for each aspect that involves the transfer of information, such as access control, entry control, communication control, etc.

A control is a procedure that verifies another procedure.

Therefore, control must be implemented that has not been violated, so its internal study is important.

The use of aspects such as high computing performance, infrastructure characteristics, is an important reason. Because there is no supercomputer, we opt for the design of a group of nodes that allows its gradual growth until it arrives as an operation to count on the performance of the supercomputers.

A network of computers has precisely the vision of the distributed and parallel system, however, care must be taken in the construction and operation thereof.

## **2. High performance in technology**

Both ISO (the International Organization for Standardization) and IEC (International Electrotechnical Commission) make up the specialized system for global standardization, in Ref.

The implementation of this system aims to preserve the confidentiality, integrity and availability of the information, through the application of a risk management process, which gives certainty to the interested parties that the risk has been adequately managed. They are considered best practices and are not considered mandatory as mentioned on their website, in

In North America, there is NIST (National Institute of Standard Technology) of the United States Department of Commerce. This Institute handles issues of cybersecurity and privacy through the application of standards and best practices, including the ISO/IEC 27000 standards, in order to help organizations, manage the risk of cybersecurity within of a framework, in Ref. [12]. This framework includes physical, cybernetic and people security, applicable to organizations dependent on technology, such as industrial control systems, Information Technologies, cyber-physical systems, or connected devices, this includes the IoT (Internet of

NIST has the National Cybersecurity Improvement Commission that is looking for:

• Guarantee public security and national economic security.

A control is a procedure that verifies another procedure.

ever, care must be taken in the construction and operation thereof.

• Empower the Americans to have better control of their digital security.

• Awareness and protection at all levels of government, business and society to protect

Within NIST is the NC Coe (National Cybersecurity Center of Excellence), which are agreements with industry and institutions that participate in projects related to the topic of cybersecurity, in Ref. [14]. When information security is implemented, and the rules related to the country and form of government are respected, then its regulation must be followed to concretize the integration of the strategies involved. Such as applicable international norms and standards for their protection. Within the IT Audit is made aware of the use of controls for each aspect that involves the transfer of information, such as access control, entry control,

Therefore, control must be implemented that has not been violated, so its internal study is

The use of aspects such as high computing performance, infrastructure characteristics, is an important reason. Because there is no supercomputer, we opt for the design of a group of nodes that allows its gradual growth until it arrives as an operation to count on the perfor-

A network of computers has precisely the vision of the distributed and parallel system, how-

[10].

108 Artificial Intelligence - Emerging Trends and Applications

Ref. [11].

Things), in Ref. [13].

communication control, etc.

mance of the supercomputers.

privacy,

important.

There is an international body that measures the capacity in supercomputing performance, within its schemes contemplated for most of its platforms the use of Linux, within the architecture is appreciated the use of MPP and the cluster, where you can appreciate the acceptable capabilities both work and performance. With this, it is possible to detect the speeds that are presented internationally. (e.g. **Table 2**, **Figure 1**).

The TOP500 project ranks and details the 500 most powerful don't distribute computer systems in the world, in reference [15].

The following graphs are presented in terms of performance and shared systems on an international scale, information retrieved from top500.org, in Ref. [15] (e.g. **Table 3**, **Figure 2**).

Mexico only occupies 0.2% of System share, it is for this reason that it is necessary to learn from the superpowers because it seems that the power of the supercomputing is wasted. (e.g. **Table 4**, **Figure 3**).

In which of the registered systems it can be noticed that Asia occupies the first place in the use of shared systems in High performance in computing.

## **2.1. High performance in computation**

One of the first programs to verify the performance of the processes and programs was the task manager because it allows us to measure the performance by means of metrics to support the resources of a computer system such as our computer.

HPC stands for High-performance computing

By computation is processing, which has been attributed to thinking carefully about the hardware or a physical infrastructure, however, it should be distinguished in that the computation is also given by logic. It continued with the **Table 5**. Performance metric.

The use of cluster implies simultaneously using concurrent technology, in the concurrent model contains the following components



**Table 2.** Architecture.

**Figure 2.** Performance.

**Table 4.** Continents.

**Figure 3.** System share.

**Continents System share (%)**

High Performance Technology in Algorithmic Cryptography

http://dx.doi.org/10.5772/intechopen.75959

111

**Africa** 0.2 **Americas** 29.8 **Asia** 50.4 **Europe** 18.6 **Oceania** 1

**Figure 1.** Architecture.

This systematic develop; consist in the use of the concurrent engineering by the design of components assembler. As show at the **Figure 4**.

By the moment these techniques have been studied at the final of Computer Sciences: the development of Systems in Real Time, the Recognition of Patterns, Semi-Intelligent Agents, the Computer Security as Cryptography and Crypto Analysis, the Information Systems based in knowledge, and Signs processing, are some cases for we appreciate this trouble, in reference [16].

Performance tests applied to cryptographic algorithms are abduction (statistics) and stress-based.


**Table 3.** Continents.

**Figure 2.** Performance.


**Table 4.** Continents.

This systematic develop; consist in the use of the concurrent engineering by the design of

By the moment these techniques have been studied at the final of Computer Sciences: the development of Systems in Real Time, the Recognition of Patterns, Semi-Intelligent Agents, the Computer Security as Cryptography and Crypto Analysis, the Information Systems based in knowledge, and Signs processing, are some cases for we appreciate this trouble, in

Performance tests applied to cryptographic algorithms are abduction (statistics) and

components assembler. As show at the **Figure 4**.

110 Artificial Intelligence - Emerging Trends and Applications

**Continents Performance**

**Other** 0.70 **Americas** 30.20 **Asia** 48.70 **Europe** 20.40

reference [16].

**Figure 1.** Architecture.

stress-based.

**Table 3.** Continents.

**Figure 3.** System share.


primary goals include the prevention of any unauthorized access to data stored within the system, the safeguarding of users from undesirable results triggered by their own actions, the assurance that various user programs will not interfere with one another, and the ability of different users to have different rights and abilities to cooperate with each other. The issues which deal with him are critical and far reaching, for this reason, they are of vital importance for both its

High Performance Technology in Algorithmic Cryptography

http://dx.doi.org/10.5772/intechopen.75959

113

Achieving security in operating systems depends on the security goals one has. These goals will typically include goals related to confidentiality, integrity, and availability. In any given system, the more detailed particulars of these security goals vary, which implies that different systems will have different security policies intended to help them meet their specific security

Network security threats are also an operating system responsibility since a computer network is usually a rather large, single system which has been created from numerous individual computer systems. The data processing functions must now be distributed among a set of distinct systems thus decentralizing the control of data storage and processing. In addition, information which must be transmitted between the various computers within the network is subject to exposure. Forged user identification and unauthorized access to stored data by legitimate users are also problems which plague a multi-user, multi-resource environment. Consequently, these factors combine to complicate the problem of ensuring a high degree of

A good cryptographic system must have several characteristics: Small variations of plain texts: large variations of encrypted texts, the sizes of the flat texts must be comparable with the ciphers, the ciphered texts must be calculated efficiently from the planes, and the relationship between plain and ciphered texts should be unpredictable. A bad cryptographic system is characterized because: It appears a random relationship between planes and ciphers, but in reality, it is not, it is susceptible to elementary cryptanalysis, the calculation of ciphers is inef-

In the past, protection against the monitoring of communication lines was guaranteed by the use of physically secure lines. This technique, however, proved to be extremely expensive and, often times, impractical. Since that time, it has been discovered that data encryption may be used as a viable alternative to secure lines, in Ref. [17]. Cryptographic technology is, therefore, a relatively inexpensive and highly effective process by which sensitive data may

Android includes cryptography to protect their information, many of the applications that are installed in a cell phone with an operating system have already cryptographic features to protect the information that the cell phone itself has. Each of the applications that have a cell phone, both those that have the factory and those that are installed after having purchased the cell phone, are handled in layers, where each of them uses services offered by the previous ones and offers the same to layers superiors All applications use the services, application interfaces and libraries of the previous layers. This structure facilitates the development of cryptographic applications in this operating system, but there is the disadvantage that the execution of cryptographic algorithms becomes more expensive as their complexity and strength increase, in Ref. [20]. Linux has a kernel crypto API that offers a rich set of cryptographic ciphers as well

security within the network and may present formidable pitfalls, in Ref. [17].

ficient in time and space, and is vulnerable to its own manufacturers, in Ref. [19].

own and its applications' sakes, in reference [17].

be protected against disclosure, in reference [17].

goals, in reference [18].

**Table 5.** Performance metric.

**Figure 4.** Petri net, model of concurrent active object.

## **3. Cryptographic algorithms**

It can be implemented at any time, its operation involves calculating the key from the calculation of two keys that are presented with prime numbers.

The microchip company announced a microcontroller that has the ability to operate with most of the cryptographic algorithms, however, now it is worth thinking, what is it that is intended to ensure?

## **3.1. Cryptographic algorithms in IoT**

Operating system being responsible for the management and control of all computer hardware resources, the security involves a multi-user and multi-programming environment where the primary goals include the prevention of any unauthorized access to data stored within the system, the safeguarding of users from undesirable results triggered by their own actions, the assurance that various user programs will not interfere with one another, and the ability of different users to have different rights and abilities to cooperate with each other. The issues which deal with him are critical and far reaching, for this reason, they are of vital importance for both its own and its applications' sakes, in reference [17].

Achieving security in operating systems depends on the security goals one has. These goals will typically include goals related to confidentiality, integrity, and availability. In any given system, the more detailed particulars of these security goals vary, which implies that different systems will have different security policies intended to help them meet their specific security goals, in reference [18].

Network security threats are also an operating system responsibility since a computer network is usually a rather large, single system which has been created from numerous individual computer systems. The data processing functions must now be distributed among a set of distinct systems thus decentralizing the control of data storage and processing. In addition, information which must be transmitted between the various computers within the network is subject to exposure. Forged user identification and unauthorized access to stored data by legitimate users are also problems which plague a multi-user, multi-resource environment. Consequently, these factors combine to complicate the problem of ensuring a high degree of security within the network and may present formidable pitfalls, in Ref. [17].

A good cryptographic system must have several characteristics: Small variations of plain texts: large variations of encrypted texts, the sizes of the flat texts must be comparable with the ciphers, the ciphered texts must be calculated efficiently from the planes, and the relationship between plain and ciphered texts should be unpredictable. A bad cryptographic system is characterized because: It appears a random relationship between planes and ciphers, but in reality, it is not, it is susceptible to elementary cryptanalysis, the calculation of ciphers is inefficient in time and space, and is vulnerable to its own manufacturers, in Ref. [19].

In the past, protection against the monitoring of communication lines was guaranteed by the use of physically secure lines. This technique, however, proved to be extremely expensive and, often times, impractical. Since that time, it has been discovered that data encryption may be used as a viable alternative to secure lines, in Ref. [17]. Cryptographic technology is, therefore, a relatively inexpensive and highly effective process by which sensitive data may be protected against disclosure, in reference [17].

**3. Cryptographic algorithms**

**Figure 4.** Petri net, model of concurrent active object.

**3.1. Cryptographic algorithms in IoT**

intended to ensure?

**Table 5.** Performance metric.

tion of two keys that are presented with prime numbers.

Metric Load balance

112 Artificial Intelligence - Emerging Trends and Applications

Metrics for multidimensional optimization Cluster vs. Supercomputer

Runtime

Communication volume

Minimum normalized cut Connectors to eigenvectors Clustering by k means

Restrictions in two dimensions

Fourier modal analysis

Fiedler vector or spectral clustering

Computing time

It can be implemented at any time, its operation involves calculating the key from the calcula-

The microchip company announced a microcontroller that has the ability to operate with most of the cryptographic algorithms, however, now it is worth thinking, what is it that is

Operating system being responsible for the management and control of all computer hardware resources, the security involves a multi-user and multi-programming environment where the Android includes cryptography to protect their information, many of the applications that are installed in a cell phone with an operating system have already cryptographic features to protect the information that the cell phone itself has. Each of the applications that have a cell phone, both those that have the factory and those that are installed after having purchased the cell phone, are handled in layers, where each of them uses services offered by the previous ones and offers the same to layers superiors All applications use the services, application interfaces and libraries of the previous layers. This structure facilitates the development of cryptographic applications in this operating system, but there is the disadvantage that the execution of cryptographic algorithms becomes more expensive as their complexity and strength increase, in Ref. [20]. Linux has a kernel crypto API that offers a rich set of cryptographic ciphers as well as other data transformation mechanisms and methods to invoke these. The kernel crypto API serves the following entity types: (a) consumers requesting cryptographic services and (b) data transformation implementations (typically ciphers) that can be called by consumers using the kernel crypto API, in Ref. [21].

Apple Inc. has a variety of cryptographic technologies around of all their products.

App Transport Security (ATS) ATS establishes best-practice policies for secure network communications using Apple platforms, employing Transport Layer Security (TLS) version.

## **3.2. Forward secrecy and strong cryptography**

*Secure Transport API*. Use Apple's secure transport API to employ current versions of the Secure Sockets Layer (SSL), Transport Layer Security (TLS), and Datagram Transport Layer Security (DTLS) cryptographic protocols for network communications.

*Supported Algorithms*. With iOS 10 and macOS v10. 12, the RC4 cipher suite is now disabled by default. In addition, Apple recommends that you upgrade your servers to use certificates signed with the SHA-2 cryptographic function.

*Cryptographic Signing*. If distributing your Mac app outside of the Mac App Store, use cryptographic signing with Developer ID to certify that your app is genuine.

*CryptoTokenKit for Smart Card Support*. The CryptoTokenKit framework provides first-class access for working with smart cards and other cryptographic devices in macOS, in reference [22].

The type of applications that generally work with sophisticated equipment such as the supercomputer or cluster, is one that presents complexity in it is development, development, those that consume more massive storage capacity and require speeds appropriate to their way of being. While social networks currently involve more data consumption, it is important to recognize that so are the applications that generate scientific researchers by searching for new solutions to real-world problems or that are analogous to their area of expertise. The aim of the following scheme is precise to make known part of the advances that we as a research team have. We do not express all theories or investigations, it's just a

High Performance Technology in Algorithmic Cryptography

http://dx.doi.org/10.5772/intechopen.75959

115

**Figure 5.** Histogram of platforms.

**Figure 6.** JFlap.

Microsoft shows in its web site the Cryptography API: Next Generation (CNG) that is the long-term replacement for the CryptoAPI. CNG is designed to be extensible at many levels and cryptography agnostic in behavior. Some of its features are listed next: Cryptographic Agility, Certification and Compliance, Suite B Support, Legacy Support, Kernel Mode Support, Auditing and Replaceable Random Number Generators, in reference [23].

According to the list dictated by top500.org, the best platforms in HPC are presented below (e.g. **Figure 5**).

## **3.3. Public key cryptography**

The data that has been monitored by telepathic systems arises from information related to greenhouses, they are perhaps only data that support a part of a process in agronomy, however, they are important for their better interpretation.

JFLAP is software for experimenting with formal languages topics including nondeterministic finite automata, nondeterministic pushdown automata, multi-tape Turing machines, several types of grammars, parsing, and L-systems, by the University Duke, in Ref. [24] (e.g. **Figure 6**).

**Figure 5.** Histogram of platforms.

as other data transformation mechanisms and methods to invoke these. The kernel crypto API serves the following entity types: (a) consumers requesting cryptographic services and (b) data transformation implementations (typically ciphers) that can be called by consumers using the

App Transport Security (ATS) ATS establishes best-practice policies for secure network communications using Apple platforms, employing Transport Layer Security (TLS) version.

*Secure Transport API*. Use Apple's secure transport API to employ current versions of the Secure Sockets Layer (SSL), Transport Layer Security (TLS), and Datagram Transport Layer

*Supported Algorithms*. With iOS 10 and macOS v10. 12, the RC4 cipher suite is now disabled by default. In addition, Apple recommends that you upgrade your servers to use certificates

*Cryptographic Signing*. If distributing your Mac app outside of the Mac App Store, use crypto-

*CryptoTokenKit for Smart Card Support*. The CryptoTokenKit framework provides first-class access for working with smart cards and other cryptographic devices in macOS, in refer-

Microsoft shows in its web site the Cryptography API: Next Generation (CNG) that is the long-term replacement for the CryptoAPI. CNG is designed to be extensible at many levels and cryptography agnostic in behavior. Some of its features are listed next: Cryptographic Agility, Certification and Compliance, Suite B Support, Legacy Support, Kernel Mode

According to the list dictated by top500.org, the best platforms in HPC are presented below

The data that has been monitored by telepathic systems arises from information related to greenhouses, they are perhaps only data that support a part of a process in agronomy, how-

JFLAP is software for experimenting with formal languages topics including nondeterministic finite automata, nondeterministic pushdown automata, multi-tape Turing machines, several types of grammars, parsing, and L-systems, by the University Duke, in Ref. [24]

Support, Auditing and Replaceable Random Number Generators, in reference [23].

Apple Inc. has a variety of cryptographic technologies around of all their products.

Security (DTLS) cryptographic protocols for network communications.

graphic signing with Developer ID to certify that your app is genuine.

kernel crypto API, in Ref. [21].

114 Artificial Intelligence - Emerging Trends and Applications

ence [22].

(e.g. **Figure 5**).

(e.g. **Figure 6**).

**3.3. Public key cryptography**

ever, they are important for their better interpretation.

**3.2. Forward secrecy and strong cryptography**

signed with the SHA-2 cryptographic function.

The type of applications that generally work with sophisticated equipment such as the supercomputer or cluster, is one that presents complexity in it is development, development, those that consume more massive storage capacity and require speeds appropriate to their way of being. While social networks currently involve more data consumption, it is important to recognize that so are the applications that generate scientific researchers by searching for new solutions to real-world problems or that are analogous to their area of expertise. The aim of the following scheme is precise to make known part of the advances that we as a research team have. We do not express all theories or investigations, it's just a

**Figure 6.** JFlap.

preamble of what these systems can be capable of doing in relation to what they are capable of doing. The use of state machines is important in the design of logical machines because they allow deducing the logic of their operation. That is to say, the following is the use of a simple machine that allows to deduce chains of 0 and 1, in such a way accept words of 0 and 1s, like prime numbers.

For the generation of keys, it is necessary to design an algorithm that allows the generation of random numbers, below is an algorithm that supports the generation of pseudorandom numbers, in reference [25].

Show in notation with Henon Map, this is a method chaotic map, and it is employed in dynamic systems to discrete process (e.g. **Figure 7**).

*3.3.1. Encryption in one dimension (1D) flow*

Used the RSA Algorithm to prove of text [26] (e.g. **Figure 8**).

## *3.3.2. Encryption in two dimension (2D) flow*

The tent map, like various chaotic maps, as pseudorandom sequence generator, in Ref. [27] (e.g. **Figures 9** and **10**).

These algorithms applied to images allow to know the range of operations necessary to work together, however, it is important to emphasize the moment in which the integration of the

High Performance Technology in Algorithmic Cryptography

http://dx.doi.org/10.5772/intechopen.75959

117

The reading of information generated by the stereo vision allow to get hard acquisition data, as the focal lens adjust. It proposes the tracking of systems at real time uses with the transfor-

Real Time Stereo Vision with a modified Census transform and fast tracking in FPGA, in

cryptographic algorithms must be perfected to the process involved.

*3.3.3. Encryption in 3D flow*

**Figure 8.** Algorithm by RSA.

Ref. [28].

mation Census. it is left for future work.

**Figure 7.** Pseudorandom by Henon map.

**Figure 8.** Algorithm by RSA.

preamble of what these systems can be capable of doing in relation to what they are capable of doing. The use of state machines is important in the design of logical machines because they allow deducing the logic of their operation. That is to say, the following is the use of a simple machine that allows to deduce chains of 0 and 1, in such a way accept words of 0 and

For the generation of keys, it is necessary to design an algorithm that allows the generation of random numbers, below is an algorithm that supports the generation of pseudorandom

Show in notation with Henon Map, this is a method chaotic map, and it is employed in

The tent map, like various chaotic maps, as pseudorandom sequence generator, in Ref. [27]

1s, like prime numbers.

numbers, in reference [25].

(e.g. **Figures 9** and **10**).

**Figure 7.** Pseudorandom by Henon map.

dynamic systems to discrete process (e.g. **Figure 7**).

Used the RSA Algorithm to prove of text [26] (e.g. **Figure 8**).

*3.3.1. Encryption in one dimension (1D) flow*

116 Artificial Intelligence - Emerging Trends and Applications

*3.3.2. Encryption in two dimension (2D) flow*

These algorithms applied to images allow to know the range of operations necessary to work together, however, it is important to emphasize the moment in which the integration of the cryptographic algorithms must be perfected to the process involved.

#### *3.3.3. Encryption in 3D flow*

The reading of information generated by the stereo vision allow to get hard acquisition data, as the focal lens adjust. It proposes the tracking of systems at real time uses with the transformation Census. it is left for future work.

Real Time Stereo Vision with a modified Census transform and fast tracking in FPGA, in Ref. [28].

**4. Conclusions**

**4.1. Operative systems and cryptography**

risks related with data security.

**4.2. Operative systems**

Maturity Model (IAMM).

target the security of their data.

Security is a priority since the operative systems are responsible to manage a multi-user and multi-programing environment. Complexity around these tasks involve goals for preventing unauthorized access, safeguarding users from undesirable results by their own actions, managing rights and abilities to cooperate, they also implies confidentiality, integrity and availability. Another thing to take on account is the network security which is responsible to protect the information that is exposed when it is transmitted between various computers. A solution for these problems have been tackled through hardware solutions however these solutions have proved to be expensive and, often times, impractical. Data encryption is a viable alternative solution to secure the information because cryptographic technology is, a relatively inexpensive and highly effective process to protect sensitive data against disclosure. By default, popular operative systems like Android, Linux, Mac OS and Microsoft Windows have cryptography technologies for enhancing their security systems in the way to protect sensible data, even some of the share their APIs with the objective to give tools for reducing

High Performance Technology in Algorithmic Cryptography

http://dx.doi.org/10.5772/intechopen.75959

119

Countries around the world like United Kingdom, Australia and United States of America are making respectable efforts directed specifically in the cyber security infrastructures. Most of them adopted international security standards of which stand out ISO/IEC 27032:2012, ISO/IEC 27001, ISO 22301, ISO/IEC 15408, ISO27799, ISO/IEC 27010 and they also have its own intern standards and specialized organizations as is the case of United Kingdom with the Network Interoperability Consultative Committee (NICC) and the Information Assurance

In the case of Mexico, the government has not adopted international standards yet and the legislation is distributed in different laws and norms because the way in which it is pursued depends on the nature of the crime. Regulations consider crimes like: separated into different laws like electronic commerce, electronic signature, personal data protection, right to the information, revelation of secrets, industrial property among others. This weakens people, companies and the government itself when they are the victims of an attack which has as a

It is necessary to prepare the relevant technology for the cryptographic algorithms that work with Linux, in the Cluster infrastructure, to make the most of the benefits related to a complete information processing. While it is necessary to have logical security, you only need to work on the integration of technologies in schemes at the level nodes of the net, as is the case with the Internet of things (IoT) thus ensuring all kinds of information such as multimedia can be operated.

**Figure 9.** Algorithm by tent map.

**Figure 10.** Algorithm by tent map.

## **4. Conclusions**

## **4.1. Operative systems and cryptography**

Security is a priority since the operative systems are responsible to manage a multi-user and multi-programing environment. Complexity around these tasks involve goals for preventing unauthorized access, safeguarding users from undesirable results by their own actions, managing rights and abilities to cooperate, they also implies confidentiality, integrity and availability. Another thing to take on account is the network security which is responsible to protect the information that is exposed when it is transmitted between various computers. A solution for these problems have been tackled through hardware solutions however these solutions have proved to be expensive and, often times, impractical. Data encryption is a viable alternative solution to secure the information because cryptographic technology is, a relatively inexpensive and highly effective process to protect sensitive data against disclosure. By default, popular operative systems like Android, Linux, Mac OS and Microsoft Windows have cryptography technologies for enhancing their security systems in the way to protect sensible data, even some of the share their APIs with the objective to give tools for reducing risks related with data security.

#### **4.2. Operative systems**

**Figure 10.** Algorithm by tent map.

**Figure 9.** Algorithm by tent map.

118 Artificial Intelligence - Emerging Trends and Applications

Countries around the world like United Kingdom, Australia and United States of America are making respectable efforts directed specifically in the cyber security infrastructures. Most of them adopted international security standards of which stand out ISO/IEC 27032:2012, ISO/IEC 27001, ISO 22301, ISO/IEC 15408, ISO27799, ISO/IEC 27010 and they also have its own intern standards and specialized organizations as is the case of United Kingdom with the Network Interoperability Consultative Committee (NICC) and the Information Assurance Maturity Model (IAMM).

In the case of Mexico, the government has not adopted international standards yet and the legislation is distributed in different laws and norms because the way in which it is pursued depends on the nature of the crime. Regulations consider crimes like: separated into different laws like electronic commerce, electronic signature, personal data protection, right to the information, revelation of secrets, industrial property among others. This weakens people, companies and the government itself when they are the victims of an attack which has as a target the security of their data.

It is necessary to prepare the relevant technology for the cryptographic algorithms that work with Linux, in the Cluster infrastructure, to make the most of the benefits related to a complete information processing. While it is necessary to have logical security, you only need to work on the integration of technologies in schemes at the level nodes of the net, as is the case with the Internet of things (IoT) thus ensuring all kinds of information such as multimedia can be operated.

## **Author details**

Arturo Lezama-León<sup>1</sup> \*, José Juan Zarate-Corona1 , Evangelina Lezama-León<sup>2</sup> , José Angel Montes-Olguín<sup>3</sup> , Juan Ángel Rosales-Alba3 and Ma. de la Luz Carrillo-González<sup>3</sup> [14] https://nccoe.nist.gov/ [Accessed: January 6, 2018]

[Accessed: January 8, 2018]

[Accessed: January 9, 2018]

[Accessed: January 10, 2018]

[Accessed: January 12, 2018]

January 14, 2018]

ISBN: 1879-8314 [Accessed: January 13, 2018]

January 9, 2018]

[15] top500.org, published on November 2017 [Accessed: January 7, 2018]

Rochester Institute of Technology; 1981 [Accessed: January 8, 2018]

developer.apple.com/security/. 2018 [Accessed: January 11, 2018]

Wesley. 2007. ISBN: 978-84-7829-088-8 [Accessed: January 12, 2018]

[16] Fúster A, Hernández L, Martín A, Montoya F, Muñoz J. Criptografía, protección de datos y aplicaciones, 2013. ALFAOMEGA. ISBN 978-607-707-469-4 [Accessed: January 7, 2018]

High Performance Technology in Algorithmic Cryptography

http://dx.doi.org/10.5772/intechopen.75959

121

[17] Painchaud M. Cryptography and its application to operating system security [thesis].

[18] Arpaci-Dusseau RH, Arpaci-Dusseau AC. Operating systems: Three easy pieces. 2015.

[19] Morales Luna G. Criptografía: Seguridad en la información. CINVESTAV-IPN. 2006

[20] Núñez R. El uso de la criptografía en el sistema operativo Android. 2016 [Accessed:

[21] The Linux Kernel. (s.f.). The Linux Kernel. Recuperado el 05 de January de 2017, de The Linux Kernel: https://www.kernel.org/doc/html/v4.12/crypto/intro.html#introduction.

[22] Apple Inc. Security. Recuperado el 18 de January de 2018, de Apple developer: https://

[23] Microsoft. CNG Features. Recuperado el 18 de January de 2018, de Microsoft: https:// msdn.microsoft.com/es-es/library/windows/desktop/bb204775(v=vs.85).aspx. 2018

[24] Jopcroft JE, Motwani R, Ullman JD. Introduction to Automata Theory; Languages and Computation. 3rd ed. Published by Pearson Education, Inc. Publishing as Addison-

[25] Lezama-león A, Liceaga-Ortíz-de-la-Peña JM, Zarate-Corona JJ. IoT Equipment Security. Advances in Digital Technologies. In: Mizera-Pietraszko J et al. editors. IOS Press, 2017 © 2017 The authors and IOS Press. All rights reserved. DOI: 10.3233/978-1-61499-773-3- 126. Vol. 295. Publishing by Frontiers in Artificial Intelligence and Applications (FAIA).

[26] González RC, Woods R, Eddins SL. Digital Image Processing Using Matlab. Pearson Education, Inc, Publishing as Prentice Hall; 2004. ISBN: 0-13-008519-7 [Accessed:

[27] Liceaga-Ortiz-De-La-Peña JM, Lezama-León A, Zarate-Corona JJ, Hernández-Terrazas RO. Politechnique University of Pachuca. The tent map, like various chaotic maps, as pseudorandom sequence generator. SIMCI 2013. Simposio Ibero Americano

[28] Pérez JMX, León AL, de la Peña JMLO, Hernández Terrazas RO. Real Time Stereo Vision with a modified Census transform in FPGA. In: Proceedings of the IEEE International Conference on Electronics, Robotics and Automotive Mechanics Conference (CERMA),

Multidisciplinario de Ciencias e Ingenierías [Accessed: January 14, 2018]

2012. DOI: 10.119/CERMA.2012.23 [Accessed: January 15, 2018]

\*Address all correspondence to: lezama@upp.edu.mx

1 Mathematics and Technology Sciences, Polytechnic University of Pachuca, Zempoala, Hidalgo, Mexico

2 Strategic Planning and Technology Management, Popular Autonomous University of the State of Puebla, Mexico

3 High Technological Institute Zacatecas Norte, Mexico

## **References**


[14] https://nccoe.nist.gov/ [Accessed: January 6, 2018]

**Author details**

Hidalgo, Mexico

**References**

State of Puebla, Mexico

January 1, 2018]

January 2, 2018]

Arturo Lezama-León<sup>1</sup>

José Angel Montes-Olguín<sup>3</sup>

120 Artificial Intelligence - Emerging Trends and Applications

\*, José Juan Zarate-Corona1

\*Address all correspondence to: lezama@upp.edu.mx

3 High Technological Institute Zacatecas Norte, Mexico

07-0088770 [Accessed: January 3, 2018]

[7] Torres & José. s.f. [Accessed: January 4, 2018]

[8] United Nations. 2016 [Accessed: January 4, 2018]

[Accessed: January 4, 2018]

, Juan Ángel Rosales-Alba3

1 Mathematics and Technology Sciences, Polytechnic University of Pachuca, Zempoala,

2 Strategic Planning and Technology Management, Popular Autonomous University of the

[1] Turing AM. Computing Machinery and Intelligence. Oxford Academic, Google Scholar Mind. 1 October 1950;**LIX**(236):433-460. DOI: 10.1093/mind/LIX.236.433. [Accessed:

[2] Russell S, Norvig P. Artificial Intelligence a Modern Approach. 2nd ed. Pearson Education Inc. Publishing as Prentice Hall; 2004. p. 1212. ISBN: 0-13-790395-2 [Accessed:

[3] Rich E, Knight K, Nair SB. Artificial Intelligence. 3rd ed. Mc Graw Hill. ISBN. 13: 978-0-

[4] Silva M. Las redes de Petri en la automática y la informática. AC; 1985. ISBN: 8472880451

[5] KPMG International. Cyber Security Standards Compliance: A Vital Measure to Critical Infrastructure Protection. Printed in Malaysia. 2015 [Accessed: January 4, 2018]

[6] World Economic Forum. Insight Report Global Risks. 9th ed. 2014. ISBN 13: 92-95044-60-

[10] https://www.iso.org/obp/ui/#iso:std:iso-iec:27001:ed-2:v1:en [Accessed: January 4, 2018] [11] https://www.iso.org/isoiec-27001-information-security.html [Accessed: January 4, 2018]

[13] https://www.nist.gov/sites/default/files/documents/2017/12/05/draft-2\_framework-

6. https://www.weforum.org/risks [Accessed: January 4, 2018]

[9] California Department of Technology. 2014 [Accessed: January 4, 2018]

[12] https://www.nist.gov/cyberframework [Accessed: January 4, 2018]

v1-1\_without-markup.pdf [Accessed: January 5, 2018]

, Evangelina Lezama-León<sup>2</sup>

,

and Ma. de la Luz Carrillo-González<sup>3</sup>


**Chapter 6**

**Provisional chapter**

**A Deterministic Algorithm for Arabic Character**

**A Deterministic Algorithm for Arabic Character** 

DOI: 10.5772/intechopen.76944

Handheld devices are flooding the market, and their use is becoming essential among people. Hence, the need for fast and accurate character recognition methods that ease the data entry process for users arises. There are many methods developed for handwriting character recognition especially for Latin-based languages. On the other hand, character recognition methods for Arabic language are lacking and rare. The Arabic language has many traits that differentiate it from other languages: first, the writing process is from right to left; second, the letter changes shape according to the position in the work; and third, the writing is cursive. Such traits compel to produce a special character recognition method that helps in producing applications for Arabic language. This research proposes a deterministic algorithm that recognizes Arabic alphabet letters. The algorithm is based on four categorizations of Arabic alphabet letters. Then, the research suggested a deterministic algorithm composed of 34 rules that can predict the character based on the use of all of categorizations as attributes assembled in a matrix

**Keywords:** conditional random field, rule-based mode, word prediction, virtual keyboard, Arabic text entry, enhancement, text entry system, theory of randomized

Arabic language is one of the top five languages spoken in the world. Arabic is used by more than 422 million native and non-native speakers in the world. Also, the letters of the Arabic alphabets are used in other languages like Urdu (65 million natives and 94 million

search heuristics, structured prediction, theory of computation

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

**Recognition Based on Letter Properties**

**Recognition Based on Letter Properties**

Evon Abu-Taieh, Auhood Alfaries, Nabeel Zanoon,

Evon Abu-Taieh, Auhood Alfaries, Nabeel Zanoon,

Issam H. Al Hadid and Alia M. Abu-Tayeh

Issam H. Al Hadid and Alia M. Abu-Tayeh

http://dx.doi.org/10.5772/intechopen.76944

**Abstract**

for this purpose.

**1. Introduction**

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

#### **A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties**

DOI: 10.5772/intechopen.76944

Evon Abu-Taieh, Auhood Alfaries, Nabeel Zanoon, Issam H. Al Hadid and Alia M. Abu-Tayeh Evon Abu-Taieh, Auhood Alfaries, Nabeel Zanoon, Issam H. Al Hadid and Alia M. Abu-Tayeh

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.76944

#### **Abstract**

Handheld devices are flooding the market, and their use is becoming essential among people. Hence, the need for fast and accurate character recognition methods that ease the data entry process for users arises. There are many methods developed for handwriting character recognition especially for Latin-based languages. On the other hand, character recognition methods for Arabic language are lacking and rare. The Arabic language has many traits that differentiate it from other languages: first, the writing process is from right to left; second, the letter changes shape according to the position in the work; and third, the writing is cursive. Such traits compel to produce a special character recognition method that helps in producing applications for Arabic language. This research proposes a deterministic algorithm that recognizes Arabic alphabet letters. The algorithm is based on four categorizations of Arabic alphabet letters. Then, the research suggested a deterministic algorithm composed of 34 rules that can predict the character based on the use of all of categorizations as attributes assembled in a matrix for this purpose.

**Keywords:** conditional random field, rule-based mode, word prediction, virtual keyboard, Arabic text entry, enhancement, text entry system, theory of randomized search heuristics, structured prediction, theory of computation

#### **1. Introduction**

Arabic language is one of the top five languages spoken in the world. Arabic is used by more than 422 million native and non-native speakers in the world. Also, the letters of the Arabic alphabets are used in other languages like Urdu (65 million natives and 94 million

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

non-native) and Persian (110 million) languages. In addition, languages like Baluchi, Brahui, Pashto, Central Kurdish, Sindhi, Kashmiri, Punjabi, and Uyghur are using the Arabic letters. Hence, there is a need to develop an algorithm for character recognition for the Arabic language. Yet, there are major challenges that arise: first, Arabic is a cursive language. Unlike other languages written, Arabic alphabets change shape as written; hence, separate letters in Arabic are usually sub-word rather than stand-alone word. Second, Arabic is written from right to left unlike Latin languages. Third, Arabic has 28 alphabets, with some letters changing shapes based on the location of the letter in the word. Also, some letters are very similar in form yet have secondary marks to differentiate. Furthermore, Arabic is written from right to left cursively. Due to all the mentioned reasons, Arabic character recognition systems are under developed and lacking.

network based which is not deterministic. Another research conducted by Abu-Taieh [10] used an enhanced method of neural network. Another promising research that was studied by Aljarrah et al. [11] is the study concentrated on printed Arabic rather than hand written in order to produce Arabic optical character recognition system. The need for Arabic character recognition is evident according to Ali and Sagheer [12] which addresses the need of smart

A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties

http://dx.doi.org/10.5772/intechopen.76944

125

Researchers Supriana and Nasution [13] cited nine works including their own research that all are non-deterministic. The research of Sarfraz, Ahmed, and Ghazi [14] developed a license plate recognition system. The research by Izakian, Monadjemi, Ladani, and Zamanifar [15] used chain codes, while Abandah, Khedher, and Mohammed [16] used selected feature extraction techniques. In their research Al-Taani and Al-Haj [17] used structural features, while Kapogiannopoulos and Kalouptsidis [18] used skew angle. The research of Zidouri [19] proposed a general method for Arabic letter segmentation, while Amin [20] used global features and decision tree technique on printed letters not handwritten. Cowell and Hussain

To develop the proposed algorithm, the researchers studied and presented the different categorizations for Arabic letters. Next, each categorization will be explained accordingly. The first categorization method depends on the number of dots used with each letter. The second categorization method depends on the shape of the letter, with classification to the letters. The third categorization is presented with the shape of the letter as used in the beginning, middle, and end of the word. The fourth categorization method relays on the proportion method, which is a method used in Arabic calligraphy that is based on rhombic dot. Each categoriza-

The use of dots to distinguish letters in Latin-based languages is familiar to people. In English, small letters I and J are distinguished by using a dot on top of the letter. In Arabic the use of a dot is used extensively; in fact, only 12 letters out of 28 letters are not doted. Furthermore, some letters use one, two, and three dots. Next, the concept of doted letters will be explained.

The first categorization is according to the number of dots used with each letter. This categorization splits the 28 letters (**Table 1**) into five branches, and from within it breeds two extra letters. The first branch is composed of 12 letters that has no dots whatsoever. The second branch is composed of ten letters: the eight letters have their dot above the body of the letter, and the other two letters have their dot below the letter body. The third branch is composed of four letters: the three letters have their two dots above the body of the letter, and the other one has two dots below its body. The fourth branch has two letters with three dots above the body.

mobile phones and tablets and hence the need for Arabic character recognition.

[21] used extracting features.

**3. Letters in Arabic language**

tion will be explained in Sections 3.1, 3.2, 3.3, and 3.4.

**3.1. First categorization: number of dots in the letter**

This research is composed of five sections. The first section presents 20 related works. Then, the research explains the letter shapes in Arabic language and the four categories used in the proposed algorithm. The four categorization methods will be employed to develop a deterministic algorithm method of categorization. The first categorization method depends on the number of dots used with each letter. The second categorization method depends on the shape of the letter, with classification to the letters. The third categorization is presented with the shape of the letter as used in the beginning, middle, and end of the word. The fourth categorization method relays on the proportion method, which is a method used in Arabic calligraphy that is based on rhombic dot. Then, the research suggested a deterministic algorithm composed of 34 rules that can predict the character based on the use of all of categorizations as attributes assembled in a matrix for this purpose.

## **2. Related work**

Character recognition is an open-ended problem. A computer cannot recognize character or language alphabets. There is a great progress in character recognition that all can be seen in the different smart application used on smart phones and pad as well as notebooks and PCs. Problems that arise with non-Latin languages are well known, and many researchers have conducted research for their respective languages: Hindi language [1]; Chinese language [2–4]; and Arabic Language [5, 6]. Furthermore, many researchers have conducted research for the Arabic alphabets: Parvez and Mahmoud [7] conducted a survey for text recognition and published their work in a paper titled *Offline Arabic Handwritten Text Recognition: A Survey*. Three researchers [8] conducted a research and published their work in a paper titled *Robust Named Entity Detection Using an Arabic Offline Handwriting Recognition System*; the paper focus was "on extraction of a predefined set of Arabic named entities (NEs) in Arabic handwritten text." Another research by Fouad Slimane, Slim Kanoun, Adel M. Alimi, Jean Hennebert, and Rolf Ingold was conducted in 2010, yet the concentration was on printed character rather than handwritten one. The research conducted by Fard, Moghadam, Bidgoli, and Hussain [9] was very promising and was conducted on Persian language, yet the method used was neural network based which is not deterministic. Another research conducted by Abu-Taieh [10] used an enhanced method of neural network. Another promising research that was studied by Aljarrah et al. [11] is the study concentrated on printed Arabic rather than hand written in order to produce Arabic optical character recognition system. The need for Arabic character recognition is evident according to Ali and Sagheer [12] which addresses the need of smart mobile phones and tablets and hence the need for Arabic character recognition.

Researchers Supriana and Nasution [13] cited nine works including their own research that all are non-deterministic. The research of Sarfraz, Ahmed, and Ghazi [14] developed a license plate recognition system. The research by Izakian, Monadjemi, Ladani, and Zamanifar [15] used chain codes, while Abandah, Khedher, and Mohammed [16] used selected feature extraction techniques. In their research Al-Taani and Al-Haj [17] used structural features, while Kapogiannopoulos and Kalouptsidis [18] used skew angle. The research of Zidouri [19] proposed a general method for Arabic letter segmentation, while Amin [20] used global features and decision tree technique on printed letters not handwritten. Cowell and Hussain [21] used extracting features.

## **3. Letters in Arabic language**

non-native) and Persian (110 million) languages. In addition, languages like Baluchi, Brahui, Pashto, Central Kurdish, Sindhi, Kashmiri, Punjabi, and Uyghur are using the Arabic letters. Hence, there is a need to develop an algorithm for character recognition for the Arabic language. Yet, there are major challenges that arise: first, Arabic is a cursive language. Unlike other languages written, Arabic alphabets change shape as written; hence, separate letters in Arabic are usually sub-word rather than stand-alone word. Second, Arabic is written from right to left unlike Latin languages. Third, Arabic has 28 alphabets, with some letters changing shapes based on the location of the letter in the word. Also, some letters are very similar in form yet have secondary marks to differentiate. Furthermore, Arabic is written from right to left cursively. Due to all the mentioned reasons, Arabic character recognition systems are

This research is composed of five sections. The first section presents 20 related works. Then, the research explains the letter shapes in Arabic language and the four categories used in the proposed algorithm. The four categorization methods will be employed to develop a deterministic algorithm method of categorization. The first categorization method depends on the number of dots used with each letter. The second categorization method depends on the shape of the letter, with classification to the letters. The third categorization is presented with the shape of the letter as used in the beginning, middle, and end of the word. The fourth categorization method relays on the proportion method, which is a method used in Arabic calligraphy that is based on rhombic dot. Then, the research suggested a deterministic algorithm composed of 34 rules that can predict the character based on the use of all of categorizations as attributes assembled in a matrix for

Character recognition is an open-ended problem. A computer cannot recognize character or language alphabets. There is a great progress in character recognition that all can be seen in the different smart application used on smart phones and pad as well as notebooks and PCs. Problems that arise with non-Latin languages are well known, and many researchers have conducted research for their respective languages: Hindi language [1]; Chinese language [2–4]; and Arabic Language [5, 6]. Furthermore, many researchers have conducted research for the Arabic alphabets: Parvez and Mahmoud [7] conducted a survey for text recognition and published their work in a paper titled *Offline Arabic Handwritten Text Recognition: A Survey*. Three researchers [8] conducted a research and published their work in a paper titled *Robust Named Entity Detection Using an Arabic Offline Handwriting Recognition System*; the paper focus was "on extraction of a predefined set of Arabic named entities (NEs) in Arabic handwritten text." Another research by Fouad Slimane, Slim Kanoun, Adel M. Alimi, Jean Hennebert, and Rolf Ingold was conducted in 2010, yet the concentration was on printed character rather than handwritten one. The research conducted by Fard, Moghadam, Bidgoli, and Hussain [9] was very promising and was conducted on Persian language, yet the method used was neural

under developed and lacking.

124 Artificial Intelligence - Emerging Trends and Applications

this purpose.

**2. Related work**

To develop the proposed algorithm, the researchers studied and presented the different categorizations for Arabic letters. Next, each categorization will be explained accordingly. The first categorization method depends on the number of dots used with each letter. The second categorization method depends on the shape of the letter, with classification to the letters. The third categorization is presented with the shape of the letter as used in the beginning, middle, and end of the word. The fourth categorization method relays on the proportion method, which is a method used in Arabic calligraphy that is based on rhombic dot. Each categorization will be explained in Sections 3.1, 3.2, 3.3, and 3.4.

## **3.1. First categorization: number of dots in the letter**

The use of dots to distinguish letters in Latin-based languages is familiar to people. In English, small letters I and J are distinguished by using a dot on top of the letter. In Arabic the use of a dot is used extensively; in fact, only 12 letters out of 28 letters are not doted. Furthermore, some letters use one, two, and three dots. Next, the concept of doted letters will be explained.

The first categorization is according to the number of dots used with each letter. This categorization splits the 28 letters (**Table 1**) into five branches, and from within it breeds two extra letters. The first branch is composed of 12 letters that has no dots whatsoever. The second branch is composed of ten letters: the eight letters have their dot above the body of the letter, and the other two letters have their dot below the letter body. The third branch is composed of four letters: the three letters have their two dots above the body of the letter, and the other one has two dots below its body. The fourth branch has two letters with three dots above the body.


**Table 1.** Arabic alphabets (according to dots).

The fifth branch deals with hamza: there is one basic letter where the hamza is part of the letter "ك, "and the other hamza is not part of the letter like the "أ "and "وْ. "The categorization is summarized in **Table 1**.

#### **3.2. Second categorization: letter shape**

The second categorization is according to shape of the letter: this categorization splits the 28 letters into 15 branches based on the body of the letter rather than the dots on the letter (see **Table 2**). However, some increase the number of shapes to 18 shapes [22]. The first branch is made of four letters all very similar in shape: two of them are differentiated by one dot (one above the letter and below the letter), and the other two (one has two dots above it and one has three dots above it). The second branch has two letters very similar to each other: one can differentiate between them by the dot above one, while the other one has no dot; furthermore, the third branch and the fourth branch have the same idea similar in shape, yet one dot makes a difference. The same happens with the fifth and sixth branches. The seventh branch has three letters that are similar in shape: one without dot, one with dot above it, and one with dot below it. The eighth branch has two letters that are very similar in shape: one with one dot and the other with two dots. The ninth branch has two letters similar in shape: one with no dots and the other with three dots. The tenth has two letters: one with no dots and other with two dots. The 11th branch has two letters: one with no hamza and the other with hamza shape above the body of the letter. The 12th, 13th, 14, and 15th branches are not similar to each other nor to the rest of the letters.

One may add here a note about the shape of the letters; there are nine letters that have as part of them enclosed space that resembles a circle. These nine letters are (م و ه ف ق ط ظ ص ض(. The enclosed circle property is an important aspect of the nine letters that will be used in the algorithm at a later stage.

> and the rest of the letters do change form as seen in **Table 3**. Using the same six letters in the middle or end of the word, these letters are only connected and they do not change form. All

> From the previous one can notice that six letters have special characteristics, namely, (ا, و, د, ذ, ر, ز(. These characters when used in the beginning of a word must stand alone, and also when they end a group of characters, they must be followed by independent character. Hence, they

letters when used at the end of the word have two states: connected and stand-alone.

**Branch Count Shape of letter Proportion**

A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties

http://dx.doi.org/10.5772/intechopen.76944

127

ن ث ت ب 4 1

ض ص 2 2

ظ ط 2 3

غ ع 2 4

ذ د 2 5

ز ر 2 6

ج خ ح 3 7

ق ف 2 8

ش س 2 9

ة ه 2 10

ك ل 2 11

م 1 12

ا 1 13

ي 1 14

و 1 15

**Table 2.** Arabic alphabets (according to shape).

only connect to the predecessor not the successor.

#### **3.3. Third categorization: letter location in a word**

The third categorization of the Arabic alphabets is based on the location of the letter in a word. Generally, shapes of the Arabic alphabets change according to position of the letter in the word itself (beginning, middle, end); some letter can be connected (refers to the letter succeeding or preceding), and others cannot be connected. The shapes of the letters can be generated with ligature or character overlaps [23, 24]. When discussing the letters that start a word, these six letters when falling at the beginning of a word must stand alone; those letters are (ا د ذ ر ز و(,


A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties http://dx.doi.org/10.5772/intechopen.76944 127

**Table 2.** Arabic alphabets (according to shape).

The fifth branch deals with hamza: there is one basic letter where the hamza is part of the letter "ك, "and the other hamza is not part of the letter like the "أ "and "وْ. "The categorization is

ا و ه م ل ع ط ص س ر د ح 12 dots No branch First ن ف غ ظ ض ذ ز خ ج ب 10 dot One branch Second

ة ق ت ي 3 dots Two branch Third Fourth branch Three dots 2 ث ش Fifth branch With hamza 3 وْ أ ك

The second categorization is according to shape of the letter: this categorization splits the 28 letters into 15 branches based on the body of the letter rather than the dots on the letter (see **Table 2**). However, some increase the number of shapes to 18 shapes [22]. The first branch is made of four letters all very similar in shape: two of them are differentiated by one dot (one above the letter and below the letter), and the other two (one has two dots above it and one has three dots above it). The second branch has two letters very similar to each other: one can differentiate between them by the dot above one, while the other one has no dot; furthermore, the third branch and the fourth branch have the same idea similar in shape, yet one dot makes a difference. The same happens with the fifth and sixth branches. The seventh branch has three letters that are similar in shape: one without dot, one with dot above it, and one with dot below it. The eighth branch has two letters that are very similar in shape: one with one dot and the other with two dots. The ninth branch has two letters similar in shape: one with no dots and the other with three dots. The tenth has two letters: one with no dots and other with two dots. The 11th branch has two letters: one with no hamza and the other with hamza shape above the body of the letter. The 12th, 13th, 14, and 15th branches are not similar to each

One may add here a note about the shape of the letters; there are nine letters that have as part of them enclosed space that resembles a circle. These nine letters are (م و ه ف ق ط ظ ص ض(. The enclosed circle property is an important aspect of the nine letters that will be used in the

The third categorization of the Arabic alphabets is based on the location of the letter in a word. Generally, shapes of the Arabic alphabets change according to position of the letter in the word itself (beginning, middle, end); some letter can be connected (refers to the letter succeeding or preceding), and others cannot be connected. The shapes of the letters can be generated with ligature or character overlaps [23, 24]. When discussing the letters that start a word, these six letters when falling at the beginning of a word must stand alone; those letters are (ا د ذ ر ز و(,

summarized in **Table 1**.

**3.2. Second categorization: letter shape**

**Table 1.** Arabic alphabets (according to dots).

126 Artificial Intelligence - Emerging Trends and Applications

other nor to the rest of the letters.

**3.3. Third categorization: letter location in a word**

algorithm at a later stage.

and the rest of the letters do change form as seen in **Table 3**. Using the same six letters in the middle or end of the word, these letters are only connected and they do not change form. All letters when used at the end of the word have two states: connected and stand-alone.

From the previous one can notice that six letters have special characteristics, namely, (ا, و, د, ذ, ر, ز(. These characters when used in the beginning of a word must stand alone, and also when they end a group of characters, they must be followed by independent character. Hence, they only connect to the predecessor not the successor.


forbids pictures and statues [22]. Hence, there is a need to decorate with words. Proportion is an essential part of the written word. The circle proportion was suggested by "Ibn Mugla," a well-known calligrapher from the eleventh century [25]. Three elements are the bases of

A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties

http://dx.doi.org/10.5772/intechopen.76944

129

• The width of the alif, (the rhombic dot) which is the square impression formed by pressing

• An imaginary circle with alif as its diameter, within which all Arabic letters could fit and

The circle is halved vertically and horizontally, with diameter equals the height of the first letter in Arabic alphabets called *alif*. Looking back at **Table 2** and **Figure 4** that represent the shape and form of the letter, the first branch, the shape of the letter, is in the lower half of the circle. The second branch, according to the circle, takes up the first quarter and the third

• The height of the *alif*, which is a straight and vertical stroke (3–12) rhombic dots.

the tip of the calligrapher's reed pen to paper (see **Figures 1** and **3**).

proportion in Arabic calligraphy [26, 27]:

**Table 4.** Matrix of the different combinations for all 28 letters.

be written (see **Figure 2**).

**Table 3.** Arabic alphabets: stand alone, beginning, middle, and end of a word.

The matrix, seen in **Table 4**, represents the different combination between all 28 letters. The first column in the matrix is the letter coming at the beginning of the order, and the first row is all the letters coming second. Each cell in the matrix shows the two letter shapes and how they change as the order differs. The highlighted letters are the previously mentioned six letters, namely, (ا,و, د, ذ, ر, ز(, which if appears at the beginning of the word, then they stand alone. When these letters appear consecutively within a word, they will both be written as stand-alone independent letters.

#### **3.4. Fourth categorization: letter proportion**

To keep letters proportional to each other, two ways were used by calligraphers: rhombic dot and circles. Arabic calligraphy was used in mosques and castells as decoration since Islam forbids pictures and statues [22]. Hence, there is a need to decorate with words. Proportion is an essential part of the written word. The circle proportion was suggested by "Ibn Mugla," a well-known calligrapher from the eleventh century [25]. Three elements are the bases of proportion in Arabic calligraphy [26, 27]:


The circle is halved vertically and horizontally, with diameter equals the height of the first letter in Arabic alphabets called *alif*. Looking back at **Table 2** and **Figure 4** that represent the shape and form of the letter, the first branch, the shape of the letter, is in the lower half of the circle. The second branch, according to the circle, takes up the first quarter and the third


**Table 4.** Matrix of the different combinations for all 28 letters.

The matrix, seen in **Table 4**, represents the different combination between all 28 letters. The first column in the matrix is the letter coming at the beginning of the order, and the first row is all the letters coming second. Each cell in the matrix shows the two letter shapes and how they change as the order differs. The highlighted letters are the previously mentioned six letters, namely, (ا,و, د, ذ, ر, ز(, which if appears at the beginning of the word, then they stand alone. When these letters appear consecutively within a word, they will both be written as

To keep letters proportional to each other, two ways were used by calligraphers: rhombic dot and circles. Arabic calligraphy was used in mosques and castells as decoration since Islam

stand-alone independent letters.

128 Artificial Intelligence - Emerging Trends and Applications

**3.4. Fourth categorization: letter proportion**

**Table 3.** Arabic alphabets: stand alone, beginning, middle, and end of a word.

**Figure 1.** Example of measuring the letter by using rhomboid dots [28].

**Figure 2.** Example of measuring the letter by using circle [28].

One can conclude by studying the second categorization and the proportion categorization

A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties

http://dx.doi.org/10.5772/intechopen.76944

131

• First, the second branch and ninth branch both (four letters) take same area of the circle.

• Second, the third branch and tenth branch (four letters) both use the first quarter of the

• Third, the fourth branch and seventh branch use the left edge of the circle, yet the differentiation between the two is that one letter is written from right to left and one letter is written

• Fourth, the fifth and sixth branches (four letters) use the fourth quarter of the circle.

through the following:

**Figure 4.** Proportions in Arabic calligraphy.

from left to right as seen in **Figure 5**.

**Figure 5.** Direction of writing with two circle edge letters.

circle.

**Figure 3.** The rhombic dot as a guide to proportions [22].

quarter. In the third branch, the letter falls in the first quarter of the circle with the upper half diameter aligned with the half alif of the letter. In the fourth branch, the letter lies on the left half of the circle. In the fifth branch, two letters are located in the fourth quarter of the circle. In the sixth branch, two letters also fall in the fourth quarter of the circle. In the seventh branch, the letter lies on the left half of the circle. In the eighth branch, two letters are both parted in the first and second quarter of the circle with a circular part above the horizontal diameter. In the tenth branch, the letter takes the first quarter of the circle. The eleventh branch takes the second and third quarter of the circle. In the twelfth branch, the letter is at the center of the circle and uses the bottom half the alif. The thirteen branch is the alif itself, which is the diameter of the circle. The fourteenth branch is taking the third quarter of the circle. The fifteenth branch takes the first and fourth quarter of the circle.

A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties http://dx.doi.org/10.5772/intechopen.76944 131

**Figure 4.** Proportions in Arabic calligraphy.

**Figure 3.** The rhombic dot as a guide to proportions [22].

quarter. In the third branch, the letter falls in the first quarter of the circle with the upper half diameter aligned with the half alif of the letter. In the fourth branch, the letter lies on the left half of the circle. In the fifth branch, two letters are located in the fourth quarter of the circle. In the sixth branch, two letters also fall in the fourth quarter of the circle. In the seventh branch, the letter lies on the left half of the circle. In the eighth branch, two letters are both parted in the first and second quarter of the circle with a circular part above the horizontal diameter. In the tenth branch, the letter takes the first quarter of the circle. The eleventh branch takes the second and third quarter of the circle. In the twelfth branch, the letter is at the center of the circle and uses the bottom half the alif. The thirteen branch is the alif itself, which is the diameter of the circle. The fourteenth branch is taking the third quarter of the

circle. The fifteenth branch takes the first and fourth quarter of the circle.

**Figure 2.** Example of measuring the letter by using circle [28].

**Figure 1.** Example of measuring the letter by using rhomboid dots [28].

130 Artificial Intelligence - Emerging Trends and Applications

One can conclude by studying the second categorization and the proportion categorization through the following:


**Figure 5.** Direction of writing with two circle edge letters.

Hence, give an insight to further classify the letters and manage them into groups. The previous sections explained in details the four categorizations used in the proposed algorithm. Each categorization was an essential in the building blocks and rules of the algorithm.

#### **3.5. Findings of the four categorizations**

Based on the four categorizations explained above, a tree of rules can be built as seen in **Figure 6**. The rule tree has five branches: the first branch is for Arabic alphabets that contain *no dots*. The second branch is for letters with *one dot*. The third branch is for letters with *two dots*. The fourth branch includes all letters with *three dots*. The fifth branch is for letters with *hamza*.

For the first branch including 12 letters and in order to distinguish among the letters, the fourth categorization logic was used. Each letter in this branch was located in the quarters of the circle suggested in the fourth categorization. Two letters used the same quarters (س ص ;(both fall in the first and third quarters of the imaginary circle, which explain the fourth categorization. Still, letter (ص (has an enclosed space, while letter (س (has no enclosed space. Hence, differentiating between the two letters depends on the enclosed space. The enclosed space property is explained previously in the third categorization. The edge of the circle from the fourth categorization was used to differentiate between the ten letters and the letters (ح ع(. Furthermore, to differentiate between the two letters, the direction of writing was used. The direction of writing was explained in **Figure 5** previously.

The second branch consisting of all letters with *one dot* included ten letters. The branch spliced further to *dot below* and *dot above* the body of the letter. Again, in this branch the imaginary circle from the fourth category was used. The location of the letters according to the quarters of the imaginary circle was used as seen in **Figure 6**. Also, the distinguishing feature of the letter falling on the edge of the imaginary circle is used, and the property of writing direction seen in **Figure 5** is also used.

The third branch consisting of all letters with *two dots* included three letters. The branch spliced further to *dots below* and *dots above* the body of the letter. There is only one letter in all the alphabets that has two dots below it (ي(. And, there are three letters with two dots above the letter body. To distinguish between the three letters, the imaginary circle from the fourth categorization and again none shared the same quarters of the circle.

The fourth branch included all letters with *three dots*. The branch included only two letters both have the dots above their body. Hence, the quarters of the imaginary circle were used to distinguish between them.

The fifth branch is the *hamza* (ء (branch which included three letters, and the distinguishing features were the imaginary circle quarters: letter (ك (is in the second and third quarters of the circle, letter (ؤ (is in the first and fourth quarters, and letter (أ (falls on the diameter of the circle.

The *hamza* (ء (can be seen on top of the letters (أ, ؤ, ك ;(the hamza sometimes is considered an independent letter when used in some words like (ءالع (and is used as part of the letter in other words. The hamza is a distinguishing character between the two letters (ك, ل(. Hence, it is treated as semi-letter and is not listed in the alphabets.

**Figure 6.** Categorization of the tree drawn based on the four categorizations.

A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties

http://dx.doi.org/10.5772/intechopen.76944

A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties http://dx.doi.org/10.5772/intechopen.76944 133

**Figure 6.** Categorization of the tree drawn based on the four categorizations.

Hence, give an insight to further classify the letters and manage them into groups. The previous sections explained in details the four categorizations used in the proposed algorithm.

Based on the four categorizations explained above, a tree of rules can be built as seen in **Figure 6**. The rule tree has five branches: the first branch is for Arabic alphabets that contain *no dots*. The second branch is for letters with *one dot*. The third branch is for letters with *two dots*. The fourth branch

For the first branch including 12 letters and in order to distinguish among the letters, the fourth categorization logic was used. Each letter in this branch was located in the quarters of the circle suggested in the fourth categorization. Two letters used the same quarters (س ص ;(both fall in the first and third quarters of the imaginary circle, which explain the fourth categorization. Still, letter (ص (has an enclosed space, while letter (س (has no enclosed space. Hence, differentiating between the two letters depends on the enclosed space. The enclosed space property is explained previously in the third categorization. The edge of the circle from the fourth categorization was used to differentiate between the ten letters and the letters (ح ع(. Furthermore, to differentiate between the two letters, the direction of writing was used. The direction of writing

The second branch consisting of all letters with *one dot* included ten letters. The branch spliced further to *dot below* and *dot above* the body of the letter. Again, in this branch the imaginary circle from the fourth category was used. The location of the letters according to the quarters of the imaginary circle was used as seen in **Figure 6**. Also, the distinguishing feature of the letter falling on the edge of the imaginary circle is used, and the property of writing direction seen in **Figure 5** is also used. The third branch consisting of all letters with *two dots* included three letters. The branch spliced further to *dots below* and *dots above* the body of the letter. There is only one letter in all the alphabets that has two dots below it (ي(. And, there are three letters with two dots above the letter body. To distinguish between the three letters, the imaginary circle from the fourth

The fourth branch included all letters with *three dots*. The branch included only two letters both have the dots above their body. Hence, the quarters of the imaginary circle were used to

The fifth branch is the *hamza* (ء (branch which included three letters, and the distinguishing features were the imaginary circle quarters: letter (ك (is in the second and third quarters of the circle, letter (ؤ (is in the first and fourth quarters, and letter (أ (falls on the diameter of the circle. The *hamza* (ء (can be seen on top of the letters (أ, ؤ, ك ;(the hamza sometimes is considered an independent letter when used in some words like (ءالع (and is used as part of the letter in other words. The hamza is a distinguishing character between the two letters (ك, ل(. Hence, it

Each categorization was an essential in the building blocks and rules of the algorithm.

includes all letters with *three dots*. The fifth branch is for letters with *hamza*.

categorization and again none shared the same quarters of the circle.

is treated as semi-letter and is not listed in the alphabets.

**3.5. Findings of the four categorizations**

132 Artificial Intelligence - Emerging Trends and Applications

was explained in **Figure 5** previously.

distinguish between them.

## **4. Suggested algorithm**

After studying all the previously mentioned categorizations, one can reach the conclusion that a deterministic algorithm can predict the character being drawn based on the following matrix in **Figure 7** and along with the matrix is the suggested algorithm in **Figure 8**, hence reducing the determination of a letter to 38 rules.

The suggested algorithm shown in **Figure 7** is composed of five major if-then statements which are based on the first categorization explained above and later summarized in **Figure 7**. The first if-then statement runs from line 1 to 12 in **Figure 8**. The if-then statement really deals with all cases of the letters which have no dots, and their location in the circle is mentioned in proportion categorization. The enclosed space property mentioned earlier was very important to distinguish letter "س "and letter "ص ;"both letters fall in the same location in the circle Q1 and Q2, yet the latter has an enclosed space. Also, notice that both "ح "and "ع "have the same properties, yet to differentiate them, the direction of writing is used [9].

The second major if-then statement started at line 13 and dealt with letters with one dot. As shown in **Figure 6**, the dot can be either above or below the body of the letter, i.e., both letters "ن "and "ب "fall in the lower part of the circle quarters 3 and 4. Yet, to make the differentiation, the dot was essential here, the latter had the dot below as seen in lines 23 and 20 in **Figure 8**. Also, line 21, in the same figure, dealt with two letters that are essential falling in the same location, and both had the dot above them, distinguished by the writing direction left to right or right to left. The algorithm can be improved by eliminating line 24, hence reducing the number of rules to 37, since one can use one statement.

The third major if-then statement starts at line 25 and ends at line 31 in **Figure 8**. The if-then statement deals with letters that have two dots according to the first categorization and the matrix seen in **Figure 7**. The nested if-statement deals with the two dots whether above or below the letter. Three letters have two dots above them, yet their location on the circle is very distinguishable; hence, using the attribute "enclosed space" was not necessary. Furthermore, one can eliminate line 31 since this is the only letter in the alphabet that has two dots below it. Still, for the purpose of clarity, line 31 was left in the suggested algorithm. If line 31 was eliminated, the number of rules will be again reduced to 36 rules.

The fourth major if-then statement deals with letters that have three dots; there are only two of them. Both letters can be distinguished based on their respective location according to the proportion categorization. Again, line 34 can be eliminated but was left for the purpose of clarity. If line 34 was eliminated, the number of rules will be again reduced to 35 rules.

The last major if-then statement starts at line 35; the statement deals with case of "hamza." The hamza is an essential part in letter "ك "and is used with other letters like "أ ؤ. "The three letters are distinguished by their location within the circle according to proportion categorization. Line 38 can be eliminated but was left to clarify the algorithm, hence reducing the number of rules to 34.

**Figure 7.** The property rules to define each letter in the Arabic alphabets.

A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties

http://dx.doi.org/10.5772/intechopen.76944


**Figure 7.** The property rules to define each letter in the Arabic alphabets.

**4. Suggested algorithm**

134 Artificial Intelligence - Emerging Trends and Applications

used [9].

35 rules.

rules to 34.

reducing the determination of a letter to 38 rules.

number of rules to 37, since one can use one statement.

eliminated, the number of rules will be again reduced to 36 rules.

After studying all the previously mentioned categorizations, one can reach the conclusion that a deterministic algorithm can predict the character being drawn based on the following matrix in **Figure 7** and along with the matrix is the suggested algorithm in **Figure 8**, hence

The suggested algorithm shown in **Figure 7** is composed of five major if-then statements which are based on the first categorization explained above and later summarized in **Figure 7**. The first if-then statement runs from line 1 to 12 in **Figure 8**. The if-then statement really deals with all cases of the letters which have no dots, and their location in the circle is mentioned in proportion categorization. The enclosed space property mentioned earlier was very important to distinguish letter "س "and letter "ص ;"both letters fall in the same location in the circle Q1 and Q2, yet the latter has an enclosed space. Also, notice that both "ح "and "ع "have the same properties, yet to differentiate them, the direction of writing is

The second major if-then statement started at line 13 and dealt with letters with one dot. As shown in **Figure 6**, the dot can be either above or below the body of the letter, i.e., both letters "ن "and "ب "fall in the lower part of the circle quarters 3 and 4. Yet, to make the differentiation, the dot was essential here, the latter had the dot below as seen in lines 23 and 20 in **Figure 8**. Also, line 21, in the same figure, dealt with two letters that are essential falling in the same location, and both had the dot above them, distinguished by the writing direction left to right or right to left. The algorithm can be improved by eliminating line 24, hence reducing the

The third major if-then statement starts at line 25 and ends at line 31 in **Figure 8**. The if-then statement deals with letters that have two dots according to the first categorization and the matrix seen in **Figure 7**. The nested if-statement deals with the two dots whether above or below the letter. Three letters have two dots above them, yet their location on the circle is very distinguishable; hence, using the attribute "enclosed space" was not necessary. Furthermore, one can eliminate line 31 since this is the only letter in the alphabet that has two dots below it. Still, for the purpose of clarity, line 31 was left in the suggested algorithm. If line 31 was

The fourth major if-then statement deals with letters that have three dots; there are only two of them. Both letters can be distinguished based on their respective location according to the proportion categorization. Again, line 34 can be eliminated but was left for the purpose of clarity. If line 34 was eliminated, the number of rules will be again reduced to

The last major if-then statement starts at line 35; the statement deals with case of "hamza." The hamza is an essential part in letter "ك "and is used with other letters like "أ ؤ. "The three letters are distinguished by their location within the circle according to proportion categorization. Line 38 can be eliminated but was left to clarify the algorithm, hence reducing the number of


world. Arabic language is different from other languages: Arabic is a cursive language, written from right to left, and letters change shape according to the position of the word. Hence, there is a dire need to develop an algorithm for character recognition for the Arabic language. However, many algorithms have used artificial intelligent methods to recognize characters that make their algorithms non-deterministic, while the proposed algorithm is deterministic. This research presents four categorization methods that will be employed to develop a deterministic algorithm method of categorization. The first categorization method depends on the number of dots used with each letter. The second categorization method depends on the shape of the letter, with classification to the letters. The third categorization is presented with the shape of the letter as used in the beginning, middle, and end of the word. The fourth categorization method relays on the proportion method, which is a method used in Arabic calligraphy that is based on rhombic dot. Then, the research suggested a deterministic algorithm composed of 34 rules that can predict the character based on the use of all of categoriza-

A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties

http://dx.doi.org/10.5772/intechopen.76944

137

The proposed algorithm is only one piece in the whole puzzle. There are many parts that need to be developed. One major part is the input section of the algorithm. Such part needs to exist in order for the puzzle to be complete. The input section needs to parse the word into segments that can detect the shape of the letters, the dots, and the hamza. Furthermore, this

*Evon M. O. Abu-Taieh*, PhD, is an associate professor and an author/editor of four scholar books, contributed in more than eight scholar books. She has more than 40 published papers. She was previously the acting dean in the University of Jordan (Aqaba) for 3 years. Dr. Evon is an editorial board member in five renounced journals. She has more than 29 years of experience in education, computers, aviation, transport, AI, ciphering, routing algorithms, compres-

*Auhood Abdullah Alfaries*, PhD, is as assistant professor in the IT Department in King Saud University (KSU). Dr. Auhood received her PhD degree in Semantic Web and Web Services from the School of Computing and Information Systems, Brunel University, UK. She held a number of IT-related academic and administrative positions both in KSU and princess Noura bint Abdulrahman University (PNU). She has experience in quality and program accreditation by serving in a number of quality-related roles since 2011. Auhood is associated with a number of important bodies such as an associate of the UK Higher Education Academy, a member of the Institute of Electrical and Electronics Engineers (IEEE), and a member of the Saudi Computer Society. She is also an ABET program evaluator. She participated as a conference and journal reviewer and a member of a number of national and international workshops and conference program committees. She has served as the vice dean and dean of E-Learning and Distance Learning Deanship in KSU and then in PNU for 2 years and has also served as the assistant general director and then the director for the General Directorate of Information and Communications Technology (ITC) in PNU. Currently, she is the dean of the

tions as attributes assembled in a matrix for this purpose [29].

sion algorithms, multimedia, and simulation.

**Biography**

research will be a building block for further research and development.

**Figure 8.** Determine\_Character (input:one\_character).

## **5. Conclusion**

The proposed algorithm stems from many needs that are more apparent today. First, there is a rise in the use of handheld devices, which use character recognition methods that serves mainly Latin-based languages. Arabic language is one of the top five languages spoken in the world. Arabic is used by more than 422 million native and non-native speakers in the world. Arabic language is different from other languages: Arabic is a cursive language, written from right to left, and letters change shape according to the position of the word. Hence, there is a dire need to develop an algorithm for character recognition for the Arabic language. However, many algorithms have used artificial intelligent methods to recognize characters that make their algorithms non-deterministic, while the proposed algorithm is deterministic. This research presents four categorization methods that will be employed to develop a deterministic algorithm method of categorization. The first categorization method depends on the number of dots used with each letter. The second categorization method depends on the shape of the letter, with classification to the letters. The third categorization is presented with the shape of the letter as used in the beginning, middle, and end of the word. The fourth categorization method relays on the proportion method, which is a method used in Arabic calligraphy that is based on rhombic dot. Then, the research suggested a deterministic algorithm composed of 34 rules that can predict the character based on the use of all of categorizations as attributes assembled in a matrix for this purpose [29].

The proposed algorithm is only one piece in the whole puzzle. There are many parts that need to be developed. One major part is the input section of the algorithm. Such part needs to exist in order for the puzzle to be complete. The input section needs to parse the word into segments that can detect the shape of the letters, the dots, and the hamza. Furthermore, this research will be a building block for further research and development.

## **Biography**

**Figure 8.** Determine\_Character (input:one\_character).

136 Artificial Intelligence - Emerging Trends and Applications

The proposed algorithm stems from many needs that are more apparent today. First, there is a rise in the use of handheld devices, which use character recognition methods that serves mainly Latin-based languages. Arabic language is one of the top five languages spoken in the world. Arabic is used by more than 422 million native and non-native speakers in the

**5. Conclusion**

*Evon M. O. Abu-Taieh*, PhD, is an associate professor and an author/editor of four scholar books, contributed in more than eight scholar books. She has more than 40 published papers. She was previously the acting dean in the University of Jordan (Aqaba) for 3 years. Dr. Evon is an editorial board member in five renounced journals. She has more than 29 years of experience in education, computers, aviation, transport, AI, ciphering, routing algorithms, compression algorithms, multimedia, and simulation.

*Auhood Abdullah Alfaries*, PhD, is as assistant professor in the IT Department in King Saud University (KSU). Dr. Auhood received her PhD degree in Semantic Web and Web Services from the School of Computing and Information Systems, Brunel University, UK. She held a number of IT-related academic and administrative positions both in KSU and princess Noura bint Abdulrahman University (PNU). She has experience in quality and program accreditation by serving in a number of quality-related roles since 2011. Auhood is associated with a number of important bodies such as an associate of the UK Higher Education Academy, a member of the Institute of Electrical and Electronics Engineers (IEEE), and a member of the Saudi Computer Society. She is also an ABET program evaluator. She participated as a conference and journal reviewer and a member of a number of national and international workshops and conference program committees. She has served as the vice dean and dean of E-Learning and Distance Learning Deanship in KSU and then in PNU for 2 years and has also served as the assistant general director and then the director for the General Directorate of Information and Communications Technology (ITC) in PNU. Currently, she is the dean of the College of Computer and Information Sciences. Auhood's research interest includes semantic web, ontology engineering, natural language processing, machine learning, and cloud computing. She is a member of IWAN Research Group.

[2] Xue N. Automatic inference of the temporal location of situations in Chinese text. Conference on Empirical Methods in Natural Language Processing (EMNLP '08). Stroudsburg, PA, USA: Association for Computational Linguistics. 2008. pp. 707-714 [3] Xue N, Converse SP. Combining classifiers for Chinese word segmentation. first SIGHAN workshop on Chinese language processing. 18. Stroudsburg, PA, USA: Association for Computational Linguistics. 2002. pp. 1-7. DOI: dx.doi.org/10.3115/1118824.1118839 [4] Xue N, Chen J, Palmer M. Aligning Features with Sense Distinction Dimensions. COLING/ACL on Main conference poster sessions (COLING-ACL '06). Stroudsburg, PA,

A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties

http://dx.doi.org/10.5772/intechopen.76944

139

[5] Green S, Manning CD. Better Arabic parsing: Baselines, evaluations, and analysis. 23rd International Conference on Computational Linguistics. Beijing, China. 2010. pp. 394-402

[6] Marton Y, Habash N, Rambow O. Dependency parsing of modern standard Arabic with lexical and inflectional features. Computational Linguistics. 2013;**39**(1):161-194. DOI:

[7] Parvez MT, Mahmoud SA. Offline Arabic handwritten text recognition: A Survey. ACM

[8] Subramanian K, Prasad R, Natarajan P. Robust named entity detection using an Arabic offline handwriting recognition system. The Third Workshop on Analytics for Noisy Unstructured Text Data (AND '09). New York, NY, USA: ACM. 2009. pp. 63-68. DOI:

[9] Fard MM, Moghadam M, Bidgoli BM, Hussain M. Persian on-line handwritten character recognition by RCE spatio-temporal neural network. 5th international conference on Soft computing as transdisciplinary science and technology (CSTST '08). New York, NY,

[10] Abu-Taieh E. Artificial neural networks: Enhanced back propagation in character recognition. In: Proceedings of Information Technology and Organizations: Trends, Issues,

[11] Aljarrah I, Al-Khalee O, Mhaidat K, Alrefai M, Alzu'bi A, Rabab'ah M. Automated system for Arabic optical character recognition. In: The 3rd International Conference on Information and Communication Systems (ICICS '12). 5, p. 6. New York, NY, USA:

[12] Ali A, Sagheer AM. Design and implementation of secure chatting application with end to end encryption. Journal of Engineering and Applied Sciences. 2017;**12**:156-160 [13] Supriana I, Nasution A. Arabic Character Recognition System Development. Juhana Salim MI, editor. Procedia Technology. 2013;**11**:334-341. DOI: 10.1016/j.protcy.2013.12.199

[14] Sarfraz M, Ahmed MJ, Ghazi SA. Saudi Arabian License Plate Recognition System. 2003 International Conference on Geometric Modeling and Graphics (GMAG'03). London,UK: IEEE Computer Society Press. 2003. pp. 36-41. DOI: 10.1109/GMAG.2003.1219663

Computer Survey. 2013;**45**(2):35. DOI: dx.doi.org/10.1145/2431211.2431222

USA: ACM. 2008. pp. 90-94. DOI: dx.doi.org/10.1145/1456223.1456246

Challenges and Solutions. Volume 1. USA: IGI Global; 2003. pp. 263

ACM. 2012. DOI: http://dx.doi.org/10.1145/2222444.2222449

USA: Association for Computational Linguistics; 2006. pp. 921-928

10.1162/COLI\_a\_00138

dx.doi.org/10.1145/1568296.1568308

*Dr. Nabeel Mohammed Zanoon* received his PhD degree in Computer Systems Engineering, from the South-West State University, Kursk, Russia, in 2011. He is a faculty member with Al-Balqa' Applied University since 2011, where he is currently an assistant professor and the head of the Department of Applied Sciences as well as the director of the ICDL Computer Centre and Cisco Academy Branch of Aqaba University College. He has published several researches in several areas: security of e-banking, algorithm scheduling in grid and cloud, meta-grammar, hardware and architecture, fiber optical, and mobile ad hoc networks.

*Issam Hamad Al Hadid* is a lecturer at the University of Jordan. He completed his PhD degree at the University of Banking and Financial Sciences (Jordan) in 2010, obtained his MSc degree in Computer Science at Amman Arab University (Jordan) in 2005, and earned his BSc degree in Computer Science at Al-Zaytoonah University (Jordan) in 2002. He has published many research papers in different fields of science in refereed journal and international conference proceedings. His researches focus on self-healing architecture; also, his research interests include AI, knowledge-based systems, security systems, compression techniques and algorithms, and information retrieval.

*Alia Abu-Tayeh* earned her PhD degree in 1995. She is a lecturer in the University of Jordan (Aqaba) and an ex-lecturer in King Hussein University. She published many scientific articles in renowned journals. Her interest ranges from linguistics to applied mathematics in computer and languages.

## **Author details**

Evon Abu-Taieh1 \*, Auhood Alfaries2 , Nabeel Zanoon3 , Issam H. Al Hadid1 and Alia M. Abu-Tayeh1


## **References**

[1] Sharma MK, Samanta D. Word prediction system for text entry in Hindi. 13, 2, article 8 (June 2014). 2014. 29 pages. DOI=http://dx.doi.org/10.1145/2617590

[2] Xue N. Automatic inference of the temporal location of situations in Chinese text. Conference on Empirical Methods in Natural Language Processing (EMNLP '08). Stroudsburg, PA, USA: Association for Computational Linguistics. 2008. pp. 707-714

College of Computer and Information Sciences. Auhood's research interest includes semantic web, ontology engineering, natural language processing, machine learning, and cloud com-

*Dr. Nabeel Mohammed Zanoon* received his PhD degree in Computer Systems Engineering, from the South-West State University, Kursk, Russia, in 2011. He is a faculty member with Al-Balqa' Applied University since 2011, where he is currently an assistant professor and the head of the Department of Applied Sciences as well as the director of the ICDL Computer Centre and Cisco Academy Branch of Aqaba University College. He has published several researches in several areas: security of e-banking, algorithm scheduling in grid and cloud, meta-grammar, hardware and architecture, fiber optical, and mobile ad

*Issam Hamad Al Hadid* is a lecturer at the University of Jordan. He completed his PhD degree at the University of Banking and Financial Sciences (Jordan) in 2010, obtained his MSc degree in Computer Science at Amman Arab University (Jordan) in 2005, and earned his BSc degree in Computer Science at Al-Zaytoonah University (Jordan) in 2002. He has published many research papers in different fields of science in refereed journal and international conference proceedings. His researches focus on self-healing architecture; also, his research interests include AI, knowledge-based systems, security systems, compression techniques

*Alia Abu-Tayeh* earned her PhD degree in 1995. She is a lecturer in the University of Jordan (Aqaba) and an ex-lecturer in King Hussein University. She published many scientific articles in renowned journals. Her interest ranges from linguistics to applied mathematics in com-

, Nabeel Zanoon3

[1] Sharma MK, Samanta D. Word prediction system for text entry in Hindi. 13, 2, article 8

(June 2014). 2014. 29 pages. DOI=http://dx.doi.org/10.1145/2617590

, Issam H. Al Hadid1

and

puting. She is a member of IWAN Research Group.

138 Artificial Intelligence - Emerging Trends and Applications

and algorithms, and information retrieval.

\*, Auhood Alfaries2

\*Address all correspondence to: abutaieh@gmail.com

1 The University of Jordan, Aqaba, Jordan

2 King Saud University, Riyadh, Saudi Arabia 3 Al-Balqa' Applied University, Aqaba, Jordan

hoc networks.

puter and languages.

**Author details**

Evon Abu-Taieh1

**References**

Alia M. Abu-Tayeh1


[15] Izakian H, Monadjemi SA, Ladani BT, Zamanifar K. Multi-font Farsi/Arabic isolated character recognition using chain codes. International Journal of Computer and Information Engineering. 2008;**2**(7):67-70. Retrieved from http://waset.org/publications/301/ multi-font-farsi-arabic-isolated-character-recognition-using-chain-codes

[27] Osborn J. Narratives of Arabic script: Calligraphic design and modern spaces. The Journal of the Design Studies Forum. 2009;**1**(3). DOI: doi.org/10.1080/17547075.2009.1

A Deterministic Algorithm for Arabic Character Recognition Based on Letter Properties

http://dx.doi.org/10.5772/intechopen.76944

141

[28] Schimmel A. (10 14). Styles of Calligraphy, Islamic Art & Architecture. Retrieved from Islamic Art and Architecture. 2011: http://islamic-arts.org/2011/styles-of-calligraphy [29] Shah NN, Bhatt N, Ganatra A. A unique word prediction system for text entry in Hindi. In: Second International Conference on Information and Communication Technology for Competitive Strategies (p. Article No. 118). ACM. 2016. DOI: 10.1145/2905055.2905334


[27] Osborn J. Narratives of Arabic script: Calligraphic design and modern spaces. The Journal of the Design Studies Forum. 2009;**1**(3). DOI: doi.org/10.1080/17547075.2009.1 1643292

[15] Izakian H, Monadjemi SA, Ladani BT, Zamanifar K. Multi-font Farsi/Arabic isolated character recognition using chain codes. International Journal of Computer and Information Engineering. 2008;**2**(7):67-70. Retrieved from http://waset.org/publications/301/

[16] Abandah GA, Khedher, Mohammed Z. Analysis of handwritten Arabic letters using selected feature extraction techniques. International Journal of Computer Processing of

[17] Al-Taani AT, Al-Haj S. Recognition of on-line Arabic handwritten characters using struc-

[18] Kapogiannopoulos G, Kalouptsidis N. A fast high precision algorithm for the estimation of skew angle using moments. IASTED, International Conference Signal Processing,

[19] Zidouri A. On multiple typeface Arabic script recognition. Research Journal of Applied Sciences Engineering and Technology. 2010;**2**(5):428-435. Retrieved from http://citeseerx.

[20] Amin A.Recognition of printed Arabic text based on global features and decision tree learning techniques. Pattern Recognition. 2000;**33**:1309-1323. DOI: 10.1016/S0031-3203(99)00114-4

[21] Cowell J, Hussain F. Extracting features from Arabic characters. IASTED International Conference on COMPUTER GRAPHICS AND IMAGING (pp. 201-206). Honolulu, Hawai,

[22] Sood S, Fitzgerald E.Arabic Script and the Art of Calligraphy. Retrieved from The Metropol itan Museum of Art. 2012: www.metmuseum.org/learn/for-educators/publications-

[23] Slimane F, Kanoun S, Alim AM, Henneber J, Ingold R. Comparison of global and cascading recognition systems applied to multi-font Arabic text. In Proceedings of the 10th ACM symposium on Document engineering (DocEng '10). ACM, New York, NY, USA,

[24] Slimane F, Kanoun S, Alimi AM, Hennebert J, Ingold R. Comparison of global and cascading recognition systems applied to multi-font Arabic text. In: Of the 10th ACM symposium on Document engineering (DocEng '10). New York, NY, USA: ACM. 2010b.

[25] Sourdel D. Ibn Muḳla. In: Bearman P, Bianquis Th, Bosworth CE, van Donzel E, Heinrichs WP, editors. Encyclopaedia of Islam. 2nd ed. Edited by: Consulted online on 01 July 2017 http://dx.doi.org/10.1163/1573-3912\_islam\_SIM\_3306. First published online: 2012. First

[26] Grabar O. The Mediation of Ornament. The A.W. Mellon Lectures in the Fine Arts.

for-educators/art-of-the-islamic-world/unit-two/proportional-scripts

161-164. 2010a. DOI=http://dx.doi.org/10.1145/1860559.1860591

pp. 161-164. DOI: http://dx.doi.org/10.1145/1860559.1860591

print edition: ISBN: 9789004161214, 1960-2007

Princeton, NJ: Princeton University Press; 1992. p. 38

Pattern Recognition and Application. Crete, Grece: SPPRA. 2002. pp. 275-279

ist.psu.edu/viewdoc/download?doi=10.1.1.473.4044&rep=rep1&type=pdf

USA: ACTA Press. 2001

140 Artificial Intelligence - Emerging Trends and Applications

multi-font-farsi-arabic-isolated-character-recognition-using-chain-codes

Languages. 2009;**22**(1):1-25. DOI: doi.org/10.1142/S1793840609001981

tural features. Journal of Pattern Recognition Research. 2010:23-37


**Chapter 7**

**Provisional chapter**

**Human-AI Synergy in Creativity and Innovation**

**Human-AI Synergy in Creativity and Innovation**

DOI: 10.5772/intechopen.75310

In order to maximize creative behavior, humans and computers need to collaborate in a manner that will leverage the strengths of both. A 2017 mathematical proof shows two limits to how innovative a computer can be. Humans can help counteract these demonstrated limits. Humans possess many mental blind spots to innovating (e.g., *functional fixedness, design fixation, analogy blindness*, etc.), and particular algorithms can help counteract these shortcomings. Further, since humans produce the corpora used by AI technology, human blind spots to innovation are implicit within the text processed by AI technology. Known algorithms that query humans in particular ways can effectively counter these text-based blind spots. Working together, a human-computer partnership can achieve higher degrees of innovation than either working alone. To become an effective partnership, however, a special interface is needed that is both human- and computer-friendly. This interface called *BrainSwarming* possesses a linguistic component, which is a formal grammar that is also natural for humans to use and a visual component that is easily represented by standard data structures. Further, the interface breaks down innovative problem solving into its essential components: a goal, sub-goals, resources, features, interactions, and effects. The resulting human-AI synergy has the potential to achieve innovative breakthroughs that either partner working alone may never achieve.

**Keywords:** creativity, innovation, human-computer interface, artificial intelligence,

Recent critiques of IBM Watson in the business world (*Forbes* and *Fortune*) and technical world (*MIT Technical Review* and *Wired*) suggest concerns about Watson's abilities and potential [1–5]. One critique is based on Watson's inability to draw conclusions beyond the corpus it has been trained on. Another critique is that it cannot make connections, or draw

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75310

intelligence augmentation

**1. Introduction**

Tony McCaffrey

**Abstract**

Tony McCaffrey

#### **Human-AI Synergy in Creativity and Innovation Human-AI Synergy in Creativity and Innovation**

DOI: 10.5772/intechopen.75310

#### Tony McCaffrey Tony McCaffrey

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75310

#### **Abstract**

In order to maximize creative behavior, humans and computers need to collaborate in a manner that will leverage the strengths of both. A 2017 mathematical proof shows two limits to how innovative a computer can be. Humans can help counteract these demonstrated limits. Humans possess many mental blind spots to innovating (e.g., *functional fixedness, design fixation, analogy blindness*, etc.), and particular algorithms can help counteract these shortcomings. Further, since humans produce the corpora used by AI technology, human blind spots to innovation are implicit within the text processed by AI technology. Known algorithms that query humans in particular ways can effectively counter these text-based blind spots. Working together, a human-computer partnership can achieve higher degrees of innovation than either working alone. To become an effective partnership, however, a special interface is needed that is both human- and computer-friendly. This interface called *BrainSwarming* possesses a linguistic component, which is a formal grammar that is also natural for humans to use and a visual component that is easily represented by standard data structures. Further, the interface breaks down innovative problem solving into its essential components: a goal, sub-goals, resources, features, interactions, and effects. The resulting human-AI synergy has the potential to achieve innovative breakthroughs that either partner working alone may never achieve.

**Keywords:** creativity, innovation, human-computer interface, artificial intelligence, intelligence augmentation

## **1. Introduction**

Recent critiques of IBM Watson in the business world (*Forbes* and *Fortune*) and technical world (*MIT Technical Review* and *Wired*) suggest concerns about Watson's abilities and potential [1–5]. One critique is based on Watson's inability to draw conclusions beyond the corpus it has been trained on. Another critique is that it cannot make connections, or draw

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

analogies, between different domains of data—such as between oncology and heart disease [1]. Yet another critique argues that AI, including IBM Watson, is still several breakthroughs away from being really intelligent, but this critique does not specify the particulars of these needed breakthroughs [5]. Our theory is that IBM Watson is exhibiting the limitations demonstrated in a 2017 mathematical proof of the limits to a computational approach to innovation and analogical thought [6]. No matter how many academic journal articles Watson processes about its topic of choice, it cannot overcome the proven limits. Further, human blind spots, such as *functional fixedness*, are implicit within the human-produced text and data used by Watson. Unless IBM Watson and other AI technologies face these limits and address them properly, they will continue to experience frustration over the uneven results that these technologies produce.

space of innovation for a given object is shown to consist of all possible effects that the object could produce when interacting with every other possible object, material, force, energy, and condition (e.g., barometric pressure and gravity strength). Section 2.2 quantifies the number of interactions that are possible between an object of interest and all other objects, materials, forces, and energies in the world. Section 2.3 builds on Section 2.2 by exploring all the ways that two given objects could interact within various conditions to produce interesting effects. The number of possible interactions and possible effects is so astronomically large that the fastest supercomputer today could not examine them even it started working from the invention of the first computer. Section 2.4 articulates other reasons why a computer could not predict certain effects of an interaction. Finally, Section 2.5 shows how humans can help

Human-AI Synergy in Creativity and Innovation http://dx.doi.org/10.5772/intechopen.75310 145

Any creative/innovative solution to a problem is built upon at least one commonly overlooked or new feature of the problem. A feature that is commonly overlooked or new is called obscure. The above description is called the *Obscure Features Hypothesis of Innovation* [13].

If the solution was based upon a commonly noticed feature, then it would get a low rating on originality and a high rating on obviousness [11]. For example, if a scented jar candle company came out with a new scent called *Bag of Halloween Candy*, people might enjoy it and buy this new scent but it still might not receive very high creativity scores. The feature of scent has been explored a great deal in the context of candles. In contrast, if a candle company devised a self-snuffing candle that could be set to extinguish itself after a desired amount of time, then this would receive high creativity scores [11]. The feature of self-snuffing in the world of

Given that features are a crucial aspect of creativity, a definition adapted from the philosopher Nietzsche permits the number of features of an object to be quantified [6]. Nietzsche states: "The features of a 'thing' are its effects on other 'things': if one removes other 'things,' then a thing has no features" [14]. From this perspective, every feature emerges from interactions and is not intrinsic to the object itself. Certainly, color is not intrinsic to an object, but results from light interacting with the object and our retinas, which results in processing in the human visual cortex. Change the circumstances of the interaction and the color changes. Change the lighting. Put on sunglasses. Experience trauma in the visual cortex. These and

As a further example, the mass of an object now appears to be the result of the object interacting with Higgs bosons [15]. Mass and length of an object change as its speed increases as it nears the speed of light [16]. Even the size of an object depends on the gravitational field that it is experiencing. A table of a certain size might be stable within one gravitational field but collapse in another gravitational field because its legs cannot hold up the weight of its tabletop. Any feature of an object, in fact, can be described as the effect of interactions.

Given these definitions of creativity and feature, we are able to quantify the number of features, interactions, and effects by defining a feature as an effect that results from interacting the object of interest with other objects, materials, forces (e.g., centripetal and centrifugal), and energies (acoustic, magnetic, chemical, biological, human, thermal, electrical, hydraulic,

counteract the challenges that computers have when innovating.

**2.1. New definitions**

candles has been under-explored.

other changes can result in a change in color.

More generally, any computational approach to innovation and creativity (e.g., Machine Learning, Deep Learning, AI in general) has limits to how creative or innovative it can be. The 2017 mathematical proof details two of these limitations [6]. Humans can help counter these limits. On the other hand, humans have many known mental blind spots to innovation, including *functional fixedness* [7], *design fixation* [8], and *analogy blindness* [9, 10]—to name a few. For every known mental obstacle to innovation, there now exists an effective counter-technique, which can be implemented in software [11]. These counter-techniques help humans be more innovative as well as improve the AI technologies that operate on the text and data produced by humans.

From these findings, it makes sense to create a human-computer interface for innovation that is both human- and computer-friendly so that the computer can help humans be more innovative and humans can return the favor for the computer. The overall result thus far has been a human-computer partnership that has already found novel solutions to such tough problems as how to significantly reduce concussions in American football players and how to adhere a coating to the non-stick surface Teflon [11, 12]. This human-computer synergy has the potential to achieve even greater innovative breakthroughs.

This chapter first articulates new definitions of creativity, innovation, feature, and effect. These definitions permit quantified arguments about the innovation process. Next, the main points of the proof will be presented. All the proof's details are contained in Ref. [6]. The main conclusion is that no computational approach can fully take over the creative or innovative process.

Then, several of the weaknesses to human innovation will be presented along with their effective algorithmic counter-techniques. A full description of human weaknesses and programmable counter-techniques are contained in Ref. [11]. How these human weaknesses become computer weaknesses is explained with an emphasis on how the programmable counter-techniques can also improve the innovation of any AI technology. Finally, the human-computer interface that permits humans to counter computer limits and the computer to counter human weaknesses will be presented. This interface called *BrainSwarming* has the potential to result in greater innovative behavior than either humans or computers can achieve working alone.

## **2. Proven computer limits to innovation**

Section 2.1 articulates the new definitions for creativity/innovation, feature, and effect, which then permit the quantification of the size of the space of innovation for physical objects. The space of innovation for a given object is shown to consist of all possible effects that the object could produce when interacting with every other possible object, material, force, energy, and condition (e.g., barometric pressure and gravity strength). Section 2.2 quantifies the number of interactions that are possible between an object of interest and all other objects, materials, forces, and energies in the world. Section 2.3 builds on Section 2.2 by exploring all the ways that two given objects could interact within various conditions to produce interesting effects. The number of possible interactions and possible effects is so astronomically large that the fastest supercomputer today could not examine them even it started working from the invention of the first computer. Section 2.4 articulates other reasons why a computer could not predict certain effects of an interaction. Finally, Section 2.5 shows how humans can help counteract the challenges that computers have when innovating.

## **2.1. New definitions**

analogies, between different domains of data—such as between oncology and heart disease [1]. Yet another critique argues that AI, including IBM Watson, is still several breakthroughs away from being really intelligent, but this critique does not specify the particulars of these needed breakthroughs [5]. Our theory is that IBM Watson is exhibiting the limitations demonstrated in a 2017 mathematical proof of the limits to a computational approach to innovation and analogical thought [6]. No matter how many academic journal articles Watson processes about its topic of choice, it cannot overcome the proven limits. Further, human blind spots, such as *functional fixedness*, are implicit within the human-produced text and data used by Watson. Unless IBM Watson and other AI technologies face these limits and address them properly, they will continue to experience frustration over the uneven results

More generally, any computational approach to innovation and creativity (e.g., Machine Learning, Deep Learning, AI in general) has limits to how creative or innovative it can be. The 2017 mathematical proof details two of these limitations [6]. Humans can help counter these limits. On the other hand, humans have many known mental blind spots to innovation, including *functional fixedness* [7], *design fixation* [8], and *analogy blindness* [9, 10]—to name a few. For every known mental obstacle to innovation, there now exists an effective counter-technique, which can be implemented in software [11]. These counter-techniques help humans be more innovative as well as improve the AI technologies that operate on the text and data produced by humans. From these findings, it makes sense to create a human-computer interface for innovation that is both human- and computer-friendly so that the computer can help humans be more innovative and humans can return the favor for the computer. The overall result thus far has been a human-computer partnership that has already found novel solutions to such tough problems as how to significantly reduce concussions in American football players and how to adhere a coating to the non-stick surface Teflon [11, 12]. This human-computer synergy has

This chapter first articulates new definitions of creativity, innovation, feature, and effect. These definitions permit quantified arguments about the innovation process. Next, the main points of the proof will be presented. All the proof's details are contained in Ref. [6]. The main conclusion is that no computational approach can fully take over the creative or innovative process. Then, several of the weaknesses to human innovation will be presented along with their effective algorithmic counter-techniques. A full description of human weaknesses and programmable counter-techniques are contained in Ref. [11]. How these human weaknesses become computer weaknesses is explained with an emphasis on how the programmable counter-techniques can also improve the innovation of any AI technology. Finally, the human-computer interface that permits humans to counter computer limits and the computer to counter human weaknesses will be presented. This interface called *BrainSwarming* has the potential to result in greater innovative behavior than either humans or computers can achieve working alone.

Section 2.1 articulates the new definitions for creativity/innovation, feature, and effect, which then permit the quantification of the size of the space of innovation for physical objects. The

the potential to achieve even greater innovative breakthroughs.

**2. Proven computer limits to innovation**

that these technologies produce.

144 Artificial Intelligence - Emerging Trends and Applications

Any creative/innovative solution to a problem is built upon at least one commonly overlooked or new feature of the problem. A feature that is commonly overlooked or new is called obscure. The above description is called the *Obscure Features Hypothesis of Innovation* [13].

If the solution was based upon a commonly noticed feature, then it would get a low rating on originality and a high rating on obviousness [11]. For example, if a scented jar candle company came out with a new scent called *Bag of Halloween Candy*, people might enjoy it and buy this new scent but it still might not receive very high creativity scores. The feature of scent has been explored a great deal in the context of candles. In contrast, if a candle company devised a self-snuffing candle that could be set to extinguish itself after a desired amount of time, then this would receive high creativity scores [11]. The feature of self-snuffing in the world of candles has been under-explored.

Given that features are a crucial aspect of creativity, a definition adapted from the philosopher Nietzsche permits the number of features of an object to be quantified [6]. Nietzsche states: "The features of a 'thing' are its effects on other 'things': if one removes other 'things,' then a thing has no features" [14]. From this perspective, every feature emerges from interactions and is not intrinsic to the object itself. Certainly, color is not intrinsic to an object, but results from light interacting with the object and our retinas, which results in processing in the human visual cortex. Change the circumstances of the interaction and the color changes. Change the lighting. Put on sunglasses. Experience trauma in the visual cortex. These and other changes can result in a change in color.

As a further example, the mass of an object now appears to be the result of the object interacting with Higgs bosons [15]. Mass and length of an object change as its speed increases as it nears the speed of light [16]. Even the size of an object depends on the gravitational field that it is experiencing. A table of a certain size might be stable within one gravitational field but collapse in another gravitational field because its legs cannot hold up the weight of its tabletop. Any feature of an object, in fact, can be described as the effect of interactions.

Given these definitions of creativity and feature, we are able to quantify the number of features, interactions, and effects by defining a feature as an effect that results from interacting the object of interest with other objects, materials, forces (e.g., centripetal and centrifugal), and energies (acoustic, magnetic, chemical, biological, human, thermal, electrical, hydraulic, pneumatic, mechanical, electromagnetic, and radioactive: [17]). Given that some amount of a material (e.g., a patch of velvet or a chunk of steel) can be considered an object, we can leave out material from the definition of feature above. Also, the lists of forces and energies may increase someday, especially as we better understand dark matter and dark energy, but these lists are currently stable but potentially dynamic in the future.

room. Or, place the coffee cup on one side of the room and take the spoon to the other side of the room. Make the coffee cup into a target by trying to throw the spoon into the cup. Or, play golf with the spoon as the putter, a marble as the golf ball, a table top as the putting green, and

Human-AI Synergy in Creativity and Innovation http://dx.doi.org/10.5772/intechopen.75310 147

Or, again set the coffee cup upright on the counter and place the spoon horizontally so it rests across the opening of the cup. Turn the spoon over so the curved part is facing upward and play a game of trying to balance various objects on the curved surface so they do not fall into the cup. Or, shake a spoon around in an empty cup to make a rattling sound. Or, turn a coffee cup over so that the open end is facing down. Place a spoon into the open end of the empty coffee cup and set the contraption on the counter. The spoon will force one side of the coffee cup to elevate a bit, forming a trap. When the spoon is disturbed by a mouse, for example, the

These are just a few of the ways to interact the spoon and the coffee cup to achieve an interesting effect. To consider all the ways that these two objects could interact, we would have to take into account every possible spatial relation between the two objects; every possible speed, acceleration, and deceleration of the two objects with respect to each other; every possible type of movement (linear, nonlinear, spinning at various angles and speeds); every possible surface that they may rest upon; every possible lighting condition, wind condition, heat condition, radiation level, magnetic field strength, electrical current flow, barometric pressure, humidity, earthquake or turbulence condition, and gravity strength; every possible extra object involved in the interaction (e.g., ice cube, marble, liquid coffee, and a human); as

If any of these conditions is actually measured by a continuous variable, then the number of different interactions between the spoon and coffee cup is truly computably nonenumerable. Even if all these conditions are measured by discrete variables that extend to a finite number of decimal places, then the number of possible interactions is outlandishly large. All these digits of precision on a variable are probably unnecessary in most cases, but when one is approaching a phase transition (e.g., liquid coffee approaching gas or the ceramic coffee cup possibly becoming superconductive), then many decimal places might be necessary to understand the onset of the transition. If one is approaching a previously unknown phase transition, then the slightest change in one condition, as measured by a change in the 100th decimal

In sum, although we calculated that there may be about 1018 possible interactions between one object and up to five other objects out of 10 million possible objects, taking into account the incredible number of ways that any two objects can interact with each other plus all the possible conditions that those interactions could take place in, raises our number of interactions at least several orders of magnitude and quite possibly many orders of magnitude [6]. The overall result is a number of possible interactions that becomes increasingly beyond the ability of current and projected supercomputers to explore even if they were running since the invention of the first computer. When quantum computers come fully into being, then all the above calculations will need to be redone. There has been work showing how quantum computers could handle certain computably enumerable sets [20]. However, if any of the conditions (e.g., heating, humidity,

place for that variable, for example, could produce a radically different effect.

a sideways coffee cup as the hole.

coffee cup will fall and flatten, possibly trapping the mouse.

well as other conditions that we are probably overlooking.

## **2.2. Interactions**

For our calculations, let us estimate that there are 10 million objects in the world. In April 2015, the US Patent Office issued its nine millionth patent [18], and this number does not include the patents unique to patent offices of other countries or the trade secrets contained in no patent databases. Further, this estimate leaves out natural objects (e.g., stone) and common objects (e.g., ball) that are also excluded from all patent databases. Further, the number of patented objects grows everyday as new patent applications are submitted on a daily basis. However, 10 million is a reasonable estimate for the present time, and it is an easy number with which to do calculations.

Given an object of interest, how many interactions are possible with 10 million objects? Strictly speaking, there are 210,000,000 possible subsets of 10 million things, which is approximately 1080, so our object of interest could interact with every possible subset of objects. More realistically, however, an engineer might interact their object of interest with between one and five other objects, which would result in on the order of 10<sup>27</sup> subsets. Computers have existed for on the order of 109 seconds, so to examine all subsets of five or fewer objects would require examining 1027/10<sup>9</sup> = 1018 subsets per second since the 1950s. The fastest supercomputer as of June 2015, the Tianhe-2, computes on the order of 1016 floating-point operation per second [19]. So, if the Tianhe-2 existed since the first computer existed, it could still not examine all the possible interactions of our object of interest with a reasonable number of subsets of possible objects. This calculation only allows one floating-point operation to process each subset. Further, it does not take into account all the possible conditions that these interactions could take place in: differing barometric pressures, humidity, temperature, lighting intensity, radiation, magnetic fields, strength of gravitational field, and so on.

In sum, even with our conservative estimates, the current fastest supercomputer could not fully explore the space of possible interactions for our object of interest in a reasonable amount of time.

## **2.3. Many ways to interact**

The assumption made in the previous section is that, given two or more objects, it is obvious how they should interact. A spoon is used to stir the contents of a coffee cup, for example. That is what *functional fixedness* would dictate, which is the tendency to fixate on the designed use of an object, including when it is interacting with another object, and ignore the plethora of other possible uses [7]. But the spoon and the coffee cup could interact in an almost incalculable number of ways to achieve many different effects. For example, rest the spoon across the opening of a steaming cup of coffee. Place an ice cube in the spoon and watch the ice cube melt. Or, place a marble in the spoon as it rests across the cup's opening. Slap down on the handle part of the spoon hanging over the edge of the cup and launch the marble across the room. Or, place the coffee cup on one side of the room and take the spoon to the other side of the room. Make the coffee cup into a target by trying to throw the spoon into the cup. Or, play golf with the spoon as the putter, a marble as the golf ball, a table top as the putting green, and a sideways coffee cup as the hole.

pneumatic, mechanical, electromagnetic, and radioactive: [17]). Given that some amount of a material (e.g., a patch of velvet or a chunk of steel) can be considered an object, we can leave out material from the definition of feature above. Also, the lists of forces and energies may increase someday, especially as we better understand dark matter and dark energy, but these

For our calculations, let us estimate that there are 10 million objects in the world. In April 2015, the US Patent Office issued its nine millionth patent [18], and this number does not include the patents unique to patent offices of other countries or the trade secrets contained in no patent databases. Further, this estimate leaves out natural objects (e.g., stone) and common objects (e.g., ball) that are also excluded from all patent databases. Further, the number of patented objects grows everyday as new patent applications are submitted on a daily basis. However, 10 million is a reasonable estimate for the present time, and it is an easy number

Given an object of interest, how many interactions are possible with 10 million objects? Strictly speaking, there are 210,000,000 possible subsets of 10 million things, which is approximately 1080, so our object of interest could interact with every possible subset of objects. More realistically, however, an engineer might interact their object of interest with between one and five other objects, which would result in on the order of 10<sup>27</sup> subsets. Computers have existed for on the order of 109 seconds, so to examine all subsets of five or fewer objects would require examining 1027/10<sup>9</sup> = 1018 subsets per second since the 1950s. The fastest supercomputer as of June 2015, the Tianhe-2, computes on the order of 1016 floating-point operation per second [19]. So, if the Tianhe-2 existed since the first computer existed, it could still not examine all the possible interactions of our object of interest with a reasonable number of subsets of possible objects. This calculation only allows one floating-point operation to process each subset. Further, it does not take into account all the possible conditions that these interactions could take place in: differing barometric pressures, humidity, temperature, lighting intensity, radia-

In sum, even with our conservative estimates, the current fastest supercomputer could not fully explore the space of possible interactions for our object of interest in a reasonable amount of time.

The assumption made in the previous section is that, given two or more objects, it is obvious how they should interact. A spoon is used to stir the contents of a coffee cup, for example. That is what *functional fixedness* would dictate, which is the tendency to fixate on the designed use of an object, including when it is interacting with another object, and ignore the plethora of other possible uses [7]. But the spoon and the coffee cup could interact in an almost incalculable number of ways to achieve many different effects. For example, rest the spoon across the opening of a steaming cup of coffee. Place an ice cube in the spoon and watch the ice cube melt. Or, place a marble in the spoon as it rests across the cup's opening. Slap down on the handle part of the spoon hanging over the edge of the cup and launch the marble across the

lists are currently stable but potentially dynamic in the future.

tion, magnetic fields, strength of gravitational field, and so on.

**2.2. Interactions**

with which to do calculations.

146 Artificial Intelligence - Emerging Trends and Applications

**2.3. Many ways to interact**

Or, again set the coffee cup upright on the counter and place the spoon horizontally so it rests across the opening of the cup. Turn the spoon over so the curved part is facing upward and play a game of trying to balance various objects on the curved surface so they do not fall into the cup. Or, shake a spoon around in an empty cup to make a rattling sound. Or, turn a coffee cup over so that the open end is facing down. Place a spoon into the open end of the empty coffee cup and set the contraption on the counter. The spoon will force one side of the coffee cup to elevate a bit, forming a trap. When the spoon is disturbed by a mouse, for example, the coffee cup will fall and flatten, possibly trapping the mouse.

These are just a few of the ways to interact the spoon and the coffee cup to achieve an interesting effect. To consider all the ways that these two objects could interact, we would have to take into account every possible spatial relation between the two objects; every possible speed, acceleration, and deceleration of the two objects with respect to each other; every possible type of movement (linear, nonlinear, spinning at various angles and speeds); every possible surface that they may rest upon; every possible lighting condition, wind condition, heat condition, radiation level, magnetic field strength, electrical current flow, barometric pressure, humidity, earthquake or turbulence condition, and gravity strength; every possible extra object involved in the interaction (e.g., ice cube, marble, liquid coffee, and a human); as well as other conditions that we are probably overlooking.

If any of these conditions is actually measured by a continuous variable, then the number of different interactions between the spoon and coffee cup is truly computably nonenumerable. Even if all these conditions are measured by discrete variables that extend to a finite number of decimal places, then the number of possible interactions is outlandishly large. All these digits of precision on a variable are probably unnecessary in most cases, but when one is approaching a phase transition (e.g., liquid coffee approaching gas or the ceramic coffee cup possibly becoming superconductive), then many decimal places might be necessary to understand the onset of the transition. If one is approaching a previously unknown phase transition, then the slightest change in one condition, as measured by a change in the 100th decimal place for that variable, for example, could produce a radically different effect.

In sum, although we calculated that there may be about 1018 possible interactions between one object and up to five other objects out of 10 million possible objects, taking into account the incredible number of ways that any two objects can interact with each other plus all the possible conditions that those interactions could take place in, raises our number of interactions at least several orders of magnitude and quite possibly many orders of magnitude [6]. The overall result is a number of possible interactions that becomes increasingly beyond the ability of current and projected supercomputers to explore even if they were running since the invention of the first computer.

When quantum computers come fully into being, then all the above calculations will need to be redone. There has been work showing how quantum computers could handle certain computably enumerable sets [20]. However, if any of the conditions (e.g., heating, humidity, radiation, etc.) actually requires a continuous variable for its measurement, then the number of possible interactions is truly continuous and thus not computably enumerable. If all the conditions can be measured with discrete variables, then it is possible yet unclear whether the set of interactions is the type of set that is computably enumerable by a quantum computer, according to the specifications in Ref. [20]. Even if the set of possible interactions were computably enumerable, however, any gaps in the theories involving those interactions—as described in the next section—would make the set of derived effects from the set of interactions uncomputable.

theories, given the computable nonenumerability of effects, humans will continue to maintain their rightful place in innovation—even with the onset of quantum computers (see previous

Human-AI Synergy in Creativity and Innovation http://dx.doi.org/10.5772/intechopen.75310 149

In this section, we present five human blind spots to being creative and innovative (i.e., *functional fixedness* [7], *design fixation* [8]; *analogy blindness* [9, 10], *goal fixedness* [11], *assumption blindness* [11]) as well as the effective algorithms, several of them patented, that can guide humans out of these creative dead ends. For a more complete discussion, we refer the

*Functional fixedness* is the tendency to fixate on the common use of an object or one of its parts [7]. In 2012, the first highly effective counter-technique, the *generic parts technique* (GPT), was developed [13]. Consider the *Two Rings Problem* [26], in which you have to fasten two steel rings together in a figure-eight configuration. The rings are each about 6 inches in diameter and weigh about three pounds. All you have to work with is a long candle, a strike-anywhere

Most people first try to light the candle and drip wax around the rings. However, the rings are too heavy to be fastened securely with a wax bond. The key is to notice that the candle's wick is a string. Remove the string by scraping the wax away on the steel cube and tie the rings together. People who used the GPT solved 67% more problems than a control group [13]. The idea is to break an object into its parts while you ask two questions. First, can the object be broken down further into smaller parts? For example, in **Figure 1**, *candle* can be broken down into *wax* and *wick*. Second, does the description imply a use? If so, re-describe it generically in terms of its shape, material, or size. For example, a wick implies burning to give off light. When re-described in terms of its material (i.e., string), this new more generic description opens up new possibilities for uses—especially, tying things together. In this case, this use is sufficient to solve the *Two Rings Problem,* but if an even more generic description is needed then perhaps

*long, interwoven fibrous strands* would be about the next level of generic description.

Software that exists can find the solution to the *Two Rings Problem* and other problems requiring the discovery of obscure features if the key obscure features are either already known in a corpus or dataset or they have been articulated by a human in a targeted query [27]. For example, if a corpus/dataset "knows" that the verb *tie* is a synonym of *fasten, string* is a material for *wick*, and *tie* is a use of *string*, then software has generated the following solution to the *Two Rings Problem*: "a candle's wick is made of string, which might be able to tie ring to ring" [27]. If any of these connections is missing in the text/data, then the solution is difficult to reach. For example, if your data source is ConceptNet 5.5 [28], for example, then *tie* is a way to *fasten* things, *wick* is a part of *candle*, but *wick* is not a type of *string*, so it is difficult to

**3. Human weaknesses to innovation and counter-techniques**

reader to [11], which includes the treatment of other human creative blind spots.

Section 2.3 M*any Ways to Interact*).

**3.1. Functional fixedness**

match, and a two-inch cube of steel.

## **2.4. Predicting effects computationally**

Can a computer compute the effects of a set of objects or entities that are interacting? It depends on whether a theory exists that derives the particular effects under consideration. Sometimes, theory is ahead of empirical measurement and sometimes empirical measurement is ahead of theory. For the former, Einstein's General Relativity, developed between 1907 and 1915, predicted that light would bend around massive objects such as our Sun [16]. It took until 1919, however, until Arthur Eddington verified this prediction by measuring starlight that moved around a total solar eclipse [21]. For the latter, empirical measurement determined that galactic clusters did not have sufficient mass to account for their rotational speeds, so the existence of dark matter was posited as a way to increase gravitational effects present in galactic clusters [22].

If no theory exists to predict a particular effect of an interaction, then no algorithm exists to compute that effect. Given our previous example of a coffee cup interacting with a spoon, if there are gaps in the theories for how the interaction would proceed in a possible condition (e.g., lighting, wind, heat, radiation level, magnetic field strength, electrical current flow, barometric pressure, humidity, earthquake or turbulence condition, and gravity strength), then no computer could predict the effects within that particular configuration. That particular combination of conditions would have to be empirically measured. Thus, a computer's ability to list out a particular combination of conditions does not mean that the computer could successfully predict the effects of the interaction taking place within that amalgam of conditions.

#### **2.5. Humans countering computer limits**

Humans are needed to carry out the empirical measurements that neither a computer can carry out nor a robot has not been set up to execute. Further, with our vast experience of interacting with the physical world, humans already know many effects of interactions but have yet to encode them for a computer. If humans have not yet experienced the interaction, often we can comfortably predict the main effects of that interaction after running a mental simulation in the sensorimotor cortices of our brain [23, 24].

In this way, humans can help flesh out and teach the computer many effects that the computer does not currently know and is presently unable to derive. Further, humans are good at crafting theories that make predictions of effects that then can be empirically tested. So, humans can encode their theories that a computer can then use to derive effects. Although a computer will continue to learn new effects taught to it by humans and derive effects based on new theories, given the computable nonenumerability of effects, humans will continue to maintain their rightful place in innovation—even with the onset of quantum computers (see previous Section 2.3 M*any Ways to Interact*).

## **3. Human weaknesses to innovation and counter-techniques**

In this section, we present five human blind spots to being creative and innovative (i.e., *functional fixedness* [7], *design fixation* [8]; *analogy blindness* [9, 10], *goal fixedness* [11], *assumption blindness* [11]) as well as the effective algorithms, several of them patented, that can guide humans out of these creative dead ends. For a more complete discussion, we refer the reader to [11], which includes the treatment of other human creative blind spots.

## **3.1. Functional fixedness**

radiation, etc.) actually requires a continuous variable for its measurement, then the number of possible interactions is truly continuous and thus not computably enumerable. If all the conditions can be measured with discrete variables, then it is possible yet unclear whether the set of interactions is the type of set that is computably enumerable by a quantum computer, according to the specifications in Ref. [20]. Even if the set of possible interactions were computably enumerable, however, any gaps in the theories involving those interactions—as described in the next section—would make the set of derived effects from the set of interac-

Can a computer compute the effects of a set of objects or entities that are interacting? It depends on whether a theory exists that derives the particular effects under consideration. Sometimes, theory is ahead of empirical measurement and sometimes empirical measurement is ahead of theory. For the former, Einstein's General Relativity, developed between 1907 and 1915, predicted that light would bend around massive objects such as our Sun [16]. It took until 1919, however, until Arthur Eddington verified this prediction by measuring starlight that moved around a total solar eclipse [21]. For the latter, empirical measurement determined that galactic clusters did not have sufficient mass to account for their rotational speeds, so the existence of dark matter was posited as a way to increase gravitational effects present in galactic clusters [22].

If no theory exists to predict a particular effect of an interaction, then no algorithm exists to compute that effect. Given our previous example of a coffee cup interacting with a spoon, if there are gaps in the theories for how the interaction would proceed in a possible condition (e.g., lighting, wind, heat, radiation level, magnetic field strength, electrical current flow, barometric pressure, humidity, earthquake or turbulence condition, and gravity strength), then no computer could predict the effects within that particular configuration. That particular combination of conditions would have to be empirically measured. Thus, a computer's ability to list out a particular combination of conditions does not mean that the computer could successfully predict the effects of the interaction taking place within that amalgam of conditions.

Humans are needed to carry out the empirical measurements that neither a computer can carry out nor a robot has not been set up to execute. Further, with our vast experience of interacting with the physical world, humans already know many effects of interactions but have yet to encode them for a computer. If humans have not yet experienced the interaction, often we can comfortably predict the main effects of that interaction after running a mental

In this way, humans can help flesh out and teach the computer many effects that the computer does not currently know and is presently unable to derive. Further, humans are good at crafting theories that make predictions of effects that then can be empirically tested. So, humans can encode their theories that a computer can then use to derive effects. Although a computer will continue to learn new effects taught to it by humans and derive effects based on new

tions uncomputable.

**2.4. Predicting effects computationally**

148 Artificial Intelligence - Emerging Trends and Applications

**2.5. Humans countering computer limits**

simulation in the sensorimotor cortices of our brain [23, 24].

*Functional fixedness* is the tendency to fixate on the common use of an object or one of its parts [7]. In 2012, the first highly effective counter-technique, the *generic parts technique* (GPT), was developed [13]. Consider the *Two Rings Problem* [26], in which you have to fasten two steel rings together in a figure-eight configuration. The rings are each about 6 inches in diameter and weigh about three pounds. All you have to work with is a long candle, a strike-anywhere match, and a two-inch cube of steel.

Most people first try to light the candle and drip wax around the rings. However, the rings are too heavy to be fastened securely with a wax bond. The key is to notice that the candle's wick is a string. Remove the string by scraping the wax away on the steel cube and tie the rings together.

People who used the GPT solved 67% more problems than a control group [13]. The idea is to break an object into its parts while you ask two questions. First, can the object be broken down further into smaller parts? For example, in **Figure 1**, *candle* can be broken down into *wax* and *wick*. Second, does the description imply a use? If so, re-describe it generically in terms of its shape, material, or size. For example, a wick implies burning to give off light. When re-described in terms of its material (i.e., string), this new more generic description opens up new possibilities for uses—especially, tying things together. In this case, this use is sufficient to solve the *Two Rings Problem,* but if an even more generic description is needed then perhaps *long, interwoven fibrous strands* would be about the next level of generic description.

Software that exists can find the solution to the *Two Rings Problem* and other problems requiring the discovery of obscure features if the key obscure features are either already known in a corpus or dataset or they have been articulated by a human in a targeted query [27]. For example, if a corpus/dataset "knows" that the verb *tie* is a synonym of *fasten, string* is a material for *wick*, and *tie* is a use of *string*, then software has generated the following solution to the *Two Rings Problem*: "a candle's wick is made of string, which might be able to tie ring to ring" [27]. If any of these connections is missing in the text/data, then the solution is difficult to reach. For example, if your data source is ConceptNet 5.5 [28], for example, then *tie* is a way to *fasten* things, *wick* is a part of *candle*, but *wick* is not a type of *string*, so it is difficult to

To be innovative, you need to build upon a feature that has been commonly overlooked and, based upon our findings, the majority of feature types of common objects are overlooked. Therefore, there is plenty of room to create novel variations for even the most common of objects. For example, a candle that moves from its own dynamics is an under-explored type of candle. Examining the other overlooked features for a candle, we found that no one noticed that a candle loses weight when it burns. Thus, we leveraged weight loss to produce a candle in motion. By placing a candle on one side of a scale-like object and a counterweight on the other side, the candle moved upward slowly as it burned down. For fun, we placed a snuffer above the candle so that it eventually moved into the snuffer and extinguished itself. The *self-snuffing candle* was born [11]. A constructed prototype revealed that the *self-snuffing candle*

Human-AI Synergy in Creativity and Innovation http://dx.doi.org/10.5772/intechopen.75310 151

Computationally, in ConceptNet 5.5 [28], a candle has no connection with being motionless or losing weight while burning, while a rocking chair has many connections related to motion. ConceptNet 5.5, as an example of many textual and data sources, would not be a good source for noticing overlooked features that could become the basis of a novel design. The overlooked features need to be uncovered through another method such as using the extensive

*Goal fixedness* occurs when a solver stays close to the original phrasing of the problem's goal and does not notice the various ways to phrase the goal in synonymous ways [11]. Any goal can be phrased in the following form: *verb noun-phrase prepositional-phrases*. The verb describes the change that is desired (e.g., *increase profits in the New England area during the holiday season*). In some cases, you want something to stay the same that is trying to change (e.g., *maintain altitude with one damaged engine*). The *noun-phrase* names what needs changing (e.g., *profits*) or maintaining (e.g., *altitude*). The prepositional phrases describe the important constraints and relations that need to hold true (e.g., *in the New England area* 

Focusing on the verb, people are able to list between 5 and 11 synonyms of a verb [11]. Humans drastically underperform when compared to the synonyms that are present in a good thesaurus. In *WordNet* [31], for example, the number of synonyms for the verbs we tested ranged

For example, suppose a person was working on fastening the rings together in the *Two Rings Problem* [26] and used *WordNet* [31] to explore the synonyms of *fasten*. *WordNet* has a hierarchical structure to its synonyms. More specific synonyms are called *hyponyms*, while more general synonyms are called *hypernyms*. The hyponyms of a verb often name specific ways to achieve the change. There are 61 hyponyms of *fasten*, and they describe many ways to fasten things together, including *tie, weld, staple, velcro, clip, glue, buckle, pin, sew, clamp, chain, garter, clinch, strap, grout, lodge, cement, hasp, bind, button, latch,* and *rivet.* In this case, *tie* is the verb that names how to solve the *Two Rings Problem*. A computer program can easily solve the *Two Rings Problem* when it is revealed that *tie* is a hyponym of *fasten* and *string* is the material from which a wick is composed [27]. For countering *goal fixedness*, both *WordNet* and ConceptNet 5.5 are effective datasets.

between 24 and 172. Each synonym has nuances that may lead to new solutions.

works as described.

**3.3. Goal fixedness**

*during the holiday season*).

category system of feature types discussed above.

**Figure 1.** Generic parts technique.

reach the conclusion that you can use the wick to tie things together. In order to obtain this information, either a human must intervene to answer the questions of the *generic parts technique*, another text or data source is used to obtain the crucial information, or you find another possible route through ConceptNet 5.5 such as *wick* being a type of *cord,* which can be used to *tie* things. I chose ConceptNet 5.5 as an example because its information comes from multiple good sources [28]: including Open Mind Common Sense contributors, DBPedia 2015, OpenCyc 2012, and Open Multilingual WordNet.

#### **3.2. Design fixation**

*Design fixation* occurs when a designer attempts to create a novel design but fixates on the features of known designs they have seen [8, 29, 30]. For example, people instructed to create a novel candle might think up a new scent for a candle or add multiple wicks to the candle. In the context of candles, however, scent and number of wicks have been frequently explored. To be truly innovative, you need to manipulate a commonly overlooked (or perhaps new) feature of a candle. But how does one notice something that is rarely noticed?

Although the number of features of a candle (or any object) is intractable and not computably enumerable, classifying the types of features that any object could possess into an extensive category system has been a highly effective method for overcoming *design fixation* [11].

We initially listed 32 categories of feature types for objects, but now use a 50-category system [11]. We asked people to list as many features as they could for many common objects (e.g., candle, umbrella, etc.). We then categorized their answers based on our 32-category system. On average, people overlooked 20.7 of the 32 categories (67.4%) for each of the objects [11]. For each object, they overlooked different types of features. For example, for a rocking chair, they would notice motion—that the chair was designed to move in a certain way. For a candle, however, no one we tested ever noticed that a candle is motionless when it burns. Its flame flickers, but the candle itself does not move.

To be innovative, you need to build upon a feature that has been commonly overlooked and, based upon our findings, the majority of feature types of common objects are overlooked. Therefore, there is plenty of room to create novel variations for even the most common of objects.

For example, a candle that moves from its own dynamics is an under-explored type of candle. Examining the other overlooked features for a candle, we found that no one noticed that a candle loses weight when it burns. Thus, we leveraged weight loss to produce a candle in motion. By placing a candle on one side of a scale-like object and a counterweight on the other side, the candle moved upward slowly as it burned down. For fun, we placed a snuffer above the candle so that it eventually moved into the snuffer and extinguished itself. The *self-snuffing candle* was born [11]. A constructed prototype revealed that the *self-snuffing candle* works as described.

Computationally, in ConceptNet 5.5 [28], a candle has no connection with being motionless or losing weight while burning, while a rocking chair has many connections related to motion. ConceptNet 5.5, as an example of many textual and data sources, would not be a good source for noticing overlooked features that could become the basis of a novel design. The overlooked features need to be uncovered through another method such as using the extensive category system of feature types discussed above.

## **3.3. Goal fixedness**

reach the conclusion that you can use the wick to tie things together. In order to obtain this information, either a human must intervene to answer the questions of the *generic parts technique*, another text or data source is used to obtain the crucial information, or you find another possible route through ConceptNet 5.5 such as *wick* being a type of *cord,* which can be used to *tie* things. I chose ConceptNet 5.5 as an example because its information comes from multiple good sources [28]: including Open Mind Common Sense contributors, DBPedia 2015,

*Design fixation* occurs when a designer attempts to create a novel design but fixates on the features of known designs they have seen [8, 29, 30]. For example, people instructed to create a novel candle might think up a new scent for a candle or add multiple wicks to the candle. In the context of candles, however, scent and number of wicks have been frequently explored. To be truly innovative, you need to manipulate a commonly overlooked (or perhaps new)

Although the number of features of a candle (or any object) is intractable and not computably enumerable, classifying the types of features that any object could possess into an extensive category system has been a highly effective method for overcoming *design fixation* [11].

We initially listed 32 categories of feature types for objects, but now use a 50-category system [11]. We asked people to list as many features as they could for many common objects (e.g., candle, umbrella, etc.). We then categorized their answers based on our 32-category system. On average, people overlooked 20.7 of the 32 categories (67.4%) for each of the objects [11]. For each object, they overlooked different types of features. For example, for a rocking chair, they would notice motion—that the chair was designed to move in a certain way. For a candle, however, no one we tested ever noticed that a candle is motionless when it burns. Its flame

feature of a candle. But how does one notice something that is rarely noticed?

OpenCyc 2012, and Open Multilingual WordNet.

flickers, but the candle itself does not move.

**3.2. Design fixation**

**Figure 1.** Generic parts technique.

150 Artificial Intelligence - Emerging Trends and Applications

*Goal fixedness* occurs when a solver stays close to the original phrasing of the problem's goal and does not notice the various ways to phrase the goal in synonymous ways [11]. Any goal can be phrased in the following form: *verb noun-phrase prepositional-phrases*. The verb describes the change that is desired (e.g., *increase profits in the New England area during the holiday season*). In some cases, you want something to stay the same that is trying to change (e.g., *maintain altitude with one damaged engine*). The *noun-phrase* names what needs changing (e.g., *profits*) or maintaining (e.g., *altitude*). The prepositional phrases describe the important constraints and relations that need to hold true (e.g., *in the New England area during the holiday season*).

Focusing on the verb, people are able to list between 5 and 11 synonyms of a verb [11]. Humans drastically underperform when compared to the synonyms that are present in a good thesaurus. In *WordNet* [31], for example, the number of synonyms for the verbs we tested ranged between 24 and 172. Each synonym has nuances that may lead to new solutions.

For example, suppose a person was working on fastening the rings together in the *Two Rings Problem* [26] and used *WordNet* [31] to explore the synonyms of *fasten*. *WordNet* has a hierarchical structure to its synonyms. More specific synonyms are called *hyponyms*, while more general synonyms are called *hypernyms*. The hyponyms of a verb often name specific ways to achieve the change. There are 61 hyponyms of *fasten*, and they describe many ways to fasten things together, including *tie, weld, staple, velcro, clip, glue, buckle, pin, sew, clamp, chain, garter, clinch, strap, grout, lodge, cement, hasp, bind, button, latch,* and *rivet.* In this case, *tie* is the verb that names how to solve the *Two Rings Problem*. A computer program can easily solve the *Two Rings Problem* when it is revealed that *tie* is a hyponym of *fasten* and *string* is the material from which a wick is composed [27]. For countering *goal fixedness*, both *WordNet* and ConceptNet 5.5 are effective datasets.

## **3.4. Assumption blindness**

Any phrasing of the goal belies many assumptions [11]. For example, a company was stuck on trying to adhere a coating to the nonstick surface Teflon. Everything they tried failed. However, some analysis of the verb *adhere* revealed some of its assumptions.

including *diminish trauma, lessen impact, reduce energy, soften collision, minimize force, decrease* 

Human-AI Synergy in Creativity and Innovation http://dx.doi.org/10.5772/intechopen.75310 153

Next, we entered each of the phrases into Google in the form "concussions diminish trauma" [12]. This step helped us determine which phrases were under-explored in the context of concussions. We found that *repel energy* was almost completely ignored. The word *repel* is closely associated with magnets, and this connection quickly triggered the creation of a possible solution. Magnetize all football helmets with the same pole so they do not want to be near each other. Tests with models showed that potential head-on collisions were turned into glancing blows as the helmets slowed down and slightly veered when approaching each other at high

In the *BrainSwarming* graph for *reduce concussions* (**Figure 2**), the goal was placed at the top and the alternative goal phrasings grew downward from the top. The resources were divided into two types: *objects* and *energies*, and placed across the bottom. A solution was constructed in the middle showing an interaction between the two helmets and magnetic energy. This interaction satisfied the subgoal *repel energy*, which satisfied the main goal *reduce concussions.* Obtaining synonyms from *WordNet*, ConceptNet 5.5, or other sources can definitely uncover nuanced phrasings of the goal that may illuminate novel solutions to a problem. These data sources and the process of creating alternative goal phrasings could potentially help any AI

technology that is focused on the task of problem solving.

**Figure 2.** BrainSwarming graph for reduce concussions.

*momentum,* and *repel energy*.

speeds [12].

The verb *adhere* assumes a chemical solution often involving some type of adhesive. The verb *adhere* also assumes that two things are being adhered to each other, that the adherence is probably meant to be permanent, that the two things being adhered are in direct contact with each other, that the direct contact is playing an important causal role in the adherence, and so on.

Noticing three of the assumptions was crucial to a solution: (1) using a chemical process between (2) two surfaces where (3) contact is crucial to the solution. Exploring alternatives to these assumptions led to a novel solution: (1) using a magnetic process among (2) three surfaces where (3) contact is not crucial to the solution. Specifically, a magnetic surface is placed behind the Teflon surface, while the coating with some ferrous content is placed in front of the Teflon surface. The coating sticks through the Teflon to the magnetic surface and forms a kind of *Teflon sandwich*.

In general, there is a master list of 50 types of features that any physical solution might possess [11]: including size, shape, material, quantity, type of energy used (e.g., chemical, magnetic, etc.), spatial relations among the parts, symmetry, and motion. To uncover some important assumptions, simply proceed through the list and ask if the verb under consideration assumes anything about each of these feature types.

These types of assumptions are contained in neither ConceptNet 5.5 nor, most likely, any other current text or data source. These assumptions need to be unearthed carefully through a method such as the one described above.

## **3.5. Analogy blindness**

Gick and Holyoak [9, 10] were the first to show experimentally how difficult it is for humans to notice by themselves how an idea from one area could be adapted as a solution in another area. For example, they had participants read a brief military story that held the crucial idea for solving a surgery problem [9, 10]. Thirty percent solved the surgery problem after mere exposure to the military problem, but 80% solved it after being told to use the military problem to help solve the surgery problem.

Building upon the work of Julie Linsey and colleagues [32–36], who focused upon looking at synonyms of the main verb expressing the goal of the problem, McCaffrey and Krishnamurty [11] went one step further to explore the synonyms of both the verb and noun phrase of the goal. For example, consider the goal *reduce concussions in American football players.* The goal is in the form *verb noun-phrase prepositional-phrase.* Focusing upon *reduce concussions*, we explored the synonyms of both words and took into consideration some basic engineering knowledge. This process led to an extensive list of alternative phrasings: including *diminish trauma, lessen impact, reduce energy, soften collision, minimize force, decrease momentum,* and *repel energy*.

**3.4. Assumption blindness**

152 Artificial Intelligence - Emerging Trends and Applications

and so on.

kind of *Teflon sandwich*.

**3.5. Analogy blindness**

anything about each of these feature types.

a method such as the one described above.

lem to help solve the surgery problem.

Any phrasing of the goal belies many assumptions [11]. For example, a company was stuck on trying to adhere a coating to the nonstick surface Teflon. Everything they tried failed.

The verb *adhere* assumes a chemical solution often involving some type of adhesive. The verb *adhere* also assumes that two things are being adhered to each other, that the adherence is probably meant to be permanent, that the two things being adhered are in direct contact with each other, that the direct contact is playing an important causal role in the adherence,

Noticing three of the assumptions was crucial to a solution: (1) using a chemical process between (2) two surfaces where (3) contact is crucial to the solution. Exploring alternatives to these assumptions led to a novel solution: (1) using a magnetic process among (2) three surfaces where (3) contact is not crucial to the solution. Specifically, a magnetic surface is placed behind the Teflon surface, while the coating with some ferrous content is placed in front of the Teflon surface. The coating sticks through the Teflon to the magnetic surface and forms a

In general, there is a master list of 50 types of features that any physical solution might possess [11]: including size, shape, material, quantity, type of energy used (e.g., chemical, magnetic, etc.), spatial relations among the parts, symmetry, and motion. To uncover some important assumptions, simply proceed through the list and ask if the verb under consideration assumes

These types of assumptions are contained in neither ConceptNet 5.5 nor, most likely, any other current text or data source. These assumptions need to be unearthed carefully through

Gick and Holyoak [9, 10] were the first to show experimentally how difficult it is for humans to notice by themselves how an idea from one area could be adapted as a solution in another area. For example, they had participants read a brief military story that held the crucial idea for solving a surgery problem [9, 10]. Thirty percent solved the surgery problem after mere exposure to the military problem, but 80% solved it after being told to use the military prob-

Building upon the work of Julie Linsey and colleagues [32–36], who focused upon looking at synonyms of the main verb expressing the goal of the problem, McCaffrey and Krishnamurty [11] went one step further to explore the synonyms of both the verb and noun phrase of the goal. For example, consider the goal *reduce concussions in American football players.* The goal is in the form *verb noun-phrase prepositional-phrase.* Focusing upon *reduce concussions*, we explored the synonyms of both words and took into consideration some basic engineering knowledge. This process led to an extensive list of alternative phrasings:

However, some analysis of the verb *adhere* revealed some of its assumptions.

Next, we entered each of the phrases into Google in the form "concussions diminish trauma" [12]. This step helped us determine which phrases were under-explored in the context of concussions. We found that *repel energy* was almost completely ignored. The word *repel* is closely associated with magnets, and this connection quickly triggered the creation of a possible solution. Magnetize all football helmets with the same pole so they do not want to be near each other. Tests with models showed that potential head-on collisions were turned into glancing blows as the helmets slowed down and slightly veered when approaching each other at high speeds [12].

In the *BrainSwarming* graph for *reduce concussions* (**Figure 2**), the goal was placed at the top and the alternative goal phrasings grew downward from the top. The resources were divided into two types: *objects* and *energies*, and placed across the bottom. A solution was constructed in the middle showing an interaction between the two helmets and magnetic energy. This interaction satisfied the subgoal *repel energy*, which satisfied the main goal *reduce concussions.*

Obtaining synonyms from *WordNet*, ConceptNet 5.5, or other sources can definitely uncover nuanced phrasings of the goal that may illuminate novel solutions to a problem. These data sources and the process of creating alternative goal phrasings could potentially help any AI technology that is focused on the task of problem solving.

**Figure 2.** BrainSwarming graph for reduce concussions.

## **4. Human-computer interface to achieve synergy**

In order for humans to counter computer limits and computers to counter human weaknesses, an interface is needed, which is comprised of data structures that both humans and computers can easily populate. In order to make the human-computer interaction efficient, the interface needs to be both human- and computer-friendly. Building upon our new definitions, we can define a problem as a set of desired effects and define a solution as a sequence of interactions that ultimately produces the desired effects named in the problem.

relations. A feature is synonymous with an effect, but sometimes a shorthand can be used: an adjective (e.g., heavy) or a noun phrase (e.g., a heavy, metal rectangle). A resource is either an object (e.g., hammer), a material (e.g., velvet), an energy (e.g., magnetic), or a force (e.g., centrifugal).

Human-AI Synergy in Creativity and Innovation http://dx.doi.org/10.5772/intechopen.75310 155

Each of these grammatical forms is both human- and computer-friendly. Each phrase has a natural English form, and each phrase is regular so that it is easy for a computer to parse.

The current implementation of *BrainSwarming* (brainswarming.io) includes the visual graph and grammar of **Figure 3**, as well as software to counteract *functional fixedness*, *design fixation*, *goal fixedness*, and *analogy blindness* [11]. The counter-technique for *assumption blindness* has not been implemented at this time. *BrainSwarming's* visual graph is simultaneously updatable online by multiple users working from different locations. *Analogy Finder* [11, 38], the countertechnique to *analogy blindness*, currently searches the U.S. Patent database for analogous solutions from different fields (i.e., solutions that achieve the same basic effect and can be adapted to the particulars of the problem at hand). The user of *Analogy Finder* enters the verb and noun-phrase of a goal (e.g., *reduce concussions*). *Analogy Finder* then explodes the phrase into many synonymous phrases. The user selects which phrases to use in the search and *Analogy Finder* finds patents that achieve the desired effect expressed in many different ways (jargons) across all domains of the U.S. Patent database. *Analogy Finder* could easily be made to search

In *BrainSwarming's* current implementation, it has helped humans create a novel, magnetic solution to reduce concussions [12, 39], an original way to stick a coating to Teflon [11], as well as several other new solutions to proprietary problems. For the Teflon problem, the *assumption blindness* technique was executed by hand, as it has not yet been implemented, although

Because *BrainSwarming* allows multiple users to remotely access it, one of *BrainSwarming's* users could be a software program—specifically, an AI program such as IBM Watson or a machine learning program. Of course, the AI program would need to be able to communicate through an API to join the *BrainSwarming* team that is working on a particular problem. In this way, the humans on the team could help the AI program overcome the proven limits that hold for any computational approach to innovation. While *BrainSwarming's* software explicitly counteracts certain human obstacles to creativity (e.g., *functional fixedness*), the contributing AI program could help the team of human users overcome such things as noticing patterns that humans have trouble noticing (e.g., machine learning programs), revealing relevant information that a user has not yet read (e.g., IBM Watson), revealing information that is outside a user's expertise, or making connections that a user has yet to make. Finally, *BrainSwarming's* software can help counteract the obstacles to innovation (e.g., *functional fixedness* or *analogy blindness*) that are implicit in the corpora and datasets being used because humans produced

the synonym generation and analogous search were conducted by the software.

**5. Current implementation and applications**

any dataset or corpus (e.g., academic journals, *Wikipedia, etc.*).

these sources of information.

We define the problem solving grammar for innovation in Extended Backus-Naur Form (EBNF: [37]), which is a compact notation mostly used for defining the syntax of computer programming languages. For our grammar, we only need a few of EBNF's symbols: "::=" means "is defined as," a superscripted "+" means there can be one or more of the preceding item, and a superscripted "\*" means there can be zero or more of the preceding item.

The bidirectional *BrainSwarming* graph has been tested with various age groups and has been found to be easy to understand [25]. As illustrated in **Figure 3**, a goal is placed at the top, and the refinements of the original goal grow downward below it. The resources to solve the problem are placed across the bottom, and the parts and features grow upward above the resources. The two directions grow toward each other until they connect, at which point you have your first candidate solution. The solution is comprised of a sequence of interactions between resources, features, and parts until the goal's effects are satisfied. Humans find the graph intuitive, and the computer can easily represent the bidirectional graph as a set of trees.

A goal is a set of desired effects. Any effect can be described as an action verb that describes a change (or a nonchange), a noun phrase to name that which needs changing (or should be kept from changing), and a list of prepositional phrases that describe important constraints and

**Figure 3.** BrainSwarming graph and grammar.

relations. A feature is synonymous with an effect, but sometimes a shorthand can be used: an adjective (e.g., heavy) or a noun phrase (e.g., a heavy, metal rectangle). A resource is either an object (e.g., hammer), a material (e.g., velvet), an energy (e.g., magnetic), or a force (e.g., centrifugal).

Each of these grammatical forms is both human- and computer-friendly. Each phrase has a natural English form, and each phrase is regular so that it is easy for a computer to parse.

## **5. Current implementation and applications**

**Figure 3.** BrainSwarming graph and grammar.

**4. Human-computer interface to achieve synergy**

154 Artificial Intelligence - Emerging Trends and Applications

In order for humans to counter computer limits and computers to counter human weaknesses, an interface is needed, which is comprised of data structures that both humans and computers can easily populate. In order to make the human-computer interaction efficient, the interface needs to be both human- and computer-friendly. Building upon our new definitions, we can define a problem as a set of desired effects and define a solution as a sequence of

We define the problem solving grammar for innovation in Extended Backus-Naur Form (EBNF: [37]), which is a compact notation mostly used for defining the syntax of computer programming languages. For our grammar, we only need a few of EBNF's symbols: "::=" means "is defined as," a superscripted "+" means there can be one or more of the preceding

The bidirectional *BrainSwarming* graph has been tested with various age groups and has been found to be easy to understand [25]. As illustrated in **Figure 3**, a goal is placed at the top, and the refinements of the original goal grow downward below it. The resources to solve the problem are placed across the bottom, and the parts and features grow upward above the resources. The two directions grow toward each other until they connect, at which point you have your first candidate solution. The solution is comprised of a sequence of interactions between resources, features, and parts until the goal's effects are satisfied. Humans find the graph intuitive, and the computer can easily represent the bidirectional graph as a set of trees. A goal is a set of desired effects. Any effect can be described as an action verb that describes a change (or a nonchange), a noun phrase to name that which needs changing (or should be kept from changing), and a list of prepositional phrases that describe important constraints and

interactions that ultimately produces the desired effects named in the problem.

item, and a superscripted "\*" means there can be zero or more of the preceding item.

The current implementation of *BrainSwarming* (brainswarming.io) includes the visual graph and grammar of **Figure 3**, as well as software to counteract *functional fixedness*, *design fixation*, *goal fixedness*, and *analogy blindness* [11]. The counter-technique for *assumption blindness* has not been implemented at this time. *BrainSwarming's* visual graph is simultaneously updatable online by multiple users working from different locations. *Analogy Finder* [11, 38], the countertechnique to *analogy blindness*, currently searches the U.S. Patent database for analogous solutions from different fields (i.e., solutions that achieve the same basic effect and can be adapted to the particulars of the problem at hand). The user of *Analogy Finder* enters the verb and noun-phrase of a goal (e.g., *reduce concussions*). *Analogy Finder* then explodes the phrase into many synonymous phrases. The user selects which phrases to use in the search and *Analogy Finder* finds patents that achieve the desired effect expressed in many different ways (jargons) across all domains of the U.S. Patent database. *Analogy Finder* could easily be made to search any dataset or corpus (e.g., academic journals, *Wikipedia, etc.*).

In *BrainSwarming's* current implementation, it has helped humans create a novel, magnetic solution to reduce concussions [12, 39], an original way to stick a coating to Teflon [11], as well as several other new solutions to proprietary problems. For the Teflon problem, the *assumption blindness* technique was executed by hand, as it has not yet been implemented, although the synonym generation and analogous search were conducted by the software.

Because *BrainSwarming* allows multiple users to remotely access it, one of *BrainSwarming's* users could be a software program—specifically, an AI program such as IBM Watson or a machine learning program. Of course, the AI program would need to be able to communicate through an API to join the *BrainSwarming* team that is working on a particular problem. In this way, the humans on the team could help the AI program overcome the proven limits that hold for any computational approach to innovation. While *BrainSwarming's* software explicitly counteracts certain human obstacles to creativity (e.g., *functional fixedness*), the contributing AI program could help the team of human users overcome such things as noticing patterns that humans have trouble noticing (e.g., machine learning programs), revealing relevant information that a user has not yet read (e.g., IBM Watson), revealing information that is outside a user's expertise, or making connections that a user has yet to make. Finally, *BrainSwarming's* software can help counteract the obstacles to innovation (e.g., *functional fixedness* or *analogy blindness*) that are implicit in the corpora and datasets being used because humans produced these sources of information.

In this way, *BrainSwarming* counteracts the known cognitive obstacles to innovation in the human users as well as in the human-produced corpora and datasets. It also provides a highly visual interface that humans and any contributing AI program can access. The interaction among the human users, the visual interface, *BrainSwarming's* software that counteracts obstacles to creativity, and any contributing AI programs has the potential to achieve great innovative breakthroughs. Much testing is required to test the innovative power of the *BrainSwarming* platform when an AI program such as IBM Watson interacts with it as one of its users. The crucial comparison would be the innovativeness of IBM Watson (or another AI program) on its own compared to IBM Watson interacting with the *BrainSwarming* platform and some human users. Even without a contributing AI program, *BrainSwarming* helped human users become more innovative by coming up with novel solutions to some very difficult problems [11, 12, 39].

through the *BrainSwarming* interface to uncover crucial obscure features for the problem at hand. Because innovative solutions are built upon obscure features, any AI technology using the *BrainSwarming* interface can potentially achieve higher levels of innovativeness than either the human users or the AI technology can achieve on its own. We look forward to testing this

Human-AI Synergy in Creativity and Innovation http://dx.doi.org/10.5772/intechopen.75310 157

This article is based on work supported by National Science Foundation Grants 1534740, 1331283, and 1129139. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the

This research is associated with my company, Innovation Accelerator, Inc. (www.innovationaccelerator.com), and may lead to the development of software products, in which I have a business and/or financial interest. I have in place an approved plan for managing any potential

[1] Bloomberg J. Is IBM Watson a 'Joke'? [Internet]. 2017. Available from: https://www. forbes.com/sites/jasonbloomberg/2017/07/02/is-ibm-watson-a-joke/#76e55bb8da20

[2] Freedman D. A Reality Check on IBM's AI Ambitions [Internet]. 2017. Available from: https://www.technologyreview.com/s/607965/a-reality-check-for-ibms-ai-ambitions/

[3] Schank R. The Fraudulent Claims Made by IBM about Watson and IBM [Internet]. 2017. Available from: http://www.rogerschank.com/fraudulent-claims-made-by-IBM-about-

exciting hypothesis.

**Acknowledgements**

National Science Foundation.

conflicts arising from this arrangement.

[Accessed January 5, 2018]

[Accessed: January 5, 2018]

Watson-and-AI [Accessed: January 5, 2018]

Address all correspondence to: tony@innovationaccelerator.com

Innovation Accelerator, Inc., West Brookfield, Massachusetts, United States

**Conflict of interest**

**Author details**

Tony McCaffrey

**References**

## **6. Conclusions**

Every innovative solution is built upon an obscure (i.e., commonly overlooked or new) feature of the problem [13, 27]. Both computers and humans tend to overlook different sets of obscure features based on their differing search biases. These differences are somewhat complementary so that computers and humans can help each other uncover obscure features that the other partner would miss [6, 11]. The result is significantly more unearthed features of the problem, resulting in a higher chance of unearthing the key obscure features required for a novel solution to the problem. Further, computers cannot completely take over the creative and innovative process due to the fact that the set of features of any object is not computably enumerable, so it cannot be fully explored by a computational device [6]. Working together through a computerand human-friendly interface called *BrainSwarming* permits computers and humans to easily innovate together. This human-computer synergy has already produced innovative solutions to some difficult industry problems [11, 12, 39]. Adding an AI program such as IBM Watson to the team innovating together through *BrainSwarming* has the potential to produce innovative breakthroughs that neither IBM Watson nor the human team could achieve on its own.

Specifically, imagine IBM Watson plugged into the *BrainSwarming* interface. It populates the bi-directional graph and reads information placed in the graph by the human users. In this way, it dynamically informs the human users of its insights as well as learns of the insights from the human users. Specifically, humans' implicit knowledge of many features and effects that have yet to be encoded can be communicated to Watson. New empirical results from tests conducted by humans can be entered for Watson to use. Further, Watson can learn obscure features from the *generic parts technique* in order to overcome *functional fixedness*. It can use overlooked features uncovered from the 50-category feature type list in order to overcome *design fixation*. It can capitalize on the 50-category feature type list to unearth assumptions hidden behind the main goal verb in order to counter *assumption blindness*. Watson can also leverage the synonyms of *WordNet* and ConceptNet 5.5 in order to overcome *goal fixedness* and *analogy blindness.* In turn, IBM Watson can offer its considerable knowledge and insight to the *BrainSwarming* graph. Overall, IBM Watson, or any AI technology, can interact with humans through the *BrainSwarming* interface to uncover crucial obscure features for the problem at hand. Because innovative solutions are built upon obscure features, any AI technology using the *BrainSwarming* interface can potentially achieve higher levels of innovativeness than either the human users or the AI technology can achieve on its own. We look forward to testing this exciting hypothesis.

## **Acknowledgements**

In this way, *BrainSwarming* counteracts the known cognitive obstacles to innovation in the human users as well as in the human-produced corpora and datasets. It also provides a highly visual interface that humans and any contributing AI program can access. The interaction among the human users, the visual interface, *BrainSwarming's* software that counteracts obstacles to creativity, and any contributing AI programs has the potential to achieve great innovative breakthroughs. Much testing is required to test the innovative power of the *BrainSwarming* platform when an AI program such as IBM Watson interacts with it as one of its users. The crucial comparison would be the innovativeness of IBM Watson (or another AI program) on its own compared to IBM Watson interacting with the *BrainSwarming* platform and some human users. Even without a contributing AI program, *BrainSwarming* helped human users become more innovative by coming up with novel solutions to some

Every innovative solution is built upon an obscure (i.e., commonly overlooked or new) feature of the problem [13, 27]. Both computers and humans tend to overlook different sets of obscure features based on their differing search biases. These differences are somewhat complementary so that computers and humans can help each other uncover obscure features that the other partner would miss [6, 11]. The result is significantly more unearthed features of the problem, resulting in a higher chance of unearthing the key obscure features required for a novel solution to the problem. Further, computers cannot completely take over the creative and innovative process due to the fact that the set of features of any object is not computably enumerable, so it cannot be fully explored by a computational device [6]. Working together through a computerand human-friendly interface called *BrainSwarming* permits computers and humans to easily innovate together. This human-computer synergy has already produced innovative solutions to some difficult industry problems [11, 12, 39]. Adding an AI program such as IBM Watson to the team innovating together through *BrainSwarming* has the potential to produce innovative

breakthroughs that neither IBM Watson nor the human team could achieve on its own.

Specifically, imagine IBM Watson plugged into the *BrainSwarming* interface. It populates the bi-directional graph and reads information placed in the graph by the human users. In this way, it dynamically informs the human users of its insights as well as learns of the insights from the human users. Specifically, humans' implicit knowledge of many features and effects that have yet to be encoded can be communicated to Watson. New empirical results from tests conducted by humans can be entered for Watson to use. Further, Watson can learn obscure features from the *generic parts technique* in order to overcome *functional fixedness*. It can use overlooked features uncovered from the 50-category feature type list in order to overcome *design fixation*. It can capitalize on the 50-category feature type list to unearth assumptions hidden behind the main goal verb in order to counter *assumption blindness*. Watson can also leverage the synonyms of *WordNet* and ConceptNet 5.5 in order to overcome *goal fixedness* and *analogy blindness.* In turn, IBM Watson can offer its considerable knowledge and insight to the *BrainSwarming* graph. Overall, IBM Watson, or any AI technology, can interact with humans

very difficult problems [11, 12, 39].

156 Artificial Intelligence - Emerging Trends and Applications

**6. Conclusions**

This article is based on work supported by National Science Foundation Grants 1534740, 1331283, and 1129139. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the National Science Foundation.

## **Conflict of interest**

This research is associated with my company, Innovation Accelerator, Inc. (www.innovationaccelerator.com), and may lead to the development of software products, in which I have a business and/or financial interest. I have in place an approved plan for managing any potential conflicts arising from this arrangement.

## **Author details**

Tony McCaffrey

Address all correspondence to: tony@innovationaccelerator.com

Innovation Accelerator, Inc., West Brookfield, Massachusetts, United States

## **References**


[4] Darrow B. Has IBM Watson's AI Technology Fallen Victim to Hype? [Internet]. 2017. Available from: http://fortune.com/2017/06/28/ibm-watson-ai-healthcare/ [Accessed: January 5, 2018]

[20] Calude C, Tadaki K. Spectral Representation of Some Computably Enumerable Sets with an Application to Quantum Provability [Internet]. 2013. Available from: https://arxiv.

Human-AI Synergy in Creativity and Innovation http://dx.doi.org/10.5772/intechopen.75310 159

[21] Kennefick D. Testing relativity from the 1919 eclipse—A question of bias. Physics Today.

[22] Zwicky F. On the masses of nebulae and of clusters of nebulae. Astrophysical Journal.

[23] Battaglia P, Hamrich J, Tennenbaum J. Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences of the United States of America. 2013;**110**(45):18327-18332 Available from: http://www.pnas.org/content/110/45/18327.full

[24] Hegarty M. Mechanical reasoning by mental simulation. Trends in Cognitive Sciences.

[25] McCaffrey T. Overcome any Obstacle to Creativity. Washington, DC: Rowman &

[26] McCaffrey T. The obscure features hypothesis for innovation: one key to improving human innovation [unpublished doctoral dissertation]. University of Massachusetts,

[27] McCaffrey T, Spector L. How the obscure features hypothesis leads to innovation assistant software. In: Proceedings of the 2nd International Conference of Comput Creativity

[28] ConceptNet 5.5 [Internet]. 2017. Available from: conceptnet.io [Accessed: January 5, 2018] [29] Smith S. Getting into and out of mental ruts: A theory of fixation, incubation and insight. In: Sternberg R, Davidson J. The Nature of Insight. Cambridge, MA: MIT Press; 1995.

[30] Smith S, Ward T, Schumacher J. Constraining effects of examples in a creative generation

[31] Miller G. WordNet: A lexical database for English. Communications of the ACM. 1995;

[32] Linsey J. Design-by-analogy and representation in innovative engineering concept gen-

[33] Linsey J, Laux J, Clauss E, Wood K, Markman A. Increasing innovation: A trilogy of experiments towards a design-by-analogy method. In: Proceedings of ASME Des Engineering and Technical Conference (IDETC). Las Vegas, NV: IDETC; 2007. pp. 1-15

[34] Linsey J, Markman A, Wood K. WordTrees: A method for design-by-analogy. In: Proceeding of the 2008 Amer Soc Eng Educ Ann Conf (ASEEAC). Pittsburg, PA:

[35] Linsey J, Wood K, Markman A. Increasing innovation: Presentation and evaluation of the WordTree design-by-analogy method. In: Proceedings of the ASME 2008 International

eration [dissertation]. The University of Texas at Austin, Austin, TX. 2007

(ICCC). Mexico City, Mexico: ICCC; 2011. pp. 120-122

task. Memory & Cognition. 1993;**21**:837-845

org/abs/1303.5502 [Accessed: January 5, 2018]

2009;**62**(3):37-42

2004;**8**(6):280-285

Littlefield; 2018

Amherst. 2011

pp. 229-251

**38**:39-41

ASEEAC; 2008. pp. 1-13

1937;**86**:217


[20] Calude C, Tadaki K. Spectral Representation of Some Computably Enumerable Sets with an Application to Quantum Provability [Internet]. 2013. Available from: https://arxiv. org/abs/1303.5502 [Accessed: January 5, 2018]

[4] Darrow B. Has IBM Watson's AI Technology Fallen Victim to Hype? [Internet]. 2017. Available from: http://fortune.com/2017/06/28/ibm-watson-ai-healthcare/ [Accessed:

[5] Frank B. AI is Still Several Breakthroughs Away from Reality [Internet]. 2017. Available from: https://venturebeat.com/2017/06/23/ai-is-still-several-breakthroughs-away-from-

[6] McCaffrey T, Spector L. An approach to human–machine collaboration in innovation. Artificial Intelligence for Engineering Design, Analysis and Manufacturing. 2017:**32**:1-

[9] Gick M, Holyoak K. Analogical problem solving. Cognitive Psychology. 1980;**12**:306-355 [10] Gick M, Holyoak K. Schema induction and analogical transfer. Cognitive Psychology.

[11] McCaffrey T, Krishnamurty S. The obscure features hypothesis in design innovation.

[12] McCaffrey T, Pearson J. Find innovation where you least expect it. Harvard Business

[13] McCaffrey T. Innovation relies on the obscure: A key to overcoming the classic *functional* 

[14] Nietzsche F. Will to Power (Translated by Kaufmann W, Hollingdale R). New York:

[15] Ellis J, Gaillard M, Nanopoulos D. An Updated Historical Profile of the Higgs Boson [Internet]. 2015. Available from: http://arxiv.org/pdf/1504.07217.pdf [Accessed: January

[16] Einstein A. Relativity: The Special and the General Theory (Translated by Lawson R).

[17] Hirtz J, Stone R, McAdams D, Szykman S, Wood K. A functional basis for engineering design: Reconciling and evolving previous efforts. Research in Engineering Design. 2002;

[18] USPTO Patent Full-Text and Image Database [Internet]. Washington, DC: US PatentDatabase; 2016. Available from http://patft.uspto.gov/netacgi/nph-Par-ser?S ect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum. htm&r=1&f=G&l=50&s1=9000000.PN.&OS=PN/9000000&RS=PN/9000000 [Accessed:

[19] Top500 Lists. Top 500 [Internet]. 2015. Available from: http://www.top500.org/lists/2015/

[7] Duncker K. On problem-solving. Psychological Monographs. 1945;**58**(5):1-113

International Journal of Design Creativity and Innovation. 2014:**3**(1):1-28

[8] Jansson D, Smith S. Design fixation. DES Studies. 1991;**12**(1):3-11

*fixedness* problem. Psychological Science. 2012;**23**(3):215-218

New York: Barnes & Noble; 2004 (Original work published 1920)

Random House; 1968 (Original work published 1901)

January 5, 2018]

1983;**15**(1):1-38

5, 2018]

**13**(2):65-82

February 11, 2016]

06/ [Accessed: February 11, 2016]

Review. 2015;**93**(12):82-89

reality/ [Accessed: January 5, 2018]

158 Artificial Intelligence - Emerging Trends and Applications

15. DOI: 10.1017/S0890060416000524


Design Engineering Technical Conferences & Computers and Information in Engineering Conferences. New York, NY: ASME; 2008. pp. 21-32

**Chapter 8**

Provisional chapter

**Min** *k***-Cut for Asset Selection in Risk-Based Portfolio**


DOI: 10.5772/intechopen.74455

Risk-based portfolio strategies such as the equal-weighted, the minimum variance, and the risk parity portfolios vie to find portfolios that are well diversified according to their respective measures. In this chapter, we propose asset-selected risk-based portfolio strategies that aim to reduce the two known weaknesses of these strategies, namely the large portfolio size and poor diversification with respect to other measures. We formulate this task as a minimum k-cut problem through which we establish asset selection from all assets in the investable universe before the risk-based strategy is applied. Empirical results on the data sets of the S&P 500 and the KOSPI 200 indicate that our asset-selected riskbased portfolio strategies possess superior properties across extensive performance mea-

Keywords: alternative portfolio management, smart beta strategy, risk-based portfolio,

Portfolio selection has been a main research topic in finance for over 60 years, dating back to the seminal paper by Markowitz [1] which laid the foundation of modern portfolio theory. Markowitz's analysis has established the mean–variance-efficient portfolios which achieve optimal tradeoff between return and risk. Extensive efforts have been made into various directions since then including the development of the capital asset-pricing model (CAPM) [2]. Most recent studies on this topic of portfolio selection have focused on alternative or "smart" beta strategies which exploit risk premia other than the systematic risk, or seek a better diversification of risk. These are quantitative approaches to portfolio selection and have played an important role in the field of portfolio management recently. These lie in between an

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

**Strategies**

Abstract

1. Introduction

Strategies

k

Min

Saejoon Kim and Soong Kim

Saejoon Kim and Soong Kim

http://dx.doi.org/10.5772/intechopen.74455

Additional information is available at the end of the chapter

sures compared to the baseline risk-based strategies.

minimum k-cut, portfolio optimization

Additional information is available at the end of the chapter


#### **Min** *k***-Cut for Asset Selection in Risk-Based Portfolio Strategies** Min k-Cut for Asset Selection in Risk-Based Portfolio Strategies

DOI: 10.5772/intechopen.74455

Design Engineering Technical Conferences & Computers and Information in Engineering

[36] Linsey J, Markman A, Wood K. Design by analogy: A study of the WordTree method for problem re-representation. ASME Transactions, Journal of Mechanical Design. 2012;

[37] Aho A, Sethi R, Ullman J. Compilers: Principles, Techniques, and Tools. New York: Addison

[38] McCaffrey A. Analogy Finder. U.S. Patent US9501469B2. Washington, DC: USPTO;

Conferences. New York, NY: ASME; 2008. pp. 21-32

[39] Marks P. Eureka machines. New Scientist. 2015;**227**(3036):32-35

**134**:041009-041012

160 Artificial Intelligence - Emerging Trends and Applications

November 22, 2016

Wesley; 1986

Saejoon Kim and Soong Kim Saejoon Kim and Soong Kim

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74455

#### Abstract

Risk-based portfolio strategies such as the equal-weighted, the minimum variance, and the risk parity portfolios vie to find portfolios that are well diversified according to their respective measures. In this chapter, we propose asset-selected risk-based portfolio strategies that aim to reduce the two known weaknesses of these strategies, namely the large portfolio size and poor diversification with respect to other measures. We formulate this task as a minimum k-cut problem through which we establish asset selection from all assets in the investable universe before the risk-based strategy is applied. Empirical results on the data sets of the S&P 500 and the KOSPI 200 indicate that our asset-selected riskbased portfolio strategies possess superior properties across extensive performance measures compared to the baseline risk-based strategies.

Keywords: alternative portfolio management, smart beta strategy, risk-based portfolio, minimum k-cut, portfolio optimization

#### 1. Introduction

Portfolio selection has been a main research topic in finance for over 60 years, dating back to the seminal paper by Markowitz [1] which laid the foundation of modern portfolio theory. Markowitz's analysis has established the mean–variance-efficient portfolios which achieve optimal tradeoff between return and risk. Extensive efforts have been made into various directions since then including the development of the capital asset-pricing model (CAPM) [2]. Most recent studies on this topic of portfolio selection have focused on alternative or "smart" beta strategies which exploit risk premia other than the systematic risk, or seek a better diversification of risk. These are quantitative approaches to portfolio selection and have played an important role in the field of portfolio management recently. These lie in between an

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

active management and the passive management manifested by the perennial market capitalization-weighted portfolio. A major category in the class of alternative beta strategies is the risk-based portfolio strategies [3–6] whose main objective is to manage risk more effectively than the market capitalization-weighted portfolio.

We tested and compared the risk-based strategies with the proposed asset-selected risk-based strategies on the data sets of the S&P 500 from 1990 to 2016 and the KOSPI 200 from 2002 to 2016 where the latter is the set of sector-diversified largest 200 companies by market capitalization listed in the Korea Stock Exchange. Therefore, the assets that we refer to in this chapter are all stocks.

Min *k*-Cut for Asset Selection in Risk-Based Portfolio Strategies

http://dx.doi.org/10.5772/intechopen.74455

163

The organization of this chapter is as follows. In the next section, we present the formulation of the three risk-based portfolio strategies along with the associated diversification measures. In Section 3, we present the minimum k-cut-based asset selection method that forms the basis from which our contributions of this chapter come. It is followed by the presentation of the asset-selected risk-based portfolio strategies in Section 4. Extensive empirical results for the strategies are presented in Section 5, and the conclusion of the chapter is provided in Section 6.

We consider the following three types of risk-based portfolio strategies from which we aim to make asset selection in an effective fashion: equal-weighted, minimum variance, and risk parity. We also make comparison with the market capitalization-weighted portfolio. While this portfolio does not generate competitive returns, it has a nice property of automatic rebalancing and serves as a common benchmark against which other alternative investment strategies can assess their performances. We explore these portfolios under the long-only constraint which will guarantee a unique solution and provide more realistic investing environment for many investors. We list the four portfolio strategies subsequently where the corresponding abbrevi-

To formulate the portfolio strategies, let us introduce some notations that will be used through-

, portfolio weight vector σð Þx portfolio volatility using x as the portfolio weight vector

Therefore, in our notations, σ and x are length—N vectors where t denotes the transpose

<sup>Σ</sup><sup>x</sup> <sup>p</sup> . As each portfolio strategy is completely defined by its

N number of assets in the investable universe

2. Risk-based portfolio strategies

ations are shown inside the parentheses:

2. Equal-weighted portfolio (EW)

4. Risk parity portfolio (RP)

out this chapter:

3. Minimum variance portfolio (MV)

1. Market capitalization-weighted portfolio (M)

Σ the covariance matrix σ<sup>i</sup> volatility of asset i

xi weight of asset i in the portfolio

xt

<sup>σ</sup> <sup>¼</sup> ð Þ <sup>σ</sup>1; <sup>⋯</sup>; <sup>σ</sup><sup>N</sup> <sup>t</sup>

<sup>x</sup> <sup>¼</sup> ð Þ <sup>x</sup>1; <sup>⋯</sup>; xN <sup>t</sup>

operator. Note that <sup>σ</sup>ð Þ¼ <sup>x</sup> ffiffiffiffiffiffiffiffiffiffi

Currently, the existing representative risk-based portfolio strategies include the equal-weighted portfolio in which every asset has an equal weight, the minimum variance portfolio that achieves the smallest volatility, and the risk parity portfolio of [7, 8] in which every asset in the portfolio is exposed to equal risk. Clearly, diversification benefits of the three risk-based portfolios do not perfectly coincide. An important and favorable characteristic of these strategies is that they do not require the estimation of the expected returns, which is very error-prone, in their formulations. A somewhat comprehensive description of these strategies in terms of risk factors was presented in [3], and it was shown that the equal-weighted and the risk parity portfolios are special cases of the constrained minimum variance portfolio in [7]. A general framework of the quantitative asset allocation models of the three risk-based portfolio strategies has been presented in [5], and a detailed comparison of the strategies has also been provided in [6].

In this chapter, we propose to improve the characteristic and the performance of the risk-based portfolio strategies. Firstly, we address the presence of an inherent problem in the exact implementations of the equal-weighted and the risk parity portfolios that arise from the large cardinality of the respective portfolios. By construction, each of these portfolios has cardinality equal to that of the investable universe which can be too large to be implemented exactly in practice for many investors. To this end, we investigate a preselection of assets from the set of investable universe prior to implementing the risk-based portfolio strategies in order to reduce the portfolio cardinality to a more manageable size. This part relates to the improvement in the "characteristic" of the risk-based strategies.

Secondly, we address the relative weakness of a risk-based strategy with respect to some diversification measures. For example, the minimum variance portfolio produces the portfolio with the smallest variance, however, also one that is poorly diversified with respect to weight and to risk. Similarly, the equal-weighted portfolio produces the portfolio with a perfect weight diversification but one that also has a relatively high variance. To this end, as in the first case, we consider a preselection of assets and, in particular, the selection of "diversified" assets that will endow each risk-based strategy a "better" assets pool from which the portfolios are constructed. Consequently, the risk-based portfolio strategies defined on our diversified assets pool will perform superior to those defined on the original investable universe across different diversification measures and also with respect to other more popular performance measures such as the return and the Sharpe ratio as well; our results will be shown later. This part relates to the improvement in the "performance" of the risk-based strategies.

The described improvements are achieved simultaneously by formulating the problem as a minimum k-cut problem with assets represented as vertices in the graph. As the risk-based strategies are applied only after the assets have been selected for inclusion in the assets pool, we call our proposed strategies "asset-selected risk-based portfolio strategies." Furthermore, our asset-selected risk-based portfolio strategies require modest additional computational cost to the respective baseline risk-based ones.

We tested and compared the risk-based strategies with the proposed asset-selected risk-based strategies on the data sets of the S&P 500 from 1990 to 2016 and the KOSPI 200 from 2002 to 2016 where the latter is the set of sector-diversified largest 200 companies by market capitalization listed in the Korea Stock Exchange. Therefore, the assets that we refer to in this chapter are all stocks.

The organization of this chapter is as follows. In the next section, we present the formulation of the three risk-based portfolio strategies along with the associated diversification measures. In Section 3, we present the minimum k-cut-based asset selection method that forms the basis from which our contributions of this chapter come. It is followed by the presentation of the asset-selected risk-based portfolio strategies in Section 4. Extensive empirical results for the strategies are presented in Section 5, and the conclusion of the chapter is provided in Section 6.

## 2. Risk-based portfolio strategies

active management and the passive management manifested by the perennial market capitalization-weighted portfolio. A major category in the class of alternative beta strategies is the risk-based portfolio strategies [3–6] whose main objective is to manage risk more effec-

Currently, the existing representative risk-based portfolio strategies include the equal-weighted portfolio in which every asset has an equal weight, the minimum variance portfolio that achieves the smallest volatility, and the risk parity portfolio of [7, 8] in which every asset in the portfolio is exposed to equal risk. Clearly, diversification benefits of the three risk-based portfolios do not perfectly coincide. An important and favorable characteristic of these strategies is that they do not require the estimation of the expected returns, which is very error-prone, in their formulations. A somewhat comprehensive description of these strategies in terms of risk factors was presented in [3], and it was shown that the equal-weighted and the risk parity portfolios are special cases of the constrained minimum variance portfolio in [7]. A general framework of the quantitative asset allocation models of the three risk-based portfolio strategies has been presented in [5], and a detailed comparison of the strategies has also been provided in [6].

In this chapter, we propose to improve the characteristic and the performance of the risk-based portfolio strategies. Firstly, we address the presence of an inherent problem in the exact implementations of the equal-weighted and the risk parity portfolios that arise from the large cardinality of the respective portfolios. By construction, each of these portfolios has cardinality equal to that of the investable universe which can be too large to be implemented exactly in practice for many investors. To this end, we investigate a preselection of assets from the set of investable universe prior to implementing the risk-based portfolio strategies in order to reduce the portfolio cardinality to a more manageable size. This part relates to the improvement in the

Secondly, we address the relative weakness of a risk-based strategy with respect to some diversification measures. For example, the minimum variance portfolio produces the portfolio with the smallest variance, however, also one that is poorly diversified with respect to weight and to risk. Similarly, the equal-weighted portfolio produces the portfolio with a perfect weight diversification but one that also has a relatively high variance. To this end, as in the first case, we consider a preselection of assets and, in particular, the selection of "diversified" assets that will endow each risk-based strategy a "better" assets pool from which the portfolios are constructed. Consequently, the risk-based portfolio strategies defined on our diversified assets pool will perform superior to those defined on the original investable universe across different diversification measures and also with respect to other more popular performance measures such as the return and the Sharpe ratio as well; our results will be shown later. This

part relates to the improvement in the "performance" of the risk-based strategies.

The described improvements are achieved simultaneously by formulating the problem as a minimum k-cut problem with assets represented as vertices in the graph. As the risk-based strategies are applied only after the assets have been selected for inclusion in the assets pool, we call our proposed strategies "asset-selected risk-based portfolio strategies." Furthermore, our asset-selected risk-based portfolio strategies require modest additional computational cost

tively than the market capitalization-weighted portfolio.

162 Artificial Intelligence - Emerging Trends and Applications

"characteristic" of the risk-based strategies.

to the respective baseline risk-based ones.

We consider the following three types of risk-based portfolio strategies from which we aim to make asset selection in an effective fashion: equal-weighted, minimum variance, and risk parity. We also make comparison with the market capitalization-weighted portfolio. While this portfolio does not generate competitive returns, it has a nice property of automatic rebalancing and serves as a common benchmark against which other alternative investment strategies can assess their performances. We explore these portfolios under the long-only constraint which will guarantee a unique solution and provide more realistic investing environment for many investors. We list the four portfolio strategies subsequently where the corresponding abbreviations are shown inside the parentheses:


To formulate the portfolio strategies, let us introduce some notations that will be used throughout this chapter:


Therefore, in our notations, σ and x are length—N vectors where t denotes the transpose operator. Note that <sup>σ</sup>ð Þ¼ <sup>x</sup> ffiffiffiffiffiffiffiffiffiffi xt <sup>Σ</sup><sup>x</sup> <sup>p</sup> . As each portfolio strategy is completely defined by its portfolio weight vector, let us describe each strategy by its weight vector. EW is the strategy in which all assets are given equal weights. The weight vector xEW for EW is given by the following:

$$\mathbf{x}\_{EW} = \frac{1}{N}\mathbf{1} \tag{1}$$

which is a convex optimization formulation that can be computed efficiently [9]. One can obtain an equivalent, but computationally less efficient, optimization problem as earlier and show that RP functions as a tradeoff between EW, which maximizes weight diversification with perfect disregard for volatility or variance reduction, and MV, which minimizes variance with perfect disregard for weight diversification [7]. As a consequence, it can be deduced that.

σð Þ xMV ≤ σð Þ xRP ≤ σð Þ xEW :

To quantify the amount of diversification attained by the three risk-based portfolio strategies, in the following, we list the three diversification profile measures [6] for the weight vector x of

1

, (5)

Min *k*-Cut for Asset Selection in Risk-Based Portfolio Strategies

http://dx.doi.org/10.5772/intechopen.74455

165

<sup>Σ</sup><sup>x</sup> (6)

: (7)

N P N i¼1 x2 i

xt MVΣxMV xt

N P N i¼1

high-weight diversification but relatively low volatility reduction.

1

These equations show that the weight vector x for EW, MV, and RP achieves the highest weight diversification, volatility reduction, and risk diversification, respectively. Note that each of the measures assumes values between 0 and 1, which represent the lowest and the highest levels of

To get an idea of how the three risk-based strategies fare with respect to the three diversification profile measures, in Table 1, we list the values of the risk-based portfolio strategydiversification profile measure pairs for the S&P 500 and the KOSPI 200 data. The values shown are the averages of the 105 and 56 periods we used in our experiments for the S&P 500 and the KOSPI 200 data, respectively. The table shows a similar pattern for the two data sets with respect to the strong and weak profile measures for each risk-based portfolio strategy, suggesting that this is a characteristic of the strategies. In particular, EW showed a relatively high-risk diversification but a relatively weak volatility reduction as intuition might suggest. MV showed a very low-weight diversification and risk diversification. RP showed a relatively

Before we further proceed to examine our asset-selected portfolio strategies, let us describe our experiment setting which is as follows. For the S&P 500 data, the investing time horizon spans from February 1, 1990, to May 2, 2016, that constitutes a total of 105 quarters, and for the

ð Þ RCi=σð Þx

2

a portfolio strategy. They are (1) weight diversification

(2) volatility reduction

and (3) risk diversification

diversification attained, respectively.

where 1 is the all-ones vector. While the strategy looks too simple to generate consistent positive excess returns, it has shown to outperform M considerably, and no other alternative investment strategies were consistently better than EW for a wide selection of markets and holding periods [3, 4]. MV [1] is the strategy in which the portfolio volatility is minimized. The weight vector xMV for MV is given by the following:

$$\begin{aligned} \mathbf{x}\_{MV} &= \quad \arg\min\_{\mathbf{x}} \left( \mathbf{x}^{\dagger} \Sigma \mathbf{x} \right) \\ \text{s.t.} &\quad \mathbf{1}^{\dagger} \mathbf{x} = \mathbf{1}, \\ &\quad \mathbf{x} \ge \mathbf{0}. \end{aligned} \tag{2}$$

MV provides an optimal return-risk tradeoff, and in particular, it lies on the leftmost tip of the efficient frontier curve. MV is perhaps the most stable portfolio among the ones along the efficient frontier curve as most of the estimation errors come from that of the returns, and this fact adds to the appeal of MV. RP [7, 8] is the strategy in which the risk associated with each asset is the same across all assets in the portfolio. Specifically, let the risk contribution of asset i, RCi, be defined as

$$R\mathbb{C}\_{i} = \mathbf{x}\_{i} \cdot \frac{\delta\sigma(\mathbf{x})}{\delta\mathbf{x}\_{i}} = \frac{\mathbf{x}\_{i}^{2}\sigma\_{i}^{2} + \mathbf{x}\_{i}\sum\_{j\neq i}\mathbf{x}\_{j}\sigma\_{ij}}{\sigma(\mathbf{x})} \tag{3}$$

where δσð Þ<sup>x</sup> <sup>δ</sup>xi is the marginal risk contribution of asset i, and σij is the covariance of assets i and j. RP requires that.

$$RC\_i = RC\_j \quad \text{for all } i, j.$$

Note that P N i¼1 RCi ¼ σð Þx , and thus the risk contribution from each asset adds up to the portfolio risk or volatility. We also note that in MV, the marginal risk contributions are all equal for all assets, that is, δσð Þ<sup>x</sup> <sup>δ</sup>xi <sup>¼</sup> δσð Þ<sup>x</sup> δxj for all i, j: It is known that RP possesses a unique solution in longonly investment environment [9] which is our case under study. The weight vector xRP for RP is given by the following:

$$\begin{aligned} \mathbf{x}\_{\text{RP}} &= \quad \arg\min\_{\mathbf{x}} \left( \frac{1}{2} \mathbf{x}^{t} \boldsymbol{\Sigma} \mathbf{x} - \sum\_{i=1}^{n} \mathbf{l} \mathbf{x}\_{i} \right) \\ \text{s.t.} \quad \mathbf{1}^{t} \mathbf{x} &= \mathbf{1}, \\ \mathbf{x} &> \mathbf{0} \end{aligned} \tag{4}$$

which is a convex optimization formulation that can be computed efficiently [9]. One can obtain an equivalent, but computationally less efficient, optimization problem as earlier and show that RP functions as a tradeoff between EW, which maximizes weight diversification with perfect disregard for volatility or variance reduction, and MV, which minimizes variance with perfect disregard for weight diversification [7]. As a consequence, it can be deduced that.

$$
\sigma(\mathbf{x}\_{MV}) \le \sigma(\mathbf{x}\_{RP}) \le \sigma(\mathbf{x}\_{EW}) \,.
$$

To quantify the amount of diversification attained by the three risk-based portfolio strategies, in the following, we list the three diversification profile measures [6] for the weight vector x of a portfolio strategy. They are (1) weight diversification

$$\frac{1}{N\sum\_{i=1}^{N} x\_i^2},\tag{5}$$

(2) volatility reduction

portfolio weight vector, let us describe each strategy by its weight vector. EW is the strategy in which all assets are given equal weights. The weight vector xEW for EW is given by the

<sup>x</sup>EW <sup>¼</sup> <sup>1</sup>

where 1 is the all-ones vector. While the strategy looks too simple to generate consistent positive excess returns, it has shown to outperform M considerably, and no other alternative investment strategies were consistently better than EW for a wide selection of markets and holding periods [3, 4]. MV [1] is the strategy in which the portfolio volatility is minimized. The

<sup>x</sup>MV <sup>¼</sup> arg min<sup>x</sup> <sup>x</sup><sup>t</sup>

MV provides an optimal return-risk tradeoff, and in particular, it lies on the leftmost tip of the efficient frontier curve. MV is perhaps the most stable portfolio among the ones along the efficient frontier curve as most of the estimation errors come from that of the returns, and this fact adds to the appeal of MV. RP [7, 8] is the strategy in which the risk associated with each asset is the same across all assets in the portfolio. Specifically, let the risk contribution of asset i,

> ¼ x2 i σ2 <sup>i</sup> þ xi P <sup>j</sup>6¼<sup>i</sup> xjσij

RCi ¼ RCj for all i, j:

risk or volatility. We also note that in MV, the marginal risk contributions are all equal for all

only investment environment [9] which is our case under study. The weight vector xRP for RP

1 2 xt

<sup>x</sup>RP <sup>¼</sup> arg min<sup>x</sup>

s:t: 1<sup>t</sup>

x ¼ 1, x > 0

<sup>δ</sup>xi is the marginal risk contribution of asset i, and σij is the covariance of assets i and j.

RCi ¼ σð Þx , and thus the risk contribution from each asset adds up to the portfolio

for all i, j: It is known that RP possesses a unique solution in long-

<sup>Σ</sup><sup>x</sup> �X<sup>n</sup> i¼1 lnxi

!

x ¼ 1, x ≥ 0:

s:t: 1<sup>t</sup>

Σx � �

weight vector xMV for MV is given by the following:

164 Artificial Intelligence - Emerging Trends and Applications

RCi ¼ xi �

δσð Þx δxi

<sup>N</sup> <sup>1</sup> (1)

<sup>σ</sup>ð Þ<sup>x</sup> (3)

(2)

(4)

following:

RCi, be defined as

where δσð Þ<sup>x</sup>

Note that P

RP requires that.

N i¼1

assets, that is, δσð Þ<sup>x</sup>

is given by the following:

<sup>δ</sup>xi <sup>¼</sup> δσð Þ<sup>x</sup> δxj

$$\frac{\mathbf{x}\_{MV}^t \Sigma \mathbf{x}\_{MV}}{\mathbf{x}^t \Sigma \mathbf{x}} \tag{6}$$

and (3) risk diversification

$$\frac{1}{N\sum\_{i=1}^{N} \left(\mathcal{RC}\_i/\sigma(\mathbf{x})\right)^2}. \tag{7}$$

These equations show that the weight vector x for EW, MV, and RP achieves the highest weight diversification, volatility reduction, and risk diversification, respectively. Note that each of the measures assumes values between 0 and 1, which represent the lowest and the highest levels of diversification attained, respectively.

To get an idea of how the three risk-based strategies fare with respect to the three diversification profile measures, in Table 1, we list the values of the risk-based portfolio strategydiversification profile measure pairs for the S&P 500 and the KOSPI 200 data. The values shown are the averages of the 105 and 56 periods we used in our experiments for the S&P 500 and the KOSPI 200 data, respectively. The table shows a similar pattern for the two data sets with respect to the strong and weak profile measures for each risk-based portfolio strategy, suggesting that this is a characteristic of the strategies. In particular, EW showed a relatively high-risk diversification but a relatively weak volatility reduction as intuition might suggest. MV showed a very low-weight diversification and risk diversification. RP showed a relatively high-weight diversification but relatively low volatility reduction.

Before we further proceed to examine our asset-selected portfolio strategies, let us describe our experiment setting which is as follows. For the S&P 500 data, the investing time horizon spans from February 1, 1990, to May 2, 2016, that constitutes a total of 105 quarters, and for the


assets pool which is the universe of all investable assets, specifically, N. Therefore, to accurately implement EW and RP, the investor has to hold all assets that exist in the investable universe (normally, in the order of hundreds) in the portfolio. This may be too difficult to achieve in practice, and furthermore, holding all assets in the investable universe may not be an investor's idea of a portfolio. Thus in reality, the implementation of EW and RP is sometimes vastly approximated by various heuristic approaches created by the investor. In this chapter, we present a systematic way to reduce and control the size of the portfolios generated by these strategies. This characteristic of our proposed strategy may be very beneficial from the practical point of view. Strategy MV, on the other hand, normally generates a very concentrated set of assets whose size may be larger or, typically, smaller than an investor's preferred value. In the case of the former, our contribution will provide a systematic way to reduce the portfolio size for this strategy. In the case of the latter, our proposed strategy may have a slight adverse effect in this respect. As a side note, for the MV strategy, one may add cardinality constraint to Eq. (2) to match the size of the portfolio with investor's investment constraints; however, this will result in a mixed integer quadratic programming

Secondly, we address the possibility of performance enhancement of the risk-based portfolio strategies. Specifically, we have witnessed that risk-based strategies show, sometimes serious, weakness in some of the diversification profile measures. As this phenomenon is attributed to the disregard for one measure of diversification by a strategy that optimizes a different measure of diversification, we consider an assets pool that is well diversified on which the risk-based strategies are executed. For this matter, we present a systematic way to execute assets selection from the pool of all investable assets such that its effect will be an improved

Relating the above two directions of improvement, in this chapter, we present an assets selection method that realizes the two improvements simultaneously. To describe our method,

is the correlation coefficient between assets i and j. To make the matrix to have nonnegative

form the weighted graph G whose adjacency matrix is Re. This complete graph G has the property that more (less) correlated pair of assets has higher (lower) weighted edge. Therefore, to obtain a set of k ð Þ ≪ N assets that are least correlated with each other, one may partition the graph G into k-connected subgraphs so that the edges removed to obtain the partition has a minimum weight and then pick an asset in each partition according to some rule. This approach to obtaining a set of k assets with the described property complements the main objective of some of the risk-based portfolios. Specifically, recall that the main objectives of EW and RP are the maximizations of weight and risk diversifications, respectively, with perfect disregard for other measures. This suggests the implication of the presence of, possibly highly, correlated assets in the obtained portfolios. Thus, for these strategies, the preselection of assets seems beneficial. On the other hand, the main objective of MV is the maximization of volatility reduction in which lesser correlated assets are selected to some degree. Therefore for this

<sup>r</sup>ij for all i, j <sup>¼</sup> <sup>1</sup>, <sup>⋯</sup>, N, and let us consider the new matrix <sup>R</sup><sup>e</sup> <sup>¼</sup> <sup>r</sup>eij � �<sup>N</sup>

of the set of all investable assets, where rij ¼

Min *k*-Cut for Asset Selection in Risk-Based Portfolio Strategies

http://dx.doi.org/10.5772/intechopen.74455

167

Δ σij σiσ<sup>j</sup> � �

. Now,

i,j¼1

performance across all diversification profile measures and return-risk tradeoffs.

i,j¼1

that is proven to be computationally hard.

consider the correlation matrix <sup>R</sup> <sup>¼</sup> <sup>r</sup>ij � �<sup>N</sup>

entries, let <sup>r</sup>eij<sup>¼</sup>

Δ e

Table 1. Diversification profile measure summary.

KOSPI 200 data, it spans from May 2, 2002, to 2016, for a total of 56 quarters. Both sets of data were collected on May 2, 2016, and the stock prices have been adjusted for dividends and splits before experiment.

The daily closing prices of stocks in the formation or the look-back period of a length of 252 days were used to execute the risk-based portfolio strategies whose resulting portfolios were put into effect the following trading day. There were no missing data in the closing prices of stocks, and no preprocessing of data was made as is common in the finance literature. This portfolio is held for one quarter after which portfolio rebalancing is performed. By portfolio rebalancing, we mean an independent and new execution of the portfolio strategy using data in the formation period of the most recent 252 days to update or rebalance the portfolio. This process is iterated throughout the investing time horizon, and specifically, portfolio rebalancing was made after market close on the last trading day of January, April, July, and October of each year. To clarify the terms used in this chapter, in Ref. to the date of portfolio formation (for the first portfolio) or rebalancing (for portfolios thereafter), let us call the preceding 252 days the "formation period" and the subsequent quarter, typically 58–60 days, the "holding quarter." The sequence of consecutive holding quarters is termed the "holding period" in this chapter which is the entire investing time horizon of our experiment.

#### 3. Min k-cut for asset selection

The risk-based portfolio strategies EW, MV, and RP described in the previous section provide substantial return–risk tradeoff advantage compared to M [3]. Moreover, the qualitative feature that is important about these strategies is that each achieves the maximum of the associated diversification profile measure. Building on top of these risk-based strategies, we present an improvement in two directions in this chapter. Firstly, we consider the size of the risk-based portfolio that is completely determined by the strategy that defines it. For example, strategies EW and RP, by definition, generate portfolios whose size is equal to that of the assets pool which is the universe of all investable assets, specifically, N. Therefore, to accurately implement EW and RP, the investor has to hold all assets that exist in the investable universe (normally, in the order of hundreds) in the portfolio. This may be too difficult to achieve in practice, and furthermore, holding all assets in the investable universe may not be an investor's idea of a portfolio. Thus in reality, the implementation of EW and RP is sometimes vastly approximated by various heuristic approaches created by the investor. In this chapter, we present a systematic way to reduce and control the size of the portfolios generated by these strategies. This characteristic of our proposed strategy may be very beneficial from the practical point of view. Strategy MV, on the other hand, normally generates a very concentrated set of assets whose size may be larger or, typically, smaller than an investor's preferred value. In the case of the former, our contribution will provide a systematic way to reduce the portfolio size for this strategy. In the case of the latter, our proposed strategy may have a slight adverse effect in this respect. As a side note, for the MV strategy, one may add cardinality constraint to Eq. (2) to match the size of the portfolio with investor's investment constraints; however, this will result in a mixed integer quadratic programming that is proven to be computationally hard.

Secondly, we address the possibility of performance enhancement of the risk-based portfolio strategies. Specifically, we have witnessed that risk-based strategies show, sometimes serious, weakness in some of the diversification profile measures. As this phenomenon is attributed to the disregard for one measure of diversification by a strategy that optimizes a different measure of diversification, we consider an assets pool that is well diversified on which the risk-based strategies are executed. For this matter, we present a systematic way to execute assets selection from the pool of all investable assets such that its effect will be an improved performance across all diversification profile measures and return-risk tradeoffs.

KOSPI 200 data, it spans from May 2, 2002, to 2016, for a total of 56 quarters. Both sets of data were collected on May 2, 2016, and the stock prices have been adjusted for dividends and splits

EW 1.000 0.229 0.838 MV 0.047 1.000 0.047 RP 0.765 0.329 1.000

EW 1.000 0.290 0.873 MV 0.095 1.000 0.095 RP 0.801 0.397 1.000

Weight diversification Volatility reduction Risk diversification

The daily closing prices of stocks in the formation or the look-back period of a length of 252 days were used to execute the risk-based portfolio strategies whose resulting portfolios were put into effect the following trading day. There were no missing data in the closing prices of stocks, and no preprocessing of data was made as is common in the finance literature. This portfolio is held for one quarter after which portfolio rebalancing is performed. By portfolio rebalancing, we mean an independent and new execution of the portfolio strategy using data in the formation period of the most recent 252 days to update or rebalance the portfolio. This process is iterated throughout the investing time horizon, and specifically, portfolio rebalancing was made after market close on the last trading day of January, April, July, and October of each year. To clarify the terms used in this chapter, in Ref. to the date of portfolio formation (for the first portfolio) or rebalancing (for portfolios thereafter), let us call the preceding 252 days the "formation period" and the subsequent quarter, typically 58–60 days, the "holding quarter." The sequence of consecutive holding quarters is termed the "holding

period" in this chapter which is the entire investing time horizon of our experiment.

The risk-based portfolio strategies EW, MV, and RP described in the previous section provide substantial return–risk tradeoff advantage compared to M [3]. Moreover, the qualitative feature that is important about these strategies is that each achieves the maximum of the associated diversification profile measure. Building on top of these risk-based strategies, we present an improvement in two directions in this chapter. Firstly, we consider the size of the risk-based portfolio that is completely determined by the strategy that defines it. For example, strategies EW and RP, by definition, generate portfolios whose size is equal to that of the

before experiment.

Table 1. Diversification profile measure summary.

166 Artificial Intelligence - Emerging Trends and Applications

S&P 500

KOSPI 200

3. Min k-cut for asset selection

Relating the above two directions of improvement, in this chapter, we present an assets selection method that realizes the two improvements simultaneously. To describe our method, consider the correlation matrix <sup>R</sup> <sup>¼</sup> <sup>r</sup>ij � �<sup>N</sup> i,j¼1 of the set of all investable assets, where rij ¼ Δ σij σiσ<sup>j</sup> � � is the correlation coefficient between assets i and j. To make the matrix to have nonnegative entries, let <sup>r</sup>eij<sup>¼</sup> Δ e <sup>r</sup>ij for all i, j <sup>¼</sup> <sup>1</sup>, <sup>⋯</sup>, N, and let us consider the new matrix <sup>R</sup><sup>e</sup> <sup>¼</sup> <sup>r</sup>eij � �<sup>N</sup> i,j¼1 . Now, form the weighted graph G whose adjacency matrix is Re. This complete graph G has the property that more (less) correlated pair of assets has higher (lower) weighted edge. Therefore, to obtain a set of k ð Þ ≪ N assets that are least correlated with each other, one may partition the graph G into k-connected subgraphs so that the edges removed to obtain the partition has a minimum weight and then pick an asset in each partition according to some rule. This approach to obtaining a set of k assets with the described property complements the main objective of some of the risk-based portfolios. Specifically, recall that the main objectives of EW and RP are the maximizations of weight and risk diversifications, respectively, with perfect disregard for other measures. This suggests the implication of the presence of, possibly highly, correlated assets in the obtained portfolios. Thus, for these strategies, the preselection of assets seems beneficial. On the other hand, the main objective of MV is the maximization of volatility reduction in which lesser correlated assets are selected to some degree. Therefore for this strategy, the benefit of the preselection of assets with the described property seems not to be as large as in the other cases.

diversified assets selection process as the correlation matrix R and thereby R~ is easily obtained

Min *k*-Cut for Asset Selection in Risk-Based Portfolio Strategies

http://dx.doi.org/10.5772/intechopen.74455

169

In the execution of the overall asset-selected portfolio selection, this part of asset selection is executed first and then the risk-based portfolio strategy is conducted. We have presented an "efficient" method to obtain a reduction in the size of the assets pool to any number of choices in this section. In the next two sections, we empirically demonstrate that this method also proves to be very "effective," characterized by superior return–risk tradeoff performances

In this section, we formally present our proposed asset-selected risk-based portfolio strategies. As mentioned before, the purpose of our proposed asset selection-based strategy is twofold. The first is endowing the investor the option to choose the exact size of the portfolio when the risk-based strategy is either EW or RP. The second purpose is obtaining a superior return-risk tradeoff through effective subset selection of the assets prior to applying the risk-based portfolio strategy. We empirically demonstrate that our proposed strategies can generate returns that

Now, to formally describe the strategies, let us denote the baseline risk-based strategy which

S\_μ

S∈f g EW; MV;RP ,

and μ is the name of the asset selection method. For our asset selection-based strategies, we picked the number of partitions k equal to approximately 25% of the total number of assets N in all of the experiments conducted in this chapter. In addition to the two methods of μ described in the previous section, we also tried selecting the asset with the largest market capitalization in

μ ∈f g Max; Min; MC :

To illustrate the performance of these risk-based portfolio strategies, the next three figures, Figures 1–3, exhibit the annualized return versus the annualized volatility plots of the three strategies, respectively, for the S&P 500 (marked by ∘) and the KOSPI 200 (marked by Δ) data. Figure 1 shows the plots for the EW-based strategies for the two data sets. Any point to the left of and/or above the point of the baseline strategy can be interpreted as improvement over the baseline strategy. We observe that EW\_Max, EW\_Min, and EW\_MC all lie above and/or to the left of EW for both data sets. The defining favorable characteristic of the asset selection-based

from the covariance matrix Σ.

compared to the baseline risk-based portfolio strategies.

4. Asset-selected risk-based portfolio strategies

are sufficiently higher than the pure, or the baseline, risk-based strategies.

where S is the name of the risk-based portfolio strategy, that is,

does not employ asset selection by S and the asset-selected risk-based strategy by

the last partition with N � k þ 1 assets. We denote this method as "MC," that is,

In all cases, by reducing the assets pool from the universe of all investable assets of size N to a set of k-diversified assets with respect to correlation with other assets, we have effectively executed a diversified assets selection. Therefore, it remains to describe (1) how to partition the graph into k-connected subgraphs, satisfying the constraints mentioned above, and (2) how to pick an asset in each partition.

The first part of this is precisely the minimum k-cut problem of which finding for the exact solution is well known to be NP-hard [10]. To this end, we use an efficient approximation algorithm to this problem that finds a minimum <sup>k</sup>-cut within a factor of 2 1 � <sup>1</sup> k � � of the optimal due to [11], which is as follows:

Min k-cut approximation algorithm [11]:


We note that the factor of 2 1 � <sup>1</sup> k � � of the optimal is still known as the best approximation factor for tractable algorithms for the minimum k-cut problem [12]. The complexity of this algorithm is dominated by that of finding the cuts r<sup>0</sup> <sup>1</sup>, r<sup>0</sup> <sup>2</sup>, ⋯, r<sup>0</sup> N Nð Þ �<sup>1</sup> <sup>=</sup><sup>2</sup> which can efficiently be calculated through the use of Gomory-Hu tree representation. Specifically, N � 1 max flow computations suffice to implement the above Min k-Cut Approximation Algorithm. Moreover, using the Gomory-Hu tree representation, all partitions but one in the k-cut contain exactly one vertex each with the remaining N � k þ 1 vertices being contained in the last partition. This characteristic of the algorithm when used with Gomory-Hu tree representation almost eliminates the need for the second part of our diversified assets selection as we need to pick only one vertex in the only partition that contains more than one vertex. Nevertheless, it remains to describe how to pick an asset in this last partition with more than one asset. To this end, we define the affinity of asset i, a ið Þ, as

$$a(i) = \sum\_{j \neq i} \tilde{\rho}\_{ij} \tag{8}$$

from the matrix R~ . Therefore, the affinity of asset i gives a measure of how the asset i is correlated with all other assets. Eq. (8) shows that the larger the value of a ið Þ, the more correlated the asset i is with other assets. To pick the one vertex in the last partition with N � k þ 1 vertices, we picked the vertex i with the highest a ið Þ in the partition as this vertex would appropriately serve as the "representative" of this partition. To see the effect of using affinity as the criterion for asset selection, we also tried picking the one vertex with the smallest a ið Þ in the partition. We labeled the former asset selection method as "Max," and the latter one as "Min." We note that this second part adds negligible computational burden on the overall diversified assets selection process as the correlation matrix R and thereby R~ is easily obtained from the covariance matrix Σ.

In the execution of the overall asset-selected portfolio selection, this part of asset selection is executed first and then the risk-based portfolio strategy is conducted. We have presented an "efficient" method to obtain a reduction in the size of the assets pool to any number of choices in this section. In the next two sections, we empirically demonstrate that this method also proves to be very "effective," characterized by superior return–risk tradeoff performances compared to the baseline risk-based portfolio strategies.

## 4. Asset-selected risk-based portfolio strategies

strategy, the benefit of the preselection of assets with the described property seems not to be as

In all cases, by reducing the assets pool from the universe of all investable assets of size N to a set of k-diversified assets with respect to correlation with other assets, we have effectively executed a diversified assets selection. Therefore, it remains to describe (1) how to partition the graph into k-connected subgraphs, satisfying the constraints mentioned above, and (2) how to

The first part of this is precisely the minimum k-cut problem of which finding for the exact solution is well known to be NP-hard [10]. To this end, we use an efficient approximation

for tractable algorithms for the minimum k-cut problem [12]. The complexity of this algorithm

through the use of Gomory-Hu tree representation. Specifically, N � 1 max flow computations suffice to implement the above Min k-Cut Approximation Algorithm. Moreover, using the Gomory-Hu tree representation, all partitions but one in the k-cut contain exactly one vertex each with the remaining N � k þ 1 vertices being contained in the last partition. This characteristic of the algorithm when used with Gomory-Hu tree representation almost eliminates the need for the second part of our diversified assets selection as we need to pick only one vertex in the only partition that contains more than one vertex. Nevertheless, it remains to describe how to pick an asset in this last partition with more than one asset. To this end, we define the

> a iðÞ¼ <sup>X</sup> j6¼i

from the matrix R~ . Therefore, the affinity of asset i gives a measure of how the asset i is correlated with all other assets. Eq. (8) shows that the larger the value of a ið Þ, the more correlated the asset i is with other assets. To pick the one vertex in the last partition with N � k þ 1 vertices, we picked the vertex i with the highest a ið Þ in the partition as this vertex would appropriately serve as the "representative" of this partition. To see the effect of using affinity as the criterion for asset selection, we also tried picking the one vertex with the smallest a ið Þ in the partition. We labeled the former asset selection method as "Max," and the latter one as "Min." We note that this second part adds negligible computational burden on the overall

<sup>2</sup>, ⋯, r<sup>0</sup>

<sup>1</sup>, r<sup>0</sup>

<sup>1</sup>, r<sup>0</sup>

� � of the optimal is still known as the best approximation factor

<sup>2</sup>, ⋯, r<sup>0</sup>

N Nð Þ �<sup>1</sup> <sup>=</sup>2.

N Nð Þ �<sup>1</sup> <sup>=</sup><sup>2</sup> which can efficiently be calculated

<sup>r</sup>eij (8)

k

� � of the optimal

<sup>i</sup> is picked only if it is not

algorithm to this problem that finds a minimum <sup>k</sup>-cut within a factor of 2 1 � <sup>1</sup>

1. For each edge <sup>r</sup>eij , pick a minimum weight cut that separates the end points of <sup>r</sup>eij .

large as in the other cases.

168 Artificial Intelligence - Emerging Trends and Applications

pick an asset in each partition.

due to [11], which is as follows:

We note that the factor of 2 1 � <sup>1</sup>

contained in r<sup>0</sup>

affinity of asset i, a ið Þ, as

Min k-cut approximation algorithm [11]:

<sup>1</sup>∪⋯∪r<sup>0</sup>

is dominated by that of finding the cuts r<sup>0</sup>

2. Sort these cuts by increasing weight, obtaining the list r<sup>0</sup>

<sup>i</sup>�1.

3. Greedily, pick cuts from this list until their union is a k-cut; cut r<sup>0</sup>

k

In this section, we formally present our proposed asset-selected risk-based portfolio strategies. As mentioned before, the purpose of our proposed asset selection-based strategy is twofold. The first is endowing the investor the option to choose the exact size of the portfolio when the risk-based strategy is either EW or RP. The second purpose is obtaining a superior return-risk tradeoff through effective subset selection of the assets prior to applying the risk-based portfolio strategy. We empirically demonstrate that our proposed strategies can generate returns that are sufficiently higher than the pure, or the baseline, risk-based strategies.

Now, to formally describe the strategies, let us denote the baseline risk-based strategy which does not employ asset selection by S and the asset-selected risk-based strategy by

S\_μ

where S is the name of the risk-based portfolio strategy, that is,

S∈f g EW; MV;RP ,

and μ is the name of the asset selection method. For our asset selection-based strategies, we picked the number of partitions k equal to approximately 25% of the total number of assets N in all of the experiments conducted in this chapter. In addition to the two methods of μ described in the previous section, we also tried selecting the asset with the largest market capitalization in the last partition with N � k þ 1 assets. We denote this method as "MC," that is,

## μ ∈f g Max; Min; MC :

To illustrate the performance of these risk-based portfolio strategies, the next three figures, Figures 1–3, exhibit the annualized return versus the annualized volatility plots of the three strategies, respectively, for the S&P 500 (marked by ∘) and the KOSPI 200 (marked by Δ) data. Figure 1 shows the plots for the EW-based strategies for the two data sets. Any point to the left of and/or above the point of the baseline strategy can be interpreted as improvement over the baseline strategy. We observe that EW\_Max, EW\_Min, and EW\_MC all lie above and/or to the left of EW for both data sets. The defining favorable characteristic of the asset selection-based

Figure 1. Annualized return versus volatility for EW-based strategies.

Figure 2. Annualized return versus volatility for MV-based strategies.

strategies that can be deduced from the performances of EW\_Max, EW\_Min, and EW\_MC is that the number of assets in the portfolio can be reduced significantly to any number of investor's choice, facilitating easier portfolio management while, at the same time, improving the return–risk performance!

strategies, and in particular, the plots resemble those of the EW-based strategies, that is, RP\_Max, RP\_Min, and RP\_MC all lie above and/or to the left of RP. As RP produces the portfolio only with equal-risk exposure to all assets, it is clearly benefitted by making diversi-

EW\_Max/Min/MC 1.000 0.349/0.344/0.350 0.835/0.829/0.835 MV\_Max/Min/MC 0.131/0.136/0.131 1.000 0.131/0.137/0.131

EW\_Max/Min/MC 1.000 0.456/0.440/0.470 0.863/0.857/0.869 MV\_Max/Min/MC 0.253/0.256/0.257 1.000 0.253/0.257/0.257

RP\_Max/Min/MC 0.842/0.826/0.843 0.479/0.484/0.480 1.000

RP\_Max/Min/MC 0.859/0.835/0.864 0.603/0.595/0.612 1.000

Table 2. Diversification profile measure summary for asset selection-based strategies.

Weight diversification Volatility reduction Risk diversification

Min *k*-Cut for Asset Selection in Risk-Based Portfolio Strategies

http://dx.doi.org/10.5772/intechopen.74455

171

In Table 2, we list the values of the three diversification profile measures of S\_ μ, S ∈ f EW, MV, RP g and μ∈f Max, Min, MC g for the S&P 500 and the KOSPI 200 data analogous to Table 1. The diversification profile measures were calculated so that S\_ μ, S ∈f EW, MV, RP g achieves 1 for weight diversification, volatility reduction, and risk diversification, respectively, for every μ∈f Max, Min, MC g. Table 2 shows a similar trend with respect to the stronger and weaker profile measures for each of the asset-selected risk-based portfolio strategies as in Table 1. However, a noteworthy finding from Table 2 is that, compared to Table 1, the values of the profile measures have increased significantly across all profile measures for all portfolio

fied assets selection.

S&P 500

KOSPI 200

Figure 3. Annualized return versus volatility for RP-based strategies.

Figure 2 shows the similar plots for the MV-based strategies. As one can expect, since the baseline strategy achieves the minimum volatility among all strategy types, improvement cannot be and was not made with respect to annualized volatility from the asset selectionbased strategies. On the other hand, for the KOSPI 200 data set in particular, asset selectionbased strategies were able to produce higher returns than the baseline strategy which may be attributable to the diversified assets selection. Figure 3 shows the plots for the RP-based

Figure 3. Annualized return versus volatility for RP-based strategies.


Table 2. Diversification profile measure summary for asset selection-based strategies.

strategies that can be deduced from the performances of EW\_Max, EW\_Min, and EW\_MC is that the number of assets in the portfolio can be reduced significantly to any number of investor's choice, facilitating easier portfolio management while, at the same time, improving

Figure 2 shows the similar plots for the MV-based strategies. As one can expect, since the baseline strategy achieves the minimum volatility among all strategy types, improvement cannot be and was not made with respect to annualized volatility from the asset selectionbased strategies. On the other hand, for the KOSPI 200 data set in particular, asset selectionbased strategies were able to produce higher returns than the baseline strategy which may be attributable to the diversified assets selection. Figure 3 shows the plots for the RP-based

the return–risk performance!

Figure 1. Annualized return versus volatility for EW-based strategies.

170 Artificial Intelligence - Emerging Trends and Applications

Figure 2. Annualized return versus volatility for MV-based strategies.

strategies, and in particular, the plots resemble those of the EW-based strategies, that is, RP\_Max, RP\_Min, and RP\_MC all lie above and/or to the left of RP. As RP produces the portfolio only with equal-risk exposure to all assets, it is clearly benefitted by making diversified assets selection.

In Table 2, we list the values of the three diversification profile measures of S\_ μ, S ∈ f EW, MV, RP g and μ∈f Max, Min, MC g for the S&P 500 and the KOSPI 200 data analogous to Table 1. The diversification profile measures were calculated so that S\_ μ, S ∈f EW, MV, RP g achieves 1 for weight diversification, volatility reduction, and risk diversification, respectively, for every μ∈f Max, Min, MC g. Table 2 shows a similar trend with respect to the stronger and weaker profile measures for each of the asset-selected risk-based portfolio strategies as in Table 1. However, a noteworthy finding from Table 2 is that, compared to Table 1, the values of the profile measures have increased significantly across all profile measures for all portfolio strategies. Furthermore, such an improvement is consistent across all asset selection methods and both data sets. This result serves as good evidence that effective subset selection of assets prior to applying the risk-based portfolio strategy improves the "quality" of the risk-based strategies in all of the diversification profile measures considered. In summary, Figures 1–3 and Tables 1 and 2 indicate that our proposed asset-selected risk-based portfolio strategies provide clear superior return-risk tradeoff and diversification profile measure performances to the baseline risk-based portfolio strategies.

## 5. Results

In this section, we present a broader set of empirical results for the asset-selected risk-based portfolio strategies presented in this chapter. This allows us to better understand the advantages and the disadvantages of the asset-selected portfolio strategies. For this matter, we examine the following set of performance measures: (A) cumulative return, (B) annual return average, (C) annual return standard deviation, (D) annualized return, (E) annualized volatility, (F) Sharpe ratio, (G) beta, (H) portfolio size, (I) maximum drawdown, and (J) one-way turnover. In this set of performance measures, all return measures are simple returns except the "cumulative return (A)" whose value is set to 1 on the first day of the holding period to explicitly show the increase in the value of the initial asset throughout the entire investing time horizon. The "portfolio size (H)" is the ratio of the size of the portfolio to that of the investable universe which is equal to N. We recall that the performances of all strategies considered in this chapter except the market capitalization-weighted portfolio M reflect survivorship bias of the same degree as all data pertaining to assets were collected on the same day. For the M strategy, we used the index to calculate for the performance measures.

methods need to be explored. We note that in all of the asset-selected risk-based strategies, as k � 1 assets have already been selected before selecting the last asset using the asset selection method μ, the choice of the asset selection method seemed not to matter too significantly in

A B CDE F GHI J

Min *k*-Cut for Asset Selection in Risk-Based Portfolio Strategies

http://dx.doi.org/10.5772/intechopen.74455

173

M 6.282 0.085 0.159 0.073 0.180 0.243 1.000 — 2.200 — EW 73.249 0.190 0.171 0.178 0.183 0.816 0.977 1.000 12.885 0.019 EW\_Max 95.972 0.203 0.180 0.190 0.183 0.881 0.965 0.252 16.067 0.370 EW\_Min 94.036 0.202 0.179 0.189 0.182 0.881 0.957 0.252 15.730 0.368 EW\_MC 93.786 0.202 0.179 0.189 0.183 0.875 0.965 0.252 15.786 0.366 MV 39.299 0.156 0.136 0.150 0.106 1.141 0.453 0.106 4.064 0.379 MV\_Max 40.084 0.158 0.130 0.151 0.117 1.049 0.489 0.068 5.505 0.573 MV\_Min 39.043 0.156 0.130 0.150 0.116 1.049 0.483 0.070 5.575 0.562 MV\_MC 40.218 0.158 0.130 0.151 0.117 1.050 0.489 0.068 5.531 0.574 RP 57.126 0.176 0.154 0.167 0.159 0.868 0.840 1.000 9.487 0.072 RP\_Max 68.401 0.185 0.159 0.175 0.160 0.912 0.836 0.252 11.168 0.398 RP\_Min 66.100 0.183 0.157 0.173 0.158 0.913 0.823 0.252 10.798 0.392 RP\_MC 67.305 0.184 0.158 0.174 0.160 0.908 0.836 0.252 11.040 0.394

Next, in Figure 4, we show the curves representing the cumulative returns over the entire investing time horizon for the S&P 500 data. For the asset-selected strategies, only the μ ¼ Max

terms of the performances as shown in the table.

Figure 4. Cumulative return curves for S&P 500 data.

Table 3. Performance summary for S&P 500 data.

#### 5.1. S&P 500 data

Table 3 shows the results of M, EW-based, MV-based, and RP-based strategies for the S&P 500 data. For the EW-based and the RP-based strategies, our asset-selected strategies showed a clear superior performance in terms of all types of returns considered, a comparable or slightly worse maximum drawdown performance, and an inferior one-way turnover performance to the respective performances of the baseline strategies. It seems that this single drawback of higher one-way turnover for the asset-selected strategies is an intrinsic characteristic of the strategies that stems from the construction and can be viewed as an implementation cost for the improved return-risk tradeoff performances gain. For the MV-based strategies, essentially no improvement was gained through asset selection. This behavior can be attributed to the fact that the baseline MV portfolio is already a somewhat diversified portfolio with respect to covariances between assets, so that adding the asset selection phase in portfolio construction does not help in terms of the performance measures. In fact, for the S&P 500 data, asset selection only contributed to adverse effect in terms of portfolio size, maximum drawdown, and one-way turnover as indicated by the table. Therefore, to improve the MV strategy in a similar order of magnitude as the EW and RP strategies, it seems that different asset selection



Table 3. Performance summary for S&P 500 data.

strategies. Furthermore, such an improvement is consistent across all asset selection methods and both data sets. This result serves as good evidence that effective subset selection of assets prior to applying the risk-based portfolio strategy improves the "quality" of the risk-based strategies in all of the diversification profile measures considered. In summary, Figures 1–3 and Tables 1 and 2 indicate that our proposed asset-selected risk-based portfolio strategies provide clear superior return-risk tradeoff and diversification profile measure performances to

In this section, we present a broader set of empirical results for the asset-selected risk-based portfolio strategies presented in this chapter. This allows us to better understand the advantages and the disadvantages of the asset-selected portfolio strategies. For this matter, we examine the following set of performance measures: (A) cumulative return, (B) annual return average, (C) annual return standard deviation, (D) annualized return, (E) annualized volatility, (F) Sharpe ratio, (G) beta, (H) portfolio size, (I) maximum drawdown, and (J) one-way turnover. In this set of performance measures, all return measures are simple returns except the "cumulative return (A)" whose value is set to 1 on the first day of the holding period to explicitly show the increase in the value of the initial asset throughout the entire investing time horizon. The "portfolio size (H)" is the ratio of the size of the portfolio to that of the investable universe which is equal to N. We recall that the performances of all strategies considered in this chapter except the market capitalization-weighted portfolio M reflect survivorship bias of the same degree as all data pertaining to assets were collected on the same day. For the M strategy,

Table 3 shows the results of M, EW-based, MV-based, and RP-based strategies for the S&P 500 data. For the EW-based and the RP-based strategies, our asset-selected strategies showed a clear superior performance in terms of all types of returns considered, a comparable or slightly worse maximum drawdown performance, and an inferior one-way turnover performance to the respective performances of the baseline strategies. It seems that this single drawback of higher one-way turnover for the asset-selected strategies is an intrinsic characteristic of the strategies that stems from the construction and can be viewed as an implementation cost for the improved return-risk tradeoff performances gain. For the MV-based strategies, essentially no improvement was gained through asset selection. This behavior can be attributed to the fact that the baseline MV portfolio is already a somewhat diversified portfolio with respect to covariances between assets, so that adding the asset selection phase in portfolio construction does not help in terms of the performance measures. In fact, for the S&P 500 data, asset selection only contributed to adverse effect in terms of portfolio size, maximum drawdown, and one-way turnover as indicated by the table. Therefore, to improve the MV strategy in a similar order of magnitude as the EW and RP strategies, it seems that different asset selection

the baseline risk-based portfolio strategies.

172 Artificial Intelligence - Emerging Trends and Applications

we used the index to calculate for the performance measures.

5. Results

5.1. S&P 500 data

methods need to be explored. We note that in all of the asset-selected risk-based strategies, as k � 1 assets have already been selected before selecting the last asset using the asset selection method μ, the choice of the asset selection method seemed not to matter too significantly in terms of the performances as shown in the table.

Next, in Figure 4, we show the curves representing the cumulative returns over the entire investing time horizon for the S&P 500 data. For the asset-selected strategies, only the μ ¼ Max

Figure 4. Cumulative return curves for S&P 500 data.

method is shown in the figure as this method was generally the best, albeit marginally, among the three methods. All curves start at 1, and we used the same color to represent the same type of risk-based portfolio strategy. Figure 4 shows that EW\_Max performs the best, and in particular, all asset-selected risk-based strategies outperform their respective baseline strategies, sometimes significantly.

#### 5.2. KOSPI 200 data

Table 4 shows the results of M, EW-based, MV-based, and RP-based strategies for the KOSPI 200 data. Similar to Table 3, for the EW-based and the RP-based strategies, our asset-selected strategies showed a clear superior performance in terms of all types of returns considered, in this case including a maximum drawdown as well, and an inferior one-way turnover performance to the respective performances of the baseline strategies. As in Table 3, the portfolio sizes are approximately 25% of N which facilitates easy portfolio management in contrast to the baseline strategies. Even for the MV strategy, our asset-selected strategies produced performance improvement across all measures but the one-way turnover. The magnitude of the improvement was not as large as the asset-selected strategies of the EW and RP cases; however, even in this MV case, adding the asset selection phase in portfolio construction facilitated a more comprehensive assets diversification than that obtainable only through variance minimization. As before, the choice of the asset selection method seemed not to matter too significantly while μ ¼ Max method generally outperformed the others. So for the KOSPI 200 data, our proposed asset-selected strategy produced a clear superior performance to the baseline strategy for all strategy types considered in this chapter.

Next, in Figure 5, we show the curves representing the cumulative returns over the entire investing time horizon for the KOSPI 200 data. As in Figure 4, only the μ ¼ Max method is shown for the asset-selected strategies as this method was the best among the three methods. As before, all curves start at 1, and we used the same color to represent the same type of riskbased portfolio strategy. Figure 5 shows that EW\_Max performs the best followed by RP\_Max and then MV\_Max. Consequently, all asset-selected risk-based strategies outperform their respective baseline strategies as in Figure 4. Summing up, Figures 4 and 5 describe that our proposed strategy's performance improvement is robust across both data sets which serves as

Min *k*-Cut for Asset Selection in Risk-Based Portfolio Strategies

http://dx.doi.org/10.5772/intechopen.74455

175

In this chapter, we considered the three types of risk-based portfolio strategies that have played an important role recently in the area of smart beta strategies. They are the equalweighted, the minimum variance, and the risk parity portfolios. By establishing an efficient and effective asset selection from assets in the investable universe before the risk-based portfolio strategies are applied, improvements in the characteristic and in the performance of the risk-based portfolio strategies were obtained. The improvement in the characteristic part allows the investor to pick the exact size of the portfolio for the equal-weighted and the risk parity portfolios. The improvement in the performance part is related to the performance improvement in all three risk-based portfolio strategies for various performance measures such as the returns, the Sharpe ratio, and the diversification measures. Empirical results on the data sets of the S&P 500 and the KOSPI 200 have indicated that our asset-selected risk-based

evidence that diversified assets selection contributes to superior portfolio returns.

6. Conclusions

Figure 5. Cumulative return curves for KOSPI 200 data.


Table 4. Performance summary for KOSPI 200 data.

Figure 5. Cumulative return curves for KOSPI 200 data.

Next, in Figure 5, we show the curves representing the cumulative returns over the entire investing time horizon for the KOSPI 200 data. As in Figure 4, only the μ ¼ Max method is shown for the asset-selected strategies as this method was the best among the three methods. As before, all curves start at 1, and we used the same color to represent the same type of riskbased portfolio strategy. Figure 5 shows that EW\_Max performs the best followed by RP\_Max and then MV\_Max. Consequently, all asset-selected risk-based strategies outperform their respective baseline strategies as in Figure 4. Summing up, Figures 4 and 5 describe that our proposed strategy's performance improvement is robust across both data sets which serves as evidence that diversified assets selection contributes to superior portfolio returns.

#### 6. Conclusions

method is shown in the figure as this method was generally the best, albeit marginally, among the three methods. All curves start at 1, and we used the same color to represent the same type of risk-based portfolio strategy. Figure 4 shows that EW\_Max performs the best, and in particular, all asset-selected risk-based strategies outperform their respective baseline strate-

Table 4 shows the results of M, EW-based, MV-based, and RP-based strategies for the KOSPI 200 data. Similar to Table 3, for the EW-based and the RP-based strategies, our asset-selected strategies showed a clear superior performance in terms of all types of returns considered, in this case including a maximum drawdown as well, and an inferior one-way turnover performance to the respective performances of the baseline strategies. As in Table 3, the portfolio sizes are approximately 25% of N which facilitates easy portfolio management in contrast to the baseline strategies. Even for the MV strategy, our asset-selected strategies produced performance improvement across all measures but the one-way turnover. The magnitude of the improvement was not as large as the asset-selected strategies of the EW and RP cases; however, even in this MV case, adding the asset selection phase in portfolio construction facilitated a more comprehensive assets diversification than that obtainable only through variance minimization. As before, the choice of the asset selection method seemed not to matter too significantly while μ ¼ Max method generally outperformed the others. So for the KOSPI 200 data, our proposed asset-selected strategy produced a clear superior performance to the baseline

A B CDE F GHI J

M 2.267 0.084 0.230 0.061 0.229 0.113 1.000 — 1.044 — EW 10.495 0.216 0.299 0.186 0.210 0.718 0.840 1.000 2.232 0.021 EW\_Max 18.156 0.256 0.283 0.234 0.215 0.924 0.840 0.254 2.439 0.396 EW\_Min 18.216 0.257 0.283 0.235 0.211 0.946 0.815 0.254 2.463 0.400 EW\_MC 18.003 0.254 0.278 0.234 0.212 0.933 0.834 0.254 2.358 0.390 MV 8.897 0.191 0.237 0.172 0.141 0.971 0.399 0.201 1.257 0.354 MV\_Max 12.553 0.217 0.211 0.202 0.169 0.984 0.488 0.120 1.039 0.503 MV\_Min 11.964 0.216 0.224 0.198 0.162 0.999 0.439 0.122 1.437 0.495 MV\_MC 12.437 0.215 0.208 0.201 0.169 0.981 0.490 0.121 1.022 0.502 RP 10.585 0.214 0.280 0.187 0.185 0.818 0.729 1.000 1.891 0.079 RP\_Max 15.868 0.241 0.255 0.222 0.192 0.972 0.734 0.254 1.863 0.417 RP\_Min 15.466 0.239 0.256 0.220 0.185 0.997 0.694 0.254 1.767 0.419 RP\_MC 15.723 0.239 0.251 0.222 0.191 0.976 0.734 0.254 1.790 0.409

gies, sometimes significantly.

174 Artificial Intelligence - Emerging Trends and Applications

strategy for all strategy types considered in this chapter.

Table 4. Performance summary for KOSPI 200 data.

5.2. KOSPI 200 data

In this chapter, we considered the three types of risk-based portfolio strategies that have played an important role recently in the area of smart beta strategies. They are the equalweighted, the minimum variance, and the risk parity portfolios. By establishing an efficient and effective asset selection from assets in the investable universe before the risk-based portfolio strategies are applied, improvements in the characteristic and in the performance of the risk-based portfolio strategies were obtained. The improvement in the characteristic part allows the investor to pick the exact size of the portfolio for the equal-weighted and the risk parity portfolios. The improvement in the performance part is related to the performance improvement in all three risk-based portfolio strategies for various performance measures such as the returns, the Sharpe ratio, and the diversification measures. Empirical results on the data sets of the S&P 500 and the KOSPI 200 have indicated that our asset-selected risk-based portfolio strategies show, sometimes significant, advantages across various performance measures compared to the baseline risk-based strategies.

[9] Bai SX, Tutuncu R. Least-squares approach to risk parity in portfolio selection. Quantita-

Min *k*-Cut for Asset Selection in Risk-Based Portfolio Strategies

http://dx.doi.org/10.5772/intechopen.74455

177

[10] Goldschmidt O, Hochbaum D. Polynomial algorithm for the k-cut problem. IEEE Annual

[11] Saran H, Vazirani V. Finding k cuts within twice the optimal. SIAM Journal of Comput-

Symposium on Foundations of Computer Science. 1988:444-451

[12] Vazirani V. Approximation Algorithms. Springer; 2003

tive Finance. 2016;16(3):357-376

ing. 1995;24(1):101-108

## Acknowledgements

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) and funded by the Ministry of Education (NRF-2017R1D1A1B03032722).

## Conflict of interest

The authors declare no conflict of interest.

## Author details

Saejoon Kim\* and Soong Kim

\*Address all correspondence to: saejoon@sogang.ac.kr

Department of Computer Science and Engineering, Sogang University, Seoul, South Korea

## References


portfolio strategies show, sometimes significant, advantages across various performance mea-

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) and funded by the Ministry of Education (NRF-

Department of Computer Science and Engineering, Sogang University, Seoul, South Korea

[2] Sharpe W. Capital asset prices: A theory of market equilibrium under conditions of risk.

[3] Carvalho R, Xiao L, Moulin P. Demystifying equity risk-based strategies: A simple alpha

[4] Clarke R, de SH, Thorley S. Risk parity, maximum diversification, and minimum variance: An analytic perspective. The Journal of Portfolio Management. 2013;39(3):39-53

[6] Richard J-C, Roncalli T. Smart Beta: Managing Diversification of Minimum Variance

[7] Maillard S, Roncalli T, Teiletche J. The properties of equally weighted risk contribution

[8] Qian E. Risk parity and diversification. The Journal of Investing. 2011;20(1):119-127

[1] Markowitz H. Portfolio selection. The Journal of Finance. 1952;7:77-91

plus beta description. The Journal of Portfolio Management. 2012:56-70

portfolios. The Journal of Portfolio Management. 2010;36:60-70

[5] Jurczenko E, Michel T, Teiletche J. Generalized Risk-Based Investing. SSRN; 2013

sures compared to the baseline risk-based strategies.

176 Artificial Intelligence - Emerging Trends and Applications

Acknowledgements

2017R1D1A1B03032722).

Conflict of interest

Author details

References

Saejoon Kim\* and Soong Kim

Portfolios. SSRN; 2015

The authors declare no conflict of interest.

\*Address all correspondence to: saejoon@sogang.ac.kr

The Journal of Finance. 1964;19(3):425-442

**Chapter 9**

**Provisional chapter**

**Virtual Reality for Urban Sound Design: A Tool for**

**Virtual Reality for Urban Sound Design: A Tool for** 

DOI: 10.5772/intechopen.75957

Urban sound is one of the main concerns of architects and urban planners in contemporary cities: how to control it, what to do about noise pollution, where silent areas should be situated, or which urban decisions must be made. These questions, among others, are based on spatial sound. Virtual reality is a powerful technology that can serve as a design tool to find some answers to these questions. Due to its power to generate realistic images of the environments that are studied, it is easy to see that virtual reality could contribute to the visualization and auralization of spaces before their construction. This task is one of architects' responsibilities, and such a tool could be very useful to them. This chapter highlights the principles and some applications of virtual reality in urban sound design.

**Keywords:** virtual reality, architect, urban planner, sound design, sound object

Sound continuously surrounds and envelops us, whether we are indoors or out, at work or play, in cities or the country. We hear birds, voices, machines, wind, water, thunder, whispers, steps, calls, whimpers, doors, windows, floors, chairs, etc. Some of these sounds are heard clearly, others overlap. Sometimes they come together in a yell, at other times they succeed one another. Then, you can hear noises, sounds, music, background, rhythms, accelerando, harmonies, dissonances, cacophonies, echoes, vibratos, repetition, melodies, etc. At times sounds remain masked by distance; at other times, some frequencies are highlighted. They are modified by echo, reverberation, coupling, absorption, brightness or localization. They have the capacity to make us rejoice, to sadden, pacify, sweeten, amaze, frighten, annoy, alert,

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

**Architects and Urban Planners**

**Architects and Urban Planners**

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75957

Josep Llorca

Josep Llorca

**Abstract**

**1. Introduction**

#### **Virtual Reality for Urban Sound Design: A Tool for Architects and Urban Planners Virtual Reality for Urban Sound Design: A Tool for Architects and Urban Planners**

DOI: 10.5772/intechopen.75957

#### Josep Llorca Josep Llorca

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75957

#### **Abstract**

Urban sound is one of the main concerns of architects and urban planners in contemporary cities: how to control it, what to do about noise pollution, where silent areas should be situated, or which urban decisions must be made. These questions, among others, are based on spatial sound. Virtual reality is a powerful technology that can serve as a design tool to find some answers to these questions. Due to its power to generate realistic images of the environments that are studied, it is easy to see that virtual reality could contribute to the visualization and auralization of spaces before their construction. This task is one of architects' responsibilities, and such a tool could be very useful to them. This chapter highlights the principles and some applications of virtual reality in urban sound design.

**Keywords:** virtual reality, architect, urban planner, sound design, sound object

## **1. Introduction**

Sound continuously surrounds and envelops us, whether we are indoors or out, at work or play, in cities or the country. We hear birds, voices, machines, wind, water, thunder, whispers, steps, calls, whimpers, doors, windows, floors, chairs, etc. Some of these sounds are heard clearly, others overlap. Sometimes they come together in a yell, at other times they succeed one another. Then, you can hear noises, sounds, music, background, rhythms, accelerando, harmonies, dissonances, cacophonies, echoes, vibratos, repetition, melodies, etc. At times sounds remain masked by distance; at other times, some frequencies are highlighted. They are modified by echo, reverberation, coupling, absorption, brightness or localization. They have the capacity to make us rejoice, to sadden, pacify, sweeten, amaze, frighten, annoy, alert,

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

stress, upset, attack or madden. At the same time, they can make you dance, swing, feel dizzy, sing, whistle, imitate, keep quiet, laugh or cry. When you try to emulate them, you can beat, rub, blow, strum, nip, strike, haul, move or pat something. Some sounds can be anticipated, others come as a surprise. Sometimes, you know how they are going to sound, but you do not know when. At other times you know when you will hear them, but not how they are going to sound. In some cases, the sound has been heard, but you do not know where it has come from. At certain times, it is known that a sound is going to happen, but you do not know why. Thus, anticipation, prevision, effect, surprise, timbre or production techniques contribute to hearing a sound as a source of pleasure, such as when it turns into music.

*Two big differences separate the experience of illuminating and sonic phenomena. The first consists of the fact that most visual objects are not sources of light, but simply objects, in the usual sense of the word, with light shining on them. Physicists are therefore quite accustomed to distinguishing light from* 

Virtual Reality for Urban Sound Design: A Tool for Architects and Urban Planners

http://dx.doi.org/10.5772/intechopen.75957

181

*With sound there is nothing like this. In the overwhelming majority of sonic phenomena, sound as originating from "sources" is emphasized. However, the classic distinction in optics between sources and objects has not been imposed in acoustics. Attention has been given to the sound (as we say the light) considered as an emanation from a source, its paths and deformations, without the appreciation of the* 

It can be easily assumed that the distinction between sound source and deformations of the sound source can refer to a probable distinction between sound source and modifications to the sound added by the architecture. In other words, we can refer to the distinction between music and architecture in the process of hearing. However, before a conclusion is drawn,

*This attitude has been reinforced by the fact that sound (prior to the discovery of recording) has always been linked in time with the energy phenomenon that was its origin, to the extent that it has been practically confused with it. However, a fleeting sound is only accessible to one sense and remains under single control: the sense of hearing. In contrast, a visual object has something stable, and this is the second of the aforementioned differences. It is not confused with the light that illuminates it, it appears with permanent contours under different lights, and it is accessible to other senses: it can be felt, weighed and smelled; there is a form that our hands feel, a surface that touch explores, a weight and an odour. It is understood that the notion of object barely had the strength to impose itself on the physicist's attention. As the natural tendency of physics is to lead facts to their causes, great satisfaction is found in the energetic evidence of the sound source. There is no reason why the ear, at the end of the propagation of the mechanical relations in an elastic medium (the air), should perceive anything other than the sound* 

*In fact, there is nothing false in the reasoning. Let us just say that, while it is valid for a physicist or an electroacoustic device builder, it is not, however, suitable for a musician or even for an acoustic ear [or an architect]. In fact, the latter do not concern themselves with the way a sound is born and propagated, but only with the way it is heard. Now, what the ear hears is neither the source nor the "sound", but the true sound objects in the same way that the eye does not see directly the source, or even its "light",* 

What Schaeffer wants us to realize is that what the ear hears is not the source, the "sound", or even the pure music, but the real sound objects, which are the-music-with-the-architecture, in the same way that the eye does not see the source directly, or even its "light", but the illumi-

Notably, when virtual reality tries to represent the sound world of an environment, it always does so for a subjective position of the listener, with an individual and unique point of view. The listener as a receiver perceives the effects and the sound content there, and interactively in virtual reality the sound is heard once it has been mixed with the architecture. Therefore, the listening point of the listener includes not only the sound source, but also the modifications caused by the place in which the sound moves, that is, the sound object. In this context, the notion of sound object provides an appropriate theoretical framework for this type of representation, since the sound object is revealed in the blind listening of the effects and the sound

nated architecture. What the ear hears is, in fact, musicalized architecture.

*the objects that reflect it. If the object itself gives out light, then we say it is a light "source".*

*shapes and contours of this sound apart from the reference to its source* [2]*.*

another quote by Schaeffer could provide clarification:

*source itself.*

*but the luminous objects.*

content, as explained previously [2].

This chapter describes some attributes of sound object design in architecture and urbanism, and explains a useful tool for designing and experiencing a sound object. The first part of the chapter defines the sound object in the built environment through some considerations of its temporal composition, and the treatment of the city as a sonic instrument. The second part of the chapter translates these concepts into a powerful representation tool––virtual reality––as a useful instrument for architects and urbanists of today, and then reviews the software that meets these needs and describes some case studies. Finally, the last part of the chapter provides an overview of how acoustic virtual reality can take advantage of some current developments in artificial intelligence.

## **2. Sound objects**

### **2.1. Some features of the architectural and urban sound object**

To define what an architectural and urban sound object is, we must refer to the creator of the term "sound object". In 1966, in his famous *Traité des objets musicaux*, Pierre Schaeffer broke with the academic classifications of noise, sound and music, and created a new musicology. His work presented a phenomenology of the audible. The key concept was not defined as a musical object, but as a sound object that could represent any environmental sound. The notion is quite complex and its richness cannot be demonstrated in a few words [1]. Nevertheless, Pierre Schaeffer himself defined what a sound object is, pointing out that:

*It is obvious that in saying "that's a violin" or "that's a creaking door" we are alluding to the sound produced by the violin or the creak of the door. But the distinction that we want to establish between the instrument and the sound object is even more radical: when we are presented with a magnetic strip in which an unknown sound is recorded, what are we listening to? It is precisely what we call a sound object, regardless of every causal reference designated by the terms sound body, sound source or instrument* [2]*.*

In this way, the term "sound object" is grounded in our subjectivity, despite the fact that it is not modified by individual variations in hearing, or continuous variations in our attention and sensitivity. Far from a subjective issue––in the individualist, incommunicable and practically elusive sense––sound objects can be well described and analyzed [2]. In other words, the sound object is the sound that reaches the listener's ear and is analyzed just before entering it. The sound object is never well revealed, as in the effect and content of blind hearing.

Pierre Schaeffer defines sound objects by comparing illuminating and sonic phenomena. This comparison is reproduced here, as it provides a clear explanation:

*Two big differences separate the experience of illuminating and sonic phenomena. The first consists of the fact that most visual objects are not sources of light, but simply objects, in the usual sense of the word, with light shining on them. Physicists are therefore quite accustomed to distinguishing light from the objects that reflect it. If the object itself gives out light, then we say it is a light "source".*

stress, upset, attack or madden. At the same time, they can make you dance, swing, feel dizzy, sing, whistle, imitate, keep quiet, laugh or cry. When you try to emulate them, you can beat, rub, blow, strum, nip, strike, haul, move or pat something. Some sounds can be anticipated, others come as a surprise. Sometimes, you know how they are going to sound, but you do not know when. At other times you know when you will hear them, but not how they are going to sound. In some cases, the sound has been heard, but you do not know where it has come from. At certain times, it is known that a sound is going to happen, but you do not know why. Thus, anticipation, prevision, effect, surprise, timbre or production techniques contrib-

This chapter describes some attributes of sound object design in architecture and urbanism, and explains a useful tool for designing and experiencing a sound object. The first part of the chapter defines the sound object in the built environment through some considerations of its temporal composition, and the treatment of the city as a sonic instrument. The second part of the chapter translates these concepts into a powerful representation tool––virtual reality––as a useful instrument for architects and urbanists of today, and then reviews the software that meets these needs and describes some case studies. Finally, the last part of the chapter provides an overview of how acoustic virtual reality can take advantage of some current developments in artificial intelligence.

To define what an architectural and urban sound object is, we must refer to the creator of the term "sound object". In 1966, in his famous *Traité des objets musicaux*, Pierre Schaeffer broke with the academic classifications of noise, sound and music, and created a new musicology. His work presented a phenomenology of the audible. The key concept was not defined as a musical object, but as a sound object that could represent any environmental sound. The notion is quite complex and its richness cannot be demonstrated in a few words [1]. Nevertheless, Pierre Schaeffer himself defined what a sound object is, pointing out that:

*It is obvious that in saying "that's a violin" or "that's a creaking door" we are alluding to the sound produced by the violin or the creak of the door. But the distinction that we want to establish between the instrument and the sound object is even more radical: when we are presented with a magnetic strip in which an unknown sound is recorded, what are we listening to? It is precisely what we call a sound object, regardless of every causal reference designated by the terms sound body, sound source or instru-*

In this way, the term "sound object" is grounded in our subjectivity, despite the fact that it is not modified by individual variations in hearing, or continuous variations in our attention and sensitivity. Far from a subjective issue––in the individualist, incommunicable and practically elusive sense––sound objects can be well described and analyzed [2]. In other words, the sound object is the sound that reaches the listener's ear and is analyzed just before entering it.

Pierre Schaeffer defines sound objects by comparing illuminating and sonic phenomena. This

The sound object is never well revealed, as in the effect and content of blind hearing.

comparison is reproduced here, as it provides a clear explanation:

ute to hearing a sound as a source of pleasure, such as when it turns into music.

**2.1. Some features of the architectural and urban sound object**

**2. Sound objects**

180 Artificial Intelligence - Emerging Trends and Applications

*ment* [2]*.*

*With sound there is nothing like this. In the overwhelming majority of sonic phenomena, sound as originating from "sources" is emphasized. However, the classic distinction in optics between sources and objects has not been imposed in acoustics. Attention has been given to the sound (as we say the light) considered as an emanation from a source, its paths and deformations, without the appreciation of the shapes and contours of this sound apart from the reference to its source* [2]*.*

It can be easily assumed that the distinction between sound source and deformations of the sound source can refer to a probable distinction between sound source and modifications to the sound added by the architecture. In other words, we can refer to the distinction between music and architecture in the process of hearing. However, before a conclusion is drawn, another quote by Schaeffer could provide clarification:

*This attitude has been reinforced by the fact that sound (prior to the discovery of recording) has always been linked in time with the energy phenomenon that was its origin, to the extent that it has been practically confused with it. However, a fleeting sound is only accessible to one sense and remains under single control: the sense of hearing. In contrast, a visual object has something stable, and this is the second of the aforementioned differences. It is not confused with the light that illuminates it, it appears with permanent contours under different lights, and it is accessible to other senses: it can be felt, weighed and smelled; there is a form that our hands feel, a surface that touch explores, a weight and an odour.*

*It is understood that the notion of object barely had the strength to impose itself on the physicist's attention. As the natural tendency of physics is to lead facts to their causes, great satisfaction is found in the energetic evidence of the sound source. There is no reason why the ear, at the end of the propagation of the mechanical relations in an elastic medium (the air), should perceive anything other than the sound source itself.*

*In fact, there is nothing false in the reasoning. Let us just say that, while it is valid for a physicist or an electroacoustic device builder, it is not, however, suitable for a musician or even for an acoustic ear [or an architect]. In fact, the latter do not concern themselves with the way a sound is born and propagated, but only with the way it is heard. Now, what the ear hears is neither the source nor the "sound", but the true sound objects in the same way that the eye does not see directly the source, or even its "light", but the luminous objects.*

What Schaeffer wants us to realize is that what the ear hears is not the source, the "sound", or even the pure music, but the real sound objects, which are the-music-with-the-architecture, in the same way that the eye does not see the source directly, or even its "light", but the illuminated architecture. What the ear hears is, in fact, musicalized architecture.

Notably, when virtual reality tries to represent the sound world of an environment, it always does so for a subjective position of the listener, with an individual and unique point of view. The listener as a receiver perceives the effects and the sound content there, and interactively in virtual reality the sound is heard once it has been mixed with the architecture. Therefore, the listening point of the listener includes not only the sound source, but also the modifications caused by the place in which the sound moves, that is, the sound object. In this context, the notion of sound object provides an appropriate theoretical framework for this type of representation, since the sound object is revealed in the blind listening of the effects and the sound content, as explained previously [2].

## **2.2. The city as a sound instrument and the architect as a luthier of the city**

Another issue in considerations of the sound object is the notion of identification of the sound source. While it is true that a sound sounds different in each architectural or urban space, we can also say that we continue to recognize the original sound and distinguish it from other sounds, despite the effects with which the architectural space modifies the sound source. To understand this issue, it can be explained as follows.

architecture as places; due to the movement of the spectator, the architectural scene changes).

Virtual Reality for Urban Sound Design: A Tool for Architects and Urban Planners

http://dx.doi.org/10.5772/intechopen.75957

183

If we listen carefully to the same sound––an oboe––from different points of the same architecture, there is hardly anything in common between the various results produced by the same reproduction-resonance device. But this does not prevent the musician from speaking about the "timbre" of the oboe as an identity. Certainly, the oboe's timbre is recognizable, and the most disfigured of the halls allows the oboe to be identified by an uneducated listener. So, we can state a priori that, although the entire room has a timbre, each of its spatial positions also has its own timbre: the same word with two different meanings. Therefore, we define archi-

*Any device that allows us to obtain a varied collection of sound objects or varied sound objects, maintaining in spirit the presence of a cause, is an instrument of music in the traditional sense of experience* 

Virtual reality is the simultaneous representation and perception of reality and its physical attributes in an interactive computer-generated environment [3]. One of these physical attributes is sound. The following diagram shows the operation of acoustic virtual reality applied to a closed architectural environment. Three elements can be distinguished that act in the process: the emission of a sound source, the addition of sound effects, and the perception of

The process described above is based on auralization. Following the concepts of simulation in acoustics and vibrations, Vorländer, in [3], describes auralization as (a) the separation of the process of sound generation, propagation and reproduction into three separate blocks, and

The primitive signal, s(n), is called "dry recording". It contains the mono sound signal without any reverberation. Normally it is a sound source recorded at a distance and in a specific direction in an anechoic chamber. The resulting sound signal after the sound propagation in (or between) rooms, g(n), contains characteristics of both the sound source and the transmission system. Here, the propagation of sound in a room usually adds the phenomenon of reverberation to the source signal, while a sound event transmitted through the walls is characterized by a lower sound pressure and a dark sound (with the characteristics of a low pass). The operation of a sound transmission system in physics is represented by the impulse response of the system, h(n). The sound signal at the position of the receiver is achieved by convolution operation of the original "dry recording" with the impulse response (usually represented by a digital filter) [3]. This method can be easily understood as an acoustic filter

(b) the representation of these blocks with systems theory tools (**Figure 2**).

that contains the impulse response as a function of the position in the hall [4].

In the framework of a tool for the architect and urban planner, we need to summarize which requirements the acoustic virtual reality must meet to satisfy the design needs of architects

Therefore, it is an instrument that varies greatly.

tecture as an acoustic instrument:

*common to all civilizations* [2]*.*

the final sound object (**Figure 1**).

and urban planners:

**3. Virtual reality as a tool for acoustic design**

We can say that an acoustic musical instrument has three elements, of which the first two are essential. These are: the *vibrator* that starts to vibrate, and the *exciter* that causes the initial vibration or prolongs it in the case of maintained sounds. The third element, which is accessory, but always present, is the *resonator*, that is, a device designed to add its own effects to those of the body in vibration in order to amplify them, prolong them or modify them in some way [2].

If we transfer this concept to the city, we can also distinguish three elements: the *vibrator* or the elements of the construction: stony floors, street furniture, walls, etc.; the *exciter* or the man with his footsteps, movements and actions, the wind, the water; and the *resonator* or architecture. From this point of view, we consider architecture as a large-scale instrument. This classification into three elements presents the city as a great sound instrument and the architect as a manufacturer of musical instruments without knowing it.

Thus, we can easily compare a church, a street and a square. All of them have elements of vibration: the stone walls of the church, the asphalt of the street, and the sheet of water on the fountain of the square. The exciter for the church are the people who pray, while in the street it is the friction of the car wheels, and in the square the jet of water that falls from the top of the fountain. Finally, the first two spaces have an open-air resonator, while the church is a closed space of resonance.

This classification introduces great clarity in the approach, so that we can move on to another more difficult classification: that of the sound objects themselves, obtained from sources or sound bodies. The murmur of the people praying in the church is infinitely closer to the sound of the fountain in the square than to the shrill sound of a shout in the church, which in turn can approach the braking of a car on the street.

Once a sound source has been discovered, two possibilities are offered to the instrument manufacturer: repeat the same source and multiply it in different measures, or keep the same source and try to vary it. Schaeffer argues that the second procedure is not the simplest, because it inextricably links the three elements: vibrator, exciter and resonator. It is likely that contingencies force the instrumentalist not to use these variations in mutual independence, but to associate them immediately with the level of esthetics of the object [2].

However, if we refer to multiple instruments composed of a collection of vibrating bodies, like the city, we see at once that each of them repeats the triple combination of the elements. Architecture proposes a change in the collection of vibrators (each piece of architecture has different construction elements), a change in the collection of exciters (people change), and a change in the resonator (although the architecture does not move, there are as many pieces of architecture as places; due to the movement of the spectator, the architectural scene changes). Therefore, it is an instrument that varies greatly.

If we listen carefully to the same sound––an oboe––from different points of the same architecture, there is hardly anything in common between the various results produced by the same reproduction-resonance device. But this does not prevent the musician from speaking about the "timbre" of the oboe as an identity. Certainly, the oboe's timbre is recognizable, and the most disfigured of the halls allows the oboe to be identified by an uneducated listener. So, we can state a priori that, although the entire room has a timbre, each of its spatial positions also has its own timbre: the same word with two different meanings. Therefore, we define architecture as an acoustic instrument:

*Any device that allows us to obtain a varied collection of sound objects or varied sound objects, maintaining in spirit the presence of a cause, is an instrument of music in the traditional sense of experience common to all civilizations* [2]*.*

## **3. Virtual reality as a tool for acoustic design**

**2.2. The city as a sound instrument and the architect as a luthier of the city**

understand this issue, it can be explained as follows.

182 Artificial Intelligence - Emerging Trends and Applications

as a manufacturer of musical instruments without knowing it.

way [2].

closed space of resonance.

can approach the braking of a car on the street.

Another issue in considerations of the sound object is the notion of identification of the sound source. While it is true that a sound sounds different in each architectural or urban space, we can also say that we continue to recognize the original sound and distinguish it from other sounds, despite the effects with which the architectural space modifies the sound source. To

We can say that an acoustic musical instrument has three elements, of which the first two are essential. These are: the *vibrator* that starts to vibrate, and the *exciter* that causes the initial vibration or prolongs it in the case of maintained sounds. The third element, which is accessory, but always present, is the *resonator*, that is, a device designed to add its own effects to those of the body in vibration in order to amplify them, prolong them or modify them in some

If we transfer this concept to the city, we can also distinguish three elements: the *vibrator* or the elements of the construction: stony floors, street furniture, walls, etc.; the *exciter* or the man with his footsteps, movements and actions, the wind, the water; and the *resonator* or architecture. From this point of view, we consider architecture as a large-scale instrument. This classification into three elements presents the city as a great sound instrument and the architect

Thus, we can easily compare a church, a street and a square. All of them have elements of vibration: the stone walls of the church, the asphalt of the street, and the sheet of water on the fountain of the square. The exciter for the church are the people who pray, while in the street it is the friction of the car wheels, and in the square the jet of water that falls from the top of the fountain. Finally, the first two spaces have an open-air resonator, while the church is a

This classification introduces great clarity in the approach, so that we can move on to another more difficult classification: that of the sound objects themselves, obtained from sources or sound bodies. The murmur of the people praying in the church is infinitely closer to the sound of the fountain in the square than to the shrill sound of a shout in the church, which in turn

Once a sound source has been discovered, two possibilities are offered to the instrument manufacturer: repeat the same source and multiply it in different measures, or keep the same source and try to vary it. Schaeffer argues that the second procedure is not the simplest, because it inextricably links the three elements: vibrator, exciter and resonator. It is likely that contingencies force the instrumentalist not to use these variations in mutual independence,

However, if we refer to multiple instruments composed of a collection of vibrating bodies, like the city, we see at once that each of them repeats the triple combination of the elements. Architecture proposes a change in the collection of vibrators (each piece of architecture has different construction elements), a change in the collection of exciters (people change), and a change in the resonator (although the architecture does not move, there are as many pieces of

but to associate them immediately with the level of esthetics of the object [2].

Virtual reality is the simultaneous representation and perception of reality and its physical attributes in an interactive computer-generated environment [3]. One of these physical attributes is sound. The following diagram shows the operation of acoustic virtual reality applied to a closed architectural environment. Three elements can be distinguished that act in the process: the emission of a sound source, the addition of sound effects, and the perception of the final sound object (**Figure 1**).

The process described above is based on auralization. Following the concepts of simulation in acoustics and vibrations, Vorländer, in [3], describes auralization as (a) the separation of the process of sound generation, propagation and reproduction into three separate blocks, and (b) the representation of these blocks with systems theory tools (**Figure 2**).

The primitive signal, s(n), is called "dry recording". It contains the mono sound signal without any reverberation. Normally it is a sound source recorded at a distance and in a specific direction in an anechoic chamber. The resulting sound signal after the sound propagation in (or between) rooms, g(n), contains characteristics of both the sound source and the transmission system. Here, the propagation of sound in a room usually adds the phenomenon of reverberation to the source signal, while a sound event transmitted through the walls is characterized by a lower sound pressure and a dark sound (with the characteristics of a low pass). The operation of a sound transmission system in physics is represented by the impulse response of the system, h(n). The sound signal at the position of the receiver is achieved by convolution operation of the original "dry recording" with the impulse response (usually represented by a digital filter) [3]. This method can be easily understood as an acoustic filter that contains the impulse response as a function of the position in the hall [4].

In the framework of a tool for the architect and urban planner, we need to summarize which requirements the acoustic virtual reality must meet to satisfy the design needs of architects and urban planners:


**Figure 1.** Representation of the operation of acoustic virtual reality applied to a closed architectural environment. A: real environment. B: virtual environment. C: zoom into the sound object.

**3.1. Virtual acoustics software: a quick review**

be combined in a synthesis of source signals and impulse responses.

programs.

The market offers a series of virtual reality tools that address virtual acoustics. These can be divided into two types: auralization engines for computer games, and acoustic simulation

**Figure 2.** Representation of the operation of auralization. Redrawn from [3]. Generation and propagation of sound and its representation in the physical domain (A and B), and in the domain of acoustic signal processing (C). In the physical domain, sound source characterization and wave propagation can be either modeled or measured. The components will

Virtual Reality for Urban Sound Design: A Tool for Architects and Urban Planners

http://dx.doi.org/10.5772/intechopen.75957

185

Regarding the first type, the sound of a computer game is the result of work done by the sound designer. A sound designer usually creates audio content (sound effects and music) Virtual Reality for Urban Sound Design: A Tool for Architects and Urban Planners http://dx.doi.org/10.5772/intechopen.75957 185

**Figure 2.** Representation of the operation of auralization. Redrawn from [3]. Generation and propagation of sound and its representation in the physical domain (A and B), and in the domain of acoustic signal processing (C). In the physical domain, sound source characterization and wave propagation can be either modeled or measured. The components will be combined in a synthesis of source signals and impulse responses.

#### **3.1. Virtual acoustics software: a quick review**

**1.** The tool must be able to correctly represent the location of the sound sources.

**3.** It must be able to include the effects added to the source by the geometry of the built environment. To do this, you must calculate the bounces with the adjacent geometry.

**4.** It must change the resulting sound depending on the materials of the building elements. **5.** It must be interactive, that is, allow movement through the environment, and even modi-

**Figure 1.** Representation of the operation of acoustic virtual reality applied to a closed architectural environment. A: real

environment. B: virtual environment. C: zoom into the sound object.

**2.** It must allow the attenuation of sound with distance.

184 Artificial Intelligence - Emerging Trends and Applications

fication and testing in real time of the elements in the scenario.

The market offers a series of virtual reality tools that address virtual acoustics. These can be divided into two types: auralization engines for computer games, and acoustic simulation programs.

Regarding the first type, the sound of a computer game is the result of work done by the sound designer. A sound designer usually creates audio content (sound effects and music) and then creates sound events to launch the audio content. Sound events are normally monitored using tools such as Wwise or FMOD and are launched by them in the game.

their three-dimensional representations are not realistic, but rather schematic. Here we collect

Virtual Reality for Urban Sound Design: A Tool for Architects and Urban Planners

http://dx.doi.org/10.5772/intechopen.75957

187

**RAVEN**. Developed at ITA in RWTH Aachen university, it is based on knowledge about today's acoustic simulation techniques and allows fairly faithful physical auralization of the propagation of sound in complex environments, including important effects such as sound diffusion, room isolation and sound diffraction. Instead of this rendered realistic sound field, the sound sources are distributed and move freely, and the receivers listen to them in real time. In addition, manipulations and modifications of the environment itself are supported. The acoustic simulation of RAVEN combines the method of deterministic image sources (IS) [6] with a stochastic ray-tracing algorithm [7]. This framework allows physical computations of high-quality impulse responses in real time, where, apart from the components of the reflected specular sound field, the phenomenon of sound diffusion, sound transmission, and diffraction are taken into account. The environment is completely written in C ++, supports the operating systems of Windows, Linux and Mac OX X, and allows parallel computing in machines where memory is shared, in machines with memory distributed over the network,

**EVERTims** (http://evertims.github.io/) is an open source framework for the auralization of 3D models, which offers real-time feedback on how the acoustics sound in any given room during its creation. The framework is based on three components: a Blender plug-in, a C ++ Ray plotter, and a JUCE auralization motor. While a 3D model in Blender is devised, the plugin continuously increases the geometry and the details of the materials to the ray tracing. On the basis of this information, the client simulates how the waves propagate there. The result of this simulation is then released to the auralization engine that reconstructs the Ambisonics sound field in any position for binaural listening. The environment takes advantage of the Blender Render Engine to support the auralization of the game for interactive exploration of

Finally, with the **CATT-Walker** module (https://www.catt.se/walker.htm), CATT-Acoustic software has a powerful tool for real-time auralization in a microcomputer. CATT-Walker manages this dynamic audible restitution by continuously interpolating the impulse responses previously calculated in B format (surround coding). Hearing is performed in binaural mode from appropriate ambisonic decoding. To maintain a sufficiently low latency, CATT uses Lake Technology's split FIR convolutional filtration technique. To reproduce the simulated acoustic environment as closely as possible, the modeled space must be sampled more or less densely by distributing the reception points around the source and in the evolution zone.

After the description of the properties of Virtual Acoustics and the software on offer that can support virtual acoustic simulation tools, we propose five applications of virtual acoustics in

**4. Five virtual acoustics applications for architects and urban** 

the process of design and architectural and urban analysis.

three powerful samples: RAVEN, EVERTims and CATT-Walker.

or in a combination of both [8].

the designed model [9].

**planners**

**Audiokinetic Wwise** (https://www.audiokinetic.com/) is a solution for sound design in computer games that consists of a powerful application to create audio and animation structures; define the propagation; control the sound, music and integration of movement; profile the reproduction; and create banks of sounds. In addition, Wwise is a sophisticated audio engine that handles audio processing, animation and a series of functions optimized for each platform. The program interprets LUA scripts and reproduces exactly how the sounds and the animation behave in the game, allowing the validation of specific behaviors and outlining the Wwise acuity in each platform before the integration of sound into the game. It also contains a series of plug-ins that are divided into those that serve to generate audio and movement, such as the tone generator, and those that create audio effects, such as reverberation. Finally, it is an interface between Wwise and the visualization programs of the three-dimensional world.

**FMOD Studio** (https://www.fmod.com/), like Wwise, this is an application dedicated to sound in video games. This software, developed by Fireflight Technologies in 2002, is one of the industry's standards and has been the basis of award-winning projects. FMOD Studio is a flexible and intuitive solution for audio in video games. It allows sound in the game to be designed using a DAW interface without knowledge of the required programming. FMOD can be incorporated into almost all platforms. In addition, the program allows real-time mixes and balances. This aspect allows you to make changes and listen to them without having to re-record the scene. FMOD consists of several separate tools in different programs such as "FMOD Designer" that correspond to the main window of the program, where the main work of creating the events and the parameters to be called by the video games is done [5].

**Pure Data** (https://puredata.info/) is a visual open source environment that works from any personal computer to smartphones and iOS. It is one of the major branches of the family of programming languages known as Max, originally developed by Miller Puckette at IRCAM. Pure Data allows musicians, visual artists, researchers and developers to create programs graphically without writing a line of code. It can be used to process and generate sound, video, 2D/3D graphics, interface and MIDI sensors.

**Propagate** (https://www.assetstore.unity3d.com/en/#!/content/40200) is a system for Unity that allows you to incorporate immersive audio that propagates realistically through the geometry of the scene quickly and efficiently. It is a simple interface that allows you to propagate the audio in real time, even when the sound sources move. The program is based on three principles of sound reception: the occlusion system simulates the transmission of sound waves by paraments and geometries, taking into account their materials and thus modifying the volume and frequency distribution. The diffraction system simulates the passage of sound waves through holes and corners between geometries. The perception system simulates how the perception of sound changes with the position in geometry.

Acoustic simulation programs, the second group of virtual reality programs with acoustics treatment, are programs that rely on and are completely dedicated to the faithful reproduction of sound in a space. Usually, they do not pay attention to the visual aspect of space, so their three-dimensional representations are not realistic, but rather schematic. Here we collect three powerful samples: RAVEN, EVERTims and CATT-Walker.

and then creates sound events to launch the audio content. Sound events are normally moni-

**Audiokinetic Wwise** (https://www.audiokinetic.com/) is a solution for sound design in computer games that consists of a powerful application to create audio and animation structures; define the propagation; control the sound, music and integration of movement; profile the reproduction; and create banks of sounds. In addition, Wwise is a sophisticated audio engine that handles audio processing, animation and a series of functions optimized for each platform. The program interprets LUA scripts and reproduces exactly how the sounds and the animation behave in the game, allowing the validation of specific behaviors and outlining the Wwise acuity in each platform before the integration of sound into the game. It also contains a series of plug-ins that are divided into those that serve to generate audio and movement, such as the tone generator, and those that create audio effects, such as reverberation. Finally, it is an interface between Wwise and the visualization programs of the three-dimensional world. **FMOD Studio** (https://www.fmod.com/), like Wwise, this is an application dedicated to sound in video games. This software, developed by Fireflight Technologies in 2002, is one of the industry's standards and has been the basis of award-winning projects. FMOD Studio is a flexible and intuitive solution for audio in video games. It allows sound in the game to be designed using a DAW interface without knowledge of the required programming. FMOD can be incorporated into almost all platforms. In addition, the program allows real-time mixes and balances. This aspect allows you to make changes and listen to them without having to re-record the scene. FMOD consists of several separate tools in different programs such as "FMOD Designer" that correspond to the main window of the program, where the main work of creating the events and the parameters to be called by the video games is done [5].

**Pure Data** (https://puredata.info/) is a visual open source environment that works from any personal computer to smartphones and iOS. It is one of the major branches of the family of programming languages known as Max, originally developed by Miller Puckette at IRCAM. Pure Data allows musicians, visual artists, researchers and developers to create programs graphically without writing a line of code. It can be used to process and generate

**Propagate** (https://www.assetstore.unity3d.com/en/#!/content/40200) is a system for Unity that allows you to incorporate immersive audio that propagates realistically through the geometry of the scene quickly and efficiently. It is a simple interface that allows you to propagate the audio in real time, even when the sound sources move. The program is based on three principles of sound reception: the occlusion system simulates the transmission of sound waves by paraments and geometries, taking into account their materials and thus modifying the volume and frequency distribution. The diffraction system simulates the passage of sound waves through holes and corners between geometries. The perception system simulates how

Acoustic simulation programs, the second group of virtual reality programs with acoustics treatment, are programs that rely on and are completely dedicated to the faithful reproduction of sound in a space. Usually, they do not pay attention to the visual aspect of space, so

sound, video, 2D/3D graphics, interface and MIDI sensors.

the perception of sound changes with the position in geometry.

tored using tools such as Wwise or FMOD and are launched by them in the game.

186 Artificial Intelligence - Emerging Trends and Applications

**RAVEN**. Developed at ITA in RWTH Aachen university, it is based on knowledge about today's acoustic simulation techniques and allows fairly faithful physical auralization of the propagation of sound in complex environments, including important effects such as sound diffusion, room isolation and sound diffraction. Instead of this rendered realistic sound field, the sound sources are distributed and move freely, and the receivers listen to them in real time. In addition, manipulations and modifications of the environment itself are supported. The acoustic simulation of RAVEN combines the method of deterministic image sources (IS) [6] with a stochastic ray-tracing algorithm [7]. This framework allows physical computations of high-quality impulse responses in real time, where, apart from the components of the reflected specular sound field, the phenomenon of sound diffusion, sound transmission, and diffraction are taken into account. The environment is completely written in C ++, supports the operating systems of Windows, Linux and Mac OX X, and allows parallel computing in machines where memory is shared, in machines with memory distributed over the network, or in a combination of both [8].

**EVERTims** (http://evertims.github.io/) is an open source framework for the auralization of 3D models, which offers real-time feedback on how the acoustics sound in any given room during its creation. The framework is based on three components: a Blender plug-in, a C ++ Ray plotter, and a JUCE auralization motor. While a 3D model in Blender is devised, the plugin continuously increases the geometry and the details of the materials to the ray tracing. On the basis of this information, the client simulates how the waves propagate there. The result of this simulation is then released to the auralization engine that reconstructs the Ambisonics sound field in any position for binaural listening. The environment takes advantage of the Blender Render Engine to support the auralization of the game for interactive exploration of the designed model [9].

Finally, with the **CATT-Walker** module (https://www.catt.se/walker.htm), CATT-Acoustic software has a powerful tool for real-time auralization in a microcomputer. CATT-Walker manages this dynamic audible restitution by continuously interpolating the impulse responses previously calculated in B format (surround coding). Hearing is performed in binaural mode from appropriate ambisonic decoding. To maintain a sufficiently low latency, CATT uses Lake Technology's split FIR convolutional filtration technique. To reproduce the simulated acoustic environment as closely as possible, the modeled space must be sampled more or less densely by distributing the reception points around the source and in the evolution zone.

## **4. Five virtual acoustics applications for architects and urban planners**

After the description of the properties of Virtual Acoustics and the software on offer that can support virtual acoustic simulation tools, we propose five applications of virtual acoustics in the process of design and architectural and urban analysis.

## **4.1. Invisible sound objects**

The first case is for the phenomenon of sound objects that are heard, but you cannot see them (incongruence), the visible objects that you see, but cannot hear (incongruence), or the sound objects that you see in a different place to which they are heard (delocalization). Do such phenomena generate any design problems?

• A desired and unseen sound source: there is strong inconsistency, discomfort and lack of

Virtual Reality for Urban Sound Design: A Tool for Architects and Urban Planners

http://dx.doi.org/10.5772/intechopen.75957

189

• An unwanted and seen sound source: there is strong congruence, annoyance and control

• An unwanted and unseen sound source: there is strong inconsistency, discomfort and lack

Architects and urban planners are constantly concerned about the visual impact that the finishing will have on the built work. Although the visual impact affects the perception of space [14] and can affect the mood of the users [14, 15], the sound impact has a no less important

We also know that visual perception in virtual reality environments is related to the geometric representation of the space, and to the material representation of this geometry. If we focus on the representation of the materials, their treatment and properties are paramount. Therefore, base color, glossiness, roughness, normal or bump and displacement are some of the techniques that virtual reality software has developed to simulate the materials of the represented reality. Even though this fine detailing could also seem to be necessary for acoustic virtual reality representations, there is a big difference between acoustic and visual materials. Small details in visual perception are negligible in acoustic perception. For this reason, a plane wall behaves acoustically similarly to a rough wall at low frequencies [19]. However, some phenomena linked to materiality, such as acoustic porosity and absorption, are linked to visual glossiness and roughness and, therefore, could have a big influence both visually and acoustically. In this context, the fine representation of both visual and acoustical properties of materials in virtual reality would turn this tool into a convincing way of representing the environments

There has always been a debate among architects about the suitability of curved shapes versus straight forms, and vice-versa. At both extremes, these tendencies can be classified as the purest rationalism, that of rectangular forms, straight lines, repeated and constant rhythms [20]; and the organisms that its formal referents have in nature [21–23]. Experimentally, curvilinear, sinuous, parabolic or circular shapes have been shown to affect neural activity more strongly than rectangular or quadrangular ones [24]. The visual influence of this type of geometry both in architectural interiors and in urban exteriors is also very important in the acoustics of these spaces. The graphical acoustics of ray tracing [25, 26] shows us how concave shapes reinforce sound at a point or in a concentrated area, while convex shapes scatter sound

This affects the way sound is perceived in interiors and can produce undesired effects in these places. One clear example of this phenomena can be tested when seated at specific points in

effect on the perception of space [16, 17] and the users' mood [18].

control of the situation.

of control of the situation.

**4.2. The influence of materials**

designed by architects and urban planners.

in multiple directions.

**4.3. Concavity and convexity in architectural forms**

of the situation.

In contrast to the experimental conditions, the listener immersed in a real environment relies on all the senses to structure a representation of the environment [10]. A sensory modality could also pay attention to a different modality, and even influence the same perception more strongly. This raises questions about whether the resources of attention are controlled by a supramodal system or by various modalities of attention systems. In conditions of focused attention, it is difficult to judge each signal (sound and vision) separately when incongruent signals occur in the same place, at least much more difficult than when the incongruent signals come from different points and attention is divided [11]. The most feasible model for today's knowledge consists of a multilevel attention mechanism with a multimodal component above the sensory component. In the context of perception of the sound environment, this could be interpreted as a stronger emphasis on visible sources, but at the same time, a lower probability of identification of the deviant sounds if these sounds come from the same place as the visual stimulus.

The mechanisms of multisensory attention also have a very strong temporal component. The sound stimuli presented in temporal congruence with the appearance of the visual objective make the visual objective stand out in the scene [12]. Based on this knowledge of multisensory perception, one can solve a nineteenth-century concern of sound landscape designers, at least partially: is it good to hide the sources of unwanted sounds from view? From the perspective of attention, we can conclude that when the sound is not very prominent and therefore does not attract much attention, we can avoid noticing the sound by eliminating congruent visual stimuli. Similarly, a desired sound should be accompanied by a visual stimulus to ensure that it receives sufficient attention. In contrast, we must emphasize that in the case of very prominent sounds that will attract attention, the absence of a visual stimulus could appear as a surprise, which would influence the perception [13].

In this context, the virtual reality tool offers a scenario to test the congruence between the sounds that are seen and heard. For this, the possible options to be evaluated must be simulated:


## **4.2. The influence of materials**

**4.1. Invisible sound objects**

place as the visual stimulus.

simulated:

situation.

a surprise, which would influence the perception [13].

nomena generate any design problems?

188 Artificial Intelligence - Emerging Trends and Applications

The first case is for the phenomenon of sound objects that are heard, but you cannot see them (incongruence), the visible objects that you see, but cannot hear (incongruence), or the sound objects that you see in a different place to which they are heard (delocalization). Do such phe-

In contrast to the experimental conditions, the listener immersed in a real environment relies on all the senses to structure a representation of the environment [10]. A sensory modality could also pay attention to a different modality, and even influence the same perception more strongly. This raises questions about whether the resources of attention are controlled by a supramodal system or by various modalities of attention systems. In conditions of focused attention, it is difficult to judge each signal (sound and vision) separately when incongruent signals occur in the same place, at least much more difficult than when the incongruent signals come from different points and attention is divided [11]. The most feasible model for today's knowledge consists of a multilevel attention mechanism with a multimodal component above the sensory component. In the context of perception of the sound environment, this could be interpreted as a stronger emphasis on visible sources, but at the same time, a lower probability of identification of the deviant sounds if these sounds come from the same

The mechanisms of multisensory attention also have a very strong temporal component. The sound stimuli presented in temporal congruence with the appearance of the visual objective make the visual objective stand out in the scene [12]. Based on this knowledge of multisensory perception, one can solve a nineteenth-century concern of sound landscape designers, at least partially: is it good to hide the sources of unwanted sounds from view? From the perspective of attention, we can conclude that when the sound is not very prominent and therefore does not attract much attention, we can avoid noticing the sound by eliminating congruent visual stimuli. Similarly, a desired sound should be accompanied by a visual stimulus to ensure that it receives sufficient attention. In contrast, we must emphasize that in the case of very prominent sounds that will attract attention, the absence of a visual stimulus could appear as

In this context, the virtual reality tool offers a scenario to test the congruence between the sounds that are seen and heard. For this, the possible options to be evaluated must be

• A very prominent and seen sound source: there is strong congruence, and control of the

• A desired sound source and view: there is strong congruence, and control of the situation.

• A not very prominent and seen sound source: there is weak incongruence. • A not very prominent and not seen sound source: there is weak congruence. • A very prominent and not seen sound source: there is strong incongruence.

Architects and urban planners are constantly concerned about the visual impact that the finishing will have on the built work. Although the visual impact affects the perception of space [14] and can affect the mood of the users [14, 15], the sound impact has a no less important effect on the perception of space [16, 17] and the users' mood [18].

We also know that visual perception in virtual reality environments is related to the geometric representation of the space, and to the material representation of this geometry. If we focus on the representation of the materials, their treatment and properties are paramount. Therefore, base color, glossiness, roughness, normal or bump and displacement are some of the techniques that virtual reality software has developed to simulate the materials of the represented reality.

Even though this fine detailing could also seem to be necessary for acoustic virtual reality representations, there is a big difference between acoustic and visual materials. Small details in visual perception are negligible in acoustic perception. For this reason, a plane wall behaves acoustically similarly to a rough wall at low frequencies [19]. However, some phenomena linked to materiality, such as acoustic porosity and absorption, are linked to visual glossiness and roughness and, therefore, could have a big influence both visually and acoustically. In this context, the fine representation of both visual and acoustical properties of materials in virtual reality would turn this tool into a convincing way of representing the environments designed by architects and urban planners.

## **4.3. Concavity and convexity in architectural forms**

There has always been a debate among architects about the suitability of curved shapes versus straight forms, and vice-versa. At both extremes, these tendencies can be classified as the purest rationalism, that of rectangular forms, straight lines, repeated and constant rhythms [20]; and the organisms that its formal referents have in nature [21–23]. Experimentally, curvilinear, sinuous, parabolic or circular shapes have been shown to affect neural activity more strongly than rectangular or quadrangular ones [24]. The visual influence of this type of geometry both in architectural interiors and in urban exteriors is also very important in the acoustics of these spaces. The graphical acoustics of ray tracing [25, 26] shows us how concave shapes reinforce sound at a point or in a concentrated area, while convex shapes scatter sound in multiple directions.

This affects the way sound is perceived in interiors and can produce undesired effects in these places. One clear example of this phenomena can be tested when seated at specific points in Coderch's new building for the School of Architecture of Barcelona. The curved walls concentrate student's whispers in some areas that reinforce the noise perception at these points. The same effect can happen when, in an urban environment, a curved design of walls concentrates the sound rays in one area.

However, the process of designing the sound object, like any architectural design process, requires that the solution to the problem addressed by the sound object is not just the solution that meets the functional requirements. In addition to fulfilling the requirements satisfactorily, it offers the user an integrated solution with the pre-existing environment and has its own entity as a newly created element. Here are some possible guidelines for achieving such goals:

Virtual Reality for Urban Sound Design: A Tool for Architects and Urban Planners

http://dx.doi.org/10.5772/intechopen.75957

191

• *A solution integrated with the pre-existing environment*: it may contain vernacular sounds of the area or neighboring territory, replicating or imitating them. It can present contrasting sounds, highlighting those of the pre-existing environment. It can modulate pre-existing

• *A solution that has its own entity as a newly created element*: it can present a characteristic rhythm, whether regular or irregular. It can present a characteristic tonic, either monophonic or polyphonic. It can present characteristic timbres or textures that are produced by

Even though the framework depicted here seems easy to implement, the reality is rather different. In 2003, Kang et al. highlighted the introduction of new EU noise policies [30] and noted that noise-mapping software/techniques are being widely used in European cities [31]. Nevertheless, they stated that these techniques provide an overall picture for macro-scale urban areas. The micro-scale, for example an urban street or a square, could be more effectively studied using detailed acoustic simulation techniques. In addition, applications that predict and measure micro-scale environments, such as auralization techniques for indoor spaces [32], are still not sufficiently user-friendly, and the computation time is rather long. Kang et al. presented two computer models based on the radiosity and image source methods in an attempt to present to urban designers an interface that could be useful in the design stage, using simple formulae that can estimate sound propagation in micro-scale urban areas.

Artificial intelligence can be applied to virtual reality in many ways. In the user interface, we need systems that behave rationally, e.g. reflect user movements as accurately as possible. Content production needs tools for optimizing the layout of virtual worlds, and virtual world simulation needs methods for approximating the behavior of the environment [33]. This last challenge relates strongly acoustic virtual reality and artificial intelligence and might be the key factor in progress in the simulation of both closed spaces and open environments. Some studies have already included artificial intelligence in the field of acoustics. In particular, new methods identify room acoustic properties based on evolutionary algorithms (EA) [34]. Focused on the problem of learning from real acoustics, Cox et al. [35] developed a new method employing machine learning techniques and a modified low frequency envelop spectrum estimator, to estimate important room acoustic parameters including Reverberation Time (RT) and Early Decay Time (EDT) from received music signals. What is known as the machine audition field [36] therefore presents a promising method that can establish and enhance classical methods of acoustics. More specifically, architects and urban

sounds, reinforcing some frequencies or switching off others.

**5. Artificial intelligence and acoustic virtual reality**

However, the answer still has not been found.

concrete materials.

Acoustic virtual reality should consider this effect as a product of design. For this purpose, therefore, a rough approximation of the architectural geometry is not enough. More detail in the representation of these nuances would lead to a more realistic and credible representation of reality with this tool.

#### **4.4. The sound object as a reason for the architectural project**

This case study deals with the sound object not as a pretext for comfort in architecture, through congruence with visual, acoustic comfort or a sound impact. The sound object is now a topic of design. The design of the sound object must meet all the requirements that the solution to a specific problem requires, as in any architectural design process. Architectural design is a creative process, but its additional emphasis on the definition phase of the problem [27] places it in a special category. When the term "creative" is used in its most general sense, to describe a process in which an agent (a personal product and in the environment) interacts with the material to form a new synthesis of essential novelty, it embraces the entire design process, but is too general to be particularly useful. However, when used in a more specific sense, relating pure arts and sciences, it usually allows for more self-original and selfmotivated input, deriving from sensitive perception in art, and critical observation in science of the selected phenomenon. It is, then, a creative process of *synthesis*, preceded by *analysis* and followed by validation, especially in science, which ends up consisting of a piece of art or a hypothesis or validated theory [28]. This process of synthesis-analysis-synthesis, which summarizes all attitudes of observation of reality, [29] is the same as occurs in the process of auralization: *synthesis* of the sound emitted by the sound source, whose sound waves come from a single point; *analysis* or separation by parts of the different types of waves that bounce, pass or diffract in the elements of the built environment; and *synthesis* that is collected in a single point and that we have qualified as the "sound object". Below, we present a series of examples of acoustic targets, extracted from [13]:


However, the process of designing the sound object, like any architectural design process, requires that the solution to the problem addressed by the sound object is not just the solution that meets the functional requirements. In addition to fulfilling the requirements satisfactorily, it offers the user an integrated solution with the pre-existing environment and has its own entity as a newly created element. Here are some possible guidelines for achieving such goals:


## **5. Artificial intelligence and acoustic virtual reality**

Coderch's new building for the School of Architecture of Barcelona. The curved walls concentrate student's whispers in some areas that reinforce the noise perception at these points. The same effect can happen when, in an urban environment, a curved design of walls concentrates

Acoustic virtual reality should consider this effect as a product of design. For this purpose, therefore, a rough approximation of the architectural geometry is not enough. More detail in the representation of these nuances would lead to a more realistic and credible representation

This case study deals with the sound object not as a pretext for comfort in architecture, through congruence with visual, acoustic comfort or a sound impact. The sound object is now a topic of design. The design of the sound object must meet all the requirements that the solution to a specific problem requires, as in any architectural design process. Architectural design is a creative process, but its additional emphasis on the definition phase of the problem [27] places it in a special category. When the term "creative" is used in its most general sense, to describe a process in which an agent (a personal product and in the environment) interacts with the material to form a new synthesis of essential novelty, it embraces the entire design process, but is too general to be particularly useful. However, when used in a more specific sense, relating pure arts and sciences, it usually allows for more self-original and selfmotivated input, deriving from sensitive perception in art, and critical observation in science of the selected phenomenon. It is, then, a creative process of *synthesis*, preceded by *analysis* and followed by validation, especially in science, which ends up consisting of a piece of art or a hypothesis or validated theory [28]. This process of synthesis-analysis-synthesis, which summarizes all attitudes of observation of reality, [29] is the same as occurs in the process of auralization: *synthesis* of the sound emitted by the sound source, whose sound waves come from a single point; *analysis* or separation by parts of the different types of waves that bounce, pass or diffract in the elements of the built environment; and *synthesis* that is collected in a single point and that we have qualified as the "sound object". Below, we present a series of

**4.4. The sound object as a reason for the architectural project**

examples of acoustic targets, extracted from [13]:

• Only the sound of nature should be heard.

• The sounds of people cannot be heard.

• A specific sound should be clearly audible in some areas.

• Suitable for hearing unamplified/amplified speech (or music).

• Acoustic sculpture/installation sounds should be clearly audible.

• Sounds conveying a city's vitality should be the dominant sounds heard.

• Moving water or sounds of nature should be the dominant sound heard.

• Mostly (nonmechanical, nonamplified) sounds made by people should be heard.

• Sounds that convey the identity of a place should be the dominant sounds heard.

the sound rays in one area.

190 Artificial Intelligence - Emerging Trends and Applications

of reality with this tool.

Even though the framework depicted here seems easy to implement, the reality is rather different. In 2003, Kang et al. highlighted the introduction of new EU noise policies [30] and noted that noise-mapping software/techniques are being widely used in European cities [31]. Nevertheless, they stated that these techniques provide an overall picture for macro-scale urban areas. The micro-scale, for example an urban street or a square, could be more effectively studied using detailed acoustic simulation techniques. In addition, applications that predict and measure micro-scale environments, such as auralization techniques for indoor spaces [32], are still not sufficiently user-friendly, and the computation time is rather long. Kang et al. presented two computer models based on the radiosity and image source methods in an attempt to present to urban designers an interface that could be useful in the design stage, using simple formulae that can estimate sound propagation in micro-scale urban areas. However, the answer still has not been found.

Artificial intelligence can be applied to virtual reality in many ways. In the user interface, we need systems that behave rationally, e.g. reflect user movements as accurately as possible. Content production needs tools for optimizing the layout of virtual worlds, and virtual world simulation needs methods for approximating the behavior of the environment [33]. This last challenge relates strongly acoustic virtual reality and artificial intelligence and might be the key factor in progress in the simulation of both closed spaces and open environments. Some studies have already included artificial intelligence in the field of acoustics. In particular, new methods identify room acoustic properties based on evolutionary algorithms (EA) [34]. Focused on the problem of learning from real acoustics, Cox et al. [35] developed a new method employing machine learning techniques and a modified low frequency envelop spectrum estimator, to estimate important room acoustic parameters including Reverberation Time (RT) and Early Decay Time (EDT) from received music signals. What is known as the machine audition field [36] therefore presents a promising method that can establish and enhance classical methods of acoustics. More specifically, architects and urban planners always need comprehensive visualizations of the reality that they are designing. Approximation of the behavior of the environment on which they are working is therefore a major concern in their representations. For this reason, artificial intelligence could be a good way to solve some problems that still remain today. These questions draw possible future paths for investigation that we can sum up in the two following points:

**Author details**

Barcelona, Spain

**References**

2017. pp. 22-33

**4**:81-99

2717-2727

Address all correspondence to: josep.llorca@upc.edu

McGill-Queen's University Press; 2010

land: Univertisy of California Press; 2017

Cognitive Sciences. 1998;**2**(7):254-262

Journal of Building Performance Simulation. 2014;**8**(1):15-25

ing. Journal of the Audio Engineering Society. 2006;**54**(7/8):604-619

AR and M, Barcelona School of Architecture, Universitat Politècnica de Catalunya (UPC),

Virtual Reality for Urban Sound Design: A Tool for Architects and Urban Planners

http://dx.doi.org/10.5772/intechopen.75957

193

[1] Augoyard JF, Torgue H, Sonic Experience: A Guide to Everyday Sounds. Montreal:

[2] Schaeffer P. Treatise on Musical Objects: An Essay Across Disciplines. 1966th ed. Oak-

[3] Vorländer M, Schröder D, Pelzer S, Wefers F. Virtual reality for architectural acoustics.

[4] Llorca J, Redondo E, Valls F, Fonseca D, Villagrasa S. Acoustic Filter. Cham: Springer;

[5] Rehren C, Cárdenas J. Motores de Audio para Video Juegos. Síntesis Tecnológica. 2011;

[6] Schröder D, Lentz T. Real-time processing of image sources using binary space partition-

[7] Schröder D, Dross P, Vorländer M. A fast reverberation estimator for virtual environments––RWTH Aachen University Institute of Technical Acoustics––English. In: Proceedings of the AES 30th International Conference on Intelligent Audio Environments. Saariselkä, Finland; New York, NY: Audio Engineering Society; March 15-17, 2007 [8] Dirk Schröder, Michael Vorländer. RAVEN: A real-time framework for the auralization of interactive virtual environments. In: Conference: Proceedings of the Forum

Acusticum, Aalborg, Denmark Volume. January 2011. ISBN: 978-84-694-1520-7

[9] Poirier-Quinot D, Noisternig M, Katz BFG. EVERTims: Open source framework for realtime auralization in architectural acoustics and virtual reality. In: 20th International Conference on Digital Audio Effects (DAFx-17), Edinburgh, United Kingdom; Sep 2017

[10] Driver J, Spence C. Attention and the crossmodal construction of space. Trends in

[11] Santangelo V, Fagioli S, Macaluso E. The costs of monitoring simultaneously two sensory modalities decrease when dividing attention in space. NeuroImage. 2010;**49**(3):

Josep Llorca


## **6. Conclusions**

Acoustic architectural and urban design is an area that still needs to be addressed. Despite the fact that a considerable amount of research has been done on architectural acoustics, urban soundscape, and noise treatment, few real applications of these theories can be found in built projects. This chapter links Schaeffer's theory on sound objects to acoustic virtual reality, to describe a potential way of understanding the role of acoustic virtual reality in the field of architecture and urbanism. The main tools that could help practitioners have been presented here. Moreover, five specific applications have been proposed. However, much more work is needed to apply the theory to daily practice, and all efforts at bringing scientific research into everyday activity are welcome. The direction of artificial intelligence seems plausible in the future. Sound design is an old concern, and still remains as elusive and unpredictable today as it can be when it turns into music.

## **Acknowledgements**

This research was supported by the National Programme of Research, Development and Innovation aimed at Society Challenges BIA2016-77464-C2-1-R & BIA2016-77464-C2-2-R of the National Plan for Scientific Research, Development and Technological Innovation 2013-2016, Government of Spain, titled "*Gamificación para la enseñanza del diseño urbano y la integración en ella de la participación ciudadana* (EduGAME4CITY)", and "*Diseño Gamificado de visualización 3D con sistemas de realidad virtual para el studio de la mejora de competencias motivacionales, sociales y espaciales del usuario* (EduGAME4CITY)".

## **Author details**

Josep Llorca

planners always need comprehensive visualizations of the reality that they are designing. Approximation of the behavior of the environment on which they are working is therefore a major concern in their representations. For this reason, artificial intelligence could be a good way to solve some problems that still remain today. These questions draw possible future

• The wide variety of case studies that an architect deals with in everyday practice require an easy method for acoustic representation. Otherwise, decisions are taken by approximation only, or the effort of representation is too great for one studio. For this reason, a database of urban spaces with their defined acoustic features would be useful. The acoustic features could be used as the hidden layer in a neural network framework for rapid prediction of

• Measurements of acoustic properties of an architectural space need highly specialized equipment and special conditions. Not every architectural studio has the opportunity or means for making such measurements. Investigations to extract acoustic information from everyday recordings could help not only to analyze the current environment, but also to

Acoustic architectural and urban design is an area that still needs to be addressed. Despite the fact that a considerable amount of research has been done on architectural acoustics, urban soundscape, and noise treatment, few real applications of these theories can be found in built projects. This chapter links Schaeffer's theory on sound objects to acoustic virtual reality, to describe a potential way of understanding the role of acoustic virtual reality in the field of architecture and urbanism. The main tools that could help practitioners have been presented here. Moreover, five specific applications have been proposed. However, much more work is needed to apply the theory to daily practice, and all efforts at bringing scientific research into everyday activity are welcome. The direction of artificial intelligence seems plausible in the future. Sound design is an old concern, and still remains as elusive and unpredictable today

This research was supported by the National Programme of Research, Development and Innovation aimed at Society Challenges BIA2016-77464-C2-1-R & BIA2016-77464-C2-2-R of the National Plan for Scientific Research, Development and Technological Innovation 2013-2016, Government of Spain, titled "*Gamificación para la enseñanza del diseño urbano y la integración en ella de la participación ciudadana* (EduGAME4CITY)", and "*Diseño Gamificado de visualización 3D con sistemas de realidad virtual para el studio de la mejora de competencias motivacionales, sociales y* 

paths for investigation that we can sum up in the two following points:

the behavior of the environment in future modifications.

predict new architectural designs.

192 Artificial Intelligence - Emerging Trends and Applications

as it can be when it turns into music.

*espaciales del usuario* (EduGAME4CITY)".

**Acknowledgements**

**6. Conclusions**

Address all correspondence to: josep.llorca@upc.edu

AR and M, Barcelona School of Architecture, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain

## **References**


[12] Talsma D, Senkowski D, Soto-Faraco S, Woldorff MG.The multifaceted interplay between attention and multisensory integration. Trends in Cognitive Sciences. 2010;**14**(9):400-410

[30] EUR-Lex––32002L0049––GA––EUR-Lex. [Online]. Available from: http://eur-lex.europa. eu/legal-content/GA/TXT/?qid=1399875039336&uri=CELEX%3A32002L0049. [Accessed:

Virtual Reality for Urban Sound Design: A Tool for Architects and Urban Planners

http://dx.doi.org/10.5772/intechopen.75957

195

[31] Welcome to Schal. [Online]. Available from: http://www.tpsconsult.co.uk/schal.aspx.

[32] Jing Y, Xiang N. A modified diffusion equation for room-acoustic predication. The

[33] Laukkanen S, Karanta I, Kotovirta V, Markkanen J, Rönkkö J. Adding intelligence to virtual reality. In: ECAI 2004: Proceedings of the 16th European Conference on Artificial

[34] Poteralski A, Szczepanik M, Ptaszny J, Kuś W, Burczyński T. Hybrid artificial immune system in identification of room acoustic properties. Inverse Problem in Science

[35] Kendrick P, Cox TJ, Zhang Y, Chambers JA, Li FF. Room acoustic parameter extraction from music signals. In: 2006 IEEE International Conference on Acoustics Speed and

[36] Wang W. Machine audition: Principles, Algorithms, and Systems. Hershey: IGI Global;

Signal Processing Proceedings. Toulouse, France: IEEE; Vol. 5. pp. V-801-V-804

Journal of the Acoustical Society of America. 2007;**121**(6):3284-3287

Intelligence. Santa Clara, CA, USA: IEEE; 2004. p. 1136

November 09, 2017]

[Accessed: November 09, 2017]

Engineering. 2013;**21**(6):957-967


[30] EUR-Lex––32002L0049––GA––EUR-Lex. [Online]. Available from: http://eur-lex.europa. eu/legal-content/GA/TXT/?qid=1399875039336&uri=CELEX%3A32002L0049. [Accessed: November 09, 2017]

[12] Talsma D, Senkowski D, Soto-Faraco S, Woldorff MG.The multifaceted interplay between attention and multisensory integration. Trends in Cognitive Sciences. 2010;**14**(9):400-410

[13] Kang J, Schulte-Fortkamp B. Soundscape and the Built Environment. Boca Raton: CRC

[14] Filbrich L, Alamia A, Blandiaux S, Burns S, Legrain V. Shaping visual space perception through bodily sensations: Testing the impact of nociceptive stimuli on visual perception in peripersonal space with temporal order judgments. PLoS One. 2017;**12**(8):e0182634

[15] Vartanian O et al. Impact of contour on aesthetic judgments and approach-avoidance decisions in architecture. Proceedings of the National Academy of Sciences. 2013;**110**

[16] Al-barrak L, Kanjo E, Younis EMG. NeuroPlace: Categorizing urban places according to

[18] Zhang Y, Kang J, Kang J. Effects of soundscape on the environmental restoration in

[20] Frampton K. Modern Architecture: A Critical History. Oxford: Oxford Univeristy Press;

[21] Mumford M. Form follows nature: The origins of American organic architecture. Journal

[22] Krause LR. Frank Lloyd Wright: Organic architecture for the 21st century. Journal of

[23] Dennis JM, Wenneker LB. Ornamentation and the organic architecture of Frank Lloyd

[24] Banaei M, Hatami J, Yazdanfar A, Gramann K. Walking through architectural spaces: The impact of interior forms on human brain dynamics. Frontiers in Human Neuroscience.

[25] Officer CB. Introduction to the Theory of Sound Transmission: With Application to the Ocean/C.B. Officer.––Version Details––Trove. New York: McGraw-Hill; 1958

[28] Herbert G. The architectural design process. British Journal of Aesthetics. 1966;**6**(2):152 [29] Condillac. La lógica o los primeros elementos del arte de pensar. Imprenta d. Barcelona;

[27] Kneller GF. The Art and Science of Creativity. Holt: Rinehart and Winston; 1965

[17] Plack CJ. The Sense of Hearing. 2nd ed. London: Taylor & Francis Group; 2014

urban natural environments. Noise & Health. 2017;**19**(87):65-72

Press. Taylor & Francis Group; 2016

194 Artificial Intelligence - Emerging Trends and Applications

(Supplement\_2):10446-10453

1980

2017;**11**:477

1827

mental states. PLoS One. 2017;**12**(9):e0183890

[19] Kuttruff H. Acoustics. New York: Talyor & Francis; 2004

[26] Kinsler LE. Fundamentals of acoustics. New York: Wiley; 2000

of Architectural Education. 1989;**42**(3):26-37

Architectural Education. 2011;**65**(1):82-84

Wright. Art Journal. 1965;**25**(1):2-14


**Chapter 10**

**Provisional chapter**

**Blockchain: The Next Breakthrough in the Rapid**

**Blockchain: The Next Breakthrough in the Rapid** 

DOI: 10.5772/intechopen.75668

*"Distributed ledgers, also known as blockchains, could be the most consequential development in information technology since the internet. Created to support the Bitcoin digital currency, the blockchain is* 

*"By far the greatest danger of Artificial Intelligence is that people conclude too early that they under-*

Blockchain technologies, once used exclusively for buying and selling bitcoins, have entered the mainstream of computer applications, fundamentally changing the way Internet transactions can be implemented by ascertaining *trust* between unknown parties. In addition, they ensure *immutability* (once information is entered it cannot be modified) and enable *disintermediation* (as trust is assured, no third party is required to verify transactions). These advantages can produce disruptive changes when properly exploited, inspiring a large number of applications. These applications are forming the backbone of what can be called the Internet of Value, bound to bring as significant changes as those brought over the last 20 years by the traditional Internet. This chapter investigates blockchain and the technologies behind it and explains their technological might and outstanding potential, not only for transactions but also as distributed databases. It also discusses its future prospects and the disruptive changes it promises to bring, while also considering the challenges that would need to be overcome for its widespread adoption. Finally, the chapter considers combining blockchain with Artificial Intelligence (AI) and discusses the revolutionary changes that would result by rapidly advancing the AI field.

**Keywords:** blockchain applications, AI applications, combining blockchain and AI, disruptive technologies, smart contracts, DAO, decentralized storage, IoT, internet of value, decentralized cloud storage, supply chain operations, blockchain/AI startups

**Prof. Kevin Werbach, University of Pennsylvania, Wharton School.**

**Eliezer Yudkowsky, an AI theorist.**

*actually something deeper: A novel solution to the age-old human problem of trust."*

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

**Progress of AI**

**Progress of AI**

*stand it."*

**Abstract**

Spyros Makridakis, Antonis Polemitis,

Spyros Makridakis, Antonis Polemitis,

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

George Giaglis and Soula Louca

George Giaglis and Soula Louca

http://dx.doi.org/10.5772/intechopen.75668

#### **Blockchain: The Next Breakthrough in the Rapid Progress of AI Blockchain: The Next Breakthrough in the Rapid Progress of AI**

DOI: 10.5772/intechopen.75668

Spyros Makridakis, Antonis Polemitis, George Giaglis and Soula Louca Spyros Makridakis, Antonis Polemitis, George Giaglis and Soula Louca

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75668

*"Distributed ledgers, also known as blockchains, could be the most consequential development in information technology since the internet. Created to support the Bitcoin digital currency, the blockchain is actually something deeper: A novel solution to the age-old human problem of trust."*

**Prof. Kevin Werbach, University of Pennsylvania, Wharton School.**

*"By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it."*

#### **Eliezer Yudkowsky, an AI theorist.**

#### **Abstract**

Blockchain technologies, once used exclusively for buying and selling bitcoins, have entered the mainstream of computer applications, fundamentally changing the way Internet transactions can be implemented by ascertaining *trust* between unknown parties. In addition, they ensure *immutability* (once information is entered it cannot be modified) and enable *disintermediation* (as trust is assured, no third party is required to verify transactions). These advantages can produce disruptive changes when properly exploited, inspiring a large number of applications. These applications are forming the backbone of what can be called the Internet of Value, bound to bring as significant changes as those brought over the last 20 years by the traditional Internet. This chapter investigates blockchain and the technologies behind it and explains their technological might and outstanding potential, not only for transactions but also as distributed databases. It also discusses its future prospects and the disruptive changes it promises to bring, while also considering the challenges that would need to be overcome for its widespread adoption. Finally, the chapter considers combining blockchain with Artificial Intelligence (AI) and discusses the revolutionary changes that would result by rapidly advancing the AI field.

**Keywords:** blockchain applications, AI applications, combining blockchain and AI, disruptive technologies, smart contracts, DAO, decentralized storage, IoT, internet of value, decentralized cloud storage, supply chain operations, blockchain/AI startups

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **1. Introduction**

In a large IBM survey recently conducted by top executives on blockchain [1] it was found that one-third of the almost 3000 who participated responded that they are using, or considering adopting blockchain in their business. According to the survey, 8 in 10 of those exploring blockchain are investing either in response to financial shifts in their industry, or for the opportunity to develop entirely new business models. The results of the survey echo a recent article in *Forbes* [2] entitled *"Blockchain As Blockbuster: Still Too Soon To Tell, But Get Ready"*. The proponents of blockchain talk about its great potential capable of creating the same type of fundamental changes as those brought over the last two decades by the traditional Internet. Yet for the majority of people, including the two-thirds of executives in IBM's survey, blockchain remains an elusive concept, with its advantages not well understood by business people, government officials and the general public (the same thing was true with the Internet in the early 1990s). It is important, therefore, to explain blockchain and its unique advantages as well as its possible drawbacks and in particular the revolutionary changes that would result by integrating it with AI.

short intervals of time and the updated information is stored (appended) to all participat-

Blockchain: The Next Breakthrough in the Rapid Progress of AI

http://dx.doi.org/10.5772/intechopen.75668

199

• **Immutability and transparency***:* information can be appended only to previous data and, once entered, cannot be changed, modified or lost, providing a permanent, incorruptible historical record that stays in the system permanently. Moreover, changes to public block-

• **Disintermediation***:* the ledger (database) is not maintained by any single person, company or government but by all participating computers located around the world. This means that two parties are able to generate an exchange without the need for a trusted intermedi-

• *Lower costs and greater speeds:* lower transaction costs and greater speed are also characteristics of blockchain in a good number of applications by removing the monopolistic power of powerful intermediaries (e.g. banks) or large, centralized industry leaders (e.g. Airbnb).

Blockchain provides a fundamental shift from the Internet of information/communications to the Internet of Value. The difference between the two is fundamental. The first disrupted business models in the 2000s and created the likes of Amazon, Google, Facebook, Alibaba as well as Uber and Airbnb. Its disadvantage is that the information transmitted can be copied, thus making it impossible to guarantee its trustworthiness without the approval of an intermediary, for example, a bank verifying that the money being transmitted is available. The biggest advantage of the Internet of Value is the establishment of trust, through the application of blockchain technology, between strangers who can now trust each other. This means assets can be exchanged in an instant and efficient manner **without** intermediaries who are no longer needed as trust is built into the system. Such an advantage of the Internet of Value is bound to cause even more profound changes than those brought by the Internet of information. Trusted peer-to-peer transactions will encourage the formation of decentralized structures, diminishing the monopolistic power of intermediaries such as banks or firms like Uber and Airbnb [3]. This will be done through the creation of new players that would exploit the blockchain-based platforms of decentralized networks with the potential to dramatically narrow the monopolistic power of today's dominant actors, democratizing the global economy

Blockchain applications started slowly introducing bitcoins after Nakamoto's 2008 paper and were restricted to cryptocurrencies until July 2015 when the Ethereum platform was released, allowing the issuing of smart contracts. At around the same time Estonia started implementing blockchain technologies in its governmental operations, including an ehealth record system that covered any one of its citizens who had ever visited a doctor. Further applications were introduced in 2016 with smart contracts and decentralized autonomous organizations (DAOs) with huge potential thus fundamentally affecting the legal profession and the management of organizations (see below). However, the most significant applications are taking place since

chains can be seen by all parties in the network thus ensuring transparency.

ary to authenticate the transactions or verify the records.

**3. Why blockchain is a disruptive technology**

and creating a more efficient and sustainable economic system [3].

ing network computers.

The purpose of this chapter is to investigate blockchain and the technologies behind it and explain its might and outstanding potential. It consists of three parts. The first part describes blockchain's achievements and expands on its ability to transform peer-to-peer collaboration by, among its other benefits, removing the need for trusted intermediaries. The second part looks at its future prospects, including its utilization as a distributed ledger and the disruptive changes it will bring while also considering the challenges that would need to be overcome, including the fear of hacking and the possible fraud associated with the utilization of the technology. The final part considers combining blockchain and AI and the breakthrough applications that could result from such a marriage. There is also a concluding section summarizing the chapter and suggesting some directions for future work.

## **2. The uniqueness of blockchain: decentralized, authenticated and immutable information at lower costs**

Blockchain is simply a decentralized, or distributed ledger (versus the centralized ones kept by, say, banks to record transactions and keep customer balances) of trustworthy digital records shared by a network of participants. As such, it expands the traditional Internet of information and communications (emails, sending/receiving/searching for information, exchanging files, participating in social media, etc.) to a new category that can be called the "Internet of Value". Such Internet includes sending/receiving money between two parties without the need for financial intermediaries, buying and selling stocks, keeping/issuing certificates, including real estate titles, creating/executing smart contacts, improving supply chains, etc. Blockchain's uniqueness comes from the following capabilities:

• *Trust:* new information can be added only when the majority of computers in the network give their approval after satisfactory proof is provided that the information, which is transmitted cryptographically, is truthful. The authentication of information is done in short intervals of time and the updated information is stored (appended) to all participating network computers.


## **3. Why blockchain is a disruptive technology**

**1. Introduction**

198 Artificial Intelligence - Emerging Trends and Applications

In a large IBM survey recently conducted by top executives on blockchain [1] it was found that one-third of the almost 3000 who participated responded that they are using, or considering adopting blockchain in their business. According to the survey, 8 in 10 of those exploring blockchain are investing either in response to financial shifts in their industry, or for the opportunity to develop entirely new business models. The results of the survey echo a recent article in *Forbes* [2] entitled *"Blockchain As Blockbuster: Still Too Soon To Tell, But Get Ready"*. The proponents of blockchain talk about its great potential capable of creating the same type of fundamental changes as those brought over the last two decades by the traditional Internet. Yet for the majority of people, including the two-thirds of executives in IBM's survey, blockchain remains an elusive concept, with its advantages not well understood by business people, government officials and the general public (the same thing was true with the Internet in the early 1990s). It is important, therefore, to explain blockchain and its unique advantages as well as its possible drawbacks

and in particular the revolutionary changes that would result by integrating it with AI.

**2. The uniqueness of blockchain: decentralized, authenticated and** 

chains, etc. Blockchain's uniqueness comes from the following capabilities:

Blockchain is simply a decentralized, or distributed ledger (versus the centralized ones kept by, say, banks to record transactions and keep customer balances) of trustworthy digital records shared by a network of participants. As such, it expands the traditional Internet of information and communications (emails, sending/receiving/searching for information, exchanging files, participating in social media, etc.) to a new category that can be called the "Internet of Value". Such Internet includes sending/receiving money between two parties without the need for financial intermediaries, buying and selling stocks, keeping/issuing certificates, including real estate titles, creating/executing smart contacts, improving supply

• *Trust:* new information can be added only when the majority of computers in the network give their approval after satisfactory proof is provided that the information, which is transmitted cryptographically, is truthful. The authentication of information is done in

the chapter and suggesting some directions for future work.

**immutable information at lower costs**

The purpose of this chapter is to investigate blockchain and the technologies behind it and explain its might and outstanding potential. It consists of three parts. The first part describes blockchain's achievements and expands on its ability to transform peer-to-peer collaboration by, among its other benefits, removing the need for trusted intermediaries. The second part looks at its future prospects, including its utilization as a distributed ledger and the disruptive changes it will bring while also considering the challenges that would need to be overcome, including the fear of hacking and the possible fraud associated with the utilization of the technology. The final part considers combining blockchain and AI and the breakthrough applications that could result from such a marriage. There is also a concluding section summarizing

Blockchain provides a fundamental shift from the Internet of information/communications to the Internet of Value. The difference between the two is fundamental. The first disrupted business models in the 2000s and created the likes of Amazon, Google, Facebook, Alibaba as well as Uber and Airbnb. Its disadvantage is that the information transmitted can be copied, thus making it impossible to guarantee its trustworthiness without the approval of an intermediary, for example, a bank verifying that the money being transmitted is available. The biggest advantage of the Internet of Value is the establishment of trust, through the application of blockchain technology, between strangers who can now trust each other. This means assets can be exchanged in an instant and efficient manner **without** intermediaries who are no longer needed as trust is built into the system. Such an advantage of the Internet of Value is bound to cause even more profound changes than those brought by the Internet of information. Trusted peer-to-peer transactions will encourage the formation of decentralized structures, diminishing the monopolistic power of intermediaries such as banks or firms like Uber and Airbnb [3]. This will be done through the creation of new players that would exploit the blockchain-based platforms of decentralized networks with the potential to dramatically narrow the monopolistic power of today's dominant actors, democratizing the global economy and creating a more efficient and sustainable economic system [3].

Blockchain applications started slowly introducing bitcoins after Nakamoto's 2008 paper and were restricted to cryptocurrencies until July 2015 when the Ethereum platform was released, allowing the issuing of smart contracts. At around the same time Estonia started implementing blockchain technologies in its governmental operations, including an ehealth record system that covered any one of its citizens who had ever visited a doctor. Further applications were introduced in 2016 with smart contracts and decentralized autonomous organizations (DAOs) with huge potential thus fundamentally affecting the legal profession and the management of organizations (see below). However, the most significant applications are taking place since 2016 with a large number of startups working on innovative solutions that are going to change the economic landscape [4] and turn blockchain into a momentous technological force.

**4.2. Payments and money transfers**

locations worldwide.

rate of 2%.

**4.3. Securities trading**

**4.4. Health care**

By avoiding a central authority to verify payments and money transfers, costs can be substantially reduced. At present, there are a good number of services using the technology aimed primarily at those without bank accounts or those looking for important cost savings. Below

Blockchain: The Next Breakthrough in the Rapid Progress of AI

http://dx.doi.org/10.5772/intechopen.75668

201

• **Abra** (USA) is a mobile application allowing person-to-person money transfers. The app

• **Align Commerce** (USA) is a payment service provider (PSP) allowing businesses to

• **Bitspark** (Hong Kong) is an end to end remittance platform to any of their 100,000 plus

• **Rebit** (Philippines) is a money transfer service offering significantly lower rates to the

• **CoinRip** (Singapore) is a service offering safe and quick money transfer charging a flat

• **BitPesa** (Africa) is a cheap and safe money transferring service operating in Africa.

Blockchain technologies aim to reduce costs and speed up trading while also simplifying the settlement process. For these reasons, many stock exchanges are considering introducing blockchain to their operation. The London Stock Exchange, the Australian Securities Exchange and the Tokyo Stock Exchange are already experimenting with blockchain technologies which are expected to be operational in the near future. Banks and financial companies are also exploring blockchain applications for security trading. T-zero (see https://tzero. com/), a US startup, claims on its website to be the first blockchain-based trading platform that integrate cryptographically secure distributed ledgers with existing market processes to reduce settlement time and costs, increase transparency, efficiency and auditability.

Health care costs are skyrocketing, estimated to be around 10% of GDP in developed countries and exceeding 17% (close to \$3 trillion) in the USA. This means that any effort to improve health care services can result in substantial savings and blockchain technologies are prime candidates to achieve such savings while improving efficiency and probably saving lives at the same time. There are short-term blockchain applications ready to apply and ambitious,

• *Security and trust:* collect complete health data (medical reports for each patient, history of illnesses, lab results and X-rays) in a secure manner, using a unique identifier for every person and only allow the sharing of such data with the express permission of the

is a brief description of six blockchain services located in various parts of the world

can be downloaded from Apple or Google stores.

send and receive payments in local currencies.

many Philippine immigrants working abroad.

long-term ones aimed at revolutionizing the health industry.

#### **3.1. Achievements**

Apple, Google, Amazon, Facebook, Tencent, Alibaba, Samsung, Netflix, Baidu and Uber (with a combined market surpassing \$4.3 trillion at the beginning of 2018) were created by exploiting the advantages provided by the evolving Internet of the late 1990s and the 2000s until now. These eight firms disrupted the economy and business sector by revolutionizing shopping and viewing habits, the search for information and advertising spending, among others, in ways no one could have predicted in the early 1990s when the Internet was introduced. As blockchain holds the potential for equal or even greater disruptions, particularly when combined with AI (see Section 3), revolutionary changes of considerable magnitude covering a wide range of industries and products/services will emerge over the next 20 years and new firms, corresponding to the eight ones mentioned will probably emerge. The great challenge for entrepreneurs is to direct their startups to exploit the emerging blockchain technologies and develop new applications and innovative products/services at affordable prices to better satisfy existing and emerging needs.

Below is a presentation of what we believe are the 10 most important *existing*, *or soon to be introduced* blockchain applications, highlighting their usage and advantages and mentioning the startups that have been formed to develop and implement them. These applications have been classified in terms of the industries that are being affected and the various applications being pursued. There is no doubt that many more applications will be introduced in the future, some of them becoming successful breakthroughs, in particular when combined with AI algorithms.

## **4. Industries**

## **4.1. Banking**

Blockchain banking applications can reduce costs by as much as \$20 billion by eliminating intermediaries and increasing the safety and efficiency of banking transactions [5]. A leading startup in the field is ThoughtMachine that has developed Vault OS, which is run in the cloud, providing a secure, fast and reliable end to end banking system capable of managing users, accounts, savings, loans, mortgages and more sophisticated financial products (see https:// www.thoughtmachine.net/). An alternative blockchain banking application is Corda, a distributed ledger platform that is the outcome of over 2 years of intense research and development by the R3 startup and 80 of the world's largest financial institutions. It meets the highest standards of the banking industry, yet it is applicable to any commercial scenario. Using Corda, participants can transact without the need for central authorities creating a world of frictionless commerce (see https://www.corda.net/). According to *Business Insider* [6], practically all major global banks are experimenting with blockchain technology trying to reduce cost and improve safety and operational efficiencies while, at the same time, making sure that they will not be left behind startups utilizing blockchain technologies to dominate the market.

#### **4.2. Payments and money transfers**

2016 with a large number of startups working on innovative solutions that are going to change

Apple, Google, Amazon, Facebook, Tencent, Alibaba, Samsung, Netflix, Baidu and Uber (with a combined market surpassing \$4.3 trillion at the beginning of 2018) were created by exploiting the advantages provided by the evolving Internet of the late 1990s and the 2000s until now. These eight firms disrupted the economy and business sector by revolutionizing shopping and viewing habits, the search for information and advertising spending, among others, in ways no one could have predicted in the early 1990s when the Internet was introduced. As blockchain holds the potential for equal or even greater disruptions, particularly when combined with AI (see Section 3), revolutionary changes of considerable magnitude covering a wide range of industries and products/services will emerge over the next 20 years and new firms, corresponding to the eight ones mentioned will probably emerge. The great challenge for entrepreneurs is to direct their startups to exploit the emerging blockchain technologies and develop new applications and innovative products/services at affordable prices to better

Below is a presentation of what we believe are the 10 most important *existing*, *or soon to be introduced* blockchain applications, highlighting their usage and advantages and mentioning the startups that have been formed to develop and implement them. These applications have been classified in terms of the industries that are being affected and the various applications being pursued. There is no doubt that many more applications will be introduced in the future, some of them becoming successful breakthroughs, in particular when combined with AI algorithms.

Blockchain banking applications can reduce costs by as much as \$20 billion by eliminating intermediaries and increasing the safety and efficiency of banking transactions [5]. A leading startup in the field is ThoughtMachine that has developed Vault OS, which is run in the cloud, providing a secure, fast and reliable end to end banking system capable of managing users, accounts, savings, loans, mortgages and more sophisticated financial products (see https:// www.thoughtmachine.net/). An alternative blockchain banking application is Corda, a distributed ledger platform that is the outcome of over 2 years of intense research and development by the R3 startup and 80 of the world's largest financial institutions. It meets the highest standards of the banking industry, yet it is applicable to any commercial scenario. Using Corda, participants can transact without the need for central authorities creating a world of frictionless commerce (see https://www.corda.net/). According to *Business Insider* [6], practically all major global banks are experimenting with blockchain technology trying to reduce cost and improve safety and operational efficiencies while, at the same time, making sure that they will not be left behind startups utilizing blockchain technologies to dominate the market.

the economic landscape [4] and turn blockchain into a momentous technological force.

**3.1. Achievements**

**4. Industries**

**4.1. Banking**

satisfy existing and emerging needs.

200 Artificial Intelligence - Emerging Trends and Applications

By avoiding a central authority to verify payments and money transfers, costs can be substantially reduced. At present, there are a good number of services using the technology aimed primarily at those without bank accounts or those looking for important cost savings. Below is a brief description of six blockchain services located in various parts of the world


#### **4.3. Securities trading**

Blockchain technologies aim to reduce costs and speed up trading while also simplifying the settlement process. For these reasons, many stock exchanges are considering introducing blockchain to their operation. The London Stock Exchange, the Australian Securities Exchange and the Tokyo Stock Exchange are already experimenting with blockchain technologies which are expected to be operational in the near future. Banks and financial companies are also exploring blockchain applications for security trading. T-zero (see https://tzero. com/), a US startup, claims on its website to be the first blockchain-based trading platform that integrate cryptographically secure distributed ledgers with existing market processes to reduce settlement time and costs, increase transparency, efficiency and auditability.

#### **4.4. Health care**

Health care costs are skyrocketing, estimated to be around 10% of GDP in developed countries and exceeding 17% (close to \$3 trillion) in the USA. This means that any effort to improve health care services can result in substantial savings and blockchain technologies are prime candidates to achieve such savings while improving efficiency and probably saving lives at the same time. There are short-term blockchain applications ready to apply and ambitious, long-term ones aimed at revolutionizing the health industry.

• *Security and trust:* collect complete health data (medical reports for each patient, history of illnesses, lab results and X-rays) in a secure manner, using a unique identifier for every person and only allow the sharing of such data with the express permission of the individual involved. Blockchain technology will eliminate the more than 450 health data breaches, affecting over 27 million patients, reported in 2016.

**5. Applications**

**5.1. Smart contracts**

tralized smart contract applications.

**5.2. Supply chain**

**5.3. IoT**

ogy for IoT devices.

Smart contracts are probably the blockchain technology with the highest potential to affect, or even revolutionize all sorts of transactions from the execution of wills to the Internet of Things (IoT). The major innovation of smart contracts is the elimination of trusted intermediaries. Consider, for example, the executor of a will who approves the directives of the deceased of how the money will be spent/allocated. Instead of an executor, a programmable, legally binding smart contract can achieve the same purpose, using blockchain technology, avoiding the trusted intermediary, while reducing costs and improving efficiency. An additional, application of smart contracts is with IoT, facilitating the sharing of services and resources leading to the creation of a marketplace of services between devices that would allow to automate in a cryptographically verifiable manner several existing, time-consuming work flows [7]. Most importantly, such technology is the central principle behind **Ethereum** (see below), a new extension of blockchain technologies focusing on running the programming code of decen-

Blockchain: The Next Breakthrough in the Rapid Progress of AI

http://dx.doi.org/10.5772/intechopen.75668

203

Supply chain operations are dominated by paper-based methods requiring letters of credit (costing 1–3%) and factoring (costing 5–10%), increasing costs by an estimated trillion dollars and also slowing down transactions. Such costs could be reduced substantially, using blockchain technology that will eliminate intermediaries by establishing trust between buyers and sellers. There are several startups, among them, Skuchain, aiming its blockchain technology at the intersection of payments (letter of credit or wire transfer), finance (operating and short-term trade loans) and Provenance focusing on tracking the authenticity and social and environmental credentials of goods from the source all the way to the final consumer. In addition to startups, big companies like Walmart, are also aiming at exploiting the advantages of

Blockchain could revolutionize the IoT if applied securely to the estimated 8.5–20 billion of connected IoT devices that existed in 2017 and are expected to grow to 1 trillion by 2020. Exploiting the information generated by IoT devices intelligently can transform our homes and cities and have a profound effect on the quality of our lives while saving energy. According to Compton [9], *"Because blockchain is built for decentralized control, a security scheme based on it should be more scalable than a traditional one. And blockchain's strong protections against data tampering would help prevent a rogue device from disrupting a home, factory or transportation system by relaying misleading information".* Eciotify, a startup, specializing in applying blockchain to the IoT, plans to roll out applications utilizing blockchain technol-

blockchain technology to improve efficiency and reduce supply chain costs [8].


Estonia has implemented a blockchain application, eHealth, covering all its citizens. In addition, there are a number of startups like GEM claiming to have developed the first application for health claims based on blockchain technology. This is done by introducing real time transparency and substantially reducing the time for bills to be paid by the sharing of the same platform among those involved. There are several other startups, some of which are already operating, and some on the way to becoming functional, like Guardtime, operating in Estonia and being used by patients, providers, private and public health companies and the government to store and access information from their eHealth system in a safe and efficient way. Similar functions are provided by Brontech, an Australian startup, offering reliable health data to improve the diagnostic process among others; Health Co aims at revolutionizing the relationship between medical researchers and users; Factom, Stratumn and Tierion are mostly concerned with improving the quality of health data while the purpose of Blockpharma is to fight drug counterfeiting.

#### **4.5. Retail**

The multinational eBay is the leader for online commerce between consumer-to-consumer sales. OpenBazaar, is a new startup challenging eBay by utilizing blockchain technology to decentralize online person-to-person trade. By running a program on their computer, users can connect to other users in the OpenBazaar network and trade directly with them. This network is not controlled or run by an owning organization but is decentralized and free. This means there are no mandatory fees to pay, and that trades are not monitored by a central organization (see https://www.cbinsights.com/company/openbazaar).

## **5. Applications**

individual involved. Blockchain technology will eliminate the more than 450 health data

• *Exchangeability of information:* health information between the various actors is not communicated freely creating silos that hinder its effective utilization to improve health care. Blockchain technology can improve both the exchangeability of information and

• *Claim settlement and bill management:* facilitate claim settlement by reducing bureaucracy and introduce bill management to reduce fraud and speed up payment. This can be

• *Authentication of medical drugs:* ensure the integrity of medical drugs based on current industry estimates pharmaceutical companies incur an estimated annual loss of \$200 billion due to counterfeit drugs globally while about 30% of drugs sold in developing

• *Clinical trials and medical research:* it is estimated that as much as 50% of clinical trials go unreported, and that investigators often fail to share their study results. Blockchain technologies can address the issues through the time-stamped, immutable records of clinical trials. Most importantly, the technology could facilitate collaboration between participants and researchers and could contribute in improving the quality of medical research.

Estonia has implemented a blockchain application, eHealth, covering all its citizens. In addition, there are a number of startups like GEM claiming to have developed the first application for health claims based on blockchain technology. This is done by introducing real time transparency and substantially reducing the time for bills to be paid by the sharing of the same platform among those involved. There are several other startups, some of which are already operating, and some on the way to becoming functional, like Guardtime, operating in Estonia and being used by patients, providers, private and public health companies and the government to store and access information from their eHealth system in a safe and efficient way. Similar functions are provided by Brontech, an Australian startup, offering reliable health data to improve the diagnostic process among others; Health Co aims at revolutionizing the relationship between medical researchers and users; Factom, Stratumn and Tierion are mostly concerned with improving the quality of health data while the purpose of Blockpharma is to

The multinational eBay is the leader for online commerce between consumer-to-consumer sales. OpenBazaar, is a new startup challenging eBay by utilizing blockchain technology to decentralize online person-to-person trade. By running a program on their computer, users can connect to other users in the OpenBazaar network and trade directly with them. This network is not controlled or run by an owning organization but is decentralized and free. This means there are no mandatory fees to pay, and that trades are not monitored by a central

organization (see https://www.cbinsights.com/company/openbazaar).

achieved more efficiently by creating consortia of health providers and insurers.

breaches, affecting over 27 million patients, reported in 2016.

its quality leading to significant benefits.

202 Artificial Intelligence - Emerging Trends and Applications

countries are considered imitations.

fight drug counterfeiting.

**4.5. Retail**

#### **5.1. Smart contracts**

Smart contracts are probably the blockchain technology with the highest potential to affect, or even revolutionize all sorts of transactions from the execution of wills to the Internet of Things (IoT). The major innovation of smart contracts is the elimination of trusted intermediaries. Consider, for example, the executor of a will who approves the directives of the deceased of how the money will be spent/allocated. Instead of an executor, a programmable, legally binding smart contract can achieve the same purpose, using blockchain technology, avoiding the trusted intermediary, while reducing costs and improving efficiency. An additional, application of smart contracts is with IoT, facilitating the sharing of services and resources leading to the creation of a marketplace of services between devices that would allow to automate in a cryptographically verifiable manner several existing, time-consuming work flows [7]. Most importantly, such technology is the central principle behind **Ethereum** (see below), a new extension of blockchain technologies focusing on running the programming code of decentralized smart contract applications.

## **5.2. Supply chain**

Supply chain operations are dominated by paper-based methods requiring letters of credit (costing 1–3%) and factoring (costing 5–10%), increasing costs by an estimated trillion dollars and also slowing down transactions. Such costs could be reduced substantially, using blockchain technology that will eliminate intermediaries by establishing trust between buyers and sellers. There are several startups, among them, Skuchain, aiming its blockchain technology at the intersection of payments (letter of credit or wire transfer), finance (operating and short-term trade loans) and Provenance focusing on tracking the authenticity and social and environmental credentials of goods from the source all the way to the final consumer. In addition to startups, big companies like Walmart, are also aiming at exploiting the advantages of blockchain technology to improve efficiency and reduce supply chain costs [8].

## **5.3. IoT**

Blockchain could revolutionize the IoT if applied securely to the estimated 8.5–20 billion of connected IoT devices that existed in 2017 and are expected to grow to 1 trillion by 2020. Exploiting the information generated by IoT devices intelligently can transform our homes and cities and have a profound effect on the quality of our lives while saving energy. According to Compton [9], *"Because blockchain is built for decentralized control, a security scheme based on it should be more scalable than a traditional one. And blockchain's strong protections against data tampering would help prevent a rogue device from disrupting a home, factory or transportation system by relaying misleading information".* Eciotify, a startup, specializing in applying blockchain to the IoT, plans to roll out applications utilizing blockchain technology for IoT devices.

#### **5.4. Decentralized cloud storage**

Computer storage was decentralized in individual computers until about a decade ago when Dropbox was founded providing the first, modern, centralized cloud storage unit. Since then cloud computing was introduced revolutionizing applications by encouraging firms to outsource their storage needs to the likes of Amazon, Google or Microsoft Web Services. The advantage of such services was lower costs and greater reliability. Blockchain technology aims to re-decentralize computer storage to individual computers all over the world. According to experts [10], there are three major reasons for such a switch. First, the cost of most cloud services is around \$25 per terabyte per month while the corresponding one of blockchain storage is 12.5 times cheaper at \$2 per terabyte/month. Second, there is greater security as blockchain data is encrypted, meaning that only users holding the appropriate keys can view it (data stored in commercial cloud services could be viewed by third parties). Finally, blockchain cloud storage is immutable while providing a record of all historical changes done on the data.

**8. Ethereum**

ship, fraud or third-party interference.

Ethereum, like bitcoin, is a distributed public blockchain network (developed by the nonprofit Swiss foundation of the same name) upholding its unique capabilities (Trust, immuta-

Blockchain: The Next Breakthrough in the Rapid Progress of AI

http://dx.doi.org/10.5772/intechopen.75668

205

• Running applications exactly as programmed without any possibility of downtime, censor-

• Enabling developers to build and deploy decentralized applications, serving specific purposes that become part of the blockchain network and as such not controlled by any indi-

• Exploiting the ethereum virtual machine (EVM) to run any desired program, written in any programming language, by using the EVM developers, without the need to create blockchain applications from scratch but can utilize the thousands of existing ones already

Blockchain is becoming one of the most remarkable technologies since the appearance of the Internet [11]. The large number of innovative applications based on this technology and the great interest shown from business firms, government organizations and individuals is mainly due to its ability to assure trust between parties that do not know each other, guarantee the safety of transactions and attest to the trustworthiness of the information, in addition to its other advantages. The interest in the technology can be seen from the Consensus Blockchain Conference, held in May 2017, which attracted more than 2000 participants and was just one of the more than 200 conferences held during 2017, as well as the more than 110 startups established in recent years and the exponentially increasing number of students attending blockchain programs. For instance, in the University of Nicosia's online blockchain course, there were 164 registrations from all over the world in 2017, versus 23 when this program was offered for the first time in 2013. In addition, there are 5495 registrations, from all five continents, who follow its MOOC class this year, versus 642 when it was first offered in the Spring of 2014. These numbers show the growing interest from the part of students while the university's blockchain placement office receives numerous requests each week from companies asking for graduates from its blockchain programs that could work for them. The previous section of this chapter covered the blockchain technology and the various applications already, or in the process of, being implemented. This section discusses its future prospects and the challenges until its widespread adoption by business firms, governmental organizations and individuals. Faster and cheaper computers, lower storage costs and a host of specialized applications (some of them already discussed in the previous section) will accelerate its widespread adoption and will produce disruptive changes that will become revolutionary when blockchain is combined with AI algorithms, exploiting the advantages of both technologies. There are always the doubters saying that blockchain is overhyped [12, 13] but the same

bility/transparency, disintermediation, low costs) but with the additional three:

vidual or central entity which is the case of Internet applications.

available (one type of such applications can be smart contracts).

**8.1. Blockchain technologies: future prospects and major challenges**

#### **5.5. Certification**

One of the great promises of blockchain technology is that it can serve as a decentralized, permanently unalterable storage alternative for all types of information, or assets, not just as a currency or payment system. This makes the technology a prime tool for certifying all sorts of information, transactions, documents and records. What has attracted the greatest interest, however, is the certification of data (with the startup Stampery being the leader) and of identities (with the startup ShoCard being the leader). There are many, additional areas where certification using blockchain technology can be applied including the issuing of IDs and even voting.

## **6. Other blockchain applications**

There are many additional applications exploiting blockchain technologies. These include true decentralized ride-sharing services (Uber and Lyft are actually centralized taxi services) like those offered by La'Zooz and Arcade City. Stratumn, a platform aiming to automate auditing, Synereo whose purpose is to aid users to create content, publish and distribute it online, Docusign offering the eSigniture solution and Steem, a social media platform where anyone can earn rewards, with some of these startups already operational while others are still being developed.

## **7. Specialized blockchain VC firms and geographical distribution of funding**

According to FinTech News, in Switzerland eight major Venture Capital Firms have invested more than \$1.55 billion in bitcoin and blockchain startups since 2012. Country wise the USA dominates the race with 55% of the total, followed by UK with 6%, Singapore with 3% and Japan, South Korea and China with 2% each. As interest in blockchain technologies increases, it is expected that VC investments will increase too accelerating the number of available applications.

## **8. Ethereum**

**5.4. Decentralized cloud storage**

204 Artificial Intelligence - Emerging Trends and Applications

**5.5. Certification**

**funding**

**6. Other blockchain applications**

Computer storage was decentralized in individual computers until about a decade ago when Dropbox was founded providing the first, modern, centralized cloud storage unit. Since then cloud computing was introduced revolutionizing applications by encouraging firms to outsource their storage needs to the likes of Amazon, Google or Microsoft Web Services. The advantage of such services was lower costs and greater reliability. Blockchain technology aims to re-decentralize computer storage to individual computers all over the world. According to experts [10], there are three major reasons for such a switch. First, the cost of most cloud services is around \$25 per terabyte per month while the corresponding one of blockchain storage is 12.5 times cheaper at \$2 per terabyte/month. Second, there is greater security as blockchain data is encrypted, meaning that only users holding the appropriate keys can view it (data stored in commercial cloud services could be viewed by third parties). Finally, blockchain cloud storage is immutable while providing a record of all historical changes done on the data.

One of the great promises of blockchain technology is that it can serve as a decentralized, permanently unalterable storage alternative for all types of information, or assets, not just as a currency or payment system. This makes the technology a prime tool for certifying all sorts of information, transactions, documents and records. What has attracted the greatest interest, however, is the certification of data (with the startup Stampery being the leader) and of identities (with the startup ShoCard being the leader). There are many, additional areas where certification using blockchain technology can be applied including the issuing of IDs and even voting.

There are many additional applications exploiting blockchain technologies. These include true decentralized ride-sharing services (Uber and Lyft are actually centralized taxi services) like those offered by La'Zooz and Arcade City. Stratumn, a platform aiming to automate auditing, Synereo whose purpose is to aid users to create content, publish and distribute it online, Docusign offering the eSigniture solution and Steem, a social media platform where anyone can earn rewards,

with some of these startups already operational while others are still being developed.

**7. Specialized blockchain VC firms and geographical distribution of** 

According to FinTech News, in Switzerland eight major Venture Capital Firms have invested more than \$1.55 billion in bitcoin and blockchain startups since 2012. Country wise the USA dominates the race with 55% of the total, followed by UK with 6%, Singapore with 3% and Japan, South Korea and China with 2% each. As interest in blockchain technologies increases, it is expected that VC investments will increase too accelerating the number of available applications.

Ethereum, like bitcoin, is a distributed public blockchain network (developed by the nonprofit Swiss foundation of the same name) upholding its unique capabilities (Trust, immutability/transparency, disintermediation, low costs) but with the additional three:


#### **8.1. Blockchain technologies: future prospects and major challenges**

Blockchain is becoming one of the most remarkable technologies since the appearance of the Internet [11]. The large number of innovative applications based on this technology and the great interest shown from business firms, government organizations and individuals is mainly due to its ability to assure trust between parties that do not know each other, guarantee the safety of transactions and attest to the trustworthiness of the information, in addition to its other advantages. The interest in the technology can be seen from the Consensus Blockchain Conference, held in May 2017, which attracted more than 2000 participants and was just one of the more than 200 conferences held during 2017, as well as the more than 110 startups established in recent years and the exponentially increasing number of students attending blockchain programs. For instance, in the University of Nicosia's online blockchain course, there were 164 registrations from all over the world in 2017, versus 23 when this program was offered for the first time in 2013. In addition, there are 5495 registrations, from all five continents, who follow its MOOC class this year, versus 642 when it was first offered in the Spring of 2014. These numbers show the growing interest from the part of students while the university's blockchain placement office receives numerous requests each week from companies asking for graduates from its blockchain programs that could work for them.

The previous section of this chapter covered the blockchain technology and the various applications already, or in the process of, being implemented. This section discusses its future prospects and the challenges until its widespread adoption by business firms, governmental organizations and individuals. Faster and cheaper computers, lower storage costs and a host of specialized applications (some of them already discussed in the previous section) will accelerate its widespread adoption and will produce disruptive changes that will become revolutionary when blockchain is combined with AI algorithms, exploiting the advantages of both technologies. There are always the doubters saying that blockchain is overhyped [12, 13] but the same was true when the Internet was in its infancy back in 1995. In a *Newsweek* article in February of that year, Clifford Stoll, a computer expert, wrote *"Baloney. Do our computer pundits lack all common sense? The truth is no online database will replace your daily newspaper, no CD-ROM can take the place of a competent teacher and no computer network will change the way government works"* [14].

Institute of Technology, is the driving force behind its widespread adoption. According to **Figure 1** and Forde, the future will probably witness a considerable number of blockchain

Blockchain: The Next Breakthrough in the Rapid Progress of AI

http://dx.doi.org/10.5772/intechopen.75668

207

*Certificates and IDs are issued exclusively on blockchain:* toward the end of 2017, the Dubai Land Department became the world's first government entity to conduct all its transactions through Blockchain technology [18]. Along the same direction, the Swedish National Land Survey and FinTech startup ChromaWay will test launch an initiative to put all land title records on Blockchain, and thereby safeguard the rights and interests of genuine property owners, eliminating or seriously reducing the chance of fraud [19]. Land records is just the tip of the iceberg, with all government (IDs, Passports, Driver Licenses, Birth Certificates, etc.) and educational certificates (Graduation Diplomas, Records of Programs/Courses taken, etc.) of potential candidates to be issued using blockchain technology. It is highly likely, therefore, to see a surge for this to happen with considerable cost savings, reduced bureaucracy and improved level of services.

While governments are buoyant about adopting blockchain for their operations, they are not so sure about virtual currencies, such as bitcoins, afraid of being used for tax evasion and possible criminal activities associated with the dark web. At present, the legal status of virtual currencies varies considerably from one country to another, with no indications of what countries plan to do in the future. China's recent decision to ban Initial Coin Offerings (ICO), calling them 'illegal fundraising' [20] as well as that of Russia to block cryptocurrency exchanges, are an indication of how virtual currencies are being treated by governments. At the same time, some countries (Switzerland, Singapore, South Korea, Japan, Dubai and Bahrain) are more open to adopt virtual currencies alongside their legal money, while others are openly hostile to its adoption. At the same time, international bodies like IFM encourage such an adoption, initially at least from countries with weak institutions and unstable national currencies. As time passes and the problems of volatility and hacking are addressed, virtual currencies are likely to play a complemental role, supplementing national ones, in trade and financial transactions, among others. However, at present, their future prospect is uncertain.

For health records to be useful they must be shared among doctors, laboratories, hospitals, pharmacies, government health agencies, insurance companies and researchers while, at the same time, protecting patients' privacy against unauthorized usage and breached health records. Although the challenge for doing so is tremendous, the Estonian eHealth Foundation is operating with considerable benefits, as a secure health record system that can become an example for other countries to follow, although it may be more difficult given the complexities of implementing the system in larger nations. In the USA, there are serious efforts to implement a blockchain health system that among other achievements can reduce fraudulent claims that are estimated at around 5–10% of health care costs at present. The challenge is how to digitize and standardize all health records, some of which are hand written. One system being developed to do so is MedRec [21] that according to its developers *"doesn't store* 

applications in all areas of governmental operations.

**9.2. Virtual (digital or crypto) currencies are adopted**

**10. eHealth records**

## **9. Future prospects**

Recently, Christine Lagarde, IMF's Managing Director, gave a talk at the Bank of England entitled "Central Banking and FinTech, A Brave New World?" [15] providing her views of banking and policy making in the year 2040. Her talk concentrated on three themes (virtual/ digital currencies, new models of financial intermediation and AI, all three major concerns of this paper too) and how they will affect the future as well as what should be done to deal effectively with the challenges they will pose. Her advice *was "we−as individuals and communities−have the capacity to shape a technological and economic future that works for all",* adding that we have a responsibility to make it work, assuring that humans will be needed for all important decisions, even though machines will certainly play a greater role as time passes.

#### **9.1. Governments adopt blockchain for their entire operations**

Some countries are experimenting with blockchain while a few are ahead in adopting the technology in some functions of their operations. Estonia is a pioneer having already applied blockchain-based services in eHealth, eSecurity and eSafety, eGovernment Services and eGovernance (including iVoting), estimating that such services save 100 years of working time for its 1.3 million citizens. Countries like Sweden follow Estonia's example while Dubai plans to implement blockchain to its entire government by 2020, reducing CO2 emission by 114 million tons a year from fewer trips and saving 25.1 million hours and \$1.5 billion annually from productivity increases in document processing alone [16]. According to an IBM sponsored survey [16], 9 in 10 government executives plan to make blockchain investments in financial transactions, asset and contract management and regulatory compliance by 2018. **Figure 1** shows the expectation of these executives to implement blockchain. According to the *Economist*'s article [17], governments may become big backers of blockchain technology as they come to understand its benefits that according to Brian Forde, of the Massachusetts

**Figure 1.** First to finish: Respondents' expectations of when they will have blockchains in productions and at scale.

Institute of Technology, is the driving force behind its widespread adoption. According to **Figure 1** and Forde, the future will probably witness a considerable number of blockchain applications in all areas of governmental operations.

*Certificates and IDs are issued exclusively on blockchain:* toward the end of 2017, the Dubai Land Department became the world's first government entity to conduct all its transactions through Blockchain technology [18]. Along the same direction, the Swedish National Land Survey and FinTech startup ChromaWay will test launch an initiative to put all land title records on Blockchain, and thereby safeguard the rights and interests of genuine property owners, eliminating or seriously reducing the chance of fraud [19]. Land records is just the tip of the iceberg, with all government (IDs, Passports, Driver Licenses, Birth Certificates, etc.) and educational certificates (Graduation Diplomas, Records of Programs/Courses taken, etc.) of potential candidates to be issued using blockchain technology. It is highly likely, therefore, to see a surge for this to happen with considerable cost savings, reduced bureaucracy and improved level of services.

## **9.2. Virtual (digital or crypto) currencies are adopted**

While governments are buoyant about adopting blockchain for their operations, they are not so sure about virtual currencies, such as bitcoins, afraid of being used for tax evasion and possible criminal activities associated with the dark web. At present, the legal status of virtual currencies varies considerably from one country to another, with no indications of what countries plan to do in the future. China's recent decision to ban Initial Coin Offerings (ICO), calling them 'illegal fundraising' [20] as well as that of Russia to block cryptocurrency exchanges, are an indication of how virtual currencies are being treated by governments. At the same time, some countries (Switzerland, Singapore, South Korea, Japan, Dubai and Bahrain) are more open to adopt virtual currencies alongside their legal money, while others are openly hostile to its adoption. At the same time, international bodies like IFM encourage such an adoption, initially at least from countries with weak institutions and unstable national currencies. As time passes and the problems of volatility and hacking are addressed, virtual currencies are likely to play a complemental role, supplementing national ones, in trade and financial transactions, among others. However, at present, their future prospect is uncertain.

## **10. eHealth records**

was true when the Internet was in its infancy back in 1995. In a *Newsweek* article in February of that year, Clifford Stoll, a computer expert, wrote *"Baloney. Do our computer pundits lack all common sense? The truth is no online database will replace your daily newspaper, no CD-ROM can take the place of a competent teacher and no computer network will change the way government works"* [14].

Recently, Christine Lagarde, IMF's Managing Director, gave a talk at the Bank of England entitled "Central Banking and FinTech, A Brave New World?" [15] providing her views of banking and policy making in the year 2040. Her talk concentrated on three themes (virtual/ digital currencies, new models of financial intermediation and AI, all three major concerns of this paper too) and how they will affect the future as well as what should be done to deal effectively with the challenges they will pose. Her advice *was "we−as individuals and communities−have the capacity to shape a technological and economic future that works for all",* adding that we have a responsibility to make it work, assuring that humans will be needed for all important

Some countries are experimenting with blockchain while a few are ahead in adopting the technology in some functions of their operations. Estonia is a pioneer having already applied blockchain-based services in eHealth, eSecurity and eSafety, eGovernment Services and eGovernance (including iVoting), estimating that such services save 100 years of working time for its 1.3 million citizens. Countries like Sweden follow Estonia's example while Dubai plans to implement blockchain to its entire government by 2020, reducing CO2 emission by 114 million tons a year from fewer trips and saving 25.1 million hours and \$1.5 billion annually from productivity increases in document processing alone [16]. According to an IBM sponsored survey [16], 9 in 10 government executives plan to make blockchain investments in financial transactions, asset and contract management and regulatory compliance by 2018. **Figure 1** shows the expectation of these executives to implement blockchain. According to the *Economist*'s article [17], governments may become big backers of blockchain technology as they come to understand its benefits that according to Brian Forde, of the Massachusetts

**Figure 1.** First to finish: Respondents' expectations of when they will have blockchains in productions and at scale.

decisions, even though machines will certainly play a greater role as time passes.

**9.1. Governments adopt blockchain for their entire operations**

**9. Future prospects**

206 Artificial Intelligence - Emerging Trends and Applications

For health records to be useful they must be shared among doctors, laboratories, hospitals, pharmacies, government health agencies, insurance companies and researchers while, at the same time, protecting patients' privacy against unauthorized usage and breached health records. Although the challenge for doing so is tremendous, the Estonian eHealth Foundation is operating with considerable benefits, as a secure health record system that can become an example for other countries to follow, although it may be more difficult given the complexities of implementing the system in larger nations. In the USA, there are serious efforts to implement a blockchain health system that among other achievements can reduce fraudulent claims that are estimated at around 5–10% of health care costs at present. The challenge is how to digitize and standardize all health records, some of which are hand written. One system being developed to do so is MedRec [21] that according to its developers *"doesn't store*  *health records or require a change in practice. It stores a signature of the record on a blockchain and notifies the patient, who is ultimately in control of where that record can travel. The signature assures that an unaltered copy of the record is obtained. It also shifts the locus of control from the institution to the patient, and in return both burdens and enables the patient to take charge of management"* [22]. According to Das [23], blockchain will probably play a significant role in the healthcare industry as it has started "*to inspire both relatively easily achievable and more speculative potential applications*". Healthcare authorities, governments and providers are excited about the available possibilities and are investing to achieve them, although these achievements maybe more evolutionary than abrupt.

various players while also assuring firms receiving materials/parts and consumers on the authenticity of goods (from the raw materials to the final product). This can be done, for instance, by installing RFID tugs that can immutably record every movement of material/ product, guaranteeing its provenance and testifying its physical presence, thus, eliminating the need for letter of credits, factoring and detailed inspections. Moreover, the optimization of supply chain can be achieved at present using AI for its logistics part (scheduling and planning) while it can be extended in the future to automate the majority of supply chain transactions (in conjunction with smart contracts) that could include the majority of

Blockchain: The Next Breakthrough in the Rapid Progress of AI

http://dx.doi.org/10.5772/intechopen.75668

209

The safety provided by blockchain technology is indispensable for the smooth running of selfdriving vehicles and the untroubled functioning of IoT devices. By 2020, it is estimated that a sizable number of AVs will be on the road while there will be more than 1 trillion IoT gadgets, providing a unique challenge for blockchain technology to provide interconnectivity for all AVs and the smooth integration of the trillion of IoT devices. The implications are immense. If AVs are interconnected, they could communicate traffic jams, facilitate car sharing, receive and make payments and select the best insurance option among other tasks that can be performed using blockchain. Interconnected IoTs can optimize the functioning of all its devices, say at home, setting optimal temperatures, reducing energy consumption, ordering food and

Despite being in their infancy, smart contracts hold the potential to become a groundbreaking legal innovation, becoming a cornerstone of future commerce. At present, there are several problems limiting its applicability as a legal document [24]. Once these problems can be resolved, they can safely move assets around, interact with IoT devices and automate many business-related processes that demand human resources. How smart contracts will affect lawyers and law practices is debatable, with some predicting a serious decline in the need for

DAO is another major innovation of blockchain technology. A DAO is a company without a CEO, managers, employees or office buildings. It is created and run based on the computer code included in a smart contract. Although, the first DAO firm was hacked and its assets were stolen [26], the potential for DAOs are significant once the technical security problems are resolved. For instance, there is no reason for portfolio funds solely investing in market indexes to pay expensive executives, employ personnel and occupy offices when they can be

**11.1. VA (autonomous vehicles) and IoT (Internet of Things)**

**12. Smart blockchain contracts instead of lawyers**

lawyers [25] or at least providing an alternative to expensive legal practices.

**13. Decentralized autonomous organizations (DAOs)**

AI transactions.

checking and paying utility bills.

## **10.1. Business firms adopting blockchain for their internal operations and external transactions**

Blockchain, as discussed, is a distributed ledger of trustworthy digital records whose safety is assured and its history can be traced as new data is added and chained, at the end of old ones while no information can be erased. Businesses that can leverage these unique advantages can harness significant gains in efficiency, including lower costs, more effective auditing (the data is immutable) and eliminating, or making fraud practically impossible.

#### **10.2. The banking and financial sector and FinTech firms**

Blockchain technology can be used for secure and direct alternatives to the complex and expensive banking processes used today, reducing transaction costs from \$25 to less than a single dollar and avoiding costly intermediaries [3]. Such a huge saving has obliged practically all major banks to test the technology and many of them have joined R3, a startup developing Corda, a blockchain-based platform geared toward the banking industry. Corda and similar platforms will transform the sector by simplifying operations, eliminating intermediaries, reducing operating costs and offering a wide variety of new, innovative products and services, in addition to opening up banking to billions of people who are excluded at present. Financial firms face similar challenges as banks. In remarks at a FinTech-focused conference at the end of September 2017, Yasuhiro Sato, the president and CEO of the Mizuho Financial Group, said "*the technology could 'change the strategies of international financial institutions,'* adding *'we should have the courage' to make the shift to blockchain now*". The Japanese Bankers Association (JBA) announced earlier in September 2017 that it will partner with IT provider Fujitsu to test the viability of using a blockchain across financial services. Blockchain will transform the banking/financial sectors, as FinTech startups are disrupting incumbents by developing innovative blockchain platforms and offering new products/services at lower prices.

## **11. Supply chain operations**

As mentioned, supply chain transactions are dominated by paper-based, time-consuming and bureaucratic procedures, involving banks, financial firms and custom agencies among others. In the future blockchain can eliminate the paper trail and introduce trust among the various players while also assuring firms receiving materials/parts and consumers on the authenticity of goods (from the raw materials to the final product). This can be done, for instance, by installing RFID tugs that can immutably record every movement of material/ product, guaranteeing its provenance and testifying its physical presence, thus, eliminating the need for letter of credits, factoring and detailed inspections. Moreover, the optimization of supply chain can be achieved at present using AI for its logistics part (scheduling and planning) while it can be extended in the future to automate the majority of supply chain transactions (in conjunction with smart contracts) that could include the majority of AI transactions.

## **11.1. VA (autonomous vehicles) and IoT (Internet of Things)**

*health records or require a change in practice. It stores a signature of the record on a blockchain and notifies the patient, who is ultimately in control of where that record can travel. The signature assures that an unaltered copy of the record is obtained. It also shifts the locus of control from the institution to the patient, and in return both burdens and enables the patient to take charge of management"* [22]. According to Das [23], blockchain will probably play a significant role in the healthcare industry as it has started "*to inspire both relatively easily achievable and more speculative potential applications*". Healthcare authorities, governments and providers are excited about the available possibilities and are investing to achieve them, although these achievements maybe more

**10.1. Business firms adopting blockchain for their internal operations and external** 

is immutable) and eliminating, or making fraud practically impossible.

blockchain platforms and offering new products/services at lower prices.

**11. Supply chain operations**

**10.2. The banking and financial sector and FinTech firms**

Blockchain, as discussed, is a distributed ledger of trustworthy digital records whose safety is assured and its history can be traced as new data is added and chained, at the end of old ones while no information can be erased. Businesses that can leverage these unique advantages can harness significant gains in efficiency, including lower costs, more effective auditing (the data

Blockchain technology can be used for secure and direct alternatives to the complex and expensive banking processes used today, reducing transaction costs from \$25 to less than a single dollar and avoiding costly intermediaries [3]. Such a huge saving has obliged practically all major banks to test the technology and many of them have joined R3, a startup developing Corda, a blockchain-based platform geared toward the banking industry. Corda and similar platforms will transform the sector by simplifying operations, eliminating intermediaries, reducing operating costs and offering a wide variety of new, innovative products and services, in addition to opening up banking to billions of people who are excluded at present. Financial firms face similar challenges as banks. In remarks at a FinTech-focused conference at the end of September 2017, Yasuhiro Sato, the president and CEO of the Mizuho Financial Group, said "*the technology could 'change the strategies of international financial institutions,'* adding *'we should have the courage' to make the shift to blockchain now*". The Japanese Bankers Association (JBA) announced earlier in September 2017 that it will partner with IT provider Fujitsu to test the viability of using a blockchain across financial services. Blockchain will transform the banking/financial sectors, as FinTech startups are disrupting incumbents by developing innovative

As mentioned, supply chain transactions are dominated by paper-based, time-consuming and bureaucratic procedures, involving banks, financial firms and custom agencies among others. In the future blockchain can eliminate the paper trail and introduce trust among the

evolutionary than abrupt.

208 Artificial Intelligence - Emerging Trends and Applications

**transactions**

The safety provided by blockchain technology is indispensable for the smooth running of selfdriving vehicles and the untroubled functioning of IoT devices. By 2020, it is estimated that a sizable number of AVs will be on the road while there will be more than 1 trillion IoT gadgets, providing a unique challenge for blockchain technology to provide interconnectivity for all AVs and the smooth integration of the trillion of IoT devices. The implications are immense. If AVs are interconnected, they could communicate traffic jams, facilitate car sharing, receive and make payments and select the best insurance option among other tasks that can be performed using blockchain. Interconnected IoTs can optimize the functioning of all its devices, say at home, setting optimal temperatures, reducing energy consumption, ordering food and checking and paying utility bills.

## **12. Smart blockchain contracts instead of lawyers**

Despite being in their infancy, smart contracts hold the potential to become a groundbreaking legal innovation, becoming a cornerstone of future commerce. At present, there are several problems limiting its applicability as a legal document [24]. Once these problems can be resolved, they can safely move assets around, interact with IoT devices and automate many business-related processes that demand human resources. How smart contracts will affect lawyers and law practices is debatable, with some predicting a serious decline in the need for lawyers [25] or at least providing an alternative to expensive legal practices.

## **13. Decentralized autonomous organizations (DAOs)**

DAO is another major innovation of blockchain technology. A DAO is a company without a CEO, managers, employees or office buildings. It is created and run based on the computer code included in a smart contract. Although, the first DAO firm was hacked and its assets were stolen [26], the potential for DAOs are significant once the technical security problems are resolved. For instance, there is no reason for portfolio funds solely investing in market indexes to pay expensive executives, employ personnel and occupy offices when they can be run more effectively as a DAO, open 24/7. There are immense possibilities to be exploited, leading to great cost reductions and more efficient operations as DAOs, once perfected, are not prone to human errors.

additional desirable properties. In the future, transaction speeds, verification times and data limits will further improve through innovations in order to deal with the exponentially grow-

Blockchain: The Next Breakthrough in the Rapid Progress of AI

http://dx.doi.org/10.5772/intechopen.75668

211

*Specific:* virtual currencies are currently too volatile and therefore too risky to be acquired by the public while the fear of hacking and fraud is present. In addition, dealing with technical problems such as programming bugs in the code of smart contracts must be dealt with, as their consequences when the contracts are executed are critical. Finally, the problem of scalability of the blockchain technology must be addressed as some platforms are reaching their capacity and storage limits. The hope is that as prices rise so will the need for innovative solu-

As we have shown in this chapter, blockchain is a groundbreaking technology permitting the safe and reliable storage and transmission of data, among its other advantages. AI, on the other hand, is a revolutionary technology that can learn on its own by analyzing and discovering patterns in massive amounts of (big) data. There is, therefore, a natural complementarity between the two, as blockchain safely stores/transmits trustworthy data while AI requires huge amounts of reliable data to discover patterns and learn. In this section, we discuss the complementarity between the two technologies and consider the breakthrough innovations that could result by marrying them. The potential benefits are expected to be in the areas of medicine, autonomous vehicles (AV), smart contracts, Internet of Things (IoT), decentralized autonomous organizations (DAOs) and many additional areas of applications, not yet conceived at present. In many cases, AI could not be used without the assurance of the safety and reliability of the data provided by blockchain and vice versa the value of many blockchain

Two examples can illustrate the complementarity and mutual benefits of joining blockchain and AI. Consider AVs in the simple case, as more carmakers adopt "over the air (OTA)" software updates for their increasingly connected and autonomous cars the risk of a hacker hijacking and stealing the car will also increase. In a worse-case scenario, a car can be forced to cause accidents or create traffic jams while the worst possibility would be to hijack and program the car to accomplish simultaneous terrorist attacks in many cities. Similarly, if IoT devices can be hacked, a home's security will be compromised, or its equipment can malfunction. Therefore, the safety provided by blockchain is indispensable for the smooth utilization of AVs and IoTs. On the other hand, consider a smart contract application that depends on some environmental assumptions for its correct execution. Such a contract would be outdated once some of these assumptions do not hold, making AI monitoring imperative in order to allow learning and determining on its own when the environment has changed. Although at present the blockchain and AI technologies may not be at the point of being successfully combined, the prospects for doing so in the near future are encouraging, motivated by the substantial expected benefits. The remainder of this section describes such advantages, clearly recognized in China where the first alliance for integrating artificial intelligence and block-

ing number of transactions.

**15.1. Combining blockchain and AI**

applications will be limited without AI.

chain is being established to harness these benefits [33].

tions that will eventually solve practically all problems.

## **14. Other applications**

There are numerous, additional applications of the blockchain technology pointing to substantial improvements. Some of them are listed below while there is practically no limit to future ones being developed and implemented


As the adoption of new technologies has accelerated over time [30], the same phenomenon would probably occur with blockchain, resulting in more applications and faster penetration rates allowing us to exploit its considerable benefits in record time and witnessing quickening progress in the field.

## **15. Challenges**

The blockchain challenges can be classified as *general*, referring to the technology itself and *specific* ones concerning virtual currencies.

*General:* adapting the blockchain technology and integrating it with existing IT systems may require significant changes, or even complete replacement of such systems, considerable initial investments and difficulties in hiring personnel to implement the technology. Although these problems are important, ready-made solutions and open systems may alleviate them, which are no different to when the Internet or other new technologies were first introduced. Another concern is the high electricity consumption required to run all of the computers in the network that some estimate to be equal to that of Ireland [31]. To avoid this problem alternative technologies to pure blockchain have been developed and utilized. DeepMind, for instance, uses a method called Merkle trees to track data changes without requiring verification from all networked machines. Such trees allow the efficient and secure verification of the contents of large data structures when the major objective is the safety and immutability of the data rather than ensuring trust between the parties involved. Similarly, the "algorand" algorithm [32] substantially reduces the amount of computations required and possesses additional desirable properties. In the future, transaction speeds, verification times and data limits will further improve through innovations in order to deal with the exponentially growing number of transactions.

*Specific:* virtual currencies are currently too volatile and therefore too risky to be acquired by the public while the fear of hacking and fraud is present. In addition, dealing with technical problems such as programming bugs in the code of smart contracts must be dealt with, as their consequences when the contracts are executed are critical. Finally, the problem of scalability of the blockchain technology must be addressed as some platforms are reaching their capacity and storage limits. The hope is that as prices rise so will the need for innovative solutions that will eventually solve practically all problems.

## **15.1. Combining blockchain and AI**

run more effectively as a DAO, open 24/7. There are immense possibilities to be exploited, leading to great cost reductions and more efficient operations as DAOs, once perfected, are

There are numerous, additional applications of the blockchain technology pointing to substantial improvements. Some of them are listed below while there is practically no limit to

• Maritime insurance, reducing costs, decreasing fraud and speeding up the settlement of

• Educational material can be exchanged safely among academic institutions while safe-

As the adoption of new technologies has accelerated over time [30], the same phenomenon would probably occur with blockchain, resulting in more applications and faster penetration rates allowing us to exploit its considerable benefits in record time and witnessing quickening

The blockchain challenges can be classified as *general*, referring to the technology itself and

*General:* adapting the blockchain technology and integrating it with existing IT systems may require significant changes, or even complete replacement of such systems, considerable initial investments and difficulties in hiring personnel to implement the technology. Although these problems are important, ready-made solutions and open systems may alleviate them, which are no different to when the Internet or other new technologies were first introduced. Another concern is the high electricity consumption required to run all of the computers in the network that some estimate to be equal to that of Ireland [31]. To avoid this problem alternative technologies to pure blockchain have been developed and utilized. DeepMind, for instance, uses a method called Merkle trees to track data changes without requiring verification from all networked machines. Such trees allow the efficient and secure verification of the contents of large data structures when the major objective is the safety and immutability of the data rather than ensuring trust between the parties involved. Similarly, the "algorand" algorithm [32] substantially reduces the amount of computations required and possesses

• Blockchain-enabled energy trading saving millions of dollars per year.

• Identifying epidemics faster while avoiding to cause panic [28].

guarding the intellectual rights of the writers [29].

*specific* ones concerning virtual currencies.

not prone to human errors.

210 Artificial Intelligence - Emerging Trends and Applications

**14. Other applications**

claims [27].

progress in the field.

**15. Challenges**

future ones being developed and implemented

As we have shown in this chapter, blockchain is a groundbreaking technology permitting the safe and reliable storage and transmission of data, among its other advantages. AI, on the other hand, is a revolutionary technology that can learn on its own by analyzing and discovering patterns in massive amounts of (big) data. There is, therefore, a natural complementarity between the two, as blockchain safely stores/transmits trustworthy data while AI requires huge amounts of reliable data to discover patterns and learn. In this section, we discuss the complementarity between the two technologies and consider the breakthrough innovations that could result by marrying them. The potential benefits are expected to be in the areas of medicine, autonomous vehicles (AV), smart contracts, Internet of Things (IoT), decentralized autonomous organizations (DAOs) and many additional areas of applications, not yet conceived at present. In many cases, AI could not be used without the assurance of the safety and reliability of the data provided by blockchain and vice versa the value of many blockchain applications will be limited without AI.

Two examples can illustrate the complementarity and mutual benefits of joining blockchain and AI. Consider AVs in the simple case, as more carmakers adopt "over the air (OTA)" software updates for their increasingly connected and autonomous cars the risk of a hacker hijacking and stealing the car will also increase. In a worse-case scenario, a car can be forced to cause accidents or create traffic jams while the worst possibility would be to hijack and program the car to accomplish simultaneous terrorist attacks in many cities. Similarly, if IoT devices can be hacked, a home's security will be compromised, or its equipment can malfunction. Therefore, the safety provided by blockchain is indispensable for the smooth utilization of AVs and IoTs. On the other hand, consider a smart contract application that depends on some environmental assumptions for its correct execution. Such a contract would be outdated once some of these assumptions do not hold, making AI monitoring imperative in order to allow learning and determining on its own when the environment has changed. Although at present the blockchain and AI technologies may not be at the point of being successfully combined, the prospects for doing so in the near future are encouraging, motivated by the substantial expected benefits. The remainder of this section describes such advantages, clearly recognized in China where the first alliance for integrating artificial intelligence and blockchain is being established to harness these benefits [33].

## **16. Government operations**

Governments, apart from some pioneering ones already mentioned, are slow in adopting new technologies and blockchain and AI are no exceptions, particularly when AI as a technology is still in a developmental stage, apart from some applications in games and those involving language and image recognition [34]. This does not mean that there will not be significant progress in the future, as the steepest progress in AI only occurred a few years ago. At present, however, the majority of AI applications are centered on digital assistants, answering questions in natural language and in image, including face recognition techniques [35]. The future prospects however are huge, with estimated benefits running into the billions. AI applications could range from fighting tax evasion to establishing monetary and fiscal policies. The catchword of "cognitive AI", if it becomes a reality, can have profound implications in not only saving billions but also providing higher quality services to the public and increasing the level of democratization. Some governments such as those of Dubai are planning to introduce Blockchain into their entire operations reducing bureaucracy, improving their efficiency, reducing waste and pollution and saving billions in the process.

detection was predictive with great success using statistical decision rules. AI has improved such rules to a new level by allowing learning through the analysis of a huge amount (big) of data to identify patterns and improve decision-making. Klarna, a Swedish e-commerce company, provides instant evaluation of customers' credit worthiness for buying goods without a credit card. The same task is done by the Chinese Yongqianbao and several other firms. In addition, *"AI technology is being used to find the speediest way to execute trades, to make bets on market momentum, and to scan press releases and financial reports for keywords that could signal that a stock will rise or fall"* [37]. However, this is not the same with more accurate forecasting. Unfortunately, stocks and commodities behave like random walks and cannot be predicted any better than using the most recent price for future ones, according to efficient market theory [38]. For instance, in a recent study conducted by one of the authors of this paper [39], comparing statistical and AI (ML or NN) forecasting methods, it was found that the former were more accurate than the AI ones, half of which were less accurate than a

Blockchain: The Next Breakthrough in the Rapid Progress of AI

http://dx.doi.org/10.5772/intechopen.75668

213

Clearly, present AI applications in banking and finance are just the tip of the iceberg and soon the power of AI to deliver better experiences, lower costs, reduce risks and increase revenues

A prime example of successful AI applications is Numerai [40], a San Francisco hedge fund that makes trades using machine-learning models built by thousands of anonymous data scientists paid in bitcoin. Another is Polychain, a fund that buys bitcoin and other digital currencies and invests in a radically new breed of businesses owned, funded, and operated entirely

Blockchain technology is already utilized in supply chains while its integrations with AI is still in its infancy apart from its logistic part (what used to be the old scheduling/planning tasks) which is used extensively by some firms [41]. The challenge is in the future to extend AI to the remaining parts of the supply chain. Amazon, a pioneer in AI, has moved beyond just responding to customer demands by developing a whole profile for each customer and using such data in its AI applications. Manish Chandra and Anand Darvbhe of Accenture [42] point out, *"The use of AI in supply chains will ultimately result in spawning an ecosystem where supply chains link themselves with each other, enabling seamless flow of products and information from one end to the other",* completely automating the process and achieving significant benefits in the

Employing AI to AVs can go beyond just following a set course for taking its passengers from point A to B by continuously analyzing traffic information from connected AVs and learning to determine the route depending on the time, the day, the weather conditions and a host of other factors. Moreover, it can even modify the course of a journey, if necessary, when the AI

will become a reality and they may even progress to more accurate forecasting.

by decentralized networks of anonymous online investors.

random walk benchmark.

**19. Supply chain operations**

process.

**20. AV and IoT**

## **17. Digital currencies**

It is not obvious how AI can be combined with the blockchain technology used in bitcoins and other cryptocurrencies, although this could be achieved in the future when DAOs and robots will be introduced, owning property and holding assets. In such a case, they will have to use AI to make the necessary M2M transactions or using bitcoins for making and receiving payments.

#### **17.1. eHealth**

While blockchain can assure safety and reliability, adding AI capabilities can greatly benefit the health sector. At present AI is mainly used for detecting abnormalities in X-rays and CT scans, a task performed at least as accurately as humans can, and for assuring a greater level of personalized medicine. According to experts, the future holds significant inventions given the momentous benefits that can be achieved by reducing medical costs and improving the quality of medical care. For this reason, all big players (Google, Microsoft, Apple and Amazon), as well as a host of startups are actively exploring AI for medical applications, aimed at improving the more effective utilization of patients' data, the accuracy of diagnosis, providing better recommendations, based on evidence-based research findings, and several other possibilities. These applications are on top of improvements in robotic surgery and digital advice provided though smartphone applications. According to Accenture [36], key clinical health AI applications can potentially create \$150 billion in annual savings for the United States healthcare economy by 2026.

## **18. The banking and financial sector**

The benefits of AI for the back office of banks and financial firms are widespread, as large histories of data are available. For a long time before AI was introduced, risk and fraud detection was predictive with great success using statistical decision rules. AI has improved such rules to a new level by allowing learning through the analysis of a huge amount (big) of data to identify patterns and improve decision-making. Klarna, a Swedish e-commerce company, provides instant evaluation of customers' credit worthiness for buying goods without a credit card. The same task is done by the Chinese Yongqianbao and several other firms. In addition, *"AI technology is being used to find the speediest way to execute trades, to make bets on market momentum, and to scan press releases and financial reports for keywords that could signal that a stock will rise or fall"* [37]. However, this is not the same with more accurate forecasting. Unfortunately, stocks and commodities behave like random walks and cannot be predicted any better than using the most recent price for future ones, according to efficient market theory [38]. For instance, in a recent study conducted by one of the authors of this paper [39], comparing statistical and AI (ML or NN) forecasting methods, it was found that the former were more accurate than the AI ones, half of which were less accurate than a random walk benchmark.

Clearly, present AI applications in banking and finance are just the tip of the iceberg and soon the power of AI to deliver better experiences, lower costs, reduce risks and increase revenues will become a reality and they may even progress to more accurate forecasting.

A prime example of successful AI applications is Numerai [40], a San Francisco hedge fund that makes trades using machine-learning models built by thousands of anonymous data scientists paid in bitcoin. Another is Polychain, a fund that buys bitcoin and other digital currencies and invests in a radically new breed of businesses owned, funded, and operated entirely by decentralized networks of anonymous online investors.

## **19. Supply chain operations**

**16. Government operations**

212 Artificial Intelligence - Emerging Trends and Applications

**17. Digital currencies**

**17.1. eHealth**

Governments, apart from some pioneering ones already mentioned, are slow in adopting new technologies and blockchain and AI are no exceptions, particularly when AI as a technology is still in a developmental stage, apart from some applications in games and those involving language and image recognition [34]. This does not mean that there will not be significant progress in the future, as the steepest progress in AI only occurred a few years ago. At present, however, the majority of AI applications are centered on digital assistants, answering questions in natural language and in image, including face recognition techniques [35]. The future prospects however are huge, with estimated benefits running into the billions. AI applications could range from fighting tax evasion to establishing monetary and fiscal policies. The catchword of "cognitive AI", if it becomes a reality, can have profound implications in not only saving billions but also providing higher quality services to the public and increasing the level of democratization. Some governments such as those of Dubai are planning to introduce Blockchain into their entire operations reducing bureaucracy, improving their efficiency,

It is not obvious how AI can be combined with the blockchain technology used in bitcoins and other cryptocurrencies, although this could be achieved in the future when DAOs and robots will be introduced, owning property and holding assets. In such a case, they will have to use AI to make the necessary M2M transactions or using bitcoins for making and receiving payments.

While blockchain can assure safety and reliability, adding AI capabilities can greatly benefit the health sector. At present AI is mainly used for detecting abnormalities in X-rays and CT scans, a task performed at least as accurately as humans can, and for assuring a greater level of personalized medicine. According to experts, the future holds significant inventions given the momentous benefits that can be achieved by reducing medical costs and improving the quality of medical care. For this reason, all big players (Google, Microsoft, Apple and Amazon), as well as a host of startups are actively exploring AI for medical applications, aimed at improving the more effective utilization of patients' data, the accuracy of diagnosis, providing better recommendations, based on evidence-based research findings, and several other possibilities. These applications are on top of improvements in robotic surgery and digital advice provided though smartphone applications. According to Accenture [36], key clinical health AI applications can potentially cre-

ate \$150 billion in annual savings for the United States healthcare economy by 2026.

The benefits of AI for the back office of banks and financial firms are widespread, as large histories of data are available. For a long time before AI was introduced, risk and fraud

**18. The banking and financial sector**

reducing waste and pollution and saving billions in the process.

Blockchain technology is already utilized in supply chains while its integrations with AI is still in its infancy apart from its logistic part (what used to be the old scheduling/planning tasks) which is used extensively by some firms [41]. The challenge is in the future to extend AI to the remaining parts of the supply chain. Amazon, a pioneer in AI, has moved beyond just responding to customer demands by developing a whole profile for each customer and using such data in its AI applications. Manish Chandra and Anand Darvbhe of Accenture [42] point out, *"The use of AI in supply chains will ultimately result in spawning an ecosystem where supply chains link themselves with each other, enabling seamless flow of products and information from one end to the other",* completely automating the process and achieving significant benefits in the process.

## **20. AV and IoT**

Employing AI to AVs can go beyond just following a set course for taking its passengers from point A to B by continuously analyzing traffic information from connected AVs and learning to determine the route depending on the time, the day, the weather conditions and a host of other factors. Moreover, it can even modify the course of a journey, if necessary, when the AI determines that traffic patterns are changing. Similarly, IoTs devices can go beyond setting temperatures and ordering food by using AI to predict what the owners want and modify the settings to satisfy their evolving desires.

**21. Conclusions**

nate Airbnb's monopolistic advantages.

Blockchain technology, according to Muneeb Ali, Blockstack Co-Founder, *"can help us advance from a 'don't be evil' world to a 'can't be evil' world".* Blockchain transactions assure trust and reliability, improve security and remove intermediaries from value chains. In a chapter, Tasca and Ulieru [3] state that, in a not-so-distant future, our economic structure will be organized around person-to-person decentralized platforms that could enable real sharing of marketplaces without intermediaries and central hubs, where all transactions between consumers and service providers will be done through decentralized, person-to-person networks. They discuss Uber and Airbnb as examples. Both companies create extra value exploiting their monopolistic advantage, derived from their centralized, proprietary software platforms, which allow them to dictate their conditions to drivers/owners and customers. LaZooz, using blockchain technology, on the other hand, has developed a decentralized transportation platform owned by the community, utilizing vehicles' unused space to create a variety of smart transportation solutions. LaZooz works with a "Fair Share" rewarding mechanism sharing value creation among developers, users and backers. Similarly, Slock (an Italian startup), uses open source blockchain technology, to develop the Universal Sharing Network (USN) to elimi-

Blockchain: The Next Breakthrough in the Rapid Progress of AI

http://dx.doi.org/10.5772/intechopen.75668

215

In addition to startups, established companies also seek to exploit the advantages of blockchain technology and diminish the monopolistic advantages of Internet giants. The CEO of TUI, the largest tourist firm in the world, believes that blockchain technology will break the almost "monopolistic" hold that Priceline, Expedia, Booking.com and Airbnb have today in the lodging and distribution ecosystem [45]. He believes that these firms create superior margins because they take advantage of their monopolistic power and that blockchain will destroy that. TUI, he explained, has already moved all of its contracts into its private blockchain. "We are using it today predominantly to have mechanisms to swap bedstock between different PMSs [Property Management Systems]," he said. "The next step is that the whole inventory will be on the blockchain." Then using smart contracts, which are simply code snippets that execute automatically on the blockchain, Joussen argues it can easily manage and automate a large part of bedstock and hotel capacity between all the markets TUI operates.

Clearly, TUI is not the only company developing blockchain applications. So, the critical question is how all these applications will affect the competitive landscape and how innovative startups will utilize blockchain technologies to disrupt established players and create the corresponding success stories of Amazon, Google and Facebook, among others, in the emerging Internet of value. In answering this question, we should have in mind Amara's law that states, *"We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run*". We strongly believe that in the long term, the Internet of Value will bring changes of equal or greater magnitude to those of the existing Internet of communications. The critical question is how to recognize such changes as soon as possible and how to profit by implementing them to gain competitive advantages. There is little doubt in our minds that in the next couple of decades, innovative, entrepreneurial startups marrying blockchain and AI technologies will disrupt established industry leaders such as Google, Amazon, Facebook,

## **20.1. Cognitive blockchain smart contracts (IBM) and DAO**

IBM is experimenting with turning smart contracts into "cognitive contracts" that can learn and adapt using AI [43]. This can be done by identifying pattern changes in the data, recognizing interesting interactions, detecting suspect activities, etc., in order to make recommendations for updating the smart contracts and taking specific actions based on insights gained from AI. Clearly, such cognitive contracts can be applied to DAOs to improve their effectiveness and value.

#### **20.2. Matrix chain: merging blockchain and AI**

Lately, efforts are being made to integrate AI and blockchain technologies into a single application. At the technical level this has been attempted by a new type of blockchain called the "MATRIX Chain" [44] whose aim is to merge blockchain and AI and set the path toward blockchain 3.0. The benefits that such technology will bring to distributed ledger technology comes down to making blockchain smarter and adding its ability to evolve through selflearning without the need to introduce AI as a separate technology.

A summary of the major applications integrating blockchain and AI is presented in **Table 1**, also showing an estimate of the extent of usage of each of the two technologies and the direction of what would need to be done to improve their future integration.


**Table 1.** Major applications, their current utilization of BC/AI and their future requirement.

## **21. Conclusions**

determines that traffic patterns are changing. Similarly, IoTs devices can go beyond setting temperatures and ordering food by using AI to predict what the owners want and modify the

IBM is experimenting with turning smart contracts into "cognitive contracts" that can learn and adapt using AI [43]. This can be done by identifying pattern changes in the data, recognizing interesting interactions, detecting suspect activities, etc., in order to make recommendations for updating the smart contracts and taking specific actions based on insights gained from AI. Clearly, such cognitive contracts can be applied to DAOs to improve their effective-

Lately, efforts are being made to integrate AI and blockchain technologies into a single application. At the technical level this has been attempted by a new type of blockchain called the "MATRIX Chain" [44] whose aim is to merge blockchain and AI and set the path toward blockchain 3.0. The benefits that such technology will bring to distributed ledger technology comes down to making blockchain smarter and adding its ability to evolve through self-

A summary of the major applications integrating blockchain and AI is presented in **Table 1**, also showing an estimate of the extent of usage of each of the two technologies and the direc-

**More AI More BC**

**Major applications Application uses mostly Future requirements**

Government operations Neither BC/AI\* Yes Yes

eHealth BC Yes Yes

FinTech AI Yes Supply chain Little BC Yes Yes Autonomous vehicles (AV) All AI Yes

settings to satisfy their evolving desires.

214 Artificial Intelligence - Emerging Trends and Applications

**20.2. Matrix chain: merging blockchain and AI**

ness and value.

\*

**20.1. Cognitive blockchain smart contracts (IBM) and DAO**

learning without the need to introduce AI as a separate technology.

tion of what would need to be done to improve their future integration.

Digital currencies BC Yes

Banking BC Yes

Internet of Things (IoT) All BC Yes Smart contracts All BC Yes DAOs BC Yes

**Table 1.** Major applications, their current utilization of BC/AI and their future requirement.

Apart from exceptions as Estonia and a few other countries.

Blockchain technology, according to Muneeb Ali, Blockstack Co-Founder, *"can help us advance from a 'don't be evil' world to a 'can't be evil' world".* Blockchain transactions assure trust and reliability, improve security and remove intermediaries from value chains. In a chapter, Tasca and Ulieru [3] state that, in a not-so-distant future, our economic structure will be organized around person-to-person decentralized platforms that could enable real sharing of marketplaces without intermediaries and central hubs, where all transactions between consumers and service providers will be done through decentralized, person-to-person networks. They discuss Uber and Airbnb as examples. Both companies create extra value exploiting their monopolistic advantage, derived from their centralized, proprietary software platforms, which allow them to dictate their conditions to drivers/owners and customers. LaZooz, using blockchain technology, on the other hand, has developed a decentralized transportation platform owned by the community, utilizing vehicles' unused space to create a variety of smart transportation solutions. LaZooz works with a "Fair Share" rewarding mechanism sharing value creation among developers, users and backers. Similarly, Slock (an Italian startup), uses open source blockchain technology, to develop the Universal Sharing Network (USN) to eliminate Airbnb's monopolistic advantages.

In addition to startups, established companies also seek to exploit the advantages of blockchain technology and diminish the monopolistic advantages of Internet giants. The CEO of TUI, the largest tourist firm in the world, believes that blockchain technology will break the almost "monopolistic" hold that Priceline, Expedia, Booking.com and Airbnb have today in the lodging and distribution ecosystem [45]. He believes that these firms create superior margins because they take advantage of their monopolistic power and that blockchain will destroy that. TUI, he explained, has already moved all of its contracts into its private blockchain. "We are using it today predominantly to have mechanisms to swap bedstock between different PMSs [Property Management Systems]," he said. "The next step is that the whole inventory will be on the blockchain." Then using smart contracts, which are simply code snippets that execute automatically on the blockchain, Joussen argues it can easily manage and automate a large part of bedstock and hotel capacity between all the markets TUI operates.

Clearly, TUI is not the only company developing blockchain applications. So, the critical question is how all these applications will affect the competitive landscape and how innovative startups will utilize blockchain technologies to disrupt established players and create the corresponding success stories of Amazon, Google and Facebook, among others, in the emerging Internet of value. In answering this question, we should have in mind Amara's law that states, *"We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run*". We strongly believe that in the long term, the Internet of Value will bring changes of equal or greater magnitude to those of the existing Internet of communications. The critical question is how to recognize such changes as soon as possible and how to profit by implementing them to gain competitive advantages. There is little doubt in our minds that in the next couple of decades, innovative, entrepreneurial startups marrying blockchain and AI technologies will disrupt established industry leaders such as Google, Amazon, Facebook, Uber and Airbnb, although they may not reach their size because of the limitations being imposed by the decentralized attributes of the blockchain technology.

[9] Compton J. How Blockchain Could Revolutionize The Internet Of Things. 2017. Available: https://www.forbes.com/sites/delltechnologies/2017/06/27/how-blockchain-could-revo-

Blockchain: The Next Breakthrough in the Rapid Progress of AI

http://dx.doi.org/10.5772/intechopen.75668

217

[10] Maor R. Cloud Computing: The Future Belongs to Blockchain. . 2017 Available: http:// www.forbes.co.il/news/new.aspx?Pn6VQ=L&0r9VQ=EIMKF. Last Accessed 30 October,

[11] Marvin R. Blockchain: The Invisible Technology That's Changing the World. 2017. Available: https://www.pcmag.com/article/351486/blockchain-the-invisible-technology-

[12] Bloomberg J. Eight Reasons To Be Skeptical About Blockchain. 2017. Available: https:// www.forbes.com/sites/jasonbloomberg/2017/05/31/eight-reasons-to-be-skeptical-about-

[13] Flieswasser K. Will blockchain disrupt or go bust?. 2017 Available: https://www.topbots.com/6-challenges-preventing-widespread-blockchain-technology-adoption/. Last

[14] Stoll C. Why the web won't be nirvana. 1995. Available: http://www.newsweek.com/ clifford-stoll-why-web-wont-be-nirvana-185306. Last Accessed 30 October, 2017 [15] Lagarde C. Central Banking and Fintech—A Brave New World?. 2017 Available: https:// www.imf.org/en/News/Articles/2017/09/28/sp092917-central-banking-and-fintech-a-

[16] Smart Dubai. Dubai blockchain strategy. 2016. Available: http://www.smartdubai.ae/

[17] Economist (The). Intelligence Unit Artificial Intelligence in the Real World: The business

[18] Gulf News. Dubai has world's first government entity to conduct transactions through Blockchain network. 2017. Available: http://gulfnews.com/business/property/dubaihas-world-s-first-government-entity-to-conduct-transactions-through-blockchain-net-

[19] Rajashekara M. 3 Ways In Which Fintech Is Riding The Blockchain Wave. 2017. Available: http://www.huffingtonpost.in/rajashekara-v-maiya/3-ways-in-which-fintech-is-riding-

[20] BBC. China bans initial coin offerings calling them 'illegal fundraising'. 2017 Available: http://www.bbc.com/news/business-41157249. Last Accessed 30 October, 2017

[21] Ekblaw A et al. A case study for Blockchain in healthcare: "MedRec" prototype for electronic health records and medical research data. IEEE Conference August. 2017;**2016**:22-24

[22] Halamka D, Lippman A, Ekblaw A. The Potential for Blockchain to Transform Electronic Health Records. 2017. Available: https://hbr.org/2017/03/the-potential-for-blockchain-to-

the-blockchain-wave\_a\_21876915/. Last Accessed 30 October, 2017

transform-electronic-health-records. Last Accessed 30 October, 2017

thats-changing-the-wor. Last Accessed 30 October, 2017

blockchain/1. Last Accessed 30 October, 2017

brave-new-world. Last Accessed 30 October, 2017

work-1.2101819. Last Accessed 30 October, 2017

dubai\_blockchain.php. Last Accessed 30 October, 2017

Accessed 30 October, 2017

case takes shape, London. 2016

lutionize-the-internet-of-things/#7617423b6eab. Last Accessed 30 October, 2017

2017

For us, the most interesting question is "who and in what areas are going to emerge the new Googles, Baidus, Facebooks, Amazons and Alibabas and how will they successfully exploit blockchain and AI, although such a marriage may still be several years away?"

## **Author details**

Spyros Makridakis\*, Antonis Polemitis, George Giaglis and Soula Louca

\*Address all correspondence to: spyrosmakridakis@gmail.com

Faculty University of Nicosia, Members of the Blockchain/AI Team, Institute For the Future (IFF), Nicosia

## **References**


[9] Compton J. How Blockchain Could Revolutionize The Internet Of Things. 2017. Available: https://www.forbes.com/sites/delltechnologies/2017/06/27/how-blockchain-could-revolutionize-the-internet-of-things/#7617423b6eab. Last Accessed 30 October, 2017

Uber and Airbnb, although they may not reach their size because of the limitations being

For us, the most interesting question is "who and in what areas are going to emerge the new Googles, Baidus, Facebooks, Amazons and Alibabas and how will they successfully exploit

Faculty University of Nicosia, Members of the Blockchain/AI Team, Institute For the Future

[1] IBM. IBM Study: C-suite executives exploring Blockchain aim to disrupt, Not Defend;

[2] McKendrick J. Blockchain as Blockbuster: Still Too Soon to Tell, But Get Ready. Forbes;

[3] Prisco G. Move Over Uber: Blockchain Technology Can Enable Real, Sustainable Sharing Economy; 2016. Available: https://bitcoinmagazine.com/articles/move-over-uber-blockchaintechnology-can-enable-real-sustainable-sharing-economy-1480629178/. Last accessed 30

[4] CBInsights.Com. Banking Is Only The Beginning: 36 Big Industries Blockchain Could Transform. 2018. Available: https://www.cbinsights.com/research/industries-disruptedblockchain/?utm\_source=CB+Insights+Newsletter&utm\_campaign=fa48df10a8-ThursNL\_ 02\_01\_2018&utm\_medium=email&utm\_term=0\_9dc0513989-fa48df10a8-90141994

[5] Tasca P, Aste T, Pelizzon L, Perony N, editors. Banking. Beyond Banks and Money: Springer

[6] Kocianski S.The blockchain in banking report: The future of blockchain solutions and technologies. 2017 Available: http://www.businessinsider.com/blockchain-in-banking-2017-3.

[7] Christidis K, Devetsikiotis M. Blockchains and smart contracts for the internet of things.

In: IEEE Access. 2016. DOI: 0.1109/ACCESS.2016.2566339

[8] Lohade N. Dubai Aims to Be a City Built on Blockchain, WSJ. 2017

imposed by the decentralized attributes of the blockchain technology.

**Author details**

216 Artificial Intelligence - Emerging Trends and Applications

(IFF), Nicosia

**References**

2017

2017

October, 2017

International Publishing; 2017

Last Accessed 30 October, 2017

blockchain and AI, although such a marriage may still be several years away?"

Spyros Makridakis\*, Antonis Polemitis, George Giaglis and Soula Louca

\*Address all correspondence to: spyrosmakridakis@gmail.com


[23] Das R. Does Blockchain Have A Place In Healthcare?. 2017 Available: https://www. forbes.com/sites/reenitadas/2017/05/08/does-blockchain-have-a-place-in-healthcare/. Last Accessed 30 October, 2017

[36] Accenture. Why the artificial intelligence is the future of growth. 2017. Available: https:// www.accenture.com/us-en/insight-artificial-intelligence-future-growth. Last Accessed

Blockchain: The Next Breakthrough in the Rapid Progress of AI

http://dx.doi.org/10.5772/intechopen.75668

219

[37] Satariano A, Kumar N. The Massive Hedge Fund Betting on AI. 2017 Available: https:// www.bloomberg.com/news/features/2017-09-27/the-massive-hedge-fund-betting-on-ai.

[38] Malkiel BG. A Random Walk Down Wall Street: The Time-Tested Strategy for Successful

[39] Makridakis S, Spiliotis E, Assimakopoulos V. The accuracy of machine learning (ML) forecasting methods versus statistical ones: Extending the results of the M3-competition.

[40] Craib R. A New Cryptocurrency For Coordinating Artificial Intelligence on Numerai. 2017. Available: https://medium.com/numerai/a-new-cryptocurrency-for-coordinating-

artificial-intelligence-on-numerai-9251a131419a. Last Accessed 30 October, 2017

[41] CB Insights. Amazon Strategy Teardown: Building New Business Pillars In AI, Next-Gen Logistics, And Enterprise Cloud Apps. 2017. Available: https://www.cbinsights. com/research/report/amazon-strategy-teardown/. Last Accessed 30 October, 2017 [42] Chandra M, Darbhe A. Artificial Intelligence: The next big thing in Supply Chain Management. 2016. Available: http://www.financialexpress.com/industry/artificialintelligence-the-next-big-thing-in-supply-chain-management/329033/. Last Accessed 30

[43] IBM, Building Trust in Government: Exploring the Potential of Blockchain, IBM Institute of Business Value and the Economist Intelligence Unit, London. 2017 Available: https:// www-03.ibm.com/press/us/en/pressrelease/52418.wss. Last Accessed 30 October, 2017

[44] Hebblethwaite C. Merging blockchain and AI with MATRIX Chain. 2017. Available: https://www.blockchaintechnology-news.com/2017/08/08/merging-blockchain-ai-

[45] Montali D. Blockchain Will Disrupt Expedia and Airbnb, TUI CEO Says. 2017. Available: https://skift.com/2017/07/11/blockchain-will-disrupt-expedia-and-airbnb-tui-ceo-says/.

30 October, 2017

October, 2017

Last Accessed 30 October, 2017

Investing. New York: W.W. Norton; 2017

matrix-chain/. Last Accessed 30 October, 2017

Last Accessed 30 October, 2017

2017. In: UNIC Working Paper. 2017


[36] Accenture. Why the artificial intelligence is the future of growth. 2017. Available: https:// www.accenture.com/us-en/insight-artificial-intelligence-future-growth. Last Accessed 30 October, 2017

[23] Das R. Does Blockchain Have A Place In Healthcare?. 2017 Available: https://www. forbes.com/sites/reenitadas/2017/05/08/does-blockchain-have-a-place-in-healthcare/.

[24] Agrello. How to Make Smart Contracts Worthy of Their Name Using Artificial Intelligence. 2017. Available: https://blog.agrello.org/how-to-make-smart-contracts-worthy-of-their-name-using-artificial-intelligence-3a90e4dd3c47. Last Accessed 30 October,

[25] Artificiallawyer.com. OpenLaw Brings Legal Norms to Blockchain Token Transactions 2017. Available: https://blog.agrello.org/how-to-make-smart-contracts-worthy-of-their-

name-using-artificial-intelligence-3a90e4dd3c47. Last Accessed 30 October, 2017

[26] Levine M. Blockchain Company's Smart Contracts Were Dumb. 2016. Available: https:// www.bloomberg.com/view/articles/2016-06-17/blockchain-company-s-smart-contracts-

[27] EY.com. EY, Guardtime and industry participants launch the world's first marine insurance blockchain platform 2017. Available: http://www.ey.com/gl/en/newsroom/ news-releases/news-ey-guardtime-and-industry-participants-launch-the-worlds-first-

[28] Jones B. The CDC Wants to Use Blockchain as a Weapon Against Deadly Epidemics. 2017. Available: https://futurism.com/the-cdc-wants-to-use-blockchain-as-a-weapon-against-

[29] Acheson N. Blockchain and Education: A Big Idea in Need of Bigger Thinking. 2017. Available: https://www.coinndesk.com/blockchain-education-big-idea-need-bigger-

[30] McGrath RM. The Pace of Technology Adoption Is Speeding up, Harvard Business

[31] O'Dwyer KJ, Malone D. Bitcoin Mining and Its Energy Footprint, ISSC 2014/CIICT 2014,

[32] Chen J, Micali S. Algorand. Technical report, 2017. URL http://arxiv.org/abs/1607.01341 [33] AIES. First Alliance for the Development of Artificial Intelligence and Block Chain Technologies will be Established in China for the Development of Integration of the two Technologies. 2017 Available: http://aies.in/first-alliance-development-artificial-intelligence-block-chain-technologies-will-established-china-development-integration-two-

[34] Pontin J. Greedy, Brittle, Opaque, and Shallow: The Downsides to Deep Learning, Wired.Com. 2018 Available: https://www.wired.com/story/greedy-brittle-opaque-and-

[35] Deloitte. AI-augmented government: Using cognitive technologies to redesign public

sector work, The Deloitte Center for Government Insights 2017

marine-insurance-blockchain-platform. Last Accessed 30 October, 2017

Last Accessed 30 October, 2017

218 Artificial Intelligence - Emerging Trends and Applications

were-dumb. Last Accessed 30 October, 2017

deadly-epidemics/. Last Accessed 30 October, 2017

thinking/. Last Accessed 30 October, 2017

technologies/ Last Accessed 30 October, 2017

shallow-the-downsides-to-deep-learning/

Review. 2013

Limerick. 2014


**Chapter 11**

**Provisional chapter**

**Augmenting Reality with Intelligent Interfaces**

**Augmenting Reality with Intelligent Interfaces**

DOI: 10.5772/intechopen.75751

It is clear that our daily reality will increasingly interface with virtual inputs. We already integrate the virtual into real life through constantly evolving sensor technologies embedded into our smartphones, digital assistants, and connected devices. Simultaneously, we seek more virtual input into our reality through intelligent interfaces for the applications that these devices can run in a context rich, socially connected, and personalized way. As we progress toward a future of ubiquitous Augmented Reality (AR) interfaces, it will be important to consider how this technology can best serve the various populations that can benefit most from the addition of these intelligent interfaces. This paper proposes a new terminological framework to discuss the way AR interacts with users. An intelligent interface that combines digital objects in a real-world context can be referred to as a Pose-Interfaced Presentation (PIP): Pose refers to user location and orientation in space; Interfaced means that the program responds to a user's intention and actions in an intelligent way; and Presentation refers to the virtual object or data being layered onto the perceptive field of the user. Finally, various benefits of AR are described and examples

are provided in the areas of education, worker training, and ESL learning.

**Keywords:** mixed reality, Augmented Reality for education, intelligent interface,

Researchers have long foreseen that Augmented Reality (AR) and Virtual Reality (VR) technologies have the potential to play a large role in our future daily lives, education, and business operations [1, 2]. The ubiquity of sensor-rich technologies in the pockets of the masses, the rapid pace of technological progress for such devices, and widespread interest in mixed-reality interfaces by both researchers and corporations, are harbingers of that foresight. Since it is

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Dov Schafer and David Kaufman

Dov Schafer and David Kaufman

http://dx.doi.org/10.5772/intechopen.75751

Pose-Interfaced Presentation, PIP

**Abstract**

**1. Introduction**

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

#### **Augmenting Reality with Intelligent Interfaces Augmenting Reality with Intelligent Interfaces**

DOI: 10.5772/intechopen.75751

#### Dov Schafer and David Kaufman Dov Schafer and David Kaufman

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75751

#### **Abstract**

It is clear that our daily reality will increasingly interface with virtual inputs. We already integrate the virtual into real life through constantly evolving sensor technologies embedded into our smartphones, digital assistants, and connected devices. Simultaneously, we seek more virtual input into our reality through intelligent interfaces for the applications that these devices can run in a context rich, socially connected, and personalized way. As we progress toward a future of ubiquitous Augmented Reality (AR) interfaces, it will be important to consider how this technology can best serve the various populations that can benefit most from the addition of these intelligent interfaces. This paper proposes a new terminological framework to discuss the way AR interacts with users. An intelligent interface that combines digital objects in a real-world context can be referred to as a Pose-Interfaced Presentation (PIP): Pose refers to user location and orientation in space; Interfaced means that the program responds to a user's intention and actions in an intelligent way; and Presentation refers to the virtual object or data being layered onto the perceptive field of the user. Finally, various benefits of AR are described and examples are provided in the areas of education, worker training, and ESL learning.

**Keywords:** mixed reality, Augmented Reality for education, intelligent interface, Pose-Interfaced Presentation, PIP

## **1. Introduction**

Researchers have long foreseen that Augmented Reality (AR) and Virtual Reality (VR) technologies have the potential to play a large role in our future daily lives, education, and business operations [1, 2]. The ubiquity of sensor-rich technologies in the pockets of the masses, the rapid pace of technological progress for such devices, and widespread interest in mixed-reality interfaces by both researchers and corporations, are harbingers of that foresight. Since it is

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

clear that we are moving toward a future where the virtual will play a big role in our real lives, how we best interface with the computers in our natural environment will become an increasingly important consideration.

An intelligent interface is one that learns about the user and can adapt to serve them better through a minimally invasive presentation. An interface designed to overcome challenges faced by specific user groups by responding to the context of the user can be called intelligent [3]. Mixed-reality applications should be created to assist people and to be responsive to a diverse range of needs in a variety of contexts. This paper will discuss the rise of Augmented Reality as a means to interface with data in an intelligent way by discussing some examples of AR designed specifically for the needs of different learner groups: workers, students, foreign language learners, and immigrants.

This literature review to parse the current state of Augmented Reality for Education was done between October and November 2017 using ERIC ProQuest, the Social Sciences Citation Index (SSCI) database, and Google Scholar. The search terms 'augmented reality' OR 'mixed reality' OR 'augmenting reality' + 'education' OR 'learning' were used. Search results were limited to articles and conference papers in English. Much of the foundational research and history of AR was discovered through reading the most widely cited works on AR according to Google Scholar's reverse citation search function. The purposes of this review are to place AR in the broader historical context of technologies that inhabit the spectrum of mixed reality within education technology research and to paint a compelling picture of the current state of AR for enhancing learning.

Reality and virtuality act as two opposite ends of a continuum [7]. Milgram and Kishino [7] proposed that there were various ways that reality and virtual content could be mixed and presented. Reality could begin as a location in physical space or could be computer generated. In a VR setting, reality becomes entirely virtual and replaces natural reality to create an immersive experience. Inversely, the goal of an AR system is to "enhance reality with digital content in a non-immersive way." ([4], p. 79). A mixed-reality program can be to varying degrees exocentric or egocentric [7]; the user can feel as if reality is situated within the program, such as when playing an immersive 3D video game with a head-mounted display [HMD], or the program can seem to be layered on top of reality and situated within it, as when viewing context specific information about your environment on a smart phone or interacting

Augmenting Reality with Intelligent Interfaces http://dx.doi.org/10.5772/intechopen.75751 223

The real environment acts as a substrate for virtual modifications. In AR, the user can maintain a direct view of the substrate background upon which the virtual is layered, as with HoloLens [9], or alongside the substrate, as exemplified with smart glasses [10]. Alternatively, the substrate can be processed by a camera and presented on a display; this is what smartphones currently do. A seemingly more futuristic option is that the virtual object can be projected directly into reality, such as a spatial AR or hologram [11]. The focus of this paper is the exploration of AR interfaces for the benefit of different learner populations, but the points made herein can be applied to the VR or AV ends of the continuum as well. The exact mix of how much reality is injected into the virtual (or visa-versa) is worthy of consideration when

Klopfer and Squire [12] resist using technological features to define AR; they claim AR occurs in "a situation in which a real-world context is dynamically overlaid with coherent location or context sensitive virtual information" ([12], p. 205). Initially, many researchers have tried to craft an exact definition for AR (e.g., [2, 7]), but in Klopfer and Squire's opinion, any definition

with a virtual tour guide in a museum (**Figure 1**).

**Figure 1.** Modified version of Milgram's virtuality continuum [8].

designing and studying mixed-reality applications.

**3. Pose-Interfaced Presentation: A classification framework**

## **2. The spectrum of mixed reality**

In the early 1990s, Tom Caudell and David Mizell were exploring the possibility of a headsup display to enable workers to more efficiently create wire harness bundles during aircraft manufacturing at Boeing. They developed a see-through visual interface to identify, through the use of virtual cues, which real wires should be bundled together. They published the first academic paper that used the term 'Augmented Reality' [1] and are often credited with coining the term. Since its inception, AR has been widely studied in many contexts, while its underlying technologies have progressed by leaps and bounds [4]. The general principle of a continuum between the real and the virtual still exists [5] and is becoming an increasingly poignant consideration in our daily lives.

Augmented Reality is not a specific device or program; it is a type of human-computer interaction that occurs through a combination of technologies that superimpose computer-generated content over a real-world environment. Encompassing a broad range of technologies and components, AR has been historically defined as any system that: (1) combines the real and the virtual, (2) is interactive in real time, and (3) appears three dimensionally [2]. AR overlays virtual objects into the real world. These virtual objects then appear to coexist in the same space as objects in the real world for the purposes of interacting with the user in some meaningful way [6].

**Figure 1.** Modified version of Milgram's virtuality continuum [8].

clear that we are moving toward a future where the virtual will play a big role in our real lives, how we best interface with the computers in our natural environment will become an

An intelligent interface is one that learns about the user and can adapt to serve them better through a minimally invasive presentation. An interface designed to overcome challenges faced by specific user groups by responding to the context of the user can be called intelligent [3]. Mixed-reality applications should be created to assist people and to be responsive to a diverse range of needs in a variety of contexts. This paper will discuss the rise of Augmented Reality as a means to interface with data in an intelligent way by discussing some examples of AR designed specifically for the needs of different learner groups: workers, students, foreign

This literature review to parse the current state of Augmented Reality for Education was done between October and November 2017 using ERIC ProQuest, the Social Sciences Citation Index (SSCI) database, and Google Scholar. The search terms 'augmented reality' OR 'mixed reality' OR 'augmenting reality' + 'education' OR 'learning' were used. Search results were limited to articles and conference papers in English. Much of the foundational research and history of AR was discovered through reading the most widely cited works on AR according to Google Scholar's reverse citation search function. The purposes of this review are to place AR in the broader historical context of technologies that inhabit the spectrum of mixed reality within education technology research and to paint a compelling picture of the current state of

In the early 1990s, Tom Caudell and David Mizell were exploring the possibility of a headsup display to enable workers to more efficiently create wire harness bundles during aircraft manufacturing at Boeing. They developed a see-through visual interface to identify, through the use of virtual cues, which real wires should be bundled together. They published the first academic paper that used the term 'Augmented Reality' [1] and are often credited with coining the term. Since its inception, AR has been widely studied in many contexts, while its underlying technologies have progressed by leaps and bounds [4]. The general principle of a continuum between the real and the virtual still exists [5] and is becoming an increasingly

Augmented Reality is not a specific device or program; it is a type of human-computer interaction that occurs through a combination of technologies that superimpose computer-generated content over a real-world environment. Encompassing a broad range of technologies and components, AR has been historically defined as any system that: (1) combines the real and the virtual, (2) is interactive in real time, and (3) appears three dimensionally [2]. AR overlays virtual objects into the real world. These virtual objects then appear to coexist in the same space as objects in the real world for the purposes of interacting with the user in some

increasingly important consideration.

222 Artificial Intelligence - Emerging Trends and Applications

language learners, and immigrants.

AR for enhancing learning.

**2. The spectrum of mixed reality**

poignant consideration in our daily lives.

meaningful way [6].

Reality and virtuality act as two opposite ends of a continuum [7]. Milgram and Kishino [7] proposed that there were various ways that reality and virtual content could be mixed and presented. Reality could begin as a location in physical space or could be computer generated. In a VR setting, reality becomes entirely virtual and replaces natural reality to create an immersive experience. Inversely, the goal of an AR system is to "enhance reality with digital content in a non-immersive way." ([4], p. 79). A mixed-reality program can be to varying degrees exocentric or egocentric [7]; the user can feel as if reality is situated within the program, such as when playing an immersive 3D video game with a head-mounted display [HMD], or the program can seem to be layered on top of reality and situated within it, as when viewing context specific information about your environment on a smart phone or interacting with a virtual tour guide in a museum (**Figure 1**).

The real environment acts as a substrate for virtual modifications. In AR, the user can maintain a direct view of the substrate background upon which the virtual is layered, as with HoloLens [9], or alongside the substrate, as exemplified with smart glasses [10]. Alternatively, the substrate can be processed by a camera and presented on a display; this is what smartphones currently do. A seemingly more futuristic option is that the virtual object can be projected directly into reality, such as a spatial AR or hologram [11]. The focus of this paper is the exploration of AR interfaces for the benefit of different learner populations, but the points made herein can be applied to the VR or AV ends of the continuum as well. The exact mix of how much reality is injected into the virtual (or visa-versa) is worthy of consideration when designing and studying mixed-reality applications.

## **3. Pose-Interfaced Presentation: A classification framework**

Klopfer and Squire [12] resist using technological features to define AR; they claim AR occurs in "a situation in which a real-world context is dynamically overlaid with coherent location or context sensitive virtual information" ([12], p. 205). Initially, many researchers have tried to craft an exact definition for AR (e.g., [2, 7]), but in Klopfer and Squire's opinion, any definition would restrict the exact meaning, so any technology that combines real and digital information could be considered to be augmenting reality. An intelligent AR interface requires the combination of tracking input, to find the pose of the user, with displaying output to combine the real and virtual world together in order to support the user. Therefore, this paper proposes a new terminological framework with which to discuss the way AR interacts with users: an intelligent interface that combines digital objects in a real-world context can be referred to as a Pose-Interfaced Presentation (PIP):

fidelity. One well-known approach for real-time 3D environment modeling is the Microsoft KinectFusion system. Newcombe et al. [16] describe KinectFusion as using visual data obtained from 'structured light depth sensors' to create real-time 3D models of objects and environments, and these models are also used for tracking the pose of the device and the user in the environment. By combining tracking of the environment with tracking of the users' limbs, KinectFusion allows for a highly accurate gesture-based human-computer interface that allows users to interact with a detailed 3D model of the environment. This type of 'natural user interface' [8] has enough fidelity to even track facial expressions [17] and may end up becoming as common in future mobile devices as multi-touch screens are today. We can expect pose track-

Augmenting Reality with Intelligent Interfaces http://dx.doi.org/10.5772/intechopen.75751 225

Once the user pose is known, the user needs to be able to interact with the program. AR programs, by their physically situated and reality-referenced nature, tap into an intrinsic human affinity for physical rules and spaces. The interface between the virtual and the real world is accomplished through physical gestures, voice, movement through space, and gaze [18]. That interaction should be intelligent; that is, it should allow interaction with computers in a minimally invasive, intuitive, and efficient way that minimizes excessive perceptual and cognitive load [19, 20]. An intelligent interface should allow a program to be able to accurately determine the wishes of the user and respond to their needs with minimal input and data display. The ultimate goal is to create an interface that is functionally invisible to the user and makes interacting with the virtual world as natural as interacting with real-world objects, removing the separation between the digital and physical. Augmented Reality is one of the first tech-

nologies to make this type of intelligent, reality-referenced interface possible [4].

**3.3. Presentation: Augmented Reality displays and devices**

Interfaces should support users' plans and goals in order to be effective. An intelligent interface presents information clearly and is easy to understand [19]. There is a lot of overlap between 'intelligent' interfaces and what could be called 'effective user-experience design.' Sullivan et al. [19] state that the requirements of intelligent interfaces are essentially those of human-computer interaction research in general. AR is a form of human-computer interaction that facilitates intelligent interface design because of its ability to make complex interactions feel grounded in reality. Situating the experience of computing in physically referenced space is a good first step toward creating a type of human-computer interface that is responsive and

Our reality will be increasingly interfaced with virtual information; the trillion-dollar question is how the digital world should best be interacted with and presented to users. Research firm Global Market Insights [21] predicts that the global market for AR products will surge by 80% to \$165 billion by 2024. Giant US blue-chip corporations have recently invested heavily in AR, in products such as Microsoft's AR headset and enterprise feature suite HoloLens [8], Apple's AR platform (ARKit) [22], Google (project X, formerly Glass) [9], Android's AR platform (ARCore) [23], Facebook's VR headset (Oculus Rift) [24], and ambitious startups such as Magic Leap [25] and Meta [26] who seek to create their own entire AR ecosystems. Although

ing to improve in both speed and accuracy over the coming years.

**3.2. Interfaced: responding to human intention**

feels meaningful.


The following three sections discussing the current state of the field based upon the above PIP framework.

#### **3.1. Pose: tracking on smart devices**

The keys to a successful device running an AR interface are (1) the accuracy of the tracking inputs to accurately overlay virtual objects in the field of view, (2) speed of the processor to be able to layer more complex virtual objects (such as videos and 3d models) onto the field of view, and (3) ease of portability, so the user is free to move around in the real world unencumbered. An important consideration for successful AR devices is that they have an array of highly accurate sensors in order to establish and maintain detailed data about the environment. The application will make use of the data to situate a virtual output in real space. In a mobile phone, a visual AR interface uses the built-in camera to send image data to the application, which is programmed to find some pre-determined pattern (e.g., a QR code, picture, or building) that exists in the real world to determine user and device pose. The application layers a virtual object over a real-time display of the environment, presented on the screen. A location-based AR interface uses GPS co-ordinates and compass, while inertial tracking uses a gyroscope and accelerometer, rather than the camera, to track pose. Data triangulation through multiple-sensor input is required to eliminate drift in visually-established pose [4]. While AR visual, location-based, and inertial sensors are by far the most common methods of tracking pose, AR could also mean sensing sound, electromagnetic fields, radio waves, infrared light, or temperature. Hybrid pose tracking methods, combining input from multiple sensors at once, are far more reliable [13].

Pose tracking accuracy, and sensors in general, will continue to develop, as smartphone technology investment shows no signs of slowing. As of May 2016, the global average user range error for GPS receivers was ≤2.3 ft. 95% of the time [14]. GPS enhancement technologies such as Real Time Kinematic, which uses the GPS signal's carrier wave for measurement, have the potential to improve accuracy to the centimeter level. This increased level of sensitivity will allow AR-based mobile applications to be able to establish exactly where the device is located in space, and therefore make much better use of location information in context-aware applications.

Another area of rapid progress that will improve AR is visual sensor technology. 3D camera systems are already appearing in flagship smartphones and should continue to proliferate [15]. The use of multiple cameras can provide depth information in order to greatly improve pose fidelity. One well-known approach for real-time 3D environment modeling is the Microsoft KinectFusion system. Newcombe et al. [16] describe KinectFusion as using visual data obtained from 'structured light depth sensors' to create real-time 3D models of objects and environments, and these models are also used for tracking the pose of the device and the user in the environment. By combining tracking of the environment with tracking of the users' limbs, KinectFusion allows for a highly accurate gesture-based human-computer interface that allows users to interact with a detailed 3D model of the environment. This type of 'natural user interface' [8] has enough fidelity to even track facial expressions [17] and may end up becoming as common in future mobile devices as multi-touch screens are today. We can expect pose tracking to improve in both speed and accuracy over the coming years.

## **3.2. Interfaced: responding to human intention**

would restrict the exact meaning, so any technology that combines real and digital information could be considered to be augmenting reality. An intelligent AR interface requires the combination of tracking input, to find the pose of the user, with displaying output to combine the real and virtual world together in order to support the user. Therefore, this paper proposes a new terminological framework with which to discuss the way AR interacts with users: an intelligent interface that combines digital objects in a real-world context can be referred to

• Pose—user location and orientation in space is accurately tracked and sent to the program; • Interfaced—the program responds to a user's intention and actions in an intelligent way; and

The following three sections discussing the current state of the field based upon the above

The keys to a successful device running an AR interface are (1) the accuracy of the tracking inputs to accurately overlay virtual objects in the field of view, (2) speed of the processor to be able to layer more complex virtual objects (such as videos and 3d models) onto the field of view, and (3) ease of portability, so the user is free to move around in the real world unencumbered. An important consideration for successful AR devices is that they have an array of highly accurate sensors in order to establish and maintain detailed data about the environment. The application will make use of the data to situate a virtual output in real space. In a mobile phone, a visual AR interface uses the built-in camera to send image data to the application, which is programmed to find some pre-determined pattern (e.g., a QR code, picture, or building) that exists in the real world to determine user and device pose. The application layers a virtual object over a real-time display of the environment, presented on the screen. A location-based AR interface uses GPS co-ordinates and compass, while inertial tracking uses a gyroscope and accelerometer, rather than the camera, to track pose. Data triangulation through multiple-sensor input is required to eliminate drift in visually-established pose [4]. While AR visual, location-based, and inertial sensors are by far the most common methods of tracking pose, AR could also mean sensing sound, electromagnetic fields, radio waves, infrared light, or temperature. Hybrid pose tracking methods, combining input from multiple sensors at once, are far more reliable [13].

Pose tracking accuracy, and sensors in general, will continue to develop, as smartphone technology investment shows no signs of slowing. As of May 2016, the global average user range error for GPS receivers was ≤2.3 ft. 95% of the time [14]. GPS enhancement technologies such as Real Time Kinematic, which uses the GPS signal's carrier wave for measurement, have the potential to improve accuracy to the centimeter level. This increased level of sensitivity will allow AR-based mobile applications to be able to establish exactly where the device is located in space, and therefore make much better use of location information in context-aware applications.

Another area of rapid progress that will improve AR is visual sensor technology. 3D camera systems are already appearing in flagship smartphones and should continue to proliferate [15]. The use of multiple cameras can provide depth information in order to greatly improve pose

• Presentation—virtual objects or data are layered on to the perceptive field of the user.

as a Pose-Interfaced Presentation (PIP):

224 Artificial Intelligence - Emerging Trends and Applications

**3.1. Pose: tracking on smart devices**

PIP framework.

Once the user pose is known, the user needs to be able to interact with the program. AR programs, by their physically situated and reality-referenced nature, tap into an intrinsic human affinity for physical rules and spaces. The interface between the virtual and the real world is accomplished through physical gestures, voice, movement through space, and gaze [18]. That interaction should be intelligent; that is, it should allow interaction with computers in a minimally invasive, intuitive, and efficient way that minimizes excessive perceptual and cognitive load [19, 20]. An intelligent interface should allow a program to be able to accurately determine the wishes of the user and respond to their needs with minimal input and data display. The ultimate goal is to create an interface that is functionally invisible to the user and makes interacting with the virtual world as natural as interacting with real-world objects, removing the separation between the digital and physical. Augmented Reality is one of the first technologies to make this type of intelligent, reality-referenced interface possible [4].

Interfaces should support users' plans and goals in order to be effective. An intelligent interface presents information clearly and is easy to understand [19]. There is a lot of overlap between 'intelligent' interfaces and what could be called 'effective user-experience design.' Sullivan et al. [19] state that the requirements of intelligent interfaces are essentially those of human-computer interaction research in general. AR is a form of human-computer interaction that facilitates intelligent interface design because of its ability to make complex interactions feel grounded in reality. Situating the experience of computing in physically referenced space is a good first step toward creating a type of human-computer interface that is responsive and feels meaningful.

## **3.3. Presentation: Augmented Reality displays and devices**

Our reality will be increasingly interfaced with virtual information; the trillion-dollar question is how the digital world should best be interacted with and presented to users. Research firm Global Market Insights [21] predicts that the global market for AR products will surge by 80% to \$165 billion by 2024. Giant US blue-chip corporations have recently invested heavily in AR, in products such as Microsoft's AR headset and enterprise feature suite HoloLens [8], Apple's AR platform (ARKit) [22], Google (project X, formerly Glass) [9], Android's AR platform (ARCore) [23], Facebook's VR headset (Oculus Rift) [24], and ambitious startups such as Magic Leap [25] and Meta [26] who seek to create their own entire AR ecosystems. Although sentiment on VR softened in the first half of 2017, corporate investors are now actively looking for mobile AR investment opportunities. Digi-Capital's new AR/VR Investment Report reported that \$1.8 billion was invested across 27 AR/VR investment sectors in the first three quarters of 2017 [27].

Major corporations are placing a lot at stake in AR. Taking into account these recent trends in the AR development space, it is important to ask how this emergent technology can best be applied, and who might benefit from the paradigm shift of layering data onto reality rather than alongside it. AR development should consider how best to leverage this new form of human-computer interaction to empower people who need support. In order to do this, it is useful to identify what AR is especially good at doing in real-world contexts in order to help

Augmenting Reality with Intelligent Interfaces http://dx.doi.org/10.5772/intechopen.75751 227

As early as 2011, Yuen et al. [29] predicted five directions for AR and provided multiple examples. While many current AR educational projects have focused on scientific inquiry (e.g., astronomy, mathematics, architecture, and engineering), educators have been working on AR projects in fields such as such as language, art, political science, sport management, textiles, fashion merchandising, and food and nutrition. AR learning tools allow students access to capabilities and resources that can dramatically increase the effectiveness of individual study and discovery learning [29]. For example, many historic sites supply overlay maps and different points of historic information for their visitors. AR will allow visitors to experience an event such as a treaty signing or a battle while they visit an historic location. Medical students around the world will be able to easily and inexpensively work on digital cadavers or dummies that can be reset after each use. Through the situated learning implicit in AR, learners

With the continued development of AR-enabled educational books and games, combined with AR models of the real world, AR has the potential to make learning ubiquitous, allowing learners to gain immediate access to a wide range of location-specific information from many sources. In the areas of non-formal and informal education, museums and other public organizations are only beginning to explore how the technology can be used to improve experiences. For example, the Franklin Institute in Philadelphia has created an AR exhibit about

For many teachers and students, the task of creating 3D models for AR is too difficult, since it requires significant technical knowledge. However, easier-to-use development kits are the goal of many firms investing in AR, so these problems should ease with time. Both Apple and Android are continually improving their native AR platforms to provide greater functionality, easier integration, and better performance. Flagship phones from most manufacturers are more than capable of running these AR platforms. Third party software developers and social media companies, such as Facebook, are working toward incorporating Augmented Reality into their platforms. These developments will accelerate the development of new applications

One consistent promise of AR has been in its application to formal and informal learning settings [31–33]. Indeed, several canonical learning theories describe the potential benefit of an augmented version of reality that is crafted with the intention to give context-specific, justin-time information to a learner in an unobtrusive way. As a teaching and learning method, AR aligns well with both situated and constructivist learning theory. These theories frame the

focus design efforts and maximize potential benefits.

the Terracotta Warriors [30].

in education and other fields.

**4.1. Augmented Reality as a tool for enhancing learning**

will be better able to transfer what they learn into authentic situations.

## **4. Current and future trends in Augmented Reality**

The future of AR lies within the usage and adoption of this new human-information interaction (HII) paradigm, not in the AR devices themselves. Technology will evolve over time, but fundamental improvements to HII and intelligent interface design will survive those mutations. Although head-mounted AR display technologies such as Meta Vision, Magic Leap, and HoloLens are grabbing all the media attention, they are merely stepping stones toward the widespread proliferation and integration of AR into mainstream culture. These flashy new devices are still too large and expensive for daily consumer use. HMDs have limited field of view and image fidelity as well as practical limitations such as battery life and limited product lifespan. That being said, HMD AR devices have demonstrated utility within commercial and industrial applications, as is discussed in Section 4.2 of this chapter.

As technology improves, mobile devices are becoming ubiquitous and capable of running AR applications. It used to be the case that immersive reality augmentation would require expensive, stationary computers and bulky HMDs. The first portable AR interfaces involved LCD screens tethered to desktop computers that provided the necessary tracking and processing power [28]. The sensors required for establishing the position and orientation (pose) of the user and device relative to the environment have also rapidly evolved since mobile phones received their first cameras in 1997 and Mohring and Bimber demonstrated the first mobile phone-based AR application in 2004 [4]. Today, smartphones can easily and smoothly run the software required for AR and are outfitted with a plethora of useful sensors/receivers (e.g., gyroscope, compass, accelerometer, light sensors, cameras, GPS, WIFI radio, near-field communication or NFC) which allow programs on smartphones to accurately track pose and make use of situational data that can be layered on top of the environment to augment the reality that the user experiences. Thanks to the development of more advanced smartphones and widely available software platforms such as Apple's ARKit and Android's ARCore, seamless dialog between reality and virtuality can now be accomplished much more easily than ever before, and this bodes well for the future of mass AR adoption.

We are progressing toward a technological future where our daily reality is intermixed with digital information that is presented to us in a real-time, user-friendly way. Earlier in this paper we outlined the PIP framework for the discussion of AR; our smartphones are mature, widely used devices capable of the Pose-Interfaced Presentation required for advanced AR interface. Adoption is not going to happen all at once, but many areas of our lives will be noticeably changed over the coming years as AR slowly replaces more traditional means of human-computer interaction. Entertainment, marketing, commerce, online social interaction, and professional practice are all likely to see major technological changes due to AR.

Major corporations are placing a lot at stake in AR. Taking into account these recent trends in the AR development space, it is important to ask how this emergent technology can best be applied, and who might benefit from the paradigm shift of layering data onto reality rather than alongside it. AR development should consider how best to leverage this new form of human-computer interaction to empower people who need support. In order to do this, it is useful to identify what AR is especially good at doing in real-world contexts in order to help focus design efforts and maximize potential benefits.

sentiment on VR softened in the first half of 2017, corporate investors are now actively looking for mobile AR investment opportunities. Digi-Capital's new AR/VR Investment Report reported that \$1.8 billion was invested across 27 AR/VR investment sectors in the first three

The future of AR lies within the usage and adoption of this new human-information interaction (HII) paradigm, not in the AR devices themselves. Technology will evolve over time, but fundamental improvements to HII and intelligent interface design will survive those mutations. Although head-mounted AR display technologies such as Meta Vision, Magic Leap, and HoloLens are grabbing all the media attention, they are merely stepping stones toward the widespread proliferation and integration of AR into mainstream culture. These flashy new devices are still too large and expensive for daily consumer use. HMDs have limited field of view and image fidelity as well as practical limitations such as battery life and limited product lifespan. That being said, HMD AR devices have demonstrated utility within commercial

As technology improves, mobile devices are becoming ubiquitous and capable of running AR applications. It used to be the case that immersive reality augmentation would require expensive, stationary computers and bulky HMDs. The first portable AR interfaces involved LCD screens tethered to desktop computers that provided the necessary tracking and processing power [28]. The sensors required for establishing the position and orientation (pose) of the user and device relative to the environment have also rapidly evolved since mobile phones received their first cameras in 1997 and Mohring and Bimber demonstrated the first mobile phone-based AR application in 2004 [4]. Today, smartphones can easily and smoothly run the software required for AR and are outfitted with a plethora of useful sensors/receivers (e.g., gyroscope, compass, accelerometer, light sensors, cameras, GPS, WIFI radio, near-field communication or NFC) which allow programs on smartphones to accurately track pose and make use of situational data that can be layered on top of the environment to augment the reality that the user experiences. Thanks to the development of more advanced smartphones and widely available software platforms such as Apple's ARKit and Android's ARCore, seamless dialog between reality and virtuality can now be accomplished much more easily than

We are progressing toward a technological future where our daily reality is intermixed with digital information that is presented to us in a real-time, user-friendly way. Earlier in this paper we outlined the PIP framework for the discussion of AR; our smartphones are mature, widely used devices capable of the Pose-Interfaced Presentation required for advanced AR interface. Adoption is not going to happen all at once, but many areas of our lives will be noticeably changed over the coming years as AR slowly replaces more traditional means of human-computer interaction. Entertainment, marketing, commerce, online social interaction,

and professional practice are all likely to see major technological changes due to AR.

**4. Current and future trends in Augmented Reality**

and industrial applications, as is discussed in Section 4.2 of this chapter.

ever before, and this bodes well for the future of mass AR adoption.

quarters of 2017 [27].

226 Artificial Intelligence - Emerging Trends and Applications

As early as 2011, Yuen et al. [29] predicted five directions for AR and provided multiple examples. While many current AR educational projects have focused on scientific inquiry (e.g., astronomy, mathematics, architecture, and engineering), educators have been working on AR projects in fields such as such as language, art, political science, sport management, textiles, fashion merchandising, and food and nutrition. AR learning tools allow students access to capabilities and resources that can dramatically increase the effectiveness of individual study and discovery learning [29]. For example, many historic sites supply overlay maps and different points of historic information for their visitors. AR will allow visitors to experience an event such as a treaty signing or a battle while they visit an historic location. Medical students around the world will be able to easily and inexpensively work on digital cadavers or dummies that can be reset after each use. Through the situated learning implicit in AR, learners will be better able to transfer what they learn into authentic situations.

With the continued development of AR-enabled educational books and games, combined with AR models of the real world, AR has the potential to make learning ubiquitous, allowing learners to gain immediate access to a wide range of location-specific information from many sources. In the areas of non-formal and informal education, museums and other public organizations are only beginning to explore how the technology can be used to improve experiences. For example, the Franklin Institute in Philadelphia has created an AR exhibit about the Terracotta Warriors [30].

For many teachers and students, the task of creating 3D models for AR is too difficult, since it requires significant technical knowledge. However, easier-to-use development kits are the goal of many firms investing in AR, so these problems should ease with time. Both Apple and Android are continually improving their native AR platforms to provide greater functionality, easier integration, and better performance. Flagship phones from most manufacturers are more than capable of running these AR platforms. Third party software developers and social media companies, such as Facebook, are working toward incorporating Augmented Reality into their platforms. These developments will accelerate the development of new applications in education and other fields.

## **4.1. Augmented Reality as a tool for enhancing learning**

One consistent promise of AR has been in its application to formal and informal learning settings [31–33]. Indeed, several canonical learning theories describe the potential benefit of an augmented version of reality that is crafted with the intention to give context-specific, justin-time information to a learner in an unobtrusive way. As a teaching and learning method, AR aligns well with both situated and constructivist learning theory. These theories frame the learner within a real-world physical and social context. AR affords the type of scaffolding and social context that constructivism values. It allows for participatory and metacognitive learning processes through authentic inquiry, situated observation, peer connections, coaching, and reciprocal peer learning, in line with a situated learning theory-driven view of education [34, 35].

addressed educational uses of AR technology. 102 articles were discovered, 68 of which were determined to be relevant after applying the inclusion criteria detailed in the paper. They found that improved learner outcomes were the most common finding, with 'enhanced learning achievement' at the top of the mentions list. Next was motivation, followed by understanding, attitude, and satisfaction. Decreased cognitive load and increased spatial ability were interesting finds. Pedagogical contributions were the second major category of reported findings, with enhanced enjoyment and engagement mentioned most frequently. Pedagogical advantages included both collaboration opportunities and 'promotes self-learning,' which

Augmenting Reality with Intelligent Interfaces http://dx.doi.org/10.5772/intechopen.75751 229

The present paper undertook an update to the above literature search to see if the trend has continued since Akçayır and Akçayır's [41] search. Using the same terms and inclusion criteria in December 2017 that they used in early 2016 ('augmented reality' OR 'mixed reality' OR 'augmenting reality') yielded 74 relevant results: 34 from 2016 and 40 from 2017. A search on EBSCO Education Source using the same terms from 2016 to 2017 yielded 145 English scholarly journal results. The trend of articles reporting increases in student learning and motivation continued, and researchers seemed to agree that AR remains one of the most important developments on the educational horizon. AR is expected to achieve widespread adoption in two to three years within higher education, and four to five years in K-12 education [42].

Although all three overviews of the field presented here found similar positive educational outcomes associated with AR, there are some gaps in research that should be addressed. Wu et al. [39] caution that the educational value of AR is not solely based on the technologies themselves but also on how AR is designed, implemented, and integrated into formal and informal learning settings. In light of the fact that most AR research has been short-term, quasi-experimental, mixed-methods, or design-based research using small sample sizes, Wu et al. recommend that more rigorous studies are needed to examine the true learning effects. Akçayır and Akçayır [41] found that some studies reported that AR decreases cognitive load, while others reported that it can actually lead to cognitive overload. They note that reasons for this difference are unclear, but it is likely to be due to design differences across studies. Similarly, there are conflicting reports of usability issues. "Whether there is a real usability issue […] that stems from inadequate technology experience, interface design errors, technical problems, or the teacher's lack of technology experience (or negative attitude) still needs to be clarified" ([41], p. 8). These discrepancies are emblematic of the broad array of design strategies implemented across different iterations of AR-based learning and of the lack of long-term, rigorous studies that compare larger samples of learner populations. Future research should aim to develop empirically-proven design principles for intelligent AR interface design that

Another continuing trend is that educational AR research has largely ignored diverse learner populations. Wu et al. [39] and Akçayır and Akçayır [41] both note that educational AR should be extended through design for different educational purposes, such as assessment and tutoring, as well as for students with special needs. It will be important, moving forward, to design AR with a focus on diverse learner populations in order to maximize its potential benefits, especially when one considers the potential power of AR as an assistive device for those with

eliminate cognitive overload and facilitate ease of use.

language barriers and learning disabilities [43, 44].

seems a bit contradictory.

The Cognitive Theory of Multimedia Learning can also be used to discuss the potential benefits of AR in education [36]. This theory states that there are multiple channels (or pathways) for sensory input to be brought into working memory. People learn best by using combined words and patterns, such as images, rather than text alone [37]. In this theory, learning is an active process of filtering, selecting, and using input from different sensory channels. Since each channel can become overloaded, it is important to reduce extraneous load, which is reflected in van Merriënboer and Sweller's notion of cognitive load reduction through mixing of input channels [38]. AR can accomplish this input channel mixing in an elegant way by overlaying intelligently presented interface data onto a view of the actual environment. Germane (as opposed to extraneous) cognitive load can be achieved through a smart mix of real and virtual inputs in order to free up working memory. Limiting extraneous cognitive load paves the way for more effortless learning.

In light of its conceptual alignment with multiple learning theories, AR has been increasingly investigated by educational technology researchers. One of the most widely cited systematic reviews of AR in education is by Wu et al. [39], who reviewed 54 different education-related studies of AR that were indexed in the Social Sciences Citation Index (SSCI) database from January 2000 to October 2012. They found widely-reported positive effects on students' learning, improvements in collaborative learning, and increases in student and teacher reports of motivation. By 2013, a consensus among educational researchers was building that AR was going to play a significant role in the future of instructional design. Researchers also agreed that, at that moment, the research into instructional design using AR was still in its infancy, but AR would become one of the key emerging technologies for education over the next five years [39]. In the past five years, there has been growing scholarly interest in the application of AR in educational settings.

In the SSCI database, Chen et al. [40] found 55 papers published from 2011 to 2016 on the topic of AR in education. Of those 55, only 16 were published between 2015 and 2016, which indicates that the literature search likely took place early in 2016. The authors found widely reported increases in learning performance and motivation; deepened student engagement, improved perceived enjoyment, and positive attitudes were associated with students who learned with AR-based instruction. However, this meta-analysis suffered from some transparency and methodological issues: learning performance was not defined, and data was not presented on how categories were coded or how much support existed between categories. Though they claimed to have searched for papers published from 2011 to 2016, the authors did not report their inclusion criteria or when their search took place. They claimed to have used the search term 'Augmented Reality' and searched through 2016, but in light of other researchers' findings, one would expect many more papers to have been found for 2015 and 2016.

Akçayır and Akçayır [41] performed a more transparent and exhaustive review of AR research related to education written in English from 1980 to January 15, 2016. They located and then analyzed all of the published studies in the SSCI journal database (to the end of 2015) that addressed educational uses of AR technology. 102 articles were discovered, 68 of which were determined to be relevant after applying the inclusion criteria detailed in the paper. They found that improved learner outcomes were the most common finding, with 'enhanced learning achievement' at the top of the mentions list. Next was motivation, followed by understanding, attitude, and satisfaction. Decreased cognitive load and increased spatial ability were interesting finds. Pedagogical contributions were the second major category of reported findings, with enhanced enjoyment and engagement mentioned most frequently. Pedagogical advantages included both collaboration opportunities and 'promotes self-learning,' which seems a bit contradictory.

learner within a real-world physical and social context. AR affords the type of scaffolding and social context that constructivism values. It allows for participatory and metacognitive learning processes through authentic inquiry, situated observation, peer connections, coaching, and reciprocal peer learning, in line with a situated learning theory-driven view of education [34, 35]. The Cognitive Theory of Multimedia Learning can also be used to discuss the potential benefits of AR in education [36]. This theory states that there are multiple channels (or pathways) for sensory input to be brought into working memory. People learn best by using combined words and patterns, such as images, rather than text alone [37]. In this theory, learning is an active process of filtering, selecting, and using input from different sensory channels. Since each channel can become overloaded, it is important to reduce extraneous load, which is reflected in van Merriënboer and Sweller's notion of cognitive load reduction through mixing of input channels [38]. AR can accomplish this input channel mixing in an elegant way by overlaying intelligently presented interface data onto a view of the actual environment. Germane (as opposed to extraneous) cognitive load can be achieved through a smart mix of real and virtual inputs in order to free up working memory. Limiting extraneous cognitive

In light of its conceptual alignment with multiple learning theories, AR has been increasingly investigated by educational technology researchers. One of the most widely cited systematic reviews of AR in education is by Wu et al. [39], who reviewed 54 different education-related studies of AR that were indexed in the Social Sciences Citation Index (SSCI) database from January 2000 to October 2012. They found widely-reported positive effects on students' learning, improvements in collaborative learning, and increases in student and teacher reports of motivation. By 2013, a consensus among educational researchers was building that AR was going to play a significant role in the future of instructional design. Researchers also agreed that, at that moment, the research into instructional design using AR was still in its infancy, but AR would become one of the key emerging technologies for education over the next five years [39]. In the past five years, there has been growing scholarly interest in the application

In the SSCI database, Chen et al. [40] found 55 papers published from 2011 to 2016 on the topic of AR in education. Of those 55, only 16 were published between 2015 and 2016, which indicates that the literature search likely took place early in 2016. The authors found widely reported increases in learning performance and motivation; deepened student engagement, improved perceived enjoyment, and positive attitudes were associated with students who learned with AR-based instruction. However, this meta-analysis suffered from some transparency and methodological issues: learning performance was not defined, and data was not presented on how categories were coded or how much support existed between categories. Though they claimed to have searched for papers published from 2011 to 2016, the authors did not report their inclusion criteria or when their search took place. They claimed to have used the search term 'Augmented Reality' and searched through 2016, but in light of other researchers' findings, one would expect many more papers to have been found for 2015 and 2016.

Akçayır and Akçayır [41] performed a more transparent and exhaustive review of AR research related to education written in English from 1980 to January 15, 2016. They located and then analyzed all of the published studies in the SSCI journal database (to the end of 2015) that

load paves the way for more effortless learning.

228 Artificial Intelligence - Emerging Trends and Applications

of AR in educational settings.

The present paper undertook an update to the above literature search to see if the trend has continued since Akçayır and Akçayır's [41] search. Using the same terms and inclusion criteria in December 2017 that they used in early 2016 ('augmented reality' OR 'mixed reality' OR 'augmenting reality') yielded 74 relevant results: 34 from 2016 and 40 from 2017. A search on EBSCO Education Source using the same terms from 2016 to 2017 yielded 145 English scholarly journal results. The trend of articles reporting increases in student learning and motivation continued, and researchers seemed to agree that AR remains one of the most important developments on the educational horizon. AR is expected to achieve widespread adoption in two to three years within higher education, and four to five years in K-12 education [42].

Although all three overviews of the field presented here found similar positive educational outcomes associated with AR, there are some gaps in research that should be addressed. Wu et al. [39] caution that the educational value of AR is not solely based on the technologies themselves but also on how AR is designed, implemented, and integrated into formal and informal learning settings. In light of the fact that most AR research has been short-term, quasi-experimental, mixed-methods, or design-based research using small sample sizes, Wu et al. recommend that more rigorous studies are needed to examine the true learning effects.

Akçayır and Akçayır [41] found that some studies reported that AR decreases cognitive load, while others reported that it can actually lead to cognitive overload. They note that reasons for this difference are unclear, but it is likely to be due to design differences across studies. Similarly, there are conflicting reports of usability issues. "Whether there is a real usability issue […] that stems from inadequate technology experience, interface design errors, technical problems, or the teacher's lack of technology experience (or negative attitude) still needs to be clarified" ([41], p. 8). These discrepancies are emblematic of the broad array of design strategies implemented across different iterations of AR-based learning and of the lack of long-term, rigorous studies that compare larger samples of learner populations. Future research should aim to develop empirically-proven design principles for intelligent AR interface design that eliminate cognitive overload and facilitate ease of use.

Another continuing trend is that educational AR research has largely ignored diverse learner populations. Wu et al. [39] and Akçayır and Akçayır [41] both note that educational AR should be extended through design for different educational purposes, such as assessment and tutoring, as well as for students with special needs. It will be important, moving forward, to design AR with a focus on diverse learner populations in order to maximize its potential benefits, especially when one considers the potential power of AR as an assistive device for those with language barriers and learning disabilities [43, 44].

#### **4.2. Augmented Reality for worker training**

Education does not only occur in the classroom. Industrial training is an area that has received considerable attention by educational technologists; to wit, AR itself was first conceived of as worker training system [1]. Tools that provide customized instruction for workers have been repeatedly shown to significantly improve task performance [45]. Although seemingly futuristic, the application of AR interfaces for educating and guiding workers has been developing for quite some time. Nearly 20 years ago, AR displays were being shown to significantly improve performance on manual assembly tasks [46]. For example, the ARVIKA consortium was a German government-supported research initiative with 23 partners from the automotive, aerospace, manufacturing, and industrial research sectors that ran for 4 years from 1999 to 2003, aimed at developing AR technology to improve manufacturing [47]. AR has been shown to aid industrial and military maintenance-related tasks by increasing the speed at which components and schematics can be compared by over 56%, compared to traditional information display methods [48].

demonstrable benefits to their learner populations. The issue with industry-led technology development is that it usually occurs behind closed doors, out of the light of scrutiny of competitors and of researchers who might be able to benefit from their study. Because neither GE nor Thyssenkrupp has made this research public, it is unclear whether there are any potential downsides to AR interfaces in complex assembly tasks, such as the potential for distractions and workplace accidents relating to visual field obstruction. There is clear evidence of the distracting nature of virtual objects [54]; in fact, virtual objects can be so visually captivating that they have even been used to distract patients enough to reduce the perception of chronic pain and phantom limb pain in clinical settings [55, 56]. Poorly designed AR interfaces have also been shown to be distracting to drivers [57]. In light of such findings, it is reasonable to ask if layering virtual interfaces over complex assembly tasks could lead to dangerous distraction. It will be increasingly important for AR interface designers to consider the cognitive load of information that is presented to users in order to make them as minimally invasive as possible

Augmenting Reality with Intelligent Interfaces http://dx.doi.org/10.5772/intechopen.75751 231

as we move toward a future that promises an increasingly augmented view of reality.

Although the vast majority of Augmented Reality systems have been designed to assist in industrial settings and STEM-related education [40], AR has also been shown to work well in other disciplines, since it is an effective tool for situating knowledge in real contexts [58]. One promising educational application of AR technology, both inside and outside the classroom,

English is arguably the most popular second language (L2) in the world [43]. For most students who study English in their home countries, English is learned as a foreign language (EFL). Unfortunately, not everyone has the means or opportunity to study abroad in order to reap the benefits of immersion. In countries like Taiwan, Japan, and Korea, English is not widely spoken in everyday life, yet learning English remains an important priority for a large number of people there due to its prominence as the language of international commerce, entertainment, and higher education. In EFL classrooms, English teaching is not connected to daily reality. Traditional L2 instruction in these countries usually relies more on knowledge acquisition than on skill or fluency, although simply repeating words, explaining grammatical rules, and reading irrelevant text leads to low motivation and poor learning outcomes [61]. Compounding these issues, EFL students often have few opportunities to practice English outside the classroom.

Contextualizing learning improves EFL performance [62]. Augmented Reality has been shown to contextualize information [63] while providing increased opportunities for language practice and serendipitous language learning [64]. AR has also been shown to increase long-term retention in vocabulary acquisition tasks [58]. This makes AR a fitting platform for EFL learners, who lack exactly what it may be able to provide: real-world context, motivation, exposure

AR for language learning has only recently begun to receive attention in research. Ho et al. [65] studied AR as a tool for ubiquitous learning (U-learning) in EFL at the University of Taiwan. U-learning is a concept closely related to the same proliferation of powerful mobile devices that was discussed earlier in this paper, which is facilitating the rise of mixed-reality applications.

to vocabulary, and opportunities to practice what was learned.

**4.3. Augmented Reality for language learners**

is facilitating language acquisition [59, 60].

Today, the use of AR interfaces to superimpose useful training and technical information onto the real-world view of workers is gaining widespread adoption [49]. AR enables people to visualize concepts that would otherwise be challenging, in order to make use of abstract concepts and interface with complex systems [50]. One salient example of an AR interface used for abstract concept visualization is the CreativiTIC Innova SL [51]. The Innova SL can recognize individual circuit boards and overlay useful information about their components, aiding workers in industrial settings or functioning as a lab guide for engineering students. AR can also overlay instructions over real-world object targets, thereby increasing accuracy and speed [47]. Westerfield et al. [45] explored the use of an intelligent AR tutor to assist with teaching motherboard assembly procedures, rather than simply guiding people through the steps, and was able to show a 40% improvement in learning outcomes in a post-test for students who used the AR intelligent tutor.

Technically complex assembly tasks are well suited to AR interfaces. General Electric (GE) has leveraged the use of consumer-grade AR glasses, powered by custom learning software, for wind turbine assembly [52]. With the glasses, workers can view installation instructions and component diagrams in the same visual frame as the actual task, thereby increasing efficiency and reducing the number of errors. GE reports a 34% increase in productivity from the very first time the technician uses the AR interface, compared to non-augmented controls.

Many HMD AR devices, such as the Microsoft HoloLens, are being used in enterprise education settings. Thyssenkrupp Elevator have developed a custom AR interface for 24,000 service engineers using the Microsoft HoloLens, and the application has reportedly reduced the average length of service calls by a factor of four by allowing technicians to access interactive 3D schematics of the elevator systems they are working on without looking at a separate screen [53]. Technicians can also communicate and share field of view directly with support staff, in order to give situationally specific advice.

It is often the case that industry takes the reins of research for emerging assistive technologies in order to develop them for profit. This is certainly true for Augmented Reality devices; for-profit industry has driven the development of AR technology precisely because of its demonstrable benefits to their learner populations. The issue with industry-led technology development is that it usually occurs behind closed doors, out of the light of scrutiny of competitors and of researchers who might be able to benefit from their study. Because neither GE nor Thyssenkrupp has made this research public, it is unclear whether there are any potential downsides to AR interfaces in complex assembly tasks, such as the potential for distractions and workplace accidents relating to visual field obstruction. There is clear evidence of the distracting nature of virtual objects [54]; in fact, virtual objects can be so visually captivating that they have even been used to distract patients enough to reduce the perception of chronic pain and phantom limb pain in clinical settings [55, 56]. Poorly designed AR interfaces have also been shown to be distracting to drivers [57]. In light of such findings, it is reasonable to ask if layering virtual interfaces over complex assembly tasks could lead to dangerous distraction. It will be increasingly important for AR interface designers to consider the cognitive load of information that is presented to users in order to make them as minimally invasive as possible as we move toward a future that promises an increasingly augmented view of reality.

## **4.3. Augmented Reality for language learners**

**4.2. Augmented Reality for worker training**

230 Artificial Intelligence - Emerging Trends and Applications

information display methods [48].

dents who used the AR intelligent tutor.

order to give situationally specific advice.

Education does not only occur in the classroom. Industrial training is an area that has received considerable attention by educational technologists; to wit, AR itself was first conceived of as worker training system [1]. Tools that provide customized instruction for workers have been repeatedly shown to significantly improve task performance [45]. Although seemingly futuristic, the application of AR interfaces for educating and guiding workers has been developing for quite some time. Nearly 20 years ago, AR displays were being shown to significantly improve performance on manual assembly tasks [46]. For example, the ARVIKA consortium was a German government-supported research initiative with 23 partners from the automotive, aerospace, manufacturing, and industrial research sectors that ran for 4 years from 1999 to 2003, aimed at developing AR technology to improve manufacturing [47]. AR has been shown to aid industrial and military maintenance-related tasks by increasing the speed at which components and schematics can be compared by over 56%, compared to traditional

Today, the use of AR interfaces to superimpose useful training and technical information onto the real-world view of workers is gaining widespread adoption [49]. AR enables people to visualize concepts that would otherwise be challenging, in order to make use of abstract concepts and interface with complex systems [50]. One salient example of an AR interface used for abstract concept visualization is the CreativiTIC Innova SL [51]. The Innova SL can recognize individual circuit boards and overlay useful information about their components, aiding workers in industrial settings or functioning as a lab guide for engineering students. AR can also overlay instructions over real-world object targets, thereby increasing accuracy and speed [47]. Westerfield et al. [45] explored the use of an intelligent AR tutor to assist with teaching motherboard assembly procedures, rather than simply guiding people through the steps, and was able to show a 40% improvement in learning outcomes in a post-test for stu-

Technically complex assembly tasks are well suited to AR interfaces. General Electric (GE) has leveraged the use of consumer-grade AR glasses, powered by custom learning software, for wind turbine assembly [52]. With the glasses, workers can view installation instructions and component diagrams in the same visual frame as the actual task, thereby increasing efficiency and reducing the number of errors. GE reports a 34% increase in productivity from the very

Many HMD AR devices, such as the Microsoft HoloLens, are being used in enterprise education settings. Thyssenkrupp Elevator have developed a custom AR interface for 24,000 service engineers using the Microsoft HoloLens, and the application has reportedly reduced the average length of service calls by a factor of four by allowing technicians to access interactive 3D schematics of the elevator systems they are working on without looking at a separate screen [53]. Technicians can also communicate and share field of view directly with support staff, in

It is often the case that industry takes the reins of research for emerging assistive technologies in order to develop them for profit. This is certainly true for Augmented Reality devices; for-profit industry has driven the development of AR technology precisely because of its

first time the technician uses the AR interface, compared to non-augmented controls.

Although the vast majority of Augmented Reality systems have been designed to assist in industrial settings and STEM-related education [40], AR has also been shown to work well in other disciplines, since it is an effective tool for situating knowledge in real contexts [58]. One promising educational application of AR technology, both inside and outside the classroom, is facilitating language acquisition [59, 60].

English is arguably the most popular second language (L2) in the world [43]. For most students who study English in their home countries, English is learned as a foreign language (EFL). Unfortunately, not everyone has the means or opportunity to study abroad in order to reap the benefits of immersion. In countries like Taiwan, Japan, and Korea, English is not widely spoken in everyday life, yet learning English remains an important priority for a large number of people there due to its prominence as the language of international commerce, entertainment, and higher education. In EFL classrooms, English teaching is not connected to daily reality. Traditional L2 instruction in these countries usually relies more on knowledge acquisition than on skill or fluency, although simply repeating words, explaining grammatical rules, and reading irrelevant text leads to low motivation and poor learning outcomes [61]. Compounding these issues, EFL students often have few opportunities to practice English outside the classroom.

Contextualizing learning improves EFL performance [62]. Augmented Reality has been shown to contextualize information [63] while providing increased opportunities for language practice and serendipitous language learning [64]. AR has also been shown to increase long-term retention in vocabulary acquisition tasks [58]. This makes AR a fitting platform for EFL learners, who lack exactly what it may be able to provide: real-world context, motivation, exposure to vocabulary, and opportunities to practice what was learned.

AR for language learning has only recently begun to receive attention in research. Ho et al. [65] studied AR as a tool for ubiquitous learning (U-learning) in EFL at the University of Taiwan. U-learning is a concept closely related to the same proliferation of powerful mobile devices that was discussed earlier in this paper, which is facilitating the rise of mixed-reality applications. Essentially, U-learning refers to any learning environment, combined with pervasive technology, which is responsive to user needs. These environments support sharing and context-aware content and deliver personalized content to the user [66]. U-learning is a paradigm made possible by technologies like Augmented Reality that operate in a ubiquitous computing environment.

mobile games can be powerful tools for situating learning by creating authentic opportunities for collaborative engagement with both language and culture. In Mentira, for example, students interact with a historical version of Los Griegos, New Mexico, in order to solve a murder mystery in Spanish. The game blends location cues, group activities, and over 70 pages of scripted dialog spoken by AR characters to create an immersive language learning experience. Though ARIS games tend to have favorable learning outcomes documented by design-based research, place-based mobile games are limited by a lack of usability away from the specific place for which they are designed. Their limitation of relying on hard-coded content is being addressed by object recognition-based AR platforms such as WordSense [64], which overlays contextual information on everyday objects in order to create serendipitous learning experiences in mixed reality. Current technological limitations in 3D object recognition will need to be overcome in order to facilitate accurate object recognition-based AR language learning tools.

The examples described above demonstrate the potential of Augmented Reality to enhance education and training in diverse settings. **Table 1** lists these benefits and provides a brief description and reference for each. It is clear from the table that AR can provide many power-

**AR benefits Description Example in research** Visualization Helping students to visualize and comprehend complex, abstract data Klopfer and Squire

Contextualization Connecting new information with the physical world in which it occurs Ternier et al. [82]

[12]

Augmenting Reality with Intelligent Interfaces http://dx.doi.org/10.5772/intechopen.75751 233

Dunleavy et al. [34]

Ab Aziz et al. [83]

Vazquez et al. [64]

Godwin-Jones [59]

It is worth noting that studies reporting on the use of serious games in education, anywhere on the virtuality spectrum, tend to focus solely on the positive effects rather than examining possible negative outcomes [84]. While indications of improved learning outcomes and motivation are good to see, it is important to consider that the use of technology in the classroom tends to have the typical cycle of hype, research, and disillusionment [85]. It is possible that

ful and unique benefits that support the claims made about its future potential.

Situation Blurring the boundaries between inside- and outside-of-classroom activity while encouraging social interaction

Attention Drawing and keeping the voluntary focus of students who attend to

Motivation Facilitating gameful design and increasing student engagement in

incidental learning in everyday life

Integration Increasing opportunities for ubiquitous interactions with knowledge and

digital information

learning tasks

**Table 1.** Benefits of Augmented Reality.

**5. Benefits of Augmented Reality in education**

**6. Discussion**

Some research on EFL U-learning using AR has been shown to improve learner outcomes. Santos et al. [58] used multimedia learning theory as a framework to apply AR to EFL settings in order to reduce cognitive load during vocabulary learning tasks. Building on the design work of previous AR language researchers [67–69], they found significant student improvements in Filipino and German vocabulary retention when implementing situated vocabulary learning by overlaying relevant words and animations over real objects found within the learner environment. Students in the study also reported higher satisfaction with their own learning outcomes and had an easier time paying attention to lessons. There appears to be promise in teaching vocabulary through the relationship between virtual data and real objects in AR, through its ability to situate information in context.

Augmented Reality language education extends beyond EFL settings; for the ever-increasing number of voluntary migrant learners and involuntary refugees, rising to the challenges of sociocultural integration and language education are of utmost importance for their success [70, 71]. Recent work by Bradley et al. [72] suggests that language learning on mobile devices can assist in easing the sociocultural transition of migrant learners. This view is supported by previous researchers [73, 74], who affirm that AR platforms allow migrant learners to practice language in context and to interact with socially normative language usage. It is also widely held that AR can be designed to increase social interactions by requiring the learner to use real-world information and locations [41, 64].

Augmented Reality games are emerging as a powerful tool for language learning. Godwin-Jones [59] offers a concise overview of the last decade of AR language learning projects, noting that while commercially available games can be made to fit pedagogical purposes, so-called 'serious games' that are designed for learning are easier to align to learning objectives. The problem is that mechanics and narrative immersion are often compromised in the pursuit of pedagogical targets, resulting in games that feel less enjoyable than commercial titles to play [75]. In addition, the financing available to create serious games is often orders of magnitude less than for big commercial titles, leading to smaller and less experienced development teams.

Commercially available AR games can be used for educational purposes. AR gaming experienced a recent spike in popularity with the viral outbreak of Niantic's Pokémon GO [76]. Niantic's first AR title, Ingress (www.ingress.com), is still popular and is scheduled for a major revamp in 2018 [77], and the company is poised to introduce the smartphone AR game Harry Potter: Wizards Unite in 2018. Recently, researchers have examined commercial mixedreality games for language learning, pointing to various pedagogical applications and outcomes including opportunities for vocabulary learning [78], digital storytelling as a product of play and engagement [79], increased opportunities for L2 practice [59], improved learner confidence [80], and improved willingness to communicate in L2 [81].

AR games that are explicitly designed for language learning have been the subject of increasing scholarly attention. Games like Mentira [73], designed using the open source platform Augmented Reality Interactive Storytelling (ARIS), have demonstrated that place-based mobile games can be powerful tools for situating learning by creating authentic opportunities for collaborative engagement with both language and culture. In Mentira, for example, students interact with a historical version of Los Griegos, New Mexico, in order to solve a murder mystery in Spanish. The game blends location cues, group activities, and over 70 pages of scripted dialog spoken by AR characters to create an immersive language learning experience.

Though ARIS games tend to have favorable learning outcomes documented by design-based research, place-based mobile games are limited by a lack of usability away from the specific place for which they are designed. Their limitation of relying on hard-coded content is being addressed by object recognition-based AR platforms such as WordSense [64], which overlays contextual information on everyday objects in order to create serendipitous learning experiences in mixed reality. Current technological limitations in 3D object recognition will need to be overcome in order to facilitate accurate object recognition-based AR language learning tools.

## **5. Benefits of Augmented Reality in education**

The examples described above demonstrate the potential of Augmented Reality to enhance education and training in diverse settings. **Table 1** lists these benefits and provides a brief description and reference for each. It is clear from the table that AR can provide many powerful and unique benefits that support the claims made about its future potential.


**Table 1.** Benefits of Augmented Reality.

## **6. Discussion**

Essentially, U-learning refers to any learning environment, combined with pervasive technology, which is responsive to user needs. These environments support sharing and context-aware content and deliver personalized content to the user [66]. U-learning is a paradigm made possible by technologies like Augmented Reality that operate in a ubiquitous computing environment.

Some research on EFL U-learning using AR has been shown to improve learner outcomes. Santos et al. [58] used multimedia learning theory as a framework to apply AR to EFL settings in order to reduce cognitive load during vocabulary learning tasks. Building on the design work of previous AR language researchers [67–69], they found significant student improvements in Filipino and German vocabulary retention when implementing situated vocabulary learning by overlaying relevant words and animations over real objects found within the learner environment. Students in the study also reported higher satisfaction with their own learning outcomes and had an easier time paying attention to lessons. There appears to be promise in teaching vocabulary through the relationship between virtual data and real objects

Augmented Reality language education extends beyond EFL settings; for the ever-increasing number of voluntary migrant learners and involuntary refugees, rising to the challenges of sociocultural integration and language education are of utmost importance for their success [70, 71]. Recent work by Bradley et al. [72] suggests that language learning on mobile devices can assist in easing the sociocultural transition of migrant learners. This view is supported by previous researchers [73, 74], who affirm that AR platforms allow migrant learners to practice language in context and to interact with socially normative language usage. It is also widely held that AR can be designed to increase social interactions by requiring the learner to use

Augmented Reality games are emerging as a powerful tool for language learning. Godwin-Jones [59] offers a concise overview of the last decade of AR language learning projects, noting that while commercially available games can be made to fit pedagogical purposes, so-called 'serious games' that are designed for learning are easier to align to learning objectives. The problem is that mechanics and narrative immersion are often compromised in the pursuit of pedagogical targets, resulting in games that feel less enjoyable than commercial titles to play [75]. In addition, the financing available to create serious games is often orders of magnitude less than for big commercial titles, leading to smaller and less experienced development teams. Commercially available AR games can be used for educational purposes. AR gaming experienced a recent spike in popularity with the viral outbreak of Niantic's Pokémon GO [76]. Niantic's first AR title, Ingress (www.ingress.com), is still popular and is scheduled for a major revamp in 2018 [77], and the company is poised to introduce the smartphone AR game Harry Potter: Wizards Unite in 2018. Recently, researchers have examined commercial mixedreality games for language learning, pointing to various pedagogical applications and outcomes including opportunities for vocabulary learning [78], digital storytelling as a product of play and engagement [79], increased opportunities for L2 practice [59], improved learner

AR games that are explicitly designed for language learning have been the subject of increasing scholarly attention. Games like Mentira [73], designed using the open source platform Augmented Reality Interactive Storytelling (ARIS), have demonstrated that place-based

confidence [80], and improved willingness to communicate in L2 [81].

in AR, through its ability to situate information in context.

real-world information and locations [41, 64].

232 Artificial Intelligence - Emerging Trends and Applications

It is worth noting that studies reporting on the use of serious games in education, anywhere on the virtuality spectrum, tend to focus solely on the positive effects rather than examining possible negative outcomes [84]. While indications of improved learning outcomes and motivation are good to see, it is important to consider that the use of technology in the classroom tends to have the typical cycle of hype, research, and disillusionment [85]. It is possible that we may be on the cusp of disillusionment, as AR interfaces become ubiquitous in everyday life and begin to lose some of their novelty for students. It is also important to hedge discussion of positive research outcomes by considering that investigations of novel technology in educational settings may often demonstrate best-case-scenario possibilities [86]. Much of the motivational and attentional findings of AR research could be due to an uncontrolled novelty effect.

Longer-term studies, and studies that control for time-on-task during learning, are needed in order to determine whether motivational and learning outcome findings persist and can be transferred to real-world contexts. Educational technology researchers tend to design new interventions from the ground up for their unique student-learning contexts. A key limitation of AR studies has been a lack of maintenance probes, as most studies have only measured initial learning [39]. Longer-term effects of AR learning interventions would help support

Augmenting Reality with Intelligent Interfaces http://dx.doi.org/10.5772/intechopen.75751 235

AR should not be looked at as a panacea to fix all of education, but rather a form of humaninformation interaction that is very effective at serving context-specific and spatially-situated virtual data. The design of each application and its appropriate alignment to learning targets will inevitably determine its efficacy in helping learners to achieve learning targets. AR educational games also rely on thoughtful design and enjoyable mechanics in order to be successful tools for learning and engagement. However, not all AR interfaces are created equal, and therefore it is not always possible to generalize findings. Regardless, there are innate features of AR that allow it to be used as an empowering platform. Taken as a whole, the corpus makes clear that AR has identifiable strengths and will clearly have an

[1] Caudell TP, Mizell DW. Augmented reality: An application of heads-up display technology to manual manufacturing processes. In: Proceedings of the Twenty-Fifth Hawaii International Conference on Systems Sciences; 7-10 January 1992; Kauai, HI, USA. New York, NY, USA: IEEE; 1992. pp. 659-669. DOI: 10.1109/HICSS.1992.183317 [2] Azuma R. A survey of augmented reality. Presence: Teleoperators and Virtual Environ-

[3] Aztiria A, Augusto JC, Basagoiti R, Izaguirre A, Cook DJ. Discovering frequent user–environment interactions in intelligent environments. Personal and Ubiquitous Computing.

findings of improved learner outcomes.

**7. Conclusion**

impact in the near future.

Dov Schafer and David Kaufman\*

\*Address all correspondence to: dkaufman@sfu.ca

ments. 1997;**6**(4):355-385. DOI: 10.1162/pres.1997.6.4.355

2012;**16**(1):91-103. DOI: 10.1007/s00779-011-0471-4

Simon Fraser University, Burnaby, BC, Canada

**Author details**

**References**

The Gartner Group, in its 2017 Hype Cycle for Emerging Technologies chart (**Figure 2**) reported that AR is moving into the 'slope of enlightenment' [87]. They offer an optimistic outlook for AR in the next five to 10 years, as much of the hype and novelty has worn off and the real work of designing useful products and integrating this new technology into the productive fabric of culture is underway. Interestingly, they note that VR is well on its way to becoming a productive tool in the next two to five years. We should view these findings cautiously, since they are simply measures of consumer expectation, corporate investment, and technology industry sentiment. However, while it is likely true that AR is five to 10 years from mass-market adoption, the applicability of these tools to create intelligent interfaces that empower learners is real.

For users who can most benefit from this technology, specific learning and accommodation needs should serve to guide researchers and designers who will be responsible for shaping the way that these new technologies will come into being.

It is important that we not only build on the successes of previous design-based educational research but recognize why certain design strategies have failed [88]. It may be beneficial to replicate and extend some of the AR designs in which researchers have invested so much time and effort over the last few years rather than 'throw the baby out with the bathwater.'

**Figure 2.** Years to mainstream adoption of various technologies [87].

Longer-term studies, and studies that control for time-on-task during learning, are needed in order to determine whether motivational and learning outcome findings persist and can be transferred to real-world contexts. Educational technology researchers tend to design new interventions from the ground up for their unique student-learning contexts. A key limitation of AR studies has been a lack of maintenance probes, as most studies have only measured initial learning [39]. Longer-term effects of AR learning interventions would help support findings of improved learner outcomes.

## **7. Conclusion**

we may be on the cusp of disillusionment, as AR interfaces become ubiquitous in everyday life and begin to lose some of their novelty for students. It is also important to hedge discussion of positive research outcomes by considering that investigations of novel technology in educational settings may often demonstrate best-case-scenario possibilities [86]. Much of the motivational and attentional findings of AR research could be due to an uncontrolled novelty effect. The Gartner Group, in its 2017 Hype Cycle for Emerging Technologies chart (**Figure 2**) reported that AR is moving into the 'slope of enlightenment' [87]. They offer an optimistic outlook for AR in the next five to 10 years, as much of the hype and novelty has worn off and the real work of designing useful products and integrating this new technology into the productive fabric of culture is underway. Interestingly, they note that VR is well on its way to becoming a productive tool in the next two to five years. We should view these findings cautiously, since they are simply measures of consumer expectation, corporate investment, and technology industry sentiment. However, while it is likely true that AR is five to 10 years from mass-market adoption, the applicability of these tools to create intelligent interfaces that empower learners is real. For users who can most benefit from this technology, specific learning and accommodation needs should serve to guide researchers and designers who will be responsible for shaping

It is important that we not only build on the successes of previous design-based educational research but recognize why certain design strategies have failed [88]. It may be beneficial to replicate and extend some of the AR designs in which researchers have invested so much time

and effort over the last few years rather than 'throw the baby out with the bathwater.'

the way that these new technologies will come into being.

234 Artificial Intelligence - Emerging Trends and Applications

**Figure 2.** Years to mainstream adoption of various technologies [87].

AR should not be looked at as a panacea to fix all of education, but rather a form of humaninformation interaction that is very effective at serving context-specific and spatially-situated virtual data. The design of each application and its appropriate alignment to learning targets will inevitably determine its efficacy in helping learners to achieve learning targets. AR educational games also rely on thoughtful design and enjoyable mechanics in order to be successful tools for learning and engagement. However, not all AR interfaces are created equal, and therefore it is not always possible to generalize findings. Regardless, there are innate features of AR that allow it to be used as an empowering platform. Taken as a whole, the corpus makes clear that AR has identifiable strengths and will clearly have an impact in the near future.

## **Author details**

Dov Schafer and David Kaufman\*

\*Address all correspondence to: dkaufman@sfu.ca

Simon Fraser University, Burnaby, BC, Canada

## **References**


[4] Billinghurst M, Clark A, Lee G.A survey of augmented reality. Foundations and Trends® in Human–Computer Interaction. 2015;**8**(2-3):73-272. DOI: 10.1561/1100000049

[17] Zhang Z. Microsoft Kinect sensor and its effect. IEEE Multimedia. 2012;**19**(2):4-10. DOI:

Augmenting Reality with Intelligent Interfaces http://dx.doi.org/10.5772/intechopen.75751 237

[18] Höllerer T, Feiner S, Hallaway D, Bell B, Lanzagorta M, Brown D, Julier S, Baillot Y, Rosenblum L.User interface management techniques for collaborative mobile augmented reality. Computers & Graphics. 2001;**25**(5):799-810. DOI: 10.1016/S0097-8493(01)00122-4

[19] Sullivan J, Sullivan JW, Tyler SW. Intelligent user interfaces: Introduction. In: Luff P, edi-

[20] Browne D, Norman M, Riches D. Why build adaptive systems? In: Browne D, Totterdell P, Norman M, editors. Adaptive User Interfaces. London, UK: Elsevier Science; 2016.

[21] Global Market Insights Inc. Market Research Report [Internet]. 2017. Available from:

[22] Apple Developer. ARKit [Internet]. 2017. Available from: https://developer.apple.com/

[23] Google Developers. ARCore [Internet]. 2017. Available from: https://developers.google.

[24] Oculus VR. Oculus Rift [Internet]. 2017. Available from: https://www.oculus.com/rift/

[25] Magic Leap Inc. Magic Leap [Internet]. 2017. Available from: www.magicleap.com/

[26] Meta Company. Meta 2 [Internet]. 2017. Available from: www.metavision.com/ [Accessed:

[27] Digi-Capital. \$1 billion AR/VR investment in Q4, \$2.5 billion this year so far [Internet]. 2017. Available from: https://www.digi-capital.com/news/2017/11/1-billion-ar-vr-invest-

[28] Fitzmaurice GW. Situated information spaces and spatially aware palmtop computers.

[29] Yuen SC-Y, Yaoyuneyong G, Johnson E. Augmented reality: An overview and five directions for AR in education. Journal of Educational Technology Development and

[30] The Franklin Institute. How To Experience Terra Cotta Warriors Augmented Reality.

[31] El Sayed NA, Zayed HH, Sharawy MI. ARSC: Augmented reality student card. Computers & Education. 2011;**56**(4):1045-1061. DOI: 10.1016/j.compedu.2010.10.019

[32] Ibanez M, Kloos CD, Leony D, Rueda JJ, Maroto D. Learning a foreign language in a mixed-reality environment. IEEE Internet Computing. 2011;**15**(6):44-47. DOI: 10.1109/

ment-in-q4-2-5-billion-this-year-so-far/#.WifPGLQ-fOQ [Accessed: 2017-10-20]

Communications of the ACM. 1993;**36**(7):39-49. DOI: 10.1145/159544.159566

2018. Available from https://www.fi.edu/mobile [Accessed: 2018-02-20]

Exchange. 2011;**4**(1):119-140. DOI: 10.18785/jetde.0401.10

tor. Intelligent User Interfaces. New York: ACM; 1994

https://www.gminsights.com/ [Accessed: 2017-10-20]

10.1109/MMUL.2012.24

arkit/ [Accessed: 2017-10-20]

com/ar/ [Accessed: 2017-10-20]

[Accessed: 2017-10-20]

[Accessed: 2017-10-16]

2017-10-16]

MIC.2011.78

pp. 15-58


[17] Zhang Z. Microsoft Kinect sensor and its effect. IEEE Multimedia. 2012;**19**(2):4-10. DOI: 10.1109/MMUL.2012.24

[4] Billinghurst M, Clark A, Lee G.A survey of augmented reality. Foundations and Trends® in Human–Computer Interaction. 2015;**8**(2-3):73-272. DOI: 10.1561/1100000049

[5] Wilson AD, Benko H. Holograms without headsets: Projected augmented reality with the RoomAlive Toolkit. In: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA '17); 6-11 May 2017; Denver, CO, USA. New York, NY, USA: ACM; 2017. pp. 425-428. DOI: 10.1145/3027063.3050433 [6] Azuma R, Baillot Y, Behringer R, Feiner S, Julier S, MacIntyre B. Recent advances in augmented reality. IEEE Computer Graphics and Applications. 2001;**21**(6):34-47. DOI:

[7] Milgram P, Kishino F. A taxonomy of mixed reality visual displays. IEICE Transactions

[8] Hurley D.Mixed Reality, Education, & Collaboration [Internet]. leveragEd Updates. 2016. Available from: https://medium.com/leveraged-technology/mixed-reality-education-

[9] Microsoft. Microsoft HoloLens [Internet]. 2017. Available from: www.microsoft.com/

[10] Alphabet. Project X [Internet]. 2017. Available from: www.x.company/glass/ [Accessed:

[11] Raskar R, Low KL. Interacting with spatially augmented reality. In: Proceedings of the 1st International Conference on Computer Graphics, Virtual Reality and Visualisation (AfriGraph '01); Nov 5-7 2001; Cape Town, South Africa. New York: ACM. pp. 101-108.

[12] Klopfer E, Squire K. Environmental detectives—The development of an augmented reality platform for environmental simulations. Educational Technology Research and

[13] Li J, Besada JA, Bernardos AM, Tarrío P, Casar JR. A novel system for object pose estimation using fused vision and inertial data. Information Fusion. 2017;**33**:15-28. DOI:

[14] Gps.gov. GPS Accuracy [Internet]. 2016. Available from: https://www.gps.gov/systems/

[15] Forbes. Why Apple Will Win the Augmented Reality Race [Internet]. 2017. Available from: https://www.forbes.com/sites/allabouttherupees/2017/09/07/why-apple-will-win-

[16] Newcombe RA, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohi P, Shotton J, Hodges S, Fitzgibbon A. KinectFusion: Real-time dense surface mapping and tracking. In: Proceedings of the 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2011; Oct 26-29 2011; Basel, Switzerland. New York: IEEE; 2011.

Development. 2008;**56**(2):203-228. DOI: 10.1007/s11423-007-9037-6

gps/performance/accuracy/ [Accessed: 2017-10-20]

the-augmented-reality-race/ [Accessed: 2017-10-15]

pp. 127-136. DOI: 10.1109/ISMAR.2011.6092378

on Information and Systems. 1994;**E-77D**(12):1321-1329

collaboration-f0fc28b3e959 [Accessed: 2018-02-07]

microsoft-hololens/en-us [Accessed: 2017-10-16]

10.1109/38.963459

236 Artificial Intelligence - Emerging Trends and Applications

2017-10-15]

DOI: 10.1145/513867.513889

10.1016/j.inffus.2016.04.006


[33] Wei X, Weng D, Liu Y, Wang Y. Teaching based on augmented reality for a technical creative design course. Computers & Education. 2015;**81**:221-234. DOI: 10.1016/j. compedu.2014.10.017

[45] Westerfield G, Mitrovic A, Billinghurst M. Intelligent augmented reality training for motherboard assembly. International Journal of Artificial Intelligence in Education. 2015;

Augmenting Reality with Intelligent Interfaces http://dx.doi.org/10.5772/intechopen.75751 239

[46] Baird KM, Barfield W. Evaluating the effectiveness of augmented reality displays for a manual assembly task. Virtual Reality. 1999;**4**(4):250-259. DOI: 10.1007/BF01421808

[47] Weidenhausen J, Knoepfle C, Stricker D. Lessons learned on the way to industrial augmented reality applications, a retrospective on ARVIKA. Computers & Graphics. 2003;

[48] Henderson SJ, Feiner S. Evaluating the benefits of augmented reality for task localization in maintenance of an armored personnel carrier turret. In: Proceedings of 8th IEEE International Symposium on Mixed and Augmented Reality (ISMAR 2009); October 19-22 2009; Orlando, FL, USA. New York: IEEE; 2009. pp. 135-144. DOI: 10.1109/

[49] Kloberdanz K. How Augmented Reality Glasses Are Being Used In Industry—GE [Internet]. GE Reports. 2017. Available from: https://www.ge.com/reports/looking-smart-

[50] Stirenko S, Gordienko Y, Shemsedinov T, Alienin O, Kochura Y, Gordienko N, Rojbi A, Benito JR, González EA. User-Driven Intelligent Interface on the Basis of Multimodal Augmented Reality and Brain-Computer Interaction for People with Functional Disabilities [Internet]. 2017. Available from: https://arxiv.org/abs/1704.05915 [Accessed

[51] Artetxe González E, Souvestre F, López Benito JR. Augmented reality interface for E2LP: Assistance in electronic laboratories through augmented reality. In: Szewczyk R, Kaštelan I, Temerinac M, Barak M, Sruk V, editors. Embedded Engineering Education. Cham, Switzerland: Springer International Publishing; 2016. pp. 93-108. DOI: 10.1007/

[52] Upskill. Upskill Industrial Augmented Reality Software Platform [Internet]. 2017.

[53] Erickson, C. Microsoft HoloLens Enables Thyssenkrupp to Transform the Global Elevator Industry [Internet]. 2016. Available from: https://blogs.windows.com/ devices/2016/09/15/microsoft-hololens-enables-thyssenkrupp-to-transform-the-global-

[54] Joseph B, Armstrong DG. Potential perils of peri-Pokémon perambulation: The dark reality of augmented reality? Oxford Medical Case Reports. 2016;**2016**(10):265-266. DOI:

[55] Gromala D, Tong X, Shaw C, Jin W. Immersive virtual reality as a non-pharmacological analgesic for pain management: Pain distraction and pain self-modulation. In: Rodrigues J, Cardoso P, Monteiro J, Figueiredo M, editors. Handbook of Research on Human-Computer

Available from: https://upskill.io/ [Accessed: 2017-10-16]

elevator-industry/ [Accessed: 2017-10-20]

augmented-reality-seeing-real-results-industry-today/ [Accessed: 2017-11-13]

**25**(1):157-172. DOI: 10.1007/s40593-014-0032-x

**27**(6):887-891. DOI: 10.1016/j.cag.2003.09.001

ISMAR.2009.5336486

2018-01-25]

978-3-319-27540-6\_6

10.1093/omcr/omw080


[45] Westerfield G, Mitrovic A, Billinghurst M. Intelligent augmented reality training for motherboard assembly. International Journal of Artificial Intelligence in Education. 2015; **25**(1):157-172. DOI: 10.1007/s40593-014-0032-x

[33] Wei X, Weng D, Liu Y, Wang Y. Teaching based on augmented reality for a technical creative design course. Computers & Education. 2015;**81**:221-234. DOI: 10.1016/j.

[34] Dunleavy M, Dede C, Mitchell R.Affordances and limitations of immersive participatory augmented reality simulations for teaching and learning. Journal of Science Education

[35] Squire K. From information to experience: Place-based augmented reality games as a model for learning in a globally networked society. Teachers College Record. 2010; **112**(10):2565-2602. Available from: http://www.litmedmod.ca/sites/default/files/pdf/

[36] Joo-Nagata J, Abad FM, Giner JG, García-Peñalvo FJ. Augmented reality and pedestrian navigation through its implementation in m-learning and e-learning: Evaluation of an educational program in Chile. Computers & Education. 2017;111:1-7. DOI: 10.1016/j.

[37] Mayer RE. Multimedia Learning. Cambridge, UK: Cambridge University Press; 2009

[38] van Merriënboer JJ, Sweller J. Cognitive load theory and complex learning: Recent developments and future directions. Educational Psychology Review. 2005;**17**(2):147-177.

[39] Wu HK, Lee SW, Chang HY, Liang JC. Current status, opportunities and challenges of augmented reality in education. Computers & Education. 2013;**62**:41-49. DOI: 10.1016/j.

[40] Chen P, Liu X, Cheng W, Huang R. A review of using Augmented Reality in Education from 2011 to 2016. In: Popescu E, Kinshuk Khribi MK, Huang R, Jemni M, Chen N-S, Sampson DG, editors. Innovations in Smart Learning. 2017. Singapore: Springer; 2017.

[41] Akçayır M, Akçayır G. Advantages and challenges associated with augmented reality for education: A systematic review of the literature. Educational Research Review.

[42] Johnson L, Becker SA, Estrada V, Freeman A.NMC Horizon Report: 2015 Museum Edition. Austin, TX: The New Media Consortium; 2015. Available from: https://eric.ed.gov/

[43] Liu TY. A context-aware ubiquitous learning environment for language listening and speaking. Journal of Computer Assisted Learning. 2009;**25**(6):515-527. DOI: 10.1111/

[44] McMahon DD, Cihak DF, Wright RE, Bell SM. Augmented reality for teaching science vocabulary to postsecondary education students with intellectual disabilities and autism. Journal of Research on Technology in Education. 2016;**48**(1):38-56. DOI: 10.1080/

and Technology. 2009;**18**(1):7-22. DOI: 10.1007/s10956-008-9119-1

squire\_2010\_technologies\_realite\_augmentee.pdf [Accessed 2018-01-25]

compedu.2014.10.017

238 Artificial Intelligence - Emerging Trends and Applications

compedu.2017.04.003

compedu.2012.10.024

DOI: 10.1007/s10648-005-3951-0

pp. 13-18. DOI: 10.1007/978-981-10-2419-1\_2

2017;**20**:1-11. DOI: 10.1016/j.edurev.2016.11.002

?id=ED559371 Accessed 2018-01-25

j.1365-2729.2009.00329.x

15391523.2015.1103149


Interfaces, Developments, and Applications. Hershey, PA: IGI Global; 2016. pp. 478-500. DOI: 10.4018/978-1-5225-0435-1.ch019

[67] Lin C-C, Hsiao H-S. The effects of multimedia annotations via PDA on EFL learners' vocabulary learning. In: Yu F-Y, Hirashima T, Supnithi T, Biswas G, editors. Proceedings of the 19th International Conference on Computers in Education (ICCE 2011); 28 November-2 December 2011; Chiang Mai, Thailand. Tha Khlong, Thailand: NECTEC. pp. 579-586. Available from: http://www.apsce.net/uploaded/filemanager/9f035690-

Augmenting Reality with Intelligent Interfaces http://dx.doi.org/10.5772/intechopen.75751 241

[68] Lin C-C, Yu Y-C. EFL learners' cognitive load of learning vocabulary on mobile phones. In: Biswas G, Wong L-H, Hirashima T, Chen W, editors. Proceedings of the 20th International Conference on Computers in Education (ICCE 2012). 26-30 November 2012; Singapore. Singapore: National Institute of Education; 2012. pp. 545-552. Available from: http://www.apsce.net/uploaded/filemanager/31fec10b-54f9-45c9-992d-b97834deb457.

[69] Lin C-C, Wu Y-C. The effects of different presentation modes of multimedia annotations on sentential listening comprehension. In: Proceedings of the 21st International Conference on Computers in Education 2013. 18-22 November 2013; Denpasar Bali, Indonesia. Jhongli City, Taiwan: Asia-Pacific Society for Computers in Education.

[70] Kukulska-Hulme A, Viberg O. Mobile collaborative language learning: State of the art.

[71] Demouy V, Jones A, Kan Q, Kukulska-Hulme A, Eardley A. Why and how do distance learners use mobile devices for language learning? The EuroCALL Review. 2016;**24**(1):10-

[72] Bradley L, Lindström NB, Hashemi SS. Integration and language learning of newly arrived migrants using mobile technology. Journal of Interactive Media in Education.

[73] Holden CL, Sykes JM. Leveraging mobile games for place-based language learning. International Journal of Game-Based Learning. 2011;**1**(2):1-18. DOI: 10.4018/ijgbl.

[74] Thorne, S. Language learning, ecological validity, and innovation under conditions of superdiversity. Bellaterra Journal of Teaching & Learning Language & Literature. 2013;

[75] Bellotti F, Kapralos B, Lee K, Moreno-Ger P, Berta R. Assessment in and of serious games: An overview. Advances in Human-Computer Interaction. 2013;**2013**:1-11. DOI:

[76] Niantic. Pokémon GO [Internet]. 2017. Available from: http://pokemongolive.com/en/

[77] Webster A. Niantic's First AR Game Ingress is Getting a Massive Overhaul in 2018 [Internet]. The Verge. 2017. Available from: https://www.theverge.com/2017/12/2/16725884/ingress-

prime-update-niantic-pokemon-go [Accessed: 2017-12-10]

British Journal of Educational Technology. 2017. DOI: 10.1111/bjet.12580

cc65-4d33-aa29-0a4d953357c4.pdf [Accessed: 2018-01-28]

pdf [Accessed 2018-01-25]

24. DOI: 10.4995/eurocall.2016.5663

2017;**2017**(1):1-9. DOI: 10.5334/jime.434

pp. 668-678

2011040101

**6**(2):1-27

10.1155/2013/136864

[Accessed: 2017-12-10]


[67] Lin C-C, Hsiao H-S. The effects of multimedia annotations via PDA on EFL learners' vocabulary learning. In: Yu F-Y, Hirashima T, Supnithi T, Biswas G, editors. Proceedings of the 19th International Conference on Computers in Education (ICCE 2011); 28 November-2 December 2011; Chiang Mai, Thailand. Tha Khlong, Thailand: NECTEC. pp. 579-586. Available from: http://www.apsce.net/uploaded/filemanager/9f035690 cc65-4d33-aa29-0a4d953357c4.pdf [Accessed: 2018-01-28]

Interfaces, Developments, and Applications. Hershey, PA: IGI Global; 2016. pp. 478-500.

[56] Dunn J, Yeo E, Moghaddampour P, Chau B, Humbert S. Virtual and augmented reality in the treatment of phantom limb pain: A literature review. NeuroRehabilitation. 2017;**40**(4):

[57] Smith M, Gabbard JL, Burnett G, Doutcheva N. The effects of augmented reality headup displays on drivers' eye scan patterns, performance, and perceptions. International Journal of Mobile Human Computer Interaction. 2017;**9**(2):1-7. DOI: 10.4018/IJMHCI.

[58] Santos ME, Taketomi T, Yamamoto G, Rodrigo MM, Sandor C, Kato H. Augmented reality as multimedia: The case for situated vocabulary learning. Research and Practice in

Technology Enhanced Learning. 2016;**11**(1):1-23. DOI: 10.1186/s41039-016-0028-2

from: http://llt.msu.edu/issues/october2016/emerging.pdf [Accessed 2018-01-25]

[59] Godwin-Jones R. Augmented reality and language learning: From annotated vocabulary to place-based mobile games. Language Learning & Technology. 2016;**20**(3):9-19. Available

[60] Richardson D. Exploring the potential of a location based augmented reality game for language learning. International Journal of Game-Based Learning. 2016;**6**(3):34-49. DOI:

[61] Liu PH, Tsai MK. Using augmented-reality-based mobile learning material in EFL English composition: An exploratory case study. British Journal of Educational Technology. 2013;

[62] Sandberg J, Maris M, de Geus K. Mobile English learning: An evidence-based study with fifth graders. Computers & Education. 2011;**57**(1):1334-1347. DOI: 10.1016/j.compedu.

[63] Dunleavy M, Dede C. Augmented reality teaching and learning. In: Spector M, Merrill MD, Elen J, Bishop MJ, editors. Handbook of Research on Educational Communications

[64] Vazquez CD, Nyati AA, Luh A, Fu M, Aikawa T, Maes P. Serendipitous language learning in mixed reality. In: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI '17); 6-11 May 2017; Denver, CO. New York:

[65] Ho SC, Hsieh SW, Sun PC, Chen CM. To activate English learning: Listen and speak in real life context with an AR featured u-learning system. Educational Technology &

[66] Huang YM, Chiu PS, Liu TC, Chen TS. The design and implementation of a meaningful learning-based evaluation method for ubiquitous learning. Computers & Education.

DOI: 10.4018/978-1-5225-0435-1.ch019

240 Artificial Intelligence - Emerging Trends and Applications

595-601. DOI: 10.3233/NRE-171447

10.4018/IJGBL.2016070103

Society. 2017;**20**(2):176-187

**44**(1):E1-E4. DOI: 10.1111/j.1467-8535.2012.01302.x

and Technology. New York: Springer. pp. 735-745

ACM; 2017. pp. 2172-2179. DOI: 10.1145/3027063.3053098

2011;**57**(4):2291-2302. DOI: 10.1016/j.compedu.2011.05.023

2017040101

2011.01.015


[78] King A. Pokémon GO for Listening and Language Development [Internet]. 2016. Available from: https://avteducationtalk.wordpress.com/2016/07/13/pokemon-go-forlistening-and-language-development/ [Accessed: 2017-11-20]

**Chapter 12**

**Provisional chapter**

**The Today Tendency of Sentiment Classification**

**The Today Tendency of Sentiment Classification**

DOI: 10.5772/intechopen.74930

Sentiment classification has already been studied for many years because it has had many crucial contributions to many different fields in everyday life, such as in political activities, commodity production, and commercial activities. There have been many kinds of the sentiment analysis such as machine learning approaches, lexicon-based approaches, etc., for many years. The today tendency of the sentiment classification is as follows: (1) Processing many big data sets with shortening execution times (2) Having a high accuracy (3) Integrating flexibly and easily into many small machines or many different

**Keywords:** sentiment classification, machine learning approaches, lexicon-based

Many different approaches have already been developed for sentiment analysis for many years because a lot of researchers have already desired to find many optimal algorithms and

The sentiment classification, called opinion mining, is the computational studies of opinions, sentiments, evaluations, attitudes, appraisal, affects, views, emotions, subjectivity, etc., expressed in

The different approaches have been used to cross-check with each other to reform their

One document (one sentence or one phrase) is classified into the positive polarity, the nega-

approaches, today tendency of the sentiment classification, big data set

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Vo Ngoc Phu and Vo Thi Ngoc Tran

Vo Ngoc Phu and Vo Thi Ngoc Tran

http://dx.doi.org/10.5772/intechopen.74930

**Abstract**

**1. Introduction**

accuracies.

tive polarity or the neutral polarity.

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

approaches. We will present each category in more details.

optimal approaches for many surveys and commercial applications.

texts (reviews, blogs, discussions, news, comments, feedbacks, etc.)


#### **The Today Tendency of Sentiment Classification The Today Tendency of Sentiment Classification**

DOI: 10.5772/intechopen.74930

Vo Ngoc Phu and Vo Thi Ngoc Tran Vo Ngoc Phu and Vo Thi Ngoc Tran

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74930

#### **Abstract**

[78] King A. Pokémon GO for Listening and Language Development [Internet]. 2016. Available from: https://avteducationtalk.wordpress.com/2016/07/13/pokemon-go-for-

[79] Schrock K. August 2016: Pokémon Go in the Classroom [Internet]. 2016. Available from: http://blog.discoveryeducation.com/blog/2016/07/13/pokemongo/ [Accessed:

[80] Di Serio Á, Ibáñez MB, Kloos CD. Impact of an augmented reality system on students' motivation for a visual art course. Computers & Education. 2013;**68**:586-596. DOI: 10.1016/

[81] Shea AM. Student Perceptions of a Mobile Augmented Reality Game and Willingness to Communicate in Japanese [Ed.D. Dissertation]. Malibu, CA: Pepperdine University;

[82] Ternier S, Klemke R, Kalz M, van Ulzen P, Specht M. ARLearn: Augmented reality meets augmented virtuality. Journal of Universal Computer Science. 2012;**18**(15):2143-2164.

[83] Ab Aziz N, Ab Aziz K, Paul A, Yusof A, Noor NSM. Providing augmented reality based education for students with attention deficit hyperactive disorder via cloud computing: Its advantages. In: Proceedings of the 14th International Conference on Advanced Communication Technology (ICACT 2012); 19-22 February 2012; PyeonhChang,

[84] Martí-Parreño J, Méndez-Ibáñez E, Alonso-Arroyo A. The use of gamification in education: A bibliometric and text mining analysis. Journal of Computer Assisted Learning.

[85] Cuban L. Teachers and Machines: The Classroom Use of Technology Since 1920.

[86] Selwyn N. Looking beyond learning: Notes towards the critical study of educational technology. Journal of Computer Assisted Learning. 2010;**26**(1):65-73. DOI: 10.1111/

[87] Gartner Group. Gartner Hype Cycle for Emerging Technologies [Internet]. 2017. Available from: https://blogs.gartner.com/smarterwithgartner/top-trends-in-the-gartner-hype-

[88] O'Neill DK. When form follows fantasy: Lessons for learning scientists from modernist architecture and urban planning. Journal of the Learning Sciences. 2016;**25**(1):133-152.

listening-and-language-development/ [Accessed: 2017-11-20]

2017-11-20]

2014

j.compedu.2012.03.002

242 Artificial Intelligence - Emerging Trends and Applications

DOI: 10.3217/jucs-018-15-2143

Republic of Korea. New York: IEEE; 2012. pp. 577-581

cycle-for-emerging-technologies-2017/ [Accessed: 2017-11-10]

2016;**32**(6):663-676. DOI: 10.1111/jcal.12161

New York: Teachers College Press; 1986

DOI: 10.1080/10508406.2015.1094736

j.1365-2729.2009.00338.x

Sentiment classification has already been studied for many years because it has had many crucial contributions to many different fields in everyday life, such as in political activities, commodity production, and commercial activities. There have been many kinds of the sentiment analysis such as machine learning approaches, lexicon-based approaches, etc., for many years. The today tendency of the sentiment classification is as follows: (1) Processing many big data sets with shortening execution times (2) Having a high accuracy (3) Integrating flexibly and easily into many small machines or many different approaches. We will present each category in more details.

**Keywords:** sentiment classification, machine learning approaches, lexicon-based approaches, today tendency of the sentiment classification, big data set

## **1. Introduction**

Many different approaches have already been developed for sentiment analysis for many years because a lot of researchers have already desired to find many optimal algorithms and optimal approaches for many surveys and commercial applications.

The sentiment classification, called opinion mining, is the computational studies of opinions, sentiments, evaluations, attitudes, appraisal, affects, views, emotions, subjectivity, etc., expressed in texts (reviews, blogs, discussions, news, comments, feedbacks, etc.)

The different approaches have been used to cross-check with each other to reform their accuracies.

One document (one sentence or one phrase) is classified into the positive polarity, the negative polarity or the neutral polarity.

The positive polarity is a polarity of a word or a phrase (a sentence or a document) which performs aspects about good, nice, like, love, delicious, happiness, enthusiasm, kindness, etc. Examples of phrases: very good, very nice, etc. Examples of sentences: "He is very handsome"; "She is very beautiful." Examples of documents: "He is very handsome. He is also good at Sports."

In this chapter, we display the dictionary-based approaches and the corpus-based approaches of the sentiment classification basically; and we also present the today tendency of the sentiment analysis in more details as follows: (1) Processing many big data sets with shortening execution times (2) Having a high accuracy (3) Integrating flexibly and easily into many small machines or many different approaches, because there have been a lot of documents, reviews, discussions, blogs, news, comments, feedbacks, etc., on many websites, online news sites, and

The Today Tendency of Sentiment Classification http://dx.doi.org/10.5772/intechopen.74930 245

There have also been many big corporations in the world. The corporations have had many branches in many different countries in the world. Each branch of a corporation has had thousands of employees. Therefore, the corporations have had a lot of big information and big data sets about their employees, their businesses, etc. Processing the big information and the big data sets is very difficult by using the old algorithms, the old surveys, and old applications; and sometimes the big information and the big data set cannot be processed successfully.

Thus, the researchers now find the approaches for the surveys and the commercial applications to process the big data set for shortening execution times, improve the accuracies of these approaches. In addition, they can flexibly be integrated, and easily into the small machines or the different approaches because these small machines can be used conveniently in anywhere, for any type of users, and for various purposes. In the near future, these small machines can

This chapter includes six sections: The first section is the Introduction section. The second section is the Approaches of the Sentiment Classification section. The third section is the Today Tendency of the Sentiment Analysis section. The fourth section is the Conclusion section. The fifth section is the Conflict of Interest section, and the sixth section is the

This section comprises two sub-sections as follows: In the first Section 2.1, we present the lexicon-based approaches of the opinion analysis. The machine learning approaches are dis-

The lexicon-based approaches are comprised of multiple approaches, both dictionary-based

The dictionary-based approaches involve using a dictionary that contains synonyms and antonyms of a word: for example [1], this study used seed sentiment words from a dictionary.

The approaches based on the corpus find opinion words with context-specific orientations according to a seed list of opinion words, to find other opinion words in a large corpus. There

are two approaches within the category of corpus-based approaches:

be produced easily, and they can be very cheap and easy to carry in everywhere.

**2. The approaches of sentiment classification**

social networks.

References section.

played in the second Section 2.2.

**2.1. Lexicon-based approaches**

and corpus-based.

The negative polarity is a polarity of a word or a phrase (a sentence or a document) which expresses aspects about bad, evil, poor, ugly, wrong, inclement, foul, shabby, sinister, rotten, ill, shoddy, etc. Examples of phrases: very bad, very evil, etc. Examples of sentences: "He is very bad"; "She is very wrong." Examples of documents: "She is very bad. She is very stupid."

The neutral polarity is a polarity of a word or a phrase (a sentence or a document) which is not both the positive polarity and the negative polarity. Examples of neutral words: eat, talk, drink, etc. Examples of phrases: a bucket of water, 1 kg, and etc. Examples of documents: "He eats a banana. He drinks a glass of water."

The polarity (positive, negative, or neutral) of a sentence or a document has been identified by using many machine learning algorithms in the surveys of the sentiment classification in [1–83].

The sentiment polarity of a word or phrase (a sentence or a document) is also expressed through a valence (sentiment score or sentiment value) of this word or this phrase (this sentence or this document).

The polarity and valence of a word or a phrase in English have been calculated by using many different approaches such as many sentiment dictionaries. Besides, the polarity and sentiment value of a word or a phrase have been identified by using many similarity measures in English and Vietnamese in [49, 50, 51, 52]. In addition, according to our opinion, the polarity and sentiment score of a word or phrase of all languages (Chinese, French, etc.) can be calculated easily by using the similarity coefficients.

If the valence of a word or phrase (a sentence or a document) is greater than 0, this word or phrase (this sentence or this document) is the positive polarity. A word or phrase (a sentence or document) is the neutral polarity if the sentiment score of this word or phrase (this sentence or this document) is as equal as 0. If the sentiment value of a word or phrase (a sentence or a document) is less than 0, this word or phrase (this sentence or this document) is the negative polarity.

Many machine learning algorithms have already had two kinds (supervised Learning and unsupervised learning) comprising a lot of algorithm groups such as: deep learning group, ensemble group, neural networks group, regularization group, rule system group, regression group, Bayesian group, decision tree group, dimensionality reduction group, instance based group, and clustering group.

The sentiment analysis has had many machine learning approaches and lexicon-based approaches.

The lexicon-based approaches comprise many dictionary-based approaches and corpusbased approaches. The corpus-based approaches include statistical and semantic.

In this chapter, we display the dictionary-based approaches and the corpus-based approaches of the sentiment classification basically; and we also present the today tendency of the sentiment analysis in more details as follows: (1) Processing many big data sets with shortening execution times (2) Having a high accuracy (3) Integrating flexibly and easily into many small machines or many different approaches, because there have been a lot of documents, reviews, discussions, blogs, news, comments, feedbacks, etc., on many websites, online news sites, and social networks.

There have also been many big corporations in the world. The corporations have had many branches in many different countries in the world. Each branch of a corporation has had thousands of employees. Therefore, the corporations have had a lot of big information and big data sets about their employees, their businesses, etc. Processing the big information and the big data sets is very difficult by using the old algorithms, the old surveys, and old applications; and sometimes the big information and the big data set cannot be processed successfully.

Thus, the researchers now find the approaches for the surveys and the commercial applications to process the big data set for shortening execution times, improve the accuracies of these approaches. In addition, they can flexibly be integrated, and easily into the small machines or the different approaches because these small machines can be used conveniently in anywhere, for any type of users, and for various purposes. In the near future, these small machines can be produced easily, and they can be very cheap and easy to carry in everywhere.

This chapter includes six sections: The first section is the Introduction section. The second section is the Approaches of the Sentiment Classification section. The third section is the Today Tendency of the Sentiment Analysis section. The fourth section is the Conclusion section. The fifth section is the Conflict of Interest section, and the sixth section is the References section.

## **2. The approaches of sentiment classification**

This section comprises two sub-sections as follows: In the first Section 2.1, we present the lexicon-based approaches of the opinion analysis. The machine learning approaches are displayed in the second Section 2.2.

## **2.1. Lexicon-based approaches**

The positive polarity is a polarity of a word or a phrase (a sentence or a document) which performs aspects about good, nice, like, love, delicious, happiness, enthusiasm, kindness, etc. Examples of phrases: very good, very nice, etc. Examples of sentences: "He is very handsome"; "She is very beautiful." Examples of documents: "He is very handsome. He is also good at

The negative polarity is a polarity of a word or a phrase (a sentence or a document) which expresses aspects about bad, evil, poor, ugly, wrong, inclement, foul, shabby, sinister, rotten, ill, shoddy, etc. Examples of phrases: very bad, very evil, etc. Examples of sentences: "He is very bad"; "She is very wrong." Examples of documents: "She is very bad. She is very stupid."

The neutral polarity is a polarity of a word or a phrase (a sentence or a document) which is not both the positive polarity and the negative polarity. Examples of neutral words: eat, talk, drink, etc. Examples of phrases: a bucket of water, 1 kg, and etc. Examples of documents: "He

The polarity (positive, negative, or neutral) of a sentence or a document has been identified by using many machine learning algorithms in the surveys of the sentiment classification

The sentiment polarity of a word or phrase (a sentence or a document) is also expressed through a valence (sentiment score or sentiment value) of this word or this phrase (this sen-

The polarity and valence of a word or a phrase in English have been calculated by using many different approaches such as many sentiment dictionaries. Besides, the polarity and sentiment value of a word or a phrase have been identified by using many similarity measures in English and Vietnamese in [49, 50, 51, 52]. In addition, according to our opinion, the polarity and sentiment score of a word or phrase of all languages (Chinese, French, etc.) can be calcu-

If the valence of a word or phrase (a sentence or a document) is greater than 0, this word or phrase (this sentence or this document) is the positive polarity. A word or phrase (a sentence or document) is the neutral polarity if the sentiment score of this word or phrase (this sentence or this document) is as equal as 0. If the sentiment value of a word or phrase (a sentence or a document) is less than 0, this word or phrase (this sentence or this document) is the negative polarity.

Many machine learning algorithms have already had two kinds (supervised Learning and unsupervised learning) comprising a lot of algorithm groups such as: deep learning group, ensemble group, neural networks group, regularization group, rule system group, regression group, Bayesian group, decision tree group, dimensionality reduction group, instance based

The sentiment analysis has had many machine learning approaches and lexicon-based

The lexicon-based approaches comprise many dictionary-based approaches and corpus-

based approaches. The corpus-based approaches include statistical and semantic.

Sports."

in [1–83].

tence or this document).

group, and clustering group.

approaches.

eats a banana. He drinks a glass of water."

244 Artificial Intelligence - Emerging Trends and Applications

lated easily by using the similarity coefficients.

The lexicon-based approaches are comprised of multiple approaches, both dictionary-based and corpus-based.

The dictionary-based approaches involve using a dictionary that contains synonyms and antonyms of a word: for example [1], this study used seed sentiment words from a dictionary.

The approaches based on the corpus find opinion words with context-specific orientations according to a seed list of opinion words, to find other opinion words in a large corpus. There are two approaches within the category of corpus-based approaches:

**a.** Statistical Approach (example in [2]): If a word appears intermittently amid positive texts, then its polarity is positive. If it appears frequently among negative texts, then its polarity can be considered negative. If it has equal frequencies in positive and negative texts, then it can be considered a neutral word. Seed opinion words can be found using statistical techniques. Most state-of-the-art methods are based on the observation that similar opinion words often appear together in a corpus. Thus, if two words appear together frequently within the same context, then the probability is high that they have same polarity. Therefore, the polarity of an unknown word can be determined by calculating the relative frequency of co-occurrence with another word.

The lexical or lexicon-based approach is a method for a teaching dictionary-based approach described by Mechael Lewis in the early 1990s in [12]. The basic concept and methods of this approach represent an idea that signifies how education involves the understanding and production of lexical phrases. This pattern of language has grammar as well as a meaningful

Sentiment analysis performs a role in the lexicon-based approach in [13]. It plays a significant

The sentiments are as followed in many dictionaries which are named as lexicon based dictionaries which are: (1) Bing Liu's opinion lexicon. (2) MPQA subjectivity lexicon. (3)

The acronym dictionary included in [15, 16] is very helpful in expanding tweets and improve

In [17, 18, 19], the emoticons have a different combination of symbols as different abbreviations. The lexicon-based antonym dictionary in [20] contains set of well-lexicons, such as WordNet dictionary in English. WordNet dictionary maintains the set of lexical datasets for English

The authors in [21–35] use the equations determining Pointwise Mutual Information (PMI)

They use the equations determining SO (sentiment orientation) of word wi as follows:

*SO* (*wi*) = *PMI*(*wi*, *positive*) − *PMI*(*wi*, *negative*) (2)

In [21–28], the positive and the negative of Eq. (2) in English are: positive = {good, nice, excellent, positive, fortunate, correct, superior} and negative = {bad, nasty, poor, negative, unfor-

The AltaVista search engine (AVSE) is used in the PMI equations of [22, 23, 25], and the Google search engine (GSE) is used in the PMI equations of [24, 26, 28]. In addition, the authors of [24] also use German, the authors of [25] also use Macedonian, the authors of [26] also use Arabic, the authors of [27] also use Chinese, and the authors of [28] also use Spanish. In addition, the

With [29–32], the PMI equations are used in Chinese, not English, and Tibetan is also added in [29]. In terms of the search engine, AVSE is used in [31], and the authors of [32] use three search engines: GSE, the Yahoo search engine (YSE), and the Baidu search engine (BSE). The PMI equations are also used in Japanese with GSE in [33]. The authors in [34, 35] also use the

*P*(*wi*, *wj*) \_\_\_\_\_\_\_\_\_\_

*<sup>P</sup>*(*wi*)*xP*(*wj*)) (1)

The Today Tendency of Sentiment Classification http://dx.doi.org/10.5772/intechopen.74930 247

Lexicon based approach [14] is to extract and handle the sentiment as no-slang words.

role in determining classes such as positive, negative, and neutral.

words and also keeps record of semantic relationship between works.

SentiWordNet lexicon. (4) Semantic Evaluation (SemEval).

collection of words.

overall sentiments scores.

tunate, wrong, inferior}.

between two words wi and wj as follows:

*PMI*(*wi*, *wj*) = *log*2(

Bing search engine (BSE) is also used in [26].

PMI equations and Jaccard equations with GSE in English.

**b.** Semantic approach (example in [3]): This principle assigns similar sentiment values to semantically-close words. These semantically-close words can be obtained by getting a list of sentiment words, iteratively expanding the initial set with synonyms and antonyms, and then determining the sentiment polarity for an unknown word by the relative count of positive and negative synonyms of this word.

A lexicon-based method was used for the sentiment classification of Twitter data in [4]. The approaches were used to identify and extract sentiments from emotions and hashtags. Also used in [4] was the practice of converting non-grammatical words to grammatical words, and normalizing non-root to root words to extract sentiments.

The survey in [5] used lexicon–based classification and included two techniques: a methodof-moments estimator for word, and a Bayesian adjustment for repeated counts of the same word.

A structured approach was used in [6] for domain-dependent sentiment analysis, using lexicon expansion aided by emoticons.

The survey [7] introduced was a new approach to lexicon extraction, which can be successfully used for sentiment polarity assignment. It has been shown that the accuracy obtained from such lexicons outperforms other lexicon-based approaches.

The lexicon–based approach that [8] used was the Semantic Orientation CALculator (SO-CAL), which includes dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation.

The survey in [9] proposes a framework for sentiment analysis using dictionary-based approach. An approach to sentiment analysis is proposed that uses dictionary-based approach incorporating fuzzy logic.

In the research in [10], a lexicon-based approach was proposed to calculate reputation scores from Twitter. A Saudi-dialect lexicon was developed from Saudi tweets, to improve addressing the sentiment of the Arabic tweets.

The authors of [11] propose a lexicon-based approach to sentiment classification of Twitter posts. Their approach is based on the exploitation of widespread lexical resources such as SentiWordNet, WordNet-Affect, MPQA, and SenticNet.

The lexical or lexicon-based approach is a method for a teaching dictionary-based approach described by Mechael Lewis in the early 1990s in [12]. The basic concept and methods of this approach represent an idea that signifies how education involves the understanding and production of lexical phrases. This pattern of language has grammar as well as a meaningful collection of words.

**a.** Statistical Approach (example in [2]): If a word appears intermittently amid positive texts, then its polarity is positive. If it appears frequently among negative texts, then its polarity can be considered negative. If it has equal frequencies in positive and negative texts, then it can be considered a neutral word. Seed opinion words can be found using statistical techniques. Most state-of-the-art methods are based on the observation that similar opinion words often appear together in a corpus. Thus, if two words appear together frequently within the same context, then the probability is high that they have same polarity. Therefore, the polarity of an unknown word can be determined by calculating the relative frequency of co-occurrence with an-

**b.** Semantic approach (example in [3]): This principle assigns similar sentiment values to semantically-close words. These semantically-close words can be obtained by getting a list of sentiment words, iteratively expanding the initial set with synonyms and antonyms, and then determining the sentiment polarity for an unknown word by the relative count

A lexicon-based method was used for the sentiment classification of Twitter data in [4]. The approaches were used to identify and extract sentiments from emotions and hashtags. Also used in [4] was the practice of converting non-grammatical words to grammatical words, and

The survey in [5] used lexicon–based classification and included two techniques: a methodof-moments estimator for word, and a Bayesian adjustment for repeated counts of the same

A structured approach was used in [6] for domain-dependent sentiment analysis, using lexi-

The survey [7] introduced was a new approach to lexicon extraction, which can be successfully used for sentiment polarity assignment. It has been shown that the accuracy obtained

The lexicon–based approach that [8] used was the Semantic Orientation CALculator (SO-CAL), which includes dictionaries of words annotated with their semantic orientation (polarity and

The survey in [9] proposes a framework for sentiment analysis using dictionary-based approach. An approach to sentiment analysis is proposed that uses dictionary-based approach

In the research in [10], a lexicon-based approach was proposed to calculate reputation scores from Twitter. A Saudi-dialect lexicon was developed from Saudi tweets, to improve address-

The authors of [11] propose a lexicon-based approach to sentiment classification of Twitter posts. Their approach is based on the exploitation of widespread lexical resources such as

other word.

246 Artificial Intelligence - Emerging Trends and Applications

word.

of positive and negative synonyms of this word.

normalizing non-root to root words to extract sentiments.

from such lexicons outperforms other lexicon-based approaches.

strength), and incorporates intensification and negation.

SentiWordNet, WordNet-Affect, MPQA, and SenticNet.

con expansion aided by emoticons.

incorporating fuzzy logic.

ing the sentiment of the Arabic tweets.

Sentiment analysis performs a role in the lexicon-based approach in [13]. It plays a significant role in determining classes such as positive, negative, and neutral.

Lexicon based approach [14] is to extract and handle the sentiment as no-slang words.

The sentiments are as followed in many dictionaries which are named as lexicon based dictionaries which are: (1) Bing Liu's opinion lexicon. (2) MPQA subjectivity lexicon. (3) SentiWordNet lexicon. (4) Semantic Evaluation (SemEval).

The acronym dictionary included in [15, 16] is very helpful in expanding tweets and improve overall sentiments scores.

In [17, 18, 19], the emoticons have a different combination of symbols as different abbreviations.

The lexicon-based antonym dictionary in [20] contains set of well-lexicons, such as WordNet dictionary in English. WordNet dictionary maintains the set of lexical datasets for English words and also keeps record of semantic relationship between works.

The authors in [21–35] use the equations determining Pointwise Mutual Information (PMI) between two words wi and wj as follows:

$$PMI(wvi, w\eta) = \log\_2\left(\frac{P(wi, w\eta)}{P(wvi) \ge P(w\eta)}\right) \tag{1}$$

They use the equations determining SO (sentiment orientation) of word wi as follows:

$$SO\text{ (wri)} = \text{PMI(wri, positive)} - \text{PMI(wri, negative)}\tag{2}$$

In [21–28], the positive and the negative of Eq. (2) in English are: positive = {good, nice, excellent, positive, fortunate, correct, superior} and negative = {bad, nasty, poor, negative, unfortunate, wrong, inferior}.

The AltaVista search engine (AVSE) is used in the PMI equations of [22, 23, 25], and the Google search engine (GSE) is used in the PMI equations of [24, 26, 28]. In addition, the authors of [24] also use German, the authors of [25] also use Macedonian, the authors of [26] also use Arabic, the authors of [27] also use Chinese, and the authors of [28] also use Spanish. In addition, the Bing search engine (BSE) is also used in [26].

With [29–32], the PMI equations are used in Chinese, not English, and Tibetan is also added in [29]. In terms of the search engine, AVSE is used in [31], and the authors of [32] use three search engines: GSE, the Yahoo search engine (YSE), and the Baidu search engine (BSE). The PMI equations are also used in Japanese with GSE in [33]. The authors in [34, 35] also use the PMI equations and Jaccard equations with GSE in English.

The Jaccard equations with GSE in English are used in [34, 35, 37]. The authors in [36, 41] use the Jaccard equations in English. The authors in [40, 42] use the Jaccard equations in Chinese. The authors in [38] use the Jaccard equations in Arabic. The Jaccard equations with the Chinese search engine (CSE) in Chinese are used in [39].

techniques to classifying sentiment of Thai Twitter data. Two deep-learning techniques are included in the study: Long Short Term Memory (LSTM) and Dynamic Convolutional Neural

The Today Tendency of Sentiment Classification http://dx.doi.org/10.5772/intechopen.74930 249

The authors of [55] used a new model to initialize the parameter weights of the convolutional neural network. They also used an unsupervised neural language model to train initial words.

The authors in [57] fine-tuned a convolutional neural network (CNN) for image sentiment analysis and train a paragraph vector model for textual sentiment analysis. The authors conducted extensive experiments on both machine weakly-labeled and manually-labeled image tweets.

Ensemble approaches in statistics and machine learning use multiple learning algorithms to get better predictive performance than constituent learning algorithms. A machine learning ensemble, unlike a statistical ensemble in statistical mechanics, comprises only a concrete, finite set of alternative models, but typically allows for much more flexible structures to exist

A comparative study of the effectiveness of ensemble technique for sentiment classification was proposed in [58]. This survey used the ensemble framework for sentiment classification, with the aim of efficiently integrating different feature sets and classification algorithms in order to synthesize a more accurate classification procedure. The research in [59] presents an ensemble learning method for sentiment classification of reviews. The ensemble learning framework, or stacking generalization, is introduced based on different algorithms with different settings, and compared with the majority voting. An ensemble sentiment classification strategy in [60] was applied based on Majority Vote principle of multiple classification methods, including Naive Bayes, SVM, Bayesian Network, C4.5 Decision Tree, and Random Forest algorithms.

The simplest definition of a neural network—more properly referred to as an "artificial" neural network (ANN)—is provided by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen. The neural networks (NN)-based method in [61] combines the BPN and SO indexes to classify bloggers' sentiment. The NN-based method can reduce training time when classifying textual data. The NN-based method outperforms the traditional sentiment classifi-

In mathematics, statistics, and computer science—particularly in the fields of machine learning and inverse problems—regularization is the process of introducing additional information in order to solve an ill-posed problem or to prevent over-fitting. The authors in [62] discussed a relation between Learning Theory and Regularization of linear ill-posed inverse problems. The authors showed that a notion of regularization (defined according to what is usually done for ill-posed inverse problems) allows derivation of learning algorithms that are

The authors in [48, 50, 51] used the rules of rule systems for the sentiment classification in

Regression analysis in statistical modeling is a set of statistical processes for estimating the relationships among variables, and it comprises many techniques for modeling and analyzing

cation methods (BPN and SO index) in experimental results.

consistent and that provide a fast convergence rate.

Vietnamese and English.

Deep learning and micro-blog sentiment analysis were proposed in [56].

Network (DCNN).

among those alternatives.

The authors in [48] use the Ochiai Measure through GSE with the AND and OR operators, to calculate the sentiment values of the words in Vietnamese. The authors in [49] use the Cosine Measure through GSE with the AND and OR operators, to identify the sentiment scores of the words in English. The authors in [50] use the Sorensen Coefficient through GSE with the AND and OR operators, to calculate the sentiment values of the words in English. The authors in [51] use the Jaccard Measure through GSE with the AND and OR operators, to calculate the sentiment values of the words in Vietnamese. The authors in [52] use the Tanimoto Coefficient through GSE with the AND and OR operators, to identify the sentiment scores of the words in English.

With the above proofs of the surveys in [21–52], according to our evaluation, all the similarity coefficients (or the similarity measures) can be applied with certainty to identify valences (or the sentiment scores) of all the words in many different languages.

## **2.2. Machine-learning approaches**

The supervised learning algorithms and the unsupervised learning algorithms of the machine learning algorithms have been developed for the sentiment classification in **Figure 1**.

For the deep learning group of the sentiment analysis, deep learning (also known as deep structured learning or hierarchical learning) is based on learning data representations. Learning can be supervised, semi-supervised, or unsupervised. Examples of deep learning include deep neural networks, deep belief networks, and recurrent neural networks. They have been applied to many fields, including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, and drug design.

In the survey in [54], the deep learning techniques showed promising accuracy in this domain on English tweet corpus. The authors conducted the first study that applies deep learning

**Figure 1.** The machine learning algorithms.

techniques to classifying sentiment of Thai Twitter data. Two deep-learning techniques are included in the study: Long Short Term Memory (LSTM) and Dynamic Convolutional Neural Network (DCNN).

The authors of [55] used a new model to initialize the parameter weights of the convolutional neural network. They also used an unsupervised neural language model to train initial words.

Deep learning and micro-blog sentiment analysis were proposed in [56].

The Jaccard equations with GSE in English are used in [34, 35, 37]. The authors in [36, 41] use the Jaccard equations in English. The authors in [40, 42] use the Jaccard equations in Chinese. The authors in [38] use the Jaccard equations in Arabic. The Jaccard equations with

The authors in [48] use the Ochiai Measure through GSE with the AND and OR operators, to calculate the sentiment values of the words in Vietnamese. The authors in [49] use the Cosine Measure through GSE with the AND and OR operators, to identify the sentiment scores of the words in English. The authors in [50] use the Sorensen Coefficient through GSE with the AND and OR operators, to calculate the sentiment values of the words in English. The authors in [51] use the Jaccard Measure through GSE with the AND and OR operators, to calculate the sentiment values of the words in Vietnamese. The authors in [52] use the Tanimoto Coefficient through GSE with the AND and OR operators, to identify the sentiment scores of the words in English.

With the above proofs of the surveys in [21–52], according to our evaluation, all the similarity coefficients (or the similarity measures) can be applied with certainty to identify valences (or

The supervised learning algorithms and the unsupervised learning algorithms of the machine

For the deep learning group of the sentiment analysis, deep learning (also known as deep structured learning or hierarchical learning) is based on learning data representations. Learning can be supervised, semi-supervised, or unsupervised. Examples of deep learning include deep neural networks, deep belief networks, and recurrent neural networks. They have been applied to many fields, including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinfor-

In the survey in [54], the deep learning techniques showed promising accuracy in this domain on English tweet corpus. The authors conducted the first study that applies deep learning

learning algorithms have been developed for the sentiment classification in **Figure 1**.

the Chinese search engine (CSE) in Chinese are used in [39].

248 Artificial Intelligence - Emerging Trends and Applications

the sentiment scores) of all the words in many different languages.

**2.2. Machine-learning approaches**

matics, and drug design.

**Figure 1.** The machine learning algorithms.

The authors in [57] fine-tuned a convolutional neural network (CNN) for image sentiment analysis and train a paragraph vector model for textual sentiment analysis. The authors conducted extensive experiments on both machine weakly-labeled and manually-labeled image tweets.

Ensemble approaches in statistics and machine learning use multiple learning algorithms to get better predictive performance than constituent learning algorithms. A machine learning ensemble, unlike a statistical ensemble in statistical mechanics, comprises only a concrete, finite set of alternative models, but typically allows for much more flexible structures to exist among those alternatives.

A comparative study of the effectiveness of ensemble technique for sentiment classification was proposed in [58]. This survey used the ensemble framework for sentiment classification, with the aim of efficiently integrating different feature sets and classification algorithms in order to synthesize a more accurate classification procedure. The research in [59] presents an ensemble learning method for sentiment classification of reviews. The ensemble learning framework, or stacking generalization, is introduced based on different algorithms with different settings, and compared with the majority voting. An ensemble sentiment classification strategy in [60] was applied based on Majority Vote principle of multiple classification methods, including Naive Bayes, SVM, Bayesian Network, C4.5 Decision Tree, and Random Forest algorithms.

The simplest definition of a neural network—more properly referred to as an "artificial" neural network (ANN)—is provided by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen. The neural networks (NN)-based method in [61] combines the BPN and SO indexes to classify bloggers' sentiment. The NN-based method can reduce training time when classifying textual data. The NN-based method outperforms the traditional sentiment classification methods (BPN and SO index) in experimental results.

In mathematics, statistics, and computer science—particularly in the fields of machine learning and inverse problems—regularization is the process of introducing additional information in order to solve an ill-posed problem or to prevent over-fitting. The authors in [62] discussed a relation between Learning Theory and Regularization of linear ill-posed inverse problems. The authors showed that a notion of regularization (defined according to what is usually done for ill-posed inverse problems) allows derivation of learning algorithms that are consistent and that provide a fast convergence rate.

The authors in [48, 50, 51] used the rules of rule systems for the sentiment classification in Vietnamese and English.

Regression analysis in statistical modeling is a set of statistical processes for estimating the relationships among variables, and it comprises many techniques for modeling and analyzing several variables. In regression analysis, we can see how the typical value of the dependent variable (or "criterion variable") changes when any one of the independent variables is varied while the other independent variables are held fixed. Regression analysis is a form of predictive modeling technique, which investigates the relationship between a dependent (target) and an independent variable (s) (predictor). The study in [63] analyzed the effect of using regression on sentiment classification of Twitter data.

text analytics. The authors evaluated the novel approach by comparing it to other dimension-

The Today Tendency of Sentiment Classification http://dx.doi.org/10.5772/intechopen.74930 251

Instance-based learning (memory-based learning) in machine learning is a family of learning algorithms that, instead of performing explicit generalization, compare new problem

Naive Bayes, Instance Based Learning, Decision Tree, SVM, and IB1 (Instance Based Learning 1) were implemented for sentiment classification of the class of reviews from Rotten Tomatoes in [71].

Clustering data concerns a set of objects processed into classes of similar objects. One cluster is a set of data objects that are similar to each other and are not similar to objects in other clusters. A number of data clusters can be clustered, which can be identified by following experience or can be automatically identified as part of the clustering method. The authors of [72] proposed a new model for big-data sentiment classification in the parallel network environment. The authors' proposed model used the Fuzzy C-Means (FCM) method for English sentiment classification, with Hadoop MAP (M) /REDUCE (R) in Cloudera. The authors [73] proposed a new model for Big Data sentiment classification in the parallel network environment. Our new model uses the STING Algorithm (SA) (in the data mining field) for English document-level sentiment classification with Hadoop Map (M)/Reduce (R), based on the 90,000 English sentences of the training data set in a Cloudera parallel network environ-

Furthermore, many approaches have combined several machine-learning and dictionarybased approaches. The authors in [74] proposed a system for sentiment analysis and classification using NLP, machine-learning technique, and dictionary-based approach; our proposed methodology classifies peoples' sentiments into different polarity classes (positive, negative, and neutral).The main objective of the proposed system is to address and solve the polarity shift problem and to provide feasible solutions to the BOW model in sentiment classification; we achieved that objective by Detecting, Eliminating, and Modifying negation polarity shifter

Two main approaches (lexical approach and machine learning) were applied to sentiment analysis in [75]. The lexicon-based method was used to create emotional dictionaries for each domain, as well as the algorithm that calculates the weight of texts. The Maximum Entropy method and the Support Vectors Machines were used in the machine learning approach to create a dictionary and an algorithm for the construction of the feature vector for the Maximum

According to a testing data set and a training data set, the opinion classification has been clas-

With the category (1), the authors [49] used two testing data sets in English and they did not use any training data set. Each testing data set has the 25,000 English documents. The authors

**3. The today tendency of the sentiment analysis**

sified into different categories in **Figure 2**.

instances with instances seen in training, which have been stored in memory.

ality reduction methods.

ment—a distributed system.

from a given text.

Entropy method.

Sentiment analysis was used in [64] to predict the Indonesian stock market. This study used the Naive Bayes and Random Forest algorithms to calculate sentiment regarding a company. The results of sentiment analysis were used to predict the company stock price. A linear regression method was used to build the prediction model.

Naïve Bayes classifiers in machine learning are a family of simple probabilistic classifiers according to Bayes' theorem, with strong (naive) independence assumptions between the features. Naïve Bayes was developed in 1950, and it was introduced under a different name to the text retrieval community in the early 1960s. It remains a popular (baseline) method for text categorization, considering the problem of judging documents as belonging to one category or the other (such as spam or legitimate, sports or politics, etc.), with word frequencies as the features. It is competitive in this domain, with more advanced methods including support vector machines, and it also finds application in automatic medical diagnosis.

The authors in [65] explored different methods of improving the accuracy of a Naive Bayes classifier for sentiment analysis. The supervised learning algorithm was used to classify a review document as either positive or negative in [66]. The authors also improved the Naïve Bayes algorithm.

A decision tree is a tool supporting a decision, and it uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. Operation research commonly uses decision trees, specifically in decision analysis, to help identify a strategy that is most likely to reach a goal; it is a popular tool in machine learning.

The authors in [67] proposed a new model using C4.5 Algorithm of a decision tree to classify semantics (positive, negative, neutral) for the English documents. A novel model using an ID3 algorithm of a decision tree was used to classify sentiments for the documents in English in [68]. This survey was based on many rules which are generated by applying the ID3 algorithm to 115,000 English sentences of our English training data set.

Dimensionality reduction, or dimension reduction in machine learning and statistics, is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. Dimensionality reduction comprises feature selection and feature extraction.

Naive Bayes and Support Vector Machine were used in [69] to analyze the sentiments of huge amount of tweets generated from Twitter users (they are stored in Twitter database). Unigram and bigram as feature extractors along with Chi2 and Singular Value Decomposition were also used for dimensionality reduction.

A novel, semi-supervised Laplacian eigenmap (SS-LE) was proposed in [70]. Redundant features were removed by decreasing its detection errors of sentiments. It enabled visualization of documents in perceptible, low-dimensional embedded space, to provide a useful tool for text analytics. The authors evaluated the novel approach by comparing it to other dimensionality reduction methods.

several variables. In regression analysis, we can see how the typical value of the dependent variable (or "criterion variable") changes when any one of the independent variables is varied while the other independent variables are held fixed. Regression analysis is a form of predictive modeling technique, which investigates the relationship between a dependent (target) and an independent variable (s) (predictor). The study in [63] analyzed the effect of using

Sentiment analysis was used in [64] to predict the Indonesian stock market. This study used the Naive Bayes and Random Forest algorithms to calculate sentiment regarding a company. The results of sentiment analysis were used to predict the company stock price. A linear

Naïve Bayes classifiers in machine learning are a family of simple probabilistic classifiers according to Bayes' theorem, with strong (naive) independence assumptions between the features. Naïve Bayes was developed in 1950, and it was introduced under a different name to the text retrieval community in the early 1960s. It remains a popular (baseline) method for text categorization, considering the problem of judging documents as belonging to one category or the other (such as spam or legitimate, sports or politics, etc.), with word frequencies as the features. It is competitive in this domain, with more advanced methods including support vector machines,

The authors in [65] explored different methods of improving the accuracy of a Naive Bayes classifier for sentiment analysis. The supervised learning algorithm was used to classify a review document as either positive or negative in [66]. The authors also improved the Naïve Bayes algorithm. A decision tree is a tool supporting a decision, and it uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. Operation research commonly uses decision trees, specifically in decision analysis, to help identify a strategy that is most likely to reach a goal; it is a popular tool in machine learning.

The authors in [67] proposed a new model using C4.5 Algorithm of a decision tree to classify semantics (positive, negative, neutral) for the English documents. A novel model using an ID3 algorithm of a decision tree was used to classify sentiments for the documents in English in [68]. This survey was based on many rules which are generated by applying the ID3 algorithm

Dimensionality reduction, or dimension reduction in machine learning and statistics, is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. Dimensionality reduction comprises feature selection and feature extraction. Naive Bayes and Support Vector Machine were used in [69] to analyze the sentiments of huge amount of tweets generated from Twitter users (they are stored in Twitter database). Unigram and bigram as feature extractors along with Chi2 and Singular Value Decomposition were

A novel, semi-supervised Laplacian eigenmap (SS-LE) was proposed in [70]. Redundant features were removed by decreasing its detection errors of sentiments. It enabled visualization of documents in perceptible, low-dimensional embedded space, to provide a useful tool for

regression on sentiment classification of Twitter data.

250 Artificial Intelligence - Emerging Trends and Applications

regression method was used to build the prediction model.

and it also finds application in automatic medical diagnosis.

to 115,000 English sentences of our English training data set.

also used for dimensionality reduction.

Instance-based learning (memory-based learning) in machine learning is a family of learning algorithms that, instead of performing explicit generalization, compare new problem instances with instances seen in training, which have been stored in memory.

Naive Bayes, Instance Based Learning, Decision Tree, SVM, and IB1 (Instance Based Learning 1) were implemented for sentiment classification of the class of reviews from Rotten Tomatoes in [71].

Clustering data concerns a set of objects processed into classes of similar objects. One cluster is a set of data objects that are similar to each other and are not similar to objects in other clusters. A number of data clusters can be clustered, which can be identified by following experience or can be automatically identified as part of the clustering method. The authors of [72] proposed a new model for big-data sentiment classification in the parallel network environment. The authors' proposed model used the Fuzzy C-Means (FCM) method for English sentiment classification, with Hadoop MAP (M) /REDUCE (R) in Cloudera. The authors [73] proposed a new model for Big Data sentiment classification in the parallel network environment. Our new model uses the STING Algorithm (SA) (in the data mining field) for English document-level sentiment classification with Hadoop Map (M)/Reduce (R), based on the 90,000 English sentences of the training data set in a Cloudera parallel network environment—a distributed system.

Furthermore, many approaches have combined several machine-learning and dictionarybased approaches. The authors in [74] proposed a system for sentiment analysis and classification using NLP, machine-learning technique, and dictionary-based approach; our proposed methodology classifies peoples' sentiments into different polarity classes (positive, negative, and neutral).The main objective of the proposed system is to address and solve the polarity shift problem and to provide feasible solutions to the BOW model in sentiment classification; we achieved that objective by Detecting, Eliminating, and Modifying negation polarity shifter from a given text.

Two main approaches (lexical approach and machine learning) were applied to sentiment analysis in [75]. The lexicon-based method was used to create emotional dictionaries for each domain, as well as the algorithm that calculates the weight of texts. The Maximum Entropy method and the Support Vectors Machines were used in the machine learning approach to create a dictionary and an algorithm for the construction of the feature vector for the Maximum Entropy method.

## **3. The today tendency of the sentiment analysis**

According to a testing data set and a training data set, the opinion classification has been classified into different categories in **Figure 2**.

With the category (1), the authors [49] used two testing data sets in English and they did not use any training data set. Each testing data set has the 25,000 English documents. The authors

**Figure 2.** Categories of the sentiment classification based on the data sets.

[51] used one testing data set in Vietnamese and they did not use any training data set. The testing data set has the 30,000 Vietnamese documents. The survey [83] used one testing data set in English and it did not use any training data set. The testing data set has the 5,000,000 English documents.

classification with movie reviews in [80]. The vote algorithm in [81] was used in conjunction with three classifiers, namely Naive Bayes, Support Vector Machine (SVM), and Bagging.

The Today Tendency of Sentiment Classification http://dx.doi.org/10.5772/intechopen.74930 253

The category (3) uses a testing data set and a training data set. This testing data set has the documents, and this training data set has many sentences. The authors in [67] used one training data set that included 140,000 sentences and two testing data sets in English. Each testing data set has 25,000 documents. The research in [68] used one training data set that included 115,000 sentences and two testing data sets in English. Each testing data set has 25,000 documents. The authors in [72] used one training data set that included 60,000 sentences and two testing data sets in English. Each testing data set had 25,000 documents. The survey in [73] used one training data set that included 90,000 sentences and two testing data sets in English.

This category also uses many machine-learning algorithms (supervised learning, unsupervised learning, semi-supervised learning, etc.). The authors in [67] used a decision tree—a C4.5 algorithm to generate many association rules for English sentiment classification. The authors in [68] also used a decision tree—an ID3 algorithm to generate many association rules for English sentiment classification. The authors in [72, 73] used the clustering algorithms of machine learning to cluster the documents of the testing data set into either the positive polarity or the negative polarity, based on the training data set. The authors in [76] used a SVM algorithm of machine learning to classify the documents of the testing data set into either the positive polarity or the negative polarity, according to the sentences of the training data set.

Paying attention to the current statuses of the economies of the world (we have presented information about big corporations, many documents, etc., in the Introduction section), we

**1.** Processing many big data sets with shortened execution times: As we have presented the information about big corporations, many documents, etc., in the Introduction section, many old approaches (methods or models) cannot process the big data sets with certainty, or they can process the big data sets but only with long times and high costs. The processing of big data sets can be implemented in many parallel network systems. The authors' proposed model in [72] used the Fuzzy C-Means (FCM) method for English sentiment classification, with Hadoop MAP (M) /REDUCE (R) in Cloudera, a parallel network environment. The authors in [73] used a STING Algorithm for English Sentiment Classification

Each testing data set had 25,000 documents.

**Figure 3.** The today tendency of the sentiment analysis.

show the today tendency of the opinion analysis in **Figure 3**.

The category (1) uses the lexicon-based approaches in [1–52, 77]. In addition, category (1) uses a Self-Organizing Map Algorithm—The Self-Organizing Map is based on unsupervised learning.


Category (1) uses many similarity coefficients (or similarity measures) to classify one document of the testing data set into either the positive polarity or the negative polarity. According to our opinion, all the similarity measures can be used for the sentiment analysis of category (1).

In addition, category (1) also uses many rules for the sentiment classification in [48- 52], in many different languages.

The category (2) has used a testing data set and a training data set. This testing data set has the documents, and this training data set has the documents. The authors [82] used one testing data set including 1,000,000 documents and one training data set comprising 2,000,000 documents in English. This category has used many machine learning algorithms (supervised learning, unsupervised learning, semi-supervised learning, etc.). The authors in [78] use a Machine Learning algorithm, Support Vector Machines, for their sentiment classification. Latent semantic analysis (LSA) has proven to be extremely useful in information retrieval in [79]. A novel approach based on LSA and support vector machine (SVM) aims to improve the sentiment classification performance. Three machine learning approaches (Naive Bayes, maximum entropy classification, and support vector machines) were used for sentiment

**Figure 3.** The today tendency of the sentiment analysis.

[51] used one testing data set in Vietnamese and they did not use any training data set. The testing data set has the 30,000 Vietnamese documents. The survey [83] used one testing data set in English and it did not use any training data set. The testing data set has the 5,000,000

**Figure 2.** Categories of the sentiment classification based on the data sets.

252 Artificial Intelligence - Emerging Trends and Applications

The category (1) uses the lexicon-based approaches in [1–52, 77]. In addition, category (1) uses a Self-Organizing Map Algorithm—The Self-Organizing Map is based on unsupervised

**a.** With one document of the testing data set, the SOM is used to cluster all the sentences of this document into either the positive or the negative sections on a map. The sentiment classification of this document is identified completely based on this map. There is no training

**b.** With many documents of the testing data set, the SOM is used to cluster all the documents into either the positive or the negative sections on a map. The sentiment classification of all the documents is identified completely based on this map. There is no training data set

Category (1) uses many similarity coefficients (or similarity measures) to classify one document of the testing data set into either the positive polarity or the negative polarity. According to our opinion, all the similarity measures can be used for the sentiment analysis of category (1).

In addition, category (1) also uses many rules for the sentiment classification in [48- 52], in

The category (2) has used a testing data set and a training data set. This testing data set has the documents, and this training data set has the documents. The authors [82] used one testing data set including 1,000,000 documents and one training data set comprising 2,000,000 documents in English. This category has used many machine learning algorithms (supervised learning, unsupervised learning, semi-supervised learning, etc.). The authors in [78] use a Machine Learning algorithm, Support Vector Machines, for their sentiment classification. Latent semantic analysis (LSA) has proven to be extremely useful in information retrieval in [79]. A novel approach based on LSA and support vector machine (SVM) aims to improve the sentiment classification performance. Three machine learning approaches (Naive Bayes, maximum entropy classification, and support vector machines) were used for sentiment

English documents.

data set in this category.

in this category.

many different languages.

learning.

classification with movie reviews in [80]. The vote algorithm in [81] was used in conjunction with three classifiers, namely Naive Bayes, Support Vector Machine (SVM), and Bagging.

The category (3) uses a testing data set and a training data set. This testing data set has the documents, and this training data set has many sentences. The authors in [67] used one training data set that included 140,000 sentences and two testing data sets in English. Each testing data set has 25,000 documents. The research in [68] used one training data set that included 115,000 sentences and two testing data sets in English. Each testing data set has 25,000 documents. The authors in [72] used one training data set that included 60,000 sentences and two testing data sets in English. Each testing data set had 25,000 documents. The survey in [73] used one training data set that included 90,000 sentences and two testing data sets in English. Each testing data set had 25,000 documents.

This category also uses many machine-learning algorithms (supervised learning, unsupervised learning, semi-supervised learning, etc.). The authors in [67] used a decision tree—a C4.5 algorithm to generate many association rules for English sentiment classification. The authors in [68] also used a decision tree—an ID3 algorithm to generate many association rules for English sentiment classification. The authors in [72, 73] used the clustering algorithms of machine learning to cluster the documents of the testing data set into either the positive polarity or the negative polarity, based on the training data set. The authors in [76] used a SVM algorithm of machine learning to classify the documents of the testing data set into either the positive polarity or the negative polarity, according to the sentences of the training data set.

Paying attention to the current statuses of the economies of the world (we have presented information about big corporations, many documents, etc., in the Introduction section), we show the today tendency of the opinion analysis in **Figure 3**.

**1.** Processing many big data sets with shortened execution times: As we have presented the information about big corporations, many documents, etc., in the Introduction section, many old approaches (methods or models) cannot process the big data sets with certainty, or they can process the big data sets but only with long times and high costs. The processing of big data sets can be implemented in many parallel network systems. The authors' proposed model in [72] used the Fuzzy C-Means (FCM) method for English sentiment classification, with Hadoop MAP (M) /REDUCE (R) in Cloudera, a parallel network environment. The authors in [73] used a STING Algorithm for English Sentiment Classification in A Parallel Environment. The authors of [76] used a SVM algorithm for English Semantic Classification in Parallel Environment. Furthermore, lexicon-based approaches can be performed in the distributed network systems with certainty. In the near future, there will be many small machines that can implement the parallel systems. The execution time of the proposed model is dependent on many factors: (1) the parallel network environment, such as the Cloudera system; (2) the distributed functions, such as Hadoop Map (M) and Hadoop Reduce (R); (3) the algorithms in the approach; (4) the performance of the distributed network system; (5) the number of nodes of the parallel network environment; (6) the performance of each node (each server) of the distributed environment; and (7) the sizes of the training data set and the testing data set.

**Conflict of interest**

**Author details**

Vo Ngoc Phu<sup>1</sup>

**References**

We declare that we have no conflict of interest in this chapter.

Thank Dr. Marco Antonio Aceves-Fernandez so much for inviting us to contribute this chap-

The Today Tendency of Sentiment Classification http://dx.doi.org/10.5772/intechopen.74930 255

1 Institute of Research and Development, Duy Tan University – DTU, Da Nang, Vietnam 2 School of Industrial Management (SIM), Ho Chi Minh City University of Technology -

[1] Goyal A, Daume III H. Generating semantic orientation lexicon using large data and thesaurus. WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches

[2] Turney P. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting of the Association for

[3] Alena N, Helmut P, Mitsuru I. Recognition of affect, judgment, and appreciation in Text. In: Proceedings of the 23rd International Conference on Computational Linguistics

[4] Palanisamy P, Yadav V, Serendio HE. Simple and practical lexicon based approach to sentiment analysis. Second Joint Conference on Lexical and Computational Semantics (\*SEM), Volume 2: Seventh International Workshop on Semantic Evaluation (SemEval

[5] Eisenstein J. Unsupervised learning for lexicon-based classification. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), the Hilton San

to Subjectivity and Sentiment Analysis, Portland, Oregon; 2011. pp. 37-43

Computational Linguistics (ACL), Philadelphia; July 2002. p. 417-424

**Notes/Thanks/Other declarations**

ter to the book "Artificial Intelligence."

\* and Vo Thi Ngoc Tran2

(Coling 2010), Beijing; 2010. pp. 806-14

2013), Atlanta, Georgia, June 14-15; 2013. pp. 543-548

Francisco, San Francisco, California, USA; 2017. pp. 3188-3194

\*Address all correspondence to: vongocphu03hca@gmail.com

HCMUT, Vietnam National University, Ho Chi Minh City, Vietnam


## **4. Conclusion**

In summary, we have presented the dictionary-based approaches and the corpus-based approaches of the sentiment classification basically; and we have also shown the today tendency of the sentiment analysis in more details.

We have displayed the information about the surveys in each section of this chapter. We have also displayed the advantages of the studies in more details.

According to the above proofs and our opinion, three tendencies of the sentiment classification will strongly have developed more and more in the near future because they have the advantages in the different fields and commercial applications.

There will be the surveys developed for the sentiment analysis.

## **Conflict of interest**

in A Parallel Environment. The authors of [76] used a SVM algorithm for English Semantic Classification in Parallel Environment. Furthermore, lexicon-based approaches can be performed in the distributed network systems with certainty. In the near future, there will be many small machines that can implement the parallel systems. The execution time of the proposed model is dependent on many factors: (1) the parallel network environment, such as the Cloudera system; (2) the distributed functions, such as Hadoop Map (M) and Hadoop Reduce (R); (3) the algorithms in the approach; (4) the performance of the distributed network system; (5) the number of nodes of the parallel network environment; (6) the performance of each node (each server) of the distributed environment; and (7) the sizes of

**2.** Having high accuracy: A high accuracy is crucial for surveys and commercial applications. We can use the works of sentiment classification to cross-check in order to improve their accuracies. The accuracy of the proposed model is dependent on several factors: (1) the algorithms in the approach; (2) the testing data set and the training data set; (3) whether the documents of the testing data set are standardized carefully; and (4) whether the docu-

**3.** Integrating flexibly and easily into many small machines or many different approaches: This category is very important for surveys, researchers, and commercial applications. The small machines used in many different fields can be conveniently used anywhere, for any type of users, and for various purposes. These small machines can be produced easily, and can be very cheap and easy to carry. The easy and flexible integration of sentiment classification into the small machines helps save a lot of time and cost. The lexicon-based approaches and the rules-based approaches can be integrated into the small machines, because the small machines have the space to store their data. In addition, the lexicons and the rules can be implemented easily in the small machines. We will not spend much time

In summary, we have presented the dictionary-based approaches and the corpus-based approaches of the sentiment classification basically; and we have also shown the today ten-

We have displayed the information about the surveys in each section of this chapter. We have

According to the above proofs and our opinion, three tendencies of the sentiment classification will strongly have developed more and more in the near future because they have the

ments (or the sentences) of the training data set are standardized carefully.

studying and implementing the surveys that currently exist.

dency of the sentiment analysis in more details.

also displayed the advantages of the studies in more details.

advantages in the different fields and commercial applications.

There will be the surveys developed for the sentiment analysis.

**4. Conclusion**

the training data set and the testing data set.

254 Artificial Intelligence - Emerging Trends and Applications

We declare that we have no conflict of interest in this chapter.

## **Notes/Thanks/Other declarations**

Thank Dr. Marco Antonio Aceves-Fernandez so much for inviting us to contribute this chapter to the book "Artificial Intelligence."

## **Author details**

Vo Ngoc Phu<sup>1</sup> \* and Vo Thi Ngoc Tran2

\*Address all correspondence to: vongocphu03hca@gmail.com

1 Institute of Research and Development, Duy Tan University – DTU, Da Nang, Vietnam

2 School of Industrial Management (SIM), Ho Chi Minh City University of Technology - HCMUT, Vietnam National University, Ho Chi Minh City, Vietnam

## **References**


[6] Zhou Z, Zhang X, Sanderson M. (2014) sentiment analysis on twitter through topic-based lexicon expansion. In: Wang H, Sharaf MA, editors. Databases Theory and Applications. ADC 2014. Lecture Notes in Computer Science. Vol. 8506. Cham: Springer; 2014

[19] Dayalani GG. Emoticon based unsupervised sentiment classifier for polarity analysis in tweets. International Journal of Engineering Research and General Science.

The Today Tendency of Sentiment Classification http://dx.doi.org/10.5772/intechopen.74930 257

[20] Xia R, Xu F, Zong C, Li Q, Qi Y, Li T. Dual sentiment analysis: Considering two sides of one review. IEEE Transactions on Knowledge and Data Engineering. 2015;**27**(8):2120-2133

[21] Bai A, Hammer H. Constructing sentiment lexicons in Norwegian from a large text corpus. 2014 IEEE 17th International Conference on Computational Science and Engi-

[22] Turney PD, Littman ML. Unsupervised learning of semantic orientation from a hundred-billion-word corpus. arXiv:cs/0212012, Learning (cs.LG); Information Retrieval (cs.

[23] Malouf R, Mullen T. Graph-based user classification for informal online political discourse. In: Proceedings of the 1st Workshop on Information Credibility on the Web; 2017

[24] Scheible C. Sentiment translation through lexicon induction. Proceedings of the ACL

[25] Jovanoski D, Pachovski V, Nakov P. Sentiment analysis in twitter for Macedonian. Proceedings of Recent Advances in Natural Language Processing, Bulgaria; 2015. pp. 249-257

[26] Htait A, Fournier S, Bellot P. LSIS at SemEval-2016 Task 7: Using Web Search Engines for English and Arabic Unsupervised Sentiment Intensity Prediction. Proceedings of

[27] Wan X. Co-training for cross-lingual sentiment classification. Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Singapore; 2009.

[28] Brooke J, Tofiloski M, Taboada M. Cross-linguistic sentiment analysis: From English to Spanish. International Conference RANLP 2009, Borovets, Bulgaria; 2009. pp. 50-54

[29] Jiang T, Jiang J, Dai Y, Li A. Micro–blog emotion orientation analysis algorithm based on Tibetan and Chinese mixed text. International Symposium on Social Science (ISSS

[30] Tan S, Zhang J. An empirical study of sentiment analysis for Chinese documents. Expert Systems with Applications. 2007;**34**(4):2622-2629. DOI: 10.1016/j.eswa.2007.05.028

[31] Du W, Tan S, Cheng X, Yun X. Adapting Information Bottleneck Method for Automatic Construction of Domain-oriented Sentiment Lexicon. WSDM'10, New York, USA; 2010

[32] Zhang Z, Ye Q, Zheng W, Li Y. Sentiment classification for consumer word-of-mouth in Chinese: Comparison between supervised and unsupervised approaches. The 2010

International Conference on E-Business Intelligence; 2010

2010 Student Research Workshop, Sweden; 2010. pp. 25-30

SemEval-2016, 2016, California, p. 481-485

2014;**2**:438-445

IR); 2002

p. 235-243

2015); 2015

neering, Chengdu, China; 2014


[19] Dayalani GG. Emoticon based unsupervised sentiment classifier for polarity analysis in tweets. International Journal of Engineering Research and General Science. 2014;**2**:438-445

[6] Zhou Z, Zhang X, Sanderson M. (2014) sentiment analysis on twitter through topic-based lexicon expansion. In: Wang H, Sharaf MA, editors. Databases Theory and Applications.

[7] Augustyniak L, Kajdanowicz T, Szymanski P, Tuligłowicz W, Kazienko P, Alhajj R, Szymanski B. Simpler is better? lexicon-based ensemble sentiment classification beats supervised methods. International Workshop on Curbing Collusive Cyber-gossips in Social Networks (C3-2014). Proc. IEEE/ACM Int. Conf. Advances in Social Network

[8] Taboada M, Brooke J, Tofiloski M, Voll K, Stede M. Lexicon-based methods for senti-

[9] Hardeniya T, Borikar DA. An approach to sentiment analysis using lexicons with comparative analysis of different techniques. IOSR Journal of Computer Engineering (IOSR-

[10] Al-Hussaini H, Al-Dossari H. A lexicon-based approach to build service provider reputation from Arabic tweets in twitter. (IJACSA) International Journal of Advanced

[11] Musto C, Semeraro G, Polignano M. A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts. Proceedings of the 8th International Workshop

[12] Hamdan H, Bellot P, Bechet F. lsislif: Feature extraction and label weighting for sentiment in twitter. Proceedings of the 9th International Workshop on Semantic Evaluation,

[13] Pan Y, Li X, Shi H, Liu H. Research of methods in sentiment orientation analysis of text based on domain sentiment lexicon. Information Technology Journal. 2014;**13**(9):

[14] Park S, Kim Y. Building thesaurus lexicon using dictionary-based approach for sentiment classification. 14th IEEE International Conference on Software Engineering Research,

[15] Ren F, Matsumoto K. Semi-automatic creation of youth slang corpus and its application to affective computing. IEEE Transactions on Affective Computing. 2016;**7**(2):176-189 [16] Xing L, Yuan L, Qinglin W, Yu L. An approach to sentiment analysis of short Chinese text based on SVMs. 34th IEEE Chinese Control Conference (CCC), China; 2015. pp. 9115-9120

[17] Kundi FM, Ahmed S, Khan A, Asghar MZ. Detection and scoring of internet slangs for

[18] Huang S, Han W, Que X, Wang W. Polarity identification of sentiment words based on emoticons. 9th Conference on Computational Intelligence and Security, Emei Mountain,

sentiment analysis using SentiWordNet. Life Science Journal. 2014;**11**:66-72

Management and Applications (SERA), Towson, MD, USA; 2016. pp. 39-44

ADC 2014. Lecture Notes in Computer Science. Vol. 8506. Cham: Springer; 2014

Analysis and Mining, ASONAM, Beijing, China; August 17, 2014

JCE). 2016;**18**(3):53-57. Ver. I; e-ISSN: 2278-0661,p-ISSN: 2278-8727

on Information Filtering and Retrieval, Pisa, Italy; December 10th 2014

ment analysis. Computational Linguistics. 2010;**37**(2):267-307

Computer Science and Applications. 2017;**8**(4)

256 Artificial Intelligence - Emerging Trends and Applications

At Denver, Colorado, USA; 2015. p. 568-573

1612-1621. DOI: 10.3923/itj.2014.1612.1621

Sichuan Province, China; 2013. pp. 134-138


[33] Wang G, Araki K. Modifying SO-PMI for Japanese weblog opinion mining by using a balancing factor and detecting neutral expressions. Proceedings of NAACL HLT 2007, Companion Volume, NY; 2007. pp. 189-192

[46] Drinić SM, Nikolić A, Perić V. Cluster analysis of soybean genotypes based on RAPD markers. Proceedings. 43rd Croatian And 3rd International Symposium On Agriculture,

The Today Tendency of Sentiment Classification http://dx.doi.org/10.5772/intechopen.74930 259

[47] Tamás J, Podani J, Csontos P. An extension of presence/absence coefficients to abundance data: A new look at absence. Journal of Vegetation Science. 2001;**12**:401-410 [48] Phu VN, Chau VTN, Tran VTN, Dat ND. A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics. International Journal of Artificial Intelligence Review (AIR). 2017;**47**:67. DOI: 10.1007/CZEKANOWSKI462-017-9538-6 [49] Phu VN, Chau VTN, Dat ND, Tran VTN, Nguyen TA. A valences-totaling model for English sentiment classification. International Journal of Knowledge and Information

Systems. 2017;**53**(3):579-636. DOI: 10.1007/CZEKANOWSKI115-017-1054-0

(EVOS). 2017;**8**:47. https://doi.org/10.1007/s12530-017-9187-7

[50] Phu VN, Chau VTN, Tran VTN. Shifting semantic values of English phrases for classification. International Journal of Speech Technology (IJST). 2017;**20**(3):509-533. DOI:

[51] Phu VN, Chau VTN, Tran VTN, Dat ND, Duy KLD. A valence-totaling model for Vietnamese sentiment classification. International Journal of Evolving Systems

[52] Vo NP, Vo TNC, Tran VTN, Dat ND, Duy KLD. Semantic lexicons of English nouns for classification. International Journal of Evolving Systems. 2017;**8**:69. DOI: 10.1007/

[53] Shirani-Mehr H.Applications of Deep Learning to Sentiment Analysis of Movie Reviews.

[54] Vateekul P, Koomsubha T. A study of sentiment analysis using deep learning techniques on Thai twitter data. 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), Khon Kaen, Thailand; 2016. DOI: 10.1109/JCSSE.2016.7748849

[55] Severyn A, Moschitti A. Twitter sentiment analysis with deep convolutional neural networks. SIGIR '15 Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile; 2015.

[56] Yanmei L, Yuda C. Research on Chinese micro-blog sentiment analysis based on deep learning. 8th Int. Symp. Comput. Intell. Des., Hangzhou, China; 2015. pp. 358-361

[57] You Q, Luo J, Jin H, Yang J. Joint visual-textual sentiment analysis with deep neural networks. MM '15 Proceedings of the 23rd ACM international conference on Multimedia,

[58] Xia R, Zong C, Li S. Ensemble of feature sets and classification algorithms for sentiment classification. Information Sciences. 2015;**181**(6):1138-1152. DOI: 10.1016/j.ins.2010.11.023

[59] Su Y, Zhang Y, Ji D, Wang Y, Wu H. Ensemble learning for sentiment classification. In: Ji D, Xiao G, editors. Chinese Lexical Semantics. CLSW 2012. Lecture Notes in Computer

Opatija, Croatia; 2008. pp. 367-370

10.1007/CZEKANOWSKI772-017-9420-6

Technical Report. Stanford University; 2014

Brisbane, Australia; 2015. pp. 1071-1074

Science. Vol. 7717. Berlin, Heidelberg: Springer; 2013

s12530-017-9188-6

pp. 959-962


[46] Drinić SM, Nikolić A, Perić V. Cluster analysis of soybean genotypes based on RAPD markers. Proceedings. 43rd Croatian And 3rd International Symposium On Agriculture, Opatija, Croatia; 2008. pp. 367-370

[33] Wang G, Araki K. Modifying SO-PMI for Japanese weblog opinion mining by using a balancing factor and detecting neutral expressions. Proceedings of NAACL HLT 2007,

[34] Feng S, Zhang L, Li B, Wang D, Yu v, Wong K-F. Is twitter a better corpus for measuring sentiment similarity? Proceedings of the 2013 Conference on Empirical Methods in

[35] An NTT, Hagiwara M. Adjective-based estimation of short sentence's impression. (KEER2014) Proceedings of the 5th Kanesi Engineering and Emotion Research; Inter-

[36] Shikalgar NR, Dixit AM. JIBCA: Jaccard index based clustering algorithm for mining online review. International Journal of Computer Applications (0975-8887).

[37] Ji X, Chun SA, Wei Z, Geller J. Twitter sentiment classification for measuring public health concerns. Social Network Analysis and Mining. 2015;**5**:13. DOI: 10.1007/

[38] Omar N, Albared M, Al-Shabi AQ, Al-Moslmi T. Ensemble of Classification algorithms for subjectivity and sentiment analysis of Arabic Customers' reviews. International

[39] Mao H, Gao P, Wang Y, Bollen J. Automatic construction of financial semantic orientation lexicon from large-scale Chinese news corpus. 7th financial risks international

[40] Ren Y, Kaji N, Yoshinaga N, Kitsuregaw M. Sentiment classification in under-resourced languages using graph-based semi-supervised learning methods. IEICE Transactions on

Information and Systems. 2014;**E97-D**(4):790-797. DOI: 10.1587/Transinf.E97.D.1

[41] Netzer O, Feldman R, Goldenberg J, Fresko M. Mine your own business: Market-structure surveillance through text mining. Marketing Science. 2012;**31**(3):521-543

[42] Ren Y, Kaji N, Yoshinaga N, Toyoda M, Kitsuregawa M. Sentiment classification in resource-scarce languages by using label propagation. Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation, Institute of Digital

[43] Alfredo Hernández-Ugalde J, Mora-Urpí J, Rocha OJ. Genetic relationships among wild and cultivated populations of peach palm (Bactris gasipaes Kunth, Palmae): Evidence for multiple independent domestication events. Genetic Resources and Crop Evolution.

[44] Ponomarenko JV, Bourne PE, Shindyalov IN. Building an automated classification of

[45] da Silva Meyer A, Garcia AAF, de Souza AP, de Souza CL Jr. Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (*Zea mays* L).

DNA-binding protein domains. Bioinformatics. 2002;**18**:S192-S201

Genetics and Molecular Biology. 2004;**27**(1):83-91

Enhancement of Cognitive Processing, Waseda University; 2011. pp. 420-429

Journal of Advancements in Computing Technology (IJACT). 2013;**5**

Companion Volume, NY; 2007. pp. 189-192

258 Artificial Intelligence - Emerging Trends and Applications

national Conference, Sweden; 2014

forum, Institut Louis Bachelier; 2014

2014;**105**(15):1-6

s13278-015-0253-5

2011;**58**(4):571-583

Natural Language Processing, USA; 2013. pp. 897-902


[60] Wan Y, Gao Q. An ensemble sentiment classification system of twitter data for airline services analysis. IEEE International Conference on Data Mining Workshop (ICDMW). Atlantic City, NJ, USA; 2015. DOI: 10.1109/ICDMW.2015.7

[72] Phu VN, Dat ND, Tran VTN, Chau VTN, Nguyen TA. Fuzzy C-means for English sentiment classification in a distributed system. International Journal of Applied Intelligence

The Today Tendency of Sentiment Classification http://dx.doi.org/10.5772/intechopen.74930 261

[73] Dat ND, Phu VN, Chau VTN, Tran VTN, Nguyen TA. STING algorithm used English sentiment classification in a parallel environment. International Journal of Pattern Recognition and Artificial Intelligence. 2017;**31**(7):30. DOI: 10.1142/S0218001417500215

[74] Kolekar NV, Rao G, Dey S, Mane M, Jadhav V, Patil S. Sentiment analysis and classification using lexicon-based approach and addressing polarity shift problem. Journal of

[75] Blinov PD, Klekovkina MV, Kotelnikov EV, Pestov OA. Research of lexical approach and machine learning methods for sentiment analysis. Proceedings of. Dialogos. 2013;**2**:51-61

[76] Vo NP, Vo TNC, Vo TNT. SVM for English semantic classification in parallel environment. International Journal of Speech Technology (IJST). 2017;**20**(3):487-508. DOI:

[77] Vo NP, Phan TT. Sentiment classification using Enhanced Contextual Valence Shifters. International Conference on Asian Language Processing (IALP), Kuching, Malaysia;

[78] Kennedy A, Inkpen D. Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence. 2006;**22**(2):110-125. DOI: 10.1111/J.1467-

[79] Wang L, Wan Y. Sentiment classification of documents based on latent semantic analysis. In: Lin S, Huang X, editors. Advanced Research on Computer Education, Simulation and Modeling. Communications in Computer and Information Science. Vol. 176. Berlin,

[80] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine

[81] Catal C, Nangir M. A sentiment classification model based on multiple classifiers.

[82] Vo NP, Vo TNT. A STING algorithm and multi-dimensional vectors used for English sentiment classification in a distributed system. American Journal of Engineering and

[83] Vo NP, Vo TNT. English sentiment classification using only the sentiment lexicons with a JOHNSON coefficient in a parallel network environment. American Journal of

Applied Soft Computing. 2017;**50**:135-141. DOI: 10.1016/j.asoc.2016.11.022

Engineering and Applied Sciences. 2017;**12**:1-28. DOI: 10.3844/ajeassp.201

learning techniques. Proceedings of EMNLP. 2002:79-86

Applied Sciences. 2017;**12**:1-19. DOI: 10.3844/ajeassp.2017

(APIN). 2017;**46**(3):717-738. DOI: 10.1007/s10489-016-0858-z

Theoretical and Applied Information Technology. 2016;**90**(1):1-8

10.1007/s10772-017-9421-5

8640.2006.00277.X

Heidelberg: Springer; 2011

2014. DOI: 10.1109/IALP.2014.6973485


[72] Phu VN, Dat ND, Tran VTN, Chau VTN, Nguyen TA. Fuzzy C-means for English sentiment classification in a distributed system. International Journal of Applied Intelligence (APIN). 2017;**46**(3):717-738. DOI: 10.1007/s10489-016-0858-z

[60] Wan Y, Gao Q. An ensemble sentiment classification system of twitter data for airline services analysis. IEEE International Conference on Data Mining Workshop (ICDMW).

[61] Chen L-S, Liu C-H, Chiu H-J. A neural network based approach for sentiment classification in the blogosphere. Journal of Informetrics. 2011;**5**(2):313-322. DOI: 10.1016/j.

[62] Bauer F, Pereverzev S, Rosasco L. On regularization algorithms in learning theory.

[63] Onal I, Ertugrul AM. Effect of using regression in sentiment analysis. Signal Processing and Communications Applications Conference (SIU), 2014 22nd, Trabzon, Turkey; 2014.

[64] Cakra YE, Trisedya BD. Stock price prediction using linear regression based on sentiment analysis. International Conference on Advanced Computer Science and Information

Systems (ICACSIS), Depok, Indonesia; 2015. DOI: 10.1109/ICACSIS.2015.7415179 [65] Narayanan V, Arora I, Bhatia A. Fast and accurate sentiment classification using an enhanced naive bayes model. In: Yin H et al., editors. Intelligent Data Engineering and Automated Learning – IDEAL 2013. IDEAL 2013. Lecture Notes in Computer Science.

[66] Kang H, Yoo SJ, Han D. Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications. 2012;**39**(5):6000-

[67] Phu VN, Ngoc CVT, Ngoc TVT, Duy DN. A C4.5 algorithm for english emotional classification. International Journal of Evolving Systems. 2017;**8**:1-27. DOI: 10.1007/

[68] Vo NP, Vo TNT, Vo TNC, Dat ND, Duy KLD. A decision tree using ID3 algorithm for English semantic analysis. International Journal of Speech Technology (IJST).

[69] Shyamasundar LB, Jhansi Rani P. Twitter sentiment analysis with different feature extractors and dimensionality reduction using supervised learning algorithms. India Conference (INDICON), 2016 IEEE Annual, Bangalore, India; 2016. DOI: 10.1109/

[70] Kim K, Lee J. Sentiment visualization and classification via semi-supervised nonlinear dimensionality reduction. Pattern Recognition. 2014;**47**(2):758-768. DOI: 10.1016/j.patcog.

[71] Oswin Rahadiyan H, Virginia G, Antonius Rachmat C. Sentiment Classification of Film Reviews Using IB1. 7th International Conference on Intelligent Systems, Modeling and

Simulation (ISMS). Bangkok, Thailand; 2016. DOI: 10.1109/ISMS.2016.38

Journal of Complexity. 2007;**23**(1):52-57. DOI: 10.1016/j.jco.2006.07.001

Atlantic City, NJ, USA; 2015. DOI: 10.1109/ICDMW.2015.7

joi.2011.01.003

DOI: 10.1109/SIU.2014.6830606

260 Artificial Intelligence - Emerging Trends and Applications

Vol. 8206. Berlin, Heidelberg: Springer

6010. DOI: 10.1016/j.eswa.2011.11.107

2017;**20**(3):593-613. DOI: 10.1007/s10772-017-9429-x

s12530-017-9180-1

INDICON.2016.7839075

2013.07.022


**Chapter 13**

Provisional chapter

**A Multilevel Genetic Algorithm for the Maximum**

DOI: 10.5772/intechopen.78299

Genetic algorithms (GA) which belongs to the class of evolutionary algorithms are regarded as highly successful algorithms when applied to a broad range of discrete as well continuous optimization problems. This chapter introduces a hybrid approach combining genetic algorithm with the multilevel paradigm for solving the maximum constraint satisfaction problem (Max-CSP). The multilevel paradigm refers to the process of dividing large and complex problems into smaller ones, which are hopefully much easier to solve, and then work backward toward the solution of the original problem, using the solution reached from a child level as a starting solution for the parent level. The promising performances achieved by the proposed approach are demonstrated by comparisons

Keywords: maximum constraint satisfaction problem, genetic algorithms, multilevel

Many problems in the field of artificial intelligence can be modeled as constraint satisfaction problems (CSP). A CSP is a tuple h i X; D;C where, X ¼ f g x1; x2;…; xn is a finite set of variables, D ¼ Dx<sup>1</sup> ; Dx<sup>2</sup> ;…; Dxn f g is a finite set of domains. Thus each variable x ∈ X has a corresponding discrete domain Dx from which it can be instantiated, and C ¼ f g C1;C2;…;Ck is a finite set of constraints. Each k-ary constraint restricts a k-tuple of variables, ð Þ x1; x2;…; xk and specifies a subset of D<sup>1</sup> � … � Dk, each element of which are values that the variables cannot take simultaneously. A solution to a CSP requires the assignment of values to each of the variables from their domains such that all the constraints on the variables are satisfied. The maximum constraint satisfaction problem (Max-CSP) aims at finding an assignment so as to maximize the number of

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

A Multilevel Genetic Algorithm for the Maximum

**Satisfaction Problem**

Satisfaction Problem

Additional information is available at the end of the chapter

made to solve conventional random benchmark problems.

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.78299

Noureddine Bouhmala

Noureddine Bouhmala

Abstract

paradigm

1. Introduction

#### **A Multilevel Genetic Algorithm for the Maximum Satisfaction Problem** A Multilevel Genetic Algorithm for the Maximum Satisfaction Problem

DOI: 10.5772/intechopen.78299

Noureddine Bouhmala Noureddine Bouhmala

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.78299

#### Abstract

Genetic algorithms (GA) which belongs to the class of evolutionary algorithms are regarded as highly successful algorithms when applied to a broad range of discrete as well continuous optimization problems. This chapter introduces a hybrid approach combining genetic algorithm with the multilevel paradigm for solving the maximum constraint satisfaction problem (Max-CSP). The multilevel paradigm refers to the process of dividing large and complex problems into smaller ones, which are hopefully much easier to solve, and then work backward toward the solution of the original problem, using the solution reached from a child level as a starting solution for the parent level. The promising performances achieved by the proposed approach are demonstrated by comparisons made to solve conventional random benchmark problems.

Keywords: maximum constraint satisfaction problem, genetic algorithms, multilevel paradigm

## 1. Introduction

Many problems in the field of artificial intelligence can be modeled as constraint satisfaction problems (CSP). A CSP is a tuple h i X; D;C where, X ¼ f g x1; x2;…; xn is a finite set of variables, D ¼ Dx<sup>1</sup> ; Dx<sup>2</sup> ;…; Dxn f g is a finite set of domains. Thus each variable x ∈ X has a corresponding discrete domain Dx from which it can be instantiated, and C ¼ f g C1;C2;…;Ck is a finite set of constraints. Each k-ary constraint restricts a k-tuple of variables, ð Þ x1; x2;…; xk and specifies a subset of D<sup>1</sup> � … � Dk, each element of which are values that the variables cannot take simultaneously. A solution to a CSP requires the assignment of values to each of the variables from their domains such that all the constraints on the variables are satisfied. The maximum constraint satisfaction problem (Max-CSP) aims at finding an assignment so as to maximize the number of

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

satisfied constraints. Max-CSP can be regarded as the generalization of CSP; the solution maximizes the number of satisfied constraints. In this chapter, attention is focused on binary CSPs, where all constraints are binary, that is, they are based on the cartesian product of the domains of two variables. However, any non-binary CSP can theoretically be converted to a binary CSP [1]. Algorithms for solving CSPs apply the so-called 1-exchange neighborhood under, which two solutions are direct neighbors if, and only if, they differ at most in the value assigned to one variable. Examples include the minimum conflict heuristic MCH [2], the break method for escaping from local minima [3], and various enhanced MCH (e.g., randomized iterative improvement of MCH called WMCH [4], MCH with tabu search [5], and evolutionary algorithms [6]). Algorithms based on assigning weights on constraints are techniques that work by introducing weights on variables or constraints in order to avoid local minima. Methods belonging to this category include genet [7], guided local search [8], the exponentiated subgradient [9], discrete Lagrangian search [10], the scaling and probabilistic smoothing [11], evolutionary algorithms combined with stepwise adaptation of weights [12], methods based on dynamically adapting weights on variables [13], or both (i.e., variables and constraints) [14]. Methods based on large neighborhood search have recently attracted several researchers for solving the CSP [15]. The central idea is to reduce the size of local search space relying on a continual relaxation (removing elements from the solution) and re-optimization (re-inserting the removed elements). Finally, the work introduced in [16] introduces a variable depth metaheuristic combing a greedy local search with a self-adaptive weighting strategy on the constraints weights.

Algorithm 1. The multilevel genetic algorithm.

A Multilevel Genetic Algorithm for the Maximum Satisfaction Problem

http://dx.doi.org/10.5772/intechopen.78299

265

Figure 1. The different steps of the multilevel paradigm.

## 2. Algorithm

#### 2.1. Multilevel context

The multilevel paradigm is a simple technique, which at its core involves recursive coarsening to produce smaller and smaller problems that are easier to solve than the original one. Multilevel techniques have been developed in the period after 1960 and are among the most efficient techniques used for solving large algebraic systems arising from the discretization of partial differential equations. In recent years, it has been recognized that an effective way of enhancing metaheuristics is to use them in the multilevel context. The pseudo-code of the multilevel genetic algorithm is shown in Algorithm 1. Figure 1 illustrates the multilevel paradigm used for six variables and two coarsening levels. The multilevel paradigm consists of four phases: coarsening, initial solution, uncoarsening, and refinement. The coarsening phase aims at merging the variables associated with the problem to form clusters. The clusters are used in a recursive manner to construct a hierarchy of problems each representing the original problem but with fewer degrees of freedom. The coarsest level can then be used to compute an initial solution. The solution found at the coarsest level is uncoarsened (extended to give an initial solution for the parent level) and then improved using a chosen optimization algorithm. A common feature that characterizes multilevel algorithms, is that any solution in any of the coarsened problems is a legitimate solution to the original one. Optimization algorithms using the multilevel paradigm draw their strength from coupling the refinement process across different levels.

Algorithm 1. The multilevel genetic algorithm.

satisfied constraints. Max-CSP can be regarded as the generalization of CSP; the solution maximizes the number of satisfied constraints. In this chapter, attention is focused on binary CSPs, where all constraints are binary, that is, they are based on the cartesian product of the domains of two variables. However, any non-binary CSP can theoretically be converted to a binary CSP [1]. Algorithms for solving CSPs apply the so-called 1-exchange neighborhood under, which two solutions are direct neighbors if, and only if, they differ at most in the value assigned to one variable. Examples include the minimum conflict heuristic MCH [2], the break method for escaping from local minima [3], and various enhanced MCH (e.g., randomized iterative improvement of MCH called WMCH [4], MCH with tabu search [5], and evolutionary algorithms [6]). Algorithms based on assigning weights on constraints are techniques that work by introducing weights on variables or constraints in order to avoid local minima. Methods belonging to this category include genet [7], guided local search [8], the exponentiated subgradient [9], discrete Lagrangian search [10], the scaling and probabilistic smoothing [11], evolutionary algorithms combined with stepwise adaptation of weights [12], methods based on dynamically adapting weights on variables [13], or both (i.e., variables and constraints) [14]. Methods based on large neighborhood search have recently attracted several researchers for solving the CSP [15]. The central idea is to reduce the size of local search space relying on a continual relaxation (removing elements from the solution) and re-optimization (re-inserting the removed elements). Finally, the work introduced in [16] introduces a variable depth metaheuristic combing a greedy

local search with a self-adaptive weighting strategy on the constraints weights.

The multilevel paradigm is a simple technique, which at its core involves recursive coarsening to produce smaller and smaller problems that are easier to solve than the original one. Multilevel techniques have been developed in the period after 1960 and are among the most efficient techniques used for solving large algebraic systems arising from the discretization of partial differential equations. In recent years, it has been recognized that an effective way of enhancing metaheuristics is to use them in the multilevel context. The pseudo-code of the multilevel genetic algorithm is shown in Algorithm 1. Figure 1 illustrates the multilevel paradigm used for six variables and two coarsening levels. The multilevel paradigm consists of four phases: coarsening, initial solution, uncoarsening, and refinement. The coarsening phase aims at merging the variables associated with the problem to form clusters. The clusters are used in a recursive manner to construct a hierarchy of problems each representing the original problem but with fewer degrees of freedom. The coarsest level can then be used to compute an initial solution. The solution found at the coarsest level is uncoarsened (extended to give an initial solution for the parent level) and then improved using a chosen optimization algorithm. A common feature that characterizes multilevel algorithms, is that any solution in any of the coarsened problems is a legitimate solution to the original one. Optimization algorithms using the multilevel paradigm draw their strength from coupling the refinement

2. Algorithm

2.1. Multilevel context

264 Artificial Intelligence - Emerging Trends and Applications

process across different levels.

Figure 1. The different steps of the multilevel paradigm.

#### 2.2. Multilevel genetic algorithm (GA)

GAs [17] are stochastic methods for global search and optimization and belong to the group of nature-inspired metaheuristics leading to the so-called natural computing. It is a fast-growing interdisciplinary field in which a range of techniques and methods are studied for dealing with large, complex, and dynamic problems with various sources of potential uncertainties. GAs simultaneously examine and manipulate a set of possible solutions. A gene is a part of a chromosome (solution), which is the smallest unit of genetic information. Every gene is able to assume different values called allele. All genes of an organism form a genome, which affects the appearance of an organism called phenotype. The chromosomes are encoded using a chosen representation and each can be thought of as a point in the search space of candidate solutions. Each individual is assigned a score (fitness) value that allows assessing its quality. The members of the initial population may be randomly generated or by using sophisticated mechanisms by means of which an initial population of high-quality chromosomes is produced. The reproduction operator selects (randomly or based on the individual's fitness) chromosomes from the population to be parents and enter them in a mating pool. Parent individuals are drawn from the mating pool and combined so that information is exchanged and passed to off-springs depending on the probability of the crossover operator. The new population is then subjected to mutation and enters into an intermediate population. The mutation operator acts as an element of diversity into the population and is generally applied with a low-probability to avoid disrupting crossover results. Finally, a selection scheme is used to update the population giving rise to a new generation. The individuals from the set of solutions, which is called population will evolve from generation to generation by repeated applications of an evaluation procedure that is based on genetic operators. Over many generations, the population becomes increasingly uniform until it ultimately converges to optimal or near-optimal solutions. The different steps of the multilevel weighted genetic algorithm are described as follows:

(i.e., same set of values), otherwise different random values should be assigned to each variable in the cluster. All the individuals of the initial population are evaluated and assigned a fitness expressed in Eq. (1), which counts the number of constraint violations

• initial weights: the next step of the algorithm assigns a fixed amount of weight equal to 1 across all the constraints. The distribution of weights to constraints aims at forcing hard constraints with large weights to be satisfied thereby preventing the algorithm at a later

• optimization: having computed an initial solution at the coarsest graph, GA starts the search process from the coarsest level Gk ¼ Vk ð , Ek) and continues to move toward smaller levels. The motivation behind this strategy is that the order in which the levels are traversed offers a better mechanism for performing diversification and intensification. The coarsest level allows GA to view any cluster of variables as a single entity leading the search to become guided in faraway regions of the solution space and restricted to only those configurations in the solution space in which the variables grouped within a cluster are assigned the same value. As the switch from one level to another implies a decrease in the size of the neighborhood, the search is intensified around solutions from previous levels in order to reach better ones. • parent selection: during the optimization, new solutions are created by combining pairs of individuals in the population and then applying a crossover operator to each chosen pair. Combining pairs of individuals can be viewed as a matching process. In the version of GA used in this work, the individuals are visited in random order. An unmatched individual

• genetic operators: the task of the crossover operator is to reach regions of the search space with higher average quality. The two-point crossover operator is applied to each matched pair of individuals. The two-point crossover selects two randomly points within a chromosome and then interchanges the two parent chromosomes between these points to

• survivor selection: the selection acts on individuals in the current population. Based on each individual quality (fitness), it determines the next population. In the roulette method, the selection is stochastic and biased toward the best individuals. The first step is to calculate the cumulative fitness of the whole population through the sum of the fitness of all individuals. After that, the probability of selection is calculated for each individual as

, where f <sup>i</sup> is the fitness of individual i.

• updating weights: the weights of each current violated constraint is then increased by one, whereas the newly satisfied constraints will have their weights decreased by one before

assigned the value si from Dxi and xj is assigned the value sj from Dxj

Xn j¼iþ1

Fitness <sup>¼</sup> <sup>X</sup><sup>n</sup>�<sup>1</sup>

stage from getting stuck at a local optimum.

i¼1

ik is matched randomly with an unmatched individual il.

generate two new offspring.

the start of new generation.

= P<sup>N</sup> <sup>1</sup> f <sup>i</sup>

being PSelectioni ¼ f <sup>i</sup>

� � > denotes the constraint between the variables xi and xj where xi is

Violation Wi,j < xi ð Þ ;si ; xj;sj

.

http://dx.doi.org/10.5772/intechopen.78299

267

� � > � � (1)

A Multilevel Genetic Algorithm for the Maximum Satisfaction Problem

where < xi ð Þ ;si , xj;sj


(i.e., same set of values), otherwise different random values should be assigned to each variable in the cluster. All the individuals of the initial population are evaluated and assigned a fitness expressed in Eq. (1), which counts the number of constraint violations where < xi ð Þ ;si , xj;sj � � > denotes the constraint between the variables xi and xj where xi is assigned the value si from Dxi and xj is assigned the value sj from Dxj .

2.2. Multilevel genetic algorithm (GA)

266 Artificial Intelligence - Emerging Trends and Applications

the multilevel weighted genetic algorithm are described as follows:

The set V denotes variables and each edge xi; xj

• construction of levels: let G<sup>0</sup> ¼ ð Þ V0; E<sup>0</sup> be an undirected graph of vertices V and edges E.

xj. Unmatched variables are simply left unmatched and copied to the next level.

• initial assignment: the process of constructing a hierarchy of graphs ceases as soon as the size of the coarsest graphs reaches some desired threshold. A random initial population is generated at the lowest level Gk ¼ Vk ð Þ ; Ek . The chromosomes, which are assignments of values to the variables are encoded as strings of bits, the length of which is the number of variables. At the lowest level, the length of the chromosome is equal to the number of clusters. The initial solution is simply constructed by assigning to all variable in a cluster, a random value vi. In this work, it is assumed that all variables have the same domain

variables xi and xj. Given the initial graph G0, the graph is repeatedly transformed into smaller and smaller graphs G1, G2, …, Gm such that ∣V0∣ >, ∣V1∣ >, … > ∣Vm∣. To coarsen a graph from Gj to Gjþ1, a number of different techniques may be used. In this chapter, when combining a set of variables into clusters, the variables are visited in a random order. If a variable xi has not been matched yet, then the algorithms randomly select one of its neighboring unmatched variable xj, and a new cluster consisting of these two variables is created. Its neighbors are the combined neighbors of the merged variables xi and

∈E implies a constraint joining the

GAs [17] are stochastic methods for global search and optimization and belong to the group of nature-inspired metaheuristics leading to the so-called natural computing. It is a fast-growing interdisciplinary field in which a range of techniques and methods are studied for dealing with large, complex, and dynamic problems with various sources of potential uncertainties. GAs simultaneously examine and manipulate a set of possible solutions. A gene is a part of a chromosome (solution), which is the smallest unit of genetic information. Every gene is able to assume different values called allele. All genes of an organism form a genome, which affects the appearance of an organism called phenotype. The chromosomes are encoded using a chosen representation and each can be thought of as a point in the search space of candidate solutions. Each individual is assigned a score (fitness) value that allows assessing its quality. The members of the initial population may be randomly generated or by using sophisticated mechanisms by means of which an initial population of high-quality chromosomes is produced. The reproduction operator selects (randomly or based on the individual's fitness) chromosomes from the population to be parents and enter them in a mating pool. Parent individuals are drawn from the mating pool and combined so that information is exchanged and passed to off-springs depending on the probability of the crossover operator. The new population is then subjected to mutation and enters into an intermediate population. The mutation operator acts as an element of diversity into the population and is generally applied with a low-probability to avoid disrupting crossover results. Finally, a selection scheme is used to update the population giving rise to a new generation. The individuals from the set of solutions, which is called population will evolve from generation to generation by repeated applications of an evaluation procedure that is based on genetic operators. Over many generations, the population becomes increasingly uniform until it ultimately converges to optimal or near-optimal solutions. The different steps of

$$Fitness = \sum\_{i=1}^{n-1} \sum\_{j=i+1}^{n} Variation\{W\_{i,j} < (\mathbf{x}\_i, \mathbf{s}\_i), (\mathbf{x}\_j, \mathbf{s}\_j)>\} \tag{1}$$


• termination condition: the convergence of GA is supposed to be reached if the best individual remains unchanged during five consecutive generations.

3.2. Results

cd094-ct017.

The plots in Figures 2 and 3 compare the WGA with its multilevel variant MLV-WGA. The improvement in quality imparted by the multilevel context is immediately clear. Both WGA and MLV-WGA exhibit what is called a plateau region. A plateau region spans a region in the search space where crossover and mutation operators leave the best solution or the mean solution unchanged. However, the length of this region is shorter with MLV-WGA compared to that of WGA. The multilevel context uses the projected solution obtained at Gmþ<sup>1</sup>ð Þ Vmþ<sup>1</sup>; Emþ<sup>1</sup> as the initial solution for Gmð Þ Vm; Em for further refinement. Even though the solution at Gmþ<sup>1</sup>ð Þ Vmþ<sup>1</sup>; Emþ<sup>1</sup> is at a local minimum, the projected solution may not be at a local optimum with respect to Gmð Þ Vm; Em . The projected assignment is already a good solution leading WGA to converge quicker within few generations to a better solution. Tables 1–3 show a comparison of

A Multilevel Genetic Algorithm for the Maximum Satisfaction Problem

http://dx.doi.org/10.5772/intechopen.78299

269

Figure 2. MLV-GA vs. GA: evolution of the mean unsatisfied constraints as a function of time. Csp-N30-DS40-C125-cd026ct063.

Figure 3. MLV-GA vs. GA: evolution of the mean unsatisfied constraints as a function of time. Csp-N35-DS20-C562-

• projection: once GA has reached the convergence criterion with respect to a child level graph Gk ¼ Vk ð Þ ; Ek , the assignment reached on that level must be projected on its parent graph Gk�<sup>1</sup> ¼ ð Þ Vk�<sup>1</sup>; Ek�<sup>1</sup> . The projection algorithm is simple; if a cluster belongs to Gk ¼ Vk ð Þ ; Ek is assigned the value vli, the merged pair of clusters that it represents belonging to Gk�<sup>1</sup> ¼ ð Þ Vk�<sup>1</sup>; Ek�<sup>1</sup> are also assigned the value vli,

## 3. Experimental results

#### 3.1. Experimental setup

The benchmark instances were generated using model A [18] as follows: each instance is defined by the 4-tuple n, m, pd, pt , where n is the number of variables; m is the size of each variable's domain; pd, the constraint density, is the proportion of pairs of variables, which have a constraint between them; and pt , the constraint tightness, is the probability that a pair of values is inconsistent. From the ð Þ n � ð Þ n � 1 =2 possible constraints, each one is independently chosen to be added in the constraint graph with the probability pd . Given a constraint, we select with the probability pt , which value pairs become no-goods. The model A will on average have pd � ð Þ <sup>n</sup> � <sup>1</sup> <sup>=</sup>2 constraints, each of which has on average pt � <sup>m</sup><sup>2</sup> inconsistent pairs of values. For each pair of density tightness, we generate one soluble instance (i.e., at least one solution exists). Because of the stochastic nature of GA, we let each algorithm do 100 independent runs, each run with a different random seed. Many NP-complete or NP-hard problems show a phase transition point that marks the spot where we go from problems that are under-constrained and so relatively easy to solve, to problems that are over-constrained and so relatively easy to prove insoluble. Problems that are on average harder to solve occur between these two types of relatively easy problem. The values of pd and pt are chosen in such a way that the instances generated are within the phase transition. In order to predict the phase transition region, a formula for the constrainedness [19] of binary CSPs was defined by:

$$\kappa = \frac{n-1}{2} p\_d \log\_m \left( \frac{1}{1 - pt} \right). \tag{2}$$

The tests were carried out on a DELL machine with 800 MHz CPU and 2 GB of memory. The code was written in C and compiled with the GNU C compiler version 4.6. The following parameters have been fixed experimentally and are listed below:


#### 3.2. Results

• termination condition: the convergence of GA is supposed to be reached if the best individ-

• projection: once GA has reached the convergence criterion with respect to a child level graph Gk ¼ Vk ð Þ ; Ek , the assignment reached on that level must be projected on its parent graph Gk�<sup>1</sup> ¼ ð Þ Vk�<sup>1</sup>; Ek�<sup>1</sup> . The projection algorithm is simple; if a cluster belongs to Gk ¼ Vk ð Þ ; Ek is assigned the value vli, the merged pair of clusters that it represents

The benchmark instances were generated using model A [18] as follows: each instance is

variable's domain; pd, the constraint density, is the proportion of pairs of variables, which have

values is inconsistent. From the ð Þ n � ð Þ n � 1 =2 possible constraints, each one is independently

pd � ð Þ <sup>n</sup> � <sup>1</sup> <sup>=</sup>2 constraints, each of which has on average pt � <sup>m</sup><sup>2</sup> inconsistent pairs of values. For each pair of density tightness, we generate one soluble instance (i.e., at least one solution exists). Because of the stochastic nature of GA, we let each algorithm do 100 independent runs, each run with a different random seed. Many NP-complete or NP-hard problems show a phase transition point that marks the spot where we go from problems that are under-constrained and so relatively easy to solve, to problems that are over-constrained and so relatively easy to prove insoluble. Problems that are on average harder to solve occur between these two types of relatively easy problem. The values of pd and pt are chosen in such a way that the instances generated are within the phase transition. In order to predict the phase transition region, a

, where n is the number of variables; m is the size of each

, the constraint tightness, is the probability that a pair of

, which value pairs become no-goods. The model A will on average have

1 1 � pt  . Given a constraint, we select

: (2)

ual remains unchanged during five consecutive generations.

belonging to Gk�<sup>1</sup> ¼ ð Þ Vk�<sup>1</sup>; Ek�<sup>1</sup> are also assigned the value vli,

chosen to be added in the constraint graph with the probability pd

formula for the constrainedness [19] of binary CSPs was defined by:

parameters have been fixed experimentally and are listed below:

<sup>κ</sup> <sup>¼</sup> <sup>n</sup> � <sup>1</sup>

assumed to have reached convergence and moves to a higher level.

<sup>2</sup> pd log <sup>m</sup>

The tests were carried out on a DELL machine with 800 MHz CPU and 2 GB of memory. The code was written in C and compiled with the GNU C compiler version 4.6. The following

• Stopping criteria for the coarsening phase: the reduction process stops as soon as the number of levels reaches 3. At this level, MLV-WGA generates an initial population. • Convergence during the optimization phase: if there is no observable improvement of the fitness function of the best individual during five consecutive generations, MLV-WGA is

3. Experimental results

268 Artificial Intelligence - Emerging Trends and Applications

defined by the 4-tuple n, m, pd, pt

a constraint between them; and pt

3.1. Experimental setup

with the probability pt

• Population size = 50

The plots in Figures 2 and 3 compare the WGA with its multilevel variant MLV-WGA. The improvement in quality imparted by the multilevel context is immediately clear. Both WGA and MLV-WGA exhibit what is called a plateau region. A plateau region spans a region in the search space where crossover and mutation operators leave the best solution or the mean solution unchanged. However, the length of this region is shorter with MLV-WGA compared to that of WGA. The multilevel context uses the projected solution obtained at Gmþ<sup>1</sup>ð Þ Vmþ<sup>1</sup>; Emþ<sup>1</sup> as the initial solution for Gmð Þ Vm; Em for further refinement. Even though the solution at Gmþ<sup>1</sup>ð Þ Vmþ<sup>1</sup>; Emþ<sup>1</sup> is at a local minimum, the projected solution may not be at a local optimum with respect to Gmð Þ Vm; Em . The projected assignment is already a good solution leading WGA to converge quicker within few generations to a better solution. Tables 1–3 show a comparison of

Figure 2. MLV-GA vs. GA: evolution of the mean unsatisfied constraints as a function of time. Csp-N30-DS40-C125-cd026ct063.

Figure 3. MLV-GA vs. GA: evolution of the mean unsatisfied constraints as a function of time. Csp-N35-DS20-C562 cd094-ct017.


MLV-WGA WGA Instance Min Max Mean REav Min Max Mean REav N30-DS40-C121-cd026-ct063 8 14 10.5 0.087 9 19 14.33 0.152 N30-DS40-C125-cd026-ct063 8 18 12.20 0.098 10 19 15.58 0.125 N30-DS40-C173-cd044-ct045 4 10 6.41 0.038 6 14 9.20 0.054 N30-DS40-C312-cd070-ct031 7 14 10.5 0.033 7 19 13.33 0.025 N30-DS40-C328-cd076-ct029 6 13 10.37 0.032 10 18 13.45 0.042 N30-DS40-C333-cd076-ct029 7 13 10.25 0.031 9 18 12.62 0.038 N30-DS40-C389-cd090-ct025 6 13 9.33 0.024 9 17 12.20 0.032 N30-DS40-C390-cd090-ct025 6 14 9.29 0.024 10 17 13 0.031

A Multilevel Genetic Algorithm for the Maximum Satisfaction Problem

http://dx.doi.org/10.5772/intechopen.78299

271

REav denotes the relative error in percent. The value in bold shows the algorithm with the lowest RE.

REav denotes the relative error in percent. The value in bold shows the algorithm with the lowest RE.

Table 3. MLV-WGA vs. WGA: number of variables 40.

MLV-WGA WGA Instance Min Max Mean REav Min Max Mean REav N40-DS20-C78-cd010-ct079 6 12 8.91 0.115 5 13 9.04 0.116 N40-DS20-C80-cd010-ct079 7 13 9.62 0.121 7 13 10.04 0.153 N40-DS20-C82-cd012-ct073 4 9 6.25 0.073 4 11 6.95 0.085 N40-DS20-C95-cd014-ct067 2 8 4.45 0.047 2 7 4.12 0.044 N40-DS20-C653-cd084-ct017 2 14 9.37 0.015 6 16 10.62 0.018 N40-DS20-C660-cd084-ct017 6 14 9.12 0.014 7 6 9.75 0.015 N40-DS20-C751-cd096-ct015 6 13 9.91 0.014 5 13 9.83 0.014 N40-DS20-C752-cd096-ct015 5 17 9.29 0.013 3 13 9.20 0.013 N40-DS20-C756-cd096-ct015 6 15 9.95 0.014 5 16 8.75 0.012 N40-DS40-C106-cd014-ct075 7 14 11.08 0.105 7 16 11.5 0.109 N40-DS40-C115-cd014-ct075 12 20 15.5 0.135 11 20 15.5 0.135 N40-DS40-C181-cd024-ct055 6 17 12.04 0.067 7 17 11.75 0.065 N40-DS40-C196-cd024-ct055 11 12 16.58 0.085 12 20 15.54 0.080 N40-DS40-C226-cd030-ct047 7 14 10.91 0.051 7 16 11.16 0.050 N40-DS40-C647-cd082-ct021 11 23 15.66 0.025 11 20 15.20 0.024 N40-DS40-C658-cd082-ct021 11 22 16.33 0.025 13 21 16.70 0.026 N40-DS40-C703-cd092-ct019 9 21 13.41 0.020 8 20 13.58 0.020 N40-DS40-C711-cd092-ct019 12 23 15.75 0.023 8 20 14.87 0.021 N40-DS40-C719-cd092-ct019 8 21 16.54 0.024 10 20 15.16 0.022

Table 2. MLV-WGA vs. WGA: number of variables: 30.

REav denotes the relative error in percent. The value in bold shows the algorithm with the lowest RE.

Table 1. MLV-WGA vs. WGA: number of variables: 25.



REav denotes the relative error in percent. The value in bold shows the algorithm with the lowest RE.

Table 2. MLV-WGA vs. WGA: number of variables: 30.

MLV-WGA WGA Instance Min Max Mean REav Min Max Mean REav N25-DS20-C36-cd-014-ct083 3 7 4.58 0.128 3 8 5.41 0.151 N25-DS20-C44-cd012-ct087 6 10 8.04 0.183 8 14 9.91 0.226 N25-DS20-C54-cd018-ct075 3 7 5.37 0.100 4 9 6.91 0.128 N25-DS20-C78-cd026-ct061 2 8 4.33 0.056 2 10 5.79 0.073 N25-DS20-C225-cd078-ct027 3 8 4.16 0.019 3 9 5.66 0.026 N25-DS20-C229-cd072-ct029 4 9 6.04 0.014 4 11 8.16 0.036 N25-DS20-C242-cd086-ct025 1 6 3.5 0.015 3 10 5.70 0.024 N25-DS20-C269-cd086-ct025 4 10 5.66 0.022 4 10 7.54 0.029 N25-DS20-C279-cd094-ct023 2 7 4.75 0.018 4 9 6.75 0.025 N25-DS40-C53-cd016-ct085 6 11 8.91 0.169 8 13 10.70 0.202 N25-DS40-C70-cd026-ct069 2 6 4.25 0.061 3 8 5.75 0.083 N25-DS40-C72-cd022-ct075 6 12 9 0.125 6 15 10.45 0.146 N25-DS40-C102-cd032-ct061 5 12 8.12 0.080 7 14 10.33 0.102 N25-DS40-C103-cd034-ct059 5 9 6.83 0.067 4 12 8.79 0.086 N25-DS40-C237-cd082-ct031 3 8 5.66 0.024 5 10 7.87 0.034 N25-DS40-C253-cd088-ct029 3 7 4.95 0.020 5 12 8.04 0.032 N25-DS40-C264-cd088-ct029 5 10 6.91 0.027 6 16 8.91 0.034 N25-DS40-C281-cd096-ct027 3 9 5.62 0.020 4 12 8.54 0.031 N25-DS40-C290-cd096-ct027 4 10 7.08 0.025 6 14 9 0.032

REav denotes the relative error in percent. The value in bold shows the algorithm with the lowest RE.

MLV-WGA WGA Instance Min Max Mean REav Min Max Mean REav N30-DS20-C41-cd012-ct083 2 6 3.70 0.026 3 7 5.08 0.124 N30-DS20-C71-cd018-ct069 1 7 3.37 0.048 3 10 5.66 0.080 N30-DS20-C85-cd020-ct065 3 9 6 0.071 5 12 8.37 0.099 N30-DS20-C119-cd028-ct053 3 10 5.70 0.048 6 12 8.83 0.075 N30-DS20-C334-cd074-ct025 6 13 8.16 0.025 6 14 9.87 0.030 N30-DS20-C387-cd090-ct021 3 9 6.66 0.018 5 13 8.70 0.033 N30-DS20-C389-cd090-ct021 2 9 6.08 0.016 4 14 8.95 0.024 N30-DS20-C392-cd090-ct021 3 10 7.08 0.019 5 15 9.16 0.024 N30-DS20-C399-cd090-ct021 5 13 7.70 0.020 6 14 9.79 0.025 N30-DS40-C85-cd020-ct073 5 11 7.75 0.092 7 14 10.87 0.152 N30-DS40-C96-cd020-ct073 8 12 16 0.167 11 19 14.58 0.015

Table 1. MLV-WGA vs. WGA: number of variables: 25.

270 Artificial Intelligence - Emerging Trends and Applications


REav denotes the relative error in percent. The value in bold shows the algorithm with the lowest RE.

Table 3. MLV-WGA vs. WGA: number of variables 40.

the two algorithms. For each algorithm, the best (Min) and the worst (Max) results are given, while mean represents the average solution. MLV-WGA outperforms WGA in 53 cases out of 96, gives similar results in 20 cases, and was beaten in 23 cases. The performance of both algorithms differs significantly. The difference for the total performance is between 25 and 70% in the advantage of MLV-GA. Comparing the worst performances of both algorithms, MLV-WGA gave bad results in 15 cases, both algorithms give similar results in 8 cases, and MLV-WGA was able to perform better than WGA in 73 cases. Looking at the average results, MLV-WGA does between 16 and 41% better than WGA in 84 cases, while the differences are very marginal in the remaining cases where WGA beats MLV-WGA.

[4] Wallace R, Freuder E. Heuristic methods for over-constrained constraint satisfaction problems. In: Over-Constrained Systems. LNCS. Vol. 1106. Berlin, Germany: Springer Verlag;

A Multilevel Genetic Algorithm for the Maximum Satisfaction Problem

http://dx.doi.org/10.5772/intechopen.78299

273

[5] Galinier P, Hao, J. Tabu search for maximal constraint satisfaction problems. In: Principles and Practice of Constraint Programming CP 1997. LNCS. Vol. 1330. Berlin, Germany:

[6] Zhou Y, Zhou G, Zhang J. A hybrid glowworm swarm optimization algorithm for constrained engineering design problems. Applied Mathematics and Information Sci-

[7] Davenport A, Tsang E, Wang C, Zhu K. Genet: A connectionist architecture for solving constraint satisfaction problems by iterative improvement. In: Proceedings of the Twelth

[8] Voudouris C, Tsang E. Guided local search: Handbook of metaheuristics. International

[9] Schuurmans D, Southey F, Holte E. The exponentiated subgradient algorithm for heuristic Boolean programming. In: 17th International Joint Conference on Artificial Intelligence.

[10] Shang E, Wah B. A discrete Lagrangian-based global-search method for solving satisfiability

[11] Hutter F, Tompkins D, Hoos H. Scaling and probabilistic smoothing: Efficient dynamic local search for SAT. In: Principles and Practice of Constraint Programming CP 2002.

[12] Amante D,Marin A. Adaptive penalty weights when solving congress timetabling. Advances in Artificial Intelligence, Lectures Notes in Computer Science. 2004;3315:144-153

[13] Pullan W, Mascia F, Brunato M. Cooperating local search for the maximum clique prob-

[14] Fang S, Chu Y, Qiao K, Feng X, Xu K. Combining edge weight and vertex weight for

[15] Lee H, Cha S, Yu Y, Jo G. Large neighborhood search using constraint satisfaction techniques in vehicle routing problem. In: Gao Y, Japkowicz N, editors. Advances in Artificial Intelligence. Lecture Notes in Computer Science. Vol. 5549. Heidelberg: Springer Berlin;

[16] Bouhmala N. A variable depth search algorithm for binary constraint satisfaction problems. Mathematical Problems in Engineering. 2015;2015:Article ID 637809, 10 pages. DOI:

[17] Holland J. Adaptation in Natural and Artificial Systems. Ann Arbor: The University of

Series in Operation Research and Management Science. 2003;57:185-218

San Francisco, CA, USA: Morgan Kaufmann Publishers; 2001. pp. 334-341

LNCS. Vol. 2470. Berlin, Germany: Springer Verlag; 2002. pp. 233-248

minimum vertex cover problem. In: FAW 2014. 2014. pp. 71-81

1995. pp. 207-216

Springer Verlag; 1997. pp. 196-208

National Conference on Artificial Intelligence. 1994

problems. Journal of Global Optimization. 1998;12(1):6199

lems. Journal of Heuristics. 2011;17:181-199

2010. pp. 229-232

10.1155/2015/637809

Michigan Press; 1975

ences. 2013;7(1):379-388

## 4. Conclusion

In this work, a multilevel weighted based-genetic algorithm is introduced for MAX-CSP. The results have shown that the multilevel genetic algorithm returns a better solution for the equivalent run-time for most cases compared to the standard genetic algorithm. The multilevel paradigm offeres a better strategy for performing diversification and intensification. This is achieved by allowing GA to view a cluster of variables as a single entity thereby leading the search becoming guided and restricted to only those assignments in the solution space in which the variables grouped within a cluster are assigned the same value. As the size of the clusters gets smaller from one level to another, the size of the neighborhood becomes adaptive, and allows the possibility of exploring different regions in the search space while intensifying the search by exploiting the solutions from previous levels in order to reach better solutions.

## Author details

Noureddine Bouhmala

Address all correspondence to: noureddine.bouhmala@usn.no

Department of Maritime Technology and Innovation, University SouthEast, Raveien, Borre, Norway

## References


[4] Wallace R, Freuder E. Heuristic methods for over-constrained constraint satisfaction problems. In: Over-Constrained Systems. LNCS. Vol. 1106. Berlin, Germany: Springer Verlag; 1995. pp. 207-216

the two algorithms. For each algorithm, the best (Min) and the worst (Max) results are given, while mean represents the average solution. MLV-WGA outperforms WGA in 53 cases out of 96, gives similar results in 20 cases, and was beaten in 23 cases. The performance of both algorithms differs significantly. The difference for the total performance is between 25 and 70% in the advantage of MLV-GA. Comparing the worst performances of both algorithms, MLV-WGA gave bad results in 15 cases, both algorithms give similar results in 8 cases, and MLV-WGA was able to perform better than WGA in 73 cases. Looking at the average results, MLV-WGA does between 16 and 41% better than WGA in 84 cases, while the differences are very marginal in the

In this work, a multilevel weighted based-genetic algorithm is introduced for MAX-CSP. The results have shown that the multilevel genetic algorithm returns a better solution for the equivalent run-time for most cases compared to the standard genetic algorithm. The multilevel paradigm offeres a better strategy for performing diversification and intensification. This is achieved by allowing GA to view a cluster of variables as a single entity thereby leading the search becoming guided and restricted to only those assignments in the solution space in which the variables grouped within a cluster are assigned the same value. As the size of the clusters gets smaller from one level to another, the size of the neighborhood becomes adaptive, and allows the possibility of exploring different regions in the search space while intensifying the search by exploiting the solutions from previous levels in order to reach better solutions.

Department of Maritime Technology and Innovation, University SouthEast, Raveien, Borre,

[1] Dechter R, Pearl J. Tree clustering for constraint networks. Artificial Intelligence. 1989;38:

[2] Minton S, Johnson M, Philips A, Laird P. Minimizing conflicts: A heuristic repair method for constraint satisfaction and scheduling scheduling problems. Artificial Intelligence.

[3] Morris P. The breakout method for escaping from local minima. In: Proceeding AAAI'93 Proceedings of the Eleventh National Conference on Artificial Intelligence. 1993. pp. 40-45

remaining cases where WGA beats MLV-WGA.

272 Artificial Intelligence - Emerging Trends and Applications

4. Conclusion

Author details

Norway

References

353366

1992;58:161-205

Noureddine Bouhmala

Address all correspondence to: noureddine.bouhmala@usn.no


[18] Xu W. Satisfiability transition and experiments on a random constraint satisfaction problem model. International Journal of Hybrid Information Technology. 2014;7(2):191-202

**Chapter 14**

**Provisional chapter**

**Artificial Intelligence Application in Machine Condition**

The subject of machine condition monitoring and fault diagnosis as a part of system maintenance has gained a lot of interest due to the potential benefits to be learned from reduced maintenance budgets, enhanced productivity and improved machine availability. Artificial intelligence (AI) is a successful method of machine condition monitoring and fault diagnosis since these techniques are used as tools for routine maintenance. This chapter attempts to summarize and review the recent research and developments in the field of signal analysis through artificial intelligence in machine condition monitoring and fault diagnosis. Intelligent systems such as artificial neural network (ANN), fuzzy logic system (FLS), genetic algorithms (GA) and support vector machine (SVM) have previously developed many different methods. However, the use of acoustic emission (AE) signal analysis and AI techniques for machine condition monitoring and fault diagnosis is still rare. In the future, the applications of AI in machine condition monitoring and fault diagnosis still need more encouragement and attention due to the gap in the literature.

**Keywords:** artificial intelligence, machine condition monitoring, fault diagnosis

In the current commercial production industries, there is an increasing trend towards the need for higher availability equipment that can work nonstop 24/7. Thus, any type of failure, even minor, cannot be accepted as it can significantly affect the cost and the production. Hence, a very accurate monitoring of the machine condition and a proper fault diagnosis of the machine failure is necessary. The machine fault diagnosis had seen a vast improvement since the maintenance was provided after the machine had developed a fault and affected the

**Artificial Intelligence Application in Machine** 

**Condition Monitoring and Fault Diagnosis**

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: 10.5772/intechopen.74932

**Monitoring and Fault Diagnosis**

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74932

Yasir Hassan Ali

**Abstract**

**1. Introduction**

Yasir Hassan Ali

[19] Gent IP, MacIntyre E, Prosser P, Walsh T. The constrainedness of search. In: Proceedings of the AAAI-96. 1996. pp. 246-252

#### **Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis**

DOI: 10.5772/intechopen.74932

Yasir Hassan Ali Yasir Hassan Ali

[18] Xu W. Satisfiability transition and experiments on a random constraint satisfaction problem model. International Journal of Hybrid Information Technology. 2014;7(2):191-202 [19] Gent IP, MacIntyre E, Prosser P, Walsh T. The constrainedness of search. In: Proceedings of

the AAAI-96. 1996. pp. 246-252

274 Artificial Intelligence - Emerging Trends and Applications

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74932

#### **Abstract**

The subject of machine condition monitoring and fault diagnosis as a part of system maintenance has gained a lot of interest due to the potential benefits to be learned from reduced maintenance budgets, enhanced productivity and improved machine availability. Artificial intelligence (AI) is a successful method of machine condition monitoring and fault diagnosis since these techniques are used as tools for routine maintenance. This chapter attempts to summarize and review the recent research and developments in the field of signal analysis through artificial intelligence in machine condition monitoring and fault diagnosis. Intelligent systems such as artificial neural network (ANN), fuzzy logic system (FLS), genetic algorithms (GA) and support vector machine (SVM) have previously developed many different methods. However, the use of acoustic emission (AE) signal analysis and AI techniques for machine condition monitoring and fault diagnosis is still rare. In the future, the applications of AI in machine condition monitoring and fault diagnosis still need more encouragement and attention due to the gap in the literature.

**Keywords:** artificial intelligence, machine condition monitoring, fault diagnosis

## **1. Introduction**

In the current commercial production industries, there is an increasing trend towards the need for higher availability equipment that can work nonstop 24/7. Thus, any type of failure, even minor, cannot be accepted as it can significantly affect the cost and the production. Hence, a very accurate monitoring of the machine condition and a proper fault diagnosis of the machine failure is necessary. The machine fault diagnosis had seen a vast improvement since the maintenance was provided after the machine had developed a fault and affected the

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

production. After that, it developed into preventive maintenance in the past few years before all the industries started using the condition-based maintenance. Preventive maintenance can be defined as providing maintenance before the machinery faces any fault.

**2. Artificial intelligence**

modelling and needed human intelligence [7].

extensively used in the field of engineering.

**Figure 1.** Architecture of a neural network.

**2.1. Artificial neural network-based fault diagnosis**

The AI is the system that thinks and acts like a human being. It can also imitate human behaviour. It is majorly concerned with the development of a computer's ability to engage in human-like thought processes like learning, reasoning and self-correction [6]. In the last decade, there has been a growing need in AI to solve the problems of engineering. Earlier, these problems were considered hard to be solved analytically or by using mathematical

Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis

http://dx.doi.org/10.5772/intechopen.74932

277

Nowadays, there is an increased demand for advanced AE analysis tools. This chapter shows that many scholars have studied the detection and diagnostic of several faults by using the AE methods in AET and signal analysis. The AI techniques as mentioned earlier have also been

Artificial neural network (ANN) is an information-processing approach. It works like the biological nervous systems like how the brain processes the information in the human body. The discussion was limited to an introduction of many components, which were involved in the ANN implementation. The network architecture or topology (including number of nodes in hidden layers, network connections, initial weight assignments and activation functions) played a key role in the ANN performance and depended on the problem at hand. **Figure 1** shows a simple ANN and its constituents. In most cases, setting the correct topology was based on a heuristic model. On the other hand, the dimensions of the input and the output spaces generally suggested the number of input and output layer nodes. Selecting the network complexity or regularization was very important [8]. When designing a neural network, there are a number of different parameters that must be decided. Some of these parameters are the number of

On the other hand, condition-based maintenance can be defined as providing maintenance depending on the data obtained from target measurements. The efficiency of this technique is measured depending on the accurate diagnostic tactics, which are fulfilled. For surviving in the current competitive market, the industries need to improve their product reliability and also reduce their production costs. The product reliability is more important for specific productions like for the aviation, nuclear and the petrochemical industries where any failure can lead to severe environmental disasters.

Currently, industries have shifted from using the condition-based (predictive) approach to the maintenance-based approach depending on the trending and the data analysis from one or more parameters that indicate the development or the presence of known failures or faults. The effective machine condition monitoring technique must be able to determine the onset of any fault in its early stages and also provide an accurate diagnosis regarding the type of the fault and its location. Ideally, the condition monitoring technique must give an overall and a detailed accurate health assessment of the equipment.

However, conventionally, it would include the aural and the visual inspection (applying all the human senses), temperature monitoring, oil analysis (known as the wear debris analysis), measurement of the vibrations and its analysis, motor current signature analysis, airborne sounds and the acoustic emission (AE) analysis. In acoustic emission analysis, the waves are sent from an emission source and transferred to the surface by the transmission medium. The low-displacement or high-frequency mechanical waves can be picked up as electronic signals. The signal strength can be increased by using a preamplifier before the data are interpreted by the AE equipment [1, 2].

Furthermore, there is a growing interest in developing new technologies to overcome the problems in condition monitoring and diagnostics of complex industrial machinery applications, which were not resolved till now. This provides excellent opportunities for the AI technology to grow continuously, with the rapid increase in the growth of intelligent information, sensor and data acquisition capabilities, combined with the rapid advances in intelligent signal processing techniques [3]. AI techniques that have been extensively used in the field of engineering include genetic algorithms (GA), support vector machine (SVM), fuzzy logic system (FLS) and artificial neural network (ANN). As compared to the common fault diagnostic approaches, the AI techniques are instrumental if they can be improved [4]. Apart from improving performance, these techniques can be easily extended and modified. These can be made adaptive by the integrating new data or information [5].

In this chapter, an attempt has been made to review the recent developments in the field of acoustic emission signal analysis for fault diagnostics of the machine based on the aforementioned AI techniques. These systems can be mutually integrated with each other and also with other traditional techniques.

## **2. Artificial intelligence**

production. After that, it developed into preventive maintenance in the past few years before all the industries started using the condition-based maintenance. Preventive maintenance can

On the other hand, condition-based maintenance can be defined as providing maintenance depending on the data obtained from target measurements. The efficiency of this technique is measured depending on the accurate diagnostic tactics, which are fulfilled. For surviving in the current competitive market, the industries need to improve their product reliability and also reduce their production costs. The product reliability is more important for specific productions like for the aviation, nuclear and the petrochemical industries where any failure

Currently, industries have shifted from using the condition-based (predictive) approach to the maintenance-based approach depending on the trending and the data analysis from one or more parameters that indicate the development or the presence of known failures or faults. The effective machine condition monitoring technique must be able to determine the onset of any fault in its early stages and also provide an accurate diagnosis regarding the type of the fault and its location. Ideally, the condition monitoring technique must give an overall and a

However, conventionally, it would include the aural and the visual inspection (applying all the human senses), temperature monitoring, oil analysis (known as the wear debris analysis), measurement of the vibrations and its analysis, motor current signature analysis, airborne sounds and the acoustic emission (AE) analysis. In acoustic emission analysis, the waves are sent from an emission source and transferred to the surface by the transmission medium. The low-displacement or high-frequency mechanical waves can be picked up as electronic signals. The signal strength can be increased by using a preamplifier before the data are interpreted

Furthermore, there is a growing interest in developing new technologies to overcome the problems in condition monitoring and diagnostics of complex industrial machinery applications, which were not resolved till now. This provides excellent opportunities for the AI technology to grow continuously, with the rapid increase in the growth of intelligent information, sensor and data acquisition capabilities, combined with the rapid advances in intelligent signal processing techniques [3]. AI techniques that have been extensively used in the field of engineering include genetic algorithms (GA), support vector machine (SVM), fuzzy logic system (FLS) and artificial neural network (ANN). As compared to the common fault diagnostic approaches, the AI techniques are instrumental if they can be improved [4]. Apart from improving performance, these techniques can be easily extended and modified. These can be

In this chapter, an attempt has been made to review the recent developments in the field of acoustic emission signal analysis for fault diagnostics of the machine based on the aforementioned AI techniques. These systems can be mutually integrated with each other and also with

be defined as providing maintenance before the machinery faces any fault.

can lead to severe environmental disasters.

276 Artificial Intelligence - Emerging Trends and Applications

by the AE equipment [1, 2].

other traditional techniques.

detailed accurate health assessment of the equipment.

made adaptive by the integrating new data or information [5].

The AI is the system that thinks and acts like a human being. It can also imitate human behaviour. It is majorly concerned with the development of a computer's ability to engage in human-like thought processes like learning, reasoning and self-correction [6]. In the last decade, there has been a growing need in AI to solve the problems of engineering. Earlier, these problems were considered hard to be solved analytically or by using mathematical modelling and needed human intelligence [7].

Nowadays, there is an increased demand for advanced AE analysis tools. This chapter shows that many scholars have studied the detection and diagnostic of several faults by using the AE methods in AET and signal analysis. The AI techniques as mentioned earlier have also been extensively used in the field of engineering.

## **2.1. Artificial neural network-based fault diagnosis**

Artificial neural network (ANN) is an information-processing approach. It works like the biological nervous systems like how the brain processes the information in the human body. The discussion was limited to an introduction of many components, which were involved in the ANN implementation. The network architecture or topology (including number of nodes in hidden layers, network connections, initial weight assignments and activation functions) played a key role in the ANN performance and depended on the problem at hand. **Figure 1** shows a simple ANN and its constituents. In most cases, setting the correct topology was based on a heuristic model. On the other hand, the dimensions of the input and the output spaces generally suggested the number of input and output layer nodes. Selecting the network complexity or regularization was very important [8]. When designing a neural network, there are a number of different parameters that must be decided. Some of these parameters are the number of

**Figure 1.** Architecture of a neural network.

training iterations, the number of layers, the learning rate, the number of neurons per layer and the transfer functions, and so on.

the UPR results, a supervised pattern recognition (SPR) method was trained based on the back-propagation neural network. This was applied to the AE data collection and a subse-

Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis

http://dx.doi.org/10.5772/intechopen.74932

279

The neural network gained attention in grinding research due to its functions of learning, interpolation and pattern recognition and classification. Different other examples of the application in the engineering field were also reported [18–22]. Aguiar et al. attempted to attain the classification of burn degrees of the surface-grinding machine, which was utilised for grinding tests with an aluminium oxide-grinding wheel and the utilisation of neural networks. The AE and power signal along with the statistics from the digital signal processing of these

Furthermore, the ANN approach was proposed for the detection of work-piece "burn", the unwanted change in metallurgical properties of the material produced by overly aggressive or otherwise inappropriate grinding [19]. The grinding AE signals for 52100 bearing steel were collected and digested to extract feature vectors. These appeared to be more useful for ANN processing. Aguiar et al.'s work was different as it used grinding parameters as an input to the neural networks that were not tested yet in surface roughness prediction by neural networks. In addition, a higher sampling rate data acquisition system was used to get the

Goebel and Wright developed hybrid architecture, featuring fuzzy logic and neural networks to cope with weaknesses of traditional methods for monitoring and diagnosing an unattended milling machine. Force, spindle current and acoustic emission data were used as inputs to the neural network after they underwent some signal processing for calculating the membership functions of fuzzy relations. Additionally, fuzzy logic principles were utilised for diagnosing the system's status concerning tool wear and chatter [24]. The findings of it was encouraging to use neural network in detection and classification of work piece "burn" and surface roughness prediction revealed that AE signal from grinding machine

Impact damage is a problem that damages the composite industry. This damage may seem superficial, but it may often have negative effects on the performance of the composite structure. The conventional NDE techniques can detect the locations or the shapes of the impact damage and cannot quantify its effects on the structure. Conversely, AE records the active flaw growth when the structure gets loaded. It also measures the reduction in the structural performance produced by an impact load. AE signal analysis was used to measure the effect of impact damage on burst pressure in 5.75 inch diameter, inert propellant-filled and filament-wound pressure vessels. The AE data were collected from 15 graphite/epoxy pressure vessels featuring 5 damage states and 3 resin systems. A burst pressure prediction model was developed by correlating the AE amplitude (frequency) distribution, generated during the first pressure ramp, to 800 psig to known burst pressures using a four-layered back-propaga-

The ANN pattern recognition technique was used for analysing the AE source signals of the pressure vessel in the site. For this purpose, a new quantitative analysis concept for AE

quent biaxial fatigue loading of the same blade [17].

signals were used as inputs of the neural networks [9].

acoustic emission and cutting power [23].

[9, 19, 23, 24].

tion neural network [25].

The benefit of ANN was that it had the ability to respond to an input pattern in a desirable manner after the learning phase. Previous studies have proved that the efficiency of ANN can predict the faults of machining processes. This technique was found to be very useful as it can be used in industrial automation in a more flexible manner [9]. ANN has been extensively used in health diagnosis of mechanical gear, bearing and rotating machines by using features more from vibration signals and less from the acoustic signals.

There has been an increasing demand for advanced AE analysis tools, which have the capacity to distinguish different sources of AE data. This has resulted in developing modern and more flexible pattern recognition software, combining traditional, graphical AE analysis and advanced unsupervised pattern recognition (UPR) and supervised pattern recognition (SPR) analysis. Application of the UPR techniques on AE data during various test cases has also increased the understanding of the damage evolution and the capacity of noise discrimination [10].

The problem of a roller with health monitoring has illustrated the effectiveness of GA for fault classification by using ANNs [8]. In this regard, Al–Balushi and Samanta have suggested a procedure to diagnose the fault of gears by wavelet transformation and ANN for AE signals. These features were taken from wavelet transformation and were used as an input to an ANN based on diagnosing approach [11]. In the fault prognosis systems, the acoustic emission and vibration signal were utilised as an input signal. Additionally, ANN was utilised as a prognosis system for rotating machinery failure [12]. In this way, a multiple-layer neural network was successfully used to detect the fault in the gearbox, and classification was used to utilise the supervised learning with an experimentally obtained data. The data were presented as processed vibration and acoustic emission signals [13]. The utilisation of acoustic emission for early detection of the helicopter rotor head dynamic component faults was previously studied. They analysed the stress wave of the flight-test data set by using the wavelet-based techniques for assessing the background operational noise as compared to machinery failure results. The feed-forward neural network was used as a classifier to determine the correct fight regime [14].

For solving the issues of velocity and the time differences, a new approach to AE source localization was described. This new approach to AE source location was documented on the wing spar cut-out of L-39 aircraft, as this method was used to estimate the AE source coordination by using the ANN process which extracted signal parameters [15]. Fog et al. studied the detection of the exhaust valve burn—through a four-cylinder, 500 mm bore and two-stroke marine diesel engine. This investigation comprised of monitoring three different valve conditions (normal, leak and large leak). Vibration and structure-born stress waves (AE) were monitored. The acoustic emission (AE) signal features were extracted by using principal component analysis (PCA). A feed-forward neural classifier was also used for discriminating between the three valve conditions [16].

The AE data collected during a static test of a 12-m FRP wind turbine blade was analysed and classified by using different unsupervised pattern recognition (UPR) techniques, and using the UPR results, a supervised pattern recognition (SPR) method was trained based on the back-propagation neural network. This was applied to the AE data collection and a subsequent biaxial fatigue loading of the same blade [17].

training iterations, the number of layers, the learning rate, the number of neurons per layer

The benefit of ANN was that it had the ability to respond to an input pattern in a desirable manner after the learning phase. Previous studies have proved that the efficiency of ANN can predict the faults of machining processes. This technique was found to be very useful as it can be used in industrial automation in a more flexible manner [9]. ANN has been extensively used in health diagnosis of mechanical gear, bearing and rotating machines by using features

There has been an increasing demand for advanced AE analysis tools, which have the capacity to distinguish different sources of AE data. This has resulted in developing modern and more flexible pattern recognition software, combining traditional, graphical AE analysis and advanced unsupervised pattern recognition (UPR) and supervised pattern recognition (SPR) analysis. Application of the UPR techniques on AE data during various test cases has also increased the

The problem of a roller with health monitoring has illustrated the effectiveness of GA for fault classification by using ANNs [8]. In this regard, Al–Balushi and Samanta have suggested a procedure to diagnose the fault of gears by wavelet transformation and ANN for AE signals. These features were taken from wavelet transformation and were used as an input to an ANN based on diagnosing approach [11]. In the fault prognosis systems, the acoustic emission and vibration signal were utilised as an input signal. Additionally, ANN was utilised as a prognosis system for rotating machinery failure [12]. In this way, a multiple-layer neural network was successfully used to detect the fault in the gearbox, and classification was used to utilise the supervised learning with an experimentally obtained data. The data were presented as processed vibration and acoustic emission signals [13]. The utilisation of acoustic emission for early detection of the helicopter rotor head dynamic component faults was previously studied. They analysed the stress wave of the flight-test data set by using the wavelet-based techniques for assessing the background operational noise as compared to machinery failure results. The feed-forward neural network was used as a classifier to determine the correct

For solving the issues of velocity and the time differences, a new approach to AE source localization was described. This new approach to AE source location was documented on the wing spar cut-out of L-39 aircraft, as this method was used to estimate the AE source coordination by using the ANN process which extracted signal parameters [15]. Fog et al. studied the detection of the exhaust valve burn—through a four-cylinder, 500 mm bore and two-stroke marine diesel engine. This investigation comprised of monitoring three different valve conditions (normal, leak and large leak). Vibration and structure-born stress waves (AE) were monitored. The acoustic emission (AE) signal features were extracted by using principal component analysis (PCA). A feed-forward neural classifier was also used for discriminating

The AE data collected during a static test of a 12-m FRP wind turbine blade was analysed and classified by using different unsupervised pattern recognition (UPR) techniques, and using

understanding of the damage evolution and the capacity of noise discrimination [10].

and the transfer functions, and so on.

278 Artificial Intelligence - Emerging Trends and Applications

fight regime [14].

between the three valve conditions [16].

more from vibration signals and less from the acoustic signals.

The neural network gained attention in grinding research due to its functions of learning, interpolation and pattern recognition and classification. Different other examples of the application in the engineering field were also reported [18–22]. Aguiar et al. attempted to attain the classification of burn degrees of the surface-grinding machine, which was utilised for grinding tests with an aluminium oxide-grinding wheel and the utilisation of neural networks. The AE and power signal along with the statistics from the digital signal processing of these signals were used as inputs of the neural networks [9].

Furthermore, the ANN approach was proposed for the detection of work-piece "burn", the unwanted change in metallurgical properties of the material produced by overly aggressive or otherwise inappropriate grinding [19]. The grinding AE signals for 52100 bearing steel were collected and digested to extract feature vectors. These appeared to be more useful for ANN processing. Aguiar et al.'s work was different as it used grinding parameters as an input to the neural networks that were not tested yet in surface roughness prediction by neural networks. In addition, a higher sampling rate data acquisition system was used to get the acoustic emission and cutting power [23].

Goebel and Wright developed hybrid architecture, featuring fuzzy logic and neural networks to cope with weaknesses of traditional methods for monitoring and diagnosing an unattended milling machine. Force, spindle current and acoustic emission data were used as inputs to the neural network after they underwent some signal processing for calculating the membership functions of fuzzy relations. Additionally, fuzzy logic principles were utilised for diagnosing the system's status concerning tool wear and chatter [24]. The findings of it was encouraging to use neural network in detection and classification of work piece "burn" and surface roughness prediction revealed that AE signal from grinding machine [9, 19, 23, 24].

Impact damage is a problem that damages the composite industry. This damage may seem superficial, but it may often have negative effects on the performance of the composite structure. The conventional NDE techniques can detect the locations or the shapes of the impact damage and cannot quantify its effects on the structure. Conversely, AE records the active flaw growth when the structure gets loaded. It also measures the reduction in the structural performance produced by an impact load. AE signal analysis was used to measure the effect of impact damage on burst pressure in 5.75 inch diameter, inert propellant-filled and filament-wound pressure vessels. The AE data were collected from 15 graphite/epoxy pressure vessels featuring 5 damage states and 3 resin systems. A burst pressure prediction model was developed by correlating the AE amplitude (frequency) distribution, generated during the first pressure ramp, to 800 psig to known burst pressures using a four-layered back-propagation neural network [25].

The ANN pattern recognition technique was used for analysing the AE source signals of the pressure vessel in the site. For this purpose, a new quantitative analysis concept for AE sources of pressure vessel was introduced by using artificial neural network classification along with raising a new method to evaluate the severity of the AE sources [26].

Macías conducted an analysis of the relationship between AE signals and the main parameters of friction stir welding (FSW) process on the basis of ANN. The AE signals were acquired by data acquisition, applied in the welding process, carrying out plates 3 mm thick of aluminium alloy. Wavelet transform (WT) was also used for the statistical and temporal parameters of the decomposition of EA signals as input for the multilayer feed-forward ANN [27].

The partial discharge (PD) detection, signal analysis and pattern identification, using AE measurements and the back-propagation (BP) ANN, were also studied. In this way, the measured signals were processed with three-dimensional patterns and short duration Fourier transforms (SDFT). The findings showed that utilisation of BP ANN with the SDFT components for the classification of the different PD patterns provided excellent results [28].

To determine the quality of feature extraction and for the ANN classifier, performances were also conducted through a series of experimentations. This helped in input data acquisition during AE experiments on the chemical process plant. These input data consisted of a set of AE power spectra. Each source input data file was subjected to preprocessing consisting of additional linear averaging in each input vector and individual amplitude normalisation by removing the mean value and division by the standard deviation of the feature. Three-layer networks using the back-propagation updating scheme were used for assessing their combined feature extraction and classification capabilities, while solving the problem of process stage recognition [29].

The earlier generations of the neural networks used the analogue signals for conveying the data from one neuron to the next. This communication between the neurons in the SNNs used spikes, which was similar to the system used in the actual human neurons. The spikes could be recognized only at those instances when they had occurred. With the help of the weighted sum of the analogue input value, the earlier neuron estimated the value using the sum-specific non-linear function. The value helped in determining the delay in the spike output, which was aimed for the succeeding neuron. Generally, the spiking neuron was viewed as the leaky integrator because the target neuron integrated the spikes for a period of time and accepted the resultant integrated values used as the membrane potential. When the membrane potential value approached a specific threshold value, then, the neuron was seen to

Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis

http://dx.doi.org/10.5772/intechopen.74932

281

An increased knowledge in the information processing of the biological neurons helped in explaining many additional parameters (like the gene and the protein expression) that needed to be taken into consideration for the neurons to spike [33–35]. The additional parameters included the different physical properties of the connections [32], the likelihood of the spikes being accepted at the synapse and the emitted neurotransmitters or the open-ion channels [36, 37]. Several of the properties were modelled mathematically and were used for studying the biological neuronal system [38, 39]. The SNNs were made of the artificial neurons that communicated using the trains that were considered as the pulse-coded data [40]. The SNN was biologically acceptable, and it was seen to offer a means for the representation of the frequency, time, phase and such other features for the information processing. Moreover, the SNN possessed the ability for training the neurons for converting their spatial-temporal data to spikes (their properties include the spiking rates and spiking time). When one was selecting the neuronal model for an SNN, one needed to consider the computational efficacy and the biological credibility [40]. If it was seen that the computational efficacy was better than the biological plausibility, then the leaky integrate-and-fire (LIF) model needed to be adopted

send a spike; thereafter, the membrane potential value was reset.

due to its cost effectiveness.

**Figure 2.** Spiking neural network [32].

Until 2015, there is no study in the literature related to the estimation of oil film thickness through acoustic emission signals, so to predict and monitor oil film thickness of spur gear, a test rig was built and the gearbox was run at different speeds and load conditions. Artificial neural network (ANN) and regression models used to predict the lubricant regime depended on oil temperature, acoustic emission signals and specific film thickness (). Both FFBP and Elman network models were used to predict specific oil film thickness with input as AE and temperature data. The results showed that FFBP and Elman models were effective in predicting oil film thickness from acoustic emission signal and temperature, and this suggested technique attained 99.9% success in prediction and classification at high speed during training. The FFBP was better than Elman during testing and gave excellent results in prediction and classification. Thus, the architecture and topology of the network through specific systems can be used for online monitoring of oil film thickness and to predict any causes of failure of spur gear operation [30, 31].

## **2.2. Spiking neural network**

Recently, spiking neural network (SNN) is the third-generation neural network (**Figure 2**) and has gained a lot of interest in the scientific community [32]. The SNNs became famous before the introduction of the sigmoidal or the perceptron neuron [32]. It was observed that the SNNs were very suitable for the parallel implementation in the digital hardware [32] and in the analogue hardware [33, 34].

Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis http://dx.doi.org/10.5772/intechopen.74932 281

**Figure 2.** Spiking neural network [32].

sources of pressure vessel was introduced by using artificial neural network classification

Macías conducted an analysis of the relationship between AE signals and the main parameters of friction stir welding (FSW) process on the basis of ANN. The AE signals were acquired by data acquisition, applied in the welding process, carrying out plates 3 mm thick of aluminium alloy. Wavelet transform (WT) was also used for the statistical and temporal parameters of the

The partial discharge (PD) detection, signal analysis and pattern identification, using AE measurements and the back-propagation (BP) ANN, were also studied. In this way, the measured signals were processed with three-dimensional patterns and short duration Fourier transforms (SDFT). The findings showed that utilisation of BP ANN with the SDFT components for

To determine the quality of feature extraction and for the ANN classifier, performances were also conducted through a series of experimentations. This helped in input data acquisition during AE experiments on the chemical process plant. These input data consisted of a set of AE power spectra. Each source input data file was subjected to preprocessing consisting of additional linear averaging in each input vector and individual amplitude normalisation by removing the mean value and division by the standard deviation of the feature. Three-layer networks using the back-propagation updating scheme were used for assessing their combined feature extraction and classification capabilities, while solving the problem of process

Until 2015, there is no study in the literature related to the estimation of oil film thickness through acoustic emission signals, so to predict and monitor oil film thickness of spur gear, a test rig was built and the gearbox was run at different speeds and load conditions. Artificial neural network (ANN) and regression models used to predict the lubricant regime depended on oil temperature, acoustic emission signals and specific film thickness (). Both FFBP and Elman network models were used to predict specific oil film thickness with input as AE and temperature data. The results showed that FFBP and Elman models were effective in predicting oil film thickness from acoustic emission signal and temperature, and this suggested technique attained 99.9% success in prediction and classification at high speed during training. The FFBP was better than Elman during testing and gave excellent results in prediction and classification. Thus, the architecture and topology of the network through specific systems can be used for online monitoring of oil film thickness and to predict any causes of failure of

Recently, spiking neural network (SNN) is the third-generation neural network (**Figure 2**) and has gained a lot of interest in the scientific community [32]. The SNNs became famous before the introduction of the sigmoidal or the perceptron neuron [32]. It was observed that the SNNs were very suitable for the parallel implementation in the digital hardware [32] and

along with raising a new method to evaluate the severity of the AE sources [26].

280 Artificial Intelligence - Emerging Trends and Applications

decomposition of EA signals as input for the multilayer feed-forward ANN [27].

the classification of the different PD patterns provided excellent results [28].

stage recognition [29].

spur gear operation [30, 31].

**2.2. Spiking neural network**

in the analogue hardware [33, 34].

The earlier generations of the neural networks used the analogue signals for conveying the data from one neuron to the next. This communication between the neurons in the SNNs used spikes, which was similar to the system used in the actual human neurons. The spikes could be recognized only at those instances when they had occurred. With the help of the weighted sum of the analogue input value, the earlier neuron estimated the value using the sum-specific non-linear function. The value helped in determining the delay in the spike output, which was aimed for the succeeding neuron. Generally, the spiking neuron was viewed as the leaky integrator because the target neuron integrated the spikes for a period of time and accepted the resultant integrated values used as the membrane potential. When the membrane potential value approached a specific threshold value, then, the neuron was seen to send a spike; thereafter, the membrane potential value was reset.

An increased knowledge in the information processing of the biological neurons helped in explaining many additional parameters (like the gene and the protein expression) that needed to be taken into consideration for the neurons to spike [33–35]. The additional parameters included the different physical properties of the connections [32], the likelihood of the spikes being accepted at the synapse and the emitted neurotransmitters or the open-ion channels [36, 37]. Several of the properties were modelled mathematically and were used for studying the biological neuronal system [38, 39]. The SNNs were made of the artificial neurons that communicated using the trains that were considered as the pulse-coded data [40]. The SNN was biologically acceptable, and it was seen to offer a means for the representation of the frequency, time, phase and such other features for the information processing. Moreover, the SNN possessed the ability for training the neurons for converting their spatial-temporal data to spikes (their properties include the spiking rates and spiking time). When one was selecting the neuronal model for an SNN, one needed to consider the computational efficacy and the biological credibility [40]. If it was seen that the computational efficacy was better than the biological plausibility, then the leaky integrate-and-fire (LIF) model needed to be adopted due to its cost effectiveness.

In their study, Silva et al. [41, 42] depicted the applications of the prototype decision support system for monitoring the tool wear depending on the SNN technique. This system consisted of six different components, that is, collection of data, feature extraction, multi-sensor integration, pattern recognition, tool wear estimation and the outlier detection. Their proposed architecture consisted of one built-in self-organizing neural architecture part that was based on the SNN. Their study showed that the modelling process was very efficient for classifying the tool wear level of the tool inserts with the help of the apparent weak features. Their method showed the effectiveness of using the SNN model for the tool condition monitoring, thus implying that the approach was feasible for many industrial applications, wherein a lot of noisy data are obtained. This researcher was the only one who used SNN in condition monitoring; the result showed the capability of spiking neuron networks for tool condition monitoring.

objective function (performance criteria or a system's behaviour) in a decoded form (phenotype). A particular group of parents was selected from the population for generating offspring

Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis

http://dx.doi.org/10.5772/intechopen.74932

283

The fitness of all offsprings was then evaluated using the same criterion. The chromosomes in the current population were then replaced by their offspring on the basis of a certain replacement strategy. Such a GA cycle was repeated until the termination criterion was reached [8]. Using ANNs utilised a simple problem of a roller with health monitoring to illustrate the effectiveness of GA in AE feature selection for fault classification. It revealed that utilising GAs to select an optimal feature set for a classification application of ANNs was a very powerful technique [8]. Ming applies the AE technique for bearing condition monitoring and fault diagnosis. Scales for continuous wavelet transform, wavelet-based waveform parameter selection and optimisation on the basis of genetic algorithm were the proposed selection methods [43]. The AE was monitored by utilising a data acquisition system during the process of conducting the mechanical tests on several materials. Two of the sensors were positioned directly on the specimen. AE signals were thought to be pattern vectors described by a number of writers. In this chapter, "model" data sets were generated to become closer to AE signals that were recorded during the tests. This chapter presented and validated a genetic algorithmbased approach to cluster the AE signals. Its superiority over the k-means algorithm was highlighted by the study of different "model" data sets. The genetic strategy can be characterised by a high stability and a high performance especially to cluster data sets consisting of a minority class, a cluster with signals of extreme features or a set of clusters with very different

Zadeh introduced the fuzzy logic (FL) in 1965 [45–47]. FL is a multi-valued logic that allows the intermediate values between conventional evaluations like true/false, yes/no, high/low, and so on. The FL helps in providing a variety of ways to solve a control or classification problem. Thus, this method focuses on what the system should do rather than trying to model

Aguiar et al. work is mentioned twice in this chapter because it contains two parts: first part used ANN for the classification of burn degrees of the surface grinding machine; in this part, a methodology was used to predict the surface roughness of advanced ceramics by using an adaptive neuro-fuzzy inference system (ANFIS). For this study, alumina work pieces were pressed and sintered into rectangular bars. The statistical data processed from the AE signal and the cutting power were also used as input data for ANFIS [9]. Cusido et al. provided approaches for a one-board fault detecting system and test program set (TPS) fault detecting system for electromechanical actuators (EMA) ball bearings by analysing the different vibra-

Omkar et al. presented the results of fuzzy modelling to discover the problem in grinding through digital processing of the acoustic emission signals produced during the process. Fuzzy C-means (FCM) clustering was utilised in classification of the AE signal to different

tion and AE signals and by using FL inference techniques [49].

on the basis of the defined genetic operations of crossover and mutation.

sizes [44].

how it works [48].

**2.4. Fuzzy logic-based fault diagnosis**

## **2.3. Genetic algorithm-based fault diagnosis**

GA created by John Holland in the 1970s is an evolutionary algorithm which is part of the field of artificial intelligence. A genetic algorithm (GA) is a method for solving both constrained and unconstrained optimization problems based on a natural selection process that mimics biological evolution. The algorithm repeatedly modifies a population of individual solutions. At each step, the genetic algorithm randomly selects individuals from the current population and uses them as parents to produce the children for the next generation. Over successive generations, the population "evolves" towards an optimal solution.

As originally proposed, a simple GA mainly consists of three processes: selection, genetic operation and replacement. Description of a typical GA cycle and its high-level description are provided in **Figure 3**. The population composed of a group of chromosomes, which were the candidates for the solution. The fitness values of all chromosomes were evaluated by an

**Figure 3.** Genetic algorithm cycle.

objective function (performance criteria or a system's behaviour) in a decoded form (phenotype). A particular group of parents was selected from the population for generating offspring on the basis of the defined genetic operations of crossover and mutation.

The fitness of all offsprings was then evaluated using the same criterion. The chromosomes in the current population were then replaced by their offspring on the basis of a certain replacement strategy. Such a GA cycle was repeated until the termination criterion was reached [8]. Using ANNs utilised a simple problem of a roller with health monitoring to illustrate the effectiveness of GA in AE feature selection for fault classification. It revealed that utilising GAs to select an optimal feature set for a classification application of ANNs was a very powerful technique [8]. Ming applies the AE technique for bearing condition monitoring and fault diagnosis. Scales for continuous wavelet transform, wavelet-based waveform parameter selection and optimisation on the basis of genetic algorithm were the proposed selection methods [43].

The AE was monitored by utilising a data acquisition system during the process of conducting the mechanical tests on several materials. Two of the sensors were positioned directly on the specimen. AE signals were thought to be pattern vectors described by a number of writers. In this chapter, "model" data sets were generated to become closer to AE signals that were recorded during the tests. This chapter presented and validated a genetic algorithmbased approach to cluster the AE signals. Its superiority over the k-means algorithm was highlighted by the study of different "model" data sets. The genetic strategy can be characterised by a high stability and a high performance especially to cluster data sets consisting of a minority class, a cluster with signals of extreme features or a set of clusters with very different sizes [44].

## **2.4. Fuzzy logic-based fault diagnosis**

In their study, Silva et al. [41, 42] depicted the applications of the prototype decision support system for monitoring the tool wear depending on the SNN technique. This system consisted of six different components, that is, collection of data, feature extraction, multi-sensor integration, pattern recognition, tool wear estimation and the outlier detection. Their proposed architecture consisted of one built-in self-organizing neural architecture part that was based on the SNN. Their study showed that the modelling process was very efficient for classifying the tool wear level of the tool inserts with the help of the apparent weak features. Their method showed the effectiveness of using the SNN model for the tool condition monitoring, thus implying that the approach was feasible for many industrial applications, wherein a lot of noisy data are obtained. This researcher was the only one who used SNN in condition monitoring; the result showed the capability of spiking neuron networks for tool condition

GA created by John Holland in the 1970s is an evolutionary algorithm which is part of the field of artificial intelligence. A genetic algorithm (GA) is a method for solving both constrained and unconstrained optimization problems based on a natural selection process that mimics biological evolution. The algorithm repeatedly modifies a population of individual solutions. At each step, the genetic algorithm randomly selects individuals from the current population and uses them as parents to produce the children for the next generation. Over successive

As originally proposed, a simple GA mainly consists of three processes: selection, genetic operation and replacement. Description of a typical GA cycle and its high-level description are provided in **Figure 3**. The population composed of a group of chromosomes, which were the candidates for the solution. The fitness values of all chromosomes were evaluated by an

monitoring.

**Figure 3.** Genetic algorithm cycle.

**2.3. Genetic algorithm-based fault diagnosis**

282 Artificial Intelligence - Emerging Trends and Applications

generations, the population "evolves" towards an optimal solution.

Zadeh introduced the fuzzy logic (FL) in 1965 [45–47]. FL is a multi-valued logic that allows the intermediate values between conventional evaluations like true/false, yes/no, high/low, and so on. The FL helps in providing a variety of ways to solve a control or classification problem. Thus, this method focuses on what the system should do rather than trying to model how it works [48].

Aguiar et al. work is mentioned twice in this chapter because it contains two parts: first part used ANN for the classification of burn degrees of the surface grinding machine; in this part, a methodology was used to predict the surface roughness of advanced ceramics by using an adaptive neuro-fuzzy inference system (ANFIS). For this study, alumina work pieces were pressed and sintered into rectangular bars. The statistical data processed from the AE signal and the cutting power were also used as input data for ANFIS [9]. Cusido et al. provided approaches for a one-board fault detecting system and test program set (TPS) fault detecting system for electromechanical actuators (EMA) ball bearings by analysing the different vibration and AE signals and by using FL inference techniques [49].

Omkar et al. presented the results of fuzzy modelling to discover the problem in grinding through digital processing of the acoustic emission signals produced during the process. Fuzzy C-means (FCM) clustering was utilised in classification of the AE signal to different sources of signals. FCM was potentially helpful in discovering the cluster among the data, when the boundaries between the subgroups overlap. AE test was conducted by using pulse, pencil and spark signal source on the surface of solid steel block. Four parameters such as event duration, peak amplitude, rise time and ring down count were measured with the help of AET 5000 system. These data were then used in training and validation of the FCM-based classification [50].

Khalifa and Komarizadeh developed an efficient walnut recognition system through putting together the AE analysis, principle component analysis (PCA) and adaptive neuro-fuzzy inference system (ANFIS) classifier. This new system was tested later and classified walnuts into two classes. In the classification phase, selected statistical features were used as the input

Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis

http://dx.doi.org/10.5772/intechopen.74932

285

The support vector machine (SVM) approach was utilised in the form of a classification technique on the basis of the statistical learning theory (SLT). It was basically based on the principle of hyperplane classifier or linear separability. The main purpose of SVM was to explore a linear optimal hyperplane for maximizing the margin of separation between the two classes [59, 60]. The SVM was utilised for fault diagnosis of spur bevel gear box. This was considered to be a popular machine learning application due to its higher accuracy and for its generalization capabilities [61]. These studies also examined the fault diagnosis of low-speed bearings based on AE technique and vibration signal. Fault diagnosis was conducted by using the classification technique with the help of relevance vector machine (RVM) and SVM. The classification process provided a comparative study between RVM and SVM for fault diagnosis of low-

Yu and Zhou exposed the method to classify the AE signals in composite laminates by utilising SVM. The classifier had built to achieve the identification and classification of AE signals. The results of simulation showed that SVM had the potential to effectively distinguish different acoustic emission signals and noise signals. The classification accuracy rate of grid search parameters was higher than the GA algorithm by this method [64]. Chu-Shu also revealed the method on how to classify the AE signals in composite laminates by using the SVM [65]. On the basis of a thorough review of literature, this study informs about the new approaches on the basis of hierarchical clustering and support vector machines (SVM) and are introduced to cluster AE signals and to detect P-waves for micro-crack location in the presence of noise through inducing the cracks in rock specimens during a surface instability test [66]. Thus, this chapter proposes a novel grinding wheel wear monitoring system based on discrete wavelet

decomposition and SVM. The grinding signals were collected by an AE sensor [67].

Elforjani used a model to analyse the output signals of a machine while in operation and accordingly helps to set an early alarm tool that reduces the untimely replacement of components and the wasteful machine downtime. In this work, Elforjani uses three supervised machine learning techniques such as Gaussian process regression (GPR), support vector machine regression (SVMR) and multi-layer artificial neural network (ANN) model to correlate AE features with corresponding natural wear of slow-speed bearings throughout the series of laboratory experiments. Analysis of signal parameters such as root mean square (RMS) and signal intensity estimator (SIE) was done to discriminate the individual types of early damage. It was concluded that neural network models with back-propagation learning algorithm have an advantage over the other models in estimating the remaining useful life (RUL) for slowspeed bearings if the proper network structure is chosen and sufficient data are provided [68].

for the ANFIS classifier [58].

speed bearing [62, 63].

**2.5. Support vector machine-based fault diagnosis**

Aguiar et al. investigated the burning in the grinding process on the basis of a fuzzy model. The inputs of the models were received from the digital processing of the raw AE and cutting power signals. The parameters obtained and used in this study consisted of the mean-value deviance, grinding power and root mean square (RMS) of the acoustic emission signal [51]. Ren et al. also attempted to introduce the most successful AE model during the continuous cutting periods by using fuzzy modelling. The fuzzy identification method provided a simple way to arrive at a more definite conclusion on the basis of the information collected with the difficulty in understanding the exact physics of the machining process [52].

Recent studies used type-2 fuzzy logic in their research [53–56] because of the need to have extremely fuzzy situations to use type-2 fuzzy. If we were extending the use of FL to a higher order, then it is called type-2 FL. Hence, Ren et al. explained how type-2 TSK [Takagi–Sugeno– Kang (TSK)] fuzzy uncertainty estimation method was implemented to filter the raw AE signal directly from the AE sensor during turning process. This paper specifically focuses on filtering and capturing the uncertainty by type-2 TSK fuzzy approach on the interval of AE signal during one 10 mm cutting length [53].

Ren et al. attempted to find out the relationship between AE and tool wear. They presented an application of type-2 FL on AE signal modelling in precision manufacturing. Type-2 fuzzy modelling was used for distinguishing the AE signal in precision machining. It provided a simple way for arriving at a definite conclusion without understanding the exact physics of the machining process [54].

The knowledge about uncertainty prediction of tool life was highly essential for tool condition investigation. It was also important for taking decisions about how to maintain the machine quality. Ren et al. presented a type-2 fuzzy tool condition monitoring (TCM) system based on AE in micro-milling. In the system, type-2 FLSs were utilised for analysing the AE signal feature (SF) and choosing the most reliable ones for integration to effectively estimate the cutting tool condition through its life. The acquired results show that the type-2 fuzzy tool life estimation is in accordance with the cutting tool wear state during the micro-milling process [55].

A type-2 fuzzy analysis method was utilised to analyse the AE SFs in TCM in micro-milling process. The interval output of type-2 approach provided an interval of uncertainty associated with SFs of AE signal. The SFs with less RMSE and variation were selected to estimate the cutting tool life in the future [56]. The new philosophy for AE source localisation under high background noise was also designed. The algorithm was based on probabilistic and fuzzyneuro principles, so AE events can be put to classification according to their energy and location probability. AE signals recorded during the stamping processes of a thin metal sheet were used for new algorithm testing [57].

Khalifa and Komarizadeh developed an efficient walnut recognition system through putting together the AE analysis, principle component analysis (PCA) and adaptive neuro-fuzzy inference system (ANFIS) classifier. This new system was tested later and classified walnuts into two classes. In the classification phase, selected statistical features were used as the input for the ANFIS classifier [58].

## **2.5. Support vector machine-based fault diagnosis**

sources of signals. FCM was potentially helpful in discovering the cluster among the data, when the boundaries between the subgroups overlap. AE test was conducted by using pulse, pencil and spark signal source on the surface of solid steel block. Four parameters such as event duration, peak amplitude, rise time and ring down count were measured with the help of AET 5000 system. These data were then used in training and validation of the FCM-based

Aguiar et al. investigated the burning in the grinding process on the basis of a fuzzy model. The inputs of the models were received from the digital processing of the raw AE and cutting power signals. The parameters obtained and used in this study consisted of the mean-value deviance, grinding power and root mean square (RMS) of the acoustic emission signal [51]. Ren et al. also attempted to introduce the most successful AE model during the continuous cutting periods by using fuzzy modelling. The fuzzy identification method provided a simple way to arrive at a more definite conclusion on the basis of the information collected with the

Recent studies used type-2 fuzzy logic in their research [53–56] because of the need to have extremely fuzzy situations to use type-2 fuzzy. If we were extending the use of FL to a higher order, then it is called type-2 FL. Hence, Ren et al. explained how type-2 TSK [Takagi–Sugeno– Kang (TSK)] fuzzy uncertainty estimation method was implemented to filter the raw AE signal directly from the AE sensor during turning process. This paper specifically focuses on filtering and capturing the uncertainty by type-2 TSK fuzzy approach on the interval of AE

Ren et al. attempted to find out the relationship between AE and tool wear. They presented an application of type-2 FL on AE signal modelling in precision manufacturing. Type-2 fuzzy modelling was used for distinguishing the AE signal in precision machining. It provided a simple way for arriving at a definite conclusion without understanding the exact physics of

The knowledge about uncertainty prediction of tool life was highly essential for tool condition investigation. It was also important for taking decisions about how to maintain the machine quality. Ren et al. presented a type-2 fuzzy tool condition monitoring (TCM) system based on AE in micro-milling. In the system, type-2 FLSs were utilised for analysing the AE signal feature (SF) and choosing the most reliable ones for integration to effectively estimate the cutting tool condition through its life. The acquired results show that the type-2 fuzzy tool life estimation is in accordance with the cutting tool wear state during the micro-milling process [55].

A type-2 fuzzy analysis method was utilised to analyse the AE SFs in TCM in micro-milling process. The interval output of type-2 approach provided an interval of uncertainty associated with SFs of AE signal. The SFs with less RMSE and variation were selected to estimate the cutting tool life in the future [56]. The new philosophy for AE source localisation under high background noise was also designed. The algorithm was based on probabilistic and fuzzyneuro principles, so AE events can be put to classification according to their energy and location probability. AE signals recorded during the stamping processes of a thin metal sheet

difficulty in understanding the exact physics of the machining process [52].

signal during one 10 mm cutting length [53].

were used for new algorithm testing [57].

the machining process [54].

classification [50].

284 Artificial Intelligence - Emerging Trends and Applications

The support vector machine (SVM) approach was utilised in the form of a classification technique on the basis of the statistical learning theory (SLT). It was basically based on the principle of hyperplane classifier or linear separability. The main purpose of SVM was to explore a linear optimal hyperplane for maximizing the margin of separation between the two classes [59, 60].

The SVM was utilised for fault diagnosis of spur bevel gear box. This was considered to be a popular machine learning application due to its higher accuracy and for its generalization capabilities [61]. These studies also examined the fault diagnosis of low-speed bearings based on AE technique and vibration signal. Fault diagnosis was conducted by using the classification technique with the help of relevance vector machine (RVM) and SVM. The classification process provided a comparative study between RVM and SVM for fault diagnosis of lowspeed bearing [62, 63].

Yu and Zhou exposed the method to classify the AE signals in composite laminates by utilising SVM. The classifier had built to achieve the identification and classification of AE signals. The results of simulation showed that SVM had the potential to effectively distinguish different acoustic emission signals and noise signals. The classification accuracy rate of grid search parameters was higher than the GA algorithm by this method [64]. Chu-Shu also revealed the method on how to classify the AE signals in composite laminates by using the SVM [65].

On the basis of a thorough review of literature, this study informs about the new approaches on the basis of hierarchical clustering and support vector machines (SVM) and are introduced to cluster AE signals and to detect P-waves for micro-crack location in the presence of noise through inducing the cracks in rock specimens during a surface instability test [66]. Thus, this chapter proposes a novel grinding wheel wear monitoring system based on discrete wavelet decomposition and SVM. The grinding signals were collected by an AE sensor [67].

Elforjani used a model to analyse the output signals of a machine while in operation and accordingly helps to set an early alarm tool that reduces the untimely replacement of components and the wasteful machine downtime. In this work, Elforjani uses three supervised machine learning techniques such as Gaussian process regression (GPR), support vector machine regression (SVMR) and multi-layer artificial neural network (ANN) model to correlate AE features with corresponding natural wear of slow-speed bearings throughout the series of laboratory experiments. Analysis of signal parameters such as root mean square (RMS) and signal intensity estimator (SIE) was done to discriminate the individual types of early damage. It was concluded that neural network models with back-propagation learning algorithm have an advantage over the other models in estimating the remaining useful life (RUL) for slowspeed bearings if the proper network structure is chosen and sufficient data are provided [68].

The development of AI technique shows a promising potential in machine condition monitoring and diagnosis, although only few articles were found in this area. However, ANN based on AE has been successfully applied to many relevant problems. It can be considered that ANN is the most new popular method in condition monitoring with AE signal. The use of fuzzy, GA and SVM in condition monitoring and fault diagnosis based on AE signal analysis still needs additional attention because of the absence of available papers. Finally, the future works will be able to find a novel idea for machine condition monitoring and fault diagnosis using AE signal analysis and AI [69].

**Author details**

Yasir Hassan Ali

**References**

Address all correspondence to: yha2006@gmail.com

Journal Teknologi. 2014;**69**(2)

Structures. 2006

on: 29-34

Emission; 2012

PEMC. 1996;**2**:314-318

Technical College Mosul, Northern Technical University, Mosul, Iraq

[1] Ali YH, Rahman RA, Hamzah RIR. Acoustic emission signal analysis and artificial intelligence techniques in machine condition monitoring and fault diagnosis: A review.

Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis

http://dx.doi.org/10.5772/intechopen.74932

287

[2] Ali YH et al. Acoustic emission technique in condition monitoring and fault diagnosis of

[3] Mba D, Rao RB. Development of Acoustic Emission Technology for Condition Monitoring and Diagnosis of Rotating Machines; Bearings, Pumps, Gearboxes, Engines and Rotating

[4] Filippetti F, Franceschini G, Tassoni C. A survey of AI techniques approach for induction machine on-line diagnosis. Proceedings of Power Electronics and Motion Control

[5] Siddique A, Yadava G, Singh B. 2003. Applications of artificial intelligence techniques for induction machine stator fault diagnostics: Review. Diagnostics for Electric Machines, Power Electronics and Drives. 2003. SDEMPED 2003. 4th IEEE International Symposium

[6] Kok JN, Boers EJ, Kosters WA, van der Putten P, Poel M.Artificial intelligence: Definition,

[7] Pham D, Pham P. Artificial intelligence in engineering. International Journal of Machine

[8] Saxena A, Saad A. Genetic algorithms for artificial neural net-based condition monitoring system design for rotating mechanical systems. Applied Soft Computing Technologies:

[9] Aguiar PR, Martins CH, Marchi M, Bianchi EC. Digital Signal Processing for Acoustic

[10] Kouroussis D, Anastassopoulos A, Lenain J, Proust A.Advances in Classification of Acoustic

[11] Al-Balushi K, Samanta B. Gear fault diagnostics using wavelets and artificial neural network. COMADEM 2000. 13th International Congress on Condition Monitoring and

trends, techniques, and cases. Artificial Intelligence. 2009

The Challenge of Complexity. Springer. 2006. pp. 135-149

Emission Sources. Reims: Les Journées COFREND; 2001

Diagnostic Engineering Management. 1001-1010. 2000

Tools and Manufacture. 1999;**39**(6):937-949

gears and bearings. International Journal of Academic Research. 2014;**6**(5)

## **3. Conclusion**

This chapter presents a survey based on a literature review using AE signal analysis and AI techniques in machine condition monitoring and fault diagnosis. It surveys the articles with a keyword index machine condition monitoring and machine fault diagnosis using AE signal analysis and AI.

We can conclude that the classification of AE signals carries high importance in machine condition monitoring and fault diagnosis. AI has several advantages when compared to the traditional mathematical modelling and statistical analysis. This includes dispensing of the necessity for detailed system behaviour knowledge, which can be replaced by relatively simple computational methods. ANN based on AE has been successfully applied to numerous relevant problems. Therefore, we can consider that ANN is the most new popular method in AE signal analysis.

GA applications with AE signal analysis in machine condition monitoring and fault diagnosis still need more support and attention because of the lack of existing evidence. The experimental results prove that the use of fuzzy logic method is efficient and feasible. The efforts to find a novel idea must be encouraged to give more contributions in robust machine condition monitoring and fault diagnosis.

Finally, the ability to continually change and obtain a novel idea for machine condition monitoring and diagnosis using AE signal analysis and AI will be in future works.

## **Acknowledgements**

The author would like to thank Northern Technical University in Iraq through Professor Dr Mowafaq Y. Hamdoon, the Chancellor of the university for supporting this work.

## **Conflict of interest**

The author declares that there is no conflict of interest regarding the publication of this chapter.

## **Author details**

The development of AI technique shows a promising potential in machine condition monitoring and diagnosis, although only few articles were found in this area. However, ANN based on AE has been successfully applied to many relevant problems. It can be considered that ANN is the most new popular method in condition monitoring with AE signal. The use of fuzzy, GA and SVM in condition monitoring and fault diagnosis based on AE signal analysis still needs additional attention because of the absence of available papers. Finally, the future works will be able to find a novel idea for machine condition monitoring and fault diagnosis

This chapter presents a survey based on a literature review using AE signal analysis and AI techniques in machine condition monitoring and fault diagnosis. It surveys the articles with a keyword index machine condition monitoring and machine fault diagnosis using AE signal

We can conclude that the classification of AE signals carries high importance in machine condition monitoring and fault diagnosis. AI has several advantages when compared to the traditional mathematical modelling and statistical analysis. This includes dispensing of the necessity for detailed system behaviour knowledge, which can be replaced by relatively simple computational methods. ANN based on AE has been successfully applied to numerous relevant problems. Therefore, we can consider that ANN is the most new popular method in

GA applications with AE signal analysis in machine condition monitoring and fault diagnosis still need more support and attention because of the lack of existing evidence. The experimental results prove that the use of fuzzy logic method is efficient and feasible. The efforts to find a novel idea must be encouraged to give more contributions in robust machine condition

Finally, the ability to continually change and obtain a novel idea for machine condition moni-

The author would like to thank Northern Technical University in Iraq through Professor Dr

The author declares that there is no conflict of interest regarding the publication of this

toring and diagnosis using AE signal analysis and AI will be in future works.

Mowafaq Y. Hamdoon, the Chancellor of the university for supporting this work.

using AE signal analysis and AI [69].

286 Artificial Intelligence - Emerging Trends and Applications

**3. Conclusion**

analysis and AI.

AE signal analysis.

monitoring and fault diagnosis.

**Acknowledgements**

**Conflict of interest**

chapter.

Yasir Hassan Ali

Address all correspondence to: yha2006@gmail.com

Technical College Mosul, Northern Technical University, Mosul, Iraq

## **References**


[12] Mahamad AKB. Diagnosis, Classification and Prognosis of Rotating Machine using Artificial Intelligence. Kumamoto University; 2010

[26] Shen G, Duan Q, Zhou Y, Li B, Liu Q, Li C, Jiang S. Investigation of artificial neural network pattern recognition of acoustic emission signals for pressure vessels. NDT. 2001;**23**:144-146

Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis

http://dx.doi.org/10.5772/intechopen.74932

289

[27] Macías EJ, Roca AS, Fals HC, Fernández JB, Muro JC. Neural networks and acoustic emission for modelling and characterization of the friction stir welding process. Revista

[28] Tian Y, Lewin P, Davies A, Sutton S, Swingler S.Application of acoustic emission techniques and artificial neural networks to partial discharge classification. Electrical Insulation, 2002. Conference Record of the 2002 IEEE International Symposium on: 119-123. 2002

[29] Szyszko S, Payne P. Artificial neural networks for feature extraction from acoustic emission signals. Measurements, Modelling and Imaging for Non-Destructive Testing, IEE

[30] Ali YH, Abd Rahman R, Hamzah RIR. Artificial neural network model for monitoring oil film regime in spur gear based on acoustic emission data. Shock and Vibration.

[31] Ali YH, Rahman RA, Hamzah RIR. Regression modeling for spur gear condition monitoring through oil film thickness based on acoustic emission signal. Modern Applied

[32] Ahmed FY, Yusob B, Hamed HNA. Computing with spiking neuron networks a review. International Journal of Advances in Soft Computing & Its Applications. 2014:6(1) [33] Kasabov N. To spike or not to spike: A probabilistic spiking neuron model. Neural

[34] Kasabov N. Integrative connectionist learning systems inspired by nature: current mod-

[35] Kojima, H. and S. Katsumata. An analysis of synaptic transmission and its plasticity by glutamate receptor channel kinetics models and 2-photon laser photolysis. In:

[36] Ikegaya Y et al. Statistical significance of precisely repeated intracellular synaptic pat-

[37] Ahmed FY, Shamsuddin SM, Hashim SZM. Improved SpikeProp for using particle

[38] Izhikevich E.. Dynamical Systems in Neuroscience: The Geometry of Excitability and

[39] Izhikevich EM, Edelman GM. Large-scale model of mammalian thalamocortical systems. Proceedings of the National Academy of Sciences. 2008;**105**(9):3593-3598

[40] Kasabov, N. Evolving spiking neural networks and neurogenetic systems for spatio-and spectro-temporal data modelling and pattern recognition. In: IEEE World Congress on

els, future trends and challenges. Natural Computing. 2009;**8**(2):199-218

International Conference on Neural Information Processing. 2008. Springer

swarm optimization. Mathematical Problems in Engineering. 2013;**2013**

Bursting (Computational Neuroscience): The MIT Press. 2006

Colloquium on. 1991:6/1-6/6

2015;**2015**

Science. 2015;**9**(8):21

Networks. 2010;**23**(1):16-19

terns. PloS One. 2008;**3**(12):e3983

Computational Intelligence. 2012. Springer

Iberoamericana de Automática e Informática Industrial RIAI. 2013;**10**(4):434-440


[26] Shen G, Duan Q, Zhou Y, Li B, Liu Q, Li C, Jiang S. Investigation of artificial neural network pattern recognition of acoustic emission signals for pressure vessels. NDT. 2001;**23**:144-146

[12] Mahamad AKB. Diagnosis, Classification and Prognosis of Rotating Machine using

[13] Abu-Mahfouz I. Condition monitoring of a gear box using vibration and acoustic emission based artificial neural network. SAE Transactions. 2001;**110**(6):1771-1781

[14] Menon S, Schoess JN, Hamza R, Busch D. Wavelet-based acoustic emission detection method with adaptive thresholding. SPIE's 7th Annual International Symposium on

[15] Blahacek M, Chlada M, Prevorovský Z. Acoustic emission source location based on sig-

[16] Fog TL, Brown E, Hansen H, Madsen L, Sørensen P, Hansen E, Steel J, Reuben R, Pedersen P. Exhaust Valve leakage Detection in Large Marine Diesel Engines. COMADEM´ 98, 11th Int. Conf. on Condition Monitoring and Diagnostic Engineering Management.

[17] Kouroussis D, Anastassopoulos A, Vionis P, Kolovos V. Unsupervised Pattern recognition of acoustic emission from full scale testing of a wind turbine blade. Journal of

[18] Wang J-Z, Wang L-S, Li G-f, Zhou G-H. Prediction of surface roughness in cylindrical traverse grinding based on ALS algorithm. Machine Learning and Cybernetics, 2005.

[19] Wang Z, Willett P, DeAguiar PR, Webster J. Neural network detection of grinding burn from acoustic emission. International Journal of Machine Tools and Manufacture.

[20] Dotto FR, Aguiar PR d, Bianchi EC, Serni PJ, Thomazella R. Automatic system for thermal damage detection in manufacturing process with internet monitoring. Journal of the

[21] Kwak J-S, Ha M-K. Neural network approach for diagnosis of grinding operation by acoustic emission and power signals. Journal of Materials Processing Technology. 2004;

[22] Aguiar P, França T, Bianchi E. Roughness and roundness prediction in grinding. Proceedings of the 5th CIRP International Seminar on Intelligent Computation in

[23] Aguiar PR, Cruz CE, Paula WC, Bianchi EC. Predicting Surface Roughness in Grinding

[24] Goebel K, Wright PK. Monitoring and Diagnosing Manufacturing Processes Using a Hybrid Architecture with Neural Networks and Fuzzy Logic. EUFIT, Proceedings. 1993. 2

[25] Walker JL, Russell SS, Workman GL, Hill EV. Neural network/acoustic emission burst pressure prediction for impact damaged composite pressure vessels. Materials Evaluation.

Brazilian Society of Mechanical Sciences and Engineering. 2006;**28**(2):153-160

Proceedings of 2005 International Conference on, 1. 2005: 549-554

Manufacturing Engineering (CIRP ICME 06). 2006. pp. 25-28

Artificial Intelligence. Kumamoto University; 2010

288 Artificial Intelligence - Emerging Trends and Applications

Smart Structures and Materials. 2000. pp. 71-77

Acoustic Emission (USA). 2000;**18**:217

pp. 269-279

2001;**41**(2):283-309

**147**(1):65-71

using Neural Networks

1997;**55**(8):903-907

nal features. Advanced Materials Research. 2006;**13**:77-82


[41] Silva RG. Condition monitoring of the cutting process using a self-organizing spiking neural network map. Journal of Intelligent Manufacturing. 2010;**21**(6):823-829

[58] Khalifa S, Komarizadeh MH. An intelligent approach based on adaptive neuro-fuzzy inference systems (ANFIS) for walnut sorting. Australian Journal of Crop Science.

Artificial Intelligence Application in Machine Condition Monitoring and Fault Diagnosis

http://dx.doi.org/10.5772/intechopen.74932

291

[60] Burges CJ. A tutorial on support vector machines for pattern recognition. Data Mining

[61] Saravanan N, Kumar Siddabattuni V, Ramachandran K. A comparative study on classification of features by SVM and PSVM extracted using Morlet wavelet for fault diagnosis

[62] Widodo A, Yang B-S, Kim EY, Tan AC, Mathew J. Fault diagnosis of low speed bearing based on acoustic emission signal and multi-class relevance vector machine.

[63] Widodo A, Kim EY, Son J-D, Yang B-S, Tan AC, Gu D-S, Choi B-K, Mathew J. Fault diagnosis of low speed bearing based on relevance vector machine and support vector

[64] Yu Y, Zhou L. Acoustic emission signal classification based on support vector machine. TELKOMNIKA Indonesian Journal of Electrical Engineering. 2012;**10**(5):1027-1032 [65] Chu-Shu K. A machine learning approach for locating acoustic emission. EURASIP

[66] Yang Z, Yu Z. Grinding wheel wear monitoring based on wavelet analysis and support vector machine. The International Journal of Advanced Manufacturing Technology.

[67] Yu Y, Zhou L. Acoustic emission signal classification based on support vector machine. Computer Engineering and Technology (ICCET), 2010 2nd International Conference,

[68] Elforjani M, Shanbr S. Prognosis of bearing acoustic emission signals using supervised

[69] Ali YH, Ali SM, Rahman RA, Hamzah RIR. Acoustic Emission and Artificial Intelligent Methods in Condition Monitoring of Rotating Machine–A Review. National Conference

for Postgraduate Research 2016, Universiti Malaysia Pahang. Malaysia. 2016

machine learning. IEEE Transactions on Industrial Electronics. 2017

of spur bevel gear box. Expert Systems with Applications. 2008;**35**(3):1351-1366

[59] Vapnik V. Statistical Learning Theory New York. NY: Wiley; 1998

Nondestructive Testing and Evaluation. 2009;**24**(4):313-328

Journal on Advances in Signal Processing. 2010;**2010**

2012;**62**(1-4):107-121

16-18 April Chengdu. 2010;**6**:300-304

machine. Expert Systems with Applications. 2009;**36**(3):7252-7261

and Knowledge Discovery. 1998;**2**(2):121-167

2012;**6**(2)


[41] Silva RG. Condition monitoring of the cutting process using a self-organizing spiking neural network map. Journal of Intelligent Manufacturing. 2010;**21**(6):823-829

[42] Silva RG, Wilcox S, Araújo AA. Multi-sensor condition monitoring using spiking neuron

[43] Ming ZX. Application of Acoustic Emission Technique in Fault Diagnostics of Rolling

[44] Sibil A, Godin N, R'Mili M, Maillet E, Fantozzi G. Optimization of acoustic emission data clustering by a genetic algorithm method. Journal of Nondestructive Evaluation.

[46] Zadeh LA. Outline of a new approach to the analysis of complex systems and decision processes. IEEE Transactions on Systems, Man and Cybernetics. 1973;(1):28-44

[48] Hellmann M. Fuzzy Logic Introduction" a Laboratoire Antennes Radar Telecom. FRE

[49] Cusido J, Delgado M, Navarro L, Sala V, Romeral L. EMA fault detection using fuzzy

[50] Omkar S, Suresh S, Raghavendra T, Mani V. Acoustic emission signal classification using fuzzy C-means clustering. Neural Information Processing, 2002. ICONIP'02. Proceedings

[51] De Aguiar PR, Bianchi EC, Canarim RC.Monitoring of Grinding Burn by Acoustic Emission [52] Ren Q, Baron L, Balazinski M. Fuzzy identification of cutting acoustic emission with extended subtractive cluster analysis. Nonlinear Dynamics. 2012;**67**(4):2599-2608

[53] Ren Q, Baron L, Balazinski M. Application of type-2 fuzzy estimation on uncertainty in machining: An approach on acoustic emission during turning process. Fuzzy Information Processing Society, 2009. NAFIPS 2009. Annual Meeting of the North American. 2009: 1-6

[54] Ren Q, Baron L, Balazinski M. Type-2 fuzzy modeling for acoustic emission signal in precision manufacturing. Modelling and Simulation in Engineering. 2011;**2011**:17 [55] Ren Q, Balazinski M, Baron L, Jemielniak K, Botez R, Achiche S. Type-2 fuzzy tool condition monitoring system based on acoustic emission in micromilling. Information

[56] Ren Q, Baron L, Balazinski M, Jemielniak K. Acoustic emission signal feature analysis using type-2 fuzzy logic system. Fuzzy Information Processing Society (NAFIPS), 2010

[57] Blahacek M, Prevorovsky Z, Krofta J, Chlada M. Neural network localization of noisy AE events in dispersive media. Journal of Acoustic Emission(USA). 2000;**18**:279

networks. In: IADIS international conference applied computing 2007. 2007

Bearing. Master's thesis. Tsinghua University, Beijing, Haidian; 2006

[47] Zadeh LA. Fuzzy algorithms. Information and Control. 1968;**12**(2):94-102

[45] Zadeh LA. Fuzzy sets. Information and Control. 1965;**8**(3):338-353

inference tools. AUTOTESTCON, 2010 IEEE. 2010: 1-6

of the 9th International Conference on, 4. 2002:1827-1831

2012;**31**(2):169-180

290 Artificial Intelligence - Emerging Trends and Applications

CNRS. 2272

Sciences. 2014;**255**:121-134

Annual Meeting of the North American. 2010: 1-6


**Section 3**

**Life and Health Sciences**

## **Life and Health Sciences**

**Chapter 15**

Provisional chapter

**Normal Versus Abnormal ECG Classification by the Aid**

DOI: 10.5772/intechopen.75546

With the development of telemedicine systems, collected ECG records are accumulated on a large scale. Aiming to lessen domain experts' workload, we propose a new method based on lead convolutional neural network (LCNN) and rule inference for classification of normal and abnormal ECG records with short duration. First, two different LCNN models are obtained through different filtering methods and different training methods, and then the multipoint-prediction technology and the Bayesian fusion method are successively applied to them. As beneficial complements, four newly developed disease rules are also involved. Finally, we utilize the bias-average method to output the predictive value. On the Chinese Cardiovascular Disease Database with more than 150,000 ECG records, our proposed method yields an accuracy of 86.22% and 0.9322 AUC (Area under

ROC curve), comparable to the state-of-the-art results for this subject.

and (2) short-term ECG recording, such as standard 10-second, 12-lead ECG.

Keywords: telemedicine, cardiovascular disease, ECG, deep learning, convolutional

As an inexpensive, noninvasive and well-established diagnostic tool for cardiovascular disease, the electrocardiogram (ECG) has been widely applied since the 1980s. In clinics, its main applications are as follows: (1) long-time ECG monitoring, such as 24-hour ambulatory ECG

With the improvement of living standards, more and more people pay attention to their own health problems and performing an ECG test is the preferred method for preventing cardiovascular disease. Although it is easy to collect ECG now, diagnostic conclusions cannot be

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Normal Versus Abnormal ECG Classification by the Aid

**of Deep Learning**

of Deep Learning

Linpeng Jin and Jun Dong

Linpeng Jin and Jun Dong

Abstract

1. Introduction

http://dx.doi.org/10.5772/intechopen.75546

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

neural network, rule inference, classification

#### **Normal Versus Abnormal ECG Classification by the Aid of Deep Learning** Normal Versus Abnormal ECG Classification by the Aid of Deep Learning

DOI: 10.5772/intechopen.75546

Linpeng Jin and Jun Dong Linpeng Jin and Jun Dong

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75546

#### Abstract

With the development of telemedicine systems, collected ECG records are accumulated on a large scale. Aiming to lessen domain experts' workload, we propose a new method based on lead convolutional neural network (LCNN) and rule inference for classification of normal and abnormal ECG records with short duration. First, two different LCNN models are obtained through different filtering methods and different training methods, and then the multipoint-prediction technology and the Bayesian fusion method are successively applied to them. As beneficial complements, four newly developed disease rules are also involved. Finally, we utilize the bias-average method to output the predictive value. On the Chinese Cardiovascular Disease Database with more than 150,000 ECG records, our proposed method yields an accuracy of 86.22% and 0.9322 AUC (Area under ROC curve), comparable to the state-of-the-art results for this subject.

Keywords: telemedicine, cardiovascular disease, ECG, deep learning, convolutional neural network, rule inference, classification

## 1. Introduction

As an inexpensive, noninvasive and well-established diagnostic tool for cardiovascular disease, the electrocardiogram (ECG) has been widely applied since the 1980s. In clinics, its main applications are as follows: (1) long-time ECG monitoring, such as 24-hour ambulatory ECG and (2) short-term ECG recording, such as standard 10-second, 12-lead ECG.

With the improvement of living standards, more and more people pay attention to their own health problems and performing an ECG test is the preferred method for preventing cardiovascular disease. Although it is easy to collect ECG now, diagnostic conclusions cannot be

accurately provided since domain experts are in short supply, especially in the basic community medical insurance system (BCMIS) of China. A feasible solution is that ECG records are sent to be interpreted by domain experts in telemedicine centers, who send back their diagnostic conclusions via the Internet. In fact, such institutions are common in China now, for instance, Shanghai Aerial Hospital Network, Henan Telemedicine Center, and so on. But in telemedicine centers, a large number of ECG records need to be interpreted and the workload of domain experts will be very heavy considering the huge possible audience. Since ECG records are mainly collected from people attending physical examinations, their diagnostic conclusions are likely to be "normal." If computer-assisted ECG analysis algorithms filter out most normal records and then domain experts only focus on interpreting the remaining abnormal ones, that is, man–machine integration [1], the diagnostic efficiency will be greatly increased and the social benefits will be significant. The key technical indicator is that the detection precision of normal records must be at least 90% for long-time ECG monitoring and 95% for short-term ECG recording [2].

centers. For the Common Standards for Quantitative ECG (CSE) database [18], there are approximately 1000 standard 10-second, 12-lead ECG records, each containing annotations for P, QRS and T positions. However, this database is not free, which makes many scholars unable to carry out research on it. For other databases (such as American Heart Association (AHA) database [19] and ST-T database [20]), there are also some advantages and disadvan-

Normal Versus Abnormal ECG Classification by the Aid of Deep Learning

http://dx.doi.org/10.5772/intechopen.75546

297

Aiming to carry out relevant research for telemedicine centers, we constructed the Chinese Cardiovascular Disease Database (CCDD) [21] containing 193,690 standard 12-lead ECG records with about 10–20 s in duration. As shown in Figure 1, the ECG record consists of 6 limb leads of I, II, III, aVR, aVL, aVF and six chest leads of V1, V2, V3, V4, V5, V6 that describe cardiac electrical activity jointly. If it is collected for 10 s at the sampling frequency of 500 Hz,

In this study, we develop a method for classification of normal and abnormal ECG records with short duration (record classification), but one that can also be easily extended to other cases since a long-term ECG record can be divided every T s into segments of length T [12]. Based on the CCDD, our research group has tried and proposed some methods for this subject. Using existing methods for heartbeat classification [22, 23] and the average-prediction technology, Zhu [24] obtained an accuracy of 72% with a specificity of 94% and a sensitivity of 25% when testing 11,760 ECG records. Wang [25] proposed a Multi-Way Tree Based Hybrid Classifier (MTHC), including analysis of RR interval, similarity analysis of QRS complex, analysis of physiology characteristics (such as P wave, T wave and PR interval) and statistical learning based on morphological and numerical characteristics. The specificity, sensitivity and accuracy are 54.62, 95.13 and 72.49%, respectively, on the 140,098 ECG records. Zhang [13] proposed a heartbeat classification method which has certain advantages over the relevant literature [5, 6, 8, 9, 14], but the accuracy is about 50–60% when he used it for record classification. For this, our prior work [26] analyzed traditional feature-based methods from the ability to construct nonlinear functions and proposed lead convolutional neural networks (LCNNs) for multi-lead ECGs. Using the explicit training method and the single-point-prediction technology, we

there are 60,000 (=500 10 12) sampling points.

Figure 1. Standard 12-lead synchronous ECG.

tages.

As heartbeat classification is an important step in computer-based ECG analysis regardless of application scenarios, a considerable amount of research has been dedicated to this subject. In general, the processing flow of these works is such that feature vectors, including physiology characteristics with diagnostic value (such as RR interval and PR interval) [3] and statistical characteristics (such as wavelet transform [4] and independent component analysis [5]), are extracted from heartbeat segments first, and feature selection [6] is conducted when necessary. Afterwards, a number of machine learning algorithms are employed for classification, such as support vector machine (SVM) [7] and Neural Network [8]. The relevant literature can be categorized into two different types based on the adopted evaluation scheme, namely, "intrapatient" and "inter-patient" [9].

The intra-patient evaluation has been adopted by extensive literature and its main characteristic is that the training and testing sets contain heartbeat segments from the same patients. By this scheme that is not in conformity with clinical practice, classification results tend to be overly optimistic, since the human heartbeat can be used for identity recognition [10]. Using the MIT-BIH arrhythmias (MIT-BIH-AR) database [11] and the Advancement of Medical Instrumentation (AAMI) recommendation [12], de Chazal et al. [9] proposed the inter-patient evaluation where the training and testing sets are constructed by heartbeat segments from different ECG records so that inter-individual variation can be taken into account. This scheme adopted by recent works [5, 6, 8, 13, 14] can evaluate the clinical performance of heartbeat classification algorithms in a relatively effective manner.

Note that heartbeat classification is just an intermediate step and what domain experts are concerned about is the classification result of an ECG record provided by computer-assisted ECG analysis algorithms. However, due to limited standard ECG databases, there are less research works relevant to this subject compared with heartbeat classification [15–17]. For the MIT-BIH-AR database, there are a total of 48 two-lead ECG records with approximately 30 min in duration, each containing heartbeat annotations for both R-peak position and disease type. The lead configurations (sensor locations) are not the same at all and only 40 records have both II and VI. Based on this database, we cannot use the obtained classifier for telemedicine centers. For the Common Standards for Quantitative ECG (CSE) database [18], there are approximately 1000 standard 10-second, 12-lead ECG records, each containing annotations for P, QRS and T positions. However, this database is not free, which makes many scholars unable to carry out research on it. For other databases (such as American Heart Association (AHA) database [19] and ST-T database [20]), there are also some advantages and disadvantages.

Aiming to carry out relevant research for telemedicine centers, we constructed the Chinese Cardiovascular Disease Database (CCDD) [21] containing 193,690 standard 12-lead ECG records with about 10–20 s in duration. As shown in Figure 1, the ECG record consists of 6 limb leads of I, II, III, aVR, aVL, aVF and six chest leads of V1, V2, V3, V4, V5, V6 that describe cardiac electrical activity jointly. If it is collected for 10 s at the sampling frequency of 500 Hz, there are 60,000 (=500 10 12) sampling points.

In this study, we develop a method for classification of normal and abnormal ECG records with short duration (record classification), but one that can also be easily extended to other cases since a long-term ECG record can be divided every T s into segments of length T [12]. Based on the CCDD, our research group has tried and proposed some methods for this subject. Using existing methods for heartbeat classification [22, 23] and the average-prediction technology, Zhu [24] obtained an accuracy of 72% with a specificity of 94% and a sensitivity of 25% when testing 11,760 ECG records. Wang [25] proposed a Multi-Way Tree Based Hybrid Classifier (MTHC), including analysis of RR interval, similarity analysis of QRS complex, analysis of physiology characteristics (such as P wave, T wave and PR interval) and statistical learning based on morphological and numerical characteristics. The specificity, sensitivity and accuracy are 54.62, 95.13 and 72.49%, respectively, on the 140,098 ECG records. Zhang [13] proposed a heartbeat classification method which has certain advantages over the relevant literature [5, 6, 8, 9, 14], but the accuracy is about 50–60% when he used it for record classification. For this, our prior work [26] analyzed traditional feature-based methods from the ability to construct nonlinear functions and proposed lead convolutional neural networks (LCNNs) for multi-lead ECGs. Using the explicit training method and the single-point-prediction technology, we

Figure 1. Standard 12-lead synchronous ECG.

accurately provided since domain experts are in short supply, especially in the basic community medical insurance system (BCMIS) of China. A feasible solution is that ECG records are sent to be interpreted by domain experts in telemedicine centers, who send back their diagnostic conclusions via the Internet. In fact, such institutions are common in China now, for instance, Shanghai Aerial Hospital Network, Henan Telemedicine Center, and so on. But in telemedicine centers, a large number of ECG records need to be interpreted and the workload of domain experts will be very heavy considering the huge possible audience. Since ECG records are mainly collected from people attending physical examinations, their diagnostic conclusions are likely to be "normal." If computer-assisted ECG analysis algorithms filter out most normal records and then domain experts only focus on interpreting the remaining abnormal ones, that is, man–machine integration [1], the diagnostic efficiency will be greatly increased and the social benefits will be significant. The key technical indicator is that the detection precision of normal records must be at least 90% for long-time ECG monitoring and

As heartbeat classification is an important step in computer-based ECG analysis regardless of application scenarios, a considerable amount of research has been dedicated to this subject. In general, the processing flow of these works is such that feature vectors, including physiology characteristics with diagnostic value (such as RR interval and PR interval) [3] and statistical characteristics (such as wavelet transform [4] and independent component analysis [5]), are extracted from heartbeat segments first, and feature selection [6] is conducted when necessary. Afterwards, a number of machine learning algorithms are employed for classification, such as support vector machine (SVM) [7] and Neural Network [8]. The relevant literature can be categorized into two different types based on the adopted evaluation scheme, namely, "intra-

The intra-patient evaluation has been adopted by extensive literature and its main characteristic is that the training and testing sets contain heartbeat segments from the same patients. By this scheme that is not in conformity with clinical practice, classification results tend to be overly optimistic, since the human heartbeat can be used for identity recognition [10]. Using the MIT-BIH arrhythmias (MIT-BIH-AR) database [11] and the Advancement of Medical Instrumentation (AAMI) recommendation [12], de Chazal et al. [9] proposed the inter-patient evaluation where the training and testing sets are constructed by heartbeat segments from different ECG records so that inter-individual variation can be taken into account. This scheme adopted by recent works [5, 6, 8, 13, 14] can evaluate the clinical performance of heartbeat

Note that heartbeat classification is just an intermediate step and what domain experts are concerned about is the classification result of an ECG record provided by computer-assisted ECG analysis algorithms. However, due to limited standard ECG databases, there are less research works relevant to this subject compared with heartbeat classification [15–17]. For the MIT-BIH-AR database, there are a total of 48 two-lead ECG records with approximately 30 min in duration, each containing heartbeat annotations for both R-peak position and disease type. The lead configurations (sensor locations) are not the same at all and only 40 records have both II and VI. Based on this database, we cannot use the obtained classifier for telemedicine

95% for short-term ECG recording [2].

296 Artificial Intelligence - Emerging Trends and Applications

patient" and "inter-patient" [9].

classification algorithms in a relatively effective manner.

Figure 2. LCNN-based method for record classification.

achieved an accuracy of 83.66% with a specificity of 83.84% and a sensitivity of 83.43% when testing 151,274 records. Figure 2 depicts the whole process flow.

To improve the classification performance further, this study extends our prior work in the following aspects. (1) Two different LCNN models are obtained through different filtering methods (a low-pass filter and a band-pass filter) and different training methods (the explicit method and the implicit method), and then the multipoint-prediction technology and the Bayesian fusion method are successively applied to them. (2) Four effective disease rules based on R peaks are developed for further analysis. (3) The final classification result is determined by utilizing the bias-average method.

Normal ECG, Sinus Rhythm, Atrial Arrhythmia, Junctional Rhythm, Ventricular Arrhythmia, Conduction Block, Atrial Hypertrophy, Ventricular Hypertrophy, Myocardial Infarction, ST-T Change and Other Abnormalities), 72 second-grade types and 335 third-grade types, including all the possible diagnostic conclusions provided by cardiologists in clinic. More details can be

Normal Versus Abnormal ECG Classification by the Aid of Deep Learning

http://dx.doi.org/10.5772/intechopen.75546

299

In telemedicine centers, ECG records that are not explicitly diagnosed as "normal" all need to be further interpreted by domain experts, so that the potential cardiovascular disease of a patient can be detected as early as possible. Therefore, in this study we only regard ECG records whose diagnostic conclusions are "001" or "0020101" or "0020102" as "normal" (denoted as 0-class) and all the others as "abnormal" (denoted as 1-class). Moreover, we throw away some exception data, that is, the diagnostic conclusion is "000" or the duration is less

The training and testing sets are organized as follows [26]: "data944–25,693" is partitioned into three parts where the numbers of training samples, validation samples and testing samples (i.e., the small-scale testing set) are 12,320, 560 and 11,789, respectively, and the large-scale testing set is composed of 151,274 ECG records from data25694–179,130. Note that we will combine training samples and validation samples together for implicit training. Table 1
