A Robotics-Based Machine Learning Approach for Fall Detection of People

*Teddy Ordoñez Nuñez, Raimundo Celeste Ghizoni Teive and Alejandro Rafael Garcia Ramirez*

## **Abstract**

For a person when carrying out household chores or even when walking on the streets, there is a risk of falling. This risk increases throughout the years due to the natural aging process. In this work, a bibliographic review was performed to find related papers who discussed different techniques for fall classification. The aim of this study was to develop two ML models: an SVM and a k-NN model, to classify the fall. An accelerometer, gyroscope, and magnetometer located on the waists of 15 volunteers are the application sensors. The extracted features were the mean, standard deviation, and range for each sensor. The best accuracy obtained was 93.89%, a sensitivity of 85.10%, and a specificity of 96.99%. All results were obtained by simulations, by using the test set separated in the first stage of the implementation. So, a shortcoming is the fact that the ML models were not tested with a hardware implementation. In future works, the models can be embedded into a microcontroller and classify data in real time.

**Keywords:** k-NN, SVM, inertial measurement unit, elderly, falls, wearables

## **1. Introduction**

As the years go by, bodies become weaker and thus give up their physical health. It can lead to new problems and challenges for the elderly because there comes a time when they need to be more cautious, and not everyone can be that way. And it is in this context that falls among the elderly are becoming more and more frequent. Falls among them have more consequences than a scrape on their bodies. People over 60 are gradually becoming more vulnerable to falls [1].

Falls among the elderly happen suddenly and are very frequent. According to Ref. [2], about 30% of people over 65 years old suffer a fall at least once a year, increasing to 50% when they are over 80. Falls are a problem of worldwide interest, which brings consequences to people and governments due to the heavy investment to recover its citizens. Therefore, researchers are always looking for solutions to improve people's quality of life.

Since 1991 the authors in Ref. [3] began studies to use wearable sensors to solve this problem. Other works in this field were in Refs. [4, 5], which proposed a protocol for evaluating the performance of any developed system.

Usually, those devices are at high end and challenging for a consumer with a low income to acquire because of the costs. The two most popular ways to detect falls are video [6] and measuring signals from an accelerometer placed on the body [7]. There are vast possibilities for integrating these devices with machine learning (ML) techniques to correctly classify data received from video streaming or sensors placed on the body.

## **2. Falls**

"Fall detection involves complex pattern recognition, which tends to vary according to each individual who suffers a fall" [8]. According to Ref. [1], falls can be defined as "an event that results in a person unintentionally stopping their activities on the ground, floor, or a lower level." Falls can also be defined as "falling to the floor or some other lower level as a consequence of receiving a violent blow, loss of consciousness, paralyzes such as a stroke or a seizure of epilepsy" [9]. Approximate 684,000 fatal falls occur each year, with 80% of these fatalities concentrated in lowand middle-income countries [1].

According to Ref. [9], most falls happen in the sagittal and coronal planes, as shown in **Figure 1**. These names are related to the human body and its anatomy. It is worth noting that when a fall occurs with the loss of consciousness, as described in Ref. [9], that is when the body suffers more. It is due to the lack of absorption of impact since the body falls directly to the ground. When a fall happens, the person is conscious can absorb the impact by stretching their arms to protect themselves if they fall forwards.

**Figure 1.** *Sagittal and coronal planes of the human body.*

## *A Robotics-Based Machine Learning Approach for Fall Detection of People DOI: http://dx.doi.org/10.5772/intechopen.106799*

Serious injuries include traumatic brain injury, concussion, hemorrhages, and cuts [6]. In Brazil, the Sistema Unico de Saude (SUS) spends more than R\$51 million annually treating various fractures because of falls [6]. According to Ref. [10], approximately one in three adults who live in their homes suffers a fall annually. And of those adults, about half of them will experience falls more frequently. According to Ref. [1], numerous factors can influence a person to suffer a fall, and among the most prominent are age, gender, and health.

## **2.1 Factors who contribute to falls**

According to Ref. [9], the age factor is not enough to describe the risk of a person falling; therefore, a person is more likely to fall depending on several other factors. It is worth noting the risk of an elder suffering a fall is higher due to the inherent aging process. The factors that contribute to the event of a fall can be separated into two categories: intrinsic and extrinsic [6, 9].

Intrinsic factors are those that depend on the person, such as medication use, low muscle mass percentage, dizziness, and lightheadedness [6]. Among these factors, Ref. [9] also includes osteoporosis, Parkinson's, dementia and cognitive problems, inadequate lifestyle, vision problems, chronic diseases, and previous falls. An inadequate lifestyle is directly linked to a sedentary lifestyle since physical activity helps to strengthen muscles [6].

Extrinsic factors are external to the individual [6]. Among them are slippery floors, stairs, inadequate footwear, crowded places, low light conditions, and damaged sidewalks [1]. Poor condition sidewalks represent a worrying problem in Brazil, based on a study conducted by Ref. [11]. They found that the average score attributed to sidewalks in several cities, on a scale of 1 to 10, is 3.40. A good score for the quality of sidewalks would be 8.0 [11].

## **2.2 Consequences of falling**

There are several consequences because of a fall. Falls as an outcome of accidents are one of the reasons for hospital admissions and the leading cause of death among people over 65 years [9]. Among the types of consequences, Refs. [6, 9] emphasize physical and psychological damage, and in addition, Ref. [9] also mentions financial losses. Serious injuries are related to physical consequences. The most common minor wounds are bruises and scrapes [9]. There are many serious injuries, such as concussions, bleeding, skull trauma, and fractures [6].

According to Ref. [6], the most common consequence among the psychological type is fear of suffering new falls, but still Ref. [9] also mentions the lower quality of life, loss of independence, low self-esteem, and limited abilities. The economic implications are just as important as others because of the medical expenses. Among these expenses are rehabilitation therapies, medical examinations, hospitalizations, and the purchase of medical equipment [9]. Due to such arguments, it is a must to prevent falls. **Figure 2** shows an example of a fall registered by the three sensors considered in this work. For every sensor, there are three individual graphs.

In **Figure 2**, one can observe a graph created using the accelerometer, gyroscope, and magnetometer readings while simulating a forward fall. This is a simulation of a fall caused by fainting or syncope forwards. These three sensors are located in the person's waist.

In this example, the volunteer stands up until the ninth second. When this mark is reached the person falls forwards, simulating a consciousness loss. At this moment, there is an abrupt change in the sensor's readings, and the accelerometer's value reached its peak at ±5 g. There was an impact, and towards around the 10th second, the volunteer hit the ground and remained in this position (this scenario did not consider recovery after impact).

## **2.3 Related works**

Bibliographic research was carried out through the Univali Integrated Library System (SIBIUN), which performs a search in the Univali collection, CAPES Portal, EBSCO, Biblioteca A, Saraiva, Vlex, Scielo Livros, Scielo Periodicals, and Open Access Directories. The search strings "Machine Learning" AND "Fall Classification" were used, yielding 184 results. After reading the abstracts, four relevant studies were selected.

In Ref. [12], three sensors collected data from an accelerometer, a gyroscope, and a magnetometer. This group of sensors were placed in five places on the volunteers' body, such as on their head, chest, waist, wrist, and legs. The authors used six different ML techniques, including k-nearest neighbor (k-NN), support vector machines (SVM), least square method, Bayesian decision making, dynamic time warping, and artificial neural networks. Overall, the work scored optimal results, with an accuracy of 99.91%, a sensitivity of 100%, and a specificity of 99.79% [12]. The best accuracy was achieved by the k-NN algorithm, with 99.1% [12].

## *A Robotics-Based Machine Learning Approach for Fall Detection of People DOI: http://dx.doi.org/10.5772/intechopen.106799*

In Ref. [13], was carried out a similar work by using the same sensors as previously cited. However, authors placed the sensors only on the waist of the volunteers since the human body's center of mass is located there. To perform the signal classification were used three stages of a fall. These stages are impact, post-impact, and posture. The proposed solution is based on a threshold comparison to identify each one of the stages. It is worth noting that in Ref. [13], SVM was used to extract thresholds for each phase. With this proposed solution, the result was 100% accuracy, sensitivity, and specificity for the classification [13].

The work in Ref. [6] differs from the related studies. In particular, the authors used an accelerometer and a gyroscope embedded in a smartphone to capture the sensors signals and classify them. A belt was used to secure the smartphone to the volunteer's waist. Like [12], this study used the idle time. After detecting the inactivity time, data were classified using a decision tree and a threshold classifier and verified the actual orientation of the device. If all verifications are true, a fall is notified. The system in Ref. [6] achieved an accuracy of 93.25%, a sensitivity of 95.45%, and a specificity of 87.65%.

The most recent work is Ref. [14]. The authors also used all three sensors. They created the dataset FallAllD, which is available to the academic community. The volunteers used the set of sensors on three parts of their body: the chest, wrist, and waist. The authors explore four different ML techniques to classify falls: k-NN, SVM, random forest classifier, and convolutional neural network. Although all the three sensors collect data, only the accelerometer readings were used to train the ML models, looking for a simplified operation. The authors found an accuracy of 89.70%, a sensitivity of 95.06%, and a specificity of 95.20% when applying the k-NN technique. The implementation of the SVM technique with a quadratic kernel achieved an accuracy of 85.86%.

In Ref. [15], the authors demonstrate techniques not only to reliably detect a fall but also to automatically classify the type. Fifteen volunteers simulate four different types of falls-left and right lateral, forward trips, and backward slips—while wearing mobile phones. They applied five machine learning classifiers to a large timeseries feature set to detect falls. Support vector machines and regularized logistic regression were able to identify a fall with 98% accuracy and classify the type of fall with 99% accuracy.

In Ref. [16], the authors present a comprehensive literature review on various ML-based classifications in fall detection. The authors identify the main problems in threshold-based classification from existing works and find the need for an efficient ML-based classification technique to accurately identify the fall. In addition, the shortcomings associated with the ML-based techniques for future research and other problems, such as data preprocessing and data dimensionality reduction techniques, are investigated. They concluded that ML-based techniques are far superior to threshold-based techniques.

**Table 1** shows the comparison between the related works.

## **3. Development**

In this work, the Python programming language was used. Besides the built-in library, we used other embedded resources to manipulate the data samples, that is, to create the ML models and to generate the confusion matrices. In addition, the Pandas' library was used to manipulate the data. This library is popular among Data Scientists


*\* A = accelerometer, G = gyroscope, M = magnetometer, and B = barometer.*

#### **Table 1.**

*Comparison between related works and algorithms.*

#### **Figure 3.**

*Block diagram of the system.*

due to its reliability and ease of use. Another functionality of this library is the ability to handle missing samples and to calculate simple statistical characteristics.

The Scikit-learn library was also used in this work. This library allows to create, train, and test the ML models. The Scikit-learn also release access to ML models, and to different training techniques, prediction, and allows to divide the dataset into training and test sets. The confusion matrices are also generated by a function of the Scikit-learn library. The Matplotlib was also used to plot the confusion matrices previously generated. Finally, Pickle allows developers to save and load datasets and ML models.

Datasets available to the academic community were researched. In Refs. [12–14] were found three datasets. The dataset in ref. [12] has the biggest data samples, however some miss relevant data. On the other hand, the dataset in ref. [13] does not have a pattern in the time domain of sensor readings. In this work, the dataset created in Ref. [14] was used. It was recently created and does not utilize mattresses to cushion the falls, making them more realistic. **Figure 3** depicts the block diagram of the proposed system.

The information extracted from the dataset contains the sensors readings from an accelerometer, a gyroscope, and a magnetometer. Next, the feature extraction was performed to train and validate the ML model. It is possible to perform the data classification after training the model, which can be done in two categories: Fall or Activity of Daily Living (ADL).

In Ref. [14], developers can capture data from the wrist, waist, and chest. The data captured from the waist was created by 14 volunteers, who used safety equipment

## *A Robotics-Based Machine Learning Approach for Fall Detection of People DOI: http://dx.doi.org/10.5772/intechopen.106799*

to prevent injuries. The authors chose not to use mattresses to cushion the falls to be as realistic as possible. The volunteers were free to choose which ADLs or falls they desired to simulate. All 14 volunteers chose to simulate ADLs and 12 out of 14 volunteers performed simulated falls. Every single scenario was recorded for 20 seconds. During the first 9 seconds, the volunteer had a movement to simulate, when the ninth second was reached the volunteer mimicked a fall, and the person could stay down or recover depending on which type of fall he was simulating.

The authors labeled as ADLs or Falls data samples within the dataset by using numbers as activity IDs. IDs ranging from 1 through 44 are samples representing ADLs. Since we are only considering samples recorded by those sensors located in the waist, ADLs range from 13 through 44, because those activities labeled from 1 through 12 were recorded by sensors located in the volunteer's wrist. Among those ADLs, one can find activities such as: walking, running, standing up from a chair, and jumping.

Falls were labeled from 100 through 135. Among these falls, you can find different types of falls that normally would occur to people day to day. Volunteers simulated falls slipping, tripping, or losing balance while walking and slipping, and those falls were forwards, backwards, and laterally. They also simulated falls while running, lying in bed, trying to sit down, or standing for a while; these falls were simulated forwards, backwards, and laterally. It is also important to point out that falls with recovery were considered effectively as falls in this work.

It is important mention that those 14 simulating ADLs and those 12 simulating falls had to repeat the scenario several times to obtain the best and most accurate result. They could decide how much time they needed to rest between trials, and also, volunteers could decide the order in which they desired to perform the activities [5]. Repetition becomes a factor, as described in Ref. [5], because the volunteers can get used to the pattern of simulating that activity, resulting in activities performed in an unnatural manner.

With this said, we created a new column to label each sample as ADL or fall, represented by 0 s and 1 s, respectively. For this, we implemented a for loop, in which we compared the value stored in the activity ID column, and if this value was greater or equal to 100, we set the output column to 1, otherwise 0 was attributed.

Since the volunteers performed several times the same activity, the best scenarios were chosen to compose the dataset. Taking this into consideration, the dataset has 1797 samples of simulated falls and ADLs. Three features were extracted from the dataset to train the models: the mean, the standard deviation, and the range. The features were extracted for each one of the three axes of the sensors. The dataset was divided into three parts to perform training, validation, and testing of the models.

It is noteworthy to mention that the dataset needed simple data manipulation before extracting those features. The original dataset published by Ref. [14] is in bytes, so this way authors can adapt the dataset to their sensor's sensitivity. We considered the same sensitivity for the accelerometer, gyroscope, and magnetometer. The sensitivities were 0.244 mg/LSB, 70 mdps/LSB, and 0.14 mgauss/LSB, respectively. Since the dataset was used as a Pandas dataframe, we multiplied every column by its corresponding sensitivity; after multiplying every data sample, we obtained the sensor's original readings.

**Figure 4** shows part of the Dataframe structure. It has 1798 rows and 7 columns in total. It is important to remark that only the data collected by the sensors located at the waist of the participants were used in this work. Also, the barometer readings were not considered.


#### **Figure 4.**

*Example of the Dataframe structure.*


#### **Figure 5.**

*The modified Dataframe structure.*

The feature extraction was performed using the built-in functions in the Pandas' library. Pandas has a mean, standard deviation, and minimum and maximum functions available and ready to use, so firstly 27 new columns were added to the dataframe to save the features. Eighteen columns were needed since we are considering the three features for every one of the three axes, that is, three columns for acceleration mean in x, y, and z, repeating this to the standard deviation and range of the accelerometer, so having a total of 27 columns.

We transformed the original column of each sensor containing all three axes into three separated columns to represent each of them. Next, we used the functions mentioned before to calculate the features. Since there is no built-in function to calculate the range, we find the maximum value and subtracted the minimum value. After completing these steps, the dataframe has all the characteristics and it is ready to use with the ML model.

**Figure 5** shows the Dataframe final state after including the accelerometer, gyroscope, and magnetometer features.

In **Figure 5**, it is possible to observe the pure data of the "Acc", "Gyr," and "Mag" sensors; however, the models will be trained with the columns that are on the right of those measures. A column called "Fall" identifies whether this event represents a fall or an ADL.

We used 80% of the data (not the 80% of the volunteers) for training, 10% of data for validation, and the residual 10% for final testing. It is important to note that the models were trained using the stratified k-fold cross-validation technique, with k *A Robotics-Based Machine Learning Approach for Fall Detection of People DOI: http://dx.doi.org/10.5772/intechopen.106799*

## **Figure 6.**

*k-fold cross-validation. Adapted from Ref. [17].*

equal to 10, to obtain a good balance among the output classes. Data for both, training and validation, are divided into 10 parts, where k-1 is used for training and k is used for validation. This task is repeated k times to complete the training of the models. **Figure 6** illustrates this process.

To perform the data classification, two ML models were created. One model uses the SVM classifier, and the other one uses the k-NN. Both models use all the sensors' data with their respective characteristics. The models were studied and compared with the results obtained by the authors in the related works.

## **4. Results**

### **4.1 SVM**

In this work, the dataset was divided randomly. By performing the training and validation, it was possible to achieve an accuracy of 95.05% using the SVM model. However, this value cannot be considered the final accuracy because it is necessary to submit the model to a final test. In the final test, we used data which was not previously known by the model. The purpose of this procedure is to classify the unknown data.

The accuracy of the final test was 93.89%, with a sensitivity of 85.10% and a specificity of 96.99%. The accuracy informs how many samples were correctly classified. On the other hand, the sensitivity is the ability to predict the true positives of each available category and lastly, the specificity is the ability to detect the true negatives of each category.

The confusion matrix is shown in **Figure 7**. This matrix was created from the results retrieved from the final test. It is possible to observe the true negatives, false negatives, false positives, and true positives, where 0 represents ADLs and 1 represents falls.

**Figure 7.** *Confusion matrix for the SVM model.*

It is possible to observe that there were 129 true negatives, 4 false positives, 7 false negatives, and 40 true positives. This is a good result because the model correctly classifies 40 of the 47 falls.

### **4.2** *k***-NN**

In this technique, we followed the same procedure as described before. After training, the model presented an accuracy of 88.45%. In the final test, with unknown data, it was possible to achieve an accuracy of 87.77%, a sensitivity of 82.98%, and a specificity of 89.47%. In this study, the results of the k-NN model were inferior, when compared with the SVM model, that is, the accuracy was 5.44% lower in relation to the SVM model. The confusion matrix for this model is shown in **Figure 8**.

Compared to the SVM model, the number of false negatives was increased by 1, and the number of false positives increased by 10.

### **4.3 Analysis**

To get a better understanding of the results, it is necessary to make a comparison with the related works. It is worth noting that among the related works there is a discrepancy among the results using the different ML techniques. Likewise, it should be considered that each one of the authors used different features or methods to perform the data categorization (**Table 2**).

The best results can be found in Ref. [12] because the authors in Ref. [13] did not base their solution using ML. The classifier is based on thresholds; however, it is important to note that the extraction of the thresholds was performed using the SVM technique. In Ref. [12], the authors achieved accuracy of 99.1%, and in Ref. [14], the accuracy was 89.70% when applying the k-NN technique. In this chapter, we achieved an accuracy of 87.77%, thus 11.33% below the result of Ref. [12] and 1.93% below the results in Ref. [14]. The results obtained here are comparable to the results in Ref. [14] due to the similarity of accuracy between these studies.

*A Robotics-Based Machine Learning Approach for Fall Detection of People DOI: http://dx.doi.org/10.5772/intechopen.106799*

#### **Figure 8.**

*Confusion matrix for the k-NN model.*


#### **Table 2.**

*Comparison between the best results achieved in the related works.*

The SVM performed better in this work, so it is possible to make a direct comparison with Ref. [14]. In this chapter, the best accuracy was 93.89% compared with the 85.86% in Ref. [14]. In Ref. [12], the higher accuracy was achieved (99.48%). Different features were used for the accelerometer, magnetometer, and gyroscope sensors, considering that each one of the related works used different features to train the ML technique. In Ref. [14], the authors used three features, obtained from the accelerometer. In this work, we extracted three features, for each one of the three sensors.

Every work has its limitations, and this work is not an exception to that rule. The simple statistical characteristics can represent a limitation of this work. This can be considered as one due to its lack of precision representing the original signal. The original recorded signal was 20 seconds long, as mentioned before, so representing these signals only by using the chosen features can be not accurate enough. This limitation should be taken into consideration if the intent is having a more realistic classifier.

A shortcoming of this work is the fact that the ML models were not tested with a hardware implementation. All results were obtained by simulations, by using the test set separated in the first stage of the implementation. The models can be embedded into a microcontroller and classify data in real time. The outcome of a hardware implementation can yield different results, them being higher or lower in comparison with those obtained by simulation.

## **5. Conclusions**

An ML-based approach to fall problem detection was presented in this work. A literature review made possible to understand what is behind a fall and its consequences on people, as well as to remark the ML techniques explored in the literature to approach this problem. In this study, two models were created using different ML techniques, and training was the same for both. We applied *k*-fold cross-validation, with training, validation, and testing sets. Both models were trained considering the data obtained from the accelerometer, gyroscope, and magnetometer.

The mean, standard deviation, and range were used as input features for the ML models. The results reached a value that enables comparisons to those in the related studies. The best result was accuracy of 93.89% for the SVM technique. Currently, an embedded system is being developed with an ESP32 microcontroller to communicate with the sensors, embedding the classification algorithm and sending notifications.

This work can be complemented by embedding the ML models and building a physical device to test the models in real time with sensor readings, consequently obtaining more realistic results. To further improve this work, we recommend employing more features, like authors of Ref. [12] did with their work. By applying more characteristics it is possible to have better results, since there is more information regarding the sensor's readings. Having more information fed to the models is a better approach because they can have a better understanding of what those characteristics are representing; therefore, a better division of possible outputs is achieved.

## **Acknowledgements**

We gratefully acknowledge the support of the Fundação de Amparo à Pesquisa do Estado de Santa Catarina (FAPESC), grant number 2021TR001236; and the National Council for Scientific and Technological Development (CNPq), grant numbers 305835/2021-1, 424937/2021-2.

## **Author details**

Teddy Ordoñez Nuñez, Raimundo Celeste Ghizoni Teive and Alejandro Rafael Garcia Ramirez\* Universidade do Vale do Itajaí, Itajaí, Brazil

\*Address all correspondence to: ramirez@univali.br

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **References**

[1] World Health Organization. Falls [Internet]. 2021. Available from: https:// www.who.int/news-room/fact-sheets/ detail/falls. [Accessed: May 20, 2020]

[2] Cabral DraK de N. Quedas dos idosos podem ser prevenidas [Internet]. 7 de fevereiro. 2019. Available from: https:// www.hospitalsiriolibanes.org.br/ sua-saude/Paginas/prevencao-quedasidosos.aspx. [Accessed: November 13, 2020]

[3] Lord CJ, Colvin DP. Falls in the elderly: Detection and assessment. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Vol. 13. New York, USA: IEEE; 1991. pp. 1938-1939

[4] Williams G, Doughty K, Cameron K, Bradley DA. A smart fall and activity monitor for telecare applications. In: Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Vol. 20. Biomedical Engineering Towards the Year 2000 and Beyond (Cat No98CH36286). New York, USA: IEEE; 2000. pp. 1151-1154

[5] Noury N, Fleury A, Rumeau P, Bourke AK, Laighin GÓ, Rialle V, et al. Fall detection - principles and methods. In: Annual International Conference of the IEEE Engineering in Medicine and Biology - Proceedings. New York, USA: IEEE; 2007. pp. 1663-1666

[6] Leite GV. Detecção de Quedas de Pessoas em Vídeos Utilizando Redes Neurais Convolucionais com Múltiplos Canais [Internet]. Universidade Estadual de Campinas. 2020. Available from: http://repositorio.unicamp.br/ bitstream/REPOSIP/341843/1/Leite\_ GuilhermeVieira\_M.pdf

[7] Rodrigues Silva EK. Desenvolvimento de um sistema de detecção de quedas para idosos [Internet] [Dissertação]. Universidade Estadual da Paraíba. 2018. Available from: http://tede.bc.uepb.edu. br/jspui/handle/tede/3570

[8] Júnior CSDQI. Sistema de Detecção de Quedas de Idosos [TCC]. RS, Brazil: Universidade de Caxias do Sul; 2016

[9] Abbate S, Avvenuti M, Corsini P, Light J, Vecchio A. Monitoring of human movements for fall detection and activities recognition in elderly care using wireless sensor network: A survey. In: Wireless Sensor Networks: Application-Centric Design. London: InTech; 2010. pp. 147-166. Available from: http://www.intechopen.com/ books/wireless-sensor-networksapplication-centric-design/ monitoring-of-human-movementsfor-fall-detection-and-activitiesrecognition-in-elderly-care-using-wi

[10] National Health Service. Overview: Falls [Internet]. 2021. Available from: https://www.nhs.uk/conditions/falls/. [Accessed: May 15, 2021]

[11] Mobilize. Relatório final da campanha e estudo realizado pelo Mobilize Brasil [Internet]. 2013. Available from: https://www.mobilize. org.br/midias/pesquisas/relatoriocalcadas-do-brasil---jan-2013.pdf. [Accessed: May 18, 2021]

[12] Özdemir AT, Barshan B. Detecting falls with wearable sensors using machine learning techniques. Sensors (Switzerland). 2014;**14**(6):10691-10708

[13] Pierleoni P, Belli A, Palma L, Pellegrini M, Pernini L, Valenti S. A high reliability wearable device for elderly

fall detection. IEEE Sensors Journal. 2015;**15**(8):4544-4553

[14] Saleh M, Abbas M, le Jeannes RB. FallAllD: An open dataset of human falls and activities of daily living for classical and deep learning applications. IEEE Sensors Journal. 2021;**21**(2):1849-1858

[15] Albert M. V.; Kording, Konrad; Herrmann, Megan; Jayaraman, Arun. Fall classification by machine learning using Mobile phones. PLoS One. 2012;**7**(5):1-6. DOI: 10.1371/journal. pone.0036556

[16] Rastogi S, Singh J. A systematic review on machine learning for fall detection system. Computational Intelligence. 2021;**37**(2):991-1014. DOI: 10.1111/coin.12441

[17] Scikit-learn. 3.1. Cross-validation: evaluating estimator performance [Internet]. 2022. Available from: https://scikit-learn.org/stable/modules/ cross\_validation.html. [Accessed: August 16, 2022]

## **Chapter 6**

## Machine Learning and Cognitive Robotics: Opportunities and Challenges

*Thomas Tawiah*

## **Abstract**

The chapter reviews recent developments in cognitive robotics, challenges and opportunities brought by new developments in machine learning (ML) and information communication technology (ICT), with a view to simulating research. To draw insights into the current trends and challenges, a review of algorithms and systems is undertaken. Furthermore, a case study involving human activity recognition, as well as face and emotion recognition, is also presented. Open research questions and future trends are then presented.

**Keywords:** neural networks, cognitive control architectures, software frameworks, imitation learning, reinforcement learning

## **1. Introduction**

Cognitive robotics aim at endowing robots with intelligent behaviour by providing processing architecture that allows them to interact with the environment, learn, understand and reason about the environment, and behave like humans in response to complex world dynamics. These are problem-solving, intentional (planning), reactive, learning, understanding and explaining behaviours. Behaviours are based on modelling biological systems, optimal control theory (engineering), neurosciences, and other behavioural sciences. Typical applications where cognitive capabilities are important in manufacturing are pick and placement, machine inspection, and collaboration and assistance. Service robots are specialized robots [1], which operate either semi or fully automatically to perform services useful to humans (excluding manufacturing operations), such as caring for the elderly and rehabilitation. The autonomy of such robots is fully oriented towards navigation in human environments and/or human-robot interactions. Enabling more autonomous object manipulation with some level of eye-hand coordination and high precision in a complex environment is a challenge [2]. To embed systems with more sense of intelligence, collaborations between AI, machine learning and robotics communities are essential to achieve remarkable progress. Robot learning refers to the robot learning about itself and the effect of its motor commands and action. Examples include learning sensorimotor skills (locomotion, grasping and object manipulation) or interactive skills

(manipulation of an object in collaboration with a human being). The field of developmental robotics and evolutionary robotics has also emerged to deal with how robots learn. In cognitive robotics, an integrated view is taken of the robots, their motor, perceptual subsystems and the body's interaction with the environment. The main challenge is a lack of adequate knowledge of the human brain at different stages of development to enable adequate modelling.

Mobile agents are the principal means of embedding cognitive processing capabilities in robotic systems. These are software components that can carry out functions autonomously on behalf of another entity to realize tasks and can migrate from one robot to another through Wi-Fi networks. Embedded cognitive robotics focuses on understanding and modelling perception, cognition and action in artificial agents through bodily interactions with the environment to be able to perform cognitive tasks autonomously [3]. Several authors have reported works using mobile agents [4, 5]. From a technical point of view, there are several open challenges in the implementation of motor and cognitive skills in artificial agents. State-of-the-art robots are still not properly able to learn, adapt, react to unexpected conditions and exhibit a level of intelligence to operate in an unconstrained environment.

Machine learning (ML) algorithms are computationally intensive data-driven analysis, modelling and inference techniques based on statistical (clustering), evolutionary computing, neural networks (deep neural networks) and mathematical optimization [6]. The processing pipeline given a set of data sequentially consists of preprocessing, feature extraction, modelling, inference and prediction. The modelling stage may involve iterative minimization of the criterion of the model fit between a discriminant and the data. It focuses on the development of algorithms that allow computers to automatically discover patterns in the data and improve with experience, without being given a set of explicit instructions. ML has been applied in experimental robotics to acquire new skills; however, the need for carefully gathered data, clever initialization and conditioning limits the autonomy with which behaviours can be learned. In particular, deep learning neural networks with several levels of composition have achieved remarkable performance in vision and natural language processing. It can be leveraged via transfer learning to generalize from simulation to the real world via domain randomization [7–9] to learn end-to-end visuomotor controllers [10, 11]. The limitations of deep neural network (DNN) techniques such as interpretability, susceptibility to adversarial attacks, privacy issues and stability under perturbations in designing end-to-end control policies are worth addressing. In particular, reliable long-term prediction is desirable to enable re-planning to adapt to the changing environment [12].

Machine learning techniques embedded within current AI systems (via agents) have increasingly shown sophisticated cognitive capabilities. For example, an existing approach in machine learning to lexicon acquisition is focused on symbol grounding problems on how to connect sound information from a human and sensor information from robots captured from the environment. A multi-sensory approach based on cooccurrence probabilities between words and visual features that is observed by a robot [13] improves as a result of using an active selection of motion based on saliency [14, 15]. Several developments in cognitive robotics underlying its multi-disciplinary nature are presented.

Traditional approaches of processing are based on a bottom-up approach with the processing pipeline starting sequentially from sensing, perception, cognition and action under control architecture such as in ref. [16], which is essentially behaviour based and, later on with high-level decision processing [17], incorporated to enable

## *Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

more autonomy. Fundamental to robotics are the control policies that guide the behaviour of a robot. It is mainly based on control theory and mathematical optimization or biologically inspired models with control relying on vision in combination with other sensing modalities (olfactory) [18, 19]. A lot of models have been developed governing the behaviour of a robot itself and how it interacts with its environment [12, 20, 21].

There are three main control architectures, namely, logic-based, subsumption and hybrid architectures [16]. The logic-based architecture uses a set of rules and provides pro-active behaviour, whilst the latter incorporates intelligence and interaction with the environment as a means of introducing cognition. Behaviour is organized hierarchically. The hybrid architecture achieves modularity and interactivity between layers. Because the models used were relatively simple, it suffers from the problem of scalability and modelling of complex scenarios. Instead of providing all information to the robot a priori, for example, possible motions to reach a certain target position, the agent will, through some process, 'learn' which motor commands lead to what action. For autonomous systems, a decision level incorporating capacities of producing plans and supervising their execution, whilst at the same time being reactive to events from the previous layer has been added to the top-level hierarchy [6]. They are typically used in controlling the robot (motion control) or in carrying out tasks. Different multi-robot configurations including robotic swarm use multi-agent systems to carry out complex tasks.

Predictive processing (PP) [3], a processing approach in cognitive sciences, is increasingly being used in cognitive robotics. It is a top-down approach that aims at unifying perception, cognition and action as a single inference processing. It is predominantly based on the free-energy principle [22], which is associated with frameworks such as predictive coding, active inference and perceptual inference. The freeenergy principle seeks to minimize prediction errors [20]. It asserts that through bodily interaction with the environment, agents are expected to learn and then be capable of performing cognitive tasks autonomously [23]. The core of information flow is top-down and the bottom-up flow of prediction error. Control motor commands are replaced by proprioceptive top-down prediction using the forward model [24]. PP is typically used in motor control and estimation of body states of a robot [25, 26]. A neural network is typically used as the generative model. Active inference, a related frame work, aims at minimizing prediction error or free energy using variational inference. It involves constructing a forward model involving hidden states to reduce proprioceptive noise for control [21].

To address issues in cognitive robotics, researchers in developmental robotics build artificial systems capable of acquiring motor and cognitive capabilities by interacting with the environment inspired by human development [27]. Traditionally mobile agents, simulated robots, humanoids or specially designed apparatus are used for research into higher-order cognitive capabilities (learning, communication and understanding) mimicking the functionalities of the human brain like its internal structure, infrastructure and social structures. The model starts from foetal sensorimotor mapping (mechanisms of dynamic motions and motor skill development) in the womb, body and motor representation and spatial perception through to social behaviour learning (communication, action execution and understanding) and spatial perception. Important insights have been gained; for example, ref. [28] indicates that control and body structure are strongly connected, with the body having the role of controlling its motion. In ref. [29], dynamic walkers realize walking on slopes without any explicit control or actuation, saving energy.

Models of human communication mechanisms have been used in developing interactions such as between caregivers and robots, action execution and understanding, development of vocal imitation and joint attention [27] in human-robot communications. Sumioku et al. [30] proposed an open-ended learning loop of social action by which artificial infants reproduce experience contingency using information-theoretic measure of contingency. Typically, gaze-following or utterances about the focus of attention are used for joint attention. Human-like robots able to show distinct facial expressions to be used in specific situations have been developed [31, 32], but the robots are unable to adapt to non–pre-specified situations. From control perspective, some of the capabilities required [33] for collaboration and assistance between robots and humans are as follows: the ability to perceive the world in a similar way to humans; the ability to communicate with humans using natural language; the ability to develop cognition through sensorimotor association; the ability to use attention and emotion to control behaviours and the ability to produce appropriate behaviours in a variety of situations. Clearly, this calls for a multidisciplinary approach involving neural sciences, developmental robotics, psychology, and engineering. Kawamura and Brown [34] approached the problem using working memory-based multi-agent systems for robot behaviour generation.

Evolution has equipped humans with a wide range of tools for collaboration, including the use of language, gestures, touch, and facial expressions, to facilitate interactions. Robots must support many of these communication methods to effectively collaborate or assist humans. In particular, for robots working in human environment, there is an urgent need to anticipate and recognize bodily movements and facial expressions, to offer timely and effective assistance when needed. To this end, a case study involving facial and action recognition to illustrate some capabilities in this regard is presented. The rest of the chapter is structured as follows: Section 1.1 introduces computational architecture and platforms for cognitive robotic systems. Sections 1.2 and 1.3 cover the roles of technology and software, respectively. Section 1.4 deals with the role of decision-making in cognitive systems. Sections 1.5, 1.5.1 and 1.5.2 briefly introduce the main algorithms used in cognitive robotics, namely, reinforcement learning and imitation learning algorithms, highlighting developments in ML that have made it possible for renewed interest in these algorithms. Section 1.5.3 reviews deep learning networks for feature learning and classification. Sections 1.6 and 1.6.1 provide a case study on human activity recognition. Section 1.7 briefly reviews current trends, whilst Section 1.8 discusses successes, challenges and research directions. Finally, Section 1.9 concludes the chapter.

#### **1.1 Architecture and platforms for cognitive robot research**

To facilitate the development of mature cognitive robotic systems, several computing platforms including real robots like humanoids (icub), panda and Hobo, simulators, and middleware like ROS and YARP are available. Particularly, to facilitate the development of mature cognitive systems, robots must continuously interact with the environment, know where objects are in the scene and understand the consequences of their generated actions. The icub [35] humanoid robot is a 53-degree-offreedom humanoid robot of approximately the same size as a three-year-old child. It can crawl on all four limbs and sit up. Its hand allows dexterous manipulation, and its head and eyes are fully articulated. It is an open systems platform available for research under GNU general public license. Its capabilities are built based on an ontogenetic pathway of human development. **Figure 1** shows different postures of icub. Robotic simulators are of interest despite not being able to provide a full model

*Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

**Figure 1.** *iCub robot in different postures from ref. [35].*

of the complexity present in the real environment. For example, the icub simulator [35] has been designed to reproduce as accurately as possible the physics and dynamics of the robot and its environment with the constraint of running approximately in real-time. It is composed of multiple rigid bodies connected via joint structures. It consists of the following components: physics and rendering engines, YARP protocol for simulated icub and body model. All commands sent to and from the robot are based on YARP instructions. More details are provided in ref. [36]. Besides, there are several platforms for humanoids and other robots in studies reported in ref. [37–39]. Details of Pioner3-AT bender robotic platform are provided in ref. [40]. There are also several European Union funded research projects on cognitive robotics that have resulted in several architectures, system concepts and benchmark datasets [41]. Several simulators for robotic systems are provided in ref. [42]. To build cognitive systems, several computational architectures have been designed and built to realise different cognitive platforms.

The following are representative architectures: The Clarion [43–45] architecture is a broadly scoped computational psychological model based on the dual theory of the mind, capturing essential structures mechanisms and processes of the mind. It provides a framework, essential structures and computational model for realising processes of the mind. It also facilitates detailed exploration of the mind and psychological theories. Clarion consist of four subsystems, namely, action-centred subsystem (ACS), non– action-centred subsystem (NACS), motivational subsystem (MS) and metacognition subsystem (MC). MS provides the impetus for action and cognition, whilst MC provides for monitoring and regulating other processes. Together, these subsystems address action, skill learning, memory, reasoning, motivation, personality, emotions and their interactions. **Figure 2** is a high-level diagram of Clarion. Each subsystem consists of two levels, which is a dual representation structure. The top-level encodes

**Figure 2.** *CLARION architecture [44].*

explicit knowledge, potentially corresponding to 'conscious', and the bottom layer encodes implicit knowledge corresponding to 'unconscious' knowledge and also corresponds to symbolic versus connectionist representation.

Computationally, ACS is realised with multilayer perceptron or reinforcement learning, whilst NACS implements implicit declarative processes with associative memory. Explicit declarative processes are captured as symbolic associative rules. Implicit processes deal with drive activations captured by MLP and explicit processes deal with goals. More details of the architecture are provided in ref. [44]. Other general purpose architectures include Soar [46], which integrate knowledge, intensive reasoning, reactive execution, hierarchical reasoning and learning from experience. It has the goal of creating systems with cognitive capabilities like humans. Several other projects that target specific robotic platforms have produced application-specific cognitive architectures. These include the HAMMER [47, 48] and ArmarX and Xperience architectures on Armar humanoid robot [49]. The HAMMER architecture is for assistive robotic agents cooperating with humans to carry out tasks. It provides for sensing user states and actions, modelling skills and predicting intentions and personalising to maximise assistance effectiveness over extended periods of interactions. ArmarX is a hybrid architecture, proposed for human observation and experience. Interactions with humans occur in natural language. It recognises the need for help and reason about the world. The original architecture proposed has been continuously extended in several projects. It consists of three layers, namely, high-level layer for planning and reasoning, mid-level layer for mediating symbolic knowledge and sensory-motor data; and lowlevel layer for robotic behaviour focusing on functions and skills, hardware abstraction layer and bridging middleware to other robot software frameworks.

Using virtual environments for simulation is very important to ensure the safety of robots, humans and other objects in the environment; the slow wall clock time makes it a too slow method to generate enough data in a reasonable time frame, and physical trials are slow and costly and the learned behaviours are limited. Increasingly, the use of complex simulation environments is being used for experimentation and research. By training a virtual robot in countless situations, such as low-probability scenarios, it is the objective of the system to learn to generalize from the scenarios and safely handle future yet unseen scenarios. When the physical properties of the environment, such as gravity, friction coefficients and the object's visual appearance, are used and randomized, it becomes apparent that the learned models transfer successfully to the physical robot using domain randomization [50]. One such platform is the Unity [51, 52] 3-D rendering platform, a cloud scalable infrastructure for generating thousands of frames per second. For video games, Arcade learning environment (ALE) [53] is a standard test bed for deep reinforcement learning (DRL) algorithms, and it supports discrete actions. TORCS car racing simulator [54], on the other hand, supports continuous actions for deep reinforcement algorithms.

### **1.2 The role of technology**

The pervasiveness of information communication technology (ICT) is evident everywhere in our daily lives. In industrial settings, the following are some examples:


In our daily lives, examples include the numerous gadgets in our homes to assist in our daily lives and care for the elderly. Robotics network and cloud robotics have evolved to connect robots and allow a central or distributed intelligence to command and control any set of robots. Advantages include flexibility, simplification of hardware and software about the robot, ease of re-planning and task management of complex robots. Several configurations exist for robots, namely, stand-alone robots, networked robots and cloud robotics [55]. Networked robots address the problem associated with stand-alone robotic systems by sharing perceived data with each other and executing tasks in a cooperative and coordinated manner. Cloud computing empowers robots by providing faster and more powerful computational capabilities through massively parallel computation (using CPUs, GPUs, and clusters and data centres) and higher storage facilities, as well as access to open source, big datasets and software cooperative learning capabilities.

Typical applications include human-assisted driving and self-driving vehicles for safe transportation, Industrial 4.0 drives to create cyber-physical systems for industrial processes based on cloud by creating a replica in the cyberspace for closed-loop feedback [56] and support for autonomous and smarter processes. It also caters for the convergence of sensing, computation and communication by providing a common platform for integrating data acquisition, processing, storage and decision making. AI agents for digital twin 4.0 provide movement prediction, tasking learning, risk reduction and predictive maintenance. Fundamental to most of these developments is AI and ML for continuous decision-making.

CR are expected to continuously learn and adapt to their environments and make decisions in real-time when required under conditions of uncertainties in sensor data, processing complexities, privacy and security constraint to arrive at timely and effective decision-making. AI and ML empowered agents is one approach to realising this goal. Current robotics have made significant progress in sensing perception and control problems but find it challenging to provide integrated thinking, feeling and knowing [57]. It is still very challenging for two-legged robots to walk naturally in unconstrained environments. Several challenges exist in using robotic platforms such as the high cost of prototyping, steep learning curve and programming robots to carry out complex tasks like autonomous driving in unconstrained dynamic environments.

## **1.3 The role of software**

Closely related to cognitive robotics is cognitive computing (CC), which is a multidisciplinary field aiming at devising computational models and decision-making mechanisms based on neurobiological processes of the brain, cognitive sciences and psychology. It aims to endow computers with the ability to think, feel and know. Since there is no commonly accepted definition of cognition, there are several definitions of cognitive computing [58, 59]. Wang [59] defines cognitive computing in terms of

cognitive informatics that applies how the brain processes information and copes with decision-making to information sciences. CC is defined as an emerging paradigm of intelligent computing methodologies and systems based on cognitive informatics that implements computational intelligence by autonomous inferences and perceptions mimicking the mechanisms of the brain. Research in cognitive computing is focused on three thematic areas, namely, computer systems with a faculty of knowing, thinking and feeling. Applications of CC include education, healthcare, commerce and industry.

When software adds intelligence to information-intensive processes, it is known as robotic process automation. The process uses AI to extend and improve action and saves cost and customer satisfaction. It is typically used in completing a complex business process that uses unstructured data or persists over a long period [57]. Typically a bot (an agent for a user of a program) observes the process to automate the process.

One of the requirements for robust and effective CR is software integration frameworks. This is justified when one considers the following:


Software frameworks enable thinking by taking advantage of brain-like computer machinery or determine causal relationships among concepts of a given domain. There have been several published works on software frameworks [60] prototyping, development of middleware, sustainable software design and architectural paradigms. MARIE [61] is a component-based software architecture for integrating and combing heterogeneous software and computational paradigms. It adapts the mediator design pattern to create a mediator interoperability layer (MIL). MIL is implemented as a virtual space where applications can interact together using a common language. ROS (robot operating system) [62], an open-source robotic middleware suite, is frequently used in robotic projects. ROS provides a set of software frameworks for software development. ROS provides the following services: hardware abstraction, low-level device control, message passing between two processing, package management and other functions; ROS 2 [63] and above provide real-time support and an embedded system. ROS is made up of three components: language and platform independent tools for building and distributing ROS-based systems; ROS client implementations (Roscpp, rospy, roslisp, etc) and packages containing application-related code. Ros

*Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

typically connects to robots via webSockets and operates on cloud servers. There are several platforms on which ROS runs including ROSbot, Nao Humanoid [64] and Raven II surgical robotic research platforms. Peira et al. [65] provide a framework for using ROS on the cloud. Davinci [66] is another software framework that is cloudbased for service robots exploiting parallelism and scalability. It is based on the Hadoop cluster combined with ROS as the messaging framework. Fast SLAM algorithm, an environmental mapping algorithm for large-scale mapping, was implemented on this platform with significant performance improvement.

A framework for unifying multi-level computing platforms and orchestrating heterogeneous edge, fog and cloud computing resources compliant with MEC [67] was proposed in ref. [68]. It is suitable for integrating different computing, communication and software technologies.

## **1.4 The role of decision-making**

At the core of most ML tasks is decision-making based on information fed to the decision maker, for example steering or breaking a car. Decision-makers used to be either a human or a group of humans; now it can be AI using different combinations of ML and traditional algorithms via agents technology. According to Kahneman [69], there are two modes, namely, system 1 and 2 modes of the human brain, and most ML methods emulate the mode of operation in system 1. ML establishes empirical associations through training and learning. When given scenarios resembling training scenarios, ML yields results in a fast way. However, it struggles when given scenarios not covered during training or the training was inadequate. In human decision-making, when system 2 fails to intervene because it is fooled by an apparent coherent picture created by system 1 tends to result in decision-making. Thus, if ML is to be used in decision-making, the ability to detect difficult and dangerous situation tend to trigger system 2.

In cardiovascular medicine, ML is routinely used to perceive an individual by collecting and interpreting his/her clinical data, and clinicians would reason on them to suggest actions to maintain or improve the individual's health. Thus, it mimics the clinicians' approach when examining and treating sick patients [70]. Big data leveraged by ML can provide well-curated information to clinicians so that they can make better informed diagnosis and treatment. ML analyses have demonstrated human-like performance in low-level tasks in robotics and cardiology.

There have been studies reporting on the success of sensing-perception-control/ action loop in autonomous vehicles [56].

Higher-level tasks involving reasoning such as patient status interpretation and decision support, and reasoning under uncertainties and dynamic environment in robotics have proven to be challenging. Intention predictions in a dynamic environment are also challenging. Similarly, human-robot cooperation for safe road transportation includes challenges in infrastructure [71] (sensor, communication subsystem, computing and storage) and predicting behaviour when driving, motion prediction and gesture recognition.

### **1.5 Review of algorithms**

From the cognitive architecture descriptions discussed, at the high level, the actions of a robot are goal-directed, with the middle layer responsible for intermediate organisation, planning and execution using some memory hierarchy. The bottom

layer is reactive and deals with the environment. For a robot to be able to interact with other objects and its environment, it needs to know how to predict the consequences of its actions using typically a forward model: X=(S, πθ), where s is the state of the robot, and πθ: S->A is a parameterized action policy (A) to the space of effect or task space. Similarly, the inverse model computes the action policies that can generate a given effect (S, Y)-> πθ. Some examples are mapping of movements of the hand in the visual field to the movement of the end point of a tool, and oscillation of the legs to body translation of a robot. There are two main approaches, analytical approach based on control engineering and learning-based approach. The main challenge is to model a prior all the possible interactions between a robot and its environment. Learning is additionally confronted with multimodal sensing perception, high dimensional spaces, continuous and highly non-stationary spatially, and temporary state spaces. Typically, statistical regression is used to guide autonomous exploration and data collection. Alternatively, an approach for learning and constraining the environment is active learning. Several learning paradigms have been used including reinforcement learning and imitation learning. Several machine learning techniques such as deep learning have been used to model robotic agents in the real world. Deep learning networks build a model that produces end-to-end learning and inference system driven purely by data. Most of the approaches reported in the literature make use of neural networks to construct forward and inverse modules. To overcome the problem of catastrophic forgetting (training a model with new information interferes with previously learned knowledge [72]) in neural networks, special memory architectures may be used [34] besides pure algorithmic approaches. Additionally, other cognitive approaches from developmental robotics, neuroscience and other behavioural science approaches have been used. Active learning and inference approaches constrain the search space and allows self-exploration. These methods generally begin using random and sparse exploration, build meta-models of the performances of the motor learning mechanism and concurrently guide the exploration of various subspaces for which the notion of interest is defined [73]. Interest is defined in terms of variants of information gain (variance, entropy or uncertainty). Motivational and goal-driven approaches where exploration and search are goals/curiosity or attention driven [74– 76] to reduce the large search spaces. Cognitive processing techniques can be split into two main approaches, namely, the control theory approach and the free energy-based approach. Although both of them use optimization techniques, the latter approach seeks to minimize free energy prediction error using variational or Bayesian approaches.

### *1.5.1 Review of reinforcement learning*

There are three main classes of algorithms for machine learning, namely, supervised, unsupervised and reinforcement learning. In supervised learning, data defining the input and corresponding output (often called 'labelled' data) are available. In unsupervised learning, only the input is available and the structure of the underlying data is typically solicited. It is used to explore the hidden structure of the data. In reinforcement learning (RL), learning takes place by trial and error interactions with the environments. It is goal-directed learning that constructs a learning model specifying output to maximize long-term profit. Deep RL (DRL) uses deep learning methods (multi-layer neural network) to learn models and representations at different levels of abstraction [77] in an unsupervised manner. It leverages deep learning as a function approximator to deal with high-dimensional data. DRL algorithms have

*Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

been applied to robotics allowing control policies for robots to be learned directly from camera inputs in the real world [11]. The basic model of RL is shown in **Figure 3**.

At time t, the agent receives state st from the environment. The agent uses its policy to choose an action at. Once the action is executed, the environment transitions a step providing the next state St+1, as well as feedback in the form of reward Rr+1. The agent uses knowledge of state transitions of the form (St, At, St+1, Rt+1) to learn to improve its policy. A policy (π) is a mapping function from any perceived state s to action taken from that state. Alternatively, a policy can be interpreted as a probability distribution of candidate actions that will be selected from state (s) as in Eq. (1):

$$\mathfrak{a} = \phi(\mathfrak{s}) = \left\{ p(a\_i|s) | \forall a\_i \in \Delta\_\pi \Lambda \sum p(a\_i|s) = 1 \right\} \tag{1}$$

Δ<sup>π</sup> denotes candidate actions on policy π, and p(ai|s) denotes the probability of taking action ai given the state s. A policy is deterministic if the probability of choosing an action a from s is p(a/s)=1 for all state s, otherwise, stochastic, i.e, p(a|s)<1. A value function is used to evaluate how good a certain state or state-action pair (s,a) is. For this purpose, a generalized return value Rt, defined by Eq. (2) is used, where γ (0 < γ<1) is the discounted factor.

$$R\_l = r\_{l+1} + \gamma r\_{l+2} + \gamma r\_{l+3} \dots \gamma^{T-t-1} r\_t = \sum\_{i=0}^{T-t-1} \gamma^i r\_{l+i+1} \tag{2}$$

The value of a state under policy π is evaluated as the expectation of Rt defined by Eqs. (3) and (4) for the state and state-action pair, respectively. E denotes expectation operation.

$$V\_{\pi} = E[R\_r | \mathbf{s}\_t = \mathbf{s}, \pi] \tag{3}$$

$$Q\_s(\mathfrak{s}, a) = E[R\_r | \mathfrak{s}\_t = \mathfrak{s}, a\_t = a] \tag{4}$$

Underlying RL is dynamic programming [78] and bellman equations for optimality under Markov decision process modelling. RL algorithms have been successfully applied to several real-world problems with limited state spaces to problems in control and navigation. However, it faces the following challenges:

• The optimal policy must be inferred by trial and error interaction with the environment with the only learning signal being the reward.

**Figure 3.** *RL algorithm using a single agent.*


Underlying RL is the Markov property that the current state affects the next state or is conditionally independent of the past given the present state. Partially observable Markov decision processes (POMDP) are Markov decision processes (MDP) in which agent receives an observation p(ot+1|st+1,at) where the distribution is dependent on the current state and previous action [79]. An episodic MDP resets after each episode of length T, and the sequence of states, actions and rewards in an episode constitute a trajectory or rollout of the policy. There are three main types of reinforcement algorithms, namely policy-search, value-function based and those that combine both policy and value function approaches. They include actor-critic method, temporal difference and Monte Carlo-based methods [80, 81]. The increasing use of deep reinforcement learning (DRL) algorithms has been attributed to the low-dimensional representation of deep neural network representation and the powerful functional approximation of neural networks. The following significant recent developments in DRL have made it possible to scale to large dimensional state space:


## *1.5.2 Review of imitation learning*

Imitation learning (IL) aims to mimic human behaviour in a given task by facilitating the teaching of complex tasks with minimal knowledge through demonstration. There are three main classes of ML algorithms for imitation, namely, behaviour cloning, inverse reinforcement learning and generative adversarial learning [87]. Behaviour cloning applies supervised learning by learning a mapping between the input observation and the corresponding actions, provided there is enough data. Generative adversarial imitation is inspired by generative adversarial networks [88]. Typically, an agent uses instances of performed action to learn a policy that solves a given task using ML techniques. The agent could learn from trial and error or observe other agents. It has been applied to problems in real-time perception and reaction, such as humanoid robots, self-driving cars, human-computer interfaces and computer

## *Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

games. The assumption is that an expert (teacher) is more efficient than the agent learning from scratch when given a task [89]. Imitation learning is an interdisciplinary field of research, and it is sometimes difficult to define suitable reward function for complex tasks. For example, it is often the case that direct imitation of an expert's motion does not suffice due variations in the task such as the position of the object, environmental conditions and inadequate demonstrations [90]. Therefore, it is difficult to learn policies given demonstrations that generalized to unseen scenarios. The policy must be able to adapt to variations in the task and surrounding environment. Argall et al [91] address different challenges in the process of IL, such as computational methods used to learn from demonstrated behaviour and the processing pipeline. A typical representation of a sample for IL consists of pairs of action and state, such as position, velocity and geometric information, and modelling the process as MDP. The learning process is with pre-processing, sample creation and direct or indirect imitation.

The following are some of the challenges of IL [90]: Noisy or unreliable sensing, correspondence problem and observability where the kinematics of the teacher is not unknown to the learner. Further, complex behaviour is often viewed as a trajectory of dependent micro-actions, which violates independent and identically distributed assumptions in machine learning. Lastly, safety concerns in human-robot interactions, the ability of the robot to react to human force and adapt to the task. A typical flow chart [90] is shown in **Figure 4**.

There are different methods from demonstrations, namely, structured predictions [92], dynamic movement primitives [93], inverse optimal control (inverse reinforcement learning [94], active learning [95], transfer learning and other techniques.

**Figure 4.** *Imitation learning flowchart [89].*

Active learning needs a dedicated oracle that can be queried for demonstration. Inverse RL techniques use demonstrations to learn cost functions over extracted features. It first recovers a utility function that makes the demonstration near-optimal and searches for the optimal policy using a cost function as an optimization objective. Closely related is apprenticeship learning, which uses demonstrations from an expert or observation to learn a reward function. A policy that optimises the reward function is then learned through experience (trial and error). Transfer learning use experience from old tasks or knowledge from other agents to learn a new policy. The reader is referred to refs. [87, 96] for details of imitation learning and its applications in robotics. Learning a direct mapping between state and action is not enough to achieve the required behaviour in most cases due to cascade errors, insufficient demonstrations and the difficulty in reproducing the conditions and settings. The learner has to learn actions and re-optimise policies with respect to quantifiable reward functions. **Figure 4** is a flowchart showing different variants of imitation learning. The following are some recent developments:


## *1.5.3 Review of deep learning algorithms*

The recent success of deep neural networks (DNN) in computer vision and natural language processing has led to its application in cognitive robotics. Traditionally, cognitive robotics architecture has been built with artificial intelligence at the top level using a restricted form of natural language and gestures for communication, and biologically inspired mechanisms at the lower levels. Deep learning using DNN has been applied to perceptual processing, motor control, object manipulation and different cognitive processing level of the generic architecture discussed earlier on. A deep learning survey focusing on deep reinforcement learning and imitation has been provided by Tai et al [81], including applications in ML in robotics. Perception processing is passive since an intelligent agent receives observations from the environment and then infers the desired properties from the sensory input. Guo et al [98] provide a comprehensive overview of deep learning for perception. Similarly, for manipulation applications, Gu et al. [99] present on deep reinforcement learning for robotic manipulations. Gupta et al. [100] also present on robotic manipulations using human demonstrations. Several works relating to deep reinforcement learning in robotic navigation [101–103] have been published including those using SLAM [104, 105]. Zhang et al. [104] propose neural SLAM based on a neural map proposed

## *Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

by Parisotto and Salakhutdinov [106], which in turn uses a neural turing machine for the deep RL agent to interact with. The main challenge with DRL is the reality gap, which refers to discrepancies between models trained with data from simulated environment, transferred to the real world, and deployed on real robotic platforms. It is due to unrealistic environmental conditions such as lighting conditions, noise patterns, texture, etc., synthetic rendering and real-world sensory readings. It is particularly several with visual data (images and videos). Domain adaptations are typical to use to mitigate the problem [107] based on generative adversarial networks (GANS).

Other DNN architectures include convolutional autoencoders for low-dimensional image representation [108], deep recurrent neural networks [109] and deep convolutional networks [11, 110]. To improve robustness of deep learning networks, several strategies have been adapted, including the following: Use of auxiliary tasks in either supervised or unsupervised fashion; experience replay, hindsight experience, curriculum learning, curiosity-driven exploration, self-replay and noise in parameter space for exploration. **Table 1** provides a summary of representative research works covering different ML approaches to solving cognitive problems and the functionality provided. For industrial 4.0, initiative typical ML algorithms are provided in ref. [56].

## **1.6 Use case**

For robots acting as human companions, autonomy is fully-oriented towards navigation in a human-centred environment and human-robot interactions. It is facilitated if the robot's behaviour is as natural as possible. Some requirements are that robot independent movement must appear familiar and predictable to humans and have similar appearance to humans. Human–robot interactions include the following: use of natural language or subset for communication, gesture or activity interpretation that involves tracking and action recognition; gesture imitation that involves tracking and reproduction and the person following which involves 2-D or 3-D based tracking. Acceptable performance at the task level requires real-time processing constraint of 50 milliseconds per second. Safety is also very important as robots are expected to evolve in a dynamic environment, well populate with humans. The main challenge is that robotic systems lack learning representation, and interactions are often limited to


#### **Table 1.**

*Comparison of different ML techniques reported in the literature.*

pre-programmed actions. One solution strategy is to conceptualize cognitive robots as permanent learners, who evolve and grow their capacities in close interactions with users [86]. Robots must learn new tasks and actions relative to humans by observing and imitating (imitation learning). Thus human detection and tracking, activity recognition and face detection are some basic tasks that must be performed robustly in real-time. A use case is presented next, which deals with daily activity recognition at home and face recognition using publicly available dataset. These typically fit in several robotic studies investigated in human-centred environments [40]. The algorithms are first described, followed by an evaluation.

## *1.6.1 Activity recognition*

Research activities in domestic service robots have increased in recent years. Some of the main drivers are the projected future use of domestic robots for improving elderly people's quality of life, childcare, entertainment and education. Several benchmark datasets [126–129] and methodologies for evaluating the capabilities and performance of robotic platforms are available. Action recognition is used in several application domains such as surveillance, patient monitoring systems, human–computer interface, housekeeping activities and human assistance by robots (guiding humans). There are two processing techniques: spatial approach, which allows recognizing activities from images, and spatio-temporal approach for detecting specific activity as space-time volume.

The HMDB51 [130] is an action dataset whose action categories mainly differ in motion rather than static points. It contains 51 distinct action categories, each containing at least 101 video clips. Video clips are extracted from a wide range of sources. The clips have been annotated and validated by at least two human observers. Additionally, meta information tags allow for a precise selection of tags for training, testing and validation. Meta-data tags include information on camera viewpoint, presence or absence of camera motion, video quality and a number of actors involved. The training procedure is also described.

A simulation study on activity recognition based on spatio-temporal analysis of a large video database of human motion recognition [130] is provided. The main processing steps are shown in **Figure 5**. The algorithm consists of six main processing steps, namely, pre-processing, spatio-temporal analysis in the wavelet domain, class model construction (class dictionary), batch singular value factorization (BSVF), similarity feature computation and classification. The pre-processing step involves filtering for noise removal and optionally contrast enhancement using histogram equalization.

**Figure 5.** *Similarity-based feature construction and classification.*

*Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

The wavelet analysis step applies orthogonal or biorthogonal wavelet (9/7 or 5/3 filter) to produce subband frames. A silhouette feature map is constructed by combining low– low and high–low subbands as described in Tawiah et al [130]. The map is a tiling of rectangular features describing the dominant objects in the frame. Sparse dictionary is constructed for each activity as described in refs. [131, 132]. Spatial frame resizing and temporal frame subsampling by interpolation are applied to construct an action volume of 64 � 32 � 100 pixels for each action volume. It is then reshaped to a vector of size of 51200. Batch singular value prediction (BSVP) is based on the classical singular value decomposition [133] used in signal processing with batch data input (matrix). Each column of the input matrix represents a sample action. The output is a decomposition consisting of left and right-hand singular vectors (or matrices) for vector (or matrix) input and a covariance matrix as the diagonal matrix.

BSVP prediction step consists of two sub-steps: first, apply singular value decomposition to the same batch training sample used in constructing the dictionary, replacing one column (e.g, the first) with an incoming action sample. Then, apply the computation step in Eq. (5). The class dictionary is constructed using a batch sample matrix, with each sample representing an action volume. The prediction for an input action sample is computed using Eq. (5):

$$\text{Est}(r,j) = \sum\_{j=1}^{\text{sample}} \rho(r, \ :) \left[ \sum\_{j=1}^{\text{dim}} LHS(r, \ i) \* a(j, \ j) + \sum\_{j=1}^{\text{dim}} RHS(r, \ j) \* a(j, \ j) \right] \tag{5}$$

Ф denotes the class dictionary matrix, N sample denotes the number of samples in the batch dataset, Dim S denotes the dimension of each sample, RHS (r,i) denotes the right-hand singular vector, LHS(r,i) denotes the left-hand singular vector, α denotes the covariance matrix and Est denotes the estimate of the sample. The indices, r and i, are used to identify specific elements in a matrix. The similarity between the input spatio-temporal volume and Est (refer to Eq. 5) is computed using five similarity measures, namely, canonical correlation [134], Bhattacharyya distance [135], modified Bhattacharyya distance, histogram intersection [136] and cityblock. A similarity vector is formed by concatenating all the similarity values. A multi-class feed-forward classifier [137], consisting of 51 all versus one classifier, is constructed. The classifier is

**Figure 6.** *Brush hair sample video clip, showing frames 1, 2 and 3.*

**Figure 7.** *Cartwheel sample video clip, showing frames 1, 3 and 5.*

able to assign an action volume to multiple classes. Samples of input video frames and the corresponding object outline maps are shown in **Figures 6** and **7**.

BSVP does not reconstruct a sample using the sparsest representation as is the case in classical sparse coding but instead uses one-time reconstruction from batch sample whose representations are known (represented as LHS and RHS singular matrices with known covariance) and applies BSVP algorithm. This provides a representation for a sample taking into consideration statistical characteristics of all samples in the batch. It is computationally efficient and avoids solving L1-norm optimization, and it is suitable for real-time classification problems. The result on applying the proposed algorithm to all the fifty-one action classes is summarised in **Figure 8**, using the action categories provided by HMDB51 dataset (**Table 2**).

The confusion matrix is also shown in **Figure 8** to illustrate action classes prone to misclassification.

For robotics applications, facial expression recognition and gesture recognitions are also very important. Reference [138] provides a good review of facial expression recognition.

**Figure 8.** *Confusion matrix for HMDB51 dataset.*


## *Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

#### **Table 2.**

*HMDB 51 action classification.*

## **1.7 Trends in cognitive robotics**

Early approaches to imitation aim to reproduce reaching or grasping with simple grippers. Imitation learning provides a desired sequencing of basic sub-skills to achieve an observed task behaviour. Later on, more sophisticated system including modules for visual attention, speech recognition, and integration of visual and linguistic inputs for instructing robots to grasp everyday objects [139]. Online learning and machine learning techniques, such neural networks, have been used in low-level and reactive tasks from trajectory learning and adaptive control of multi-DOF robots, and tasks learning from demonstrations. ML provides different paradigms of learning from transfer learning, representation learning curriculum learning, etc, which provides for systematic means of acquiring systematic models for making inferences [140]. The following are some trends that are apparent from the literature review:


The use of artificial intelligence, especially machine learning, wireless connectivity and cloud computing, is increasing to integrate physical systems and processes, including robotics. At the core of most ML tasks, decision-making is based on information fed to the decision maker. The study of decision-making is closely connected with psychology and cognitive sciences.

## **1.8 Success, challenges and research directions**

Several projects involving the use of cognitive robotics have been reported in industrial settings (Industry 4.0), service robots, robotic surgery, cardiovascular surgery [70], assistive technology [141] and several other fields. In ref. 70, ML methods are used in perceiving an individual's health by collecting and interpreting his/her clinical data and would reason to suggest actions to maintain or improve the individual's cardiovascular health. ML augmented decisions point to potential to improve the outcome at a lower cost of care and increase satisfaction. In assistive technology [141], vision-based hand wheelchair control using kinect sensor system enables the user to control without wearing or touching.

As cognitive robotics continues to make some remarkable progress in industrial process automation with Industry 4.0 initiative, cloud robotics and service robots, it has resulted in more challenges [142–144]. For example, standardisation effort [56] has ushered in a new era of robotics linked to cyber–physical system for effective control and monitoring of industrial processes. Classical approaches to robotics have made significant progress in control-based applications in stand-alone robotic applications, but there are challenges in multi-robot and multi-agent systems applied to complex tasks in dynamic environments.

The main goal of integrating thinking, knowing and feeling in an artificial intelligent system as cognitive process has not been realised today despite advances [57]. In particular, integrating feeling into the existing system has proved very challenging.

The trends towards Industry 4.0 of providing cyber–physical framework for unifying industrial processes and cyber–physical system would be extended to service robots domain as well. The need to develop more robust and sophisticated ML algorithms to enable AI agents to carry out complex tasks in a coordinated and cooperative fashion to ensure reliability and cost-effectiveness. The robustness of ML algorithms under adversarial learning would also have to be investigated.

The need is for more research into the decision-making process (using ML) to make it robust, timely and relevant to situation, as well as meet real-time requirements. For multi-robot systems, the need for cooperation and coordination of tasks is very challenging to improve the effectiveness and improved utilisation of resources. Underlying these problems is the need for research into more robust ML algorithms and transparent model interpretation, and guarantees against adversarial attacks [145].

The need is for cost-effective management of resources (computing, network, storage and devices), all interconnected for ambient intelligence. The problem of scheduling, recovering from unexpected events and scalability issues require urgent attention. Similarly, the integration of heterogeneous platforms (software and hardware) into processes is required. Investigations into robust and generic processing architecture for social robotics are another area worthy of investigation.

Investigations into protocols to ensure effective and robust cooperation between humans and robots via human-machine interfaces to ensure trust and autonomy, as well as ethical considerations, ought to be investigated.

To meet privacy and security concerns distributed learning [146, 147] approaches to train models on the cloud keeping data localized and apply privacy-preserving analysis. However, this has raised the issues of network latency and model consistencies, which has been proven very challenging. Approaches to solving the challenges include MEC-based training, federated learning and capsule network for internet of vehicles. Other persistent challenges are latency, security and management of

network infrastructure. Autonomic systems [148] seem very attractive for managing problems related to network and computing infrastructure.

## **1.9 Conclusion**

The chapter has presented a review of recent development in ML techniques for cognitive robotic systems in the overall context of artificial intelligence. The main algorithms for learning, namely, reinforcement and imitation learning techniques, have been discussed.

The recent initiative in Industry 4.0 initiative, increasing trend in research in service robots, telemedicine and computer-assisted medical delivery system means that the industry holds lots of promise for research and personal applications.

Several processing architectures, as well as software frameworks for integrating heterogeneous hardware and software components, have also been presented. Towards simulating further research, current trends and research issues have also been highlighted. An example scenario involving action recognition of humans and facial expression has also been presented.

## **Author details**

Thomas Tawiah University of Education, Winneba, Ghana

\*Address all correspondence to: thomastawiah@yahoo.co.uk

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **References**

[1] Hacker M. Humanoid Robots: Human-like Machines. Vienna, Austria; 2007. pp. 367-396

[2] Jurgen J. A bottom-up integration of vision and actions to create cognitive humanoids. In: Samani H, editor. Cognitive Robotics. Boca Raton, FL: CRC; 2015. pp. 191-214

[3] Ciria A, Schillaci G, Pezzulo G, Hafner VV, Lara B. Predictive processing in cognitive Robotics: A review, 2021

[4] Kambayashi Y, Yajima H, Shiyoji T, Oikawa R, Takimoto M. Formation control of swarm robots using mobile agents. Vietnam Journal of Computer Science 2019;6(2):193-222

[5] Schillaci G, Hafner V, Lara B. Exploration behaviors, body representations, and simulation processes for the development of cognition in artificial agents. Frontiers in Robotics and AI;**3**:39

[6] Alami R, Chatila R, Fleury S, Ghallab M, Ingrand F. An architecture for autonomy. International Journal of Robotics Research (Special Issue on Integrated Architecture for Robot Control and Programming). 1998;**17**: 315-337

[7] Sun B, Saenko K. From virtual to reality: Fast adaptation of virtual object detectors to real domains. BMVC. 2014;**1**

[8] Tobin et al. Domain randomization for transferring deep neural networks from simulation to the real world, March 2017

[9] Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T. Deep domain confusion. Maximizing for domain invariance. 2014 [10] Tzeng E, Devin C, Hoffman J. et al. Adpating deep visuomotor representations with pairwise constraints. 2017

[11] Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research. 2016;**17**:1-40

[12] Guido S. Sensorimotor Learning and Simulation of Experience as a Basis for the Development of Cognition in Robotics. Germany: Humboldt University of Berlin; 2013

[13] Hayamizu S, Hasegawa O, Itou K, Yoshimura T, Akiba T, Asoh H, Kurita T, Sakaue K. Multimodal interaction systems that integrates speech and visual information. Bulletin of the Electrotechnical Laboratory 2000;64(4- 5):37-44

[14] Steels L, Kaplan F. Aibos first words. The social learning of language and meaning. Evolution Communication. 2001;**4**(1):3-21

[15] Iwahashi N. Language acquisition through a human-robot interface by combining speech, visual, and behaviour information. Information Science. 2003; **156**:109-121

[16] Brooks RA. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation. 1986;**RA-2**:1

[17] Agostini A, Torra C, Worgotter F. Efficient interactive decision-making framework for robotic applications. Artificial Intelligence 2017;247:187-212

[18] Yeon ASA, Visvanathan R, Mamdah SM, Kamarudin K, Kamarusin LM,

*Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

Zakaria A. Implementation of behavior based robot with sense of smell and sight. In: 2015 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS 2015). 2015. pp. 119-125

[19] Zucker M, Ratliff N, Stolle M, et al. Optimization and learning for rough terrain legged locomotion. The International Journal of Robotics Research;**30**(2):175-191

[20] Schillaci G, Ciria A, Lara B. Tracking emotions: Intrinsic motivation grounded on multi-level prediction error dynamics. In: Proceedings of the 10th Joint International Conference on Development and Learning and Epigenetic Robotics (IEEE ICDL-EpiRob 2020). 2020

[21] Pio-Lopez L, Ange N, Fristorn K, Pezzulo G. Active inference and robot control: A case study. Journal of Royal Society Interface. 2016;**12**:616

[22] Buckley C, Kim CS, Mcgregor S, Seth AK. The free energy principle for action and perception: A mathematical review. Journal of Mathematical Psychology;**81**: 55-79

[23] Lara B, Astorga D, Mendoza-Bock E, Pardo M, Escobar E, Ciria A. Embedded Cognitive robotics and the learning of sensorimotor schemes. Adaptive Behaviour;**26**(5):225-238

[24] Pickering M, Clark A. Getting ahead: Forward models and their place in cognitive architecture. Trends in Cognitive Sciences;**18**(9):451-454

[25] Lanillos P, Cheng G. Adpative robot body learning and estimation through predictive coding. In: Proceedings 2018 IEEE/RSJ International Conference on Intelligent Robotics and Systems (IROS). 2018. pp. 4083-4090

[26] Lanillos P, Cheng G et al. Robot self/ other distinction: Active inference meets neural networks in a mirror. 2004

[27] Asada M, Hosoad K, Kuniyoshi Y, et al. Cognitive developmental robotics: A survey. IEEE Transactions on Autonomous Mental Development. 2009;**1**(1):12-34

[28] Pfeifer R, Lida F, Gomez G. Morphological computation for adaptive behaviour and cognition. International Congress Series. 2006;**1291**:22-29

[29] Mcgeer T. Passie walking with knees. In: Proc. 1990 IEEE Int. Conf. Robot Autom. 1990

[30] Sumioka H, Yoshikawa Y, Asada M. Development of joint attention related actions based on reproducing contingency. In: Proceedings of 7th International Conference on Developmental Learning. 2008

[31] Hashimoto T, Senda M, Kobayashi H. Realization of realistic and rich facial expressions by face robot. In: Proceedings of 2004 IEEE Techn. Exhib. Based Conf. Robot Autom. 2004. pp. 37-38

[32] Matsui D, Minato T, MacDorman KF, Ishiguro H. Generating natural motion in an android by mapping human motion. In: Proceedings IEEE/RSJ Int. Conf. Intell. Robots Sys. 2005. pp. 1089- 1096

[33] Tobin J, Fang A, Scheider R, Zaremba W, Abbeel P. Domain randomization for transferring deep neural networks from simulation to real world. 2017

[34] Kawamua K, Brown W. Cognitive robotics' Chapter in Springer Encyclopedia of Complexity and System Science. Springer Science; 2010. pp. 1109-1126

[35] Melta G, Fitzpatrick P, Natale T. YARP: Yet Another Robot Platform. International Journal of Advanced Robotic Systems, Special Issue on Software Development and Integration in Robotics. 2006;**3**(1)

[36] Frank M, Leitner J, Stollenga M, Harding S, Forster A, Schmidhuber J. The modular behavioural envirnment for humanoids and other robots (MoBeE). In: Proceedings of the International Conference on Informatics in Control, Automation & Robotics (ICINCO). 2012

[37] Stollenga M, Pape L, Frank M, Leitner J, Forster A, Schmidhuber J. Task-relevant roadmaps: A framework for humanoid motion planning. In: Proceedings of the International Conference on Intelligent Robotics and Systems (IROS). 2013

[38] Leitner et al. A modular software framework for hand-eye coordination in humanoid robots. Frontiers in Robitics and AI. 2016;**2016**:1-16

[39] Courtney et al Cognitive systems platforms using open source. 2009

[40] Correa M, Hemosilla G, Verschae R, Ruiz-del-solar J. Human detection and identification by robots using therma and visual information in domestic environments. Journal of Intelligent Robotic Systems;**66**:223-243

[41] Cheraghi AR, Shahzad S, Graffi K. Past, present, and future of swarm robotics. 2021

[42] Baranes A, Oudeyer P-Y. Intrinsically motivated goal exploration for active motor learning in robots: A case study. In: Proc: IEEE/RSJ

International Conference on Intelligent Robots and Systems. 2010. pp. 1766- 1173. DOI: 10.1109/IROS.2010.5651385

[43] Sun R. The importance of cognitive architecture: An analysis based on CLARION. Journal of Experimental and Theoretical Artificial Intelligence. 2007; **19**(2):159-193

[44] Sun R. Anatomy of the Mind. Oxford University Press; 2016

[45] John E. The Soar Cognitive Architecture. MIT Press; 2012. p. 390

[46] Demiris Y. Predicition of Intent in Robotics and Multi-agent systems. Cognitive Processing. 2007;**8**:151-158

[47] Demiris Y, Khadhouri B. Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems. 2006;**54**:361-369

[48] Vahrenkamp N, Wachter M, Krohnert M, Welke K, Asfour T. The robot software framework Armarx. Information Technology. 2015;**57**(2): 99-111

[49] Meta G et al. The iCub humanoid robot: An open-systems platform for research in cognitive development. Neural Networks. 2010

[50] Unity Technologies. 2019 Available: https://unity.com

[51] Juliani A, Berges V-P, Teng E, et al. Unity: A general platform for intelligent agents. 2020

[52] Bellemare MG, Naddaf Y, Veness J, Bowling M. The arcade learning environment: An evaluation platform for general agents. In: Proc. International Joint. Conference on Artificial Intelligence. 2015. pp. 253-279

*Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

[53] Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, et al. Asynchronous methods for deep reinforcement learning. In: Proc. Int. Conf. Learning Representation. 2016

[54] Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press; 1998

[55] Kehoe et al. A survey of research on cloud robotics and automation. IEEE Transactions on Automation Science and Engineering. 2015;**12**(2):398-409

[56] Groshev M et al. Toward Intellignt Cyber-Physical Systems: Digital twin meets artificial Intelligence. IEEE Communications Magazine. 2021;**59**(8): 14-20

[57] Gutierrez-Garcia J. and O. Lopez-Neri. Cognitive computing: A brief survey and open research challenges. In Proceedings of 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on computational Science and Intelligence, Japan, 2015.

[58] Brasil L et al. Hybrid expert systems for decision support in the medical area: Complexity and cognitive computing. International Journal of Medical Informatics. 2001;**63**(11):19-30

[59] Wang Y. Towards the synergy of cognitive informatics, neural informatics, brain informatics, and cognitive computing. In: Cognitive Information for Revealing Human Cognition: Knowledge Manipulations in Natural Intelligence. First ed. Hershe, PA, USA: IGI Global; 2012. pp. 159-177

[60] Cote et al. Prototyping cognitive models with MARIE. In: Proceedings on (IEEE/RSJ 2008 International Conference on Intelligent Robots and

Systems), IROS Workshop on Current Software Frameworks in Cognitive Robotics Integrating Different Computational Paradigms. Nice, France; 2008

[61] Cote C, Letournrau D, Raievsky C, Michaud F. Robotic software integration using MARIE. International Journal of Advanced Robotic Systems. 2006;**3**(1): 55-60

[62] Robot operating system (ROS) https://en.wikipedia.org/wiki/Robot\_ Operating\_System

[63] ROS 2 for Realtime applications. https://discourse.ros.org/t/ros2-for-realtime-applications/6493. ROS.org. Open Robotics, 17 October 2018

[64] Nao ROS Wiki. http://www.ros.org/ wiki/nao ROS.org. Open Robotics, 28 October 2013

[65] Pereira A, Bastos GS. ROSRemote, using ROS on cloud to access robots remotely. In: Proceedings of the 2017 IEEE 18th International Conference on Advanced Robotics (ICAR). Hongkong, China; 2017

[66] Arumugan R et al. Davinci: A cloud computing framework for service robots. In: Proceedings of the 2010 IEEE International Conference on Robotics and Automation (ICRA). Anchorage, AK, USA; 2010. pp. 3084-3089

[67] Multi-Access Edge Computing (MEC): Framework and reference architecture; https://www.etsi.org/ deliver/etsi gs/MEC/001099/003/ 02.02.0160/gsmec003v020201p.pdf

[68] Borsatti et al. Enabling industrial IOT as a service with multi-access edge computing. IEEE Communication Magazine. 2021;**59**(8):21-27

[69] Kahneman D. Thinking Fast and Slow. first ed. Farrar, Straus and Giroux; 2011

[70] Sanchez-Martinez M et al. Machine learning for clinical decision making: Challenges and opportunities in cardiovascular imaging. Frontiers in Cardiovascular Medicine. 2022

[71] Aoki et al. Human-Robot cooperation for autonomous vehicles and human drivers: Challenges and solutions. IEEE Communications Magazine. 2022;**59**(8):36-41

[72] Schillaci G, Villapando AP, Hafner VV, Hanaper P, Colliaux D, Wintz T. Intrinsic motivation and episodic memories for robots exploration of highdimensional sensory spaces. Adaptive Behaviour. 2020;**29**(6):549-566

[73] Baranes A, Oudeyer P. R-IAC: Robust intrinsically motivated exploration and active learning. IEEE Transactions on Autonomous Mental Development, IEEE. 2009;**1**(3): 155-169. DOI: 10.1109/ TAMD.2009.2037513

[74] Lee K, Ognibene D, Chang HJ, Kim TK, Demiris Y. STARE: Spatio-Temporal attention relocation for multiple structured activities detection. IEEE Transactions on image processing. 2015; **24**(12):5916-5927

[75] Demiris Y. Prediction of intent in robotics and multi-agent systems. Cognitive Processing. 2007;**8**:151-158

[76] Mcleland J, McNaughton L. Why there are complementary learning systems in the hippocampus and Neocortex: Insights from the success and failures of connectionist models of learning and memory. Psychological Review. 1995;**102**(3):419-457

[77] Bellman R. On the theory of dynamic programming. Proceedings of the National Academy Science. 1952;**38**(8): 716-719

[78] Arulkumaran K, Peter M, et al. Deep reinforcement learning: A brief survey. IEEE Signal Processing. 2017; **34**(6):28-38

[79] Bilard A, Calinan S, Dillman RR, et al. Robot Programming by Demonstration. Springer. pp. 1371-1391

[80] Tai L, Zhang J, Liu M et al. A survey of deep network solutions for learning control in robotics: From reinforcement learning to imitation. 2016

[81] Schad T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. In: Proc. Int. Conf. Learning Representations. 2016

[82] Hasselt HV. Double Q-learning. In: Proc. Neural Information Processing Systems. 2010. pp. 2613-2621

[83] Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. In: Proc. Int. Conf. Learning Representations. 2016

[84] Mnih V, Silver D, et al. Human-level control through deep reinforcement learning. Nature. 2015;**518**(7540): 529-533

[85] Kaelbling LP, Littman ML, Cassandra AR. Planning and acting in partially observable stochastic domains. Artificial Intelligence. 1998;**101**(1): 99-134

[86] Nachum O, Nourouzi M, Xu K, Schuurmans D. Bridging the gap between value and policy based reinforcement learning

*Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

[87] Ho J, Ermon S. Generative adversarial imitation learning. 2016

[88] Schmerling M, Schillaci G, Hafner V. Goal-directed learning of hand-eye coordination in a humanoid robot. In: Proceeding of 5th International Conference on Development and Learning and on Epigenetic Robotics (ICDL-EpiRob). 2015

[89] Hussein A, Gaber MM, Elgan E, Jayne C. Imitation learning: A survey of learning methods. ACM Computing Surveys. 2017;**50**(2):35

[90] Argall B, Browning B, Veloso M. Learning by demonstration with critique from a human teacher. In: Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, SCM. 2007. pp. 57-64

[91] Bitzer S, Vijayakumar S. Latent spaces for dynamic movement primitives. In: Proc. 9th IEEE-RAS International Conference on Humanoid Robots (Humanoids '09). 2009

[92] Bagnell JA. An invitation to mitiation. Pittsbrgh, PA: Carnegie-Mellon University; 2015

[93] Abbeel P and Ng A. Y. Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning (ICML), 2004

[94] Baraness A, Ouder P-Y. Active learning of inverse models with intrinsically motivated goal exploration in robots. 2013

[95] Daume H, Langford J, Marco D. Search-based structured prediction. Machine Learning. 2009;**75**:297

[96] Duan Y, et al. One-shot imitation learning. In: Advances in Neural

Information Processing Systems. 2017. pp. 1087-1098

[97] Guo Y, Liu Y, Oerlemans A, Laos S, Wu S, Lew MS. Deep learning for visual understanding: A review. NeuroComputing. 2016;**187**:27-48

[98] Gu S, Holy E, Lillicrap T, Levine S. Deep reinforcement learning for robotic manipulation with asynchronous off policy updates. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA). 2017. pp. 3386-3396

[99] Gupta A, Eppner C, Levine S, Abbeel P. Learning dexterous manipulations for a soft robotic hand from human demonstrations. In: Proceedings of IEEE/ RSI International Conference on Intelligent Robots and Systems (IROS). 2016. pp. 3786-3793

[100] Zhang J, Springenberg JT, Boedecker J, Burgard W. Deep reinforcement learning with successor features for navigation across similar environments. In: Proceedings of IEEE/ RSJ International Conference on Intelligent Robots and Systems (IROS). 2017. pp. 2371-2378

[101] Chen Y, Everett M, Liu M, How JP. Socially aware motion planning with deep reinforcement learning

[102] Tai L, Paolo G, Liu M. Virtual-toreal deep reinforcement learning: Continuous control of mobile robots for maples navigation. In: Proceedings of IEEE?RSJ International Conference on Intelligent Robots and Systems (IROS). 2017. pp. 31-36

[103] Zhang J, Tai L, Boedecker J, Burgard W, Liu M. Neural SLAM

[104] Khan A, Zhang C, Atanasov N, Karydis K, Kumar V, and Lee D. Memory augmented control networks [105] Parisotto E, Salakhutdinov R. Neural map: Structured memory for deep reinforcement learning

[106] Tzeng F, Hoffman J, Zhang N, Saenko K, and Darrel T. Deep domain confusion: Maximizing for domain invariance

[107] Chen L-C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected crfs. In: Proceedings of International Conference on Learning Representations (ICLR). 2015

[108] Fritzpatrick P, Metta G, Natale L, Rao S. Learning about objects through action-initial steps towards artificial cognition. In: Proceedings of International Conference on Robbotics and Automation (ICRA '03). Taipei, Taiwan. pp. 3140-3145

[109] Masci J, Meier U, Cirecsan D, Schmidhuber J. Stacked convolutional auto-encoders for hierarchical feature extraction. In: Proceedings of International Conference on Artificial Neural Networked. Springer; 2011. pp. 52-59

[110] Kanitscheider I, Fiete I. Training recurrent networks to generate hypothesis about how the brain solves hard navigation problems. Advances in Neural Information Processing Systems: 4532-4541

[111] Taylor M, Stone P. Cross-domain transfer for reinforcement learning. In: Proc. 24th International Conference on Machine Learning (ICML'07). 2007. pp. 879-886

[112] Wang et al. Dueling network architectures for Deep reinforcement learning. In: Proc. 33rd International Conference on Machine Learning (ICML'16). 2016. pp. 1995-2003

[113] Radford R, Luke M. Unsupervised representation learning with deep convolutional generative adversarial networks.2016

[114] Ran et al. Convolutional neural network-based robot navigation using uncalibrated spherical images. Sensors. 2017;**17**:1341

[115] Coates A, Ng AY. Learning Feature Representations with k-means. Springer; 2012. pp. 561-580

[116] Schillaci G et al. Intrinsic motivation and episodic memories for robot exploration of high-dimensional sensory spaces. 2020

[117] Chen R, Jin Y. A social learning particle swarm optimization algorithm for scalable optimization. Information Science. 2015;**291**:43-60

[118] Rahmatizadeh R, Abolghasemi P, Bolani L. Learning manipulation trajectories using recurrent neural networks. 2016

[119] Yamada T, Murata S, Aric H, Ogata T. Dynamic integration of language and behavior in a recurrent neural network for human-robot interaction. Frontiers in Neurobotics. 2016

[120] Molina-Leal A et al. Trajectory planning for mobile robot in a dynamic environment using LSTN neural network. Applied Science. 2021;**11**(22): 10689

[121] Redmon J, Angelova A. Real-time grasp detection using convolutional neural networks, 2015

[122] Levine S et al. Learning hand-eye coordination for robotic grasping with deep learning and large scale data collection. 2016

*Machine Learning and Cognitive Robotics: Opportunities and Challenges DOI: http://dx.doi.org/10.5772/intechopen.107147*

[123] Li C, Lowe R, Ziemke T. Humanoids learning to walk: A natural CPG-actor critic architecture. Frontiers in Neurobiotics. 2013

[124] Calinion S, Li S, Alizadeh T, Tsagarakis G. Statistical dynamical systems for skills acquisition in humanoids. In: Proc. 2012 IEEE-RAS International Conference on Humanoids Robots (Humanoids '12). pp. 232-329

[125] Triesch J, Wirghardt J, Mael E. Towards imitation learning of Grasping movements by an autonomous robot. In: Proc. of the Interantional Gesture Workshop on Gesture-Based Communication in Human-Computer Interaction (GW '99). 1999. pp. 73-84

[126] Wisspeinter T, Van der Zant T, Ioccchi I, Schiffers S. Robocupmome: Scientific competition as benchmarking for domestic robots. Interaction Studies. 2009;**930**:392-426

[127] RoboCupHome Official Website. Available on December 2010

[128] Vrigkas M, Christophorous N, Ioannis A. A Review of human activity recognition methods. Frontiers in Robotics and AI. 2015;**2**:28

[129] Kuehne H, Jhuang H, Garrote E, Poggio T, Sierre T. A large video database for human motion recognition. In: Proceedings of the IEEE International Conference on Computer Vision. 2011

[130] Andzi-Quainoo TT, Mike LR. A bank of classifiers for robust object modeling in wavelet domain. In: Proceedings of IEEE International Conference on Industrial Technology. Busan, South Korea; 2014

[131] Lee H, Alexis B, Rajat R, Ng Andrew Y. Efficient sparse coding algorithms. In: Advances in Neural Information Processing (NIPS). 2007

[132] Aharon M, Elad M, Bruckstein AM. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representations. IEEE Transactions on Signal Processing. 2006;**54**(11): 4311-4322

[133] Golub GH, Reinsch C. Singular value decomposition and least squares solutions. Numerical Mathematics. 1970; **14**:403-420

[134] Hardle HW, Leopold S. Applied Multivariate Statistical Analysis. Berline, Heidelberg: Springer; 2007. pp. 321-330

[135] Kailath T. The dirvergence and Bhattacharyya distance measures in signal selection. IEEE Transactions on Communication Technologies. 1967; **15**(1):54-60

[136] Swain M, Ballard DH. Color indexing. International Journal of Computer Vision. 1991;**7**(1):11-32

[137] Cheng-Liu L. One-versus\_all training of prototype classifier for pattern classification and retrieval. In: Proceedings of 2010 20th International Conference on Pattern Recognition. 2010. pp. 3328-3331

[138] Pramerdorf C and Kampel M. Facial expression recognition using convolutional neural networks: State of the art. 2016.

[139] Steil J, Wersing H. Recent trends in online learning for cognitive robotics. In: Proceedings of ESANN '2006 –European Symposium on Artificial Neural networks. Bruges, Belgium; 2006

[140] Tani J. Learning to generate articulated behavior through bottom-up and top-down interaction processes. Neural Computation. 2003;**16**(1):11-23

[141] Trigueiros P, Ribeiro F. Visionbased hand wheelchair control. In: Proc. of 12th International Conference on Autonomous Robot Systems and Competitions (Robotics 2012). Guimaraes, Portugal; 2012. pp. 39-43

[142] Saha O, Dasgupta P. A comprehensive survey of recent trends in cloud robotics architectures and Applications. Robotics. 2018;**7**:47

[143] ISO 8373. (en). Robots and robotic devices-vocabulary: ISO/TC299. 2012. https://www.iso.org/obp/ui/#iso:std:iso: 8373:ed-2:v1:en [Accessed: [November 2020]

[144] Fong T, Noorbakhsh I, Dautenhahn K. A survey of socially interactive robots. Robotics and Autonomous Systems;**42**: 143-166

[145] Modas A, Sanchez-Matilla R, Frossard P, Cavallaro A. Toward robust sensing for autonomous vehicles, an adversarial perspective. IEEE Signal Processing Magazine. 2020;**47**:14-24

[146] Xu et al. Capsule network distributed learning with multi-access edge computing for internet of vehicles. IEEE Communications Magazine. 2021; **59**(8):52-57

[147] Li L et al. A survey on federated learning. In: 2020 IEEE International Conference on Control & Automation (ICCA). 2020. pp. 791-796

[148] Maidana RG et al. Autonomic computing towards resource management in embedded mobile robots. In: Proceedings of 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE). 2019. pp. 192-197

## *Edited by Maki K. Habib*

The development and use of robotics is affecting all aspects of modern life. There is a demand not only for robots that can move, interact, learn, and act in real-time dynamic and unconstrained environments but also for those that can interact smoothly and safely with the actions and movements of people within the same environments. In addition to managing complex motor coordination, these robots also require the ability to acquire and represent knowledge, deal with uncertainty at different operational levels, learn, reason, adapt, and have the autonomy to make intelligent decisions and act upon them. They should be able to learn from interaction, anticipate the outcomes of actions, acquire experiences and use them as required for future activities. Cognitive robotics is the interdisciplinary term used to describe robots that merge all these features and capabilities in their hardware and software architectures.

Published in London, UK © 2022 IntechOpen © kang053 / iStock

Cognitive Robotics and Adaptive Behaviors

Cognitive Robotics and

Adaptive Behaviors

*Edited by Maki K. Habib*