The Neo-Mechanistic Model of Human Cognitive Computation and Its Major Challenges

*Diego Azevedo Leite*

## **Abstract**

The neo-mechanistic theory of human cognition is currently one of the most accepted major theories in fields, such as cognitive science and cognitive neuroscience. This proposal offers an account of human cognitive computation, and it has been considered by its proponents as revolutionary and capable of integrating research concerning human cognition with new evidence provided by fields of biology and neuroscience. However, some complex cognitive capacities still present a challenge for explanations constructed by using this theoretical structure. In this chapter, I make a presentation of some of the central tenets of this framework and show in what dimensions it helps our understanding of human cognition concerning aspects of capacities, such as visual perception and memory consolidation. My central goal, however, is to show that to understand and explain some particular human cognitive capacities, such as self-consciousness and some conscious informal reasoning and decision making, the framework shows substantial limitations. I conclude the chapter by suggesting that to fully understand human cognition we will need much more than what the neo-mechanistic framework is actually able to provide.

**Keywords:** theoretical cognitive science, human cognitive computation, consciousness, informal reasoning, decision making and action

## **1. Introduction**

A new intellectual movement in the field of cognitive science1 has been developed, above all, in the last two decades of the current century, starting from debates that took place, mainly, in the philosophy of science at the end of the twentieth century. This movement has been described more broadly by many authors as a "new mechanistic philosophy" [4–7]. Strongly influenced by recent advances in computer science, neuroscience, and artificial intelligence, the theoretical framework developed by some

<sup>1</sup> I will use the term "cognitive science" in a *general sense* and a *specific sense*. In the general sense, the term will be treated as synonymous with the term "psychology" [1, 2]. In a specific sense, it will be treated as an attempt to build a science of cognition, integrating several different areas of knowledge, which took place in the 1970s in the USA [3].

of the movement's most prominent authors offers a new physicalist (or materialist) and mechanistic view of human cognition2 [9–21].

The theory formulated from the application of the neo-mechanistic philosophy to cognitive science and, specifically, to human cognition, can be called the *Mechanistic Theory of Human Cognition* (MTHC) [22]. This proposal is currently one of the most accepted major theories in fields, such as cognitive science and cognitive neuroscience, and it has been considered by its influential proponents as revolutionary and capable of integrating research concerning human cognition with new evidence provided by fields of biology and neuroscience.

One of the most central elements present in the framework of MTHC is a "model of human cognitive computation" [9–11, 13, 15], which is also part of the attempt made by several influential authors to provide some type of unification or integration for the field of cognitive science [9, 10, 23–25]. However, some complex cognitive capacities and some particular aspects of human cognition still present a challenge for explanations constructed by using this theoretical structure [22].

My central goal in this chapter, therefore, is to present an argument to show that human cognition cannot yet be completely understood and explained in terms of mechanistic computation and that this view indeed presents many substantial limitations.

To develop my argument, I present, firstly, some of the central elements of this neo-mechanistic framework and its application to cognitive science. Secondly, I present the mechanistic model of human cognitive computation, as it is currently framed, and, based on the specialized literature, I show in what dimensions it helps our understanding of some aspects of human cognitive capacities, such as visual perception and memory consolidation. Thirdly, I show that to understand and explain some human cognitive capacities, such as self-consciousness and conscious informal reasoning and decision making, the neo-mechanistic framework shows substantial limitations. I conclude the chapter by suggesting that the notion of human artificial cognitive computation can be useful for several projects, but to fully understand natural human cognition we will most certainly have to consider theories that go beyond the current neo-mechanistic model of human cognitive computation.

## **2. Mechanistic theory of human cognition**

The contemporary movement of neo-mechanistic philosophy has been historically associated with ideas already present in the period of Ancient Philosophy. Philosophers, such as Democritus, Leucippus, Aristotle, Epicurus, and Lucretius [9, 14, 26], for example, have been mentioned in the specialized literature as precursors. Although there is no unity of thought regarding this philosophical tradition, these thinkers would arguably have launched, in Western philosophical thought, the first notions linked to mechanistic reflections. In other words, these philosophers would have proposed the general idea that many phenomena in nature must be explained through their basic components, their forms of movement, their properties, and their interactions since these phenomena are also composed of these basic elements.

In Modern Philosophy, the history of what might be called "mechanistic philosophy" is quite complex, given the many debates over definitions of the term and the

<sup>2</sup> I will use the term "cognition" as synonymous of the term "mind" for the sake of clarity and objectivity. For an important discussion concerning the term "cognition," cf. Akagi [8].

## *The Neo-Mechanistic Model of Human Cognitive Computation and Its Major Challenges DOI: http://dx.doi.org/10.5772/intechopen.104995*

variety of positions that can be considered within a more general view of what the term means in this period. In any case, many authors consider that the movement of mechanistic philosophy in the seventeenth century is a reaction to Aristotelian natural philosophy and various natural philosophies of the Renaissance period [27]. The French philosopher René Descartes (1596–1650), for example, is considered one of the main figures who laid the foundations of modern mechanistic philosophy, especially with regard to explanations of biological natural phenomena [9, 27–31]. Des Chene [30] argues that Descartes united a mechanistic ontology, on the one hand, with a method of mechanistic explanation, on the other, applying these ideas to numerous biological phenomena, including the behavior of non-human animals and the human body.

Shortly thereafter, this reasoning would also be applied quite influentially to human beings and their mental capacities. One of the most prominent advocates of this view was the French philosopher and physician Julien Offray de La Mettrie (1709–1751), who published *Histoire Naturelle de L'âme* (Natural History of the Soul), in 1745, and *L' Homme Machine* (Man a Machine), in 1747, expanding Descartes' philosophy of biology to human beings [21]. It can be said, therefore, that modern mechanistic philosophy is fundamentally committed to the "machine analogy," that is, just as it occurs in a machine, all-natural processes can be explained in terms of their constituent components and the interaction between the activities they perform to produce their result [32]. This mechanistic framework was quite influential in many dimensions of many central issues and debates during the eighteenth and nineteenth centuries.

At the beginning of the twentieth century, the debate about the best explanation for the complex phenomenon of "life" was still quite strong [32]. The controversy was over whether or not this phenomenon could be explained in mechanistic terms. In this context, a very influential work was that of the German-born American physiologist and biologist Jacques Loeb (1859–1924), published in 1912, *The Mechanistic Conception of Life*. In this work, Loeb [33] indicates his interest in discussing the question of whether "life" (or all vital phenomena) could be explained in physicochemical terms. He sought to reduce "higher-level" biological phenomena to their more basic "lowlevel" components and thus ultimately place biology on the same level of scientific prestige and legitimacy as physics and chemistry [28].

In the second half of the twentieth century, philosophers of science sought to analyze, in a more precise way, this mechanistic explanatory strategy. One of the most influential analyzes is present in the work of the American philosopher Ernest Nagel (1901–1985), *The Structure of Science*, published in 1961. Chapter 12 of this work is entitled *Mechanistic explanation and organismic biology*. In it, Nagel [34] discusses the problem of explaining "life" and says that a mechanist is one who believes, as Jacques Loeb believed, that all vital processes can be explained in physicochemical terms. This work profoundly influenced the understanding of what a mechanistic scientific explanation was in the philosophy of science of the period.

It was also during this period that some philosophers of science working in the field of biology began the task of elaborating, in an even more robust and systematic way, notions related to mechanistic explanations in science – mainly in biology. Along these lines, some pioneering works were the following: Herbert Simon, *The Architecture of Complexity*, published in 1962; Stuart Kauffman, *Articulation of parts explanations in biology and the rational search for them*, published in 1970; and William Wimsatt, *Reductive explanation: a functional account*, published in 1976.

Within this line of philosophical thinking, the work of William Bechtel and Robert Richardson, *Discovering Complexity*, published in 1993, is normally considered in the specialized literature as being the first to elaborate mechanistic explanations of a more solid, detailed, and mature form. Moreover, in 1996, Stuart Glennan published the article *Mechanisms and the nature of causation*; in 1998, Paul Thagard published the article *Explaining disease: correlations, causes*, *and mechanisms*; in 2000, Peter Machamer, Lindley Darden, and Carl Craver published the article *Thinking about mechanisms*; and in 2002, Jim Woodward, published the article *What is a mechanism? A counterfactual account*. All these works were extremely important for the development of the new mechanistic movement in the philosophy of science, especially related to biology.

It is also important to point out that in the development of the neo-mechanist movement, at the end of the twentieth century, we can distinguish, more generally, two main trends [5]. One of them focuses more on metaphysical and ontological directions. Authors who work in this line seek, above all, to answer what mechanisms are as real things in the world. The other strand followed in the direction of a greater elaboration of the philosophy of science with epistemological and methodological discussions about scientific explanations, mainly in the area of biology. They seek to explain how something works and not make claims about the ultimate reality of things. These two strands of the new mechanism have been elaborated in an enormous specialized literature that covers several scientific and philosophical areas, dominating a great part of the central debates. Despite being two dimensions that can be separated in the debate, ontological and epistemological discussions are deeply related in many works, both directly and indirectly.

The neo-mechanistic philosophy began to be applied with greater emphasis to cognitive science since the decade of 1990 – with this application becoming stronger in the first decade of the twenty-first century – and it has been better elaborated since then until the present days in central works of very influential authors [9–15, 18–20, 35–43]. According to this view, human cognition, specifically, as well as biological cognition, in general, can be understood and explained through complex models of multilevel neurocognitive mechanisms. At these levels, there are causal processes related to cognitive information processing, cognitive representation, cognitive computing, as well as processes related to chemical and physical reactions that can be used to explain a given cognitive phenomenon. These are, in fact, autonomous processes of causation, which take place at all these different levels and are relevant to the explanation of the phenomenon of interest [44]. According to this theory of human cognition, namely, MTHC, all these causal levels and processes, although autonomous, can be related in a pluralistic mechanistic explanation, where the relevant scientific theories are integrated. As a result, MTHC includes not only a theory of human cognition but also a theory of the human neurocognitive relationship; that is, the theoretical framework suggests a possible solution to the problem of how we are to understand and explain the connection between human neural and cognitive phenomena, thus attempting to relate neuroscience and cognitive science.

The main objective of a mechanistic scientific explanation in scientific areas, such as biology, cognitive neuroscience, and cognitive science, is to identify the parts of a mechanism, its operations, its organization, and thus show how these elements constitute the system's relationship with the phenomenon that must be explained [9, 10, 45]. Particularly, in cognitive science, the central idea present in the theory is that human neurocognitive processes are a type of information processing performed by neural systems (mechanisms). These processes and the components that carry them out can be decomposed into subparts, and these subparts are decomposed again, as far as necessary

## *The Neo-Mechanistic Model of Human Cognitive Computation and Its Major Challenges DOI: http://dx.doi.org/10.5772/intechopen.104995*

for the understanding of the investigated phenomenon. After that, these components and activities have to be located in the brain as spatiotemporal parts of a complex multilevel neurobiological mechanism. As a result, there may be multiple levels of mechanistic composition in a human neurocognitive mechanism.

Another important feature of MTHC is that it was developed within a broad physicalist context that is present in a vast amount of work in contemporary cognitive science, philosophy of cognitive science, and philosophy of mind. In this physicalist context, the theory tries to combine central ideas present in traditional cognitive science with the main ideas present in certain fields of neuroscience that investigate human cognition. In this sense, some authors argue that this mechanistic physicalist framework can provide a consistent way to build a unified science of cognition and integrate cognitive science and neuroscience [23–25, 40].

Indeed, integrating and unifying, from a physicalist background, traditional cognitive science and traditional neuroscience to understand and investigate human cognition is an old dream held by many authors. Patricia Churchland, in 1986, calls for the unification of cognitive research and neural research in her book *Neurophilosophy: Toward a unified science of the mind-brain*. The aim of Churchland's book was to outline a general framework that would be suitable for the development of a unified theory of what she called "mind-brain," as well as to encourage the interaction between philosophy, psychology, and neuroscience [46].

It is possible to argue that MTHC was articulated with the objective of providing this integration and unification in a more precise theoretical way and within a clear physicalist background. The influential version of MTHC by William Bechtel is a clear example. He considers the human phenomenon "mind-brain" as "a set of mechanisms for controlling behavior" [9], and he explains that cognitive phenomena (e.g., perception, attention, memory, problem solving, and language) can be characterized as "information-processing mechanisms" [9]. Bechtel [9] states that scientific disciplines that aim to explain cognitive activities recognize that "in some way, these activities depend upon our brain." Or, to put it in another way: "Psychological phenomena are realized in brains comprised of neurons" [45]. This means that cognitive phenomena are physical and need to be explained in some physical (neural) way.

Craver and Tabery [47] describe the physicalist commitment quite clearly—"many mechanists opt for some form of explanatory anti-reductionism, emphasizing the importance of multilevel and upward-looking explanations, without rejecting the central ideas that motivate a broad physicalist world-picture." Therefore, in this approach, there is no space for any form of dualism, pluralism, or non-physicalism of any kind in relation to the ontology of human cognition. There is, indeed, a clear commitment to a form of ontological monism, namely, physicalism, that underlies the neo-mechanistic theory of human cognition.

Neo-mechanistic ideas about human cognitive phenomena are becoming increasingly dominant in fields related to theoretical cognitive science and cognitive neuroscience [48]. Consequently, the neo-mechanistic framework is often presented as one of the main theories, or the main theory, to explain human cognition in the twentyfirst century.

## **3. Mechanistic model of human cognitive computation**

Formulations of the idea that human cognition can be considered in computational terms can already arguably be found in the works of Thomas Hobbes

(1588–1679) and Gottfried Leibniz (1646–1716). However, it is in the first half of the twentieth century that new developments in this tradition made the thesis gain great strength [49]. Alan Turing (1912–1954), with his work on computation, made a solid mathematical contribution to advances in the attempt to build machines capable of thinking like humans. And with the development of the computer and the emergence of studies in computer science and artificial intelligence, there was an even greater push for the acceptance of these ideas in the period. Indeed, these were crucial factors in the development of cognitive psychology in the 1950s and cognitive science (in the specific sense) in the 1970s. In discussing the foundations of cognitive science, Gardner [3] states that "there is the faith that central to any understanding of the human mind is the electronic computer." Furthermore, according to him: "Involvement with computers, and belief in their relevance as a model of human thought, is pervasive in cognitive science" [3].

The first formulations of the philosophical foundations and the most central bases of the "computational theory of cognition" were presented, above all, in central works by Hilary Putnam (1926–2016) and Jerry Fodor (1935–2017). It is mainly based on works like these that the "classical model of cognitive computation" was formulated [49]. According to this proposal, the human mind is a computational system similar in important respects to a "Turing machine," which works through "Turing-style computations." In this view, cognitive processes, such as problem solving, decision making, and formal reasoning, are performed through computations similar to those of a Turing machine.

Another line of work, however, developed an alternative notion of cognitive computation. Inspired by research in the field of neurophysiology, some authors in the 1980s proposed that cognitive computation was something very different from Turing-style computation [50]. The correct format of cognitive computation for them was that of neural networks, in which, very briefly, data nodes are connected in a particular way so that when the network is activated through an input, it can provide an output. This framework became known as connectionism, and it has been developed in numerous works since then. Many cognitive models of different phenomena were built based on this view, such as object recognition, speech perception, and sentence comprehension.

The notion of "cognitive mechanistic computation" is part of this tradition, and it is especially related to the model of neural networks. Craver [10], for example, writes about the "computational properties of brain regions" and "computational properties of neural systems," without giving much detail about what exactly this means. In any case, it is clear that the supposed computation is much more related to concrete properties of neural systems than to abstract functional properties of psychological capacities considered in terms of Turing computation or something similar. Milkowski [11], in turn, presents a proposal that holds that neurocognitive processing occurs over states that contain information, but he does not elaborate much on the content and the semantic dimension of cognitive information or of putative cognitive computations.

Bechtel [9, 19] considers mental mechanisms as information-processing mechanisms that operate through neural representations and neural computations about vehicles and content. In his view, the "control theory of dynamical systems" shows how content is placed in this context. And Thagard [14, 15] thinks that mental mechanisms operate through computations that take place on representations at the cognitive level and computations that take place at the neural and molecular levels. In Thagard's work, there is also recourse to the "theory of dynamical

## *The Neo-Mechanistic Model of Human Cognitive Computation and Its Major Challenges DOI: http://dx.doi.org/10.5772/intechopen.104995*

systems" (as in Bechtel's); however, just in his version of the mechanistic theory, there is a definite number of mechanistic levels and extensive discussion about the "semantic pointers theory" of Chris Eliasmith.

Finally, there is the work of Piccinini [12, 13, 51], which is one of the most theoretically sophisticated and detailed among neo-mechanists regarding such issues. The author defends a mechanistic neurocomputational theory of human cognition. In his view, the human nervous system is a functional mechanism that produces computations through the activation of neurons, while the processing occurs in vehicles according to rules. Cognitive capacities are explained then by multilevel neurocognitive mechanisms that perform neural computations over neural representations. Besides, he thinks that neural computation (i.e., computations defined on the functionally relevant elements of neural activity) is not purely digital, as classically understood, nor purely analog, as alternatively understood; in his view, neural computation is *sui generis* – neither wholly digital nor wholly analog.

One does not need to enter so deep into these individual theories to see that they differ significantly. Craver mentions computations but does not offer an elaborated account. Thagard is the only one mentioning semantic pointers as central to the account. Milkowski and Piccinini attempt to avoid the problems with content, by means of focusing on formal properties. And Bechtel uses control theory to deal with the issue of content. As a result, it is not possible to derive from those accounts a single theory, as each author develops his own point of view with its significant particularities. There is, therefore, no theoretical substantial unity among these proponents.

However, one can try to find common aspects to evaluate at least the most basic and important tenets. To do that, an analysis of two cases where this mechanistic view on human cognitive computation can be applied will be helpful.

One of the best examples found in the specialized literature of a concrete application of this view to particular cognitive phenomena is related to memory, which, indeed, has been traditionally an object of study in the field of psychology [9, 10]. Functional analyses of the human memory capacity reveal the existence of many subcapacities, such as short-term memory, long-term memory, phonological memory, visuospatial memory, semantic memory, episodic memory, and memory consolidation. In mechanistic terms, one of the best-understood phenomena in this memory system is memory consolidation. Roughly put, this is the phenomenon of transforming short-term memories (which are liable and easy to disrupt) into long-term memories, which are robust and enduring, when consolidation takes place and permits the organism to remember important events for a longer period of time and modify its behavior accordingly [52]. To explain this phenomenon, all the relevant regions in the brain responsible for the functions that compose the neuro-cognitive mechanism of memory consolidation, including all relevant mechanistic levels of decomposition, must be identified, that is, all the particular component parts and component operations of the whole mechanism must be determined, as shown on **Figure 1**. Finally, the causal processes and causal interactions within the mechanism functions need also to be understood, that is, the general organization of the mechanism.

The explanation starts at the highest level of the whole mechanism. At this level, it is necessary to correctly identify all the large neural network that is responsible for memory consolidation. Secondly, it must be established whether this large neural system is indeed all that is relevant for the explanation of the phenomenon. The mechanistic explanation at this level also needs to clarify how the neural network process information about new memory episodes through *computational operations*

#### **Figure 1.**

*An example of a simple model of a neuro-cognitive biological mechanism (M1). In this model, M1 is composed, at the level L1, by its component parts C1, C2, and C3, which perform the functions (or activities) f1, f2, and f3. The component parts can be decomposed into smaller components, as it happens with C3, which is composed, at level L2, of the sub-components SC1, SC2, SC3, and SC4. The component SC3 can be further decomposed, at level L3, into its subcomponents ssc1, ssc2, and ssc3.*

and how these processes produce and affect, for instance, the different degrees of consolidation that characterize the memories under investigation.

Once this has been clarified, the explanation turns to the second level of description in which the large neural system is decomposed into particular sub-neural systems localized in more specific regions. Here the goal is to understand the information processing and computational operations (e.g., spiking patterns in populations of neurons) of these smaller neural networks and how they contribute to the performance of the whole mechanism composed of such neural nets.

Moreover, a further stage of decomposition must be reached that concerns the processes underlying memory at an intercellular level. The explanation at this particular level aims at describing the components of a particular neural network and at understanding how a small number of neurons operate (e.g., how they depolarize and fire in the process of propagation of action potentials, or how they are responsible for synaptic processes, neurotransmitters being released, and so on). Here it is possible to measure spiking rates of neurons, or spiking frequency and record neural activity in general.

Finally, the explanation can go even to another lower mechanistic level—the intracellular and molecular level. At this level, the description is in terms of the activity of relevant proteins, molecules, and ions. As one can see, this kind of explanation "exhibits a progression from the behavioral-level characterization of memory consolidation to the identification of important components in the process at progressively lower levels." [52]. All levels are equally important to achieve the complete multilevel mechanistic explanation of the particular phenomenon in the end.

## *The Neo-Mechanistic Model of Human Cognitive Computation and Its Major Challenges DOI: http://dx.doi.org/10.5772/intechopen.104995*

Another example is related to human visual perception [9, 13, 40], which is roughly understood as the capacity to acquire and process visual information from objects and events in the environment. In the biological mechanism related to human visual perception, the occipital lobe is central, since many studies on humans show deficits in visual processing due to damage in the occipital lobe. The mechanism also includes a projection of the optic tract going from the eye, passing by the lateral geniculate nucleus (LGN), which is an area of the thalamus, and achieving the occipital lobe. Besides, it includes the eyes, optic nerves, and other brain areas responsible for visual perception. All these areas can be decomposed in working components and their operations, and each decomposition is considered to be a lower level in the entire constitution of the mechanism. The occipital lobe, for instance, can be itself decomposed in areas responsible for particular visual functions, such as the striate cortex, also known as Brodmann area 17, or V1 (primary visual cortex, or visual area 1).

The same procedure can be done for all the other areas in the brain that are also part of the mechanism responsible for visual perception; for instance, V2, V3, V4, and V5/MT. It is necessary to identify also the cells (including visual receptor cells in the retina of the eye, such as cones and rods), networks of cells, or larger neural systems in these areas that are responsible for *information processing and computation*, for example, about light and dark spots, bars of light (edges), size, shape, color, depth, location, and motion of objects in the visual field. The mechanism also includes the pathways and channels through which the information is transmitted and the information about intercellular, intracellular, and molecular processes.

As one can observe by looking at these two examples, the notion of "computation" in the mechanistic framework stands for some causal interactions within the nervous system and this is how different brain regions "compute" different information. Each brain region "stands for" some kind of particular information—related to perception, sensation, memory, language, reasoning, emotion, etc. The substantial problems with such an account of human cognition will be analyzed in what follows.

## **4. Major challenges to the model**

A great deal of criticism has arisen in the specialized literature concerning the notion of human cognitive computation. It is nearly impossible to review all of the works, but I will make some considerations of some of the most influential critics.

Fodor [53–55], for instance, claims that many mental representations (e.g., beliefs) and mental processes (e.g., abductive reasoning) are sensitive to global properties (i.e., properties that beliefs, for instance, have so that they are determined by a set of other beliefs which they are members of). For example, a belief about a tennis racket being broken may complicate the plan of playing tennis on the weekend, but not the plan of playing soccer. This means that a mental representation, such as an intention to play tennis, will depend on the context at the moment—whether there is a racket available for the game or not. Fodor argues, though, that classical symbolic computing models are only sensitive to local properties, and neural network models cannot handle this feature of human cognition.

Dreyfus [56], in turn, claims that much human knowledge cannot be captured by symbolic manipulation and formal rules, since this knowledge is constructed through direct contact and practicing in the world. Nagel [57] brings attention to the problem of phenomenal consciousness—roughly, the issue of what it feels like to experience something subjectively. Following this line of thinking, we can also say that a

computer cannot know (if it can know anything) what it feels like to taste the flavor of chocolate. It has no idea of what it is like to eat chocolate, something that is quite basic for any child that does it. More than that, computers do not feel pain or pleasure, which is quite basic for human beings. Furthermore, Searle [58] brings attention to the difficulties related to intentionality, understanding, and meaning, with his famous "Chinese room argument." And, additionally, Putnam [59] develops the idea that mental states cannot be identified with computational states, consequently arguing vigorously against computational reductionism3 .

The case of Bruner's critics is also very interesting. One of the names most frequently mentioned in influential works of historical reconstruction of the events and studies that contributed to the beginning and development of the cognitive movement in psychology is the American psychologist Jerome Bruner (1915–2016) [1–3, 60, 61]. He is recognized for having founded, together with George Miller (1920–2012), the Center for Cognitive Studies at Harvard University, in 1960. In addition, Bruner published, together with colleagues, in 1956, *A Study of Thinking*, in which he dealt, in a systematic way, with the formation of concepts under a cognitive perspective, which gave great impetus to the movement. In his various works, Bruner has contributed to scientific knowledge on various topics of psychology, such as perception, language, learning, and cognitive development [62].

One of the most interesting points in Bruner's work, however, is his strong criticism of the very cognitive movement he helped to develop. He has presented this criticism in key works, such as *Acts of Meaning*, published in 1990, and *The Culture of Education*, published in 1996. Examination of these works can thus show what an author with a rigorous background in scientific psychology, a high degree of theoretical sophistication, and extensive research in the field observed that was wrong with the development of cognitivism.

In *Acts of Meaning*, Bruner [63] states that the original idea of the cognitivist movement of the 1950s was, in fact, to establish "meaning" as a central concept of psychology. However, in Bruner's view, this original impulse was distorted by a reductionist emphasis, adopted by a dominant trend of the movement that defended computationalism. The emphasis was given to "information," "processing of information," and "computability;" and not to meaning and to "meaning construction" [63]. As a result of this approach, concepts central to traditional inquiry in scientific psychology have been distorted, eliminated, or obscured, such as the concepts of "intentional states" (believing, desiring, intending, understanding a meaning) and the concept of "agency," that is, the conduct of human action under the influence of intentional states [63].

However, in Bruner's view, this is not the way forward. In *The Culture of Education*, Bruner [64] says that, since the cognitive revolution, there have been two quite different conceptions of how the human mind works—the first establishes the hypothesis that the human mind works as a computational system; the second proposes the hypothesis that the human mind is constituted and realized in the use of human culture. Bruner claims that his version of cognitivism is not based on reductionist computationalism, but rather on what he called culturalism. He claims that his intention is really to develop a theory of the human mind alternative to computationalism

<sup>3</sup> Of course, these arguments are still being strongly debated currently, and there are many attempts to answer these concerns. If the answers are satisfying or not, it is something that cannot be settled here. However, in any case, these arguments taken together provide a very compelling case against the idea that all human cognition can be understood and explained in computational terms.

## *The Neo-Mechanistic Model of Human Cognitive Computation and Its Major Challenges DOI: http://dx.doi.org/10.5772/intechopen.104995*

and that his theory focuses exclusively on "how human beings in cultural communities create and transform meanings" [64].

One of the major problems pointed out by Bruner in the computationalist approach is that the production of meaning is often extremely complex, sensitive to the context, and involves the difficulty of clear and precise understanding [64]. This is not the same as establishing computational procedures for the processing of input and output information to the system, whether this is computational processing in digital format or the form of neural networks. For Bruner, meaning making is not merely information processing; it is something more profound and more complex. Culture, in his view, has a fundamental role in human life and it is only through it and in it that certain processes and mental structures are formed and used.

The human being, in Bruner's view, was able to develop a way of life in which reality is represented by a symbolism shared by members of a cultural community, and human life is organized and built from this symbolism that is conserved, elaborated, and transmitted through successive generations [64]. Although meaning is in the mind and is produced by it, it also has its origins in culture and has its importance within the culture in which it was generated. And for the production of meanings, the human mind creates and makes use of symbolic cultural systems. Thus, in this view, thinking and learning are always situated in a cultural context [64]. Computer systems, however, are not capable of producing meanings. They only deal with a certain set of formalized and operationalized meanings, but they do not make interpretations of human and cultural phenomena.

Furthermore, there is no very clear reason to suppose that processes and relationships between all mental phenomena are literally computational in nature, nor that all mental representations have this same character. The application of the concept of computation to these phenomena investigated in the tradition of psychological research is based only on a working hypothesis present in a certain particular theoretical system. Nevertheless, there is as yet no concrete proof that all human cognition works according to a type of computational processing x, y, or z. In fact, finding out what kind of computational processing is related to the human mind has become an extremely debated issue internally by adherents of any computational model of human cognition [49]. It is no accident that comprehensive theoretical systems were developed precisely with the intention of questioning the computational model of cognition.

Now, to illustrate more concretely some of the difficulties mentioned with the notion of human cognitive computation, let us consider some cases involving conscious complex informal reasoning and conscious complex decision making where explanations for human behavior might be required [22].

Consider, firstly, a case where a person is dissatisfied with her marriage and is thinking about getting a divorce. To make such a decision, she has been consciously reflecting for months on the current state of the marriage, her beliefs about the relationship, her emotions about her partner, her desires and expectations in life, the beliefs of her family and closest friends about the issue and what are the reasons to take action in this regard. After thinking carefully for a very long time, being aware that she really does not feel comfortable and happy at all, she decides to go for a divorce.

Consider also a second example. A person needs to decide which candidate she will vote for as president of her country. To make this decision, she needs to use her conscious informal reasoning ability. Thus, she reflects on the arguments put forward by politicians running for the election, the arguments put forward by commentators,

scientists, and political analysts, as well as journalists writing on the subject, and the arguments of friends and family she finds relevant and credible. After three months of thinking, she has not decided yet but is rather still in doubt concerning her vote in the major candidates A and B. When someone asks her which candidate she is going to vote, she says: "I still don't know." Then, some surprising news arises in a serious newspaper with charges of corruption against candidate A, and she is a frequent reader of this newspaper, so she becomes immediately aware of this. Upon reflection on the matter and related issues, she takes the new information seriously and she finally decides that voting for candidate B is the best option. The major reason is that there is no charge whatsoever of corruption against him. When she is asked now which candidate she is going to vote for, she answers immediately: "candidate B." After she made up her mind, she finally goes to the appropriate place on the proper day and time to cast her vote.

A third example is the case of a college student who suffers from difficulties related to his excessive anxiety. Through a general psychological assessment, it can be seen that the factors related to student anxiety are financial difficulties, difficulties in family life where physical and psychological violence occurs, difficulties in finding leisure time to relax and have fun (since they need to work and study at the same time) and difficulties with excessive concerns about the uncertain future, as he believes that it will not be easy to find a job when he graduates. All of these factors seem to contribute to generate in the student's mind distorted and dysfunctional negative thoughts about himself and his life, and it seems very plausible that these distorted thoughts are strongly associated with his excessive anxiety. This interpretation is, indeed, supported by numerous works in the specialized literature in clinical psychology. Thus, we observe that the most relevant causal factors to explain this psychological phenomenon are not merely computational, but psychological, social, and environmental.

Psychological scientific explanations, in these cases, need considerations that go beyond the investigation of computations being performed in nervous systems or even in any abstract functional system. What explains the psychological phenomenon of belief formation and decision making in the first example and the excessive anxiety in the third example is the meaning formation and interaction of beliefs, desires, and intentions to act (according to logical rules, practical rules, and interpretation of reality), which are strongly affected by emotions, physical environment, and social factors.

In the second example, evidently, an informative explanation would have to mention an important causal factor—the event of the corruption charges against candidate A, appearing in a serious newspaper. Moreover, the explanation would have to mention that the person becomes aware of this event, accepts it as reliable, accepts the charges as true and accurate, and now this content is present in one or some of her beliefs. In possession of this content, she can rationally justify herself when engaging in discussions about the topic with family, friends, and other people, providing reasons for her related beliefs and her related behaviors. Thus, the influence of the event on her is external and affects the internal logic and content of her systems of beliefs, emotions, desires, and intentions. This explanation involves then particular properties of human cognitive systems, present for instance in belief and intention systems. These properties are clearly different than those involved in merely describing supposed automatic computational activities in her neural networks or describing what is happening in terms of physical and chemical neural processes. The explanation for this phenomenon of belief formation, therefore, would also have to account

## *The Neo-Mechanistic Model of Human Cognitive Computation and Its Major Challenges DOI: http://dx.doi.org/10.5772/intechopen.104995*

for how this new information could change a particular belief given her system of beliefs about the topic.

In the examples above, there are cognitive processes that often necessitate consciousness and complex informal reasoning about belief systems that are often linked to particular perceptions, sensations, emotions, desires, intentions, attitudes, as well as related to each other and the external environment. Some of these beliefs have great value, such as some moral beliefs, which makes this whole dynamic even more complex. In these cases, blind computation might even occur at some level, but what is most relevant are environmental, social, cultural, historical, and psychological factors (such as beliefs, emotions, desires, and intentions) that acquire meaning in a given cognitive system.

The relevant explanation of the actions in such cases is made through considerations—(1) about the creation and alteration of the content of perceptions, beliefs, sensations, emotions, maxims, wills, desires, intentions, etc.; (2) about their internal relationships; and (3) about their external relationships with the physical, social, historical, and cultural context. Rigorous empirical scientific research can aid in discovering strong and systematic (stable) regularities in human behavior explained in such terms without the need for the notion of computation. Statistical tools and analysis, through the mathematical application, can bring greater objectivity, avoiding both an extremely subjective and confusing vocabulary, as well as unproductive speculation and mere common sense.

Moreover, self-consciousness here is crucial, since we humans have the ability to *evaluate* our own beliefs, not just to be aware that we have them. If we can access some beliefs as belonging to our cognitive belief system, we can evaluate whether they are true or false, precise or imprecise, how they are related to our emotions and sensations and we can decide if we want to keep them or not. The complex social dynamics are also crucial, since our systems of beliefs are constantly interacting with the beliefs of others during our lifetime and this interaction has a major influence on the formation and modification of our belief system, emotional system, and volitional system.

Therefore, human beings have the ability to form original belief systems and relate them according to logical and interpretative rules, building arguments to support their point of view, which often influences their behavior. Human beings are also able to think about different types of relevant information for months or years to make an important and complex decision. To make a difficult decision, a human being can take into account information related to plans for the very distant future, in which many scenarios are considered. A human may wonder what happened in the very distant past, or what might have happened, even if he or she knew what really happened. And complex informal reasoning and complex decision making are things that humans do naturally and often in their daily lives.

Thus, in cognitive science, it is necessary to deal with extremely complex phenomena, given that human beings show great differences when compared to other animals in nature. Human beings have a cumulative, complex, dynamic, and elaborate culture that is passed on through generations. Humans are also involved in understanding and writing their own history. They have natural languages with enormous, complex, and refined expressive power and sophisticated grammar. Human beings practice and appreciate art, such as literature, painting, cinema, and music. They engage in purely formal or very abstract thoughts when they do mathematics, logic, and engage in certain religious thoughts. They create legal laws for their societies and think about morality, building moral systems. They build artificial intelligence machines that are able to learn with a certain level of autonomy and are able to explore other planets. Furthermore, humans are involved in politics, science, and philosophy.

Computers, by contrast, so far, do not form beliefs on their own, they do not have the capacity to evaluate and improve them by themselves, and they do not interact in the social environment neither using natural language with a huge degree of sophistication as humans do nor engaging in social and cultural practices. If we look at the problem from a very concrete and objective point of view, we observe that even the most advanced computer systems, the most advanced robots, and the most advanced artificial neural and cognitive architectures today are still very far from behaving like human beings in relation to language and actions that involve consciousness and informal rationality. Humans are capable of playing chess, cooking pizza, making coffee, having a conversation about politics, creating a new song on a guitar, and playing tennis on the same day. No computational artificial system is currently capable of this generality in cognition. So, as a matter of current fact, computational artificial cognition cannot be used to fully explain the major capacities of human cognition and intelligence.

It is no surprise, then, that mechanistic accounts of psychological capacities usually suggest only *where* the putative computations are probably taking place in the idealized standard human brain (as we can see in the examples presented in the previous section), not *what* exactly are these computations and how they can be related to the internal subjective experience of a person (like the content of a strong belief, for instance, that can normally be accessed and become conscious).

Difficulties with the notion of cognitive computation are recognized by influential neo-mechanists themselves. Milkowski [21], for instance, concludes his work by admitting that we "still don't know how to model consciousness mechanistically." Additionally, there are several alternative models of cognitive computation in cognitive science nowadays—syntactic computation; algorithmic computation; causal computation; and semantic computation [65]. None of the models has gained significant prominence over the others concerning the understanding and explanation of human cognition. Finally, there is strong criticism even of the attempt by neo-mechanists to propose that good computational explanations in cognitive science must be also mechanistic explanations [66, 67].

Therefore, if we think about the issue from the point of view of current facts, we need to recognize that the neo-mechanistic proposal for human cognition is still far from being able to be considered the best or most plausible understanding and explanation of human cognition. It is just one view among many.

## **5. Conclusion**

The mechanistic framework has been offering significant contributions to the field of cognitive science, on the one hand. One of its best contributions is the promotion of debates on the issue of human cognitive computation. In this sense, there is a search for a better understanding of what this notion actually means. All this effort is very worthwhile and welcome. More generally, the theoretical debate about fundamental questions in cognitive science promoted by new mechanists is also very important, as well as their effort to clarify what a "biological mechanism" and a "cognitive mechanism" are and what a "mechanistic explanation" in cognitive science is. Furthermore, another contribution of the new mechanistic philosophy is to encourage historical research and current debate, in cognitive science and beyond, about the relationship among "mechanism," "materialism," "reductionism" and "computationalism", so that these concepts are not confused and that the positions adopted

## *The Neo-Mechanistic Model of Human Cognitive Computation and Its Major Challenges DOI: http://dx.doi.org/10.5772/intechopen.104995*

by the authors, as well as the different dimensions of the debate, are appreciated in a fair and correct way. Finally, the new mechanistic philosophy applied to cognitive science is also contributing to the important debate concerning the unity, integration, and plurality in the field.

On the other hand, however, many of the current promises of the new mechanism for cognitive science are quite difficult to fulfill. Firstly, neo-mechanistic philosophy is a philosophy of science built primarily from examples from the biological sciences and neuroscience that is serving as the basis for building a philosophy of the science of mind. We live in a period in which neuroscience and artificial intelligence research have gained great prestige and recognition. A great deal of economic investment has been made in these areas and this is very attractive. In part, this also influences "the new wave of mechanism," and the necessity of some authors to expand the framework. However, numerous particularities related to psychology and human cognition are being neglected in this theoretical structure, as I tried to show.

Secondly, there is considerable disagreement among leading neo-mechanists over the most plausible formulation of MTHC regarding fundamental issues, such as the idea of human cognitive computation. Thus, there is a considerable difficulty related to the internal articulation and unification of the theory. Furthermore, many alternative major theories, and the research programs based on them, strongly threaten the neo-mechanistic framework in current cognitive science, since they are also seeking predominance in the field, or just for having more space and recognition.

Given this, we can conclude that the mechanistic model of human cognitive computation cannot provide substantial theoretical or explanatory unification or integration to the field of cognitive science today, since there is no unification between the proponents themselves. Moreover, their different proposals are often unclear on many important aspects concerning traditional problems of intentionality, consciousness, and self-consciousness. The accounts are sometimes internally not well-articulated; and, externally, there is serious criticism of them, with countless debates and controversies on several fundamental questions. In addition, there are several alternative models competing for predominance on this particular issue. And it is yet by no means clear whether the explanatory power of any of them is greater than the explanatory power of the others.

This analysis shows, therefore, that the neo-mechanistic proposal concerning human cognitive computation has serious weaknesses. But the problem is not to use the idea of cognitive computing to advance models of biological and artificial cognitive architectures, since many human cognitive abilities can already be simulated. Indeed, it is very interesting to see that our science has advanced to the point where a computer can win against the best chess and go game players in the world. In fact, advancements within computational artificial systems and robotics could well be applied to improve our educational and health systems. For example, inspired by scientific developments in the field of cognitive science, artificial cognitive systems could possibly be developed to help children with the learning process of mathematics, natural language, or history at schools, or even at the university level. Artificial systems could possibly be developed to help people with excessive anxiety symptoms, as well. This could be extremely worthwhile. Moreover, better and more advanced artificial cognitive systems and robotic systems can contribute to improving theories of human cognition, as much as better and more correct theories of human cognition can help in faster advancements of cognitive artificial systems and robotic systems. But there is good reason to keep these efforts separated and to consider human cognition as a very complex and particular phenomenon in nature.

The problem arises only with the untenable suggestion that we already have, or that we are very close to getting, the complete and definitive understanding and explanation of all the major capacities of human cognition in computational terms. This, yes, is a mistake.

## **Conflict of interest**

The author declares no conflict of interest.

## **Author details**

Diego Azevedo Leite Federal University of Alfenas, Alfenas, Minas Gerais, Brazil

\*Address all correspondence to: diego.azevedo@unifal-mg.edu.br

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*The Neo-Mechanistic Model of Human Cognitive Computation and Its Major Challenges DOI: http://dx.doi.org/10.5772/intechopen.104995*

## **References**

[1] Leahey TH. A History of Psychology: From Antiquity to Modernity. 8th ed. New York: Routledge; 2018

[2] Mandler G. A History of Modern Experimental Psychology: From James and Wundt to Cognitive Science. Cambridge, MA: MIT Press; 2007

[3] Gardner H. The Mind's New Science: A History of the Cognitive Revolution. New York: Basic Books; 1985

[4] Glennan S. The New Mechanical Philosophy. Oxford: Oxford University Press; 2017

[5] Glennan S, Illari P. Introduction: Mechanisms and mechanical philosophies. In: Glennan S, Illari P, editors. The Routledge Handbook of Mechanisms and Mechanical Philosophy. New York: Routledge; 2018. pp. 1-10

[6] Krickel B. The Mechanical World. Cham: Springer; 2018 Available from: http://link.springer. com/10.1007/978-3-030-03629-4

[7] Glennan S, Illari P, Weber E. Six Theses on Mechanisms and Mechanistic Science. Journal for General Philosophy of Science. 2021. DOI: 10.1007/ s10838-021-09587-x

[8] Akagi M. Rethinking the problem of cognition. Synthese. 2018;**195**:3547-3570

[9] Bechtel W. Mental Mechanisms: Philosophical Perspectives on Cognitive Neuroscience. New York: Routledge; 2008

[10] Craver CF. Exlaining the Brain: Mechanisms and the Mosaic Unity of Neuroscience. Oxford: Oxford University Press; 2007

[11] Milkowski M. Explaining the Computational Mind. Cambridge, MA: MIT Press; 2013

[12] Piccinini G. Physical Computation: A Mechanistic Account. Oxford: Oxford University Press; 2015

[13] Piccinini G. Neurocognitive Mechanisms: Explaining Biological Cognition. Oxford: Oxford University Press; 2020

[14] Thagard P. Hot Thought: Mechanisms and Applications of Emotional Cognition. Cambridge, MA: MIT Press; 2006

[15] Thagard P. Brain-Mind: From Neurons to Consciousness and Creativity (Treatise on Mind and Society). Oxford: Oxford University Press; 2019

[16] Bechtel W, Abrahamsen A. Explanation: A mechanist alternative. Studies in History and Philosophy of Biological and Biomedical Sciences. 2005;**36**(2):421-441

[17] Machamer P, Darden L, Craver CF. Thinking about Mechanisms. Philosophy in Science. 2000;**67**(1):1-25

[18] Bechtel W. Resituating cognitive mechanisms within heterarchical networks controlling physiology and behavior. Theory & Psychology. 2019;**29**(5):620-639

[19] Bechtel W. Investigating neural representations: The tale of place cells. Synthese. 2016;**193**(5):1287-1321. DOI: 10.1007/s11229-014-0480-8

[20] Bechtel W. Looking down, around, and up: Mechanistic explanation in psychology. Philosophical Psychology. 2009;**22**(5):543-564

[21] Milkowski M. Mechanisms and the mental. In: The Routledge Handbook of Mechanisms and Mechanical Philosophy. New York: Routledge; 2018. pp. 74-88

[22] Leite DA. The Twenty-First Century Mechanistic Theory of Human Cognition: A critical analysis. Cham: Springer; 2021

[23] Milkowski M. Integrating cognitive (neuro)science using mechanisms. Avante. 2016;**6**(2):45-67

[24] Milkowski M, Hohol M, Nowakowski P. Mechanisms in psychology: The road towards unity? Theory & Psychology. 2019;**29**(5):567- 578. DOI: 10.1177/0959354319875218

[25] Piccinini G, Craver CF. Integrating psychology and neuroscience: Functional analyses as mechanism sketches. Synthese. 2011;**183**(3):283-311

[26] Popa T. Mechanisms: Ancient sources. In: Glennan S, Illari P, editors. The Routledge Handbook of Mechanisms and Mechanical Philosophy. New York: Routledge; 2018. pp. 13-25

[27] Roux S. From the mechanical philosophy to early modern mechanisms. In: Glennan S, Illari P, editors. The Routledge Handbook of Mechanisms and Mechanical Philosophy. New York: Routledge; 2018. pp. 26-45

[28] Allen GE. Mechanism, organicism, and vitalism. In: Glennan S, Illari P, editors. The Routledge Handbook of Mechanisms and Mechanical Philosophy. New York: Routledge; 2018. pp. 59-73

[29] Boas M. The establishment of the mechanical philosophy. Osiris. 1952;**10**(1952):412-541

[30] Des CD. Mechanisms of life in the seventeenth century: Borelli, Perrault, Régis. Studies in History and Philosophy of Biological and Biomedical Sciences. 2005;**36**(2):245-260

[31] Gunderson K. Descartes, La Mettrie, Language, and Machines. Philosophy. 1964;**XXXIX**(149):193-222

[32] Allen GE. Mechanism, vitalism and organicism in late nineteenth and twentieth-century biology: The importance of historical context. Studies in History and Philosophy of Biological and Biomedical Sciences. 2005;**36**(2):261-283

[33] Loeb J. The Mechanistic Conception of Life. Chicago: The University of Chicago Press; 1912

[34] Nagel E. The Structure of Science: Problems in the Logic of Scientific Explanation. New York: Harcourt, Brace & World, Inc.; 1961

[35] Craver CF. Interlevel experiments and multilevel mechanisms in the neuroscience of memory. Philosophy in Science. 2002;**69**(S3):S83-S97

[36] Bechtel W. Multiple levels of inquiry in cognitive science. Psychological Research. 1990;**52**(2-3):271-281

[37] Bechtel W. Levels of description and explanation in cognitive science. Minds and Machines. 1994;**4**(1):1-25

[38] Bechtel W. Constructing a philosophy of science of cognitive science. Topics in Cognitive Science. 2009;**1**(3):548-569

[39] Bechtel W. How can philosophy be a true cognitive science discipline? Topics in Cognitive Science. 2010;**2**(3):357-366

[40] Boone W, Piccinini G. The cognitive neuroscience revolution. Synthese. 2016;**193**(5):1509-1534

*The Neo-Mechanistic Model of Human Cognitive Computation and Its Major Challenges DOI: http://dx.doi.org/10.5772/intechopen.104995*

[41] Bechtel W. Decomposing the mindbrain: A long-term pursuit. Brain and Mind. 2002;**3**(2):229-242

[42] Craver CF. The making of a memory mechanism. Journal of the History of Biology. 2003;**36**(1):153-195

[43] Craver CF. Beyond reduction: Mechanisms, multifield integration and the unity of neuroscience. Studies in History and Philosophy of Biological and Biomedical Sciences. 2005;**36**(2):373-395

[44] Craver CF, Bechtel W. Topdown causation without top-down causes. Biology and Philosophy. 2007;**22**(4):547-563

[45] Bechtel W, Wright CD. What is psychological explanation? In: Symons J, Calvo P, editors. The Routledge Companion to Philosophy of Psychology. New York: Routledge; 2009. pp. 113-130

[46] Churchland PS. Neurophilosophy: Toward a Unified Science of the Mind-Brain. Cambridge, MA: MIT Press; 1986

[47] Craver C, Tabery J. Mechanisms in science. In: Zalta EN, editor. The Stanford Encyclopedia of Philosophy. Summer 2019 Edition. 2019. Available from: https://plato.stanford.edu/archives/ sum2019/entries/science-mechanisms/

[48] Samuels R, Margolis E, Stich SP. Introduction: Philosophy and Cognitive Science. In: Margolis E, Samuels R, Stich SP, editors. The Oxford Handbook of Philosophy of Cognitive Science. Oxford: Oxford University Press; 2012. pp. 1-18

[49] Rescorla M. The computational theory of mind. In: Zalta EN, editor. The Stanford Encyclopedia of Philosophy. Fall 2020 Edition. 2020. Available from: https://plato.stanford.edu/archives/ fall2020/entries/computational-mind/

[50] Buckner C, Garson J. Connectionism. In: Zalta EN, editor. The Stanford Encyclopedia of Philosophy. Fall 2019 Edition. 2019. Available from: https:// plato.stanford.edu/archives/fall2019/ entries/connectionism/

[51] Piccinini G. Computing mechanisms. Philosophy in Science. 2007;**74**(4):501-526

[52] Bechtel W. Molecules, systems, and behavior: Another view of memory consolidation. In: Bickle J, editor. The Oxford Handbook of Philosophy and Neuroscience. Oxford: Oxford University Press; 2009. pp. 13-40

[53] Fodor J. The Mind Doesn't Work That Way: The Scope and Limits of Computational Psychology. Cambridge, MA: MIT Press; 2000

[54] Fodor J. Reply to Steven Pinker "So how does the mind work?". Mind & Language. 2005;**20**(1):25-32

[55] Fodor J. LOT 2: The Language of Thought Revisited. Oxford: Oxford University Press; 2008

[56] Dreyfus HL. What Computers Can't Do: A critique of artificial reason. New York: Harper & Row, Publishers; 1972

[57] Nagel T. What is it like to be a bat? Philosophical Review. 1974;**83**(4):435-450

[58] Searle JR. Minds , brains , and programs. The Behavioral and Brain Sciences. 1980;**3**:417-457

[59] Putnam H. Representation and Reality. Cambridge, MA: MIT Press; 1988

[60] Bechtel W, Abrahamsen A, Graham G. The life of cognitive science. In: Bechtel W, Graham G, editors. A Companion to Cognitive Science. Oxford: Blackwell Publishing; 1998. pp. 1-104

[61] Miller GA. The cognitive revolution: A historical perspective. Trends in Cognitive Sciences. 2003;**7**(3):141-144

[62] Greenfield PM. Jerome Bruner (1915- 2016). Nature. 2016;**535**:232

[63] Bruner J. Acts of Meaning. Cambridge, MA: Harvard University Press; 1990

[64] Bruner J. The Culture of Education. Cambridge, MA: Harvard University Press; 1996

[65] Shagrir O. In defense of the semantic view of computation. Synthese. 2020;**197**:4083-4108. DOI: 10.1007/ s11229-018-01921-z

[66] Chirimuuta M. Minimal models and canonical neural computations: The distinctness of computational explanation in neuroscience. Synthese. 2014;**191**(2):127-153

[67] Serban M. The scope and limits of a mechanistic view of computational explanation. Synthese. 2015;**192**:3371-3396

## **Chapter 2**

## Learning Robotic Ultrasound Skills from Human Demonstrations

*Miao Li and Xutian Deng*

## **Abstract**

Robotic ultrasound system plays a vital role in assisting or even replacing sonographers in some cases. However, modeling and learning ultrasound skills from professional sonographers are still challenging tasks that hinder the development of ultrasound systems' autonomy. To solve these problems, we propose a learningbased framework to acquire ultrasound scanning skills from human demonstrations<sup>1</sup> . First, ultrasound scanning skills are encapsulated into a high-dimensional multi-modal model, which takes ultrasound images, probe pose, and contact force into account. The model's parameters can be learned from clinical ultrasound data demonstrated by professional sonographers. Second, the target function of autonomous ultrasound examinations is proposed, which can be solved roughly by the sampling-based strategy. The sonographers' ultrasound skills can be represented by approximating the limit of the target function. Finally, the robustness of the proposed framework is validated with the experiments on ground-true data from sonographers.

**Keywords:** robotic ultrasound, robotic skills learning, learning from demonstrations, compliant manipulation, multi-modal prediction

## **1. Introduction**

Ultrasound imaging technology is widely used in clinical diagnosis due to its noninvasive, low-hazard, real-time imaging, relative safety, and low cost. Nowadays, ultrasound imaging can quickly detect diseases of different anatomical structures, including liver [1], gallbladder [2], bile duct [3], spleen [4], pancreas [5], kidney [6], adrenal gland [7], bladder [8], prostate [9], and thyroid [10]. Besides, during the global pandemic caused by COVID-19, ultrasound is largely used for the diagnosis of infected persons by detecting pleural effusion [11, 12]. However, the performance of ultrasound examination is highly dependent on the ultrasound skills of sonographers, in terms of ultrasound images, probe pose, and contact force (**Figure 1**). In general, the training of an eligible sonographer requires a relatively large amount of time and cases [13, 14]. In addition, the high-intensity repetitive scanning process causes a

<sup>1</sup> More details about our original research: https://arxiv.org/abs/2111.09739; https://arxiv.org/abs/ 2111.01625.

#### **Figure 1.**

*The medical ultrasound examination (as left figure shown) needs the dexterous manipulation of ultrasound probe (as right figure shown), which is caused by the environmental complexity in terms of ultrasound images, probe pose and contact force. (a) Clinical medical ultrasound examination. (b) Ultrasound probe.*

heavy burden on sonographers' physical condition, further leading to the scarcity of ultrasound practitioners.

To address these issues, many previous studies in robotics have attempted to use robots to help or even replace sonographers [15–17]. According to the extent of the system autonomy, robotic ultrasound can be categorized into three levels teleoperated, semi-autonomous, and full-autonomous. A teleoperated robotic ultrasound system usually contains two main parts—teacher site and student site [18–20]. The motion of the student robot is completely determined by the teacher, usually a trained sonographer, through different kinds of interaction devices, including a 3D space mouse [18], inertial measurement unit (IMU) handle [20, 21], and haptic interface [21]. While for a semi-autonomous robotic ultrasound system, the motion of the student robot is only partly determined by the teacher [22–24].

For a full-autonomous robotic ultrasound system, the student robot is supposed to perform the whole process of local ultrasound scanning by itself [25–27] and the teacher robot is only used for emergencies. Until today, only part full-autonomous robotic ultrasound system has been reported in the literature [28, 29]. These robotic ultrasound systems usually focus on the scanning of certain anatomical structures, such as the abdomen [28], thyroid [26], and vertebra [29]. A comprehensive survey on robotic ultrasound is given in **Table 1**. Despite these achievements, there are still many obstacles to the development of the robotic ultrasound system. For example, the robustness of most systems is poor and some preparations are required before performing the examination. The key is that there is not a high-dimensional model to learn ultrasound skills (**Figure 2**) from the sonographer, further to guide the adjustment of the ultrasound probe.

In this chapter, we proposed a learning-based approach to represent and learn ultrasound skills from sonographers' demonstrations, and further guide the scanning process [31]. During the learning process, the ultrasound images together with the relevant scanning variables (the probe pose and the contact force) are recorded and encapsulated into a high-dimensional model. Then, we leverage the

*Learning Robotic Ultrasound Skills from Human Demonstrations DOI: http://dx.doi.org/10.5772/intechopen.105069*


#### **Table 1.**

*A brief summary of robotic ultrasound. Initials: Convolutional neural network (CNN), magnetic resonance imaging (MRI), support vector machine (SVM), reinforcement learning (RL).*

#### **Figure 2.**

*The feedback information from three different modalities during a free-hand ultrasound scanning process. The first row represents ultrasound images. The second row represents the contact force in the z-axis between the probe and the skin, collected using a six-dimensional force/torque sensor. The third row represents the probe pose, which is collected using an inertial measurement unit (IMU).*

power of deep learning to implicitly capture the relation between the quality of ultrasound images and scanning skills. During the execution stage, the learned model is used to evaluate the current quality of the ultrasound image. To obtain a high-quality ultrasound image, a sampling-based approach is used to adjust the probe motion.

The main contribution of this chapter is two-fold: 1. A multi-modal model of ultrasound scanning skills is proposed and learned from human demonstrations, which takes ultrasound images, the probe pose, and the contact force into account. 2. Based on the learned model, a sampling-based strategy is proposed to adjust the ultrasound scanning process, to obtain a high-quality ultrasound image. Note that the goal of this chapter is to offer a learning-based framework to understand and acquire ultrasound skills from human demonstrations [31]. However, it is obvious that the learned model can be ported into a robot system as well, which is our work for the next step [32].

This chapter is organized as follows. Section II presents related work in the field of ultrasound images and ultrasound scanning guidance. Section III provides the methodology of our model, including the learning process of task representation, the data acquisition process through human demonstrations, and the strategy for scanning guidance during real-time execution. Section IV describes the detailed experimental validation, with a final discussion and conclusion in Section V.

## **2. Related work**

#### **2.1 Ultrasound images evaluation**

The goal of the ultrasound image evaluation is to understand images in terms of classification [33], segmenting [34], recognition [35], etc. With the rise of deep learning, many studies have attempted to process ultrasound images with the help of neural networks.

Liu et al. have summarized the extensive research results on ultrasound image processing with different network structures, including convolution neural network (CNN), recurrent neural network (RNN), auto-encoder network (AE), restricted Boltzmann's machine (RBM), and deep belief network (DBN) [36]. From the perspective of applications, Sridar et al. have employed CNN for the main plane classification in fetal ultrasound images, considering both local and global features of the ultrasound images [37]. To judge the severity of patients, Roy et al. have collected ultrasound images of the COVID-19 patient's lesions to train a spatial transformer network [38]. Deep learning is also adopted in the task of segmenting thyroid nodules from real-time ultrasound images [39]. While deep learning provides a superior framework to understand ultrasound images, it generally requires a large number of expert-labeled data, which can be difficult and expensive to collect.

Confidence map provides an alternative method in ultrasound image processing [40]. The confidence map is obtained through pixel-wise confidence estimation using a random walk algorithm. Chatelain et al. have devised a control law based on the ultrasound confidence map [41, 42], with the goal to adjust the in-plane rotation and motion of the probe. Confidence map is also employed to automatically determine the proper parameters for ultrasound scanning [25]. Furthermore, the advantages of the confidence maps have been demonstrated by combining with position control and force control to perform automatic position and pressure maintenance [43].

However, a confidence map is proposed with the hand-coded rules, which can not be directly used to guild the scanning motion.

## **2.2 Learning of the ultrasound scanning skills**

While the goal of ultrasound image processing is to understand images, learning ultrasound scanning skills aims to obtain high-quality ultrasound images through the adjustment of the scanning operation. Droste et al. have used a clamping device with IMU to obtain the relation between the probe pose and the ultrasound images during ultrasound examination [44]. Li et al. have built a simulation environment based on 3D ultrasound data acquired by a robot arm mounted with an ultrasound probe [45]. However, they did not explicitly learn ultrasound scanning skills. Instead, a reinforcement learning framework is adopted to optimize the confidence map of ultrasound images, by adapting the movement of the ultrasound probe. All of the abovementioned work only take the pose and the position of the probe as input, while in this chapter, the contact force between the probe and humans is also encoded, which is considered as a crucial factor during the ultrasound scanning process [46].

For the learning of force-relevant skills, a great variety of previous studies in robotic manipulation focused on learning the relation between force information and other task-related variables, such as the position and velocity [47], the surface electromyography [48], the task states and constraints [49], and the desired impedance [50–52]. A multi-modal representation method for contact-rich tasks has been proposed in ref. [53] to encode the concurrent feedback information from vision and touch. The method was learned through self-supervision, which can be further exploited to improve the sampling efficiency and the task success rate. To the best of our knowledge, for a multi-modal manipulation task, including feedback information from ultrasound, force, and motion, this is the first work to learn the task representation and the corresponding manipulation skills from human demonstrations.

## **3. Problem statement and method**

Our goal is to learn free-hand ultrasound scanning skills from human demonstrations. We want to evaluate the multi-modal task quality of combining multiple sensory information, including ultrasound images, the probe pose, and the contact force, with the goal to extract skills from the task representation and even transferring skills across tasks. We formulate the multisensory data by a neural network, where the parameters are trained by the data supervised by human ultrasound experts. In this section, we will discuss the learning process of the task representation, the data collection procedure, and the online ultrasound scanning guidance respectively.

### **3.1 Learning of ultrasound task representation**

For a free-hand ultrasound scanning task, three types of sensory feedback are available—ultrasound images from the ultrasound machine, force feedback from a mounted F/T sensor, and the probe pose from a mounted IMU. To encapsulate the heterogeneous nature of this sensory data, we propose a domain-specific encoder to model the task, as shown in **Figure 3**. For the ultrasound imaging feedback, we use a VGG-16 network to encode the 224 224 3 RGB images and yield a 128-d feature vector. For the force and pose feedback, we encode them with a four-layer fully

#### **Figure 3.**

*The multi-modal task learning architecture with human annotations. The network takes data from three different sensors as input—The ultrasound images, force/torque (F/T), and the pose information. The data for the task learning is acquired through human demonstrations, where the ultrasound quality is evaluated by sonographers. With the trained network, the multi-modal task can be represented as a high-dimensional vector.*

connected neural network to produce a 128-d feature vector. The resulting two feature vectors are concatenated together into one 256-d vector and connected with a onelayer fully connected network to yield a 128-d feature vector as the *task feature vector*. The multi-modal task representation is a neural network model denoted by Ω*θ*, where the parameters are trained as described in the following section.

## **3.2 Data collection via human demonstration**

The multi-modal model as shown in **Figure 3** has a large number of learnable parameters. To obtain the training data, we design a procedure to collect the ultrasound scanning data from human demonstrations, as shown in **Figure 4**. A novel probe holder is designed with intrinsically mounted sensors such as IMU and F/T sensors. A sonographer is performing the ultrasound scanning process with the probe, and the data collected during the scanning process is described as follows:

• *<sup>D</sup>* <sup>¼</sup> *<sup>S</sup><sup>i</sup>* , *P<sup>i</sup>* , *F<sup>i</sup> <sup>i</sup>*¼<sup>1</sup> … *<sup>N</sup>* denotes a dataset with *<sup>N</sup>* observations.

• *S<sup>i</sup>* ∈ <sup>224</sup>�224�<sup>3</sup> denotes the *i*-th collected ultrasound image with cropped size.

## **Figure 4.**

*The ultrasound scanning data collected from human demonstrations. The sonographer is performing an ultrasound scanning with a specifically designed probe holder. The sensory feedback during the scanning process is recorded, including the ultrasound images from an ultrasound machine, the contact force and torque from a 6D F/T sensor, and the probe pose from an IMU sensor.*


For each recorded data in the dataset *D*, the quality of the obtained ultrasound image is evaluated by three sonographers and labeled with 1*=*0. 1 stands for a good ultrasound image while 0 corresponds to an unacceptable ultrasound image. With the recorded data and the human annotations, the model Ω*<sup>θ</sup>* is trained with a loss function of cross-entropy. During training, we minimize the loss function with stochastic gradient descent. Once trained, this network produces a 128-d feature vector and evaluates the quality of the task at the same time. Given the task representation model Ω*θ*, an online adaptation strategy is proposed to improve the task quality by leveraging the multi-modal sensory feedback, as discussed in the next section.

### **3.3 Ultrasound skill learning**

As discussed in related work, it is still challenging to model and plan complex force-relevant tasks, mainly due to the inaccurate state estimation and the lack of a dynamics model. In our case, it is difficult to explicitly model the relations among ultrasound images, the probe pose, and the contact force. Therefore, we formulate the policy of ultrasound skills as a model-free reinforcement learning problem, and the target function is as follows:

$$\begin{array}{ll}\underset{P,F}{\operatorname{maximize}} & Q\_{\theta} = f(\mathcal{S}, P, F | \Omega\_{\theta}) \\\\ \text{subject to} & P \in D\_{P}, \ F \in D\_{F}, \\\\ & F\_{x} \geq 0. \end{array} \tag{1}$$

where *Q<sup>θ</sup>* denotes the quality of the task, which is computed using the learned model Ω*<sup>θ</sup>* by passing through the sensory feedback *S*, *P*, *F*. The constraint *Fz* ≥ 0 means that the contact force along the normal direction should be positive. *DP* and *DF* denote feasible sets of the probe pose and the contact force, respectively. In our case, these two feasible sets are determined by human demonstrations. However, it is worth mentioning that other task-specific constraints for the pose and the contact force can also be adopted here.

By choosing model-free, it requires no prior knowledge of the dynamics model of the ultrasound scanning process, namely the transition probabilities from one state (current ultrasound image) to another (next ultrasound image). More specifically, we choose Monte Carlo policy optimization [54], where the potential actions are sampled and selected directly from previous demonstrated experience, as shown in **Figure 5**. For the sampling, we impose a bound between *P*<sup>0</sup> *t* , *F*<sup>0</sup> *<sup>t</sup>* and *Pt*, *Ft*, which prevents the next state from moving too far away from the current state. If the new state < *P*<sup>0</sup> *t* , *F*<sup>0</sup> *t* , *St* > is evaluated by the task quality function *Q<sup>θ</sup>* as good, thus the desired pose *P*<sup>0</sup> *<sup>t</sup>* and contact force *F*<sup>0</sup> *<sup>t</sup>* are used as a goal for the human ultrasound scanning guidance. Otherwise, new *P*<sup>0</sup> *<sup>t</sup>* and *F*<sup>0</sup> *<sup>t</sup>* are sampled from the previous demonstrated experience. This process repeats N times, and the *P*<sup>0</sup> *t* , *F*<sup>0</sup> *<sup>t</sup>* with the best task quality, is

#### **Figure 5.**

*Our strategy for scanning guidance takes the current pose Pt, the contact force Ft, and the ultrasound image St as input, and outputs the next desired pose P*<sup>0</sup> *<sup>t</sup> and contact force F*<sup>0</sup> *t. For sampling, we impose a bound between P*<sup>0</sup> *t, F*<sup>0</sup> *t, and Pt, Ft, which prevents the next state from moving too far away from the current state. For evaluation, if the sampled pose and force are predicted as high-quality according to Eq. 1, the skill-learned model will select them as desired output, otherwise, it will repeat the sampling process. For execution, the desired pose P*<sup>0</sup> *<sup>t</sup> and contact force F*<sup>0</sup> *t are used as the goal for the human ultrasound scanning guidance.*

chosen as the final goal for the human scanning guidance. Note that this samplingbased approach does not guarantee the global optimality of Eq. 1. However, this is sufficient for human ultrasound scanning guidance because the final goal is only required to be updated at a relatively low frequency.

## **4. Experiments: design and results**

In this section, we use real experiments to examine the effectiveness of our proposed approach to multi-modal task representation learning. In particular, we design experiments to verify the following two questions:


## **4.1 Experiments setup**

For the experimental setup, we used a Mindray DC-70 ultrasound machine with an imaging frame rate of 900 Hz. The ultrasound image was captured using MAGEWELL USB Capture AIO with a frame rate of 120 Hz and a resolution of 2048 � 2160, as shown in **Figure 6**.

As shown in **Figure 4**, the IMU mounted on the ultrasound probe was ICM20948 and the microcontroller unit (MCU) was STM32F411. The highest frequency of IMU could reach 200 Hz, with an acceleration accuracy of 0.02 g and a gyroscope accuracy of 0*:*06<sup>∘</sup> /s. The IMU could output the probe pose in the forms of quaternion. For the force feedback, we used a 6D ATI Gamma F/T sensor with a maximum frequency of 7000 Hz. The computer used for the data collection was with Intel i5 CPU and Nvidia GTX 1650 GPU, and with the operating system of Ubuntu16.04 LTS and ROS Kinetic.

### **4.2 Data acquisition**

To make collected data comparable, the recording program needs to implement two functions—coordinate transformation and gravity compensation. The IMU will start to work as soon as the power is turned on. At that time, the probe pose corresponds to the initial coordinate system, so the quaternion's values are equal to (1, 0, 0, 0) and the

*Learning Robotic Ultrasound Skills from Human Demonstrations DOI: http://dx.doi.org/10.5772/intechopen.105069*

**Figure 6.**

*Experiments setup. (a) the ultrasound machine – Mindray DC-70. (b) the video capture device – MAGEWELL USB capture AIO. (c) Data-acquisition probe holder. (d) the computer for data collection with Intel i5 CPU and Nvidia GTX 1650 GPU, Ubuntu16.04 LTS.*

rotation matrix is the identity matrix. However, it will take some time from the wiring of the whole system to recording data, that is, the quaternion's values at the beginning of recording are never equal to the initial ones. To solve this problem, the coordinate transformation is necessary so that the original pose corresponds to the initial coordinate system. Besides, the force/torque signal contains the contact force with the device's gravity, which means our program should have the function of gravity compensation.

The real-time quaternion *Q* output by the IMU includes four values (*w*, *x*, *y*, *z*), which should be transformed into a real-time rotation matrix *R* for calculation. The initial rotation matrix is recorded as *R*0. As the rotation matrix is always orthogonal, the inverse and transpose of *R*<sup>0</sup> are equal and recorded as *R*�<sup>1</sup> <sup>0</sup> . The relative real-time rotation matrix *R*<sup>∗</sup> *<sup>x</sup>* is calculated as follows:

$$R\_\mathbf{x}^\* = R\_\mathbf{0}^{-1} \cdot R\_\mathbf{x} \tag{2}$$

The gravity components *Gx*, *Gy*, *Gz* in *X*, *Y*, *Z* directions are calculated by *R*<sup>∗</sup> *<sup>x</sup>* and gravity *G*, as follows:

$$\left[\mathbf{G}\_{\mathbf{x}}, \mathbf{G}\_{\mathbf{y}}, \mathbf{G}\_{\mathbf{y}}\right] = \left[\mathbf{0}, \mathbf{0}, \mathbf{G}\right] \cdot \mathbf{R}\_{\mathbf{x}}^{\*}\tag{3}$$

In this experiment, we mainly consider the influence of force, so simply record original values of torque. The force/torque sensor's output signal contains real-time force components *Fx*, *Fy*, *Fz* and torque components *Tx*, *Ty*, *Tz* in three directions. The fixed values *F* <sup>∗</sup> *<sup>x</sup>* , *F* <sup>∗</sup> *<sup>y</sup>* , *F* <sup>∗</sup> *<sup>z</sup>* , *T* <sup>∗</sup> *<sup>x</sup>* , *T*<sup>∗</sup> *<sup>y</sup>* , *T*<sup>∗</sup> *<sup>z</sup>* are calculated, as follows:

$$\left[F\_{\mathbf{x}}^{\*}, F\_{\mathbf{y}}^{\*}, F\_{\mathbf{z}}^{\*}\right] = \left[F\_{\mathbf{x}}, F\_{\mathbf{y}}, F\_{\mathbf{z}}\right] - \left[\mathbf{G}\_{\mathbf{x}}, \mathbf{G}\_{\mathbf{y}}, \mathbf{G}\_{\mathbf{z}}\right] \tag{4}$$

$$= \begin{bmatrix} T\_x, T\_y, T\_x \end{bmatrix} \tag{5}$$

It is worth noting that gravity *G* can be calculated by Eq. 6, where the maximum and minimum values of force components in three directions are denoted by *Fx*� *max* , *Fx*� *min* , *Fy*� *max* , *Fy*� *min* , *Fz*� *max* , *Fz*� *min* .

$$G = \frac{F\_{x-\text{max}} - F\_{x-\text{min}} + F\_{y-\text{max}} - F\_{y-\text{min}} + F\_{x-\text{max}} - F\_{x-\text{min}}}{6} \tag{6}$$

#### **Figure 7.**

*The snapshots of human ultrasound scanning demonstrations and samples of the obtained ultrasound images. Here the images (e) and (f) are labeled as good quality while (g) and (h) are labeled as bad quality.*

The recording frequency is 10 Hz and the accuracy of gravity compensation is 0.5 N. The ultrasound data were collected at the Hospital of Wuhan University. The sonographer was asked to scan the left kidneys of 5 volunteers with different physical conditions. Before the examination, the sonographer vertically held the probe above the left kidney of a volunteer. The ultrasound scanning process began with the recording program launched. The snapshots for the scanning process are shown in **Figure 7**. The collected data consists of ultrasound videos, the probe pose (quaternion), the contact force (force and torque), and labels (1/0). In total, there are 5995 samples of data. The number of positive samples (labeled 1) is 2266, accounting for 37.8%. The number of negative samples (labeled 0) is 3729, accounting for 62.2%. **Figure 8** presents trajectories of the recorded information.

### **4.3 Experimental results**

The detailed architecture of our network is shown in **Figure 9**. In this case, the 256-dimensional vector denotes the feature vector presented in **Figure 3**. We started the training process with a warm start to classify the ultrasound images. The adopted neural network was VGG-16 with cross-entropy loss. A totla of 5995 sets of recorded data were divided into 8:2 for training and validation. Data for training included ultrasound images and labels. The learning rate was 0.001 and the batch size was 20. For the ultrasound skill evaluation, data for training included images *S*, quaternion *P*, force *F*, and labels. By inputting *P*, *F*, *S*, this neural network would output predicted label. We fixed channels of the last fully connected layer in VGG-16 to 128 channels and merged it with ð Þ *P*, *F* feature vector. Four fully connected layers were added to transform ð Þ *P*, *F* vector into 128 channels, which were concatenated with VGG-16 output vector. After getting the vector with 256 channels, two fully connected layers and a softmax layer were added to output the confidence of the label. **Figure 10** presents accuracy and loss in training neural networks. The neural network for classification finally reached an accuracy of 96.89% and 95.61% in training and validation. The neural network for ultrasound skill

*Learning Robotic Ultrasound Skills from Human Demonstrations DOI: http://dx.doi.org/10.5772/intechopen.105069*

**Figure 8.**

*The trajectories of the recorded force and pose during an ultrasound examination. Force component in (a) X direction (b) Y direction (c) Z direction; rotation axis: (d) X Axis (e) Y Axis (f) Z Axis.*

#### **Figure 9.**

*Framework of the neural network. The ultrasound images were encoded with VGG-16. Four fully connected layers were added to transform P*ð Þ , *F the vector into 128 channels. Vectors from S and P*ð Þ , *F were concatenated. Two fully connected layers were added to transform concatenated vector's channels from 256 to 2. Finally, the softmax layer would map the last values to the probability of label 1 or 0.*

#### **Figure 10.**

*(a) Accuracy and (b) loss in training the neural network for ultrasound image classification. (c) Accuracy and (d) loss in training the neural network for ultrasound skills evaluation.*

evaluation finally reached an accuracy of 84.85% and 88.50% in training and validation.

To confirm the correlation between *P* and *F*, we divided data into different levels for training of four networks with different input ports. Net1 was trained with *S* and *P*, while Net2 was trained with *S* and *F*. Net3 was trained with *S*, *P*, and *F* with two parallel four-layer fully connected neural networks for inputting *P* and *F*. Net4 (**Figure 9**) was trained with *S*, *P*, and *F*, with concatenated ð Þ *P*, *F* vectors. The main difference between Net3 and Net4 was the existence of interactions between *P* and *F* during the training process. Each network had been trained five times with 20 training epochs. **Figure 11** presents the performance of four networks in validation.

Online ultrasound scanning skill guidance: We selected some continuous data streams from the dataset for verification, which had not been used for training the neural network. The sampling process in **Figure 5** was repeated 1000 times and the actions *P*, *F* with the best task quality were selected as the next desired action. The whole process took 3 to 5 seconds to output the desired action.

*Learning Robotic Ultrasound Skills from Human Demonstrations DOI: http://dx.doi.org/10.5772/intechopen.105069*

#### **Figure 11.**

*Accuracy of four networks in validation. Net1 was trained with S and P. Net2 was trained with S and F. Net3 was trained with S, P, and F, without interaction between P and F. Net4 was trained with S, P, and F, with the interaction between P and F.*

**Figure 12.** *Predicted force's component in (a) X-axis direction. (b) Y-axis direction. (c) Z-axis direction.*

**Figure 13.** *Predicted probe pose and corresponding ultrasound images. The confidence is the probability of label 1.*

**Figure 12** presents predicted results about components of contact force, compared with ground truth data. **Figure 13** presents the predicted probe pose with corresponding ultrasound images. **Figure 14** presents predicted and true probe poses with corresponding ultrasound images.

**Figure 14.** *Predicted and true probe pose, with corresponding ultrasound images. The confidence is the probability of label 1.*

## **5. Discussion and conclusion**

#### **5.1 Discussion**

This chapter provides a general approach to realizing autonomous ultrasound guidance with some merits as follows: (1) The clinical ultrasound skills are considered as a multi-modal model without any unique factor or parameter, namely, it could be used in most robotic ultrasound systems. (2) The ultrasound skills are mapped into low-dimensional vectors, which makes our approach more flexible with other machine learning methods, such as support vector machine, Gaussian mixture model, and k-nearest neighbors algorithm. (3) The autonomous ultrasound examinations are defined as roughly solving the proposed target function by Monte Carlo method, which provides a newborn and robust method to fulfill autonomous ultrasound.

There are some limitations in this chapter. First, the online guidance method is based on random sampling, which leads to a certain degree of randomness. Therefore, there is a certain difference between forecast results and true values in the short term. Second, to ensure the effectiveness of the sampling, a large number of samples are required, which means a higher task quality improvement would require more computation cost. With the expedition of the dataset, this method is difficult to meet the requirement of timely guidance, which can be solved by denoting the feasible set as a probabilistic model to acquire better sampling efficiency. Finally, we believe that through detailed adjustments to the neural network, the efficiency of this model has the opportunity to be greatly improved without losing too much accuracy.

## **6. Conclusion**

This chapter presents a framework for learning ultrasound scanning skills from human demonstrations. By analyzing the scanning process of sonographers, we define the entire scanning process as a multi-modal model of interactions between ultrasound images, the probe pose, and the contact force. A deep-learning-based method is proposed to learn ultrasound scanning skills, from which the skill-representing target function with a sampling-based strategy for ultrasound examination guidance is proposed. Experimental results show that this framework for ultrasound scanning

guidance is robust, and presents the possibility of developing a real-time learning guidance system. In future work, we will speed up the prediction process by taking advantage of self-supervision, with the goal to port the learned guidance model into a real robot system.

## **Acknowledgements**

This work was supported by Suzhou key industrial technology innovation project (SYG202121), and the Natural Science Foundation of Jiangsu Province (Grant No. BK20180235).

## **Author details**

Miao Li1,2\* and Xutian Deng1,2

1 Institute of Technological Sciences, Wuhan University, Wuhan, China

2 School of Power and Mechanical Engineering, Wuhan University, Wuhan, China

\*Address all correspondence to: limiao712@gmail.com

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **References**

[1] Gerstenmaier J, Gibson R. Ultrasound in chronic liver disease. Insights Into Imaging. 2014;**5**(4):441-455

[2] Konstantinidis IT, Bajpai S, Kambadakone AR, Tanabe KK, Berger DL, Zheng H, et al. Gallbladder lesions identified on ultrasound. Lessons from the last 10 years. Journal of Gastrointestinal Surgery. 2012;**16**(3): 549-553

[3] Lahham S, Becker BA, Gari A, Bunch S, Alvarado M, Anderson CL, et al. Utility of common bile duct measurement in ed point of care ultrasound: A prospective study. The American Journal of Emergency Medicine. 2018;**36**(6):962-966

[4] Omar A, Freeman S. Contrastenhanced ultrasound of the spleen. Ultrasound. 2016;**24**(1):41-49

[5] Larson MM. Ultrasound imaging of the hepatobiliary system and pancreas. Veterinary Clinics: Small Animal Practice. 2016;**46**(3):453-480

[6] Correas J-M, Anglicheau D, Joly D, Gennisson J-L, Tanter M, Hélénon O. Ultrasound-based imaging methods of the kidney—Recent developments. Kidney International. 2016;**90**(6): 1199-1210

[7] Dietrich C, Ignee A, Barreiros A, Schreiber-Dietrich D, Sienz M, Bojunga J, et al. Contrast-enhanced ultrasound for imaging of adrenal masses. Ultraschall in der Medizin-European Journal of Ultrasound. 2010; **31**(02):163-168

[8] Daurat A, Choquet O, Bringuier S, Charbit J, Egan M, Capdevila X. Diagnosis of postoperative urinary retention using a simplified ultrasound bladder measurement. Anesthesia & Analgesia. 2015;**120**(5):1033-1038

[9] Mitterberger M, Horninger W, Aigner F, Pinggera GM, Steppan I, Rehder P, et al. Ultrasound of the prostate. Cancer Imaging. 2010;**10**(1):40

[10] Haymart MR, Banerjee M, Reyes-Gastelum D, Caoili E, Norton EC. Thyroid ultrasound and the increase in diagnosis of low-risk thyroid cancer. The Journal of Clinical Endocrinology & Metabolism. 2019;**104**(3):785-792

[11] Buonsenso D, Pata D, Chiaretti A. Covid-19 outbreak: Less stethoscope, more ultrasound. The Lancet Respiratory Medicine. 2020;**8**(5):e27

[12] Soldati G, Smargiassi A, Inchingolo R, Buonsenso D, Perrone T, Briganti DF, et al. Proposal for international standardization of the use of lung ultrasound for covid-19 patients; a simple, quantitative, reproducible method. Journal of Ultrasound in Medicine. 2020;**10**:1413-1419

[13] Arger PH, Schultz SM, Sehgal CM, Cary TW, Aronchick J. Teaching medical students diagnostic sonography. Journal of Ultrasound in Medicine. 2005;**24**(10): 1365-1369

[14] Hertzberg BS, Kliewer MA, Bowie JD, Carroll BA, DeLong DH, Gray L, et al. Physician training requirements in sonography: How many cases are needed for competence? American Journal of Roentgenology. 2000;**174**(5):1221-1227

[15] Boctor EM, Choti MA, Burdette EC, Webster RJ III. Three-dimensional ultrasound-guided robotic needle placement: An experimental evaluation. *Learning Robotic Ultrasound Skills from Human Demonstrations DOI: http://dx.doi.org/10.5772/intechopen.105069*

The International Journal of Medical Robotics and Computer Assisted Surgery. 2008;**4**(2):180-191

[16] Priester AM, Natarajan S, Culjat MO. Robotic ultrasound systems in medicine. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control. 2013;**60**(3):507-523

[17] Chatelain P, Krupa A, Navab N. 3d ultrasound-guided robotic steering of a flexible needle via visual servoing. In: 2015 IEEE International Conference on Robotics and Automation (ICRA). Washington, USA: IEEE; 2015. pp. 2250-2255

[18] Seo J, Cho J, Woo H, Lee Y. Development of prototype system for robot-assisted ultrasound diagnosis. In: 2015 15th International Conference on Control, Automation and Systems (ICCAS). Busan, Korea: IEEE; 2015. pp. 1285-1288

[19] Mathiassen K, Fjellin JE, Glette K, Hol PK, Elle OJ. An ultrasound robotic system using the commercial robot ur5. Frontiers in Robotics and AI. 2016;**3**:1

[20] Guan X, Wu H, Hou X, Teng Q, Wei S, Jiang T, et al. Study of a 6dof robot assisted ultrasound scanning system and its simulated control handle. In: 2017 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM). Ningbo, China: IEEE; 2017. pp. 469-474

[21] Sandoval J, Laribi MA, Zeghloul S, Arsicault M, Guilhem J-M. Cobot with prismatic compliant joint intended for doppler sonography. Robotics. 2020; **9**(1):14

[22] Patlan-Rosales PA, Krupa A. A robotic control framework for 3-d

quantitative ultrasound elastography. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). Marina Bay, Singapore: IEEE; 2017. pp. 3805-3810

[23] Mathur B, Topiwala A, Schaffer S, Kam M, Saeidi H, Fleiter T, et al. A semi-autonomous robotic system for remote trauma assessment. In: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE). Athens, Greece: IEEE; 2019. pp. 649-656

[24] Victorova M, Navarro-Alarcon D, Zheng Y-P. 3d ultrasound imaging of scoliosis with force-sensitive robotic scanning. In: 2019 Third IEEE International Conference on Robotic Computing (IRC). Naples, Italy: IEEE; 2019. pp. 262-265

[25] Virga S, Zettinig O, Esposito M, Pfister K, Frisch B, Neff T, et al. Automatic force-compliant robotic ultrasound screening of abdominal aortic aneurysms. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Daejeon, Korea: IEEE; 2016. pp. 508-513

[26] Kim YJ, Seo JH, Kim HR, Kim KG. Development of a control algorithm for the ultrasound scanning robot (nccusr) using ultrasound image and force feedback. The International Journal of Medical Robotics and Computer Assisted Surgery. 2017;**13**(2):e1756

[27] Huang Q, Lan J, Li X. Robotic arm based automatic ultrasound scanning for three-dimensional imaging. IEEE Transactions on Industrial Informatics. 2018;**15**(2):1173-1182

[28] Hennersperger C, Fuerst B, Virga S, Zettinig O, Frisch B, Neff T, et al. Towards mri-based autonomous robotic us acquisitions: A first feasibility study. IEEE Transactions on Medical Imaging. 2016;**36**(2):538-548

[29] Ning G, Zhang X, Liao H. Autonomic robotic ultrasound imaging system based on reinforcement learning. IEEE Transactions on Bio-medical Engineering. 2021;**68**:2787-2797

[30] Kim R, Schloen J, Campbell N, Horton S, Zderic V, Efimov I, et al. Robot-assisted semi-autonomous ultrasound imaging with tactile sensing and convolutional neural-networks. IEEE Transactions on Medical Robotics and Bionics. 2020;**3**:96-105

[31] Deng X, Lei Z, Wang Y, Li M. Learning ultrasound scanning skills from human demonstrations. 2021. arXiv preprint arXiv:2111.09739. 2021. DOI: 10.48550/arXiv.2111.09739

[32] Deng X, Chen Y, Chen F, Li M. Learning robotic ultrasound scanning skills via human demonstrations and guided explorations. 2021. arXiv preprint arXiv:2111.01625. DOI: 10.48550/arXiv.2111.01625

[33] Hijab A, Rushdi MA, Gomaa MM, Eldeib A. Breast cancer classification in ultrasound images using transfer learning. In: 2019 Fifth International Conference on Advances in Biomedical Engineering (ICABME). Tripoli, Lebanon: IEEE; 2019. pp. 1-4

[34] Ghose S, Oliver A, Mitra J, Mart R, Lladó X, Freixenet J, et al. A supervised learning framework of statistical shape and probability priors for automatic prostate segmentation in ultrasound images. Medical Image Analysis. 2013; **17**(6):587-600

[35] Wang L, Yang S, Yang S, Zhao C, Tian G, Gao Y, et al. Automatic thyroid nodule recognition and diagnosis in

ultrasound imaging with the yolov2 neural network. World Journal of Surgical Oncology. 2019;**17**(1):1-9

[36] Liu S, Wang Y, Yang X, Lei B, Liu L, Li SX, et al. Deep learning in medical ultrasound analysis: A review. Engineering. 2019;**5**(2):261-275

[37] Sridar P, Kumar A, Quinton A, Nanan R, Kim J, Krishnakumar R. Decision fusion-based fetal ultrasound image plane classification using convolutional neural networks. Ultrasound in Medicine & Biology. 2019; **45**(5):1259-1273

[38] Roy S, Menapace W, Oei S, Luijten B, Fini E, Saltori C, et al. Deep learning for classification and localization of covid-19 markers in point-of-care lung ultrasound. IEEE Transactions on Medical Imaging. 2020; **39**(8):2676-2687

[39] Ouahabi A, Taleb-Ahmed A. Deep learning for real-time semantic segmentation: Application in ultrasound imaging. Pattern Recognition Letters. 2021;**144**:27-34

[40] Karamalis A, Wein W, Klein T, Navab N. Ultrasound confidence maps using random walks. Medical Image Analysis. 2012;**16**(6):1101-1112

[41] Chatelain P, Krupa A, Navab N. Optimization of ultrasound image quality via visual servoing. In: 2015 IEEE International Conference on Robotics and Automation (ICRA). Washington, USA: IEEE; 2015. pp. 5997-6002

[42] Chatelain P, Krupa A, Navab N. Confidence-driven control of an ultrasound probe: Target-specific acoustic window optimization. In: 2016 IEEE International Conference on Robotics and Automation (ICRA).

*Learning Robotic Ultrasound Skills from Human Demonstrations DOI: http://dx.doi.org/10.5772/intechopen.105069*

Stockholm, Sweden: IEEE; 2016. pp. 3441-3446

[43] Chatelain P, Krupa A, Navab N. Confidence-driven control of an ultrasound probe. IEEE Transactions on Robotics. 2017;**33**(6):1410-1424

[44] Droste R, Drukker L, Papageorghiou AT, Noble JA. Automatic probe movement guidance for freehand obstetric ultrasound. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Lima, Peru: Springer; 2020. pp. 583-592

[45] Li K, Wang J, Xu Y, Qin H, Liu D, Liu L, et al. Autonomous navigation of an ultrasound probe towards standard scan planes with deep reinforcement learning. Xi'an, China: IEEE; 2021:8302– 8308. arXiv preprint arXiv:2103.00718

[46] Jiang Z, Grimm M, Zhou M, Hu Y, Esteban J, Navab N. Automatic forcebased probe positioning for precise robotic ultrasound acquisition. IEEE Transactions on Industrial Electronics. 2020;**68**:11200-11211

[47] Gao X, Ling J, Xiao X, Li M. Learning force-relevant skills from human demonstration. Complexity. 2019;**2019**: 5262859

[48] Zeng C, Yang C, Cheng H, Li Y, Dai S-L. Simultaneously encoding movement and semg-based stiffness for robotic skill learning. IEEE Transactions on Industrial Informatics. 2020;**17**(2): 1244-1252

[49] Holladay R, Lozano-Pérez T, Rodriguez A. Planning for multi-stage forceful manipulation. Xi'an, China: IEEE; 2021:6556–6562. arXiv preprint arXiv:2101.02679

[50] Li M, Tahara K, Billard A. Learning task manifolds for constrained object

manipulation. Autonomous Robots. 2018;**42**(1):159-174

[51] Li M, Yin H, Tahara K, Billard A. Learning object-level impedance control for robust grasping and dexterous manipulation. In: 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE; 2014. pp. 6784-6791

[52] Li M, Bekiroglu Y, Kragic D, Billard A. Learning of grasp adaptation through experience and tactile sensing. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago, USA: IEEE; 2014. pp. 3339-3346

[53] Lee MA, Zhu Y, Srinivasan K, Shah P, Savarese S, Fei-Fei L, et al. Making sense of vision and touch: Selfsupervised learning of multimodal representations for contact-rich tasks. In: 2019 International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE; 2019. pp. 8943-8950

[54] Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; 2018

## **Chapter 3**
