**5. Discussion and conclusions**

In a letter to Nature Neuroscience entitled "What does 'understanding' mean"?, the author confesses that "upon reflection, it is depressing, if not scandalous, to realize how rarely I ask myself this" [151]. Arguably, the letter's intent was not to confess ignorance or lack of interest but to point out that a critically important issue has been long neglected. There is nothing that one is familiar more intimately and directly with than sensations of confusion, mental effort and understanding (except, perhaps, for the sensations of one's own breathing and heart beating), yet the issue of understanding has not been receiving significant attention in cognitive sciences (see some discussion in [27, 152]). The intent of this chapter was to suggest that a theory of understanding might be within reach (and grasp), requiring the synthesis of new ideas and the long existing ones, re-evaluated in the light of new data. The proposed theoretical framework is that of active inference [13, 14] carried out under the requirements of limited neuronal pool size and minimized energy expenditures. Within that framework, the meaning of 'understanding' reduces to optimization strategy in the deployment of neuronal resources that enables expanding domains of inference while minimizing expansion costs. In subjective experiences, the meaning of understanding reduces to attaining 'grasp', i.e. unifying some disparate entities, in a coordinated relational structure that enables relational [153, 154] and other forms of reasoning. Attaining 'grasp' can be accompanied by cognitive strain and culminates in exhilaration and euphoria making the activity self-rewarding (the Greek *euporia* stands for *'*easy passage or travel' while its opposite *aporia* denotes 'difficulty or impossibility of passage' [48]). This section will compare the present proposal to some findings in the literature, aiming to suggest directions for further research.

#### **5.1 Mental simulations**

The phenomenon of mental modeling (mental simulations) has been addressed in a number of studies [155–157], focusing on the "paradox" of endogenouslydriven mental activity:" how can findings that carry conviction result from a new experiment conducted entirely within the head" [155]. Data has been accumulated demonstrating that mental simulations engage mechanisms that are different from those involved in reasoning based on descriptive knowledge, exhibit analogue properties, and can produce correct inferences when descriptive knowledge is lacking. At the same time, it was observed that mental simulations proceed in a piecemeal fashion (not a holistic image) [157].

**103**

*Brain Functional Architecture and Human Understanding*

The present proposal pivots on the notion that mental modeling was made possible by decoupling regulatory processes from the motor-sensory feedback, which shifted the power of conviction from experiments in the world to experiments in the head (e.g., arguments in Pythagorean theorem are entirely convincing but not amenable to experimental verification). On the account of the present theory, the experience of understanding accompanies formation of tightly coordinated gestalts which, simultaneously, afford some degrees of freedom to their constituents. Exploring these degrees of freedom can indeed proceed in a step-by-step fashion, i.e. experimental findings in [157] and are not incompatible with the theory. Other proposals in the recent literature addressing the role of mental simulation [158]

The operation of focused attention was compared to a searchlight that shifts between and thus helps forming conjunctions of separate attributes or features of perceived objects [159]. It was further proposed that functions of the "searchlight" are carried out by activity bursts in thalamic nuclei while conjunctions are implemented by rapidly modifiable synapses (called Malsburg synapses), orchestrated by

Notwithstanding suggestions in [159, 160] concerning transient assemblies, considering the role of focused attention in manipulating quasi-stable assemblies (packets) calls for a different metaphor. A neuronal packet is a superposition of multiple behavior patterns afforded by an object. Overcoming energy barrier and shifting attention from outside the packet to the inside (see **Figure 13**(2)) actualizes one of the patterns. Think of 'grasp' as seizing an object and holding it in a closed fist, followed by opening the fist and holding the object in an open palm. With the eyes closed, one needs to run the fingertip of another hand over the object in order to discern its shape. The point is that concentrating attention amounts to focusing energy delivery on particular neurons causing their excitation or inhibition, which gives rise to the experience of a behaving object. In short, both the searchlight and the fingertip metaphors define attention as physical actions applied to neurons. However, the former metaphor conjures up an image of a wandering light beam falling on the elements of neuronal structures and thus making them discernable to the "mind's eye" while the latter one connotes the image of a finger (or stick) 'tapping' on the neurons, which seems to better represent the notion of physical action.

The discovery of mirror neurons inspired hopes that understanding of the origins of language can be "within our grasp" [161] Mirror neurons discharge during active movements of the hand or mouth (or both) performed by the subject or observed being performed by others (hence, the mirror neurons). It was hypothesized that the latter feature establishes a bridge from 'doing' to 'communicating', or from acting to message sending [161, 162]. Other hypothesis concerning language origins attribute its emergence to internal, as opposed to communicative, functions [35–37] and conceptualize language mechanisms as the manipulation of neuronal assemblies [163, 164]. This theory offers an opinion that seems to unify all three hypotheses, as follows. First, note that mirror neurons were determined to be of three types: 'grasping with the hand' neurons, 'holding' neurons and 'tearing' neurons [161]. Apply these notions to manipulation of mental 'objects' (as opposed to physical ones) and assume that 'grasping with the hand' denotes formation of a packet, 'holding' denotes the state when attention is "hovering outside" the packet (see **Figure 14**),

*DOI: http://dx.doi.org/10.5772/intechopen.95594*

resonate with the key notions in this theory.

**5.2 Transient assemblies and the searchlight hypothesis**

the bursts to produce transient cell assemblies [160].

**5.3 Understanding and language**

*Brain Functional Architecture and Human Understanding DOI: http://dx.doi.org/10.5772/intechopen.95594*

*Connectivity and Functional Specialization in the Brain*

to other primates (the cerebral surface area is 120 cm2

**5. Discussion and conclusions**

The present theory attributes emergence of human understanding to evolutionary developments causing decoupling of regulatory processes from the sensorymotor feedback loops [10, 28]. The idea is consistent with suggestions in [150] regarding evolutionary origins of human cognition. Analysis [150] focuses on the development of cerebral cortex, pointing at its vast expansion in the humans relative

in the human) and disproportionate expansion of distributed association regions within the cortex. The hypothesis is that rapid expansion of the cortical mantle may have decoupled ("untethered") large portions of the cortex from sensory hierarchies and resulted in the development of networks that either control processes in the sensory networks or are engaged in parallel activities that are "detached from sensory perception and motor actions – what one might term 'internal mentation" [150].

In a letter to Nature Neuroscience entitled "What does 'understanding' mean"?, the author confesses that "upon reflection, it is depressing, if not scandalous, to realize how rarely I ask myself this" [151]. Arguably, the letter's intent was not to confess ignorance or lack of interest but to point out that a critically important issue has been long neglected. There is nothing that one is familiar more intimately and directly with than sensations of confusion, mental effort and understanding (except, perhaps, for the sensations of one's own breathing and heart beating), yet the issue of understanding has not been receiving significant attention in cognitive sciences (see some discussion in [27, 152]). The intent of this chapter was to suggest that a theory of understanding might be within reach (and grasp), requiring the synthesis of new ideas and the long existing ones, re-evaluated in the light of new data. The proposed theoretical framework is that of active inference [13, 14] carried out under the requirements of limited neuronal pool size and minimized energy expenditures. Within that framework, the meaning of 'understanding' reduces to optimization strategy in the deployment of neuronal resources that enables expanding domains of inference while minimizing expansion costs. In subjective experiences, the meaning of understanding reduces to attaining 'grasp', i.e. unifying some disparate entities, in a coordinated relational structure that enables relational [153, 154] and other forms of reasoning. Attaining 'grasp' can be accompanied by cognitive strain and culminates in exhilaration and euphoria making the activity self-rewarding (the Greek *euporia* stands for *'*easy passage or travel' while its opposite *aporia* denotes 'difficulty or impossibility of passage' [48]). This section will compare the present proposal to some findings in the literature, aiming to suggest directions for further research.

The phenomenon of mental modeling (mental simulations) has been addressed

in a number of studies [155–157], focusing on the "paradox" of endogenouslydriven mental activity:" how can findings that carry conviction result from a new experiment conducted entirely within the head" [155]. Data has been accumulated demonstrating that mental simulations engage mechanisms that are different from those involved in reasoning based on descriptive knowledge, exhibit analogue properties, and can produce correct inferences when descriptive knowledge is lacking. At the same time, it was observed that mental simulations proceed in a piecemeal

in the macaque and 960 cm2

*4.3.9 Decoupling*

**102**

**5.1 Mental simulations**

fashion (not a holistic image) [157].

The present proposal pivots on the notion that mental modeling was made possible by decoupling regulatory processes from the motor-sensory feedback, which shifted the power of conviction from experiments in the world to experiments in the head (e.g., arguments in Pythagorean theorem are entirely convincing but not amenable to experimental verification). On the account of the present theory, the experience of understanding accompanies formation of tightly coordinated gestalts which, simultaneously, afford some degrees of freedom to their constituents. Exploring these degrees of freedom can indeed proceed in a step-by-step fashion, i.e. experimental findings in [157] and are not incompatible with the theory. Other proposals in the recent literature addressing the role of mental simulation [158] resonate with the key notions in this theory.

#### **5.2 Transient assemblies and the searchlight hypothesis**

The operation of focused attention was compared to a searchlight that shifts between and thus helps forming conjunctions of separate attributes or features of perceived objects [159]. It was further proposed that functions of the "searchlight" are carried out by activity bursts in thalamic nuclei while conjunctions are implemented by rapidly modifiable synapses (called Malsburg synapses), orchestrated by the bursts to produce transient cell assemblies [160].

Notwithstanding suggestions in [159, 160] concerning transient assemblies, considering the role of focused attention in manipulating quasi-stable assemblies (packets) calls for a different metaphor. A neuronal packet is a superposition of multiple behavior patterns afforded by an object. Overcoming energy barrier and shifting attention from outside the packet to the inside (see **Figure 13**(2)) actualizes one of the patterns. Think of 'grasp' as seizing an object and holding it in a closed fist, followed by opening the fist and holding the object in an open palm. With the eyes closed, one needs to run the fingertip of another hand over the object in order to discern its shape. The point is that concentrating attention amounts to focusing energy delivery on particular neurons causing their excitation or inhibition, which gives rise to the experience of a behaving object. In short, both the searchlight and the fingertip metaphors define attention as physical actions applied to neurons. However, the former metaphor conjures up an image of a wandering light beam falling on the elements of neuronal structures and thus making them discernable to the "mind's eye" while the latter one connotes the image of a finger (or stick) 'tapping' on the neurons, which seems to better represent the notion of physical action.

#### **5.3 Understanding and language**

The discovery of mirror neurons inspired hopes that understanding of the origins of language can be "within our grasp" [161] Mirror neurons discharge during active movements of the hand or mouth (or both) performed by the subject or observed being performed by others (hence, the mirror neurons). It was hypothesized that the latter feature establishes a bridge from 'doing' to 'communicating', or from acting to message sending [161, 162]. Other hypothesis concerning language origins attribute its emergence to internal, as opposed to communicative, functions [35–37] and conceptualize language mechanisms as the manipulation of neuronal assemblies [163, 164]. This theory offers an opinion that seems to unify all three hypotheses, as follows.

First, note that mirror neurons were determined to be of three types: 'grasping with the hand' neurons, 'holding' neurons and 'tearing' neurons [161]. Apply these notions to manipulation of mental 'objects' (as opposed to physical ones) and assume that 'grasping with the hand' denotes formation of a packet, 'holding' denotes the state when attention is "hovering outside" the packet (see **Figure 14**),

and 'tearing' denotes entering the packet and experiencing the contents. A reversible 'holding' – 'tearing' transition corresponds to set operation: a manifold of features is experienced as a unity (one object) devoid of (separated from) any sensory contents, followed by experiencing a series of sensory features comprised in the object.

Next, think of watching a play performed on the stage, and then consider the same play being read to you. In the latter case, assume that the cast of characters and all the names have been removed so only the text proper remained. It is not hard to realize that figuring out what is going on might be possible but extremely difficult, requiring forming and comparing different word combinations (e.g. "The queen, my lord, is dead. She should have died hereafter…" – who is talking here? Note that you are facing no such challenges when watching the play). Finally, imagine that only the cast of characters and names are extracted from the text and the rest is discarded. Clearly, it can be very hard but possible to make some sense of the former version while the latter one makes no sense at all. It is also evident that the range of understanding in the former version will be restricted to a few characters and a few consecutive episodes, with the text becoming an impenetrable mess after that. Restoring the original text (putting the names back where they belong) resolves the otherwise insurmountable difficulty. Here comes a tentative proposal:

Emergence of language followed decoupling from the sensory-motor feedback while retaining the mechanisms of sensory-motor coordination. Language emerged as a means to support mental coordination over an expanding variety of mental objects, by adopting the mechanisms of communicative signaling and re-purposing them for self-signaling (communicative signals make an animal aware of a predator or other condition without direct sensory confirmation of that condition). Symbols (labels) are implemented as neuronal assembles [163, 164] or 'symbol packets' attached to 'object packets', 'symbol packets' have no sensory content except for the minimum required for making them distinct. Symbols make one roughly aware of the contents of a packet without the expense of entering and examining these contents, thus facilitating landscape navigation (think of labels attached to drawers that need to be pulled with effort). The process of thinking alternates reversibly between the packet arrays (roughly, between words and images and actions they signify). Understanding phrases involves syntactic coordination and, crucially, substantive, or grounded [165] coordination (i.e., between the objects and activities signified by the words). Findings in [166] demonstrating "grasping ideas with the motor system", i.e. activation of the motor cortex by words referring to bodily actions, even idiomatically, other results [167] appear to support these contentions.

#### **5.4 Cognitive disorders**

Pathological malfunctions in the operation of the DMN/SN/CEN system (**Figure 19**) can cause breakdowns in the regulation of energy landscapes (energy barriers are rigid and remain abnormally high or abnormally low), entailing a range of cognitive disorders. In particular, abnormally high barriers hamper correlation between cortical areas and interactions between frontal and parietal, neostriatum, and thalamic areas involved in attention control, which can manifest in performance impairments characteristic of the autism spectrum disorders [168–170]. By contrast, abnormally low barriers entail destabilization and disintegration of neuronal packets, leading to irreversible memory losses and other impairments characteristic of the Alzheimer's – type disorders (e.g., subjects can be expected to fail clock drawing tests due to the inability to recollect proper elements and/or their respective positions [171]. In general, abnormally high energy barriers degrade functional connectivity between memory elements (percepts, concepts) while abnormally low barriers degrade the elements. It appears possible to relate a variety

**105**

**Acknowledgements**

*Brain Functional Architecture and Human Understanding*

unified approaches in the diagnosis and treatment.

notion of energy-minimizing deployment of neuronal resources.

brought them long ago to conclusions similar to those expressed here:

*and we have seen that it contains the necessary equipment" [125] p. 187*

The author is grateful to Karl Friston, Rosalyn Moran, Thomas Parr, Maxwell Ramstead, Vlad Krasnopolsky, Todd Hylton, Mark Latash for insightful comments

*"It is worth while to speculate about cell assemblies as an alternative to feature detectors and hierarchies of classificatory units. These concepts are related to Perceptrons. Similarly, cell assemblies would find their technological analogue in a (non existing) Conceptron. … It would be surprising if it turned out that the real brain makes use only of one or the other scheme. Most likely the two schemes are used in combination, with the hierarchical organization predominating at the sensory and motor periphery of the nervous system, and the cell assemblies in between. From this point of view the cerebral cortex would seem a good place for cell assemblies,* 

of cognitive disorders (e.g. different forms and stages of dementia) to persistent abnormalities in energy landscapes, which can potentially lead to new insights and

To conclude, this chapter suggested a hand-in-glove relationship between an information-theoretic account of cognitive processes (active inference) and a thermodynamics-centered account asserting that neuronal mechanisms underlying active inference are sculpted by physical conditions in the brain limiting its volume and energy supply. Active inference has been conceptualized as a regulatory process allowing organisms to operate within the sensory-motor feedback loop. This is accomplished by forming generative models that anticipate consequences of overt actions as those are reflected in the sensory inflows, followed by adjustments that reconcile the actions and the models in a manner serving to satisfy the survival and other needs. This chapter applied the active inference framework to define regulatory mechanisms decoupled from the motor-sensory feedback loop, under the

Advanced theoretical analysis seeking to unite conceptual foundations of the physical sciences and biology is uncovering a profound unity of the information-theoretic and thermodynamics-centered viewpoints, spanning the range from inanimate matter to the most complex life forms [172]. Moreover, recent experimental findings demonstrate the possibility of information-to-energy conversion [173]. Analysis indicates that self-organization obtains access to progressively higher degrees of order and organization in the channels of energy transduction [172]. The notion of increasing levels of coordination in the brain functional architecture, from subcellular processes to mental modeling, appears to agree with this general principle. Evolutionary climb to the upper reaches of organization manifested in creative thinking was made possible by minimizing energy costs in every step. On the present theory, active inference is the result and expression of that underlying, thermodynamically- enforced frugality. In machine intelligence, the bulk of effort has been concentrated on learning techniques derived from the perceptron idea (conditioning). This proposal suggests advancing from machine learning to machine understanding, requiring a different conceptual foundation. It has been argued that human understanding requires awareness, and physical processes in the brain that evoke awareness might not be amenable to computational simulation [174]. Notwithstanding these arguments, it appears possible to construct artifacts possessing a level of understanding that does not reach human heights but exceeds those accessible to the conventional technology. It feels appropriate to end this chapter by giving credit to those whose foresight

*DOI: http://dx.doi.org/10.5772/intechopen.95594*

#### *Brain Functional Architecture and Human Understanding DOI: http://dx.doi.org/10.5772/intechopen.95594*

*Connectivity and Functional Specialization in the Brain*

and 'tearing' denotes entering the packet and experiencing the contents. A reversible 'holding' – 'tearing' transition corresponds to set operation: a manifold of features is experienced as a unity (one object) devoid of (separated from) any sensory contents, followed by experiencing a series of sensory features comprised in the object. Next, think of watching a play performed on the stage, and then consider the same play being read to you. In the latter case, assume that the cast of characters and all the names have been removed so only the text proper remained. It is not hard to realize that figuring out what is going on might be possible but extremely difficult, requiring forming and comparing different word combinations (e.g. "The queen, my lord, is dead. She should have died hereafter…" – who is talking here? Note that you are facing no such challenges when watching the play). Finally, imagine that only the cast of characters and names are extracted from the text and the rest is discarded. Clearly, it can be very hard but possible to make some sense of the former version while the latter one makes no sense at all. It is also evident that the range of understanding in the former version will be restricted to a few characters and a few consecutive episodes, with the text becoming an impenetrable mess after that. Restoring the original text (putting the names back where they belong) resolves the

otherwise insurmountable difficulty. Here comes a tentative proposal:

Emergence of language followed decoupling from the sensory-motor feedback while retaining the mechanisms of sensory-motor coordination. Language emerged as a means to support mental coordination over an expanding variety of mental objects, by adopting the mechanisms of communicative signaling and re-purposing them for self-signaling (communicative signals make an animal aware of a predator or other condition without direct sensory confirmation of that condition). Symbols (labels) are implemented as neuronal assembles [163, 164] or 'symbol packets' attached to 'object packets', 'symbol packets' have no sensory content except for the minimum required for making them distinct. Symbols make one roughly aware of the contents of a packet without the expense of entering and examining these contents, thus facilitating landscape navigation (think of labels attached to drawers that need to be pulled with effort). The process of thinking alternates reversibly between the packet arrays (roughly, between words and images and actions they signify). Understanding phrases involves syntactic coordination and, crucially, substantive, or grounded [165] coordination (i.e., between the objects and activities signified by the words). Findings in [166] demonstrating "grasping ideas with the motor system", i.e. activation of the motor cortex by words referring to bodily actions, even idiomatically, other results [167] appear to support these contentions.

Pathological malfunctions in the operation of the DMN/SN/CEN system (**Figure 19**) can cause breakdowns in the regulation of energy landscapes (energy barriers are rigid and remain abnormally high or abnormally low), entailing a range of cognitive disorders. In particular, abnormally high barriers hamper correlation between cortical areas and interactions between frontal and parietal, neostriatum, and thalamic areas involved in attention control, which can manifest in performance impairments characteristic of the autism spectrum disorders [168–170]. By contrast, abnormally low barriers entail destabilization and disintegration of neuronal packets, leading to irreversible memory losses and other impairments characteristic of the Alzheimer's – type disorders (e.g., subjects can be expected to fail clock drawing tests due to the inability to recollect proper elements and/or their respective positions [171]. In general, abnormally high energy barriers degrade functional connectivity between memory elements (percepts, concepts) while abnormally low barriers degrade the elements. It appears possible to relate a variety

**104**

**5.4 Cognitive disorders**

of cognitive disorders (e.g. different forms and stages of dementia) to persistent abnormalities in energy landscapes, which can potentially lead to new insights and unified approaches in the diagnosis and treatment.

To conclude, this chapter suggested a hand-in-glove relationship between an information-theoretic account of cognitive processes (active inference) and a thermodynamics-centered account asserting that neuronal mechanisms underlying active inference are sculpted by physical conditions in the brain limiting its volume and energy supply. Active inference has been conceptualized as a regulatory process allowing organisms to operate within the sensory-motor feedback loop. This is accomplished by forming generative models that anticipate consequences of overt actions as those are reflected in the sensory inflows, followed by adjustments that reconcile the actions and the models in a manner serving to satisfy the survival and other needs. This chapter applied the active inference framework to define regulatory mechanisms decoupled from the motor-sensory feedback loop, under the notion of energy-minimizing deployment of neuronal resources.

Advanced theoretical analysis seeking to unite conceptual foundations of the physical sciences and biology is uncovering a profound unity of the information-theoretic and thermodynamics-centered viewpoints, spanning the range from inanimate matter to the most complex life forms [172]. Moreover, recent experimental findings demonstrate the possibility of information-to-energy conversion [173]. Analysis indicates that self-organization obtains access to progressively higher degrees of order and organization in the channels of energy transduction [172]. The notion of increasing levels of coordination in the brain functional architecture, from subcellular processes to mental modeling, appears to agree with this general principle. Evolutionary climb to the upper reaches of organization manifested in creative thinking was made possible by minimizing energy costs in every step. On the present theory, active inference is the result and expression of that underlying, thermodynamically- enforced frugality.

In machine intelligence, the bulk of effort has been concentrated on learning techniques derived from the perceptron idea (conditioning). This proposal suggests advancing from machine learning to machine understanding, requiring a different conceptual foundation. It has been argued that human understanding requires awareness, and physical processes in the brain that evoke awareness might not be amenable to computational simulation [174]. Notwithstanding these arguments, it appears possible to construct artifacts possessing a level of understanding that does not reach human heights but exceeds those accessible to the conventional technology.

It feels appropriate to end this chapter by giving credit to those whose foresight brought them long ago to conclusions similar to those expressed here:

*"It is worth while to speculate about cell assemblies as an alternative to feature detectors and hierarchies of classificatory units. These concepts are related to Perceptrons. Similarly, cell assemblies would find their technological analogue in a (non existing) Conceptron. … It would be surprising if it turned out that the real brain makes use only of one or the other scheme. Most likely the two schemes are used in combination, with the hierarchical organization predominating at the sensory and motor periphery of the nervous system, and the cell assemblies in between. From this point of view the cerebral cortex would seem a good place for cell assemblies, and we have seen that it contains the necessary equipment" [125] p. 187*
