**4.2.2 Mirror neurons and the social robot**

One of the key requirements for Rob's robot is the ability to be social in the sense that it can interact with human beings in a sensible, autonomous manner. Such an interaction can take many forms. One example is *learning by imitation*, which would allow the robot to learn novel tasks by observing a human demonstrating the desired behavior. It could also take the form of *cooperation*, in which Rob's robot has to solve a task together with a human. Or it could be *verbal interaction*, which may be required in all social scenarios involving robots.

What is common to all these scenarios is first and foremost the requirement that Rob's robot be able to *understand* the actions of a human and related concepts, whether shown to the robot by demonstration or given via verbal labels. In other words, these scenarios can benefit from the embodied theories on sensory-motor grounding in order to facilitate this understanding.

Within this context, the discovery of mirror neurons –neurons that are active both when an agent executes a (goal-directed) action and when he observes the action being executed by another (Gallese et al., 1996) –is attracting Rob's attention. Indeed, one of the most prominent hypotheses on the functional role of mirror neurons is that they form the link between action perception and action understanding Bonini et al. (2010); Cattaneo et al. (2007); Fogassi et al. (2005); Rizzolatti & Sinigaglia (2010); Umiltà et al. (2008), which appears highly relevant for Rob's robot. Even so, it is always worthwhile to remember that this claim remains a *hypothesis*, not a proven fact, and does not come without criticism (Hickok, 2008).

It is further thought that mirror neurons may play a role in learning by imitation (Oztop et al., 2006) as well as in the sensory-motor grounding of language (see Chersi et al. (2010) for a discussion). Although these are again hypotheses rather than facts (, for instance rightly points out that most of the neurophysiologic data supporting the theories come from macaque monkeys, which neither imitate nor use language), they have inspired some robotics research (see for instance Yamashita & Tani (2008) for an example of learning by imitation and Wermter et al. (2005) for an example of language use). Overall, it thus comes as no surprise that mirror neurons are also of high interest to the field of humanoid robotics, since they may provide the key to grounding cognition in sensory-motor experiences. For this reason, Rob is interested in some of the work within the field of robotics that is based on insights learned from mirror neurons.

#### **4.2.3 Mirror neuron based robots**

Oztop et al. (2006) provides a general review of computational mirror neuron models (including models that can be seen to have components which function similarly to mirror neurons rather than being explicitly inspired by mirror neuron research), together with a proposed taxonomy. Not all of these are relevant to robotics, but we will briefly mention a few controllers that are examples of early mirror-neuron related work in robotics.

One of the first such examples is the recurrent neural network with parametric bias (RNNPB) by Tani et al. (2004), in which "parametric bias" units of a recurrent neural network are associated with actions, encoding each action with a specific vector. Then the system may then either be given a PB vector to generate the associated action, or it may be used to predict the PB vector associated with a given sensory-motor input. As such, these PB units can be understood as replicating some of the mirror functionality. Tani et al. (2004) demonstrate the 12 Will-be-set-by-IN-TECH

theories and computational models thereof (Pezzulo et al., 2011) while on the other hand, insights from embodied cognition can allow the better design of robots. For Rob's robot, the latter is clearly the most interesting aspect and hence the focus of the remainder of this section.

One of the key requirements for Rob's robot is the ability to be social in the sense that it can interact with human beings in a sensible, autonomous manner. Such an interaction can take many forms. One example is *learning by imitation*, which would allow the robot to learn novel tasks by observing a human demonstrating the desired behavior. It could also take the form of *cooperation*, in which Rob's robot has to solve a task together with a human. Or it could be

What is common to all these scenarios is first and foremost the requirement that Rob's robot be able to *understand* the actions of a human and related concepts, whether shown to the robot by demonstration or given via verbal labels. In other words, these scenarios can benefit from the embodied theories on sensory-motor grounding in order to facilitate this understanding. Within this context, the discovery of mirror neurons –neurons that are active both when an agent executes a (goal-directed) action and when he observes the action being executed by another (Gallese et al., 1996) –is attracting Rob's attention. Indeed, one of the most prominent hypotheses on the functional role of mirror neurons is that they form the link between action perception and action understanding Bonini et al. (2010); Cattaneo et al. (2007); Fogassi et al. (2005); Rizzolatti & Sinigaglia (2010); Umiltà et al. (2008), which appears highly relevant for Rob's robot. Even so, it is always worthwhile to remember that this claim remains a *hypothesis*,

It is further thought that mirror neurons may play a role in learning by imitation (Oztop et al., 2006) as well as in the sensory-motor grounding of language (see Chersi et al. (2010) for a discussion). Although these are again hypotheses rather than facts (, for instance rightly points out that most of the neurophysiologic data supporting the theories come from macaque monkeys, which neither imitate nor use language), they have inspired some robotics research (see for instance Yamashita & Tani (2008) for an example of learning by imitation and Wermter et al. (2005) for an example of language use). Overall, it thus comes as no surprise that mirror neurons are also of high interest to the field of humanoid robotics, since they may provide the key to grounding cognition in sensory-motor experiences. For this reason, Rob is interested in some of the work within the field of robotics that is based on insights learned from mirror

Oztop et al. (2006) provides a general review of computational mirror neuron models (including models that can be seen to have components which function similarly to mirror neurons rather than being explicitly inspired by mirror neuron research), together with a proposed taxonomy. Not all of these are relevant to robotics, but we will briefly mention a

One of the first such examples is the recurrent neural network with parametric bias (RNNPB) by Tani et al. (2004), in which "parametric bias" units of a recurrent neural network are associated with actions, encoding each action with a specific vector. Then the system may then either be given a PB vector to generate the associated action, or it may be used to predict the PB vector associated with a given sensory-motor input. As such, these PB units can be understood as replicating some of the mirror functionality. Tani et al. (2004) demonstrate the

few controllers that are examples of early mirror-neuron related work in robotics.

*verbal interaction*, which may be required in all social scenarios involving robots.

not a proven fact, and does not come without criticism (Hickok, 2008).

**4.2.2 Mirror neurons and the social robot**

neurons.

**4.2.3 Mirror neuron based robots**

utility of the overall architecture in three tasks. First, they show that the architecture enables a robot to learn hand movements by imitation. Second, they demonstrate that it can learn both end-point and cyclic movements. Finally, they illustrate the ability of the architecture to associate word sequences with the corresponding sensory-motor behaviors.

Yamashita & Tani (2008) similarly use a recurrent neural network at the heart of their robot controller but endow it with neurons that have two different timescales. The robot then learns repetitive movements and it is shown that the neurons with the faster timescale encode so-called movement primitives while the neurons with the slower timescale encode the sequencing of these primitives. This enables the robot to create novel behavior sequences by merely adapting the slower neurons. The encoding of different movement primitives within the neural structure also replicates the organization of parietal mirror neurons (Fogassi et al., 2005), which is at the core of other computational models of the mirror system (Chersi et al., 2010; Thill et al., In Press; Thill & Ziemke, 2010).

While the RNNPB architecture encodes behavior as different parametric bias vectors, Demiris & Hayes (2002); Demiris & Johnson (2003) propose an architecture in which every behavior is encoded by a separate module. This architecture combines inverse and forward models, leading to the ability to both recognize and execute actions with the same architecture. Learning is done by imitation, where the current state of the demonstrator is received and fed to all forward modules. These forward modules each predict the next state of the demonstrator based on the behavior they encode. The predicted states are compared with the actual states, resulting in confidence values that a certain behavior is being executed. If the behavior is known (a module produces a high confidence value), the motors are then actuated accordingly. If not, a new behavioral module is created to learn the novel behavior being demonstrated. A somewhat similar model of human motor control, also using multiple forward and inverse models has been proposed by Wolpert & Kawato (2001), with the main difference being that in this work, all models (rather than simply the one with the highest confidence value) contribute to the final motor command (albeit in different amounts).

Finally, Wermter et al. (2003; 2005; 2004) developed a self-organizing architecture which "takes as inputs language, vision and actions . . . [and] . . . is able to associate these so that it can produce or recognize the appropriate action. The architecture either takes a language instruction and produces the behavior or receives the visual input and action at the particular time-step and produces the language representation" (Wermter et al., 2005, cf. Wermter et al., 2004). This architecture was implemented in a wheeled (non-humanoid) robot based on the PeopleBot platform. This robot can thus be seen to "understand" actions by either observing them or from its stored representation related to observing the action. This is therefore an example of a robot control architecture that makes use of embodied representations of actions. In related work on understanding of concepts/language in mirror-neuron-like neural robotic controllers (Wermter et al., 2005) researchers use the insight that language can be grounded in semantic representations derived from sensory-motor input to construct multimodal neural network controllers for the PeopleBot platform that are capable of learning. The robot in this scenario is capable of locating a certain object, navigating towards it and picking it up. A modular associator-based architecture is used to perform these tasks. One module is used for vision, another one for the execution of the motor actions. A third module is used to process linguistic input while an overall associator network combines the inputs from each module. What all these examples illustrate is that insights from mirror neuron studies (in particular their potential role in grounding higher-level cognition in an agent's sensory-motor experiences) can be useful in robotics. In terms of using insights from embodied cognition,

for Humanoid Robots 15

Rob's Robot: Current and Future Challenges for Humanoid Robots 293

Dynamic Field Theory (DFT) is a mathematical framework based on the concepts of Dynamical Systems and the guidelines from Neurophysiology. A field represents a population of neurons and their activations follow continuous responses to external stimuli. Amari (1977) studied the properties of these networks as a model of the activation observed in cortical tissue. Fields have the same structure of a recurrent neural network since their connections can have, depending on the relative location within the network, a local excitation or a global

Fields are used to represent perceptual features, motion or cognitive decisions, e.g. position, orientation, color, speed. The dynamics of these fields allow the creation of peaks which are the units of representation in DFT (Schöner, 2008). Different configurations of one or more fields are possible, being the designer responsible for creating a proper connectivity and tuning of parameters. The result of activating this type of network is a continuously adaptive

Rob has learned about the different properties and potentials of dynamic fields for using it as part of a robust cognitive architecture. Some of the most attractive features of this approach include the possibility of having a hebbian-type of learning by exploiting the short-term memory features implicit in the dynamics of this algorithm. Long-term memory, decision making mechanism and noise robustness (also implicit in the dynamics of fields), and single-shot learning are all important tools that can and must be included in any cognitive architecture. Several applications modeling experiments on human behavior (Dineva (2005); Johnson et al. (2008); Lowe et al. (2010)) and robotic implementations (Bicho et al. (2000);

Nonetheless, from Rob's perspective, the current work with dynamic fields still needs to overcome a number of challenges. Dynamic field controllers are currently designed by hand, through elaborate parameter space explorations, to solve a very specific problem. Although these models nicely illustrate that decision-making can be grounded directly in sensory-motor experiences, their learning ability is limited and any particular model does not generalize well to other tasks, even though modular combinations of different models each solving a particular task seems possible (Johnson et al. (2008); Simmering et al. (2008); Simmering &

system that responds dynamically to any change coming from external stimuli.

Erlhagen et al. (2006); Zibner et al. (2011)) have demonstrated DFT's potential.

Fig. 4. Typical activations in a dynamics field, from Schöner (2008).

**4.2.5 Dynamic Field Theory**

inhibition, Fig. 4.

Spencer (2008)).

these are relatively simple examples since the main role of the body in these cases is to enable the grounding of concepts. For instance, a robot would "know" what a grasp is because it can relate it to its own grasping movements via the mirror neurons.

However, from Rob's perspective, there is still substantial work that needs to be done in this direction. In essence, what the field is currently missing are robots that can display higher-level cognition which goes beyond simple toy problems. For example, most of the examples above dealing with learning by imitation understand imitation as reproducing the trajectory of, for instance, the end-effector. However, imitation learning is more complex than that and involves decisions on whether it is really the trajectory that should be copied or the overall goal of the action (see *e.g.* Breazeal & Scassellati, 2002, for a discussion of such issues). Similarly, while there is work on endowing robots with embodied language skills (*e.g.* Wermter et al., 2005), it rarely goes beyond associating verbal labels to sensory-motor experiences (Although see Cangelosi & Riga (2006) for an attempt to build more abstract concepts using linguistic labels for such experiences). Again, while this is a worthwhile exercise, it is not really an example of a robot with true linguistic abilities.
