**8. Acknowledgements**

MS and TV were supported by BOF/GOA Grant BOF08/GOA/011.

#### **9. References**

292 Neuroimaging – Cognitive and Clinical Neuroscience

(Notebaert and Verguts, 2008) or even across task effectors (Braem et al., in press). Consistent with the model, it was recently demonstrated that ACC responds to item-specific

Besides conflict monitoring, several other functions have been attributed to the ACC. In humans, evidence using EEG and fMRI pointed toward a role in error processing (Gehring et al., 1993), error likelihood (Brown and Braver, 2005), or volatility (Behrens et al., 2007). Moreover, in the single-cell literature, no direct evidence has been found for conflict monitoring (Cole et al., 2009), while, on the other hand, there is strong evidence for reinforcement processing (Rushworth and Behrens, 2008). More specifically, single-cell recording studies revealed the presence in ACC of three different types of neural units. One population codes for reward expectation, discharging as a function of the expected reward following the presentation of an external cue or the planning of an action. A second population codes for positive prediction error (i.e. when the outcome was better than predicted). Finally, another population codes for negative prediction errors (i.e. when the outcome was worse than predicted). We recently attempted to integrate these different levels of data and theories from the point of view of the RL framework. The model we proposed (Silvetti et al., 2011), the Reward Value Prediction Model (RVPM) demonstrated that all these findings can be understood from the same computational machinery which calculates values and deviations between observed reinforcement and expected values in an RL framework. The global function of the ACC however, remained similar to that in the conflict monitoring model and later versions of it: it is to detect if something is unexpected,

The evolution sketched here, from abstract cybernetic control models to the RVPM, represents a general trend in RL, in which computational, cognitive, and neuroscience concepts are increasingly integrated. Despite this success, not all features of RL have received appropriate attention in the literature. In the final section, we look at an aspect of

Despite the variety in levels of abstraction and purpose of the different models that we described, most of them implement what is sometimes called a triple-factor learning rule (Ashby et al., 2007; Arbuthnott et al., 2000). This means that three factors are multiplied for the purpose of changes in model weights: the first two factors are activation of input and output neurons, constituting the Hebbian component. The third factor is a RL-like signal, which provides some evaluation of the current situation (is it rewarding, unexpected, etc; henceforth, value signal). The value signal indicates the valence of an environmental state or of an internal state of the individual. It can be both encoded by dopaminergic signals (Holroyd & Coles, 2001) or by noradrenergic signals (e.g., Gläscher et al., 2010; Verguts &

This general scheme of Hebbian learning modulated by value provides an instantiation of the theory of Neural Darwinism (ND; Edelman, 1978). ND is a large scale theory on brain processes with roots in evolutionary theory and immunology. The basic idea of ND consists in the analogy between the Darwinian process of natural selection of individual organisms, and the selection of the most appropriate neural connections between a large population of

**6.2 New evidence on ACC function: Insights from RL-based neural modelling** 

congruencies, not block-level congruencies (Blais and Bunge, 2011).

and if so, to take action and adapt the cognitive system.

RL that has been underrepresented.

**7. RL and neural Darwinism** 

Notebaert, 2009).


Reinforcement Learning, High-Level Cognition, and the Human Brain 295

Notebaert, W., and Verguts, T. (2008). Cognitive control acts locally. *Cognition* 106, 1071-

O'doherty, J.P., Dayan, P., Friston, K., Critchley, H., and Dolan, R.J. (2003). Temporal difference models and reward-related learning in the human brain. *Neuron* 38, 329-337. O'Reilly, R. C., & Frank, M. J. (2004). Making working memory work: A computational

Pearce, J.M., and Hall, G. (1980). A model for Pavlovian learning: variations in the effectiveness of

Rescorla, R.A., and Wagner, A.R. (1972). "A theory of Pavlovian conditioning: variation in

Roelfsema, P.R., and Van Ooyen, A. (2005). Attention-gated reinforcement learning of internal representations for classification. *Neural Computation* 17, 2176-2214. Rudebeck, P.H., Walton, M.E., Smyth, A.N., Bannerman, D.M., and Rushworth, M.F. (2006).

Rumelhart, D.E., and Mc Clelland, J.L. (1986). *Parallel Distributed Processing: Explorations in* 

Rumelhart, D.E., and Mcclelland, J.L. (1987). Learning the past tenses of english verbs:

Rushworth, M.F., and Behrens, T.E. (2008). Choice, uncertainty and value in prefrontal and

Schultz, W., Apicella, P., and Ljungberg, T. (1993). Responses of monkey dopamine neurons

Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and

Seidenberg, M.S., and Mc Clelland, J.L. (1989). A Distributed, Developmental Model of

Seymour, B., O'doherty, J.P., Dayan, P., Koltzenburg, M., Jones, A.K., Dolan, R.J., Friston,

Shannon, C.E. (1948). A Mathematical Theory of Communication. *The Bell System Technical* 

Silvetti, M., Seurinck, R., and Verguts, T. (2011). Value and prediction error in the medial

Skinner, B. F. (1938). *The behavior of organisms: An experimental analysis.* New York: Appleton-

St. John, M.F., and Mcclelland, J.L. (1990). Learning and applying contextual constraints in

sentence comprehension. *Artificial Intelligence* 46, 217-257.

Word Recognition and Naming. *Psychological Review* 96, 523-568.

*the Microstructure of Cognition.* Cambridge, MA: MIT Press.

*Acquisition,* ed. B. Macwhinney. (Mahwah, NJ: Erlbaum), 194-248.

conditioned but not of unconditioned stimuli. *Psychol Rev* 87, 532-552.

model of learning in the prefrontal cortex and basal ganglia. *Neural Computation, 18,*

the effectiveness of reinforcement and nonreinforcement," in *Classical conditioning II: current research and theory,* eds. A.H. Black & W.F. Prokasy. (New York:

Separate neural pathways process different decision costs. *Nat Neurosci* 9, 1161-

Implicit rules or parallel distributed processing, in *Mechanisms of Language* 

to reward and conditioned stimuli during successive steps of learning a delayed

K.J., and Frackowiak, R.S. (2004). Temporal difference models describe higher-

frontal cortex: integrating the single-unit and systems levels of analysis. *Frontiers in* 

Neisser, U. (1967). *Cognitive psychology.* New York: Appleton-Century-Crofts.

1080.

283-328.

1168.

Appleton-Century-Crofts), 64-99.

cingulate cortex. *Nat Neurosci* 11, 389-397.

order learning in humans. *Nature* 429, 664-667.

response task. *J Neurosci* 13, 900-913.

reward. *Science, 275,* 1593-1599.

*Journal* 27, 379–423, 623–656.

*Human Neuroscience,* 5:75*.*

Century-Crofts.


Braem, S., Verguts, T., & Notebaert, W. (in press). Conflict adaptation by means of

Brown, J.W., and Braver, T.S. (2005). Learned predictions of error likelihood in the anterior

Burgess, N., and Hitch, G.J. (1999). Memory for Serial Order: A Network Model of the Phonological Loop and its Timing. *Psychological Review* 106, 551-581. Chomsky, N. (1959). Review of Verbal Behavior by B.F. Skinner. *Language* 35, 26-58.

Cohen, J.D., Dunbar, K., and Mcclelland, J.L. (1990). On the control of automatic processes: a

Cole, M.W., Yeung, N., Freiwald, W.A., and Botvinick, M. (2009). Cingulate cortex: diverging data from humans and monkeys. *Trends Neurosci* 32, 566-574. Daw, N. D., O'Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. *Nature, 441,* 876-879. Dehaene, S., Changeux, J.-P., & Nadal, J.-P. (1987). Neural networks that learn temporal

Frank, M.J., Seeberger, L.C., and O'reilly R, C. (2004). By carrot or by stick: cognitive

Frank, M.J. (2005). Dynamic dopamine modulations in the basal ganglia: A

Gehring, W.J., Goss, B., Coles, M.G.H., Meyer, D.E., and Donchin, E. (1993). A Neural System for Error Detection and Compensation. *Psychological Science* 4, 385-390 Gläscher, J., Daw, N., Dayan, P., and O'doherty, J.P. (2010). States versus rewards:

Grossberg, S. (1973). Contour enhancement, short term memory, and constanciesin reverberating neural networks. *Studies in Applied Mathematics* 11, 213-257. Hebb, D. (1949). *The organization of behavior; a neuropsychological theory.* New York Wiley-

Holroyd, C.B., and Coles, M.G. (2002). The neural basis of human error processing:

Kamin, L.J. (1969). "Predictability, surprise, attention, and conditioning.," in *Punishment and* 

Kennerley, S.W., Walton, M.E., Behrens, T.E., Buckley, M.J., and Rushworth, M.F. (2006).

Kruschke, J.K. (2008). Bayesian approaches to associative learning: from passive to active

Lømo, T. (1966). Frequency potentiation of excitatory synaptic activity in the dentate area of the hippocampal formation. *Acta Physiologica Scandinavia* 68 Suppl. 277, 128. Montague, P.R., Dayan, P., and Sejnowski, T.J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. *J Neurosci* 16, 1936-1947.

reinforcement learning in parkinsonism. *Science* 306, 1940-1943.

Parkinsonism. Journal of Cognitive Neuroscience, 17, 51-72.

parallel distributed processing account of the Stroop effect. *Psychol Rev* 97, 332-361.

sequences by selection. *Proceedings of the National Academy of Sciences: USA, 84,* 2727-

neurocomputational account of cognitive deficits in medicated and nonmedicated

dissociable neural prediction error signals underlying model-based and model-free

reinforcement learning, dopamine, and the error-related negativity. *Psychol Rev* 109,

*Aversive Behavior.,* eds. B.A. Campbell & R.M. Church. (New York: Appleton-

Optimal decision making and the anterior cingulate cortex. *Nat Neurosci* 9, 940-947.

*Performance.*

2731.

Interscience.

Century-Crofts), 279-296.

learning. *Learn Behav* 36, 210-226.

679-709.

cingulate cortex. *Science* 307, 1118-1121.

Edelman, G. (1978). *The Mindful Brain.* Cambridge, Ma: MIT press.

reinforcement learning. *Neuron* 66, 585-595.

associative learning. *Journal of Experimental Psychology: Human Perception &* 

Neisser, U. (1967). *Cognitive psychology.* New York: Appleton-Century-Crofts.


**15** 

*France* 

**What Does Cerebral Oxygenation Tell Us** 

Since the fifth Century Athens, when Hippocrates identified the brain as the source of thought and understanding, humanity has been preoccupied with its functions. Anatomical descriptions have been brought to modernity by Andreas Vesalius in the sixteenth century (Vesalius, 1543) while underlying mechanisms have awaited the discovery of "bioelectricity" by Luigi Galvani in the eighteenth century to emerge (Galvani, 1791). In the nineteenth century, famous physicians such as Paul Broca or Carl Wernicke have demonstrated the role of the brain in cognitive tasks, studying patients with neurological disorders (Broca, 2004; Wernicke, 1894). From the late twentieth century to present day, neuroimaging techniques have allowed explorations in healthy subjects providing very

For the advancement of theory it is essential to acknowledge the strengths and limitations of available neuroimaging techniques so that converging evidence on the basis of multiple modes of investigation can be brought to bear on current controversies in the literature. Electroencephalography (EEG) was chronologically the first technique to open the way to the study of brain functions in exercising subjects (Swartz and Goldensohn, 1998). While one of the most direct methods to non-invasively measure the electrical signal arising from the synchronous firing of neurons, spatial resolution and lack of information from areas deeper than the cortex are its main limitations. Magnetoencephalography (MEG) is also a direct measure of the electrical activity of neurons and has a better spatial resolution as compared with EEG. However, the lack of detection in deep brain structures and the threshold detection (at least 50,000 neurons active simultaneously are needed) make MEG main disadvantages (Shibasaki, 2008). Functional imaging such as positron emission tomography (PET), single photon emission computed tomography (SPECT) and functional magnetic resonance imaging (fMRI) overcome the EEG and MEG limitations as they can detect neuronal activity as deep in the brain as experimenters desire (Cui et al., 2011; Villringer, 1997). However, the measure is indirect as it relies on blood supply for fMRI or on radioactive tracers for PET and SPECT (Jantzen et al., 2008; Tashiro et al., 2008). Additionally, except for EEG, the experimental environments of the earlier described techniques are very restricting with regards to physical exercise. Subjects and experimenters are limited to sit or laid positions and to breathe, eye, wrist and ankle movements. Actually, *in vivo* determination of brain functions in humans requires flexible, accessible and rapid monitoring techniques (Kikukawa et al., 2008; Perrey, 2008;

precise locations of brain regions involved in cognitive and motor functions.

**1. Introduction** 

**About Central Motor Output?** 

*Movement To Health (M2H), Montpellier-1 University, Euromov* 

Nicolas Bourdillon and Stéphane Perrey

