**1. Introduction**

282 Neuroimaging – Cognitive and Clinical Neuroscience

Wiesel, F.A. (1989). Positron emission tomography in psychiatry. *Psychiatric Developments*,

Wingo, A., Wingo, T., Harvey, P., & Baldessarini, R. (2009). Effects of lithium on cognitive

Wolf, F., Brüne, M., & Assion, H.J. (2010). Theory of mind and neurocognitive functioning in

Yates, D.B., Dittmann, S., Kapczinski, F., & Trentini, C.M. (2010). Cognitive abilities and

Yatham, L.N., Liddle, P.F., Lam, R.W., Shiah, I.S., Lane, C., Stoessl, A.J., Sossi, V., & Ruth,

Yatham, L.N., Liddle, P.F., Erez, J., Kauer-Sant'Anna, M., Lam, R.W., Imperial, M., Sossi, V.,

Yildiz-Yesiloglu, A., & Ankerst, D.P. (2006). Neurochemical alterations of the brain in

Young, R.C., Nambudiri, D.E., Jain, H,. de Asis, J.M., & Alexopoulos, G.S. (1999). Brain

Young, R.C., Patel, A., Meyers, B.S., Kakuma, T., & Alexopoulos, G.S. (1999). Alpha(1)-acid

*Psychiatry*, Vol.196, No.1, (January 2010), pp. 47-51, ISSN 0007-1250

No.8, (April 1999), pp.1063-1065, ISSN 0006-3223 (a)

*Psychiatry,* Vol.7, No.4, (1999), pp. 331–334, ISSN 1064-7481 (b)

2009), pp. 516-21,ISSN 0006-3223

1398-5647

0002-953X (b)

969–995, ISSN 0278-5846

Vol.7, No.1, (1989), pp. 19-47, 0262-9283.

(November 2009), pp. 1588–1597, ISSN 0160-6689

(May 2002), pp. 768-774, ISSN 0002-953X (a)

and amygdala in bipolar disorder. *Biological Psychiatry,* Vol.66, No.5, (September

performance: a meta-analysis. *Journal of Clinical Psychiatry*, Vol.70, No.11,

patients with bipolar disorder. *Bipolar Disorders*, Vol.12, No.6, pp. 657-666, ISSN

clinical variables in bipolar I depressed and euthymic patients and controls. *Journal of Psychiatric Research,* Vol.45, No.4, (April 2011), pp. 495-504, ISSN 0022-3956 Yatham, L.N., Liddle P.F., Shiah, I.S., Lam, R.W., Ngan, E., Scarrow, G., Imperial, M., Stoessl,

J., Sossi, V., & Ruth, T.J. (2002). PET study of [(18)F]6-fluoro-L-dopa uptake in neuroleptic- and mood-stabilizer-naive first-episode nonpsychotic mania: effects of treatment with divalproex sodium. *The American Journal of Psychiatry*, Vol.159, No.5,

T.J. (2002). PET study of the effects of valproate on dopamine D(2) receptors in neuroleptic- and mood-stabilizer-naive patients with nonpsychotic mania. *The American Journal of Psychiatry*, Vol.159, No.10, (October 2002), pp. 1718-1723, ISSN

& Ruth, T.J. (2010). Brain serotonin-2 receptors in acute mania. *The British Journal of* 

bipolar disorder and their implications for pathophysiology: a systematic review of the in vivo proton magnetic resonance spectroscopy findings. *Progress in Neuropsychopharmacology and Biological Psychiatry*, Vol.30, No.6, (August 2006), pp.

computed tomography in geriatric manic disorder. *Biological Psychiatry,* Vol.45,

glycoprotein, age, and sex in mood disorders. *The American Journal of Geriatric* 

Reinforcement learning (RL) has a rich history tracing throughout the history of psychology. Already in the late 19th century Edward Thorndike proposed that if a stimulus is followed by a successful response, the stimulus-response bond will be strengthened. Consequently, the response will be emitted with greater likelihood upon later presentation of that same stimulus. This proposal already contains the two key principles of RL. The first principle concerns *associative learning*, the learning of associations between stimuli and responses. This theme was developed by John Watson. Building on the work of Ivan Pavlov, John Watson investigated the laws of classical conditioning, in particular, how a stimulus and a response become associated after repeated pairing. In the classical "Little Albert" experiment, Watson and Rayner (1920) repeatedly presented a rabbit together with a loud sound to the kid (little Albert); the rabbit initially evoked a neutral response, the loud sound initially evoked a fear response. After a while, also presentation of the rabbit alone evoked a fear response in the subject. In this same paper, the authors proposed that this principle of learning by association more generally is responsible for shaping (human) behavior. According to psychology handbooks John Watson hereby laid the foundation for behaviorism. The second principle is that *reinforcement* is key for human learning. Actions that are successful for the organism, will be strengthened and therefore repeated by the organism. This aspect was developed into a systematic research program by the second founder of behaviorism, Burrhus Skinner (e.g., Skinner, 1938).

The importance of RL for explaining human behavior started to be debated from the late 1940s. Scientific criticism toward RL arrived from two main fronts. The first was internal, deriving from experimental findings and theoretical considerations within psychology itself. The second derived from external developments, in particular, advancements in information theory and control theory. These criticisms led to a disinterest for RL lasting several decades. However, in recent years, RL has been revived, leading to a remarkable interdiscplinary confluence between computer science, neurophysiology, and cognitive neuroscience. In the current chapter, we describe the relevant mid-20th century criticisms and developments, and how these were considered and integrated in current versions of RL. In particular we focus on how RL can be used as a model for understanding high-level cognition. Finally, we link RL to the broader framework of neural Darwinism.

Reinforcement Learning, High-Level Cognition, and the Human Brain 285

These new disciplines showed that it was possible, and indeed a proficient and powerful approach, to investigate the internal functioning of systems (including biological organisms), by mathematical modelling of their hidden machinery that was not directly investigable. In this way, the philosophical-methodological assumption of behaviorism, according to which the scientific approach should be limited to strictly empirical

Because of these developments, behaviorism, and with it RL, was discredited for several decades. Instead an alternative paradigm became dominant, according to which the human mind could be construed as a computer that manipulates abstract symbols (e.g., Neisser, 1967; Atkinson and Shiffrin, 1968). However, in recent years the RL framework became influential again. At least two developments in the second part of the 20th century prepared a renewed interest for RL. The first originated in human learning theory; the second from a new discipline called connectionist psychology, which proposed itself as an alternative to

Important phenomena observed in the behavioral lab could not be accounted for with the standard behaviorist conceptualization (Rescorla and Wagner, 1972). For example, blocking (Kamin, 1969) refers to the fact that an organism only learns about the contingency between two events to the extent that one of the events is unexpected. To account for blocking, Rescorla and Wagner added a crucial ingredient to an associative learning framework, namely prediction error. Prediction error refers to the difference between an external feedback signal indicating the correct response or stimulus on the one hand, and the response or stimulus predicted by the organism on the other. Here it is worth noting the influence (and indeed similarity) of the cybernetic concept of feedback on the formulation of the concept of prediction error. Rescorla and Wagner proposed a formal model which learned by updating associations between events (e.g., stimulus and response) using prediction error (Rescorla and Wagner, 1972). This model formed the basis for many human learning theories (e.g., Kruschke, 2008; Pearce and Hall, 1980; Van Hamme and Wasserman,

*t tt V* (1)

*V V t tt* <sup>1</sup> (2)

is a learning rate parameter modulating the

is the actual

the then canonical symbol-manipulation paradigm for the study of cognition.

**4.1 Human learning theory and the Rescorla-Wagner model** 

1994), and can be represented by the following equations:

the prediction error from time point *t* to *t* + 1;

**4.2 The connectionist approach** 

prediction error.

where is the prediction error, *V* is the prediction of the organism, and

outcome from the environment. Equation 2 shows how the new expectations are updated by

A second development preparing the cultural ground for reviving the field of RL was connectionist psychology. Here, the study of psychological phenomena was grounded on the construction of artificial neural networks, i.e. models simulating both the nervous

investigation, was shown to be unnecessary for scientific progress.

**4. Precursors to the return of RL** 
