**1. Introduction**

Mammalian sensory systems are composed in cortex of many functionally specialized areas organized into hierarchical networks [1–6]. The most fundamental sensory information is embodied by the organization of the sensory receptors, which is maintained throughout most of the cortical hierarchy of sensory regions with repeating representations of this topography in cortical field maps (CFMs) [5, 7–13]. Accordingly neurons with receptive fields situated next to one another in sensory feature space are positioned next to one another in cortex within a CFM.

In auditory cortex, auditory field maps (AFMs) are identified by two orthogonal sensory representations: tonotopic gradients from the spectral aspects of sound (i.e., tones), and periodotopic gradients from the temporal aspects of sound (i.e., period or temporal envelope) [5, 10, 14]. On a larger scale across cortex, AFMs are grouped into cloverleaf clusters, another fundamental organizational structure also common to visual cortex [8, 10, 15–20]. CFMs within clusters tend to share properties such as receptive field distribution, cortical magnification, and processing specialization (e.g., [18, 19, 21]).

#### **Figure 1.**

*Primary auditory cortex. (A) The lateral view of the left hemisphere is shown in the schematic. Major sulci are marked by black lines. The approximate position of primary auditory cortex (PAC) is shown with the red overlay inside the black dotted line. The white dotted line within the red region indicates the extension of PAC into the lateral sulcus (LS) along Heschl's gyrus (HG; hidden within the sulcus in this view). Inset refers to anatomical directions as A: anterior; P: posterior; S: superior; I: inferior. PAC: primary auditory cortex (red); LS: lateral sulcus (green; also known as the lateral fissure or Sylvian fissure); CS: central sulcus (purple); STG: superior temporal gyrus (blue); STS: superior temporal sulcus (orange). (B) The cortical surface of the left hemisphere of one subject (S2) is displayed as a typical inflated 3-D rendering created from highresolution, anatomical MRI measurements. Light gray regions denote gyri; dark gray regions denote sulci. The exact location of this subject's hA1 auditory field map is shown in red within the black dotted lines. Note that HG in S2 is composed of a double peak, seen here as two light gray stripes, rather than the more common single gyrus. The locations of the three cloverleaf clusters composed of the core and belt AFMs are shown along HG by three colored overlays as yellow: hCM/hCL cluster; red: HG cluster including hA1, hR, hRM, hMM, hML, hAL; and magenta: hRTM/hRT/hRTL cluster (cite?). Additional cloverleaf clusters are under investigation along PP, PT, STG, and the STS. Green-labeled anatomical regions are sections within the lateral sulcus—CG: Circular gyrus (green); PP: planum polare (green); PT: planum temporale (green). (C) This single T1 image shows a coronal view of hA1 on HG (red within dotted white line). Adapted from Refs. [5, 12].*

Across the cortical hierarchy, there is generally a progressive increase in the complexity of sensory computations from simple sensory stimulus features (e.g., frequency content) to higher levels of cognition (e.g., attention and working memory) [6, 13, 22]. CFM organization likely serves as a framework for integrating bottom-up inputs from sensory receptors with top-down attentional processing [12, 17]. With the recent ability to measure AFMs in the core and belt regions of human auditory cortex along Heschl's gyrus (HG) using high-resolution functional magnetic resonance imaging (fMRI), the stage is now set for investigation into this integration of basic auditory processing with higher-order auditory attention and working memory within human AFMs (**Figure 1**) [5, 12, 15, 23].

This chapter first provides a brief history of research into models of auditory nonverbal attention and working memory, with comparisons to their visual counterparts. Next, we discuss the current state of research into AFMs within human auditory cortex. Finally, we propose directions of future research investigating auditory attention and working memory within these AFMs to illuminate how these higher-order cognitive processes interact with low-level auditory processing.

## **2. Attention and working memory in human audition**

#### **2.1 Models of attention and working memory**

Attention, the ability to select and attend to aspects of the sensory environment while simultaneously ignoring or inhibiting others, is a fundamental aspect of human sensory systems (for reviews, see [24–27]). Given the limited resources of the human brain, attention allows for greater resources to be allocated to processing of important incoming sensory stimuli by diverting precious resources from currently unimportant stimuli. Such allocation can be controlled cognitively, in what is generally referred to as 'top-down' attentional control in models of attention, in reference to the higher-order cognitive processes controlling attention from the 'top'

**5**

**Figure 2.**

*Attention and Working Memory in Human Auditory Cortex*

of the sensory-processing hierarchy and acting 'down' on the lower levels (**Figure 2**) [24, 28–31]. Despite lower priority being assigned to the currently unimportant stimulus locations, change is constant, so the resource diversion to attended stimuli is not absolute, allowing for the sensory environment to continue to be monitored. If, instead, processing resources were evenly distributed throughout the sensory field, without regard to salience, more resources would be wasted on unimportant aspects of the field. If something in the unattended sensory field should become important, the system requires a mechanism to reorient attention to that aspect of the field. Such stimulus-driven attentional control is referred to as 'bottom-up', referring to the ability of incoming sensory input at the bottom of the hierarchy to orient the higherorder attention system. This broad framework of attentional models is common at least to the senses most commonly studied, vision and audition [25, 27, 31, 32].

*Attention and working-memory model. A model of the interactions between perception, trace memory, attention, working memory, and long-term memory in the visual and auditory systems, as well as the central executive. Ovals represent neural systems. Arrows represent actions of one system on another. Attention is the term for the action of perception and trace memory on working memory and vice versa. Rehearsal is the term for maintaining information in working memory. This model is not intended to indicate that these systems are* 

*discrete or independent; within each sense, they are in fact highly integrated.*

In the effort to elucidate the parameters of auditory attention, researchers have taken a myriad of approaches in numerous contexts. Researchers have attempted to decipher at what level of the sensory-processing hierarchy stimulus-driven attention occurs (after which sensory-processing steps does attention act) [24, 30, 31, 33–35], how attention can be deployed (to locations in space or particular sensory features) [36–40], and how can attention be distributed (to how many 'objects' or 'streams' can attention be simultaneously deployed) [41–44]. Many studies have narrowed the range of possibilities without precisely answering these questions, and so remain active areas of research. Modern models of attention generally agree that stimuli are processed to some degree before attention acts, accounting for the stimulus-driven 'bottom-up' attentional shifts, though it is unclear to precisely which degree [24, 30, 33]. Neuroscientific evidence suggests that attention acts throughout sensory-processing hierarchies, so the idea of attention being located at a particular 'height' in the hierarchy may not be a particularly useful insight for

*DOI: http://dx.doi.org/10.5772/intechopen.85537*

*Attention and Working Memory in Human Auditory Cortex DOI: http://dx.doi.org/10.5772/intechopen.85537*

#### **Figure 2.**

*The Human Auditory System - Basic Features and Updates on Audiological Diagnosis and Therapy*

*Primary auditory cortex. (A) The lateral view of the left hemisphere is shown in the schematic. Major sulci are marked by black lines. The approximate position of primary auditory cortex (PAC) is shown with the red overlay inside the black dotted line. The white dotted line within the red region indicates the extension of PAC into the lateral sulcus (LS) along Heschl's gyrus (HG; hidden within the sulcus in this view). Inset refers to anatomical directions as A: anterior; P: posterior; S: superior; I: inferior. PAC: primary auditory cortex (red); LS: lateral sulcus (green; also known as the lateral fissure or Sylvian fissure); CS: central sulcus (purple); STG: superior temporal gyrus (blue); STS: superior temporal sulcus (orange). (B) The cortical surface of the left hemisphere of one subject (S2) is displayed as a typical inflated 3-D rendering created from highresolution, anatomical MRI measurements. Light gray regions denote gyri; dark gray regions denote sulci. The exact location of this subject's hA1 auditory field map is shown in red within the black dotted lines. Note that HG in S2 is composed of a double peak, seen here as two light gray stripes, rather than the more common single gyrus. The locations of the three cloverleaf clusters composed of the core and belt AFMs are shown along HG by three colored overlays as yellow: hCM/hCL cluster; red: HG cluster including hA1, hR, hRM, hMM, hML, hAL; and magenta: hRTM/hRT/hRTL cluster (cite?). Additional cloverleaf clusters are under investigation along PP, PT, STG, and the STS. Green-labeled anatomical regions are sections within the lateral sulcus—CG: Circular gyrus (green); PP: planum polare (green); PT: planum temporale (green). (C) This single T1 image* 

Across the cortical hierarchy, there is generally a progressive increase in the complexity of sensory computations from simple sensory stimulus features (e.g., frequency content) to higher levels of cognition (e.g., attention and working memory) [6, 13, 22]. CFM organization likely serves as a framework for integrating bottom-up inputs from sensory receptors with top-down attentional processing [12, 17]. With the recent ability to measure AFMs in the core and belt regions of human auditory cortex along Heschl's gyrus (HG) using high-resolution functional magnetic resonance imaging (fMRI), the stage is now set for investigation into this integration of basic auditory processing with higher-order auditory attention and

*shows a coronal view of hA1 on HG (red within dotted white line). Adapted from Refs. [5, 12].*

This chapter first provides a brief history of research into models of auditory nonverbal attention and working memory, with comparisons to their visual counterparts. Next, we discuss the current state of research into AFMs within human auditory cortex. Finally, we propose directions of future research investigating auditory attention and working memory within these AFMs to illuminate how these

Attention, the ability to select and attend to aspects of the sensory environment

while simultaneously ignoring or inhibiting others, is a fundamental aspect of human sensory systems (for reviews, see [24–27]). Given the limited resources of the human brain, attention allows for greater resources to be allocated to processing of important incoming sensory stimuli by diverting precious resources from currently unimportant stimuli. Such allocation can be controlled cognitively, in what is generally referred to as 'top-down' attentional control in models of attention, in reference to the higher-order cognitive processes controlling attention from the 'top'

higher-order cognitive processes interact with low-level auditory processing.

working memory within human AFMs (**Figure 1**) [5, 12, 15, 23].

**2. Attention and working memory in human audition**

**2.1 Models of attention and working memory**

**4**

**Figure 1.**

*Attention and working-memory model. A model of the interactions between perception, trace memory, attention, working memory, and long-term memory in the visual and auditory systems, as well as the central executive. Ovals represent neural systems. Arrows represent actions of one system on another. Attention is the term for the action of perception and trace memory on working memory and vice versa. Rehearsal is the term for maintaining information in working memory. This model is not intended to indicate that these systems are discrete or independent; within each sense, they are in fact highly integrated.*

of the sensory-processing hierarchy and acting 'down' on the lower levels (**Figure 2**) [24, 28–31]. Despite lower priority being assigned to the currently unimportant stimulus locations, change is constant, so the resource diversion to attended stimuli is not absolute, allowing for the sensory environment to continue to be monitored. If, instead, processing resources were evenly distributed throughout the sensory field, without regard to salience, more resources would be wasted on unimportant aspects of the field. If something in the unattended sensory field should become important, the system requires a mechanism to reorient attention to that aspect of the field. Such stimulus-driven attentional control is referred to as 'bottom-up', referring to the ability of incoming sensory input at the bottom of the hierarchy to orient the higherorder attention system. This broad framework of attentional models is common at least to the senses most commonly studied, vision and audition [25, 27, 31, 32].

In the effort to elucidate the parameters of auditory attention, researchers have taken a myriad of approaches in numerous contexts. Researchers have attempted to decipher at what level of the sensory-processing hierarchy stimulus-driven attention occurs (after which sensory-processing steps does attention act) [24, 30, 31, 33–35], how attention can be deployed (to locations in space or particular sensory features) [36–40], and how can attention be distributed (to how many 'objects' or 'streams' can attention be simultaneously deployed) [41–44]. Many studies have narrowed the range of possibilities without precisely answering these questions, and so remain active areas of research. Modern models of attention generally agree that stimuli are processed to some degree before attention acts, accounting for the stimulus-driven 'bottom-up' attentional shifts, though it is unclear to precisely which degree [24, 30, 33]. Neuroscientific evidence suggests that attention acts throughout sensory-processing hierarchies, so the idea of attention being located at a particular 'height' in the hierarchy may not be a particularly useful insight for

identifying the cortical locus of attentional control [45, 46]. Modern attentional models also generally agree that attention can be deployed to locations in or features of sensory space, both of which are fundamental aspects to the sensory-processing hierarchy [24, 35]. Finally, modern models of attention agree that attention is very limited, but not about precisely how it is limited. Some models are still fundamentally 'spotlight' models [25, 44], in which attention is limited to a single location or feature set, while others posit that attention can be divided between a small number of locations or features [41, 47]. Based on related working-memory research, the latter theory is gaining prominence as likely correct.

Working memory (i.e., a more accurate term for 'short-term memory') is the ability to maintain and manipulate information within the focus of attention over a short period of time after the stimulus is no longer perceptible (for reviews, see [48–51]). Without explicit maintenance, this retention period is approximately 1–2 s, but is theoretically indefinite with explicit maintenance. Working memory should not to be confused with 'sensory memory', also known as 'iconic memory' in vision and 'echoic memory' in audition [52]. Sensory memory is a fundamental aspect of sensory systems in which a sensory trace available to attention and working-memory systems persists for less than ~100 ms after stimuli are no longer perceptible. Models of working memory are nearly indistinguishable from models of attention; the key difference is that working memory is a 'memory' of previously perceptible stimuli, whereas attention is thought to act on perceptible stimuli or sensory traces thereof. Working-memory models posit, by definition, that working memory acts after perception processing has occurred (**Figure 2**; for review, see [53]). However, it has been difficult to isolate exactly where working-memory control resides along the cortical hierarchy of sensory processing, likely because low-level perceptual cortex is recruited at least for visual working memory and attention [40, 46, 54, 55].

Like attention, working-memory models also posit that working memory is a highly limited resource, in which a small set of locations or objects (e.g., 3–4 items on average) can be simultaneously maintained [42, 49]. In fact, some modern measures of attention and working memory are nearly identical. The change-detection task is a ubiquitous one in which subjects are asked to view a sensory array, then compare that sensory array to a second one in which some aspect of the array may have changed, and indicate whether a change has occurred (**Figure 3**) [56–60]. A short delay period (i.e., retention interval) is included during each array, which may include a neutral presentation or, if desired, a mask of the sensory stimuli to prevent the use of 'sensory memory'. The length of the delay period can be then be altered to either measure attention or working memory. If the delay period is on the order of ~0–200 ms, it is considered an attentional task; if it is longer, on the order of 1–2 s, it is considered a working-memory task [53]. Therefore, attention and working-memory systems are at a minimum heavily intertwined and very likely the same system studied in slightly different contexts, with attention being a component of a larger working-memory framework.

With the relatively recent invention of fMRI, researchers have been able to begin to localize these models of attention and working memory to their cortical underpinnings (e.g., [6, 37, 40, 50, 55, 61, 62]). FMRI, through its exquisite ability to localize blood oxygenation-level dependent (BOLD) signals (and thus the underlying neural activity) to just a couple of millimeters is the best technology available for such research [63, 64]. Two broad approaches have been employed for studying these high-order cognitive processes: model-based and perception-based. Model-based investigations tend to use tasks based on behavioral investigations into attention and working memory, adapt them to the strict parameters required of fMRI, and compare activity in conditions when attention or working memory are differentially deployed [61, 62]. Perception-based investigations tend to measure low-level perceptual cortex that has already been mapped in detail and measure

**7**

**Figure 3.**

*Attention and Working Memory in Human Auditory Cortex*

the effects of attention or working memory within those regions [50, 55, 65]. Both approaches are important and should be fully integrated to garner a more complete and accurate localization of these attentional and working-memory systems.

*Visual change-detection task. This task can be used to probe visual attention or working memory and is very similar to its auditory counterpart. Such tasks have three phases: first is encoding, when subjects are given ~100–500 ms to view the sample array; next is maintenance, which is short (~0–200 ms) for measuring attention and longer (~1000 ms) for working memory; last is the probe (lasting until the subject responds or with a time limit, often ~2000 ms). In this example, a set size of four is presented for the sample array and a probe array of one is used, though different set sizes are commonplace and often the probe array will be the same set size as the encoding array with a possibility of one object being changed. Typically there is an equal chance (50%) of the probe array containing a change or not. Generally subjects will be required to fixate centrally, particularly if fMRI, EEG, or PET recordings are being made. (A) Simple colored square stimuli are depicted here, often drawn from a small set of easily distinguished hues (in this case, 6). As a result, changes are always low in similarity, requiring low resolution to make accurate comparisons between encoding and test arrays, which is important at least for visual working-memory measurements. More complex stimuli can also be used as in (B) and (C). These stimuli are shaded cubes with the same hue set as in (A), but also have 6 possible shading patterns with the dark, medium, and light shaded sides on each cube. Changes between hues, as in (B), are equivalently low similarity to (A) and result in similar performance under visual workingmemory conditions. Changes in shading patterns, as in (C), result in worse performance than (B) despite having the same number of possible pattern changes as hue changes in (A) or (B), because such changes require higher resolution representations in visual working memory. Adapted from Barton and Brewer [50].*

*DOI: http://dx.doi.org/10.5772/intechopen.85537*

*The Human Auditory System - Basic Features and Updates on Audiological Diagnosis and Therapy*

identifying the cortical locus of attentional control [45, 46]. Modern attentional models also generally agree that attention can be deployed to locations in or features of sensory space, both of which are fundamental aspects to the sensory-processing hierarchy [24, 35]. Finally, modern models of attention agree that attention is very limited, but not about precisely how it is limited. Some models are still fundamentally 'spotlight' models [25, 44], in which attention is limited to a single location or feature set, while others posit that attention can be divided between a small number of locations or features [41, 47]. Based on related working-memory research, the

Working memory (i.e., a more accurate term for 'short-term memory') is the ability to maintain and manipulate information within the focus of attention over a short period of time after the stimulus is no longer perceptible (for reviews, see [48–51]). Without explicit maintenance, this retention period is approximately 1–2 s, but is theoretically indefinite with explicit maintenance. Working memory should not to be confused with 'sensory memory', also known as 'iconic memory' in vision and 'echoic memory' in audition [52]. Sensory memory is a fundamental aspect of sensory systems in which a sensory trace available to attention and working-memory systems persists for less than ~100 ms after stimuli are no longer perceptible. Models of working memory are nearly indistinguishable from models of attention; the key difference is that working memory is a 'memory' of previously perceptible stimuli, whereas attention is thought to act on perceptible stimuli or sensory traces thereof. Working-memory models posit, by definition, that working memory acts after perception processing has occurred (**Figure 2**; for review, see [53]). However, it has been difficult to isolate exactly where working-memory control resides along the cortical hierarchy of sensory processing, likely because low-level perceptual cortex is

recruited at least for visual working memory and attention [40, 46, 54, 55]. Like attention, working-memory models also posit that working memory is a highly limited resource, in which a small set of locations or objects (e.g., 3–4 items on average) can be simultaneously maintained [42, 49]. In fact, some modern measures of attention and working memory are nearly identical. The change-detection task is a ubiquitous one in which subjects are asked to view a sensory array, then compare that sensory array to a second one in which some aspect of the array may have changed, and indicate whether a change has occurred (**Figure 3**) [56–60]. A short delay period (i.e., retention interval) is included during each array, which may include a neutral presentation or, if desired, a mask of the sensory stimuli to prevent the use of 'sensory memory'. The length of the delay period can be then be altered to either measure attention or working memory. If the delay period is on the order of ~0–200 ms, it is considered an attentional task; if it is longer, on the order of 1–2 s, it is considered a working-memory task [53]. Therefore, attention and working-memory systems are at a minimum heavily intertwined and very likely the same system studied in slightly different contexts, with

attention being a component of a larger working-memory framework.

With the relatively recent invention of fMRI, researchers have been able to begin to localize these models of attention and working memory to their cortical underpinnings (e.g., [6, 37, 40, 50, 55, 61, 62]). FMRI, through its exquisite ability to localize blood oxygenation-level dependent (BOLD) signals (and thus the underlying neural activity) to just a couple of millimeters is the best technology available for such research [63, 64]. Two broad approaches have been employed for studying these high-order cognitive processes: model-based and perception-based. Model-based investigations tend to use tasks based on behavioral investigations into attention and working memory, adapt them to the strict parameters required of fMRI, and compare activity in conditions when attention or working memory are differentially deployed [61, 62]. Perception-based investigations tend to measure low-level perceptual cortex that has already been mapped in detail and measure

latter theory is gaining prominence as likely correct.

**6**

the effects of attention or working memory within those regions [50, 55, 65]. Both approaches are important and should be fully integrated to garner a more complete and accurate localization of these attentional and working-memory systems.

#### **Figure 3.**

*Visual change-detection task. This task can be used to probe visual attention or working memory and is very similar to its auditory counterpart. Such tasks have three phases: first is encoding, when subjects are given ~100–500 ms to view the sample array; next is maintenance, which is short (~0–200 ms) for measuring attention and longer (~1000 ms) for working memory; last is the probe (lasting until the subject responds or with a time limit, often ~2000 ms). In this example, a set size of four is presented for the sample array and a probe array of one is used, though different set sizes are commonplace and often the probe array will be the same set size as the encoding array with a possibility of one object being changed. Typically there is an equal chance (50%) of the probe array containing a change or not. Generally subjects will be required to fixate centrally, particularly if fMRI, EEG, or PET recordings are being made. (A) Simple colored square stimuli are depicted here, often drawn from a small set of easily distinguished hues (in this case, 6). As a result, changes are always low in similarity, requiring low resolution to make accurate comparisons between encoding and test arrays, which is important at least for visual working-memory measurements. More complex stimuli can also be used as in (B) and (C). These stimuli are shaded cubes with the same hue set as in (A), but also have 6 possible shading patterns with the dark, medium, and light shaded sides on each cube. Changes between hues, as in (B), are equivalently low similarity to (A) and result in similar performance under visual workingmemory conditions. Changes in shading patterns, as in (C), result in worse performance than (B) despite having the same number of possible pattern changes as hue changes in (A) or (B), because such changes require higher resolution representations in visual working memory. Adapted from Barton and Brewer [50].*

## **2.2 Overview of auditory and visual attention research**

Research into attention began in earnest in the auditory system after World War II with a very practical motivation. It had been noted that fighter pilots sometimes failed to perceive auditory messages presented to them over headphones despite the fact that the messages were completely audible. To solve this problem, Donald Broadbent began studying subjects with an auditory environment similar to the pilots, with multiple speech messages presented over headphones [34]. Based on his findings, he proposed a selective theory of attention, which was popular and persuasive, but ultimately required modification. Environments such as the one Broadbent studied are more commonly encountered at cocktail parties, in which multiple audible conversations are taking place, and people are able to attend to one or a small set of speech streams while attenuating the others. To study the 'cocktail party phenomenon,' the dichotic listening task was developed in the 1950s by Colin Cherry [66, 67]. Subjects were asked to shadow the speech stream presented to one ear of a set of headphones while another stream was presented to the other ear, and they demonstrated little knowledge of the nonshadowed (unattended) stream (**Figure 4**).

A host of studies followed up on the basic finding, revealing several attentional parameters within the context of that type of task (e.g., [30, 35, 40, 68–71]). Importantly, preferential processing of the attended stream relative to the unattended streams is not absolute; for example, particularly salient information, such as the name of the subject, could sometimes be recalled from an unattended stream, presumably by reorienting attention [39, 66, 67, 69]. The streams were typically differentiated spatially (e.g., to each ear through a headset), indicating a spatial aspect to attentional selection and therefore the attentional system. Similarly, the streams

#### **Figure 4.**

*Auditory spatial attention. Schematic of an example auditory spatial attention task (e.g., see [35, 40, 66, 67]). Each block typically starts with cue (auditory or visual) for the subject to attend left or right on the upcoming trial. Two simultaneous auditory streams of digits are presented as binaural, spatially lateralized signals. Behavioral studies in an anechoic chamber often use speaker physically located to the left and right of the subject; fMRI measurements do not have the option of such a set up, but instead can use differences in the interaural time difference (ITD) to produce a similarly effective lateralization for the two digit streams. The subjects attend to the cued digit stream and perform a 1-back task.*

**9**

system (**Figure 6**) [5, 10].

*Attention and Working Memory in Human Auditory Cortex*

were also typically differentiated by the voice of the person speaking, indicating attentional selection based on the spectrotemporal characteristics of the speaker's voice such as the average and variance of pitch and speech rate (often reflecting

These findings are very similar to findings in the visual domain, indicating that attentional systems across senses are similarly organized. Visual attention can similarly be deployed to a small set of locations or to visual features with very little recall of nonattended visual stimuli [41]. Roughly analogous to speech shadowing are multiple-object-tracking tasks, which require subjects to visually track a small set of moving objects out of a group [47, 73]. Visual change-detection tasks are also very common, and they demonstrate very similar results as their auditory counterparts [50, 74, 75]. In sum, the evidence suggests that attentional systems are organized

Despite these broad contributions, these types of tasks are of limited utility when tying behavior to cortical activity because the types of stimuli used are rather highorder (e.g., speech) with relatively uncontrolled low-level parameters. For example, the spectrotemporal profile of a stream of speech is complex, likely activating broad swaths of low-level sensory cortex in addition to higher-order regions dedicated to speech comprehension, including working and long-term memory [68, 72, 76, 77]. If one were to compare fMRI activity across auditory cortex in traditional dichotic listening tasks, the differences would have far too many variables for which to account before meaningful conclusions can be made about attentional systems. It may seem intuitive to compare cortical activity between conditions where identical speech stimuli have been presented and the subject either attended to the stimuli or did not. However, areas that have increased activity when the stimuli were attended could simply reflect higher-order processing that only occurs when attention is directed to the stimuli rather than directly revealing areas involved in attentional control. For example, recognition of particular words requires comparison of the speech stimulus to an internal representation, which requires activation of long-term memories of words [77]. Long-term memory retrieval does not happen if the subject never perceived the word due to attention being maintained on a separate speech stream, so such memory-retrieval activity would be

Thus, simpler stimuli that are closer in nature to the initial spectrotemporal analyses performed by primary auditory cortex (PAC) are better suited for experiments intended to demonstrate attentional effects in cortex [24]. Reducing the speech comprehension element is a good first step, and research approached this by using a change-detection task and arrays of recognizable animal sounds (cow, owl, frog, etc.; **Figure 5**) [59]. These tests revealed what the researchers termed 'change deafness,' in which subjects often failed to identify changes in the sound arrays. Such inability to detect changes is entirely consistent with very limited attentional resources, and very

similar to results of working-memory change-detection tasks [30, 53, 60, 78].

**2.3 Overview of auditory and visual working-memory research**

However, even these types of stimuli are not best suited to fMRI investigation at this stage of understanding due to their relative complexity compared to the basic spectrotemporal features of sounds initially processed in auditory cortex [12, 50]. As discussed in detail below, the auditory system represents sounds in spectral and temporal dimensions, and stimuli similar to those used to define those perceptual areas would be best suited now to evaluating the effects of attention in the auditory

Visual and auditory working memory were discovered in quick succession and discussed together in a very popular and influential model by Baddeley and Hitch

additional information about the speaker, such as gender) [66–68, 72].

very similarly, perhaps identically, between at least vision and audition.

confounded with attentional activity in the analysis [70].

*DOI: http://dx.doi.org/10.5772/intechopen.85537*

#### *Attention and Working Memory in Human Auditory Cortex DOI: http://dx.doi.org/10.5772/intechopen.85537*

*The Human Auditory System - Basic Features and Updates on Audiological Diagnosis and Therapy*

Research into attention began in earnest in the auditory system after World War II with a very practical motivation. It had been noted that fighter pilots sometimes failed to perceive auditory messages presented to them over headphones despite the fact that the messages were completely audible. To solve this problem, Donald Broadbent began studying subjects with an auditory environment similar to the pilots, with multiple speech messages presented over headphones [34]. Based on his findings, he proposed a selective theory of attention, which was popular and persuasive, but ultimately required modification. Environments such as the one Broadbent studied are more commonly encountered at cocktail parties, in which multiple audible conversations are taking place, and people are able to attend to one or a small set of speech streams while attenuating the others. To study the 'cocktail party phenomenon,' the dichotic listening task was developed in the 1950s by Colin Cherry [66, 67]. Subjects were asked to shadow the speech stream presented to one ear of a set of headphones while another stream was presented to the other ear, and they demonstrated little

A host of studies followed up on the basic finding, revealing several attentional

*Auditory spatial attention. Schematic of an example auditory spatial attention task (e.g., see [35, 40, 66, 67]). Each block typically starts with cue (auditory or visual) for the subject to attend left or right on the upcoming trial. Two simultaneous auditory streams of digits are presented as binaural, spatially lateralized signals. Behavioral studies in an anechoic chamber often use speaker physically located to the left and right of the subject; fMRI measurements do not have the option of such a set up, but instead can use differences in the interaural time difference (ITD) to produce a similarly effective lateralization for the two digit streams.* 

*The subjects attend to the cued digit stream and perform a 1-back task.*

parameters within the context of that type of task (e.g., [30, 35, 40, 68–71]). Importantly, preferential processing of the attended stream relative to the unattended streams is not absolute; for example, particularly salient information, such as the name of the subject, could sometimes be recalled from an unattended stream, presumably by reorienting attention [39, 66, 67, 69]. The streams were typically differentiated spatially (e.g., to each ear through a headset), indicating a spatial aspect to attentional selection and therefore the attentional system. Similarly, the streams

**2.2 Overview of auditory and visual attention research**

knowledge of the nonshadowed (unattended) stream (**Figure 4**).

**8**

**Figure 4.**

were also typically differentiated by the voice of the person speaking, indicating attentional selection based on the spectrotemporal characteristics of the speaker's voice such as the average and variance of pitch and speech rate (often reflecting additional information about the speaker, such as gender) [66–68, 72].

These findings are very similar to findings in the visual domain, indicating that attentional systems across senses are similarly organized. Visual attention can similarly be deployed to a small set of locations or to visual features with very little recall of nonattended visual stimuli [41]. Roughly analogous to speech shadowing are multiple-object-tracking tasks, which require subjects to visually track a small set of moving objects out of a group [47, 73]. Visual change-detection tasks are also very common, and they demonstrate very similar results as their auditory counterparts [50, 74, 75]. In sum, the evidence suggests that attentional systems are organized very similarly, perhaps identically, between at least vision and audition.

Despite these broad contributions, these types of tasks are of limited utility when tying behavior to cortical activity because the types of stimuli used are rather highorder (e.g., speech) with relatively uncontrolled low-level parameters. For example, the spectrotemporal profile of a stream of speech is complex, likely activating broad swaths of low-level sensory cortex in addition to higher-order regions dedicated to speech comprehension, including working and long-term memory [68, 72, 76, 77]. If one were to compare fMRI activity across auditory cortex in traditional dichotic listening tasks, the differences would have far too many variables for which to account before meaningful conclusions can be made about attentional systems. It may seem intuitive to compare cortical activity between conditions where identical speech stimuli have been presented and the subject either attended to the stimuli or did not. However, areas that have increased activity when the stimuli were attended could simply reflect higher-order processing that only occurs when attention is directed to the stimuli rather than directly revealing areas involved in attentional control. For example, recognition of particular words requires comparison of the speech stimulus to an internal representation, which requires activation of long-term memories of words [77]. Long-term memory retrieval does not happen if the subject never perceived the word due to attention being maintained on a separate speech stream, so such memory-retrieval activity would be confounded with attentional activity in the analysis [70].

Thus, simpler stimuli that are closer in nature to the initial spectrotemporal analyses performed by primary auditory cortex (PAC) are better suited for experiments intended to demonstrate attentional effects in cortex [24]. Reducing the speech comprehension element is a good first step, and research approached this by using a change-detection task and arrays of recognizable animal sounds (cow, owl, frog, etc.; **Figure 5**) [59]. These tests revealed what the researchers termed 'change deafness,' in which subjects often failed to identify changes in the sound arrays. Such inability to detect changes is entirely consistent with very limited attentional resources, and very similar to results of working-memory change-detection tasks [30, 53, 60, 78].

However, even these types of stimuli are not best suited to fMRI investigation at this stage of understanding due to their relative complexity compared to the basic spectrotemporal features of sounds initially processed in auditory cortex [12, 50]. As discussed in detail below, the auditory system represents sounds in spectral and temporal dimensions, and stimuli similar to those used to define those perceptual areas would be best suited now to evaluating the effects of attention in the auditory system (**Figure 6**) [5, 10].

### **2.3 Overview of auditory and visual working-memory research**

Visual and auditory working memory were discovered in quick succession and discussed together in a very popular and influential model by Baddeley and Hitch

#### **Figure 5.**

*Auditory feature attention. Schematic outlines a simple proposed attention task utilizing spectral (narrowband noise) and temporal (broadband noise) stimuli taken from the stimuli used by [10] to define auditory field maps. Subjects are asked to attend to one of two simultaneously presented stimuli, which are either (A) narrowband noise, in this case with central frequencies of 6400 and 1600 Hz and the same amplitude modulation (AM) rate of 8 Hz, or (B) broadband noise, in this case with AM rates of 2 and 8 Hz. (C) A proposed task that varies auditory feature attention, in which subjects are instructed to attend to each of the stimuli in an alternating pattern, cued by a short sound at the beginning of each block.*

#### **Figure 6.**

*Auditory object attention and working memory. Schematic of one trial in an auditory change-detection task (e.g., see change-deafness experiments in [59]). Subjects are first presented with an array of four distinct auditory objects (e.g., four different recordings of real animal sounds, randomized each trial from a larger set of iconic animal calls). In the initial memory array, the four animal sounds are initially presented binaurally and are temporally overlapped for a short time (e.g., 2 s). Within an anechoic chamber setup often used in psychoacoustic studies, these speakers may be physically positioned at the corners of a square; fMRI measurements do not have the option of such a set up, but instead can use differences in the interaural time difference (ITD) and interaural level difference (ILD) to produce a similarly effective virtual space. The subject's goal is typically to identify and remember all four animal sounds. The interstimulus interval is commonly filled with silence or white noise and can be varied in length to create shorter or longer retention intervals for attention or working-memory tasks, respectively. During the subsequent test array, subjects attempt to identify which one of the four auditory objects is now missing from the simultaneous animal sound presentations. In such auditory change-deafness paradigms, subjects fail to notice a large proportion of the changes introduced between the initial and test arrays.*

**11**

*Attention and Working Memory in Human Auditory Cortex*

linking sensory perception, working memory, and executive control [79–81]. The generally accepted modern model of working memory has changed somewhat from the original depiction, but the vast majority of research has been working within the framework (for reviews, see [30, 51, 53, 79, 81]). Each sense is equipped with its own perceptual system and three memory systems: sensory memory, working memory, and long-term memory. Direct sensory input, gated by attentional selection, is one of the two primary inputs into working memory. Sensory memory is a vivid trace of sensory information that persists after the information has vanished for a short time and is essentially equivalent to direct sensory input into working memory, again gated by attentional selection; one can reorient attention to aspects of the sensory trace as if it were direct sensation. Long-term memory is the second primary input into working memory, which is gated by an attention-like selection, generally referred to as selective memory retrieval. Working memory itself is a short-term memory workspace lasting a couple of seconds without rehearsal, in which sensory information is maintained and manipulated by a central executive [82]. The central executive is a deliberately vague term with nebulous properties; as a colleague often quips, "All we know of the central executive is that it's an oval," after its oval-shaped depiction in the Baddeley and Hitch model. There is ongoing debate as to the level of the hierarchy at which each system is integrated into that of

Visual working memory and visual sensory memory (i.e., 'iconic memory') were fundamentally measured by George Sperling in 1960 [52]. He presented arrays of simple visual stimuli for short periods of time and asked subjects to report what they had seen after a number of short delays. He discovered that subjects could only recall a small subset of stimuli in a large array, representing the limited capacity of visual working memory. Furthermore, they could recall a particular subset of the stimuli when cued after the presentation but before the sensory trace had faded (≤100 ms), indicating that visual sensory memory exists and that visual attention can be deployed to stimuli either during sensation or sensory memory. Over the next decade, George Sperling went on to perform similar measurements in the auditory system, delineating very similar properties for auditory perception, sensory

Without directly measuring brain activity, researchers concluded that sensory systems must be operating independently with dual-task paradigms in which subjects were asked to maintain visual, auditory, or both types of information in working memory. It was shown that subjects could recall ~3–4 'chunks' of information (which may not precisely reflect individual sensory locations or features) of each type, regardless of whether they were asked to maintain visual, auditory, or both types of information [49, 78]. If the systems were integrated, one would be able to allocate multisensory working-memory 'slots' to either sense, with a maximum number (e.g., 6–8) that could be divided between the senses as desired. Instead, subjects can maintain on average ~3–4 visual chunks and ~3–4 auditory chunks,

While electroencephalogram (EEG) and positron emission topography (PET) recordings could broadly confirm the contralateral organization of the visual system and coarsely implicate the parietal and frontal lobes in attention and working memory, it was not until the advent of high-resolution fMRI that researchers could begin localizing attention and working memory in human cortex with any detail [6, 17, 37, 50, 84–90]. Model-based fMRI investigations have attempted to localize visual working memory by comparing BOLD activity in conditions where subjects are required to hold different numbers of objects in working memory [50, 62, 91, 92]. The logic goes that, because visual-working-memory models posit that a maximum of ~3–4

without any ability to reallocate any 'slots' from one sense to the other.

*DOI: http://dx.doi.org/10.5772/intechopen.85537*

the other senses, with no definitive solutions.

memory, and working memory [83].

#### *Attention and Working Memory in Human Auditory Cortex DOI: http://dx.doi.org/10.5772/intechopen.85537*

*The Human Auditory System - Basic Features and Updates on Audiological Diagnosis and Therapy*

*Auditory feature attention. Schematic outlines a simple proposed attention task utilizing spectral (narrowband noise) and temporal (broadband noise) stimuli taken from the stimuli used by [10] to define auditory field maps. Subjects are asked to attend to one of two simultaneously presented stimuli, which are either (A) narrowband noise, in this case with central frequencies of 6400 and 1600 Hz and the same amplitude modulation (AM) rate of 8 Hz, or (B) broadband noise, in this case with AM rates of 2 and 8 Hz. (C) A proposed task that varies auditory feature attention, in which subjects are instructed to attend to each of the* 

*stimuli in an alternating pattern, cued by a short sound at the beginning of each block.*

*Auditory object attention and working memory. Schematic of one trial in an auditory change-detection task (e.g., see change-deafness experiments in [59]). Subjects are first presented with an array of four distinct auditory objects (e.g., four different recordings of real animal sounds, randomized each trial from a larger set of iconic animal calls). In the initial memory array, the four animal sounds are initially presented binaurally and are temporally overlapped for a short time (e.g., 2 s). Within an anechoic chamber setup often used in psychoacoustic studies, these speakers may be physically positioned at the corners of a square; fMRI measurements do not have the option of such a set up, but instead can use differences in the interaural time difference (ITD) and interaural level difference (ILD) to produce a similarly effective virtual space. The subject's goal is typically to identify and remember all four animal sounds. The interstimulus interval is commonly filled with silence or white noise and can be varied in length to create shorter or longer retention intervals for attention or working-memory tasks, respectively. During the subsequent test array, subjects attempt to identify which one of the four auditory objects is now missing from the simultaneous animal sound presentations. In such auditory change-deafness paradigms, subjects fail to notice a large proportion of the* 

**10**

*changes introduced between the initial and test arrays.*

**Figure 6.**

**Figure 5.**

linking sensory perception, working memory, and executive control [79–81]. The generally accepted modern model of working memory has changed somewhat from the original depiction, but the vast majority of research has been working within the framework (for reviews, see [30, 51, 53, 79, 81]). Each sense is equipped with its own perceptual system and three memory systems: sensory memory, working memory, and long-term memory. Direct sensory input, gated by attentional selection, is one of the two primary inputs into working memory. Sensory memory is a vivid trace of sensory information that persists after the information has vanished for a short time and is essentially equivalent to direct sensory input into working memory, again gated by attentional selection; one can reorient attention to aspects of the sensory trace as if it were direct sensation. Long-term memory is the second primary input into working memory, which is gated by an attention-like selection, generally referred to as selective memory retrieval. Working memory itself is a short-term memory workspace lasting a couple of seconds without rehearsal, in which sensory information is maintained and manipulated by a central executive [82]. The central executive is a deliberately vague term with nebulous properties; as a colleague often quips, "All we know of the central executive is that it's an oval," after its oval-shaped depiction in the Baddeley and Hitch model. There is ongoing debate as to the level of the hierarchy at which each system is integrated into that of the other senses, with no definitive solutions.

Visual working memory and visual sensory memory (i.e., 'iconic memory') were fundamentally measured by George Sperling in 1960 [52]. He presented arrays of simple visual stimuli for short periods of time and asked subjects to report what they had seen after a number of short delays. He discovered that subjects could only recall a small subset of stimuli in a large array, representing the limited capacity of visual working memory. Furthermore, they could recall a particular subset of the stimuli when cued after the presentation but before the sensory trace had faded (≤100 ms), indicating that visual sensory memory exists and that visual attention can be deployed to stimuli either during sensation or sensory memory. Over the next decade, George Sperling went on to perform similar measurements in the auditory system, delineating very similar properties for auditory perception, sensory memory, and working memory [83].

Without directly measuring brain activity, researchers concluded that sensory systems must be operating independently with dual-task paradigms in which subjects were asked to maintain visual, auditory, or both types of information in working memory. It was shown that subjects could recall ~3–4 'chunks' of information (which may not precisely reflect individual sensory locations or features) of each type, regardless of whether they were asked to maintain visual, auditory, or both types of information [49, 78]. If the systems were integrated, one would be able to allocate multisensory working-memory 'slots' to either sense, with a maximum number (e.g., 6–8) that could be divided between the senses as desired. Instead, subjects can maintain on average ~3–4 visual chunks and ~3–4 auditory chunks, without any ability to reallocate any 'slots' from one sense to the other.

While electroencephalogram (EEG) and positron emission topography (PET) recordings could broadly confirm the contralateral organization of the visual system and coarsely implicate the parietal and frontal lobes in attention and working memory, it was not until the advent of high-resolution fMRI that researchers could begin localizing attention and working memory in human cortex with any detail [6, 17, 37, 50, 84–90]. Model-based fMRI investigations have attempted to localize visual working memory by comparing BOLD activity in conditions where subjects are required to hold different numbers of objects in working memory [50, 62, 91, 92]. The logic goes that, because visual-working-memory models posit that a maximum of ~3–4

objects can be held in visual working memory on average, areas that increase their activity with arrays 1, 2, 3 objects and remaining constant with arrays of 4 or more objects should be areas controlling visual working memory. Such areas were found bilaterally in parietal cortex by multiple laboratories [57, 62, 91, 93], but activity related to visual working memory has also been measured in early visual cortex (e.g., V1 and hV4) [55, 65, 94], prefrontal cortex [95], and possibly in object-processing regions in lateral occipital cortex [62], indicating that working-memory tasks recruit areas throughout the visual-processing hierarchy. (We note that the report of objectprocessing regions is controversial, as the cortical coordinates reported in that study are more closely consistent with the human motion-processing complex, hMT+, than the lateral occipital complex [15, 17, 96, 97]). However, little has been done to measure visual-working-memory activity in visual field maps, and so these studies should be considered preliminary rather than definitive. Measurements within CFMs would, in fact, help to clear up such controversies.

Auditory-working-memory localization with fMRI has been quite limited compared to its visual counterpart, and largely concentrated on speech stimuli rather than fundamental auditory stimuli [30, 68]. As noted above with attention localization with fMRI, too many variables exist with highly complex stimuli, and as such, a different approach is necessary. Furthermore, even low-level auditory sensory areas have only very recently been properly identified [5, 10].
