**6.1 Limitations**

The current attention pipeline can be accelerated significantly if the processing related to producing regions of interest is parallelized. This can either be done during low-level feature computation by utilizing GPU acceleration techniques or at higher levels by allowing regions of interest to be processed simultaneously by multiple types of detectors or the regions of interest could be paritioned spatially to be analyzed in parallel by the same type of detector. In a distributed system, the communication between the different levels of the attention model is handled using network socket communication. If there are numerous communication links from the raw sensory input to the highest panoramic attention layer, the cumulative latency will create a delayed response in perception and any subsequent behavioral reponse. By judiciously allow increased computational load on a single computer, network communication can be replaced by very rapid memory-based messaging schemes which will allow the robot to respond quicker to environmental changes.

To reduce excessive computational overhead, the use of context can be applied to adjust weighting of different features when determining regions of interest. For example, a highly visual task may not require frequent sound processing. The attention system can choose to either adjust the sampling rate of the sound detector or disable processing completely. Increased processing can then be allocated to relevant features to produce more accurate measurements and subsequent state estimates. At the middle layers, introducing a top-down mechanism of modulating different types of detectors can create efficient resource allocation. 14 Will-be-set-by-IN-TECH

Fig. 7. Hand-waving to get robot's attention while it is playing card game with another

incidental information obtained through casual observation can be quickly recalled when

The current attention pipeline can be accelerated significantly if the processing related to producing regions of interest is parallelized. This can either be done during low-level feature computation by utilizing GPU acceleration techniques or at higher levels by allowing regions of interest to be processed simultaneously by multiple types of detectors or the regions of interest could be paritioned spatially to be analyzed in parallel by the same type of detector. In a distributed system, the communication between the different levels of the attention model is handled using network socket communication. If there are numerous communication links from the raw sensory input to the highest panoramic attention layer, the cumulative latency will create a delayed response in perception and any subsequent behavioral reponse. By judiciously allow increased computational load on a single computer, network communication can be replaced by very rapid memory-based messaging schemes

To reduce excessive computational overhead, the use of context can be applied to adjust weighting of different features when determining regions of interest. For example, a highly visual task may not require frequent sound processing. The attention system can choose to either adjust the sampling rate of the sound detector or disable processing completely. Increased processing can then be allocated to relevant features to produce more accurate measurements and subsequent state estimates. At the middle layers, introducing a top-down mechanism of modulating different types of detectors can create efficient resource allocation.

which will allow the robot to respond quicker to environmental changes.

person

**6.1 Limitations**

location-based queries are performed.

Fig. 8. Automatic logging of speaker activity and locations for multi-speaker applications: (inset) Panoramic attention locates speakers and logs amount of speaker activity for each participant. White circle regions represent past clusters of sound activity labeled with current number of utterances and cumulative time spoken.

For example, if it is discovered that certain objects are not present in an environment or not relevant to a task, the corresponding detectors can be suspended to allow relevant detectors to perform at faster rates, improving overall system performance.

The mechanism for top-down modulation of the attention system should be handled by a behavior system that is driven by the set of active applications running on the robot. The behaviors are dictated by the current task and there should be a clear layer of separation between the behavior and perception system. Behaviors are triggered by changes of the environmental state which is inferred from the panoramic attention system. Therefore, it should be the behavior system that configures which detectors and low-level features the perception system should perform to allow proper decision-making for behaviors to be made. The behavior system can consist of either low-level reactive actions that may be triggered directly by low-level features, or high-level deliberative behaviors that may spawn multiple sub-tasks themselves. Since reactive behaviors have fewer intervening communication links, generated responses will automatically occur quicker in response to changes in sensory input. Since the locations in the panoramic map are in egocentric coordinates, they need to be updated whenever the robot moves to a new location. Although these locations can be completely re-sensed and re-calculated once the robot moves to its new position, the

L.Itti & G. Rees & J.K. Tsotsos(2005). In: *Neurobiology of Attention*, Academic Press, 2005, ISBN

A Multi-Modal Panoramic Attentional Model for Robots and Applications 227

L. Komatsu(1994). In: *Experimenting With the Mind: Readings in Cognitive Psychology*,

L. Itti & N. Dhavale & F. Pighin(2003). Realistic Avatar Eye and Head Animation Using a

C.Kayser & C. I. Petkov & M. Lippert & N.K. Logothetis(2008). Mechanisms for Allocating Auditory Attention: An Auditory Saliency Map, In: *Current Biology*, Vol. 15 P. Maragos(2008). In: *Multimodal Processing and Interaction: Audio, Video, Text*, pp. 181-184,

K.Rapantzikos & G. Evangelopoulos & P.Maragos & Y. Avrithis(2007). An Audio-Visual

V. Navalpakkam and L. Itti(2005). Modeling the influence of task on attention, In:*Journal of*

J.Moren & A.Ude & A.Koene(2008). Biologically based top-down attention modulation for

B. Mutlu & T. Shiwa & T. Kanda & H. Ishiguro & N. Hagita(2009). Footing in Human-Robot

T. Sobh & A. Yavari & H.R. Pourreza(2008). Visual Attention in Foveated Images, In:*Advances*

J. Shear & F.J. Varela(1999). In: *The View from Within: First-person Approaches to the Study of*

P. W. Halligan & D. T. Wade(2005). In: *The Effectiveness of Rehabilitation for Cognitive Deficits*,

A. Bur & A. Tapus & N. Ouerhani & R. Siegwar & H.Hiigli(2006), Robot Navigation by

K. Kayama & K. Nagashima & A. Konno & M.Inaba & H. Inoue(1998).

J. Ruesch & M. Lopes & A. Bernardino & J. Hörnstein & J. Santos-Victor & R. Pfeifer(2008),

A. Koene & J. Moren & V. Trifa & G. Cheng(2007). Gaze shift reflex in a humanoid active vision

*International Conference on Robotics and Automation(ICRA)*, pp. 3253-3258. K. Nickel and R.Stiefelhagen(2007). Fast Audio-Visual Multi-Person Tracking for a Humanoid

*Workshop on Robot and Human Interactive Communication (RO-MAN)*.

*Consciousness*, Imprint Academic, ISBN 0907845258, 9780907845256

Oxford University Press, ISBN 0198526547, 9780198526544

*Pattern Recognition(ICPR)*, vol. 1, pp. 695-698.

*Humanoids'07*, Pittsburgh, PA.

*Symposium on Optical Science and Technology*, vol. 5200, pp 64-78.

Neurobiological Model of Visual Attention, In: *Proc. SPIE 48th Annual International*

Saliency Model for Movie Summarization, In: *IEEE 9th Workshop on Multimedia Signal*

humanoid interactions, In: *International Journal of Humanoid Robotics (IJHR)*, vol. 5, pp

Conversations: How Robots Might Shape Participants Roles Using Gaze Cues. In: *Proceedings of the 4th International Conference on Human-Robot Interaction (HRI)*. Y. Nozawa & J. Mori & M. Ishizuka(2004). Humanoid Robot Presentation Controlled by

Multimodal Presentation Markup Language MPML, In:*Proc. 13th IEEE International*

*in Computer and Information Sciences and Engineering*, Springer, ISBN 9781402087417,

Panoramic Vision and Attention Guided Features, In: *International Conference on*

Panoramic-environmental description as robots visual short-term memory, In:*IEEE*

Stereo Camera Head, In:*IEEE-RAS 7th International Conference on Humanoid Robots -*

Multimodal saliency-based bottom-up attention framework for the humanoid robot iCub, In:*IEEE International Conference on Robotics and Automation (ICRA)*, pp. 962-967

system, In:*Proceedings of the 5th International Conference on Computer Vision Systems*

0123757312, 9780123757319.

Brooks-Cole Publishing Company.

*Processing(MMSP)*, pp.320-323.

3-24

pp. 17-20.

*(ICVS)*.

Springer, ISBN 0387763155, 9780387763156.

*Vision Research*, vol. 45, no. 2, pp. 205-231.

Fig. 9. Child faces appears lower in panorama.

panoramic map would be invalid during actual motion, limiting the robot's perception while it is moving. Continuously tracking the position of environmental objects during motion and incorporating the known ego-motion of the robot could be employed to create a continuously updated panoramic map even while the robot is moving.
