2.3.2.2 Object location

10 will be set by intech

Fig. 9. Three models of the mean observer for natural images on the left. The two right images: model of the mean observer on a set of advertising and websites images.

**2.3.2 Top-down as a task: attending to objects or their usual position**

While the previous section dealt with attention attracted by events which lead to situations which are not consistent with the knowledge acquired about the scene, here we focus on the second main top-down cue which is a visual task ("find the keys"). This task will also have a huge influence on the way the image is attended and it will imply object recognition ("recognize the keys") and object usual location ("they could be on the floor, but never on the

Object recognition can be achieved through classical methods or using points of interest (like SIFT, SURF . . . Bay et al. (2008)) which are somehow related to saliency. Some authors integrated the notion of object recognition into the architecture of their model like Navalpakkam & Itti (2005). They extract the same features as for the bottom-up model, from the object and learn them. This learning step will provide weight modification for the fusion of the conspicuity maps which will lead to the detection of the areas which contain the same

(2011).

ceiling").

2.3.2.1 Object recognition

feature combination as the learnt object.

is very different on a set of advertisements and on a set of websites as it is showed in Figure 9 on the two right images. This is partly due to a priori knowledge that people have about those images. For example, when viewing a website, the upper part has high chance to contain the logo and title, while the left part should contain the menu. During images or video viewing, the default template is the one of natural images with a high weight on the center of the image. If supplemental knowledge is known about the image, the top-down information will modify the mean behavior towards the optimized gaze density. Those top-down maps can highly influence the bottom-up saliency map but this influence is variable. In Mancas (2009) it appears that top-down information seems more important in the case of websites, than advertisements and natural images. Other kinds of models can be learnt from videos, especially if the camera is still. It is possible to accumulate motion patterns for each extracted feature which provides a model of normality. As an example, after a given period of observation, one can say: here moving objects are generally fast (first feature: speed) and going from left to right (second feature: direction). If an object, at the same location is slow and/or going from right to left, this is surprising given what was previously learnt from the scene, thus attention will be directed to this object. This kind of considerations can be found in Mancas & Gosselin (2010). It is possible to go further and to have different cyclic models in time. In a metro station, for example, normal people behavior when a train arrives in the station is different from the one during the waiting period in terms of people direction, speed, density . . . In the literature (mainly in video surveillance) the variations in time of the normality models is learnt through HMMs (Hidden Markov Models) Jouneau & Carincotte Another approach is in providing with a higher weight the areas from the image which have a higher probability to contain the searched object. Several authors as Oliva et al. (2003) developed methods to learn objects' location. Vectors of features are extracted from the images and their dimension is reduced by using PCA (Principal Component Analysis). Those vectors are then compared to the ones from a database of images containing the given object. Figure 10 shows the potential people location that has been extracted from the image. This information, combined with bottom-up saliency lead to the selection of a person sitting down on the left part of the image.

Fig. 10. Bottom-up saliency model inhibited by top-down information to select only salient people.
