**6. References**


22 will be set by intech

In a multimedia file a lot of information is included into the visual data. But also, supplemental or complementary information can be found within the audio track: audio data could confirm visual data information, help in being more selective or even bring new information that is not present in the camera field of view. Indeed, in some contexts sound might even be the only way to determine where to focus visual attention, for example if several persons are in a room but only one is talking. It seems thus that the use of both visual

Multimodal models of attention are unfortunately very few and they are mainly used in the field of robotics such in Ruesch et al. (2008). Another interesting idea is to localize the sound-emitting regions in a video. Recent work as Lee et al. (2010) has shown the ability to

Given the computationally intensive nature and the real-time requirements of video compression methods and especially in the case of multimodal integration of saliency maps, some algorithms have exploited recent advances in Graphics Processing Unit (GPU) computing. In particular, a parallel implementation of a spatio-temporal visual saliency

Visual compression has been a very active field of research and development for over 20 years, leading to many different compression systems and to the definition of international standards. Even though video compression has become a mature field, a lot of research is still ongoing. Indeed, as the quality of the compression increases, so does users' level of expectations and their intolerance to artifacts. Exploiting saliency-based video compression is a challenging and exciting area of research and especially nowadays when saliency models include more and more top-down information and manage to better and better predict real

Multimedia applications are a continuously evolving domain and compression algorithms must also evolve and adapt to new applications. The explosion of portable devices with less bandwidth and smaller screens, but also the future semantic TV/web and its object-based description will lead in the future to a higher importance of saliency-based algorithms for

Achanta, R., Hemami, S., Estrada, F. & Susstrunk, S. (2009). Frequency-tuned Salient Region

Avidan, S. & Shamir, A. (2007). Seam carving for content-aware image resizing, *ACM Trans.*

Bay, H., Ess, A., Tuytelaars, T. & Gool, L. V. (2008). Surf: Speeded up robust features, *Computer*

Belardinelli, A., Pirri, F. & Carbone, A. (2008). Motion saliency maps from spatiotemporal

*Vision and Image Understanding (CVIU)* 110(3): 346–359.

Detection, *IEEE International Conference on Computer Vision and Pattern Recognition*

filtering, *In Proc. 5th International Workshop on Attention in Cognitive Systems - WAPCV*

**5.4 Saliency cross-modal integration: combining audio and visual attention**

and audio saliency is a relevant idea.

model has been proposed Rahman et al. (2011).

multimedia data repurposing and compression.

**5.5 Saliency models and new trends in multimedia compression**

localize sounds in an image.

human gaze.

**6. References**

*(CVPR)*.

*Graph.* 26(3): 10.

*2008*, pp. 7–17.


Maeder, A. J., Diederich, J. & Niebur, E. (1996). Limiting human perception for image

Human Attention Modelization and Data Reduction 127

Mancas, M. (2009). Relative influence of bottom-up and top-down attention, *Attention in*

Mancas, M. & Gosselin, B. (2010). Dense crowd analysis through bottom-up and top-down

Mancas, M., Gosselin, B. & Macq, B. (2007). Perceptual image representation, *J. Image Video*

Mancas, M., Pirri, F. & Pizzoli, M. (2011). From saliency to eye gaze: embodied visual selection

Mancas, M., Riche, N. & J. Leroy, B. G. (2011). Abnormal motion selection in crowds using

Najemnik, J. & Geisler, W. (2005). Optimal eye movement strategies in visual search, *Nature*

Navalpakkam, V. & Itti, L. (2005). Modeling the influence of task on attention, *Vision Research*

Ninassi, A., Le Meur, O., Le Callet, P. & Barbba, D. (2007). Does where you gaze on an image

Oliva, A., Torralba, A., Castelhano, M. & Henderson, J. (2003). Top-down control of visual

Privitera, C. M. & Stark, L. W. (2000). Algorithms for defining visual regions-of-interest:

Rahman, A., Houzet, D., Pellerin, D., Marat, S. & Guyader, N. (2011). Parallel implementation

Ren, T., Liu, Y. & Wu, G. (2009). Image retargeting using multi-map constrained region

Ren, T., Liu, Y. & Wu, G. (2010). Rapid image retargeting based on curve-edge grid representation, *IEEE Inter. Conf. on Image Processing (ICIP)* , pp. 869–872. Richardson, I. E. (2003). *H.264 and MPEG-4 Video Compression: Video Coding for Next Generation*

Riche, N., Mancas, M. & B. Gosselin, T. D. (2011). 3d saliency for abnormal motion selection:

Rubinstein, M., Shamir, A. & Avidan, S. (2008). Improved seam carving for video retargeting,

Ruesch, J., Lopes, M., Bernardino, A., Hornstein, J., Santos-Victor, J. & Pfeifer, R. (2008).

Santella, A., Agrawala, M., Decarlo, D., Salesin, D. & Cohen, M. (2006). Gaze-based interaction for semi-automatic photo cropping, *In CHI 2006*, ACM, pp. 771–780.

*IEEE Inter. Conf. on Image Processing (ICIP)* , Vol. 2, pp. 169–172.

attention, *Proc. of the Brain Inspired Cognitive Sytems (BICS)*.

*Instrumentation Engineers (SPIE) Conference Series*, Vol. 2657, pp. 330–337. Mancas, M. (2007). *Computational Attention Towards Attentive Computers*, Presses universitaires

de Louvain.

Heidelberg.

pp. 387–391.

45(2): 205–231.

– 253–6 vol.1.

6: 3–14.

*Process.* 2007: 3–3.

*(ISVC)*, Las Vegas, USA.

bottom-up saliency, *IEEE ICIP*.

warping, *ACM Multimedia*, pp. 853–856.

Science, Springer Berlin / Heidelberg.

*ACM Transactions on Graphics (SIGGRAPH)* 27(3): 1–9.

icub, *IEEE Int. Conf. on Robotics and Automation*, p. 6.

*Multimedia*, 1 edn, Wiley.

sequences, *in* B. E. Rogowitz & J. P. Allebach (ed.), *Society of Photo-Optical*

*Cognitive Systems*, Vol. 5395 of *Lecture Notes in Computer Science*, Springer Berlin /

for a pan-tilt-based robotic head, *Proc. of the 7th Inter. Symp. on Visual Computing*

affect your perception of quality? applying visual attention to image quality metric,

attention in object detection, *IEEE Inter. Conf. on Image Processing (ICIP)* , Vol. 1, pp. I

Comparison with eye fixations, *IEEE Trans. Pattern Anal. Mach. Intell.* 22(9): 970–982.

of a spatio-temporal visual saliency model, *Journal of Real-Time Image Processing*

the role of the depth map, *Proceedings of the ICVS 2011*, Lecture Notes in Computer

Multimodal saliency-based bottom-up attention a framework for the humanoid robot


24 will be set by intech

Itti, L. (2004). Automatic foveation for video compression using a neurobiological model of visual attention, *IEEE Transactions on Image Processing* 13(10): 1304–1318. Itti, L. & Baldi, P. F. (2006). Modeling what attracts human gaze over dynamic natural scenes,

Itti, L. & Koch, C. (2001). Computational modelling of visual attention, *Nature Reviews*

Itti, L., Koch, C. & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene

Judd, T., Ehinger, K., Durand & Torralba, A. (2009). Learning to predict where humans look,

Kayser, C., Petkov, C., Lippert, M. & Logothetis, N. K. (2005). Mechanisms for allocating auditory attention: An auditory saliency map, *Curr. Biol.* 15: 1943–1947. Koch, C. & Ullman, S. (1985). Shifts in selective visual attention: towards the underlying

Kortum, P. & Geisler, W. (1996). Implementation of a foveated image coding system for

Le Meur, O. & Le Callet, P. (2009). What we see is most likely to be what matters: visual

Le Meur, O., Le Callet, P. & Barba, D. (2007a). Construction d'images miniatures avec

Le Meur, O., Le Callet, P. & Barba, D. (2007b). Predicting visual fixations on video based on

Le Meur, O., Le Callet, P., Barba, D. & Thoreau, D. (2006). A coherent computational approach

Lee, J., De Simone, F. & Ebrahimi, T. (2010). Efficient video coding based on audio-visual focus of attention, *Journal of Visual Communication and Image Representation* 22(8): 704–711. Legge, Hooven, Klitz, Mansfield & Tjan (2002). Mr.chips 2002: new insights from an

Li, J., Tian, Y., Huang, T. & Gao, W. (2009). A dataset and evaluation methodology for visual

Li, Z., Qin, S. & Itti, L. (2011). Visual attention guided bit allocation in video compression,

Liu, F. & Gleicher, M. (2005). Automatic image retargeting with fisheye-view warping,

Liu, H., Jiang, S., Huang, Q., Xu, C. & Gao, W. (2007). Region-based visual attention

*processing*, ICIP'09, IEEE Press, Piscataway, NJ, USA, pp. 3049–3052.

low-level visual features, *Vision Research* 47: 2483–2498.

idealobserver model of reading, *Vision Research* pp. 2219–2234.

*Proceedings of User Interface Software Technologies (UIST)*.

Lorente, J. D. S. (ed.) (2011). *Recent Advances on Video Coding*, InTech.

image bandwidth reduction, *In Human Vision and Electronic Imaging, SPIE Proceedings*,

attention and applications, *Proceedings of the 16th IEEE international conference on Image*

recadrage automatique basé sur un modéle perceptuel bio-inspiré, *Traitement du*

to model bottom-up visual attention, *Pattern Analysis and Machine Intelligence, IEEE*

saliency in video, *Multimedia and Expo, 2009. ICME 2009. IEEE International Conference*

analysis with its application in image browsing on small displays, *ACM Multimedia*,

Itti, L., Rees, G. & Tsotsos, J. (2005). *Neurobiology of Attention*, Elsevier Academic Press. Jouneau, E. & Carincotte, C. (2011). Particle-based tracking model for automatic anomaly

detection, *IEEE Int. Conference on Image Processing (ICIP)*.

*IEEE Inter. Conf. on Computer Vision (ICCV)*, pp. 2376–2383.

Cambridge University Press, Cambridge, MA.

neural circuitry., *Hum Neurobiol* 4(4): 219–227.

*Neuroscience* 2(3): 194–203.

–1259.

pp. 350–360.

*signal*, Vol. 24(5), pp. 323–335.

*Transactions on* 28(5): 802 –817.

*Image and Vision Computing* 29(1): 1 – 14.

*on*, pp. 442 –445.

pp. 305–308.

*in* L. Harris & M. Jenkin (eds), *Computational Vision in Neural and Machine Systems*,

analysis, *IEEE Transactions on Pattern Analysis and Machine Intelligence* 20(11): 1254


**1. Introduction** 

could expect.

procedures of measurement of the quality of video.

the bitrate used in the new standard.

expectatives placed on it.

**7** 

*Spain* 

**Video Quality Assessment** 

One of the main aspects which affects the video compression and needs to be deeply analyzed is the quality assessment. The chain of transmission of video over a determined channel of distribution, such as broadcast or a digital way of storage, is limited, and requires a process of compression, with a consequent degradation and the apparition of artifacts which are necessary to evaluate, in order to offer a suitable and appropriate quality to the final user.

The quick evolution of technology, especially referred to television and multimedia services, which has evolved from analog to digital. The constant increasement of resolution from standard television to high definition and ultrahigh-definition or the creation of advanced production of contents systems such as 3-dimensional video, make necessary new quality studies to evaluate the video characteristics to provide the observer the best viewing that

Once the change from analog to digital television has been completely developed, the next step was encoding the video in order to obtain high compression without damaging the quality contemplated by the observer. In analog television the quality systems were wellestablished and controlled, but in digital television it is required new metrics and

The quality assessment must be adapted to the human visual system, which is why researchers have performed subjective viewing experiments in order to obtain the

There has been a process of standardization in video encoding, the group of experts of MPEG developed techniques that assure the quality which would be improved with the evolution of the standards. MPEG-2 offered a reasonably good quality, but the evolution of the standards developed another one which was twice efficient as MPEG-2, which is called AVC/H.264, i.e. to obtain a similar quality than the first standard it was only necessary half

The quality assessment has also been force to evolve Parallel to technologies. The concept is not any more limited to the perceived quality of the video, but now there are other additives carried to this concept, making appear a new term called Quality of Experience (QoE) which is becoming more popular because it is a more complete definition, just because the user is not only observing the video, is living a real experience which depends on the content and

conditions of encoding of video systems to provide the best quality to the user.

Juan Pedro López Velasco *Universidad Politécnica de Madrid,* 

