**4.1.2 Saliency maps based methods**

Figure 15 perfectly illustrates the process of the saliency-based retargeting. From the original image, on the left, a saliency map is computed (in the middle) from which an area with higher intensity is extracted using some algorithm and its bounding-box will represent the zoom.

Fig. 15. Example of retargeting: left, the original picture; middle, the saliency map; right, the reframed picture. (adapted from Le Meur & Le Callet (2009))

A technique to determine automatically the "right" viewing area for spatio-temporal images is proposed in Deselaers et al. (2008). Images are first analyzed to determine relevant regions by using three strategies: the visual saliency of spatial images, optical flow for movements and the appearance of the image. A log-linear algorithm then computes the relevance for every position of the image to determine a sequence of cropping positions with a correct aspect ratio for the display device.

Suh et al. (2003) uses the Itti & Koch (2001) algorithm to compute the saliency map, that serves as a basis to automatically delineate a rectangular cropping window. A fast greedy algorithm was developed to optimize the window, that has to take into account most of the saliency while remaining sufficiently small.

The previous methods show that the perceptual zoom not only compresses the images, but it also allows better recognition during visual search!

The Self-Adaptive Image Cropping for Small Displays Ciocca et al. (2007) is based on an Itti and Koch bottom-up attention algorithm but also on top-down considerations as face

linear system of equations which takes into account some constraints such as importance

Human Attention Modelization and Data Reduction 121

Ren et al. (2009) introduces a retargeting method based on global energy optimization. Some content-aware methods only preserve high energy pixels, which only achieve local optimization. They calculate an energy map which depends on the static saliency and face

The same group proposes a retargeting approach that combines an uniform sampling and a structure-aware image representation Ren et al. (2010). The image is decomposed with a curve-edge grid, which is determined by using a carving graph such that each image pixel corresponds to a vertex in the graph. A weight is assigned to each vertex connection (only vertical direction) which depends on an energy map using saliency region and face detection. The paths with high connection weight sums in the graph are selected and the target image is

Fig. 16. The original image (left) is deformed by a grid mesh structure to be fit in the required size (right). The scaling and stretching depend on the gradient and saliency map. Source :

Wang et al. (2008) present a warping method which uses the grid mesh of quads to retarget the images (figure 16). The method determines an optimal scaling factor for regions with high content importance as well as for regions with homogeneous content which will be distorted. A significance map is computed based on the product of the gradient and the saliency measure which characterizes the visual attractiveness of each pixel. The regions are deformed according to the significance map. A global optimizing process is used repetitively

Seam carving Avidan & Shamir (2007) allows to retarget the image thanks to an energy function which defines the pixels importance. The most classical energy function is the gradient map, but other functions can be used such as entropy, histograms of oriented gradients, or saliency maps Vaquero et al. (2010). Low-energy pixels are connected together to make a seam path. The seam paths cross vertically and horizontally the image and are removed. Dynamic programming is used to calculate the optimal seams. The image is readjusted by shifting pixels to compensate the disappeared seams. The process is repeated

Figure 17 shows an example of seam carving: the original images (A and B) are reduced either by discarding vertical or horizontal seams. On the top row, the classical gradient is used as the energy map, while saliency maps of Wonjun et al. (2011) are used for the bottom row. Depending on the energy map which is used distances, shapes as well as aspect ratio distortions can cause anisotropic stretching Chamaret et al. (2010). Even if saliency maps

detection. The optimal new size of each pixel is computed by linear programming.

modeling, boundary substitutions, spatial and time continuity.

generated by uniformly sampling the pixels within the grids.

http://graphics.csie.ncku.edu.tw/Image\_Resizing/

to minimize the quad deformation and grid bending.

as often as required to reach the expected sizes.

**4.2.2 Seam carving**

detection, skin color .... According to a given threshold, the region is either kept or eliminated.

The RSVP (Rapid Serial Visual Presentation de Bruijn & Spence (2000)) method for images can also be adapted to allow in a sequential way and during a short time the visualization and browsing of the interest regions Fan et al. (2003). Here also, the bottom-up attention saliency is computed with Itti & Koch (2001) while top-down information is added: texts and faces detection. The most relevant interest regions are proposed to mobile phones as key images.

Liu et al. (2007) start by segmenting the image into several regions, for which saliency is calculated to provide a global saliency map. The regions are classified according to their attractiveness, which allows to present image regions on small size screens and to browse in big size images.

A completely automatic solution to create thumbnails according to the saliency distribution or the cover rate is presented by Le Meur et al. (2007a). The size of the thumbnail can be fixed and centered on the saliency map global maximum or adapted to certain parameters such as the saliency distribution. The gaze fixation predicted by a Winner-Take-All algorithm can thus be used and the search for the thumbnail location ends when a given percentage of the total image saliency is reached. A subset of the corners coordinates of the squares in which are predicted eye gaze centered on a local maximum of saliency is determined. The coordinates of the upper left and the lower right corners of the final zoom thumbnail are set to include a square area centered on the relevant local maximum.

#### **4.2 Spatio-temporal resolution decrease for uninteresting regions: anisotropic resolution**

Perceptual zoom does not always preserve the image structure. For example, Figure 14 shows that the smallest zoom on the left image only comprises part of the castle, which is likely to attract attention. In this case the zoom loses the structure and context of the original image. To keep the image structure when retargeting two main methods are described in this section: warping and seam carving. These methods may cause non-linear visual distortions on several regions of the image (Zhou et al. (2003)).

#### **4.2.1 Warping**

Warping is an operation that maps a position in a source image to a position in a target image by a spatial transformation. This transformation could be a simple scaling transformation Liu & Gleicher (2005). Another approach of warping is to place a grid mesh onto the image and then compute a new geometry for this mesh (Figure 16), such that the boundaries fit the new desired image sizes, and the quad faces covering important image regions remain intact at the expense of larger distortion to the other quads Wang et al. (2008).

Automatic image retargeting with fisheye-view warping Liu & Gleicher (2005) uses an "importance map" that combines salience and object information to find automatically, with a greedy algorithm, a minimal rectangular region of interest. A non-linear function is then used for warping to ensure that the distortion in the region of interest is smaller than elsewhere in the image.

Non-homogeneous content-driven video-retargeting Wolf et al. (2007) proposes a real-time retargeting algorithm for video. Spatial saliency, face detection and motion detection are computed to provide a saliency matrix. An optimized mapping is computed with a sparse 18 will be set by intech

detection, skin color .... According to a given threshold, the region is either kept or

The RSVP (Rapid Serial Visual Presentation de Bruijn & Spence (2000)) method for images can also be adapted to allow in a sequential way and during a short time the visualization and browsing of the interest regions Fan et al. (2003). Here also, the bottom-up attention saliency is computed with Itti & Koch (2001) while top-down information is added: texts and faces detection. The most relevant interest regions are proposed to mobile phones as key images. Liu et al. (2007) start by segmenting the image into several regions, for which saliency is calculated to provide a global saliency map. The regions are classified according to their attractiveness, which allows to present image regions on small size screens and to browse

A completely automatic solution to create thumbnails according to the saliency distribution or the cover rate is presented by Le Meur et al. (2007a). The size of the thumbnail can be fixed and centered on the saliency map global maximum or adapted to certain parameters such as the saliency distribution. The gaze fixation predicted by a Winner-Take-All algorithm can thus be used and the search for the thumbnail location ends when a given percentage of the total image saliency is reached. A subset of the corners coordinates of the squares in which are predicted eye gaze centered on a local maximum of saliency is determined. The coordinates of the upper left and the lower right corners of the final zoom thumbnail are set to include a

**4.2 Spatio-temporal resolution decrease for uninteresting regions: anisotropic resolution** Perceptual zoom does not always preserve the image structure. For example, Figure 14 shows that the smallest zoom on the left image only comprises part of the castle, which is likely to attract attention. In this case the zoom loses the structure and context of the original image. To keep the image structure when retargeting two main methods are described in this section: warping and seam carving. These methods may cause non-linear visual distortions on several

Warping is an operation that maps a position in a source image to a position in a target image by a spatial transformation. This transformation could be a simple scaling transformation Liu & Gleicher (2005). Another approach of warping is to place a grid mesh onto the image and then compute a new geometry for this mesh (Figure 16), such that the boundaries fit the new desired image sizes, and the quad faces covering important image regions remain intact at the

Automatic image retargeting with fisheye-view warping Liu & Gleicher (2005) uses an "importance map" that combines salience and object information to find automatically, with a greedy algorithm, a minimal rectangular region of interest. A non-linear function is then used for warping to ensure that the distortion in the region of interest is smaller than elsewhere in

Non-homogeneous content-driven video-retargeting Wolf et al. (2007) proposes a real-time retargeting algorithm for video. Spatial saliency, face detection and motion detection are computed to provide a saliency matrix. An optimized mapping is computed with a sparse

eliminated.

in big size images.

**4.2.1 Warping**

the image.

square area centered on the relevant local maximum.

expense of larger distortion to the other quads Wang et al. (2008).

regions of the image (Zhou et al. (2003)).

linear system of equations which takes into account some constraints such as importance modeling, boundary substitutions, spatial and time continuity.

Ren et al. (2009) introduces a retargeting method based on global energy optimization. Some content-aware methods only preserve high energy pixels, which only achieve local optimization. They calculate an energy map which depends on the static saliency and face detection. The optimal new size of each pixel is computed by linear programming.

The same group proposes a retargeting approach that combines an uniform sampling and a structure-aware image representation Ren et al. (2010). The image is decomposed with a curve-edge grid, which is determined by using a carving graph such that each image pixel corresponds to a vertex in the graph. A weight is assigned to each vertex connection (only vertical direction) which depends on an energy map using saliency region and face detection. The paths with high connection weight sums in the graph are selected and the target image is generated by uniformly sampling the pixels within the grids.

Fig. 16. The original image (left) is deformed by a grid mesh structure to be fit in the required size (right). The scaling and stretching depend on the gradient and saliency map. Source : http://graphics.csie.ncku.edu.tw/Image\_Resizing/

Wang et al. (2008) present a warping method which uses the grid mesh of quads to retarget the images (figure 16). The method determines an optimal scaling factor for regions with high content importance as well as for regions with homogeneous content which will be distorted. A significance map is computed based on the product of the gradient and the saliency measure which characterizes the visual attractiveness of each pixel. The regions are deformed according to the significance map. A global optimizing process is used repetitively to minimize the quad deformation and grid bending.
