**4.1 Experimental setup**

**Datasets.** The saliency maps have been computed for images from a distinct eyetracking dataset, corresponding to 120 real scenes (Toronto) [19] and 230 synthetic images with specific feature contrast (SID4VAM) (see **Table 1)** [45]. These images datasets have been computed with our approach, a supervised artificial model that specifically computes high-level features (DeepGazeII, ML-Net (multi-level net), SAM (saliency attentive model), salGAN), and models biological inspiration (IKN (Itti, Koch, and Niebur) [16], AIM [19] (saliency based on information maximization), SDLF (saliency detection by using local features) [20], and GBVS (graph-based visual saliency) [13]).

**Networks architectures.** It evaluates approach using two network architectures: AlexNet [42] and ResNet-152 [41]. It is modified to meet our requirement. In both cases, the weights were pretrained on ImageNet and then fine-tuned on each of the datasets mentioned above. The networks were trained for 70 epochs with a learning rate of 0.0001 and a weight decay of 0.005. The top classification layer was initialized from scratch using Xavier method [46]. The SSP consists of four convolutional layers for AlexNet and four residual blocks for ResNet-152.

**Comparison.** This work compares its proposal with other models (see **Tables 2** and **3**—rows 8) from fixation data. For instance, DeepGazeII summed the center baseline, whereas in ML-Net and SAM, the learned priors are used for modulating the result of the network.
