*4.2.2 Second experiment: Qualitative results*

These saliency prediction results show that our model has robust metric scores on both real and synthetic images for saliency prediction. Again, we would like to stress that our model is not trained on fixation prediction datasets (**Figure 3**). Its model with subitizing supervision performs best on detecting pop-out effects (from visual attention theories [16]) while performing similarly for real image datasets (**Figure 4**). Some deep saliency models use several mechanisms to leverage (or/and train) performance for improving saliency metric scores, such as smoothing/thresholding (see **Figure 4**, row 4). It is also considered that some of these models are already finetuned for synthetic images (e.g., SAM-ResNet [4]). *Our approach* (which has not been trained in these type of data sets) has shown to be robust on these two distinct scenarios/domains.
