**5. Operation of computer vision algorithms**

When applying AI models, specifically computer vision models to different types of medical images, we can perform different tasks. According to Huo et al., these tasks can be classified into four categories [14]. The first one is classification, in which the input is an image and the output is a label. The label can be numerical (e.g., 1, 2) or it can be text (e.g., cancerous, noncancerous) [14, 15]. The second is detection, which consists of the identification of an object in the image by means of a bounding box. This task offers an extra degree of information since in addition to locating the object it can inform about its position by means of coordinates in the input image [14, 16, 17]. The third is segmentation, which provides the highest degree of information about an image. In this task, each pixel receives a label, and the final result is a mask that groups several pixels. This enables the segmentation of precise structures within medical images, such as glomeruli or metastatic zones in pathology slides, or entire organs, such as the bladder in CT images [14, 18–20]. The last task is synthesis. It consists of generating images from noise or other images. For this, two different models work antagonistically, one generates the images *de novo* from available data and the other model tries to discriminate this artificially generated image from a real image. With each iteration of the process, both the generator and the discriminator become more efficient, which produces images with high similarity to the real ones [14, 21]. This task allows for example to generate of more training samples to populate datasets and thus, to achieve models with more generalization power [22, 23].
