**2.1 Visual information, the first steps**

My personal interaction with information/semantics issues has happened somewhere in the mid-1980s. At that time I was busy with home security and surveillance systems design and development. As known, such systems rely heavily on visual information acquisition and processing. However – What is visual information? – nobody knew then, nobody knows today. But, that has never restrained anybody from trying again and again to meet the challenge.

Deprived from a suitable understanding what visual information is, computer vision designers have always tried to find their inspirations in biological vision analogs, especially human vision analogs. Although underlying fundamentals and operational principles of human vision were obscure and vague, still the research in this field was always far more mature and advanced. Therefore the computer vision society has always considered human vision conjectures as the best choice to follow.

A theory of human visual information processing has been established about thirty years ago by the seminal works of David Marr (Marr, 1982), Anne Treisman (Treisman & Gelade, 1980), Irving Biederman (Biederman, 1987) and a large group of their followers. Since then it has become a classical theory, which dominates today all further developments both in human and the computer vision. The theory considers human visual information processing as an interplay of two inversely directed processing streams. One is an unsupervised, bottom-up directed process of initial image information pieces discovery and localization (The so-called low-level image processing). The other is a supervised, top-down directed process, which conveys the rules and the knowledge that guide the linking and binding of these disjoint information pieces into perceptually meaningful image objects (The so-called high-level or cognitive image processing).

While the idea of low-level processing from the very beginning was obvious and intuitively appealing (therefore, even today the mainstream of image processing is occupied mainly with low-level pixel-oriented computations), the essence of high-level processing was always obscure, mysterious, and undefined. The classical paradigm said nothing about the roots of high-level knowledge origination or about the way it has to be incorporated into the introductory low-level processing. Until now, however, the problem was usually bypassed by capitalizing on the expert domain knowledge, adapted to each and every application

Let Us First Agree on what the Term "Semantics"

structures is the information contained in an image.

image

4 to 1 compressed image

image

Means: An Unorthodox Approach to an Age-Old Debate 7

First of all, information is a description, a certain language-based description, which Kolmogorov's Complexity theory regards as a program that, being executed, trustworthy reproduces the original object. In an image, such objects are visible data structures from which an image is comprised of. So, a set of reproducible descriptions of image data

**Last (top) level**

Segmentation Classification

Level n-1

. . . . . . . . . . . . . . . .

Level 1

Level 0

object maps Original image

Fig. 1. The block-diagram of image contained information elucidation.

elicitation is a top-down coarse-to-fine way of image processing!)

high-level guiding rules and inspirations.

**Bottom-up path Top-down path Object list**

Levl 1 obj. 4 to 1 compressed

The Kolmogorov's theory prescribes the way in which such descriptions must be created: At first, the most simplified and generalized structure must be described. (Recall the Occam's Razor principle: Among all hypotheses consistent with the observation choose the simplest one that is coherent with the data). Then, as the level of generalization is gradually decreased, more and more fine-grained image details (structures) become revealed and depicted. This is the second important point, which follows from the theory's pure mathematical considerations: Image information is a hierarchy of decreasing level descriptions of information details, which unfolds in a coarse-to-fine top-down manner. (Attention, please! Any bottom-up processing is not mentioned here! There is no low-level feature gathering and no feature binding!!! The only proper way for image information

The third prominent point, which immediately pops-up from the two just mentioned above, is that the top-down manner of image information elicitation does not require incorporation of any high-level knowledge for its successful accomplishment. It is totally free from any

Object shapes Labeled objects Top level object descriptors 4 to 1 comprsd

> 1 to 4 expanded object maps

> > 1 to 4 expanded object maps

> > > 1 to 4 expanded

Level n-1 objects

L 0

case. Therefore, it is not surprising that the whole realm of image processing had been fragmented and segmented according to high-level knowledge competence of the respected domain experts.
