**2.2 Visual information and Marr's edges**

Actually, the idea of initial low-level image information processing had been initially avowed by Marr (Marr, 1978). (By the way, Marr was the first who originally coined the term "visual information"). According to Marr's theory, image edges are the main bearers of visual information, and therefore, image information processing has always been occupied with edge processing duties.

Affected by the mainstream pixel-based (edge-based) bottom-up image exploration philosophy, I have at first only slightly diverged from the common practice. But slowly I have begun to pave my own way. Initially I have invented the Single Pixel Information Content measure (Diamant, 2003). Then, experimenting with it, I have discovered the Information Content Specific Density Conservation principle (Diamant, 2002). (The reverse order of publication dates does not mean nothing. Papers are published not when the relevant work is finished, but when you are lucky to meet the personal views of a tough reviewer).

The Information Content Specific Density Conservation principle says that when an image scale is successively reduced, Image Specific Information Density remains unchanged. That explains why we usually launch our observation with a general, reduced scale preview of a scene and then zoom in on the relevant scene part that we are interested in. That also indicates that we perceive our objects of interest as dimensionless items. Taking into account these observations, Information Content Specific Density Conservation phenomenon actually leads us to a conclusion that information itself is a qualitative (not a quantitative) notion with a clear smack of a narrative.

It is worth to be mentioned that similar investigations have been performed at a later time by MIT researchers (Torralba, 2009), and similar results have been attained considering the use of a reduced scale (32x32 pixels) images. However, that was done only in qualitative experiments conducted on human participants (but not as a quantitative work).

It is not surprising therefore that The Information Content Specific Density Conservation principle has inevitably led me to a conclusion that image information processing has to be done in a top-down fashion (and not bottom-up as it is usually considered). Further advances on this path supported by insights borrowed from Solomonoff's theory of Inference (Solomonoff, 1964), Kolmogorov's Complexity theory (Kolmogorov, 1965), and Chaitin's Algorithmic Information theory (Chaitin, 1966) have promptly led me to a fullblown theory of Image Information Content discovery and elucidation (Diamant, 2005).

### **2.3 Visual information, a first definition**

In the mentioned above theory of Image Information Content discovery and elucidation I have proposed for the first time a preliminary definition of "What is information". In the year 2005 it had sounded as follows:

6 Semantics – Advances in Theories and Mathematical Models

case. Therefore, it is not surprising that the whole realm of image processing had been fragmented and segmented according to high-level knowledge competence of the respected

Actually, the idea of initial low-level image information processing had been initially avowed by Marr (Marr, 1978). (By the way, Marr was the first who originally coined the term "visual information"). According to Marr's theory, image edges are the main bearers of visual information, and therefore, image information processing has always been occupied

Affected by the mainstream pixel-based (edge-based) bottom-up image exploration philosophy, I have at first only slightly diverged from the common practice. But slowly I have begun to pave my own way. Initially I have invented the Single Pixel Information Content measure (Diamant, 2003). Then, experimenting with it, I have discovered the Information Content Specific Density Conservation principle (Diamant, 2002). (The reverse order of publication dates does not mean nothing. Papers are published not when the relevant work is finished, but when you are lucky to meet the personal views of a tough

The Information Content Specific Density Conservation principle says that when an image scale is successively reduced, Image Specific Information Density remains unchanged. That explains why we usually launch our observation with a general, reduced scale preview of a scene and then zoom in on the relevant scene part that we are interested in. That also indicates that we perceive our objects of interest as dimensionless items. Taking into account these observations, Information Content Specific Density Conservation phenomenon actually leads us to a conclusion that information itself is a qualitative (not a quantitative)

It is worth to be mentioned that similar investigations have been performed at a later time by MIT researchers (Torralba, 2009), and similar results have been attained considering the use of a reduced scale (32x32 pixels) images. However, that was done only in qualitative

It is not surprising therefore that The Information Content Specific Density Conservation principle has inevitably led me to a conclusion that image information processing has to be done in a top-down fashion (and not bottom-up as it is usually considered). Further advances on this path supported by insights borrowed from Solomonoff's theory of Inference (Solomonoff, 1964), Kolmogorov's Complexity theory (Kolmogorov, 1965), and Chaitin's Algorithmic Information theory (Chaitin, 1966) have promptly led me to a fullblown theory of Image Information Content discovery and elucidation (Diamant, 2005).

In the mentioned above theory of Image Information Content discovery and elucidation I have proposed for the first time a preliminary definition of "What is information". In the

experiments conducted on human participants (but not as a quantitative work).

domain experts.

reviewer).

**2.2 Visual information and Marr's edges** 

notion with a clear smack of a narrative.

**2.3 Visual information, a first definition** 

year 2005 it had sounded as follows:

with edge processing duties.

First of all, information is a description, a certain language-based description, which Kolmogorov's Complexity theory regards as a program that, being executed, trustworthy reproduces the original object. In an image, such objects are visible data structures from which an image is comprised of. So, a set of reproducible descriptions of image data structures is the information contained in an image.

Fig. 1. The block-diagram of image contained information elucidation.

The Kolmogorov's theory prescribes the way in which such descriptions must be created: At first, the most simplified and generalized structure must be described. (Recall the Occam's Razor principle: Among all hypotheses consistent with the observation choose the simplest one that is coherent with the data). Then, as the level of generalization is gradually decreased, more and more fine-grained image details (structures) become revealed and depicted. This is the second important point, which follows from the theory's pure mathematical considerations: Image information is a hierarchy of decreasing level descriptions of information details, which unfolds in a coarse-to-fine top-down manner. (Attention, please! Any bottom-up processing is not mentioned here! There is no low-level feature gathering and no feature binding!!! The only proper way for image information elicitation is a top-down coarse-to-fine way of image processing!)

The third prominent point, which immediately pops-up from the two just mentioned above, is that the top-down manner of image information elicitation does not require incorporation of any high-level knowledge for its successful accomplishment. It is totally free from any high-level guiding rules and inspirations.

Let Us First Agree on what the Term "Semantics"

levels of information details.

**3. What is semantics?** 

To summarize all what we have learned until now we can say:

set (e.g., pixel clusters or segments in an image).

this stage it is the only information available.

**3.1 Semantics – the physical information's twin** 

studies and applied to our semantics investigations.

Means: An Unorthodox Approach to an Age-Old Debate 9

its successful accomplishment. It is totally free from any high-level guiding rules and inspirations (Which is in a striking contrast with the classic image information processing theories). It deals only with natural (physical) structures usually discernible in an image, which originate from natural aggregations of similar nearby data elements (e.g., pixels in the case of an image). That was the reason why I have decided to call it "Physical Information".

**Physical Information is a description of data structures** usually discernable in a data

 **Physical Information is a language-based description**, according to which a reliable reconstruction of original objects (e.g., image segments) can be attained while the

 **Physical Information is a descending hierarchy** of descriptions standing for various complexity levels, a top-down coarse-to-fine evolving structure that represents different

 **Physical Information is** the only information that can be extracted from a raw data set (e.g., an image). Later it can be submitted to further suitable image processing, but at

To my own great surprise, solving the problem of physical (visual) information elucidation did not promote me even in the smallest way to my primary goal of image recognition, understanding and interpretation – they have remained elusive and unattainable as ever.

It is clear that physical information does not exhaust the whole visual information that we usually expect to reveal in an image. But on the other hand, it is perfectly clear that relying on our approach the only information that could be extracted from an image is the physical information, and nothing else. What immediately follows from this is that the other part of visual information, the high-level knowledge that makes grouping of disjoint image segments meaningful, is not an integral part of image information content (as it is traditionally assumed). It cannot be seen more as a natural property of an image. And it has

This way I came to the conclusion that the notion of visual information must be disintegrated to two composite parts – physical information and semantic information.

The first is contained in an image while the other is contained in the observer's head. The first can be extracted from an image while the second – and that is an eternal problem – cannot be studied by opening the human's head in order to verify its existence or to explore its peculiarities. But, if we are right in our guess that semantics is information, then we have some general principals, some insights, which can be drawn from physical information

In such a case, all previously defined aspects of the notion of information must also hold in the case of semantic information. That is, we can say – Semantics is a language-based description of the structures that are observable in a given image. While physical

to be seen as a property of a human observer that watches and scrutinizes an image.

description is carried out (like an execution of a computer program).

Following the given above principles, a practical algorithm for image information content discovery has been proposed and put in work. Its block-schema is provided in Fig. 1.

As it can be seen at Fig. 1, the proposed block-diagram is comprised of three main processing paths: the bottom-up processing path, the top-down processing path and a stack where the discovered information content (the generated descriptions of it) is actually accumulated.

As it follows from the schema, the input image is initially squeezed to a small size of approximately 100 pixels. The rules of this shrinking operation are very simple and fast: four non-overlapping neighbor pixels in an image at level L are averaged and the result is assigned to a pixel in a higher (L+1)-level image. Then, at the top of the shrinking pyramid, the image is segmented, and each segmented region is labeled. Since the image size at the top is significantly reduced and since in the course of the bottom-up image squeezing a severe data averaging is attained, the image segmentation/labeling procedure does not demand special computational resources. Any well-known segmentation methodology will suffice. We use our own proprietary technique that is based on a low-level (single pixel) information content evaluation (Diamant, 2003), but this is not obligatory.

From this point on, the top-down processing path is commenced. At each level, the two previously defined maps (average region intensity map and the associated label map) are expanded to the size of an image at the nearest lower level. Since the regions at different hierarchical levels do not exhibit significant changes in their characteristic intensity, the majority of newly assigned pixels are determined in a sufficiently correct manner. Only pixels at region borders and seeds of newly emerging regions may significantly deviate from the assigned values. Taking the corresponding current-level image as a reference (the left-side unsegmented image), these pixels can be easily detected and subjected to a refinement cycle. In such a manner, the process is subsequently repeated at all descending levels until the segmentation/classification of the original input image is successfully accomplished.

At every processing level, every image object-region (just recovered or an inherited one) is registered in the objects' appearance list, which is the third constituting part of the proposed scheme. The registered object parameters are the available simplified object's attributes, such as size, center-of-mass position, average object intensity and hierarchical and topological relationship within and between the objects ("sub-part of…", "at the left of…", etc.). They are sparse, general, and yet specific enough to capture the object's characteristic features in a variety of descriptive forms.

In such a way, a set of pixel clusters (segments, structures formed by nearby pixels with similar properties) is elucidated and depicted providing an explicit representation of the information contained in a given image. That means, taking the relevant segment description we can reconstruct it trustworthy and rigorously, because (by definition) every such a description contains all the information needed for the item's (or the whole set of items, that is an entire image) successful reconstruction.

## **2.4 Visual information = Physical information**

One interesting thing has already been mentioned above – the top-down coarse-to-fine image information elucidation does not require any high-level knowledge incorporation for 8 Semantics – Advances in Theories and Mathematical Models

Following the given above principles, a practical algorithm for image information content

As it can be seen at Fig. 1, the proposed block-diagram is comprised of three main processing paths: the bottom-up processing path, the top-down processing path and a stack where the discovered information content (the generated descriptions of it) is actually

As it follows from the schema, the input image is initially squeezed to a small size of approximately 100 pixels. The rules of this shrinking operation are very simple and fast: four non-overlapping neighbor pixels in an image at level L are averaged and the result is assigned to a pixel in a higher (L+1)-level image. Then, at the top of the shrinking pyramid, the image is segmented, and each segmented region is labeled. Since the image size at the top is significantly reduced and since in the course of the bottom-up image squeezing a severe data averaging is attained, the image segmentation/labeling procedure does not demand special computational resources. Any well-known segmentation methodology will suffice. We use our own proprietary technique that is based on a low-level (single pixel)

From this point on, the top-down processing path is commenced. At each level, the two previously defined maps (average region intensity map and the associated label map) are expanded to the size of an image at the nearest lower level. Since the regions at different hierarchical levels do not exhibit significant changes in their characteristic intensity, the majority of newly assigned pixels are determined in a sufficiently correct manner. Only pixels at region borders and seeds of newly emerging regions may significantly deviate from the assigned values. Taking the corresponding current-level image as a reference (the left-side unsegmented image), these pixels can be easily detected and subjected to a refinement cycle. In such a manner, the process is subsequently repeated at all descending levels until the segmentation/classification of the original input image is successfully

At every processing level, every image object-region (just recovered or an inherited one) is registered in the objects' appearance list, which is the third constituting part of the proposed scheme. The registered object parameters are the available simplified object's attributes, such as size, center-of-mass position, average object intensity and hierarchical and topological relationship within and between the objects ("sub-part of…", "at the left of…", etc.). They are sparse, general, and yet specific enough to capture the object's characteristic

In such a way, a set of pixel clusters (segments, structures formed by nearby pixels with similar properties) is elucidated and depicted providing an explicit representation of the information contained in a given image. That means, taking the relevant segment description we can reconstruct it trustworthy and rigorously, because (by definition) every such a description contains all the information needed for the item's (or the whole set of

One interesting thing has already been mentioned above – the top-down coarse-to-fine image information elucidation does not require any high-level knowledge incorporation for

discovery has been proposed and put in work. Its block-schema is provided in Fig. 1.

information content evaluation (Diamant, 2003), but this is not obligatory.

accumulated.

accomplished.

features in a variety of descriptive forms.

items, that is an entire image) successful reconstruction.

**2.4 Visual information = Physical information** 

its successful accomplishment. It is totally free from any high-level guiding rules and inspirations (Which is in a striking contrast with the classic image information processing theories). It deals only with natural (physical) structures usually discernible in an image, which originate from natural aggregations of similar nearby data elements (e.g., pixels in the case of an image). That was the reason why I have decided to call it "Physical Information".

To summarize all what we have learned until now we can say:


To my own great surprise, solving the problem of physical (visual) information elucidation did not promote me even in the smallest way to my primary goal of image recognition, understanding and interpretation – they have remained elusive and unattainable as ever.
