inf fAREA LONELY TREES IS <state>AT 12:30, AREA < name of area>IS SMOKED AT 12:< minutes>g ¼ AREA LONELY TREES IS SMOKED AT 12:30:∎

Since now we shall use interpretation of ∗ ¼) Gt as of the relation of the mutual informativity of incomplete facts. According to this interpretation, x ∗ ¼) Gt x<sup>0</sup> means that N-fact x<sup>0</sup> is not less informative in comparison with N-fact x (if x þ ¼) Gt x0 , then x<sup>0</sup> is more informative and is called concretization of x). (This interpretation naturally fits to A. Kolmogorov's algorithmic theory of information basic postulates, i.e., constructive objects mutual complexity [50]).

"Set of Strings" Framework for Big Data Modeling DOI: http://dx.doi.org/10.5772/intechopen.85602

Figure 1. Graphical illustration of interconnection between supf g x; y and inff g x; y .

Graphical illustration of interconnections between supf g x; y and inff g x; y is in Figure 1. As seen, supf g) <sup>x</sup>; <sup>y</sup> <sup>∗</sup> <sup>x</sup>, supf g) <sup>x</sup>; <sup>y</sup> <sup>∗</sup> y, x) ∗ inff g x; y , y) ∗ inff g x; y , inff g) <sup>x</sup>; <sup>y</sup> <sup>∗</sup> <sup>w</sup>, and inff g) <sup>x</sup>; <sup>y</sup> <sup>∗</sup> w0 , where w ∈L Gð Þ<sup>t</sup> and w<sup>0</sup> ∈L Gð Þ<sup>t</sup> .

Let us consider DBI Xt <sup>¼</sup> <sup>x</sup>1; …; xm tð Þ � �⊆SF Gð Þ<sup>t</sup> . We shall call such DBI nonredundant (NR), if there are no two N-facts x and x<sup>0</sup> entering Xt, one of which is

more informative than the other. (It is obvious that if <sup>x</sup><sup>0</sup> <sup>∗</sup> ¼) Gt x, then there is no

necessity of storing x<sup>0</sup> , because all information, having place in x<sup>0</sup> , presents in x. So x<sup>0</sup> is a redundant N-fact, and DBI, containing such N-facts, is also redundant).

Until the contrary is declared, we shall consider only NR DBI lower. By this, when defining M-semantics of update of NR DBI, understood as inclusion of N-fact x∈SF Gð Þ<sup>t</sup> , it is reasonable to suppose, that it contains maximally informative N-facts, which only may be acquired by the system. In this case inclusion of N-fact x∈Xt to DBI may be defined as follows:

$$X\_{t+1} = X\_t \bigcup \{ \mathbf{x} \} - \left\{ y \mid y \in \mathbf{X}\_t \& \left( \mathbf{x} \xrightarrow[G\_t]{\mathbf{x}} \mathbf{y} \bigvee y \xrightarrow[G\_t]{\mathbf{x}} \mathbf{x} \right) \right\}. \tag{47}$$

According to this definition, N-fact x inclusion to DBI Xt causes extraction from DBI of all N-facts, which are more or less informative than x. Such logics provides maintenance of nonredundancy of the DBI. As seen, all N-facts, having place in Xt and being "compatible" with N-fact x by informativity, are eliminated from this DBI.

It is reasonable to define reply to the inclusion of N-fact x as set of N-facts, eliminated from DBI:

$$\mathbf{A}\_{t+1} = \mathbf{X}\_t - \mathbf{X}\_{t+1}.\tag{48}$$

Example 5. If N-fact x ¼ AREA GREEN VALLEY IS SMOKED AT 15:30 is included to DBI Xt from Example 3, then

$$X\_{t+1} = \{A \text{REA LONELY TREES IS NORMAL AT 12.31,} \\ \text{(A) } \quad \text{(A) } \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad$$

because

$$A \text{REA}  \text{SMOKED } \; AT \; \text{15.30} \xrightarrow[G\_l]{\text{\*}}$$

#### AREA GREEN VALLEY IS SMOKED AT 15:30.∎

As to M-semantics of queries, there may by two different versions, which run out of the new DBI features in comparison with DB.

The first version is obvious:

$$A\_{t+1}^{\mathcal{V}} = \left\{ \mathbf{x} \mid \mathbf{x} \in \mathbf{X}\_t \& \mathbf{y} \stackrel{\*}{\Rightarrow} \mathbf{x} \right\},\tag{49}$$

where Ay <sup>t</sup>þ<sup>1</sup> is the answer to the query with content <sup>y</sup>, and all N-facts from DBI Xt, which are no less informative than x, are included to the answer.

As seen, by this we postulate query language to databases (or DB with complete information): it is the set of sentential forms of CF grammar Gt, and in the case of SF y is the content of query ωt,

$$I\_t = \left\{ w \middle| y \overset{\*}{\underset{G\_t}{\rightleftharpoons}} w \middle| \mathfrak{E} w \in V^\* \right\},\tag{50}$$

i.e., it is the set of facts, more informative than N-fact y, having place in the query. So combining Eqs. (9) and (49), we obtain

$$A\_{t+1} = W\_t \cap I\_t = \left\{ w \, \middle| \, y \overset{\*}{\underset{G\_t}{\rightleftharpoons}} w \, \& w \in W\_t \right\},\tag{51}$$

i.e., the result of access is the subset of database Wt, containing all facts, more informative than y.

The background of the second version is the interpretation of the query as an action, which aim is to check if there are such possible facts w ∈L Gð Þ<sup>t</sup> , which are more informative than N-fact x, entering Xt, and N-fact y (query content):

$$(\exists w \in L(G\_t)) \ge \mathop{\ast}\limits\_{G\_t} w \mathop{\otimes}\nolimitsly\mathop{\ast}\limits\_{\overline{G\_t}}^{\ast} w,\tag{52}$$

so, while x is not concretization of y, it is sensible to include x to the answer, because there may be facts w ∈ L Gð Þ<sup>t</sup> , which are both x and y concretizations. The set of such facts is the intersection Wx ∩Wy, where

$$W\_{\mathbf{x}} = \left\{ w \mid \mathbf{x} \stackrel{\ast}{\Longrightarrow} w \right\}, \tag{53}$$

$$W\_{\mathbf{y}} = \left\{ w \mid \mathbf{y} \stackrel{\ast}{\Longrightarrow} w \right\}.$$

"Set of Strings" Framework for Big Data Modeling DOI: http://dx.doi.org/10.5772/intechopen.85602

Finite representation of the aforementioned intersection is, obviously, maximal lower bound of the set f g x; y . For this reason, answer to the query with content y may be set

$$\overline{A}\_{t+1}^{\prime} = \{ \mathfrak{x} \mid \mathfrak{x} \in \mathcal{X}\_t \; \mathfrak{B} \; \exists \, \exists \, \inf \{ \mathfrak{x}, \mathfrak{y} \} \}, \tag{54}$$

and, as an alternative,

$$\overline{\widetilde{A}}\_{t+1}^{\mathcal{V}} = \{ \inf \{ \boldsymbol{\mathfrak{x}}, \boldsymbol{\mathfrak{y}} \} \mid \boldsymbol{\mathfrak{x}} \in \mathcal{X}\_t \; \& \exists \; \inf \{ \boldsymbol{\mathfrak{x}}, \boldsymbol{\mathfrak{y}} \} \}. \tag{55}$$

Example 6. Consider DBI Xt from Example 3 and the query with content y ¼ AREA <name of area>IS SMOKED AT <time>. The purpose of this query is to get information about all areas smoked. According to Eqs. (49), (54), and (55)

$$A\_{t+1}^{\mathcal{V}} = \{AREA 
$$\overline{A}\_{t+1}^{\mathcal{V}} = \{AREA \text{ LONENLY } TREES AT \text{ 13. },$$

$$AREA 
$$\overline{A}\_{t+1}^{\mathcal{V}} = \{AREA \text{ LONENLY } TREES \text{ IS SWOKED } AT \text{ 13. },$$

$$AREA $$
$$
$$

Returning to DB, which DML S-semantics was defined by Eqs. (4)–(9), we may now write its M-semantics equations not only for query but also for insertion and deletion. Namely, if string w is the content of the insertion access, then

$$\mathcal{W}\_{t+1} = \begin{cases} \mathcal{W}\_t \cup \{w\}, \text{if } w \in L(\mathcal{G}\_t), \\ \mathcal{W}\_t \quad \text{otherwise}. \end{cases} \tag{56}$$

Similarly, if string y, containing terminal and nonterminal symbols of CF grammar Gt, is the content of the delete access, then

$$\mathcal{W}\_{t+1} = \mathcal{W}\_t - \left\{ w \middle| y \overset{\*}{\underset{\mathcal{G}\_t}{\right\}} w \middle\\$\& w \in L(\mathcal{G}\_t) \right\}.\tag{57}$$

Data manipulation languages, described in this section, will be called sentential (SDML), because content of any access to DB/DBI, specified with the help of such DML, is a sentential form of CF grammar, which set of rules is current MDB.

Concerning KADB, capabilities of the sentential DML are compatible with the aforementioned NoSQL languages, which provide selection of DB elements, in which keys are specified in the queries.

Of course, it is not difficult to extend SDML by features, providing construction of more complicated selection criteria (including, e.g., number intervals) [38, 39]. But in comparison with SQL and similar relational languages, providing symmetric access to DB, SDML are rather poor. To achieve capabilities of the relationally complete query languages, it is necessary to extend SDML by features, providing comparison of values, having place in different facts. Such features are critically needed also for knowledge representation, extraction, and processing.

To achieve the formulated purpose, we shall use another tool, differing from CF grammars, namely, Post systems (PS), which also operate strings but, due to variables in their basic constructions (productions), have basic capabilities for aforementioned functions. The result of integration of the described "set of strings"

databases with PS are augmented Post systems. The intermediate layer between SDB and APS is formed by the so-called word equations on context-free languages, considered in the next section.
