3. Emulation of the known data models

We shall demonstrate how relational and non-relational databases may be represented on the described higher background. We shall consider relational data model as a full-scope example of databases with symmetric access (DBSA) [8–12] and a family of asymmetric access (or key-addressed) databases (KADB), which contains, among others, hypertext, page, and WWW- and Twitter-like DB [32–41].

Let us begin from the relational model of data.

Consider relational database (RDB), in which the scheme is R<sup>1</sup> A<sup>1</sup> <sup>1</sup>; …; A<sup>1</sup> m1 ; …; Rk A<sup>k</sup> <sup>1</sup> ; …; Ak mk <sup>g</sup>, where <sup>R</sup>1,…,Rk are the names of relations and <sup>A</sup><sup>1</sup> <sup>1</sup>, …, Ai j , …, A<sup>k</sup> mk are the names of attributes. Every relation Ri at moment t is the set of tuples

$$R\_i \subseteq D\_j^j \times \ldots \times D\_{mj}^j,\tag{36}$$

where D<sup>i</sup> <sup>j</sup> is the domain (set of possible values of attribute Ai j ).

We shall define SDB <Wt, Dt>, corresponding to this RDB, as follows. We shall include to the MDB Dt rules

$$→ \to R\_1: , \ldots, ,$$

$$\cdots \tag{37}$$

$$ \to R\_k: , \ldots, ,$$

where "<" and ">" are the dividers (aforementioned metalinguistic brackets in the Backus-Naur notation) and Ri and Ai <sup>j</sup> are the strings, being names of relations and attributes, respectively (dividers provide syntactic unambiguity).

Along with Eq. (36), MDB will include rules such that

$$D\_j^i = \left\{ b \mid < A\_j^i > \stackrel{\*}{\Rightarrow} b \&\!\&\/ b \in \{V - \{\cdot\}\}^\* \right\},\tag{38}$$

i.e., these rules provide generation of sets of words in terminal alphabet V, being domains of the respective attributes. For unambiguity we assume that comma "," does not enter values of attributes V ∈ D<sup>i</sup> j .

By this, every tuple b<sup>i</sup> <sup>1</sup>; …; <sup>b</sup><sup>i</sup> mi � � of the relation Ri is represented by fact

$$R\_i: b\_{1^p}^i \dots b\_{mi}^i \in \mathcal{W}\_l. \tag{39}$$

Note that representation of facts in the form (37)–(39) is not unique. As seen from Examples 1 and 2, tuples, entering relations, may be represented as any natural language phrases, described by the corresponding rules.

Let us consider now key-addressed databases. Their common feature is that every fact, entering KADB, includes a unique key, which is necessary to select, delete, and update this fact. These DB are associated with NoSQL family of DML [42–45], which in the last years is considered as a practical alternative to SQL-like DML [8–12, 46–48], developed since the introduction of the relational model of data.

We shall represent KADB as set

$$W\_t = \{k\_1 = d\_1, \dots, k\_m = d\_m\},\tag{40}$$

where symbol "=" inside angle brackets is the divider, ki <sup>∈</sup>ð Þ <sup>V</sup> � ¼f g <sup>∗</sup> is the key, and di ∈V <sup>∗</sup> is the data, corresponding to this key (or identified by it). At every moment t, KADB must satisfy the consistency condition: KADB is consistent, if

$$|(\forall t)(\forall k \in (V - \{=\})^{\*})|\{k = \} \cdot V^{\*} \cap W\_{t}| \le 1,\tag{41}$$

i.e., set Wt would not include two or more elements with one and the same key. Content of access to KADB must include key <sup>k</sup>, so It <sup>⊆</sup> f g <sup>k</sup> <sup>¼</sup> � <sup>V</sup> <sup>∗</sup> , and for inclusion It ¼ f g k ¼ d . S-semantics of insertion to KADB is as follows:

$$\mathcal{W}\_{t+1} = \begin{cases} \mathcal{W}\_t \cup \{k = d\}, \text{if } \{k = \} \cdot \boldsymbol{V}^\* \cap \mathcal{W}\_t = \{\mathcal{Q}\}, \\\mathcal{W}\_t - \{k = \} \cdot \boldsymbol{V}^\* \cup \{k = d\} \text{otherwise}, \end{cases} \tag{42}$$

because postulation of fact k ¼ d at moment t is equivalent to the negation of fact k ¼ d<sup>0</sup> , where d 6¼ d<sup>0</sup> , which was postulated at some earlier moment t <sup>0</sup> <t. So in the case of KADB update is implemented by insertion, and reply to this access may be defined as follows:

$$A\_{l+1} = \begin{cases} k=d, \text{if } \{k=\} \cdot V^\* \cap W\_l = \{\mathcal{Q}, \}, \\ (W\_l \cap \{k=\} \cdot V^\*) \cup \{k=d\} \text{otherwise}, \end{cases} \tag{43}$$

thus in the case of update reply contains deleted as well as included fact.

Concerning M-semantics of KADB, we may see, that every known class of such databases is identified by its own structure of keys and techniques of their extraction from the current processed fact.

The simplest approach is implemented in Twitter network, where keys, necessary for access to the descendants of the current element of the hypertext, are bounded by two dividers—"#" from the left and blank from the right.

"Set of Strings" Framework for Big Data Modeling DOI: http://dx.doi.org/10.5772/intechopen.85602

In the Internet HTTP/WWW service, similar keys are represented as strings of symbols, visualized by other colors in comparison with the rest of the text of the current hypertext page. This is equivalent to splitting terminal alphabet V to two subsets, first including symbols of the ordinary colors and the second symbols of the "key-representing" colors. However, HTTP/WWW hypertexts are organized in a much more complicated manner. First of all, along with the displayed pages, in which the structure is described by hypertext markup language (HTML) or its various later versions (XML et al.), there is another KADB, in which elements contain keys, being the aforementioned strings of another colors, and data are, in fact, unified resource locators (URL), providing direct network access to the subordinated pages. This access is possible, because URL contains string, providing application of the domain name service (DNS) for resolving proper IP address. In fact, HTML is no more than language for the convenient representation of CF rules, which form current metadabase of the WWW KADB.

One of the simplest versions of KADB is the so-called page databases, in which elements are strings of equal length, the first string of the page being key [38, 39]. Thus

$$L(\forall t)L(\mathbf{G}\_t) = V^l(V^l)^\*,\tag{44}$$

where l is the length of the string (in this case divider "=" is redundant). Data may be also string p : d, where ":" is the divider and prefix p before the sequence of l-symbol strings defines the name of the program, called for this sequence interpretation (e.g., visualization). In general case d may be the string of bits, not only string of symbols of alphabet V.

Until now we discussed only S-semantics and start point of M-semantics, being representation of metadabase as a set of rules of CF grammar. Second such point in the SSF is the representation of databases with incomplete information.
