*Q'← π Age, Department σ Age < 70 (QC1,QC2);*

Fig. 7. Rewritten Query Q'.

#### **3.4.1 Example**

88 Semantics – Advances in Theories and Mathematical Models

As discussed earlier, a user of data integration system poses query in term of mediated schema, because root sources are transparent in such systems. A module of data integration system translate/reformulate a user query that refers directly to the root sources. Several reputed algorithms exist for such query reformulation/rewriting (Levy A.Y et al., 1996), (Duschka O.M. and Genesereth M.R. 1997), (Pottinger R. and Levy A. 2000). In context of semantic cache the root sources are the cache segments and the mediated schema is the cache description. The goal of the bucket algorithm (Levy A.Y et al., 1996) is to reformulate a user query that is posed on the mediated (virtual) schema into a query that refers directly to the available (local/cached) data sources. This reformulation is known as query-rewriting. Both the query and the sources are described by select-project-join queries that may include atoms of arithmetic comparison predicates. The bucket algorithm returns the maximallycontained rewriting of the query using the views. This rewriting is a maximally-contained

We demonstrate working (in context of semantic cache query processing) of bucket

*QU ← π Age, Department σ Exp > 20 Age < 70 (Emp);* 

Let us have QC1, QC2 and QC3 (shown in Figure 6) in cache, and a user query QU (shown in Figure 6) is posed over them. As shown in Table 1 below, according to bucket algorithm both cached queries QC1 and QC2 are candidate selection for its bucket. Since there is no inconsistency between user query predicate and cached queries (i.e. *Age ≥55* consistent with *Age < 70*) when compared in isolation (atomically). Where QC3 is excluded due to predicate inconsistency (i.e. *Exp < 15* inconsistent with *Exp > 20*). In second step of bucket algorithm, elements of buckets are combined together to form a rewriting of the user query. The

Table 1. Contents of Bucket. The attribute not required by user query is shown as primed

*QC1 ← π Ename, Department σ Age ≥<sup>50</sup> (Emp);* 

*QC3 ← π Age, Department σ Exp < 15 (Emp);* 

*QC2 ← π Age, Department (Emp);* 

rewritten query (*Q*') in this case is shown in Figure 7 below.

Fig. 6. User Query (QU) Over Cached Queries

**3.4 Bucket algorithm** 

but not an equivalent one.

algorithm with example.

attribute.

We follow the results produced by maximally-contained query rewriting algorithm named bucket algorithm (Levy A.Y et al., 1996) provided above. The predicate (*Exp > 20*) is pruned because query cannot be executed over cached data as there is no information present against *Exp* attribute. Further more if the rewritten query (*Q'* shown in Figure 7) executed locally, it will give unnecessary/incorrect results. These results are maximally-contained or maximum data retrieval (MDR) but the results contain tuples that are not part of the actual user query (QU). Figure 8 (a) shows the collective dataset of cached queries QC1 and QC2 (Figure 6). The rewritten query (*Q'*) executed over cached data is shown in Figure 8 (b). The data items shown as strike circle ( ) in Figure 8 (b) are the required results of user query (QU). Where results retrieved by the rewritten query (*Q'*) are not the precise answer for the user query (QU).

Fig. 8. (a) Collective Data of Cached Query QC1 and QC2. (b) Rewritten Query Q' Over Cached Query QC1 and QC2.

Bucket algorithm does not compute probe and remainder queries separately. So there is no way to determine the available and unavailable answer from the cache.

#### **3.5 Semantic reasoning for web queries**

A web-based cache system is different from data caching. In web cache, special proxy servers store recently visited pages for later reuse. A uniform resource locater (URL) is a user query that is posed over web cache system. Any page in cache is being used whenever a user given URL matches the cache page header. This type of caching strategy is similar to page-caching, where binary results (complete answer or no-answer) are possible but partial answer cannot be determined.

Semantic Cache Reasoners 91

An XML document is shown in Figure 9. A user issues a query /lib/book and as a result the technique loads all the results of "lib", "book" nodes in the cache and assigns prime numbers to each node i.e. "lib"=2, "book"=3. After assigning the prime numbers a prime

Now if the user again issues the query /lib/book/author then each node in the query is assigned the same prime number as it was previously assigned to the nodes in the view. Here 2 is assigned to "lib" and 3 is assigned to "book". "author" appeared first time so a new prime number i.e. 5 is assigned to author node. Dividing the prime product of query (90) by the prime product of view (6) will yields the result 15, means query is completely divided by the view. If the prime product of a view completely divides the prime product of a query then it further checks the following conditions. Whether the order of appearance of each axis node in the view and query is similar and if the answer is true then it means that

If a query contains predicates, for example A[b[b[a]]]/c/d the tree of this query is shown in

Now only the PPT of b completely divides the PPT of the query so b is selected in the first

This algorithm retrieves the results of all axis nodes given in the query for example if we issue following query to the document shown in figure 1 "\lib\book[price>30]". Then apart from the presence of a predicate it retrieves all the result of book node and stores it in the

Description logics (a language of logic family) DL claims that it can express the conceptual domain model/ontology of the data source and provide evaluation techniques. Since structured query language (SQL) is a structured format, it can be classified under

**3.6.1 Example** 

**3.6.2 Example** 

product is calculated as follows.

the query is contained in the view.

Fig. 9. Prime Product Calculation.

condition of the algorithm.

PPT of b=(2\*3)\*(3\*1)\*(1\*2)\*(2\*7) = 504 PPT of c= (2\*7)\*(7\*1)\*(1\*2)\*(2\*3) = 1176

cache. This action requires more cache space.

**3.7 Subsumption analysis reasoning** 

(2\*3), here 6 is the Tree Pattern Prime Product of the view.

figure 9. The prime product is calculated as shown below

However, searching performed over web resources through Boolean queries (keywords conjunction with AND & NOT operators) do not work in a plain page caching system. Because the user query in this case is not a URL, and extracting qualified tuples against an individual keyword or whole query from page headers is not possible (Chidlovskii B and Borghoff U. M., 2000), (Qiong L and Jaffrey F. N., 2001). Semantic cache was introduced as an alternative to plain page caching where cache is managed as semantic regions.

Web queries over web resources are different than queries posed over databases. As there is no attribute and predicate part in web queries, also it neither contain join operator. And the problem of answering web-queries can be reduced to *set containment problem*.

There is a lot of research work on semantic caching for web queries. Such as (Chidlovskii B and Borghoff U. M., 2000) addressed both semantic cache management and query processing of web queries for meta-searcher systems. Their technique is based on a signature file method. In which a signature is given to every semantic region for processing all cases (similar to Figure 1) of containment and intersection.

A cache model was proposed for database applications using web techniques (Anton J. et al., 2002). Cache elements were stored as web pages/sub pages called fragments and sub fragments with their header information called template. Fragments can be indexed or shared among different templates. Fragments, sub-fragments and templates were updated or expired based on their unique policy which included expiration, validation and invalidation information. In this case data retrieval is performed by matching template information with requested query and subsequent fragments or sub-fragments are returned. Partial answer retrieval is possible in this technique as sub-fragments alone can be resulted to a user query. But still this technique is closer to page cache technique, where each fragment is itself a page.
