Queries Processing in Wireless Sensor Network

*Kamel Abbassi and Tahar Ezzedine*

#### **Abstract**

For the super-excellence applications used to control the water level in rivers, temperature handles a very large volume of information and does not stop constantly changing. These spatio-temporal data collected by a network of sensors form a set of thematic, integrated, non-volatile and historical data organized to help decision-making. Usually this process is performed with temporal, spatial and spatiotemporal queries. This in turn increases the execution time of the query load. In the literatures, several techniques have been identified such as materialized views (MV), indexes, fragmentation, scheduling, and buffer management. These techniques do not consider the update of the request load and the modification at the database level. In this chapter, we propose an optimal dynamic selection solution based on indexes and VMs. the solution is optimal when it meets the entire workload with a reasonable response time. The proposed approach supports modification at the database level and at the workload level to ensure the validity of the optimal solution for this the knapsack algorithm was used.

**Keywords:** wireless sensor network, workload, optimized structure, NP-complete problem, knapsack, materialized view, index, multiple selection problem, monitoring

#### **1. Introduction**

A sensor network used to record physical conditions of the environment such as temperature, rainfall, pollution, humidity, wind, etc. These data are sent to a database server which will be processed later.

All this data collected by the sensors will be recorded in a database which is in turn queried by client applications, such as the supervisor, the security agent, or a third-party application.

This database will be queried by complex queries which require resources.

To decrease the response time, it is necessary to use optimization techniques such as materialized views (MV), indexes, fragmentation, and the caching system. All these techniques are proven in the case of relational databases.

A view is a virtual table representing the result of a query on the basis. As the name suggests and unlike a standard view, in a materialized view the data is duplicated.

The index placed on a table will provide quick access to records, depending on the value of one or more fields. In addition, allows to simplify and accelerate the operations of search, sorting, joining or aggregation.

In this work, an approach proposed for the multiple selection of indexes and materialized views with the knapsack algorithm. The work presents an

improvement of another approach based on the greedy algorithm. The rest of this work is organized like this: The first section deals with optimization techniques; the problem of multiple selection of indexes and materialized views will be presented in the second section. The contribution to the dynamic workload will be mentioned in Section 4. In Section 5, a discussion of our approach for the case where the database is dynamic will be described. Finally, a discussion on the experiment and the contribution will be discussed.

The Knapsack problem, also noted KP, is an optimization problem. Presents a situation which cannot support more than a certain weight, with all or part of a given set *N* ∈ *O* of objects *O* ¼ f g *o*1, *o*<sup>2</sup> … … *:on* each having a weight weightð Þ *oi* and a value *profit o*ð Þ*<sup>i</sup>* . Items put in the backpack must maximize the total value and not

*profit o*ð Þ*<sup>i</sup>* <

On the other hand, the problem of selecting indexes and materialized views (PSIMV) consists in finding a set of indexes and materialized views constituting the final configuration to optimize the workload requests. This optimization can be in run time and storage space. The Workload requests, index and MV are presented as

*Q* ¼ *q*1, *q*2, … *qm*

*Q* presented the Workload queries. This set composed by m queries. *I* is the set

*S* is the size allowed by the administrator to store indexes and MV. Then it is necessary to find a configuration without violating the following constraints:

X *oi* ∈*S*

*weight o*ð Þ*<sup>i</sup>* <*S* (1)

� � (3)

*I* ¼ f g *i*1, *i*2, … *in* (4) *V* ¼ f g *v*1, *v*2, … *vk* (5)

*C Q*,*ConfigIV* � � <sup>¼</sup> *Min C*ð Þ *IV*ð Þ *<sup>Q</sup>* (6)

*size iv* ð Þ≤*S* (7)

*profit o*ð Þ*<sup>i</sup>* (2)

exceed the maximum weight *S* The problem is formalized as follows:

X *oi*

X *oi* ∈ *N*

∀*N* ∈ *O*,

*Queries Processing in Wireless Sensor Network DOI: http://dx.doi.org/10.5772/intechopen.94749*

of n indexes and *V* presented the k materialized views.

• The size of the configuration *ConfigIV* does not exceed *S*

X *i*∈*ConfigIV*

**3.1 Selection problem with index and MV vs. knapsack algorithm**

The problem of selecting indexes and materialized views is adopted by the genetic algorithm. The starting population is the set of candidate indexes and MVs. The objective function to optimize is the cost of the workload. The next section shows the analogy between the problem of selecting indexes and materialized views

In this work, we present the correspondence between the problem of the knapsack and that of the multiple selection of indexes and materialized views (**Table 1**).

Typically, the number of indexes and candidate VMs is greater since the input load is significant. The creation of all these indexes and MVs is not possible due to the constraint on the allocated storage space. To solve the problem, we use a cost model which allows us to keep only the most advantageous indexes and MVs. This

• Minimize the cost of Workload, i.e.

and the knapsack algorithm.

**3.2 Cost model**

**213**

follows:

#### **2. Optimization techniques**

The use of optimization techniques is based on two approaches. The first is the sequential use of techniques such as indexes and fragmentations which have depleted physical structure, but the second is the simultaneous use of techniques which have similar physical structure such as materialized views and indexes.

In [1], authors proposed three approaches which are MVFirst, INDFirst and Joint enumeration. But the major drawback of this approach is the sequential and isolating use of these techniques, which does not make it possible to benefit from the advantage of the interactivity between these optimization structures.

The authors say that this last alternative is the best [2]. Bellatrech et al. [3] improve part of the multiple selection problem with storage space management.

The authors use two spies to manage the space shared between two structures, index and VM. If the optimal configuration needs more indexes, then the spy associated with the index will take up space from the VM spy and vice versa.

In [4] use the drop algorithm for the selection simulation of indexes and MV.

These works only deal with the case where the load of requests is static and does not change over time. In [5], authors proposed an approach to dynamically select materialized views. The approach is based on the PRQ predictor to predict the next request and materialize its corresponding views using the conditional probability. This approach uses a cost model based on cloud costs. This work shows a tremendous improvement in terms of cost, execution time, and processing, but the authors only used one optimization framework which is view materialized. Another dynamic approach called Dynamat refreshes the configuration of materialized views if their size exceeds the space allocated for it [6]. Several criteria are used, for example, delete rarely used views. A hybrid approach jointly exploits a static set of persistent views used by multiple request and maintenance sequences, and another dynamic set of aggregated and smaller sizes accessible and replaceable on the fly [7]. However, these approaches focus only on the refresh performance of materialized views and not on query workload. In addition, do not use other optimization structures. Karkad et al. [8] applied the buffer management and scheduling technique to three optimization structures (index, materialized views, and fragmentation). This approach requires caching, planning, and is not dynamic.

In our approach, the simultaneity between materialized and indexed views used to benefit from the interaction between these structures. In addition, a mathematical modulization of the problem has been proposed based on the backpack algorithm which proves their performance compared to the greedy algorithm used by N. Maiz et al. [5].

#### **3. Materialized view and index selection problem**

Simultaneous selection of indexes and materialized views is an NP-Hard type problem which gives several optimal solutions [9].

*Queries Processing in Wireless Sensor Network DOI: http://dx.doi.org/10.5772/intechopen.94749*

improvement of another approach based on the greedy algorithm. The rest of this work is organized like this: The first section deals with optimization techniques; the problem of multiple selection of indexes and materialized views will be presented in the second section. The contribution to the dynamic workload will be mentioned in Section 4. In Section 5, a discussion of our approach for the case where the database is dynamic will be described. Finally, a discussion on the experiment and the

*Wireless Sensor Networks - Design, Deployment and Applications*

The use of optimization techniques is based on two approaches. The first is the

sequential use of techniques such as indexes and fragmentations which have depleted physical structure, but the second is the simultaneous use of techniques which have similar physical structure such as materialized views and indexes. In [1], authors proposed three approaches which are MVFirst, INDFirst and Joint enumeration. But the major drawback of this approach is the sequential and isolating use of these techniques, which does not make it possible to benefit from

the advantage of the interactivity between these optimization structures.

The authors say that this last alternative is the best [2]. Bellatrech et al. [3] improve part of the multiple selection problem with storage space management. The authors use two spies to manage the space shared between two structures,

index and VM. If the optimal configuration needs more indexes, then the spy associated with the index will take up space from the VM spy and vice versa. In [4] use the drop algorithm for the selection simulation of indexes and MV. These works only deal with the case where the load of requests is static and does not change over time. In [5], authors proposed an approach to dynamically select materialized views. The approach is based on the PRQ predictor to predict the next request and materialize its corresponding views using the conditional probability. This approach uses a cost model based on cloud costs. This work shows a tremendous improvement in terms of cost, execution time, and processing, but the authors

only used one optimization framework which is view materialized. Another dynamic approach called Dynamat refreshes the configuration of materialized views if their size exceeds the space allocated for it [6]. Several criteria are used, for example, delete rarely used views. A hybrid approach jointly exploits a static set of persistent views used by multiple request and maintenance sequences, and another dynamic set of aggregated and smaller sizes accessible and replaceable on the fly [7]. However, these approaches focus only on the refresh performance of materialized views and not on query workload. In addition, do not use other optimization structures. Karkad et al. [8] applied the buffer management and scheduling technique to three optimization structures (index, materialized views, and fragmenta-

tion). This approach requires caching, planning, and is not dynamic.

**3. Materialized view and index selection problem**

problem which gives several optimal solutions [9].

In our approach, the simultaneity between materialized and indexed views used to benefit from the interaction between these structures. In addition, a mathematical modulization of the problem has been proposed based on the backpack algorithm which proves their performance compared to the greedy algorithm used by

Simultaneous selection of indexes and materialized views is an NP-Hard type

contribution will be discussed.

**2. Optimization techniques**

N. Maiz et al. [5].

**212**

The Knapsack problem, also noted KP, is an optimization problem. Presents a situation which cannot support more than a certain weight, with all or part of a given set *N* ∈ *O* of objects *O* ¼ f g *o*1, *o*<sup>2</sup> … … *:on* each having a weight weightð Þ *oi* and a value *profit o*ð Þ*<sup>i</sup>* . Items put in the backpack must maximize the total value and not exceed the maximum weight *S* The problem is formalized as follows:

$$\sum\_{o\_i} weight(o\_i) < \mathcal{S} \tag{1}$$

$$\forall N \in O, \sum\_{o\_i \in N} profit(o\_i) < \sum\_{o\_i \in S} profit(o\_i) \tag{2}$$

On the other hand, the problem of selecting indexes and materialized views (PSIMV) consists in finding a set of indexes and materialized views constituting the final configuration to optimize the workload requests. This optimization can be in run time and storage space. The Workload requests, index and MV are presented as follows:

$$Q = \{q\_1, q\_2, \dots q\_m\} \tag{3}$$

$$I = \{i\_1, i\_2, \dots i\_n\} \tag{4}$$

$$V = \{v\_1, v\_2, \dots v\_k\} \tag{5}$$

*Q* presented the Workload queries. This set composed by m queries. *I* is the set of n indexes and *V* presented the k materialized views.

*S* is the size allowed by the administrator to store indexes and MV. Then it is necessary to find a configuration without violating the following constraints:

• Minimize the cost of Workload, i.e.

$$\mathcal{C}(\mathbf{Q}, \mathbf{C} \text{offg}\_{IV}) = \text{Min}(\mathbf{C}\_{IV}(\mathbf{Q})) \tag{6}$$

• The size of the configuration *ConfigIV* does not exceed *S*

$$\sum\_{i \in \text{Config}\_{IV}} \text{size}(iv) \leq \mathcal{S} \tag{7}$$

The problem of selecting indexes and materialized views is adopted by the genetic algorithm. The starting population is the set of candidate indexes and MVs. The objective function to optimize is the cost of the workload. The next section shows the analogy between the problem of selecting indexes and materialized views and the knapsack algorithm.

#### **3.1 Selection problem with index and MV vs. knapsack algorithm**

In this work, we present the correspondence between the problem of the knapsack and that of the multiple selection of indexes and materialized views (**Table 1**).

#### **3.2 Cost model**

Typically, the number of indexes and candidate VMs is greater since the input load is significant. The creation of all these indexes and MVs is not possible due to the constraint on the allocated storage space. To solve the problem, we use a cost model which allows us to keep only the most advantageous indexes and MVs. This


**Table 1.**

*Selection of indexes and materialized vs. knapsack.*

model estimates the space in bytes occupied by indexes and VMs, the data access costs and the maintenance cost in terms of number of inputs and outputs.

Indexes and MVs are the objects in this optimization system. Cost of an object is the sum of storage size, access data cost both these indexes and MVs and maintenance cost.

$$\text{Cost}\_{o\_i} = \text{Size}\_{o\_i} + \text{Cost\\_Access}\_{o\_i} + \text{Cost\\_Mat}\_{o\_i} \tag{8}$$

An optimal configuration is the set of materialized views and indexes which extend to the workload in a reasonable time with the minimum of resources.

**Algorithm 1**. Static approach.

*Queries Processing in Wireless Sensor Network DOI: http://dx.doi.org/10.5772/intechopen.94749*

*Initialization: Config* ! *{Ø}, Sizemax = 0, Profitmax = 0*

*2. S=Space authorized by administrator 3. for (Oi* ∈ *O \{Config} et Sizemax < S) then*

*4. if Profit (Oi) > Profitmax then 5. Profitmax Profit (Oi) 6. Config Config {Oi}*

*7. Sizemax Sizemax + Size (Oi)*

between the two optimization structures index and MV.

*Input: Index I, MV Output: Config*

*1. O=I MV*

*8. End if 9. End for*

*Start*

*End*

administrator.

space is exceeded.

optimization structure.

when it is used by a request.

intelligence techniques are used.

**215**

the queries. In this part we will study the two cases.

operation is very time consuming.

The following algorithm takes as input the list of indexes and Materialized views to create an optimal configuration on condition that this configuration extends to the entire load of requests and does not exceed the authorized storage size.

S is the size of the disk space allocated to store MVs and indexes, it is fixed by

Objective function Profit () calculates for each index or MV. It is the difference in cost between the workload run time with or without this object (Index or MV). If this object improves the system, it will be added to the entire configuration. At the end of this algorithm, the final configuration is made up of a set of indexes and MVs which represent the optimal solution. This technique considers the similarity

These iterations will be repeated until there is no improvement in the Profit () function, or until all indexes and VMs have been selected, or until the limit storage

Changes at the database level or in the workload require a new configuration to revert to the new Workload. Then you must rebuild new indexes and VMs. This

In Dynamat [8] the authors have removed the least used VMs to free space for new creations. In this approach the authors limit themselves to use only the MV

To solve this inconsistency problem, the authors find three strategies. The first one is that all views are updated regularly at each time interval [10]. The second one is that all views are updated at the end of each transaction [11] and the last strategy is that the changes are propagated in a delayed manner. I.e. a VM is updated only

Our approach combines the two structures (Index and Materialized views) to benefit from the structural affinity between these two optimization techniques. In a real-time survival system, query processing is important. To ensure optimal validation of the solution after the change in workload and database, two artificial

The arrival of requests is random and varied depending on the context and in this case. On the other hand, the database can be modified during the execution of

The benefit provided by an *Oi*object is the difference the cost of the Workload before adding the object *Cost*\_*Load*\_*Before O*ð Þ*<sup>i</sup>* and after the addition of this object *Cost*\_*Load*\_*After O*ð Þ*<sup>i</sup>* . the following equation calculates the profit:

$$\text{Profit}(O\_i) = \text{Cost\\_Load\\_Before}\ (O\_i) - \text{Cost\\_Load\\_After}\ (O\_i) \tag{9}$$

To add this object to the configuration list, we followed this equation

$$\text{Profit}(O\_i) = \begin{cases} > 0, & \text{AddO}\_i \text{to} \text{Config} \\ \le 0, & \text{dconthing} \end{cases} \tag{10}$$

If *Profit O*ð Þ*<sup>i</sup>* >0, There is a benefit, so add the object *Oi* to the configuration, else not add *Oi* to the configuration.

This query system can be static or dynamic. When the Workload and the data stored in the database are invariable in this case, it is a static system which will be discussed in Section 4. On other hand, if the Workload and the database are modifiable, then it is a dynamic system which will be discussed in Section 5.

#### **4. Statistic workload**

A request load is the set of requests that have arrived and are waiting for their turn to be executed. This section will discuss the case where the database does not change, and the requests have arrived successively in random order. Bellatrach et al. [3] proposed a static approach that does not support the changing of Workload. Authors apply the greedy algorithm which does not necessarily provide an optimal solution. In [6], authors show that dynamic programming and more optimal than the Greedy algorithm.

The Knapsack algorithm is an example of dynamic algorithms used for optimization problems. In the proposed approach, Artificial Intelligence uses this algorithm only in the learning phase and afterwards a model will be created to predict the final solution and avoid the execution of this algorithm on each new request.

*Queries Processing in Wireless Sensor Network DOI: http://dx.doi.org/10.5772/intechopen.94749*

An optimal configuration is the set of materialized views and indexes which extend to the workload in a reasonable time with the minimum of resources.

The following algorithm takes as input the list of indexes and Materialized views to create an optimal configuration on condition that this configuration extends to the entire load of requests and does not exceed the authorized storage size.

#### **Algorithm 1**. Static approach.

model estimates the space in bytes occupied by indexes and VMs, the data access costs and the maintenance cost in terms of number of inputs and outputs.

is the sum of storage size, access data cost both these indexes and MVs and

**Knapsack problem Problem with selecting indexes and materialized views**

Object set The final configuration of indexes and materialized views

Weight The point shows the size of each object and the required execution time. Profit This is the profile to be won if these objects are used. Shows the gain in

Size The number of bits needed to store the objects that form the optimal solution

Objects The total set of Indexes and materialized views

*Wireless Sensor Networks - Design, Deployment and Applications*

execution time and storage

*Selection of indexes and materialized vs. knapsack.*

To add this object to the configuration list, we followed this equation

*Cost*\_*Load*\_*After O*ð Þ*<sup>i</sup>* . the following equation calculates the profit:

*Profit O*ð Þ¼*<sup>i</sup>*

not add *Oi* to the configuration.

**4. Statistic workload**

the Greedy algorithm.

request.

**214**

maintenance cost.

**Table 1.**

Indexes and MVs are the objects in this optimization system. Cost of an object

The benefit provided by an *Oi*object is the difference the cost of the Workload before adding the object *Cost*\_*Load*\_*Before O*ð Þ*<sup>i</sup>* and after the addition of this object

If *Profit O*ð Þ*<sup>i</sup>* >0, There is a benefit, so add the object *Oi* to the configuration, else

This query system can be static or dynamic. When the Workload and the data stored in the database are invariable in this case, it is a static system which will be discussed in Section 4. On other hand, if the Workload and the database are modifiable, then it is a dynamic system which will be discussed in Section 5.

A request load is the set of requests that have arrived and are waiting for their turn to be executed. This section will discuss the case where the database does not change, and the requests have arrived successively in random order. Bellatrach et al. [3] proposed a static approach that does not support the changing of Workload. Authors apply the greedy algorithm which does not necessarily provide an optimal solution. In [6], authors show that dynamic programming and more optimal than

The Knapsack algorithm is an example of dynamic algorithms used for optimization problems. In the proposed approach, Artificial Intelligence uses this algorithm only in the learning phase and afterwards a model will be created to predict the final solution and avoid the execution of this algorithm on each new

*Profit O*ð Þ¼*<sup>i</sup> Cost*\_*Load*\_*Before O*ð Þ�*<sup>i</sup> Cost*\_*Load*\_*After O*ð Þ*<sup>i</sup>* (9)

> 0, *AddOitoConfig* ≤ 0, *donothing*

(10)

*Costoi* ¼ *Sizeoi* þ *Cost*\_*Accessoi* þ *Cost*\_*Matoi* (8)

*Input: Index I, MV Output: Config Initialization: Config* ! *{Ø}, Sizemax = 0, Profitmax = 0 Start 1. O=I MV 2. S=Space authorized by administrator 3. for (Oi* ∈ *O \{Config} et Sizemax < S) then 4. if Profit (Oi) > Profitmax then 5. Profitmax Profit (Oi) 6. Config Config {Oi} 7. Sizemax Sizemax + Size (Oi) 8. End if 9. End for End*

S is the size of the disk space allocated to store MVs and indexes, it is fixed by administrator.

Objective function Profit () calculates for each index or MV. It is the difference in cost between the workload run time with or without this object (Index or MV). If this object improves the system, it will be added to the entire configuration. At the end of this algorithm, the final configuration is made up of a set of indexes and MVs which represent the optimal solution. This technique considers the similarity between the two optimization structures index and MV.

These iterations will be repeated until there is no improvement in the Profit () function, or until all indexes and VMs have been selected, or until the limit storage space is exceeded.

Changes at the database level or in the workload require a new configuration to revert to the new Workload. Then you must rebuild new indexes and VMs. This operation is very time consuming.

In Dynamat [8] the authors have removed the least used VMs to free space for new creations. In this approach the authors limit themselves to use only the MV optimization structure.

To solve this inconsistency problem, the authors find three strategies. The first one is that all views are updated regularly at each time interval [10]. The second one is that all views are updated at the end of each transaction [11] and the last strategy is that the changes are propagated in a delayed manner. I.e. a VM is updated only when it is used by a request.

Our approach combines the two structures (Index and Materialized views) to benefit from the structural affinity between these two optimization techniques.

In a real-time survival system, query processing is important. To ensure optimal validation of the solution after the change in workload and database, two artificial intelligence techniques are used.

The arrival of requests is random and varied depending on the context and in this case. On the other hand, the database can be modified during the execution of the queries. In this part we will study the two cases.

We used artificial intelligence to create materialized views for the dynamic processing of the workload and to make requests as visible as possible. With automatic learning, we proposed an algorithm that allows to search for the logical link between the query load and the optimal configuration, then and after the learning phase will predict the final solution (Minimum configuration).

We started with a remodeling phase. Each request is presented by a factor which presents the list of attributes used. on the other side a matrix which presents all the possible solutions which are prepared in advance.

The Workload *Q* is formed by *n* queries, i.e., *Q* ¼ *Q*<sup>1</sup> f g , … *Qn* . A query is composed by *j* attributes, where ϵ f g *a*1, … *ak* , and each query has the following form: *Qi* <sup>¼</sup> *ai j* n o, <sup>∀</sup>*<sup>i</sup>* : <sup>1</sup>*::n*, *<sup>j</sup>* : <sup>1</sup>*::k*. The activation function used in this work is presented as follows:

$$f(a\_j) = \begin{cases} \mathbf{1} \* a\_j & \text{if } a\_j \text{ used in Q} \\ \mathbf{0} \* a\_j & \text{else} \end{cases} \tag{11}$$

**Figure 1** shows the three layers of our modeling and the steps to create candidate solutions. First step is the extraction of the attributes used in all the tables of the databasef g *a*1, *::*, *an* , then create a vector containing all the possible materialized views, i.e. the possible combinations with the attributes *v*1, *::*, *v*2*<sup>n</sup>* f g �<sup>1</sup> . A materialized view contains at least one attribute and at most all attributes. The number of

Then the candidate solutions, which presents all the possible combinations of the

To apply the automatic learning, To apply machine learning, you have to start with the learning phase, this phase the algorithm will build a logical link between the attributes and the final solutions. The duration of this phase is set by the

The algorithm is composed of two phases: The first phase is used for training.

**Figure 3** shows the architecture of our approach. The system administrator sets

The final FS solution is the optimal configuration that extends to the workload with a reasonable execution time. With this approach, a logic established between

the period for learning the model. If this phase is in progress, each time a new request arrives the system will use the knapsack algorithm to find the right configuration and at the same time prepare the neural network model. At the end of the learning phase the system will use this module provided in the first phase to predict

the requests and the final solutions to avoid recalculating each time.

VMs. The maximum number of solutions is 22*<sup>n</sup>*�<sup>1</sup> � 1.

*Queries Processing in Wireless Sensor Network DOI: http://dx.doi.org/10.5772/intechopen.94749*

However, the second is used to predict materialized views.

a new optimal configuration for each arrival query.

VMs is 2*<sup>n</sup>* � <sup>1</sup>*:*

**Figure 1.** *Final solutions tree.*

**Figure 2.**

**217**

*Machine learning to create optimal solutions.*

administrator (**Figure 2**).

The workload can be presented in form of matrix as follow:

$$\mathbf{M} \mathbf{a} \mathbf{t} \mathbf{A} = \begin{pmatrix} a\_1^1 \ast f(a\_1) & \cdots & a\_k^1 \ast f(a\_k) \\ \vdots & \ddots & \vdots \\ a\_1^n \ast f(a\_1) & \cdots & a\_k^n \ast f(a\_k) \end{pmatrix} \tag{12}$$

A final solution is a set of structures such as Index and MV that guarantees the response to the entire query load with minimal cost. Based on, *N* attributes, we can find 2*<sup>N</sup>* � 1 views and the final solution has a view between 1 and 2*<sup>N</sup>* � 1.

$$FS^f = \left\{ \upsilon^f\_\epsilon \right\}, where \, e \in \left( \mathbf{1}. 2^N - \mathbf{1} \right), f = \left( \mathbf{1}. 2^{2^N - 1} - \mathbf{1} \right) \tag{13}$$

In order to verify if this materialized view *ve* is included in the solution *FS <sup>f</sup>* or not, the function *h v*ð Þ*<sup>e</sup>* having the following form should be used

$$h(v\_{\varepsilon}) = \begin{cases} 1, & v\_{\varepsilon} \text{ if } v\_{\varepsilon} \text{ used in FS} \\ 0, & v\_{\varepsilon} \text{ else} \end{cases} \tag{14}$$

The maximum number of final solutions is 2*<sup>N</sup>* � <sup>1</sup> � �<sup>2</sup>*N*�<sup>1</sup> , where *N* is the number of attributes in database tables.

The final solutions are presented as follows

$$FS = \begin{pmatrix} v\_1^1 \ast h(v\_1) & \cdots & v\_{2^N - 1}^{\left(2^N - 1\right)^{2^N - 1}} \ast h(v\_{2^N - 1}) \\ \vdots & \ddots & \vdots \\ v\_1^n \ast h(v\_1) & \cdots & v\_{2^N - 1}^{\left(2^N - 1\right)^{2^N - 1}} \ast h(v\_{2^N - 1}) \end{pmatrix} \tag{15}$$

The references of the final solutions are stored in a vector *VS* with the following form

$$\text{VS} = \left\{ \mathbf{S}\_f \right\}, where \, f = \left( \mathbf{1}. \mathbf{2}^{2^N - 1} - \mathbf{1} \right) \tag{16}$$

#### *Queries Processing in Wireless Sensor Network DOI: http://dx.doi.org/10.5772/intechopen.94749*

We used artificial intelligence to create materialized views for the dynamic processing of the workload and to make requests as visible as possible. With automatic learning, we proposed an algorithm that allows to search for the logical link between the query load and the optimal configuration, then and after the learning

We started with a remodeling phase. Each request is presented by a factor which presents the list of attributes used. on the other side a matrix which presents all the

, ∀*i* : 1*::n*, *j* : 1*::k*. The activation function used in this work is

(11)

(12)

(13)

(14)

(15)

(16)

, where *N* is the number

1

CCCCCA

The Workload *Q* is formed by *n* queries, i.e., *Q* ¼ *Q*<sup>1</sup> f g , … *Qn* . A query is composed by *j* attributes, where ϵ f g *a*1, … *ak* , and each query has the following

> � � <sup>¼</sup> <sup>1</sup> <sup>∗</sup> *<sup>a</sup> <sup>j</sup> if a <sup>j</sup> used in Q* 0 ∗ *a <sup>j</sup> else*

> > <sup>1</sup> <sup>∗</sup> *f a*ð Þ<sup>1</sup> <sup>⋯</sup> *<sup>a</sup>*<sup>1</sup>

<sup>1</sup> <sup>∗</sup> *f a*ð Þ<sup>1</sup> <sup>⋯</sup> *an*

A final solution is a set of structures such as Index and MV that guarantees the response to the entire query load with minimal cost. Based on, *N* attributes, we can

� �, *where e*<sup>∈</sup> <sup>1</sup>*::*2*<sup>N</sup>* � <sup>1</sup> � �, *<sup>f</sup>* <sup>¼</sup> <sup>1</sup>*::*22*<sup>N</sup>*�<sup>1</sup> � <sup>1</sup>

1, *ve if ve used in FS*

<sup>2</sup>*<sup>N</sup>* ð Þ �<sup>1</sup> <sup>2</sup>*N*�<sup>1</sup>

<sup>2</sup>*<sup>N</sup>* ð Þ �<sup>1</sup> <sup>2</sup>*N*�<sup>1</sup>

<sup>2</sup>*N*�<sup>1</sup> <sup>∗</sup> *h v*2*N*�<sup>1</sup> ð Þ

<sup>2</sup>*N*�<sup>1</sup> <sup>∗</sup> *h v*2*N*�<sup>1</sup> ð Þ

� �

In order to verify if this materialized view *ve* is included in the solution *FS <sup>f</sup>* or

0, *ve else*

⋮⋱ ⋮

The references of the final solutions are stored in a vector *VS* with the following

� �, *where f* <sup>¼</sup> <sup>1</sup>*::*22*<sup>N</sup>*�<sup>1</sup> � <sup>1</sup>

⋮⋱⋮

*<sup>k</sup>* ∗ *f a*ð Þ*<sup>k</sup>*

1

CCA

� �

*<sup>k</sup>* ∗ *f a*ð Þ*<sup>k</sup>*

(

The workload can be presented in form of matrix as follow:

0

BB@

not, the function *h v*ð Þ*<sup>e</sup>* having the following form should be used

(

<sup>1</sup> ∗ *h v*ð Þ<sup>1</sup> ⋯ *v*

<sup>1</sup> ∗ *h v*ð Þ<sup>1</sup> ⋯ *v*

*h v*ð Þ¼ *<sup>e</sup>*

The final solutions are presented as follows

0

BBBBB@

*v*1

*vn*

*VS* ¼ *S <sup>f</sup>*

*FS* ¼

The maximum number of final solutions is 2*<sup>N</sup>* � <sup>1</sup> � �<sup>2</sup>*N*�<sup>1</sup>

*a*1

*an*

find 2*<sup>N</sup>* � 1 views and the final solution has a view between 1 and 2*<sup>N</sup>* � 1.

phase will predict the final solution (Minimum configuration).

*Wireless Sensor Networks - Design, Deployment and Applications*

possible solutions which are prepared in advance.

*f a <sup>j</sup>*

*MatA* ¼

*FS <sup>f</sup>* <sup>¼</sup> *<sup>v</sup> <sup>f</sup> e*

of attributes in database tables.

form

**216**

form: *Qi* <sup>¼</sup> *ai*

presented as follows:

*j* n o

**Figure 1** shows the three layers of our modeling and the steps to create candidate solutions. First step is the extraction of the attributes used in all the tables of the databasef g *a*1, *::*, *an* , then create a vector containing all the possible materialized views, i.e. the possible combinations with the attributes *v*1, *::*, *v*2*<sup>n</sup>* f g �<sup>1</sup> . A materialized view contains at least one attribute and at most all attributes. The number of VMs is 2*<sup>n</sup>* � <sup>1</sup>*:*

Then the candidate solutions, which presents all the possible combinations of the VMs. The maximum number of solutions is 22*<sup>n</sup>*�<sup>1</sup> � 1.

To apply the automatic learning, To apply machine learning, you have to start with the learning phase, this phase the algorithm will build a logical link between the attributes and the final solutions. The duration of this phase is set by the administrator (**Figure 2**).

The algorithm is composed of two phases: The first phase is used for training. However, the second is used to predict materialized views.

**Figure 3** shows the architecture of our approach. The system administrator sets the period for learning the model. If this phase is in progress, each time a new request arrives the system will use the knapsack algorithm to find the right configuration and at the same time prepare the neural network model. At the end of the learning phase the system will use this module provided in the first phase to predict a new optimal configuration for each arrival query.

The final FS solution is the optimal configuration that extends to the workload with a reasonable execution time. With this approach, a logic established between the requests and the final solutions to avoid recalculating each time.

**Figure 1.** *Final solutions tree.*

**Figure 2.** *Machine learning to create optimal solutions.*

#### **Figure 3.**

*Switching between the training phase and the prediction phase.*

In this experiment uses a workload containing 5 queries numbered from 1 to 5 and a database of 4 attribute differences that make 15 materialized views and 32,768 final solutions (**Table 2**).

This figure clearly shows that after a learning phase, the algorithm manages to predict the final solution and consequently a great gain in the execution time and

This section discusses the case where the database is dynamic, during the execution of the queries, an update on the data is in progress. Updating all optimization structures is very expensive, so it is a good idea to update only the affected optimi-

For this, two binary tables are proposed and stored in the database (**Table 3**). The Matrix IT[i, t] stores the link between the indexes and the tables of the database. If index number 3 is used by table number 5, then IT[3, 5] = 1 otherwise equal to 0. Likewise, for the Matrix VT[v, t] which presents the materialized views linked to the tables. For example, if the materialized view number 5 (MV5) is linked with

To understand, here is the following example: either Table T1 used by the indexes I1, I2, I4 and MVs V2, V4. Table T3 used by indexes I2, I4 and MVs V1, V4,

The trigger is an integrated solution in all DBMS. It is a program that launches a

I1 1 0 0 1 1 0 I2 1 0 1 1 0 1 I3 0 1 0 1 1 1 I4 1 0 1 0 1 1 I5 0 1 0 1 0 1

T1 T1 T3 T4 T5 T6

so each time the database is updated, it is wise to modify only the structure concerned (index or MV). Each time the database tables are updated, a trigger searches for the index or Materialize view affected by this change. More details

series of tasks with each change in the database. It identifies the objects to be modified in the configuration. At each update operation (insertion, update, or

the resources used.

**Figure 4.**

zation structures.

below (**Figure 5**).

**Table 3.** *Matrix IT.*

**219**

**5. Dynamic database**

*Final solution vs. predicted final solution.*

*Queries Processing in Wireless Sensor Network DOI: http://dx.doi.org/10.5772/intechopen.94749*

**Table 4** then VT[5,4] = 1 otherwise equal to 0.

Between 09:21 am and 9:47 am the requests arrive randomly. At the start the Workload contains only the query Q5 and for this workload the final solution is 4523 on the other hand the predicted final solution is 25,531 which is our predicted solution is different from the real solution.

To test the approach, an implementation of the algorithm was carried out with Python 3 on a laptop computer equipped with a Windows 10 operating system, 64 bits and 8 GB of RAM. The experimental results are discussed in the following figure.

First, each query is executed with the greedy algorithm to see the final solution as shown by the blue dots in **Figure 4**. In the second step our algorithm will be compared with the first to see if possible, to predict the final solution (orange curve) without wasting the time to recalculate the configuration each time a request arrives.


#### **Table 2.** *Dataset final solution.*

#### **Figure 4.**

In this experiment uses a workload containing 5 queries numbered from 1 to 5 and a database of 4 attribute differences that make 15 materialized views and 32,768

Between 09:21 am and 9:47 am the requests arrive randomly. At the start the Workload contains only the query Q5 and for this workload the final solution is 4523 on the other hand the predicted final solution is 25,531 which is our predicted

To test the approach, an implementation of the algorithm was carried out with Python 3 on a laptop computer equipped with a Windows 10 operating system, 64 bits and 8 GB of RAM. The experimental results are discussed in the following figure. First, each query is executed with the greedy algorithm to see the final solution as shown by the blue dots in **Figure 4**. In the second step our algorithm will be compared with the first to see if possible, to predict the final solution (orange curve) without wasting the time to recalculate the configuration each time a request

**Time Query Workload Index of final solution Index of predicted final solution**

09:21:00 Q5 Q5 4523 25,531 09:22:00 Q4 Q5Q4 2660 18,747 09:23:00 Q3 Q5Q4Q3 29,366 21,896 09:24:00 Q4 Q5Q4Q3 16,468 24,525 09:25:00 Q5 Q5Q4Q3 29,845 5103 09:26:00 Q2 Q5Q4Q3Q2 3280 23,163 … …… … … 09:42:00 Q4 Q5Q4Q3Q2Q1 23,181 23,181 09:43:00 Q4 Q5Q4Q3Q2Q1 20,649 20,649 09:44:00 Q1 Q5Q4Q3Q2Q1 8366 8366 09:45:00 Q5 Q5Q4Q3Q2Q1 21,667 21,667 09:46:00 Q2 Q5Q4Q3Q2Q1 4942 4942 09:47:00 Q4 Q5Q4Q3Q2Q1 11,120 11,120

final solutions (**Table 2**).

arrives.

**Table 2.**

**218**

*Dataset final solution.*

**Figure 3.**

solution is different from the real solution.

*Switching between the training phase and the prediction phase.*

*Wireless Sensor Networks - Design, Deployment and Applications*

*Final solution vs. predicted final solution.*

This figure clearly shows that after a learning phase, the algorithm manages to predict the final solution and consequently a great gain in the execution time and the resources used.

### **5. Dynamic database**

This section discusses the case where the database is dynamic, during the execution of the queries, an update on the data is in progress. Updating all optimization structures is very expensive, so it is a good idea to update only the affected optimization structures.

For this, two binary tables are proposed and stored in the database (**Table 3**). The Matrix IT[i, t] stores the link between the indexes and the tables of the database. If index number 3 is used by table number 5, then IT[3, 5] = 1 otherwise equal to 0. Likewise, for the Matrix VT[v, t] which presents the materialized views linked to the tables. For example, if the materialized view number 5 (MV5) is linked with **Table 4** then VT[5,4] = 1 otherwise equal to 0.

To understand, here is the following example: either Table T1 used by the indexes I1, I2, I4 and MVs V2, V4. Table T3 used by indexes I2, I4 and MVs V1, V4, so each time the database is updated, it is wise to modify only the structure concerned (index or MV). Each time the database tables are updated, a trigger searches for the index or Materialize view affected by this change. More details below (**Figure 5**).

The trigger is an integrated solution in all DBMS. It is a program that launches a series of tasks with each change in the database. It identifies the objects to be modified in the configuration. At each update operation (insertion, update, or


**Table 3.** *Matrix IT.*

#### *Wireless Sensor Networks - Design, Deployment and Applications*


*17. lock false 18. End if*

*Queries Processing in Wireless Sensor Network DOI: http://dx.doi.org/10.5772/intechopen.94749*

> *19. if(! lock) then 20. Execute (Workload)*

Algorithm 2 is still running, the *trigger* **()** function returns the list of tables infected by update if not returns null. Variable *lock* initialized to false to prevent the execution of the workload pending the configuration update. If there are updates, the *lock* variable takes true and *get\_structure()* function searches the structures infected with this modification. This function uses two matrices IT and VT. Then *Update()* function modifies the configuration to support the new updates in the tables. And at the end of this operation, the variable lock will be released to execute

In this work, a similarity between the problem of selecting indexes and materialized views with the Knapsack algorithm was proposed. The contributions are: The first level, the use of the backpack algorithm to present this problem as well as a mathematical modeling, then the use of machine learning to reduce the execution time of the workload. For this, two tables were used to ensure that the optimal configuration remains reliable even after updating the database. To validate this

this work is the result of two works published as part of a Tunisian-South African research project (Grant Numbers: 113340, 120106) funded by the Ministry of Higher Education and Scientific Research and the National Research Foundation

Communication System Laboratory Sys'com, National Engineering School of Tunis,

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

University Tunis El Manar, PB 37, Belvedere, 1002, Tunis, Tunisia

\*Address all correspondence to: kamel.abbassi@enit.utm.tn

*21. End if 22. End for*

*End*

the Workload.

**6. Conclusions**

**Acknowledgements**

**Author details**

**221**

Kamel Abbassi\* and Tahar Ezzedine

provided the original work is properly cited.

approach, an algorithm developed in python.

of South Africa presented by Prof Qing-Guo WANG.

**Table 4.** *Matrix VT.*

**Figure 5.** *Algorithm of the dynamic approach.*

deletion) the trigger does the same operation on the object concerned (Index or MV). For example, if a new row is inserted in the Table Ti, the trigger inserts the same row in the index and the VM linked by the table Ti. After each iteration, if the size of the configuration exceeds S or if the solution has become non-optimal, Algorithm 1 must be restarted.

This architecture guarantees that all the indexes and MVs form the optimal configuration even after updating the Workload.

**Algorithm 2**. Dynamic database.


*Queries Processing in Wireless Sensor Network DOI: http://dx.doi.org/10.5772/intechopen.94749*


Algorithm 2 is still running, the *trigger* **()** function returns the list of tables infected by update if not returns null. Variable *lock* initialized to false to prevent the execution of the workload pending the configuration update. If there are updates, the *lock* variable takes true and *get\_structure()* function searches the structures infected with this modification. This function uses two matrices IT and VT. Then *Update()* function modifies the configuration to support the new updates in the tables. And at the end of this operation, the variable lock will be released to execute the Workload.
