**5.2 Construction for cache demand matrix**

According to the parameters described in Section 5.1, we can construct the cache demand matrix, which is the total cache needing of all cache files, and as followings:

```
1. For CFi ( 1≤i≤m ) do
```
{Calculate the duplication width vector

DWVi (|SEGi1|, |SEGi2| … |SEGiw|) for CFi by Segli , where iw is the numbers of the segment of CFi, and |SEGij| is the duplication width of j th segment of CFi};


In CDM, the sum value Tcd= 1 i m, 1 j k *qij* is the total cache demand of all cache files, and

we hope the formula (1) is true. In fact, it is not possible.

$$\text{Tcd} \mathsf{STc} \tag{1}$$

Research and Implementation of Parallel Cache Model Through Grid Memory 145

**Strategy 1.** *d-density* **allocation strategy** means that the allocation ratio between the numbers of *CA* and the numbers of their *DSEG* for one segment in the same cache node is *d*. Namely,

Fig. 5. This is the compress storage in a cach node for the same duplication and it is the 3-

Compress

In order to use the storage of grid memory effectively, we hope *CDM*=*NCDM*. Through the *d-density* allocation strategy**,** we can make the formula (2) is true, where *Tcd*(*NCDM*) means

Tcd(NCDM)<Tc d (2)

For all cache nodes *CNk* (1≤*k*≤*p*) , we get a subset of *NCDM* elements SE to make the formula

is a small constant. For each *DSEG*s, we adopt the d-density allocation strategy

CN

DSEG1

CA

CA

CA

Cut-matrix

*CDM-NCDM*

Fig. 4. The relations between CDM, Cut-matrix, and NCDM

*NCDM*

CN

DSEG1

*CDM*

DSEG1

DSEG1

density allocation strategy.

**5.4 Cache allocation** 

(3) is true*.*

to appoint CA.

that the total cache demand of *NCDM*.

one *DSEG* has *d CA*s in the same cache node. *d* is the average in PCMGM.

CA

CA

CA

When *Tcd*≥*Tc*, we can processes the *CDM* by the techniques of cut-matrix and compress in order to make the formula (1) is true.

#### **5.3 Cut-matrix and compress technique**

The *cut-matrix* is a special *m×k* matrix *CM=(cqij)* (1≤*i*≤*m*, 1≤*j*≤*k*), where *cqij=Cut-v* and *Cut-v* is a integer. The process is presented as follows:


The *NCDM* is the new cache demand matrix and all its *DSEG*s can be cached in grid memory. The demand of *CDM-NCDM* will be save the disk. The cut-matrix is presented in the figure 4.

According to the *CA* conception, the corresponding relation between a *CA* and a *DSEG* is one to one. But if a set of duplications of the same segment of be stored in the same cache node, the duplication will be stored one time in the cache node. The corresponding cache agent is established many times in the cache node, so this mechanism can improve the use rate of memory resources, but the computing ability of the cache node may be overload. The average ratio of the *CA* numbers and the *DSEG* numbers is called as the *cache agent density*  (CAD). The compress principle is presented in the figure 4.

144 Grid Computing – Technology and Applications, Widespread Coverage and New Horizons

DWVi (|SEGi1|, |SEGi2| … |SEGiw|) for CFi by Segli , where iw is the numbers of the

2. Look for the cache file which has the maximal segment number in all cache files, and

3. Through the vector set DWV1, DWV2 … DWVm and k, we construct the cache demand matrix: CDM= (qij) (1≤i≤m, 1≤j≤k), Where qij is the duplication width of the jth segment

When *Tcd*≥*Tc*, we can processes the *CDM* by the techniques of cut-matrix and compress in

The *cut-matrix* is a special *m×k* matrix *CM=(cqij)* (1≤*i*≤*m*, 1≤*j*≤*k*), where *cqij=Cut-v* and *Cut-v* is

*NCDM*=*CDM*; /\* *NCDM* is the cut *CDM* that can satisfy the formula (1)\*/

/\* - is the subtraction of Matrixes and if

Until the formula (1) is true;

*NCDM*=*0*, then *NCDM*-*CM*=*0, 0* is zero-matrix

Set all elements of *CM* as *Cut-v*;

Calculate the *Tcd by NCDM*;

*NCDM=NCDM*-*CM*;

The *NCDM* is the new cache demand matrix and all its *DSEG*s can be cached in grid memory. The demand of *CDM-NCDM* will be save the disk. The cut-matrix is presented in

According to the *CA* conception, the corresponding relation between a *CA* and a *DSEG* is one to one. But if a set of duplications of the same segment of be stored in the same cache node, the duplication will be stored one time in the cache node. The corresponding cache agent is established many times in the cache node, so this mechanism can improve the use rate of memory resources, but the computing ability of the cache node may be overload. The average ratio of the *CA* numbers and the *DSEG* numbers is called as the *cache agent density* 

(CAD). The compress principle is presented in the figure 4.

is the total cache demand of all cache files, and

*Tcd*≤*Tc* (1)

segment of CFi, and |SEGij| is the duplication width of j th segment of CFi};

of CFi, and qij ≥0; If the jth segment of CFi is not existed, then qij=0.

*qij*

1 i m, 1 j k

we hope the formula (1) is true. In fact, it is not possible.

the maximal segment number is k;

In CDM, the sum value Tcd=

the figure 4.

order to make the formula (1) is true.

**5.3 Cut-matrix and compress technique** 

a integer. The process is presented as follows:

*Cut-v*=0; Repeat do *Cut-v*++;

\*/

Fig. 4. The relations between CDM, Cut-matrix, and NCDM

**Strategy 1.** *d-density* **allocation strategy** means that the allocation ratio between the numbers of *CA* and the numbers of their *DSEG* for one segment in the same cache node is *d*. Namely, one *DSEG* has *d CA*s in the same cache node. *d* is the average in PCMGM.

Fig. 5. This is the compress storage in a cach node for the same duplication and it is the 3 density allocation strategy.

In order to use the storage of grid memory effectively, we hope *CDM*=*NCDM*. Through the *d-density* allocation strategy**,** we can make the formula (2) is true, where *Tcd*(*NCDM*) means that the total cache demand of *NCDM*.

$$\text{Tcd(NCDM)} \lhd \mathbf{Tc} \times \mathbf{d} \tag{2}$$

### **5.4 Cache allocation**

For all cache nodes *CNk* (1≤*k*≤*p*) , we get a subset of *NCDM* elements SE to make the formula (3) is true*.* is a small constant. For each *DSEG*s, we adopt the d-density allocation strategy to appoint CA.

$$\sum\_{1 \le i \le p,\atop 1 \le i \le p} \text{Mai}\_{i - \mathcal{E}} \le \sum\_{q \ne \mathcal{E} \mid \mathcal{E}} q\_{ij} \le \sum\_{1 \le i \le p,\atop 1 \le i \le p,\atop 2 \le i \le p} \text{Mai}\_{i} + \mathcal{E} \tag{3}$$

Research and Implementation of Parallel Cache Model Through Grid Memory 147

1. AA do the statistics work from user information and report them to GA; GA calculates the cache demand matrix CDM by the statistics information, and get a LCC (That is

2. All NAs, which LCC includes, calculate its memory storage ability Ma and its

3. GA receives the idle resource information from LCC and calculates the total computing

4. GA constructs the NCDM according to the cut-matrix and d-density allocation

6. All NAs of CN transfer the segments form the file server into grid memory in order to

8. All users commit its access sequences to AA, AA redirect the connections to build the

9. All CAs and users start the file operation in Parallel and PCMGM start the dynamic

In order to test this model, we built a *DNE* that is composed of 2 file servers and 8 cache nodes (personal computer), and 16 client computers, and they are connected by LAN. Each server has 4 disks, and the segments of large file are distributed in all disks in balance. There are 4 files and the length of all them is about 1GB. The segment length is 48MB. The cache nodes are classified into 2 types according to their types of cpu, memory, disk, and net adapter. The types of computers are *RSV* (2400MHZ, 768MB, 7200RPM, 100M) and *RSV* (1800MHZ, 512MB, 7200RPM, 100M). The operating systems of the computers are the LINUX. In order to test the peak access ability about the hot segment, we ignore the net factors. So, the test agent running on client end only sends the access command timely and the *CA* does the file operations and the data will be not transferred to the client end. The

**Experiment 1.** The experimentation includes two kinds: the one is the response time testing in the condition that all *DSEG*s are stored in the file server disks; the other is the response time testing in the condition that all *DSEG*s are stored in Grid memory. We calculate the average ratios of response time of this two kinds. The tests include 6 times according to the *DSEG* scales. The total number of *DSEGs* is the *Tc*, 2*Tc*… 6*Tc* during the 6 times testing. In each test, we adopt the three access method for the *DSEG*s: the random access, the sequence access, and the mix access composed of the random and sequence. The test results are shown in figure 6(a), and the results show that PCMGM is high efficiency compared with

**Experiment 2.** We had tested the variety of PCMGM performance during d is 1, 2, 3, 4, 5, and 6 (The total number of DSEGs is the Tc, 2Tc… 6Tc). The test results are shown in figure

**6. Process of PCMGM** 

The PCMGM process is as follows:

computing ability Ca;

5. GA allots DSEGs for all CNs;

relations between the users and CAs;

adjusting mechanism by the migration.

average value of *d*-density allocation strategy is 6.

strategy;

**7. Experiments** 

the disk cache method.

constructed by GA through fuzzy partition method);

ability Gcc and the total memory storage ability Gmc;

form the DSEGs; then, construct CAs for each DSEG; 7. All CAs in PCMGM start the hot segment service;

The *DSEG*s in *CDM-NCDM* will be written into the local disk of cache node.
