Data Mining and Fuzzy Data Mining Using MapReduce Algorithms

*Poli Venkata Subba Reddy*

#### **Abstract**

Data mining is knowledge discovery process. It has to deal with exact information and inexact information. Statistical methods deal with inexact information but it is based on likelihood. Zadeh fuzzy logic deals with inexact information but it is based on belief and it is simple to use. Fuzzy logic is used to deal with inexact information. Data mining consist methods and classifications. These methods and classifications are discussed for both exact and inexact information. Retrieval of information is important in data mining. The time and space complexity is high in big data. These are to be reduced. The time complexity is reduced through the consecutive retrieval (C-R) property and space complexity is reduced with blackboard systems. Data mining for web data based is discussed. In web data mining, the original data have to be disclosed. Fuzzy web data mining is discussed for security of data. Fuzzy web programming is discussed. Data mining, fuzzy data mining, and web data mining are discussed through MapReduce algorithms.

**Keywords:** data mining, fuzzy logic, fuzzy data mining, web data mining, fuzzy MapReduce algorithms

#### **1. Introduction**

Data mining is an emerging area for knowledge discovery to extract hidden and useful information from large amounts of data. Data mining methods like association rules, clustering, and classification use advanced algorithms such as decision tree and k-means for different purposes and goals. The research fields of data mining include machine learning, deep learning, and sentiment analysis. Information has to be retrieved within a reasonable time period for big data analysis. This may be achieved through the consecutively retrieval (C-R) of datasets for queries. The C-R property was first introduced by Ghosh [1]. After that, the C-R property was extended to statistical databases. The C-R cluster property is a presorting to store the datasets for clusters. In this chapter, C-R property is extended to cluster analysis. MapReduce algorithms are studied for cluster analysis. The time and space complexity shall be reduced through the consecutive retrieval (C-R) cluster property. Security of the data is one of the major issues for data analytics and data science when the original data is not to be disclosed.

The web programming has to handle incomplete information. Web intelligence is an emerging area and performs data mining to handle incomplete information. The incomplete information is fuzzy rather than probability. In this chapter, fuzzy web programming is discussed to deal with data mining using fuzzy logic. The fuzzy algorithmic language, called FUZZYALGOL, is discussed to design queries in data mining. Some examples are discussed for web programming with fuzzy data mining.

The lossless join of the datasets "price" and "sales" is given in **Table 3**.

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.92232*

**INo IName Sales Price** I005 Shirt 80 100 I007 Dress 60 50 I004 Pants 100 80 I008 Jacket 50 60 I009 Skirt 80 100

In the following, some of the methods (frequency, association rule, and clustering)

**CNo INo IName Price** C001 I005 shirt 100 C001 I007 Dress 50 C003 I004 pants 80 C002 I007 dress 80 C001 I008 Jacket 60 C002 I005 shirt 100

Consider the "purchase" relational dataset given in **Table 4**.

Find the frequently customers purchase more than one item.

**CNo INo COUNT** C001 I005 2 C002 I005 2

are discussed.

*Lossless join of the price and sales datasets.*

**Table 3.**

**2.1 Frequency**

*Sample dataset "purchase."*

**Table 4.**

Frequency is the repeatedly accrued data.

SELECT P.CNo, P.INo, IName, COUNT(\*)

The output of this query is given in **Table 5**.

*Association rule* is the relationship among the data.

Find the customers who purchase shirt and dress.

Consider the following query:

FROM purchase P WHERE COUNT(\*)>1.

**2.2 Association rule**

**Table 5.** *Frequency.*

**179**

<shirt⇔ dress> SELECT P.CNo, P.INo

Consider the following query:

#### **2. Data mining**

Data mining [2–5] is basically performed for knowledge discovery process. Some of the well-known data mining methods are frequent itemset mining, association rule mining, and clustering. Data warehousing is the representation of a relational dataset in two or more dimensions. It is possible to reduce the space complexity of data mining with consecutive storage of data warehouses.

The relational dataset is a representation of data with attributes and tuples. **Definition**: A relational dataset *R* or cluster dataset is defined as a collection of attributes *A*1, *A*<sup>2</sup> , … , *Am* and tuples *t*1, *t*2, … , *tn* and is represented as

*R* = *A*<sup>1</sup> x *A*<sup>2</sup> x … x *Am*

*ti* = *ai*<sup>1</sup> x *ai*<sup>2</sup> x … x *aim* are tuples, where *i* =1,2,.., *n* or

*R*(*A*1. *A*2. … *Am*). *R* is a relation.

*R*(*ti*)= (*ai*1. *ai*<sup>2</sup> … . *aim*) are tuples, where *i* =1,2,.., *n*

or instance, two sample datasets "price" and "sales" are given in **Tables 1** and **2**, respectively.


#### **Table 1.**

*Sample dataset "price."*


**Table 2.** *Sample dataset "sales."*

#### *Data Mining and Fuzzy Data Mining Using MapReduce Algorithms DOI: http://dx.doi.org/10.5772/intechopen.92232*

The lossless join of the datasets "price" and "sales" is given in **Table 3**.


#### **Table 3.**

The web programming has to handle incomplete information. Web intelligence is an emerging area and performs data mining to handle incomplete information. The incomplete information is fuzzy rather than probability. In this chapter, fuzzy web programming is discussed to deal with data mining using fuzzy logic. The fuzzy algorithmic language, called FUZZYALGOL, is discussed to design queries in data mining. Some examples are discussed for web programming with fuzzy data

Data mining [2–5] is basically performed for knowledge discovery process. Some of the well-known data mining methods are frequent itemset mining, association rule mining, and clustering. Data warehousing is the representation of a relational dataset in two or more dimensions. It is possible to reduce the space complexity of

The relational dataset is a representation of data with attributes and tuples. **Definition**: A relational dataset *R* or cluster dataset is defined as a collection of

or instance, two sample datasets "price" and "sales" are given in **Tables 1** and **2**,

**INo IName Price** I005 Shirt 100 I007 Dress 50 I004 Pants 80 I008 Jacket 60 I009 Skirt 100

**INo IName Sales** I005 Shirt 80 I007 Dress 60 I004 Pants 100 I008 Jacket 50 I009 Skirt 80

data mining with consecutive storage of data warehouses.

*ti* = *ai*<sup>1</sup> x *ai*<sup>2</sup> x … x *aim* are tuples, where *i* =1,2,.., *n*

*R*(*ti*)= (*ai*1. *ai*<sup>2</sup> … . *aim*) are tuples, where *i* =1,2,.., *n*

attributes *A*1, *A*<sup>2</sup> , … , *Am* and tuples *t*1, *t*2, … , *tn* and is represented as

mining.

or

respectively.

**Table 1.**

**Table 2.**

**178**

*Sample dataset "sales."*

*Sample dataset "price."*

**2. Data mining**

*R* = *A*<sup>1</sup> x *A*<sup>2</sup> x … x *Am*

*R*(*A*1. *A*2. … *Am*). *R* is a relation.

*Data Mining - Methods, Applications and Systems*

*Lossless join of the price and sales datasets.*

In the following, some of the methods (frequency, association rule, and clustering) are discussed.

Consider the "purchase" relational dataset given in **Table 4**.


**Table 4.** *Sample dataset "purchase."*

#### **2.1 Frequency**

Frequency is the repeatedly accrued data. Consider the following query: Find the frequently customers purchase more than one item. SELECT P.CNo, P.INo, IName, COUNT(\*) FROM purchase P WHERE COUNT(\*)>1. The output of this query is given in **Table 5**.


**Table 5.** *Frequency.*

#### **2.2 Association rule**

*Association rule* is the relationship among the data. Consider the following query: Find the customers who purchase shirt and dress. <shirt⇔ dress> SELECT P.CNo, P.INo

#### FROM purchase P WHERE IName="shirt" and IName="dress". The output of this query is given in **Table 6**.


Suppose if a cluster in a cluster set *C* is relevant to the data in a dataset *R*, then

relevancy between cluster set *C* and dataset *R* can be represented as (*n* x *m*) matrix, as shown in **Table 8**. The matrix is called dataset-cluster incidence matrix (CIM).

**R CNo IName Sales** r1 70001 Shirt 150 r2 70002 Dress 30 r3 70003 Pants 100 r4 60001 Dress 50 r5 60002 Jacket 75 r6 60003 Shirt 120 r7 60004 Dress 40

The dataset given in **Table 9** is reorganized in ascending order based on sorting,

**R CNo IName Sales** r1 70001 Shirt 150 r6 60003 Dress 120 r3 70003 Pants 100 r5 60002 Dress 75 r4 60001 Jacket 50 r7 60004 Shirt 40 r2 70002 Dress 30

the relevancy is denoted by 1 and the irrelevancy is denoted by 0. Thus, the

Consider the dataset for customer account given in **Table 9**.

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.92232*

as shown in **Table 10**.

**Table 9.** *Storage of sales.*

**Table 10.**

**181**

*Reorganizing for C-R cluster.*

Consider the following clusters of queries:

The CIM is given in **Table 11**.

C2 = Find the customers whose sales is less than 100.

as shown in **Table 12**. Thus, *C*<sup>1</sup> has C-R cluster property.

as shown in **Table 13**. Thus, *C*<sup>2</sup> has C-R cluster property.

as shown in **Table 14**. Thus, *C*<sup>3</sup> has C-R cluster property.

as shown in **Table 15**. Thus, *C*<sup>4</sup> has a C-R cluster property.

C1 = Find the customers whose sales is greater than or equal to 100.

C4 = Find the customers whose sales is less than average sales.

C3 = Find the customers whose sales is greater than or equal average sales.

The dataset given in **Table 11** is reorganized with sort on *C*<sup>1</sup> in descending order,

The dataset given in **Table 11** is reorganized with sort on C2 in descending order,

The dataset given in **Table 11** is reorganized with sort on *C*<sup>3</sup> in descending order,

The dataset given in **Table 11** is reorganized with sort on *C*<sup>4</sup> in descending order,

**Table 6.** *Association.*

#### **2.3 Clustering**

*Clustering* is grouping the particular data. Consider the following query: Group the customers who purchase dress and shirt. The output of this query is given in **Table 7**.


**Table 7.**

*Clustering.*

#### **3. Data mining using C-R cluster property**

The C-R (consecutive retrieval) property [1, 3] is the retrieval of records of database consecutively. Suppose *R* = {*r*1, *r*2, … , *rn*} is the dataset of records and *C* = {*C*1, *C*2, … , *Cm*} is the set of clusters.

The best type of file organization on a linear storage is one in which records pertaining to clusters are stored in consecutive locations without redundancy storing any data of *R*.

If there exists on such organization of *R* for *C* said to have the Consecutive Retrieval Property or C-R cluster property with respect to dataset *R*. Then C-R cluster property is applicable to linear storage.

The C-R cluster property is a binary relation between a cluster set and dataset.


**Table 8.** *Incidence matrix.* *Data Mining and Fuzzy Data Mining Using MapReduce Algorithms DOI: http://dx.doi.org/10.5772/intechopen.92232*

Suppose if a cluster in a cluster set *C* is relevant to the data in a dataset *R*, then the relevancy is denoted by 1 and the irrelevancy is denoted by 0. Thus, the relevancy between cluster set *C* and dataset *R* can be represented as (*n* x *m*) matrix, as shown in **Table 8**. The matrix is called dataset-cluster incidence matrix (CIM).

Consider the dataset for customer account given in **Table 9**.


#### **Table 9.**

FROM purchase P

**2.3 Clustering**

**Table 6.** *Association.*

**Table 7.** *Clustering.*

**Table 8.** *Incidence matrix.*

**180**

WHERE IName="shirt" and IName="dress". The output of this query is given in **Table 6**.

*Data Mining - Methods, Applications and Systems*

*Clustering* is grouping the particular data.

**3. Data mining using C-R cluster property**

*C* = {*C*1, *C*2, … , *Cm*} is the set of clusters.

cluster property is applicable to linear storage.

storing any data of *R*.

The C-R (consecutive retrieval) property [1, 3] is the retrieval of records of database consecutively. Suppose *R* = {*r*1, *r*2, … , *rn*} is the dataset of records and

**CNo INo IName Price** C001 I007 Dress 50

C002 I007 dress 80

I005 shirt 100

I005 shirt 100

**CNo INo** C001 I005 C002 I005

The best type of file organization on a linear storage is one in which records pertaining to clusters are stored in consecutive locations without redundancy

If there exists on such organization of *R* for *C* said to have the Consecutive Retrieval Property or C-R cluster property with respect to dataset *R*. Then C-R

The C-R cluster property is a binary relation between a cluster set and dataset.

**R C1 C2 … . Cm** r1 1 0 … 1 r2 0 1 ;;; 0 --- … - --- … - =-- … rn 1 1 … 1

Group the customers who purchase dress and shirt. The output of this query is given in **Table 7**.

Consider the following query:

*Storage of sales.*

The dataset given in **Table 9** is reorganized in ascending order based on sorting, as shown in **Table 10**.


#### **Table 10.** *Reorganizing for C-R cluster.*

Consider the following clusters of queries:

C1 = Find the customers whose sales is greater than or equal to 100.

C2 = Find the customers whose sales is less than 100.

C3 = Find the customers whose sales is greater than or equal average sales.

C4 = Find the customers whose sales is less than average sales.

The CIM is given in **Table 11**.

The dataset given in **Table 11** is reorganized with sort on *C*<sup>1</sup> in descending order, as shown in **Table 12**. Thus, *C*<sup>1</sup> has C-R cluster property.

The dataset given in **Table 11** is reorganized with sort on C2 in descending order, as shown in **Table 13**. Thus, *C*<sup>2</sup> has C-R cluster property.

The dataset given in **Table 11** is reorganized with sort on *C*<sup>3</sup> in descending order, as shown in **Table 14**. Thus, *C*<sup>3</sup> has C-R cluster property.

The dataset given in **Table 11** is reorganized with sort on *C*<sup>4</sup> in descending order, as shown in **Table 15**. Thus, *C*<sup>4</sup> has a C-R cluster property.

#### *Data Mining - Methods, Applications and Systems*


#### **Table 11.**

*Cluster incidence matrix.*


The dataset is given for *C*<sup>1</sup> ⋈ *C*<sup>2</sup> has C-R cluster property (**Table 16**).

**R C1 ⋈ C2** r1 1 r3 1 r6 1 r2 1 r4 1 r5 1 r7 1

**R C4** r1 0 r3 0 r5 0 r6 0 r2 1 r4 1 r7 1

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.92232*

The dataset is given for *C*<sup>3</sup> ⋈ *C*<sup>4</sup> has C-R cluster property (**Table 17**).

**R C3 ⋈C4** r1 1 r3 1 r5 1 r6 1 r2 1 r4 1 r7 1

The dataset is given for *C*<sup>1</sup> ⋈ *C*<sup>3</sup> has C-R cluster property (**Table 18**). The dataset is given for *C*<sup>2</sup> ⋈ *C*<sup>4</sup> has C-R cluster property (**Table 19**). The dataset is given for C2 ⋈ C3 has C-R cluster property (**Table 20**). The cluster sets {*C*<sup>1</sup> ⋈ *C*2, *C*<sup>3</sup> ⋈ *C*4, *C*<sup>1</sup> ⋈ *C*3, *C*<sup>2</sup> U⋈ *C*4, *C*<sup>2</sup> U⋈ *C*3} has C-R cluster property. Thus, the cluster sets have C-R cluster properties with respect to

The design of parallel cluster shall be studied through the C-R cluster property. It can be studied in two ways: the parallel cluster design through graph

dataset *R*.

**183**

**Table 17.** C*<sup>3</sup>* ⋈ C*4.*

**Table 15.** *Sorting on* C*4.*

**Table 16.** C*<sup>1</sup>* ⋈ C*2.*

**3.1 Design of parallel C-R cluster property**

#### **Table 12.**

*Sorting on* C*1.*


#### **Table 13.**

*Sorting on* C*2.*


#### *Data Mining and Fuzzy Data Mining Using MapReduce Algorithms DOI: http://dx.doi.org/10.5772/intechopen.92232*


**Table 15.** *Sorting on* C*4.*

**R C1 C2 C3 C4** r1 1010 r2 0101 r3 1010 r4 0101 r5 0110 r6 1010 r7 0101

**R C1** r1 1 r3 1 r6 1 r2 0 r4 0 r5 0 R7 0

**R C2** r1 0 r3 0 r6 0 r2 1 r4 1 r5 1 r7 1

**R C3** r1 1 r3 1 r5 1 r6 1 r2 0 r4 0 r7 0

**Table 11.**

**Table 12.** *Sorting on* C*1.*

**Table 13.** *Sorting on* C*2.*

**Table 14.** *Sorting on* C*<sup>3</sup>*

**182**

*Cluster incidence matrix.*

*Data Mining - Methods, Applications and Systems*

The dataset is given for *C*<sup>1</sup> ⋈ *C*<sup>2</sup> has C-R cluster property (**Table 16**).


#### **Table 16.**

C*<sup>1</sup>* ⋈ C*2.*

The dataset is given for *C*<sup>3</sup> ⋈ *C*<sup>4</sup> has C-R cluster property (**Table 17**).


**Table 17.** C*<sup>3</sup>* ⋈ C*4.*

The dataset is given for *C*<sup>1</sup> ⋈ *C*<sup>3</sup> has C-R cluster property (**Table 18**). The dataset is given for *C*<sup>2</sup> ⋈ *C*<sup>4</sup> has C-R cluster property (**Table 19**). The dataset is given for C2 ⋈ C3 has C-R cluster property (**Table 20**). The cluster sets {*C*<sup>1</sup> ⋈ *C*2, *C*<sup>3</sup> ⋈ *C*4, *C*<sup>1</sup> ⋈ *C*3, *C*<sup>2</sup> U⋈ *C*4, *C*<sup>2</sup> U⋈ *C*3} has C-R cluster property. Thus, the cluster sets have C-R cluster properties with respect to dataset *R*.

#### **3.1 Design of parallel C-R cluster property**

The design of parallel cluster shall be studied through the C-R cluster property. It can be studied in two ways: the parallel cluster design through graph



C*<sup>1</sup>* ⋈ C*3.*


C*<sup>2</sup>* ⋈ C*4.*


**Table 20.** C*<sup>2</sup>* ⋈ C*3.*

theoretical approach and the parallel cluster design through response vector approach.

The C-R cluster property between cluster set *C* and dataset R can be stated in terms of the properties of vectors. The data cluster incidences of cluster set *C* with C-R cluster property may be represented as response vector set *V*. For instance the cluster set {*C*1, *C*2, *C*3, *C*4} has response vector set {*V*1=(1,1,1,0,0,0,0), *V*2= (0,0,0,1,1,1,1), *V*3=(1,1,1,0,0,0), and *V*4=(0,0,0,0,1,1,1)} (**Tables 21–23**).

property. Consider the vectors *V*<sup>1</sup> and *V*<sup>2</sup> of *C*<sup>1</sup> and *C*2. The intersection of *V*<sup>1</sup> ∩*V*<sup>2</sup> = Ф, so that the cluster set {*C*1, *C*2} has parallel cluster property. Similarly the cluster set {*C*3, *C*4} has parallel cluster property. The cluster set {*C*2, *C*3} does not have parallel cluster property because *V*<sup>1</sup> ∩ *V*<sup>2</sup> # Ф and *r*<sup>2</sup> depending on *C*<sup>1</sup> and *C*2.

**R C2 C3** r1 0 1 r3 0 1 r6 0 1 r2 1 1 r4 1 0 r5 1 0 r7 1 0

**R C1 C2** r1 1 0 r3 1 0 r6 1 0 r2 0 1 r4 0 1 r5 0 1 r7 0 1

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.92232*

**R C3 C4** r1 1 0 r3 1 0 r6 1 0 r2 1 0 r4 0 1 r5 0 1 r7 0 1

The C-R cluster property is studied with graphical approach. This graphical

approach can be studied for designing parallel cluster processing (PCP).

**3.2 Visual design for parallel cluster**

**Table 21.** *{*C*1,* C*2}.*

**Table 22.** *{*C*3,* C*4}.*

**Table 23.** *{*C*2,* C*3}.*

**185**

For instance, the response vector of the cluster *C*1 is given by column vector (1,1,1,0,0,0,0).

Suppose *Ci* and *Cj* are two clusters. If the two vectors *Vi* and *Vj* of *Ci* and *Cj* and the intersection *Vi* ∩ *Vj* = Ф, then the cluster set {*Ci*, *Cj*} has a parallel cluster


#### *Data Mining and Fuzzy Data Mining Using MapReduce Algorithms DOI: http://dx.doi.org/10.5772/intechopen.92232*

#### **Table 21.**

*{*C*1,* C*2}.*


**Table 22.**

*{*C*3,* C*4}.*


**Table 23.** *{*C*2,* C*3}.*

theoretical approach and the parallel cluster design through response vector

cluster set {*C*1, *C*2, *C*3, *C*4} has response vector set {*V*1=(1,1,1,0,0,0,0), *V*2= (0,0,0,1,1,1,1), *V*3=(1,1,1,0,0,0), and *V*4=(0,0,0,0,1,1,1)} (**Tables 21–23**).

the intersection *Vi* ∩ *Vj* = Ф, then the cluster set {*Ci*, *Cj*} has a parallel cluster

The C-R cluster property between cluster set *C* and dataset R can be stated in terms of the properties of vectors. The data cluster incidences of cluster set *C* with C-R cluster property may be represented as response vector set *V*. For instance the

**R C1 ⋈C3** r1 1 r3 1 r6 1 r2 1 r4 0 r5 0 r7 0

*Data Mining - Methods, Applications and Systems*

**R C2 ⋈C4** r1 0 r3 0 r6 0 r2 1 r4 1 r5 1 r7 1

**R C2 U C3** r1 1 r3 1 r6 1 r2 1 r4 1 r5 1 r7 1

For instance, the response vector of the cluster *C*1 is given by column vector

Suppose *Ci* and *Cj* are two clusters. If the two vectors *Vi* and *Vj* of *Ci* and *Cj* and

approach.

**Table 20.** C*<sup>2</sup>* ⋈ C*3.*

**Table 18.** C*<sup>1</sup>* ⋈ C*3.*

**Table 19.** C*<sup>2</sup>* ⋈ C*4.*

(1,1,1,0,0,0,0).

**184**

property. Consider the vectors *V*<sup>1</sup> and *V*<sup>2</sup> of *C*<sup>1</sup> and *C*2. The intersection of *V*<sup>1</sup> ∩*V*<sup>2</sup> = Ф, so that the cluster set {*C*1, *C*2} has parallel cluster property. Similarly the cluster set {*C*3, *C*4} has parallel cluster property. The cluster set {*C*2, *C*3} does not have parallel cluster property because *V*<sup>1</sup> ∩ *V*<sup>2</sup> # Ф and *r*<sup>2</sup> depending on *C*<sup>1</sup> and *C*2.

#### **3.2 Visual design for parallel cluster**

The C-R cluster property is studied with graphical approach. This graphical approach can be studied for designing parallel cluster processing (PCP).

Suppose *Vi* is the vertex of RICM of C. The G(*C*) is defined by vertices *Vi*, *i*=1,2,… , and n, and two vertices have an edge *Eij* associated with interval *Ii*={*Vi*, *Vi*+1} i=1,… ,*n*-1.

If G(*C*) has C-R cluster property, the vertices of G(C) have consecutive 1's or 0's. Consider the cluster set {*C*1, *C*2}. The G(C1) has the vertices (1,1,1,0,0,0,0), and the G(*C*2) has the vertices (0,0,0,1,1,1,1), G(*C*3) has the vertices (1,1,1,1, 0,0,0), and G(*C*4) has vertices (0,0,0,0,1,1,1).

The parallel cluster property exists if G(*Ci*) ∩G(*Cj*)=Ф.

For instance, consider the G(*C*1) and G(*C*2). G(*C*1) ∩G(*C*2)=Ф, so that the cluster set {*C*1, *C*2} has parallel cluster property. The graphical representation is shown in **Figure 1**.

Similarly the cluster set {*C*3, *C*4} has the parallel cluster property (PCP). The cluster set {*C*3, *C*4} has no PCP because it is G(*C*2) ∩ G(*C*3) # Ф

The graph G(*C*1) ∩ G(*C*2) = Ф have consecutive cluster property.

The graph G(*C*3) ∩ G(*C*4) = Ф have consecutive cluster property. The graphical representation is shown in **Figure 2**.

The graph G(*C*2) ∩ G(*C*3) # Ф does not have consecutive cluster property. The graphical representation is shown in **Figure 3**.

**3.3 Parallel cluster design through genetic approach**

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.92232*

Consider the following crossover with two cuts:

The parent #1 and #2 match with crossover.

G(*C*1) and G(*C*2)matches with the crossover. The cluster set {*C*1, *C*2} has parallel cluster property.

for designing parallel cluster processing (PCP).

• Selection

**Figure 3.** *{C2, C3}.*

• Mutation

*C*<sup>1</sup> and *C*2.

**187**

C-R cluster property.

For instance, G(*C*1) = 11110000 G(*C*2) = 00001111

• Competition

Parent #1 00001111 Parent #2 11110000

• Reproduction

and optimize the problem [7]. There are four evaluation processes:

Genetic algorithms (GAs) were introduced by Darwin [6]. GAs are used to learn

The C-R cluster property is studied through genetical study. This study will help

**Definition:** The gene G of cluster G(*C*) is defined as incidence sequence. Suppose G(*C*1) is parent and G(*C*2) child genome of cluster incidence for

Suppose *C* is cluster set, *R* is dataset and G(*C*) is genetic set.

Suppose the G(*C*1) has (1,1,1,0,0,0,0) and the G(*C*2) has the v(0,0,0,1,1,1,1). The parallel cluster property may be designed using genetic approach with the

The parallel cluster property exists if G(*Ci*) and G(*Cj*) matches with crossover.

Similarly the cluster set {*C*3, *C*4} has the parallel cluster property. The cluster set

{*C*3, *C*4} has no PCP because G(*C*2) and G(*C*3) are not matched with crossover.

**Figure 2.** *{C3, C4}.*

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms DOI: http://dx.doi.org/10.5772/intechopen.92232*

**Figure 3.** *{C2, C3}.*

Suppose *Vi* is the vertex of RICM of C. The G(*C*) is defined by vertices *Vi*, *i*=1,2,… , and n, and two vertices have an edge *Eij* associated with interval *Ii*={*Vi*, *Vi*+1}

For instance, consider the G(*C*1) and G(*C*2). G(*C*1) ∩G(*C*2)=Ф, so that the cluster set {*C*1, *C*2} has parallel cluster property. The graphical representation is

Similarly the cluster set {*C*3, *C*4} has the parallel cluster property (PCP). The

The graph G(*C*3) ∩ G(*C*4) = Ф have consecutive cluster property. The graphical

The graph G(*C*2) ∩ G(*C*3) # Ф does not have consecutive cluster property. The

If G(*C*) has C-R cluster property, the vertices of G(C) have consecutive 1's or 0's. Consider the cluster set {*C*1, *C*2}. The G(C1) has the vertices (1,1,1,0,0,0,0), and the G(*C*2) has the vertices (0,0,0,1,1,1,1), G(*C*3) has the vertices (1,1,1,1, 0,0,0),

i=1,… ,*n*-1.

shown in **Figure 1**.

**Figure 1.** *{C1, C2}.*

**Figure 2.** *{C3, C4}.*

**186**

and G(*C*4) has vertices (0,0,0,0,1,1,1).

*Data Mining - Methods, Applications and Systems*

representation is shown in **Figure 2**.

graphical representation is shown in **Figure 3**.

The parallel cluster property exists if G(*Ci*) ∩G(*Cj*)=Ф.

cluster set {*C*3, *C*4} has no PCP because it is G(*C*2) ∩ G(*C*3) # Ф

The graph G(*C*1) ∩ G(*C*2) = Ф have consecutive cluster property.

#### **3.3 Parallel cluster design through genetic approach**

Genetic algorithms (GAs) were introduced by Darwin [6]. GAs are used to learn and optimize the problem [7]. There are four evaluation processes:


Consider the following crossover with two cuts:

Parent #1 00001111

Parent #2 11110000

The parent #1 and #2 match with crossover.

The C-R cluster property is studied through genetical study. This study will help for designing parallel cluster processing (PCP).

**Definition:** The gene G of cluster G(*C*) is defined as incidence sequence. Suppose G(*C*1) is parent and G(*C*2) child genome of cluster incidence for *C*<sup>1</sup> and *C*2.

Suppose the G(*C*1) has (1,1,1,0,0,0,0) and the G(*C*2) has the v(0,0,0,1,1,1,1). The parallel cluster property may be designed using genetic approach with the C-R cluster property.

Suppose *C* is cluster set, *R* is dataset and G(*C*) is genetic set.

The parallel cluster property exists if G(*Ci*) and G(*Cj*) matches with crossover. For instance,

G(*C*1) = 11110000

G(*C*2) = 00001111

G(*C*1) and G(*C*2)matches with the crossover.

The cluster set {*C*1, *C*2} has parallel cluster property.

Similarly the cluster set {*C*3, *C*4} has the parallel cluster property. The cluster set {*C*3, *C*4} has no PCP because G(*C*2) and G(*C*3) are not matched with crossover.

#### **3.4 Parallel cluster design cluster analysis**

*Clustering* is grouping the particular data according to their properties, and sample clusters *C*<sup>1</sup> and *C*<sup>2</sup> are given in **Tables 24** and **25**, respectively.


**Table 24.**

*Cluster* C*1.*


memory but in blackboard architecture, the data item source is direct from the blackboard structure. For the retrieval of information for a query, data item is directly retrieved from the blackboard which contains data item sources. Hash

The blackboard systems may be constructed with data structure for data item

When the transaction is being processed, there is no need to take the entire database into the main memory. It is sufficient to retrieval of particular data item of

The advantage of blackboard architecture is highly secured for blockchain

Sometimes, data mining is unable to deal with incomplete database and unable to combine the data and reasoning. Fuzzy data mining [6, 7, 9–18] will combine the data and reasoning by defining with fuzziness. The fuzzy MapReducing algorithms have two functions: *mapping* reads fuzzy datasets and *reducing* writes the after

**Definition**: Given some universe of discourse *X*, a fuzzy set is defined as a pair {*t*, μd(*t*)}, where *t* is tuples and *d* is domains and membership function μd(*x*) is taking values on the unit interval [0,1], i.e., μd(*t*) ➔ [0,1], where *ti*Є*X* is tuples

**R1 d1 d2 . dm μ** t1 a11 a12 . a1m μd(t1) t2 a21 a22 A2m μd(t2) . . .. . . tn a1n a1n . Anm μd(tn)

transaction. The blockchain technology has no third-party interference.

function may be used to store the data item set in the blackboard.

Consider the account (AC-No, AC-Name, AC-Balance)

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.92232*

Each data item is data sourced which is mapped by h(*x*). These data items are stored in blackboard structure.

particular transaction from the blackboard system (**Figure 4**).

Here AC-No is key of datasets.

**5. Fuzzy data mining**

operations.

(**Table 28**).

**Table 28.** *Fuzzy dataset.*

**189**

sources.

**Figure 4.** *Blackboard system.*

**Table 25.** *Cluster* C*2.*

Thus, the *C*<sup>1</sup> and *C*<sup>2</sup> have consecutive parallel cluster property (**Tables 26** and **27**).


**Table 26.** *Cluster* C*3.*


**Table 27.** *Cluster* C*4.*

Thus, the *C*<sup>3</sup> and *C*<sup>4</sup> have consecutive parallel properly. *C*<sup>2</sup> and *C*<sup>3</sup> do not have consecutive parallel cluster property because *r*<sup>2</sup> is common.

#### **4. Design of retrieval of cluster using blackboard system**

Retrieval of clusters from blackboard system [8] is the direct retrieval of data sources. When the query is being processed, the entire database has to bring to main *Data Mining and Fuzzy Data Mining Using MapReduce Algorithms DOI: http://dx.doi.org/10.5772/intechopen.92232*

**Figure 4.** *Blackboard system.*

**3.4 Parallel cluster design cluster analysis**

*Data Mining - Methods, Applications and Systems*

*Clustering* is grouping the particular data according to their properties, and

**R C1** r1 1 r3 1 r6 1

**R C2** r2 1 r4 1 r5 1 r7 1

**R C3** r1 1 r3 1 r5 1 r6 1

sample clusters *C*<sup>1</sup> and *C*<sup>2</sup> are given in **Tables 24** and **25**, respectively.

Thus, the *C*<sup>1</sup> and *C*<sup>2</sup> have consecutive parallel cluster property

Thus, the *C*<sup>3</sup> and *C*<sup>4</sup> have consecutive parallel properly. *C*<sup>2</sup> and *C*<sup>3</sup> do not have

**R C4** r2 1 r4 1 r7 1

Retrieval of clusters from blackboard system [8] is the direct retrieval of data sources. When the query is being processed, the entire database has to bring to main

consecutive parallel cluster property because *r*<sup>2</sup> is common.

**4. Design of retrieval of cluster using blackboard system**

(**Tables 26** and **27**).

**Table 24.** *Cluster* C*1.*

**Table 25.** *Cluster* C*2.*

**Table 26.** *Cluster* C*3.*

**Table 27.** *Cluster* C*4.*

**188**

memory but in blackboard architecture, the data item source is direct from the blackboard structure. For the retrieval of information for a query, data item is directly retrieved from the blackboard which contains data item sources. Hash function may be used to store the data item set in the blackboard.

The blackboard systems may be constructed with data structure for data item sources.

Consider the account (AC-No, AC-Name, AC-Balance)

Here AC-No is key of datasets.

Each data item is data sourced which is mapped by h(*x*).

These data items are stored in blackboard structure.

When the transaction is being processed, there is no need to take the entire database into the main memory. It is sufficient to retrieval of particular data item of particular transaction from the blackboard system (**Figure 4**).

The advantage of blackboard architecture is highly secured for blockchain transaction. The blockchain technology has no third-party interference.

#### **5. Fuzzy data mining**

Sometimes, data mining is unable to deal with incomplete database and unable to combine the data and reasoning. Fuzzy data mining [6, 7, 9–18] will combine the data and reasoning by defining with fuzziness. The fuzzy MapReducing algorithms have two functions: *mapping* reads fuzzy datasets and *reducing* writes the after operations.

**Definition**: Given some universe of discourse *X*, a fuzzy set is defined as a pair {*t*, μd(*t*)}, where *t* is tuples and *d* is domains and membership function μd(*x*) is taking values on the unit interval [0,1], i.e., μd(*t*) ➔ [0,1], where *ti*Є*X* is tuples (**Table 28**).


#### The sale is defined intermittently with fuzziness (**Tables 29–32**).


#### **Table 29.**

*Fuzzy demand.*

μ Demand(*x*)=0.9/90+0.85/80+0.8/75+0.65/70 or Fuzziness may be defined with function μ Demand(*x*)= (1+(Demand-100)/100) <sup>1</sup> Demand <=100 =1 Demand>100

#### A.*Negation*


The fuzzy k-means clustering algorithm (FKCA) is optimization algorithm for

**CNo INo IName Sales** C001 I005⇔I007 Shirt⇔Dress 0.4 C003 I004 pants 0.6 C002 I007⇔I005 Dress⇔shirt 0.5

**CNo INo IName Sales** C001 I005 shirt 0.8 C001 I007 Dress 0.5 C003 I004 pants 0.6 C002 I007 dress 0.5 C001 I008 Jacket 0.7 C002 I005 shirt 0.7

**INo IName Sales** I005 Shirt 0.8 I007 Dress 0.5 I004 Pants 0.6 I007 Dress 0.5 I008 Jacket 0.6

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.92232*

Fuzzy k-means cluster algorithm (FKAC) is given by, using FAD

), if ri.R=rj

The fuzzy multivalued association (FMVD) is the multivalve dependency (MVD). The association multivalve dependency (FAMVD) may be defined by

= min{EQ(t1(*Y*) ,t2(*Y*)) EQ(t2(Y) ,t3(Y)) EQ(*t*1(*Y*) ,*t*3(*Y*))} = min{min(t1(*Y*) ,t2(Y)) , min(*t*2(Y) ,*t*3(Y)) , min(*t*1(*Y*) ,*t*3(*Y*))}

.R

The fuzzy multivalued association property of data mining may be defined with

If EQ(*t*1(*X*),t2(*X*),*t*3(*X*)) then EQ(*t*1(Y) ,*t*2(*Y*)) or EQ(*t*2(*Y*) ,t3(*Y*)) or EQ(*t*1(*Y*) ,

fuzzy datasets (**Table 34**).

best=R

**Table 34.** *Association.*

**Table 33.** *Fuzzy semijoin.*

**Table 32.** *Items-sales.*

return

*t*3(*Y*))

**191**

K=means=best for *i* range(1,*n*) for *j* range(1,*n*)

*C* reduce best k-means < best

*ti*=fuzzy union(ri.RU ri.Rj

multivalued fuzzy functional dependency.

using Mamdani fuzzy conditional inference [3].

#### **Table 30.**

*Negation of price.*

#### A.*Union*


#### **Table 31.**

*Sales U price.*

Union of 1105 = max{0.8,0.7}=0,8

Fuzzy semijoin is given by sales ⋈ items-sale as shown in **Table 33**.

#### *Data Mining and Fuzzy Data Mining Using MapReduce Algorithms DOI: http://dx.doi.org/10.5772/intechopen.92232*


**Table 32.**

The sale is defined intermittently with fuzziness (**Tables 29–32**).

**CNo INo IName Demand** C001 I005 shirt 0.9 C001 I007 Dress 0.65 C003 I004 pants 0.85 C002 I007 dress 0.6 C001 I008 Jacket 0.65 C002 I005 shirt 0.9

μ Demand(*x*)=0.9/90+0.85/80+0.8/75+0.65/70

μ Demand(*x*)= (1+(Demand-100)/100) <sup>1</sup> Demand <=100

Fuzziness may be defined with function

*Data Mining - Methods, Applications and Systems*

or

**Table 29.** *Fuzzy demand.*

=1 Demand>100

A.*Negation*

A.*Union*

**Table 30.** *Negation of price.*

**Table 31.** *Sales U price.*

**190**

Union of 1105 = max{0.8,0.7}=0,8

Fuzzy semijoin is given by sales ⋈ items-sale as shown in **Table 33**.

**CNo INo IName Sales U price** C001 I005 Shirt 0.8 C001 I007 Dress 0.5 C003 I004 Pants 0.6 C002 I007 Dress 0.5 C001 I008 Jacket 0.6 C002 I005 Shirt 0.7

**CNo INo IName Negation of price** C001 I005 shirt 0.3 C001 I007 Dress 0.5 C003 I004 pants 0.4 C002 I007 dress 0.5 C001 I008 Jacket 0.4 C002 I005 shirt 0.3

*Items-sales.*


#### **Table 33.**

*Fuzzy semijoin.*

The fuzzy k-means clustering algorithm (FKCA) is optimization algorithm for fuzzy datasets (**Table 34**).


**Table 34.** *Association.*

> Fuzzy k-means cluster algorithm (FKAC) is given by, using FAD best=R K=means=best for *i* range(1,*n*) for *j* range(1,*n*) *ti*=fuzzy union(ri.RU ri.Rj ), if ri.R=rj .R *C* reduce best k-means < best return

The fuzzy multivalued association property of data mining may be defined with multivalued fuzzy functional dependency.

The fuzzy multivalued association (FMVD) is the multivalve dependency (MVD). The association multivalve dependency (FAMVD) may be defined by using Mamdani fuzzy conditional inference [3].

If EQ(*t*1(*X*),t2(*X*),*t*3(*X*)) then EQ(*t*1(Y) ,*t*2(*Y*)) or EQ(*t*2(*Y*) ,t3(*Y*)) or EQ(*t*1(*Y*) , *t*3(*Y*))

= min{EQ(t1(*Y*) ,t2(*Y*)) EQ(t2(Y) ,t3(Y)) EQ(*t*1(*Y*) ,*t*3(*Y*))} = min{min(t1(*Y*) ,t2(Y)) , min(*t*2(Y) ,*t*3(Y)) , min(*t*1(*Y*) ,*t*3(*Y*))} = min(t1(*Y*) ,t2(*Y*). t3(*Y*))

The fuzzy k-means clustering algorithm (FKCA) is the optimization algorithm for fuzzy datasets (**Table 35**).

**6. Fuzzy security for data mining**

*DOI: http://dx.doi.org/10.5772/intechopen.92232*

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms*

Price = 0.4/50+0.5/60+07/80+0.8/100

Demand = 0.4/50+0.5/60+0.7/80+0.8/100

data associated.

**Table 37.** *Relational database.*

**Table 38.** *Price fuzzy set.*

**Table 39.** *Demand fuzzy set.*

**193**

approximate information.

Security methods like encryption and decryption are used cryptographically. These security methods are not secured. Fuzzy security method is based on the mind and others do not descript. Zadeh [16] discussed about web intelligence, world knowledge, and fuzzy logic. The current programming is unable to deal question answering containing approximate information. For instance "which is the best car?" The fuzzy data mining with security is knowledge discovery process with

The fuzzy relational databases may be with fuzzy set theory. Fuzzy set theory is another approach to approximate information. The security may be provided by

**Definition**: Given some universe of discourse *X*, a relational database *R*1 is

**R1 d1 d2 . dm** t1 a11 a12 . a1m t2 a21 a22 A2m . . .. . tn a1n a1n . Anm

**INo IName Price** I005 Benz 0.8 I007 Suzuki 0.4 I004 Toyota 0.7 I008 Skoda 0.5 I009 Benz 0.8

**INo IName Demand μ** I005 Benz 80 0.7 I007 Suzuki 60 0.5 I004 Toyota 100 0.8 I008 Skoda 50 0.4 I009 Benz 80 0.7

defined as pair {*t*, *d*}, where *t* is tuple and *d* is domain (**Table 37**).

The fuzzy security database of price is given in **Table 38**.

The fuzzy security database of demand is given in **Table 39**.


#### **Table 35.**

*Association using AFMVD.*

Fuzzy k-means cluster algorithm (FKAC) is given by, using FAMVD best=R K=means=best for *i* range(1,*n*) for *j* range(1,*n*) for *k* range(1,*n*) *ti*=fuzzy union(*ri*.*R* U *rj*.*R* U rk.*R*), if *ri*.*R*=rj .*R*=*rk*.*R C* reduce best k-means<best return The fuzzy k-means clustering algorithm (FKCA) is the optimization algorithm for fuzzy datasets. K=means=n for *i* range(1,*n*) for *j* range(1,*n*) *ti*=fuzzy union(*ri*.R U *si*.*Sj*), if *ri*.*R*=*sj*.*S C* =best k-means < best return For example, consider the sorted fuzzy sets of **Table 5** is given in **Table 36**.


**Table 36.** *Fuzzy join.*

#### **6. Fuzzy security for data mining**

= min(t1(*Y*) ,t2(*Y*). t3(*Y*))

*Data Mining - Methods, Applications and Systems*

for fuzzy datasets (**Table 35**).

best=R

**Table 35.**

return

*C* =best

return

**Table 36.** *Fuzzy join.*

**192**

k-means < best

K=means=best for *i* range(1,*n*) for *j* range(1,*n*) for *k* range(1,*n*)

*Association using AFMVD.*

*C* reduce best k-means<best

for fuzzy datasets. K=means=n for *i* range(1,*n*) for *j* range(1,*n*)

The fuzzy k-means clustering algorithm (FKCA) is the optimization algorithm

**CNo INo IName Sales**

C003 I004 Pants 0.6 C002 I007⇔I005 Dress⇔shirt 0.5

Fuzzy k-means cluster algorithm (FKAC) is given by, using FAMVD

The fuzzy k-means clustering algorithm (FKCA) is the optimization algorithm

For example, consider the sorted fuzzy sets of **Table 5** is given in **Table 36**.

**CNo INo IName Sales ⋈ Price⋈ Demand**

C001 I005 Shirt 0.8 C001 I007 Dress 0.5 C003 I004 Pants 0.6 C002 I007 Dress 0.5 C001 I008 Jacket 0.6 C002 I005 Shirt 0.7

.*R*=*rk*.*R*

⇔Jacket

0.8 0.4 0.5

0.7

*ti*=fuzzy union(*ri*.*R* U *rj*.*R* U rk.*R*), if *ri*.*R*=rj

C001 I005⇔I007 ⇔I008 Shirt⇔Dress

*ti*=fuzzy union(*ri*.R U *si*.*Sj*), if *ri*.*R*=*sj*.*S*

Security methods like encryption and decryption are used cryptographically. These security methods are not secured. Fuzzy security method is based on the mind and others do not descript. Zadeh [16] discussed about web intelligence, world knowledge, and fuzzy logic. The current programming is unable to deal question answering containing approximate information. For instance "which is the best car?" The fuzzy data mining with security is knowledge discovery process with data associated.

The fuzzy relational databases may be with fuzzy set theory. Fuzzy set theory is another approach to approximate information. The security may be provided by approximate information.

**Definition**: Given some universe of discourse *X*, a relational database *R*1 is defined as pair {*t*, *d*}, where *t* is tuple and *d* is domain (**Table 37**).


**Table 37.** *Relational database.*

## Price = 0.4/50+0.5/60+07/80+0.8/100

The fuzzy security database of price is given in **Table 38**.


#### **Table 38.**

*Price fuzzy set.*

Demand = 0.4/50+0.5/60+0.7/80+0.8/100 The fuzzy security database of demand is given in **Table 39**.


**Table 39.** *Demand fuzzy set.*


The lossless natural join of demand and price is union and is given in **Table 40**.

#### **Table 40.**

*Lossless join.*

The actual data has to be disclosed for analysis on the web. There is no need to disclose the data if the data is inherently define with fuzziness.

The Mamdani [7] fuzzy conditional inference s given by

The Reddy [12] fuzzy conditional inference s given by

x is more Demand o min{1, 1-Demand+Price}Zadeh x is more Demand o min{Demand, Price} Mamdani

——————————————————————————————————

**INo IName Price** I005 Benz 0.7 I007 Suzuki 0.4 I004 Toyota 0.6 I008 Skoda 0.5 I009 Benz 0.7

**INo IName Demand** I005 Benz 0.8 I007 Suzuki 0.9 I004 Toyota 0.6 I008 Skoda 0.7 I009 Benz 0.9

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.92232*

"If x is more demand, then x is more prices" is given in **Tables 43** and **44**.

So the business administrator (DA) can take decision to increase the price or not.

**INo IName More demand**

I005 Benz 0.89 I007 Suzuki 0.95 I004 Toyota 0.77 I008 Skoda 0.84 I009 Benz 0.95

if x is P1 and x is P2 … . x is Pn then x is Q = min {μP1(x), μP2(x), … , μPn(x) , μQ(x)}

= min(μP1(x), μP2(x), … , μPn(x)) If x is Demand then x is price

x is more Demand o (Demand ➔ Price)

x is more Demand o {Demand} Reddy

The inference for price is given in **Table 45**.

x is more demand

**Table 42.** *Price.*

**Table 41.** *Demand.*

**Table 43.** *More demand.*

**195**

"car with fuzziness >07" may defined as follows:

For instance, XML data may be defined as <CAR> <COMPANY> <NAME> Benz <NAME> <FUZZ> 0.8 <FUZZ> </COMPANY> <COMPANY> <NAME> Suzuki <NAME> <FUZZ> 0.9<FUZZ> </COMPANY> <COMPANY> <NAME> Toyoto<NAME> <FUZZ> 0.6<FUZZ> </COMPANY> <COMPANY> I<NAME> Skoda<NAME> <FUZZ> 0.7<FUZZ> </COMPANY>

Xquery may define using projection operator for demand car is given as Name space default = http:\www.automoble.com/company Validate <CAR> { For \$name in COMPANY/CAR where \$company/ Max(\$demand>0.7)} return <COMPANY> {\$company/name, \$company/fuzzy}</COMPANY> </CAR> The fuzzy reasoning may be applied for fuzzy data mining. Consider the more demand fuzzy database by decomposition (**Tables 41** and **42**).

The fuzzy reasoning [14] may be performed using Zadeh fuzzy conditional inference

The Zadeh [14] fuzzy conditional inference is given by if x is P1 and x is P2 … . x is Pn then x is Q = min 1, {1-min(μP1(x), μP2(x), … , μPn(x)) +μQ(x)}

#### *Data Mining and Fuzzy Data Mining Using MapReduce Algorithms DOI: http://dx.doi.org/10.5772/intechopen.92232*


**Table 41.**

*Demand.*

The lossless natural join of demand and price is union and is given in **Table 40**.

The actual data has to be disclosed for analysis on the web. There is no need to

Xquery may define using projection operator for demand car is given as

return <COMPANY> {\$company/name, \$company/fuzzy}</COMPANY>

The fuzzy reasoning [14] may be performed using Zadeh fuzzy conditional

Name space default = http:\www.automoble.com/company

The fuzzy reasoning may be applied for fuzzy data mining. Consider the more demand fuzzy database by decomposition

The Zadeh [14] fuzzy conditional inference is given by

if x is P1 and x is P2 … . x is Pn then x is Q = min 1, {1-min(μP1(x), μP2(x), … , μPn(x)) +μQ(x)}

disclose the data if the data is inherently define with fuzziness. "car with fuzziness >07" may defined as follows:

For instance,

<CAR> <COMPANY>

**Table 40.** *Lossless join.*

XML data may be defined as

*Data Mining - Methods, Applications and Systems*

<NAME> Benz <NAME> <FUZZ> 0.8 <FUZZ> </COMPANY> <COMPANY>

<NAME> Suzuki <NAME> <FUZZ> 0.9<FUZZ> </COMPANY> <COMPANY>

<NAME> Toyoto<NAME> <FUZZ> 0.6<FUZZ> </COMPANY> <COMPANY>

I<NAME> Skoda<NAME> <FUZZ> 0.7<FUZZ> </COMPANY>

For \$name in COMPANY/CAR

where \$company/ Max(\$demand>0.7)}

Validate <CAR> {

</CAR>

inference

**194**

(**Tables 41** and **42**).


**Table 42.** *Price.*

> The Mamdani [7] fuzzy conditional inference s given by if x is P1 and x is P2 … . x is Pn then x is Q = min {μP1(x), μP2(x), … , μPn(x) , μQ(x)} The Reddy [12] fuzzy conditional inference s given by = min(μP1(x), μP2(x), … , μPn(x)) If x is Demand then x is price x is more demand

```
x is more Demand o (Demand ➔ Price)
x is more Demand o min{1, 1-Demand+Price}Zadeh
x is more Demand o min{Demand, Price} Mamdani
x is more Demand o {Demand} Reddy
"If x is more demand, then x is more prices" is given in Tables 43 and 44.
The inference for price is given in Table 45.
So the business administrator (DA) can take decision to increase the price or not.
```
——————————————————————————————————


**Table 43.** *More demand.*

#### *Data Mining - Methods, Applications and Systems*


#### **Table 44.**

*Demand* ➔ *Price.*


**Table 45.** *Inference price.*

#### **7. Web intelligence and fuzzy data mining**

Let C and D be the fuzzy rough sets (**Tables 46–51**).


#### **Table 46.**

*Fuzzy database.*


XML data may be defined as

<NAME> IBM <NAME> <FUZZ> 0.8 <FUZZ> </COMPANY>

<SOFTWARE> <COMPANY>

*Best software company.*

*Lossless decomposition of price.*

**Table 48.**

**Table 49.**

**Table 50.**

**Table 51.**

**197**

*Intersect of demand and price.*

*Lossless decomposition of demand.*

**INo IName Demand μ** I005 Shirt 80 0.8 I007 Dress 60 0.5 I004 Pants 100 0.8 I008 Jacket 50 0.5 I009 Skirt 80 0.8

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.92232*

**INo IName Price μ** I005 Shirt 100 0.8 I007 Dress 50 0.5 I004 Pants 80 0.8 I108 Jacket 60 0.5 I009 Skirt 100 0.8

**Company μ** IBM 0.8 Microsoft 0.9 Google 0.75

#### **Table 47.**

*Price database.*

The operations on fuzzy rough set type 2 are given as 1-C= 1- μC(x) Negation CVD=max{μC(x), μD(x)} Union CΛD=min{μC(x) , μD(x)} Intersection

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms DOI: http://dx.doi.org/10.5772/intechopen.92232*


#### **Table 48.**

*Intersect of demand and price.*


#### **Table 49.**

**7. Web intelligence and fuzzy data mining**

*Data Mining - Methods, Applications and Systems*

**Table 44.** *Demand* ➔ *Price.*

**Table 45.** *Inference price.*

**Table 46.** *Fuzzy database.*

**Table 47.** *Price database.*

**196**

Let C and D be the fuzzy rough sets (**Tables 46–51**).

**INo IName Zadeh Mamdani Reddy** I005 Benz 0.9 0.7 0.7 I007 Suzuki 0.5 0.4 0.4 I004 Toyota 1,0 0.6 0.6 I008 Skoda 0.8 0.5 0.5 I009 Benz 0.8 0.7 0.7

**INo IName Zadeh Mamdani Reddy** I005 Benz 0.89 0.7 0.7 I007 Suzuki 0.5 0.4 0.4 I004 Toyota 0.77 0.6 0.6 I008 Skoda 0.8 0.5 0.5 I009 Benz 0.8 0.7 0.7

**d1 22 . dm μ**

t1 a11 a12 . a1m μd(t1) t2 a21 a22 A2m μd(t2) . . .. . . tn a1n a1n . Anm μd(tn)

**INo IName Price μ** I005 Shirt 100 0.8 I007 Dress 50 0.4 I004 Pants 80 0.7 I008 Jacket 60 0.5 I009 Skirt 100 0.8

The operations on fuzzy rough set type 2 are given as

1-C= 1- μC(x) Negation

CVD=max{μC(x), μD(x)} Union CΛD=min{μC(x) , μD(x)} Intersection *Lossless decomposition of demand.*


#### **Table 50.**

*Lossless decomposition of price.*


**Table 51.** *Best software company.*

> XML data may be defined as <SOFTWARE> <COMPANY> <NAME> IBM <NAME> <FUZZ> 0.8 <FUZZ> </COMPANY>

<COMPANY> <NAME> Microsoft <NAME> <FUZZ> 0.9<FUZZ> </COMPANY> <COMPANY> <NAME> Google<NAME> <FUZZ> 0.75<FUZZ> </COMPANY> Xquery may define using projection operator for best software company is given as Name space default = http:\www.software.cm/company Validate <SOFTWARE> {For \$name in COMPANY/SOFTWARE where \$company/ Max(\$fuzz)} return <COMPANY> {\$company/name, \$company/fuzzy} </COMPANY> </SOFTWARE> Similarly, the following problem may be considered for web programming. Let P is the fuzzy proposition in question-answering system. P=Which is tallest buildings City? The answer is "x is the tallest buildings city." For instance, the fuzzy set "most tallest buildings city" may defined as most tallest buildings city = 0.6/Hoang-Kang + 0.6/Dubai + 0.7/New York +0.8/ Taipei+ 0.5/Tokyo For the above question, output is "tallest buildings city"= 0.8/Taipei by using projection. The fuzzy algorithm using FUZZYALGOL is given as follows: BEGIN Variable most tallest buildings City = 0.6 / Hoang-Kang + 0.6 / Dubai + 0.7 / New York + 0.8 / Taipei + 0.5 / Tokyo most tallest buildings City =0.8 / Taipei Return URL, fuzziness=Taipei, 0.8 END The problem is to find "most pdf of type-2 in fuzzy sets" The Fuzzy algorithm is Go to most visited fuzzy set cites Go to most visited fuzzy sets type-2 Go to most visited fuzzy sets type -2 pdf The web programming gets "the most visited fuzzy sets" and put in order The web programming than gets "the most visited type-2 in fuzzy sets" The web programming gets "the most visited pdf in type-2"

fuzzy algorithms for data mining. Web intelligence system for data mining is discussed. Some examples are given for web intelligence and fuzzy data mining.

*Data Mining and Fuzzy Data Mining Using MapReduce Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.92232*

The author thanks the reviewer and editor for revision and review suggestions

Department of Computer Science and Engineering, Sri Venkateswara University,

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

\*Address all correspondence to: pvsreddy@hotmail.co.in

provided the original work is properly cited.

**Acknowledgements**

made in this work.

**Author details**

Tirupati, India

**199**

Poli Venkata Subba Reddy

#### **8. Conclusion**

Data mining may deal with incomplete information. Bayesian theory needs exponential complexity to combine data. Defining datasets with fuzziness inherently reduce complexity. In this chapter, fuzzy MapReduce algorithms are studied based on functional dependencies. The fuzzy k-means MapReduce algorithm is studied using fuzzy functional dependencies. Data mining and fuzzy data mining are discussed. A brief overview on the work on business intelligence is given as an example.

Most of the current web programming studies are unable to deal with incomplete information. In this chapter, the web intelligence system is discussed for fuzzy data mining. In addition, the fuzzy algorithmic language is discussed for design

fuzzy algorithms for data mining. Web intelligence system for data mining is discussed. Some examples are given for web intelligence and fuzzy data mining.

## **Acknowledgements**

<COMPANY>

</COMPANY>

pany/ Max(\$fuzz)}

Taipei+ 0.5/Tokyo

projection.

BEGIN

END

**8. Conclusion**

example.

**198**

</SOFTWARE>

given as

<FUZZ> 0.9<FUZZ> </COMPANY> <COMPANY>

<NAME> Microsoft <NAME>

*Data Mining - Methods, Applications and Systems*

<NAME> Google<NAME> <FUZZ> 0.75<FUZZ>

P=Which is tallest buildings City?

New York + 0.8 / Taipei + 0.5 / Tokyo

Go to most visited fuzzy set cites Go to most visited fuzzy sets type-2 Go to most visited fuzzy sets type -2 pdf

The Fuzzy algorithm is

most tallest buildings City =0.8 / Taipei Return URL, fuzziness=Taipei, 0.8

The answer is "x is the tallest buildings city."

Xquery may define using projection operator for best software company is

Validate <SOFTWARE> {For \$name in COMPANY/SOFTWARE where \$com-

return <COMPANY> {\$company/name, \$company/fuzzy} </COMPANY>

Similarly, the following problem may be considered for web programming.

most tallest buildings city = 0.6/Hoang-Kang + 0.6/Dubai + 0.7/New York +0.8/

For the above question, output is "tallest buildings city"= 0.8/Taipei by using

Variable most tallest buildings City = 0.6 / Hoang-Kang + 0.6 / Dubai + 0.7 /

The web programming gets "the most visited fuzzy sets" and put in order The web programming than gets "the most visited type-2 in fuzzy sets"

Data mining may deal with incomplete information. Bayesian theory needs exponential complexity to combine data. Defining datasets with fuzziness inherently reduce complexity. In this chapter, fuzzy MapReduce algorithms are studied based on functional dependencies. The fuzzy k-means MapReduce algorithm is studied using fuzzy functional dependencies. Data mining and fuzzy data mining are discussed. A brief overview on the work on business intelligence is given as an

Most of the current web programming studies are unable to deal with incomplete information. In this chapter, the web intelligence system is discussed for fuzzy data mining. In addition, the fuzzy algorithmic language is discussed for design

For instance, the fuzzy set "most tallest buildings city" may defined as

Name space default = http:\www.software.cm/company

Let P is the fuzzy proposition in question-answering system.

The fuzzy algorithm using FUZZYALGOL is given as follows:

The problem is to find "most pdf of type-2 in fuzzy sets"

The web programming gets "the most visited pdf in type-2"

The author thanks the reviewer and editor for revision and review suggestions made in this work.

## **Author details**

Poli Venkata Subba Reddy Department of Computer Science and Engineering, Sri Venkateswara University, Tirupati, India

\*Address all correspondence to: pvsreddy@hotmail.co.in

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Ghosh SP. File organization: The consecutive retrieval property. Communications of the ACM. 1972; **15**(9):802-808

[2] Chin FY. Effective Inference Control for Range SUM Queries, Theoretical Computer Science, 32,77-86. North-Holland; 1974

[3] Kamber M, Pei J. Data Mining: Concepts and Techniques. New Delhi: Morgan Kaufmann; 2006

[4] Ramakrishnan R, Gehrike J. Data Sets Management Systems. New Delhi: McGraw-Hill; 2003

[5] Tan PN, Steinbach V, Kumar V. Introduction to Data Mining. New Delhi: Addison-Wesle; 2006

[6] Zadeh LA. Fuzzy logic. In: IEEE Computer. 1988. pp. 83-92

[7] Tanaka K, Mizumoto M. Fuzzy programs and their executions. In: Zadeh LA, King-Sun FU, Tanaka K, Shimura M, editors. Fuzzy Sets and Their Applications to Cognitive and Decision Processes. New York: Academic Press; 1975. pp. 47-76

[8] Englemore R, Morgan T. Blackboard Systems. New Delhi: Addison-Wesley; 1988

[9] Poli VSR. On existence of C-R property. Proceedings of the Mathematical Society. 1989;**5**:167-171

[10] Venkta Subba Reddy P. Fuzzy MapReduce Data Mining Algorithms, 2018 International Conference on Fuzzy Theory and Its Applications (iFUZZY2018), November 14-17; 2108

[11] Reddy PVS, Babu MS. Some methods of reasoning for conditional propositions. Fuzzy Sets and Systems. 1992;**52**(3):229-250

[12] Venkata Subba Reddy P. Fuzzy data mining and web intelligence. In: International Conference on Fuzzy Theory and Its Applications (iFUZZY); 2015. pp. 74-79

[13] Reddy PVS. Fuzzy logic based on belief and disbelief membership functions. Fuzzy Information and Engineering. 2017;**9**(9):405-422

[14] Zadeh LA. A note on web intelligence, world knowledge and fuzzy logic. Data and Knowledge Engineering. 2004;**50**:91-304

[15] Zadeh LA. A note on web intelligence, world knowledge and fuzzy logic. Data and Knowledge Engineering. 2004;**50**:291-304

[16] Zadeh LA. Calculus of fuzzy restrictions. In: Zadeh LA, King-Sun FU, Tanaka K, Shimura M, editors. Fuzzy Sets and Their Applications to Cognitive and Decision Processes. New York: Academic Press; 1975. pp. 1-40

[17] Zadeh LA. Fuzzy algorithms. Information and Control. 1968;**12**: 94-104

[18] Zadeh LA. Precipitated Natural Language (PNL). AI Magazine. 2004; **25**(3):74-91

**References**

**15**(9):802-808

Holland; 1974

[1] Ghosh SP. File organization: The consecutive retrieval property. Communications of the ACM. 1972;

*Data Mining - Methods, Applications and Systems*

[12] Venkata Subba Reddy P. Fuzzy data

[13] Reddy PVS. Fuzzy logic based on belief and disbelief membership functions. Fuzzy Information and Engineering. 2017;**9**(9):405-422

intelligence, world knowledge and fuzzy logic. Data and Knowledge Engineering.

intelligence, world knowledge and fuzzy logic. Data and Knowledge Engineering.

restrictions. In: Zadeh LA, King-Sun FU, Tanaka K, Shimura M, editors. Fuzzy Sets and Their Applications to Cognitive and Decision Processes. New York: Academic Press; 1975. pp. 1-40

[14] Zadeh LA. A note on web

[15] Zadeh LA. A note on web

[16] Zadeh LA. Calculus of fuzzy

[17] Zadeh LA. Fuzzy algorithms. Information and Control. 1968;**12**:

[18] Zadeh LA. Precipitated Natural Language (PNL). AI Magazine. 2004;

mining and web intelligence. In: International Conference on Fuzzy Theory and Its Applications (iFUZZY);

2015. pp. 74-79

2004;**50**:91-304

2004;**50**:291-304

94-104

**25**(3):74-91

[2] Chin FY. Effective Inference Control for Range SUM Queries, Theoretical Computer Science, 32,77-86. North-

[3] Kamber M, Pei J. Data Mining: Concepts and Techniques. New Delhi:

[4] Ramakrishnan R, Gehrike J. Data Sets Management Systems. New Delhi:

[5] Tan PN, Steinbach V, Kumar V. Introduction to Data Mining. New Delhi: Addison-Wesle; 2006

[6] Zadeh LA. Fuzzy logic. In: IEEE

[7] Tanaka K, Mizumoto M. Fuzzy programs and their executions. In: Zadeh LA, King-Sun FU, Tanaka K, Shimura M, editors. Fuzzy Sets and Their Applications to Cognitive and Decision Processes. New York: Academic Press; 1975. pp. 47-76

[8] Englemore R, Morgan T. Blackboard Systems. New Delhi: Addison-Wesley;

[9] Poli VSR. On existence of C-R property. Proceedings of the

Mathematical Society. 1989;**5**:167-171

(iFUZZY2018), November 14-17; 2108

[10] Venkta Subba Reddy P. Fuzzy MapReduce Data Mining Algorithms, 2018 International Conference on Fuzzy

Theory and Its Applications

1992;**52**(3):229-250

**200**

[11] Reddy PVS, Babu MS. Some methods of reasoning for conditional propositions. Fuzzy Sets and Systems.

Computer. 1988. pp. 83-92

1988

Morgan Kaufmann; 2006

McGraw-Hill; 2003

## *Edited by Derya Birant*

Data mining is a branch of computer science that is used to automatically extract meaningful, useful knowledge and previously unknown, hidden, interesting patterns from a large amount of data to support the decision-making process. This book presents recent theoretical and practical advances in the field of data mining. It discusses a number of data mining methods, including classification, clustering, and association rule mining. This book brings together many different successful data mining studies in various areas such as health, banking, education, software engineering, animal science, and the environment.

Published in London, UK © 2021 IntechOpen © pinglabel / iStock

Data Mining - Methods, Applications and Systems

Data Mining

Methods, Applications and Systems

*Edited by Derya Birant*