1. Introduction

### 1.1. What is big data?

"Big data" is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### 1.2. Big data significance in industry and challenges

While understanding the estimation of big information keeps on residual a test, other down to practical challenges including financing and rate of return and aptitudes keep on remaining at the front line for various distinctive ventures that are embracing huge information. All things considered, a Gartner survey for 2015 demonstrates that over 75% of organizations are putting or are intending to put resources into enormous information in the following 2 years. These discoveries speak to a critical increment from a comparable study done in 2012, which showed that 58% of organizations put or were wanting to put resources into enormous information inside the following 2 years [1].

By and large, most associations have a few objectives for receiving enormous information ventures. While the essential objective of most associations is to upgrade client encounter, different objectives incorporate cost diminishment, better focused on promoting and making existing procedures more effective. Lately, information ruptures have additionally made upgraded security a critical objective that has huge information.

of information passing, for example, issuing errands, confirming assignment culmination, and

Data Privacy for Big Data Publishing Using Newly Enhanced PASS Data Mining Mechanism

http://dx.doi.org/10.5772/intechopen.77033

167

• Most of the registering happens on hubs with information on neighborhood circles that

• After consummation of the given errands, the bunch gathers and diminishes the informa-

Generally, the fundamental theme data anonymization [2–5] is the use of one or more techniques designed to make it impossible or at least more difficult to identify a particular individ-

It is privacy preservation techniques used for static data. Techniques implemented in anonym-

tion to shape a suitable outcome and sends it back to the Hadoop server.

ual from stored data related to them. Purposes of data anonymization are:

2. To implement effective techniques to prevent a security breach

1. To prevent the privacy of individuals who shared data for various surveys

duplicating information around the group between the hubs [1]:

lessens the system activity.

Figure 1. Internal working of Map Reduce.

2. Anonymization

2.1. Anonymization

ization are:

1. Encryption

3. Generalization

2. Hashing

### 1.3. Data stream

Big data associated with the time stamp is called big data stream [2].

Examples of data streams:


Privacy protection: i.e., the data streams are extracted from various sources which consist of many individuals' information; hence, the sensitive data of any individuals must not be leaked.

Computer security: Access control and verification guarantee that opportune individual has a right expert to the correct protest at the perfect time and the ideal place. That is not what we need here. A general precept of information security is to discharge all the data as much as the personalities of the subjects (individuals) are ensured.

Real-time processing: Since the data is not static in nature, real-time processing is required, and at present, not many algorithms are there to process the dynamic data.

### 1.4. What is MapReduce?

MapReduce, as shown in Figure 1, is a preparing method and a program that demonstrates for circulated figuring in light of java. The structure deals with every one of the points of interest Data Privacy for Big Data Publishing Using Newly Enhanced PASS Data Mining Mechanism http://dx.doi.org/10.5772/intechopen.77033 167

Figure 1. Internal working of Map Reduce.

1.2. Big data significance in industry and challenges

upgraded security a critical objective that has huge information.

Big data associated with the time stamp is called big data stream [2].

inside the following 2 years [1].

1.3. Data stream

166 Data Mining

1. Sensor data

3. Clickstreams

leaked.

4. Healthcare data

1.4. What is MapReduce?

5. Constraints associated with data streams

personalities of the subjects (individuals) are ensured.

Examples of data streams:

2. Call center records

While understanding the estimation of big information keeps on residual a test, other down to practical challenges including financing and rate of return and aptitudes keep on remaining at the front line for various distinctive ventures that are embracing huge information. All things considered, a Gartner survey for 2015 demonstrates that over 75% of organizations are putting or are intending to put resources into enormous information in the following 2 years. These discoveries speak to a critical increment from a comparable study done in 2012, which showed that 58% of organizations put or were wanting to put resources into enormous information

By and large, most associations have a few objectives for receiving enormous information ventures. While the essential objective of most associations is to upgrade client encounter, different objectives incorporate cost diminishment, better focused on promoting and making existing procedures more effective. Lately, information ruptures have additionally made

Privacy protection: i.e., the data streams are extracted from various sources which consist of many individuals' information; hence, the sensitive data of any individuals must not be

Computer security: Access control and verification guarantee that opportune individual has a right expert to the correct protest at the perfect time and the ideal place. That is not what we need here. A general precept of information security is to discharge all the data as much as the

Real-time processing: Since the data is not static in nature, real-time processing is required,

MapReduce, as shown in Figure 1, is a preparing method and a program that demonstrates for circulated figuring in light of java. The structure deals with every one of the points of interest

and at present, not many algorithms are there to process the dynamic data.

of information passing, for example, issuing errands, confirming assignment culmination, and duplicating information around the group between the hubs [1]:

