**2. Personal data privacy**

A foundational privacy issue facing information system developers and users is personal data privacy. Personally-identifiable data about clients, employees, prospects and other stakeholders may be regularly collected and stored in shared ledgers. Today, many organizations store private stakeholder data and even passwords in unencrypted form. Even when data are encrypted or anonymized, it may be possible to identify users unless well-developed cybersecurity processes are designed into data management systems. With frequent cybersecurity failures and increasing regulation, maintaining the privacy of personally identifiable information (PII) has become an issue of strategic concern for many organizations.

PII includes any data that can be traced back to a specific person, and can include individual items such as biometric data, social security numbers, phone numbers, or geolocation data. PII can also include data combinations, such as postal codes, birthdates, and gender, or behavioral data associated with one person. Organizations gather and store personal data about current and future customers and employees as well as about other stakeholders.

### **3. Cybersecurity and privacy breaches**

Cybersecurity has become increasingly important for governments and businesses alike. Information security—one component of cybersecurity—focuses on protecting the integrity and privacy of data as it is captured, stored and used. The people, processes, and technology associated with data work in concert to create and maintain security.

Despite advances in security protocols and software, privacy breaches are on the rise. According to Risk Based Security's 2020 data breach report, "The total number of records compromised in 2020 exceeded 37 billion, a 141% increase compared to 2019" [1]. Personal records of system users are regularly compromised, and millions of these records, including names, emails and passwords, have been subject to data breaches, in many cases even including addresses, birth dates and financial information [1].

A data breach occurs from unauthorized access to an organization's database, enabling cyber hackers to steal sensitive personal information such as passwords, credit card numbers, social security numbers, and banking information [2]. These well documented breeches have had adverse consequences, including credit card fraud, and identity theft, which can have lasting negative effects on personal credit, often taking months, if not years, to remedy [2]. Some of the Largest, most recent cyber hacks include the 2013/14 breech of Yahoo's database by what is thought to have been a state-sponsored cyberattack, impacting over 3 billion users. The hackers collected consumers' names, email addresses, telephone numbers, dates of birth, hashed passwords and unencrypted answers to security questions.

In 2017, the credit reporting agency Equifax was subject to a cyberattack in which affected an estimated 143 million consumers. System administrators weren't aware of the suspicious activity for two months and did not report the breach for a full month after its discovery. It is believed that Equifax was breached by Chinese state-sponsored hackers engaged in espionage [3]. The collective financial impact to individual victims is not known, nor is it known what security and strategic damage was incurred by the state, but these cases highlight the potential risk when PII are housed in a centralized data base.

*How Blockchain and AI Enable Personal Data Privacy and Support Cybersecurity DOI: http://dx.doi.org/10.5772/intechopen.96999*

**Figure 1.** *Cybersecurity breaches and record exposure.*

Most of the data gathered and stored are in the control of governments and corporations, which have gathered volumes of personal information that they are responsible for securing. At the same time, these organizations may be monetizing these datasets, either by using them to improve their own operations and offerings or by selling them to third parties. The volume of data generated and collected is increasing exponentially, enlarging the footprints of users. Data consolidators are able to link data elements across data sources and combine data in ways that were never anticipated by the parties that collected the information nor by the users that provided it.

**Figure 1**, which uses from data provided by Statista [4], shows the cost of amassing these large databases. Statista, a statistical research firm, tracks cybersecurity failures and trends. A recently published Statista report reveals that these events are increasing, especially in the past five years, underscoring the need to improve how data are secured. It should be noted that in 2020 a massive cyber breach by what is thought to be Russia could result in higher numbers for 2020 especially in the records exposed category as it is thought to be significant. The extent of the breach is still under investigation at the time of this publication.

#### **4. Privacy regulations**

The right to privacy is a considered to be basic human right in many parts of the world. That privacy may extend to individuals' right to control their own personal data. This right must be carefully defended as ownership and management of an individual's personal data can impact relationships with others and even the data-owner's identity [5].

Regulations governing how personal data are gathered and managed are rapidly being developed. The European Union has led the way in legislating privacy law through the General Data Protection Regulation (GDPR), passed in 2016. The law requires organizations, that gather personal data about EU citizens for transactions with EU member states, must carefully protect that data to ensure privacy.

In the US, the California Privacy Rights Act (CPRA) which expands on the 2018 California Consumer Privacy Act (CCPA), adopts many principles from the GDPR [6]. The CCPR is designed to provide residents of California the right:

1.to know what personal data is being collected


At the federal level, the Consumer Online Privacy Rights Act (COPRA) was introduced in December 2019 by Democratic senators, led by Maria Cantwell. Although this bill has yet to pass, and previous federal privacy bills have failed, governmental bodies continue to pursue stricter laws for governing data [8].

Privacy laws directly affect how companies operate and will require firms that use consumer data to implement systems and operational practices that enable them to conform to these new regulations. Blockchain and Distributed Ledger Technology are uniquely positioned to help companies comply with existing and potential future regulation as it relates to personal property and data privacy.

#### **5. Blockchain and privacy**

Among the significant benefits of blockchain solutions is that they enable organizations to share data in ways not previously available, opening up possibilities for enhanced collaboration, improved operational efficiencies and expanded revenue. Questions about how to maintain privacy over the data are heightened in these environments because the data are stored in shared ledgers which may be accessible by multiple blockchain participants.

ConsenSys, a blockchain technology solutions company, in discussing the security of public blockchains, argues that "In reality, privacy is not a property of any blockchain. Rather, there are layers of privacy that can be applied to any blockchain…" [9]. Designers must carefully consider which parties are allowed to read and write transactions and how transactions are broadcast, validated, and stored. Additional issues relating to how permissions and security measures are updated and enforced are also important considerations. Decisions about who owns the data and how data can be used by organizations and computer applications further complicates privacy discussions [9].

#### **5.1 Decentralized identity**

Self-sovereign identity, a widely held view among blockchain proponents, holds that individuals should have control over their own identities and should have autonomy over how facets of identity are shared with others. Decentralized identity (DID) is a blockchain-enabled embodiment of self-sovereign identity that can profoundly improve the privacy and security of personal data.

DID refers to individual ownership of personal digital data relating to many elements of identity. Microsoft, which participates in defining DID standards, takes the perspective of the individual. "Currently, our identity and all our digital interactions are owned and controlled by other parties, some of whom we aren't even aware of [10]." Returning ownership of data to the individuals to whom the data pertains can provide benefits both to those individuals and to organizations that would otherwise be responsible for protecting the data.

#### *How Blockchain and AI Enable Personal Data Privacy and Support Cybersecurity DOI: http://dx.doi.org/10.5772/intechopen.96999*

Blockchain technology enables DID and provides a way for individuals to store their own data outside of the databases of the parties with whom they transact. Data are owned and controlled by these individuals and pointers to this data or metadata can be stored on the blockchain and can be used to verify the validity of claims the users make about their personal data. For example, a driver's license bureau might issue a driver's license to a user, which the user stores privately. When an insurance company or other party wishes to verify that the user is licensed, the user can present the license to a party such as an insurance company, and the party can independently verify the issuer and expiration date.

Anyone can create a DID. When this identity is first created, there is no information attached to it. Over time, the user could attach a driver's license or other identifying data to that DID. The process that a third party might use to verify that a particular person owns a DID, is similar to the process of validating that a person owns an email address. For example, an online gaming account can be attached to an email address. A party seeking to validate that a person was the owner of that account could send a private message, such as a security code, to the email address and ask the person to provide that code, something that only the person possessing the password for that email address could provide.

Unlike an email account, the DID would be owned and stored by a person rather than by an email service provider. The password, or private key, would also be secured by the owner. Personal information relating to the identity could be stored in an identity hub—an encrypted repository of personal data that is stored outside the blockchain, likely in a combination of phone, PC, and cloud data or offline storage devices [10]. Through the use of an identity hub, the person could control which pieces of information to share with an external party.

DIDs reduce the probability of unwanted correlation. The use of common identifiers—such as email addresses on different web sites—creates what is called a correlation problem. Correlation in this context means entities can, without a user's consent, associate information about a single identity across multiple systems. Email addresses utilize data on almost every website. When users provide the same email address on different sites—along with perhaps additional pieces of personal information like a phone number or physical address—they unknowingly enable a potential for correlation. In this case, entities can correlate that data across sites.

**Figure 2.** *Decentralized identities and service providers.*

Tracking cookies and web clicks enable the linking of IDs across websites which can result in outsiders gaining a full picture of users' identity, where they live, their gender, age range, interests, and other information [10].

**Figure 2** depicts how a user of several services and on-line websites can store data in a central user-controlled location and interact separately with each service provider. This enables the user to control the specific pieces of information that can be seen by each provider.

#### **5.2 Blockchain-enabled federated identity**

DIDs can help users secure and control their data property and determine who gets access to that data. Blockchains can also increase security for individuals when interacting with multiple internet platforms or services through the use of decentralized federated identities.

Blockchains allow entities to protect privacy of individuals—central to selfsovereign identity. Traditionally, users of a system or set of systems possess what is referred to as a federated identity, which can be described as a single identity used by individuals to access services or information platforms, provided by multiple parties, whereby a single identity is enabled and determined by single sign on (SSO) authentication. Consider a health care network that includes multiple entities like hospitals, insurance carriers, or urgent care clinics, where the providers enable the use of a single sign-on credential or *digital federated identity* to access all services. This type of identity, which is typically stored and managed in a central location by a service provider, is prone to security vulnerabilities [11].

The distributed nature of Blockchain technology provides an opportunity for networks to enable single sign-on, or federated identities much more securely. ElGayyar [11] proposes a blockchain-based federated identity framework (BFID) where the network of providers themselves, rather than a centralized third party, manage the system, identification, and authentication of the users. Any entity within the blockchain network can verify credentials and issue the identity for any user in the system. In a BFID, all transactions are written and maintained within the blockchain where the system takes advantage of the secure and immutable nature of the distributed ledger, thereby practically eliminating the possibility for identity breaches and potential theft.

Blockchain-based federated identity frameworks can be configured on both public and private blockchain implementations and make use of smart contracts to react to potential rule changes that may occur while governing identity management within the system. Additionally, these frameworks enable users to audit and control how their identities are used while also providing the network business entities the ability to monitor how their services are being used, enabling process improvement and a better overall user experience.

#### **5.3 Zero-knowledge proofs**

Zero-knowledge proofs enable ease of access to identity and other important data while maintaining privacy and property control for individuals. Zeroknowledge proofs are cryptographic methods whereby a user or "prover" can convince someone, or a "verifier" that something about them is true without providing, revealing or sharing that information. A common example is a customer attempting to order an alcoholic beverage from a bartender who demands to know that the patron is 21 of age or older. Providing a driver's license reveals the patron's full birth date as well as height, eye color, and home address—information that could be misused or stolen.

#### *How Blockchain and AI Enable Personal Data Privacy and Support Cybersecurity DOI: http://dx.doi.org/10.5772/intechopen.96999*

Zero-knowledge proofs use cryptographic algorithms that enable a prover to mathematically demonstrate to a verifier that a statement is correct without revealing any data. When the state issues a 21-and-over driver's license, it asks the driver to type in a secret nickname, unknown to the licensing bureau. This nickname could then be hashed together with the driver's license number, and stored in a public list representing valid drivers over 21. At the bar, you could type your nickname and license number into a hash generator, and if the resulting hash matched one on the list, the bartender would know that you were of legal age [12].

There are two types of zero knowledge proofs, interactive and non-interactive. Most commonly, zero knowledge protocols are interactive whereby the prover (an individual or more likely a computer) and the verifier participate in a back and forth set of questions or challenges that, when answered correctly a given number of times, enables the prover to convince the verifier, with very high probability, that the statement they are making is true.

An example of an interactive zero knowledge proof could involve two colored balls that are identical in every way accept their color. One is red and one green. Let's assume the verifier is completely color blind and cannot tell the color of either ball. You want to prove to the verifier that the balls do in fact differ in color. The verifier puts the balls behind their back and shows one. The prover indicates the color. The verifier then does this again and asks if they switched the ball. Since you can see the different colors you can say with certainty that the ball was either switched or not. After several rounds of this, it becomes more statistically true that there are in fact two different balls as the probability that you could guess correctly over and over goes down to almost zero [13].

Non-interactive proof is more like the example above of the patron proving their age to a bartender with a proof statement that reveals age but not additional information that might be revealed if the prover were to show their photo. Proving which point Value a card in a deck of 52 cards, without identifying its suit, can provide an example of this type of proof. The prover states that the card they are holding is a king but does not want to reveal which king—the king of hearts, diamonds, spades or clubs. If the cryptographic string also contains information that reveals the other 48 cards, none of which are kings, we can know for certain that the prover does in fact hold a king of some kind.

Zero knowledge proofs are powerful tools for maintaining privacy and property control for individuals that may need to provide a bit of personal information but no more than absolutely necessary.
