**4. Theoretical results**

• We can expand the previous notation to reflect the metric representation of a

◦ For example, in **Figure 2**, Df g *<sup>v</sup>*1,*v*<sup>2</sup> ,�f g *<sup>v</sup>*3,*v*<sup>4</sup> <sup>¼</sup> f g ð Þ 2, 3 , 1, 2 ð Þ . Note that the first pair (2,3) corresponds to *v*<sup>1</sup> and the second pair (1,2) corresponds to *v*2.

• We define a partition <sup>Q</sup> <sup>¼</sup> f g *<sup>V</sup>*1, *<sup>V</sup>*2, … ,*Vt* of *<sup>V</sup>*<sup>0</sup> <sup>⊆</sup>*<sup>V</sup>* as one with the

1,*V*<sup>0</sup>

◦ Optionally, for every set *<sup>V</sup>*<sup>ℓ</sup> in <sup>Q</sup>, replace *<sup>V</sup>*<sup>ℓ</sup> by a partition of *<sup>V</sup>*ℓ.

• We define an *equivalence relation* (and related notations) over set of same-

◦ The set of equivalence classes, which forms a partition of <sup>D</sup>*<sup>V</sup>*n*V*<sup>0</sup>

Q<sup>¼</sup>

**<sup>d</sup>***vi*,�*V*<sup>0</sup> and **<sup>d</sup>***vj*,�*V*<sup>0</sup> belong to the same equivalence class in <sup>Q</sup><sup>¼</sup>

◦ The *measure* of the equivalence relation is defined as

*y*∈ Q<sup>¼</sup> *V*n*V*0,�*V*0

i. For example, in **Figure 2**, *μ* Df g *<sup>v</sup>*1,*v*2,*v*<sup>6</sup> ,�f g *<sup>v</sup>*3,*v*<sup>5</sup>

◦ We declare two nodes *vi*, *vj* <sup>∈</sup>*V*n*V*<sup>0</sup> to be in the same equivalence class if

,�*V*<sup>0</sup> also defines a partition into equivalence classes of *<sup>V</sup>*n*V*<sup>0</sup>

f g j*y*j .

◦ If a set *<sup>S</sup>* is a *<sup>k</sup>*-antiresolving set then <sup>D</sup>*<sup>V</sup>*n*S*,�*<sup>S</sup>* defines a partition into

,�*V*<sup>0</sup> for some Ø ⊂*V*<sup>0</sup> ⊂*V* as follows:

*<sup>i</sup>*¼<sup>1</sup> *Vi* � �<sup>n</sup> <sup>⋃</sup><sup>ℓ</sup>

2, … ,*V*<sup>0</sup> ℓ

*<sup>i</sup>*¼<sup>1</sup> *<sup>V</sup>*<sup>0</sup> *i* � �

i. For example, in **Figure 2**, f g f g *v*1, *v*<sup>2</sup> , f g *v*<sup>3</sup> , f g *v*<sup>5</sup> ≺*r*f g f g *v*1, *v*<sup>2</sup> , f g *v*3, *v*4, *v*<sup>5</sup> .

Q, as one that can be obtained from Q using the following rules:

� � of a partition Q, denoted by

f g *<sup>v</sup>*1,*v*2,*v*<sup>6</sup> ,�f g *<sup>v</sup>*3,*v*<sup>5</sup> <sup>¼</sup> f g ð Þ 2, 3 , 1, 2 ð Þ, 2, 3 ð Þ .

� � <sup>¼</sup> 1 and f g *<sup>v</sup>*3, *<sup>v</sup>*<sup>5</sup> is a

, remove *vj* from the set in Q

,�*V*0, is

,�*V*0; thus

.

*V*n*V*<sup>0</sup>

,�*<sup>S</sup>* <sup>¼</sup> **<sup>d</sup>***vl*,�*<sup>S</sup>* <sup>j</sup> *vl* <sup>∈</sup>*V*<sup>0</sup> � �.

subset of nodes *V*<sup>0</sup> ⊂*V S* with respect to *S* as D*V*<sup>0</sup>

*Security and Privacy From a Legal, Ethical, and Technical Perspective*

, and

◦ If there exists an empty set, remove it.

*V*n*V*<sup>0</sup> ,�*V*<sup>0</sup>

def min

equivalence classes of measure *k*.

i. For example, in **Figure 2**,

following properties:

*<sup>i</sup>*¼<sup>1</sup> *Vi* <sup>¼</sup> *<sup>V</sup>*<sup>0</sup>

◦ for all *<sup>i</sup>* 6¼ *<sup>j</sup>*, *Vi* <sup>∩</sup>*Vj* <sup>¼</sup> Ø.

• We define a refinement <sup>Q</sup><sup>0</sup> <sup>¼</sup> *<sup>V</sup>*<sup>0</sup>

◦ For every node *vj* ∈ ⋃*<sup>t</sup>*

that contains it.

length vectors D*<sup>V</sup>*n*V*<sup>0</sup>

Q<sup>¼</sup> *V*n*V*<sup>0</sup>

*μ* D*<sup>V</sup>*n*V*<sup>0</sup>

**130**

,�*V*<sup>0</sup> � � <sup>¼</sup>

1-antiresolving set.

denoted by Q<sup>¼</sup>

◦ ⋃*t*

Q<sup>0</sup> ≺*<sup>r</sup>*

To understand graph resistance against privacy attacks, one needs to study the ð Þ *k*, ℓ -anonymity in greater details. Thus, we look into some computational problems related to this measure that were formalized and investigated in [16]. This section contains three problems from [16] and the respective algorithms to solve each problem efficiently. It is important to note that ð Þ *k*, ℓ -anonymity in its basic definition sets no limitation for the adversary, which means that an adversary can take control of as many nodes as she/he can. However, in real world there are many mechanisms designed solely to prevent such attacks and thus the chances of being caught are significantly high. This notion is the motivation behind several problems with respect to measuring the ð Þ *k*, ℓ -anonymity in a graph [17].

We now state the three problems for analyzing ð Þ *k*, ℓ -anonymity. Problem 1 simply checks to find a *k*-antiresolving set for the largest possible value of *k*. Problem 2 sets a restriction for number of nodes the adversary can control and attempts to find the largest possible value of *k* while minimizing the number of nodes that are compromised. Problem 3 introduces a version of the problem that attempts to address the trade-off between privacy threshold and number of compromised nodes.

**Problem 1** (metric antidimension (*ADIM*)). Find a *k*-antiresolving subset of nodes *S* that maximizes *k*.

Problem 1 assumes there are *no* limitations on the number of attacker nodes, thus finding an absolute bound for privacy violation. Note that solution to Problem 1, denoted by *kopt*, shows that, given no bound on number of the nodes an adversary can control, it is feasible to uniquely re-identify *kopt* nodes with probability <sup>1</sup> *kopt*. The assumptions in Problem 1 are rarely plausible in practice; due to mechanisms present to counter such attacks, the more nodes the adversary controls, the higher the risk of being exposed. Thus, a limit on number of attacker nodes is necessary, which leads us to Problem 2.

**Problem 2** (*k*<sup>≥</sup> -metric antidimension (*ADIM*≥*<sup>k</sup>*)). Given *k*, find a *k*<sup>0</sup> antiresolving set *S* such that (i) *k*<sup>0</sup> > ¼ *k* and, (ii) *S* is of minimum cardinality.

Problem 2 is an extension to Problem 1 that attempts to find the largest value of *k* while minimizing the number of attacker nodes. A solution to this problem asserts few interesting statements. For example, an adversary controlling *l* attacker nodes where <sup>ℓ</sup><∣L<sup>≥</sup>*<sup>k</sup> opt* ∣ cannot uniquely re-identify any node in the network with a probability better than <sup>1</sup> *k*. However, using enough number of nodes (≥ ∣L<sup>≥</sup>*<sup>k</sup> opt* ∣) one can re-establish such possibilities.

The third problem focuses on a trade-off between number of attacker nodes and the privacy violation probability. Given two measures ð Þ *k*, ℓ -anonymity and *k*<sup>0</sup> , ℓ<sup>0</sup> anonymity where *k*<sup>0</sup> >*k* and ℓ<sup>0</sup> <ℓ, it is easy to observe that *k*<sup>0</sup> , ℓ<sup>0</sup> -anonymity measure provides a smaller privacy violation probability but also has lower tolerance for attacker nodes. The trade-off leads us to the third problem.

**Problem 3** (*k*=-metric antidimension (*ADIM*¼*<sup>k</sup>*)) Given a positive integer *k*, find a *k* antiresolving subset of nodes *S* with minimum cardinality if such a subset exists.

Chatterjee et al. [16] investigated Problems 1–3 from a computational complexity perspective. The following theorems summarizes their finding on Problems 1–3. The non-trivial mathematical proofs for these theorems are unfortunately outside of the scope of this chapter; we strongly recommend readers who are interested in the proofs to read the original paper [16].

Theorem 1. [16]

1.Both *ADIM* and *ADIM*≥*<sup>k</sup>* can be solved in O *<sup>n</sup>*<sup>4</sup> ð Þ time.

2.Both *ADIM* and *ADIM*≥*<sup>k</sup>* can also be solved in O *<sup>n</sup>*<sup>4</sup> log *<sup>n</sup> k* time with high probability.

Theorem 2. [16]


Theorem 3. [16]


### **4.1 Algorithms**

The following algorithms were devised in [16] to address Problems 1–3. It is important to note that *ADIM* can be solved in *O n*<sup>5</sup> ð Þ time by repeatedly solving *ADIM*≥*<sup>k</sup>* for *k* ¼ *n* � 1, *n* � 2, … , 1 to find the largest obtainable value for *k* such that <sup>L</sup><sup>≥</sup>*<sup>k</sup> opt* <sup>&</sup>lt; <sup>∞</sup>. However, few modifications to Algorithm 1 directly result in *O n*<sup>4</sup> ð Þ solution, which is shown in Algorithm 2.

### **5. Empirical results**

In [18], DasGupta et al. investigated the resistance of 8 real-world network against active attacks with respect to the ð Þ *k*, ℓ -anonymity. All the networks under investigation were unweighted graphs and the direction of edges (if the network was directed) was ignored during the analysis. **Table 1** contains the general information regarding these networks. Results for both *ADIM* and *ADIM*≥*<sup>k</sup>* were obtained by running Algorithm 1 on the networks, the return statements from Algorithm 1 being an exact solution to Problem 2. On the other hand, the exact solution for Problem 1 can be achieved by combining Algorithm 1 and binary search on *k* to find the largest value of *k* such that *V* <sup>≥</sup>*<sup>k</sup> opt* 6¼ Ø [18].

**133**

*A Review of Several Privacy Violation Measures for Large Networks under Active Attacks*

*DOI: http://dx.doi.org/10.5772/intechopen.90909*

*A Review of Several Privacy Violation Measures for Large Networks under Active Attacks DOI: http://dx.doi.org/10.5772/intechopen.90909*

Theorem 1. [16]

probability.

Theorem 2. [16]

graph is 2.

Theorem 3. [16]

time.

time.

**4.1 Algorithms**

**5. Empirical results**

that <sup>L</sup><sup>≥</sup>*<sup>k</sup>*

**132**

that *ADIM*¼*<sup>k</sup>* does not admit <sup>1</sup>

solution, which is shown in Algorithm 2.

on *k* to find the largest value of *k* such that *V* <sup>≥</sup>*<sup>k</sup>*

range 1≤*k*≤*n<sup>ε</sup>* for any constant 0≤*ε* < <sup>1</sup>

1.Both *ADIM* and *ADIM*≥*<sup>k</sup>* can be solved in O *<sup>n</sup>*<sup>4</sup> ð Þ time.

*Security and Privacy From a Legal, Ethical, and Technical Perspective*

2.Both *ADIM* and *ADIM*≥*<sup>k</sup>* can also be solved in O *<sup>n</sup>*<sup>4</sup> log *<sup>n</sup>*

1.*ADIM*¼*<sup>k</sup>* is NP-Complete for any *<sup>k</sup>* in the range 1≤*k*≤*n<sup>ε</sup>* where 0 <sup>≤</sup>*ε*<sup>&</sup>lt; <sup>1</sup>

2.Assuming NP ⊈ DTIME *n*log log *<sup>n</sup>* , there exists a universal constant *δ*>0 such

3. If *k* ¼ *n* � *c* for some constant c then *ADIM*¼*<sup>k</sup>* can be solved in polynomial time.

2. If *<sup>G</sup>* has at least one node of degree 1 then *ADIM*¼<sup>1</sup> can be solved in *O n*<sup>3</sup> ð Þ

3. If *<sup>G</sup>* does not contain a cycle of 4 edges then *ADIM*¼<sup>1</sup> can be solved in *O n*<sup>3</sup> ð Þ

The following algorithms were devised in [16] to address Problems 1–3. It is important to note that *ADIM* can be solved in *O n*<sup>5</sup> ð Þ time by repeatedly solving *ADIM*≥*<sup>k</sup>* for *k* ¼ *n* � 1, *n* � 2, … , 1 to find the largest obtainable value for *k* such

In [18], DasGupta et al. investigated the resistance of 8 real-world network against active attacks with respect to the ð Þ *k*, ℓ -anonymity. All the networks under investigation were unweighted graphs and the direction of edges (if the network was directed) was ignored during the analysis. **Table 1** contains the general information regarding these networks. Results for both *ADIM* and *ADIM*≥*<sup>k</sup>* were obtained by running Algorithm 1 on the networks, the return statements from Algorithm 1 being an exact solution to Problem 2. On the other hand, the exact solution for Problem 1 can be achieved by combining Algorithm 1 and binary search

*opt* 6¼ Ø [18].

*opt* <sup>&</sup>lt; <sup>∞</sup>. However, few modifications to Algorithm 1 directly result in *O n*<sup>4</sup> ð Þ

2

arbitrary constant, even if the diameter of the input graph is 2.

1.*ADIM*¼<sup>1</sup> admits 1ð Þ <sup>þ</sup> ln ð Þ *<sup>n</sup>* � <sup>1</sup> approximation in *O n*<sup>3</sup> ð Þ time.

*k* 

*<sup>δ</sup>* ln *<sup>n</sup>* approximation for any integer *<sup>k</sup>* in the

, even if the diameter of the input

time with high

<sup>2</sup> is any

The results for both Problem 1 and Problem 2 for the networks in **Table 1** are depicted in **Table 2**. Results in **Table 2** provide the following interesting insights with respect to resistance against privacy attacks in real-world social networks [19].

• All networks, with the exception of "Enron Email Data" network, will have a significant percentage of their users compromised if an adversary gains control of *only* one node (varying between 2.6% of users compromised in "University Rovira i Virgili emails" network to 26.5% of users compromised in "Zachary Karate Club" network).

• For all networks with the exception of "Enron Email Data" network, the minimum privacy violation probability is notably higher than 0 (varying between 0.019 for the "UC Irvine College Message platform" network to 0.25 for the "Hamsterster friendships" network). The value for minimum privacy violation probability in "Hamsterster friendships" network is notably higher

*opt values recorded for k*>1 *for the "Enron Email Data" network [18]. The values shown are subject*

*k* **4 5 10 20 40 60 100 120 153**

*opt* 1 334 463 567 683 842 935 935 935

*<sup>k</sup>* 0.25 0.2 0.1 0.05 0.025 0.017 0.01 0.009 0.007

**Name** *n k*opt *<sup>p</sup>*opt <sup>¼</sup> **<sup>1</sup>**

*n depict the number of nodes, kopt is the largest value of k such that V* <sup>≥</sup>*<sup>k</sup> opt* 6¼ <sup>ø</sup>*, and* <sup>L</sup><sup>≥</sup>*kopt*

Zachary Karate Club [20] 34 9 0.111 1 26.5% San Juan Community [21] 75 7 0.143 1 9.3% Jazz Musician Network [22] 198 12 0.084 1 6.0% University Rovira i Virgili emails [23] 1133 29 0.035 1 2.6% Enron Email Data Set [24] 1088 153 0.007 935 14.1% Email Eu Core [25] 986 39 0.026 1 3.4% UC Irvine College Message platform [26] 1896 55 0.019 1 2.9% Hamsterster friendships [27] 1788 4 0.25 1 0.22%

*A Review of Several Privacy Violation Measures for Large Networks under Active Attacks*

*<sup>k</sup>*opt <sup>L</sup> **<sup>≥</sup>** *<sup>k</sup>*opt

opt ¼ L¼*<sup>k</sup>*opt opt

*opt is minimum number of*

*k*opt *n*

• In comparison to other networks, the "Zachary Karate Club" and the "San Juan Community" have higher percentage of their users compromised if subjected

The exception network is the "Enron Email Data" network which due to a high

least 86% of the network to achieve a value of *popt* ¼ 0*:*007, which is not feasible in practice. This interesting observation in the "Enron Email Data" network motivated

Email Data" network does not decrease significantly until *k* is set to a much smaller value compare to *kopt*, which further emphasizes that **violating the privacy of the "Enron Email Data" network is not guaranteed in practice.** The authors in [18] also investigated the ð Þ *k*, ℓ -anonymity measure in *synthetic* networks constructed based on both Erdös-Rényi random graphs and Barabási-Albert scale-free networks.

further inspections in different values of *<sup>k</sup>*. As shown in **Table 3**, <sup>L</sup><sup>≥</sup>*<sup>k</sup>*

We refer the reader to the original paper for more information.

*opt* is very resilient against an attack. An adversary needs to control at

*opt* in the "Enron

compare to all other networks.

to a privacy attack.

*attacker nodes for corresponding k.*

*n denotes the number of nodes in the social graph.*

*DOI: http://dx.doi.org/10.5772/intechopen.90909*

*kopt is the largest value of k such that V* <sup>≥</sup>*<sup>k</sup> opt* 6¼ <sup>ø</sup>*.*

*Results for* ADIM *using Algorithm 1 [18].*

Enron Email Data Set *pk* <sup>¼</sup> <sup>1</sup>

<sup>L</sup><sup>≥</sup>*<sup>k</sup>*

*a*

*b*

**Table 2.**

**Table 3.** <sup>L</sup><sup>≥</sup>*<sup>k</sup>*

*opt* 6¼ L<sup>≥</sup>*k*�<sup>1</sup> *opt .*

*to* <sup>L</sup><sup>≥</sup>*<sup>k</sup>*

value of <sup>L</sup><sup>≥</sup>*<sup>k</sup>*

**135**


### **Table 1.**

*List of 8 social networks studied in [18].*

*A Review of Several Privacy Violation Measures for Large Networks under Active Attacks DOI: http://dx.doi.org/10.5772/intechopen.90909*


*n depict the number of nodes, kopt is the largest value of k such that V* <sup>≥</sup>*<sup>k</sup> opt* 6¼ <sup>ø</sup>*, and* <sup>L</sup><sup>≥</sup>*kopt opt is minimum number of attacker nodes for corresponding k.*

*a n denotes the number of nodes in the social graph.*

*b kopt is the largest value of k such that V* <sup>≥</sup>*<sup>k</sup> opt* 6¼ <sup>ø</sup>*.*

### **Table 2.**

*Results for* ADIM *using Algorithm 1 [18].*


**Table 3.**

The results for both Problem 1 and Problem 2 for the networks in **Table 1** are depicted in **Table 2**. Results in **Table 2** provide the following interesting insights with respect to resistance against privacy attacks in real-world social networks [19].

• All networks, with the exception of "Enron Email Data" network, will have a significant percentage of their users compromised if an adversary gains control of *only* one node (varying between 2.6% of users compromised in "University Rovira i Virgili emails" network to 26.5% of users compromised in "Zachary

**Description**

karate club

in farms in San Juan Sur, Costa Rica, 1948

of university

34 78 Network of friendship between 34 members of a

75 144 Network for visiting relations between families living

1133 10903 The network of email interchanges between members

1896 59835 Messages on a Facebook-like platform at UC-Irvine

1788 12476 Friendships between users of the website

198 2842 A social network of jazz musicians

1088 1767 Enron email network

Email Eu Core [25] 986 24989 Emails from a large European research institute

Karate Club" network).

**Name Number**

Zachary Karate Club

San Juan Community

Jazz Musician Network [22]

University Rovira i Virgili emails [23]

Enron Email Data Set

UC Irvine College Message platform [26]

*List of 8 social networks studied in [18].*

Hamsterster friendships [27]

[20]

[21]

[24]

**Table 1.**

**134**

**of nodes**

*Security and Privacy From a Legal, Ethical, and Technical Perspective*

**Number of edges** <sup>L</sup><sup>≥</sup>*<sup>k</sup> opt values recorded for k*>1 *for the "Enron Email Data" network [18]. The values shown are subject to* <sup>L</sup><sup>≥</sup>*<sup>k</sup> opt* 6¼ L<sup>≥</sup>*k*�<sup>1</sup> *opt .*


The exception network is the "Enron Email Data" network which due to a high value of <sup>L</sup><sup>≥</sup>*<sup>k</sup> opt* is very resilient against an attack. An adversary needs to control at least 86% of the network to achieve a value of *popt* ¼ 0*:*007, which is not feasible in practice. This interesting observation in the "Enron Email Data" network motivated further inspections in different values of *<sup>k</sup>*. As shown in **Table 3**, <sup>L</sup><sup>≥</sup>*<sup>k</sup> opt* in the "Enron Email Data" network does not decrease significantly until *k* is set to a much smaller value compare to *kopt*, which further emphasizes that **violating the privacy of the "Enron Email Data" network is not guaranteed in practice.** The authors in [18] also investigated the ð Þ *k*, ℓ -anonymity measure in *synthetic* networks constructed based on both Erdös-Rényi random graphs and Barabási-Albert scale-free networks. We refer the reader to the original paper for more information.
