**Multiple-Membership Communities Detection and Its Applications for Mobile Networks**

Nikolai Nefedov

*Nokia Research Center ISI Lab, Swiss Federal Institute of Technology Zurich (ETHZ) Switzerland*

#### **1. Introduction**

26 Will-be-set-by-IN-TECH

50 Applications of Digital Signal Processing

Yamazaki, E., Sano, A., Kobayashi, T., Yoshida, E. & Miyamoto, Y. (2011). Mitigation

OThF1, Los Angeles USA, March 2011.

of non-linearities in optical transmission systems. *Conference on Optical Fiber communication/National Fiber Optic Engineers Conference (OFC/NFOEC) 2011*, paper

> The recent progress in wireless technology and growing spread of smart phones equipped with various sensors make it possible to record real-world rich-content data and compliment it with on-line processing. Depending on the application, mobile data processing could help people to enrich their social interactions and improve environmental and personal health awareness. At the same time, mobile sensing data could help service providers to understand better human behavior and its dynamics, identify complex patterns of users' mobility, and to develop various service-centric and user-centric mobile applications and services on-demand. One of the first steps in analysis of rich-content mobile datasets is to find an underlying structure of users' interactions and its dynamics by clustering data according to some similarity measures.

> Classification and clustering (finding groups of similar elements in data) are well-known problems which arise in many fields of sciences, e.g., (Albert & Barabási, 2002; Flake et al, 2002; Wasserman & Faust, 1994). In cases when objects are characterized by vectors of attributes, a number of efficient algorithms to find groups of similar objects based on a metric between the attribute vectors are developed. On the other hand, if data are given in the relational format (causality or dependency relations), e.g., as a network consisting of *N* nodes and *E* edges representing some relation between the nodes, then the problem of finding similar elements corresponds to detection of communities, i.e., groups of nodes which are interconnected more densely among themselves than with the rest of the network.

> The growing interest to the problem of community detection was triggered by the introduction of a new clustering measure called modularity (Girvan & Newman, 2002; 2004). The modularity maximization is known as the NP-problem and currently a number of different sub-optimal algorithms are proposed, e.g., see (Fortunato, 2011) and references within. However, most of these methods address network partitions into disjoint communities.

> On the other hand, in practice communities are often overlapping. It is especially visible in social networks, where only limited information is available and people are affiliated to different groups, depending on professional activities, family status, hobbies, and etc. Furthermore, social interactions are reflected in multiple dimensions, such as users activities, local proximities, geo-locations and etc. These multi-dimensional traces may be presented as multi-layer graphs. It raises the problem of overlapping communities detection at different

*Pij* = *didj*/2*m*.

detection and scalability. **Greedy Search Algorithm**

for all nodes *i* **do**:

*Q*(*c* (*i*→*c*<sup>∗</sup> *j*) *<sup>j</sup>* ) = max

is given by

By construction |*Q*| < 1 and *Q* = 0 means that the network under study is equivalent to the used null model (an equivalent random graph). Case *Q* > 0 indicates a presence of a community structure, i.e., more links remain within communities than would be expected in an equivalent random graph. Hence, a network partition which maximizes modularity may be used to locate communities. This maximization is NP-hard and many suboptimal

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 53

In the following we use the basic greedy search algorithm (Newman, 2004) extended with a random walk approach, since it gives a reasonable trade-off between accuracy of community

> *di* 2*m*

*<sup>j</sup>* with max modularity

<sup>2</sup> .

*pj*,*<sup>n</sup>* . (2)

*pi* (3)

algorithms are suggested, e.g., see (Fortunato, 2011) and references within.

Input: a weighted graph described by *N* × *N* adjacency matrix **A**.

**Initialize** each node *i* as a community *ci* with modularity *Q*(*i*) = −

(*i*→*cj*) *<sup>j</sup>* );


*Q*(*c* (*i*→*cj*) *<sup>j</sup>* );


It is well-known that a network topology affects a system dynamics, it allows us to use the system dynamics to identify the underlying topology (Arenas et al, 2006; 2008; Lambiotte et al, 2009). First, we review the Laplacian dynamics formalism recently developed

Let's consider *N* independent identical Poisson processes defined on every node of a graph *G*(*V*, *E*), |*V*| = *N*, where random walkers are jumping at a constant rate from each of the nodes. We define *pn* as the density of random walkers on node *i* at step *n*, then its dynamics

*j*

*pj* − *pi* = ∑

*Aij dj*

*j*

 *Aij dj*

<sup>−</sup> *<sup>δ</sup>ij*

*pi*,*n*+<sup>1</sup> = ∑

*Aij dj*

**Repeat** until there is an increase in modularity:


*j*∈*N*(*i*)

**2.2 Communities detection with random walk**

in (Evans & Lambiotte, 2009; Lambiotte et al, 2009).

The corresponding continuous-time process, described by (3),

*dpi dt* <sup>=</sup> ∑ *j*

**Stop** when (a local) maximum is reached.


hierarchical levels at single and multi-layer graphs.

In this chapter we present a framework for multi-membership communities detection in dynamical multi-layer graphs and its applications for missing (or hidden) link predictions/recommendations based on the network topology. In particular, we use modularity maximization with a fast greedy search (Newman, 2004) extended with a random walk approach (Lambiotte et al, 2009) to detect multi-resolution communities beyond and below the resolution provided by *max*-modularity. We generalize a random walk approach to a coupled dynamic systems (Arenas et al, 2006) and then extend it with dynamical links update to make predictions beyond the given topology. In particular, we introduce attractive and repulsive coupling that allow us to detect and predict cooperative and competitive behavior in evolving social networks.

To deal with overlapping communities we introduce a soft community detection and outline its possible applications in single and multi-layer graphs. In particular, we propose friend-recommendations in social networks, where new link recommendations are made as intra- and inter-clique communities completion and recommendations are prioritized according to topologically-based similarity measures (Liben-Nowel & Kleinberg, 2003) modified to include multiple-communities membership. We also show that the proposed prediction rules based on soft community detection are in line with the network evolution predicted by coupled dynamical systems. To test the proposed framework we use a benchmark network (Zachary, 1977) and then apply developed methods for analysis of multi-layers graphs built from real-world mobile datasets (Kiukkonen et al, 2010). The presented results show that by combining information from multi-layer graphs we can improve reliability measures of community detection and missing links predictions.

The chapter is organized as follows: in Section 2 we outline the dynamical formulation of community detection that forms the basis for the rest of the paper. Topology detection using coupled dynamical systems and its extensions to model a network evolution are described in Section 3. Soft community detection for networks with overlapping communities and its applications are addressed in Section 4, followed by combining multi-layer graphs in Section 5. Evaluation of the proposed methods in the benchmark network are presented in Section 6. Analysis of some real-world datesets collected during Nokia data collection campaign is presented in Section 7, followed by conclusions in Section 8.

#### **2. Community detection**

#### **2.1 Modularity maximization**

Let's consider the clustering problem for an undirected graph *G* = (*V*, *E*) with |*V*| = *N* nodes and *E* edges. Recently Newman et al (Girvan & Newman, 2002; 2004) introduced a new measure for graph clustering" named a modularity, which is defined as a number connections within a group compared to the expected number of such connections in an equivalent null model (e.g., in an equivalent random graph). In particular, the modularity *Q* of a partition P may be written as

$$Q = \frac{1}{2m} \sum\_{i,j} \left( A\_{i\bar{j}} - P\_{i\bar{j}} \right) \delta(c\_{i\nu} c\_{\bar{j}}) \,, \tag{1}$$

where *ci* is the *i*-th community., *Aij* are elements of graph adjacency matrix; *di* is the *i*-th node degree, *di* = ∑*<sup>j</sup> Aij*; *m* is a total number of links *m* = ∑*<sup>i</sup> di*/2; *Pij* is a probability that nodes *i* and *j* in a null model are connected; if a random graph is taken as the null model, then

*Pij* = *didj*/2*m*.

2 Will-be-set-by-IN-TECH

In this chapter we present a framework for multi-membership communities detection in dynamical multi-layer graphs and its applications for missing (or hidden) link predictions/recommendations based on the network topology. In particular, we use modularity maximization with a fast greedy search (Newman, 2004) extended with a random walk approach (Lambiotte et al, 2009) to detect multi-resolution communities beyond and below the resolution provided by *max*-modularity. We generalize a random walk approach to a coupled dynamic systems (Arenas et al, 2006) and then extend it with dynamical links update to make predictions beyond the given topology. In particular, we introduce attractive and repulsive coupling that allow us to detect and predict cooperative and competitive

To deal with overlapping communities we introduce a soft community detection and outline its possible applications in single and multi-layer graphs. In particular, we propose friend-recommendations in social networks, where new link recommendations are made as intra- and inter-clique communities completion and recommendations are prioritized according to topologically-based similarity measures (Liben-Nowel & Kleinberg, 2003) modified to include multiple-communities membership. We also show that the proposed prediction rules based on soft community detection are in line with the network evolution predicted by coupled dynamical systems. To test the proposed framework we use a benchmark network (Zachary, 1977) and then apply developed methods for analysis of multi-layers graphs built from real-world mobile datasets (Kiukkonen et al, 2010). The presented results show that by combining information from multi-layer graphs we can

improve reliability measures of community detection and missing links predictions.

presented in Section 7, followed by conclusions in Section 8.

*<sup>Q</sup>* <sup>=</sup> <sup>1</sup>

<sup>2</sup>*<sup>m</sup>* ∑ *i*,*j* 

**2. Community detection**

**2.1 Modularity maximization**

may be written as

The chapter is organized as follows: in Section 2 we outline the dynamical formulation of community detection that forms the basis for the rest of the paper. Topology detection using coupled dynamical systems and its extensions to model a network evolution are described in Section 3. Soft community detection for networks with overlapping communities and its applications are addressed in Section 4, followed by combining multi-layer graphs in Section 5. Evaluation of the proposed methods in the benchmark network are presented in Section 6. Analysis of some real-world datesets collected during Nokia data collection campaign is

Let's consider the clustering problem for an undirected graph *G* = (*V*, *E*) with |*V*| = *N* nodes and *E* edges. Recently Newman et al (Girvan & Newman, 2002; 2004) introduced a new measure for graph clustering" named a modularity, which is defined as a number connections within a group compared to the expected number of such connections in an equivalent null model (e.g., in an equivalent random graph). In particular, the modularity *Q* of a partition P

*Aij* − *Pij*

where *ci* is the *i*-th community., *Aij* are elements of graph adjacency matrix; *di* is the *i*-th node degree, *di* = ∑*<sup>j</sup> Aij*; *m* is a total number of links *m* = ∑*<sup>i</sup> di*/2; *Pij* is a probability that nodes *i* and *j* in a null model are connected; if a random graph is taken as the null model, then

*δ*(*ci*, *cj*) , (1)

hierarchical levels at single and multi-layer graphs.

behavior in evolving social networks.

By construction |*Q*| < 1 and *Q* = 0 means that the network under study is equivalent to the used null model (an equivalent random graph). Case *Q* > 0 indicates a presence of a community structure, i.e., more links remain within communities than would be expected in an equivalent random graph. Hence, a network partition which maximizes modularity may be used to locate communities. This maximization is NP-hard and many suboptimal algorithms are suggested, e.g., see (Fortunato, 2011) and references within.

In the following we use the basic greedy search algorithm (Newman, 2004) extended with a random walk approach, since it gives a reasonable trade-off between accuracy of community detection and scalability.

#### **Greedy Search Algorithm**

Input: a weighted graph described by *N* × *N* adjacency matrix **A**.

**Initialize** each node *i* as a community *ci* with modularity *Q*(*i*) = − *di* 2*m* <sup>2</sup> .

**Repeat** until there is an increase in modularity:

for all nodes *i* **do**:


$$\mathbb{Q}(\mathbf{c}\_j^{(i \to c\_j^\*)}) = \max\_{j \in N(i)} \mathbb{Q}(\mathbf{c}\_j^{(i \to c\_j)})\_j$$

**Stop** when (a local) maximum is reached.

#### **2.2 Communities detection with random walk**

It is well-known that a network topology affects a system dynamics, it allows us to use the system dynamics to identify the underlying topology (Arenas et al, 2006; 2008; Lambiotte et al, 2009). First, we review the Laplacian dynamics formalism recently developed in (Evans & Lambiotte, 2009; Lambiotte et al, 2009).

Let's consider *N* independent identical Poisson processes defined on every node of a graph *G*(*V*, *E*), |*V*| = *N*, where random walkers are jumping at a constant rate from each of the nodes. We define *pn* as the density of random walkers on node *i* at step *n*, then its dynamics is given by

$$p\_{i,n+1} = \sum\_{j} \frac{A\_{ij}}{d\_j} p\_{j,n} \,. \tag{2}$$

The corresponding continuous-time process, described by (3),

$$\frac{dp\_i}{dt} = \sum\_j \frac{A\_{ij}}{d\_j} p\_j - p\_i = \sum\_j \left(\frac{A\_{ij}}{d\_j} - \delta\_{ij}\right) p\_i \tag{3}$$

partitions {P*t*} with the decreasing numbers of communities.

which after substitution (6) and (9) gives

given topology.

(Olfati-Saber et al, 2007)

modularity *Q*(*t*) by linear terms in time expansion for *R*(*t*) at *t* ≈ 0,

*<sup>Q</sup>*(*t*)=(<sup>1</sup> − *<sup>t</sup>*) + ∑

**3. Topology detection using coupled dynamical systems**

*x*˙*i*(*t*) = *qi*(*xi*(*t*)) + *kc*

*θi*(*t*) = *ω<sup>i</sup>* + *kc*

˙ *θi*(*t*) = *kc*

and *j*; *ψ*(·) is a coupling function; *kc* is a global coupling gain.

Kuramoto model (Acebron et al, 2005; Kuramoto, 1975)

˙

which for a connectivity graph *G* may be written as

**3.1 Laplacian formulation of network dynamics**

Furthermore, as shown in (Evans & Lambiotte, 2009), we may define a time-varying

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 55

*ck*∈P

In the following we apply time-dependent modularity maximization (11) using the greedy search to find hierarchical structures in networks beyond modularity maximization *Qmax* in (1). This approach is useful in cases where maximization of (1) results in a very fragmental structure with a large number of communities. Also it allows us to evaluate the stability of communities at different resolution levels. However, since the adjacency matrix **A** is not time dependent, the time-varying modularity (11) can not be used to make predictions beyond the

Let's consider an undirected weighted graph *G* = {*V*, *E*} with *N* nodes and *E* edges, where each node represents a local dynamical system and edges correspond to local coupling.

> *N* ∑ *j*=1

where *qi*(*xi*) describes a local dynamics of state *xi*; *Aij* is a coupling strength between nodes *i*

In case of weakly phase-coupled oscillators the dynamics of local states is described by

Linear approximation of coupling function sin(*θ*) *θ* in (13) results in the consensus model

*Aij*

*Aij* sin

*N* ∑ *j*=1

*N* ∑ *j*=1 *Aijψ* 

*xj*(*t*) − *xi*(*t*)

*θj*(*t*) − *θi*(*t*)

*θj*(*t*) − *θi*(*t*)

<sup>Θ</sup>˙ (*t*) = <sup>−</sup> *kc***<sup>L</sup>** <sup>Θ</sup>(*t*) , (15)

Dynamics of *N* locally coupled dynamical systems on the graph *G* is described by

∑ *i*,*j*∈*ck*  *Aij* 2*m*

*R*(*t*) ≈ (1 − *t*) · *R*(0) + *t* · *Q* = *Q*(*t*) , (10)

*<sup>t</sup>* <sup>−</sup> *didj* 4*m*<sup>2</sup> 

. (11)

, (12)

. (13)

, (14)

is driven by the random walk operator *Aij dj* − *δij*, which in case of a discrete time process is presented by the random walk matrix **<sup>L</sup>***rw* <sup>=</sup> **<sup>D</sup>**−1**<sup>L</sup>** <sup>=</sup> **<sup>I</sup>** <sup>−</sup> **<sup>D</sup>**−1**A**, where **<sup>L</sup>** <sup>=</sup> **<sup>D</sup>** <sup>−</sup> **<sup>A</sup>** is a Laplacian matrix, **A** is a non-negative weighted adjacency matrix, **D** = diag{di}, i = 1, . . . , N. For an undirected connected network the stationary solution of (2) is given by *p*∗ *<sup>i</sup>* = *di*/2*m*.

Let's now assume that for an undirected network there exist a partition P with communities *ck* ∈ P, *k* = 1, . . . , *Nc*. The probability that initially, at *t*0, a random walker belongs to a community *ck* is Pr(*ck*, *t*0) = ∑ *j*∈*ck dj*/2*m*. Probability that a random walker, which was initially

in *ck*, will stay in the same community at the next step *t*<sup>0</sup> + 1 is given by

$$\Pr\left(c\_{k'}t\_0, t\_0+1\right) = \sum\_{j \in c\_k} \sum\_{i \in c\_k} \left(\frac{A\_{ij}}{d\_j}\right) \left(\frac{d\_j}{2m}\right). \tag{4}$$

The assumption that dynamics is ergodic means that the memory of the initial conditions are lost at infinity, hence Pr(*ck*, *t*0, ∞) is equal to the probability that two independent walkers are in *ck*,

$$\Pr(c\_{k'}t\_{0'}\infty) = \left(\sum\_{i \in c\_k} \frac{d\_i}{2m}\right) \left(\sum\_{j \in c\_k} \frac{d\_j}{2m}\right). \tag{5}$$

Combining (4) and (5) we may write

$$\sum\_{c\_k \in P} \left( \Pr\left( c\_k, t\_0, t\_0 + 1 \right) - \Pr(c\_k, t\_0, \infty) \right) = \frac{1}{2m} \sum\_{i, j} \left( A\_{ij} - \frac{d\_i d\_j}{2m} \right) \delta(c\_i, c\_j) = Q \,. \tag{6}$$

In general case, using (3), one may define a stability of the partition P as (Evans & Lambiotte, 2009; Lambiotte et al, 2009)

$$R\_{\mathcal{P}}(t) = \sum\_{c\_k \in \mathcal{P}} \Pr\left(c\_{k'}t\_{0\prime}t\_0 + t\right) - \Pr(c\_{k'}t\_{0\prime}\infty) \tag{7}$$

$$=\sum\_{\mathbf{c}\_k \in \mathcal{P}} \sum\_{\mathbf{i}, \mathbf{j} \in \mathbf{c}\_k} \left( \left( e^{t(\hat{A} - I)} \right)\_{\mathbf{i}\mathbf{j}} \frac{d\_{\mathbf{j}}}{2m} - \frac{d\_{\mathbf{i}} d\_{\mathbf{j}}}{4m^2} \right), \text{ where } \hat{A}\_{\mathbf{i}\mathbf{j}} = \frac{A\_{\mathbf{i}\mathbf{j}}}{d\_{\mathbf{j}}}.\tag{8}$$

Then, as the special cases of (8) at *t* = 1, we get the expression for modularity (6). Note that *<sup>R</sup>*<sup>P</sup> (*t*) is non-increasing function of time: at˙ *<sup>t</sup>* = 0 we get

$$R\_{\mathcal{P}}(0) = 1 - \sum\_{c\_k \in \mathcal{P}} \sum\_{i,j \in c\_k} \frac{d\_i d\_j}{4m^2} \tag{9}$$

and max P *R*(0) is reached when each node is assigned to its own community. Note that (9) corresponds to collision entropy or Rényi entropy of order 2.

On the other hand, in the limit *<sup>t</sup>* → <sup>∞</sup>, the maximum of *<sup>R</sup>*<sup>P</sup> (*t*) is achieved with Fiedler spectral decomposition into 2 communities. In other words, time here may be seen as a resolution parameter: with time *t* increasing, the max P *R*(*t*) results in a sequence of hierarchical partitions {P*t*} with the decreasing numbers of communities.

Furthermore, as shown in (Evans & Lambiotte, 2009), we may define a time-varying modularity *Q*(*t*) by linear terms in time expansion for *R*(*t*) at *t* ≈ 0,

$$R(t) \approx (1 - t) \cdot R(0) + t \cdot Q = Q(t) \text{ .}\tag{10}$$

which after substitution (6) and (9) gives

4 Will-be-set-by-IN-TECH

is presented by the random walk matrix **<sup>L</sup>***rw* <sup>=</sup> **<sup>D</sup>**−1**<sup>L</sup>** <sup>=</sup> **<sup>I</sup>** <sup>−</sup> **<sup>D</sup>**−1**A**, where **<sup>L</sup>** <sup>=</sup> **<sup>D</sup>** <sup>−</sup> **<sup>A</sup>** is a Laplacian matrix, **A** is a non-negative weighted adjacency matrix, **D** = diag{di}, i = 1, . . . , N.

Let's now assume that for an undirected network there exist a partition P with communities *ck* ∈ P, *k* = 1, . . . , *Nc*. The probability that initially, at *t*0, a random walker belongs to a

> *j*∈*ck* ∑ *i*∈*ck*

The assumption that dynamics is ergodic means that the memory of the initial conditions are lost at infinity, hence Pr(*ck*, *t*0, ∞) is equal to the probability that two independent walkers are

In general case, using (3), one may define a stability of the partition P as (Evans & Lambiotte,

*dj* <sup>2</sup>*<sup>m</sup>* <sup>−</sup> *didj* 4*m*<sup>2</sup> 

Then, as the special cases of (8) at *t* = 1, we get the expression for modularity (6).

*<sup>R</sup>*<sup>P</sup> (0) = <sup>1</sup> <sup>−</sup> ∑

*ck*∈P

On the other hand, in the limit *<sup>t</sup>* → <sup>∞</sup>, the maximum of *<sup>R</sup>*<sup>P</sup> (*t*) is achieved with Fiedler spectral decomposition into 2 communities. In other words, time here may be seen as a

∑ *i*,*j*∈*ck*

*R*(0) is reached when each node is assigned to its own community. Note that (9)

P

*didj*

*di* 2*m*  ∑ *j*∈*ck*

<sup>2</sup>*<sup>m</sup>* ∑ *i*,*j*

 ∑ *i*∈*ck*  *Aij dj*

− *δij*, which in case of a discrete time process

*dj*/2*m*. Probability that a random walker, which was initially

 *dj* 2*m* 

> *dj* 2*m*

*Aij* <sup>−</sup> *didj* 2*m* 

Pr(*ck*, *t*0, *t*<sup>0</sup> + *t*) − Pr(*ck*, *t*0, ∞) (7)

*ij* <sup>=</sup> *Aij dj*

<sup>4</sup>*m*<sup>2</sup> (9)

*R*(*t*) results in a sequence of hierarchical

, where *A*ˆ

*<sup>i</sup>* = *di*/2*m*.

. (4)

. (5)

*δ*(*ci*, *cj*) = *Q* . (6)

. (8)

*Aij dj*

For an undirected connected network the stationary solution of (2) is given by *p*∗

*j*∈*ck*

in *ck*, will stay in the same community at the next step *t*<sup>0</sup> + 1 is given by

Pr(*ck*, *t*0, *t*<sup>0</sup> + 1) = ∑

Pr(*ck*, *t*0, ∞) =

(Pr(*ck*, *<sup>t</sup>*0, *<sup>t</sup>*<sup>0</sup> <sup>+</sup> <sup>1</sup>) <sup>−</sup> Pr(*ck*, *<sup>t</sup>*0, <sup>∞</sup>)) <sup>=</sup> <sup>1</sup>

*<sup>R</sup>*<sup>P</sup> (*t*) = ∑

 *e <sup>t</sup>*(*A*ˆ−*I*) *ij*

Note that *<sup>R</sup>*<sup>P</sup> (*t*) is non-increasing function of time: at˙ *<sup>t</sup>* = 0 we get

corresponds to collision entropy or Rényi entropy of order 2.

resolution parameter: with time *t* increasing, the max

∑ *i*,*j*∈*ck* *ck*∈P

is driven by the random walk operator

community *ck* is Pr(*ck*, *t*0) = ∑

Combining (4) and (5) we may write

= ∑ *ck*∈P

∑ *ck*∈*P*

2009; Lambiotte et al, 2009)

in *ck*,

and max P

$$Q(t) = (1 - t) + \sum\_{c\_k \in \mathcal{P}} \sum\_{i, j \in c\_k} \left( \frac{A\_{ij}}{2m} t - \frac{d\_i d\_j}{4m^2} \right) \,. \tag{11}$$

In the following we apply time-dependent modularity maximization (11) using the greedy search to find hierarchical structures in networks beyond modularity maximization *Qmax* in (1). This approach is useful in cases where maximization of (1) results in a very fragmental structure with a large number of communities. Also it allows us to evaluate the stability of communities at different resolution levels. However, since the adjacency matrix **A** is not time dependent, the time-varying modularity (11) can not be used to make predictions beyond the given topology.

#### **3. Topology detection using coupled dynamical systems**

#### **3.1 Laplacian formulation of network dynamics**

Let's consider an undirected weighted graph *G* = {*V*, *E*} with *N* nodes and *E* edges, where each node represents a local dynamical system and edges correspond to local coupling. Dynamics of *N* locally coupled dynamical systems on the graph *G* is described by

$$\dot{\mathbf{x}}\_i(t) = q\_i(\mathbf{x}\_i(t)) + k\_c \sum\_{j=1}^{N} A\_{ij} \boldsymbol{\psi} \left( \mathbf{x}\_j(t) - \mathbf{x}\_i(t) \right) \tag{12}$$

where *qi*(*xi*) describes a local dynamics of state *xi*; *Aij* is a coupling strength between nodes *i* and *j*; *ψ*(·) is a coupling function; *kc* is a global coupling gain.

In case of weakly phase-coupled oscillators the dynamics of local states is described by Kuramoto model (Acebron et al, 2005; Kuramoto, 1975)

$$\dot{\theta}\_i(t) = \omega\_i + k\_c \sum\_{j=1}^{N} A\_{ij} \sin \left[ \theta\_j(t) - \theta\_i(t) \right] \,. \tag{13}$$

Linear approximation of coupling function sin(*θ*) *θ* in (13) results in the consensus model (Olfati-Saber et al, 2007)

$$\dot{\theta}\_{\dot{i}}(t) = k\_c \sum\_{j=1}^{N} A\_{\dot{i}\dot{j}} \left[ \theta\_{\dot{j}}(t) - \theta\_{\dot{i}}(t) \right],\tag{14}$$

which for a connectivity graph *G* may be written as

$$
\dot{\Theta}(t) = -k\_{\text{cf}} \mathbf{L} \,\Theta(t) \,, \tag{15}
$$

dynamical connectivity matrix (20) in the form **C***<sup>η</sup>* (*t*) to present the evolution of connectivity in time for a fixed correlation threshold *η*. Using this approach we consider below several

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 57

As the first step, let's introduce dynamics into static attractive coupling (13). Using the

*F*(*η*) *ij* (*t*) sin

where initial conditions are defined by *Aij*; **D**F(*t*) is formed from **D**<sup>A</sup> with elements {*ak*}

Many biological and social systems show a presence of a competition between conflicting processes. In case of coupled oscillators it may be modeled as the attractive coupling (driving oscillators into the global synchronization) combined with the repulsive coupling (forcing system into a chaotic/random behavior). To allow positive and negative interactions we use instant correlation matrix **R**(*t*) = **R**+(*t*) + **R**−(*t*), and separate attractive and repulsive parts

> − *kc*

Note that the total number of links in the network does not change, at a given time instant

To obtain the Laplacian presentation we define a dynamical connectivity matrix **F**(*t*) as

*N* ∑ *j*=1 |*r*<sup>−</sup>

*θj*(*t*) − *θi*(*t*)

**B**(*t*)*T*Θ(*t*)

*ij* (*t*)<sup>|</sup> *Aij* sin

**<sup>F</sup>**(*t*) = **<sup>R</sup>**(*t*) ◦ **<sup>A</sup>** <sup>=</sup> **<sup>F</sup>**+(*t*) + **<sup>F</sup>**−(*t*), (24)

**L**F(*t*) = **B**(*t*)(**D**F<sup>+</sup> (*t*) + **D**F<sup>−</sup> (*t*))**B***T*(*t*). (25)

*F*− *ij* (*t*) sin

*N* ∑ *m*=1 , (21)

*ij* (*t*) = *AijCη*(*t*)*ij* ≥ 0. Then,

*θj*(*t*) − *θi*(*t*)

*θj*(*t*) − *θi*(*t*)

, (26)

, (23)

, (22)

*N* ∑ *j*=1

similar to (18), the attractive coupling with a dynamical update may be described as

**B.2. Combination of attractive and repulsive coupling with dynamical links update**

*θj*(*t*) − *θi*(*t*)

*θj*(*t*) − *θi*(*t*)

 − *kc*

<sup>Θ</sup>˙ (*t*) = **<sup>Ω</sup>** <sup>−</sup> *kc***B**(*t*)**D**F(*t*) sin

scenarios of networks evolution with dynamically changing coupling.

**B.1. Attractive coupling with dynamical updates**

dynamical connectivity matrix (20) we may write

˙

scaled according to **C***<sup>η</sup>* (*t*).

˙

*θi*(*t*) = *ω<sup>i</sup>* + *kc*

It allows us to write

*θi*(*t*) = *ω<sup>n</sup>* + *kc*

˙

*N* ∑ *j*=1 *r*+

element-by-element matrix product

*ij* (*t*) *Aij* sin

each link performs either attractive or repulsive function.

and present dynamic Laplacian as the following

*N* ∑ *m*=1

*F*+ *ij* (*t*) sin

<sup>1</sup> For presentation clarity we omit here the correlation threshold *η*.

where superscripts denote positive and negative correlations 1.

*θi*(*t*) = *ω<sup>i</sup>* + *kc*

where matrix **<sup>F</sup>**(*η*)(*t*) describes dynamical attractive coupling, *<sup>F</sup>*(*η*)

where **L** = **A** − **D** is the Laplacian matrix of *G*. The solution of (15) in the form of normal modes *ωi*(*t*) may be written as

$$
\omega\_i(t) = k\_\varepsilon \sum\_{j=1}^N V\_{ij} \theta\_j(t) = k\_\varepsilon \,\omega\_i(t\_0) \mathbf{e}^{-\lambda\_i t} \,, \tag{16}
$$

where *λ*1,..., *λ<sup>N</sup>* are eigenvalues and **V** is the matrix of eigenvectors of **L**. Note that (16) describes a convergence speed to a consensus for each nodes. Let's order these equations according to the descending order of their eigenvalues. Then it is easy to see that nodes are approaching the consensus in a hierarchical way, revealing in the same time a hierarchy of communities in the given network *G*.

Note that (15) has the same form as (3), with the difference that the random walk process (3) is based on **L***rw* = **D**−<sup>1</sup> **L**. It allows us to consider random-walk-based communities detection in the previous section as a special case of coupled oscillators synchronization.

Similarly to (15), we may derive the Laplacian presentation for locally coupled oscillators (13). In particular, the connectivity of a graph may be described by the graph incidence (*N* × *E*) matrix **B**: {**B**}*ij* = 1 (or −1) if nodes *j* and *i* are connected, otherwise {**B**}*ij* = 0. In case of weighted graphs we use the weighted Laplacian defined as

$$\mathbf{L}\_{\mathbf{A}} \stackrel{\triangle}{=} \mathbf{B} \mathbf{D}\_{\mathbf{A}} \mathbf{B}^{T} \,. \tag{17}$$

Based on (17) we can rewrite (13) as

$$\dot{\Theta}(t) = \boldsymbol{\Omega} - k\_c \mathbf{B} \mathbf{D}\_A \sin \left( \mathbf{B}^T \Theta(t) \right), \tag{18}$$

where vectors and matrices are defined as follows:

<sup>Θ</sup>(*t*) [*θ*1(*t*), ··· , *<sup>θ</sup>N*(*t*)]*T*; <sup>Ω</sup>(*t*) [*ω*1(*t*), ··· , *<sup>ω</sup>N*(*t*)]*T*; **<sup>D</sup>**<sup>A</sup> diag {*a*1,..., *aE*}, *<sup>a</sup>*1, ..., *aE* are weights *Aij* indexed from 1 to *E*.

In the following we use (18) to describe different coupling scenarios.

#### **3.2 Dynamical structures with different coupling scenarios**

Let's consider local correlations between instant phases of oscillators,

$$r\_{ij}(t) = \langle \cos \left[\theta\_j(t) - \theta\_i(t)\right] \rangle,\tag{19}$$

where the average is taken over initial random phases *θi*(*t* = 0).

Following (Arenas et al, 2006; 2008) we may define a dynamical connectivity matrix **C***t*(*η*), where two nodes *i* and *j* are connected at time *t* if their local phase correlation is above a given threshold *η*,

$$
\mathbb{C}\_t(\eta)\_{ij} = 1 \quad \text{if} \quad r\_{ij}(t) > \eta
$$

$$
\mathbb{C}\_t(\eta)\_{ij} = 0 \quad \text{if} \quad r\_{ij}(t) < \eta \,. \tag{20}
$$

We select communities resolution level (time *t*) using a random walk as in Section 2. Next, by changing the threshold *η*, we obtain a set of connectivity matrices **C***t*(*η*) which reveal dynamical topological structures for different correlation levels. Since the local correlations *rij*(*t*) are continuous and monotonic functions in time, we may also fix *η* and express dynamical connectivity matrix (20) in the form **C***<sup>η</sup>* (*t*) to present the evolution of connectivity in time for a fixed correlation threshold *η*. Using this approach we consider below several scenarios of networks evolution with dynamically changing coupling.

#### **B.1. Attractive coupling with dynamical updates**

6 Will-be-set-by-IN-TECH

where **L** = **A** − **D** is the Laplacian matrix of *G*. The solution of (15) in the form of normal

where *λ*1,..., *λ<sup>N</sup>* are eigenvalues and **V** is the matrix of eigenvectors of **L**. Note that (16) describes a convergence speed to a consensus for each nodes. Let's order these equations according to the descending order of their eigenvalues. Then it is easy to see that nodes are approaching the consensus in a hierarchical way, revealing in the same time a hierarchy of

Note that (15) has the same form as (3), with the difference that the random walk process (3) is based on **L***rw* = **D**−<sup>1</sup> **L**. It allows us to consider random-walk-based communities detection

Similarly to (15), we may derive the Laplacian presentation for locally coupled oscillators (13). In particular, the connectivity of a graph may be described by the graph incidence (*N* × *E*) matrix **B**: {**B**}*ij* = 1 (or −1) if nodes *j* and *i* are connected, otherwise {**B**}*ij* = 0. In case of

<sup>Θ</sup>(*t*) [*θ*1(*t*), ··· , *<sup>θ</sup>N*(*t*)]*T*; <sup>Ω</sup>(*t*) [*ω*1(*t*), ··· , *<sup>ω</sup>N*(*t*)]*T*; **<sup>D</sup>**<sup>A</sup> diag {*a*1,..., *aE*}, *<sup>a</sup>*1, ..., *aE*

Following (Arenas et al, 2006; 2008) we may define a dynamical connectivity matrix **C***t*(*η*), where two nodes *i* and *j* are connected at time *t* if their local phase correlation is above a given

*Ct*(*η*)*ij* = 1 if *rij*(*t*) > *η*

We select communities resolution level (time *t*) using a random walk as in Section 2. Next, by changing the threshold *η*, we obtain a set of connectivity matrices **C***t*(*η*) which reveal dynamical topological structures for different correlation levels. Since the local correlations *rij*(*t*) are continuous and monotonic functions in time, we may also fix *η* and express

*θj*(*t*) − *θi*(*t*)

*Ct*(*η*)*ij* = 0 if *rij*(*t*) < *η* . (20)

 **B***T*Θ(*t*) 

*Vijθj*(*t*) = *kc ωi*(*t*0)e−*λit* , (16)

**L**<sup>A</sup> **BD**A**B***<sup>T</sup>* . (17)

, (18)

, (19)

*ωi*(*t*) = *kc*

weighted graphs we use the weighted Laplacian defined as

where vectors and matrices are defined as follows:

*N* ∑ *j*=1

in the previous section as a special case of coupled oscillators synchronization.

<sup>Θ</sup>˙ (*t*) = **<sup>Ω</sup>** <sup>−</sup> *kc***BD**<sup>A</sup> sin

In the following we use (18) to describe different coupling scenarios.

Let's consider local correlations between instant phases of oscillators,

where the average is taken over initial random phases *θi*(*t* = 0).

*rij*(*t*) = *cos*

**3.2 Dynamical structures with different coupling scenarios**

modes *ωi*(*t*) may be written as

communities in the given network *G*.

Based on (17) we can rewrite (13) as

are weights *Aij* indexed from 1 to *E*.

threshold *η*,

As the first step, let's introduce dynamics into static attractive coupling (13). Using the dynamical connectivity matrix (20) we may write

$$\dot{\theta}\_i(t) = \omega\_i + k\_c \sum\_{j=1}^{N} F\_{ij}^{(\eta)}(t) \sin \left[\theta\_j(t) - \theta\_i(t)\right],\tag{21}$$

where matrix **<sup>F</sup>**(*η*)(*t*) describes dynamical attractive coupling, *<sup>F</sup>*(*η*) *ij* (*t*) = *AijCη*(*t*)*ij* ≥ 0. Then, similar to (18), the attractive coupling with a dynamical update may be described as

$$\dot{\Theta}(t) = \mathbf{D} - k\_{\text{c}} \mathbf{B}(t) \mathbf{D}\_{\text{F}}(t) \sin \left( \mathbf{B}(t)^{T} \Theta(t) \right), \tag{22}$$

where initial conditions are defined by *Aij*; **D**F(*t*) is formed from **D**<sup>A</sup> with elements {*ak*} scaled according to **C***<sup>η</sup>* (*t*).

#### **B.2. Combination of attractive and repulsive coupling with dynamical links update**

Many biological and social systems show a presence of a competition between conflicting processes. In case of coupled oscillators it may be modeled as the attractive coupling (driving oscillators into the global synchronization) combined with the repulsive coupling (forcing system into a chaotic/random behavior). To allow positive and negative interactions we use instant correlation matrix **R**(*t*) = **R**+(*t*) + **R**−(*t*), and separate attractive and repulsive parts

$$\dot{\theta}\_i(t) = \omega\_i + k\_c \sum\_{j=1}^{N} r\_{ij}^+(t) \left| A\_{ij} \sin\left[\theta\_j(t) - \theta\_i(t)\right] - k\_c \sum\_{j=1}^{N} \left| r\_{ij}^-(t) \right| A\_{ij} \sin\left[\theta\_j(t) - \theta\_i(t)\right], \tag{23}$$

where superscripts denote positive and negative correlations 1.

Note that the total number of links in the network does not change, at a given time instant each link performs either attractive or repulsive function.

To obtain the Laplacian presentation we define a dynamical connectivity matrix **F**(*t*) as element-by-element matrix product

$$\mathbf{F}(t) = \mathbf{R}(t) \circ \mathbf{A} = \mathbf{F}^+(t) + \mathbf{F}^-(t), \tag{24}$$

and present dynamic Laplacian as the following

$$\mathbf{L}\_{\mathbf{F}}(t) = \mathbf{B}(t)(\mathbf{D}\_{\mathbf{F}^+}(t) + \mathbf{D}\_{\mathbf{F}^-}(t))\mathbf{B}^T(t). \tag{25}$$

It allows us to write

$$\dot{\theta}\_i(t) = \omega\_{\text{fl}} + k\_{\text{c}} \sum\_{m=1}^{N} F\_{ij}^+(t) \sin \left[ \theta\_j(t) - \theta\_i(t) \right] - k\_{\text{c}} \sum\_{m=1}^{N} F\_{ij}^-(t) \sin \left[ \theta\_j(t) - \theta\_i(t) \right], \tag{26}$$

<sup>1</sup> For presentation clarity we omit here the correlation threshold *η*.

at Fig.1 and Fig.2, respectively. Modularity maximization here reveals 4 communities shown by different colors. However, the multi-communities membership results in overlapping communities illustrated by overlapping ovals (Fig.1). For example, according to modality maximization, the node 1 belongs to community *c*2, but it also has links to all other

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 59

Participation of different nodes in selected communities is presented at Fig.3 and Fig.4. These graphs show that even if a node is assigned by some community detection algorithm to a certain community, it still may have significant membership in other communities. This multi-communities membership is one of the reasons why different algorithms disagree on communities partitions. In practice, e.g., in targeted advertisements, due to the "hard" decision in community detection, some users may be missed even if they are strongly related to the targeted group. For example, user '29' is assigned to *c*<sup>3</sup> (Fig.1), but it also has equally strong memberships in *c*<sup>2</sup> and *c*<sup>4</sup> (Fig.3). Using soft community detection user '29' can also be

Fig. 2. Membership weight distribution for selected users in karate club social network.

communities indicated by blue bars at Fig.2.

qualified for advertisements targeted to *c*<sup>2</sup> or *c*4.

Fig. 1. Overlapping communities in karate club.

or in matrix form

$$\dot{\boldsymbol{\Theta}}(t) = \boldsymbol{\Omega} - k\_{\mathrm{c}} \mathbf{B}(t) \mathbf{D}\_{\mathrm{F}^{+}}(t) \sin \left( \mathbf{B}^{T}(t) \boldsymbol{\Theta}(t) \right) + k\_{\mathrm{c}} \mathbf{B}(t) \mathbf{D}\_{\mathrm{F}^{-}}(t) \sin \left( \mathbf{B}^{T}(t) \boldsymbol{\Theta}(t) \right) \, . \tag{27}$$

#### **B.3. Combination of attractive and initially neutral coupling with dynamical links update**

Negative correlations (resulting in repulsive coupling) are typically assigned between nodes which are not initially connected. However, in many cases this scenario is not realistic. For example, in social networks, the absence of communications between people does not necessary indicate conflicting (negative) relations, but often has a neutral meaning. To take this observation into account we modified second term in (23) such that it sets neutral initial conditions to unconnected nodes in adjacency matrix **A**. In particular, system dynamics with links update (24) and initially neutral coupling is described by

$$\dot{\theta}\_i(t) = \omega\_i + k\_\mathbb{c} \sum\_{j=1}^N F\_{ij}^+(t) \sin \left[\theta\_j(t) - \theta\_i(t)\right] + k\_\mathbb{c} \sum\_{j=1}^N F\_{ij}^-(t) \cos \left[\theta\_j(t) - \theta\_i(t)\right],\tag{28}$$

or in the matrix form

$$\dot{\boldsymbol{\Theta}}(t) = \boldsymbol{\Omega} - k\_{\mathcal{E}} \mathbf{B}(t) \mathbf{D}\_{\mathbb{F}^+}(t) \sin \left( \mathbf{B}^T(t) \boldsymbol{\Theta}(t) \right) - k\_{\mathcal{E}} \mathbf{B}(t) \mathbf{D}\_{\mathbb{F}^-}(t) \cos \left( \mathbf{B}^T(t) \boldsymbol{\Theta}(t) \right) \,. \tag{29}$$

Then a dynamical interplay between the given network topology and local interactions drives the connectivity evolution. We evaluated the scenarios above using different clustering measures (Manning et al, 2008) and found that scenario *B.3* shows the best performance. In the following we use coupled system dynamics approach to predict networks' evolution and to make missing links predictions and recommendations. Furthermore, the suggested approach allows us also to predict repulsive relations in the network based on the network topology and links dynamics.

#### **4. Overlapping communities**

#### **4.1 Multi-membership**

In social networks people belong to several overlapping communities depending on their families, occupations, hobbies, etc. As the result, users (presented by nodes in a graph) may have different levels of membership in different communities. This fact motivated us to consider multi-community membership as edge-weights to different communities and partition edges instead of clustering nodes.

As an example, we can measure a membership *gj*(*k*) of node *k* in *j*-th community as a number of links (or its weight for a weighted graph) between the *k*-th node and other nodes within the same community, *gj*(*k*) = <sup>∑</sup>*i*∈*cj wki* Then, for each node *<sup>k</sup>* we assign a vector **g**(*k*)=[*g*1(*k*), *g*2(*k*),..., *gNc* (*k*)], *k* ∈ {1, . . . , *N*} which presents the node membership (or participation) in all detected communities {*c*1,..., *cNc*}. In the following we refer **g**(*k*) as a soft community decision for the *k*-th node.

To illustrate the approach, overlapping communities derived from benchmark karate club social network (Zachary, 1977) and membership distributions for selected nodes are depicted

8 Will-be-set-by-IN-TECH

+ *kc***B**(*t*)**D**F<sup>−</sup> (*t*) sin

*θj*(*t*) − *θi*(*t*)

**B***T*(*t*)Θ(*t*)

, (28)

. (29)

**B***T*(*t*)Θ(*t*)

. (27)

**B***T*(*t*)Θ(*t*)

**B.3. Combination of attractive and initially neutral coupling with dynamical links update** Negative correlations (resulting in repulsive coupling) are typically assigned between nodes which are not initially connected. However, in many cases this scenario is not realistic. For example, in social networks, the absence of communications between people does not necessary indicate conflicting (negative) relations, but often has a neutral meaning. To take this observation into account we modified second term in (23) such that it sets neutral initial conditions to unconnected nodes in adjacency matrix **A**. In particular, system dynamics with

*θj*(*t*) − *θi*(*t*)

**B***T*(*t*)Θ(*t*)

Then a dynamical interplay between the given network topology and local interactions drives the connectivity evolution. We evaluated the scenarios above using different clustering measures (Manning et al, 2008) and found that scenario *B.3* shows the best performance. In the following we use coupled system dynamics approach to predict networks' evolution and to make missing links predictions and recommendations. Furthermore, the suggested approach allows us also to predict repulsive relations in the network based on the network

In social networks people belong to several overlapping communities depending on their families, occupations, hobbies, etc. As the result, users (presented by nodes in a graph) may have different levels of membership in different communities. This fact motivated us to consider multi-community membership as edge-weights to different communities and

As an example, we can measure a membership *gj*(*k*) of node *k* in *j*-th community as a number of links (or its weight for a weighted graph) between the *k*-th node and other nodes within the same community, *gj*(*k*) = <sup>∑</sup>*i*∈*cj wki* Then, for each node *<sup>k</sup>* we assign a vector **g**(*k*)=[*g*1(*k*), *g*2(*k*),..., *gNc* (*k*)], *k* ∈ {1, . . . , *N*} which presents the node membership (or participation) in all detected communities {*c*1,..., *cNc*}. In the following we refer **g**(*k*) as a

To illustrate the approach, overlapping communities derived from benchmark karate club social network (Zachary, 1977) and membership distributions for selected nodes are depicted

 + *kc*

*N* ∑ *j*=1 *F*− *ij* (*t*) cos

− *kc***B**(*t*)**D**F<sup>−</sup> (*t*) cos

links update (24) and initially neutral coupling is described by

*F*+ *ij* (*t*) sin

*N* ∑ *j*=1

<sup>Θ</sup>˙ (*t*) = **<sup>Ω</sup>** <sup>−</sup> *kc***B**(*t*)**D**F<sup>+</sup> (*t*) sin

or in matrix form

˙

or in the matrix form

*θi*(*t*) = *ω<sup>i</sup>* + *kc*

topology and links dynamics.

**4. Overlapping communities**

partition edges instead of clustering nodes.

soft community decision for the *k*-th node.

**4.1 Multi-membership**

<sup>Θ</sup>˙ (*t*) = **<sup>Ω</sup>** <sup>−</sup> *kc***B**(*t*)**D**F<sup>+</sup> (*t*) sin

at Fig.1 and Fig.2, respectively. Modularity maximization here reveals 4 communities shown by different colors. However, the multi-communities membership results in overlapping communities illustrated by overlapping ovals (Fig.1). For example, according to modality maximization, the node 1 belongs to community *c*2, but it also has links to all other communities indicated by blue bars at Fig.2.

Participation of different nodes in selected communities is presented at Fig.3 and Fig.4. These graphs show that even if a node is assigned by some community detection algorithm to a certain community, it still may have significant membership in other communities. This multi-communities membership is one of the reasons why different algorithms disagree on communities partitions. In practice, e.g., in targeted advertisements, due to the "hard" decision in community detection, some users may be missed even if they are strongly related to the targeted group. For example, user '29' is assigned to *c*<sup>3</sup> (Fig.1), but it also has equally strong memberships in *c*<sup>2</sup> and *c*<sup>4</sup> (Fig.3). Using soft community detection user '29' can also be qualified for advertisements targeted to *c*<sup>2</sup> or *c*4.

Fig. 1. Overlapping communities in karate club.

Fig. 2. Membership weight distribution for selected users in karate club social network.

• each new link creates at least one new clique (the FoF concept);

on multi-membership.

systems described in Section 3.

overlapping communities.



*S*(*i*,*i*) KC (*k*, *n*) =

0 < *β* < 1, such that ∑*ij βAij* < 1.

base-line predictors *S*(*k*, *n*) as



where <sup>|</sup>path*i*(*k*, *<sup>n</sup>*)(*l*)

measure, *S*(*i*,*i*)

**Modified topology-based predictors**

the base-line predictor scores as follows:

• complete inter-cliques (where nodes belong to different communities);

• complete cliques within the same community (intra-cliques) using the FoF concept; • complete cliques towards to the fully-connected own community if there is no FoF links;

• prioritize intra-clique and inter-clique links completion according to some measure based

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 61

To assign priorities we introduce several similarity measures outlined below. We will show in next sections that these rules are well in line with link predictions made by coupled dynamical

Let'a define sets of neighbors of node *k*, which are inside and outside of community *ci* as <sup>Γ</sup>*i*(*k*) = {Γ(*k*) <sup>∈</sup> *ci*} and <sup>Γ</sup>\*i*(*k*) = {Γ(*k*) <sup>∈</sup>/ *ci*}, respectively. This allows us to introduce a set of similarity measures by modifying topology-based base-line predictors listed in (Liben-Nowel & Kleinberg, 2003) to take into account the multiple-membership in

As an example, for the intra-clique completion we may associate a quality of missing link prediction (or recommendation) between nodes *k* and *n* within *ci* community by modifying

<sup>P</sup>*<sup>A</sup>* (*k*, *n*) = |Γ*i*(*k*)|·|Γ*i*(*n*)|;

<sup>|</sup>path(*k*, *<sup>n</sup>*)(*l*)

AA (*k*, *<sup>n</sup>*) = <sup>∑</sup>*z*∈Γ*i*(*k*)∩Γ*i*(*n*) (log|Γ(*z*)|)−1;


matrix, *A*(*i*) is the (weighted) adjacency matrix of community *ci*, *β* is a dumping parameter,

Additionally to the base-line predictors above, we also use a community connectivity

consensus within a community *ci* when a link between nodes *k* and *n* is added inside the community; here *σ*2(*L*) is the 2nd smallest eigenvalue of the graph Laplacian **L***<sup>i</sup>* of community

The measures above consider communities as disjoint sets and may be used as the 1st order approximation for link predictions in overlapping communities. To take into account both intra- and inter-community links we use multi-community membership for nodes, *gi*(*k*). In general, for nodes *k* ∈ *ci* and *n* ∈ *cj*, the inter-community relations may be asymmetric, *gj*(*k*) = *gi*(*n*). In the case of undirected graphs we may use averaging and modify the

(*k*, *<sup>n</sup>*) = *gj*(*k*) + *gi*(*n*)

2*m*

CC (*k*, *n*) ∼ *σ*2(*Li*), which characterizes a convergence speed of opinions to

(*<sup>I</sup>* <sup>−</sup> *<sup>β</sup>A*(*i*)


)−<sup>1</sup> <sup>−</sup> *<sup>I</sup>* (*k*,*n*) ,

*S*(*k*, *n*). (30)

<sup>J</sup>*<sup>C</sup>* (*k*, *n*) = |Γ*i*(*k*) ∩ Γ*i*(*n*)|/|Γ*i*(*k*) ∪ Γ*i*(*n*)|;

∞ ∑ *l*=1 *βl*

*ci* (or the normalized Laplacian for weighted graphs, based on (17)).

*S*(*i*,*j*)

Fig. 3. Karate club: participation of users in communities *c*2, *c*4.

Fig. 4. Karate club: participation of users in communities *c*1, *c*3.

#### **4.2 Application of soft community detection for recommendation systems**

In online social networks a recommendation of new social links may be seen as an attractive service. Recently Facebook and LinkedIn introduced a service "People You May Know", which recommends new connections using the friend-of-friend (FoF) approach. However, in large networks the FoF approach may create a long and often not relevant list of recommendations, which is difficult (and also computationally expensive, in particular in mobile solutions) to navigate. Furthermore, in mobile social networks (e.g., Nokia portal Ovi Store) these kinds of recommendations are even more complicated because users' affiliations to different groups (and even its number) are not known. Hence, before making recommendations, communities are to be detected first.

#### **Recommendations as communities completion**

Based on soft communities detection we suggest to make the FoF recommendations as follows:


To make new link recommendations in (iii) we suggest the following rules:


To assign priorities we introduce several similarity measures outlined below. We will show in next sections that these rules are well in line with link predictions made by coupled dynamical systems described in Section 3.

#### **Modified topology-based predictors**

10 Will-be-set-by-IN-TECH

Fig. 3. Karate club: participation of users in communities *c*2, *c*4.

Fig. 4. Karate club: participation of users in communities *c*1, *c*3.

are to be detected first.

**Recommendations as communities completion**

**4.2 Application of soft community detection for recommendation systems**

(i) detect communities, e.g., by using one of the methods described above; (ii) calculate membership *gj*(*k*) in all relevant communities for each node *k*;

To make new link recommendations in (iii) we suggest the following rules:

(iv) use multiple-membership to prioritize recommendations.

In online social networks a recommendation of new social links may be seen as an attractive service. Recently Facebook and LinkedIn introduced a service "People You May Know", which recommends new connections using the friend-of-friend (FoF) approach. However, in large networks the FoF approach may create a long and often not relevant list of recommendations, which is difficult (and also computationally expensive, in particular in mobile solutions) to navigate. Furthermore, in mobile social networks (e.g., Nokia portal Ovi Store) these kinds of recommendations are even more complicated because users' affiliations to different groups (and even its number) are not known. Hence, before making recommendations, communities

Based on soft communities detection we suggest to make the FoF recommendations as follows:

(iii) make new recommendations as communities completion following the rules below;

Let'a define sets of neighbors of node *k*, which are inside and outside of community *ci* as <sup>Γ</sup>*i*(*k*) = {Γ(*k*) <sup>∈</sup> *ci*} and <sup>Γ</sup>\*i*(*k*) = {Γ(*k*) <sup>∈</sup>/ *ci*}, respectively. This allows us to introduce a set of similarity measures by modifying topology-based base-line predictors listed in (Liben-Nowel & Kleinberg, 2003) to take into account the multiple-membership in overlapping communities.

As an example, for the intra-clique completion we may associate a quality of missing link prediction (or recommendation) between nodes *k* and *n* within *ci* community by modifying the base-line predictor scores as follows:


$$S\_{\mathsf{KC}}^{(i,i)}(k,n) = \sum\_{l=1}^{\infty} \beta^l |\mathsf{path}(k,n)^{(l)}| = \left\{ (I - \beta A^{(i)})^{-1} - I \right\}\_{(k,n)}.$$

where <sup>|</sup>path*i*(*k*, *<sup>n</sup>*)(*l*) | is number of all paths of length-*l* from *k* to *n* within *ci*; *I* is the identity matrix, *A*(*i*) is the (weighted) adjacency matrix of community *ci*, *β* is a dumping parameter, 0 < *β* < 1, such that ∑*ij βAij* < 1.

Additionally to the base-line predictors above, we also use a community connectivity measure, *S*(*i*,*i*) CC (*k*, *n*) ∼ *σ*2(*Li*), which characterizes a convergence speed of opinions to consensus within a community *ci* when a link between nodes *k* and *n* is added inside the community; here *σ*2(*L*) is the 2nd smallest eigenvalue of the graph Laplacian **L***<sup>i</sup>* of community *ci* (or the normalized Laplacian for weighted graphs, based on (17)).

The measures above consider communities as disjoint sets and may be used as the 1st order approximation for link predictions in overlapping communities. To take into account both intra- and inter-community links we use multi-community membership for nodes, *gi*(*k*). In general, for nodes *k* ∈ *ci* and *n* ∈ *cj*, the inter-community relations may be asymmetric, *gj*(*k*) = *gi*(*n*). In the case of undirected graphs we may use averaging and modify the base-line predictors *S*(*k*, *n*) as

$$S^{(i,j)}(k,n) = \frac{g\_j(k) + g\_i(n)}{2m} S(k,n) \,. \tag{30}$$

according its total weight *m*.

**<sup>A</sup>***<sup>l</sup>* ; **<sup>d</sup>**¯ <sup>=</sup> <sup>1</sup>

*L L* ∑ *l*

> 1 2*L*

*L* ∑ *l*

*Tr*

**<sup>A</sup>**¯ *<sup>w</sup>* <sup>=</sup> <sup>1</sup> *L L* ∑ *l*

*Ql* = max **G**

the layer *l* according to its modularity *Ql*, hence

**6. Simulation results for benchmark networks**

**<sup>A</sup>**¯ <sup>=</sup> <sup>1</sup> *L L* ∑ *l*

max *<sup>Q</sup>* <sup>=</sup> <sup>1</sup>

*L L* ∑ *l*

is used in (Dong et al, 2011).

considered elsewhere.

1977).

The simplest method to combine multi-layer graphs is to make the average of all layers:

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 63

*L L* ∑ *l*

Then the community membership matrix **G** may be found by one of community detection methods described before. By taking into account degree distributions of nodes at each graph

**<sup>G</sup>***T*(**A***<sup>l</sup>* <sup>−</sup> **<sup>d</sup>***l***d***<sup>T</sup>*

Similar approach, but applied to graph Laplacian spectra and extended with a regularization,

Typically networks describing social relations are often undersampled, noisy and contain different amount of information at each layer. As the result, a noisy or an observable part(s) in one of the layers after averaging in (35) and (36) may deteriorate the total accuracy. A possible solution for this problem is to apply weighted superposition of layers. In particular, the more informative the layer *l* is, the larger weight *wl* it should be given. For example, we may weight

*wl***A***<sup>l</sup>* <sup>=</sup> <sup>1</sup>

Another method to improve the robustness of nodes classification in multi-layer graphs is to extract structural properties **G***<sup>l</sup>* at each layer separately and then merge partitions (Strehl & Ghosh, 2002). The more advanced approach of processing of multi-dimensional data may be based on presenting multi-layer graphs as tensors and apply tensor decomposition algorithms (Kolda & Bader, 2009) to extract stable communities and make de-noising by lower-dimension tensor approximation. These methods are rather involved and will be

To test algorithms described in the previous sections we use the karate club social network (Zachary, 1977). As mentioned before, to get different hierarchical levels beyond and below the resolution provided by *max*-modularity, we use the random walk approach. A number of detected communities in the karate club at different resolution levels is presented at Fig.5. As one can see, the *max*-modularity algorithm does not necessary result in the best partition stability. The most stable partition in case of the karate club corresponds to 2 communities (shown by squares and circles at Fig.1), which is in line with results reported by (Zachary,

Comparison of coupling scenarios *B.2* and *B.3* is presented at Fig.6 and Fig.7. Pair-wise correlations between oscillators at *t* = 1 for coupling scenarios *B.2* and *B.3* are depicted at Fig.6. Scenario *B.3* reveals clearly communities structure, while in case of *B.2* the negative coupling overwhelms the attractive coupling and forces the system into a chaotic behavior.

*L L* ∑ *l*

*l* 2*ml* )**G** 

*ml* ; max *Q* = max

= max **G**

**G**

1 *L L* ∑ *l*

1 2*m*¯

*Tr*(**G***<sup>T</sup>* **<sup>M</sup>***<sup>l</sup>* 2*ml*

*Ql***A***<sup>l</sup>* ; (37)

*Tr*(**G***<sup>T</sup>***MG¯** ) (35)

**G**), (36)

**<sup>d</sup>***<sup>l</sup>* ; *<sup>m</sup>*¯ <sup>=</sup> <sup>1</sup>

layer, the total modularity across all layers may maximized as (Tang et all, 2009)

For example, modified Jaccards and Katz scores which take into account multi-communities membership are defined as

$$S\_{\rm JC}^{(i,j)}(k,n) = \frac{g\_j(k) + g\_i(n)}{2m} \frac{|\Gamma(k) \cap \Gamma(n)|}{|\Gamma(k) \cup \Gamma(n)|},\tag{31}$$

$$S\_{\mathbf{K}}^{(i,j)}(k,n) = \frac{g\_j(k) + g\_i(n)}{2m} \left\{ (I - \beta A^{(\mathbb{C}\_{n,k})})^{-1} - I \right\}\_{\begin{pmatrix} k,n \end{pmatrix}'} \tag{32}$$

where *<sup>k</sup>* <sup>∈</sup> *ci*, *<sup>n</sup>* <sup>∈</sup> *cj*; *<sup>A</sup>*(*Cn*,*<sup>k</sup>*) is an adjacency matrix formed by all communities relevant to nodes *n* and *k*.

Recommendations also may be made in the probabilistic way, e.g., to be picked up from distributions formed by modified prediction scores.

#### **5. Multi-layer graphs**

In analysis of multi-layer graphs we assume that different network layers capture different modalities of the same underlying phenomena. For example, in case of mobile networks the social relations are partly reflected in different interaction layers, such as phone and SMS communications recorded in call-logs, people "closeness" extracted from the bluetooth (BT) and WLAN proximity, common GPS locations and traveling patterns and etc. It may be expected that a proper merging of data encoded in multi-graph layers can improve the classification accuracy.

One approach to analyze multi-layer graphs is first to merge graphs according to some rules and then extract communities from the combined graph. The layers may be combined directly or using some functions defined on the graphs. For example, multiple graphs may be aggregated in spectral domain using a joint block-matrix factorization or a regularization framework (Dong et al, 2011). Another method is to extract spectral structural properties from each layer separately and then to find a common presentation shared by all layers (Tang et all, 2009).

In this paper we consider methods of combining graphs based on modularity maximization

$$\max Q = \max\_{c\_i, c\_j} \frac{1}{2m} \sum\_{i,j} \left( A\_{ij} - \frac{d\_i d\_j}{2m} \right) \delta(c\_{j'} c\_j) \,. \tag{33}$$

Let's define a modularity matrix **<sup>M</sup>** with elements *Mij* <sup>=</sup> *Aij* <sup>−</sup> *didj* <sup>2</sup>*<sup>m</sup>* . Then the modularity in (33) may be presented as

$$Q = \frac{1}{2m} \text{Tr} \left( \mathbf{G}^T (\mathbf{A} - \frac{\mathbf{d} \mathbf{d}^T}{2m}) \mathbf{G} \right) = \frac{1}{2m} \text{Tr} (\mathbf{G}^T \mathbf{M} \mathbf{G}) \,, \tag{34}$$

where columns of *N* × *Nc* matrix **G** describes community memberships for nodes, *gj*(*i*) = *gij* ∈ {0, 1}, *gij* = 1 if the *i*-th node belongs to the community *cj*; *Nc* is a number of communities; **<sup>d</sup>** is a vector formed by degrees of nodes, **<sup>d</sup>** = (*d*1, ··· , *dN*)*T*.

Let's consider a multi-layer graph G = {*G*1, *G*2,..., *GL*} with adjacency matrices A = {**A**1, **A**2,..., **A***L*}, where *L* is a number of layers. Before combining. the graphs are to be normalized. In case of modularity maximization (33) it is natural to normalize each layer

according its total weight *m*.

12 Will-be-set-by-IN-TECH

For example, modified Jaccards and Katz scores which take into account multi-communities

2*m*

where *<sup>k</sup>* <sup>∈</sup> *ci*, *<sup>n</sup>* <sup>∈</sup> *cj*; *<sup>A</sup>*(*Cn*,*<sup>k</sup>*) is an adjacency matrix formed by all communities relevant to

Recommendations also may be made in the probabilistic way, e.g., to be picked up from

In analysis of multi-layer graphs we assume that different network layers capture different modalities of the same underlying phenomena. For example, in case of mobile networks the social relations are partly reflected in different interaction layers, such as phone and SMS communications recorded in call-logs, people "closeness" extracted from the bluetooth (BT) and WLAN proximity, common GPS locations and traveling patterns and etc. It may be expected that a proper merging of data encoded in multi-graph layers can improve the

One approach to analyze multi-layer graphs is first to merge graphs according to some rules and then extract communities from the combined graph. The layers may be combined directly or using some functions defined on the graphs. For example, multiple graphs may be aggregated in spectral domain using a joint block-matrix factorization or a regularization framework (Dong et al, 2011). Another method is to extract spectral structural properties from each layer separately and then to find a common presentation shared by all layers (Tang et all,

In this paper we consider methods of combining graphs based on modularity maximization

<sup>2</sup>*<sup>m</sup>* )**<sup>G</sup>**

where columns of *N* × *Nc* matrix **G** describes community memberships for nodes, *gj*(*i*) = *gij* ∈ {0, 1}, *gij* = 1 if the *i*-th node belongs to the community *cj*; *Nc* is a number of

Let's consider a multi-layer graph G = {*G*1, *G*2,..., *GL*} with adjacency matrices A = {**A**1, **A**2,..., **A***L*}, where *L* is a number of layers. Before combining. the graphs are to be normalized. In case of modularity maximization (33) it is natural to normalize each layer

*Aij* <sup>−</sup> *didj* 2*m* 

> <sup>=</sup> <sup>1</sup> 2*m*

max *Q* = max

*<sup>Q</sup>* <sup>=</sup> <sup>1</sup> 2*m Tr* 

Let's define a modularity matrix **<sup>M</sup>** with elements *Mij* <sup>=</sup> *Aij* <sup>−</sup> *didj*

*ci*,*cj*

1 <sup>2</sup>*<sup>m</sup>* ∑ *i*,*j*

**<sup>G</sup>***T*(**<sup>A</sup>** <sup>−</sup> **dd***<sup>T</sup>*

communities; **<sup>d</sup>** is a vector formed by degrees of nodes, **<sup>d</sup>** = (*d*1, ··· , *dN*)*T*.

(*<sup>I</sup>* <sup>−</sup> *<sup>β</sup>A*(*Cn*,*<sup>k</sup>*)


> )−<sup>1</sup> <sup>−</sup> *<sup>I</sup>* (*k*,*n*)

, (31)

*δ*(*cj*, *cj*). (33)

*Tr*(**G***T***MG**), (34)

<sup>2</sup>*<sup>m</sup>* . Then the modularity in

, (32)

<sup>J</sup>*<sup>C</sup>* (*k*, *<sup>n</sup>*) = *gj*(*k*) + *gi*(*n*)

2*m*

membership are defined as

nodes *n* and *k*.

**5. Multi-layer graphs**

classification accuracy.

(33) may be presented as

2009).

*S*(*i*,*j*)

distributions formed by modified prediction scores.

KC (*k*, *<sup>n</sup>*) = *gj*(*k*) + *gi*(*n*)

*S*(*i*,*j*)

The simplest method to combine multi-layer graphs is to make the average of all layers:

$$\bar{\mathbf{A}} = \frac{1}{L} \sum\_{l}^{L} \mathbf{A}\_{l}; \qquad \bar{\mathbf{d}} = \frac{1}{L} \sum\_{l}^{L} \mathbf{d}\_{l}; \qquad \bar{m} = \frac{1}{L} \sum\_{l}^{L} m\_{l}; \qquad \max Q = \max\_{\mathbf{G}} \frac{1}{2\bar{m}} \text{Tr} (\mathbf{G}^{T} \bar{\mathbf{M}} \mathbf{G}) \tag{35}$$

Then the community membership matrix **G** may be found by one of community detection methods described before. By taking into account degree distributions of nodes at each graph layer, the total modularity across all layers may maximized as (Tang et all, 2009)

$$\max Q = \frac{1}{L} \sum\_{l}^{L} Q\_{l} = \max\_{\mathbf{G}} \frac{1}{2L} \sum\_{l}^{L} \text{Tr} \left( \mathbf{G}^{T} (\mathbf{A}\_{l} - \frac{\mathbf{d}\_{l} \mathbf{d}\_{l}^{T}}{2m\_{l}}) \mathbf{G} \right) = \max\_{\mathbf{G}} \frac{1}{L} \sum\_{l}^{L} \text{Tr} (\mathbf{G}^{T} \frac{\mathbf{M}\_{l}}{2m\_{l}} \mathbf{G}) , \tag{36}$$

Similar approach, but applied to graph Laplacian spectra and extended with a regularization, is used in (Dong et al, 2011).

Typically networks describing social relations are often undersampled, noisy and contain different amount of information at each layer. As the result, a noisy or an observable part(s) in one of the layers after averaging in (35) and (36) may deteriorate the total accuracy. A possible solution for this problem is to apply weighted superposition of layers. In particular, the more informative the layer *l* is, the larger weight *wl* it should be given. For example, we may weight the layer *l* according to its modularity *Ql*, hence

$$\bar{\mathbf{A}}\_{w} = \frac{1}{L} \sum\_{l}^{L} w\_{l} \mathbf{A}\_{l} = \frac{1}{L} \sum\_{l}^{L} Q\_{l} \mathbf{A}\_{l};\tag{37}$$

Another method to improve the robustness of nodes classification in multi-layer graphs is to extract structural properties **G***<sup>l</sup>* at each layer separately and then merge partitions (Strehl & Ghosh, 2002). The more advanced approach of processing of multi-dimensional data may be based on presenting multi-layer graphs as tensors and apply tensor decomposition algorithms (Kolda & Bader, 2009) to extract stable communities and make de-noising by lower-dimension tensor approximation. These methods are rather involved and will be considered elsewhere.

#### **6. Simulation results for benchmark networks**

To test algorithms described in the previous sections we use the karate club social network (Zachary, 1977). As mentioned before, to get different hierarchical levels beyond and below the resolution provided by *max*-modularity, we use the random walk approach. A number of detected communities in the karate club at different resolution levels is presented at Fig.5. As one can see, the *max*-modularity algorithm does not necessary result in the best partition stability. The most stable partition in case of the karate club corresponds to 2 communities (shown by squares and circles at Fig.1), which is in line with results reported by (Zachary, 1977).

Comparison of coupling scenarios *B.2* and *B.3* is presented at Fig.6 and Fig.7. Pair-wise correlations between oscillators at *t* = 1 for coupling scenarios *B.2* and *B.3* are depicted at Fig.6. Scenario *B.3* reveals clearly communities structure, while in case of *B.2* the negative coupling overwhelms the attractive coupling and forces the system into a chaotic behavior.

(a) (b)

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 65

Fig. 6. Karate club: averaged pair-wise correlations (scaled by 5) between oscillators at *t* = 1 re-ordered according to communities. Coupling scenarios: (a) attractive-repulsive *B.2*; (b)

bluetooth (BT) and WLAN proximity, GPS coordinates, information on mobile and applications usage and etc) are collected from about 200 participants for the period from June 2009 till October 2010. Besides the collected data, several surveys before and after the campaign have been conducted to profile participants and to form a basis for the ground truth. In this section we consider social affinity graphs constructed from call-logs, GPS locations and

Fig.10 shows a weighted aggregated graph of voice-calls and SMS connections derived from corresponding datasets. This graph depicts connections among 136 users, which indicates that about 73% of participants are socially connected within the data collection campaign. To find communities in this network we first run the modularity maximization algorithm, which identifies 14 communities after the 3d iteration (Fig.10). To get the higher hierarchical levels one could represent each community by a single node and continue clustering with the new aggregated network. However, this procedure would result in a loss of underlaying structure. In particular, the hierarchical community detection with the nested communities structure poses additional constrains on the maximization process and may lead to incorrect classification at the higher layers. For example, after the 3d iteration the node "v146", shown by red arrow at Fig.10, belongs (correctly) to a community shown by white circles (3 intra-community edges and single edges to other 6 communities). After agglomeration, the node "v146" will be assigned to the community shown by white circles on the left side of the graph. However, it is easy to verify that when communities on the right are merged, the node "v146" is to be re-assigned to the community on the right side of the network. Dynamical formulation of modularity extended with the random walk allows different (not necessarily nested) allocations of nodes at different granularity (resolution) levels and helps to resolve

Fig.11 presents a number of communities at different hierarchical levels detected by the random walk for the network shown at Fig.10. As one can see, the *max*-modularity partition with 14 communities is clearly unstable and hardly could be used for reliable predictions, the

attractive-neutral *B.3*.

users proximity.

this problem.

Dynamical connectivity matrices reordered by communities for the attractive-neural coupling *B.3* at *t* = 1 (on the left) and *t* = 10 (on the right) are depicted at Fig.7. In case *B.3* one can see (also cf. Fig.8) that number of connections with the attractive coupling is growing in time, while the strength of the repulsive connections is decreasing, which finally results in the global synchronization. For the scenario *B.2* there is a dynamical balance between attractive and repulsive coupling with small fluctuations around the mean (Fig.8). Note that even the averaged strength of the repulsive connections is less than the attractive coupling, the system dynamics shows a quasi-chaotic behavior.

Fig.9 shows the adjacency matrix for Zachary karate club (red circles), detected communities by pink squares, predicted links are shown by blue dots. As expected, the dynamical methods for links prediction tend to make more connections within the established communities first, followed by merging communities and creating highly overlapped partitions at the higher hierarchical levels (the upper part at Fig.9). In case of Katz predictor (32), by increasing the dumping parameter *β* we take into account the larger number of paths connecting nodes in the graph, which in turn results into the larger number of suggested links above a fixed threshold. Following the concept of dynamical connectivity matrix (20), the process of growing number of links may be seen as the hierarchical community formation predicted by (32) at different values of *β*. This process is illustrated at Fig.9, the bottom part. Note that in case of Katz predictor, the connected graph is also approaching the fully connected graph, but the network evolution may take a different trajectory compared to the coupled dynamical systems. In particular, at small values of *t* and *β*, the network evolution is similar for both cases (cf. Fig.9(b) and Fig.9(e)), but with the time the evolution trajectories may follow different paths (cf. Fig.9(c) and Fig.9(f)), which in turn results in different predictions.

Note that in all cases of the network evolution, we may prioritize the recommended links based on the soft communities detection (Katz predictor) or the threshold *η* (coupled dynamical systems). We address this issue below in Section 7.

Fig. 5. Karate club: number od communities at different resolution levels.

#### **7. Applications for real wold mobile data**

#### **7.1 Community detection in Nokia mobile datasets**

To analyze mobile users behavior and study underlying social structure, Nokia Research Center/Lausanne organized mobile data collection campaign at EPFL university campus (Kiukkonen et al, 2010). Rich-content datasets (including data from mobile sensors, call-logs, 14 Will-be-set-by-IN-TECH

Dynamical connectivity matrices reordered by communities for the attractive-neural coupling *B.3* at *t* = 1 (on the left) and *t* = 10 (on the right) are depicted at Fig.7. In case *B.3* one can see (also cf. Fig.8) that number of connections with the attractive coupling is growing in time, while the strength of the repulsive connections is decreasing, which finally results in the global synchronization. For the scenario *B.2* there is a dynamical balance between attractive and repulsive coupling with small fluctuations around the mean (Fig.8). Note that even the averaged strength of the repulsive connections is less than the attractive coupling, the system

Fig.9 shows the adjacency matrix for Zachary karate club (red circles), detected communities by pink squares, predicted links are shown by blue dots. As expected, the dynamical methods for links prediction tend to make more connections within the established communities first, followed by merging communities and creating highly overlapped partitions at the higher hierarchical levels (the upper part at Fig.9). In case of Katz predictor (32), by increasing the dumping parameter *β* we take into account the larger number of paths connecting nodes in the graph, which in turn results into the larger number of suggested links above a fixed threshold. Following the concept of dynamical connectivity matrix (20), the process of growing number of links may be seen as the hierarchical community formation predicted by (32) at different values of *β*. This process is illustrated at Fig.9, the bottom part. Note that in case of Katz predictor, the connected graph is also approaching the fully connected graph, but the network evolution may take a different trajectory compared to the coupled dynamical systems. In particular, at small values of *t* and *β*, the network evolution is similar for both cases (cf. Fig.9(b) and Fig.9(e)), but with the time the evolution trajectories may follow different paths

Note that in all cases of the network evolution, we may prioritize the recommended links based on the soft communities detection (Katz predictor) or the threshold *η* (coupled

To analyze mobile users behavior and study underlying social structure, Nokia Research Center/Lausanne organized mobile data collection campaign at EPFL university campus (Kiukkonen et al, 2010). Rich-content datasets (including data from mobile sensors, call-logs,

(cf. Fig.9(c) and Fig.9(f)), which in turn results in different predictions.

Fig. 5. Karate club: number od communities at different resolution levels.

**7. Applications for real wold mobile data**

**7.1 Community detection in Nokia mobile datasets**

dynamical systems). We address this issue below in Section 7.

dynamics shows a quasi-chaotic behavior.

Fig. 6. Karate club: averaged pair-wise correlations (scaled by 5) between oscillators at *t* = 1 re-ordered according to communities. Coupling scenarios: (a) attractive-repulsive *B.2*; (b) attractive-neutral *B.3*.

bluetooth (BT) and WLAN proximity, GPS coordinates, information on mobile and applications usage and etc) are collected from about 200 participants for the period from June 2009 till October 2010. Besides the collected data, several surveys before and after the campaign have been conducted to profile participants and to form a basis for the ground truth. In this section we consider social affinity graphs constructed from call-logs, GPS locations and users proximity.

Fig.10 shows a weighted aggregated graph of voice-calls and SMS connections derived from corresponding datasets. This graph depicts connections among 136 users, which indicates that about 73% of participants are socially connected within the data collection campaign. To find communities in this network we first run the modularity maximization algorithm, which identifies 14 communities after the 3d iteration (Fig.10). To get the higher hierarchical levels one could represent each community by a single node and continue clustering with the new aggregated network. However, this procedure would result in a loss of underlaying structure. In particular, the hierarchical community detection with the nested communities structure poses additional constrains on the maximization process and may lead to incorrect classification at the higher layers. For example, after the 3d iteration the node "v146", shown by red arrow at Fig.10, belongs (correctly) to a community shown by white circles (3 intra-community edges and single edges to other 6 communities). After agglomeration, the node "v146" will be assigned to the community shown by white circles on the left side of the graph. However, it is easy to verify that when communities on the right are merged, the node "v146" is to be re-assigned to the community on the right side of the network. Dynamical formulation of modularity extended with the random walk allows different (not necessarily nested) allocations of nodes at different granularity (resolution) levels and helps to resolve this problem.

Fig.11 presents a number of communities at different hierarchical levels detected by the random walk for the network shown at Fig.10. As one can see, the *max*-modularity partition with 14 communities is clearly unstable and hardly could be used for reliable predictions, the

(a) (b) (c)

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 67

(d) (e) (f)

Fig. 9. Karate club: adjacency matrix is shown by red circles, detected communities by pink squares, predicted links are shown by blue dots. The upper part (a)-(c): predictions made by dynamical systems at different time scales. The bottom part (d)-(f): recommendations made

Fig. 10. Community detection based on SMS and call-logs: communities are coded by colors.

by the modified Katz predictor at different values of *β*.

Fig. 7. Karate club: examples of dynamical connectivity matrices for attractive (shown on the top in red color) and repulsive (shown at the bottom in blue color) coupling at *t* = 1 (a) and *t* = 10 (b). Nodes are ordered according to communities. Coupling scenarios: attractive-neutral *B.3*.

Fig. 8. Karate club: evolution of averaged attractive *wp* and repulsive *wn* weights for different coupling scenarios *B.2* and *B.3*; the average is made over 100 realizations.

16 Will-be-set-by-IN-TECH

(a) (b)

Fig. 7. Karate club: examples of dynamical connectivity matrices for attractive (shown on the top in red color) and repulsive (shown at the bottom in blue color) coupling at *t* = 1 (a) and

*t* = 10 (b). Nodes are ordered according to communities. Coupling scenarios:

Fig. 8. Karate club: evolution of averaged attractive *wp* and repulsive *wn* weights for different coupling scenarios *B.2* and *B.3*; the average is made over 100 realizations.

attractive-neutral *B.3*.

Fig. 9. Karate club: adjacency matrix is shown by red circles, detected communities by pink squares, predicted links are shown by blue dots. The upper part (a)-(c): predictions made by dynamical systems at different time scales. The bottom part (d)-(f): recommendations made by the modified Katz predictor at different values of *β*.

Fig. 10. Community detection based on SMS and call-logs: communities are coded by colors.

Fig. 12. Community detection using random walk in the phone-calls network.

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 69

Fig. 13. Communities detected in the BT proximity network and mapped on the phone-calls

network.

Fig. 11. Stability of communities at different resolution levels.

stable partitions appear at the higher hierarchical levels starting from 8 communities. In the following we rely on this fact to build the ground truth references for evaluation of clustering.

#### **7.2 Applications for multi-layer graphs**

Besides phone and SMS call-logs, the social affinity of participants may also be derived from other information layers, such as a local proximity of users (BT and WLAN layers) and their location information (GPS). In this case the soft communities detection may be extended to include multiple graph layers. In particular, we found that users' profiles may significantly vary across the layers. For example, a user may have dense BT connections with a multiple communities participation, while his phone call activities may be rather limited. Combining information from several graph layers can be used to improve the reliability of classification. Below we show some preliminary results, more detailed analysis of multi-layer graphs built from mobile datasets may be found in (Dong et al, 2011).

To make verification of detected communities we select a subset of 136 users with known email affiliations as the ground truth. In our case these users are allocated into 8 groups. To get the same number of communities in social affinity multi-layer graphs, we use the random walk (11) to obtain the more course resolution than provided by the modularity maximization. Fig 12 depicts communities (color coded) derived from the phone-calls graph. Single nodes here indicate users which did not make phone calls to other participants of the data collection campaign. Communities derived from the BT-proximity graph and mapped on the phone-call graph are shown at Fig.13. As expected, multi-layers graphs help us to classify users based on the additional information found in other layers. For example, users which can not be classified based on phone calls (Fig.12) are assigned to communities based on the BT proximity (Fig.13). Fig.14 shows communities detected in the combined graph formed by the BT and phone-call networks and then mapped on the phone-call network.

Next, we consider communities detected at single and combined layers with different strategies (35)-(37) described in Section 5 and compare them to the ground truth. To evaluate accuracy of community detection we use the normalized mutual information (NMI) score, purity test and Rand index (RI) (Manning et al, 2008). We found that the best graph combining is provided by weighted superposition (37) according to the *max*-modularity of layers *Q*. Results of the comparison are summarized in Table 1. As expected, different graph layers have a different relevance to the email affiliations and do not have fully overlapped community structures. In particular, the local proximity seems to be more relevant to professional relations

18 Will-be-set-by-IN-TECH

stable partitions appear at the higher hierarchical levels starting from 8 communities. In the following we rely on this fact to build the ground truth references for evaluation of clustering.

Besides phone and SMS call-logs, the social affinity of participants may also be derived from other information layers, such as a local proximity of users (BT and WLAN layers) and their location information (GPS). In this case the soft communities detection may be extended to include multiple graph layers. In particular, we found that users' profiles may significantly vary across the layers. For example, a user may have dense BT connections with a multiple communities participation, while his phone call activities may be rather limited. Combining information from several graph layers can be used to improve the reliability of classification. Below we show some preliminary results, more detailed analysis of multi-layer graphs built

To make verification of detected communities we select a subset of 136 users with known email affiliations as the ground truth. In our case these users are allocated into 8 groups. To get the same number of communities in social affinity multi-layer graphs, we use the random walk (11) to obtain the more course resolution than provided by the modularity maximization. Fig 12 depicts communities (color coded) derived from the phone-calls graph. Single nodes here indicate users which did not make phone calls to other participants of the data collection campaign. Communities derived from the BT-proximity graph and mapped on the phone-call graph are shown at Fig.13. As expected, multi-layers graphs help us to classify users based on the additional information found in other layers. For example, users which can not be classified based on phone calls (Fig.12) are assigned to communities based on the BT proximity (Fig.13). Fig.14 shows communities detected in the combined graph formed by the BT and

Next, we consider communities detected at single and combined layers with different strategies (35)-(37) described in Section 5 and compare them to the ground truth. To evaluate accuracy of community detection we use the normalized mutual information (NMI) score, purity test and Rand index (RI) (Manning et al, 2008). We found that the best graph combining is provided by weighted superposition (37) according to the *max*-modularity of layers *Q*. Results of the comparison are summarized in Table 1. As expected, different graph layers have a different relevance to the email affiliations and do not have fully overlapped community structures. In particular, the local proximity seems to be more relevant to professional relations

Fig. 11. Stability of communities at different resolution levels.

from mobile datasets may be found in (Dong et al, 2011).

phone-call networks and then mapped on the phone-call network.

**7.2 Applications for multi-layer graphs**

Fig. 12. Community detection using random walk in the phone-calls network.

Fig. 13. Communities detected in the BT proximity network and mapped on the phone-calls network.

(a) (b) (c)

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 71

Fig. 15. Community of the user "129" (shown by pink color at Fig.12): averaged (scaled by 5) pair-wise correlations between oscillators at *t* = 10 (a). Intra-community adjacency matrix (red circles) and links predicted by dynamics (blue dots) at different resolution levels: *t* = 15

the threshold *η* for the dynamical connectivity matrix *Ct*(*η*) (which is linked to time resolution *t*) we obtain different connectivity matrices *C<sup>η</sup>* (*t*) presenting the network evolution. Connectivity matrices (blue points) corresponding to *η* = 3 (*t* = 15) and *η* = 2.3 (*t* = 25) are shown at Fig.15(b) and Fig.15(c), respectively. The community adjacency matrix is marked on the same figures by red circles. As one can see, dynamical systems first reliably detect the underlaying topology and then form new links as the result of local interactions and dynamical links update. It can be easily verified that practically all new links (e.g., 12 out of 13 at Fig.15(b)) create new cliques, hence we can interpret these new links as the Friend-of-Friend

intra-community recommendations for two predictors based on the soft community detection

Here we list all new links together with their normalized prediction scores for the user "129" which create at least one new clique within its community (shown by pink color at Fig.12).

KC (*s*, *<sup>d</sup>*), % *<sup>S</sup>*(*i*,*i*)

129 51 10.5 **22.6** 18.6 129 78 11.1 16.3 **20.8** 129 91 **47.1** 15.4 11.6 129 70 11.3 15.3 18.9 129 92 9.6 15.3 18.8 129 37 10.5 15.1 11.4 Table 2. Scores for the FoF intra-community recommendations for user 129 according to

closely related Laplacians. As the result, the distribution of prediction scores *S*(*i*,*i*)

DC (*k*, *n*) are rather close to each other, compared to the the distribution of the routing-based

cases is the important target in social science. As an example, the best intra-community

KC (*k*, *n*). Convergence of opinions to a consensus within communities in many

(Katz predictor and convergence speed to consensus, *S*(*i*,*i*)

source destination *S*(*i*,*i*)

different similarity measures for the phone-calls network at Fig.12.

CC (*k*, *<sup>n</sup>*) and *<sup>S</sup>*(*i*,*i*)

DC (*k*, *n*) for dynamical systems together with the Friend-of-Friend

CC (*s*, *<sup>d</sup>*), % *<sup>S</sup>*(*i*,*i*)

DC (*k*, *n*) are based on the network synchronization with

CC (*k*, *n*)) are summarized in Table 2.

DC (*s*, *d*), %

CC (*k*, *n*) and

(b) and *t* = 25 (c).

recommendations. Calculated scores *S*(*i*,*i*)

Recall that both *S*(*i*,*i*)

Katz score *S*(*i*,*i*)

*S*(*i*,*i*)

indicated by email affiliations, while phone calls seem to reflect more friendship and family relations. However, the detected structures are still rather close to each other (cf. columns in Table 1) reflecting underlaying social affinity. As one can see, by properly combining information from different graph layers we can improve the reliability of communities detection.

Fig. 14. Communities detected in the combined BT & phone-calls network and mapped on the phone-calls network.


Table 1. Evaluation of community detection in multi-layer graphs.

#### **7.3 Application for recommendation systems**

As discussed in Section 4, one of applications of the soft communities detection and coupled systems dynamics may be seen in recommendation systems. To illustrate the approach we selected the user "129" (marked by oval) in the phone-calls network at Fig.12 and calculated proposed prediction scores for different similarity measures.

First, we consider intra-community predictions made by coupled dynamical systems. Fig.15(a) depicts pair-wise correlations (scaled by 5) between oscillators at *t* = 10 for the sub-network at Fig.12 forming the intra-community of the user "129". By changing 20 Will-be-set-by-IN-TECH

indicated by email affiliations, while phone calls seem to reflect more friendship and family relations. However, the detected structures are still rather close to each other (cf. columns in Table 1) reflecting underlaying social affinity. As one can see, by properly combining information from different graph layers we can improve the reliability of communities

Fig. 14. Communities detected in the combined BT & phone-calls network and mapped on

Phone calls 0.262 0.434 0.698 0.638 BT proximity 0.307 0.456 0.720 0.384 GPS 0.313 0.471 0.704 0.101

As discussed in Section 4, one of applications of the soft communities detection and coupled systems dynamics may be seen in recommendation systems. To illustrate the approach we selected the user "129" (marked by oval) in the phone-calls network at Fig.12 and calculated

First, we consider intra-community predictions made by coupled dynamical systems. Fig.15(a) depicts pair-wise correlations (scaled by 5) between oscillators at *t* = 10 for the sub-network at Fig.12 forming the intra-community of the user "129". By changing

Phone + BT **0.342** 0.427 **0.783**

Table 1. Evaluation of community detection in multi-layer graphs.

proposed prediction scores for different similarity measures.

**7.3 Application for recommendation systems**

NMI Purity RI Q

detection.

the phone-calls network.

Fig. 15. Community of the user "129" (shown by pink color at Fig.12): averaged (scaled by 5) pair-wise correlations between oscillators at *t* = 10 (a). Intra-community adjacency matrix (red circles) and links predicted by dynamics (blue dots) at different resolution levels: *t* = 15 (b) and *t* = 25 (c).

the threshold *η* for the dynamical connectivity matrix *Ct*(*η*) (which is linked to time resolution *t*) we obtain different connectivity matrices *C<sup>η</sup>* (*t*) presenting the network evolution. Connectivity matrices (blue points) corresponding to *η* = 3 (*t* = 15) and *η* = 2.3 (*t* = 25) are shown at Fig.15(b) and Fig.15(c), respectively. The community adjacency matrix is marked on the same figures by red circles. As one can see, dynamical systems first reliably detect the underlaying topology and then form new links as the result of local interactions and dynamical links update. It can be easily verified that practically all new links (e.g., 12 out of 13 at Fig.15(b)) create new cliques, hence we can interpret these new links as the Friend-of-Friend recommendations.

Calculated scores *S*(*i*,*i*) DC (*k*, *n*) for dynamical systems together with the Friend-of-Friend intra-community recommendations for two predictors based on the soft community detection (Katz predictor and convergence speed to consensus, *S*(*i*,*i*) CC (*k*, *n*)) are summarized in Table 2. Here we list all new links together with their normalized prediction scores for the user "129" which create at least one new clique within its community (shown by pink color at Fig.12).


Table 2. Scores for the FoF intra-community recommendations for user 129 according to different similarity measures for the phone-calls network at Fig.12.

Recall that both *S*(*i*,*i*) CC (*k*, *<sup>n</sup>*) and *<sup>S</sup>*(*i*,*i*) DC (*k*, *n*) are based on the network synchronization with closely related Laplacians. As the result, the distribution of prediction scores *S*(*i*,*i*) CC (*k*, *n*) and *S*(*i*,*i*) DC (*k*, *n*) are rather close to each other, compared to the the distribution of the routing-based Katz score *S*(*i*,*i*) KC (*k*, *n*). Convergence of opinions to a consensus within communities in many cases is the important target in social science. As an example, the best intra-community

(a) (b) (c)

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 73

large networks. Prediction scores *S*CC(129, *n*) and *S*KC(129, *n*) calculated according to (32) for cases with intra- and inter-communities links in the phone-call network are depicted at Fig.19. Here the scores are normalized as probabilities and sorted according to its priority; destination nodes *n* are listed along the *x*-axis; corresponding random-link probabilities, *pkn* = (*dkdn*)/2*m*, are shown as the reference. Note that the link with the highest priority,

However, the presence of inter-community links modifies priorities of other recommendations according to (30). To make verification we compare the predicted links at the phone-call network with links observed for the user "129" at the BT proximity layer. This comparison

Results for the combined BT and phone-calls networks are presented at Fig.20. Pair-wise correlations between nodes obtained by dynamical systems approach are shown at Fig.20 (a). These correlations may be interpreted as probabilities for new links recommendations. Fig.20 (b) depicts recommended links based on the modified Katz predictor (blue circles) beyond the

Fig. 19. Priorities of the FoF recommendations for the user 129 at Fig.12 to be connected to

destination nodes shown along *x*-axis over all relevant communities.

shows a good fit: 16 out of 18 predicted links are found at the BT proximity layer.

DC (*k*, *<sup>n</sup>*), it makes *<sup>S</sup>*(*i*,*i*)

CC (*k*, *n*), is the same as in the intra-community recommendation (cf. Table 2).

CC (*k*, *n*) more suitable for

Fig. 18. Phone-call network: (a) adjacency matrix is marked by red dots, all possible intra-communities links are shown by yellow squares. Links predicted by dynamics (blue

dots) tend to concentrate within communities: (b) *t* = 10; (c) *t* = 15.

has the lower computational complexity than *S*(*i*,*i*)

{129,51} for *S*(*i*,*i*)

recommendation in the phone-calls network according *S*(*ii*) CC (*k*, *n*) is shown by the blue arrow at Fig.12. Scaled pair-wise correlations between oscillators for the whole phone-call network

Fig. 16. Phone-call network: averaged pair-wise correlations (scaled by 10) between oscillators at *t*=10, coupling scenario *B.3.*

Fig. 17. Phone-call network: averaged pair-wise correlations re-ordered according to detected communities.

at Fig.12 are shown at Fig.16. Correlations between nodes, re-ordered according to one of the stable partitions detected by the random walk at *t=10*, reveal clearly the community structure (Fig.17). The phone-calls adjacency matrix (red circles) and all possible intra-community links (yellow squares) for the stable communities at *t* = 10 are depicted at Fig.18 (a). Links predicted by system dynamics (blue dots) inside and outside of yellow squares indicate predicted intra-community and inter-communities connections at different resolution levels and show the priority of the intra-community connections (Fig.18 (b) – Fig.18(c) ). As the whole, the presented results for the coupled dynamical systems provide the formal basis for the recommendation rules formulated in Section 4.2.

As it is shown in Section 3, the dynamical process of opinions convergence may be seen as the first-order approximation of the network synchronization. At the same time, *S*(*i*,*i*) CC (*k*, *n*) 22 Will-be-set-by-IN-TECH

at Fig.12. Scaled pair-wise correlations between oscillators for the whole phone-call network

Fig. 16. Phone-call network: averaged pair-wise correlations (scaled by 10) between

Fig. 17. Phone-call network: averaged pair-wise correlations re-ordered according to

at Fig.12 are shown at Fig.16. Correlations between nodes, re-ordered according to one of the stable partitions detected by the random walk at *t=10*, reveal clearly the community structure (Fig.17). The phone-calls adjacency matrix (red circles) and all possible intra-community links (yellow squares) for the stable communities at *t* = 10 are depicted at Fig.18 (a). Links predicted by system dynamics (blue dots) inside and outside of yellow squares indicate predicted intra-community and inter-communities connections at different resolution levels and show the priority of the intra-community connections (Fig.18 (b) – Fig.18(c) ). As the whole, the presented results for the coupled dynamical systems provide the formal basis for

As it is shown in Section 3, the dynamical process of opinions convergence may be seen as the first-order approximation of the network synchronization. At the same time, *S*(*i*,*i*)

CC (*k*, *n*) is shown by the blue arrow

CC (*k*, *n*)

recommendation in the phone-calls network according *S*(*ii*)

oscillators at *t*=10, coupling scenario *B.3.*

detected communities.

the recommendation rules formulated in Section 4.2.

Fig. 18. Phone-call network: (a) adjacency matrix is marked by red dots, all possible intra-communities links are shown by yellow squares. Links predicted by dynamics (blue dots) tend to concentrate within communities: (b) *t* = 10; (c) *t* = 15.

has the lower computational complexity than *S*(*i*,*i*) DC (*k*, *<sup>n</sup>*), it makes *<sup>S</sup>*(*i*,*i*) CC (*k*, *n*) more suitable for large networks. Prediction scores *S*CC(129, *n*) and *S*KC(129, *n*) calculated according to (32) for cases with intra- and inter-communities links in the phone-call network are depicted at Fig.19. Here the scores are normalized as probabilities and sorted according to its priority; destination nodes *n* are listed along the *x*-axis; corresponding random-link probabilities, *pkn* = (*dkdn*)/2*m*, are shown as the reference. Note that the link with the highest priority, {129,51} for *S*(*i*,*i*) CC (*k*, *n*), is the same as in the intra-community recommendation (cf. Table 2). However, the presence of inter-community links modifies priorities of other recommendations according to (30). To make verification we compare the predicted links at the phone-call network with links observed for the user "129" at the BT proximity layer. This comparison shows a good fit: 16 out of 18 predicted links are found at the BT proximity layer.

Results for the combined BT and phone-calls networks are presented at Fig.20. Pair-wise correlations between nodes obtained by dynamical systems approach are shown at Fig.20 (a). These correlations may be interpreted as probabilities for new links recommendations. Fig.20 (b) depicts recommended links based on the modified Katz predictor (blue circles) beyond the

Fig. 19. Priorities of the FoF recommendations for the user 129 at Fig.12 to be connected to destination nodes shown along *x*-axis over all relevant communities.

mobile-data collection campaign to derive community structures in multi-layer graphs and

Multiple-Membership Communities Detection and Its Applications for Mobile Networks 75

Let's define *C* = {*c*1,..., *cM*} and Ψ = {*ψ*1,...,.*ψM*} as partitions containing detected clusters *ci* and the ground truth clusters *ψi*, respectively. Quality of clustering algorithms may be

*n*

*NMI*(*C*, <sup>Ψ</sup>) = <sup>2</sup> *<sup>I</sup>*(Ψ, *<sup>C</sup>*)

*nm*1*nm*<sup>2</sup>

where the mutual information *I*(*C*1, *C*2) between the partitions *C*<sup>1</sup> and *C*<sup>2</sup> and their

*n* is total number of data points; *cm*1,*m*<sup>2</sup> is the number of common samples in the *m*1-th cluster from *C*<sup>1</sup> and the *m*2-th cluster in the partition *C*2; *nmi* is the number of samples in the *mi*-th cluster in the partition *Ci* . According to (41), max *NMI*(*C*1, *C*2) = 1 if *C*<sup>1</sup> = *C*<sup>2</sup> .

Acebrón, J., Bonilla, L., Pérez-Vicente, C., Ritort, F., Spigler, R. (2005). The Kuramoto model: A

Albert, R. & Barabási, A.-L. (2002). Statistical mechanics of complex networks. *Reviews of*

Arenas A., Díaz-Guilera, A., Pérez-Vicente, C. (2006). Synchronization reveals topological

Arenas, A., Diaz-Guilera, A., Kurths, J., Moreno, Y. and Zhou, C. (2008). Synchronization in

Blondel, V., Guillaume, J.-L., Lambiotte, R. and Lefebvre, E. (2008). Fast unfolding of

Evans, T. S. and Lambiotte R. (2009). Line Graphs, Link Partitions and Overlapping

communites in large networks. *Journal of Statistical Mechanics: Theory and Experiment*,

scales in complex networks. *Physical Review Letters*, 96, 114102.

complex networks, *Physics Reports*, 469, pp. 93–153.

simple paradigm for synchronization phenomena. *Reviews of Modern Physics*, 77 (1),

, *H*(*Ci*) = −

*M* ∑ *m*=1

*TruePositive* <sup>+</sup> *FalsePositive* <sup>+</sup> *FalseNegative* <sup>+</sup> *TrueNegative*; (38)

*<sup>j</sup>* <sup>|</sup>*ψ<sup>m</sup>* <sup>∩</sup> *cj*|; (39)

*<sup>H</sup>*(Ψ) + *<sup>H</sup>*(*C*)), (40)

*nmi*

*<sup>n</sup>* log *nmi n* 

; (41)

*M* ∑ *mi*

max

to make new link recommendations.

• Normalized mutual information:

*I*(*C*2, *C*2) =

pp. 137–185.

*M* ∑ *m*<sup>1</sup>

*Modern Physics*, 74, pp. 47–97.

vol. 1742-5468, no. 10, pp. P10008+12.

Communities. *Physical Review*, E 80 016105.

*M* ∑ *m*<sup>2</sup>

*cm*1,*m*<sup>2</sup>

entropies *H*(*Ci*) are

**10. References**

• Rand index:

• Purity test:

**9. Appendix: Clustering evaluation measures**

evaluated by different measures (Manning et al, 2008), in particular:

*RI* <sup>=</sup> *TruePositive* <sup>+</sup> *TrueNegative*

*Purity*(Ψ,*C*) = <sup>1</sup>

*<sup>n</sup>* log *n cm*1,*m*<sup>2</sup>

given topology (red dots). We found that both recommenders mostly agree on the priority of intra-community links, but put different weights on inter-community predictions.

Depending on a purpose of recommendation we may select different prediction criteria. Since new links change topology, which in turn affects dynamical properties of the network, the recommendations may be seen as a distributed control driving the network evolution.

In general, the selection of topology-based recommendation criteria and their verifications are the open problems. Currently we are running experiments to evaluate different recommendation criteria and its acceptance rates.

Fig. 20. Combined BT and phone-call networks, nodes are ordered according to detected communities: (a) color-coded pair-wise correlations using dynamical systems; (b) links recommendations using modified Katz predictor (blue circles), adjacency matrix is marked by red dots, all possible intra-community links are shown by yellow squares.

#### **8. Conclusions**

In this chapter we present the framework for multi-membership communities detection in dynamical multi-layer graphs and its applications for links predictions/recommendations based on the network topology. The method is based on the dynamical formulation of modularity using a random walk and then extended to coupled dynamical systems to detect communities at different hierarchical levels. We introduce attractive and repulsive coupling and dynamical link updates that allow us to make predictions on a cooperative or a competing behavior of users in the network and analyze connectivity dynamics.

To address overlapping communities we suggest the method of soft community detection. This method may be used to improve marketing efficiency by identifying users which are strongly relevant to targeted groups, but are not detected by the standard community detection methods. Based on the soft community detection we suggest friend-recommendations in social networks, where new link recommendations are made as intra- and inter-clique communities completion and recommendations are prioritized according to similarity measures modified to include multiple-communities membership.

This developed methods are applied for analysis of datasets recorded during Nokia

mobile-data collection campaign to derive community structures in multi-layer graphs and to make new link recommendations.

#### **9. Appendix: Clustering evaluation measures**

Let's define *C* = {*c*1,..., *cM*} and Ψ = {*ψ*1,...,.*ψM*} as partitions containing detected clusters *ci* and the ground truth clusters *ψi*, respectively. Quality of clustering algorithms may be evaluated by different measures (Manning et al, 2008), in particular:

• Rand index:

24 Will-be-set-by-IN-TECH

given topology (red dots). We found that both recommenders mostly agree on the priority of

Depending on a purpose of recommendation we may select different prediction criteria. Since new links change topology, which in turn affects dynamical properties of the network, the recommendations may be seen as a distributed control driving the network evolution. In general, the selection of topology-based recommendation criteria and their verifications are the open problems. Currently we are running experiments to evaluate different

(a) (b)

In this chapter we present the framework for multi-membership communities detection in dynamical multi-layer graphs and its applications for links predictions/recommendations based on the network topology. The method is based on the dynamical formulation of modularity using a random walk and then extended to coupled dynamical systems to detect communities at different hierarchical levels. We introduce attractive and repulsive coupling and dynamical link updates that allow us to make predictions on a cooperative or a competing

To address overlapping communities we suggest the method of soft community detection. This method may be used to improve marketing efficiency by identifying users which are strongly relevant to targeted groups, but are not detected by the standard community detection methods. Based on the soft community detection we suggest friend-recommendations in social networks, where new link recommendations are made as intra- and inter-clique communities completion and recommendations are prioritized according to similarity measures modified to include multiple-communities membership. This developed methods are applied for analysis of datasets recorded during Nokia

Fig. 20. Combined BT and phone-call networks, nodes are ordered according to detected communities: (a) color-coded pair-wise correlations using dynamical systems; (b) links recommendations using modified Katz predictor (blue circles), adjacency matrix is marked

by red dots, all possible intra-community links are shown by yellow squares.

behavior of users in the network and analyze connectivity dynamics.

intra-community links, but put different weights on inter-community predictions.

recommendation criteria and its acceptance rates.

**8. Conclusions**

$$RI = \frac{TruePositive + TrueNegative}{TruePositive + FalseNegative + TrueNegative}} \text{(38)}$$

• Purity test:

$$Purity(\Psi, \mathbb{C}) = \frac{1}{n} \sum\_{m=1}^{M} \max\_{j} |\psi\_{m} \cap c\_{j}|;\tag{39}$$

• Normalized mutual information:

$$NMI(\mathbb{C}, \mathbb{Y}) = \frac{2 \, I(\mathbb{Y}, \mathbb{C})}{H(\mathbb{Y}) + H(\mathbb{C})},\tag{40}$$

where the mutual information *I*(*C*1, *C*2) between the partitions *C*<sup>1</sup> and *C*<sup>2</sup> and their entropies *H*(*Ci*) are

$$I(\mathbb{C}\_2, \mathbb{C}\_2) = \sum\_{m\_1}^{M} \sum\_{m\_2}^{M} \frac{c\_{m\_1, m\_2}}{n} \log \left( \frac{n \ c\_{m\_1, m\_2}}{n\_{m\_1} n\_{m\_2}} \right), \quad H(\mathbb{C}\_i) = -\sum\_{m\_i}^{M} \frac{n\_{m\_i}}{n} \log \left( \frac{n\_{m\_i}}{n} \right); \tag{41}$$

*n* is total number of data points; *cm*1,*m*<sup>2</sup> is the number of common samples in the *m*1-th cluster from *C*<sup>1</sup> and the *m*2-th cluster in the partition *C*2; *nmi* is the number of samples in the *mi*-th cluster in the partition *Ci* . According to (41), max *NMI*(*C*1, *C*2) = 1 if *C*<sup>1</sup> = *C*<sup>2</sup> .

#### **10. References**


**Part 2** 

**DSP in Monitoring, Sensing and Measurements** 

