**Table 2.**

*Extracted top bullying words.*

B-LDA got crucial enrichment to facilitate specification the per-bullying message topic dispersion mutually on the predator and individual victims. Every topic includes multinomial distribution on words and every Predator-Casualty pair has a distribution on topics. So, subsidiary dispersions in excess on bullying subjects accustomed exclusively on a predator, or solely on a recipient, can be computed easily. For example, corpus comprising 135 persons and 35 k bulling messages, and also on 5 months of sending and receiving messages of a predator, comprising 17 victims and 19 k messages. B-LDA turns up tremendously prominent topics, and grants support that it predicts predator's motives. In the experiments, the hectic parameters α and β are fixed at 1 and 0.01 respectively. The number of topics T is also fixed at |T| = 5. For a 50 topic solution, Dataset from Twitter took 150 hours for

2000 iterations (5 min per iteration).

**Figure 4.**

**Figure 5.**

**157**

*Distributions of tweets per motive.*

*Number of bullying tweets over time intervals.*

*Classification Model for Bullying Posts Detection DOI: http://dx.doi.org/10.5772/intechopen.88633*

**Figure 3.** *Word cloud for bullying words.*

*Classification Model for Bullying Posts Detection DOI: http://dx.doi.org/10.5772/intechopen.88633*

**3.3 Results and discussions**

*Statistics of training and testing corpus.*

**Figure 2.**

**Table 1.**

*Cyberspace*

**Table 2.**

**Figure 3.**

**156**

*Extracted top bullying words.*

*Word cloud for bullying words.*

Bully-Latent Dirichlet Allocation model is an intended for pictorial representation of texts in a harassing message, given their predator and a pair of casualties.

Tweets 3,18,14,716 97,35,537 Retweets 76,20,335 2,87,567 URLs 85,45,112 4,76,234 Usernames 97,02,445 14,20,554 Hashtags 79,85,956 3,56,778

**Training corpus Testing corpus**

(a) (b)

**Word Prob Word Prob Word Prob** Fuck 0.0798 Bitch 0.0705 Naked 0.0588 Ass 0.0767 Freak 0.0699 Sexy 0.0569 shit 0.0752 Fat 0.0663 Mood 0.0547 Gay 0.0738 Dirty 0.0643 Lick 0.0519 Dumb 0.0722 Bullshit 0.0621 Bed 0.0508 Suck 0.0711 Kiss 0.0604 Piss 0.0495

*(a) Bullying words with their probability, and (b) List of bullying words.*

**Figure 4.** *Number of bullying tweets over time intervals.*

**Figure 5.** *Distributions of tweets per motive.*

B-LDA got crucial enrichment to facilitate specification the per-bullying message topic dispersion mutually on the predator and individual victims. Every topic includes multinomial distribution on words and every Predator-Casualty pair has a distribution on topics. So, subsidiary dispersions in excess on bullying subjects accustomed exclusively on a predator, or solely on a recipient, can be computed easily. For example, corpus comprising 135 persons and 35 k bulling messages, and also on 5 months of sending and receiving messages of a predator, comprising 17 victims and 19 k messages. B-LDA turns up tremendously prominent topics, and grants support that it predicts predator's motives. In the experiments, the hectic parameters α and β are fixed at 1 and 0.01 respectively. The number of topics T is also fixed at |T| = 5. For a 50 topic solution, Dataset from Twitter took 150 hours for 2000 iterations (5 min per iteration).

B-LDA proves the motive of the predator and track the activity of the predator with victims, using the following steps. First, the proportions of each predator contributing in each of the bullying topics are determined. Next the impacts of the predators throughout the time intervals on the bullying topics. The two users' threshold ε and λ are empirically set to 3.2106 and 2.0457, respectively. From each of the documents, B-LDA generates 5 topics with predators associated with each. The distribution of the different bullying topics from the documents is displayed in **Table 3**. From the table, predator p1 has a probability of 0.0547 for bullying topic t5. There is a need to prove the bullying motive of the predator with victim using specific time intervals within bullying topics. It could be characterized as trails: A tweet message is a triplet (a, μ, т), representing a textual bullying message μ written by the predator "a" at time т. A document, denoted by d, is a sequence of bullying messages ordered by т. From this definition, time т<sup>d</sup> is associated with both message μ<sup>d</sup> and predator ad.

The predator time contributions during time interval have been evaluated by:

$$F(a\_d^t)\_{\mathbf{T'}}^{\mathcal{T}} = \begin{cases} \text{active} & \text{if } \begin{aligned} & \text{if } \begin{aligned} p\left(a\_d^t\right)\_{\mathbf{T'}}^{\mathcal{T}} \ge \text{user threshold}, \mathbf{F}(t)\_{\mathbf{T'}}^{\mathcal{T}} \text{ is active} \\ \text{not-active} & \end{aligned} \end{cases} \end{cases} \tag{10}$$

A predator is said to be active and his/her motive of bullying during the interval [т s , т f ] for topic t if the probability of a predator participating in t, during that time period, exceeds the user-specified threshold, and *F t*ð Þ<sup>T</sup>*<sup>f</sup>* <sup>T</sup>*<sup>s</sup>* is active within that duration. The user enumerated threshold is calculated by taking an average of *ϑ<sup>t</sup> <sup>a</sup>* over predators for t. The contribution of a predator *a<sup>t</sup> <sup>i</sup>*,*<sup>d</sup>* within [т s , т f ], using *P a*<sup>T</sup>*<sup>s</sup>* <sup>j</sup>*<sup>t</sup>* � � <sup>¼</sup> *p a*T*<sup>s</sup>* j*d*T*s* ð Þ�*p t*T*<sup>s</sup>* j*d*T*s* ð Þ *p d*T*<sup>s</sup>* ð Þ per tome instance s, is mapped first in order to compute *p at i*,*d* � �<sup>T</sup>*<sup>f</sup>* T*s* . Next, the total probability for predator a<sup>t</sup> during [т s , т f ] is calculated as P<sup>T</sup>*<sup>f</sup>* <sup>T</sup>*<sup>s</sup> P a*<sup>T</sup>*<sup>s</sup>* <sup>j</sup>*<sup>t</sup>* � �. **Figure 6** shows the activity of predators over time. For example, the activity of predators in bullying topic t5,d5 during [15:00,21:00] can be analyzed in the following manner. Initially, the specified threshold is determined as 0.1770, for the average of *ϑ<sup>t</sup> <sup>a</sup>*. Then the mapping function is calculated for all predators. For example, a predator a5 and time instance s = 15:00 are considered to analyze. The mapping function is calculated as *P a*ð 5,T15 : 00j*t*5Þ ¼ 0*:*0547 and then the total probability of a5 is estimated by calculating <sup>P</sup>T21:<sup>00</sup> T15:<sup>00</sup> *P a*5,T ð Þ¼ *<sup>s</sup>* j*t*<sup>5</sup> 0*:*2307. When applying the transition function *F at d* � �<sup>T</sup>*<sup>f</sup>* <sup>T</sup>*<sup>s</sup>* , the predators (a1,a3) are active for bullying topic t5,d1 and the predators (a2,a4,a5) are not active.

#### **3.4 Performance evaluation**

The Perplexity of the model is used on test documents to estimate the execution of model and it is a customary measure for evaluating the operation of a probabilistic model. The adapted models are compared by means of perplexity on test datasets. Perplexity is extensively used in a probabilistic model for checking their quality. The perplexity of a couple of trial texts, (wd,pd) for d ϵ Dtest, is characterized as the exponential of the negative standardized predictive likelihood underneath the representation,

$$\text{perplexity}(wd|pd) = \exp\left[-\frac{\ln p(wd|pd)}{Nd}\right] \tag{11}$$

**MOTIVE = RACISM**

**159**

**TOPIC 5** **EXTREMISM**

Incorrect Improper

Indecent Ineligible

Unfit Unsuited

Room Raffish Square peg

Unworthy Predators: Victims

P1: V1 P2: V2 P3: V3 **MOTIVE = SEXUAL**

**TOPIC 30**

**CRUDE** 

Gay Suck Naked

0.0738 0.0711 0.0588

 Frequent

 0.0491

Bed

0.0508

Lick Kiss

0.0508

Fat

0.0663

Fug

0.0321

0.0519

Freak

0.0699

 Bumpuglies

 0.0423

Dirty

0.0643

Mood

0.0547

Bitch

0.0705

Pull

0.0456

**LANGUAGE**

**TOPIC 35**

**IMPLICIT**

**LANGUAGE**

**INDECENT**

**PROPOSALS**

**UNREFINED**

**LANGUAGE**

**SLANG WORDS**

**TOPIC 40**

**TOPIC 45**

**TOPIC 50**

 Prob 0.0547 0.0367 0.0361

P3: V3

0.0254

P1: V3

0.0246

P1: V3

0.0208

P4: V5

0.0236

P2: V2

0.0288

P1: V2

0.0254

P2: V6

0.0325

P5: V7

0.0257

P1: V1

0.0341

P4: V4

0.0352

P3: V5

0.0421

P1: V6

0.0284

 Predators: Victims

 Prob

 Predators: Victims

 Prob

 Predators: Victims

0.0271

0.0242 0.0231 0.0225 0.0214

0.021 0.0197 0.0193 0.0184 0.0173

Fright

0.0169

Burn

0.0132

Windowlicker

 0.0121

 Prob

 Predators: Victims

 Prob

Wigger

0.0201

Alarm

0.0176

 Aggravate

 0.0154

Terror

0.0187

 Rufflesb's feathers

Scare

0.0194

 Gotoofar

 0.0201

 0.0176

Phobia

0.0203

Nose

0.0215

Daft Autism

Freak Gimpy

0.0132

 Peckerwood

 0.0213

0.0154

Nigger

0.0221

0.0167

Cameljockey

 0.0238

0.0203

 Peckerwood

 0.0253

Panic

0.0212

Tee toe

0.0232

Cracy

0.0212

Nitchie

0.0276

Horror

0.0223

 Aggrieve

 0.0254

Fearful

0.0235

Piss

0.0506

Dumb

Blind

0.0342

Crow

0.0294

0.0722

Filthy

0.0304

*Classification Model for Bullying Posts Detection DOI: http://dx.doi.org/10.5772/intechopen.88633*

> Dread

0.0254

Bullshit

0.0621

Ass

0.0767

Dog

0.0312

 ColdSweat

 0.0265

Shit

0.0752

Fuck

0.0798

Pussi

0.0321

**TOPIC 10** **HOMOPHOBIA**

**TOPIC 15** **VIOLENCE**

**REF. TO** 

**HANDICAPS**

**TOPIC 20**

**TOPIC 25**

**SLURS**


### *Classification Model for Bullying Posts Detection DOI: http://dx.doi.org/10.5772/intechopen.88633*

B-LDA proves the motive of the predator and track the activity of the predator with victims, using the following steps. First, the proportions of each predator contributing in each of the bullying topics are determined. Next the impacts of the predators throughout the time intervals on the bullying topics. The two users

of the documents, B-LDA generates 5 topics with predators associated with each. The distribution of the different bullying topics from the documents is displayed in **Table 3**. From the table, predator p1 has a probability of 0.0547 for bullying topic t5. There is a need to prove the bullying motive of the predator with victim using specific time intervals within bullying topics. It could be characterized as trails: A

The predator time contributions during time interval have been evaluated by:

A predator is said to be active and his/her motive of bullying during the interval

tion. The user enumerated threshold is calculated by taking an average of *ϑta* over

ð Þ per tome instance s, is mapped first in order to compute *p a*

t

:<sup>00</sup> *P a*5,T ð Þ¼ *<sup>s</sup>* j*t*<sup>5</sup>

The Perplexity of the model is used on test documents to estimate the execution of model and it is a customary measure for evaluating the operation of a probabilistic model. The adapted models are compared by means of perplexity on test datasets. Perplexity is extensively used in a probabilistic model for checking their

ized as the exponential of the negative standardized predictive likelihood under-

*perplexity wd* ð Þ¼ j*pd* exp

1,a

**Figure 6** shows the activity of predators over time. For example, the activity of predators in bullying topic t5,d5 during [15:00,21:00] can be analyzed in the following manner. Initially, the specified threshold is determined as 0.1770, for the average of *ϑta*. Then the mapping function is calculated for all predators. For example, a

> : 00 j *t* <sup>5</sup>Þ ¼ 0

P T21 :00 T15

<sup>T</sup>*<sup>s</sup>* , the predators (a

5) are not active.

quality. The perplexity of a couple of trial texts, (w

] for topic t if the probability of a predator participating in t, during that time

μ ,

‐*active otherwise*

period, exceeds the user-specified threshold, and *F t*ð Þ

т. From this definition, time

*td* � � T*f* T*s*

λ are empirically set to 3.2106 and 2.0457, respectively. From each

т), representing a textual bullying message

т

≥*users threshold*

*a ti* ,

during [ т s , т f

<sup>5</sup> and time instance s = 15:00 are considered to analyze. The mapping

0

d,p

� ln *p wd* ð Þ <sup>j</sup>*pd Nd* � �

d) for d

ϵ

Dtest, is character-

(11)

т. A document, denoted by d, is a sequence of bullying

T*f*

*<sup>d</sup>* within [

, *F t*ð Þ T*f*

> т s , т f

] is calculated as

*:*0547 and then the total probability of

3) are active for bullying topic t5,d1 and the

*:*2307. When applying the transi-

<sup>d</sup> is associated with both message

<sup>T</sup>*<sup>s</sup> is active*

<sup>T</sup>*<sup>s</sup>* is active within that dura-

], using *P a*

P T*f* <sup>T</sup>*<sup>s</sup> P a* T*s* j *t* � � .

T*s* j *t* � �

> *ti* , *d* � � T*f* T*s* .

¼

threshold

*Cyberspace*

ε and

tweet message is a triplet (a,

" a

> d .

*not*

(

" at time

<sup>¼</sup> *active if p a*

predators for t. The contribution of a predator

Next, the total probability for predator a

function is calculated as *P a*ð 5,T15

� � T*f*

<sup>5</sup> is estimated by calculating

2,a 4,a

**3.4 Performance evaluation**

neath the representation,

by the predator

*F a td* � � T*f* T*s*

μ

[ т s , т f

*p a* T*s* j *d* T*s* ð Þ

messages ordered by

<sup>d</sup> and predator a

�*p t* T*s* j *d* T*s* ð Þ

*p d* T*s*

predator a

tion function *F atd*

predators (a

a

**158**

'

μ written

(10)


**MOTIVE = OUTRAGE**

**161**

**TOPIC 60**

**ANGER**

Resent Rancor Grudge

Flap Predators: Victims

P1: V7 P2: V4 P3: V3

**Table 3.** *The distribution*

 *for the different bullying topics from the documents.*

0.0237 0.0209 0.0192 0.0163

Prob 0.0241 0.0219 0.0147

P1: V3

P4: V5

P5: V2

Predators: Victims

Nappy Natter

Nick Nonce

0.0154 0.0142 0.0126 0.0118

Prob 0.0207 0.0165 0.0126

P4: V5

P5: V8

P2: V6

Predators: Victims

POP Restroom

Rider

Sick

0.0143

0.0137

0.0129

*Classification Model for Bullying Posts Detection DOI: http://dx.doi.org/10.5772/intechopen.88633*

0.0109

Prob

0.0175

0.0154

0.0132

**MOTIVE =** 

**TOPIC 70**

**IRRELEVANT**

**MOTIVE =** 

**TOPIC 90**

**UNKNOWN**

#### *Cyberspace*


**Table 3.** *Thedistributionfor the differentbullyingtopics*

 *from the documents.*
