**5.1 Synthesis of previous work carried on twitter data**

In a research conducted by Daniel Ricardo Jaimes Moreno et al., twitter data was utilized to predict personality traits. On twitter users generally post content in form of text i.e., tweets. For training purpose, the author utilized PAN CLEF dataset of 152 users containing 14,166 tweets. The prediction of personality was seen as a classification problem where after creating TFIDF matrix, different dimensionality reduction techniques were applied like PCA (Principal Components Analysis), LDA (Linear Discriminant Analysis) and NMF (Non-negative Matrix Factorization) for extracting latent features. The results displayed that for extraverted traits, TFIDF and PCA performed better whereas, LDA technique gave better results for agreeableness, conscientiousness and stability. Therefore, the best performance was performed by LDA technique [32].

Aditi V.Kunte et al. [33] made hypothesis based on twitter dataset using Twitter API. Data pre-processing was done by conversion to lower case, removing stop-words and special characters. Classification algorithms were then applied to the preprocessed data in order to classify user personality in the class labels of Big Five personality prediction model. It was observed that accuracy, precision recall and F1-score were highest for Multinomial NB as compared to other classification algorithms [34].

Pavan Kumar K. N. and Marina L. Gavrilova [34] aimed to determine MBTI personality traits by taking user generated data in form of latest 50 tweets. For this purpose, the author used a combination of TFIDF, GloVe word embedding technique and SVM classifier. The TFIDF document term matrix and GloVe embeddings of the tweets posted by an individual is utilized to construct decision tree ensembles composed of CART (Classification and Regression tress).

It was observed that the prediction accuracy changed across the MBTI dimensions where S-N (Sensation-Intuition) and E-I (Extraversion-Introversion) dimensions were considerably more reliable. Alexia Katrimpouza et al. [35] used questionnaire and educational activities of students on Twitter in order to determine how learning outcomes are correlated to twitter usage. In total, three studies were conducted where twitter activities of students were analyzed during each study. It was seen that the personality characteristics Openness and Conscientiousness were related with the twitter usage in one of the studies. The tools to implement this study are- SML scale (Social Media Learning), Big Five personality test, TAS (Technology Affinity Survey) and ICTL (Communications Technology Learning). Fabio R. Galloa et al. [36] has made hypothesis that NKB (Network Knowledge Base) model can be associated with personality prediction to develop a hybrid model to predict actions and reactions made by an individual in their social networking feeds. A specimen of NKB and stream of news items for each individual was used to train classifier in order to predict if the user will make a move i.e. a certain action in a certain amount of time. For tuning hyper- parameters the different algorithms utilized were- Logistic Regression, One Class SVM, Random Forests, Decision Trees, Multinomial Naive Bayes, and Complement Naive Bayes. The bigger scope of this study is to reduce the pathogenic feeds by analyzing the information flow on social media [36] (**Table 4**).

Utku Pamuksuz et al. and Joseph T. Yun et al. [38] targeted on three brands i.e. McDonald's, Harley-Davidson and Tom's Shoes, to determine personality of the brand's twitter accounts and the followers associated with it. The main objective of the author was to obtain the connection between human personality and brand's personality on social networking platforms. Crimson Hexagon was utilized to accumulate



#### **Table 4.**

*Predicting Personality using content posted on Twitter.*

brand related user generated dated between 13 July 2009 until 1 October 2015. To attain relation between users and brands, cosine similarity measure was leveraged. The results displayed that the personality type of twitter followers of Harley and Tom's was closer to the brand personality but it was not the same for McDonald's and its followers. However, this difference was mitigated on removing neuroticism from the analysis [38].
