**2.3. Data classification**

a stir socially in Jamaica including President Barack Obama's historic working visit to the island country [2]. The aim of the study is to analyze the opinions and emotion expressed by citizens based on these topical issues and classifies them by emotions, feelings and polarity. It utilizes three machine learning algorithms to classify citizens perceptions namely decision tree J48, PART and naive bayes; and identifies the accuracy of the data classified based on the polarity. The classifiers identified the polarity reflected and which opinion is more dominant of the three (negative, positive or neutral). Research was undertaken on four topical issues in Jamaica: (1) The decriminalization of marijuana in Jamaica (2) Kaci Fennell's placing in the Miss Universe competition (3) The Riverton Landfill fire and (4)

The Sentiment analysis process consists of four main steps outlined in [3]: Data Acquisition,

In this study, the twitter R package was used with RStudio to extract tweets which were subsequently used to create charts and classify data into emotions and polarity. Installation of packages such as install.packages ("twitteR", "ROAuth", "plyr") were required. The searchTwitter() function, found in the R library was used to obtain tweets on selected topics. Hashtags, single and double quotes were parameters accepted by the searchTwitter() function as a means of searching the Twitter API for tweets related to the keywords used in the search, for example temp = searchTwitter("#Jamaica Marijuana") would download tweets with the hashtag Jamaica Marijuana. It allows queries against the indices of recent or popular tweets and behaves similarly to, but not exactly like the search features available in Twitter mobile or web clients, making it very effective and easy to use

The population comprised of a corpus eleven thousand two hundred and five (11,205) tweets that were extracted from Twitter between January and April 2015. A search was done on

The corpus was also used offline where it was analyzed using machine learning and spreadsheet tools during pre-processing, classification and the post processing of the data. A function built into RStudio was then used to remove unwanted characters, texts, punctuations and numbers from the text files created as a result of the extracted data from Twitter. After successfully searching Twitter and obtaining the number of tweets required, the tweets were

Twitter to extract tweets on Jamaican topics that were not older than 2 weeks.

Barack Obama's working visit to Jamaica.

66 Machine Learning - Advanced Techniques and Emerging Applications

Data Pre-processing, Data Classification and Data Analysis.

**2. Methodology**

**2.1. Data acquisition**

in searching Twitter.

**2.2. Data pre-processing**

'cleaned' using RStudio's cleaning function.

RStudio provided two functions that analyzed the tweets and classified them into polarity (negative, neutral and positive) and emotion (joy, anger, fear, surprise). Analysis was done both on tweets (not re-tweeted) as well as re-tweets. After compiling the polarity function to classify the tweets into negative, positive and neutral polarities, the team observed that a number of tweets were classified incorrectly. This was a result of R's inability to understand the Jamaican dialect and RStudio's limited dictionary of words. Classifying tweets into emotions proved to be another challenge as majority of the tweets for the different topical issues returned a result of "unknown" for the emotion associated with the tweet. Both these tools, which are essential components of the sentiment analysis research being conducted, were somewhat ineffective in describing and classifying the data that was collected from Twitter.
