**2.2. Data pre-processing**

The corpus was also used offline where it was analyzed using machine learning and spreadsheet tools during pre-processing, classification and the post processing of the data. A function built into RStudio was then used to remove unwanted characters, texts, punctuations and numbers from the text files created as a result of the extracted data from Twitter. After successfully searching Twitter and obtaining the number of tweets required, the tweets were 'cleaned' using RStudio's cleaning function.
