2.2. Data preprocessing

The goal of data preprocessing is to discover important features from collected raw data. Preprocessing is a set of techniques used prior to analysis to remove imperfection, inconsistency, and redundancy. In this study, there was a high need to preprocess text data, because many tweets were not properly formatted or contained spelling errors. As a result, using a filter, cleaning is done before the text data are further handled. For image data in Twitter, we extracted the image's hyperlink and removed a tweet if its hyperlink was empty or did not work, since in this study, the tweet must contain both image and text. After preprocessing, the data will be ready for feature extraction.

Figure 2. An example of tweets on Brisbane hailstorm (left) and the word cloud for the event (right).
