2. The proposed algorithm

instantly what is happening in their location in 140-character messages, or tweets. Twitter is an information system that provides a real-time reflection of its users. As a consequence, Twitter serves as a rich source for exploring what is attracting users' attention and what is happening around the world. For example, for news and communications in time of a disaster, social media users use Twitter to tweet and post text, images, and video through their smartphones and tablets. As a result, Twitter becomes a good source for detection of events such as disasters [1]. An event is the basis on which people form and recall memories. Events are a natural way to refer to any observable occurrence that groups persons, places, times, and activities together. They are useful because they help us make sense of the world around us, helping to recollect real-world experiences, explaining phenomena that we observe, or assisting us in predicting future events. Social events are the events that are attended by people and are represented by multimedia content shared online. Instances of such events are concerts, disasters, sports events, public celebrations, or protests. Twitter platform forms a rich site for news, events, and information mining. It allows the posting of images and videos to accompany tweets produced by users of the site. As a result, the site contains multimedia content which can be mined using complicated algorithms. However, due to the huge burst in information, event detection in Twitter is a complicated task that requires a lot of skill and expertise in data mining. Here, an event detection is a data mining task aiming to identify the event in a media collection. To enhance the process of event detection, an automatic algorithm needs be devel-

Many approaches have been proposed for event detection [2–4]. For event detection using Twitter data, there are different ways to detect event, including using part of speech technique [5], hidden Markov model (HMM) [6], and term frequency and inverse document frequency (TF-IDF), and part-of-speech (POS) tagging and parsing. Alqhtani et al. [7] introduced a data fusion approach in multimedia data for earthquake detection in Twitter by using kernel fusion. It had achieved a high detection accuracy of 0.94, comparing to accuracy of 0.89 with texts only, and accuracy of 0.83 with images only. Sakaki et al. [8] showed that mining of relevant tweets can be used to detect earthquake events and predict the earthquake center in real time by using TF-IDF. In the process of event detection, the method utilized TF-IDF to eliminate redundant information or keywords. It provided a way of real-time interaction for earthquakes in Twitter. It developed a classifier based on several features including keywords, the number of words and the context, location and time of the words. It used a probabilistic spatiotemporal model to detect the location of the earthquake happened in Japan. Yardi and Boyd [9] used keyword search to present the role of stream news in spreading local information from Twitter for two accidents including a shooting and a building collapse. Ozdikis et al. [10] discussed an event detection method for various topics in Twitter using semantic similarities between hashtags based on clustering. Zhang et al. [11] proposed an event detection from online microblogging stream. It combined the normalized term frequency and user's social relation to weight words. Although many approaches have been proposed for event detection using Twitter data, most of them used no images but only textual analysis of tweet texts. With the cases of using images, restrictions had been applied. For example, Nguyen et al. [12] used textual features and image features for event detection. However, they focused on the principle

oped to mine multimedia information.

50 Machine Learning - Advanced Techniques and Emerging Applications

The proposed automatic event detection method includes five steps, including Twitter data collection, data preprocessing, features extraction, multimedia data fusion, and final event detection. The block diagram of the proposed method is shown in Figure 1. The following subsections explain the details of these five steps of the proposed algorithm.
