4. Conclusion

where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively. In classifying an event such as a wildfire, a true positive (TP) is considered to be when a wildfire happened and a tweet from the wildfire data is classified as wildfire. If a tweet from the wildfire data is classified as not wildfire, this is a false negative (FN). In contrast, when a tweet from the data about a nonwildfire event is classified as wildfire, that is a false positive (FP). If a tweet from the data about a nonwildfire event is classified as not wildfire, that is a true negative (TN). For other events such as hailstorm, the classification is

Precision is a term that refers to the fraction of correctly retrieved tweets. It is a function of true

precision <sup>¼</sup> TP

The term recall refers to the fraction of relevant tweets that were retrieved. It is a function of correctly classified examples, i.e., true positives, and the false negatives true positive rate. It is

recall <sup>¼</sup> TP

F-score is introduced as the harmonic mean of precision and recall, in this way combining and

<sup>F</sup> � score <sup>¼</sup> <sup>2</sup><sup>∗</sup> precision∗recall

F-score measures how well a learning algorithm applies to a class. It is based on the weighted

In order to validate the performance of the proposed event detection based on multiple kernel learning, two other single kernel-based methods are also built and tested. Both of the other two methods take single media as input, i.e., text or image. The performance metrics of the proposed method and that of the other two methods for two events are given in Table 1.

From the table, it can be seen that for both the Brisbane hailstorm event and California wildfire event, the proposed method consistently achieved a better performance in all the four metrics than the methods using text only or image only. For example, the proposed method achieved an accuracy of 0.93 for Brisbane hailstorm, whereas the method of using text only achieved 0.89 and the method of using image only achieved 0.85. For California wildfire, the accuracy of the proposed method is 0.92, better than that of 0.90 and 0.86 of the other two methods. Comparing to the other two single kernel-based methods, it can also be seen that the proposed method has improved about 5%, 6%, 5%, and 6%, respectively, in accuracy, precision, recall,

TP <sup>þ</sup> FP (16)

TP <sup>þ</sup> FN (17)

precision <sup>þ</sup> recall (18)

applied in the same way.

defined as:

positives and false positives. It is defined as:

60 Machine Learning - Advanced Techniques and Emerging Applications

balancing precision and recall. It is defined as:

average of precision and recall.

3.3. Result and discussion

In this chapter, a method for detecting hot events, in particular disasters such as hailstorm and wildfires, is proposed. The approach uses visual information as well as textual information to improve the performance of detection. It starts with monitoring a Twitter stream to pick up tweets having texts and images, and storing them in a database. After that, Twitter data is preprocessed to eliminate unwanted data and transform unstructured data into structured data. Then, features in both texts and images are extracted for event detection. For feature extraction from the text, the term frequency-inverse document frequency technique is used. For images, the features extracted are: histogram of oriented gradients descriptors for object detection, gray-level co-occurrence matrix for texture description, color histogram, and scale-invariant features transform. In the next step, text features and image features are input to the multiple kernel learning (MKL) for fusion. MKL can automatically combine both feature types in order to achieve the best performance. The proposed method was tested on two datasets from two events, including Brisbane hailstorm 2014 and California wildfires 2017. The method is compared with a method that used text only and another method that used images only. With the Brisbane hailstorm data, the proposed method achieved the best performance, with a fusion accuracy of 0.93, compared to 0.89 with text only, and 0.85 with images only. With the California wildfires data, the proposed method achieved the best performance, with a fusion accuracy of 0.92, compared to 0.90 with text only, and 0.86 with images only. It has demonstrated that event detection from multimedia data in Twitter is enhanced and improved by our approach of using a combination of multiple features for both images and text. The proposed method also improves computational efficiency when handling big volumes of data, and gives better performance than other fusion approaches. It has delivered an accurate and effective detection method for detecting events, which can be used for spreading awareness and organizing responses.

The research presents a breakthrough in terms of risk management strategies, one that can improve public health preparedness and lead to better disaster management actions.

[8] Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twitter users: Real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide

Multiple Kernel-Based Multimedia Fusion for Automated Event Detection from Tweets

http://dx.doi.org/10.5772/intechopen.77178

63

[9] Yardi S, Boyd D. Tweeting from the Town Square: Measuring geographic local networks.

[10] Ozdikis O, Senkul P, Oguztuzun H. Semantic expansion of hashtags for enhanced event detection in Twitter. In: Proceedings of the 1st International Workshop on Online Social

[11] Zhang X, Chen X, Chen Y, Wang S, Li Z, Xia J. Event detection and popularity prediction

[12] Nguyen T V, DaoMS, Mattivi R, Sansone E, De Natale F G, Boato G. Event Clustering and Classification from Social Media:Watershed-Based and Kernel Methods. Editors: M. Larson, et al. Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 18-19, 2013, volume 1043 of CEUR Workshop Proceedings; 2013

[13] Wu H, Luk R, Wong K, Kwok K. Interpreting TF-IDF term weights as making relevance

[14] Wang Z, Shawe-Taylor J. A kernel regression framework for SMT. Machine Translation.

[15] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings

[16] Mohanaiah P et al. Image texture feature extraction using GLCM approach. International

[17] Zhang D, Islam MM, Lu G. A review on automatic image annotation techniques. Pattern

[18] Pelillo M. Similarity-Based Pattern Analysis and Recognition. Berlin, Germany: Springer

[19] Borra S, Rocci R, Vichi M, Schader M. Advances in Classification and Data Analysis.

[20] Lanckriet GR, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research. 2004;5:27-72 [21] Snoek CG, Worring M, Smeulders AW. Early versus late fusion in semantic video analysis. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, ACM;

[22] Lan Z, Bao L, Yu S, Liu W, Hauptmann AG. Multimedia classification and event detection

using double fusion. Multimedia Tools and Applications. 2014;71(1):333-347

Journal of Scientific and Research Publications. 2013;3(5):1-5. ISSN 2250-3153

decisions. ACM Transactions on Information Systems. 2008;26(3):13-49

Web; 2010; pp. 851-860

Systems; 2012

2010;24(2):87-102

2005. pp. 399-402

Recognition. 2012;45(1):346-362

Science & Business Media; 2013

In: Proceedings of the ICWSM; 2010. pp. 194-201

in microblogging. Neurocomputing. 2015;149:1469-1480

of Computer Vision and Pattern Recognition; 2005

Berlin, Germany: Springer Science & Business Media; 2012
