2.3.1. Textual features

In extracting textual features, two major processes are executed, including filtering and feature calculation. The filtering will derive the key information out of tweets. The feature calculation will represent the significance of a word within a given document using a measurement named term frequency-inverse document frequency (TF-IDF) [13].

The filtering consists of five major steps including: filtering tweets in such way that they are in English only; converting all words to lowercase; converting the string to a list of tokens based on whitespace; removing punctuation marks from the text; eliminating common words that do not tell anything about the dataset (such as the, and, for, etc.); and reducing each word to its stem by removing any prefixes or suffixes.

After the filtering, TF-IDF is calculated, which is a statistical measure that details the significance of a word within tweets based on how often the word occurs in an individual tweet compared with how often it occurs in other tweets [14]. The advantage of using the TF-IDF algorithm technique is that it allows the retrieval of information since the TF-IDF values increase proportionally with the number of times a certain keyword appears in a document, being offset by the frequency of the word in the database. The TF-IDF algorithm utilizes a combination of term frequency and inverse document frequency.

Suppose there is a vocabulary of k words, then each document is represented by a k-vector Vd ¼ t1; …; ti ð Þ ; …; tk <sup>T</sup> of weighted word frequencies with components ti. TF-IDF is computed as follows:

$$t\_i = \frac{n\_{id}}{n\_d} \log \frac{N}{n\_i} t\_i = \frac{n\_{id}}{n\_d} \log \frac{N}{n\_i} t\_i = \frac{n\_{id}}{n\_d} \log \frac{N}{n\_i} \tag{1}$$

where nid is the number of occurrences of word i in document d, ndis the total number of words in document d, niis the number of occurrences of term i in the database, and N is the total number of documents in the database. It can be seen that TF-IDF is a product of the word frequency nid nd and the inverse document frequency log <sup>N</sup> ni . For a word <sup>i</sup>, the more it occurs in document d (i.e., the higher the nid is), the bigger the ti is, meaning the word i is more significant. Note here, the significance of the word i in document d is offset by the frequency of the word in the whole database. This offsetting will result in different ti for word i that are unevenly distributed among the documents.

2.4.1. About multimedia data fusion

decision-making.

2.4.2. Kernel-based data fusion

product in the following equation:

kernel functions are:

classification can be applied further to analyze the data.

Multimedia data fusion is the process in which different features of multimedia are brought together for the purpose of analyzing specific media data. Some common multimedia analyses that enable understanding of multimodal data include event detection, human tracking, audiovisual speaker detection, and semantic concept detection. The purpose of data fusion is to ensure that the algorithm of a process is improved. Through the use of a fusion strategy, the multimedia analysis can improve the accuracy of the output, resulting in more reliable

Multiple Kernel-Based Multimedia Fusion for Automated Event Detection from Tweets

http://dx.doi.org/10.5772/intechopen.77178

55

There are many fusion methods such as linear fusion, linear weighted fusion, nonlinear fusion, and nonlinear weighted fusion. This study relates to a fusion strategy of combining both textual and visual modalities in the context of event detection. A new method of multimedia fusion has been proposed. It is based on multiple kernel learning (MKL). It has the advantage

Kernel methods are based on a kernel function, which is a similarity function that finds similarities over pairs of data points. The kernel function enables the kernel method to operate in a high-dimensional space by simply applying an inner product. The kernel method introduces nonlinearity into the decision parameters by simply mapping the original features of the original sources onto a higher dimensional space. For kernel function κ x; y and mapping function ϕ : X ! F, the model built by the kernel method can be expressed as an inner

where <sup>κ</sup> <sup>x</sup>; <sup>y</sup> is positive semidefinite and <sup>ϕ</sup> : <sup>X</sup> ! <sup>F</sup> maps each instance <sup>x</sup>, <sup>y</sup> into feature space F, which is a Hilbert space. With the kernel method, a simple mining technique such as

Kernel methods can be described as a class of algorithms for pattern analysis, whose best member is the support vector machine [18]. There are many kernel methods including polynomial, fisher, radial basis functions (RBF), string, and graph kernels. Several commonly used

where xi and x are two samples represented as feature vectors, x � x<sup>0</sup> k k is the distance between

Radial basis function RBF ð Þ : κ xi ð Þ¼ ; x e

the two feature vectors, σ is a free parameter, and p is a constant.

<sup>κ</sup>ð Þ¼ <sup>x</sup>; <sup>y</sup> <sup>ϕ</sup>ð Þ <sup>x</sup> <sup>∙</sup>ϕð Þ<sup>y</sup> (2)

Linear function : κ xi ð Þ¼ ; x xi∙x (3)

�k k xi�<sup>x</sup> <sup>2</sup>

=2σ<sup>2</sup>

(5)

Polynomial function : <sup>κ</sup> xi ð Þ¼ ; <sup>x</sup> xi ½ � ð Þþ <sup>∙</sup><sup>x</sup> <sup>1</sup> <sup>p</sup> (4)

of incorporating with classifier learning and handling a big volume of data.
