**Abstract**

The current chapter introduces a procedure that aims at determining regions that are on fire, based on Twitter posts, as soon as possible. The proposed scheme utilizes a deep learning approach for analyzing the text of Twitter posts announcing fire bursts. Deep learning is becoming very popular within different text applications involving text generalization, text summarization, and extracting text information. A deep learning network is to be trained so as to distinguish valid Twitter fire-announcing posts from junk posts. Next, the posts labeled as valid by the network have undergone traditional NLP-based information extraction where the initial unstructured text is converted into a structured one, from which potential location and timestamp of the incident for further exploitation are derived. Analytic processing is then implemented in order to output aggregated reports which are used to finally detect potential geographical areas that are probably threatened by fire. So far, the part that has been implemented is the traditional NLP-based and has already derived promising results under real-world conditions' testing. The deep learning enrichment is to be implemented and expected to build upon the performance of the existing architecture and further improve it.

**Keywords:** deep learning, NLP procedure, fire burst detection, twitter posts, valid posts

#### **1. Introduction**

Due to their cost and easy access, social media and Twitter, among them, are widely used as sources of news and means of information spreading. Among others, fire bursts are such breaking news that can be initially made known through Twitter posts.

Mega fires often result in significant environmental destructions, major damages on infrastructures, and economic loss. Most importantly, they put at stake the lives, not only of the civilians but also of the forest fire personnel. Thus, technologies that facilitate early fire detection are important for reducing fires and their negative effects.

text provided by social media posts, especially focused on Twitter. The output is the

The architecture presented in [3], namely, ConvNet, uses a convolutional layer to capture the dependency between the text and its metadata. For the case of the metadata, a standard max pooling and a bidirectional Long Short-Term Memory (LSTM) auto-encoder layer follow. For the case of the text, only a max-pooling layer is implemented. Finally, the max-pooled text representations are concatenated with the metadata representation from the bidirectional LSTM. The merged concatenations are fed to a fully connected layer with a softmax activation function.

The work in [4] presents the FakeNewsTracker architecture. This is a deep learning architecture which is divided into two sub-schemes. The first sub-scheme uses an LSTM deep network [5] in order for the system to be trained on the post representation context. The second sub-scheme utilizes a recursive neural network (RNN) in order to be trained on the context of social engagements. The output features of the aforementioned sub-schemes are fused together to perform a binary

The DeClarE architecture [6] is based on bidirectional LSTMs in order to result in a credibility score related to the input post. The scheme also considers post source and claims information, which is processed within the bidirectional LSTM dense layers. The concatenated output is also processed by two dense layers and a softmax

The work in [7] introduces a hybrid architecture approach which combines an LSTM and a convolutional neural network (CNN) model. Throughout this chapter, the aforementioned architecture will be called Hybrid LSTM-CNN. The LSTM was adopted for the sequence classification of the data. The 1D CNN was added immediately after the word embedding layer of the LSTM model. A max-pooling layer is also recruited to reduce dimensionality of the input layer, thus avoiding training over-fitting of the training data. This also helps in reducing the resources for the

The FakeDetector architecture [8] relates post creators to posts and subjects. It contains a Hybrid Feature Vector Unit (HFLU) which extracts the feature vector based on a specific input. The feature vector is fed to the gated diffusive unit (GDU) model for effective relationship modeling among news articles, creators, and subjects. Formally, the GDU model accepts multiple inputs from different sources simultaneously. The GDU applies softmax operation on the output vector before assigning a credibility label. For a more explicit sight on deep learning architectures,

Before deciding which architecture fits best in our specific case, the direct

To begin with, detecting fake posts in real time is an essential requirement of the process. Rapid decision whether a fire-bursting declaration post is fake or not leads to fast implementation of the NLP procedure (as described in Section 3). The latter, in turn, facilitates the timely detection of the geographical areas threatened by fire as soon as possible which helps toward the prevention of the majority of negative

requirements of the overall procedure should be recorded.

classification procedure which labels the input news as "fake" or "valid."

layer before the prediction of the credibility score.

decision whether the input text corresponds to a "valid" or "fake" post. The 3HAN architecture [2] utilizes a three-level hierarchical attention network. Each of the three levels corresponds to words, sentences, and headline analysis. The three-edge analysis results in the construction of a news vector which represents the input post. The latter vector is used for classifying the reliability

*Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection…*

of the post.

This generates the final prediction.

*DOI: http://dx.doi.org/10.5772/intechopen.85075*

training of the model.

the reader is referred to [9].

**2.3 Procedural requirements**

**81**

Our approach proposes the combination of a deep learning architecture along with a more traditional natural language processing (NLP) one. The deep learning component of the system is responsible for filtering out the fake from the valid firerelated posts, so that only posts containing true fire-related information are retained. For this part of the system, we refer to current state-of-the-art systems for detecting fake news and adopt the one that suits the needs of our problem best. Once the fake posts are filtered out, each valid post is afterward fed into the NLPbased subsystem. By converting the unstructured, raw text into a structured one, the NLP-based subsystem is able to extract information, such as the geographical area of the fire reported in the post. In order to draw final conclusions about the possible fire sources, aggregation statistics over the posts containing similar firerelated information are computed, and probability values for each potential fire source are given as output.

The rest of the chapter is organized as follows: Section 2 describes and analyzes the deep learning-based architecture to be utilized for detecting valid Twitter posts regarding fire bursts. Section 3 illustrates the typical NLP-based architecture for extracting meaningful information from the unstructured text of a valid Twitter post. Section 4 presents the overall scheme and its final output. Finally, before the conclusions, Section 5 highlights the results of the up-to-date validated part of the overall proposed scheme.

## **2. Deep learning-based architecture for false Twitter post detection**

#### **2.1 Introduction**

Social media are low-cost and easy-to-access means of information sharing and, thus, nowadays are widely used as source of news and information. However, getting informed from social media is not always safe, as posts expressing fake news (i.e., news containing false information) are exponentially widespread, simultaneous to the boosting development of online social networks. In fact, fake news tends to outperform the valid ones in the near future [1].

In case of fire burst news, deciding whether a Twitter post is fake or not can be proven of crucial twofold importance. On the one hand, the required time and money for purposeless activation of the firefighting mechanisms are saved. On the other hand, timely confrontation of mega fires is facilitated. This will, in turn, make it less likely for human lives, the environment, and infrastructures to be jeopardized.

Thus, before extracting the crucial fire burst information at a later NLP-based stage, a preprocessing step, deciding whether a Twitter post that declares a fire burst is fake or not, is necessary. To this end, a deep learning-based architecture is to be implemented. The purpose of this architecture is to filter out the posts that will be characterized as "fake" and provide the sequential NLP procedure only with the "valid" posts.

#### **2.2 Candidate deep learning architectures**

In this subsection, the candidate state-of-the-art deep learning architectures for the detection of fake posts are described. The purpose of the subsection is to illustrate the most recent and modern approaches that have appeared from 2017 onward and have been examined. The input data, used by all architectures, is the

#### *Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection… DOI: http://dx.doi.org/10.5772/intechopen.85075*

text provided by social media posts, especially focused on Twitter. The output is the decision whether the input text corresponds to a "valid" or "fake" post.

The 3HAN architecture [2] utilizes a three-level hierarchical attention network. Each of the three levels corresponds to words, sentences, and headline analysis. The three-edge analysis results in the construction of a news vector which represents the input post. The latter vector is used for classifying the reliability of the post.

The architecture presented in [3], namely, ConvNet, uses a convolutional layer to capture the dependency between the text and its metadata. For the case of the metadata, a standard max pooling and a bidirectional Long Short-Term Memory (LSTM) auto-encoder layer follow. For the case of the text, only a max-pooling layer is implemented. Finally, the max-pooled text representations are concatenated with the metadata representation from the bidirectional LSTM. The merged concatenations are fed to a fully connected layer with a softmax activation function. This generates the final prediction.

The work in [4] presents the FakeNewsTracker architecture. This is a deep learning architecture which is divided into two sub-schemes. The first sub-scheme uses an LSTM deep network [5] in order for the system to be trained on the post representation context. The second sub-scheme utilizes a recursive neural network (RNN) in order to be trained on the context of social engagements. The output features of the aforementioned sub-schemes are fused together to perform a binary classification procedure which labels the input news as "fake" or "valid."

The DeClarE architecture [6] is based on bidirectional LSTMs in order to result in a credibility score related to the input post. The scheme also considers post source and claims information, which is processed within the bidirectional LSTM dense layers. The concatenated output is also processed by two dense layers and a softmax layer before the prediction of the credibility score.

The work in [7] introduces a hybrid architecture approach which combines an LSTM and a convolutional neural network (CNN) model. Throughout this chapter, the aforementioned architecture will be called Hybrid LSTM-CNN. The LSTM was adopted for the sequence classification of the data. The 1D CNN was added immediately after the word embedding layer of the LSTM model. A max-pooling layer is also recruited to reduce dimensionality of the input layer, thus avoiding training over-fitting of the training data. This also helps in reducing the resources for the training of the model.

The FakeDetector architecture [8] relates post creators to posts and subjects. It contains a Hybrid Feature Vector Unit (HFLU) which extracts the feature vector based on a specific input. The feature vector is fed to the gated diffusive unit (GDU) model for effective relationship modeling among news articles, creators, and subjects. Formally, the GDU model accepts multiple inputs from different sources simultaneously. The GDU applies softmax operation on the output vector before assigning a credibility label. For a more explicit sight on deep learning architectures, the reader is referred to [9].

#### **2.3 Procedural requirements**

Before deciding which architecture fits best in our specific case, the direct requirements of the overall procedure should be recorded.

To begin with, detecting fake posts in real time is an essential requirement of the process. Rapid decision whether a fire-bursting declaration post is fake or not leads to fast implementation of the NLP procedure (as described in Section 3). The latter, in turn, facilitates the timely detection of the geographical areas threatened by fire as soon as possible which helps toward the prevention of the majority of negative

facilitate early fire detection are important for reducing fires and their negative

related posts, so that only posts containing true fire-related information are

Our approach proposes the combination of a deep learning architecture along with a more traditional natural language processing (NLP) one. The deep learning component of the system is responsible for filtering out the fake from the valid fire-

retained. For this part of the system, we refer to current state-of-the-art systems for detecting fake news and adopt the one that suits the needs of our problem best. Once the fake posts are filtered out, each valid post is afterward fed into the NLPbased subsystem. By converting the unstructured, raw text into a structured one, the NLP-based subsystem is able to extract information, such as the geographical area of the fire reported in the post. In order to draw final conclusions about the possible fire sources, aggregation statistics over the posts containing similar firerelated information are computed, and probability values for each potential fire

The rest of the chapter is organized as follows: Section 2 describes and analyzes the deep learning-based architecture to be utilized for detecting valid Twitter posts regarding fire bursts. Section 3 illustrates the typical NLP-based architecture for extracting meaningful information from the unstructured text of a valid Twitter post. Section 4 presents the overall scheme and its final output. Finally, before the conclusions, Section 5 highlights the results of the up-to-date validated part of the

**2. Deep learning-based architecture for false Twitter post detection**

thus, nowadays are widely used as source of news and information. However, getting informed from social media is not always safe, as posts expressing fake news (i.e., news containing false information) are exponentially widespread, simultaneous to the boosting development of online social networks. In fact, fake news

proven of crucial twofold importance. On the one hand, the required time and money for purposeless activation of the firefighting mechanisms are saved. On the other hand, timely confrontation of mega fires is facilitated. This will, in turn, make it less likely for human lives, the environment, and infrastructures to be jeopar-

tends to outperform the valid ones in the near future [1].

**2.2 Candidate deep learning architectures**

Social media are low-cost and easy-to-access means of information sharing and,

In case of fire burst news, deciding whether a Twitter post is fake or not can be

Thus, before extracting the crucial fire burst information at a later NLP-based stage, a preprocessing step, deciding whether a Twitter post that declares a fire burst is fake or not, is necessary. To this end, a deep learning-based architecture is to be implemented. The purpose of this architecture is to filter out the posts that will be characterized as "fake" and provide the sequential NLP procedure only with the

In this subsection, the candidate state-of-the-art deep learning architectures for

the detection of fake posts are described. The purpose of the subsection is to illustrate the most recent and modern approaches that have appeared from 2017 onward and have been examined. The input data, used by all architectures, is the

effects.

*Cyberspace*

source are given as output.

overall proposed scheme.

**2.1 Introduction**

dized.

**80**

"valid" posts.

effects caused by mega fires. Therefore, the proposed architecture of [3] is not suitable for our use case, as it is not implemented in a fully automated manner.

Fake news detection accuracy is very important. High detection accuracy guarantees that the great majority of the posts that fed to be processed in the sequential NLP phase (see Section 3) express sincere fire burst claims. Thus, the final resulting fire-threatened geographical areas are much more likely to be actually threatened. Furthermore, the aforementioned accuracy needs to have been achieved in publicly available datasets and benchmarks. This windows the performance of the architecture much more reliable than others, tested on proprietary datasets. To this end, the FakeNewsTracker architecture [4] is not suitable for our use case, as it is tested on a proprietary dataset.

Last but not least, the architecture needs to be domain invariant. In other words, it needs to be generally applicable to any domain, other than the one(s) used for conducting training and testing procedures. More precisely, the accuracy of a system, detecting fake post that deal with fire burst, should not be significantly altered in the case of post that deal with any other domain (politics, sports, etc.). This makes the system architecture much more flexible and adoptable. From the remaining architectures analyzed in this section, only DeClarE [6] and the Hybrid LSTM-CNN [7] claim to be domain invariant. DeClarE has been tested on PolitiFact dataset [10] achieving accuracy 67.32%, while the Hybrid LSTM-CNN has been tested on PHEME dataset [11, 12] achieving 82.00% accuracy. Both datasets are publicly available. PolitiFact is a respected fact-checking website releasing a list of sites manually investigated and labeled. It mostly contains posts of political content. PHEME is also another EU-funded project whose results include collecting and annotating rumor tweets which are associated with nine different breaking news contents. Therefore, PHEME is a richer dataset with a wider variety of themes that makes the Hybrid LSTM-CNN system architecture [7] it has been tested on more suitable for our use case.

The procedural requirements for the fake post detection scheme with respect to the architectures analyzed in Section 2 are summarized in **Table 1**.

#### **2.4 Implementation architecture**

Based on the aforementioned requirements, the baseline of the architecture selected to be implemented follows the Hybrid LSTM-CNN architecture [7]. The overall architecture is illustrated in **Figure 1**. The *input layer* consists of Twitter posts which are, in fact, unstructured raw texts. A *word embedding layer* follows, within which the input text is parsed and is divided into a series of words and, consequently, into a series of sentences.


Each sentence is then consumed by the *CNN layer* of the architecture which is made up of a set of 1D CNNs based on the work presented in [13]. The CNNs of this layer are structured as illustrated in the upper part of **Figure 1**. The 1D convolutions, taking place within the CNNs (as defined by Eq. (10) of the Appendix), operate on sliding windows of the words of the sentence. Before outputting the outcome of the layer, max pooling is performed to reduce dimensionality and avoid over-fitting of the training data. This also helps toward reducing computational complexity of the training process. The output of each CNN is a fixed length vector, acting as a digital signature of the corresponding sentence and describing the nature of the sentence. Thus, a set of such description vectors (descriptors) are fed forward

*Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection…*

*DOI: http://dx.doi.org/10.5772/intechopen.85075*

for further process.

**83**

**Figure 1.**

*Suggested fake post detection architecture.*

#### **Table 1.**

*Procedural requirements for fake post detection part.*

*Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection… DOI: http://dx.doi.org/10.5772/intechopen.85075*

**Figure 1.** *Suggested fake post detection architecture.*

Each sentence is then consumed by the *CNN layer* of the architecture which is made up of a set of 1D CNNs based on the work presented in [13]. The CNNs of this layer are structured as illustrated in the upper part of **Figure 1**. The 1D convolutions, taking place within the CNNs (as defined by Eq. (10) of the Appendix), operate on sliding windows of the words of the sentence. Before outputting the outcome of the layer, max pooling is performed to reduce dimensionality and avoid over-fitting of the training data. This also helps toward reducing computational complexity of the training process. The output of each CNN is a fixed length vector, acting as a digital signature of the corresponding sentence and describing the nature of the sentence. Thus, a set of such description vectors (descriptors) are fed forward for further process.

effects caused by mega fires. Therefore, the proposed architecture of [3] is not suitable for our use case, as it is not implemented in a fully automated manner. Fake news detection accuracy is very important. High detection accuracy guarantees that the great majority of the posts that fed to be processed in the sequential NLP phase (see Section 3) express sincere fire burst claims. Thus, the final resulting fire-threatened geographical areas are much more likely to be actually threatened. Furthermore, the aforementioned accuracy needs to have been achieved in publicly available datasets and benchmarks. This windows the performance of the architecture much more reliable than others, tested on proprietary datasets. To this end, the FakeNewsTracker architecture [4] is not suitable for our use case, as it is tested on a

Last but not least, the architecture needs to be domain invariant. In other words, it needs to be generally applicable to any domain, other than the one(s) used for conducting training and testing procedures. More precisely, the accuracy of a system, detecting fake post that deal with fire burst, should not be significantly altered in the case of post that deal with any other domain (politics, sports, etc.). This makes the system architecture much more flexible and adoptable. From the remaining architectures analyzed in this section, only DeClarE [6] and the Hybrid LSTM-CNN [7] claim to be domain invariant. DeClarE has been tested on PolitiFact dataset [10] achieving accuracy 67.32%, while the Hybrid LSTM-CNN has been tested on PHEME dataset [11, 12] achieving 82.00% accuracy. Both datasets are publicly available. PolitiFact is a respected fact-checking website releasing a list of sites manually investigated and labeled. It mostly contains posts of political content. PHEME is also another EU-funded project whose results include collecting and annotating rumor tweets which are associated with nine different breaking news contents. Therefore, PHEME is a richer dataset with a wider variety of themes that makes the Hybrid LSTM-CNN system architecture [7] it has been tested on more

The procedural requirements for the fake post detection scheme with respect to

Based on the aforementioned requirements, the baseline of the architecture selected to be implemented follows the Hybrid LSTM-CNN architecture [7]. The overall architecture is illustrated in **Figure 1**. The *input layer* consists of Twitter posts which are, in fact, unstructured raw texts. A *word embedding layer* follows, within which the input text is parsed and is divided into a series of words and,

**Real time Accuracy Public dataset Domain invariance**

the architectures analyzed in Section 2 are summarized in **Table 1**.

**Architectures Procedural requirements**

3HAN [2] ✓✓ ✓ ConvNet [3] ✓ FakeNewsTracker [4] ✓ DeClarE [6] i ✓✓ ✓ Hybrid LSTM-CNN [7] ✓✓ ✓ ✓ FakeDetector [8] ✓✓ ✓

proprietary dataset.

*Cyberspace*

suitable for our use case.

**Table 1.**

**82**

**2.4 Implementation architecture**

consequently, into a series of sentences.

*Procedural requirements for fake post detection part.*

The *LSTM layer* follows, which is the core of the architecture. This layer consists of a set of LSTMs. It uses as input the sentence descriptors resulting from the CNN layer and outputs the final decision vector indicating whether the claim of the post is fake (F) or valid (V). Each LSTM of the layer is structured as presented in the lower part of **Figure 1**. LSTMs are chosen because they are proven to be robust for representing a series of data, such as the one we are dealing with here (i.e., series of words or sentences), as they are capable of capturing their internal temporal dependencies [9]. The LSTM layer is very interesting in terms of mathematics. For more information the reader is referred to Appendix.

be exploited by computer algorithms. Apparently, for a given unstructured text, many structured forms correspond each one holding different knowledge and representing different relations. As a result, the algorithm designer has the respon-

1. *Sentence segmentation*: the procedure of distinguishing different sentences.

3. *Part of speech tagging*: the procedure of characterizing each token of each

4.*Entity recognition*: the procedure of characterizing tokens or set of tokens of each sentence based on previous knowledge. For example, characterize words referring to geographic locations as "city," "country," "mountain," etc.

5. *Relation recognition*: the procedure of detecting specific combination of tokens that corresponds to a specific meaning relation among them. For example, the following segmented tagged sentence **'George' (SUBJECT, NAME)** \$ **'lives' (VERB, RELEVANT TO LOCATION)** \$ **'in'** \$ **'Athens' (OBJECT,**

The above procedures make use of text processing algorithms, knowledge representation, and information retrieval algorithms. In order to achieve text segmentation (sentence segmentation or tokenization), each text should be treated as an array of characters. Segmentation is based on the a priori knowledge of special characters that in most cases are used for splitting. For example, sentences usually end with a period mark "." or exclamation mark "!" or question mark "?" and begin with a capital letter. As a result a general rule for segmenting sentences would be to search for pairs: (special character \$ capital letter) or (special character \$ end of

Tagging procedures are usually based on knowledge databases and information retrieval algorithms. For this task, there is a need of having a lexical and syntactic and semantic database, which we call a corpora (of course different for each language!), which holds characterizations of several words to conceptual entities, and their relations in a structural way. Consequently, segmented texts (tokenized texts) are used as key vectors in order to retrieve from the corpora the corresponding

**LOCATION)** leads to the relation **lives('George', 'Athens')**.

characterization set. The most common approaches for this task are:

classification algorithms in order to specify the corresponding relations.

Conditional Random Fields (CRF).

Networks (ANN).

• *Sequential classification algorithms*: Hidden Markov Models (HMM) and

• *Classification algorithms*: Support Vector Machines (SVM) and Artificial Neural

Finally, relation extraction procedures demand from the algorithm designer to predefine either directly by specifying relation rules and use matching algorithms in order to detect word patterns corresponding to specific rule or indirectly by providing to the system several examples of annotated tagged sentences and then use

2.*Tokenization*: the procedure of splitting each sentence to structural components

The information extraction procedure consists of the following steps:

*Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection…*

sibility of selecting the appropriate structured form.

sentence to the corresponding part of speech.

(words and punctuations).

*DOI: http://dx.doi.org/10.5772/intechopen.85075*

text).

**85**
