**3. NLP-based architecture for Twitter post information extraction**

#### **3.1 Introduction**

This component consists of two sub-modules: (a) the fire incident report detection sub-module and (b) the fire incident report analytic sub-module. The first one is responsible for acquiring reports made by civilians on the Twitter platform and detects reports that refer to a potential fire incident. These reports are stored in a structured way. The fire incident report analytic sub-module is responsible for aggregating the detected fire incident reports, and based on the number of these reports and the location these reports refer to, it concludes to a probability that there was a significant amount of people that reported a fire incident at a specific location. The final output is the result along with a geographic area and a reliability score of each location and the coordinates of each location.

#### **3.2. Fire incident detection**

#### *3.2.1 Introduction to information extraction*

Natural language processing (NLP) is a field of computer science responsible for the study and analysis of raw text. The purpose of this field is to enhance human-computer communication by constructing systems that are capable of understanding raw text and incorporate interaction interfaces based on textual messages. Some of the main topics of NLP are learning syntactic and semantic rules and determining concept, topics, and sentiment from a document, automatic summarization, machine translation, natural language generation, information extraction, etc. [14].

Information extraction corresponds to the section of NLP which is responsible for the analysis of unstructured textual pieces and conversion to a structured form. For example, the conversion of the following unstructured text (raw text):

"Yesterday, New York based Foo Inc. announced their acquisition of Bar Corp."

to the structured form:

MergerBetween('Foo Inc', 'Bar Corp', date …)

The above-structured form corresponds to a relation of various entities that were embedded in the initial unstructured raw text. The benefit of this conversion is that structured relations can be manipulated by computer algorithms and finally be exploited by computer algorithms. Apparently, for a given unstructured text, many structured forms correspond each one holding different knowledge and representing different relations. As a result, the algorithm designer has the responsibility of selecting the appropriate structured form.

The information extraction procedure consists of the following steps:


The above procedures make use of text processing algorithms, knowledge representation, and information retrieval algorithms. In order to achieve text segmentation (sentence segmentation or tokenization), each text should be treated as an array of characters. Segmentation is based on the a priori knowledge of special characters that in most cases are used for splitting. For example, sentences usually end with a period mark "." or exclamation mark "!" or question mark "?" and begin with a capital letter. As a result a general rule for segmenting sentences would be to search for pairs: (special character \$ capital letter) or (special character \$ end of text).

Tagging procedures are usually based on knowledge databases and information retrieval algorithms. For this task, there is a need of having a lexical and syntactic and semantic database, which we call a corpora (of course different for each language!), which holds characterizations of several words to conceptual entities, and their relations in a structural way. Consequently, segmented texts (tokenized texts) are used as key vectors in order to retrieve from the corpora the corresponding characterization set. The most common approaches for this task are:


Finally, relation extraction procedures demand from the algorithm designer to predefine either directly by specifying relation rules and use matching algorithms in order to detect word patterns corresponding to specific rule or indirectly by providing to the system several examples of annotated tagged sentences and then use classification algorithms in order to specify the corresponding relations.

The *LSTM layer* follows, which is the core of the architecture. This layer consists of a set of LSTMs. It uses as input the sentence descriptors resulting from the CNN layer and outputs the final decision vector indicating whether the claim of the post is fake (F) or valid (V). Each LSTM of the layer is structured as presented in the lower part of **Figure 1**. LSTMs are chosen because they are proven to be robust for representing a series of data, such as the one we are dealing with here (i.e., series of words or sentences), as they are capable of capturing their internal temporal dependencies [9]. The LSTM layer is very interesting in terms of mathematics. For

**3. NLP-based architecture for Twitter post information extraction**

This component consists of two sub-modules: (a) the fire incident report detection sub-module and (b) the fire incident report analytic sub-module. The first one is responsible for acquiring reports made by civilians on the Twitter platform and detects reports that refer to a potential fire incident. These reports are stored in a structured way. The fire incident report analytic sub-module is responsible for aggregating the detected fire incident reports, and based on the number of these reports and the location these reports refer to, it concludes to a probability that there was a significant amount of people that reported a fire incident at a specific location. The final output is the result along with a geographic area and a reliability

Natural language processing (NLP) is a field of computer science responsible for the study and analysis of raw text. The purpose of this field is to enhance human-computer communication by constructing systems that are capable of understanding raw text and incorporate interaction interfaces based on textual messages. Some of the main topics of NLP are learning syntactic and semantic rules and determining concept, topics, and sentiment from a document, automatic summarization, machine translation, natural language generation, information

Information extraction corresponds to the section of NLP which is responsible for the analysis of unstructured textual pieces and conversion to a structured form.

For example, the conversion of the following unstructured text (raw text):

MergerBetween('Foo Inc', 'Bar Corp', date …)

"Yesterday, New York based Foo Inc. announced their acquisition of Bar Corp."

The above-structured form corresponds to a relation of various entities that were embedded in the initial unstructured raw text. The benefit of this conversion is that structured relations can be manipulated by computer algorithms and finally

more information the reader is referred to Appendix.

score of each location and the coordinates of each location.

**3.1 Introduction**

*Cyberspace*

**3.2. Fire incident detection**

extraction, etc. [14].

**84**

to the structured form:

*3.2.1 Introduction to information extraction*

#### *3.2.2 Information extraction from Twitter*

In this section, a real-case scenario of a system that was realized and evaluated for the purposes of real-time automatic fire detection as demanded by the EUfunded research project "AF3" is presented [15]. The suggested solution comprises a training phase where, via surveys, a variety of tweet samples for various predetermined occasions were collected. These samples were used in order to create a language model (template) that refers to fire incident report.

*3.2.3 Figure training comment platform interface*

*DOI: http://dx.doi.org/10.5772/intechopen.85075*

WORD> <'at'><'Immitos', LOCATION>

HASHTAG> + <LOCATION>

EXPRESSION>

NOUN>

where

'is', etc.)

**87**

The results of the training phase were passed through (a) sentence segmentation, (b) tokenization, (c) part of speech tagging, and (d) name entity detection algorithms, so consequently each report was converted to a tagged sentence form:

*Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection…*

E.g. <'I'> <'think'> <'there'> <'is', DEFINING VERB> <'fire', FIRE RELATED

As a result, this procedure concluded to a set of tagged sentences that we know that they refer to fire incident report. Next, these reports were aggregated based on their similarity. Finally, the most common aggregated ones were kept in a regular expression form in order to represent the variations. These aggregated rules correspond to the relation rules that will be used by the relation recognition step of the

2. <FIRE RELATED NOUN> <EXCLAMATION MARK> \* <FIRE RELATED VERB>

3. <FIRE RELATED NOUN> <EXCLAMATION MARK> \*<VERB RELATED TO

4. <LOCATION> <EXCLAMATION MARK> \* <SENSITIVE AREA> + <FIRE

5. <LOCATION> <EXCLAMATION MARK> \* <SENSITIVE AREA> + <

6. <LOCATION> <EXCLAMATION MARK> \* <FOREST> + <EXCLAMATION> \* < FIRE RELATED VERB> <EXCLAMATION MARK> \* <HASHTAG> + <FIRE RELATED

7. <SENSITIVE AREA> + <EXCLAMATION MARK> \* <FIRE EXPRESSION> <

VERB LOCATION DEFINITION: verbs that define location ('exists', 'is located',

EXCLAMATION MARK> \* <HASHTAG> + <FIRE RELATED NOUN>

EXCLAMATION MARK> \* <HASHTAG> + <LOCATION>

FIRE-RELATED NOUN: 'fire', 'flames', 'smoke', etc.

information extraction module. The selected rules are the following:

SMOKE> + <PREPOSITION> + <HASHTAG> + <LOCATION>

1. <FIRE RELATED WORD> <EXCLAMATION MARK> \* <TIME> + EXCLAMATION MARK> \* <VERB LOCATION DEFINITION> + <PREPOSITION> + <

*Training phase*: The system presented here is responsible for acquiring reports and comments made by civilians about fire incidents at specific locations. In order to define the algorithms to be used, first it is needed to determine the requirements of these algorithms, the desired performance, and efficiency [16]. Consequently, as a first step, a training comment platform was constructed where users were asked to make some comments about a fire incident that they were witnessed hypothetically (see **Figure 2**). Moreover, they were asked to make some comments that use phrases that refer to fire reports, but the comment *should not* refer to a fire incident but to something else (see **Figure 3**). For example, "John has a burning desire to succeed in his new business" (here "burning" means "very strong").

#### **Figure 2.**

*Training comment platform: declaration of fire burst.*


**Figure 3.** *Training comment platform: tricky "fire" word usage.*

*Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection… DOI: http://dx.doi.org/10.5772/intechopen.85075*
