*3.2.3 Figure training comment platform interface*

*3.2.2 Information extraction from Twitter*

*Cyberspace*

**Figure 2.**

**Figure 3.**

**86**

*Training comment platform: declaration of fire burst.*

*Training comment platform: tricky "fire" word usage.*

In this section, a real-case scenario of a system that was realized and evaluated for the purposes of real-time automatic fire detection as demanded by the EUfunded research project "AF3" is presented [15]. The suggested solution comprises a

predetermined occasions were collected. These samples were used in order to create

*Training phase*: The system presented here is responsible for acquiring reports and comments made by civilians about fire incidents at specific locations. In order to define the algorithms to be used, first it is needed to determine the requirements of these algorithms, the desired performance, and efficiency [16]. Consequently, as a first step, a training comment platform was constructed where users were asked to make some comments about a fire incident that they were witnessed hypothetically (see **Figure 2**). Moreover, they were asked to make some comments that use phrases that refer to fire reports, but the comment *should not* refer to a fire incident but to something else (see **Figure 3**). For example, "John has a burning desire to

training phase where, via surveys, a variety of tweet samples for various

a language model (template) that refers to fire incident report.

succeed in his new business" (here "burning" means "very strong").

The results of the training phase were passed through (a) sentence segmentation, (b) tokenization, (c) part of speech tagging, and (d) name entity detection algorithms, so consequently each report was converted to a tagged sentence form:

E.g. <'I'> <'think'> <'there'> <'is', DEFINING VERB> <'fire', FIRE RELATED WORD> <'at'><'Immitos', LOCATION>

As a result, this procedure concluded to a set of tagged sentences that we know that they refer to fire incident report. Next, these reports were aggregated based on their similarity. Finally, the most common aggregated ones were kept in a regular expression form in order to represent the variations. These aggregated rules correspond to the relation rules that will be used by the relation recognition step of the information extraction module. The selected rules are the following:

1. <FIRE RELATED WORD> <EXCLAMATION MARK> \* <TIME> + EXCLAMATION MARK> \* <VERB LOCATION DEFINITION> + <PREPOSITION> + < HASHTAG> + <LOCATION>

2. <FIRE RELATED NOUN> <EXCLAMATION MARK> \* <FIRE RELATED VERB>

3. <FIRE RELATED NOUN> <EXCLAMATION MARK> \*<VERB RELATED TO SMOKE> + <PREPOSITION> + <HASHTAG> + <LOCATION>

4. <LOCATION> <EXCLAMATION MARK> \* <SENSITIVE AREA> + <FIRE EXPRESSION>

5. <LOCATION> <EXCLAMATION MARK> \* <SENSITIVE AREA> + < EXCLAMATION MARK> \* <HASHTAG> + <FIRE RELATED NOUN>

6. <LOCATION> <EXCLAMATION MARK> \* <FOREST> + <EXCLAMATION> \* < FIRE RELATED VERB> <EXCLAMATION MARK> \* <HASHTAG> + <FIRE RELATED NOUN>

7. <SENSITIVE AREA> + <EXCLAMATION MARK> \* <FIRE EXPRESSION> < EXCLAMATION MARK> \* <HASHTAG> + <LOCATION>

where

FIRE-RELATED NOUN: 'fire', 'flames', 'smoke', etc.

VERB LOCATION DEFINITION: verbs that define location ('exists', 'is located', 'is', etc.)

FIRE LOCATION VERB: 'burn', 'fire', etc. VERB-RELATED TO SMOKE: 'covering', 'smoke', etc. SENSITIVE AREA: forest, trees, park, etc.

#### **3.3 Fire incident aggregation and potential fire incident prediction**

• *Low threshold*: The bottom threshold of the number or reports, where below

*Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection…*

• *High threshold*: The upper threshold of the number of reports, where above of

• *Low threshold probability (Pl)*: reliability corresponding to the low threshold

• *High threshold probability (Ph)*: reliability corresponding to the high threshold

NoR ¼ low threshold then we set reliability score Ph (1)

NoR ¼ high threshold then we set reliability score Pl (2)

*<sup>a</sup>*�*low* threshold <sup>¼</sup> *Ph* \$ ln 1 ð Þ ð Þ � *Pl <sup>=</sup>b*Þ ¼ *<sup>a</sup>* � *low threshold* (3)

*Eq:*ð Þ\$3 ln 1 ð Þ ð Þ � *Ph =b*Þ ¼ *a* � *high threshold* (4)

ð Þ ln 1ð Þ� � *Ph* ln ð Þ *<sup>b</sup>* <sup>¼</sup> *Low threshold*

*High threshold* (6)

*High threshold* � ln 1 ð Þ ð Þ � *Ph <sup>=</sup>b*<sup>Þ</sup> (8)

*High threshold* (5)

ð Þ *c*�ln 1ð Þ� �*ph* ln 1ð Þ �*Pl*

*<sup>c</sup>*�<sup>1</sup> (7)

Þ ¼ ð Þ ln 1ð Þ� � *Pl* ln ð Þ *<sup>b</sup>*

<sup>c</sup> <sup>¼</sup> *Low threshold*

**4. Overall proposed scheme for Twitter post-based fire burst detection**

Based on the system architectures presented in Sections 2 and 3, we propose a hybrid architecture for detecting fire bursts in real time based on Twitter posts. The proposed architecture can be divided into two parts: a deep learning scheme for distinguishing false from valid Twitter posts and a typical NLP scheme for

*Eq:*ð Þ 6 *, Eq:*ð Þ!7 ln 1ð Þ� � *Pl* ln ð Þ¼ *b c* � ln 1ð Þ� � *Ph c* � ln ð Þ\$ *b b* ¼ *e*

Eq*:*ð Þ<sup>5</sup> *,* Eq*:*ð Þ! <sup>8</sup> *<sup>a</sup>* <sup>¼</sup> <sup>1</sup>

*a*�*NoR*

Reliability score ¼ 1 � *b* � *e*

The term *NoR* stands for the number of results. In case of

ln 1 ð Þ ð Þ � *Pl =b*Þ ln 1 ð Þ ð Þ � *Ph =b*

of it these reports are considered unreliable

*DOI: http://dx.doi.org/10.5772/intechopen.85075*

it these reports are considered very reliable

The reliability score is given by

Thus:

Similarly:

*Eq:*ð Þ 4 *, Eq:*ð Þ!5

Let:

Then:

**89**

Moreover:

Eq*:*ð Þ\$2 1 � *b* � *e*

#### *3.3.1 Overview*

In the previous section, the procedure of fire incident report acquisition was presented. The result is the gathering of various fire incident reports on different locations with different timestamps. Despite the fact that these reports may seem reliable, due to the severity of the situation, there would be cases, however, that a report may indeed refer to a false fire incident, either because of false fire incident detection from the information extraction component or because of a false report by a civilian [17]. It should be highlighted here that a false report is not made intentionally (like fake news, e.g., as examined in Section 2), but it is an outcome of misunderstanding or a tricky usage of the word fire and its derivatives (i.e., pants on fire). In order to ensure that fire incident notification alerts correspond to a noteworthy event, such reports should be checked of their validity before they are reported to the ingestion server. Consequently, the system consists of an analytic process responsible for the confirmation of the reports based on the number and the location of them. The analytic process implements a reliability model which aggregates the reports and concludes to a fire incident event report along with a reliability score. The reliability score corresponds to the level of how many trustful reports of fire incidents refer to a specific location. The reliability model is presented in more detail in the next section.

#### *3.3.2 Implementation*

Initially, the analytic process clusters incident reports based on their geocoordinates (longitude, latitude). Due to the fact that fire incident reports usually are distributed densely along the fire locations, DBSCAN algorithm [18] was used for report clustering, which is a very efficient dense-based unsupervised classification algorithm for two-dimensional spaces and Euclidean distance as proximity measure and is able to detect accurately various cluster shapes. Then, for each cluster, the reliability model is applied where, finally, a geographical area that it is suspected of being threatened by fire incident is estimated, along with a reliability score.

#### *3.3.3 Reliability model*

The reliability model was designed by assuming that very few reports for specific location probably would mean that these reports are probably false alarms, but above a specific threshold, it is almost clear that there is a significant number of people reported a fire incident. In other words if, for example, there emerges one tweet referring to a great fire at the center of Athens, apparently there would be doubts about the validity of this report. Probably, we would say that either this report was a joke or the author of this comment might mean something different that of the literal meaning of a fire incident. On the other hand, if 100 tweets reported a fire incident, probably a real fire incident in the center of Athens is very likely. Apparently, some more tweets would not do the difference. As a result, an exponential model was selected which is parameterized by:

*Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection… DOI: http://dx.doi.org/10.5772/intechopen.85075*


The reliability score is given by

Reliability score ¼ 1 � *b* � *e a*�*NoR*

The term *NoR* stands for the number of results. In case of

$$\text{NoR} = \text{low threshold then we set reliability score} \leftarrow \text{Ph} \tag{1}$$

$$\text{NoR} = \text{high threshold then we set reliability score} \leftarrow \text{Pl} \tag{2}$$

Thus:

FIRE LOCATION VERB: 'burn', 'fire', etc.

SENSITIVE AREA: forest, trees, park, etc.

*3.3.1 Overview*

*Cyberspace*

detail in the next section.

*3.3.2 Implementation*

*3.3.3 Reliability model*

score.

**88**

VERB-RELATED TO SMOKE: 'covering', 'smoke', etc.

**3.3 Fire incident aggregation and potential fire incident prediction**

In the previous section, the procedure of fire incident report acquisition was presented. The result is the gathering of various fire incident reports on different locations with different timestamps. Despite the fact that these reports may seem reliable, due to the severity of the situation, there would be cases, however, that a report may indeed refer to a false fire incident, either because of false fire incident detection from the information extraction component or because of a false report by a civilian [17]. It should be highlighted here that a false report is not made intentionally (like fake news, e.g., as examined in Section 2), but it is an outcome of misunderstanding or a tricky usage of the word fire and its derivatives (i.e., pants on fire). In order to ensure that fire incident notification alerts correspond to a noteworthy event, such reports should be checked of their validity before they are reported to the ingestion server. Consequently, the system consists of an analytic process responsible for the confirmation of the reports based on the number and the location of them. The analytic process implements a reliability model which aggregates the reports and concludes to a fire incident event report along with a reliability score. The reliability score corresponds to the level of how many trustful reports of fire incidents refer to a specific location. The reliability model is presented in more

Initially, the analytic process clusters incident reports based on their geocoordinates (longitude, latitude). Due to the fact that fire incident reports usually are distributed densely along the fire locations, DBSCAN algorithm [18] was used for report clustering, which is a very efficient dense-based unsupervised classification algorithm for two-dimensional spaces and Euclidean distance as proximity measure and is able to detect accurately various cluster shapes. Then, for each cluster, the reliability model is applied where, finally, a geographical area that it is suspected of being threatened by fire incident is estimated, along with a reliability

The reliability model was designed by assuming that very few reports for specific location probably would mean that these reports are probably false alarms, but above a specific threshold, it is almost clear that there is a significant number of people reported a fire incident. In other words if, for example, there emerges one tweet referring to a great fire at the center of Athens, apparently there would be doubts about the validity of this report. Probably, we would say that either this report was a joke or the author of this comment might mean something different that of the literal meaning of a fire incident. On the other hand, if 100 tweets reported a fire incident, probably a real fire incident in the center of Athens is very likely. Apparently, some more tweets would not do the difference. As a result, an

exponential model was selected which is parameterized by:

$$\text{Eq.}(2) \leftrightarrow \mathbf{1} - b \cdot e^{a \cdot low \text{ threshold}} = Ph \leftrightarrow \ln\left((\mathbf{1} - Pl)\right) / b\text{)} = a \cdot low \text{ threshold} \tag{3}$$

Similarly:

$$Eq.(\mathfrak{Z}) \leftrightarrow \ln\left((\mathfrak{1} - Ph)\right)/b\rangle = a \cdot high \, threshold \tag{4}$$

$$Eq.(4), Eq.(5) \rightarrow \frac{\ln\left((1 - Pl)/b\right)}{\ln\left((1 - Pl)/b\right)} = \frac{\left(\ln\left(1 - Pl\right) - \ln\left(b\right)\right)}{\left(\ln\left(1 - Pl\right) - \ln\left(b\right)\right)} = \frac{Low\ threshold}{High\ threshold} \quad \text{(5)}$$

Let:

$$\mathbf{c} = \frac{\text{Low threshold}}{\text{High threshold}} \tag{6}$$

Then:

$$Eq.(\mathsf{6}), Eq.(\mathsf{7}) \to \ln\left(\mathbf{1} - Pl\right) - \ln\left(b\right) = c \cdot \ln\left(\mathbf{1} - Pl\right) - c \cdot \ln\left(b\right) \leftrightarrow b = e^{\frac{(c \cdot \ln\left(1 - ph\right) - \ln\left(1 - Pl\right))}{c - 1}} \tag{7}$$

Moreover:

$$\text{Eq.(5), Eq.(8)} \to a = \frac{1}{\text{High threshold}} \cdot \ln\left( (1 - Ph) \right) / b \tag{8}$$

### **4. Overall proposed scheme for Twitter post-based fire burst detection**

Based on the system architectures presented in Sections 2 and 3, we propose a hybrid architecture for detecting fire bursts in real time based on Twitter posts. The proposed architecture can be divided into two parts: a deep learning scheme for distinguishing false from valid Twitter posts and a typical NLP scheme for

**Figure 4.** *Proposed overall architecture.*

extracting the crucial information with respect to the declared fire burst post. The overall combined scheme is illustrated in **Figure 4**. The deep learning network part represents the scheme presented in Section 2, while the information extractor of the typical NLP part represents the scheme presented in Section 3.

For the fake post detection part, we are to recruit the aforementioned deep learning scheme as it performs twice as good as the related NLP-based methods [19]. Thus, Twitter post processing is expected to work much faster than in the case of implementing a typical NLP-based procedure of the state of the art. In addition, the availability of large posts/news datasets [10–12] facilitates the reliable training of such systems.

Despite the current trend of massively turning to deep neural networks, we designed and constructed a rather typical NLP-based architecture for the information extraction part of our system. This is highly related to the prerequisites that the training procedure of a deep neural network sets, as well as the nature of the problem itself. To begin with, due to lack of a publicly available (i.e., dataset containing a large number of fire burst-related Twitter posts), appropriate dataset for this task, a deep learning approach would be one of only few chances of success. More importantly, the nature of the task itself points to the direction we followed; fire-related posts on a social media platform are reasonably expected to have some common characteristics that make it suitable for a human to model them in order to obtain the desired information. For example, such posts are expected to be short in length, declaring the area of the fire source while containing words and phrases from a fire-related expression set of manageable size. So, our NLP-based subsystem is human and not machine modeled, is proven to be efficient, and is human intuitive and understandable, something that makes it easier to manipulate and expand, if needed.

estimated that the fire was located along with the post comments of the reports, photos attached with the reports, and the reliability score (see **Figure 5**).

API

**Source Expected results Validation**

*Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection…*

containing the hashtag

• Collect the post coming from Twitter and

Done successfully

#af3EUprojectFireDetection\_TRIAL • Pass these posts through the information extraction sub-module in order to distinguish the tweets that referred to fire incidents from the ones that did not • Cluster the posts referring to fire incidents, and detect fire incident areas along with the

corresponding reliability score • Send result to ingestion server via the REST

enhance its estimation. **Table 2** illustrates the validation results.

**6. Conclusions**

effects.

**91**

**Figure 5.**

*Validating the scenarios.*

(tweets)

**Table 2.**

Fire incident indication based on reports coming from Twitter posts

*DOI: http://dx.doi.org/10.5772/intechopen.85075*

*Validation results of the NLP-based scheme.*

During the second scenario test, a controlled fire was set at an open area near the military airport in Aktio [20]. After a while actors, members of the pilot exercise, similarly with the first scenario, committed posts about the fire incident on the Twitter instead of the mobile app. These tweets were collected by the fire incident detection component, analyzed, and distinguished the ones that refer to the fire incident. These reports were gathered by the analytic module and, as described above, clustered, and finally the corresponding notifications were sent to the ingestion server. The results, similar to the first case, were visualized by the public information channel and exploited by the data fusion component in order to

Fire bursts are a dangerous problem of great importance worldwide. Mega fires often result in significant environmental destructions, major damages on infrastructures, and economic loss. Most importantly, they put at stake the lives, not only of the civilians but also of the forest fire personnel. Thus, technologies that facilitate early fire detection are important for reducing fires and their negative

#### **5. Validation**

The system was tested during the AF3 pilot exercise in Skaramagas naval base in two scenarios: (a) fire incident indication based on reports coming from mobile app and (b) fire incident indication based on reports coming from Twitter posts (tweets) containing the hashtag #af3EUprojectFireDetection\_TRIAL.

During the first scenario test, a controlled fire was set at an open area inside the naval base. After a while actors, members of the pilot exercise, pretending to be citizens passing by, started posting reports about the fire incident they witnessed. These posts were analyzed by the fire incident detection module and return a notification of a potential of fire incident along with the estimated location and a reliability score. The results were visualized by the public information channel, where fire incident notifications were presented on the map as an area that it was

*Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection… DOI: http://dx.doi.org/10.5772/intechopen.85075*

#### **Figure 5.** *Validating the scenarios.*

extracting the crucial information with respect to the declared fire burst post. The overall combined scheme is illustrated in **Figure 4**. The deep learning network part represents the scheme presented in Section 2, while the information extractor of the

For the fake post detection part, we are to recruit the aforementioned deep learning scheme as it performs twice as good as the related NLP-based methods [19]. Thus, Twitter post processing is expected to work much faster than in the case of implementing a typical NLP-based procedure of the state of the art. In addition, the availability of large posts/news datasets [10–12] facilitates the reliable training

Despite the current trend of massively turning to deep neural networks, we designed and constructed a rather typical NLP-based architecture for the information extraction part of our system. This is highly related to the prerequisites that the training procedure of a deep neural network sets, as well as the nature of the problem itself. To begin with, due to lack of a publicly available (i.e., dataset containing a large number of fire burst-related Twitter posts), appropriate dataset for this task, a deep learning approach would be one of only few chances of success. More importantly, the nature of the task itself points to the direction we followed; fire-related posts on a social media platform are reasonably expected to have some common characteristics that make it suitable for a human to model them in order to obtain the desired information. For example, such posts are expected to be short in length, declaring the area of the fire source while containing words and phrases from a fire-related expression set of manageable size. So, our NLP-based subsystem is human and not machine modeled, is proven to be efficient, and is human intuitive and understandable, something that makes it easier to manipulate and expand,

The system was tested during the AF3 pilot exercise in Skaramagas naval base in two scenarios: (a) fire incident indication based on reports coming from mobile app

During the first scenario test, a controlled fire was set at an open area inside the naval base. After a while actors, members of the pilot exercise, pretending to be citizens passing by, started posting reports about the fire incident they witnessed. These posts were analyzed by the fire incident detection module and return a notification of a potential of fire incident along with the estimated location and a reliability score. The results were visualized by the public information channel, where fire incident notifications were presented on the map as an area that it was

and (b) fire incident indication based on reports coming from Twitter posts (tweets) containing the hashtag #af3EUprojectFireDetection\_TRIAL.

typical NLP part represents the scheme presented in Section 3.

of such systems.

**Figure 4.**

*Cyberspace*

*Proposed overall architecture.*

if needed.

**90**

**5. Validation**


#### **Table 2.**

*Validation results of the NLP-based scheme.*

estimated that the fire was located along with the post comments of the reports, photos attached with the reports, and the reliability score (see **Figure 5**).

During the second scenario test, a controlled fire was set at an open area near the military airport in Aktio [20]. After a while actors, members of the pilot exercise, similarly with the first scenario, committed posts about the fire incident on the Twitter instead of the mobile app. These tweets were collected by the fire incident detection component, analyzed, and distinguished the ones that refer to the fire incident. These reports were gathered by the analytic module and, as described above, clustered, and finally the corresponding notifications were sent to the ingestion server. The results, similar to the first case, were visualized by the public information channel and exploited by the data fusion component in order to enhance its estimation. **Table 2** illustrates the validation results.
