**4. Earthquake detection from a time-series data using a probabilistic model**

The second step of Fig. 4 detects an earthquake from positive tweets.

First, it is difficult to believe these tweets directly because some users misinterpret shaking caused by something other than an earthquake. Some ill-willed users post positive tweets to deceive others. This closely resembles physical sensors, and sometimes produces a wrong value. Therefore, we must process positive tweets to detect earthquakes with high accuracy, similarly to treating physical sensors.

Figure 10 depicts the sizes of earthquakes and counts of positive tweets filtered by SVM on Feb 11 2011. These two graphs are correlated: whenever an earthquake occurs, a peak appears in the graph of positive tweet counts. Therefore, we can detect earthquakes by detecting the peaks of positive tweet counts.

Fig. 10. Sizes of earthquakes and changes of filtered tweet counts Feb 11 2011.

Many methods have been used to detect peaks from time-series data for purposes such as burst detection(Kleinberg, 2002; Zhu & Shasha, 2003) and anomaly detection(Cheng et al., 2008; Krishnamurthy et al., 2003). Toretter uses a static rule *5 tweets in 5 min* that is calculated using an exponential function. We explain this method hereinafter.

#### **4.1 Temporal model**

To detect an earthquake using physical sensors, we must calculate the probability of earthquake occurrence based on signals from those sensors. Similarly, we must calculate the probability of earthquake occurrence from signals of social sensors. In this subsection, we explain the temporal model we use to calculate this probability.

Figure 11 presents graphs of positive tweet counts during earthquakes. In Fig. 11, the green line shows an exponential function. As shown here, the green line resembles the red line,

Fig. 11. Number of Tweets and Exponential Curve.

the graph of positive tweet counts. It can be inferred from these graphs that this frequency distribution of positive tweets is an exponential distribution, as expressed by the following equation(Sakaki et al., 2010).

$$f(t\lambda) = ke^{-\lambda t} \tag{1}$$

**4.2 Setup the condition for detection trigger**

determine this condition.

described in this section.

**5. Location estimation from tweets**

**5.1 Extracting location information from tweets**

**5.1.1 Location information in user profiles**

(This number varies among countries.).

the following URL.

&include\_entities=true

that Twitter user TwitterAPI resides in *San Francisco, CA*.

calculate *twait* as

In the Toretter system, we detect an earthquake when *five positive tweets arrive in 5 min*, which means *five sensors produce positive signals in 5 min*. In this subsection, we explain how to

Earthquake Observation by Social Sensors 327

We set *λ* = 0.34, *pf* = 0.35 (taken from our earlier research) to Equation (5) , by which we can calculate the probability of earthquake occurrence. When obtaining *n*<sup>0</sup> positive tweets, and given that we would like to make an alarm with false-positive ratio less than 1%, we can

If we set *twait* = 5, then we can calculate *n*<sup>0</sup> = 4.1 from Eq. 6. Therefore, the trigger for earthquake detection is set as *five positive tweets come in 5 min* in Toretter. The trigger used for detection of earthquake calculation can be determined using an exponential function, as

In this section, we explain a means to estimate the location of an earthquake epicenter by analyzing tweets. First, we introduce the kinds of location information to be acquired from tweets. Next, we explain methods to estimate the location of the earthquake epicenter.

Two kinds of information are applicable for location estimation from tweets: using location

The twitter user profile includes the location information of users. Of course, not all users make their location information public on the internet, but a sufficient number of users do so

For earthquake detection, we collect positive tweets. We extract the location information of users who post those positive tweets for earthquake epicenter location estimation. Twitter

Twitter REST API is one Twitter API included among all methods to use basic functions of Twitter. Many methods of using REST API exist. We use the *users/show* method to obtain user information. To extract user information of Twitter user *TwitterAPI*, it is necessary to access

It is possible to obtain results in Fig. 12, which is described in JSON format, in the same manner as that used for Twitter Search API. It is possible to know from the result in Fig. 12

<sup>1</sup> <sup>−</sup> 1.264 *n*0

− 1. (6)

0.34 *log*

*twait* <sup>=</sup> <sup>−</sup> <sup>1</sup>

information in the Twitter user profile or using *geotag* attached to tweets.

REST API must be used to extract location information of users from Twitter.

http://api.twitter.com/1/users/show.json?screen\_name=TwitterAPI

Some points to consider when using Twitter REST API are the following:

We express the number of sensors producing positive value at time *t* in *n*(*t*). Here, *n*(*t*) is equal to the number of positive tweets at time *t*. If *n*<sup>0</sup> sensors produce positive value at *t* = 0, then we can calculate the number of sensors for which the response is a positive value at time *t* using the following equation.

$$m(t) = n\_0 \cdot e^{-\lambda t} \tag{2}$$

Therefore, we can calculate *Nta* , the number of sensors that produce a positive value from time 0 to time *ta*, as presented below.

$$\begin{split} N\_{t\_d} &= \sum\_{t=0}^{t\_d} n(t) \\ &= n\_0 \sum\_{t=0}^{t\_d} e^{-\lambda t} \\ &= n\_0 \frac{1 - e^{-\lambda(t\_d+1)}}{1 - e^{-\lambda}} \end{split} \tag{3}$$

We define the false-positive ratio of a sensor as *pf* . In this case, we assume that we have *n* sensors and that all *n* sensors have the same false-positive ratio equally. The probability of all *n* sensors producing a false alarm is *p<sup>n</sup> <sup>f</sup>* . Therefore, the probability of earthquake occurrence can be estimated as

$$P(n) = 1 - p\_f^n. \tag{4}$$

From Eq. 3, Eq. 4, we can calculate the probability of earthquake occurrence at time *ta*.

$$\begin{split} p\_{\text{occur}}(t) &= 1 - p\_f^{N\_{\text{la}}} \\ &= 1 - p\_f^{n\_0 \left(1 - e^{-\lambda(t\_d + 1)}\right) / \left(1 - e^{-\lambda}\right)} \end{split} \tag{5}$$

14 Will-be-set-by-IN-TECH

the graph of positive tweet counts. It can be inferred from these graphs that this frequency distribution of positive tweets is an exponential distribution, as expressed by the following

We express the number of sensors producing positive value at time *t* in *n*(*t*). Here, *n*(*t*) is equal to the number of positive tweets at time *t*. If *n*<sup>0</sup> sensors produce positive value at *t* = 0, then we can calculate the number of sensors for which the response is a positive value at time

*n*(*t*) = *n*<sup>0</sup> · *e*

*ta* ∑ *t*=0 *n*(*t*)

= *n*<sup>0</sup>

= *n*<sup>0</sup>

*Nta* =

Therefore, we can calculate *Nta* , the number of sensors that produce a positive value from time

*ta* ∑ *t*=0 *e* −*λt*

We define the false-positive ratio of a sensor as *pf* . In this case, we assume that we have *n* sensors and that all *n* sensors have the same false-positive ratio equally. The probability of all

*<sup>P</sup>*(*n*) = <sup>1</sup> <sup>−</sup> *<sup>p</sup><sup>n</sup>*

*Nta f*

*<sup>n</sup>*0(1−*e*−*λ*(*ta*+1)

From Eq. 3, Eq. 4, we can calculate the probability of earthquake occurrence at time *ta*.

= 1 − *p*

*poccur*(*t*) = 1 − *p*

<sup>1</sup> <sup>−</sup> *<sup>e</sup>*−*λ*(*ta*+1)

*f*(*tλ*) = *ke*−*λ<sup>t</sup>* (1)

<sup>−</sup>*λ<sup>t</sup>* (2)

<sup>1</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>λ</sup>* (3)

*<sup>f</sup>* . (4)

*<sup>f</sup>* (5)

*<sup>f</sup>* . Therefore, the probability of earthquake occurrence

)/(1−*e*−*<sup>λ</sup>*)

Fig. 11. Number of Tweets and Exponential Curve.

equation(Sakaki et al., 2010).

*t* using the following equation.

0 to time *ta*, as presented below.

*n* sensors producing a false alarm is *p<sup>n</sup>*

can be estimated as

#### **4.2 Setup the condition for detection trigger**

In the Toretter system, we detect an earthquake when *five positive tweets arrive in 5 min*, which means *five sensors produce positive signals in 5 min*. In this subsection, we explain how to determine this condition.

We set *λ* = 0.34, *pf* = 0.35 (taken from our earlier research) to Equation (5) , by which we can calculate the probability of earthquake occurrence. When obtaining *n*<sup>0</sup> positive tweets, and given that we would like to make an alarm with false-positive ratio less than 1%, we can calculate *twait* as

$$t\_{wait} = -\frac{1}{0.34} \log \left( 1 - \frac{1.264}{n\_0} \right) - 1.\tag{6}$$

If we set *twait* = 5, then we can calculate *n*<sup>0</sup> = 4.1 from Eq. 6. Therefore, the trigger for earthquake detection is set as *five positive tweets come in 5 min* in Toretter. The trigger used for detection of earthquake calculation can be determined using an exponential function, as described in this section.

#### **5. Location estimation from tweets**

In this section, we explain a means to estimate the location of an earthquake epicenter by analyzing tweets. First, we introduce the kinds of location information to be acquired from tweets. Next, we explain methods to estimate the location of the earthquake epicenter.

#### **5.1 Extracting location information from tweets**

Two kinds of information are applicable for location estimation from tweets: using location information in the Twitter user profile or using *geotag* attached to tweets.

#### **5.1.1 Location information in user profiles**

The twitter user profile includes the location information of users. Of course, not all users make their location information public on the internet, but a sufficient number of users do so (This number varies among countries.).

For earthquake detection, we collect positive tweets. We extract the location information of users who post those positive tweets for earthquake epicenter location estimation. Twitter REST API must be used to extract location information of users from Twitter.

Twitter REST API is one Twitter API included among all methods to use basic functions of Twitter. Many methods of using REST API exist. We use the *users/show* method to obtain user information. To extract user information of Twitter user *TwitterAPI*, it is necessary to access the following URL.

http://api.twitter.com/1/users/show.json?screen\_name=TwitterAPI &include\_entities=true

It is possible to obtain results in Fig. 12, which is described in JSON format, in the same manner as that used for Twitter Search API. It is possible to know from the result in Fig. 12 that Twitter user TwitterAPI resides in *San Francisco, CA*.

Some points to consider when using Twitter REST API are the following:

Fig. 13. Result of geographical name converted using Google Maps API.

Some tweets have an attached geotag, which includes a latitude–longitude pair acquired from GPS. If positive tweets related to an earthquake include tweets with attached geotags, then it is possible to use these geotag data for location estimation. Geotag data can be extracted using the Twitter Search API. Therefore, GPS data can be obtained if stored when using crawl for

Earthquake Observation by Social Sensors 329

Geotag data are more accurate than location information of the Twitter user profile because they are acquired from GPS. Nevertheless, it is unusual that positive tweets referring to an earthquake include a sufficient number of tweets with attached geotags to estimate the earthquake epicenter location. Actually, a combination of location information of Twitter users

If one can obtain sufficient location information from positive tweets, then estimating the location of the earthquake epicenter can be done using the information. Nevertheless, that information is often inaccurate. Alternatively if they are precise, then users might still be posting far from the earthquake epicenter. Therefore, it is preferred that the location of the

Several methods can be used to estimate the location of events from sensor readings using Bayesian Filters: Kalman filters, Multihypothesis tracking, Grid-based approaches,

We use particle filters as an example for explanation. Particle filters have high performance in belief, accuracy, robustness, and variety according to an evaluation by Fox et al. (Fox et al., 2003). Moreover particle filters work better to detect earthquakes from Twitter in the

**5.1.2 Geotags attached to each tweet**

those tweets by the Twitter Search API.

**5.2 Location estimation using Bayesian filtering**

earthquake epicenter be estimated probabilistically.

Topological approaches, and Particle filters.

experiments by Sakaki et al. (Sakaki et al., 2010).

and geotag should be used.

Fig. 12. User information extraction from Twitter Search API.


It is possible to access REST API 150 times per hour. This limit is sufficient to extract user information for location estimation of an earthquake epicenter because the earthquake-related tweets posted in the 5 min after an earthquake are most often fewer than 100. To expand the limit, one must register with Twitter and obtain an authorization called OAuth, according to the Twitter API Documentation3.

Moreover one must convert location information acquired from Twitter into a latitude–longitude pair because human beings can understand places expressed by the names of places, such as *San Francisco*, but a computer can not understand where that place is. One must treat location information in the format of a latitude–longitude coordinate pair. At present, some web services can convert geographical names into a latitude–longitude coordinate pairs, such as the Google Maps API and Yahoo Maps API. Here we explain the Google Maps API.

To convert *San Francisco* into a a latitude–longitude coordinate pair, one can access the following URL.

http://maps.google.com/maps/api/geocode/json?address=San %20Francisco&sensor=false&language=en

Results are obtainable as in Fig. 13, which is described in JSON format, in the same manner as Twitter API. It is possible to convert *San Francisco* into *latitude* = 37.7749295, *longitude* = −122.4194155.

Location information related to an earthquake can be acquired as described above.

<sup>3</sup> https://dev.twitter.com/docs/auth

16 Will-be-set-by-IN-TECH

• Some users do not register their location information, or register non-location data, such

(The limit is published: it is possible to access the Twitter Search API about 150 times per

It is possible to access REST API 150 times per hour. This limit is sufficient to extract user information for location estimation of an earthquake epicenter because the earthquake-related tweets posted in the 5 min after an earthquake are most often fewer than 100. To expand the limit, one must register with Twitter and obtain an authorization called OAuth, according to

Moreover one must convert location information acquired from Twitter into a latitude–longitude pair because human beings can understand places expressed by the names of places, such as *San Francisco*, but a computer can not understand where that place is. One must treat location information in the format of a latitude–longitude coordinate pair. At present, some web services can convert geographical names into a latitude–longitude coordinate pairs, such as the Google Maps API and Yahoo Maps API. Here we explain the

To convert *San Francisco* into a a latitude–longitude coordinate pair, one can access the

Results are obtainable as in Fig. 13, which is described in JSON format, in the same manner as Twitter API. It is possible to convert *San Francisco* into *latitude* = 37.7749295, *longitude* =

http://maps.google.com/maps/api/geocode/json?address=San

Location information related to an earthquake can be acquired as described above.

%20Francisco&sensor=false&language=en

<sup>3</sup> https://dev.twitter.com/docs/auth

Fig. 12. User information extraction from Twitter Search API.

• API requests are limited.

hour without authorization.)

the Twitter API Documentation3.

Google Maps API.

following URL.

−122.4194155.

as *in a dream*, *anywhere*. Such non-location data should be ignored.

```
Fig. 13. Result of geographical name converted using Google Maps API.
```
#### **5.1.2 Geotags attached to each tweet**

Some tweets have an attached geotag, which includes a latitude–longitude pair acquired from GPS. If positive tweets related to an earthquake include tweets with attached geotags, then it is possible to use these geotag data for location estimation. Geotag data can be extracted using the Twitter Search API. Therefore, GPS data can be obtained if stored when using crawl for those tweets by the Twitter Search API.

Geotag data are more accurate than location information of the Twitter user profile because they are acquired from GPS. Nevertheless, it is unusual that positive tweets referring to an earthquake include a sufficient number of tweets with attached geotags to estimate the earthquake epicenter location. Actually, a combination of location information of Twitter users and geotag should be used.

#### **5.2 Location estimation using Bayesian filtering**

If one can obtain sufficient location information from positive tweets, then estimating the location of the earthquake epicenter can be done using the information. Nevertheless, that information is often inaccurate. Alternatively if they are precise, then users might still be posting far from the earthquake epicenter. Therefore, it is preferred that the location of the earthquake epicenter be estimated probabilistically.

Several methods can be used to estimate the location of events from sensor readings using Bayesian Filters: Kalman filters, Multihypothesis tracking, Grid-based approaches, Topological approaches, and Particle filters.

We use particle filters as an example for explanation. Particle filters have high performance in belief, accuracy, robustness, and variety according to an evaluation by Fox et al. (Fox et al., 2003). Moreover particle filters work better to detect earthquakes from Twitter in the experiments by Sakaki et al. (Sakaki et al., 2010).

We use a more advanced algorithm with re-sampling. We use weight distribution *Dw*(*x*, *y*), which is obtained from the Twitter user distribution to assess the biases of user locations4 .

Earthquake Observation by Social Sensors 331

1. **Initialization**: Calculate the weight distribution *Dw*(*x*, *y*) from Twitter users'

2. **Generation**: Generate and weight a particle set, which means the *N* discrete

0,...,*sN*−<sup>1</sup> <sup>0</sup> )

> <sup>0</sup>, *<sup>y</sup><sup>k</sup>* <sup>0</sup>, *<sup>w</sup><sup>k</sup>* 0)

<sup>0</sup> = (*x<sup>k</sup>*

(a) Re-sample *N* particles from a particle set *St* using weights of respective particles and allocate them on the map. We allow re-sampling of more than that of the

(b) Generate a new particle set *St*+<sup>1</sup> and weight them based on weight distribution

*<sup>t</sup>*−<sup>1</sup> <sup>+</sup> *vxt*−<sup>1</sup>Δ*<sup>t</sup>* <sup>+</sup>

*<sup>t</sup>*−<sup>1</sup> <sup>+</sup> *vyt*−<sup>1</sup>Δ*<sup>t</sup>* <sup>+</sup>

(*vxt* , *vyt*)=(*vxt*−<sup>1</sup> + *axt*−<sup>1</sup> , *vyt*−<sup>1</sup> , *ayt*−<sup>1</sup> )

5. **Weighing**: Re-calculate the weight of *St* by measurement *m*(*mx*, *my*) as follows.

*axt* <sup>=</sup> <sup>N</sup> (0; *<sup>σ</sup>*2), *ayt* <sup>=</sup> <sup>N</sup> (0; *<sup>σ</sup>*2).

*<sup>t</sup>* , *dy<sup>k</sup>*

−(*dx<sup>k</sup> t* 2 + *dy<sup>k</sup> t* 2 )

*<sup>t</sup>*) · <sup>1</sup> ( <sup>√</sup>2*πσ*)

2*σ*<sup>2</sup>

*<sup>t</sup>* , *<sup>y</sup><sup>k</sup>*

6. **Measurement**: Calculate the current object location *o*(*xt*, *yt*) by the average of

<sup>4</sup> We sample tweets associated with locations and obtain a user distribution that is proportional to the

*axt*−<sup>1</sup> <sup>2</sup> <sup>Δ</sup>*<sup>t</sup>* 2,

*ayt*−<sup>1</sup> <sup>2</sup> <sup>Δ</sup>*<sup>t</sup>* 2)

*<sup>t</sup>* <sup>=</sup> *my* <sup>−</sup> *<sup>y</sup><sup>k</sup>*

*t*

4. **Prediction**: Predict the next state of a particle set *St* from Newton's motion equation.

*S*<sup>0</sup> = (*s* 0 <sup>0</sup>,*s* 1 <sup>0</sup>,*s* 2

*particle s<sup>k</sup>*

The algorithm is shown as follows:

hypothesis.

3. **Re-sampling**

same particles.

*Dw*(*x*, *y*).

*s*(*xt*, *yt*) ∈ *St*.

number of tweets in each region.

geographic distribution in Japan.

and allocate them evenly on the map, as

(b) Weight them based on weight distribution *Dw*(*x*, *y*).

(*x<sup>k</sup> <sup>t</sup>* , *<sup>y</sup><sup>k</sup>*

*dx<sup>k</sup>*

*wk*

7. **Iteration**: Iterate Steps 3, 4, 5, and 6 until convergence.

*t*)=(*x<sup>k</sup>*

*yk*

*<sup>t</sup>* <sup>=</sup> *mx* <sup>−</sup> *<sup>x</sup><sup>k</sup>*

*<sup>t</sup>* = *Dw*(*x<sup>k</sup>*

·*exp*

*x*, *longitude*; *y*, *latitude*; *w*, *weight*

(a) Generate a particle set

#### **5.2.1 Spatial model**

Each tweet is associated with a location. We describe a method that can estimate the location of an event from sensor readings. To define the problem of location estimation, we consider the evolution of the state sequence {*xt*, *t* ∈ **N**} of a target, given

$$\mathbf{x}\_{t} = f\_{t}(\mathbf{x}\_{t-1}, \boldsymbol{\mu}\_{t}), \ f\_{t} : \mathcal{R}\_{t}^{n} \times \mathcal{R}\_{t}^{n} \to \mathcal{R}\_{t}^{n}.$$

where *ft* is a possibly nonlinear function of the state *xt*−1. Furthermore, *ut* is an i.i.d. process noise sequence. The objective of tracking is to estimate *xt* recursively from measurements, as

$$z\_t = h\_t(\mathfrak{x}\_t, n\_t), \ h\_t : \mathcal{R}\_t^n \times \mathcal{R}\_t^n \to \mathcal{R}\_t^n.$$

where *ht* is a possibly nonlinear function, and where *nt* is an i.i.d. measurement noise sequence. From a Bayesian perspective, the tracking problem is to calculate, recursively, some degree of belief in the state *xt* at time *t*, given data *zt* up to time *t*.

Presuming that *<sup>p</sup>*(*xt*−1|*zt*−1) is available, the prediction stage uses the following equation.

$$p(\mathbf{x}\_{t}|\mathbf{z}\_{t-1}) = \int p(\mathbf{x}\_{t}|\mathbf{x}\_{t-1}) p(\mathbf{x}\_{t-1}|\mathbf{z}\_{t-1}) d\mathbf{x}\_{t-1}$$

Here we use a Markov process of order one. Therefore, we can assume that

$$p(\mathbf{x}\_t|\mathbf{x}\_{t-1}, z\_{t-1}) = p(\mathbf{x}\_t|\mathbf{x}\_{t-1}).\text{ q.}$$

In the update stage, Bayes' rule is applied as

$$p(\mathbf{x}\_t|z\_t) = p(z\_t|\mathbf{x}\_t)p(\mathbf{x}\_t|z\_{t-1})/p(z\_t|z\_{t-1})\_\prime$$

where the normalizing constant is

$$p(z\_t|z\_{t-1}) = \int p(z\_t|\mathbf{x}\_t) p(\mathbf{x}\_t|z\_{t-1}) d\mathbf{x}\_t.$$

To solve the problem, several methods of Bayesian filters are proposed such as Kalman filters, multi-hypothesis tracking, grid-based and topological approaches, and particle filters. For this study, we use particle filters, both of which are widely used in location estimation.

Additionally, we must consider the nonuniform distribution of Twitter users when we apply Bayesian filters to *social sensors* because *social sensors* are arranged non-uniformly to a greater degree than normal physical sensors are.

#### **5.2.2 Location estimation using a particle filter**

A particle filter is a Bayes filter that approximates a state probabilistically. It is a sequential Monte Carlo method. For location estimation, we maintain a probability distribution for the location estimation at time *<sup>t</sup>*, designated as the belief *Bel*(*xt*) = {*x<sup>i</sup> <sup>t</sup>*, *<sup>w</sup><sup>i</sup> <sup>t</sup>*}, *<sup>i</sup>* <sup>=</sup> 1... *<sup>n</sup>*. Each *<sup>x</sup><sup>i</sup> t* is a discrete hypothesis related to the location of the object. The *w<sup>i</sup> <sup>t</sup>* are non-negative weights, called *importance factors*, which sum to one.

The Sequential Importance Sampling (SIS) algorithm is a Monte Carlo method that forms the basis for particle filters. The SIS algorithm consists of recursive propagation of the weights and support points as each measurement is received sequentially.

18 Will-be-set-by-IN-TECH

Each tweet is associated with a location. We describe a method that can estimate the location of an event from sensor readings. To define the problem of location estimation, we consider

where *ft* is a possibly nonlinear function of the state *xt*−1. Furthermore, *ut* is an i.i.d. process noise sequence. The objective of tracking is to estimate *xt* recursively from measurements, as

where *ht* is a possibly nonlinear function, and where *nt* is an i.i.d. measurement noise sequence. From a Bayesian perspective, the tracking problem is to calculate, recursively, some

Presuming that *<sup>p</sup>*(*xt*−1|*zt*−1) is available, the prediction stage uses the following equation.

*<sup>p</sup>*(*xt*|*xt*−1, *zt*−1) = *<sup>p</sup>*(*xt*|*xt*−1).

*<sup>p</sup>*(*xt*|*zt*) = *<sup>p</sup>*(*zt*|*xt*)*p*(*xt*|*zt*−1)/*p*(*zt*|*zt*−1),

To solve the problem, several methods of Bayesian filters are proposed such as Kalman filters, multi-hypothesis tracking, grid-based and topological approaches, and particle filters. For this study, we use particle filters, both of which are widely used in location estimation.

Additionally, we must consider the nonuniform distribution of Twitter users when we apply Bayesian filters to *social sensors* because *social sensors* are arranged non-uniformly to a greater

A particle filter is a Bayes filter that approximates a state probabilistically. It is a sequential Monte Carlo method. For location estimation, we maintain a probability distribution for the

The Sequential Importance Sampling (SIS) algorithm is a Monte Carlo method that forms the basis for particle filters. The SIS algorithm consists of recursive propagation of the weights

*<sup>t</sup>* × R*<sup>n</sup>*

*<sup>t</sup>* × R*<sup>n</sup>*

*<sup>p</sup>*(*xt*|*xt*−1)*p*(*xt*−1|*zt*−1)*dxt*−<sup>1</sup>

*<sup>p</sup>*(*zt*|*xt*)*p*(*xt*|*zt*−1)*dxt*.

*<sup>t</sup>*, *<sup>w</sup><sup>i</sup>*

*<sup>t</sup>*}, *<sup>i</sup>* <sup>=</sup> 1... *<sup>n</sup>*. Each *<sup>x</sup><sup>i</sup>*

*<sup>t</sup>* are non-negative weights,

*t*

*<sup>t</sup>* → R*<sup>n</sup> t* ,

*<sup>t</sup>* → R*<sup>n</sup> t* ,

the evolution of the state sequence {*xt*, *t* ∈ **N**} of a target, given

degree of belief in the state *xt* at time *t*, given data *zt* up to time *t*.

*<sup>p</sup>*(*xt*|*zt*−1) =

In the update stage, Bayes' rule is applied as

where the normalizing constant is

degree than normal physical sensors are.

**5.2.2 Location estimation using a particle filter**

called *importance factors*, which sum to one.

*xt* <sup>=</sup> *ft*(*xt*−1, *ut*), *ft* : <sup>R</sup>*<sup>n</sup>*

*zt* <sup>=</sup> *ht*(*xt*, *nt*), *ht* : <sup>R</sup>*<sup>n</sup>*

Here we use a Markov process of order one. Therefore, we can assume that

*<sup>p</sup>*(*zt*|*zt*−1) =

location estimation at time *<sup>t</sup>*, designated as the belief *Bel*(*xt*) = {*x<sup>i</sup>*

is a discrete hypothesis related to the location of the object. The *w<sup>i</sup>*

and support points as each measurement is received sequentially.

**5.2.1 Spatial model**

We use a more advanced algorithm with re-sampling. We use weight distribution *Dw*(*x*, *y*), which is obtained from the Twitter user distribution to assess the biases of user locations4 . The algorithm is shown as follows:

	- (a) Generate a particle set

$$S\_0 = (s\_{0\prime}^0 s\_{0\prime}^1 s\_{0\prime}^2 \dots, s\_0^{N-1})$$

and allocate them evenly on the map, as

$$particle\ s\_0^k = (x\_{0'}^k y\_{0'}^k w\_0^k)$$

*x*, *longitude*; *y*, *latitude*; *w*, *weight*

	- (a) Re-sample *N* particles from a particle set *St* using weights of respective particles and allocate them on the map. We allow re-sampling of more than that of the same particles.
	- (b) Generate a new particle set *St*+<sup>1</sup> and weight them based on weight distribution *Dw*(*x*, *y*).

$$\begin{aligned} (\mathfrak{x}\_{t\prime}^k y\_t^k) &= (\mathfrak{x}\_{t-1}^k + v\_{\mathfrak{x}\_{t-1}} \Delta t + \frac{a\_{\mathfrak{x}\_{t-1}}}{2} \Delta t^2) \\ y\_{t-1}^k + v\_{\mathfrak{y}\_{t-1}} \Delta t + \frac{a\_{\mathfrak{y}\_{t-1}}}{2} \Delta t^2) \\ (v\_{\mathfrak{x}\_{t\prime}} v\_{\mathfrak{y}\_t}) &= (v\_{\mathfrak{x}\_{t-1}} + a\_{\mathfrak{x}\_{t-1}\prime} v\_{\mathfrak{y}\_{t-1}\prime} a\_{\mathfrak{y}\_{t-1}}) \\ a\_{\mathfrak{x}\_t} &= \mathcal{N}(0; \sigma^2), \ a\_{\mathfrak{y}\_t} = \mathcal{N}(0; \sigma^2). \end{aligned}$$

5. **Weighing**: Re-calculate the weight of *St* by measurement *m*(*mx*, *my*) as follows.

$$\begin{aligned} dx\_t^k &= m\_\chi - \mathbf{x}\_t^k \quad dy\_t^k = m\_\mathcal{Y} - y\_t^k\\ w\_t^k &= D\_w(\mathbf{x}\_t^k, y\_t^k) \cdot \frac{1}{(\sqrt{2\pi}\sigma)}\\ &\cdot \exp\left(-\frac{(dx\_t^{k^2} + dy\_t^{k^2})}{2\sigma^2}\right) \end{aligned}$$


<sup>4</sup> We sample tweets associated with locations and obtain a user distribution that is proportional to the number of tweets in each region.

Figure 14 depicts the Twitter user distribution map and Fig. 15 depicts an earthquake occurrence distribution map. Earthquake detection using information from Twitter users is applicable in overlapping areas of these two maps: for example, Japan, the west coast of the

Earthquake Observation by Social Sensors 333

The number of Twitter users has been increasing continuously. Therefore, those areas can probably be expanded. Additionally, if one uses social media other than Twitter, then

Therefore, a target area should be chosen very carefully to apply the methods described in this

To evaluate the performance of earthquake detection and earthquake epicenter location estimation, one must collect earthquake data from some organizations. Those data must include information about an approximate time point of an earthquake and approximate position of an earthquake epicenter. Moreover, it is better that they include the exact time of an earthquake, the longitude and latitude of an earthquake epicenter, and the seismic intensity

For example, the Japan Meteorology Agency (JMA) publishes an earthquake database on the Web, which includes a time, magnitude, and earthquake intensities at each point of area, a place of earthquake epicenter of all earthquakes above level 1 on the Japanese seismic intensity

Data of such kinds can be obtained by crawling. They can be used to create training data for

Our research is an early approach to using Twitter as a social sensor for earthquake observations. It is meaningful that we apply methods by ordinary physical sensors to earthquake detection by social sensors. Furthermore, we present the possibility of earthquake detection without installing numerous physical sensors. The method is effective for earthquake observations in some countries where a few seismic sensors exist. However, it is difficult to detect earthquakes occurring in oceanic areas or less populated areas using methods we introduced in this chapter. Therefore, we must verify that earthquake detection by social sensors is effective when we apply these methods. Furthermore, the applicable scope of the earthquake observation by social sensors can be extended considering a stochastic gradient, more detailed probabilistic models, and so on. Many subjects remain to be explored

Bishop, C. M. (2006). *Pattern Recognition and Machine Learning*, Vol. 4 of *Information science and*

classifiers and to evaluate the performance an earthquake detection system.

U.S., Indonesia, Turkey, Iran, and Italy.

overlapping areas might be changed.

**6.2 Evaluation of earthquake detection**

scale5. The USGS publishes similar data on the Web6.

of earthquakes in each region.

**7. Conclusion**

in future work.

**8. References**

*statistics*, Springer.

<sup>6</sup> http://neic.usgs.gov/neis/qed/

<sup>5</sup> http://www.seisvol.kishou.go.jp/eq/shindo\_db/shindo\_index.html

chapter.
