3.1.2 Defining true positive events

Here we describe how we define true positive, false-negative, and false-positive events. For a given prediction window length ω:


## 3.1.3 Choosing the maximum value of ω

To make a useful deep search, we have computed the distribution of time difference between different amounts ω of buckets. Indeed, we want to control a temporal time window reasonable for practitioners and still sufficiently wide so that we can analyze which events VPIN can detect or not. We have focused this research to have a stable bounded distribution of time difference between ω buckets of about 1 month. Below one can see the respective distribution for the S&P500 instrument; the four other distributions of the instruments studied look the same (Figure 3).

step 0.1% for EC instrument, θMIR ∈ ½ � 8%; 9% with step 0.1% for NQ instrument, <sup>θ</sup>MIR <sup>∈</sup>½ � <sup>5</sup>:4%; <sup>6</sup>:4% with step 0.1% for YM instrument<sup>5</sup> do:

Futures Days Number of bucket chosen

ES 14.8 2500 EC 13.8 2500 CL 15.0 2500 YM 14.3 2500 NQ 15.2 2500

◦ store current parameters, precision and recall if and only if

Remark: we first try to maximize precision+recall rate. If the local maximum found is interesting for practice (at least superior or equal to 1.2) and more powerful than a "naive" algorithm, then it sounds worth making a more serious search of precision and recall rates separately to find a good trade-off between them (e.g.,

<sup>5</sup> As each MIR value for the flash crash is different, one must adapt the area of deep search to be precise

◦ store prediction length (distance between VPIN event and MIR

• For each VPIN support n ∈½ � 30; 60 , with step 10, do:

recall þ precision ≥previousLocalMaximum

◦ For <sup>ω</sup><sup>∈</sup> ½ � <sup>100</sup>; <sup>2500</sup> with step 100, do:

Median of time difference between 2500 buckets for the different instruments.

◦ test prediction

Time difference distribution between 2500 S&P 500 buckets.

An Assessment of the Prediction Quality of VPIN DOI: http://dx.doi.org/10.5772/intechopen.86532

event).

thanks to a ROC curve).

61

and have a quicker calculation time.

Figure 3.

Table 1.

In Table 1 one can see the medians of the different distributions.

For the next step, ω ≤ 2500.

## 3.1.4 Describing deep search of flash crash prediction

Here we describe how we intend to make a first deep search of VPIN prediction quality of events close to the "Flash Crash" of May 2010. In this algorithm described below θVPIN = 0.99.<sup>4</sup>

For each VPIN classifier (student or Gaussian), for each bar price structure (last, first, median, average) do:

• For each θMIR ∈½ � 5:2%; 6:2% with step 0.1% for ES instrument, θMIR ∈ ½ � 2:2%; 3:2% with step 0.1% for CL instrument, θMIR ∈½ � 0:4%; 0:9% with

<sup>2</sup> If <sup>j</sup> � <sup>ω</sup> <sup>&</sup>lt; 0, the window of buckets considered is [0,j-1].

<sup>3</sup> If <sup>j</sup> <sup>þ</sup> <sup>ω</sup>>endOfDataSet, the window of buckets considered is [j + 1,endOfDataSet].

<sup>4</sup> Previous research, such as [5], showed that this threshold is a good one.

An Assessment of the Prediction Quality of VPIN DOI: http://dx.doi.org/10.5772/intechopen.86532

#### Figure 3.

• Bar price: mean, median, last price, first price

Advanced Analytics and Artificial Intelligence Applications

• VPIN classifier (student, normal)

3.1.2 Defining true positive events

it is a false-negative event.

3.1.3 Choosing the maximum value of ω

For the next step, ω ≤ 2500.

(last, first, median, average) do:

below θVPIN = 0.99.<sup>4</sup>

60

3.1.4 Describing deep search of flash crash prediction

<sup>2</sup> If <sup>j</sup> � <sup>ω</sup> <sup>&</sup>lt; 0, the window of buckets considered is [0,j-1].

• Prediction window ω (described below)

events. For a given prediction window length ω:

event.<sup>3</sup> Otherwise it is a false-positive event.

• VPIN support n

• MIR decision threshold θMIR to detect a flash crash

• VPIN decision threshold θVPIN to predict a flash crash

• From a VPIN event at a bucket j (i.e., VPINNormalized,<sup>j</sup> ≥ θVPIN,

Here we describe how we define true positive, false-negative, and false-positive

• From a MIR flash crash detection (i.e., MIRj ≥ θMIR) at a bucket j (j ≥ ω), if in the window of buckets [j-ω,j-1] there is a VPIN event (i.e., VPINNormalized, <sup>i</sup> ≥ θVPIN, i ∈ [j-ω,j-1]), then we consider it as a true positive event.<sup>2</sup> Otherwise

j þ ω≤end Of DataSet), if in the window of buckets [j + 1,j + ω] there is a flash crash ((i.e., MIRi ≥ θMIR, i ∈ [j+1,j+ω]), then we consider it as a true positive

To make a useful deep search, we have computed the distribution of time difference between different amounts ω of buckets. Indeed, we want to control a temporal time window reasonable for practitioners and still sufficiently wide so that we can analyze which events VPIN can detect or not. We have focused this research to have a stable bounded distribution of time difference between ω buckets of about 1 month. Below one can see the respective distribution for the S&P500 instrument; the four other distributions of the instruments studied look the same (Figure 3).

Here we describe how we intend to make a first deep search of VPIN prediction quality of events close to the "Flash Crash" of May 2010. In this algorithm described

θMIR ∈ ½ � 2:2%; 3:2% with step 0.1% for CL instrument, θMIR ∈½ � 0:4%; 0:9% with

For each VPIN classifier (student or Gaussian), for each bar price structure

In Table 1 one can see the medians of the different distributions.

• For each θMIR ∈½ � 5:2%; 6:2% with step 0.1% for ES instrument,

<sup>3</sup> If <sup>j</sup> <sup>þ</sup> <sup>ω</sup>>endOfDataSet, the window of buckets considered is [j + 1,endOfDataSet].

<sup>4</sup> Previous research, such as [5], showed that this threshold is a good one.

Time difference distribution between 2500 S&P 500 buckets.


#### Table 1.

Median of time difference between 2500 buckets for the different instruments.

step 0.1% for EC instrument, θMIR ∈ ½ � 8%; 9% with step 0.1% for NQ instrument, <sup>θ</sup>MIR <sup>∈</sup>½ � <sup>5</sup>:4%; <sup>6</sup>:4% with step 0.1% for YM instrument<sup>5</sup> do:

	- For <sup>ω</sup><sup>∈</sup> ½ � <sup>100</sup>; <sup>2500</sup> with step 100, do:
	- test prediction
	- store current parameters, precision and recall if and only if recall þ precision ≥previousLocalMaximum
	- store prediction length (distance between VPIN event and MIR event).

Remark: we first try to maximize precision+recall rate. If the local maximum found is interesting for practice (at least superior or equal to 1.2) and more powerful than a "naive" algorithm, then it sounds worth making a more serious search of precision and recall rates separately to find a good trade-off between them (e.g., thanks to a ROC curve).

<sup>5</sup> As each MIR value for the flash crash is different, one must adapt the area of deep search to be precise and have a quicker calculation time.
