3.2.2 Remarks and first interpretation

We remark overall the following:


instruments (in NQ cases, precision+recall optimal rate is constant from

Best parameters maximizing precision+recall rate for different futures and mean bar price structure in the first

Futures Recall Precision Precision+recall θMIR n ω (buckets) Classifier Bar price ES 0.9737 0.1950 1.1687 0.062 60 2400 Student Mean EC 0.9058 0.9691 1.8749 0.006 30 2500 Student Mean CL 0.9789 0.8654 1.8443 0.022 40 2500 Student Mean NQ 1 0.0036 1.0036 0.08 30 400 Gaussian Mean YM 1 0.1921 1.1921 0.055 30 2500 Student Mean

Best parameters maximizing precision+recall rate for different futures and median bar price structure in the

Futures Recall Precision Precision+recall θMIR n ω (buckets) Classifier Bar price ES 0.9737 0.1996 1.1733 0.062 60 2400 Gaussian Median EC 0.9037 0.9718 1.8755 0.006 30 2500 Student Median CL 0.9447 0.8951 1.8398 0.022 60 2500 Student Median NQ 1 0.0036 1.0036 0.08 30 400 Gaussian Median YM 1 0.1911 1.1911 0.054 30 2500 Student Median

• When the flash crash is significantly present for the instrument, i.e., of high magnitude and rare in the data set (ES, YM, and NQ cases), then recall is high, which means that VPIN makes a prediction before this happens, but precision

• When the flash crash is not significantly present for the instrument, i.e., of low magnitude and not rare (there are a lot of events of 10–20-minute length of

• VPIN seems to be a poor indicator of flash crash prediction with the usual

• VPIN can be a better indicator of another type of event (crashes of less

We made a comparison of VPIN prediction quality result with a "naive classifier," which randomly chooses whether or not there will be a crash from each

We will compare the results of the same deep search with the one of a naive classifier, to see whether or not the good prediction results in CL and ES cases are

is low: VPIN detects other events that are not flash crashes.

same magnitude), then recall and precision are high.

This may suggest one of the following hypotheses:

0.8 to 0.9).

Table 5.

Table 4.

first deep search.

deep search.

The results give two first findings:

An Assessment of the Prediction Quality of VPIN DOI: http://dx.doi.org/10.5772/intechopen.86532

recommended threshold 0.99.

3.2.3 Benchmark with a "naive classifier"

important amplitude).

relevant.

63

• CL and EC obtain their maximum value to the minimum bound of the deep search (respectively, a 2.2% fall and 0.6% fall). It is not the case for other


#### Table 2.

Best parameters maximizing precision+recall rate for different futures and last bar price structure in the first deep search.


#### Table 3.

Best parameters maximizing precision+recall rate for different futures and first bar price structure in the first deep search.

An Assessment of the Prediction Quality of VPIN DOI: http://dx.doi.org/10.5772/intechopen.86532


Table 4.

3.2 Results

3.2.1 Best parameters found

3.2.2 Remarks and first interpretation

We remark overall the following:

precision+recall rate on average.

"high" since recall is already "high."

• Recall rates are very close to 1.

are "low."

Table 2.

Table 3.

62

deep search.

deep search.

In Tables 2–5 one case see the best parameters that maximize precision+recall

• The choice of bar structure does not really affect the optimal choice of other parameters; nevertheless mean and median bar price structures have best

• Since ES, NQ, and YM precision rates are "low", thus precision + recall rates

• Since EC and CL precision rates are "high," thus precision + recall rates are

increases a lot the number of crash of same magnitude detected in the data set.

• CL and EC obtain their maximum value to the minimum bound of the deep search (respectively, a 2.2% fall and 0.6% fall). It is not the case for other

Futures Recall Precision Precision+recall θMIR n ω (buckets) Classifier Bar price ES 0.9737 0.2171 1.1908 0.062 60 2400 Gaussian Last EC 0.9080 0.9644 1.8724 0.006 30 2500 Gaussian Last CL 0.9406 0.9045 1.8451 0.022 60 2500 Student Last NQ 1 0.0034 1.0034 0.08 30 400 Gaussian Last YM 0.8421 0.1512 0.9933 0.064 60 2500 Gaussian Last

Best parameters maximizing precision+recall rate for different futures and last bar price structure in the first

Futures Recall Precision Precision+recall θMIR n ω (buckets) Classifier Bar price ES 0.9737 0.2024 1.1761 0.062 60 2400 Gaussian First EC 0.9127 0.9681 1.8808 0.006 30 2500 Student First CL 0.9534 0.9012 1.8546 0.022 60 2500 Student First NQ 1 0.0038 1.0038 0.08 30 400 Gaussian First YM 0.8421 0.1449 0.9870 0.064 60 2500 Gaussian First

Best parameters maximizing precision+recall rate for different futures and first bar price structure in the first

• CL and EC had on May 6, 2010, a very low flash crash threshold, which

for each financial instrument and bar price structure studied.

Advanced Analytics and Artificial Intelligence Applications

Best parameters maximizing precision+recall rate for different futures and median bar price structure in the first deep search.


#### Table 5.

Best parameters maximizing precision+recall rate for different futures and mean bar price structure in the first deep search.

instruments (in NQ cases, precision+recall optimal rate is constant from 0.8 to 0.9).

The results give two first findings:


This may suggest one of the following hypotheses:


We will compare the results of the same deep search with the one of a naive classifier, to see whether or not the good prediction results in CL and ES cases are relevant.

## 3.2.3 Benchmark with a "naive classifier"

We made a comparison of VPIN prediction quality result with a "naive classifier," which randomly chooses whether or not there will be a crash from each


Indeed, the first hypothesis is that there are too many false VPIN predictions, i.e., false-positive events, as precision rate is too low and recall rate is too high. That's why one may hope that making θVPIN constraints higher may reduce the number of VPIN "useless" predictions while not reducing too much recall rate.

In the following we have looked to higher bounds for θVPIN from 0.99 to 0.99999. All other parameters of the deep search are the same. Below, one can see the results in Tables 7–10. The results for the naive algorithm are indeed the same.

• Precision rate has increased for each bar price structure for ES instrument,

Futures Recall Precision Precision+recall θMIR n ω (buckets) Classifier θVPIN ES 0.9737 0.4677 1.4414 0.062 60 1600 Gaussian 0.99999 EC 0.9080 0.9644 1.8724 0.006 30 2500 Gaussian 0.99 CL 0.9406 0.9045 1.8451 0.022 60 2500 Student 0.99 NQ 1 0.0034 1.0034 0.08 30 400 Gaussian 0.99 YM 0.7091 0.3160 1.0251 0.054 60 2500 Student 0.9999

Best parameters maximizing precision+recall rate for different futures and last bar price structure allowing

Futures Recall Precision Precision+recall θMIR n ω (buckets) Classifier θVPIN ES 0.9737 0.3412 1.3149 0.062 60 1200 Gaussian 0.99999 EC 0.9127 0.9681 1.8808 0.006 30 2500 Student 0.99 CL 0.9534 0.9012 1.8546 0.022 60 2500 Student 0.99 NQ 1 0.0038 1.0038 0.08 30 2500 Gaussian 0.99 YM 0.7091 0.3545 1.0636 0.054 60 2500 Student 0.9999

Best parameters maximizing precision+recall rate for different futures and first bar price structure allowing

Futures Recall Precision Precision+recall θMIR n ω (buckets) Classifier θVPIN ES 0.9737 0.3306 1.3043 0.062 30 1700 Gaussian 0.99999 EC 0.9037 0.9718 1.8755 0.006 30 2500 Student 0.99 CL 0.9447 0.8951 1.8398 0.022 60 2500 Student 0.99 NQ 1 0.0036 1.0036 0.08 30 400 Gaussian 0.99 YM 1 0.1911 1.1911 0.054 30 2500 Student 0.99

Best parameters maximizing precision+recall rate for different futures and median bar price structure allowing

• Precision + recall rate has increased for YM instrument only with a last or first bar price structure, but recall decreased a bit compared to θVPIN = 0.99 case.

maintaining recall rate constant to θVPIN = 0.99 case.

3.2.4 Deep search allowing higher bounds for θVPIN

An Assessment of the Prediction Quality of VPIN DOI: http://dx.doi.org/10.5772/intechopen.86532

We remark the following:

Table 7.

Table 8.

Table 9.

65

higher bounds for θVPIN.

higher bounds for θVPIN.

higher bounds for θVPIN.

Table 6.

Best parameters maximizing precision+recall rate for different futures for the naive classifier.

bucket of the data set. In Table 6 one can see the results of the naive classifier for the first deep search set of parameters.<sup>6</sup> As it is a naive classifier, results do not depend on direction of prices (bar price classifier) and bar price structure.

We remark the following:


We can interpret it as follows:


Anyway, previous results may conclude that for "flash crash" prediction, VPIN has overall equivalent poor power prediction with the traditional threshold θVPIN = 0.99, as a "naive" algorithm.

That's why in the next paragraph, we benchmark predictive power of "naive" and VPIN algorithms:


<sup>6</sup> First tests conducted with EC instrument have been realized with an average to get more robust results. They are really close to the one obtained here with a single realization of randomness.

An Assessment of the Prediction Quality of VPIN DOI: http://dx.doi.org/10.5772/intechopen.86532

Indeed, the first hypothesis is that there are too many false VPIN predictions, i.e., false-positive events, as precision rate is too low and recall rate is too high. That's why one may hope that making θVPIN constraints higher may reduce the number of VPIN "useless" predictions while not reducing too much recall rate.
