**4.1 Case study data description**

A log is the sequential and chronological recording of the operations and events coming from a specific industrial machine tool. Log messages are stored in text format in one or more files, one record per line with each line containing only one message. Generally, these registrations are done in an automated way. Each record stores everything that happens on the machine, so a log file holds both information about normal machine operation and about errors and problems or even slight deviations from the norm.

This section shows the actual data relating to the case study examined, but the information contained in a log file of a generic machine tool is approximately the same, regardless of the machine; at most it could slightly change the data format in each individual field or their mutual position in the log file.

Therefore, the case study presented here is representative of many industrial machines.

Here, there is an example of the industrial machine log file from the case study. **13/02/2019 04:25:24;MSG\_SYS;Scrive;Fine corsa asse.., Y+;**

13/02/2019 04:25:26;MSG\_SYS;Scrive;E 2034:indice variabile errato in riga PLC.. 4239;

> Of the 300,000 messages (the non-empty ones are actually 299,998), which we have inserted in a single file, those with different "semantic" content are 1,499. To achieve this we have eliminated every message that does not have useful information and the first 20 characters (date and time) of each remaining line and then we have identified all the different messages. At the end of this first pre-processing phase, we have a file that contains all and only 1,499 different messages.

*Text Mining for Industrial Machine Predictive Maintenance with Multiple Data Sources*

messages from a real industrial machine.

*One week summary of log message analysis.*

*DOI: http://dx.doi.org/10.5772/intechopen.96575*

**5. Conclusions**

**9**

**Figure 4.**

an anomalous configuration or within very short times.

In the next step, with the assistance of the machine tool technician, we assigned an ALERT level to each of the 1,499 different messages. The alert levels are ordered according to the increasing level of severity: white, yellow, orange, red and black. The final document contains the associations between the 1,499 messages and the related warning level. We report here the screen of the test sent by us on the log file and the first part of the list of messages encountered with the relative multiplicity and alerts: In **Figure 4** there is the final report of a week's analysis of log

During this week there were three times when groups of messages occurred that required the supervision of experienced personnel, but none of these alarms turned into a request to stop the machine. Two interventions were due to a red alert and another due to an anomalous aggregation of orange and yellow alerts in a short period. At the end of the week, during a planned machine downtime, some adjustments were made, suggested by the presence of some clusters of messages relating to a slight deviation of the machine performance, compared to the estimated one. Since the messages can be analyzed in real time, if clusters of "dangerous" messages occurr during the operation of the machine, the machine experts would been able to intervene in time to prevent irreparable damage. Note that the message cluster can also be formed by several messages of mild severity issued, however, in

We have presented here an innovative methodology and an associated fast and efficient software system prototype, for the algorithmic prediction of industrial machine tools malfunctions, adaptable to any type of company. It integrates

13/02/2019 04:25:28;MSG\_SYS;Scrive;Fine corsa asse.., Y+;

13/02/2019 04:25:30;MSG\_SYS;Scrive;E 2034:indice variabile errato in riga PLC.. 4239;

The log file has four blocks of useful data and a fifth block that is not useful for analysis. Each field is separated from the next by the semicolon symbol (;). The essential information contained in the first four blocks of the first row (the one in bold) of the example above are:


As described in the following section, we have extracted all the different messages from the 300,000 input lines; therefore we have assigned, with the assistance of the machine tool technician, an alert level to each single different message.

A single log record stored in the dictionary is a line containing the following information, separated by an hashtag:
