**"Machine ID" + Date/Time # Process PID # Oper. Done # Result # "Warning Level" (1).**

Machine ID is a code that allows to identify the single machine. Since the same message can be emitted several times on the same machine, but never at the same instant, each message is identified by ID + Date/time. Each message corresponds to a single fact and, associated with each message, there is the level of warning issued. By adding the ID of the data source to the message, it is possible to integrate messages from different data sources that insist on the same machine.

#### **4.2 System prototype run and results**

In the preliminary phase, we had to identify the different types of messages emitted by the machine for which we had the log files. We analyzed 300 log files, each containing 1000 messages, from 4:25 on feb/13/2019 to 21:34 on feb/20/2019. *Text Mining for Industrial Machine Predictive Maintenance with Multiple Data Sources DOI: http://dx.doi.org/10.5772/intechopen.96575*


#### **Figure 4.**

**4.1 Case study data description**

deviations from the norm.

bold) of the example above are:

information, separated by an hashtag:

**4.2 System prototype run and results**

**"Warning Level" (1).**

**8**

machines.

PLC.. 4239;

4239;

A log is the sequential and chronological recording of the operations and events coming from a specific industrial machine tool. Log messages are stored in text format in one or more files, one record per line with each line containing only one message. Generally, these registrations are done in an automated way. Each record stores everything that happens on the machine, so a log file holds both information about normal machine operation and about errors and problems or even slight

This section shows the actual data relating to the case study examined, but the information contained in a log file of a generic machine tool is approximately the same, regardless of the machine; at most it could slightly change the data format in

Therefore, the case study presented here is representative of many industrial

Here, there is an example of the industrial machine log file from the case study.

13/02/2019 04:25:26;MSG\_SYS;Scrive;E 2034:indice variabile errato in riga PLC..

The log file has four blocks of useful data and a fifth block that is not useful for analysis. Each field is separated from the next by the semicolon symbol (;). The essential information contained in the first four blocks of the first row (the one in

**MSG\_SYS;** Process PID, i.e. the identification of the running process,

As described in the following section, we have extracted all the different messages from the 300,000 input lines; therefore we have assigned, with the assistance of the machine tool technician, an alert level to each single different message. A single log record stored in the dictionary is a line containing the following

Machine ID is a code that allows to identify the single machine. Since the same message can be emitted several times on the same machine, but never at the same instant, each message is identified by ID + Date/time. Each message corresponds to a single fact and, associated with each message, there is the level of warning issued. By adding the ID of the data source to the message, it is possible to integrate messages from different data sources that insist on the same machine.

In the preliminary phase, we had to identify the different types of messages emitted by the machine for which we had the log files. We analyzed 300 log files, each containing 1000 messages, from 4:25 on feb/13/2019 to 21:34 on feb/20/2019.

**Fine corsa asse.., Y+;** The status or result of the execution of the event

**"Machine ID" + Date/Time # Process PID # Oper. Done # Result #**

13/02/2019 04:25:30;MSG\_SYS;Scrive;E 2034:indice variabile errato in riga

each individual field or their mutual position in the log file.

**13/02/2019 04:25:24;MSG\_SYS;Scrive;Fine corsa asse.., Y+;**

*Advances in Dynamical Systems Theory, Models, Algorithms and Applications*

13/02/2019 04:25:28;MSG\_SYS;Scrive;Fine corsa asse.., Y+;

13/02/2019 04:25:24**; Date/time of recording of the event**

**Scrive;** The operation done

*One week summary of log message analysis.*

Of the 300,000 messages (the non-empty ones are actually 299,998), which we have inserted in a single file, those with different "semantic" content are 1,499. To achieve this we have eliminated every message that does not have useful information and the first 20 characters (date and time) of each remaining line and then we have identified all the different messages. At the end of this first pre-processing phase, we have a file that contains all and only 1,499 different messages.

In the next step, with the assistance of the machine tool technician, we assigned an ALERT level to each of the 1,499 different messages. The alert levels are ordered according to the increasing level of severity: white, yellow, orange, red and black.

The final document contains the associations between the 1,499 messages and the related warning level. We report here the screen of the test sent by us on the log file and the first part of the list of messages encountered with the relative multiplicity and alerts: In **Figure 4** there is the final report of a week's analysis of log messages from a real industrial machine.

During this week there were three times when groups of messages occurred that required the supervision of experienced personnel, but none of these alarms turned into a request to stop the machine. Two interventions were due to a red alert and another due to an anomalous aggregation of orange and yellow alerts in a short period. At the end of the week, during a planned machine downtime, some adjustments were made, suggested by the presence of some clusters of messages relating to a slight deviation of the machine performance, compared to the estimated one.

Since the messages can be analyzed in real time, if clusters of "dangerous" messages occurr during the operation of the machine, the machine experts would been able to intervene in time to prevent irreparable damage. Note that the message cluster can also be formed by several messages of mild severity issued, however, in an anomalous configuration or within very short times.
