**2. Industrial machine tool data**

In the following, the term "machine data" defines an information message emitted by any "data source" associated with a machine tool (for example a sensor), which concerns an event that occurred during the activity of the machine itself. Machine data are the log files emitted by the machine itself, but also all data emitted by external entities such as event sensors, sensors for speed, temperature, acceleration and so on.

In order to reconstruct in detail the "history" of the messages, machine tool data must be raw and immutable (as opposed to classical structured and aggregated Relational Databases data). Data is never deleted or updated (except in very rare cases, for example to comply with regulations), it is only added. The main disadvantage of this management is that the stored data tends to become very large. From this point of view we can talk about big data. To avoid misinterpretation of the data, they are not stored in a free format, but are "semantically normalized", that is, remodeled in a standard format, even if not strictly structured.

Machine tools data are represented through the "fact-based model" (see **Figure 1**): a graph where each message corresponds to a single fact. In this graph, each node corresponds to a machine tool data source entity (e.g. Sensor Y on machine M) and each arc can represent information about an entity (dashed connecting line) or a relationship between two entities (continuous line). Each single message is identified by its timestamp plus the identification of the machine (M) and the identification of the entity that emits the message (Sensor Y).

*Text Mining for Industrial Machine Predictive Maintenance with Multiple Data Sources DOI: http://dx.doi.org/10.5772/intechopen.96575*

check, or after a certain amount of work, the state of use of the machine, without any signs of behavioral anomalies having actually occurred. According to [8], if the maintenance strategy only involves interventions that react to failures, the maintenance costs are relatively low but the losses could be high. If preventive and predictive maintenance is introduced, maintenance costs increase: for example, some activities must be carried out using overtime, detectors for predictive maintenance are introduced, time is dedicated to training activities for operators and maintenance workers. Some clues for defining the algorithms on which the cyber-physical system presented here is based derive from research in the field of computational linguistics, which have appeared on papers presented at various important interna-

*Advances in Dynamical Systems Theory, Models, Algorithms and Applications*

Here we present a discrete dynamic system, based on events represented as textual messages to which an alert level has been associated and whose data structure is a graph. The system is dynamic because the data structure adjusts itself without additional computational costs if a new message is issued by the machine, which has never appeared before and that is not yet included in the set of known messages. The new message must in any case be validated by a human expert, who

This paper is structured as follows: in section 2 we present the main characteristics of the data emitted by the data sources associated with an industrial machine tool, and how they are represented in the system. In section 3 we present the system. In particular, in section 3.1 we present the model for a machine tool, in section 3.2 we give an overview of the system, while in section 3.3 and in section 3.4 we present the two main phases of the system: pre-processing, to be performed only once, and runtime; in section 3.5 and in section 3.6 we present the main algorithms and their theoretical performances; in section 4 we report a brief summary of the results of a prototype of the system applied to a simple, but real, case study. The

In the following, the term "machine data" defines an information message emitted by any "data source" associated with a machine tool (for example a sensor), which concerns an event that occurred during the activity of the machine itself. Machine data are the log files emitted by the machine itself, but also all data emitted by external entities such as event sensors, sensors for speed, temperature, accelera-

In order to reconstruct in detail the "history" of the messages, machine tool data

must be raw and immutable (as opposed to classical structured and aggregated Relational Databases data). Data is never deleted or updated (except in very rare cases, for example to comply with regulations), it is only added. The main disadvantage of this management is that the stored data tends to become very large. From this point of view we can talk about big data. To avoid misinterpretation of the data, they are not stored in a free format, but are "semantically normalized",

Machine tools data are represented through the "fact-based model" (see **Figure 1**): a graph where each message corresponds to a single fact. In this graph, each node corresponds to a machine tool data source entity (e.g. Sensor Y on machine M) and each arc can represent information about an entity (dashed connecting line) or a relationship between two entities (continuous line). Each single message is identified by its timestamp plus the identification of the machine

(M) and the identification of the entity that emits the message (Sensor Y).

that is, remodeled in a standard format, even if not strictly structured.

must associate the message with an adequate alert level.

paper ends in section 5 with some conclusions.

**2. Industrial machine tool data**

tion and so on.

**2**

tional conferences [9–14].

Physically, we assume that each source message emitted by a machine tool is in a semi-structured text format (a sort of simplified JSON): that is, it is a succession of text fields, divided by a field terminator. This provides simplicity and flexibility with the greatest possible space savings. It is possible to store anything within the main dataset, as long as each data has the same information placed in the same order, with the same data type format; otherwise it will be necessary to carry out a pre-processing of the source messages before storing them, downloading the data that do not conform to the expected format in a separate archive.
