*Text Mining for Industrial Machine Predictive Maintenance with Multiple Data Sources DOI: http://dx.doi.org/10.5772/intechopen.96575*

must be interpreted, choosing only the pertinent ones; finally, the data must be semantically normalized, to adapt it to a common data semantics.

In the next step of the pre-processing phase, the system associates an alert level, based on a chromatic scale, to each of the messages coming from the previous step. The levels are as follows:

**White** Level (no problem).

**Yellow** Level (warning): There have been sporadic, but not serious, anomalies and the system is able to continue without problems.

**Orange** Level (serious): some serious anomalies, or some cluster of anomalies, have been found in the system, which could affect the future of work.

**Red** Level (very serious): Very serious anomalies were found in the system, capable of affecting, in the very near future, the production activity.

**Black** Level (immediate stop): the system runs the immediate risk of irreparable equipment or product failures.

In the third step of the pre-processing phase, to be carried out only if there are several data sources associated with the machine tool, composite messages are created, obtained by composing the messages with at least an yellow alert level, a composite message. The order in which messages are juxtaposed is predefined and must be the same as it will appear at runtime. At this point, a general alert level is associated with each composite message.

Finally, a digital data structure is built containing all composite messages: a text dictionary, in which each line contains a composite message and the relative general alert level.

Steps 1 and 2 must be performed for each of the data source entities associated with a specific machine tool. Identifying alert levels requires the help of the machine tool expert, thus steps 2 and 3 need the help of a machine tool expert.

The whole data design process is reported in **Figure 3**.

### **3.4 Runtime phase**

network to the factory control system. Here, the feedback loop is closed, and

*Advances in Dynamical Systems Theory, Models, Algorithms and Applications*

The generic sensor is represented by its two constituent parts: the transducer (Ti) and the control electronics (CEi); this distinction is useful because even if the sensor is a unified whole, the transducer and the control electronics can be placed into different area of the machine tool. For example, the accelerometer transducer can be placed on the spindle while its control electronics is usually placed within the

The event-based dynamic cyber-physical system proposed here integrates physical devices with advanced Text Mining analytical technologies and is very general, therefore adaptable to any production domain. It needs the ontology of the messages emitted by the data sources of a machine tool during its normal operation and is able to intercept its (even slightly) anomalous behaviors, which allow to evaluate whether it is moving towards a failure state such as to require the

The system consists of a design pre-processing phase to create the main message ontology and which is performed only once for each data source (for example a sensor), and an algorithmic runtime phase. Each message correspond to an event of the data source, has the form of a text messages and has associated an adequate

The pre-processing phase is performed only once for each data source and allows you to create the initial ontology of the messages. It consists of four main steps. In the first step of the pre-processing phase, for each data source the set of all messages that it can emits is identified and normalized. In particular, for each industrial machine and for each data source associated with that machine, the possible types of messages that can be issued must be identified; then, these data

feedback actions can eventually be taken.

*A simple model for machine tool data collection.*

cabinet together with other control subsystems.

**3.2 System functionality**

alert level.

**4**

**Figure 2.**

activation of safeguard procedures.

**3.3 Pre-processing design phase**

In the runtime phase, the real data coming from a machine tool are collected and normalized according to the specifications identified in the pre-processing phase. Then, all messages with a compatible timestamp are aggregated into a line of text to

**Figure 3.** *Data design process.*

form a single composite message that has the same layout as the composite messages in the dictionary. The layout of a composite message from a machine tool and *n* Data Sources, is as follows:

a classification of the health of the machine. The system warns if the analysis of the message cluster indicates a possible future malfunction of the machine tool. The system can also provide an immediate response if a single alert is very critical.

*Text Mining for Industrial Machine Predictive Maintenance with Multiple Data Sources*

The pre-processing step (i.e. the construction of the FA) requires a time linearly proportional to the sum of the lengths of all the messages present in the dictionary,

The matching algorithm for a set of textual log messages with a total length of *k* characters requires *n* state transitions (*n* ≤2*k*). Therefore, the analysis of a message takes a linear time with respect to the number of characters, and this is a lower limit, whatever the algorithm used among those who read the message character by

Algorithms with an approach different from the one proposed here, should refer

Therefore, the algorithms used here are independent of the size of the dictionary, which can be as large as we want without worsening the search time, while the classical algorithms are dependent on the size of the dictionary: the larger it is, the

The differences are even greater by admitting the possibility of editing the dictio-

In addition, each log message (dictionary entry) is made up of (many) words and other non-alphabetic symbols and this means that the dictionary size can be very large and, furthermore, the search algorithms need the messages to be well delimited and not superimposed, while our approach is able to identify even totally overlapping or non-delimited messages and present within plain text sentences.

We have also implemented a prototype of the cyber-physical system presented in this paper. In this section we report the results of some preliminary tests we conducted on some data coming from a machine tool currently operating in an important company, operating in Southern Italy, which produces metal molds for

In particular, we analyzed a log file relating to a period of 194 hours of continuous work of a machine tool, from 4:25 on 13 / Feb / 2019 to 21:34 on 20 / Feb / 2019,

In the preliminary testing phase, we used data from a single data source from a single machine tool, then we considered simple, not composite messages, because our initial interest was to verify the feasibility of algorithmic message search in ontology.

nary. With our approach, the dictionary does not need to be sorted, so adding, changing, or deleting one or more dictionary entries is done at virtually no computation cost. On the other hand, with the classical approach, the dictionary must be kept orderly. Therefore, in case of cancelation, insertion or modification even of a single entry, the dictionary must be reordered, and this costs at least *O logn* ð Þ operations, possibly with physical movement of the entries from one memory area to another. Another advantage of our approach is of a technical nature: the algorithms are all executed in central memory, while a classical method largely uses secondary mem-

to the method that considers a whole input message as a single entity. These methods look for a message in the set of all possible messages (the dictionary). From the literature [15], we know that no algorithm in that class can use less than

*O logn* ð Þ steps to look up a single message in a dictionary of *n* messages.

**3.6 Performance analysis of algorithms**

*DOI: http://dx.doi.org/10.5772/intechopen.96575*

i.e. the total number of characters in the dictionary.

more time it takes to search for a message within it.

**4. A simple case study**

**7**

other national and international companies.

in which it issued approximately 300,000 log messages.

ories (which are much slower by several orders of magnitude).

character, since all characters of the message must be read.

Machine ID + Timestamp. + DS1 ID + DS1 Message. + DS2 ID + DS2 Message. + ... + DS*<sup>n</sup>* ID + DS*<sup>n</sup>* Message.

The algorithms look up this compound message in the dictionary of all compound messages and extract its global alert level. They identify all occurrences of all messages in the text, even if they are overlapped (partially or completely).

At this point the system analyzes the alert levels found, their frequency and, possibly, their relative positioning both in time and in space, and predicts possible future malfunctions.

The system implements different types of analysis algorithms. The simplest algorithm catalogs the entire cluster of messages and identifies the warning levels that appear most frequently. A more sophisticated algorithm analyzes smaller homogeneous clusters of these composite messages. Another algorithm identifies the presence of sub-clusters, even small ones, which indicate a possible future malfunction of the machine.

The software is written in DELPHI (Visual Development Environment), is highly parameterized (to allow quick modification) and is composed of about 2,000 lines of code structured in 30 operating modules, different methods for managing the interface and uses a large number of predefined libraries.
