**3.6 Performance analysis of algorithms**

form a single composite message that has the same layout as the composite messages in the dictionary. The layout of a composite message from a machine tool and

*Advances in Dynamical Systems Theory, Models, Algorithms and Applications*

The algorithms look up this compound message in the dictionary of all compound messages and extract its global alert level. They identify all occurrences of all

At this point the system analyzes the alert levels found, their frequency and, possibly, their relative positioning both in time and in space, and predicts possible

The system implements different types of analysis algorithms. The simplest algorithm catalogs the entire cluster of messages and identifies the warning levels that appear most frequently. A more sophisticated algorithm analyzes smaller homogeneous clusters of these composite messages. Another algorithm identifies the presence of sub-clusters, even small ones, which indicate a possible future

The software is written in DELPHI (Visual Development Environment), is highly parameterized (to allow quick modification) and is composed of about 2,000 lines of code structured in 30 operating modules, different methods for managing

The system at runtime is fast and accurate in identifying possible anomalous situations of a machine tool. It consists of a pre-processing step (to be performed only once, when the program starts) which has the purpose of building the entire data structure (the dictionary of all messages with their relative alert level) and an

The system is independent on the dictionary size. It reads the input text, consisting of one or more messages, one character at a time and scrolls through a finite automaton; when it encounters a final state (corresponding to the end of a message), it emits the alert level associated with that message. The number of state

**Pre-processing**. The system builds the data structure that the algorithms will use when running. Pre-processing only needs to be done once, at startup. In particular, the system constructs, in the central memory, a finite state automaton from all the elements of the dictionary, and its behavior is completely determined by a

**Matching**. The finite state automaton continuously reads the messages from the input log file, character by character. When it reaches a final state, the automaton shows the alert level of the message corresponding to this state. The algorithm is able to identify all occurrences of all messages, even if they were partially or totally overlapping. Lines with no messages (i.e. with "blank" content) are

**Analysis**. In this step, the system analyzes a set of alert levels identified in the previous phase. Normally, it analyzes a group of alarms, usually those issued in a period; then, based on the frequency and "severity" of the alerts extracted, it makes

transitions is proportional to the number of characters read in input.

(small) set of states and by some simple functions.

the interface and uses a large number of predefined libraries.

messages in the text, even if they are overlapped (partially or completely).

*n* Data Sources, is as follows:

+ ...

Machine ID + Timestamp. + DS1 ID + DS1 Message. + DS2 ID + DS2 Message.

+ DS*<sup>n</sup>* ID + DS*<sup>n</sup>* Message.

future malfunctions.

malfunction of the machine.

actual processing step.

The steps are:

ignored.

**6**

**3.5 Detailed algorithms description**

The pre-processing step (i.e. the construction of the FA) requires a time linearly proportional to the sum of the lengths of all the messages present in the dictionary, i.e. the total number of characters in the dictionary.

The matching algorithm for a set of textual log messages with a total length of *k* characters requires *n* state transitions (*n* ≤2*k*). Therefore, the analysis of a message takes a linear time with respect to the number of characters, and this is a lower limit, whatever the algorithm used among those who read the message character by character, since all characters of the message must be read.

Algorithms with an approach different from the one proposed here, should refer to the method that considers a whole input message as a single entity. These methods look for a message in the set of all possible messages (the dictionary). From the literature [15], we know that no algorithm in that class can use less than *O logn* ð Þ steps to look up a single message in a dictionary of *n* messages.

Therefore, the algorithms used here are independent of the size of the dictionary, which can be as large as we want without worsening the search time, while the classical algorithms are dependent on the size of the dictionary: the larger it is, the more time it takes to search for a message within it.

The differences are even greater by admitting the possibility of editing the dictionary. With our approach, the dictionary does not need to be sorted, so adding, changing, or deleting one or more dictionary entries is done at virtually no computation cost. On the other hand, with the classical approach, the dictionary must be kept orderly. Therefore, in case of cancelation, insertion or modification even of a single entry, the dictionary must be reordered, and this costs at least *O logn* ð Þ operations, possibly with physical movement of the entries from one memory area to another.

Another advantage of our approach is of a technical nature: the algorithms are all executed in central memory, while a classical method largely uses secondary memories (which are much slower by several orders of magnitude).

In addition, each log message (dictionary entry) is made up of (many) words and other non-alphabetic symbols and this means that the dictionary size can be very large and, furthermore, the search algorithms need the messages to be well delimited and not superimposed, while our approach is able to identify even totally overlapping or non-delimited messages and present within plain text sentences.
