**Chapter 2** Pre-Informing Methods for ANNs

*Mustafa Turker*

## **Abstract**

In the recent past, when computers just entered our lives, we could not even imagine what today would be like. If we look at the future with the same perspective today, only one assumption can be made about where technology will go in the near future; Artificial intelligence applications will be an indispensable part of our lives. While today's work is promising, there is still a long way to go. The structures that researchers define as artificial intelligence today are actually programmed programs with limits and are result-oriented. Real learning includes many complex features such as convergence, association, inference and prediction. It has been demonstrated with an application how to transfer the input layer connections in human neurons to the artificial learning network with the pre-informing method. When the results are compared, the learning load (weights) was reduced from 147 to 9 with the proposed pre-informing method, and the learning rate was increased between 15–30% according to the activation function used.

**Keywords:** ANN, pre-informing, AHP, modified networks, interfered networks

## **1. Introduction**

The learning mechanism makes human beings superior to all other creatures. Despite the fact that today's computers have much more processing power, the human brain is still much more efficient than any computer or any artificially developed intelligence.

Building a perfect learning network requires more than just cell structures and its weights. The human brain has a very complex network, and each brain is unique for itself. Today's technology is not enough to explain all the details of how our brain works. My observation of how our brain works starts from defining items. Every item has a key cell in our brain. Defining process is done by visuals, smell, feeling, linguistic name, hearing its sound. If these key cells match any of these information from body inputs, thinking and learning continues, if there is no key cell defined before, new cell is assigned for this item. Then, your brain wants to explore these item's behavior. You start to take this item in your hand and start the psychical observation. When the psychical observation is satisfied, your brain starts to categorize it. After categorization, your brain checks other items for same categorization and determines what other information can be learned. Whenever you see someone has more knowledge from you, then you want to speak about this newly learned item, or you want to do research on it. This key cell started to develop itself with explored information. Each key cell

and its network can also connect to each other in any part, if there are logical connections that exist.

Today's artificial intelligence studies are a little simple compared to reality. Mathematical modeling of learning in an artificial cell and solving the problem with an optimization mechanism has resulted in success in most areas. However, this success is due to the fast processing capacity of computers rather than the perfect modeling of machine learning. In this case, researchers need to work on developing artificial neural networks close to the real learning.

In this study, the pre-informing method and rules in artificial neural networks are explained with an example in order to establish a more conscious and effective learning network instead of searching for relationships in random connections.

## **2. ANN structure**

In the literature of ANN design, the first principles were introduced in the middle of the 20th century [1, 2]. Over the following years, network structures such as Perceptron, Artron, Adaline, Madaline, Back-Propagation, Hopfield Network, Counter-Propagation Network, Lamstar were developed [3–10].

The complex behavior of our brain artificially imitated through layers is most network configuration. Basically, an artificial neural network has 3 types of layer group: input layer, hidden layers, output layer (See **Figure 1**). And all cells in these layers connected each other with artificial weights [1, 2].

Input layer is the cluster of cells present the data that has influence on learning. Each cell represents a parameter with a variable data value. These values are scaled

### *Pre-Informing Methods for ANNs DOI: http://dx.doi.org/10.5772/intechopen.106906*

according to the limits of the activation function used in the next layers. The selection of input parameters requires knowledge and experience on the subject to be created artificial intelligence. In fact, this process is exactly the transfer of natural neuron input parameters from our brain to paper. However, this is not so easy because a learning activity in our brain is connected by a huge number of networks managed subconsciously. To explain this situation, sometimes our minds make some inferences even on subjects we have no knowledge of, and we can make correct predictions about this subject. In some cases, we feel the result of an event that we do not know, but we cannot explain it. In fact, the best example of this is falling in love. No one can tell why you fall in love with a person, it happens and then you look for the reason. This is proof that the subconscious mind plays a major role in learning. This means that there may also be some input parameters that we did not notice. Therefore, it is necessary to focus on this layer and define the input parameters.

Hidden layer(s) is the layer where the data of the input parameters are interpreted, and the learning capability of the network is defined. Each cell in these layers transfers the data from the input layer cells or previously hidden layer cells with the defined activation function and sends it to all cells in the next layer. Learning of nonlinear behavior takes place in this layer. Increasing the number of layers and cells in this group does not always work, but provides memorization, not learning. This also increases the number of connections and thus highly increases the required experienced data to determine the weight values of these connections.

In general, the basic mechanism of an artificial neuron consists of two steps: summation and activation [1]. Summation is the process of summing the intensities of incoming connections. Activation, on the other hand, is the process of transforming the collected signals according to the defined function (See **Figure 2**).

There are many activation functions. The purpose of these functions is to emulate linear or non-linear behavior. The sigmoid function is one of the most commonly used activation functions.

Mathematically, the summation and activation process of an artificial neuron is expressed as below (See Eqs. (1) and (2)).

$$u = \sum\_{n=1}^{m} \mathbf{x}\_i \ast w\_i - \theta \tag{1}$$

$$y = f(u) \tag{2}$$

In these equations,


**Figure 2.** *Artificial neuron structure.*


In some cases, the learning network cannot find a logical connection between the results and the inputs, so that this does not stop learning, a bias value can be used for each cell. A high bias coefficient means that learning is low, and memorization is high.

The output layer is the last layer in the connection and receives inputs from the last set in the hidden layer. In this layer, data is collected and as a result, output data is exported in the planned method.

The learning process of the network established with the input, hidden and output layers is actually an optimization problem. The connection values between the cells of the network converge to reach the result depending on the optimization technique. A training set consisting of a certain number of input and output data is used for this purpose. If desired, a certain amount of data set is also tested to measure the consistency of the network. When the learning is complete, the values of the weights are fixed, and the network becomes serviceable. If desired, the mathematical equation of the network can be derived by following the cells from back to forward.

## **3. Pre-informing of ANNs**

Pre-information, unlike pre-training, is the processing of a certain information or rule into the structure of the network. In reality, a person learns under some prejudices while learning something. These prejudices are a mechanism that allows us to make predictions about the event that will occur, and they make these inferences by utilizing similar events. With these prejudices, the number of training data required for learning decreases by a considerable ratio. As a result, you have a clean and efficient way of learning.

For example, for a child who goes out for the first time, his mother advises never to talk to strangers, and he guesses that if the child talks to a stranger, the result may be bad. In this case, the people to talk to are the input parameters, the possibility of something bad happening as a result of the conversation is the output parameter. If the mother did not give advice to her child, the child would talk to everyone and eventually learn that talking to a stranger is bad and dangerous. As a result of the mother's suggestion, the weight of strangers among the input parameters (people to talk to) increased before they even experienced it.

In order to transfer prejudices to artificial neural networks, some rules must be followed:


In **Figure 3**, a total of 23 input parameters belonging to 3 input groups, these three groups are represented by two separate cells with hyperbolic tangent and sigmoid activations, and a hidden layer consisting of a total of 6 cells, and finally an output layer are described.

After the network structure is established, the next step is pre-informing the network. This stage is the transfer of information from the subconscious to network weights. This stage should be done for each group, and each group should be considered separately. The best method of this process is using AHP (Analytic Hierarchy Process) evaluation methods. In AHP evaluation methods, each parameter is compared with the other using verbal expressions. A simple superiority scale is used in this comparison. This means you can prepare a questionnaire and get the superiority information of parameters from an expert mind. After some calculations you will have the weights. These weights will be used in the network directly. The beauty of using this technique is consistency analysis can be done. In the end, if the input parameters are defined correctly, you will have 100% academically proofed subconscious information extraction.

AHP is a multi-criteria decision making (MCDM) method. The earliest reference to AHP is from 1972 [11]. Afterwards, Saaty [12], fully described the method in his article published in the Journal of Mathematical Psychology. AHP makes it possible to divide the problem into a hierarchy of sub-problems that can be more easily grasped and evaluated subjectively. Subjective evaluations are converted into numerical values and each alternative is processed and ranked on a numerical scale. Schematic AHP hierarchy is given in **Figure 4** below.

At the top of the hierarchy is the goal/purpose, while at the bottom there are alternatives. Between these two parts are the criteria and their sub-criteria. The most important feature that makes AHP important is that it can make comparisons both locally and globally when comparing the effect of sub-criteria at any level on alternatives.

*Artificial Neural Networks - Recent Advances, New Perspectives and Applications*

**Figure 3.** *Pre-informed ANN structure.*

*Pre-Informing Methods for ANNs DOI: http://dx.doi.org/10.5772/intechopen.106906*

**Figure 5.**

*Pairwise comparison chart of alternatives A and B. B is very inferior compared to A.*

Data corresponding to the hierarchical structure is collected by experts or decision makers by pairwise comparison of alternatives within the scope of a qualitative scale. Experts can rate the comparison as equal, less strong, strong, very strong and extremely strong. A general table, as shown in **Figure 5**, is used for expert evaluation of pairwise comparisons and data collection. This design can be customized for purpose, method and user usage.

Comparisons are made for each criterion and converted to quantitative numbers according to **Table 1**.

The pairwise comparison values of the criteria arranged in a matrix shown in **Table 2**.


**Table 1.** *Comparison scales and explanations.*


#### **Table 2.**

*Pairwise comparison matrix of criteria.*


**Table 3.**

*Obtaining the weights of the normalized comparison values of the criteria.*

In next step, each aij value is normalized by dividing by the corresponding column sum, and the weights shown in the table above are obtained with the corresponding equation shown in the **Table 3** above.

Network connections of input parameters using AHP are explained as shown above. Next step is how to assign weights. **Figure 6** shows how the AHP weights are defined to the network.

In this way, a large number of connections are canceled and a fast, efficient and less data-needing network is obtained.

## **4. Estimation of the severity of occupational accidents with using pre-informed ANN**

The pre-informed neural network method was used by Turker [13] to predict the severity of occupational accidents in construction projects. In this study, it has been estimated how the accidents will result if they happen instead of the possibility of their occurrence. The scope of the study was made for the 4 most common accident types in the world. These are falling from high, hit from a thrown/falling object, structural collapse, electrical contact. In this study, 23 measures to be taken in occupational accidents are discussed in 3 groups. These measures have been associated with occupational accident severity in the artificial intelligence network (**Table 4**).

First of all, defined measures in occupational accidents, which are the input parameters, were turned into a questionnaire by creating paired comparison questions for comparison within their own groups. Occupational health and safety experts working professionally in the sector were reached through a professional firm. The questionnaires were administered online and recorded. Survey results were taken and converted to weights with AHP matrices. Weights are shown in **Tables 5**–**7**.

## *Pre-Informing Methods for ANNs DOI: http://dx.doi.org/10.5772/intechopen.106906*

#### **Figure 6.**

*Connections of two input groups to three different types of representation cells and implementation of AHP weights.*


#### **Table 4.**

*Risk reduction measures in occupational accidents.*

After obtaining the preliminary information weights, 3 different artificial intelligence networks were created (**Table 8**). 140 historical accident data were collected on selected accidents within a company. These data include the precautions taken at the time of the accident and how the accident resulted. Accident results are divided into 4 categories: near miss, minor injury, serious injury, death. For each accident,

#### *Artificial Neural Networks - Recent Advances, New Perspectives and Applications*


#### **Table 5.**

*AHP weights of collective protection measures group.*


#### **Table 6.**

*AHP weights of personal protective equipment group.*


#### **Table 7.**

*AHP weights of control, training, inspection group.*

35 datasets were collected and a total of 120 datasets were used in training the network and 20 datasets were used in testing the network.

Three alternative network structures were trained with the same data. As a result, the pre-informed neural network provided a better learning rate of 5% in the training


### **Table 8.**

*3 alternative ANN structures.*

set and 15% in the test set compared to the neural network without a pre-informed stage. The configuration using parabolic activation function from pre-informed artificial neural networks provided 1% better learning rate in the training set and 15% better in the test set compared to the configuration using hyperbolic tangent. Other configurations with activation functions were not included in the comparisons because of their low learning rates. As a result, it has been seen that the preliminary information phase significantly increases the learning performance in artificial neural networks. In addition, it has been observed that the parabolic activation function performs better than the hyperbolic tangent in relation to the prevention methods in occupational accidents and the result of the accident (**Table 9**).


#### **Table 9.**

*3 alternative ANN structure results.*

## **5. Conclusions**

In this study, how the learning ability of artificial neural networks should be increased with the pre-informing method is explained with rules and demonstrations. It is not possible to implement this method with the existing ready-made ANN software on the market. Instead, ANN should be expressed mathematically, and preinforming method should be applied using programming languages such as MATLAB, Excel VBA, Python.

In this section, the application of this method has been demonstrated in an artificial neural network in which the precautions in occupational accidents are associated with the results of the accident and high performance has been achieved. With the application of the specified rules, this method can be used to solve many problems. In future studies, it can be investigated which other methods such as AHP can be used for the preliminary information phase.

## **Conflict of interest**

The authors declare no conflict of interest.

## **Author details**

Mustafa Turker Gorkem Construction Company, Ankara, Turkiye

\*Address all correspondence to: mustafaturker@yahoo.com

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Pre-Informing Methods for ANNs DOI: http://dx.doi.org/10.5772/intechopen.106906*

## **References**

[1] Graupe D. Principles of artificial neural networks. 3rd ed. In: Advanced Series in Circuits and Systems. Singapore: World Scientific Publishing Co. Pte. Ltd.; 2013. DOI: 10.1142/8868

[2] McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics. 1943;**5**(4): 115-133

[3] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review. 1958; **65**(6):386

[4] Graupe D, Lynn J. Some aspects regarding mechanistic modelling of recognition and memory. Cybernetica. 1969;**12**(3):119

[5] Hecht-Nielsen R. Counterpropagation networks. Applied Optics. 1987;**26**(23): 4979-4984

[6] Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences. 1982;**79**(8): 2554-2558

[7] Bellman R, Kalaba R. Dynamic programming and statistical communication theory. Proceedings of the National Academy of Sciences. 1957; **43**(8):749-751

[8] Widrow B, Winter R. Neural nets for adaptive filtering and adaptive pattern recognition. Computer. 1988; **21**(3):25-39

[9] Widrow B, Hoff ME. Adaptive Switching Circuits. Stanford, CA: Stanford University; 1960

[10] Lee RJ. Generalization of learning in a machine. In: Preprints of Papers Presented at the 14th National Meeting of the Association for Computing Machinery (ACM '59). New York, NY, USA: Association for Computing Machinery; 1959. pp. 1-4. DOI: 10.1145/ 612201.612227

[11] Saaty TL. An Eigenvalue Allocation Model for Prioritization and Planning. Pennsylvania, USA: University of Pennsylvania; 1972. pp. 28-31

[12] Saaty TL. A scaling method for priorities in hierarchical structures. Journal of Mathematical Psychology. 1977;**15**(3):234-281

[13] Turker M. Estimation of the Severity of Occupational Accidents in the Building Process with Pre-informed Artificial Learning Method. Gazi: Gazi University; 2021

## **Chapter 3**
