**6. Management learning strategies**

*Deep Learning Applications*

( ) max

where

( ) ( ) ( ) ( ) max <sup>1</sup> i 12 <sup>1</sup> ,, 1 ,, , , *Q sca Q sca <sup>t</sup>*<sup>+</sup>

αβ*<sup>t</sup>* ∆ ∆ *a tt Q s ca* <sup>+</sup> = − + γ +γ +

(7)

<sup>1</sup> ,, ,, ) *a tt Q s ca* <sup>+</sup> you can calibrate the

<sup>1</sup> ,, ,, *a tt Q s ca* <sup>+</sup> is expected maximum equilibrium strategy state change pay

γ∆12 is the expected cumulative profit reward (floating 12 months)

With expected equilibrium strategy pay ( ( ) max

*Management game learning phenomenon for finding equilibrium.*

Q-learning points so that it gives approximately 0-points when no actions are done, thus no learning was achieved. In our Q-learning function this value is 221 € monthly improvement value per employee. This corresponds the costs of one absence day per month for each worker or one working day more in work efficiency. Using this value, Q-learning gives 0 points regardless of what the supervisor's

α

β is [0,1] is discounted reward factor αt is [0,1] is the learning rate (1-αt) γ∆i is the monthly profit reward

from best actions a at competence levels c.

**46**

skills are.

**Figure 6.**

**Figure 5.**

*Management-game Markov sequences.*

There are three different strategic areas of prior-believes that forms the manager's learning context. These strategies are influenced by the supervisor's interaction skills (competences), which tend to either promote or hinder learning in the area. Every manager has personal competences, which seems to form personal Nash equilibrium and corresponding Q-learning results. According to this article, it seems that Nash equilibrium is different for each combination of manager's competences. In addition, the leader's strategic mind-set defines the equilibrium. Indeed, management equilibrium seems to be evolving phenomenon, depending on organization and its' players change of characteristics (**Figure 7**).

The focus of the signal-strategy (π <sup>ô</sup> ) is to learn to understand employee signals and utilize them to achieve best reward. This strategy is strongly related to the psychological agreement between workers and supervisor. When working team members learn to play general-sum-game, the signals are provided early and in constructive way, which foster optimal actions. In case signal-strategy turn to 0-sum-game the signals tend to be hided or used to harm other members of the team. Thus, creating best foundation for signal-strategy is grounded on continuous fostering of psychological agreement at the working society.

Profit-strategy (π € ) focus is to learn from experience how target profit is achieved at anticipated time span. Economical profit indicators are usually constantly monitored, giving them a lot of attention. In addition, organization profit target time span is determined at management system, which create certain predefined attitude towards achieving profit. From a strategic point of view, there is a big difference between focusing on maximum result this month or aiming for the maximum profit with delay of several months. If a management system requires maximum results over a short period of time, then it reinforces the detrimental profit-maximization bias. In this bias the team-leader tend to push workers performance too much, which lead to maximizing performance that is declining. In addition, a manager under this bias neglect employee signals because the signals pose a risk that short-term profits are threatened when scarce working hours are used to solve the problem. Clearly, this behavior damages the signal game, as employees learn that problems are not worth reporting.

The focus of the standard-strategy (π *st* ) is to learn how to plan actions in advance to secure the reward in the future. Usually this strategy comes from the

**Figure 7.** *Management learning strategies.*

#### *Deep Learning Applications*

organization's human resources management, which recommends the implementation of certain management practices according to the annual plan. In practice it is common that this recommended plan is followed in various ways – some managers follow the plan while others do not. Those who do not follow the plan are likely to have learned good reasons why the recommended measures are not be implemented. Approved defense excuses may be related to the lack of time, because profit target needed all the focus. Clearly, this behavior damages the benefits of good standard-strategy.

All of these supervisor strategies are built on the supervisor's personal and everevolving managerial skills. In this management game theoretical approach there are personal leadership action competencies that determine the effect of each action. There is interaction between management competencies and learning strategies. The supervisor reflects the effectiveness of his or her own leadership behavior and changes personal management strategies accordingly.
