*The Digital Twin of an Organization by Utilizing Reinforcing Deep Learning DOI: http://dx.doi.org/10.5772/intechopen.96168*

*Deep Learning Applications*

standard-strategy.

where

where ð

β

( <sup>12</sup> ) *arcmax aA t*

 ∈ ∗+ *v S* π

β

*Rt*<sup>+</sup>1 = immediate reward

organization's human resources management, which recommends the implementation of certain management practices according to the annual plan. In practice it is common that this recommended plan is followed in various ways – some managers follow the plan while others do not. Those who do not follow the plan are likely to have learned good reasons why the recommended measures are not be imple-

mented. Approved defense excuses may be related to the lack of time, because profit target needed all the focus. Clearly, this behavior damages the benefits of good

All of these supervisor strategies are built on the supervisor's personal and everevolving managerial skills. In this management game theoretical approach there are personal leadership action competencies that determine the effect of each action. There is interaction between management competencies and learning strategies. The supervisor reflects the effectiveness of his or her own leadership behavior and

Digital twin advisor uses Bellman [20] expectation function in finding optimal actions for achieving Nash equilibrium. Bellman expectation function for strategy π is

> *t t* + + <sup>1</sup> β

Optimal policy forms from the actions that result in optimal value function, thus

 *aA t* π

= discounted maximum future value (12 months estimation).

<sup>ð</sup> ( ,, ,, ) <sup>ð</sup> ( <sup>12</sup> ) *s arcmax q sca R v S* <sup>∗</sup> = +∗ ∈ ∗+ β

( <sup>12</sup> ) (8)

(9)

*v s E R vS* <sup>ð</sup> ( ) = + π

*v S*( *<sup>t</sup>*+<sup>12</sup> ) = discounted future value (12 months estimation).

changes personal management strategies accordingly.

**7. Digital twin AI advisor using Bellman function**

*<sup>s</sup> R* <sup>∗</sup> = immediate state reward from strategy π\*

*Bellman function principle of marginal productivity value.*

**48**

**Figure 8.**

In our digital twin AI assistant is using Bellman function. It returns the combination of actions that gives the best value after floating 12 months. This is achieved so that first each action value is analyzed and sorted in magnitude of the value. Then the combinations of best actions are evaluated until marginal productivity of the value is achieved, see example at **Figure 8**. One simulation episode is 12 months; thus, the Bellman function maximize future reward even when the episode is coming to end.

Simulation game is done using UNITY 3D, for making possible to play the learning game episodes. Each episode is 12 months, consisting several workplace challenges. In the test runs we used Cash Cow episode where problems are easy, the market situation is steady, and the company does not seek special increase in revenue. State space problems are signaled by the workers that comes meeting the team-leader (agent). In this ODT there is so far 25 workplace challenges which reduce QWL according situational probability matrix. Leader has 32 best management practices (action space) that may be used as the leader prefers. Each action reduce profit and may improve QWL according state space situation specific probability function [23] (**Figure 9**).

We tested simulation using three different competence values; 30%, 60% and 90%. **Table 1** contains the results of three simulation rounds as follows:


It seems that with management competence levels 30% there are difficulties to achieve budgeted target result in profit. If QWL is sacrificed for short term wins, the cumulative profit result at the end of the year will be poor. It seems that in

**Figure 9.** *Simulation game user-interface.*


**Table 1.**

*Test episode values.*

one-year simulation episode there is achieved equilibrium where Q-learning points and QWL values are not exceeding. At 30% competence levels the BIAS episode Q-learning points varies between 0 and 3000 points. It seems as if the agent has no idea of how to achieve sustainable development where both QWL and profit improves. With low competence levels only with Bellman decisions will the profit slightly exceed the target value.

Manager's competence levels 60% are quite realistic, representing average line-managers leadership-action skills. In one-year simulation both BIAS and Learning strategies achieve Nash equilibrium, however in different profit outcome. At BIAS strategy the QWL is set at level 60%, which actually corresponds workforce medium QWL value in Finland [24]. When equilibrium is achieved, it may be difficult to change the behavior (see **Figure 10**).

**51**

than people.

**Figure 11.**

**8. Conclusions and discussion**

*The Digital Twin of an Organization by Utilizing Reinforcing Deep Learning*

Learning strategy has also equilibrium at competence level 60%, but higher QWL and profit values than in BIAS (see **Figure 11**). In our practical simulation studies this type of results are usually learned when simulation episodes are practiced over ten times. Must bear in mind that management systems have tendency to press maximizing short-term profits, thus remaining in BIAS mind set. Learning to be excellent leader requires several years practice in organization system that allows investing in people. This phenomenon may explain why some leaders learn to be

There is interesting phenomenon at 90% competence level BIAS strategy. Even with very high leadership skills the QWL is set at 60% where equilibrium remains. This is due to the behavior where leadership actions are implemented only when problems arise, thus there are no proactive investments in team development. In competence levels 90% it seems that one-year simulation episode is not enough time to achieve perfect equilibrium at Learning and Bellman strategies, since Q-learning points and QWL seems to continue improving throughout the episode.

BIAS strategy seems to achieve equilibrium where QWL is no longer improved and the Q-learning points finds management cultural maximum value. The lower the competence, the lower the level of QWL, however the difference is not so big, varying from 57% to 60%. This is interesting because in Finland the workforce medium QWL is around 60% [24]. One could argue that the profit maximization bias is common and not depending on line-managers leadership competences, and therefore most employees feel the QWL is around 60%. Moreover, the reason for profit maximization bias is not necessarily a lack of leaders' skills, but a management system that forces leaders to focus on short-term profit rather

Organizational management research has typically focused on qualitative behavioral factors that have a complex relationship to organizational success, and in addition, impacts often come with a delay. Each organization is a unique system with certain same laws, but also a unique context of its own. Therefore, repeating the empirical research results has proven to be challenging, which also makes it difficult to draw generalizable conclusions [7]. This article examines the utilization of model-based artificial intelligence in management development. ODT can be used to assess the impact of management behavior on an organization's success, considering situational data and the impact of management culture. ODT helps to explore

excellent team-leaders while majority remains at lower level.

It would need longer time period to achieve equilibrium.

*DOI: http://dx.doi.org/10.5772/intechopen.96168*

*Learning strategy Nash Q-learning equilibrium.*

**Figure 10.** *BIAS strategy Q-learning points.*

*The Digital Twin of an Organization by Utilizing Reinforcing Deep Learning DOI: http://dx.doi.org/10.5772/intechopen.96168*

**Figure 11.** *Learning strategy Nash Q-learning equilibrium.*

*Deep Learning Applications*

Competence 30%, BIAS

Competence 30%. Learning

Competence 30%, Bellman

Competence 60%, BIAS

Competence 60%, Learning

Competence 60%, Bellman

Competence 90%, BIAS

Competence 90%, learning

Competence 90%, Bellman

**Table 1.** *Test episode values.*

**Q-learning QWL** 

**start**

**QWL end**

**QWL difference**

3 310 60,2% 57,9% -2,3% 254 923 244 921 −10 002 —

5 370 60,2% 64,6% 4,4% 254 923 243 650 −11 273 yes

21412 60,2% 67,5% 7,3% 254 923 257 070 2 147 yes

5 854 60,2% 59,0% −1,2*%* 254 923 263 284 8 361 yes

20 425 60,2% 68,3% 8,1% 254 923 287 083 32 160 yes

35 931 60,2% 70,4% 10,2% 254 923 293 442 38 519 no

7 737 60,2% 59,8% −0,4% 254 923 276 828 21 905 yes

31 240 60,2% 69,9% 9,7% 254 923 305 604 50 681 no

38 446 60,2% 70,1% 9,9% 254 923 312 003 57 080 no

**Cumulative Profit** 

**in 1 y. Budj. € EBITDA**

**difference**

**Equilibrium** 

slightly exceed the target value.

ficult to change the behavior (see **Figure 10**).

one-year simulation episode there is achieved equilibrium where Q-learning points and QWL values are not exceeding. At 30% competence levels the BIAS episode Q-learning points varies between 0 and 3000 points. It seems as if the agent has no idea of how to achieve sustainable development where both QWL and profit improves. With low competence levels only with Bellman decisions will the profit

Manager's competence levels 60% are quite realistic, representing average line-managers leadership-action skills. In one-year simulation both BIAS and Learning strategies achieve Nash equilibrium, however in different profit outcome. At BIAS strategy the QWL is set at level 60%, which actually corresponds workforce medium QWL value in Finland [24]. When equilibrium is achieved, it may be dif-

**50**

**Figure 10.**

*BIAS strategy Q-learning points.*

Learning strategy has also equilibrium at competence level 60%, but higher QWL and profit values than in BIAS (see **Figure 11**). In our practical simulation studies this type of results are usually learned when simulation episodes are practiced over ten times. Must bear in mind that management systems have tendency to press maximizing short-term profits, thus remaining in BIAS mind set. Learning to be excellent leader requires several years practice in organization system that allows investing in people. This phenomenon may explain why some leaders learn to be excellent team-leaders while majority remains at lower level.

There is interesting phenomenon at 90% competence level BIAS strategy. Even with very high leadership skills the QWL is set at 60% where equilibrium remains. This is due to the behavior where leadership actions are implemented only when problems arise, thus there are no proactive investments in team development. In competence levels 90% it seems that one-year simulation episode is not enough time to achieve perfect equilibrium at Learning and Bellman strategies, since Q-learning points and QWL seems to continue improving throughout the episode. It would need longer time period to achieve equilibrium.

BIAS strategy seems to achieve equilibrium where QWL is no longer improved and the Q-learning points finds management cultural maximum value. The lower the competence, the lower the level of QWL, however the difference is not so big, varying from 57% to 60%. This is interesting because in Finland the workforce medium QWL is around 60% [24]. One could argue that the profit maximization bias is common and not depending on line-managers leadership competences, and therefore most employees feel the QWL is around 60%. Moreover, the reason for profit maximization bias is not necessarily a lack of leaders' skills, but a management system that forces leaders to focus on short-term profit rather than people.
