Applications of Multi-Agent Technologies Combined with Machine Learning

### **Chapter 3**

## Role of an Optimal Multiagent Scheduling in Different Applications Using ML

*Fahmina Taranum, Sridevi K, Maniza Hijab, Syeda Fouzia Sayeedunissa, Afshan Kaleem and Niraja K.S*

#### **Abstract**

Scheduling is regarded as one of the vital decision-making processes used frequently in many real-time cases. It manages everything from resource allocation to the task completion, with the goal to optimize the desired objectives. Subject to the problem, the resources, tasks, and goals can differ. The aim is to design a corporative multiagent system for optimal scheduling. Many of the scheduling available algorithms calculate optimality based on different perspectives. The proposal is to create the dataset using multiple algorithms with different performance metrics to find an optimal one. This data can be imported into machine learning tools for training and predicting, based on the selected performance metrics. The algorithm considered in the empirical analysis includes first come first serve, Round robin, and Ant colony approach. The major finding shows that scheduling using Ant colony is an optimal algorithm, which is based on speed and velocity. The future extension would be to check the correctness of optimality using machine learning tools.

**Keywords:** optimality, scheduling, machine learning

#### **1. Introduction**

#### **1.1 Scheduling**

The scheduling task can be comprehended as a distributed consecutive decisionmaking process, which is designed using multiagent reinforcement learning algorithms. These algorithms provide the agents with effective learning as they involve frequent interactions with the environment, thereby enhancing their relevance to numerous real-time cases. The scheduling entities used are resources and tasks, which are interchangeably assigned to each other. The resources can be classified as heterogeneous and homogeneous. Based on the generation of throughput, resources are classified as homogenous resources with similar or uniform throughput and heterogeneous refers to distinct type. Dependent tasks are interconnected job endeavors, in which the task cannot initiate until the accomplishment of a separate task. The characteristics based on defining the task include priority and QoS. Another widely used

scheduling mechanism is based on the notion of priority rules. In this method, the resource allocation and task execution are scheduled by designating some priorities to them as per the situation, demand, requirement, or need.

The priority can be based on the deadline and load. The magnitude of work carried out to processes for manufacturing components is termed as load, which must be balanced to execute fair scheduling. Deadlines are the limits or constraints to complete the job, which can be classified as firm, soft, and hard. Based on the applicability the classification is named as soft if is not rigid, or if some limited consideration is given then it is firm, or if it is not accepted after the time limit then hard The components used for the entities are depicted in **Figure 1**.

#### **1.2 Optimal schedule**

Optimality can be measured with respect to task or resource using multiple metrics, as shown in **Figure 2**. Most of the optimization practices are extremely nonlinear and cross-media under different contexts and constraints with high convergence performance and low computational cost. It can be classified as gradient or derivative, stochastic or deterministic, or population-based or trajectory-based.

The word optimal means the best and most desirable solution and scheduling means arranging, controlling, and optimizing work and workloads. From that sequence of job, there should be maximum utilization of machines and less waiting time for the job and maximum work should be done by satisfying various constraints. It is an optimization problem in scheduling where the input to the machine is the list of jobs and the output is the schedule of all the jobs on the particular machine. The schedule should optimize the throughput and resource utilization. This is also known as machine scheduling, processor scheduling, multiprocessor scheduling, or just scheduling. The jobs can be scheduled on single machine or multiple machines based on the requirements. Either a set of jobs given to only one machine or parallel to more than one machine.

**Figure 1.** *Scheduling entities.*

*Role of an Optimal Multiagent Scheduling in Different Applications Using ML DOI: http://dx.doi.org/10.5772/intechopen.108314*

**Figure 2.** *Criteria for optimal scheduling.*

The criteria for scheduling a resource are its proper utilization, generating minimum makespan, and maximum throughput. Utilization refers to the instance of manufacturing a component based on its real-world or commercial usage. Makespan refers to the time taken by the units of resources to properly complete executing all the jobs. Throughput refers to the process of executing the job in a unit amount of time.

There are various parameters considered to get the optimal job or task for setup.


If there is a machine M with a set of jobs J1, J2, and J3 that needs to be executed by M, with maximum work in minimum time to complete all jobs with the best sequence is called optimal job scheduling.

#### **1.3 Types of scheduling**

There are many algorithms for scheduling of dependent or independent tasks in a single processor or multiprocessor environment with different speeds. And using dynamic programming algorithms, the best optimal scheduling can be obtained by considering the time taken to complete the job or priority of the job and any other constraints need to satisfy to finish the particular tasks.

### **Figure 3.**

*Types of scheduling.*

The effective allocation of the order of the jobs to a machine to get the maximum profit and minimum cost from the process leads to optimal scheduling is depicted in **Figure 3**.

Classification of the scheduling can be preemptive or nonprimitive, in which the decision to take the resource from the low-priority process and allocate it to a higher priory one is taken. For the simultaneous, linear or nonlinear class of execution, the second level of classification is used. Finally, the decision to distribute the resource can be done at central, distributed, or combined by using different environments based on the demand and architecture used. The most widely used scheduling algorithms are Round-robin, First come first serve, Shortest job first, Earliest Deadline First, Priority Scheduling, Multi-Level Queue Algorithm, Multi-level Feedback Queue Scheduling Algorithm, and shortest Remaining Time.

#### **1.4 Approaches used in machine learning**

Reinforcement learning, which is usually preferred is an area of machine learning which emphasizes how intelligently an agent decides an environment to achieve the optimum reward. This mechanism allows learning using trial-and-error communication with the environment. Scheduling helps to interpret the data accurately, if collaborated with machine learning as it generates the optimal results based on predictions as depicted in **Figure 4**. Through frequent trials, the agents come upon different possible outcomes for an action and thereby find the most appropriate action to be done in any given situation.

For instance, upon encountering an unexpected situation, reinforcement learning can be found helpful, as it enables learning from prior results and altering the parameters as per the need before passing for the subsequent iterations thereby ensuring that the solutions are as desired and robust.

The learning algorithms could benefit from this idea, by associating the priorities with the feedback signal the agents receive when executing the actions. Later by incorporating the priority rules with the feedback received for each action, the learning algorithm can be improvised. The aim is to generate multiple cases from different

**Figure 4.** *Machine learning and optimality.*

algorithms and train a machine with the data generated from execution, to validate the optimality of scheduling using machine-generated test results.

#### **1.5 Applications**

Based on the usage of multitask applications in every field, a need for efficient utilization of optimal multiagent scheduling algorithms is highlighted. Machine learning algorithms are widely used to improve the optimality of multiagent scheduling in the field of production and transportation. The major domains where some of these challenges can be improved have been discussed below.

#### *1.5.1 Transportation domain*

Every individual had to rely on some mode of transportation which can be through land, air, or water. With the advancement of technology, the usage of air and land transportation has drastically improved due to efficient time management and comfort. The development of many multiagent system software has been of the major reasons for this. Though the multiagent scheduling softwares still exist, there are some challenges that need to be improved.

#### *1.5.2 Railway domain*

Transportation is the major source of people especially above 75% of people relies on railway networks throughout the globe. When it comes to railway networks, most railways run on a single line where more limitations exist for decades. Although multiagent systems have applied to this, many of the existing challenges are unavoidable. The train delay rate is a challenging issue to date, even with the advances in technology, it cannot be handled perfectly. With the advancement of machine learning algorithms, we can find a path with the optimal multiagent scheduling algorithm. Electricity transport management plays a vital role, especially in railway networks with the utilization of machine learning

algorithms. We can find an optimal solution for scheduling the multiagent system to choose the proper routing to overcome the major challenges faced by the transportation domain.

#### *1.5.3 Airline reservation domain*

Airline flight bidding software mostly uses multiagent system prototype software, where the buyers buy the e-ticket. The optimal multiagent scheduling algorithm with machine learning can direct the buyer to a better price line system through which buyers take preferences and correlate the parameters with the available flights.

#### *1.5.4 Manufacturing domain*

Every business domain has its own production department where different jobs have to be scheduled with multiagent approaches. Machine Learning usage has become unimaginable heights in the last few years where the manufacturing industry's economic growth rate can be improved with the utilization of optimal multiagent scheduling algorithms.

#### *1.5.5 Electronic commerce domain*

This domain will show its increased growth rate in the near future as most of the day-to-day transactions are carried out these days through e-commerce combined with the multiagent scheduler and machine learning algorithms.

#### **2. Literature review**

The proposal by Kumar et al. [1] is to enterprise a scheme for processing tasks to allocate resources at runtime. The experimentation was performed on Cloudsim to validate the effective technique for optimum solutions. The proposal aims at tracing results with initial high velocity, which is gradually decreased improved exploitation.

The proposal of reinforcement learning is used by author Chi Zhang et al. [2], to achieve corporation in the scheduling of multiagent working in a team, which is an agent feedback-based learning used to learn from the results of the environment and perform action. The concept of Markov chain and Proximal Policy Optimization is used to check for optimal scheduling. The performance metrics used are load consumption and price of electricity to check for optimal results. The optimality of the results is based on the trained dataset or historical data used. Multiagent reinforcement learning is used when manifold inhabitants with multiple global constraints interact with different scheduling errands. This approach is used in a realistic environment to obtain optical results using predictions.

The population diversity control problem is selected in the proposal by author Zha oyun Song, Bo Liu et al. [3] to improve optimization. The method is based on the runtime selection for adaptive and diversified controlling parameters. The accuracy is calculated for the cluster of movable particles.

The concept of using low voltage over harmonic distortion of modulated index is used to set the image threshold for increasing the image accuracy by the author [4].

#### *Role of an Optimal Multiagent Scheduling in Different Applications Using ML DOI: http://dx.doi.org/10.5772/intechopen.108314*

The scheduling of multiagent in a cooperative, spatial and temporal constraints is used by the author Julie Shah [5] to obtain an optimal task assignment. The experimental analysis is done on the hill climbing algorithm using intricate computation and the accuracy versus optimality is compared with the conventional algorithm. An empirical result proves that the optimality of hill climbing is better using the mixed integer collaboration approach of linear programming.

The author Khaled M. Khalil [6] has used the Netlogo simulator for experimentation. The aim is to maximize the agent group by maximizing the reward to the agent working in the group using Q-learning algorithm (action value function) along with working in an interactive, autonomous, and seamless environment. The approaches of high relevance to realistic solutions grounded on AI was proposed by authors Martin Riedmiller et al. [7] to optimize, manufacture and control the scheduling based on distributed sequential approaches for employment related decision making with multi agent reinforcement learning.

The Authors E. Grace Mary Kanaga [8] have used an approach of ANT colony for an optimal scheduling algorithm. The continuous optimization problems are solved using velocity, inertia weight location of the particle, and global and local best positions. Scheduling patients for an optimal accurate solution is performed.

Zhi hui et al. [9] have aimed to implement adaptive behavior for population distribution along with fitness.

The target is to improve and validate the global convergent ability by authors Jianchao Zeng, Jing Jie et al. [10]. Kennedy et al. [11] have invented the perception using non-linear functions of Ant colony using particle swarm methodologies for optimization. Kennedy et al. [11] is the inventor behind the use of non-linear functions to design methodologies based on cluster optimization. The relationships amongst PSO and its integration with artificial life using genetic algorithms are proposed and discussed.

The author Wilfried Brauer et al. [12], aims at scheduling multimachine, and assigning jobs based on demand like cost, the effectiveness of results, and time. The approach of artificial intelligence is used to collect and train the data in distributed, parallel, asynchronous and corporative multi-agent environments. To generate the results two learning steps known as successor selection and estimate adjustment are repeatedly applied and experimented on individual machines.

#### **3. Proposal**

The idea is to generate optimal scheduling, which is tested for set of scheduling algorithms and the major finding is ANT colony scheduling gave the best results. The algorithm aims to predict and generate the best position using parameters like inertia, weight, speed, velocity, distance, acceleration constants, direction, and local and global position. The local and global positions are the best positions for each and the group of jobs, respectively.

The movement of the job is in the direction of the best location value. The velocity is calculated using Eq. (1).

$$\mathbf{V}\mathbf{e}^{x+1} = \mathbf{w}\mathbf{V}\mathbf{e}^{x} + \mathbf{a}\mathbf{c}\_{1}\mathbf{p}\_{1}(\mathbf{L}\mathbf{b}\mathbf{s}^{x} - \mathbf{x}^{x}) + \mathbf{a}\mathbf{c}\_{2}\mathbf{p}\_{2}(\mathbf{G}\mathbf{b}\mathbf{s}^{k} - \mathbf{x}^{x}) \tag{1}$$
 
$$\mathbf{i} \qquad \mathbf{i} \qquad \mathbf{i} \qquad \mathbf{i} \qquad \mathbf{i} \qquad \mathbf{i}$$

Lbst is the minimum value of the experience position of each job and the group's minimum value of experience position is Gbst. The movement is monitored using the kth step of ith job to get the next new place. The first term of Eq. (1) gives the inertia with respect to previous velocity; second and third terms are used to fetch the direction of each and group of job respectively. An accelerating object (ac) is used to persuade a uniform alteration in its velocity at each instant. Acceleration constants used in Eq. (1) are ac1 and ac2. The random numbers are p1 and p2 [0, 1] of range are chosen. Velocity and Position update is derived using Eq. (1) as given by Kennedy and Eberhart [11].

$$\mathbf{x}^{z+1} = \mathbf{x}^z + \mathbf{V}\mathbf{e}^{z+1} \tag{2}$$

$$\mathbf{j} \qquad \mathbf{j} \qquad \mathbf{j} \qquad \mathbf{j}$$

Eq. (2) is used to get the new position from the previous position and the velocity of its movement. After getting the best and optimal position, all the particles synchronize with each other. According to Kennedy et al. [11], the search space is epitomized in a D-dimensional vector and the velocity and position update are calculated using Eqs. (1) and (2). The jth particle's position in D-dimensional vector is calculated using Eq. (3). The local best (Lbst) and the global best (Gbst) of the jth particle are denoted in Eqs. (3) and (4) respectively.

$$\mathbf{x}\_{\rangle} = \begin{pmatrix} \mathbf{x}\_{\rangle1}, \mathbf{x}\_{\rangle2}, \mathbf{x}\_{\rangle3}, \mathbf{x}\_{\rangle4}, \dots, \mathbf{x}\_{\text{jd}} \end{pmatrix} \tag{3}$$

$$\mathbf{v}\mathbf{ex}\_{\rangle} = \begin{pmatrix} \mathbf{x}\_{\rangle1}, \mathbf{x}\_{\rangle2}, \mathbf{x}\_{\rangle3}, \mathbf{x}\_{\ranglek}, \dots, \mathbf{x}\_{\text{jd}} \end{pmatrix} \tag{4}$$

$$\mathbf{l}\_{\rangle} = \begin{pmatrix} \mathbf{x}\_{\rangle1}, \mathbf{x}\_{\rangle2}, \mathbf{x}\_{\rangle3}, \mathbf{x}\_{\rangle4}, \dots, \mathbf{x}\_{\text{jd}} \end{pmatrix} \tag{5}$$

$$\mathbf{g}\_{\mathbf{j}} = \begin{pmatrix} \mathbf{x}\_{1\circ}, \mathbf{x}\_{2\circ}, \mathbf{x}\_{3\circ}, \mathbf{x}\_{\circ}, \dots, \mathbf{x}\_{\circ} \mathbf{d} \end{pmatrix} \tag{6}$$

For scheduling a process with recourse, the problem is to schedule 'n' jobs with appropriate resources; each process has a set of sequential tasks (operations) and an index of the job. The count of operations used in scheduling is denoted by D in (7).

$$\mathbf{D} = \sum \mathbf{n}\_i \tag{7}$$

$$\mathbf{Y} = \sum\_{i=1}^{i-1} \mathbf{n}\_i + \mathbf{1} \tag{8}$$

#### **3.1 Optimization for job scheduling**

To generate an ideal solution for Job-scheduling multiple scheduling algorithms are available, which works on different parameters. To generate an optimal one, the need to select the parameter for performance metric becomes mandatory, based on which the complete experimentation can be carried out. Let the proposal be defined on the continuous optimization parameters like particle position xj, Velocity vej, acceleration coefficients ac1 and ac2, and inertia weight ω. Scheduling a task is a conjunctional and feasible optimization with sequence of selected resource along with its operation. The aim is to find an optimal schedule without busy waiting and with fair scheduling.

#### **3.2 Task scheduling with optimization**

The scenario consisting of n task (or jobs) denoted as J = {J1, J2, … , Jn}, sequential task T={1 … n}, Resource R = {R1, R2, … , Rm}, and Oij are the operations with i, j are the indices of the jobs and task respectively. Every task of the vth job is numbered as v and for same number of v tasks in **Table 1**, and y is defined in Eq. (8). One sample representation of the job for optimization of scheduling is depicted in **Table 1**.

a. Phase I: Scheduling for three jobs with three resources is depicted in **Table 2**. The experimentation is done in multiple permutations (3p3 ways for 3,3 job, resource allocation) giving in total 9 such task possibility. **Table 2** uses a few of the nine possible permutations, where R1, R2, and R3 are the distinct resources. The example for job representation along with position and velocity vector initialization of scheduling is given in **Table 3**.

Initialization of the particle position is done with random numbers from xmin to xmax, and xmin is set to 0 and xmax is set to 2. The velocity vector is initialized with the random numbers limitation of vmin as 4 and vmax as 4 as shown in **Table 3**.

b. Phase II: Decoding Particles with job solutions: An integration of optimizing for job scheduling cannot be deployed directly as a solution to particle position. Hence, indirect ways are adopted to decode particle representation as a solution to schedule job problem. For decoding jobs into a schedule, the algorithmic steps are listed below:


#### **Table 1.**

*Job's representation for resource scheduling.*


#### **Table 2.**

*Job scheduling problem for 3:3 jobs and resource allocation.*


**Table 3.**

*Job representation along with initialization of position and velocity vector for ith job.*


**Table 4.**

*Result obtained for sequenced particles after phase II.*


**Table 5.**

*Decoded schedule with optimization.*

Step I: sort in ascending order the values of the position vector.

Step II: arrange the tasks in the corresponding order of the values of the position vector obtained in step I.

Step III: The resultant is with sequential order of task along with the corresponding positions as shown in **Table 4**.

Using the sequence obtained with operation-based permutation is π = (2,3,1,2,2,1,3,1,3). An element of π with a value i for Job Ji. The jth occurrence of i in π refers to operation Oij for the jth task (operation) of Job ith. The precedence of the task is determined simply by the order of the elements of π and all the task are ready for scheduling as per the first row of **Table 4**. Based on the permutation, the first element selected for scheduling according to the permutation is 2, therefore, first unit of the third Job is processed on resource R3.

Followed with the first task of the second Job is processed on R2, and then the first unit of the task1 is scheduled on R1. The obtained decode schedule is shown in **Table 5**.

Jobs 1,2, and 3 complete the execution at 9, 8, and 6 time respectively as concluded from **Table 5** after applying optimization algorithm based on the order of tasks, as depicted below:

O ¼ f g O31, O21, O11, O12, O22, O32, O33, O13, O23

It is optimal scheduling as all the processes have been completed within time limit of 9, with the time of completion for jobs as {J1, J2, J3 as 9,8,6}.

#### **3.3 First come first server algorithm**

The working concept of FCFS is to serve the first in jobs on high priority, and results obtained after applying the scheduling are shown in **Table 6**.

*Role of an Optimal Multiagent Scheduling in Different Applications Using ML DOI: http://dx.doi.org/10.5772/intechopen.108314*


**Table 6.**

*Decoded schedule with FCFS.*


**Table 7.**

*Job decoded schedule using priority.*

Jobs 1,2, and 3 complete the execution at 6, 11, and 16 times respectively after applying the FCFS scheduling algorithm as concluded from **Table 6**. The order is based on first-entered jobs and would be the first to be catered.

The selected operation of tasks is as depicted below:

O ¼ f g O11, O12, O21, O13, O22, O31, O23, O32, O33

Hence, the conclusion is drawn as FCFS is not optimal when compared to optimization algorithm. Job 1 has arrived first and he/she is processed for task 1 on resource R1 which is available. Next Job 1 task 2 has to be processed on resource R2 and after completion of the task; Job 1 has to get the resource R3 for task 3. Thus, based on the FCFS description **Table 6** decoded schedules are obtained. Based on FCFS the calculated total job completion time is:

$$\{\mathbf{J1} = \mathbf{6}, \mathbf{J2} = \mathbf{11} \text{ and } \mathbf{J3} = \mathbf{16}\}$$

#### *3.3.1 Based on age priority (using preemptive manner)*

The working concept of priority is based on age priority, and results obtained after applying the scheduling are depicted in **Table 7**.

Job 1 is processed on task 1 with a request for resource R1, Job 2 needs resource R2 for its first task and based on its availability is allocated for 1 slot. Makespan time for jobs is {J1 = 14–1 = 13, J2 = 14–3 = 4, J3 = 12–5 = 7}, i.e{J1, J2, J3 is 13, 4 and 7, respectively}.

#### **4. Conclusion**

Multi-agent technologies and approaches can be applied to machine learning, based on their capabilities of flexibility, adaptability and self-sufficiency. The applications designed using these methodologies are realistic, dynamic and distributive in nature. The idea is to build a best decision making model using the latest approaches of machine learning. The experiments are tested for four types of Scheduling Round robin, FCFS, Priority-based and Ant colony or particle swarm optimization technique. The results validate the Ant colony approach with the optimal answer. The dataset generated with this experimentation for multi-agent scheduling is collected with different performance metrics. The future work of this proposal is to justify the validation of multiagent scheduling using a machine learning tool, by training the machine using the generated dataset.

### **Author details**

Fahmina Taranum<sup>1</sup> , Sridevi K<sup>1</sup> , Maniza Hijab<sup>1</sup> \*, Syeda Fouzia Sayeedunissa<sup>1</sup> , Afshan Kaleem<sup>1</sup> and Niraja K.S2

1 Muffakham Jah College of Engineering and Technology affiliated to Osmania University, Hyderabad, Telangana, India

2 BVRIT Hyderabad College of Engineering for Women, Hyderabad, Telangana, India

\*Address all correspondence to: hijabmaniza@gmail.com

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Role of an Optimal Multiagent Scheduling in Different Applications Using ML DOI: http://dx.doi.org/10.5772/intechopen.108314*

### **References**

[1] Kumar M, Sharma SC. PSO-based novel resource scheduling technique to improve QoS parameters in cloud computing. Real-world optimization problems and meta-heuristics. Neural Computing and Applications. 2019; **134**(16):1-24. DOI: 10.1145/3302505. 3310069

[2] Zhang C, Kuppannagari SR, Kannan R, Xiong C, Prasanna VK. A cooperative multi-agent deep reinforcement learning framework for real-time residential load scheduling. Association for Computing Machinery. ACM. 2019. pp. 59-69. ISBN: 978-1- 4503-6283-2/19/04 DOI: 10.1145/ 3302505.3310069

[3] Song Z, Liu B, Cheng H. Adaptive particle swarm optimization with population diversity control and its application in tandem blade optimization. Journals of Mechanical Engineering Science, Sage. 2018;**233**(6):1-17

[4] Dulhare U. Prediction system for heart disease using Naïve-Bayes and particle swarm optimization. Biomedical Research. 2018;**29**(12):2646-2649

[5] Shah J, Zhang C. Co-optimizating multi-agent placement with task assignment and scheduling. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. ACM. 2016. pp. 3308-3314. ISSN No: 978-1-57735-770-4

[6] Khalil KM, Abdel-Aziz M, Nazmy TT, Salem A-BM. MLIMAS: A framework for machine learning in interactive, multiagent systems. Procedia Computer Science. 2015;**65**:827-835. DOI: 10.1016/j. procs.2015.09.035

[7] Riedmiller M, Sperschneider V, Brockmann W, Nuchter A. Multi-agent reinforcement learning approaches for distributed job-shop scheduling problems. 2009

[8] Kanaga EGM, Valarmathi ML. Multi-agent based patient scheduling using particle-swarm optimization. Procedia Engineering. 2012;**30**:386-393

[9] Zhan Z-H, Zhang J, Li Y, Chung HS-H. Adaptive particle swarm optimization. IEEE Transactions on Systems, Man, and Cybernetics. 2009;**39**(6):1362-1381

[10] Zeng J, Jie J, Hu J. Adaptive particle swarm optimization guided by acceleration information. International Conference on Computational Intelligence and Security. IEEE, 1-4244. 2006. pp. 351-355

[11] Kennedy J, Eberhart R. Particle swarm optimization. IEEE Xplore. 2002. pp. 1942-1948

[12] Brauer W, Weiß G. Multi-machine scheduling – a multi-agent learning approach

#### **Chapter 4**

## On an Approach to Knowledge Management and the Development of the Knowledge-Вased Multi-Agent System

*Evgeniy Zaytsev and Elena Nurmatova*

#### **Abstract**

The chapter discusses the architecture of the Knowledge-Вased Multi-Agent System (KBMAS) and describes the software agent models. The purpose and functional organization of the system software agents used for planning and management of computing resources of the KBMAS are considered. An approach to the applied software agent's development that integrates knowledge-based reasoning mechanisms with neural network models is proposed. The structure of the problem-oriented Multi-Agent Solver, including groups of reactive and cognitive software agents used to solve complex ill-formalized problems, is considered. The interaction diagram of reactive agents and the states and transitions diagram of cognitive agent of the computing node are given. The control scheme is shown that includes methods for determining the availability of microservices used by agents, reliability assurances and coordinated operation of the system's computing nodes. The method of reinforcement learning, the system of rules (productions), and the queries to the knowledge base are described. Methods of distribution of software agents in the KBMAS computing nodes, as well as construction of an optimal logical structure of the Distributed Knowledge Base, which has minimal information connectivity and ensures effective operation of the system on multicomputers, are proposed.

**Keywords:** distributed system, multi-agent system, knowledge-base, intelligent software agents, Fuzzy system, reinforcement learning, data localization optimization

#### **1. Introduction**

The Knowledge-Вased Multi-Agent System is a Distributed Artificial Intelligence System that uses intelligent applied and system software agents. Knowledge-based reasoning mechanisms and artificial neural networks are integrated into the model of applied software agents, which are designed to solve complex, ill-formalizable problems [1–4].

System software agents are used to effectively manage computational processes and provide application software agents with access to information-computing resources of the multicomputer.

Applied software agents function using Event-Driven Microservices (EDM) independent, autonomous resources designed as separate interacting processes with lightweight interprocess communications. Microservices can be implemented as autonomous processes or as functions, they may or may not store state.

A feature of the microservice architecture is that microservices are loosely coupled. EDM communicates not through request-response API (Application Programming Interface), but through events defined in event streams, which are neutral with respect to one technology or another. This allows you to choose the most appropriate tooling to implement each job, allowing you to achieve the required level of performance.

In today's EDM architecture, information is exchanged by issuing and consuming events, which may be transferred through simple notifications as well as complex structures with state support. The events are not destroyed when consumed, as in conventional messaging systems, but remain available to other consumers, who can read them as needed.

Microservices consume events from input event streams, process information, generate their own outgoing events, provide data for to implement a request-response access scheme, exchange information with third-party APIs, or perform other necessary actions.

Unlike Service-Oriented Architecture (SOA), which typically uses web service standards like SOAP (Simple Object Access Protocol), microservice architecture uses simpler protocols. A microservice can be designed as a stand-alone service on a PAAS (Platform As A Service), or it can be a process of its own Operating System.

In traditional Operating System architectures, information-computing resources are hidden behind universal APIs that do not allow the KBMAS developer to implement problem-dependent optimization. Effective implementation of KBMAS is possible on the basis of an exokernel OS and event-driven intelligent system software modules.

High performance is achieved through the implementation of special mechanisms for managing information and computing resources, as well as the possibility of direct access to hardware [5, 6].

Using the services of an exokernel OS, the KBMAS designer is able to choose or implement his own System Libraries (LibOS). For example, specialized VMM (Virtual Memory Management) or IPC (InterProcess Communication) modules defined in LibOS can work much faster than general-purpose software modules doing a similar job in a monolithic or microkernel Operating System. System software agents can effectively manage information-computing resources using exokernel OS services, which is the basis for creating high-performance knowledge processing systems.

Currently, in distributed systems, hypervisors are usually used instead of exokernel architecture. Hypervisors of the first type do not use OS services, they directly control the hardware. In hypervisors of the second type, instead of an exokernel that provides untrusted servers with a low-level interface to access computing resources, the host OS runs. Guest OSs are used instead of user (unprivileged) mode servers. The advantage of building an exokernel virtualization system instead of using hypervisors is that, in this case, an extra layer of mapping is eliminated.

Unlike a hypervisor, which must support disk address translation tables (and other tables to convert virtual resources to physical resources), there is no need for such reassignment when using an exokernel OS. The exokernel only needs to keep track of which Virtual Machine (VM) has been given certain hardware resources. The exokernel architecture does not require creating a copy of a real computer and

*On an Approach to Knowledge Management and the Development of the Knowledge-Вased… DOI: http://dx.doi.org/10.5772/intechopen.106738*

isolating virtual machines from physical resources. In this case, each VM is provided with a subset of the real computer resources. The exokernel OS operates in privileged mode, distributing computing resources between VMs and controlling them so that none of the machines tries to use not intended for this VM computing resources.

Virtual machines can serve as containers that contain software agents (processes and threads) and their surroundings. It is easier to provide mobility of software agents together with their VM than to move individual agents around. When using a virtual machine, the local group of software agents is moved together with the necessary environment for it (configuration files, system tables, etc.).

Development of KBMAS on the basis of exokernel OS includes creation of system and applied software agents, definition of their states and actions, as well as events (messages), delivery environment properties, and other characteristics describing the agents and their interaction. Development of a problem-oriented KBMAS can be performed using a special Multi-Agent-KB5 software toolkit [4], which allows knowledge engineers to design a distributed intelligent system based on high-level abstractions implemented by high-performance specialized system software modules. Using the Multi-Agent-KB5 toolkit, a knowledge engineer can create groups of applied software agents that act rationally, are able to respond in a timely manner to environmental events and learn in this environment.

In a broad sense, rationality is the ability to do the right thing. Ideal rationality, that is, choosing the optimal (best) action in a given situation, is not always achievable and may require large computational resources. The concept of rationality in the KBMAS uses to both applied and system software agents.

In logic programming paradigm, the rational behavior of an applied software agent is realized by means of logical inference methods (resolutions and unifications). A software agent can act rationally without using logical inference. In some situations, a reflex action may be more successful than a slower action taken after logical inference.

Problem-solving in the KBMAS is done by decomposing a complex problem into subtasks, which are jointly solved by rational applied software agents. Horizontal and vertical decomposition is used. Horizontal decomposition results in a multi-connected system with a flat structure. Vertical decomposition results in a hierarchical system with multiple levels. The levels are vertically subordinate to each other and have their own goals and functions, the implementation of which is aimed at achieving the global goal of the intelligent system.

Two types of applied software agents are used in the KBMAS to solve applied problems: cognitive and reactive. Mathematical models of these types of agents are described in [4].

KBMAS is an emergent system that implements the principle of self-organization. In the self-organizing KBMAS software agents are capable of making decisions under conditions of incompleteness, vagueness, and fuzziness of knowledge.

#### **2. Structural and functional organization of the KBMAS**

In traditional Knowledge-Based Systems, problems are solved by a single intelligent solver. This intelligent solver is designed as a monolithic application. It is assumed to use a complete and consistent Knowledge Base and has a global view of the problem. This model uses monotone logic (closed-world), the intelligent solver search by AND/OR-connection (reduction) graph (**Figure 1**).

The reduction graph shown in **Figure 1** corresponds to the following fragment of the formalized description of the problem domain (in the language of logical programming):

RCSF(A,B,C,...,G,F) :- RCSF(A,B,C,...,G, f1); ... ; RCSF(A,B,C,...,G, fn). RCSF(A,B,C,...,G,f1) :- RCS(A,B,C,f1), VS(G,A,B,C,f1). RCS(A,B,C,F) :- SOVA(A,B,F), SOVB(B,C,F), SС(C,A). SOVA(X,Y,F) :- SA(X,Y), FR1(X,Y,F). SOVB(K,L,F) :- SB(K,L), FС(L,F). FR1(G,M,R) :- FA(G,R), FB(M,R).

There are effective parallel algorithms for reduction graph processing. However, these algorithms can be used only in Shared Memory Processors systems (SMP). These algorithms are not suitable for multicomputers, which use a completely different model of concurrency — message-based distributed computing.

To implement parallel search algorithms on the reduction graph in multicomputer (cluster) systems, as an option, it would be possible to organize a single virtual address space (Distributed Shared Memory, DSM) using page swap. However, in the case of the DSM mechanism, which is implemented by the Operating System or middleware, system performance is low. A more complex but predictable message-based model is preferable.

The Knowledge-Вased Multi-Agent System uses a network of cooperating software agents instead of a single intelligent solver performing a parallel search on the

**Figure 1.** *AND/OR-connection graph.*

*On an Approach to Knowledge Management and the Development of the Knowledge-Вased… DOI: http://dx.doi.org/10.5772/intechopen.106738*

reduction graph. Each individual software agent has only partial knowledge of the problem and can solve only some subtask.

The KВMAS integrates models, methods, and tools of Distributed Artificial Intelligence, parallel computing, and Event-Driven Microservices technology. The software agents of the KВMAS are loosely coupled intelligent software modules that can be distributed, often on a large scale (**Figure 2**).

The use of decoupled components is a basic requirement for successful scaling. The opportunity to realize decoupled software modules is provided by the methodology of Object-Oriented Analysis and Design. The idea of decoupling underlies most of the of Object-Oriented patterns, which can be successfully used to create a Knowledge-Вased Multi-Agent System.

Computing nodes of the KBMAS are multiprocessors with shared memory. Software agents of the same node communicate with each other using a Local InterProcess Communication (LIPC). To improve performance the LIPC is implemented using specialized system libraries (LibOS). Software agents of the different nodes communicate through message exchange implemented using standard libraries and middleware.

**Figure 3** shows the structure of a Multi-Agent Solver for one of the KВMAS nodes. Two types of applied software agents are used in the Multi-Agent Solver: cognitive and reactive. Applied software agents processing the domain knowledge use special Cognitive Data Structures (CDS). Four types of methods are implemented for work of applied software agents with the knowledge base: comparison (CMP), association (ASS), analysis (ANS), and specification (VAL). CMP method is called when

**Figure 2.** *Structure of the KВMAS.*

#### **Figure 3.**

*Multi-Agent Solver of the KВMAS node.*

comparing events or objects; ASS method is used to get answers to queries about relations between objects and events; ANS method implements logical analysis of events. For object specification (VAL-method) can be used both clear queries to the knowledge base and fuzzy queries [7].

Different types of membership functions can be used to implement fuzzy queries. **Figure 4** shows the examples of membership functions (μs) for two linguistic variables: Correctness and Completeness.

**Figure 5** shows the result of a fuzzy query that uses these membership functions.

In the problem-oriented Multi-Agent Solver presented in **Figure 4**, the priorities of the applied software agents are set according to the sequence number of the software agent. In this case, the first software agent uses the tables TableU\_1 and TableSovU\_1. The software agent with number N has the lowest priority and is associated with table TableU\_N.

Cognitive applied software agent coordinates the work of a group of reactive software agents. As an example, **Figure 6** shows the diagram of the states and transitions of one of the cognitive applied software agents.

*On an Approach to Knowledge Management and the Development of the Knowledge-Вased… DOI: http://dx.doi.org/10.5772/intechopen.106738*

#### **Figure 4.** *Examples of the membership functions.*



#### **Figure 5.**

*Example of the result of a fuzzy query.*

After initialization, the transition to the "Selection" state occurs, in which the cognitive software agent selects the necessary knowledge source, taking into account the informative signals from the reactive agents. The cognitive agent then goes into a "Coordination" state in which it coordinates the actions of the reactive software agents. If at the next step, the reactive agents do not find a coordinated solution, cognitive agent backtracking to the previous state of partial solution to the problem.

The diagram of one possible option of interaction between the reactive agents of a node, each of which is connected to only one neighbor, is shown in **Figure 7**.

The organization of the Multi-Agent Solver is described in more detail in the paper [4].

Computing node of the KBMAS can use the concurrency model in which several computing threads are in the execution state, and only one of the threads is actually executing on the processor at any given time. This concurrency model uses shared memory to exchange information. Competitive threads are described by the consistency model, which defines the order in which operations performed by local agents in a node should be executed, and the order in which the results of these operations should be transmitted to the group members.

In addition to competitive concurrency computations, simultaneous concurrency computations can be implemented in the KBMAS nodes. To implement parallel computing, Uniform Memory Access (UMA) multiprocessors are usually used. In this case, the whole set of software agents is divided into subsets (groups). Agents that belong to different groups can act simultaneously.

The execution of computing processes (threads) associated with groups of applied software agents is coordinated by system software agents, which provide access to information-computing resources of the multicomputer.

The agents are dynamically divided into groups using the compatibility matrix S and the inclusion matrix R. The compatibility matrix S has the following form (1):

$$\mathbf{S} = \begin{bmatrix} \mathbf{0} & s\_{12} & s\_{13} & \cdots & s\_{1M} \\ s\_{21} & \mathbf{0} & s\_{23} & \cdots & s\_{2M} \\ s\_{31} & s\_{32} & \mathbf{0} & \cdots & s\_{3M} \\ \cdots & \cdots & \cdots & \cdots & \cdots \\ s\_{M1} & s\_{M2} & s\_{M3} & \cdots & \mathbf{0} \end{bmatrix} \begin{array}{c} \mathbf{S}\_1 \\ \mathbf{S}\_2 \\ \mathbf{S}\_3 \\ \vdots \\ \mathbf{S}\_M \end{array} \tag{1}$$

**Figure 7.**

*Interaction scheme of the reactive software agents.*

*On an Approach to Knowledge Management and the Development of the Knowledge-Вased… DOI: http://dx.doi.org/10.5772/intechopen.106738*

where sij=1, if the agents Ai and Aj use different computing resources and work in parallel, otherwise sij=0.

The distribution of agents into groups is based on the inclusion matrix R (2):

$$R = \begin{bmatrix} r\_{11} & r\_{12} & \cdots & r\_{1M} \\ r\_{21} & r\_{22} & \cdots & r\_{2M} \\ \cdots & \cdots & \cdots & \cdots \\ r\_{H1} & r\_{H2} & \cdots & r\_{HM} \end{bmatrix} \ R\_2 \tag{2}$$

where M is the number of agents, H is the number of groups. rij = 1 if the agent *A*<sup>i</sup> is included in the group Yj. The agent Ai is included in the group Yj if Si ∩ Rj = ∅, that is the matrix rows do not intersect. For optimal partitioning into subsets, it is necessary to consider the functional features of the software agents, their requirements for computing resources, as well as know the structural organization of the KBMAS node used to implement parallel computations.

In the process of solving subtasks, software agents use microservices, which can be duplicated on different nodes of the computing system. The system software agents distribute the computational load and control the microservers based on the data provided by the agents-monitors.

The behavior of applied software agents becomes more rational by repeatedly solving the same problem. The applied software agent learns through a series of rewards and punishments. The agent's actions change the environment to a new state. The environment returns the next state and reward to the agent. The cycle "state ! action ! reward" repeats until the problem is solved (**Figure 8**).

Signals st correspond to a state, at to an action, rt to a reward at time t. The strategy according to which the agent chooses actions is a function that maps a set of states into a set of actions. The agent's task is to choose (by trial and error) the best action that maximizes the target function, which is the sum of the rewards received by the software agent.

Various reinforcement learning algorithms can be used. The most effective is the Actor-Critic algorithm [8–10] in which the strategy generates action, and the value function critiques the actions.

The basis of reinforcement learning is function approximation. As a method of function approximation, KВMAS uses a multilayer neural network that models both policy functions and value functions.

**Figure 8.** *The reinforcement learning control loop.*

The advantage function A=Q(s,a)-V(s) is used to generate reinforcing signals. Benefit V(s) which can be obtained by achieving a particular state (s) is evaluated by the agent before the action, and the value function of the action Q(s,a) — after the action has been taken.

#### **3. Aggregation of distributive information structure**

Created information data structures can have a large volume and dimension, so their loading and implementation is carried out in fragments. Logically interrelated data should be divided into a number of clusters that have the smallest interconnection under constraints on the dimensionality of clusters, as well as on the degree of semantic proximity of logical records included in the clusters. No less important is the question of choosing the type of data storage systems used [11, 12].

Let us introduce the variable B<sup>i</sup> kj, which characterizes the handling by the k-th query of the i-th information element, which is in the j-th logical record. Variable Xij = 1 if the i-th data partition is selected in the j-th logical record; Xij = 0 otherwise. Variable a*i*<sup>k</sup> = 1 if the i-th data partition is included in the k-th query; otherwise a*i*<sup>k</sup> = 0.

Variable B<sup>i</sup> kj characterizes the use of the j-th logical record by the k-th query (3):

$$B\_{kj}^i = \begin{cases} \mathbf{1}, \text{ when } \sum\_{i=1}^I a\_{ik} \mathbf{x}\_{ij} \ge \mathbf{1}; \\\\ \mathbf{0}, \text{ when } \sum\_{i=1}^I a\_{ik} \mathbf{x}\_{ij} = \mathbf{0}; \end{cases} \tag{3}$$

Note that the results obtained in solving such a problem are important from a practical point of view, given the constraints for designing an acceptable data structure and the ability to generate fast queries for sampling and editing distributional structures.

Let's analyze this algorithm step by step.

#### **4. Approximate algorithm for distributing data clusters between the server and local network clients**

At this stage, the distribution of data batches for storage and processing is determined by the criterion of minimum total traffic.

We reduce the canonical graph of the data structure to an uncoupled graph and calculate the weight of each data batch, summarily consisting of the weight of the batch itself and the weight of arcs (links), taking into account the requirements of the network clients (4):

$$\mathcal{W}\_i = \mathcal{W}\_i^{part} + \mathcal{W}\_{iq}^{link} \tag{4}$$

where, *Wpart <sup>i</sup>* — total weight of data partitions (5); *Wlink iq* — the weight of the arcs of the canonical data structure graph (6).

$$\mathcal{W}\_i^{part} = \sum\_{k=1}^{k\_0} \sum\_{p=1}^{p\_0} \gamma\_{kp}^Q \delta\_{kp}^Q \theta\_{pi} \tag{5}$$

*On an Approach to Knowledge Management and the Development of the Knowledge-Вased… DOI: http://dx.doi.org/10.5772/intechopen.106738*

$$\mathcal{W}\_{iq}^{link} = \sum\_{k=1}^{k\_0} \sum\_{p=1}^{p\_0} \gamma\_{kp}^Q \delta\_{kp}^Q \theta\_{pi} \sum\_{q \neq i}^I \theta\_{pq} a\_{iq}^G \tag{6}$$

The weight of the i-th data batch is calculated by the formula (7):

$$\mathcal{W}\_i = \sum\_{k=1}^{k\_0} \sum\_{p=1}^{p\_0} \gamma\_{kp}^{\mathcal{Q}} \delta\_{kp}^{\mathcal{Q}} \theta\_{pi} \left( \mathbf{1} + \sum\_{q \neq i}^I \mathfrak{g}\_{pq} a\_{iq}^{\mathcal{G}} \right) \tag{7}$$

where, γ Q *kp*—the frequency of generation of user requests; <sup>δ</sup><sup>Q</sup> *kp*— elements of the matrix for generating user queries; ϑ*pi*— the matrix for using data partitions when executing queries; *aG iq*— the semantic contiguity matrix of data partitions.

In the next step, the local network graph is converted into an unconnected graph with the calculation of the weight of each node (8):

$$\mathcal{W}\_r = t\_r + \sum\_{m^\dagger r}^{R\_0} t\_{rm} \tag{8}$$

where *tr* — the total average duration of data processing in the r-th node, consisting of the time of decomposition of the query into subqueries, route selection and connection establishment, etc.;

*trm* — the average duration of data transmission between nodes, determined based on the matrix of logical distances between the servers of the nodes of the local network.

Next, the matrix *V* ¼ k k *ωir* is formed, the elements of which are the Cartesian product of the weights of each node by the weights of each data partition (9):

$$
\rho\_{ir} = W\_i \times W\_r \text{ for } i = \overline{1,} \overline{l}; r = \overline{1,} \overline{R\_0}. \tag{9}
$$

In the final step of the first stage, the problem (10)

$$\min\_{\{\mathbf{x}\_{ir}\}} \sum\_{i=1}^{I} \sum\_{r=1}^{R\_0} o\_{ir} \mathbf{x}\_{ir} \tag{10}$$

is solved under the constraints:

• by the number of data partitions, the localization of which is possible on one node (11)

$$\sum\_{i=1}^{I} \varkappa\_{ir} \le N\_r r = \overline{1, r\_0} \tag{11}$$

• on the permissible redundancy of groups by network nodes (12)

$$\sum\_{r=1}^{r\_0} \varkappa\_{ir} \le M\_i, i = \overline{1, I} \tag{12}$$

• on the amount of available external memory of the data storage system (13)

$$\sum\_{i=1}^{I} \varkappa\_{ir} \rho\_i \pi\_i \le \mathfrak{h}\_r^{ROM} \tag{13}$$

where, ρ*<sup>i</sup>* — the vector of group lengths in bytes; π*<sup>i</sup>* —the vector of number of instances in groups; η*ROM <sup>r</sup>* — the amount of available memory on the server of the *r*-th host; *xir* ¼ 1, if the *i*-th data partition is included in the r-th network node; *xir* ¼ 0 – otherwise.

#### **5. Problem of distributing data parties of each agent to the types of logical records**

The problem posed at this stage is solved taking into account the criterion of the least total time of local data processing in each network node by agents. The number of aggregation tasks for this stage is determined by the number of local network nodes.

The initial data are the subgraphs of the graph of the canonical data structure, as well as the temporal and volumetric characteristics of the subgraphs of their canonical structure, a set of requests from users and network nodes.

The aggregation problem is solved here using approximate algorithms with restrictions:

• on the number of groups (14):

$$\sum\_{i=1}^{I} \mathbf{x}\_{it} \le F\_t, \ \forall t = \overline{\mathbf{1}\_t t\_0}, \text{ where, } F\_t - number \text{ of groups in } t-record,\tag{14}$$

• on the non-repeatability of including groups in a record (15):

$$\sum\_{t=1}^{t\_0} \mathbf{x}\_{it} = \mathbf{1}, \ \forall i = \overline{\mathbf{1}, I} \tag{15}$$

• on the cost of information storage.

The main cost characteristics of the distributive data structure are the cost of storing *Eds* information; the cost of executing *EQ run* requests and transactions at a given time interval; the cost of transmitting information via *EC run* communication channels.

The sum of these components determines the total cost (16):

$$E = E\_{ds} + E\_{run}^{Q} + E\_{run}^{C} \tag{16}$$

The cost of storing distributed information is determined by the physical volume of information Vcr and the cost of storing a unit of information volume (one logical record) on the server. If we assume that the cost of storage in all nodes of the local network is a constant value, then *Eds* ¼ *VL*∙*kds*.

That is, the product of the logical volume of stored information (*VL*Þ and the coefficient that takes into account the storage capacity on the media when organizing the database (*kds*; in practice, it is approximately equal to 1.2–1.5).

The cost of executing multiple user requests at a given time interval is the sum of the cost of servicing multiple user requests on servers and the cost of transmitting information through communication channels during the execution of user requests. *On an Approach to Knowledge Management and the Development of the Knowledge-Вased… DOI: http://dx.doi.org/10.5772/intechopen.106738*

The cost of performing transactions at a given time interval is also determined by the sum of the cost of performing steps (tasks) of transactions on server nodes and the cost of transferring transaction requirements to server nodes, fixing transactions and removing locks.

• from the total time of servicing operational requests on servers (17):

$$\sum\_{pr=1}^{r\_0} \sum\_{t=1}^{t\_0} B\_{pr}^t \bullet (t\_q + t\_l) < T\_p \tag{17}$$

where T*<sup>p</sup>* — additional service time of the p-th operational request; *tq* — average duration of generation of one request (or transaction task step); *tl* — average processing time for one logical record on a local network host/server.

Variables *Bt pr* determine the types of logical records used by the *p*-th query in the *r*-th node of the computing system.

As a result, logical database structures are determined for each network node.

#### **6. Localization of data by network nodes**

This step uses the results of the previous steps and the characteristics of the data warehouse.

As a result of the proposed algorithm, localization matrices for a set of batches of data are formed by types of logical records (the result of the first stage), and then groups of records by local network nodes. In this case, the running time of the algorithms is additionally estimated.

#### **7. Conclusion**

The chapter examined the structural and functional organization of the Multi-Agent System, which uses intelligent applied and system software agents. To support the process of developing a problem-oriented KBMAS based on the considered agentbased models, the Multi-Agent-KB5 toolkit is used. This toolkit includes interactive wizards and property panels that allow creating groups of applied (reactive and cognitive) software agents. Inclusion of system software agents into KBMAS, which implement algorithms for planning and management of computational resources, taking into account the specifics of the interaction of applied agents, increases the performance of the system. KBMAS performance is also improved by optimizing the logical structures of the distributed knowledge base.

*Multi-Agent Technologies and Machine Learning*

### **Author details**

Evgeniy Zaytsev\* and Elena Nurmatova MIREA—Russian Technological University, Moscow, Russia

\*Address all correspondence to: zajcev@mirea.ru

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*On an Approach to Knowledge Management and the Development of the Knowledge-Вased… DOI: http://dx.doi.org/10.5772/intechopen.106738*

#### **References**

[1] Wooldridge M. An Introduction to Multi-Agent Systems. 2nd ed. John Willey & Sons Ltd; 2009. p. 488. ISBN: 978-0-470-51946-2

[2] Baranauskas R, Janaviciute A, Jasinevicius R, Jukavicius V. On multiagent systems intellectics. Information Technology and Control. 2015;**1**:112-121

[3] Houhamdi Z, Athamena B, Abuzaineddin R, Muhairat M. A multi-agent system for course timetable generation. TEM Journal. 2019;**8**:211-221

[4] Zaytsev EI, Khalabiya RF, Stepanova IV, Bunina LV. Multi-agent system of knowledge representation and processing. In: Proceedings of the Fourth International Scientific Conference "Intelligent Information Technologies for Industry" (IITI'19). Springer; 2020. pp. 131-141

[5] Darweesh S, Shehata H. Performance evaluation of a multi-agent system using Fuzzy Model. In: 1st International Workshop on Deep and Representation Learning (IWDRL). 2018. pp. 7-12

[6] Aly S, Badoor H. Performance evaluation of a multi-agent system using Fuzzy Model. In: 1st International Workshop on Deep and Representation Learning (IWDRL). Cairo; 2018. pp. 175-189

[7] Zaytsev EI. Method of date representation and processing in the distributed intelligence information systems. Automation Modern Technologies. 2008;1:29–34

[8] Red'ko VG. Evolyutsiya, neyronnyye seti, intellekt: Modeli i kontseptsii evolyutsionnoy kibernetiki. -M.: "LIBROKOM". 2013

[9] Graesser L, Keng WL. Foundations of Deep Reinforcement Learning. Addison-Wesley Professional; 2020. p. 416. ISBN: 978-0135172384

[10] Raj JS, Ananthi JV. Recurrent neural networks and nonlinear prediction in support vector machines. Journal of Soft Computing Paradigm (JSCP). 2019;**1**: 33-40

[11] Batouma N, Sourrouille J. Dynamic adaption of resource aware distributed applications. International Journal of Grid and Distributed Computing. 2011; **4**(2):25-42

[12] Nurmatova EV, Gusev VV, Kotliar VV. Analysis of the features of the optimal logical structure of distributed databases. In: Collection of works the 8th International Conference "Distributed Computing and Gridtechnologies in Science and Education". Dubna; 2018. p. 167

## Section 3

Advanced Developments in Multi-Agent Technologies and Machine Learning Creating Potential for Their Further Integration

### **Chapter 5**

## Modeling Electric Vehicle Charging Station Behavior Using Multiagent System

*Jaslin Shaleem Khan, Malligama Arachchige Uditha Sudheera Navaratne and Janaka Bandara Ekanayake*

#### **Abstract**

Agent-based models(ABMs) are a type of simulation in which a large number of self-sufficient agents interact in a way that combines stochastic and deterministic behavior. Recently, there have been reestablished interests in utilizing multiagent systems (MASs) to get more granular data relating to specific conditions. MESA is an ABM framework for Python. It enables users to quickly develop ABMs with built-in core components, view them with a browser-based interface, and evaluate their findings with Python's data analysis capabilities. This chapter depicts an ABM of a photovoltaic (PV)-powered electric vehicle (EV) charging station in a university car park modeled using MESA. The goal is to determine the preliminary requirements for PVpowered EV charging stations, which would result in increased PV and cost benefits.

**Keywords:** agent-based model (ABM), MESA, multiagent system (MAS), photovoltaic (PV), EV charging station

#### **1. Introduction**

Agent-based applications are becoming the mainstream in a wide range of domains, including e-commerce, logistics, supply chain management, telecommunications, healthcare, engineering, and manufacturing, as technology advances [1]. Multiagent systems (MASs) emerge as new software technologies that combine a number of artificial intelligence (AI) techniques. It provides a more efficient and natural alternative to building intelligent systems, thereby providing a solution to the current complex real-world problems that must be solved. Autonomy, complexity, adaptability, concurrency, communication, distribution, mobility, security and privacy, and openness are some of the properties of MASs [2, 3]. The concept of MAS is also a trending technology in power engineering applications, such as power system restoration, power system optimization, market simulation/electricity trading, and smart grid control [4]. The MAS's ability to deal with complex problems through agents, which has been highlighted in various research works, is the basis of using the MAS in many applications.

MAS-based models are used globally to implement demand-side management (DSM) systems, cost optimization, robustness management in microgrids (MGs), and controlling voltage and thermal constraints in distribution networks.

In [5], an MAS-based online voltage monitoring system was proposed. In [4, 6], a dual-layered advanced control and security system based on the MAS, as well as an automated meter reading facility in a smart grid distribution network, was proposed. DSM has been activated from a global research standpoint by maximizing the use of distributed energy sources, environmentally friendly technologies, optimizing algorithms, and the implementation of renewable energy (RE) resources [7–10]. The use of renewable energy sources in smart grid distribution systems, as well as the multi-MG model, lowers consumer electricity costs [11, 12].

As the number of electric vehicles (EVs) increases, better charging infrastructure is required to provide the necessary energy for mobility with the cost benefits. MASs have seen a surge in popularity in recent years, and they are now widely used in EV-based power system research studies. In [13], a multiagent system (MAS)-based modeling tool has been proposed to assess the effects of EV charging on Singapore's energy grid. This study looked into the effects of EV temperature (air conditioning) and EV charging load during the charging. In [14], a state of charge (SOC)-based charging algorithm has been suggested, which is divided into two categories: controlled and uncontrolled charging (vehicle to grid—V2G) and grid to vehicle (G2V). This reduces the number of cars that run out of power on their next trip. In [15], a decentralized and intelligent MAS for controlling and managing EV charging in low-voltage (LV) distribution networks was presented. In this context, three case studies were investigated: without EV charging regulation-uncoordinated case or dumb charging; with EV charging regulation and without the voltage charging control; and with EV charging and voltage droop control. The simulation results showed that charging regulation provides significant benefits in terms of voltage control when compared to the other two situations. The proposed solution in [14] has been improved with active demand (AD) program management in [16]. It enabled the incorporation of EVs into the system while mitigating their negative impact on voltage regulation.

Lee et al. [17] discussed the impact of EVs on the electric grid; it was tested using real data from the "My Electric Avenue" initiative with ABM. It looked at how consumers' use of time-of-use (ToU) tariffs and vehicle range (battery capacity) preferences affect total and peak demand fluctuations at the local substation. In order to depict the complexity of an electric transportation system, an EV implementation based on agents was created [18]. After implementing various charging powers and charging patterns, the results were generated in a qualitative manner. In [19], the effects of influencing variables on EV charging demand, such as driver behavior, charging station location, and electricity pricing, were investigated. An ABM on Net Logo was used in this study to precisely simulate human aggregate behavior and its impact on load demand due to EV charging. In [20], the results of a study that used ABM simulation to simulate alternative charging infrastructure rollout techniques to allow for large-scale EV adoption were presented. The simulation included a variety of user types (residents, visitors, taxis, and sharing), as well as various types of charging infrastructure (level 2, clustered level 2, and fast charging).

An EV powered by an intermittent power source is an excellent way to ensure lowcost, emission-free mobility. An energy storage management hybrid optimization algorithm was presented in [21]. This algorithm flipped between deterministic and rule-based modes of operation depending on the power pricing band allocation. The cost degradation model and the levelized cost of photovoltaic (PV) power were

*Modeling Electric Vehicle Charging Station Behavior Using Multiagent System DOI: http://dx.doi.org/10.5772/intechopen.105613*

#### **Figure 1.**

*An overview of this study and its state of the art.*

combined in the case of PV-integrated charging stations with on-site energy storage systems. An agent-based charge station model utilizing renewable energy (RE) was proposed in [22]. Charging patterns were determined by scenarios including various RE capacities, policy interventions, limited versus unlimited charging capacity, social charging, and the existence or absence of central control. It was determined that in order to improve sustainable charging, policymakers should employ various incentives for different categories of EV drivers.

To aid comprehension of the state-of-the-art review and to clarify the study's contributions, **Figure 1** is added.

The rest of this chapter is structured as follows. Section 2 presents the modeling of an ABM of a PV-based charging station. Section 3 includes the simulation results and analysis of various scenarios. Section 4 draws the conclusion.

#### **1.1 Solar PV-based EV charging station: Application of MAS**

This chapter discusses the development of an agent-based EV charging station on university premises. It is primarily being developed as a solar PV-powered charging system. The objectives of this AB charging system are to optimize the benefits of cost and direct solar supply in a quantitative manner. This study makes two major contributions:


A survey was used to collect information about the available motor vehicles in the university car parks. A future EV integration database has been created in accordance with that.

#### *1.1.1 Modeling of the system*

The simulation model is built with a variety of agents, including an EV agent, a weather agent, a solar panel agent, a main control agent (MCA), a utility agent, a

**Figure 2.** *The system architecture of the multiagent simulation platform.*

charging control agent (CCA), and a charge station battery agent. The solar panel agent generates energy based on the weather condition's temperature and irradiance value, which can be accessed via the weather agent. The university weather broadcasting inverter portal was used to create the weather agent database. Each charging control agent in the developed agent-based system can manage the EVs' charge while taking into account energy prices and the EV agent's requirements (EVs' SOC during arrival, charging option, and arrival time), which is based on one of the following two scenarios: SOC-based or TOU-based tariffs. The EV agent represents the EV owner, who has the option of interacting/charging with the user interface in a range of ways. The charging control agent's energy scheduling is sent to the main control agent, which evaluates the overall energy supply agents' (utility agent, solar panel agent, and charge station battery agent) performance using optimizing algorithms. An illustration of the agent-based EV charging station system is presented in **Figure 2**.

#### *1.1.2 Agents' validation*

Each of the agents was validated with proper testing results before starting to simulate the system.

I.Solar panel agent: A PV array power calculation system of Homer Pro 3.14 was used to validate the solar panel agent. The results revealed the least amount of variation, which is included in **Table 1**.


**Table 1.**

*Total PV array output from the simulation model and Homer software.*

*Modeling Electric Vehicle Charging Station Behavior Using Multiagent System DOI: http://dx.doi.org/10.5772/intechopen.105613*


#### **Table 2.**

*The NEDC and WLTP testing results of the ABM simulation.*


#### **Table 3.**

*The EPA testing results of the simulation.*


#### **Table 4.**

*The EPA testing results for EV battery simulation.*


#### **2. ABM with MESA**

#### **2.1 MESA simulation software**

The ABM of the EV charging station is modeled based on a multigrid scenario in MESA. MESA, a platform that provides a Python environment for agent behavior, is *Modeling Electric Vehicle Charging Station Behavior Using Multiagent System DOI: http://dx.doi.org/10.5772/intechopen.105613*

used to implement the agent-based control system. It contains the model (model, agent, schedule, and space), analysis (data collector and batch runner), and visualization (visualization server and visualization browser page) [24]. The portion of the Python-based MESA simulation source code is depicted in **Figure 3**.

The interaction of agents begins with the EV agent, which sends SOC information and charging requirements to its charging control agent (CCA). The request is then sent to the main control agent (MCA) by each CCA. The MCA sends requests to each energy resource, including the solar agent, charge station battery agent, and utility grid agent. The MCA then calculates the optimal charging schedule based on the charging scenarios (SOC-based and TOU-tariff-based). The MCA coordinates each step with each energy resource until the charging process is complete. Each iteration does its internal operations, such as calculations for energy management in MESA. Each agent is responsible for its own tasks.

Solar agent: It generates solar energy as per temperature and irradiance data, which is accessed from the weather agent.

**Figure 3.** *The portion of source code—model.py.*

**Figure 4.** *The activity diagram of the ABM.*

Utility agent: In this ABM, there is no power limit for the utility agent. It serves as a backup supply, allowing PV sources to sell excess energy and EVs to obtain power, depending on the power management strategy.

Main control agent: It serves as a coordinator, receives requests for EV charging from each charge pole agent, and performs internal calculations in accordance with management scenarios.

The ABM's activity diagram is depicted in **Figure 4**. It shows the information flows as well as agent internal operations.

#### **2.2 Knowledge representation of agents**

ABM with AI provides a real-time application that is extremely beneficial to all industries. Its popularity stems from its adaptability in different subfields (reasoning, knowledge representation, machine learning (ML), planning, coordination, communication, and so on). First, consider a few advantages of this combined technique: ABM drives emergent phenomena, precisely defines a natural system, and is flexible [2, 25].

Most complex real-world systems are only partially decomposable, and one solution would be to give the components the ability to decide on the nature and scope of their interactions at run time. Still, when combined with ML, ABM has the potential to create a new type of computing based on agents—by learning agents' behavioral patterns.

Many ABMs can easily incorporate various ML techniques such as genetic algorithms (GAs), neural networks (NNs), and Bayesian classifiers. It has two interlocked cycles for examining input, making decisions, and producing output. The ML algorithm uses the ABM as an environment for this framework, while the ABM uses the ML algorithm to maintain the agents' internal models [26]. The framework has described how ML techniques are used in ABM in **Figure 5**.

The weather agent in our solar-powered EV charging station model is updated with ML techniques to update its temperature and irradiance value. Excel is also used to

*Modeling Electric Vehicle Charging Station Behavior Using Multiagent System DOI: http://dx.doi.org/10.5772/intechopen.105613*

**Figure 5.** *The integrated cycles of ABM and ML [26].*

generate the observed dataset from the faculty weather portal for an ML tool. Excel can be a valuable addition to your ML toolkit. This can aid in the visualization and analysis of smaller datasets.

#### **2.3 Simulation and results**

The simulations are evaluated in terms of direct solar benefit (DSB) and costbenefit analysis. DSB defines the solar PV contribution to requested EV demand, which is calculated from Eq. (1).

$$\text{Total DSB} (\%) = \frac{\text{Direct solar supply for EV charging}}{\text{Total EV demand}} \times 100\% \tag{1}$$

where

Direct solar supply for EV charging ¼ Available solar generation

� Remaining solar energy*:*

The system is simulated in 5-min intervals. This is simulated in two ways: SOC-based charging with a flat tariff and TOU-tariff-based charging.

The ABM of EV charging stations initially investigated using an SOC-based flat charging scenario. For EVs, three charging algorithms have been developed: uncontrolled, vehicle to grid (V2G), and grid to vehicle (G2V). TOU-tariff-based charging is simulated with slow, average, and fast charging options. The simulations have yielded numerical results for DSB and cost benefits.

#### *2.3.1 SOC-based charging with flat tariff*

Uncontrolled charging: This implies that when the EV is connected to the university charging station, the battery is charged until it reaches maximum SOC or disconnection.

Vehicle to grid (V2G): It converts EVs into energy storage systems, allowing any excess energy stored in the EV's battery to be injected back into the grid, which is enabled by the SOC values of EV. When the EV's battery pack SOC falls below 50%, it begins charging by fast charging (30 kW); otherwise, it charges by average charging (6.6 kW) until the EV reaches 80%. If the EV's SOC is greater than 80%, the energy in the EV battery can be pushed back into the electrical grid.

Grid to vehicle (G2V): It allows the EVs to charge with controlled charging while parked in the university car park. If the EV's SOC falls below 50%, it immediately begins charging either through fast charging or through average charging.

**Figures 6**–**8** show the simulation of the above three charging scenarios. GE—grid energy, BSE—battery storage energy, and SE—solar energy.

When the PV energy supply is insufficient to fully charge the EVs, the stationary storage charges the EV, and the energy is supplied by the public grid. The main drawback is that PV energy supply does not completely benefit EV charging and that reliance on the public grid increases when charging is uncontrolled.

This ABM is modeled to serve as a university charging station (workplace charging). A large number of EVs are likely to be parked for a longer duration. To increase the direct solar benefit of EV charging, slow charging (2.3 kW) is combined with G2V charging.

**Figure 6.** *Grid, station battery, and solar energy distribution for uncontrolled charging.*

**Figure 7.** *Grid, station battery, and solar energy distribution for V2G charging.*

*Modeling Electric Vehicle Charging Station Behavior Using Multiagent System DOI: http://dx.doi.org/10.5772/intechopen.105613*

**Figure 8.** *Grid, station battery, and solar energy distribution for G2V charging.*

Flat cost charging is used to simulate SOC-based charging; the three charging modes have the same unit cost. **Figures 6**–**8** show how uncontrolled charging and G2V result in the highest demand peak from the grid. The V2G has a lower peak because it allows only a few EVs to charge. According to our database, the majority of EVs are not far from our faculty. We also include simulation assumptions, such as when the EVs begin their journey with 100% SOC each day. When EVs arrive at the faculty, they have an average of more than 80% since they are not far away from the faculty. It causes them to inject excess energy into the grid as an energy source until it reaches 80% of SOC. It helps in reducing the grid's need for additional energy generation and the demand for power supply resources. **Figure 9** shows the simulation observation of G2V slow charging (**Figure 9**).

However, V2G and G2V permit obtaining energy consumption from energy resources and charging the vehicle when implementing controlled charging. When the SOC exceeds 50%, fast charging begins to preserve the lifetime of the battery from the full depth of discharge.

G2V controlled charging is combined with slow charging to improve DSB. It reduces grid dependability while improving DSB, as illustrated in **Figure 9**.

**Figure 10** shows that DSB is calculated for all charging strategies based on SOC charging. EVs charge in average charging (6.6 kW) mode when uncontrolled charging and G2V charging mode are enabled. Both have almost the same DSB. In V2G, few

**Figure 10.**

*DSB percentage for SOC-based charging scenarios.*

cars need to be charged by the charging station, resulting in a moderate DSB value. When G2V is paired with slow charging, the DSB rises.

#### *2.3.2 TOU-tariff-based charging*

The TOU-based charging scenario simulates three different charging options. It includes slow, average, and fast charging. The charging powers are 2.3 kW, 6.6 kW, and 30 kW, respectively. In TOU-tariff-based charging, two major streams are considered: cost and direct solar benefit. The charging station's various start times are simulated to determine the most advantageous point of solar PV-based charging. The simulation observations are depicted in **Figures 11**–**13**.

This system is simulated with different station charging start times for the EV to see how PV energy affects EV charging. **Figure 14** summarizes the simulation results.

According to the findings, charging that begins at 11 a.m. has a higher DSB. However, our database shows that the few cars left the faculty before 11 a.m. Fast charging is assumed to have a 30-kW unit charge power excess for a few EVs in this simulation. As a result, the total demand to be achieved is reduced. Aside from that, the charging start time of 10 a.m. has a higher DSB. The slow and average charging have better DSB than the fast charging. The proportion of PV charging has increased, while the reliance on the public grid has decreased.

*Modeling Electric Vehicle Charging Station Behavior Using Multiagent System DOI: http://dx.doi.org/10.5772/intechopen.105613*

**Figure 12.** *Grid, station battery, and solar energy distribution for average charging.*

**Figure 13.** *Grid, station battery, and solar energy distribution for fast charging.*

#### **Figure 14.**

*The DSB results for EVs under TOU tariff.*

Besides that, the stationary storage lasts longer, preventing rapid discharge. It does not exceed its minimum SOC (20%). The stationary storage energy decreases as it approaches its capacity limit with minimum SOC, and it is then supplied by the public

**Figure 15.** *Station battery SOC pattern for slow charging at 10 a.m.*

**Figure 16.** *Station battery SOC pattern for average charging at 10 a.m.*

**Figure 17.** *Station battery SOC pattern for fast charging at 10 a.m.*

grid during off-peak hours. The charge station battery SOC pattern is shown in **Figures 15**–**17** for slow, average, and fast charging, respectively.

As per the results, in slow and average charging modes, PV and stationary storage share more power. DSB is also increased when the charge station charging time is set to 10 a.m. in the university charging unit.

TOU tariff charging is expected to be cost-effective in PV-powered EV charging stations when compared with SOC-based charging. This creates a win-win situation for both charging station operators and EV owners. **Figures 18**–**20** show the graphical representation of the total and profit costs for both EV users and station owners.

#### *Modeling Electric Vehicle Charging Station Behavior Using Multiagent System DOI: http://dx.doi.org/10.5772/intechopen.105613*

**Figure 18.** *Total cost based on CEB unit cost and simulation unit cost.*

**Figure 19.** *Total profit for the EV owner in LKR (Sri Lankan Rupees).*

#### **Figure 20.**

*Total profit for the charge station owner in LKR (Sri Lankan Rupees).*

The cost benefit was calculated compared to the Ceylon Electricity Board's (CEB's) current EV charging price. As previously discussed, fast charging did not meet the total EV demand. Aside from that, slow and average charging has lower costs for both charge station operators and EV owners, which is more beneficial at 10 a.m. charging time. Fast charging also provides a higher grid-injected price benefit at the 10 a.m. start time to charge station operators than the other two start times.

#### **3. Conclusion**

There have been many exciting developments in ABM recently, and the use of truly adaptive agents within ABM is one promising guarantee that is still being explored. ABM is more than a simulation tool; it also aids in the reduction of operational risk and the development of new strategies for the organization. The incorporation of ML techniques into ABM should enable the creation of new and unique models.

Simulation model discussed focuses on the preliminary requirements and costeffective model for charging in a university car park. Two main strategies were presented, SOC-based and TOU-tariff-based, which demonstrated improvements in terms of DSB and cost benefits. When compared to an uncontrolled charging strategy, SOC-based charging is safe and has many benefits. Uncontrolled EV charging causes a significant increase in power demand, which may cause power congestion or voltage issues in the power system. On the contrary, G2V charging with a slow charging mode has the advantage of distributing the charging load over time by limiting the peak power demand.

It is shown that the proposed system can effectively improve the DSB as well as cost benefits by implementing TOU-tariff-based charging. It is cost-effective for both charge station operators and EV owners. Two charging modes are advantageous for the requirements and feasibility conditions in the university car park: slow and average charging.

Our ABM charging station can communicate and collaborate with each agent to achieve the required system behavior. The MAS must be coordinated with its characteristics in order to attain the purpose [2, 3]. Scalability is an essential aspect to consider when creating practical MASs. The simulation model will expand the interaction between the agents without hesitation or delay as the number of EVs in the model expands.

#### **Author details**

Jaslin Shaleem Khan\*, Malligama Arachchige Uditha Sudheera Navaratne and Janaka Bandara Ekanayake Faculty of Engineering, Department of Electrical and Electronic Engineering, University of Peradeniya, Sri Lanka

\*Address all correspondence to: jaslinshaleem93@gmail.com

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Modeling Electric Vehicle Charging Station Behavior Using Multiagent System DOI: http://dx.doi.org/10.5772/intechopen.105613*

#### **References**

[1] Julian V, Botti V. Multi-agent systems. Applied Science. 2019;**9**:1402. DOI: 10.3390/app9071402

[2] Oprea M. In: Reis R, editor. Applications of Multi-Agent Systems. Boston: Kluwer Academic Publishers; 2004. pp. 239-270

[3] Rocha J, Boavida-Portugal I, Gomes E. Introductory Chapter. London, UK: IntechOpen; 2017. DOI: 10.5772/ intechopen.70241

[4] Hemapala KT, Jayasighe SL, Kulasekera AL. Multi-agent based control of smart grid. Communication Control Security Smart Grid. 2017;**1**: 321-345. DOI: 10.1049/PBPO095E\_ch12

[5] Bodhinayake G, Gunawardena LHPN, Hemapala KTMU. Development of a multi agent system for voltage and outage monitoring. In: 2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB). Chennai, India: IEEE; 2017. pp. 156-162

[6] Kulasekera AL, Gopura RC, Hemapala KT, Perera N. A Review on Multi-agent Systems in Microgrid Applications. Kollam, India: IEEE; 2011. pp. 173-177. DOI: 10.1109/ISET-India.2011.6145377

[7] Logenthiran T, Srinivasan D, Shun TZ. Multi-Agent System for Demand Side Management in Smart Grid. Singapore: IEEE; 2011

[8] Et-Tolba EH, Maaroufi M, Ouassaid M. Demand side management in smart grid by multiagent systems technology. IEEE Access. 2014. DOI: 10.1109/ ICMCS.2014.6911211

[9] Chuanchuan C, Jun Z, Panpan J. Multi-agent system applied to energy management system for renewable energy micro-grid. In: 2013 5th International Conference on Power Electronics Systems and Applications (PESA). Hong Kong, China: IEEE. Epub ahead of print December 2013. DOI: 10.1109/PESA.2013.6828222

[10] Aleksei Y, Kuzin GL, Demidova DV, Lukichev NA. Multi-agent System for Distributed Energy Microgrid: Simulation and Hardware-in-the-loop Physical Model. IEEE; 2020

[11] Li W, Logenthiran T, Woo WL, Phan V-T, Srinivasan D. Implementation of Demand Side Management of a Smart Home using Multi-agent System. IEEE; 2016

[12] Jiang W, Yang K, Mao R, Xue N, Zhuo Z. A multiagent-based hierarchical energy management strategy for maximization of renewable energy consumption in interconnected multi-microgrids. IEEE Access. 2019;**7**: 169931-169945. DOI: 10.1109/ACCESS. 2019.2955552

[13] Ho TCT, Yu R, Lim JR, Franti P. Modelling implications & impacts of going green with EV in Singapore with multi-agent systems. In: Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific. Chiang Mai, Thailand: IEEE; 2014. pp. 1–6

[14] Mocci S, Natale N, Ruggeri S, Pilo F. Multi-agent control system for increasing hosting capacity in active distribution networks with EV. In: 2014 IEEE Int. Energy Conf. ENERGYCON; Cavtat, Croatia. 2014. pp. 1409-1416

[15] Stifter M, Bermasser S. A multiagent based approach for simulating G2V and V2G charging strategies for large electric vehicle fleets. In: 22nd Int. Conf. Exhib. Electr. Distrib. CIRED 2013; Stockholm, Sweden. 2013

[16] Ruggeri S, Pilo F, Natale N, Mocci S. Multi-agent control system to coordinate optimal electric vehicles charging and demand response actions in active distribution networks. In: Power Gener. Conf. RPG; Naples, Italy; 2014

[17] Lee R, Yazbeck S, Brown S. Validation and Application of Agentbased Electric Vehicle Charging Model. Elsevier; 2020. pp. 53-62

[18] Torres S, Barambones O, Gonzalez de Durana JM, Marzabal F, Kremers E, Wirges J. Agent-based modelling of electric vehicle driving and charging behavior. In: 2015 23rd Mediterr. Conf. Control Autom. MED, Torremolinos. 2015. pp. 459-464. DOI: 10.1109/ MED.2015.7158791

[19] Chaudhari K, Kandasamy NK, Krishnan A, Ukil A, Gooi HB. Agentbased aggregated behavior modeling for electric vehicle charging load. IEEE Transactions on Industrial Informatics. 2019;**15**:856-868. DOI: 10.1109/ TII.2018.2823321

[20] Wolbertus R, van den Hoed R. Expanding charging infrastructure for large scale introduction of electric vehicles. In: Electr. Veh. Symp, France. 2019

[21] Chaudhari KS. Agent-based Modelling of Electric Vehicle Charging for Optimized Charging Station Operation. Singapore: DR-NTU (Digital Repository of NTU). Nanyang Technological University; 2019

[22] van der Kam M, Peters A, van Sark W, Alkemade F. Agent-based modelling of charging behaviour of electric vehicle drivers. Journal of Artificial Societies and Social Simulations. 2019;**22**:7

[23] Lee S, Cherry J, Lee B, McDonald J, Safoutin M. HIL Development and Validation of Lithium-Ion Battery Packs. 2014. pp. 2014-01–1863. DOI: 10.4271/ 2014-01-1863

[24] Kazil J, Masad D, Crooks A. Utilizing python for agent-based modeling: The Mesa framework. In: Thomson R, Bisgin H, Dancy C, Hyder A, Hussain M, editors. Soc. Cult. Behav. Model. Cham: Springer International Publishing; 2020. pp. 308-317

[25] TOP 5 Significant Use Cases of Agent Based Modeling Simulation— Synergylabs. n.d. Available from: https:// synlabs.io/top-5-significant-use-casesof-agent-based-modeling-simulation/ [Accessed: May 25, 2022]

[26] Rand W. MACHINE LEARNING MEETS AGENT-BASED MODELING: WHEN NOT TO GO TO. 9. [Online]. Available from: https://www.semantic scholar.org/paper/MACHINE-LEARNING-MEETS-AGENT-BASED-MODELING-%3A-WHEN-Rand/65d 565d5186345efc73457cf71cea61f6cfc8645

#### **Chapter 6**

## Approximate Dynamic Programming: An Efficient Machine Learning Algorithm

*Zhou Shaorui, Cai Ming and Zhuo Xiaopo*

#### **Abstract**

We propose an efficient machine learning algorithm for two-stage stochastic programs. This machine learning algorithm is termed as projected stochastic hybrid learning algorithm, and consists of stochastic sub-gradient and piecewise linear approximation methods. We use the stochastic sub-gradient and sample information to update the piecewise linear approximation on the objective function. Then we introduce a projection step, which implemented the sub-gradient methods, to jump out from a local optimum, so that we can achieve a global optimum. By the innovative projection step, we show the convergent property of the algorithm for general twostage stochastic programs. Furthermore, for the network recourse problem, our algorithm can drop the projection steps, but still maintains the convergence property. Thus, if we properly construct the initial piecewise linear functions, the pure piecewise linear approximation method is convergent for general two-stage stochastic programs. The proposed approximate dynamic programming algorithm overcomes the high dimensional state variables using methods from machine learning, and its logic capture the critical ability of the network structure to anticipate the impact of decisions now on the future. The optimization framework, which is carefully calibrated against historical performance, make it possible to introduce changes in the decisions and capture the collective intelligence of the experienced decisions. Computational results indicate that the algorithm exhibits rapid convergence.

**Keywords:** stochastic programming, piecewise linear approximation, machine learning, network, approximate dynamic programming

#### **1. Introduction**

Optimal learning addresses the challenge of how to collect information, as efficiently as possible, to make a decision in the present such that it minimizes the expectation of costs in the future with uncertainty. Collecting information is usually time consuming and expensive. For example, several large shippers, such as Amazon, Walmart, and IKEA, need to decide the quantity of products to ship from plants to warehouses to satisfy the retailers' demand. The retailer usually makes their decisions before knowing the real demand. Then, after they know the retail demand, they

optimize the shipping plans between retailers and warehouses. The aforementioned problems generally can be treated as the two-stage stochastic programs. The decisions that the retailer made now (Stage 1) will determines the state while solving the problem in future (Stage 2). Therefore, an optimal decision can be made now if we can compute the expected cost function (the recourse function) of Stage 2. In this chapter, we propose an efficient machine learning algorithm that can collect information very efficiently based on the knowledge gradient and solve the problem optimally. The main gap between MAT and proposed algorithm is that our proposed can collect the information based on knowledge gradient and overcomes the "curse of dimensionality". Besides, it can transfer the problem into a polynomial solvable problem and has been proven convergent theoretically.

#### **1.1 Motivation**

Optimal learning is a rich filed that includes contributions from different communities. At the moment, this chapter focus on optimal learning in two-stage stochastic program, which is a practically important problem. Problems of this type arise in several areas in dynamic programming, in which the decision maker need to make temporal and spatial decisions before realizing events that will influence the decisions. For example, in empty container repositioning problems [1], shipping companies need to reposition empty containers before realizing the demand. In locomotive planning problems [2], railroads have to decide the schedule of trains in which locomotives are assigned before disruptions occur across the railway network. For relief distribution problems [3], humanitarian decision makers need to distribute emergency aid to disaster locations when the emergency aid materials are very scarce amidst great uncertainties. For job scheduling problems [4], the managers need to decide initial staffing levels and their working plans before the demand are realized. Most of the aforementioned applications are fully sequential problems, and they can be modeled as two-stage stochastic programming problems. Hence, the research of two-stage stochastic optimization in this chapter is very important. However, the main obstacle in most practical problem is that the expected cost function in Stage 2 is quite complex due to uncertainty. In this chapter, we propose a hybrid learning algorithm called *projected stochastic hybrid learning algorithm* (ProSHLA) to approximate the expected recourse function for two-stage stochastic programs. In order to demonstrate the efficiency of the algorithm, we also theoretically prove the convergence of the proposed algorithm mathematically.

In essence, ProSHLA is a hybrid of stochastic sub-gradient and piecewise linear approximation methods. The core of ProSHLA consists of a series of learning steps those provide information for updating the recourse function through a sequence of piecewise linear separable approximations, and a series of the projection steps those can guarantee convergence by implementing the stochastic sub-gradient method. The mathematical analysis and the computational results all demonstrates that when the initial piecewise linear approximation function is properly constructed for two-stage stochastic programs with network recourse, the learning algorithm can drop the projection steps without sacrificing convergence. Moreover, without the projection step, the learning algorithm only consists of a series of learning steps through a sequence of piecewise linear separable approximations, and can solve the practical complex problems very efficiently. Our innovative finding can help the practitioner and the scholar to understand the open problem that has puzzled them for decades: why does the piecewise linear approximation method can be efficient and convergent

for stochastic programs with network recourse in practice. In this chapter, we provide the first theoretical support by our analytical results for the use of the piecewise linear approximation method in solving practical problems.

#### **1.2 Literature review**

In this chapter, we consider the two-stage stochastic programming problem as follows:

$$\min c\_0^T \mathbf{x} + E\_o[Q(\mathbf{x}, o)] \tag{1}$$

s.t.,

*Ax* ¼ *b*, *x*≥0,

where *<sup>X</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* denotes a convex compact set and the recourse function *Q x*ð Þ ,*<sup>ω</sup>* denotes the optimal value of the second stage problem:

$$Q(\mathbf{x}, a) = \min c\_1^T y(a) \tag{2}$$

s.t.,

*W*ð Þ *ω y* ¼ *h*ð Þ� *ω T*ð Þ *ω x*, *y*ð Þ *ω* ≥0ð Þ *ω :*

In the above model, variables *x* and *y* denote the decision variables of stage 1 and 2 problems, respectively. *A*, *W*ð Þ *ω* are constraint matrices, and parameters *c*<sup>0</sup> and *c*<sup>1</sup> denote the first and second stage vectors of cost coefficients, respectively.

Stochastic programming models and solution methods has been examined by many researchers. Comprehensive reviews and discussions were performed by Wallace and Ziemba [5]. The expected recourse function is extremely complex to evaluate except for a few special cases. There are various approximation methods those can be categorized into four groups. Let *Q x* bð Þ denote the approximate function. The first group includes scenario methods which use the sample average of *Q x*ð Þ ,*ω<sup>i</sup>* for several samples, *ω*1*, ω*2*, … ωN,* to approximate the expected recourse function [6]. The approximation function is usually successively updated by the following function:

$$
\widehat{Q}(\mathbf{x}) = \frac{\sum\_{i=1}^{N} Q(\mathbf{x}, \alpha\_i)}{N},
$$

Generally, the scenario method is very efficient, but it cannot guarantee to obtain the convergent solution.

The second group consists stochastic gradient techniques [7, 8], which updates solutions by using stochastic sub-gradients as directions. Usually, the approximate function can be successively updated by the following function:

$$
\hat{Q}(\mathbf{x}) = \left(\overline{\mathbf{g}}^{k}\right)^{T} \mathbf{x} \tag{3}
$$

where *g<sup>k</sup>* denotes a smoothed estimate of the gradient of the expected recourse function at *x* for iteration *k*. This method can be proven convergent by projection [9] or recursive linearization [10], although the drawback of this method is that it is time-consuming.

The third group mainly consists of primal and dual decomposition methods. The use of primal and dual decomposition methods dates back to Benders decomposition [11]. Van Slyke and Wets [12] first adopted the L-shaped algorithm into the application of Benders decomposition to two-stage stochastic programs. Pereira and Pinto [13] proposed the stochastic dual dynamic programming (SDDP) method, which has been widely applied in many areas. SDDP uses Benders cuts to compute an outer approximation of a (convex) recourse function, and constructs feasible dynamic programming policies. SDDP has led to numerous related approximation methods those are based on the same logic but seek to improve the approximation procedures by exploiting the underlying structure of the particular applications. These methods consist of use of inexact cuts [14], risk-averse variants [15], embedding SDDP in the scenario tree framework [16]. The convergence of SDDP and related methods has been proven by [17], for linear programs by Girardeau et al. [18].

The fourth group includes separable approximation methods [19, 20]. This type of methods usually replaces the expected recourse function in Eq. (1) with separable approximation functions as follows:

$$
\hat{Q}(\mathbf{x}) = \sum\_{i=1}^{I} \hat{Q}\_i(\mathbf{x}) \tag{4}
$$

If the separable functions *Q*b*i*ð Þ *x* are piecewise linear or linear, we can replace the expected recourse function in Eq. (1) with *Q x* bð Þ. Then we can solve the problem as a pure network flow problem for network recourse problems, which is polynomial solvable. Thus, it is very efficient. For example, Godfrey and Powell [21] proposed an adaptive piecewise concave approximation (CAVE) algorithm, and the experimental performance of the algorithm shows exceptionally good. However, there was none provable convergent results in their study. In order to provide convergent solutions, Cheung and Powell [19] proposed an approximation algorithm (SHAPE), which uses sequences of strongly convex approximation functions. However, the strongly convex functions require to construct a nonlinear term, and the strongly convex term might damage the pure network structure and need additional computational effort. This chapter intends to introduce an accurate and efficient approximations with the convergence property.

#### **1.3 Contributions of the algorithms**

In this chapter, we aim to develop a convergent method that can efficiently approximate the expected recourse function for two-stage stochastic programs. The main contributions are listed as following:


computational results and mathematical analysis both reveals that the algorithm can drop the projection step without sacrificing the convergence for two-stage stochastic programs with network recourse if we can properly construct the initial piecewise linear approximation functions. That means that a pure piecewise linear approximation can be indeed convergent, which is highly consistent with industry practices. This interesting finding answers the open question which has puzzled scholars for more than a decade: why does the piecewise linear approximation work well for two-stage stochastic programs in industry? Our mathematical analysis can provide the first theoretical support.


The remainder of this chapter is organized as follows. Section 2 presents the description and convergence analysis of the algorithm for general two-stage stochastic programs. The algorithm (without projection steps) for two-stage stochastic programs with network recourse are shown in Section 3. Section 4 demonstrates computational experiments based on an application of the empty container repositioning problem. Section 5 presents the conclusions and outline directions for future research.

#### **2. Description and convergence analysis of ProSHLA for general two-stage stochastic programs**

In this section, ProSHLA is first introduced. Subsequently, we analyze the convergence of ProSHLA for general two-stage stochastic programs.

#### **2.1 Description of ProSHLA**

To present ProSHLA mathematically, we let, at each iteration *k*, *α<sup>k</sup>* ¼ (possibly random) positive step size; *Q x*ð Þ¼ expected recourse function, that is, *Eω*½ � *Q x*ð Þ , *ω* ; *Q*b*k* ð Þ¼ *x* a convex differentiable approximation of *Q x*ð Þ; b*q k* ð Þ¼ *<sup>x</sup>* a subgradient of *<sup>Q</sup>*b*<sup>k</sup>* ð Þ *<sup>x</sup>* at *<sup>x</sup>*, that is <sup>b</sup>*<sup>q</sup> k* ð Þ *<sup>x</sup>* <sup>∈</sup> *<sup>∂</sup>Q*b*<sup>k</sup>* ð Þ *x* ; *<sup>g</sup><sup>k</sup>* <sup>¼</sup> a smoothed estimate of the gradient of *Q x*ð Þ at iteration *<sup>k</sup>*; *<sup>g</sup><sup>k</sup>* <sup>¼</sup> a stochastic subgradient of *Q x*ð Þ at *xk*, that is, *<sup>g</sup> <sup>k</sup>* ∈ *∂Q*(*x<sup>k</sup> ,ω<sup>k</sup>* + 1); *Hk* ¼ f*ω*1*, ω*2*, … ωN*g ¼ the history up to and including ð Þ iteration *k:*

For a general non-smooth convex function *Q x* bð Þ, its sub-differential can be defined as follows:

$$\partial \widehat{Q}(\varkappa) = \left\{ \widehat{q}(\varkappa) \in \mathfrak{R}^n : \widehat{Q}(\wp) - \widehat{Q}(\varkappa) \ge \widehat{q}(\varkappa)^\mathsf{T} \,(\wp - \varkappa) \right\}.$$

We combine Eqs. (1), (3), and (4) to form an approximation at iteration *k* as follows:

$$\min \mathbf{c}\_0^T \mathbf{x} + \hat{\mathbf{Q}}^0(\mathbf{x}) + \left(\overline{\mathbf{g}}^k\right)^T \mathbf{x} \tag{5}$$

In this study, we approximate the expected recourse function at iteration *k* via a convex, differentiable approximation *<sup>Q</sup>*<sup>b</sup> <sup>0</sup> ð Þ *<sup>x</sup>* with a linear correction term *<sup>g</sup><sup>k</sup>* � �*<sup>T</sup> x*. At each iteration, the linear correction term *g<sup>k</sup>* � �*<sup>T</sup> x* are introduced to improve the initial approximation *<sup>Q</sup>*<sup>b</sup> <sup>0</sup> ð Þ *x* . Note that here we use a convex initial approximation function *Q*b 0 ð Þ *x* , whereas SHAPE uses strongly convex approximation functions. SHAPE will introduce a nonlinear term in the approximation function to maintain the strong convexity property, and it might destroy the pure network flow problem structure and demands additional computational effort. Moreover, we do not calculate *g<sup>k</sup>* in the usual manner to obtain stochastic sub-gradients in this study. We use the following form in our model instead:

$$
\hat{\mathbf{p}}\min \mathbf{c}\_0^T \mathbf{x} + \hat{\mathbf{Q}}^k(\mathbf{x}) + a\_k \left(\mathbf{g}^k - \hat{\mathbf{q}}^k(\mathbf{x})\right)^T \mathbf{x},\tag{6}
$$

where *<sup>Q</sup>*b*<sup>k</sup>* ð Þ *x* is updated as follows:

$$
\widehat{\boldsymbol{Q}}^{k+1}(\boldsymbol{\omega}) = \widehat{\boldsymbol{Q}}^k(\boldsymbol{\omega}) + a\_k \left(\boldsymbol{g}^k - \widehat{\boldsymbol{q}}^k(\boldsymbol{\omega})\right)^T \boldsymbol{\upchi} \tag{7}
$$

The greatest merit of updating *<sup>Q</sup>*b*<sup>k</sup>*þ<sup>1</sup> ð Þ *x* in the above way is that it can retain the stochastic sub-gradients <sup>b</sup>*<sup>q</sup> k <sup>x</sup><sup>k</sup>* � �, <sup>b</sup>*<sup>q</sup> <sup>k</sup>*�<sup>1</sup> � (*xk* � <sup>1</sup> )*, … ,*b*<sup>q</sup>* 0 *<sup>x</sup>*<sup>0</sup> ð Þ� used in the previous iterations. Thus, in iteration *k,* the objective function involves a weighted average of stochastic sub-gradients in the past (*<sup>k</sup>* � 1) iterations. As shown later in Lemma 2, *<sup>g</sup><sup>k</sup>* in Eq. (5) is a linear combination of *g*1, *g*2, *… g<sup>k</sup>*�1.

Let *PX* : *<sup>R</sup><sup>n</sup>* ! *<sup>X</sup>* be the orthogonal projection onto *<sup>X</sup>* [9]. Then, we can obtain a sequence of solutions {*xk*} using the following procedure (**Figure 1**).

Generally, ProSHLA consists of two-level loops. In the first-level loop, there exists a series of *passes*, and in the second-level loop, the exists a series of projection steps, which include the step 5 and 6. We first construct an initial bounded and piecewise linear convex approximation function *<sup>Q</sup>*<sup>b</sup> <sup>0</sup> (*x*) at the beginning of the first pass, then the initial solution *x*<sup>0</sup> can be obtained by solving problem (1). A realization of the random quantity *ω* ∈ Ω can be drawn, and then we can obtain a stochastic subgradient of *Q*<sup>0</sup> ð Þ *x* by solving the resulting deterministic problem. Compared with the slope of *<sup>Q</sup>*<sup>b</sup> <sup>0</sup> (*x*) and the stochastic sub-gradient at *x* = *x*<sup>0</sup> , the difference of these two slopes can be used as a linear term to update *<sup>Q</sup>*<sup>b</sup> <sup>0</sup> (*x*). Subsequently, we can obtain a

$$Q^{k+1}(x) = Q^k(x) + a\_k ("\ g^k - \Phi^k(x))^T x.$$

$$x^{k+1} = \arg\min\_{\mathbf{x}} \quad Q^{k+1}(\mathbf{x}),\tag{8}$$

$$\mathbf{x}^{k+1} = P\_X(\mathbf{x}^k - \alpha\_k g^k). \tag{9}$$

#### **Figure 1.**

*The procedure of ProSHLA.*

new solution *x<sup>k</sup>* + 1 using the updated approximation function. If the sub-gradient vectors <sup>b</sup>*q<sup>m</sup> xk*þ<sup>1</sup> � � of the newly obtained solution *xk* + 1 are equal to sub-gradient of solution *<sup>x</sup><sup>m</sup>*, that is <sup>b</sup>*q<sup>m</sup> <sup>x</sup><sup>m</sup>* � �, the piecewise linear approximation might have jumped into a local optimum. Subsequently, ProSHLA need to jump out from local optimum by implementing projection steps in the second-level loop. If we obtain a new solution *xk* + 1 in the second-level loop and the sub-gradient <sup>b</sup>*q<sup>m</sup> xk*þ<sup>1</sup> � � is different from subgradient <sup>b</sup>*q<sup>m</sup> xm* � �, then ProSHLA can jump out the second-level loop, and comes to the end of second pass. Thus then, we can repeat the entire process. Finally, ProSHLA will be terminated when the total absolute change in *<sup>Q</sup>*b*<sup>l</sup>* ð Þ *x* over a certain number of iterations is low (e.g. P*<sup>k</sup> <sup>l</sup>*¼*k*�*M*þ<sup>1</sup>k*Q*b*<sup>l</sup>* ð Þ� *<sup>x</sup> <sup>Q</sup>*b*<sup>l</sup>*�<sup>1</sup> ð Þ *x* ∣< *δ*).

Here we point out the main difference between SHAPE and ProSHLA. The most remarkable difference is that ProSHLA uses convex approximation functions while SHAPE uses strongly convex approximation functions. The strongly convexity always maintains a nonlinear term in the approximation function. And this term might destroy the pure network flow structure and causes additional computational effort. To overcome the drawback of the SHAPE, we introduce the projection step in the second-level loop and construct approximation functions. Particularly, the approximation functions in the ProSHLA is NOT strictly convex, while it needs to be strictly convex in SHAPE. Without the projection step in the second-level loop, the ProSHLA might stuck in the corner local –optimum for stochastic linear programs. Thus, ProSHLA can work well for most practical stochastic linear programs, because most of practical stochastic programs are piecewise convex problems.

#### **2.2 Convergence analysis of ProSHLA**

Firstly, we demonstrate the convergence theorem of ProSHLA in this subsection. Then, several properties of approximation are presented. Finally, we use these properties to prove the convergence of ProSHLA.

Without loss of generality, the following assumptions are listed.

**(A.1)** *X* ⊂ R*<sup>n</sup>* is compact and convex.

� � **(A.2)** *EωQ x*ð Þ 1, *ω* is convex, finite and continuous on *X*.

**(A.3)** *g<sup>k</sup>* is bounded such that *g<sup>k</sup>* � � � � <sup>≤</sup>*c*<sup>1</sup> for each *<sup>ω</sup>*<sup>∈</sup> <sup>Ω</sup>;b*<sup>q</sup> <sup>k</sup>* is bounded such that b*q <sup>k</sup>* � � � � <sup>≤</sup>*c*<sup>2</sup> for each *<sup>ω</sup>*<sup>∈</sup> <sup>Ω</sup>.

**(A.4)** Piecewise linear function *<sup>Q</sup>*b*<sup>k</sup>* ð Þ *x* are convex, implying that

$$
\widehat{Q}^k(\mathfrak{x}\_1) - \widehat{Q}^k(\mathfrak{y}\_1) \le \widehat{q}^k(\mathfrak{x}\_1)^T (\mathfrak{x}\_1 - \mathfrak{y}\_1) .
$$

**(A.5)** The stepsize *α<sup>k</sup>* are H*<sup>k</sup>* measurable and satisfy

$$0 < a\_k < \mathbf{1}, \ \sum\_{k=0}^{\infty} E\{a\_k^2\} \le \infty$$

Except for the assumption from (A.1) to (A.5), we also introduce the following assumption to characterize the piecewise linear convex approximation functions.

**(A.6)** There exists a positive *b* and a constant *δ*, such that for any two points *<sup>x</sup>*1, *<sup>y</sup>*<sup>1</sup> <sup>∈</sup> *<sup>X</sup>*, ifj*x*<sup>1</sup> � *<sup>y</sup>*1j>*δ*, then <sup>j</sup>b*q x*ð Þ� <sup>1</sup> <sup>b</sup>*q y*<sup>1</sup> � �j<sup>≥</sup> *<sup>b</sup>*j*x*<sup>1</sup> � *<sup>y</sup>*1j. If there exists <sup>b</sup>*q x*ð Þ<sup>1</sup> and <sup>b</sup>*q y*<sup>1</sup> � � such that <sup>b</sup>*<sup>q</sup> k* ð Þ� *<sup>x</sup>*<sup>1</sup> <sup>b</sup>*<sup>q</sup> k y*1 � � <sup>¼</sup> 0, then <sup>j</sup>*x*<sup>1</sup> � *<sup>y</sup>*1j<sup>≤</sup> *<sup>δ</sup>*. If *<sup>δ</sup>* ! 0, then the function corresponds to a strongly convex function, if *δ* ! ∞, then the function becomes purely linear.

Given assumption (A.1)–(A.6), we obtain the following theorem of ProSHLA.

**Theorem 1.** If assumptions (A.1)–(A.6) are satisfied, then the sequence of *xk* 1 � � generated by algorithm ProSHLA converges almost surely to the optimal solution *x*∗ <sup>1</sup> ∈ *X*<sup>∗</sup> of problem (1).

In order to prove the Theorem 1, we need to use the following Martingale convergence theorem and three lemmas.

**Martingale Convergence Theorem.** A sequence of random variables *W<sup>k</sup>* � �, which are H*<sup>k</sup>* measurable, is said to be a super-martingale if there exists the sequence of conditional expectations *E W<sup>k</sup>*þ<sup>1</sup> jH*<sup>k</sup>* �� � and satisfies *E W<sup>k</sup>*þ<sup>1</sup> jH*<sup>k</sup>* � �≤*W<sup>k</sup>* � .

**Theorem 2.** (From reference [22]) Let *W<sup>k</sup>* be a positive super-martingale. Then, *W<sup>k</sup>* converges to a finite random variables a.s.

From the above theorem, we can conclude that *W<sup>k</sup>* is a stochastic decreasing analogue essentially.

Based on the convexity property, the optimal solution for problem (8) at iteration *m* can be characterized by the following inequality:

$$\left(\widehat{q}^{\overline{m}}\left(\boldsymbol{\chi}\_{1}^{\overline{m}}\right)\right)^{T}\left(\boldsymbol{\chi}\_{1}-\boldsymbol{\chi}\_{1}^{\overline{m}}\right)\geq 0,\quad\forall\boldsymbol{\chi}\_{1}\in X\tag{8}$$

To obtain Theorem 1, the following three lemmas are required. The first lemma shows that the difference between the solutions of two consecutive update processes will be bounded by the step-size and the stochastic gradient. The second lemma indicates that the approximation *<sup>Q</sup>*b*<sup>k</sup>* ð Þ *<sup>x</sup>*<sup>1</sup> is finite. The third lemma shows that *<sup>T</sup><sup>k</sup>* (which will be denoted in Lemma 3) is bounded.

$$a\_{\overline{m}} \mathbf{g}^{\overline{m}} \left( \boldsymbol{\varkappa}\_1^i - \boldsymbol{\varkappa}\_1^j \right) \le \left( a\_{\overline{m}} c\_1 \right)^2 / b \tag{9}$$

$$\left(\widehat{q}^{\overline{m+1}}\left(\mathbf{x}\_1^{\overline{m+1}}\right)\right)^T \left(\mathbf{x}\_1 - \mathbf{x}\_1^{\overline{m+1}}\right) \ge \mathbf{0}, \forall \mathbf{x}\_1 \in X \tag{10}$$

$$\left(\widehat{q}^{\overline{m+1}-1}\left(\mathbf{x}\_1^{\overline{m+1}}\right) + a\_{\overline{m+1}-1}\left(\mathbf{g}^{\overline{m+1}-1} - \widehat{q}^{\overline{m+1}-1}\left(\mathbf{x}\_1^{\overline{m+1}-1}\right)\right)\right)^T \left(\mathbf{x}\_1 - \mathbf{x}\_1^{\overline{m+1}}\right) \ge 0, \forall \mathbf{x}\_1 \in X \tag{11}$$

$$\begin{aligned} & \left( \alpha\_{\overline{m+1}-1} \left( \hat{\mathbf{g}^{\overline{m+1}-1}} - \hat{\mathbf{q}}^{\overline{m+1}-1} \left( \boldsymbol{\varkappa}\_{1}^{\overline{m+1}-1} \right) \right)^{T} \left( \boldsymbol{\varkappa}\_{1}^{\overline{m+1}-1} - \boldsymbol{\varkappa}\_{1}^{\overline{m+1}} \right) \\ & \geq \hat{\mathbf{q}}^{\overline{m+1}-1} \left( \boldsymbol{\varkappa}\_{1}^{\overline{m+1}} \right)^{T} \left( \boldsymbol{\varkappa}\_{1}^{\overline{m+1}} - \boldsymbol{\varkappa}\_{1}^{\overline{m+1}-1} \right) \end{aligned} \tag{12}$$

$$\begin{split} & \alpha \overline{q\_{m+1-1}} \left( \widetilde{\mathbf{g}^{m+1-1}} \right)^{T} \left( \mathbf{x}\_{1}^{\overline{m+1}-1} - \mathbf{x}\_{1}^{\overline{m+1}} \right) \\ & \geq \widetilde{\mathbf{q}}^{\overline{m+1-1}} \left( \mathbf{x}\_{1}^{\overline{m+1}} \right)^{T} \left( \mathbf{x}\_{1}^{\overline{m+1}} - \mathbf{x}\_{1}^{\overline{m+1}-1} \right) - a\_{\overline{m+1-1}} \left( \widetilde{\mathbf{q}}^{\overline{m+1-1}} \left( \mathbf{x}\_{1}^{\overline{m+1}-1} \right) \right)^{T} \left( \mathbf{x}\_{1}^{\overline{m+1}} - \mathbf{x}\_{1}^{\overline{m+1}-1} \right) \\ & = \left( \widetilde{\mathbf{q}}^{\overline{m+1}-1} \left( \mathbf{x}\_{1}^{\overline{m+1}} \right) - \widetilde{\mathbf{q}}^{\overline{m+1}-1} \left( \mathbf{x}\_{1}^{\overline{m+1}-1} \right) \right)^{T} \left( \mathbf{x}\_{1}^{\overline{m+1}} - \mathbf{x}\_{1}^{\overline{m+1}-1} \right) \\ & \quad + \left( 1 - a\_{\overline{m+1}-1} \right) \left( \widetilde{\mathbf{q}}^{\overline{m+1}-1} \left( \mathbf{x}\_{1}^{\overline{m+1}-1} \right) \right)^{T} \left( \mathbf{x}\_{1}^{\overline{m+1}} - \mathbf{x}\_{1}^{\overline{m+1}-1} \right) \\ & \qquad \tag{13} \end{split} \tag{13}$$

$$\begin{aligned} &\alpha\_{\overline{m+1}-1} \left( \boldsymbol{g}^{\overline{m+1}-1} \right)^{T} \left( \boldsymbol{\chi\_{1}^{\overline{m+1}-1}} - \boldsymbol{\chi\_{1}^{\overline{m+1}}} \right) \\ &\geq b \left\| \boldsymbol{\chi\_{1}^{\overline{m+1}-1}} - \boldsymbol{\chi\_{1}^{\overline{m+1}}} \right\|^{2} + \left( 1 - \alpha\_{\overline{m+1}-1} \right) \left( \boldsymbol{\widetilde{q}^{\overline{m+1}-1}} \left( \boldsymbol{\chi\_{1}^{\overline{m+1}-1}} \right) \right)^{T} \left( \boldsymbol{\chi\_{1}^{\overline{m+1}}} - \boldsymbol{\chi\_{1}^{\overline{m+1}-1}} \right) \\ &\geq b \left\| \boldsymbol{\chi\_{1}^{\overline{m+1}-1}} - \boldsymbol{\chi\_{1}^{\overline{m+1}}} \right\|^{2} .\end{aligned}$$

$$\begin{aligned} &a\_{\overline{m+1}-1} \left\| \left. \overline{g^{\overline{m+1}-1}} \right| \right\| \star \left\| \left. \varkappa\_1^{\overline{m+1}-1} - \varkappa\_1^{\overline{m+1}} \right\| \right\| \ge a\_{\overline{m+1}-1} \left( \left. \overline{g^{\overline{m+1}-1}} \right\vert^T \left( \varkappa\_1^{\overline{m+1}-1} - \varkappa\_1^{\overline{m+1}} \right) \right) \\ &\ge b \left\| \left. \varkappa\_1^{\overline{m+1}-1} - \varkappa\_1^{\overline{m+1}} \right\vert \right\|^2 \end{aligned}$$

$$a\_{\overline{m+1}-1} \left(\mathcal{g}^{\overline{m+1}-1}\right)^{T} \left(\mathcal{x}\_{1}^{\overline{m+1}-1} - \mathcal{x}\_{1}^{\overline{m+1}}\right) \le \left(a\_{\overline{m+1}-1}c\_{1}\right)^{2}/b \tag{14}$$

$$\alpha\_{\overline{m}} (\mathbf{g}^{\overline{m}})^T \left(\mathbf{x}\_1^i - \mathbf{x}\_1^{\overline{m+1}}\right) \le \left(a\_{\overline{m}}c\_1\right)^2 / b \tag{15}$$

$$\alpha\_{\overline{m}} \left(\mathbf{g}^{\overline{m}}\right)^{T} \left(\boldsymbol{\varkappa}\_{1}^{i} - \boldsymbol{\varkappa}\_{1}^{j}\right) \leq \left(\ \alpha\_{\overline{m}} \boldsymbol{c}\_{1}\right)^{2} / b \tag{16}$$

$$\widehat{d} \ge \max\_{k} |\mathbf{g}^{k} - \widehat{q}^{0}(\mathbf{x}\_{1}^{k})| \tag{17}$$

$$T^{j} - T^{i} \le a\_{\overline{m}} \left(\mathbf{g}^{\overline{m}}\right)^{T} \left(\boldsymbol{\varkappa}\_{1}^{i} - \boldsymbol{\varkappa}\_{1}^{j}\right) + a\_{\overline{m}} \left(\mathbf{g}^{\overline{m}}\right)^{T} \left(\boldsymbol{\varkappa}\_{1}^{\*} - \boldsymbol{\varkappa}\_{1}^{i}\right). \tag{18}$$

$$\begin{aligned} \widehat{\boldsymbol{Q}}^{\overline{m+1}}(\boldsymbol{\varkappa}\_{1}) &= \widehat{\boldsymbol{Q}}^{\overline{m+1}-1}(\boldsymbol{\varkappa}\_{1}) + a\_{\overline{m+1}-1} \left( \boldsymbol{\varrho}^{\overline{m+1}-1} - \widehat{\boldsymbol{q}}^{\overline{m+1}-1} \left( \boldsymbol{\varkappa}\_{1}^{\overline{m+1}-1} \right) \right)^{T} \boldsymbol{\varkappa}\_{1} \\ &= \widehat{\boldsymbol{Q}}^{\overline{m}}(\boldsymbol{\varkappa}\_{1}) + a\_{\overline{m}} \left( \boldsymbol{\varrho}^{\overline{m}} - \widehat{\boldsymbol{q}}^{\overline{m}} \left( \boldsymbol{\varkappa}\_{1}^{\overline{m}} \right) \right)^{T} \boldsymbol{\varkappa}\_{1} \end{aligned}$$

$$
\hat{Q}^{\overline{m}}(\varkappa\_1^{\overline{m}}) - \hat{Q}^{\overline{m}}\left(\varkappa\_1^{\overline{m+1}}\right) \leq \left(\hat{q}^{\overline{m}}\right)^T \left(\varkappa\_1^{\overline{m}} - \varkappa\_1^{\overline{m+1}}\right) \tag{19}
$$

$$\begin{split} \left(\widehat{\boldsymbol{Q}}^{\overline{m}}(\boldsymbol{\kappa}\_{1}^{\overline{m}}) - \widehat{\boldsymbol{Q}}^{\overline{m}}\left(\boldsymbol{\kappa}\_{1}^{\overline{m+1}}\right) \leq \left(\widehat{\boldsymbol{q}}^{\overline{m}}\right)^{T}\left(\boldsymbol{\kappa}\_{1}^{\overline{m}} - \boldsymbol{\kappa}\_{1}^{\overline{m+1}}\right) \\ = \left(1 - a\_{\overline{m}}\right)\left(\widehat{\boldsymbol{q}}^{\overline{m}}\right)^{T}\left(\boldsymbol{\kappa}\_{1}^{\overline{m}} - \boldsymbol{\kappa}\_{1}^{\overline{m+1}}\right) + a\_{\overline{m}}\left(\widehat{\boldsymbol{q}}^{\overline{m}}\right)^{T}\left(\boldsymbol{\kappa}\_{1}^{\overline{m}} - \boldsymbol{\kappa}\_{1}^{\overline{m+1}}\right) \end{split} \tag{20}$$

$$\begin{aligned} \text{Thus, } T^{\overline{m+1}} - T^{\overline{m}} &\leq \alpha\_{\overline{m}} (\underline{g}^{\overline{m}})^{\top} \left( \mathbf{x}\_{1}^{\overline{m}} - \mathbf{x}\_{1}^{\overline{m+1}} \right) + \alpha\_{\overline{m}} (\underline{g}^{\overline{m}})^{\top} (\mathbf{x}\_{1}^{\*} - \mathbf{x}\_{1}^{\overline{m}}). \\ \text{For any} &\in [\overline{m}, \overline{m+1} - 1], \mathbf{g}^{i} = \mathbf{g}^{\overline{m}} = \mathbf{g}^{\overline{m+1}-1} \text{ and } \hat{q}^{i}(\mathbf{x}\_{1}) = \hat{q}^{\overline{m}}(\mathbf{x}\_{1}) = \hat{q}^{\overline{m+1}-1}(\mathbf{x}\_{1}) \text{ for } \\ \text{any } \boldsymbol{\pi}\_{1} \in \boldsymbol{X}. \text{ Therefore,} \end{aligned}$$

$$T^{\overline{m+1}} - T^i \le a\_{\overline{m}} (\mathcal{g}^{\overline{m}})^T \left(\boldsymbol{\varkappa}\_1^i - \boldsymbol{\varkappa}\_1^{\overline{m+1}}\right) + a\_{\overline{m}} (\mathcal{g}^{\overline{m}})^T \left(\boldsymbol{\varkappa}\_1^\* - \boldsymbol{\varkappa}\_1^i\right) \tag{21}$$

$$T^{j} - T^{i} \le a\_{\overline{m}} \left(\mathbf{g}^{\overline{m}}\right)^{T} \left(\boldsymbol{\varkappa}\_{1}^{i} - \boldsymbol{\varkappa}\_{1}^{j}\right) + a\_{\overline{m}} \left(\mathbf{g}^{\overline{m}}\right)^{T} \left(\boldsymbol{\varkappa}\_{1}^{\*} - \boldsymbol{\varkappa}\_{1}^{i}\right) \tag{22}$$

finite iterations before the algorithm stops, which means *m* þ 1 � *m* < *M* for any *m* (*M* represents a large number). For the second scenario, ProSHLA might terminate in a given update process. In the following text, Theorem 1 is proven for each scenario.

**Scenario 1: ProSHLA** does not stop in a given update process.

In the first scenario, a subsequence of *xk* 1 � �, *xm* 1 � � are considered. We will prove that the subsequence *xm* 1 � � converges to the true optimal *x*<sup>∗</sup> <sup>1</sup> *:* According to the definition of *g<sup>k</sup>* ∈*∂Q x<sup>k</sup>* <sup>1</sup> , *<sup>ω</sup>k*þ<sup>1</sup> � �, we can obtain the following inequality

$$\left(\mathbf{g}^{k}\right)^{T}\left(\mathbf{x}\_{1}^{\*}\,-\,\mathbf{x}\_{1}^{k}\right)\leq\mathbf{Q}\left(\mathbf{x}\_{1}^{\*},\,\boldsymbol{\alpha}^{k+1}\right)-\mathbf{Q}\left(\mathbf{x}\_{1}^{k},\,\boldsymbol{\alpha}^{k+1}\right)\tag{23}$$

where *Q x*1, *ωk*þ<sup>1</sup> � � represents the operational cost function given the outcome *ωk*þ<sup>1</sup> According to Lemma 1, we can obtain the following inquality:

$$\alpha\_{\overline{m}} \left(\mathbf{g}^{\overline{m}}\right)^{T} \left(\boldsymbol{\varkappa}\_{1}^{i} - \boldsymbol{\varkappa}\_{1}^{j}\right) \leq \left(\boldsymbol{\alpha}\_{\overline{m}} \boldsymbol{\varepsilon}\_{1}\right)^{2} / b \tag{24}$$

.

On the basis of Lemma 3, the difference *<sup>T</sup><sup>m</sup>*þ<sup>1</sup> � *<sup>T</sup><sup>m</sup>* can be described as follows:

$$\begin{split} T^{\overline{m+1}} - T^{\overline{m}} &\leq a\_{\overline{m}} (\mathbf{g}^{\overline{m}})^{T} \left( \mathbf{x}\_{1}^{\overline{m}} - \mathbf{x}\_{1}^{\overline{m+1}} \right) + a\_{\overline{m}} (\mathbf{g}^{\overline{m}})^{T} \left( \mathbf{x}\_{1}^{\*} - \mathbf{x}\_{1}^{\overline{m}} \right) \\ &\leq -a\_{\overline{m}} \left( Q\left( \mathbf{x}\_{1}^{\overline{m}}, \boldsymbol{\alpha}^{\overline{m+1}} \right) - Q\left( \mathbf{x}\_{1}^{\*}, \boldsymbol{\alpha}^{\overline{m+1}} \right) \right) + a\_{\overline{m}} (\mathbf{g}^{\overline{m}})^{T} \left( \mathbf{x}\_{1}^{\*} - \mathbf{x}\_{1}^{\overline{m}} \right) \\ &\leq -a\_{\overline{m}} \left( Q\left( \mathbf{x}\_{1}^{\overline{m}}, \boldsymbol{\alpha}^{\overline{m+1}} \right) - Q\left( \mathbf{x}\_{1}^{\*}, \boldsymbol{\alpha}^{\overline{m+1}} \right) \right) + \left( a\_{\overline{m}} c\_{1} \right)^{2} / b \end{split} \tag{25}$$

Conditional expectation of Eq. (27) with respect to H*<sup>k</sup>* can be taken on both side and then we can obtain

$$E\left(T^{\overline{m+1}}|\mathcal{H}\_{\overline{m}}\right) \leq T^{\overline{m}} - a\_{\overline{m}}\left(\overline{Q}\left(\varkappa\_{1}^{\overline{m}}\right) - \overline{Q}\left(\varkappa\_{1}^{\*}\right)\right) + \left(a\_{\overline{m}}c\_{1}\right)^{2}/b^{2}$$

where *Q x*ð Þ<sup>1</sup> represents the expected recourse function, that is *EωQ x*ð Þ 1, *ω :* Given the conditioning on H*k*, *T<sup>m</sup>*,*α<sup>m</sup>* and *xm* <sup>1</sup> on the right-hand side are deterministic. The conditioning H*<sup>k</sup>* cannot provide any information on *ω<sup>m</sup>*þ1. Hence, we replace *Q x*1, *ω<sup>m</sup>*þ<sup>1</sup> � � (for *<sup>x</sup>*<sup>1</sup> <sup>¼</sup> *xk* <sup>1</sup> and *<sup>x</sup>*<sup>1</sup> <sup>¼</sup> *<sup>x</sup>*<sup>∗</sup> <sup>1</sup> ) with its expectation *Q x*ð Þ<sup>1</sup> . Given that *α<sup>m</sup> Q xm* 1 � � � *Q x*<sup>∗</sup> 1 � � � � ≥0, the sequence

$$\mathcal{W}^{\overline{m}} = T^{\overline{m}} + (\left(a\_{\overline{m}}c\_1\right)^2 / b\tag{26}$$

is a positive supermartingale. Theorem 2 implies the almost sure convergence of *W<sup>m</sup>*. Hence,

$$T^{\overline{m}} \to T^\* \quad \text{a.s.} \tag{27}$$

We perform the summation of Eq. (27) from 0 to *M* and obtain the following inequality:

$$T^{\overline{M}} - T^0 \le -\sum\_{\overline{m}=0}^{\overline{M}} \alpha\_{\overline{m}} \left( Q\left( \mathbf{x}\_1^{\overline{m}}, \boldsymbol{\alpha}^{\overline{m+1}} \right) - Q\left( \mathbf{x}\_1^\*, \boldsymbol{\alpha}^{\overline{m+1}} \right) \right) + \sum\_{\overline{m}=0}^{\overline{M}} \left( \alpha\_{\overline{m}} \boldsymbol{\alpha}\_1 \right)^2 / b \tag{28}$$

We take the expectation of both sides. We take the conditional expectation with respect to H*<sup>m</sup>* and then over all H*<sup>m</sup>* for the first term on the right-hand side.

$$\begin{split} &E\left(T^{\overline{M+1}} - T^0\right) \\ &\leq -\sum\_{\overline{m}=0}^{\overline{M}} E\left\{ E\left\{ a\_{\overline{m}} \left( Q\left( \mathbf{x}\_1^{\overline{m}}, a^{\overline{m+1}} \right) \right) \right\} \right\} - Q\left( \mathbf{x}\_1^\*, a^{\overline{m+1}} \right) \Big\} \mathcal{H}\_{\overline{m}} \right\} \Big\} + E\left\{ \sum\_{\overline{m}=0}^{\overline{M}} \frac{(a\_{\overline{m}} c\_1)^2}{b} \right\} \\ &\leq -\sum\_{\overline{m}=0}^{\overline{M}} E\left\{ a\_{\overline{m}} \left( Q\left( \mathbf{x}\_1^{\overline{m}}, a^{\overline{m+1}} \right) - Q\left( \mathbf{x}\_1^\*, a^{\overline{m+1}} \right) \right) \Big\} \mathcal{H}\_{\overline{m}} \right\} \\ &+ (c\_1)^2 / b \sum\_{\overline{m}=0}^{\overline{M}} E\left\{ a\_{\overline{m}}^2 \right\} \end{split}$$

We take the limit as *<sup>M</sup>* ! <sup>∞</sup> and use the finiteness of *<sup>T</sup><sup>M</sup>* and <sup>P</sup> *M m*¼0 *E α*<sup>2</sup> *m* � � to obtain

$$\sum\_{m=0}^{\overline{M}} E\left\{ \alpha\_{\overline{m}}\left(Q\left(\mathbf{x}\_1^{\overline{m}}, \boldsymbol{\omega}^{\overline{m+1}}\right) - Q\left(\mathbf{x}\_1^\*, \boldsymbol{\omega}^{\overline{m+1}}\right)\right) \middle| \mathcal{H}\_{\overline{m}}\right\} < \infty \tag{29}$$

Given that *Q xm* <sup>1</sup> ,*ω<sup>m</sup>*þ<sup>1</sup> � � � *Q x*<sup>∗</sup> <sup>1</sup> ,*ω<sup>m</sup>*þ<sup>1</sup> � �≥0 and <sup>P</sup> *M m*¼0 *E α*<sup>2</sup> *m* � � <sup>¼</sup> <sup>∞</sup>ð Þ *<sup>a</sup>:s:* , there exists a subsequence f g *m* such that

$$
\overline{Q}(\mathfrak{x}\_1^{\overline{m}}) \to \overline{Q}(\mathfrak{x}\_1^\*) \quad a.s.
$$

By continuity of *Q*, the sequence converges. Hence,

$$x\_1^{\overline{m}} \to x\_1^\* \quad a.s.$$

Subsequently, we construct another subsequence *x<sup>m</sup>*�<sup>1</sup> 1 � �. Based on Eq. (27),

$$E\left(T^{\overline{M+2}-1} - T^{\overline{M+1}-1}\right) \le -\sum\_{\overline{m}=0}^{\overline{M}} a\_{\overline{m}} \left( Q\left(\mathbf{x}\_1^{\overline{m+1}-1}, a^{\overline{m+2}-1}\right) - Q\left(\mathbf{x}\_1^\*, a^{\overline{m+2}-1}\right) \right) + \left(a\_{\overline{m}}c\_1\right)^2/b^2$$

Like-wise, the following approximation can be proved:

$$x\_1^{\overline{m}-1} \to x\_1^\* \quad a.s.$$

By analogic condition, a very general subsequence *x<sup>i</sup>* 1 � �, *<sup>i</sup>* <sup>∈</sup> *<sup>m</sup>*, *<sup>m</sup>* <sup>þ</sup> <sup>1</sup> � <sup>1</sup> � � will almost surely converge to *x*<sup>∗</sup> <sup>1</sup> . Here, we term this type of subsequence *Xs*.

In the procedure of ProSHLA, the number of all update iterations is finite. Thus, for any subsequence of *xk* 1 � �, we can obtain a subsequence that always belongs to *Xs*. Then, the following conclusion can be obtained:

$$\mathbf{x}\_1^{\mathbf{k}} \to \mathbf{x}\_1^\* \quad \text{a.s.}$$

**Scenario 2:** ProSHLA halts in a given update process.

For the second scenario, ProSHLA halts in a projection procedure which generates a convergent sequence.

Hence, the conclusion of Theorem 1 can be finally obtained. □.

The above analytical processes demonstrate the convergence property of

ProSHLA. According to the above results, we require the function *Q x* bð Þ to be piecewise linear convex. However, for practitioners are interested in a practically scenarios, in which they usually use separable functions to approximate the expected recourse function for stochastic programs with network recourse. Based on Eq. (4), if the separable functions are piecewise linear or purely linear, then practitioners can easily solve this network recourse problem, because a pure network flow problem is polynomial solvable. In the following section, we will discuss this special practice scenario.

#### **3. Application for two-stage stochastic programs with network recourse using separable piecewise linear functions**

In this section, we will discuss the scenario where *Q x* bð Þ is separable for two-stage stochastic programs with network recourse. For this scenario, we can simplify implement ProSHLA without projection step. We denote this simplified version as the Stochastic Hybrid Learning Algorithm (SHLA), which is described as follows (**Figure 2**).

Essentially, SHLA is not convergent. However, if it is applied to two-stage stochastic programs with network recourse, SHLA will enjoys several merits as follows: (1) the solution of *Q*(*x,ω*) is naturally integer; (2) at each iteration, problem *Q*(*x,ω*) is simple network flow problem that can be solved by polynomial algorithm.

Here, if we use separable functions, then assumption (A.6) can be satisfied by the following artificially expression:

$$
\widehat{q}^0(\mathbf{x}\_i) < \widehat{q}^0(\mathbf{x}\_i + \delta).
$$

Note that for both ProSHLA and SHLA, it allows to choose initial approximation function with different value of *δ* flexibly. Thus, if *δ* is set to be 1 for any *i*, then we can guarantee the following expression:

$$Q^{k+1}(x) = Q^k(x) \, | \, + \, \alpha\_k (\, | \, g^k - \delta^k(x) \rangle^\Gamma x \tag{32}$$

**Figure 2.** *Description of SHLA.*

$$
\hat{q}\_i^0(\mathbf{x}\_i) < \hat{q}\_i^0(\mathbf{x}\_i + \delta). \tag{A.7}
$$

Then, we can reach Theorem 3 below.

**Theorem 3.** If (A.7) is satisfied, SHLA is always convergent for two-stage stochastic programs problem with network recourse.

**Proof.** For any *x*, *y*∈*X*, if there are unequal, we can obtain the following expression according to (A.7).

$$
\widehat{q}^k(\mathfrak{x}) \neq \widehat{q}^k(\mathfrak{y}).
$$

Thus,

$$|\widehat{q}^k(\mathbf{x}) - \widehat{q}^k(\mathbf{y})| > \mathbf{0}$$

Hence, if we set *δ* ¼ 1 and apply ProSHLA for two-stage stochastic programs with network recourse, then ProSHLA can drop the projection step because <sup>b</sup>*<sup>q</sup> k* ð Þ *<sup>x</sup>* and <sup>b</sup>*<sup>q</sup> k* ð Þ*y* are always unequal. In this situation, ProSHLA is equivalent to SHLA, so SHLA is convergent.

According to the above analysis, we have provided first theoretical convergence support for SHLA-type algorithms which are widely used in numerous applications as mentioned in introduction part. Compared with SHAPE, SHLA does not contain any nonlinear terms so that it can be very efficient. Besides, SHLA can automatically maintain the convexity of the approximation function if the initial piecewise linear functions are properly constructed.

#### **4. Experimental results for performance analysis**

In this section, we use two experimental designs to evaluate the performance of the algorithms: (1) An empty container repositioning problem which arises in the context of two-stage stochastic programs with network recourse; and (2) a high dimensional resource allocation problem as an extension experiment. In this section, the empty container repositioning problem is first introduced and then we present the efficiency of ProSHLA and SHLA. Sub-sequentially, we present the convergence of ProSHLA and SHLA, and examine how δ affects convergence performance, and compare the performance under different distributions of random demands. Finally, an extension experiment on a high dimensional resource allocation problem is conduncted to evaluate the efficiency our algorithms.

#### **4.1 Problem generator for the empty container repositioning problem**

In this subsection, we test our algorithms in an empty container repositioning problem faced by a major Chinese freight forwarder, who need to manage their numerous empty container in a port network which is located in Pearl River Delta in a fixed route from [23]. The port network contains several hubs (large ports) and spokes (small ports). And the demand of empty container is usually uncertain. When the forwarder need to decide the quantity of empty container to ship from one port to another, they did not know the exact demand of container in the future [24, 25]. Thus, we can formulate the problem as a two-stage stochastic programs with

network recourse. Before we formally introduce the problem, we present the following notations.


Then, the problem can be formulated as follows:

$$\min \sum\_{i \in L} \sum\_{j \in L} \left\{-r\_{i\bar{j}} \mathbf{x}\_{i\bar{j}} + c\_{i\bar{j}} \mathbf{y}\_{i\bar{j}}\right\} + E\_o[Q(\mathbf{x}, a)],\tag{30}$$

s.t.,

$$\sum\_{j \in L} \left\{ \mathbf{x}\_{ij} + \mathbf{y}\_{ij} \right\} = \mathbf{s}\_i, \qquad \forall i \in L \tag{31}$$

$$\sum\_{i \in L} \left\{ \mathbf{x}\_{\vec{\eta}} + \mathbf{y}\_{\vec{\eta}} \right\} = \mathbf{s}\_{\vec{\jmath}}, \quad \forall \vec{\eta} \in L \tag{32}$$

$$\mathcal{Y}\_{ij} \ge \mathbf{0}, \ \forall i, j \in L \tag{33}$$

where the recourse function *Q x*ð Þ , *ω* is given as follows:

$$Q(\mathbf{x}, \boldsymbol{\alpha}) = \min \sum\_{i \in L} \sum\_{j \in L} \left\{-r\_{i\bar{j}} \mathbb{1}\_{\bar{i}\bar{j}}(\boldsymbol{\alpha}) + c\_{i\bar{j}} \boldsymbol{y}\_{i\bar{j}}(\boldsymbol{\alpha})\right\} \tag{34}$$

s.t.,

$$\sum\_{j \in L} \left\{ \mathfrak{x}\_{i\bar{j}}(o) + \mathfrak{y}\_{i\bar{j}}(o) \right\} = \mathfrak{s}\_i, \qquad \forall i \in L \tag{35}$$

$$\sum\_{i \in L} \left\{ \mathbf{x}\_{\vec{\eta}}(o) + \mathbf{y}\_{\vec{\eta}}(o) \right\} = \mathbf{s}\_{\vec{\eta}}, \quad \forall \vec{\eta} \in L \tag{36}$$

$$\mathcal{Y}\_{\vec{\eta}}(\omega) \ge 0, \ \forall i, j \in L \tag{37}$$

In order to evaluate the algorithm, a set of problem instances are created. In this study, the problem generator creates ports in L in a 100-mile by 100-mile rectangle. We simply use the Euclidean distance between each pair of ports as the corresponding travel distance. We set the holding cost for a demand to 15 cents per time instance. We set the net profit for a demand to 500 cents per mile. The empty cost is set to 40 cents. The demand *Dij* between locations *i* and *j* is set as follows:

$$D\_{\vec{\eta}} = out\_{\vec{\jmath}} \cdot in\_i \cdot \nu,$$

where

*outj* = outbound potential for port *j*;

*ini* = inbound potential for port *i*;

*v* = random variable.

The outbound and inbound potentials for each port represent the capability of the location to generate outbound demand or attract inbound containers. In the generator, We draw the inbound potential, *ini*, for port *i* between 0*.*2 and 1*.*8 uniformly, while the corresponding *outj* is set as *outj* ¼ 2 � *ini* . The reason for this setting is that in realworld regions, large inbound flows port usually exhibits small outbound flows. We also include a random number *v* with mean 30, that is, the typical daily demand between each pair of locations to capture the randomness in demand. In order to test the performance of the algorithms under different distributions, we also evaluate the performance under exponential, normal and uniform distribution. We set the stepsize *α<sup>k</sup>* to 1*/k*.

We solve a deterministic network flow problem to construct an initial piecewise linear functions as described in [1], and we replace the random demand by their mean values in the deterministic problem. Then, we can obtain *S* ¼ *S*1, *S*2, … , *Sn* � �. For each *<sup>i</sup>* <sup>∈</sup>*L*, we generate the initial approximation function *<sup>Q</sup>*<sup>b</sup> <sup>0</sup> *<sup>i</sup>* ð Þ¼ *x c x* � *Si* � �<sup>2</sup> , *x* ¼ 0, *δ*, … , *kδ*, … *:Kδ*, where *c* is a positive parameter and *x* ∈ [0*, Kδ*]. In the projection step, a least-squares problem is solved as following:

$$\boldsymbol{\pi}^{k+1} = \operatorname{argmin} \left( \boldsymbol{\pi}^{k+1} - \left( \boldsymbol{\pi}^k + a\_k \boldsymbol{g}^k(\boldsymbol{\pi}^k) \right) \right)^2, \boldsymbol{\pi}^{k+1} \in \boldsymbol{X}.$$

#### **4.2 Effectiveness and efficiency performance**

To test the efficiency of the algorithm, we use a myopic algorithm, a posterior bound (PB), the L-shaped algorithm [12] and the inexact cut algorithm [15] as benchmarks. The myopic algorithm simply solves a static deterministic assignment problem at the current stage while ignoring uncertainties in the second stage. It is necessary to solve a deterministic network flow problem with all realized demands to obtain PB. Note that such a posterior optimization involves no uncertainty since decisions are allowed to anticipate future demand. Thus, the cost of PB is the lowest and normally unreachable. As for the L-shaped algorithm and the inexact cut algorithm, a group of linear programming problems with valid cuts should be solved.

We use 8 instances, in which the number of empty containers is ranged from 400 to 3200, and the corresponding number of ports is ranged from 5 to 40. For each instance, 2000 samples are implemented and we obtain the solutions of the myopic algorithm and the sample means of PB, the inexact cut algorithm, the L-shaped algorithm, SHLA and ProSHLA. For SHLA, two classes of initial functions with *δ* = 1 and *δ* = 2 are selected, whereas we select *δ* = 2 for ProSHLA.

We show the experiment results on total cost in **Table 1**. In **Table 1**, column 1 presents the number of ports, and column 2 shows the number of the empty containers. The PB bounds are contained in column 3. Columns 4–9 contain the solutions achieved by the myopic algorithm, the L-shaped algorithm, the inexact cut algorithm, SHLA-1, SHLA-2 and ProSHLA, respectively. From the table, it clearly demonstrates that the inexact cut algorithm, the L-shaped algorithm, SHLA, and ProSHLA can achieve optimal or very-near-optimal solutions, which are closer to the PB (lowest)




#### **Table 2.**

*Computational time of ProSHLA and SHLA.*

bounds than those of the myopic method. Moreover, the solutions of the L-shaped algorithm are the best solutions known, which are slightly better than the inexact cut algorithm because the latter produces valid cuts that are inexact in the sense that they are not as constraining as optimality cuts in the Lshaped algorithm. In addition, the performance of SHLA (*δ* = 1) outperforms that of SHLA (*δ* = 2) and ProSHLA (*δ* = 2), the reason is that small *δ* can lead to good performance. A specific discussion with impact of *δ* will be demonstrated later. The performance of ProSHLA (*δ* = 2) is slight better than SHLA (*δ* = 2). Because the projection steps in ProSHLA help improve the solution. Considering the speed of convergence is quite important in practical problems, we will focus on the computational time for different algorithms, which is shown in the **Table 2** below.

As shown in **Table 2**, ProSHLA and SHLA are more efficient than the inexact cut algorithm and the L-shaped algorithm because ProSHLA and SHLA can utilize the network structure while using the stochastic sub-gradient to approximate the recourse function. From the table, we find that the inexact cut algorithm and L-shaped algorithm are time-consuming, the reason is that here are 2000 samples, and it corresponds to a very large number of cuts for the inexact cut algorithm and L-shaped algorithm. It can also be observed that the computational time of the inexact cut algorithm is smaller than that of the L-shaped algorithm, and the reason is that the optimality cut in L-shaped is more than the valid cuts in the inexact cut algorithm. Moreover, the computational time of SHLA (*δ* = 1) is almost equal to that of (*δ* = 2), which reveals that the computational time canot be affect by the choice of *δ:* In contrary, ProSHLA(*δ* = 2) requires more computational time than SHLA (*δ* = 2), and the reason is that the projection step in ProSHLA are time-consuming. In the following text, we focus on the convergence performance of SHLA and ProSHLA. Thus, only the results of the myopic algorithm, PB, ProSHLA and SHLA are demonstrated, and we use the solutions of the myopic algorithm and PB as the upper and lower bounds, respectively.

#### **4.3 Analysis of convergence performance**

In this subsection, a set of experiments are conducted to evaluate the convergence performance of SHLA and ProSHLA, and we choose the second instance (*NR* = 800

**Figure 3.** *Convergence rate of ProSHLA and SHLA.*

**Figure 4.** *Gaps to PB for various δ:*

**Figure 5.** *Comparison of ProSHLA and SHLA for various δ.*

and *NL* = 10) as the experimental illustration. The range of the sample number is set from 20 to 640, and we record the result of each combination of *NR* and *NL* at each iteration. We can seem from **Figure 3** that the convergence rate of SHLA-1 and ProSHLA is remarkably high.

To further evaluate how *δ* affects the algorithm's convergence performance, a set of computational experiments are conducted. We increase *δ* from 1 to 16 and the number of samples from 20 to 640. Here are many combination of *δ* and the number of samples. We record the sample means of the solutions of SHLA and ProSHLA, PB and the myopic method for each combination. We demonstrate the 3D plots of the solution in **Figures 4** and **5**. As in **Figure 4**, the layers of ProSHLA and SHLA are extremely close to the PB layer, and this implies the ProSHLA and SHLA are convergent rapidly for various *δ:* Furthermore, it can been seem that the performance of ProSHLA can slightly exceeds that of SHLA. In order to further investigate the difference between SHLA and ProSHLA, we demonstrate the performance of ProSHLA and SHLA separately in **Figure 5** (without the myopic algorithm and PB). As described in **Figure 5**, the choice of *δ* can affect the performance of ProSHLA and SHLA, and a small *δ* usually leads to a good solution.

We provides more details on the convergence performance of ProSHLA and SHLA for various *δ* in **Table 3** below, which clearly demonstrates that in conjunction with the small *δ*, the performance of SHLA and ProSHLA is close to that of PB.


**Table 3.**

*Performance under various δ (no. of samples is 2000).*

#### **4.4 An extension experiment on a high dimensional resource allocation problem**

Due to the limitation of the container setting, an extension experiment on a higher dimensional problem is considered in this subsection. In this problem, there exists several retailers *R* and many production facilities (with warehouse) *L*. In stage 1, an amount *xij* is moved to a warehouse or retailer or location *j* from production facility *i* before the retail demand is realized. When we know the consumer's demand, then *yij* products are moved to retailer location *j* from production facility *i*. Besides, the type of the consumer's demand at each location *j* is different, we denote the type as *t*∈ *Τ* , we set the consumer's demand of type *t* at location *j* as *D<sup>t</sup> j* , and provide *pt <sup>i</sup>* unit of type *t* at production location *i:* We denote the production capacity of location *i* by *capi :* This problem is a non-separable problem.

Subsequently, we formulate the problem as follows:

$$\min \sum\_{i \in L\_j} \sum\_{i \in L \cup R} c\_{ij}^1 y\_{ij} + E\_o[Q(\mathbf{x}, o)] \tag{38}$$

subject to

$$\sum\_{i \in L \cup R} x\_{ij} \le cap\_i, \qquad \forall i \in L \tag{39}$$

$$\sum\_{i \in L} x\_{ij} = s\_j, \quad \forall j \in L \cup R \tag{40}$$

$$\{x\_{\vec{\eta}}, s\_{\vec{\jmath}} \ge 0, \quad \forall i \in L, \forall j \in L \cup R\tag{41}$$

where the recourse function *Q x*ð Þ , *ω* is given as follows:

$$Q(\mathbf{x}, \boldsymbol{\omega}) = \min \sum\_{i \in L \cup Rj \in R} \sum\_{i \in R} c\_{ij}^2 y\_{ij} - \sum\_{i \in Rt \in T} r\_i^t p\_i^t \tag{42}$$

subject to

$$\sum\_{j \in R} \mathbf{y}\_{ij} = \mathbf{s}\_i, \quad \forall i \in L \cup R \tag{43}$$

$$\sum\_{i \in L \cup R} y\_{ij} = \sum\_{t \in T} p\_j^t, \quad \forall i \in R \tag{44}$$

$$\{p\_j^t \le D\_j^t(\omega), \ \forall t \in L \cup \mathbb{R}, \forall j \in \mathbb{R}, t \in T\tag{45}$$

In the first stage, we set *c*<sup>1</sup> *ij* <sup>¼</sup> *<sup>c</sup>*<sup>1</sup> <sup>0</sup> <sup>þ</sup> *<sup>c</sup>*<sup>1</sup> <sup>1</sup>*dij*, where *dij* is the Euclidean distance between locations *i* and *j*, and *c*<sup>1</sup> <sup>0</sup> is the production cost for each product and *c*<sup>1</sup> <sup>1</sup> is the transportation cost per mile. For the second stage costs, we set

$$c\_{\vec{\eta}}^2 = \begin{cases} c\_1^2 d\_{\vec{\eta}} & \text{if } \ i \in L \text{ or } i = j \\ c\_0^2 + c\_1^2 d\_{\vec{\eta}} \text{ if } i \in R \text{ and } i \neq j \end{cases}$$

*c*2 <sup>1</sup> is the transportation cost per mile in the second stage, and *c*<sup>2</sup> <sup>0</sup> represents the fixed charge for moving each product from one retailer location to another retailer location. For one unit of the demand type *t* occurring in retailer location *i*, a revenue *r<sup>t</sup> <sup>i</sup>* will be obtained. Our problem instances differ in the number of products and ∣*L* ∪ *R*∣, and it determines the dimensionality of the recourse function.

Similarly, we use the inexact cut algorithm [15] and the L-shaped algorithm [12] as benchmarks, and these two algorithms are Benders decomposition based methods. Considering the convergence rate is quite important practically, in this part, our main focus is on the speed of convergence. In order to evaluate the speed of convergence of different methods, each algorithm is implemented for 40, 160, 640, 1200, and 4000 iterations, and a side by side comparison of the algorithms has been made when the number of iteration increases. For the L-shaped and inexact cut algorithms, the number of iterations refer to the number of cuts used to approximate the expected recourse function. For ProSHLA (*δ* = 2), the number of iterations refer to the number of demand samples used.

**Table 4** below shows the experiment results. In the experiment, the L-shaped algorithm has been used to help find the optimal solution. In the table, the numbers denote the percent deviation between the optimal value and the objective value.

For all problem instances, we use the L-shaped algorithm to find the optimal solution. The numbers in the table represent the percent deviation between the objective value and the optimal value obtained after a certain number of iterations. The computational time per iteration are also listed in **Table 4**. The computational results on 5 scale of dimensionality instances.

In **Table 4** above, column 1 presents the number of the locations, and column 2 shows the number of the products. Column 3 presnets the method that we used in the experiment. The percent deviation from the optimal value are contained in columns 4 to 8. Column 9 lists the computational time per iteration. According to results in **Table 4**, ProSHLA is able to obtain high quality solutions very efficient for different problem instance, and it can maintain the consistent performance in problem of different sizes, especially for large problems. This performance characteristic makes ProSHLA promising for large-scale application. In comparison with these two Benders decomposition-based methods, ProSHLA is competitive for high dimensional problems. The reason is that separable approximations usually scale much more easily to very high dimensional problems. Note that in the first problem instance, when the number of location is 6 and the number of resource is 10 (the inventory in a location might be 0, 1, 2), the result of ProSHLA seems to be breakdown, because the problem instance in this subsection is non-separable, which may introduce errors when we use


*Note. Figures represent the deviation from the best objective value known. \*Optimal solution not found.*

#### **Table 4.**

*Percent error over optimal solution with different algorithms costs.*

the separable approximations to approximate the expected recourse function. However, it will not happen on large problems. As for large problems, the separable approximations are nearly continuous, rather than being just piecewise continuous.

According to the above computational results, ProSHLA is a promising method for two-stage stochastic programs, but more comprehensive numerical work is needed before using it in a particular problem. Owing to its efficient performance and simplicity, ProSHLA is a very promising candidate for high-dimensional problems. Moreover, we can use it as an initialization routine method for high-dimensional stochastic programming problems, and it can exploit high-quality initial feasible solution.

#### **5. Conclusion**

In this study, we propose an efficient machine learning algorithm for two-stage stochastic programs. This machine learning algorithm is termed as projected stochastic hybrid learning algorithm, and consists of stochastic sub-gradient and piecewise linear approximation methods. We use the stochastic sub-gradient and sample information to update the piecewise linear approximation on the objective function. Then we introduce a projection step, which implemented the sub-gradient methods, to jump out from a local optimum, so that we can achieve a global optimum. By the innovative projection step, we show the convergent property of the algorithm for general two-stage stochastic programs. Furthermore, for the network recourse problem, our algorithm can drop the projection steps, but still maintains the convergence

property. The computational results reveal the efficiency of the proposed algorithms and the proposed algorithms are distribution-free. Furthermore, the convergence rate can be affected by the granularity of the initial function (*δ*). Small granularity usually leads to a high convergence rate. Finally, the computational results also show that the proposed algorithm is very competitive for high dimensional problems. Compared with MAT, the proposed algorithm can collect the information based on knowledge gradients and use it to update the recourse function by learning steps. It can overcome the "curse of dimensionality". Moreover, it can transfer the problem into a polynomial solvable problem.

#### **Acknowledgements**

This research is partially supported by the National Nature Science Foundation of China (Project No. 71701221) and the Natural Science Foundation of Guangdong Province, China, under Grant number: 2019A1515011127.

### **Author details**

Zhou Shaorui\*, Cai Ming and Zhuo Xiaopo Sun Yat-sen University, Guangzhou, China

\*Address all correspondence to: zshaorui@gmail.com

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Cheung RK, Chen CY. A two-stage stochastic network model and solution methods for the dynamic empty container allocation problem. Transportation Science. 1998;**32**(2): 142-162

[2] Bouzaiene-Ayari B, Cheng C, Das S, Fiorillo R, Powell WB. From single commodity to multiattribute models for locomotive optimization: A comparison of optimal integer programming and approximate dynamic programming. Transportation Science. 2016;**50**:366-389

[3] Moreno A, Alem D, Ferreira D, Clark A. An effective two-stage stochastic multi-trip locationtransportation model with social concerns in relief supply chains. European Journal of Operational Research. 2018;**269**(3):1050-1071

[4] Kim K, Mehrotra S. A two-stage stochastic integer programming approach to integrated staffing and scheduling with application to nurse management. Operations Research. 2015;**63**:1431-1451

[5] Wallace SW, Ziemba WT. Applications of stochastic programming. In: MOS-SIAM Series on Optimization. Vol. 5. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM); Mathematical Programming Society (MPS); 2005. ISBN:0-8971-555-5

[6] Kleywegt AJ, Shapiro A. Homem de Mello T. the sample average approximation method for stochastic discrete optimization. SIAM Journal on Optimization. 2001;**12**(2):479-502

[7] Ermoliev Y. Stochastic quasigradient methods. In: Numerical Techniques for Stochastic Optimization. New York: Springer-Verlag; 1988

[8] Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics. 1951;**22**(3): 400-407

[9] Rockafellar RT, Wets JB. A note about projections in the implementation of stochastic quasigradient methods. In: Numerical Techniques for Stochastic Optimization, Springer Ser. Comput. Math. Vol. 10. Berlin: Springer; 1988. pp. 385-392

[10] Ruszczyñski A. A linearization method for nonsmooth stochastic optimization problems. Mathematics of Operations Research. 1987;**12**:32-49

[11] Benders JF. Partitioning procedures for solving mixed-variables programming problems. Numerische Mathematik. 1962;**4**(1):238-252

[12] Van Slyke RM, Wets RJ-B. L-shaped linear programs with applications to optimal control and stochastic programming. SIAM Journal on Applied Mathematics. 1969;**17**(4):638-663

[13] Pereira MVF, Pinto LMVG. Multistage stochastic optimization applied to energy planning. Mathematical Programming. 1991;**52**:359-375

[14] Zakeri G, Philpott AB, Ryan DM. Inexact cuts in benders decomposition. SIAM Journal on Optimization. 2000; **10**(4):643-657

[15] Shapiro A. Analysis of stochastic dual dynamic programming method. European Journal of Operational Research. 2011;**209**(1):63-72

[16] Rebennack S. Combining samplingbased and scenario-based nested benders decomposition methods: Application to stochastic dual dynamic programming.

Mathematical Programming. 2016; **156**(1):343-389

[17] Philpott AB, Guan Z. On the convergence of stochastic dual dynamic programming and related methods. Operations Research Letters. 2008;**36**: 450-455

[18] Girardeau P, Leclere V, Philpott AB. On the convergence of decomposition methods for multistage stochastic convex programs. Mathematics of Operations Research. 2015;**40**(1): 130-145

[19] Cheung RK, Powell WB. SHAPE—A stochastic hybrid approximation procedure for two-stage stochastic programs. Operations Research. 2000; **48**(1):73-79

[20] Powell WB, Ruszczyñski A, Togaloglu H. Learning algorithms for separable approximation of discrete stochastic optimization problems. Mathematics of Operations Research. 2004;**29**(4):814-836

[21] Godfrey GA, Powell WB. An adaptive dynamic programming algorithm for dynamic fleet management I: Single period travel times. Transportation Science. 2002;**36**(1):21-39

[22] Neveu J. Discrete Parameter Martingales. Amsterdam: North Holland; 1975

[23] Song DP, Dong JX. Empty container management in cyclic shipping routes. Maritime Economics & Logistics. 2008; **10**(4):335-361

[24] Zhou S, Zhang H, Shi N, Xu Z, Wang F. A new convergent hybrid learning algorithm for two-stage stochastic programs. European Journal of Operational Research. 2020;**283**(1): 33-46

[25] Xu L, Zou Z, Zhou S. The influence of COVID-19 epidemic on BDI volatility: An evidence from GARCH-MIDAS model. Ocean Coastal Management. 2022. DOI: 10.1016/j.ocecoaman. 2022.106330

### *Edited by Igor A. Sheremet*

This book discusses multi-agent technologies (MATs) and machine learning (ML). These tools can be integrated and applied in industry, commerce, energy, medicine, psychology, and other areas. This volume consists of six chapters in three sections that discuss the integration, applications, and advanced results of MATs and ML.

### *Andries Engelbrecht, Artificial Intelligence Series Editor*

Published in London, UK © 2023 IntechOpen © your\_photo / iStock

Multi-Agent Technologies and Machine Learning

IntechOpen Series

Artificial Intelligence, Volume 14

Multi-Agent Technologies

and Machine Learning

*Edited by Igor A. Sheremet*