**5. Minimization of devices' energy consumption in UAV-assisted IoT network**

The devices at the cell edge consume high energy to achieve the required data rate when transmitting data to the nearest BS because of the large LoS distance between

*Optimal Unmanned Aerial Vehicle Control and Designs for Load Balancing in Intelligent… DOI: http://dx.doi.org/10.5772/intechopen.110312*

**Figure 8.** *Illustration of UAV-assisted IoT network.*

BSs and those devices. Alternatively, a quad-rotor UAV-assisted IoT network could provide reliable communication compared to fixed terrestrial BSs. Therefore, in this section, we aim to find the optimal trajectory of UAV and the association of IoT devices that simultaneously support energy-efficient data collection.

#### **5.1 System model**

**Figure 8** illustrates the UAV-assisted IoT network, in which *M* terrestrial BSs with fixed height of *HB* and a single UAV collect data from *K* stationary uniformly distributed IoT devices. The UAV flies at a fixed altitude *Hu* with the constant speed of *ϑ<sup>u</sup>* where its start and end locations are represented by *US* ¼ *xs*, *ys* , *Hu* � � and *UE* <sup>¼</sup> *xe*, *ye*, *Hu* � � respectively. To track the UAV's location at each instance, we discretize its flight period into *N* equally spaced time slots, each of duration *Ts*, and assume that UAV's location at *n*th time slot *U n*½ �¼ ð Þ *x n*½ �, *y n*½ �, *Hu* , ∀*n*∈ N ¼ f g 1, 2, … , *N* is constant. All devices transmit atleast D*Min* bits data to the core network to maintain reliable QoS.

#### *5.1.1 Data collection of core network*

The transmission environment is categorized into two scenarios, i.e., ground to ground (G2G) and ground to air (G2A) channels. G2G channel establishes the links between BS and IoT devices, whereas G2A channel connects the IoT devices with the UAV platform. We generalize the wireless channel gain between each device and its destination (either UAV or BS) at each time slot as the combination of large-scale path loss and small-scale fading. The channel gain between each device and its destination can be modeled as [39]

$$h\_k^i[n] = \mathbf{g}\_k^i[n] \sqrt{L\_k^i[n]}, \forall k \in \mathcal{K} = \{1, 2, \dots, K\} \tag{51}$$

where *i* ∈f g *B* or *U* is the destination indicator in which *B* and *U* represent nearest BS and UAV, respectively, *Li <sup>k</sup>*½ � *<sup>n</sup>* is the large scale path loss, *<sup>g</sup><sup>i</sup> <sup>k</sup>*½ � *n* is the small scale fading coefficient. The achievable instantaneous transmission rate of the *k*th IoT device can be formulated as [40]

$$R\_k^i[n] = C\_B \log\_2 \left[ 1 + \frac{\left| h\_k^i[n] \right|^2 P\_k^i[n]}{\eta\_0} \right] \tag{52}$$

where *CB* is channel bandwidth, *Pi <sup>k</sup>*½ � *n* is transmitting power of the *k*th device, and *η*<sup>0</sup> is noise power. Instantaneous data transmitted by the *k*th device over G2G and G2A channel is measured as D*<sup>B</sup> <sup>k</sup>* ½ �¼ *<sup>n</sup> RB <sup>k</sup>* ½ � *<sup>n</sup> Ts* and <sup>D</sup>*<sup>U</sup> <sup>k</sup>* ½ �¼ *<sup>n</sup> <sup>R</sup><sup>U</sup> <sup>k</sup>* ½ � *n Ts* respectively. The energy consumption of device *k* at *n*th time slot can be calculated as

$$E\_k[n] = \left(I\_k^U[n]P\_k^U[n] + I\_k^B[n]P\_k^B[n]\right)T\_s, \forall k \in \mathcal{K} \tag{53}$$

where *P<sup>U</sup> <sup>k</sup>* ½ � *<sup>n</sup>* and *<sup>P</sup><sup>B</sup> <sup>k</sup>* ½ � *n* are the instantaneous transmit powers of *k*th device when connecting with UAV and BS, respectively and *I U <sup>k</sup>* ½ � *n* ,*I B <sup>k</sup>* ½ � *n* ∈f g 0, 1 are the binary device association indicators with UAV and BS respectively. The *k* th device transmits data to the core network during each time slot is measured as

$$\mathcal{D}\_{k}[n] = I\_{k}^{U}[n]\mathcal{D}\_{k}^{U}[n] + I\_{k}^{B}[n]\mathcal{D}\_{k}^{B}[n], \forall k \in \mathcal{K}, n \in \mathcal{N} \tag{54}$$

## *5.1.2 Problem formulation*

We aim for energy-efficient data collection that jointly exploit reliable data transmission, optimal instantaneous position of UAV and transmit power control. The fluctuation of channel gain causes unstable network performance, leading to quickly drain out devices' on-board battery energy. Thus, to minimize total energy consumption of all devices we jointly optimize UAV' trajectory, device association indicators and their transmit power allocation, while ensuring that each device should transmit a minimum data to the destination and UAV chooses a constant speed during its trajectory between the initial and final locations. Therefore the optimization problem is formulated as

$$\begin{array}{c} \mathbf{P1}: \begin{array}{c} \text{Minimize} \\ \{ (\mathbf{x}[n], \mathbf{y}[n]) \}, I\_{k}^{U}[n], I\_{k}^{B}[n], P\_{k}^{U}[n], \text{and } P\_{k}^{B}[n] \end{array} \end{array} \sum\_{n=1}^{N} \sum\_{k=1}^{K} \left[ \left( I\_{k}^{u}[n]P\_{k}^{u}[n] + I\_{k}^{B}[n]P\_{k}^{B}[n] \right) T\_{s} \right] \end{array} \tag{55}$$

Subject to the constraints

$$\mathbf{C1}: I\_k^U[n] \mathcal{D}\_k^U[n] + I\_k^B[n] \mathcal{D}\_k^B[n] \ge \mathcal{D}\_{\text{Min}}, \forall k \in \mathcal{K}, n \in \mathcal{N} \tag{56}$$

$$\mathbb{C}\mathbb{C}: I\_k^U[n] \in \{0, 1\}, I\_k^B[n] \in \{0, 1\}, \forall k \in \mathbb{K}, n \in \mathcal{N} \tag{57}$$

$$\mathsf{CSC}: I\_k^U[n] + I\_k^B[n] \le \mathbf{1}, \forall k \in \mathcal{K}, n \in \mathcal{N} \tag{58}$$

$$\mathsf{C4}: \sum\_{k=1}^{K} I\_k^U[n] \le K, \forall n \in \mathcal{N} \tag{59}$$

$$\mathbf{C}\mathbf{S}: U[\mathbf{1}] = U\_{\mathbf{S}}, U[\mathbf{N}] = U\_{E} \tag{60}$$

Here, C1 ensures that each device transmits atleast D*Min* bits data to either UAV or nearest BS at a time slot. C2 defines the device association indicators. C3 verifies that each device associates with either UAV or the nearest BS at each time slot. C4 implies that UAV can associate with maximum *K* number of devices instantaneously; and C5 guarantees that UAV starts its trajectory from an initial given position and ends to the final predefined location. The optimization problem contains multiple interactive and coupled variables, and they have a complex relationship by which changing one's value may impact to others. Furthermore, these discrete optimizing variables make the problem highly non-convex to find a limited time trajectory between the start and end points. *Optimal Unmanned Aerial Vehicle Control and Designs for Load Balancing in Intelligent… DOI: http://dx.doi.org/10.5772/intechopen.110312*

Hence, standard optimization methods face difficulties in obtaining exact solutions. In order to tackle this situation, we propose RL framework and adaptive decision-making policy to find UAV's successive locations, and device association along with their transmit power allocation. We adopt the SARSA algorithm to control the UAV, which acts as an RL-agent for taking the optimal action at each step to maximize its reward.

#### **5.2 Reinforcement learning based on SARSA algorithm**

As discussed earlier in Section 4.3, the RL framework follows MDP, where the current state only depends on the immediate past state, and the UAV acting as RL agent chooses an action according to the *ϵ*-greedy policy. Here, the generated reward depends on UAV's current state and taken action at each time slot. The expected trajectory is obtained more precisely when the reward generated by the UAV at the current time slot is beneficial for the long term. To reflect this property, we model the instantaneous reward for every time slot as UAV's instantaneous objective value, which is expressed as

$$\mathcal{R}(\mathfrak{s}[n], \mathfrak{a}[n]) = \left[ \sum\_{k=1}^{K} \left( I\_k^U[n] P\_k^U[n] + I\_k^B[n] P\_k^B[n] \right) T\_s \right]^{-1} \tag{61}$$

Algorithm 2 summarizes the optimal trajectory learning procedure using the improved SARSA technique. In this framework, we first calculate UAV's current state, channel gain, and distances from all devices to UAV and the nearest BS at every time slot. Then, all devices select the destination (either UAV or nearest BS) by estimating the instantaneous device association indicator and the required transmit power while satisfying the data rate constraint value. This process is repeated at each step, and UAV obtains optimal policy at the final episode. Since the number of episodes is *T* and each episode goes through *N* time slots, the computation complexity depends on total steps *TN*, including state space and action space in RL. In our scenario, there are *L*1*L*<sup>2</sup> possible state locations and eight possible actions for each time slot. Therefore, the computational complexity of algorithm 1 is Oð Þ 8*TNL*1*L*<sup>2</sup> , including the complexity of the action selection scheme in each step.


12: **else if** *n* ¼ *N* � 1 and ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð Þ *xe* � *xu*½ � *<sup>n</sup>* <sup>2</sup> <sup>þ</sup> *ye* � *yu*½ � *<sup>n</sup>* � �<sup>2</sup> <sup>q</sup> ≤*ϑuTs* **then** 13: Obtain the next state as *s*½ �¼ *n* þ 1 *xe*, *ye* � � 14: Calculate reward Rð Þ *s*½ � *n* ,*a*½ � *n* by (61) 15: Choose the next action *<sup>a</sup>*½ �¼ *<sup>n</sup>* <sup>þ</sup> <sup>1</sup> *<sup>δ</sup><sup>u</sup> <sup>x</sup>*½ � *<sup>n</sup>* <sup>þ</sup> <sup>1</sup> , *<sup>δ</sup><sup>u</sup> <sup>y</sup>* ½ � *n* þ 1 n o by (50) and obtain *<sup>Q</sup>*ð Þ *<sup>s</sup>*½ � *<sup>n</sup>* <sup>þ</sup> <sup>1</sup> ,*a*½ � *<sup>n</sup>* <sup>þ</sup> <sup>1</sup> value 16: Update *Q*ð Þ *s*½ � *n* ,*a*½ � *n* value according to (49) 17: Update the respective state and action as *s*½ � *n s*½ � *n* þ 1 and *a*½ � *n a*½ � *n* þ 1 18: **else** 19: **Break** 20: Find an optimal policy as *π* <sup>∗</sup> *<sup>h</sup>* <sup>¼</sup> *arg* max *<sup>a</sup>*½ �¼*<sup>n</sup> <sup>δ</sup><sup>u</sup> <sup>x</sup>* ½ � *<sup>n</sup>* , *<sup>δ</sup><sup>u</sup> <sup>y</sup>* f g ½ � *<sup>n</sup> Q*ð Þ *s*½ � *n* ,*a*½ � *n* , ∀*s*½ � *n* ∈S,*a*½ � *n* ∈ A, *n*∈ N

#### **5.3 Simulation results**

This sub-section presents the training outcomes corresponding to the proposed SARSA algorithm for optimal trajectory and subsequently evaluates the energyefficient data collection. Here, we compare the effectiveness and superiority of the proposed design with the benchmark PSO technique [41], where 100 IoT devices are uniformly distributed within a square field of size 2000 � 2000 m. Moreover, we adopt the required simulation parameters from [40] and [24] to implement the proposed algorithm.

#### *5.3.1 Convergence analysis*

The agents' training evaluations using RL-based SARSA algorithm are illustrated in **Figure 9a**, when all IoT devices maintain the data rate constraint of 10 Mbps. In this figure, we have found that the convergence rate varies for flying time because UAV explores the target area more efficiently with the available time slots. As a result more devices associate with UAV and the convergence occurs before 10,000 episodes.

**Figure 9b** shows the episode-wise objective value evaluation using PSO algorithm. From this figure, it is visible that PSO takes more time to converge, and its final convergence value is less than the SARSA algorithm. This is because PSO updates particles' position and velocity according to the random inertial weight which causes less exact regulation of particles' moving directions and speed. Hence, its

#### **Figure 9.**

*Training results corresponding to the proposed and benchmark algorithms. (a) Cumulative reward generated by proposed SARSA. (b) Fitness value generated by benchmark PSO.*

*Optimal Unmanned Aerial Vehicle Control and Designs for Load Balancing in Intelligent… DOI: http://dx.doi.org/10.5772/intechopen.110312*

#### **Figure 10.**

*Optimal trajectories corresponding to the proposed and benchmark algorithms. (a) Optimal UAV trajectory using SARSA. (b) Optimal UAV trajectory using PSO.*

computational complexity increase due to the high dimensions of decision variables. Therefore, the proposed SARSA algorithm improves the cumulative reward by 10.26% with respect to the PSO.

#### *5.3.2 Optimal trajectory*

Using the same parameters mentioned in **Figure 9**, UAV finds its optimal trajectories with the help of SARSA and PSO algorithms, depicted in **Figure 10**. These figures indicate that UAV moves toward the devices, far away from the BS, and within the flight period, it reaches the final destination point. Since devices consume more energy while transmitting data to BS, UAV fly toward those devices to improve their channel conditions. as we mentioned earlier, device association with UAV increases with the flying time, more devices transmit their data to the UAV instead of BS, reducing their energy consumption.

#### *5.3.3 Performance comparison of proposed SARSA with benchmark PSO*

The variation of devices' average transmit power to achieve 10 Mbps data rate with the index value is demonstrated in **Figure 11a** where a device's index indicates its

#### **Figure 11.**

*Performance comparison of the proposed and benchmark algorithms. (a) Devices' transmit power corresponding to their index value. (b) Devices' energy consumption versus data rate constraint.*

distance from the nearest BS. It is observed that, when there is no UAV support, average transmit power increases with the index value because, according to (52) devices far away from BS utilize more power to obtain the given data rate. But when UAV is employed, its optimal trajectory focuses the devices which are consuming more power and associates with them for data collection. Furthermore, since UAV's straight trajectory cannot improve all devices' channel conditions, the corresponding energy-efficient data collection would not be possible.

The total energy consumption of all devices for various data rate constraint values is illustrated in **Figure 11b**. It is clear that devices' energy consumption increases with data rate constraint because, according to (49), devices allocate more power to achieve the given rate constraint. Furthermore, from **Figure 11a**, UAV's optimal trajectory corresponding to the proposed SARSA algorithm reduces devices' transmit power with its available flying time as compared to PSO algorithm, because PSO achieves low convergence rate in an iterative process and could not identify the local optimal in high-dimension space. Hence, the proposed SARSA methodology significantly reduces the total energy consumption of all devices by 8.15%, 7.72%, and 5.67% for UAV's flying time of 80, 100, and 120 timeslots, respectively as compared to PSO.
