**5. Approach for ground-based MORL-ANN PD**

The proposed ground-based MORL-ANN PD (or simply PD) can be implemented using a deep deterministic policy gradient (DDPG) technology enabler or a combined DDPG with deep-Q learning network (DQN). This section describes these two implementation approaches.

## **5.1 PD implementation using DDPG**

The goal for the MORL-ANN PD is to pre-distort the TX signal such that the received (RX) signal is identical to the transmit (TX) signal [3, 4]. The ML-AI technology enabler that is available from MATLAB is the DDPG [5–8]. The DDPG is suitable for the MORL-ANN training and prediction that involves tuning the parameters of a deep neural network (DNN). As depicted in **Figure 4**, DDPG is an actorcritic network that is the heart of the proposed ML-AI framework, where the actor observes the received data and decides on required actions, and the critic judges the actions and rewards or penalizes the actions using a pre-defined loss function. The word "deep" in DDPG represents the DNN with two or more hidden layers,

**Figure 4.**

*(a) MORL-ANN implementation using DDPG and (b) DDPG actor-critic network block diagram.*

*Ground-Based HPA Pre-Distorter Using Machine Learning and Artificial Intelligent… DOI: http://dx.doi.org/10.5772/intechopen.110735*

"deterministic" means that there is only one-output, "policy" means that the PD has a policy for deciding an action, and "gradient" means that the PD uses gradient of the loss function to update previous values.

DNN is a structure that consists of a sequence of functions (layers), which takes in our state (or a state-action pair) and returns to us an action (or our expected reward). Here, the MORL training occurs in episodes that consist of k steps. A step is a process whereby an action is generated by the agent. The action is processed by the environment, and the resulting reward is returned to the agent. For our MORL-ANN implementation, an episode consists of a single step. The MORL-ANN algorithm is expressed as follows:

Form Policy Gradient <sup>∇</sup><sup>J</sup> <sup>θ</sup><sup>Þ</sup> : where J <sup>θ</sup>Þ ¼ E RewardEpisode (4)

where our neural networks are determined by a parameter vector θ representing system operating conditions, such as operating temperature and IPBO


$$\{\theta \gets \theta + \text{\textquotedblleft}\text{, where } 0 < <\text{\textquotedblleft}\text{, so the learning rate.}\tag{5}$$

The proposed DNN tuning requires fine-tuning the training parameters, involving:


The "Reward" is defined as the error of the signal amplitudes and error of the signal phase after the HPA is expressed in negative values. For tuning, we take the L2 norm between the post-HPA PD signal and the original transmitted signal from the ground FH-TX terminal. For final training, we will use the L2 norm between the "SlidingBucket" normalized signals for greater accuracy. The "SlidingBucket" is an algorithm that our team developed to emulate the automatic gain control (AGC) to maintain the IPBO level. The IPBO level is updated depending on the selected AGC loop time response (i.e., update rate). The CSUF graduate students<sup>1</sup> spent a tremendous amount of time fine-tuning the training parameters and found an optimum set of training parameters for the final simulation run. The simulation results are shown in Section 6.2.

<sup>1</sup> Sean Cantarini was the lead of the graduate student team to fine tune the MORL-ANN PD.
