**2. Literature review**

Double pendulum is an example of a dynamic system that exhibits chaotic behavior. The inverted double pendulum is an archetype for thrust vector controlled multi-staged rocket or missile or even multi-rotor UAV flight control. The problem of controlling an inverted double pendulum has been studied for decades using different types of controllers. One of the approaches that has been used is a self-tuned neuro-PID controller to control the inverted pendulum on a cart [1]. The controller was realized by summing up two controllers for position and angle controls. They proved the effectiveness and robustness of this technique. An angular momentum based controller was able to control the double pendulum at any unstable position [2].

With the increase in computational capability and the advent of new and improved machine learning algorithms over the last decade, there has been an increase in the development of intelligent systems for various engineering applications such as path planning, target tracking, satellite attitude control systems [3], collaborative control of a swarm of UAVs [4], etc. Such intelligent systems provide various advantages, a few of which include adaptability, robustness to uncertainties and improved efficiency.

Fuzzy logic is one such intelligent system that can provide robustness and adaptability to controllers. Fuzzy logic was used in combination with optimal control theory to design a highly effective controller [5]. In a related research [6], fuzzy logic was used for stabilizing a parallel type double inverted pendulum. This is different from the inverted double pendulum in that it involves two separate pendulums being controlled simultaneously on a cart. Simulation results showed that the controller was able to stabilize completely the parallel-type double inverted pendulum system within 10s for a wide range of the initial angles of the two pendulums. The performance of fuzzy logic was compared to a PID controller in controlling an inverted pendulum [7]. Simulation results showed that the fuzzy logic controllers (FLCs) are far superior compared to PID controllers in terms of overshoot, settling time and response to parameter changes.

The performance of fuzzy controller tuned with noisy data was compared to that of a controller tuned without noise [8]. Optimizing the fuzzy system for a higher noise level results in good performance at lower noise levels. Lee presented three fuzzy system architectures and methods for automatically designing them for high dimensional problems [9]. The results indicate that the real coded algorithms consistently outperformed the binary coded algorithms in both the final performance of the system and the performance of the search algorithm. The Asymmetric-Triangular fuzzy systems consistently improved faster than the hyper-ellipsoidal and shared triangular representations in all cases.

Designing an FLS includes tuning the membership functions and the rulebase. This process can be automated by coupling GA with FLS to obtain the methodology known as Genetic Fuzzy System (GFS). In a GFS, GA tunes the parameters of the FLS to minimize a cost function that is carefully chosen such that minimizing it provides the desired behavior of the system. Such GFSs have been developed with much success for clustering and task planning [10], aircraft conflict resolution [11], simulated air-to-air combat [12], collaborative control of UAVs, etc. Since fuzzy logic systems are made up of a set of membership functions that define the inputs and a set of linguistic rules that define the relationship between the inputs and the outputs, it is more interpretable compared to other machine learning techniques like neural networks and support vector machines. Since it is trained using GA, differentiable cost functions such as integral squared error is not required. So, as long as the mission requirement can be defined as a mathematical cost function, we do not need to have ground truth data available. GA will traverse the search space looking for the optimal set of membership functions and rulebase that minimizes the cost function, which makes it a form of reinforcement learning. Reinforcement learning is a branch of machine learning where an agent is trained to take the optimal control action to maximize a reward.
