*5.2.2 Achievements of implementation of structure-defined mathematical models of an aircraft using CUDA*

In our solution, the user can monitor the simulation process in real time using the GPU (**Figure 9**). Parallel simulation is one of the areas of simulation optimization and global optimization, which are implemented by the method of estimating the differences between continuous and discrete versions. The division of the structured mathematical model of aircraft motion into blocks is the motivation of this research with the definition of the relationships between the created blocks of the mathematical model.

*Simulation of a Mathematical Model of an Aircraft Using Parallel Techniques (MPI and GPU) DOI: http://dx.doi.org/10.5772/intechopen.105538*

#### **Figure 9.**

*The CUDA kernel of BlockSimulationModelGPU() function.*

The design of a block structure of mathematical model of aircraft motion is realized by various methods of simulation software systems, and one of them is CUDA computing. The application programming interface (API) improves computing performance with a graphics card for numerical calculation in simulations, and it is standardized layer that allows applications to take advantage of software or hardware services and features. The CUDA includes C/C++ software development tools, function libraries, and a hardware abstraction mechanism that hides the GPU hardware from developers [22].

#### *5.2.3 Basic software solution of computation on the GPU*

The *ComputeFlightGPU ( … )* function is essentially an outsourcing agent that sends input data to the device, activates the execution of the simulation on the device, and collects the results from the device [28]. To perform a simulation of a block structure mathematical model of aircraft motion on a GPU core, the programmer musts allocate the required memory by calling the *cudaMalloc ( … )* function on the device. To transfer the relevant data from the host memory to the allocated device memory, we call *cudaMemcpy ( … )* function, see **Figure 10**—Part 1.

#### **Figure 10.**

*Screenshot of simulation results based on (Eq. (18)), horizontal axes – time, vertical axes – speed increment.*

**Figure 11.** *Screenshot of simulation results based on (Eq. (24)), horizontal axes – time, vertical axes – angle of attack increment.*

After completing the simulation, the programmer must transfer results from the device memory back to the host memory by calling function *cudaMemcpy ( … )* and free the device memory by calling function *cudaFree ( … )*, see **Figure 10** – Part 3. The CUDA kernel provides API simulation functions running as programmer-defined functions [27]. This core function determines the function of the block structure mathematical model of aircraft motion to be performed by all threads during the parallel simulation phase, see **Figure 10**—Part 2.

Parallel programming in C/C++ supports a CPU computer and the GPU card supports programming in C [28]. The inner loop in the CUDA implementation disappears, because the *ComputeFlightGPU* calculation is performed in parallel by CUDA threads, see previous figure. Our presented solution was simulated on a personal computer that consists of a CPU Intel Quad Core Q9650 processor with 4 cores, 3.00 GHz each, cache 12 MB L2, bus speed 1333 MHz (FSB), 4GB RAM DDR3, and GPU accelerator NVIDIA GeForce GTX 560 Ti.

The first thing to notice is the *\_\_global\_\_* keyword, see **Figure 11**. This simply means that this function can be called either from host computer or the CUDA device. Each thread runs the same block mathematical model of aircraft motion, so the only way to differentiate yourself from other threads is to use their *threadIdx* and your *blockIdx* [29]**.**

The index in the thread array is calculated and determined by the block and thread ID. Each thread uses its own *threadIdx.x* and *threadIdx.y* to identify the elements of a block mathematical model of aircraft **motion** defined by the system of equations (Eq. (17)) and (Eq. (**23**)). The expression *blockIdx.y* × *2* × *2* equals the number of threads in all grid rows above the current thread position. The expression *blockIdx.x* × *2* equals the number of all columns in the current grid row. Finally, *threadIdx.x* equals the number of threads above the current block.

The function *BlockSimulationModelGPU()* calculates a unique ID in the *Idx* register variable, which is then used as an identifier in the array and in the aircraft motion simulation calculation model.

*Simulation of a Mathematical Model of an Aircraft Using Parallel Techniques (MPI and GPU) DOI: http://dx.doi.org/10.5772/intechopen.105538*

#### *5.2.4 Simulation results from CUDA*

As follows from the expression, the block of the structured mathematical model of the aircraft is defined by Eq. (18) *A11* \* *x1* and is simulated by the GPU block *Block1*, the block structure of the mathematical model of the aircraft is defined by Eq. (24) *A21* \* *x1* and is simulated by the GPU block *Block2*. From the obtained graphical output, **Figures 6** and **7**, which are the results of the numerical integration of the block structure of the mathematical models of aircraft motion.

Graphically, the course of **Figure 6** shows the increase in aircraft speed as a function of fuel delivery not exceeding 31.0174 [m/s]. **Figure 7** shows the angle of increment of the aircraft attack as a function of the change in the position of the rudder, at which it does not exceed 0.1247 [rad].

The CUDA memory model consists of different memory spaces, which differ significantly in latency times [30]. Results from comparing of the simulation of block structured mathematical model of aircraft motion on the CPU and on the GPU are in **Table 1**.

In the number of block mathematical models column, the number means the number of block structure of mathematical model of aircraft motion simulated in parallel. The times achieved from the measurement of the simulation of mathematical aircraft models on the CPU are in CPU runtime column and on the GPU in the GPU runtime column. The last column is the acceleration of the calculation on the GPU compared on the CPU.

The table shows the performance acceleration obtained by simulating a different number of block structured mathematical models of motion of aircraft. The columns show the number of block mathematical models, the CPU run time, GPU run time, and acceleration. Decreasing the simulation speed of a block structure of mathematical models of aircraft motion on the GPU with an increasing number of simulated mathematical models is caused by an increasing the time required to read or retrieve data from or to the host computer.

This subchapter presents GPU calculation and CUDA programming with an explanation of how to implement an efficient GPU application. The subchapter aims to offer data obtained by simulating the block structure of a mathematical model of aircraft motion using the GPU. Even a non-optimized parallel implementation of the block structured mathematical model of aircraft motion on the GPU can lead to a significant reduction in computational time compared with the implementation to the CPU.


#### **Table 1.**

*The performance obtained from simulation different members of the mathematical models of aircraft motion.*
