**Author details**

**7.3. Open accelerator (OpenACC)**

64 Recent Progress in Parallel and Distributed Computing

• allocation of memory in the device,

• transferring the results back to the host,

A small code example for using OpenACC is as follow:

• queuing sequences of operations executed by the device [37].

#pragma acc kernels: tells the compiler to generate parallel accelerator kernels that run in

In this chapter, taxonomy that divides GPU computing into four different classes was proposed. Class one (SHSD) and two (SHMD) are suitable for home GPU computing and can

• sending the code to the device,

• initiating data transfer,

• waiting for completion,

• deallocating memory and

sible for

main() { <serial>

{

} }

#pragma acc kernels

<parallel code>

**8. Conclusion**

//automatically runs on GPU device

parallel inside the GPU device [38].

OpenACC is an application-programming interface, stands for open accelerators, it came to simplify parallel programming by providing a set of compiler directives that allow developers to run parallel code with the modifying underlying code (like OpenMP), and it was developed by CAPS, Cray, NVidia and PGI. OpenACC uses compiler directives that allow small segments of code, called kernels, to be run on the device. OpenACC divides tasks among gangs (blocks), gangs have workers (warps) and workers have vectors (threads) [9, 34–36].

OpenACC is portable across operating systems and various types of host CPUs and devices (accelerators). In OpenACC, some computations are executed in the CPU, while the compute intensive regions are offloaded to GPU devices to be executed in parallel. The host is respon-

Abdelrahman Ahmed Mohamed Osman

Address all correspondence to: aamosman@uqu.edu.sa

Faculty of Computer at Al-Gunfudah, Umm AL-Qura University, Al-Gunfudah, Saudi Arabia
