**2. Introduction**

Despite the dramatic increase in computer processing power over the past few years [1], the appetite for more processing power is still rising. The main reason is that as more power becomes available, new types of work and applications that require more power are generated. The general trend is that new technology enables new applications and opens new horizons that demand further power and the introduction of some newer technologies.

Developments at the high end of computing have been motivated by complex systems such as simulation and modelling problems, speech recognition training, climate modelling and the human genome.

However, there are indications that commercial applications will also be in demand for high processing powers. This is mainly because of the increase in the volumes of data treated by these applications [2].

There are many approaches to increase computer processing power like improving the processing power of computer processors, using multiple processors or multiple computers to perform computations or using graphics processing unit (GPU) to speed up computations.

#### **2.1. Why we need parallel computing**

There are many reasons for parallelization, like speed up execution, overcome memory capacity limit and execute application that is distributed in its nature. The main reason for parallelization is to speed up the execution of applications. Another problem that arises in the era of big data is that the huge data, which need to be processed, do not fit in a single computer memory. Some applications are distributed in their nature, where parts of an application must be located in widely dispersed sites [3].

#### **2.2. Important terminology**

There are many important terminologies that help in understanding parallelization, here in this section we will talk about some of them.

#### *2.2.1. Load balancing*

The load balancing is an important issue for performance. It is a way of keeping all the processors busy as much as possible. This issue arises constantly in any discussion of parallel processing [3].

#### *2.2.2. Latency, throughput and bandwidth*

Latency, throughput and bandwidth are important factors that affect the performance of computations. Here is a brief definition for them.


### *2.2.3. Floating point operation (FLOP)*

It is a unit for measuring performance. It is about how many floating-point calculations can be performed in 1 s. The calculations can be adding, subtracting, multiplying or dividing two floating-point numbers. For example, 3.456 + 56.897 is equal to one flop.

Units of flops are as follows:

Developments at the high end of computing have been motivated by complex systems such as simulation and modelling problems, speech recognition training, climate modelling and

However, there are indications that commercial applications will also be in demand for high processing powers. This is mainly because of the increase in the volumes of data treated by

There are many approaches to increase computer processing power like improving the processing power of computer processors, using multiple processors or multiple computers to perform computations or using graphics processing unit (GPU) to speed up computations.

There are many reasons for parallelization, like speed up execution, overcome memory capacity limit and execute application that is distributed in its nature. The main reason for parallelization is to speed up the execution of applications. Another problem that arises in the era of big data is that the huge data, which need to be processed, do not fit in a single computer memory. Some applications are distributed in their nature, where parts of an application

There are many important terminologies that help in understanding parallelization, here in

The load balancing is an important issue for performance. It is a way of keeping all the processors busy as much as possible. This issue arises constantly in any discussion of parallel

Latency, throughput and bandwidth are important factors that affect the performance of com-

• Communication latency is the time for one bit to travel from source to destination, e.g. from

• Processor latency can be defined as the time to finish one task. While throughput can be defined as the rate at which we complete a large number of tasks (a number of tasks done

• Bandwidth is the number of bits per unit time that can be travelling in parallel. This can be affected by factors such as the bus width in a memory or the number of parallel network

the human genome.

these applications [2].

**2.1. Why we need parallel computing**

46 Recent Progress in Parallel and Distributed Computing

must be located in widely dispersed sites [3].

this section we will talk about some of them.

*2.2.2. Latency, throughput and bandwidth*

in a given amount of time).

putations. Here is a brief definition for them.

a CPU to GPU or from one host/device to another.

paths in a cluster and also by the speed of the links [3].

**2.2. Important terminology**

*2.2.1. Load balancing*

processing [3].

