**3. Understanding the CPUs**

In this section initially we will present to the reader an idea about the internal operation of a general purpose processing unit (CPU), assuming that the reader doesn't have any previous knowledge. Subsequently, we will analyze the contribution made by the alternative computing platforms in the high performance computing.

Let's imagine an automobile assembly line completely automated like the shown in the figure 6, which produce type A automobiles. The parts of the cars can be found in the warehouse and are taken into the line, passing through the following stages: engine assembly, bodywork assembly, motor drop, painting and quality tests. Finally we get the vehicle and it is carried to the parking zone. In this way, in the seismic data processing, a computing system can access the data stored in memory, adapt it, calculate the velocity model, and finally migrate them to produce a subsurface image that is stored again in memory.

Now let's suppose that we want to modify the assembly line in order to make it produce type B automobiles. Therefore, it's necessary either to add some stages to the assembly line or to modify the existing modules configuration. In like manner operates a CPU, where it can be executed different kind of algorithms as well as the assembly line can now produce different 90 New Technologies in the Oil and Gas Industry Alternative Computing Platforms to Implement the Seismic Migration <sup>9</sup> Alternative Computing Platforms to Implement the Seismic Migration 91

**Figure 6.** CPU analogy

8 Will-be-set-by-IN-TECH

1978 Schneider Integral formulation of migration.

1984 Gazdag and Sguazzero Phase shift plus interpolation.

1971 W. A. Schneider's tied diffraction stacking and Kirchhoff

1988 Forel and Gardner A completely velocity-independent

Over the years the seismic imaging began to be more demanding with the technology, because every time it would need a more accurate subsurface image. This led to the use of more complex mathematical attributes. Additionally it became necessary to process a large volume of data resulting from the 3D and 4D seismic and the use of multicomponent geophones. All this combined with the technological limitations of computers since 2000, made HPC

These alternatives should provide a solution to the three barriers currently faced by computers (Power wall, Memory wall and IPL wall; these three barriers will be explained in detail below). For that reason the attention was focused on two emerging technologies that promised to address the three barriers in a better way than computers, [14]. These technologies were GPUs

In the following sections it's explained how these two technologies have been facing the three barriers, in order to map out the future of the HPC in the area of seismic data processing.

In this section initially we will present to the reader an idea about the internal operation of a general purpose processing unit (CPU), assuming that the reader doesn't have any previous knowledge. Subsequently, we will analyze the contribution made by the alternative

Let's imagine an automobile assembly line completely automated like the shown in the figure 6, which produce type A automobiles. The parts of the cars can be found in the warehouse and are taken into the line, passing through the following stages: engine assembly, bodywork assembly, motor drop, painting and quality tests. Finally we get the vehicle and it is carried to the parking zone. In this way, in the seismic data processing, a computing system can access the data stored in memory, adapt it, calculate the velocity model, and finally migrate them to

Now let's suppose that we want to modify the assembly line in order to make it produce type B automobiles. Therefore, it's necessary either to add some stages to the assembly line or to modify the existing modules configuration. In like manner operates a CPU, where it can be executed different kind of algorithms as well as the assembly line can now produce different

migration together.

migration technique.

**Year Author Contribution**

1974 and 1975 Gardner et al Clarified this even more.

1978 Gazdag Phase shift method.

1990 Stoffa The split step method.

**Table 2.** Development of efficient algorithms.

**3. Understanding the CPUs**

and FPGAs.

developers began to seek new technological alternatives.

computing platforms in the high performance computing.

produce a subsurface image that is stored again in memory.

kinds of automobiles. A CPU includes different kind of modules that allow it execute several instructions which can execute the desired algorithm.

As well as the CPU receive and deliver data, the assembly line receive parts and deliver vehicles. But to assemble a particular type of car, the line have to get prepared. In this way, the line can now select the right parts from the warehouse, transport it and handle it suitably. In a CPU, the program is the responsible to indicate how to get each data and how to manipulate it to obtain the final result.

A program is composed by elementary process called instructions. In the assembly line, an instruction matched simple actions like bring a part from the warehouse, adjust a screw, activate the painting machine, etc. In a CPU, an instruction executes a elementary calculation over the data such as fetch data from memory, carry out a sum, multiplication, etc.

#### **3.1. CPU operation**

In a CPU, the instructions are stored in the memory and are executed sequentially one by one as it's shown in the figure 6. To execute one instruction, it must fetch it from memory, decode (interpret) it, read the data required by the instruction, manipulate the data and finally return the results to the memory. This architecture is known as Von Neumann. Moderns CPUs have evolved from their original concept, but its operating principle is still the same.

#### 10 Will-be-set-by-IN-TECH 92 New Technologies in the Oil and Gas Industry

Nowadays almost any electronic component that we use is based on CPUs like our PC, cell phones, videogames, electrical appliance, automobiles, digital clocks, etc.

As was described, the CPU is in charge to process the data stored in memory as the program indicated. In our assembly line we can identify several blocks that handle, transport and fit the vehicle parts. In a CPU, this set of parts is known as the datapath. The main functionality of the datapath is to temporally store, transform and route the data in a track from the entrance to the exit.

In the same way, in the assembly line we can find the main controller which is in charge to monitor all the process and to activate harmonically the units on each stage to execute the assembly process. All this, taking into account the requested assembly process. In a CPU we can find the control unit, in charge to address orderly the data through the functional units to execute each one of the instructions that the program indicates and thus perform the algorithm.

#### **3.2. CPU performance**

The performance of an assembly line, could be measured as the time required to produce certain amount of vehicles. In the same way, the CPU performance is measured as how long it takes to process some data. In first place, the performance of the assembly line could be limited by the speed of its machines as well as in the CPU the integrated circuit speed is proportional to its performance. The second aspect that affects the performance is how fast can be put the parts at the beginning of the assembly line. In the same way, the data transference rate between CPU and memory could limit its performance. The CPU performance is related with the execution time of an algorithm, for that reason we are going to analyze some aspects that have slowed down the CPU performance.

We will analyze the assembly line operation to make the best use of its machines, and increase its performance. We can observe that the units on each stage could simultaneously work over different vehicles, and it is not necessary to finish one vehicle to start the next one (see figure 6). In the same way, the CPU can process several data at the same time provided that it has been designed using the technique called pipeline, [19]. This technique segments the execution process and allows that the CPU handles several data in parallel. This is one of the digital techniques developed to improve the CPU speed although nowadays developments on this area are stuck. This phenomenon is one of the greatest performance improvement constraint and it's called the Instruction Level Parallelism (ILP) wall [23].

The ILP can be associated with a technique that tries to reorganize the assembly line machines, in order to improve the performance. Additionally, there have been other different ways to improve the performance, one of them is using new machines more efficient that can assemble the parts faster. Also these new machines could be smaller in size, occupying less space, allowing, therefore more machines in the same area, increasing the productivity. In the CPU context, it has been the improvement of the integrated circuit manufacturing which has allowed the deployment of smaller transistors. This supports the implementation of more dense systems (i.e. the greatest number of transistors per chip) and faster (i.e. that can perform more instructions per unit time).

The growth rate of the amount of transistors in integrated circuits has faithfully obeyed Moore's Law, it allows to have smaller transistors on a chip. As the transistors were becoming smaller, their capacitances were reducing allowing shorter charges and discharges times. All this allow higher working frequencies and there is a direct relationship between the working frequency of an integrated circuit and power dissipation (given by equation 1 taken from [9]).

$$P = \mathbb{C}\rho \, fV\_{dd}^2 \tag{1}$$

where *ρ* represents the transistor density, *f* is the working frequency of the chip, *Vdd* is the power supply voltage and *C* is the total capacitance. Nowadays the power dissipation has grown a lot and has become an unmanageable problem with the conventional cooling techniques. This limitation on the growth of the processors is known as the power wall and has removed one of the major growth forces of the CPUs.

#### **3.3. Memory performance**

10 Will-be-set-by-IN-TECH

Nowadays almost any electronic component that we use is based on CPUs like our PC, cell

As was described, the CPU is in charge to process the data stored in memory as the program indicated. In our assembly line we can identify several blocks that handle, transport and fit the vehicle parts. In a CPU, this set of parts is known as the datapath. The main functionality of the datapath is to temporally store, transform and route the data in a track from the entrance

In the same way, in the assembly line we can find the main controller which is in charge to monitor all the process and to activate harmonically the units on each stage to execute the assembly process. All this, taking into account the requested assembly process. In a CPU we can find the control unit, in charge to address orderly the data through the functional units to execute each one of the instructions that the program indicates and thus perform the

The performance of an assembly line, could be measured as the time required to produce certain amount of vehicles. In the same way, the CPU performance is measured as how long it takes to process some data. In first place, the performance of the assembly line could be limited by the speed of its machines as well as in the CPU the integrated circuit speed is proportional to its performance. The second aspect that affects the performance is how fast can be put the parts at the beginning of the assembly line. In the same way, the data transference rate between CPU and memory could limit its performance. The CPU performance is related with the execution time of an algorithm, for that reason we are going to analyze some aspects

We will analyze the assembly line operation to make the best use of its machines, and increase its performance. We can observe that the units on each stage could simultaneously work over different vehicles, and it is not necessary to finish one vehicle to start the next one (see figure 6). In the same way, the CPU can process several data at the same time provided that it has been designed using the technique called pipeline, [19]. This technique segments the execution process and allows that the CPU handles several data in parallel. This is one of the digital techniques developed to improve the CPU speed although nowadays developments on this area are stuck. This phenomenon is one of the greatest performance improvement

The ILP can be associated with a technique that tries to reorganize the assembly line machines, in order to improve the performance. Additionally, there have been other different ways to improve the performance, one of them is using new machines more efficient that can assemble the parts faster. Also these new machines could be smaller in size, occupying less space, allowing, therefore more machines in the same area, increasing the productivity. In the CPU context, it has been the improvement of the integrated circuit manufacturing which has allowed the deployment of smaller transistors. This supports the implementation of more dense systems (i.e. the greatest number of transistors per chip) and faster (i.e. that can perform

The growth rate of the amount of transistors in integrated circuits has faithfully obeyed Moore's Law, it allows to have smaller transistors on a chip. As the transistors were becoming

constraint and it's called the Instruction Level Parallelism (ILP) wall [23].

phones, videogames, electrical appliance, automobiles, digital clocks, etc.

to the exit.

algorithm.

**3.2. CPU performance**

that have slowed down the CPU performance.

more instructions per unit time).

Returning to our assembly line, we saw it with a lot of high speed machines, but now its performance could be limited by the second constraint of the system performance. The speed of the incoming parts to the assembly line could not be enough to keep all these machines busy. If we don't have all parts ready, we will be forced to stop until they become available again. In the same way, faster CPUs require more data ready to be processed. The limitation presented nowadays in the communication channels to supply the demand of data processing units is called memory wall.

This wall was initially treated by improving the access paths between the warehouse and the assembly line, which in a CPU, means improving the transmission lines between memory and CPU on the printed circuit board.

The second form, the memory wall was faced, is more complex, but takes into account all the process since the manufacturer deliver. Analyzing the features of the warehouse parts, we can find different kinds. The part manufacturers have large deposits with a lot of parts ready to be used, but use a part from these deposits is expensive in time. On the other hand, the warehouse next to the assembly line have stored a smaller number of pieces of different kinds, because we must remember that the assembly line can produce different types of cars. The advantage of this warehouse is that the parts reach faster the points of the production line where they are needed. Some deposits are larger but their transportation time to the line is slower, besides, the nearest deposit has a lower storage capacity. The line could improve its performance taking advantage of this deposit features.

Similarly occurs with the computing system memory. High capacity memory such as hard disks, are read by blocks, and access one single data requires to transport a large volume of information that is locally stored in RAM, which represents a nearest deposit. Then the single data can be read and taken to the CPU. The memory organization in a modern computing system is arranged hierarchically as the figure 6 shows. Even in this organization, the assembly line has provided small storage spaces next to the machines that require specific parts. In a CPU this is called CPU cache.

The memory hierarchy creates local copies of the data that will be handled in certain algorithm on fast memories to speed up its access. The challenge of such systems is always have available the required data in the fast memory, because otherwise, the CPU must stop while the data is accessed in the main memory.

Currently, the three computing barriers are present and they are the main cause of stagnation in which CPU based technology has fallen. Some solutions have been proposed based on alternative technologies, such as the graphics processing units (GPU) and Field Programmable Gate Array (FPGA), and have intended to mitigate these phenomena, [4, 22]. They seem to be a short and mid-term solutions respectively.
