Preface

This edited book aims to present the state of the art in research and development of the convergence of high-performance computing and parallel programming for various applications. The book has consolidated algorithms and techniques to bridge the gap between the theoretical foundations of academia and implementation for research, which might be used in business applications in the future.

The book outlines techniques and tools used for emergent areas and domains, including acceleration of large-scale electronic structure simulations with heterogeneous parallel computing, characterizing power and energy efficiency of a data-centric HPC runtime and applications, security applications of GPUs, parallel implementation of multiprocessors on MPI using FDTD, particle-based fused rendering, design and implementation of particle systems for mesh-free methods with high performance, and evolving topics on heterogeneous computing.

It is certainly not an ambition to cover everything on HPPC in this book; rather this edited work features the latest methodologies, technical progress, and the direction in which the research is going.

The intended audience of this book will mainly consist of researchers, research students, and practitioners in the area of HPPC. I would like to convey my appreciation to all the contributors, including the accepted chapters' authors and many others who submitted their chapters but couldn't be included in the book due to various limitations.

My thanks to the editorial team, especially Mrs. Marina Dusevic, for their kind support and great efforts in publishing this book on time.

> **Satyadhyan Chickerur** KLE Technological University, Hubballi, India

**1**

biotechnology.

**Chapter 1**

*Satyadhyan Chickerur*

interest was also sustained because of:

industry.

**1. Introduction**

Introductory Chapter: High

Performance Parallel Computing

High performance computing research had an interesting journey from the year

I.Real time applications needed parallel processing for improving speedup.

II.Growing interest in artificial intelligence and machine learning.

III.Huge amount of data collected because of the revolution in the mobile

The past decade has seen democratization of massively parallel computing with the introduction of accelerators in various forms, GPUs, and technologies like cloud computing, HPC, Big data, and web technologies converging. The extent to which we can parallelize an application/program depends on the granularity required. A program running on the system can be considered as a big task which can be split into smaller tasks and if these smaller tasks can be executed in parallel, then the parallel programming can be applied to achieve better performance. Because of the various new parallel architectures available, the developers now are in a better position to decide whether they want the application to be coarse grained or fine grained. The next decade will see a rise in the applications related to computer vision, image and video processing, machine and deep learning, web services, natural language processing, medicine, drug discovery, autonomous driving, and

The research/application development today aims to solve the real world problems in the area of healthcare, automobiles, weather, security, etc. The chapters in this edited volume try to capture few of the recent research work in related areas.

1972 to this day. In the initial years HPC was considered synonyms with supercomputing and was accessible to the scientists and researchers who worked in the domain of aeronautics, automobiles, petrochemicals, pharmaceuticals, particle physics, weather forecasting, etc. to name a few. Next came a phase where the term supercomputing gradually was replaced by high performance computing and the computing power gradually shifted to PCs in the form of multicore processors for various reasons. This was the time when lot of researchers saw benefit in parallelizing their applications achieving speedups, scale ups and robustness. This was possible because of concepts like Message passing interface, OpenMP, etc. which got evolved. Lot of research was carried out related to HPC systems architecture, computational models, parallel algorithms, and performance optimization, as a result of which renewed interest was created in parallel computing for HPC. This

### **Chapter 1**

## Introductory Chapter: High Performance Parallel Computing

*Satyadhyan Chickerur*

### **1. Introduction**

High performance computing research had an interesting journey from the year 1972 to this day. In the initial years HPC was considered synonyms with supercomputing and was accessible to the scientists and researchers who worked in the domain of aeronautics, automobiles, petrochemicals, pharmaceuticals, particle physics, weather forecasting, etc. to name a few. Next came a phase where the term supercomputing gradually was replaced by high performance computing and the computing power gradually shifted to PCs in the form of multicore processors for various reasons. This was the time when lot of researchers saw benefit in parallelizing their applications achieving speedups, scale ups and robustness. This was possible because of concepts like Message passing interface, OpenMP, etc. which got evolved. Lot of research was carried out related to HPC systems architecture, computational models, parallel algorithms, and performance optimization, as a result of which renewed interest was created in parallel computing for HPC. This interest was also sustained because of:


The past decade has seen democratization of massively parallel computing with the introduction of accelerators in various forms, GPUs, and technologies like cloud computing, HPC, Big data, and web technologies converging. The extent to which we can parallelize an application/program depends on the granularity required. A program running on the system can be considered as a big task which can be split into smaller tasks and if these smaller tasks can be executed in parallel, then the parallel programming can be applied to achieve better performance. Because of the various new parallel architectures available, the developers now are in a better position to decide whether they want the application to be coarse grained or fine grained. The next decade will see a rise in the applications related to computer vision, image and video processing, machine and deep learning, web services, natural language processing, medicine, drug discovery, autonomous driving, and biotechnology.

The research/application development today aims to solve the real world problems in the area of healthcare, automobiles, weather, security, etc. The chapters in this edited volume try to capture few of the recent research work in related areas.

### **2. Conclusions**

The next decade will see the convergence of high performance computing and massively parallel computing for various applications and will help the researchers and scientists to solve problems which few years back were thought to be unsolvable.

### **Acknowledgements**

The editor would like to acknowledge the support of the management of K LE Technological University during the duration of the editing of this book. This edition would not have been released on time without the support of the editorial team of IntechOpen, especially the Author Service Manager, Mrs. Marina Dusevic.

### **Author details**

Satyadhyan Chickerur KLE Technological University, Hubballi, India

\*Address all correspondence to: chickerursr@kletech.ac.in

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**3**

**Chapter 2**

**Abstract**

Computing

large-scale sparse matrices.

**1. Introduction**

*Oh-Kyoung Kwon and Hoon Ryu*

Acceleration of Large-Scale

with Heterogeneous Parallel

Electronic Structure Simulations

Large-scale electronic structure simulations coupled to an empirical modeling approach are critical as they present a robust way to predict various quantum phenomena in realistically sized nanoscale structures that are hard to be handled with density functional theory. For tight-binding (TB) simulations of electronic structures that normally involve multimillion atomic systems for a direct comparison to experimentally realizable nanoscale materials and devices, we show that graphical processing unit (GPU) devices help in saving computing costs in terms of time and energy consumption. With a short introduction of the major numerical method adopted for TB simulations of electronic structures, this work presents a detailed description for the strategies to drive performance enhancement with GPU devices against traditional clusters of multicore processors. While this work only uses TB electronic structure simulations for benchmark tests, it can be also utilized as a practical guideline to enhance performance of numerical operations that involve

**Keywords:** offload computing, GPU devices, large-scale electronic structure

As the dimension of functional semiconductor devices are scaled down to decananometer (nm) sizes, the underlying material can no longer be considered continuous. The number of atoms in the active device region becomes countable in the range of ~50 K to a few millions, and their local arrangements in interfaces, alloys, and strained systems give non-negligible effects on device characteristics [1–3]. Also, most experimentally realized structures are not infinitely periodic, but are finite in sizes; such geometries call for a local orbital basis, rather than a plane wave basis which implies infinite periodicity. As full *ab-initio* methods such as density functional theory are in principle hard to simulate electronic structures of such a huge and discrete atomic structures [4, 5], the necessity of atomistic approaches

The *spds\** 10-band tight-binding (TB) approach, which employs a set of 10 localized orbital bases to describe a single atom, has been extensively used to explain

simulations, tight-binding approach, nanoelectronics modeling

based on an empirical modeling method is quite huge.

### **Chapter 2**

*High Performance Parallel Computing*

**2. Conclusions**

unsolvable.

**Acknowledgements**

**2**

**Author details**

Satyadhyan Chickerur

provided the original work is properly cited.

KLE Technological University, Hubballi, India

\*Address all correspondence to: chickerursr@kletech.ac.in

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

The next decade will see the convergence of high performance computing and massively parallel computing for various applications and will help the researchers and scientists to solve problems which few years back were thought to be

The editor would like to acknowledge the support of the management of K LE Technological University during the duration of the editing of this book. This edition would not have been released on time without the support of the editorial team of IntechOpen, especially the Author Service Manager, Mrs. Marina Dusevic.
