**1. Introduction**

The success of grid systems can be verified by the increasing number of middleware systems, actual production grids, and dedicated forums that appeared in recent years. The use of grid computing technology is increasing rapidly, reaching more scientific fields and encompassing a growing body of applications (Grandinetti, 2005; Wilkinson, 2009).

A grid might be seen as a way to interconnect clusters that is much more convenient than the construction of huge clusters. Another possible approach for conceiving a grid is the opportunistic use of workstations of regular users. The focus of an opportunistic grid middleware is not on the integration of dedicated computer clusters (e.g., Beowulf) or supercomputing resources, but on taking advantage of idle computing cycles of regular computers and workstations that can be spread across several administrative domains.

In a desktop grid, a large number of regular personal computers are integrated for executing large-scale distributed applications. The computing resources are heterogeneous in respect to their hardware and software configuration. Several network technologies can be used on the interconnection network, resulting in links with different capacities in respect to properties such as bandwidth, error rate, and communication latency. The computing resources can also be spread across several administrative domains. Nevertheless, from the user viewpoint, the computing system should be seen as a single integrated resource and be easy to use.

If the grid middleware follows an opportunistic approach, resources do not need to be dedicated for executing grid applications. The grid workload will coexist with local applications executions, submitted by the nodes regular users. The grid middleware must take advantage of idle computing cycles that arise from unused time frames of the workstations that comprise the grid. By leveraging the idle computing power of existing commodity workstations and connecting them to a grid infrastructure, the grid middleware allows a better utilization of existing computing resources and enables the execution of computationally-intensive parallel applications that would otherwise require expensive cluster or parallel machines.

Next, we will concentrate in two central issues in the development of an opportunistic grid infrastructure: application scheduling and execution management and fault tolerance.

Efficient Parallel Application Execution on Opportunistic Desktop Grids 115

A programming model is a necessary underlying feature of any computing system. Programming models provide well-defined constructs to build applications and are a key element to enable interoperation of application components developed by third parties. A programming model can be tied to a given programming language or it can be a higher-level abstraction layered on top of the language. In the latter case, different application components can be built using different programming languages, and the programming model, as long as properly implemented by a platform, serves as the logical bridge between them. In the heterogeneous environment of computing grids, the need for such high-level programming models is even more evident as there can be many different types of machine architecture and programming languages, all needing to be integrated as part of a seamless environment for

While it is largely acknowledged that no one-size-fits-all solution exists when it comes to programming models, one can argue that some programming models are best suited for particular kinds of problems than others (Lee & Talia, 2003). Considering that grid computing environments can be used to run different kinds of applications in different domains, such as e-science, finance, numeric simulation and, more generally, virtual organizations, it follows that having a variety of programming model to choose from may be an important factor. A number of well-known programming models have been investigated for grid computing, including, but not limited to, remote procedure calls (as in RPC and RMI), tuple spaces, publish-subscribe, message passing, and Web services, as well as enhancements of such models with non-functional properties such as fault tolerance, dependability and security (Lee & Talia, 2003). Note that a main emphasis of programming models for grid computing is on communication abstractions, which is due to the fact that interaction among distributed

The InteGrade middleware offers a choice of programming models for computationally intensive distributed parallel applications, MPI (Message Passing Interface), and BSP (Bulk Synchronous Parallel) applications. It also offers support for sequential and bag-of-tasks applications. The remainder of this section presents the basic concepts of the InteGrade

The basic architectural unit of an InteGrade grid is a cluster, a collection of machines usually connected by a local network. Clusters can be organized in a hierarchy, enabling the construction of grids with a large number of machines. Each cluster contains a *cluster manager* node that hosts InteGrade components responsible for managing cluster resources and for inter-cluster communication. Other cluster nodes are called *resource providers* and export part of their resources to the grid. They can be either shared with local users (e.g., secretaries using a word processor) or dedicated machines. The cluster manager node, containing InteGrade management components, must be a stable machine, usually a server, but not necessarily dedicated to InteGrade execution only. In case of a cluster manager failure, only its managed

**2. Application programming models**

application components is a key issue for grid applications.

middleware and its support for parallel programming models.

**2.1 Introduction to the InteGrade grid middleware**

distributed applications.

Over the last decade, opportunistic desktop grid middleware developers have been constructing several approaches for allowing the execution of different application classes, such as: (a) sequential applications, where the task to be run is assigned to a single grid node; (b) parametric or bag-of-tasks applications, where several copies of a task are assigned to different grid nodes, each of them processing a subset of the input data independently and without exchanging data; (c) tightly coupled parallel applications, whose processes exchange data among themselves using message passing or shared memory abstractions.

Due to the heterogeneity, high scalability and dynamism of the execution environment, providing efficient support for application execution on opportunist grids comprises a major challenge for middleware developers, that must provide innovative solutions for addressing problems found in areas, such as:

**Support for a variety of programming models**, which enables the extension of the benefits of desktop grids to a larger array of application domains and communities, such as in scientific and enterprise computing, and including the ability to run legacy applications in an efficient and reliable way. Important programming models to consider include message-passing standards, such as MPI (MPI, 2009), BSP (Bisseling, 2004; Valiant, 1990), distributed objects, publish-subscribe, and mobile agents. In this chapter we will concentrate on the support for parallel application models, in particular MPI and BSP, but also pointing to the extensions of grid management middleware to support other programming models.

**Resource management**, which encompasses challenges such as how to efficiently monitor a large number of highly distributed computing resources belonging to multiple administrative domains. On opportunistic grids, this issue is even harder due to the dynamic nature of the execution environment, where nodes can join and leave the grid at any time due to the use of the non-dedicated machines by their regular (non-grid) users.

**Application scheduling and execution management**, which also includes monitoring, that must provide user-friendly mechanisms to execute applications in the grid environment, to control the execution of jobs, and to provide tools to collect application results and to generate reports about current and past situations. Application execution management should encompass all execution models supported by the middleware.

**Fault tolerance**, that comprises a major requirement for grid middleware as grid environments are highly prone to failures, a characteristic amplified on opportunistic grids due their dynamism and the use of non-dedicated machines, leading to a non-controlled computing environment. An efficient and scalable failure detection mechanism must be provided by the grid middleware, along with a means for automatic application execution recovery, without requiring human intervention.

In this chapter, we will provide a comprehensive description of reputable solutions found in the literature to circumvent the above described problems, emphasizing the approaches adopted in the InteGrade1 (da Silva e Silva et al., 2010) middleware development, a multi-university effort to build a robust and flexible middleware for opportunistic grid computing. InteGrade's main goal is to be an opportunistic grid environment with support for tightly-coupled parallel applications. The next section gives an overview of grid application programming models and provides an introduction to the InteGrade grid middleware, discussing its support for executing parallel applications over a desktop grid platform.

<sup>1</sup> Homepage: http://www.integrade.org.br

Next, we will concentrate in two central issues in the development of an opportunistic grid infrastructure: application scheduling and execution management and fault tolerance.
