**4.3 Stable storage**

To guarantee the continuity of application execution and prevent loss of data in case of failures, it is necessary to store the periodical checkpoints and application temporary, input, and output data using a reliable and fault-tolerant storage device or service, which is called stable storage. The reliability is provided by the usage of data redundancy and the system can determine the level of redundancy depending on the unavailability rate of the storage devices used by the service.

A commonly used strategy for data storage in computational grids usually store several replicas of files in dedicated servers managed by replica management systems (Cai et al., 2004; Chervenak et al., 2004; Ripeanu & Foster, 2002). These systems usually target high-performance computing platforms, with applications that require very large amounts (petabytes) of data and run on supercomputers connected by specialized high-speed networks.

When dealing with the storage of checkpoints of parallel applications in opportunistic grids, a common strategy is to use the grid machines used to execute applications to store checkpoints. It is possible to distribute the data over the nodes executing the parallel application that generates the checkpoints, in addition to other grid machines. In this case, data is transfered in a parallel way to the machines. To ensure fault-tolerance, data must be coded and stored in a redundant way, and the data coding strategy must be selected considering its scalability, computational cost, and fault-tolerance level. The main techniques used are *data replication*, *data parity*, and *erasure codes*.

Using data replication, the system stores full replicas of the generated checkpoints. If one of the replicas becomes unaccessible, the system can easily find another. The advantage is that no extra coding is necessary, but the disadvantage is that this approach requires the transfer and storage of large amounts of data. For instance, to guarantee safety against a single failure, it is necessary to save two copies of the checkpoint, which can generate too much local network traffic, possibly compromising the execution of the parallel application. A possible approach to adopt is to store a copy of the checkpoint locally and another remotely (de Camargo et al., 2006). Although a failure in a machine running the application makes one of the checkpoints unaccessible, it is still possible to retrieve the other copy. Moreover, the other application processes can use their local checkpoint copies. Consequently, this storage mode permits recovery as long as one of the two nodes containing a checkpoint replica is available.

The two other coding techniques decompose a file into smaller data segments, called stripes, and distribute these stripes among the machines. To ensure fault-tolerancy, redundant stripes are also generated and stored, permitting the original file to be recovered even if a subset of the stripes is lost. There are several algorithms to code the file into redundant fragments. A commonly used one is the use of data parity (Malluhi & Johnston, 1998; Plank et al., 1998; Sobe, 2003), where one or more extra stripes are generated based on the evaluation of the parity of the bits in the original fragments. It has the advantage that it is fast to evaluate and that the original stripes can be stored without modifications. But they have the disadvantage that data cannot be recovered if two or more fragments are lost and, consequently, cannot be used for storage in devices with higher rates of unavailability.

The other strategy is to use erasure coding techniques, which allow one to code a vector *U* of size *n*, into *m* + *k* encoded vectors of size *n*/*m*, with the property that one can regenerate *U*

using only *m* of the *m* + *k* encoded vectors. By using this encoding, one can achieve different levels of fault-tolerance by tuning the values of *m* and *k*. In practice, it is possible to tolerate *k* failures with an overhead of only *k*/*m* ∗ *n* elements. The information dispersal algorithm (IDA) (de Camargo et al., 2006; Malluhi & Johnston, 1998; Rabin, 1989) is an example of erasure code that can be used to code data. IDA provides the desired degree of fault-tolerance with lower space overhead, but it incurs a computational cost for coding the data and an extra latency for transferring the fragments from multiple nodes. But analytical studies (Rodrigues & Liskov, 2005; Weatherspoon & Kubiatowicz, 2002) show that, for a given redundancy level, data stored using erasure coding has a mean availability several times higher than using replication.

The checkpointing overhead in the execution time of parallel applications when using erasure coding, data parity and replication was compared elsewhere (de Camargo et al., 2006). The replication strategy had the smallest overhead, but uses more storage space. Erasure coding causes a larger overhead, but uses less storage space and is more flexible, allowing the system to select the desired level of fault-tolerance.
