**2. Motivation**

Due to the decomposition of the original problems into a larger number of subproblems, the scheme is well suited to a parallelization approach, since the subproblems (which will be referred to as blocks in the following) are disjoint, and the elaboration of the blocks is intrinsically a parallelizable operation. Since the generation of the entire domain basis functions on each block is independent from the other blocks, a high scalability is expected. As an example, Figure 1 shows a fighter subdivided into 4 different blocks, identified by different colors. The four blocks can be processed in parallel by four different processes, in order to generate the basis functions describing each block. Parallelization can be achieved using two different techniques: parallel programming or Grid Computing. Parallel programming is the technique by which have been obtained the best results in this field (500 billion of unknowns) Mouriño et al. (2009). There are important barriers to the broader adoption of these methodologies though: first, the algorithms must be modified using parallel programming API like MPI (2011) and OpenMP (2011), in order to run jobs on selected cores. The second aspect to consider is the hardware: to obtain results of interest supercomputers with thousands of cores and thousands of GB of RAM are required. Typically these machines are made by institutions to meet specific experiments and do not always allow public access. Grid Computing is a rather more easily applicable model for those who do not fulfill the requirements listed above Foster & Kesselman (2003). For its adoption the only requirement is to have at disposal computers to use within the grid. The machines may be heterogeneous, both in terms of the types (workstations, servers) and hardware resources of each machine (RAM, CPU, HD); besides they can also be recovery machines. With the Virtualization technology the number of processes running on each machine (for multicore ones) can be optimized by instantiating multiple virtual nodes on the same physical machine. The real advantage of this model, however, is that researchers do not have to worry about rewriting code to make their application parallelizable. One of the drawbacks in the adoption of this model is the addition of overhead introduced by the grid (e.g., inputs time transfer) and by the hypervisor that manages the virtual machines. It has been shown, however, that the loss of performance due to the hypervisor is not particularly high, usually not exceeding 5% Chierici & Verald (2010). For the above reasons, in our case it was decided to grid infrastructure, composed of heterogeneous recovery machines, both physical and virtual, on which to run scientific applications without the need to modify the source codes of the used algorithms. The ultimate goal is the reduction of execution times of CEM applications.

Fig. 1. A fighter discretized with 4774 triangles is subdivided into 4 blocks, shown with different colors.
