**3.1.2 (ii) Partitioning process of the computational tasks**

One of the challenging problems of the HW/SW co-design is to perform an efficient HW/SW partitioning of the computational tasks. The aim of the partitioning problem is to find which computational tasks can be implemented in an efficient hardware architecture looking for the best trade-offs among the different solutions. The solution to the problem requires, first, the definition of a partitioning model that meets all the specification requirements (i.e., functionality, goals and constraints).

Note that from the formal SW-level co-design point of view, such DEDR techniques (20), (21), (22) can be considered as a properly ordered sequence of the vector-matrix multiplication procedure that one can next perform in an efficient high performance computational fashion following the proposed bit-level high-speed VLSI co-processor architecture. In particular, for implementing the fixed-point DEDR RSF and RASF algorithms, we consider in this partitioning stage to develop a high-speed VLSI co-processor for the computationally complex matrix-vector SP operation in aggregation with a powerful FPGA reconfigurable architecture via the HW/SW co-design technique. The rest of the reconstructive SP operations are employed in SW with a 32 bits embedded processor (MicroBlaze).

This novel VLSI-FPGA platform represents a new paradigm for real time processing of newer RS applications. Fig. 1 illustrates the proposed VLSI-FPGA architecture for the implementation of the RSF/RASF algorithms.

Once the partitioning stage has been defined, the selected reconstructive SP sub-task is to be mapped into the corresponding high-speed VLSI co-processor. In the HW design, the precision of 32 bits for performing all fixed-point operations is used, in particular, 9-bit integer and 23-bits decimal for the implementation of the co-processor. Such precision guarantees numerical computational errors less than 10-5 referring to the MATLAB Fixed Point Toolbox (Matlab, 2011).

#### **3.1.3 Aggregation of parallel computing techniques**

This sub-section is focused in how to improve the performance of the complex RS algorithms with the aggregation of parallel computing and mapping techniques onto HWlevel massively parallel processor arrays (MPPAs).

Fig. 1. VLSI-FPGA platform of the RSF/RASF algorithms via the HW/SW co-design paradigm.

The basic algebraic matrix operation (i.e., the selected matrix–vector multiplication) that constitutes the base of the most computationally consuming applications in the reconstructive SP applications is transformed into the required parallel algorithmic representation format. A manifold of different approaches can be used to represent parallel algorithms, e.g. (Moldovan & Fortes, 1986), (Kung, 1988). In this study, we consider a number of different loop optimization techniques used in high performance computing (HPC) in order to exploit the maximum possible parallelism in the design:


142 Applications of Digital Signal Processing

Thus, the result, **q**, can be interpreted as a rough estimate of **b**<sup>D</sup> = (**b – mb**) referred to as a

referred to as the reconstructed image frame. The matrix **A**D–1 = (**S**+**S** + DRSF**I**)–1 operating on **q** produces some form of inversion of the degradations embedded in the operator **S**+**S**. It is
