**3.5 The results of the ME block applied to scheduling tasks approach**

Some experimental curves of execution time for the ME block applied to the DAG algorithm are presented in **Figure 13**, obtained in OVP for SoC and MPSoC based ARM Cortex A9MP. The important metric in our works is execution time.

*Approximation Algorithm for Scheduling a Chain of Tasks for Motion Estimation… DOI: http://dx.doi.org/10.5772/intechopen.97676*

#### **Figure 12.**

*Independence tasks in sequences with CPU.*


#### **Table 4.**

*Modeling test sequences using the DAG algorithm.*

After, execution time simulation, we chose co-simulation in OVP. In this virtual platform, we can try variety SoC and MPSoC targets in OVP. The execution time is ameliorated compared with other results in literature with respect to the number or frames in test video sequences. For more details we can see [9], in our works we presents a different values in three models in scheduling and partitioning tasks codesigned in platform OVP (SW/HW). Our scheduling algorithm DAG applied to ME block in video codec is optimal because 0ð Þ < ¼ *C*< ¼ 0 : 5 . *T*<sup>1</sup> is the execution time for one processor and *T*<sup>2</sup> is the execution time for many processors. The gain for the application is computed in Eq. 12.

$$Gain = [(|T\mathbf{1} - T\mathbf{2}| \div T\mathbf{1}) \times \mathbf{100}] \tag{12}$$

Also, our parameters for DAG and GGEN algorithms applied to ME blocks in codec video is very important if you compared with others approachs in scheduling tasks. In **Figure 13**, we give the formulations for scheduling dependent tasks into

**Figure 13.** *Results execution time in SoC and MPSoCs targets in OVP with DAG algorithm.*

homogeneous multiprocessor architectures of an arbitrary DAG and GGEN, taking into account communication delays. The time execution is in seconds. T1 presents the execution time for simulations with C/C++. Then T2 is the execution time for simulations with SoC platform in OVP. T3 and T4 present the execution times when scheduling and partitioning tasks with the MPSoC system in OVP. P1 is the platform Versatile ARM CortexA9MP\*4, P2 is the platform Ukernel arm Cortex A9MP\*4.

We observe our results in simulation for execution time are very more high compared with our results in co-simulation in Virtual Platform OVP, we can see the important minimization. **Figure 14** shows the results obtained for the gain of the entire test frame with the DAG algorithm. From **Figure 14**, we can see that using our technique reduces the execution time of the ME block. The experimental results illustrates the substantial enhancement of execution time. The scheduling and partitioning tasks algorithm DAG is give a true parallism with 4 processors in SoC and MPSoCs targets. For those comparisons, the metric value is the latency time of executions "t" for the SoC system. We selected SOC and tow platforms MPSoC (the models of MPSoC are: Platform including ARM Cortex-A9MPx4 to run ARM MPCore Sample Code and Versatile Express booting Linux on Cortex-A9MP Single, Dual and Quad Core in OVP). We remark that the execution time decreases when changing the platform, as in SoC and MPSoC systems. We deduce that the

**Figure 14.** *The results of gain for the entire test frame with the DAG algorithm.*

*Approximation Algorithm for Scheduling a Chain of Tasks for Motion Estimation… DOI: http://dx.doi.org/10.5772/intechopen.97676*


#### **Table 5.**

*The results of the "F(sc)" function of the ME block for the different test sequences.*

scheduling methodology has beneficial results for the execution times in OVP for the various test video sequences.

We show in this part the evaluation of the criteria of our application treated in our research work. We calculate the complexity of the H 264 video codec for ME block in the Eq. (14). **Table 5** prensent a results for the function F(sc) of ME blocks. The function (Fsc) is the function for calculating the complexity of the EM block; such as "(w, h)" frame size of the test sequence, "p" is the maximum authorized displacement, "N" is the size of MB processed, "h" is the factor and w is the width.

$$F\_{\kappa}(Im) = \left( (h \times w)(2 \times p + 1)^2 \times N^2 \right) \text{ave}cp = 4 \tag{13}$$

$$F\_{\kappa}(Im) = \left[ \left( (h \times w)(2 \times p + 1)^2 \times N^2 \right) \right] \times (NF) \tag{14}$$

We notice that the sequences that we use in our research are very complex, so we need a very optimal, precise, generic and automatic approach. With this we get good results in all H 264/AVC video codec.

### **4. Conclusion**

In this paper, we presented a new scheduling and partitioning tasks algorithm DAG applied to ME for MPSoC platforms in OVP. The main contributions of our approach are the following ones: the semi-automatic scheduling and partitioning tasks, the performance with respect to granularity, the high quality, the accuracy and the short time execution for ME blocks in H 265 video codec. Stemming from complexity and profiling analysis, the DAG algorithm with ME blocks for three architectures platforms are presented in OVP (Soc and MPSoC system). The prototype SoC and MPSoCs system in OVP results highlighted that the processors have interesting performances, complexity, and execution times compared to other published solutions. This is visible in the tables and figures. The co-design HW/SW high level is also presented in SoC and MPSoC systems in OVP with IP of the DAG algorithm applied to ME blocks in H 265 video codec. Our scheduling and partitioning algorithm DAG is able to handle the execution time efficiently with

very limited resources, dropping the appropriate tasks in order to reduce the deadline time and the execution time in ME blocks. Our scheduling and partitioning algorithm DAG is fully semi-automatic to the characteristics of each application and it does not require any offline profiling data. The DAG model solution as a lowcomplexity applied to ME blocks for H 265 compared with other approaches. Besides, our results for the H 265 video codec have demonstrated that our proposed low-complexity solution for the general DAG model reduces the execution time by up to 70 %.

The execution times computed theoretically are almost the same that are found in OVP. We have also shown how our solution efficiently adapts with respect to the DAG type, and scales well with the number of cores and the number of deadlines considered in the buffer. The design of the bus and the interface of SoC and MPSoC platforms based on Arm Cortex A 9 MP is also described, allowing direct integration of the IP cores on-chip communication used in MPSoC for H 265 video codec. In a future work, we will try to minimize other metrics in the H 265 video codec. We will use an automatic scheduling and partitioning algorithm DAG. We will also implement this work in a real target, based on the ARM Cortex A 9 MP, which is composed of four processors and named Embest SABER Lite, Target from Development SABER Lite-i.MX 6 Quad.
