3.5.5 Boundary updating

#### 3.5.6 Control unit

The interface of Control unit is described as follows.

.wrenTemp(wrenTemp),

Solving Partial Differential Equation Using FPGA Technology

.wraddressHUV(wraddr), .rdaddressHUV(rdaddr),

.EnableBoundaryUpdating(EnableBoundaryUpdating),

clk,rdaddr,doutH,doutU,doutV, wraddr,wren,HNew,UNew,VNew);

matrixhin,matrixuin,matrixvin);

matrixhout,matrixuout,matrixvout,

HNewTemp,UNewTemp,VNewTemp,

rdaddrTemp,doutHNewTemp,doutUNewTemp,doutVNewTemp);

EnableBoundaryUpdating,

clk,wraddrTemp,wrenTemp,HNewTemp,UNewTemp,

HNew,UNew,VNew);

doutHNewTemp,doutUNewTemp,doutVNewTemp,

clk,doutH,doutU,doutV,

.matrixhin(matrixhin), .matrixuin(matrixuin), .matrixvin(matrixvin), .matrixhout(matrixhout), .matrixuout(matrixuout), .matrixvout(matrixvout));

.clk(clk),

.clk(clk),

DOI: http://dx.doi.org/10.5772/intechopen.84588

.wren(wren), .start(start),

.finish(finish)); InputMemoryHUV #(N) InputMemory(

BoundaryUpdatingHUV #(N) Boundary(

TempMemoryHUV #(N) TempMemory(

VNewTemp,

endmodule

InputBuffer #(M,N) Buffer(

CNNCore #(M,N) uut(

Figure 23.

213

Interface for Input and Temp memory h, u, v.

#### 3.5.7 System scheme

To verify the system, the interface of the top module of the system should include all the signals that we want to verify.

The top module is described as follows.

Control CU(

.CountCLK(CountCLK), .wraddressHUVTemp(wraddrTemp), .rdaddressHUVTemp(rdaddrTemp),

Solving Partial Differential Equation Using FPGA Technology DOI: http://dx.doi.org/10.5772/intechopen.84588

3.5.6 Control unit

3.5.7 System scheme

Control CU(

212

include all the signals that we want to verify. The top module is described as follows.

.CountCLK(CountCLK),

.wraddressHUVTemp(wraddrTemp), .rdaddressHUVTemp(rdaddrTemp),

The interface of Control unit is described as follows.

Boundary Layer Flows - Theory, Applications and Numerical Methods

To verify the system, the interface of the top module of the system should

```
.wrenTemp(wrenTemp),
              .clk(clk),
              .wraddressHUV(wraddr),
              .rdaddressHUV(rdaddr),
              .wren(wren),
              .start(start),
              .EnableBoundaryUpdating(EnableBoundaryUpdating),
              .finish(finish));
InputMemoryHUV #(N) InputMemory(
                        clk,rdaddr,doutH,doutU,doutV,
                        wraddr,wren,HNew,UNew,VNew);
InputBuffer #(M,N) Buffer(
                        clk,doutH,doutU,doutV,
                        matrixhin,matrixuin,matrixvin);
CNNCore #(M,N) uut(
                        .clk(clk),
                        .matrixhin(matrixhin),
                        .matrixuin(matrixuin),
                        .matrixvin(matrixvin),
                        .matrixhout(matrixhout),
                        .matrixuout(matrixuout),
                        .matrixvout(matrixvout));
BoundaryUpdatingHUV #(N) Boundary(
                        matrixhout,matrixuout,matrixvout,
                        doutHNewTemp,doutUNewTemp,doutVNewTemp,
                        EnableBoundaryUpdating,
                        HNewTemp,UNewTemp,VNewTemp,
                        HNew,UNew,VNew);
TempMemoryHUV #(N) TempMemory(
              clk,wraddrTemp,wrenTemp,HNewTemp,UNewTemp,
              VNewTemp,
              rdaddrTemp,doutHNewTemp,doutUNewTemp,doutVNewTemp);
```
endmodule

Figure 23. Interface for Input and Temp memory h, u, v.


#### Figure 24.

An example of h.core file to initial data for the Input memory h.

### 3.6 Simulation results

The ISE design software shows the device utilization summary as in Table 1. Figures 25–27 show the schematics synthesized by the ISE design software. Comparing the new values of h in Figure 28i, k (doutH) with Figure 29, we can see that the 3x4 CNN system worked well.

The simulation results show the properness and effectiveness of installation methods. The cost for calculating the first three blocks of 1xN taken from memory units h, u, v is 10 clock pulses, of which 1 clock pulse is for initial reading Input memory, 3 clock pulse is for initial updating buffer to CNN, and 6 clock pulses for initial calculation. Each successive 1xN unit takes only 1 clock pulse to calculate, due to the use of the pipeline mechanism to update buffer to CNN and calculate at CNN arithmetic unit. After finishing reading each column of blocks of data in the Input memory, it needs 2 more clocks for initiating the buffer again. It also takes 1 clk for initial writing Temp memory, 1 clk for initial reading Temp memory, and 1 clk for initial writing result back to Input memory.


As a result, the time for one computing cycle is:

Solving Partial Differential Equation Using FPGA Technology

DOI: http://dx.doi.org/10.5772/intechopen.84588

4. Conclusion

215

Figure 25.

The architecture of CNN chip.

As the above implementation, m = 8, Q = 2, and T = 32 (clk).

T ¼ 8 þ m Qð Þ þ 1 ð Þ clk

This chapter gives the solution for configuring CNN chip to solve Navier-Stokes equations, especially concerning to solution in the temporary boundary problem

Table 1. Device utilization summary.

Figure 25. The architecture of CNN chip.

As a result, the time for one computing cycle is:

$$T = \mathbf{8} + m(Q+\mathbf{1})\ (clk)$$

As the above implementation, m = 8, Q = 2, and T = 32 (clk).
