**5.3 Fuel temperature approximation**

The calculation of the fuel temperature usually involves the temperature of the outer surface and the center, and then the effective temperature of the fuel could be obtained for rod-like geometry. The geometry is divided into different sections to solve the heat conduction equation in radial direction, with different approximations are taken account. The fuel rod of PWR is usually delimited as three sections in some programs, and the 1D conduction equation is constructed at each area. After knowing the boundary temperature, the Rowland's equation could be used to compute the effective temperature.

Fuel temperature approximation is necessary for the reactor core coupling calculation, but reasonable boundaries come from other discipline procedure. K-MOD coupling program has been realized, and according to the above description, fuel temperature approximation could be abstracted in **Table 7**.

Thermal feedback controller carries out the function of management and as an external module that will be called repeatedly in K-MOD. The features of the parallel algorithm are less obvious and can be listed in two points: (1) the computing


**Table 6.**

*Parameters and time results of two assembly examples.*


#### **Table 7.**

*Abstract components in K-MOD.*

granularity is small and the single cyclic time is short and (2) the computing pattern could be abstracted and reused if lower level parallel method is researched; thus, SIMD technology is suitable for programming.

In the sample program K-MOD, fuel temperature calculation could be abstracted into three classes, such as op\_basic represents the basic arithmetic, but op\_math stands for the compound operation, such as square, logarithm, and so forth.


The former deals with integer data, while the latter two deal with floating-point operations. The problem of rod inserting and lifting in real PWR core is tested and compared. The similar imitated structure solves 50,000 times the same as the K-MOD program, with the SIMD hand optimization, which is based on data structure named \_\_m128i and \_\_m128. The time results are shown in **Table 8**.

Performance conclusion: The statistical results of multiple runs show that SIMD scheme has performance bonus for the third class of calculation content; however, other classes get no gain. Looking up Intel-related documents, it is found that SIMD programming depends on the specific hardware and instruction set usage and depends on the compiler behavior, which is not suitable for manual SIMD rewriting with care in practice.

### **5.4 3D SN**

The sweep operation of discrete ordinate method iterates every angle direction in each octant, which has famous KBA parallel sweep algorithm for structured grids. 3D SN method translates Boltzmann transport equation into the matrix system of flux moment as in Eqs. (8) and (9) in the sample program Hydra:


**Table 8.**

*Time results of three classes on parallel contents.*


**Table 9.**

*Abstract components in Hydra.*

$$L\underline{\mu} = \mathbf{MS}\phi + \overline{Q} \tag{8}$$

$$
\phi = D\psi \tag{9}
$$

where *Q* and *L* denote the source term and the difference operator, respectively. Discrete operator of flux moment matrix *M* need to multiply the cross-sectional data, such as the scattering matrix *S*. The total meshes are ð Þ *I*, *J*, *K* in each angle direction, and then every section *Ia*, *Jb* ð Þ , *K* that are divided could be mapped into corresponding processes in parallel. According to the above description, SN transport calculation could be abstracted as in **Table 9**.

There are multiple sweeping levels in the SN method, and the section *Ia*, *Jb* ð Þ ,*K* could be represented by the loop structure in every octant, and the sweeping procedure can be described as follows:


Since there is a lack of parallel programming for this loop structure in Hydra, the features of the algorithm are listed.


According to the identified features, OpenMP guide sentences are utilized, and lock array done [I][J] is used to ensure data consistency.



#### **Table 10.**

*Time results by the compile level -O0.*

**Table 10** collects the running time of two benchmark example, which utilizes the optimization level -O0, and only one MPI process is fixed during the experiment.

Performance conclusion: The speedup ratio is almost 1.8 and 1.6, which illuminates that the pipeline structure can gain performance bonus with small scale *Ia*, *Jb* ð Þ for the SN method, such as the sample program Hydra. It can be further expanded to the parallel strategy of MPI + OpenMP.
