**2. Design and theoretical analysis of on-chip thermal ridge**

### **2.1 Theoretical analysis**

The thermal TSVs are intended to be placed in the inter-CG whitespace, which is called a thermal ridge. In this section, we derive analytical expressions for some key parameters.

#### **2.1.1 Analytical model of the thermal ridge**

18 VLSI Design

Thermal-aware floorplanning is the key in which the inter-layer interconnection plays a role more than just signal transmission or power delivery. Figure 1 depicts the usage of thermal TSV to alleviate the heat accumulation, which is brought from that used in printed circuit boards (PCBs) (Lee et al., 1992). For 3D ICs, the problems of high power/thermal density can be more serious than that in the planar form. Thus, the thermal TSVs become essential for heat dissipation. Of particular interest is the design of an efficient heat transferring path. Some recent works discussed the placement of thermal TSVs. However, not only the routing but also the floorplan may need to be changed substantially after the thermal TSVs are inserted (Tsai & Kang, 2000). This leads to long iterations. Further, as the circuit complexity is increased, to insert the thermal TSVs without largely changing the floorplan is an important technology to be developed (Tsui et al., 2003). In order to keep the original routing and floorplan as much as possible, the temperature-driven design should be

**Core 1 Core 2**

(a)

**Core 1 Core 2**

(b)

The thermal TSVs are intended to be placed in the inter-CG whitespace, which is called a thermal ridge. In this section, we derive analytical expressions for some key parameters.

Fig. 1. 3D IC implementations of a multiprocessor system-on-chip (MP-SoC) with (a) a

traditional structure and (b) with the insertion of thermal ridges.

**2.1 Theoretical analysis** 

**2. Design and theoretical analysis of on-chip thermal ridge** 

**Thermal TSV**

**Heat Sink Die Layer 1**

**Die Layer 2**

**Die Layer 3**

brought in early phases of the design procedure.

**Signal TSV**

At the transient state, the heat conduction can be described by the following equation

$$\frac{\partial}{\partial \mathbf{x}} k\_{\mathbf{xx}} \frac{\partial \mathbf{T}}{\partial \mathbf{x}} + \frac{\partial}{\partial y} k\_{yy} \frac{\partial \mathbf{T}}{\partial y} + \frac{\partial}{\partial z} k\_{zz} \frac{\partial \mathbf{T}}{\partial z} + \mathbf{g} = \rho \mathbf{C} \frac{\partial \mathbf{T}}{\partial \theta} \tag{1}$$

where *T* is the temperature, *g* is the heat generation rate in W/cm2, is the density of the material, *C* is the thermal capacity of the material, is time, and *k* is the thermal conductivity of the material. This fundamental thermal conduction equation describes that the temperature transmitting through the thermal volume depends on time *θ* and directional thermal conductivities *xx k* , *yy k* , and *zz k* (Chieh et al., 2010; Lung et al., 2010). The boundary conditions of the top and bottom surfaces of the chip are *adiabatic* and those of the surrounding surfaces are *convective*.

For dissipating the heat into the substrate homogeneously, the inter-core-group thermal ridges are aligned orthogonally in column and in row. The temperature prediction of the many-core system is performed by utilizing CFD-RC which is commercial thermal and fluidic temperature simulation software. However, in order to illustrate the physical phenomenon more intuitively, a simplified one-dimensional conduction equation without taking the transient into consideration is utilized.

$$\frac{\partial}{\partial \mathbf{x}} k\_{\text{xx}} \frac{\partial T}{\partial \mathbf{x}} = -\mathbf{g} \tag{2}$$

The heat removing rate of the thermal ridge is assumed to be *q*. Let us consider two CGs. The temperature distribution between CG1 and CG2 can be expressed by

$$T = T\_1 + \frac{T\_2 - T\_1}{w} x - \frac{q}{2k\_s} [w - x] x \tag{3}$$

where *T*1 and *T*2 are the temperatures of CG1 and CG2, respectively, *q* is the heat conducted to the ambient environment by the thermal ridge, *ks* is the equivalent thermal conductivity of the thermal ridge, and *w* is the width of the thermal ridge. Since *T* denotes the temperature at the location *x*, examining the mid-point *T*1/2 by substituting *x* with *w*/2 into (3), we have

$$w = \left(\frac{8k\_s}{q}\right)^{1/2} \left[\frac{T\_1 + T\_2}{2} - T\_{1/2}\right]^{1/2} \tag{4}$$

From (4), it is easy to see that if the mid-point temperature *T*1/2 is targeted to be lower, *w* needs to be larger.

#### **2.1.2 Effective thermal conductivity of the thermal ridge**

The equivalent thermal conductivity *kszz* of a thermal ridge is decided by the density of the thermal TSVs in the thermal ridge (Chieh et al., 2010; Lung et al., 2010). To determine *kszz*, the effective thermal conductivity should be taken into account and described as the following equation:

Three-Dimensional Integrated Circuits Design

where the heat sink would have been located originally.

Fig. 3. Insertion of type I and type II thermal ridges into the NoC.

et al., 2009; Xu et al., 2004).

planted.

for Thousand-Core Processors: From Aspect of Thermal Management 21

The cores are arranged as a 32 × 32 square mesh. Since the international technology roadmap for semiconductor (ITRS) predicts that the maximum chip size will maintain similar dimensions, we assume 20 mm × 20 mm as our upper bound. Under such a constraint, the remaining area not occupied by the tiles is the input/output and peripheral circuits. The total power consumption of the chip is around 20 W, which leads to the average power density of 5 W/cm2. Since ITRS also predicts the power density is reasonable up to the level of 100 W/cm2, the power density assumed in this chapter is a probable value (Brunschwiler

In this chapter, we assumed that there are three layers of the die stack and the many-core NoC is sandwiched in the middle. As mentioned earlier, a commercial tool based on finite element method (FEM) is used. The three-dimensional model of the NoC is created with the widely used package model, in a fashion similar to that shown in Figure 1. However, the heat sink is not modelled and analyzed in our case. Instead, it is simplified to a heat loss, and a proper heat transfer coefficient is applied to the boundary condition on the top surface

First, the 1,024 cores are divided into 8 × 8 CGs, each CG consisting of 4 × 4 cores. As shown in Figure 3, thermal ridges are inserted between the hottest CGs. By the locations where they are inserted, the thermal ridges can be categorized into two types. The type-I thermal ridge has a low density of thermal TSVs and the type-II thermal ridge has a high density of thermal TSVs. This is because the type-I thermal ridge is located between two CGs in which their routing dominates the most of the silicon area, even after the expansion to gain more whitespace. On the other hand, the type-II thermal ridge lies in the intersectional area having no wires passing through, and therefore, a large quantity of thermal TSVs can be

The physical effect of the thermal ridge can be illustrated by using the electrical lumped model as shown in Figure 4. By the duality between electrical and thermal models, the temperature *T* is substituted by a voltage *V*, the power *P* is substituted by a current *I*, and the thermal resistance *R* by definition is proportional to the reciprocal of thermal

$$k\_{szz} = d \cdot k\_{embb} + \left(1 - d\right)k\_{sub} \tag{5}$$

where *kemb* is the equivalent thermal conductivity of the thermal TSVs, *ksub* is the thermal conductivity of the silicon substrate, *d* is the percent contribution of the thermal TSVs in the thermal ridge. Since the orientation of the thermal TSV is longitudinal along the *z* direction, this effective thermal conductivity cannot be applied to the lateral heat transfer computation. For *x* and *y* directional heat transfer, the thermal conductivity should be applied by the following equation.

$$k\_{sxx} = \left(1 - \sqrt{m}\right)k\_{sub} + \frac{\sqrt{m}}{1 - \sqrt{m}} + \frac{\sqrt{m}}{k\_{cmb}} = k\_{syy} \tag{6}$$

where *m* is the percent contribution of the metal lines for thermal conduction in the silicon substrate. In general, the vertical thermal conductivity *kszz* is much larger than the lateral thermal conductivities *ksxx* and *ksyy*. By (5) and (6), we can clearly figure out that *ksxx* is around 10 W/mK and *kszz* is around 120 W/mK. Thus, the heat flows through the thermal ridge almost dissipates by the heat sink instead of transferring laterally. By substituting the equivalent *ks* and the temperature values of T1, T2 and T1/2 into (3), we obtained that the widths of the thermal ridge should be 200 µm ~ 400 µm.

#### **2.2 Design parameters and assumptions**

Here, we focus on a mesh-connected NoC with 1,024 cores. A globally asynchronous, locally synchronous (GALS) digital-signal processor (DSP) design is adopted (Tran et al., 2009a, 2009b; Truong et al., 2008). Each DSP, constituting a tile, is composed of a core with an onchip oscillator for its own clocking and a switch with associated buffers, as shown in Figure 2. The tile allows repetitive, mirrored layout, occupying an area of 0.168 mm2 (410 μm × 410 μm) (Tran et al., 2009a, 2009b). Consider a simple power map with two major sources in the tile. One is attributed to the computation and the other to the communication. Correspondingly, the average power consumption at the active status is broken down to 17.6 mW and 1.1 mW, respectively (Tran et al., 2009a, 2009b).

Fig. 2. The DSP element for a GALS many-core system.

where *kemb* is the equivalent thermal conductivity of the thermal TSVs, *ksub* is the thermal conductivity of the silicon substrate, *d* is the percent contribution of the thermal TSVs in the thermal ridge. Since the orientation of the thermal TSV is longitudinal along the *z* direction, this effective thermal conductivity cannot be applied to the lateral heat transfer computation. For *x* and *y* directional heat transfer, the thermal conductivity should be

1 *sxx sub syy*

where *m* is the percent contribution of the metal lines for thermal conduction in the silicon substrate. In general, the vertical thermal conductivity *kszz* is much larger than the lateral thermal conductivities *ksxx* and *ksyy*. By (5) and (6), we can clearly figure out that *ksxx* is around 10 W/mK and *kszz* is around 120 W/mK. Thus, the heat flows through the thermal ridge almost dissipates by the heat sink instead of transferring laterally. By substituting the equivalent *ks* and the temperature values of T1, T2 and T1/2 into (3), we obtained that the

Here, we focus on a mesh-connected NoC with 1,024 cores. A globally asynchronous, locally synchronous (GALS) digital-signal processor (DSP) design is adopted (Tran et al., 2009a, 2009b; Truong et al., 2008). Each DSP, constituting a tile, is composed of a core with an onchip oscillator for its own clocking and a switch with associated buffers, as shown in Figure 2. The tile allows repetitive, mirrored layout, occupying an area of 0.168 mm2 (410 μm × 410 μm) (Tran et al., 2009a, 2009b). Consider a simple power map with two major sources in the tile. One is attributed to the computation and the other to the communication. Correspondingly, the average power consumption at the active status is broken down to

*<sup>m</sup> k mk <sup>k</sup>*

*sub emb*

*m m k k*

1

widths of the thermal ridge should be 200 µm ~ 400 µm.

17.6 mW and 1.1 mW, respectively (Tran et al., 2009a, 2009b).

Fig. 2. The DSP element for a GALS many-core system.

**2.2 Design parameters and assumptions** 

applied by the following equation.

*szz emb* <sup>1</sup> *sub k dk dk* (5)

(6)

The cores are arranged as a 32 × 32 square mesh. Since the international technology roadmap for semiconductor (ITRS) predicts that the maximum chip size will maintain similar dimensions, we assume 20 mm × 20 mm as our upper bound. Under such a constraint, the remaining area not occupied by the tiles is the input/output and peripheral circuits. The total power consumption of the chip is around 20 W, which leads to the average power density of 5 W/cm2. Since ITRS also predicts the power density is reasonable up to the level of 100 W/cm2, the power density assumed in this chapter is a probable value (Brunschwiler et al., 2009; Xu et al., 2004).

In this chapter, we assumed that there are three layers of the die stack and the many-core NoC is sandwiched in the middle. As mentioned earlier, a commercial tool based on finite element method (FEM) is used. The three-dimensional model of the NoC is created with the widely used package model, in a fashion similar to that shown in Figure 1. However, the heat sink is not modelled and analyzed in our case. Instead, it is simplified to a heat loss, and a proper heat transfer coefficient is applied to the boundary condition on the top surface where the heat sink would have been located originally.

Fig. 3. Insertion of type I and type II thermal ridges into the NoC.

First, the 1,024 cores are divided into 8 × 8 CGs, each CG consisting of 4 × 4 cores. As shown in Figure 3, thermal ridges are inserted between the hottest CGs. By the locations where they are inserted, the thermal ridges can be categorized into two types. The type-I thermal ridge has a low density of thermal TSVs and the type-II thermal ridge has a high density of thermal TSVs. This is because the type-I thermal ridge is located between two CGs in which their routing dominates the most of the silicon area, even after the expansion to gain more whitespace. On the other hand, the type-II thermal ridge lies in the intersectional area having no wires passing through, and therefore, a large quantity of thermal TSVs can be planted.

The physical effect of the thermal ridge can be illustrated by using the electrical lumped model as shown in Figure 4. By the duality between electrical and thermal models, the temperature *T* is substituted by a voltage *V*, the power *P* is substituted by a current *I*, and the thermal resistance *R* by definition is proportional to the reciprocal of thermal

Three-Dimensional Integrated Circuits Design

temperature, is designed to be an on-chip heat sink.

the CG, the temperature distribution is asymmetric.

Fig. 5. Temperature distribution of the 16-core CG.

core.

**2.2.1 Rotation of the hotspots** 

for Thousand-Core Processors: From Aspect of Thermal Management 23

vertical thermal resistance *R*11 (*R*21) is much larger than the lateral thermal resistance *R*<sup>12</sup> (*R*22), the voltage *V*1 (*V*2) keeps at a high value. Figure 4(b) shows the case when a type-I thermal ridge is inserted between CG1 and CG2. Another conduction path is added through the thermal resistance *RTS*1. As aforementioned, *RTS*1 is inversely proportional to *ks*. As long as *ks* is much larger than the thermal conductivity *ksub* of the silicon substrate, *RTS*1 is much smaller than *R*11 (*R*21); the current *I*1 (*I*2) goes mostly through *RTS*1, rather than *R*11 (*R*21). In addition, by voltage division, *VTS*1 is obviously lower than *V*1 (or *V*2). In other words, the temperature of the type-I thermal ridge is definitely lower than the temperature of CG1 or CG2. Figure 4(c) shows the case when a type-II thermal ridge is inserted at the intersectional area between the CGs to remove more heat. The value of *RTS*2 depends on that of *ks*. Since the thermal TSVs are densely planted on the type-II thermal ridge, *RTS*2 is much smaller than *R*<sup>11</sup> (or *R*21). Compared with CG1 and CG2, the type-II thermal ridge, which has a lower

To verify the feasibility of the proposed scheme for thermal-aware floorplanning, we obtain the temperature distribution of the basic CG first. There are 4 × 4 cores within a CG as shown in Figure 5. The cores are homogenous, with the hotspot near the lower right corner. It is clear that since the hotspot is not located at the center of the core, when assembled into

Fig. 6. Temperature distribution of the 1,024-core NoC with the same orientation of each

conductivity *ks*. The availability of the thermal ridge can be modelled by the equivalent circuits as follows.

Fig. 4. Resistive thermal models of two adjacent CGs inserted with (a) no thermal ridge, (b) a type-I thermal ridge, and (c) a type-II thermal ridge.

Figure 4(a) shows the case when there is no thermal ridge between CG1 and CG2. It is clear in the schematic that no extra conduction path has been added to the ground. Since the vertical thermal resistance *R*11 (*R*21) is much larger than the lateral thermal resistance *R*<sup>12</sup> (*R*22), the voltage *V*1 (*V*2) keeps at a high value. Figure 4(b) shows the case when a type-I thermal ridge is inserted between CG1 and CG2. Another conduction path is added through the thermal resistance *RTS*1. As aforementioned, *RTS*1 is inversely proportional to *ks*. As long as *ks* is much larger than the thermal conductivity *ksub* of the silicon substrate, *RTS*1 is much smaller than *R*11 (*R*21); the current *I*1 (*I*2) goes mostly through *RTS*1, rather than *R*11 (*R*21). In addition, by voltage division, *VTS*1 is obviously lower than *V*1 (or *V*2). In other words, the temperature of the type-I thermal ridge is definitely lower than the temperature of CG1 or CG2. Figure 4(c) shows the case when a type-II thermal ridge is inserted at the intersectional area between the CGs to remove more heat. The value of *RTS*2 depends on that of *ks*. Since the thermal TSVs are densely planted on the type-II thermal ridge, *RTS*2 is much smaller than *R*<sup>11</sup> (or *R*21). Compared with CG1 and CG2, the type-II thermal ridge, which has a lower temperature, is designed to be an on-chip heat sink.
