**3. Chip design and implementation by using metallic thermal skeletons**

In this chapter, a realistic thermal dissipation enhancement methodology for NoC system will be introduced. The on-chip virtual 126-core network as the hot-spot dissipates the generated heat through the metallic thermal skeletons. To evaluate the feasibility of the thermal enhancement, 9 arrays of metallic thermal skeletons are designed in the test chip. Essentially, by improving the lateral thermal dissipation path by increasing the thermal metallic skeleton in the back end of line (BEOL) metals, the heat consumed by the virtual core can be conducted into the on-chip heat sink such as the TSVs. The temperature of the hotspot can be lowered substantially if the metallic thermal skeletons arranged properly. In addition, we design thermal sensor-network on chip to facilitate the measurement and evaluation for the capability of heat transfer. Last, some important thermal characteristics of metallic thermal skeleton are listed in this chapter. In order to design a better thermal dissipation path, metallic thermal skeletons can provide alternatives for just increasing the number of thermal TSVs.

Fig. 11. FEM simulation model and result. (a) Temperature profile. (b) Simulation model.

The FEM simulation is performed by using CFD-RC, based on the following assumptions. As shown in Figure 11, a TSV is on the left, and a heat source is on the right. The other half of the structure is mirrored to the cross section. The heat source consists of 12 squares, each with power of 0.5 mW, and area of 1 µm × 1 µm, which run to the top by local interconnects (not shown in the figure for they are buried in the structure), just shy of the front metal layer at the top. It is seen that the neighboring TSV is unconnected electrically and cold. The simulation assumes a TSV with dielectric thickness of 0.5 µm, diameter of 10 µm, and length of 50 µm.

Three-Dimensional Integrated Circuits Design

for Thousand-Core Processors: From Aspect of Thermal Management 29

Χ

(a) **Front Metal**

**Metal 1~4**

and TSVs. (b) The three-dimensional view of the metallic thermal skeletons.

Fig. 14. Concept of virtual block design.

(b) Fig. 13. The design of TSVs with metallic thermal skeletons. (a) The planar floorplan with , ,

**TSV**

Π

TSV

1-4 Metal

Φ

30um

120um

#### **3.1 Design of the proposed test chip**

#### **3.1.1 Overall floorplan of the chip**

The floorplan of the proposed test chip is depicted in the Figure 12. The metallic thermal skeletons are arranged and enclosed by the core-sensor blocks. The peripheral area is for input/output and power/ground connections which provide external accesses. The test chip is designed without resorting to a complex control scheme. The virtual cores are arranged in three groups, each consisting of three rows and seven columns. The whole chip can be divided into nine regions. Each region consists of two separate areas which are enclosed by core-sensor block named A1-A7, B1-B7 and C1-C7 respectively and represent 3 types of metallic thermal skeletons. to are identical design of the metallic thermal skeleton, so do the to and to . The major differences among these nine regions are the combinations of , and elements, which are shown in Figure 13. In this design as shown in Figure 13(a), elements , and are different in the distribution densities of metal in the BEOL. For better visualization, Figure 13(b) shows the three-dimensional view of the metallic thermal skeletons. The combinations of TSVs with front metals form the onchip heat sink, and the BEOL metal 1 to metal 4 form the metallic thermal skeletons.

Fig. 12. The floorplan of designed test chip.

In this chapter, the stacking of the identical chips is not included in discussions, only planar die is reported. The future thermal TSV test chip will divide the core area into blocks, each, as shown in Figure 14, consisting of virtual cores, temperature sensors, and a TSV array with metallic thermal skeletons to constructs the on-chip heat sink. The virtual cores and temperature sensors are laid out at the left and right side of the on-chip heat sink. As shown in Figure 14, thermal TSV with front metals will be the on-chip heat sink, and the metallic thermal skeletons play the role as the conduction path for high speed heat transfer. Therefore, the performance of the metallic thermal skeletons are emphasized and compared with each other.

The floorplan of the proposed test chip is depicted in the Figure 12. The metallic thermal skeletons are arranged and enclosed by the core-sensor blocks. The peripheral area is for input/output and power/ground connections which provide external accesses. The test chip is designed without resorting to a complex control scheme. The virtual cores are arranged in three groups, each consisting of three rows and seven columns. The whole chip can be divided into nine regions. Each region consists of two separate areas which are enclosed by core-sensor block named A1-A7, B1-B7 and C1-C7 respectively and represent 3 types of metallic thermal skeletons. to are identical design of the metallic thermal skeleton, so do the to and to . The major differences among these nine regions are the combinations of , and elements, which are shown in Figure 13. In this design as shown in Figure 13(a), elements , and are different in the distribution densities of metal in the BEOL. For better visualization, Figure 13(b) shows the three-dimensional view of the metallic thermal skeletons. The combinations of TSVs with front metals form the onchip heat sink, and the BEOL metal 1 to metal 4 form the metallic thermal skeletons.

**α<sup>1</sup> α<sup>2</sup> β <sup>1</sup> β <sup>2</sup> γ <sup>1</sup> γ <sup>2</sup>**

**α<sup>3</sup> α<sup>4</sup> β <sup>3</sup> β <sup>4</sup> γ <sup>3</sup> γ <sup>4</sup>**

**α<sup>5</sup> α<sup>6</sup> β <sup>5</sup> β <sup>6</sup> γ <sup>5</sup> γ <sup>6</sup>**

In this chapter, the stacking of the identical chips is not included in discussions, only planar die is reported. The future thermal TSV test chip will divide the core area into blocks, each, as shown in Figure 14, consisting of virtual cores, temperature sensors, and a TSV array with metallic thermal skeletons to constructs the on-chip heat sink. The virtual cores and temperature sensors are laid out at the left and right side of the on-chip heat sink. As shown in Figure 14, thermal TSV with front metals will be the on-chip heat sink, and the metallic thermal skeletons play the role as the conduction path for high speed heat transfer. Therefore, the performance of the metallic thermal skeletons are emphasized and compared

**3.1 Design of the proposed test chip 3.1.1 Overall floorplan of the chip** 

**Core-sensor** 

Fig. 12. The floorplan of designed test chip.

with each other.

**block**

Fig. 13. The design of TSVs with metallic thermal skeletons. (a) The planar floorplan with , , and TSVs. (b) The three-dimensional view of the metallic thermal skeletons.

Fig. 14. Concept of virtual block design.

Three-Dimensional Integrated Circuits Design

for Thousand-Core Processors: From Aspect of Thermal Management 31

to be read out. The handshake signal RDY indicates that the count is ready. The physical

(a)

(b)

Fig. 16. Thermal sensor design. (a) The block diagram of the thermal sensor. (b) The layout

Fig. 17. Power gating design. (a) The schematic diagram of the virtual core circuits. (b) The

of the thermal sensor, including a regulator, counter and a control unit.

 (a) (b)

layout view of the virtual core circuits.

view of the thermal sensor used in this test chip is shown in Figure 16(b).

Fig. 15. The layout of the test chip.

In this chapter, to verify the capability of heat conduction, triplet experiments are designed to test the chip. Since A1-A3 is at the corner of the chip, the heat transfers more to the peripherals than to the central area of the chip. Such kind of location factors occur often in the chip measurement of thermal phenomenon. Hence, A1-A3, B1-B3 and C1-C3 are identical combination of the metallic thermal skeletons to avoid the location effects happening. The layout of the designed test chip is shown in Figure 15. The core-sensor blocks, metallic thermal skeletons, peripherals, IOs, and power domains are in one SOC chip as the NoC. The virtual core system composed of on-chip heaters can be operated at the same time. The die size measures 5,040 µm × 5,040 µm, including the seal ring. There are three voltage levels, four power domains, and nine test regions in this chip. Each voltage level can be separately controlled by the programmable logic analysis instrument. All the cores in the chip can be operated independently through the power gating mechanism. In order to precisely observe the temperature distribution of the chip surface, all sensors on the chip are activated simultaneously, and the measured temperature values can be read out as the matrix data.

#### **3.1.2 Design of the core-sensor block**

The temperature sensitive ring oscillator (TSRO) thermal sensor in Figure 16 is based on a ring oscillator whose oscillation frequency is sensitive to temperature, albeit not completely linear. In fact, the ring oscillator is also sensitive to supply voltage. Hence, to minimize power droop is important in improving the accuracy. By establishing the relationship between temperature and frequency, and opting for on-die calibration, the thermal sensor can be quite accurate. The frequency is converted by a counter and read out to a register. Figure 16(a) shows the block diagram. The control unit (CU) accepts a reference clock TS\_CK and an input TS\_EN which enables the sensing operation when transitioning from 0 to 1. As shown in Figure 16(a), four signals a, b, c, and RDY are generated. When the internal signal a changes from 0 to 1, the counter is reset and the count is cleared. When internal signal b changes from 0 to 1, the ring oscillator is activated and the counter starts; when it changes from 1 to 0, the ring oscillator is deactivated and the counter stops. When the internal signal c changes from 0 to 1, the count is loaded into an output register TS\_REG

In this chapter, to verify the capability of heat conduction, triplet experiments are designed to test the chip. Since A1-A3 is at the corner of the chip, the heat transfers more to the peripherals than to the central area of the chip. Such kind of location factors occur often in the chip measurement of thermal phenomenon. Hence, A1-A3, B1-B3 and C1-C3 are identical combination of the metallic thermal skeletons to avoid the location effects happening. The layout of the designed test chip is shown in Figure 15. The core-sensor blocks, metallic thermal skeletons, peripherals, IOs, and power domains are in one SOC chip as the NoC. The virtual core system composed of on-chip heaters can be operated at the same time. The die size measures 5,040 µm × 5,040 µm, including the seal ring. There are three voltage levels, four power domains, and nine test regions in this chip. Each voltage level can be separately controlled by the programmable logic analysis instrument. All the cores in the chip can be operated independently through the power gating mechanism. In order to precisely observe the temperature distribution of the chip surface, all sensors on the chip are activated simultaneously, and the measured temperature values can be read out as

The temperature sensitive ring oscillator (TSRO) thermal sensor in Figure 16 is based on a ring oscillator whose oscillation frequency is sensitive to temperature, albeit not completely linear. In fact, the ring oscillator is also sensitive to supply voltage. Hence, to minimize power droop is important in improving the accuracy. By establishing the relationship between temperature and frequency, and opting for on-die calibration, the thermal sensor can be quite accurate. The frequency is converted by a counter and read out to a register. Figure 16(a) shows the block diagram. The control unit (CU) accepts a reference clock TS\_CK and an input TS\_EN which enables the sensing operation when transitioning from 0 to 1. As shown in Figure 16(a), four signals a, b, c, and RDY are generated. When the internal signal a changes from 0 to 1, the counter is reset and the count is cleared. When internal signal b changes from 0 to 1, the ring oscillator is activated and the counter starts; when it changes from 1 to 0, the ring oscillator is deactivated and the counter stops. When the internal signal c changes from 0 to 1, the count is loaded into an output register TS\_REG

Fig. 15. The layout of the test chip.

the matrix data.

**3.1.2 Design of the core-sensor block** 

to be read out. The handshake signal RDY indicates that the count is ready. The physical view of the thermal sensor used in this test chip is shown in Figure 16(b).

(b)

Fig. 16. Thermal sensor design. (a) The block diagram of the thermal sensor. (b) The layout of the thermal sensor, including a regulator, counter and a control unit.

Fig. 17. Power gating design. (a) The schematic diagram of the virtual core circuits. (b) The layout view of the virtual core circuits.

Three-Dimensional Integrated Circuits Design

design rule released from the foundry.

where

for Thousand-Core Processors: From Aspect of Thermal Management 33

For the die with 9 μm of BEOL and 450 μm of the silicon substrate, we can clearly figure out that *ksxx* is around 12~68 W/mK and *kszz* is around 116~147 W/mK, by substituting the thermal conductivities into (6). The variation in the equivalent thermal conductivity depends on the percentage distribution of the metal in BEOL. Thus, the heat flows through the silicon substrate almost dissipates by the metallic thermal skeletons instead of transferring by silicon dioxide in the BEOL. By substituting the equivalent *ksk* and the temperature values of *Ta*, *Tb* and *T1/2* into (9) we obtained that the widths of the metallic thermal skeleton should be 420 µm. FEM simulations have been performed to see the effectiveness of the proposed metallic thermal skeletons, as shown in Figure 19. For the reason of compatibility, we have combined the simulation results both from CFD-RD and ANSYS, so as to link the design platform for our circuit designers. Hence, to design the

D

0.28 0.44 0.28 D 0.20 0.52 0.28 0.36 0.36 0.28 

The matrix D represents the weighting coefficients of the metallic thermal skeletons. The percent contribution of the element is limited by the metal density constraint in the

Fig. 19. The simulated results of the selected regions of the proposed architecture are shown.

The enable signal H\_EN is broadcast to all virtual cores.

   , and with

(10)

(11)

**3.2.2 Effective thermal conductivity of the metallic thermal skeletons** 

metallic thermal skeleton shown in Figure 12, we assumed the type

different distribution densities of metal in the BEOL as following equation.

The virtual core circuit is composed of a PMOS switch and a p-type diffusion resistor, as shown in Figure 17. The diffusion resistor is non-silicided and placed in an n-well. Consequently, the n-well becomes hot at first, if the heater in the virtual core is turned on, which is slightly different from a conventional CMOS circuit in that the substrate is more likely to be the heat source. The maximum current flowing into the resistor is regulated below 13.5 mA.

#### **3.2 Thermal property analysis of the metallic thermal skeletons**

The metallic thermal skeletons are intended to be placed in the regions enclosed by the coresensor blocks. In this section, we derive analytical expressions for some key parameters.

#### **3.2.1 Analytical model of the metallic thermal skeleton**

It is clear that the heat removing rate of the metallic thermal skeletons is assumed to be *q*. Let us consider a pair of core-sensor blocks as the heat sources. The temperature distribution on the metallic thermal skeletons between any couple of core-sensor blocks can be expressed by (4), and then can be expressed as the following equation.

$$T\_k = T\_a + \frac{T\_b - T\_a}{w} \ge -\frac{q}{2k\_{sk}} \left[ w - x \right] \mathbf{x} \tag{8}$$

As shown in Figure 18, where *T*a and *T*b are the temperatures of CS1 and CS2, respectively, *q* is the heat conducted to the ambient environment by the metallic thermal skeletons, *ksk* is the equivalent thermal conductivity of the metallic thermal skeletons, and *w* is the width of the metallic thermal skeletons. Since *Tk* denotes the temperature at the location *x*, examining the mid-point *T*1/2 by substituting *x* with *w*/2 into (9), we have

$$
\sigma w = \left(\frac{8k\_{sk}}{q}\right)^{1/2} \left[\frac{T\_a + T\_b}{2} - T\_{1/2}\right]^{1/2} \tag{9}
$$

Fig. 18. The theoretical model of the core-sensor blocks with the metallic thermal skeletons.

#### **3.2.2 Effective thermal conductivity of the metallic thermal skeletons**

For the die with 9 μm of BEOL and 450 μm of the silicon substrate, we can clearly figure out that *ksxx* is around 12~68 W/mK and *kszz* is around 116~147 W/mK, by substituting the thermal conductivities into (6). The variation in the equivalent thermal conductivity depends on the percentage distribution of the metal in BEOL. Thus, the heat flows through the silicon substrate almost dissipates by the metallic thermal skeletons instead of transferring by silicon dioxide in the BEOL. By substituting the equivalent *ksk* and the temperature values of *Ta*, *Tb* and *T1/2* into (9) we obtained that the widths of the metallic thermal skeleton should be 420 µm. FEM simulations have been performed to see the effectiveness of the proposed metallic thermal skeletons, as shown in Figure 19. For the reason of compatibility, we have combined the simulation results both from CFD-RD and ANSYS, so as to link the design platform for our circuit designers. Hence, to design the metallic thermal skeleton shown in Figure 12, we assumed the type , and with different distribution densities of metal in the BEOL as following equation.

$$
\begin{bmatrix}
\alpha\\ \beta\\ \gamma \end{bmatrix} = \mathbf{D} \begin{bmatrix} \mathbf{X} \\ \Pi \\ \Phi \end{bmatrix} \tag{10}
$$

where

32 VLSI Design

The virtual core circuit is composed of a PMOS switch and a p-type diffusion resistor, as shown in Figure 17. The diffusion resistor is non-silicided and placed in an n-well. Consequently, the n-well becomes hot at first, if the heater in the virtual core is turned on, which is slightly different from a conventional CMOS circuit in that the substrate is more likely to be the heat source. The maximum current flowing into the resistor is regulated

The metallic thermal skeletons are intended to be placed in the regions enclosed by the coresensor blocks. In this section, we derive analytical expressions for some key parameters.

It is clear that the heat removing rate of the metallic thermal skeletons is assumed to be *q*. Let us consider a pair of core-sensor blocks as the heat sources. The temperature distribution on the metallic thermal skeletons between any couple of core-sensor blocks can be expressed

> *T T <sup>q</sup> T T x w xx w k*

As shown in Figure 18, where *T*a and *T*b are the temperatures of CS1 and CS2, respectively, *q* is the heat conducted to the ambient environment by the metallic thermal skeletons, *ksk* is the equivalent thermal conductivity of the metallic thermal skeletons, and *w* is the width of the metallic thermal skeletons. Since *Tk* denotes the temperature at the location *x*, examining the

*b a*

*sk a b k TT w T*

<sup>2</sup>

1/2

(8)

(9)

*sk*

1/2 1/2

2

**Core-Sensor blocks (CS)**

*x w*

**CS1 CS2 CS3**

**Metallic thermal skeletons**

Fig. 18. The theoretical model of the core-sensor blocks with the metallic thermal skeletons.

**3.2 Thermal property analysis of the metallic thermal skeletons** 

**3.2.1 Analytical model of the metallic thermal skeleton** 

by (4), and then can be expressed as the following equation.

mid-point *T*1/2 by substituting *x* with *w*/2 into (9), we have

*k a*

8

*q*

below 13.5 mA.

$$\begin{array}{c} \text{D} = \begin{bmatrix} 0.28 & 0.44 & 0.28\\ 0.20 & 0.52 & 0.28\\ 0.36 & 0.36 & 0.28 \end{bmatrix} \end{array} \tag{11}$$

The matrix D represents the weighting coefficients of the metallic thermal skeletons. The percent contribution of the element is limited by the metal density constraint in the design rule released from the foundry.

Fig. 19. The simulated results of the selected regions of the proposed architecture are shown. The enable signal H\_EN is broadcast to all virtual cores.

Three-Dimensional Integrated Circuits Design

for Thousand-Core Processors: From Aspect of Thermal Management 35

(a)

(b)

(c)

Fig. 21. The testing environment and setup. (a) The test chip is under the measurement environment with the infrared radiation inspection. (b) The naked die with the evaluation board and thermal management total analysis platform. (c) The test chip is placed in the

chamber at a nearly constant ambient temperature.
