# The Impact of the Floorplan of Functional units on 3D Multi-core Processors

Hyung Gyu Jeon School of Electronics and Computer Engineering, Chonnam National University Gwangju, Korea hggodman1108@gmail.com Hong Jun Choi School of Electronics and Computer Engineering, Chonnam National University Gwangju, Korea chj6083@gmail.com Jong Myon Kim School of Computer Engineering and Information Technology, University of Ulsan Ulsan, Korea jmkim07@ulsan.ac.kr Cheol Hong Kim School of Electronics and Computer Engineering, Chonnam National University Gwangju, Korea chkim22@jnu.ac.kr<sup>\*</sup>

Abstract— Interconnection delay is one of the most critical constraints in improving the performance of multi-core processors. In order to reduce the interconnection delay in the multi-core processor, 3D integration technology has been applied in designing multi-core processors. The 3D multi-core processor is composed of vertically stacked cores which are connected by through-silicon vias, leading to improved interconnection and power efficiency by reducing the physical wire length significantly. However, 3D multi-core processors have severe temperature problems caused by higher power density compared to 2D Multi-core processors. In this paper, we propose the thermal-aware floorplan schemes to solve the thermal problems in 3D multi-core processors by changing the location of functional units. According to our experimental results, the proposed floorplan schemes reduce the peak temperature by 12°C on average with 3% performance gain.

## Keywords- Multi-core processor; 3D architecture; Temperature; Floorplan schemes

# I. INTRODUCTION

Continuing advances in semiconductor technology enables increasing clock frequency, leading to the improved processor performance. Unfortunately, the increased frequency causes high power consumption [1]. Therefore, performance and power efficiency should be considered together in designing up-to-date microprocessors [2-6]. To overcome the power constraints in single-core processors, multi-core processors have been widely used. In the multicore processors, the interconnection delay is regarded as one of the major constraints in improving the performance [7-9].

In 3D multi-core processors, multiple cores are stacked vertically and each core on different layers are connected by vertical through-silicon vias(TSVs) [10][11]. The 3D integration technology using TSVs can be a good solution in the multi-core processor because the 3D architecture has advantages in the perspective of performance and power efficiency over the 2D architecture [12]. For this reason, many researchers have focused on the 3D architecture in designing multi-core processors. However, one of the major problems in designing 3D multi-core processors is the thermal problem due to high power density. According to [13], the thermal problem is exacerbated in the 3D cases for mainly two reasons. First, the thermal conductivity of the

dielectric layers between the device layers is very low compared to silicon and metal. Second, the vertically stacked multiple layers of active devices cause a rapid increase of power density. Therefore, in spite of various advantages of the 3D integration technology, it cannot be practical without proper solutions for thermal problems, because thermal problems have negative impact on the reliability of the processor [14].

Dvnamic Thermal Management(DTM) techniques, which use dynamic frequency scaling(DFS), dynamic voltage scaling(DVS), clock gating, or computational migration, have been proposed to relieve the thermal problems of processors. DTM techniques keep the chip temperature under the given threshold, resulting in improved reliability [15]. Unfortunately, DTM techniques degrade the performance to reduce the temperature of the processor. In this work, we propose three thermal-aware floorplan schemes to alleviate the thermal problems of the 3D multicore processor with little performance loss. In our previous work, we investigated thermal-aware floorplan schemes and analyzed the impact of the floorplan on the processor temperature [16]. In this work, we present more efficient floorplan schemes compared to the schemes in [16][17], leading to better performance and energy-efficiency while solving the thermal problems of 3D multi-core processors.

The rest of this paper is organized as follows: Section II describes related work and Section III presents the proposed thermal-aware floorplan schemes. Section IV describes the simulation infrastructure and methodology and Section V describes our experimental results in detail. Finally, Section VI concludes this paper.

#### II. RELATED WORK

#### A. 3D multi-core processors

Compared to 2D multi-core processors, 3D multi-core processors have benefits in improving the performance by reducing the wire length dramatically. There are several manufacturing technologies for 3D die stacking and alignment, such as wafer-to-wafer bonding, die-to-die bonding, die-to-wafer bonding [18][19]. In the wafer-towafer bonding, electronic components are built on several semiconductor wafers and entire wafers are directly bonded together, where under 1.5um misalignment is achieved defects after without significant bonding process optimization [20]. Improvement of alignment accuracy also can be expected, because no deformable adhesive material is included in the bonding interface.



Figure 1. 2-die 3D IC

# **B.** Thermal Management techniques

Researchers have proposed a large number of DTM techniques to solve the thermal problems in the processor. DTM techniques can be categorized into two different groups: One is software-based techniques such as energyaware task scheduling, OS-level task scheduling [21]. The other is hardware-based techniques, such as clock gating, dynamic frequency scaling (DFS), dynamic voltage scaling (DVS), instruction throttling, etc. Software-based techniques show lower performance degradation compared to the hardware-based techniques, but the cooling efficiency of software-based techniques is less effective than that of hardware-based techniques [22].

The existing DTM techniques for 2D multi-core processors have been used to prevent the case that the temperature of the chip exceeds the thermal limit supported by the cooling solution. However, high power density of 3D multi-core processors may require reactive DTM techniques to be engaged more frequently than the 2D multi-core processors, because the 3D architecture exacerbates the thermal problems in multi-core processors [23]. Unfortunately, DTM techniques incur performance overhead to control the temperature in the processor, resulting in performance degradation. Consequently, as DTM techniques are applied to the 3D multi-core processors, more performance degradation can be occurred compared to the 2D multi-core processors. Therefore, the thermal management for 3D multi-core processors should be used to proactively and continuously optimize performance and temperature, instead of merely reacting to emergencies [16].

# C. Floorplan Technology

When designing up-to-date microprocessors, high performance, power efficiency, and thermal efficiency are all important design considerations. However, existing thermalaware studies, i.e. DTM, reduce the peak temperature in the processor by sacrificing performance. For this reason, recent researches have focused on the solutions for reducing the peak temperature in the processor with little effects on the performance.

According to [17], the thermal-aware design techniques using floorplan scheme lead to peak temperature reduction with minimal performance degradation. Traditionally, floorplan schemes have been researched at the circuit-level [13]. However, as wire delay becomes the bottleneck of performance and thermal problems become critical issue, floorplan schemes have started to be looked at the architecture-level.

In floorplan schemes, heat transfer from adjacent functional units is one of the most important factors that affect the temperature distribution of a chip. The temperature distribution depends on the functional unit adjacency determined by the floorplan of the processor. In other words, the temperature on the functional units is heavily affected by the heat transfer from adjacent functional units.

#### III. PROPOSED THERAL-AWARE FLOORPLAN SCHEMES

In this paper, baseline floorplan model is based on the Alpha21364(EV6) [23]. We assume a 90nm technology with a supply voltage of 1.5 volts. To implement 2-die staked 3D IC, we extend it to 2 layers in our experiments.

| FPMap                   | Int Map   | Int Q        | Int Reg  |                                                                                               |
|-------------------------|-----------|--------------|----------|-----------------------------------------------------------------------------------------------|
| FPMul<br>FPGeg<br>FPAdd | FPQ       | Ldstq<br>ITB | Int Exec | IntMap : Integer Mapper                                                                       |
| BPr                     | BPred DTB |              | TB       | IntReg : Integer Register                                                                     |
| ICache                  |           | DCache       |          | IntExec : Integer Execution U<br>FPMap : Floating Point Mapp<br>FPMul : Floating Point Multij |
|                         | cor       | e-0          |          | FPReg : Floating Point Regist<br>FPAdd : Floating Point Adder<br>FPO : Floating Point Oueue   |
| FPMap                   | Int Map   | Int Q        | Int Reg  | LdStQ : Load and Store Queu                                                                   |
| FPMul<br>FPGeg<br>FPAdd | FPQ       | Ldstq<br>ITB | Int Exec | Bpred :Branch Predictor<br>ITB : Instruction TLB<br>DTB : Data TLB                            |
| BPred                   |           | DTB          |          | Icache : Instruction Cache                                                                    |
| ICache                  |           | DCache       |          | - Deache : Data Cache                                                                         |
|                         | cor       | e-1          |          | -                                                                                             |
| H                       | EAT       | SIN          | K        |                                                                                               |

Figure 2. Baseline Floorplan of 3D Dual-core Processor - Baseline

Figure 2 denotes the baseline floorplan of 3D dual-core processor, which is the target processor in this work. Each core of the baseline processor is Alpha21364. We stacked two cores vertically to configure the dual-core processor. In Figure 2, core-0 represents the core located far from the heat sink while core-1 denotes the core located near to the heat sink. Proposed three floorplan schemes are shown in Figure  $3 \sim$  Figure 5. The floorplan of core-0 is modified while the

Multiplier

floorplan of core-1 is fixed, since the temperature of core-0 is higher than that of core-1 due to the different cooling efficiency. In this work, we implement a lot of experiments to find efficient floorplans for reducing the temperature. We describe only three efficient floorplan schemes owing to limited page.

In the proposed floorplan schemes, hottest functional units are relocated to reduce the temperature without increasing area. Compared to the Baseline in Figure 2, relocated functional units in the proposed floorplan schemes can be summarized as follows:

# Floorplan I - IntReg, IntExec Floorplan II - FPMul, LdStQ, IntReg, IntExec, IntQ Floorplan III - FPAdd, LdStQ, IntQ, IntExec, IntReg

First floorplan scheme (Floorplan I) swaps the location of IntReg with that of IntExec because the IntReg unit is one of the hottest units in the baseline floorplan. As shown in Figure 3, Floorplan I doesn't change the location of other functional units. Second floorplan scheme (Floorplan II), shown in Figure 4, changes the location of FPMul, LdStQ, IntReg, IntExec and IntQ. As shown in Figure 5, third floorplan scheme (Floorplan III) changes the location of FPAdd, LdStQ, IntQ, IntExec and IntReg.

| FPMap          | Int Map   | Int Q        | Int Exec |  |  |  |  |
|----------------|-----------|--------------|----------|--|--|--|--|
| FPMul<br>FPGeg | EBO       | Ldstq        | Int Exec |  |  |  |  |
| FPAdd          | ггQ       | ITB          | Int Reg  |  |  |  |  |
| BPr            | ed        | DTB          |          |  |  |  |  |
| ICad           | che       | DCache       |          |  |  |  |  |
|                | core-0    |              |          |  |  |  |  |
| FPMap          | Int Map   | Int Q        | Int Reg  |  |  |  |  |
| FPMul<br>FPGeg | FPQ       | Ldstq<br>ITB | Int Exec |  |  |  |  |
| BPr            | BPred     |              | DTB      |  |  |  |  |
| ICad           | che       | DCache       |          |  |  |  |  |
|                | core-1    |              |          |  |  |  |  |
| H              | HEAT SINK |              |          |  |  |  |  |
|                |           |              |          |  |  |  |  |

Figure 3. Floorplan I

Modifying the floorplan of the core affects the datapath in the processor, because the distance between functional units is determined by the floorplan. Data path is also considered in this work, because the floorplan in the proposed schemes is modified, significantly. We assume that the data path of Floorplan I is same to that of Baseline, since there is little difference between two floorplans. However, we changed the data path of Floorplan II and Floorplan III, because two floorplans have big difference compared to the Baseline. Especially, level 1 data cache latency related with LdStQ is increased compared to the Baseline, because the location of LdStQ is mainly changed in Floorplan II and Floorplan III.



Figure 5. Floorplan III

# IV. EXPERIMENTAL METHODOLOGY

In this section, we briefly describe the simulation infrastructure and thermal modeling. In order to determine the characteristics of the proposed schemes with respect to the baseline scheme, we perform applications selected from SPEC CPU2000 suite [24] using SimpleScalar [25] and Wattch [26]. SimpleScalar provides cycle-level modeling of processor in detail and Wattch is used to obtain the power trace of processor.

| Parameter            | Value                                                                    |  |  |
|----------------------|--------------------------------------------------------------------------|--|--|
| Functional units     | 4 integer ALUs,<br>1 floating point ALUs<br>1 integer multiplier/divider |  |  |
|                      | 1 floating point<br>1 multiplier/divider                                 |  |  |
| L1 Instruction Cache | 32KB, 4-way, 32byte lines,<br>1 cycle latency                            |  |  |
| L1 Data Cache        | 32KB, 4-way, 32byte lines,<br>1 ~ 2 cycle latency                        |  |  |
| Unified L2 Cache     | 256KB, 8-way, 64byte lines,<br>12 cycle latency                          |  |  |

TABLE I. SYSTEM PARAMETERS

Table I shows the main processor and memory hierarchy parameters used in the simulation. In the up-to-date processors, DTM technique is applied to alleviate the thermal problems. Therefore, DTM techniques (initial temperature:  $60^{\circ}$ , instruction throttling starting at  $80^{\circ}$ , DVFS starting at  $85^{\circ}$ ) are also applied to the simulated processor.

TABLE II. BENCHMARK

| Benchmark<br>Program | IPC  | L2 Cache<br>Miss Rates | Peak Temperature<br>in 2D Processor<br>(°C) |  |
|----------------------|------|------------------------|---------------------------------------------|--|
| gcc                  | 2.48 | 0.03                   | 78                                          |  |
| mcf                  | 2.73 | 0.25                   | 81                                          |  |

Table II shows benchmark programs used in our simulation. Generally, benchmark programs are divided into two categories: memory-bound programs and cpu-bound programs. We use both memory-boud program(mcf) and cpu-bound program(gcc) to anaylze the temperature of the processor with various kinds of benchmark programs.

TABLE III. THERMAL MODEL PARAMETERS

| Doromotor                                     | Value  |        |        |        |  |
|-----------------------------------------------|--------|--------|--------|--------|--|
| 1 al alletel                                  | TIM-0  | die-0  | TIM-1  | die-1  |  |
| Specific heat<br>capacity(J/m <sup>3</sup> K) | 4.0e6  | 1.75e6 | 4.0e6  | 1.75e6 |  |
| Thickness(m)                                  | 2.0e-5 | 1.5e-4 | 2.0e-5 | 1.5e-4 |  |
| Resistivity(mK/W)                             | 0.25   | 0.01   | 0.25   | 0.01   |  |

To evaluate the heat dissipation of 3D multi-core processors, especially inside each die, we use HotSpot Version 5.0 [27]. Hotspot is a modeling tool for developing compact thermal models. It can simulate the processor temperature in detail. And, HotSpot's grid model is capable of modeling stacked 3D chips. HotSpot takes power trace as input and generates the steady state temperature according to each functional block. Power trace is generated by using Wattch. In order to have precise experiments, the configuration parameters and the layer parameters are obtained from the material properties in CRC handbook [28]. Thermal modeling configurations of 3D dual-core processor used in our experiments are shown in Table III. In the table, die-0 represents the die located far away from the heat sink, while die-1 denotes the die located near to the heat sink.

# V. EXPERIMENTAL RESULTS

In this work, we analyze the temperature and the performance according to the floorplan scheme for the 3D multi-core processor. In the results, notDTM and DTM represent the baseline floorplan without DTM technique and the baseline floorplan with DTM technique, respectively. Floorplan I, Floorplan II and Floorplan III represent the proposed floorplan schemes shown in Figure 3-5 with the DTM technique, respectively.

## A. Temperature

We use the peak temperature of the processor instead of the average temperature because the major thermal problem is caused by the highest temperature. We assume that g and m represent the gcc and mcf applications obtained from SPEC CPU2000, respectively. In the graph, the first character denotes an application which is executed on the core-0 and second character denotes the application executed on the core-1. For example, the gm means that gcc is executed on the core-0 and mcf is executed on the core-1. The vertical axis represents the peak temperature of the processor.



Figure 6. Peak temperature of 3D Dual-core processor

Figure 6 shows the peak temperature of the 3D dual-core processor according to each floorplan scheme. As shown in the graph, the temperature of the DTM is lower than that of notDTM. Our proposed three floorplan schemes reduce the peak temperature compared to the baseline floorplan.

In the three proposed floorplan schemes, Floorplan II and Floorplan III show lower temperature than Floorplan I. According to our simulation results, compared to notDTM, DTM decreases the temperature by  $6.22^{\circ}$  on average (gg:3.42°C, gm:5.16°C, mg:7.28°C, mm:9.0°C) and Floorplan I decreases the temperature by 11.9°C on average (gg:8.8°C, gm:10.38°C, mg:13.4°C, mm:15.08°C). Floorplan II and Floorplan III reduce the temperature by 13.27°C and 13.32°C on average, respectively. Reduced temperature leads to higher reliability of the processor.

# B. Performance

If the temperature on the processor exceeds the threshold, the DTM technique is activated, resulting in performance degradation. In the simulated 3D dual-core processor, the temperature of the core-1 is higher than that of the core-0. Therefore, the DTM is more frequently applied to core-1 than core-0. Consequently, the performance degradation of the core-0 is more serious than that of the core-1. Figure 7 shows total execution time of core-0 to analyze the performance according to each floorplan scheme. Each bar in the graphs is normalized to the execution time of notDTM.



Figure 7. Performance of 3D Dual-core processor

We analyze the normalized performance according to each floorplan for 4 application combination such as gg, gm, mg and mm. For all cases, DTM shows the performance degradation by 22.89% compared to notDTM on average. In our simulation results, the first DTM technique, instruction throttling, is applied after 5.1% of the total execution time. Floorplan I, Floorplan II and Floorplan III degrade the performance by 5%, 2.19% and 2.2% on average compared to notDTM, respectively. Performance degradation in the proposed schemes is less than that of DTM, because the DTM technique is less frequently applied in the proposed schemes. Especially, even though Floorplan II and Floorplan III increase the latency for accessing the data cache(1 cycle  $\rightarrow$  2 cycle) due to modified floorplan, the performance is improved by about 3% compared to Floorplan I. According to our analysis, it comes from the fact that performance improvement obtained from reduced temperature is larger than the performance loss due to increased latency.

# C. Energy consumption

As described in [29], we can see the equation for calculating the leakage power related to the area and temperature.

$$P_{\text{leak}} = \alpha \cdot \text{Area} \cdot e^{\beta(T_{\text{current}} \cdot T_0)}$$
(1)

$$\mathbf{P}_{\text{total}} = \sum \mathbf{P}_{\text{leak}} \tag{2}$$

In the equation (1), the " $\alpha$ " and " $\beta$ " represents a base leakage power and constant value which is changed by the technology, respectively. " $T_{current}$ " is current temperature and " $T_0$ " is temperature when the leakage power is base value. We can get the total leakage power by using the equation (2). As you can see in the equation, total leakage power decreases if " $\alpha$ " or " $\beta$ " or "Area" or execution time becomes less. The proposed floorplan schemes reduce the execution time significantly compared to the traditional floorplan. Therefore, the proposed technique can reduce the leakage power consumption significantly.

#### VI. CONCLUSION

In this paper, we proposed thermal-aware floorplan schemes to solve the thermal problems in 3D multi-core processors. The proposed schemes reduce the peak temperature of the processor by adjusting the location of hot units to reduce the temperature increase due to heat transfer. According to our experiments, the proposed schemes reduce the temperature compared to the baseline scheme by 12.84°C on average, leading to better reliability. Moreover, it reduces the performance loss due to the DTM technique significantly. The proposed floorplan schemes also show better power efficiency than the traditional floorplan scheme. Therefore, the proposed floorplan schemes can be a good solution for improving the performance, power-efficiency and reliability of 3D multi-core processors.

#### ACKNOWLEDGMENT

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(2012R1A1B4003492) and MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency)(NIPA-2012-H0301-12-3005)

#### REFERENCES

 V. Agarwal, M. S. Hrishikesh, S. W. Keckler and D. Burger, "Clock rate versus IPC: the end of the road for conventional microArchitectures," Proc. the 27th International Symposium on Computer Architecture, Jun. 2000, pp. 10-14.

- [2] L. Xiang, J. Huang and T. Chen, "Coordinating System Software for Power Savings," Proc. Future Generation Communication and Networking, Dec. 2008, pp. 13-15.
- [3] R. Palit, A. Singh and K. Naik, "An Architecture for Enhancing Capability and Energy Efficiency of Wireless Handheld Devices," International Journal of Energy, Information and Communications. vol. 2, Nov. 2011, pp. 117-136.
- [4] M. Chakraverty, S. Mandava and G. Mishra, "Performance Analysis of CMOS Single Ended Low Power Low Noise Amplifier," International Journal of Control and Automation. vol. 3,Jun. 2010, pp. 45-52.
- [5] S. Banerjee, M. Mukherjee and J. P. Banerjee, "Bias current optimization of Wurtzite-GaN DDR IMPATT Diode for High Power Operation at Thz Frequencies," International Journal of Advanced Science and Technology. vol. 16, Mar. 2010, pp. 11-20.
- [6] H. Naqvi, S. Berber and Z. Salcic, "Energy Efficiency of Collaborative Communication with imperfect Frequency Synchronization in Wireless Sensor Networks," International Journal of Multimedia and Ubiquitous Engineering. vol. 5, Oct. 2010, pp. 19-30.
- [7] J. W. Joyner, P. Zarkesh-Ha, J. A. Davis and J. D. Meindl, "A Three-Dimensional Stochastic Wire-Length Distribution for Variable Separation of Strata," Proc. IEEE International Interconnect Technology Conference, Jun. 2000, pp. 7-9.
- [8] L. Yeh and R. Chy, "Thermal Management of Microelectronic Equipment," American Society of Mechanical Engineering, 2001.
- [9] Z. Zhijun, L. R. Hoover, and A. L. Phillips, "Advanced thermal architecture for cooling of high power electronics," Components and Packaging Technologies, IEEE Transactions on, vol. 25, Dec. 2002, pp. 629-634.
- [10] S. W. Yoon, D. W. Yang, J. H. Koo, M. Padmanathan and F. Carson, "3D TSV processes and its assembly/Packaging Technology," Proc. IEEE International Conference on 3D System Integration, Sep. 2009, pp. 28-30.
- [11] Zhu, C., Gu, Z., Shang, L., Dick, R.P., and Joseph, R. "Threedimensional chip-multiprocessor run-time thermal management," IEEE Transactions on Computer-Aided Design of Lntegrated Circuits and Systems, vol. 27, Aug. 2008, pp. 1479-1492.
- [12] A. W. Topol, D. C. L. Tulipe, L. Shi, D. J. Frank, K. Bernstein, S. E. Steen, A. Kumar, G. U. Singco, A. M. Young, K. W. Guarini and M. Ieong, "Three-Dimensional integrated circuits," IBM Journal of Research and Development, USA, 2006.
- [13] J. Cong, G. J. Luo, J. Wei and Y. Zhang, "Thermal-Aware 3D IC Placement Via Transformation," Proc. Asia and South Pacific-Design Automation Conference, Jan. 2007, pp. 23-26.
- [14] R. Mahajan, "Thermal Management of CPUs: A Perspective on Trends, Needs, and Opportunities," Invited talk given at the 8th International Workshop on Thermal INvestigations of ICs and Systems, 2002.
- [15] A. K. Coskun, J. L. Ayala, D. Atienza, T. S. Rosing and Y. Leblebici, "Dynamic Thermal Management in 3D Multicore Architectures," Proc. Design, Automation & Test in Europe Conference & Exhibition, Apr. 2009, pp. 20-24.
- [16] D. O. Son, Y. J. Park, J. W. Ahn, J. H. Park, J. M. Kim and C. H. Kim, "Thermal-aware Floorplan Schemes for Reliable 3D Multi-core Processors," Proc. International Conference ICCSA, Jun. 2011, pp. 20-23.
- [17] K. Sankaranarayanan, S. Velusamy, M. Stan and K. Skadron, "A Case for Thermal-Aware Floorplanning at the Microarchitectural level," Journal of Instruction-level Parallelism, vol. 7, Aug. 2005, pp. 1-16.

- [18] P. Lindner, V. Dragoi, T. Glinsner, C. Schaefer and R. Islam. "3D Interconnect through aligned wafer level bonding," Proc. the Electronic Components and Technology Conference, May. 2002, pp. 1439-1443.
- [19] P. Morrow, M. J. Kobrinsky, S. Ramanathan, C. M. Partk, M. Harmes, V. Ramachandrarao, H. M. Park, G. Kloster, S. List and S. Kim. "Wafer-level 3D Interconnects via Cu bonding," Proc. the 2004 Advanced Metalization Conference, Oct. 2004, pp. 331-336.
- [20] P. Leduca, F. de Crecy, B. Charlet, T. Enot, M. Zussy, B. Jones, J.-C. Barbe, N. Kernevez, N. Sillon, S. Maitrejean and D. Louisa. "Challenges for 3D IC Integration: bonding quality and Thermal management," Proc. IEEE International Interconnect Technology Conference, Jun. 2007, pp. 21-212.
- [21] X. Zhou, Y. Xu, Y. Du, Y. Zhang and J. Yang, "Thermal Management for 3D Processors via Task scheduling," Proc. the 2008 37th International Conference on Parallel Processing, Sep. 2008, pp. 9-12.
- [22] T. Pering and R. Brodersen, "Energy efficient voltage scheduling for real-time operating systems," Proc. the 4th IEEE Real-Time Technology and Applications Symposium RTAS'98, Work in Progress Session, Jun. 1998, pp. 3-5.
- [23] R. E. Kessler. "The Alpha 21364 microprocessor," IEEE MICRO, vol. 19, 1996
- [24] J. L. Henning, "SPEC CPU2000: Measuring CPU Performance in the New Millennium," IEEE Computer, vol. 33, Jul. 2000, pp. 28-35.
- [25] D. C. Burger and T. M. Austin. "The SimpleScalar tool set, version 2.0," ACM Special Interest Group on Computer Architecrue Computer Architecture News, vol. 25, Jun. 1997, pp. 13-25.
- [26] D. Brooks, V. Tiwari and M. Martonosi, "Wattch: A Framework for Architectural-level Power Analysis and Optimizations," Proc. the 27th Annual International Symposium on Computer Architecture, Jun. 2000, pp. 10-14.
- [27] W. Huang, M. R. Stan, K. Skadron, K. Sankaranarayanan and S. Ghosh. "HotSpot: A Compact Thermal Modeling Method for CMOS VLSI Systems," IEEE Transactions on VLSI Systems, vol. 14, May. 2006, pp. 501-513.
- [28] CRC Press, CRC Handbook of Chemistry. http://www.hbcpnetbase.com
- [29] A. K. Coskun, A. B. Kahng and T. S. Rosing, "Temperatureand Cost-Aware Design of 3D Multiprocessor Architectures," Proc. Architectures, Methods and Tools, Aug. 2009, pp. 27-29.