S-Space Graduate School of Convergence Science and Technology (융합과학기술대학원) Dept. of Transdisciplinary Studies(융합과학부) Theses (Ph.D. / Sc.D._융합과학부)
Architecting Main Memory Systems of Manycore Processors
매니코어 프로세서 시스템을 위한 주 메모리 시스템 아키텍처 설계
- 융합과학기술대학원 지능형융합시스템학과
- Issue Date
- 서울대학교 융합과학기술대학원
- manycore; memory system; DRAM; μbanks; manycore simulator; Through-Silicon Interposer (TSI)
- 학위논문 (박사)-- 서울대학교 융합과학기술대학원 : 지능형융합시스템전공, 2015. 8. 안정호.
- Manycore processors have already become mainstream, where DRAM is widely used as main memory for these manycore processor systems. Applications have also been parallelized to exploit manycore systems efficiently, and their data sets keep increasing. Therefore, main memory systems become the performance and energy bottleneck of modern manycore systems. Through-silicon interposer (TSI) technology is a promising solution to architect high bandwidth energy-efficient main memory systems for modern manycore processors. While TSI improves the I/O energy efficiency, it results in an unbalanced memory system design because DRAM core dominates the overall energy consumption of manycore systems. However, there are few studies on DRAM device microarchitecture that consider the system-level impact on the performance and energy efficiency of manycore systems.
To conduct research on modern manycore systems, we need a cycle-level timing simulator that provides the detailed microarchitecture models of core and uncore subsystems. The core subsystems of manycore processors can consist of traditional or asymmetric cores. The uncore subsystems become more powerful and complex than ever, including deeper cache hierarchies, advanced on-chip interconnects, memory controllers, and main memory. We first implement a new cycle-level timing simulator, McSimA+, which enables microarchitectural studies on manycore systems and have the detailed microarchitecture models of core and uncore subsystems. McSimA+ is an application-level+ simulator, which enjoys the light weight of application-level simulators and the full control of threads and processes as in full-system simulators.
Then, we evaluate the system-level impacts on the performance and power of DRAM array organizations. We model modern DRAM array organizations by varying the number of banks, DRAM row size per bank, and evaluate the area, power, and timing of them. The modeling results show that larger DRAM row improves area efficiency and access time, but increases activation/precharge energy. We evaluate the system-level impacts of DRAM array organizations by simulating a manycore system with 3D stacked DRAM memory. The system performance and energy efficiency improve as each DRAM rank has more banks. While the 8KB DRAM row shows the best performance, the highest energy-delay product (EDP) is obtained when the DRAM row size is 2KB.
We finally propose a new TSI-based main memory system which solves the unbalance between I/O energy and DRAM core energy. Our TSI-based main memory system utilizes a novel DRAM device microarchitecture, called μbank. The μbank partitions each conventional bank into a large number of smaller banks (or μbanks) that operate independently with minimal area overhead. A massive number of μbanks provide ample bank-level parallelism, less bank conflict rate, and thus improve both IPC and EDP by 1.62× and 4.80× respectively for memory intensive SPEC 2006 benchmarks on average over the baseline DDR3-based memory system. We also show that the μbank-based memory systems can simplify memory controller designs because they show comparable performance with simple open-page policy to complex prediction-based page management policies. In our μbank-based memory system, the simple open-page policy achieves more than 95% of the performance of a perfect predictor.