Architecting Main Memory Systems to Achieve Low Access Latency

손영훈

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Architecting Main Memory Systems to Achieve Low Access Latency

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 손영훈

Advisor: 안정호

Major: 융합과학기술대학원 융합과학부

Issue Date: 2016-08

Publisher: 서울대학교 융합과학기술대학원

Keywords: memory system ; DRAM ; CHARM ; SALAD ; row-buffer decoupling ; access latency ; area overhead

Description: 학위논문 (박사)-- 서울대학교 융합과학기술대학원 : 융합과학부 지능형융합시스템전공, 2016. 8. 안정호.

Abstract: DRAM has been a de facto standard for main memory, and advances in process technology have led to a rapid increase in its capacity and bandwidth. In contrast, its random access latency has remained relatively stagnant, as it is still around 100 CPU clock cycles. Modern computer systems rely on caches or other latency tolerance techniques to lower the average access latency. However, not all applications have ample parallelism or locality that would help hide or reduce the latency. Moreover, applications demands for memory space continue to grow, while the capacity gap between last-level caches and main memory is unlikely to shrink. Consequently, reducing the main-memory latency is important for application performance. Unfortunately, previous proposals have not adequately addressed this problem, as they have focused only on improving the bandwidth and capacity or reduced the latency at the cost of significant area overhead.
Modern DRAM devices for the main memory are structured to have multiple banks to satisfy ever-increasing throughput, energy-efficiency, and capacity demands. Due to tight cost constraints, only one row can be buffered (opened) per bank and actively service requests at a time, while the row must be deactivated (closed) before a new row is stored into the row buffers. Hasty deactivation unnecessarily re-opens rows for otherwise row-buffer hits while hindsight accompanies the deactivation process on the critical path of accessing data for row-buffer misses. The time to (de)activate a row is comparable to the time to read an open row while applications are often sensitive to DRAM latency. Hence, it is critical to make the right decision on when to close a row. However, the increasing number of banks per DRAM device over generations reduces the number of requests per bank. This forces a memory controller to frequently predict when to close a row due to a lack of information on future requests, while the dynamic nature of memory access patterns limits the prediction accuracy.
In this thesis, we propose three novel DRAM bank organizations to reduce the average main-memory access latency. First, we introduce a novel DRAM bank organization with center high aspect-ratio (i.e., low-latency) mats called CHARM. Experiments on a simulated chip-multiprocessor system show that CHARM improves both the instructions per cycle and system-wide energy-delay product up to 21% and 32%, respectively, with only a 3% increase in die area. Second, we propose SALAD, a new DRAM device architecture that provides symmetric access latency with asymmetric DRAM bank organizations. SALAD leverages the asymmetric bank structure of CHARM in an opposite way. SALAD applies high aspect-ratio mats only to remote banks to offset the difference in data transfer time, thus providing uniformly low access time (tAC) over the whole device. Our evaluation demonstrates that SALAD improves the IPC by 13% (10%) without any software modifications, while incurring only 6% (3%) area overhead. Finally, we propose a novel DRAM microarchitecture that can eliminate the need for any prediction. By decoupling the bitlines from the row buffers using isolation transistors, the bitlines can be precharged right after a row becomes activated. Therefore, only the sense amplifiers need to be precharged for a miss in most cases. Also, we show that this row-buffer decoupling enables internal DRAM μ-operations to be separated and recombined, which can be exploited by memory controllers to make the main memory system more energy efficient. Our experiments demonstrate that row-buffer decoupling improves the geometric mean of the instructions per cycle and MIPS2/W by 14% and 29%, respectively, for memory-intensive SPEC CPU2006 applications.

Language: English

URI: https://hdl.handle.net/10371/122364

Files in This Item:

000000135917.pdf 1.10 MB

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Transdisciplinary Studies(융합과학부)
  - Theses (Ph.D. / Sc.D._융합과학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share