Publications

Detailed Information

Architecting Main Memory Systems to Achieve Low Access Latency

DC Field Value Language
dc.contributor.advisor안정호-
dc.contributor.author손영훈-
dc.date.accessioned2017-07-14T01:48:45Z-
dc.date.available2017-07-14T01:48:45Z-
dc.date.issued2016-08-
dc.identifier.other000000135917-
dc.identifier.urihttps://hdl.handle.net/10371/122364-
dc.description학위논문 (박사)-- 서울대학교 융합과학기술대학원 : 융합과학부 지능형융합시스템전공, 2016. 8. 안정호.-
dc.description.abstractDRAM has been a de facto standard for main memory, and advances in process technology have led to a rapid increase in its capacity and bandwidth. In contrast, its random access latency has remained relatively stagnant, as it is still around 100 CPU clock cycles. Modern computer systems rely on caches or other latency tolerance techniques to lower the average access latency. However, not all applications have ample parallelism or locality that would help hide or reduce the latency. Moreover, applications demands for memory space continue to grow, while the capacity gap between last-level caches and main memory is unlikely to shrink. Consequently, reducing the main-memory latency is important for application performance. Unfortunately, previous proposals have not adequately addressed this problem, as they have focused only on improving the bandwidth and capacity or reduced the latency at the cost of significant area overhead.
Modern DRAM devices for the main memory are structured to have multiple banks to satisfy ever-increasing throughput, energy-efficiency, and capacity demands. Due to tight cost constraints, only one row can be buffered (opened) per bank and actively service requests at a time, while the row must be deactivated (closed) before a new row is stored into the row buffers. Hasty deactivation unnecessarily re-opens rows for otherwise row-buffer hits while hindsight accompanies the deactivation process on the critical path of accessing data for row-buffer misses. The time to (de)activate a row is comparable to the time to read an open row while applications are often sensitive to DRAM latency. Hence, it is critical to make the right decision on when to close a row. However, the increasing number of banks per DRAM device over generations reduces the number of requests per bank. This forces a memory controller to frequently predict when to close a row due to a lack of information on future requests, while the dynamic nature of memory access patterns limits the prediction accuracy.
In this thesis, we propose three novel DRAM bank organizations to reduce the average main-memory access latency. First, we introduce a novel DRAM bank organization with center high aspect-ratio (i.e., low-latency) mats called CHARM. Experiments on a simulated chip-multiprocessor system show that CHARM improves both the instructions per cycle and system-wide energy-delay product up to 21% and 32%, respectively, with only a 3% increase in die area. Second, we propose SALAD, a new DRAM device architecture that provides symmetric access latency with asymmetric DRAM bank organizations. SALAD leverages the asymmetric bank structure of CHARM in an opposite way. SALAD applies high aspect-ratio mats only to remote banks to offset the difference in data transfer time, thus providing uniformly low access time (tAC) over the whole device. Our evaluation demonstrates that SALAD improves the IPC by 13% (10%) without any software modifications, while incurring only 6% (3%) area overhead. Finally, we propose a novel DRAM microarchitecture that can eliminate the need for any prediction. By decoupling the bitlines from the row buffers using isolation transistors, the bitlines can be precharged right after a row becomes activated. Therefore, only the sense amplifiers need to be precharged for a miss in most cases. Also, we show that this row-buffer decoupling enables internal DRAM μ-operations to be separated and recombined, which can be exploited by memory controllers to make the main memory system more energy efficient. Our experiments demonstrate that row-buffer decoupling improves the geometric mean of the instructions per cycle and MIPS2/W by 14% and 29%, respectively, for memory-intensive SPEC CPU2006 applications.
-
dc.description.tableofcontentsChapter1. Introduction 1
1.1 Research Contributions 10
1.2 Outline 12

Chapter2. Background 13
2.1 Modern DRAM Device Organization 13
2.2 How DRAM Devices Operate 17
2.3 The Impact of DRAM Timing Parameters on Memory Access Latency 19
2.4 State-Dependent DRAM Latency 20
2.5 Memory Access Scheduling and Page Management Challenges 22

Chapter3.Reducing Memory Access Latency with Asymmetric DRAM 3. Bank Organizations 27
3.1 Asymmetric DRAM Bank Organizations 27
3.1.1 DRAM Cycle and Access Time Analysis 28
3.1.2 Low-Latency Mats with High Aspect Ratios 29
3.1.3 Banks with a Non-Uniform Access Time 32
3.1.4 CHARM: Center High-Aspect-Ratio Mats 34
3.1.5 OS Page Allocation 37
3.2 Experimental Setup 40
3.3 Evaluation 44
3.3.1 Performance Impact on Single-Threaded Applications 44
3.3.2 Performance Impact on Multiprogrammed Workloads 49
3.3.3 Performance Impact on Multithreaded Workloads 52

Chapter4. SALAD: Achieving Symmetric Access Latency with Asymmetric DRAM Architecture 54
4.1 Symmetric Access Latency with Asymmetric DRAM Architecture 54
4.2 Experimental Setup and SPICE Modeling 57
4.3 Evaluation 61

Chapter5. Row-Buffer Decoupling: A Case for Low-Latency DRAM Microarchitecture 65
5.1 Row-Buffer Decoupling 65
5.1.1 Pertinent Details of DRAM Architecture to Explain Why Row Buffers are Coupled 66
5.1.2 DRAM Architecture with Decoupled Row Buffers 69
5.1.3 Scheduling Internal DRAM μ-operations 73
5.1.4 Quantifying the Row-Buffer Decoupling Overhead through SPICE Modeling 76
5.2 Experimental Setup 78
5.3 Evaluation 82
5.3.1 The Impact of Row-Buffer Decoupling on Single- Threaded Workloads 82
5.3.2 The Impact of Row-Buffer Decoupling on Multithreaded and Multiprogrammed Workloads 86
5.3.3 Sensitivity Studies 89

Chapter6. Related Work 92
6.1 High-Performance DRAM Bank Structures 92
6.2 DRAM-Side Caching 93
6.3 3D Die Stacking 94
6.4 DRAM Module-Level Solutions 94
6.5 Memory access schedulers 95
6.6 DRAM page-management policies 96

Chapter7. Conclusion 98

Bibliography 103

국문 초록 115
-
dc.formatapplication/pdf-
dc.format.extent1149034 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 융합과학기술대학원-
dc.subjectmemory system-
dc.subjectDRAM-
dc.subjectCHARM-
dc.subjectSALAD-
dc.subjectrow-buffer decoupling-
dc.subjectaccess latency-
dc.subjectarea overhead-
dc.subject.ddc620-
dc.titleArchitecting Main Memory Systems to Achieve Low Access Latency-
dc.typeThesis-
dc.description.degreeDoctor-
dc.citation.pages117-
dc.contributor.affiliation융합과학기술대학원 융합과학부-
dc.date.awarded2016-08-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share