Publications

Detailed Information

Performance Enhancement of Systems using Emerging Memory Technologies : 새로운 메모리 기술을 사용하는 시스템의 성능 향상

DC Field Value Language
dc.contributor.advisor최기영-
dc.contributor.author이동우-
dc.date.accessioned2018-05-28T16:23:37Z-
dc.date.available2018-05-28T16:23:37Z-
dc.date.issued2018-02-
dc.identifier.other000000150572-
dc.identifier.urihttps://hdl.handle.net/10371/140693-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 2. 최기영.-
dc.description.abstractEmerging memory technologies such as 3D-stacked memory or STT-RAM have higher density than traditional SRAM technology. As a result, these new memory technologies have recently been integrated with processors on the same chip or in the same package. These integrated emerging memory technologies provide more capacity to the processors than traditional SRAMs. Therefore, in order to improve the performance of the chip or the package, it is also important to effectively manage the memories as well as improve the performance of the processors themselves.

This dissertation researches two approaches to improve the performance of systems in which processors and emerging memories are integrated on a single chip or in a single package. The first part of this dissertation focuses on improving the performance of a system in which 3D-stacked memory is integrated with the processor in a package, assuming that the processor is generic and the memory access pattern is not predefined. A DRAM cache technique is proposed, which combines the previous approaches in a synergistic way by devising a module called dirty-block tracker to maintain dirtiness of each block in a dirty-region. The approach avoids unnecessary tag checking for a write operation if the corresponding block in the cache is not dirty. Simulation results show that the proposed technique achieves significant performance improvement on average over the state-of-the-art DRAM cache technique.

The second part of this dissertation focuses on improving the performance of a system in which an accelerator and STT-RAM are integrated on a single chip, assuming that certain algorithms, called deep neural networks, are processed on this system. A high-performance, energy-efficient accelerator is designed considering the characteristics of the neural network. While negative inputs for ReLU are useless, it consumes a lot of computing power to calculate them for deep neural networks. A computation pruning technique is proposed that detects at an early stage that the result of a sum of products will be negative by adopting an inverted two's complement expression for weights and a bit-serial sum of products. Therefore, it can skip a large amount of computations for negative results and simply set the ReLU outputs to zero. Moreover, a DNN accelerator architecture is devised that can efficiently apply the proposed technique. The evaluation shows that the accelerator using the computation pruning through early negative detection technique significantly improves the energy efficiency and the performance.
-
dc.description.tableofcontents1 Introduction 1
1.1 A DRAM Cache using 3D-stacked Memory 1
1.2 A Deep Neural Network Accelerator with STT-RAM 5
2 A DRAM Cache using 3D-stacked Memory 7
2.1 Background 7
2.1.1 Loh-Hill DRAM Cache 8
2.1.2 Alloy Cache 9
2.1.3 Mostly-Clean DRAM Cache 10
2.2 Direct-mapped DRAM Cache with Self-balancing Dispatch 12
2.2.1 A Naıve Approach 13
2.2.2 Dirty-Block Tracker (DiBT) 20
2.2.3 Sampling Hit-Miss Predictor 31
2.3 Evaluation Methodology 32
2.3.1 Experimental Setup 32
2.3.2 Workloads 33
2.4 Results 36
2.4.1 Performance 36
2.4.2 Analysis 38
2.4.3 Prediction Accuracy 42
2.4.4 Sensitivity to Sampling Hit-miss Predictor to VUPPER 43
2.4.5 Sensitivity to Dirty-Block Table Size 45
2.4.6 Scalability 46
2.4.7 Implementation Cost 46
2.5 Related Work 49
2.6 Summary 50
3 A Deep Neural Network Accelerator with STT-RAM 52
3.1 Background 52
3.1.1 Computations in CNNs 52
3.1.2 Sign Distribution of Inputs to ReLU 53
3.1.3 Twos Complement Representation 54
3.2 Early Negative Detection 55
3.2.1 Bit-serial Sum of Products 55
3.2.2 Inverted Twos Complement Representation 58
3.2.3 Early Negative Detection 58
3.3 Accelerator 60
3.3.1 Overall Architecture 61
3.3.2 Data block 62
3.3.3 Processing Unit 62
3.3.4 Buffers 65
3.3.5 Memory Controller 65
3.3.6 Providing Network 66
3.3.7 Pipelined Bit-serial Sum of Products 67
3.3.8 Global Controller 68
3.4 Evaluation 71
3.4.1 Methodology 72
3.4.2 Workloads 74
3.4.3 Normalized Runtime 77
3.4.4 Normalized Energy Consumption 80
3.4.5 Power Consumption 83
3.4.6 Normalized EDP and ED2P 85
3.4.7 Area 87
3.5 Related work 87
3.6 Summary 89
4 Conclusion 91
Abstract (In korean) 100
-
dc.formatapplication/pdf-
dc.format.extent1688389 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectDRAM Cache-
dc.subject3D-stacked Memory-
dc.subjectDirty-block Tracker-
dc.subjectDeep Neural Network Accelerator-
dc.subjectEarly Negative Detection-
dc.subjectSTT-RAM-
dc.subject.ddc621.3-
dc.titlePerformance Enhancement of Systems using Emerging Memory Technologies-
dc.title.alternative새로운 메모리 기술을 사용하는 시스템의 성능 향상-
dc.typeThesis-
dc.description.degreeDoctor-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2018-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share