Performance Enhancement of Systems using Emerging Memory Technologies

이동우

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Performance Enhancement of Systems using Emerging Memory Technologies : 새로운 메모리 기술을 사용하는 시스템의 성능 향상

DC Field	Value	Language
dc.contributor.advisor	최기영	-
dc.contributor.author	이동우	-
dc.date.accessioned	2018-05-28T16:23:37Z	-
dc.date.available	2018-05-28T16:23:37Z	-
dc.date.issued	2018-02	-
dc.identifier.other	000000150572	-
dc.identifier.uri	https://hdl.handle.net/10371/140693	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 2. 최기영.	-
dc.description.abstract	Emerging memory technologies such as 3D-stacked memory or STT-RAM have higher density than traditional SRAM technology. As a result, these new memory technologies have recently been integrated with processors on the same chip or in the same package. These integrated emerging memory technologies provide more capacity to the processors than traditional SRAMs. Therefore, in order to improve the performance of the chip or the package, it is also important to effectively manage the memories as well as improve the performance of the processors themselves. This dissertation researches two approaches to improve the performance of systems in which processors and emerging memories are integrated on a single chip or in a single package. The first part of this dissertation focuses on improving the performance of a system in which 3D-stacked memory is integrated with the processor in a package, assuming that the processor is generic and the memory access pattern is not predefined. A DRAM cache technique is proposed, which combines the previous approaches in a synergistic way by devising a module called dirty-block tracker to maintain dirtiness of each block in a dirty-region. The approach avoids unnecessary tag checking for a write operation if the corresponding block in the cache is not dirty. Simulation results show that the proposed technique achieves significant performance improvement on average over the state-of-the-art DRAM cache technique. The second part of this dissertation focuses on improving the performance of a system in which an accelerator and STT-RAM are integrated on a single chip, assuming that certain algorithms, called deep neural networks, are processed on this system. A high-performance, energy-efficient accelerator is designed considering the characteristics of the neural network. While negative inputs for ReLU are useless, it consumes a lot of computing power to calculate them for deep neural networks. A computation pruning technique is proposed that detects at an early stage that the result of a sum of products will be negative by adopting an inverted two's complement expression for weights and a bit-serial sum of products. Therefore, it can skip a large amount of computations for negative results and simply set the ReLU outputs to zero. Moreover, a DNN accelerator architecture is devised that can efficiently apply the proposed technique. The evaluation shows that the accelerator using the computation pruning through early negative detection technique significantly improves the energy efficiency and the performance.	-
dc.description.tableofcontents	1 Introduction 1 1.1 A DRAM Cache using 3D-stacked Memory 1 1.2 A Deep Neural Network Accelerator with STT-RAM 5 2 A DRAM Cache using 3D-stacked Memory 7 2.1 Background 7 2.1.1 Loh-Hill DRAM Cache 8 2.1.2 Alloy Cache 9 2.1.3 Mostly-Clean DRAM Cache 10 2.2 Direct-mapped DRAM Cache with Self-balancing Dispatch 12 2.2.1 A Naıve Approach 13 2.2.2 Dirty-Block Tracker (DiBT) 20 2.2.3 Sampling Hit-Miss Predictor 31 2.3 Evaluation Methodology 32 2.3.1 Experimental Setup 32 2.3.2 Workloads 33 2.4 Results 36 2.4.1 Performance 36 2.4.2 Analysis 38 2.4.3 Prediction Accuracy 42 2.4.4 Sensitivity to Sampling Hit-miss Predictor to VUPPER 43 2.4.5 Sensitivity to Dirty-Block Table Size 45 2.4.6 Scalability 46 2.4.7 Implementation Cost 46 2.5 Related Work 49 2.6 Summary 50 3 A Deep Neural Network Accelerator with STT-RAM 52 3.1 Background 52 3.1.1 Computations in CNNs 52 3.1.2 Sign Distribution of Inputs to ReLU 53 3.1.3 Twos Complement Representation 54 3.2 Early Negative Detection 55 3.2.1 Bit-serial Sum of Products 55 3.2.2 Inverted Twos Complement Representation 58 3.2.3 Early Negative Detection 58 3.3 Accelerator 60 3.3.1 Overall Architecture 61 3.3.2 Data block 62 3.3.3 Processing Unit 62 3.3.4 Buffers 65 3.3.5 Memory Controller 65 3.3.6 Providing Network 66 3.3.7 Pipelined Bit-serial Sum of Products 67 3.3.8 Global Controller 68 3.4 Evaluation 71 3.4.1 Methodology 72 3.4.2 Workloads 74 3.4.3 Normalized Runtime 77 3.4.4 Normalized Energy Consumption 80 3.4.5 Power Consumption 83 3.4.6 Normalized EDP and ED2P 85 3.4.7 Area 87 3.5 Related work 87 3.6 Summary 89 4 Conclusion 91 Abstract (In korean) 100	-
dc.format	application/pdf	-
dc.format.extent	1688389 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	DRAM Cache	-
dc.subject	3D-stacked Memory	-
dc.subject	Dirty-block Tracker	-
dc.subject	Deep Neural Network Accelerator	-
dc.subject	Early Negative Detection	-
dc.subject	STT-RAM	-
dc.subject.ddc	621.3	-
dc.title	Performance Enhancement of Systems using Emerging Memory Technologies	-
dc.title.alternative	새로운 메모리 기술을 사용하는 시스템의 성능 향상	-
dc.type	Thesis	-
dc.description.degree	Doctor	-
dc.contributor.affiliation	공과대학 전기·컴퓨터공학부	-
dc.date.awarded	2018-02	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Files in This Item:

000000150572.pdf 1.61 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share