Publications

Detailed Information

Performance Enhancement of Systems using Emerging Memory Technologies : 새로운 메모리 기술을 사용하는 시스템의 성능 향상

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

이동우

Advisor
최기영
Major
공과대학 전기·컴퓨터공학부
Issue Date
2018-02
Publisher
서울대학교 대학원
Keywords
DRAM Cache3D-stacked MemoryDirty-block TrackerDeep Neural Network AcceleratorEarly Negative DetectionSTT-RAM
Description
학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 2. 최기영.
Abstract
Emerging memory technologies such as 3D-stacked memory or STT-RAM have higher density than traditional SRAM technology. As a result, these new memory technologies have recently been integrated with processors on the same chip or in the same package. These integrated emerging memory technologies provide more capacity to the processors than traditional SRAMs. Therefore, in order to improve the performance of the chip or the package, it is also important to effectively manage the memories as well as improve the performance of the processors themselves.

This dissertation researches two approaches to improve the performance of systems in which processors and emerging memories are integrated on a single chip or in a single package. The first part of this dissertation focuses on improving the performance of a system in which 3D-stacked memory is integrated with the processor in a package, assuming that the processor is generic and the memory access pattern is not predefined. A DRAM cache technique is proposed, which combines the previous approaches in a synergistic way by devising a module called dirty-block tracker to maintain dirtiness of each block in a dirty-region. The approach avoids unnecessary tag checking for a write operation if the corresponding block in the cache is not dirty. Simulation results show that the proposed technique achieves significant performance improvement on average over the state-of-the-art DRAM cache technique.

The second part of this dissertation focuses on improving the performance of a system in which an accelerator and STT-RAM are integrated on a single chip, assuming that certain algorithms, called deep neural networks, are processed on this system. A high-performance, energy-efficient accelerator is designed considering the characteristics of the neural network. While negative inputs for ReLU are useless, it consumes a lot of computing power to calculate them for deep neural networks. A computation pruning technique is proposed that detects at an early stage that the result of a sum of products will be negative by adopting an inverted two's complement expression for weights and a bit-serial sum of products. Therefore, it can skip a large amount of computations for negative results and simply set the ReLU outputs to zero. Moreover, a DNN accelerator architecture is devised that can efficiently apply the proposed technique. The evaluation shows that the accelerator using the computation pruning through early negative detection technique significantly improves the energy efficiency and the performance.
Language
English
URI
https://hdl.handle.net/10371/140693
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share