McDRAM: Low Latency and Energy-Efficient Matrix Computation in DRAM

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

McDRAM: Low Latency and Energy-Efficient Matrix Computation in DRAM

Cited 0 time in Web of Science Cited 0 time in Scopus

Abstract: Neural networks are characterized by massively parallel computation and high memory bandwidth. In particular, memory bandwidth severely limits performance and increases power consumption. In order to overcome memory bottleneck of neural network applications, we propose a novel memory architecture called McDRAM where DRAM dies are equipped with a large number of multiplier-accumulator (MAC) units to perform neural networks internally. Each bank of DRAM memory has multiple MACs as much as the size of memory pre-fetch data, thereby fully utilizing internal bandwidth of DRAM which far larger than external memory bandwidth. McDRAM broadcast data efficiently to all bank without any modifications of DRAM data bus, and it performs MAC operations in the all banks with a single DRAM command. McDRAM is implemented based on the state-of-the-art commercial memory architecture, HBM2, and it equips thousands of MACs (up to 6,144 in HBM2) in a single DRAM package. According to our experiments with in-house memory models based on commercial JEDEC HBM2 simulator, McDRAM achieves 18.68x TOPS/W performance compared to the state-of-the-art hardware accelerator (Google TPU) in LSTM.

Files in This Item:

Appears in Collections:

Show Full Item Record

Find it @ SNU

SNS Share