Unlocking Wordline-level Parallelism for Fast Inference on RRAM-based DNN Accelerator

Park, Yeonhong; Lee, Seung Yul; Shin, Hoon; Heo, Jun; Ham, Tae Jun; Lee, Jae Wook

doi:10.1145/3400302.3415664

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Unlocking Wordline-level Parallelism for Fast Inference on RRAM-based DNN Accelerator

DC Field	Value	Language
dc.contributor.author	Park, Yeonhong	-
dc.contributor.author	Lee, Seung Yul	-
dc.contributor.author	Shin, Hoon	-
dc.contributor.author	Heo, Jun	-
dc.contributor.author	Ham, Tae Jun	-
dc.contributor.author	Lee, Jae Wook	-
dc.date.accessioned	2022-10-17T03:51:51Z	-
dc.date.available	2022-10-17T03:51:51Z	-
dc.date.created	2022-06-07	-
dc.date.issued	2020-11	-
dc.identifier.citation	IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, p. 103	-
dc.identifier.issn	1092-3152	-
dc.identifier.uri	https://hdl.handle.net/10371/186111	-
dc.description.abstract	© 2020 Association on Computer Machinery.In-memory computing is rapidly rising as a viable solution that can effectively accelerate neural networks by overcoming the memory wall. Resistive RAM (RRAM) crossbar array is in the spotlight as a building block for DNN inference accelerators since it can perform a massive amount of dot products in memory in an area- and power-efficient manner. However, its in-memory computation is vulnerable to errors due to the non-ideality of RRAM cells. This error-prone nature of RRAM crossbar limits its wordline-level parallelism as activating a large number of wordlines accumulates non-zero current contributions from RRAM cells in the high-resistance state as well as current deviations from individual cells, leading to a significant accuracy drop. To improve performance by increasing the maximum number of concurrently activated wordlines, we propose two techniques. First, we introduce a lightweight scheme that effectively eliminates the current contributions from high-resistance state cells. Second, based on the observation that not all layers in a neural network model have the same error rates and impact on the inference accuracy, we propose to allow different layers to activate non-uniform numbers of wordlines concurrently. We also introduce a systematic methodology to determine the number of concurrently activated wordlines for each layer with a goal of optimizing performance, while minimizing the accuracy degradation. Our proposed techniques increase the inference throughput by 3-10× with a less than 1% accuracy drop over three datasets. Our evaluation also demonstrates that this benefit comes with a small cost of only 8.2% and 5.3% increase in area and power consumption, respectively.	-
dc.language	영어	-
dc.publisher	ICCAD	-
dc.title	Unlocking Wordline-level Parallelism for Fast Inference on RRAM-based DNN Accelerator	-
dc.type	Article	-
dc.identifier.doi	10.1145/3400302.3415664	-
dc.citation.journaltitle	IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers	-
dc.identifier.wosid	000671087100045	-
dc.identifier.scopusid	2-s2.0-85097956354	-
dc.citation.startpage	103	-
dc.description.isOpenAccess	N	-
dc.contributor.affiliatedAuthor	Lee, Jae Wook	-
dc.type.docType	Conference Paper	-
dc.description.journalClass	1	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Journal Papers (저널논문_컴퓨터공학부)

Files in This Item:: There are no files associated with this item.

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share