Publications
Detailed Information
Unlocking Wordline-level Parallelism for Fast Inference on RRAM-based DNN Accelerator
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Park, Yeonhong | - |
dc.contributor.author | Lee, Seung Yul | - |
dc.contributor.author | Shin, Hoon | - |
dc.contributor.author | Heo, Jun | - |
dc.contributor.author | Ham, Tae Jun | - |
dc.contributor.author | Lee, Jae Wook | - |
dc.date.accessioned | 2022-10-17T03:51:51Z | - |
dc.date.available | 2022-10-17T03:51:51Z | - |
dc.date.created | 2022-06-07 | - |
dc.date.issued | 2020-11 | - |
dc.identifier.citation | IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, p. 103 | - |
dc.identifier.issn | 1092-3152 | - |
dc.identifier.uri | https://hdl.handle.net/10371/186111 | - |
dc.description.abstract | © 2020 Association on Computer Machinery.In-memory computing is rapidly rising as a viable solution that can effectively accelerate neural networks by overcoming the memory wall. Resistive RAM (RRAM) crossbar array is in the spotlight as a building block for DNN inference accelerators since it can perform a massive amount of dot products in memory in an area- and power-efficient manner. However, its in-memory computation is vulnerable to errors due to the non-ideality of RRAM cells. This error-prone nature of RRAM crossbar limits its wordline-level parallelism as activating a large number of wordlines accumulates non-zero current contributions from RRAM cells in the high-resistance state as well as current deviations from individual cells, leading to a significant accuracy drop. To improve performance by increasing the maximum number of concurrently activated wordlines, we propose two techniques. First, we introduce a lightweight scheme that effectively eliminates the current contributions from high-resistance state cells. Second, based on the observation that not all layers in a neural network model have the same error rates and impact on the inference accuracy, we propose to allow different layers to activate non-uniform numbers of wordlines concurrently. We also introduce a systematic methodology to determine the number of concurrently activated wordlines for each layer with a goal of optimizing performance, while minimizing the accuracy degradation. Our proposed techniques increase the inference throughput by 3-10× with a less than 1% accuracy drop over three datasets. Our evaluation also demonstrates that this benefit comes with a small cost of only 8.2% and 5.3% increase in area and power consumption, respectively. | - |
dc.language | 영어 | - |
dc.publisher | ICCAD | - |
dc.title | Unlocking Wordline-level Parallelism for Fast Inference on RRAM-based DNN Accelerator | - |
dc.type | Article | - |
dc.identifier.doi | 10.1145/3400302.3415664 | - |
dc.citation.journaltitle | IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers | - |
dc.identifier.wosid | 000671087100045 | - |
dc.identifier.scopusid | 2-s2.0-85097956354 | - |
dc.citation.startpage | 103 | - |
dc.description.isOpenAccess | N | - |
dc.contributor.affiliatedAuthor | Lee, Jae Wook | - |
dc.type.docType | Conference Paper | - |
dc.description.journalClass | 1 | - |
- Appears in Collections:
- Files in This Item:
- There are no files associated with this item.
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.