Publications

Detailed Information

ELSA: Hardware-software Co-design for efficient, lightweight self-attention mechanism in neural networks

DC Field Value Language
dc.contributor.authorHam, Tae Jun-
dc.contributor.authorLee, Yejin-
dc.contributor.authorSeo, Seong Hoon-
dc.contributor.authorKim, Soosung-
dc.contributor.authorChoi, Hyunji-
dc.contributor.authorJung, Sung Jun-
dc.contributor.authorLee, Jae Wook-
dc.date.accessioned2022-06-24T00:25:56Z-
dc.date.available2022-06-24T00:25:56Z-
dc.date.created2022-05-09-
dc.date.issued2021-06-
dc.identifier.citationConference Proceedings - Annual International Symposium on Computer Architecture, ISCA, Vol.2021-June, pp.692-705-
dc.identifier.issn1063-6897-
dc.identifier.urihttps://hdl.handle.net/10371/183738-
dc.description.abstract© 2021 IEEE.The self-attention mechanism is rapidly emerging as one of the most important key primitives in neural networks (NNs) for its ability to identify the relations within input entities. The self-attention-oriented NN models such as Google Transformer and its variants have established the state-of-the-art on a very wide range of natural language processing tasks, and many other self-attention-oriented models are achieving competitive results in computer vision and recommender systems as well. Unfortunately, despite its great benefits, the self-attention mechanism is an expensive operation whose cost increases quadratically with the number of input entities that it processes, and thus accounts for a significant portion of the inference runtime. Thus, this paper presents ELSA (Efficient, Lightweight Self-Attention), a hardware-software co-designed solution to substantially reduce the runtime as well as energy spent on the self-attention mechanism. Specifically, based on the intuition that not all relations are equal, we devise a novel approximation scheme that significantly reduces the amount of computation by efficiently filtering out relations that are unlikely to affect the final output. With the specialized hardware for this approximate self-attention mechanism, ELSA achieves a geomean speedup of 58.1× as well as over three orders of magnitude improvements in energy efficiency compared to GPU on self-attention computation in modern NN models while maintaining less than 1% loss in the accuracy metric.-
dc.language영어-
dc.publisherIEEE-
dc.titleELSA: Hardware-software Co-design for efficient, lightweight self-attention mechanism in neural networks-
dc.typeArticle-
dc.identifier.doi10.1109/ISCA52012.2021.00060-
dc.citation.journaltitleConference Proceedings - Annual International Symposium on Computer Architecture, ISCA-
dc.identifier.wosid000702275600051-
dc.identifier.scopusid2-s2.0-85114693285-
dc.citation.endpage705-
dc.citation.startpage692-
dc.citation.volume2021-June-
dc.description.isOpenAccessN-
dc.contributor.affiliatedAuthorLee, Jae Wook-
dc.type.docTypeConference Paper-
dc.description.journalClass1-
Appears in Collections:
Files in This Item:
There are no files associated with this item.

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share