ELSA: Hardware-software Co-design for efficient, lightweight self-attention mechanism in neural networks

Ham, Tae Jun; Lee, Yejin; Seo, Seong Hoon; Kim, Soosung; Choi, Hyunji; Jung, Sung Jun; Lee, Jae Wook

doi:10.1109/ISCA52012.2021.00060

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

ELSA: Hardware-software Co-design for efficient, lightweight self-attention mechanism in neural networks

DC Field	Value	Language
dc.contributor.author	Ham, Tae Jun	-
dc.contributor.author	Lee, Yejin	-
dc.contributor.author	Seo, Seong Hoon	-
dc.contributor.author	Kim, Soosung	-
dc.contributor.author	Choi, Hyunji	-
dc.contributor.author	Jung, Sung Jun	-
dc.contributor.author	Lee, Jae Wook	-
dc.date.accessioned	2022-06-24T00:25:56Z	-
dc.date.available	2022-06-24T00:25:56Z	-
dc.date.created	2022-05-09	-
dc.date.issued	2021-06	-
dc.identifier.citation	Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA, Vol.2021-June, pp.692-705	-
dc.identifier.issn	1063-6897	-
dc.identifier.uri	https://hdl.handle.net/10371/183738	-
dc.description.abstract	© 2021 IEEE.The self-attention mechanism is rapidly emerging as one of the most important key primitives in neural networks (NNs) for its ability to identify the relations within input entities. The self-attention-oriented NN models such as Google Transformer and its variants have established the state-of-the-art on a very wide range of natural language processing tasks, and many other self-attention-oriented models are achieving competitive results in computer vision and recommender systems as well. Unfortunately, despite its great benefits, the self-attention mechanism is an expensive operation whose cost increases quadratically with the number of input entities that it processes, and thus accounts for a significant portion of the inference runtime. Thus, this paper presents ELSA (Efficient, Lightweight Self-Attention), a hardware-software co-designed solution to substantially reduce the runtime as well as energy spent on the self-attention mechanism. Specifically, based on the intuition that not all relations are equal, we devise a novel approximation scheme that significantly reduces the amount of computation by efficiently filtering out relations that are unlikely to affect the final output. With the specialized hardware for this approximate self-attention mechanism, ELSA achieves a geomean speedup of 58.1× as well as over three orders of magnitude improvements in energy efficiency compared to GPU on self-attention computation in modern NN models while maintaining less than 1% loss in the accuracy metric.	-
dc.language	영어	-
dc.publisher	IEEE	-
dc.title	ELSA: Hardware-software Co-design for efficient, lightweight self-attention mechanism in neural networks	-
dc.type	Article	-
dc.identifier.doi	10.1109/ISCA52012.2021.00060	-
dc.citation.journaltitle	Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA	-
dc.identifier.wosid	000702275600051	-
dc.identifier.scopusid	2-s2.0-85114693285	-
dc.citation.endpage	705	-
dc.citation.startpage	692	-
dc.citation.volume	2021-June	-
dc.description.isOpenAccess	N	-
dc.contributor.affiliatedAuthor	Lee, Jae Wook	-
dc.type.docType	Conference Paper	-
dc.description.journalClass	1	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Journal Papers (저널논문_컴퓨터공학부)

Files in This Item:: There are no files associated with this item.

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share