Designing FPGA-based modular architectures for NLP models

허수연

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Designing FPGA-based modular architectures for NLP models : 자연어 처리 모델을 위한 FPGA 기반 모듈러 아키텍처 설계

DC Field	Value	Language
dc.contributor.advisor	김장우	-
dc.contributor.author	허수연	-
dc.date.accessioned	2022-06-08T06:37:14Z	-
dc.date.available	2022-06-08T06:37:14Z	-
dc.date.issued	2022	-
dc.identifier.other	000000169616	-
dc.identifier.uri	https://hdl.handle.net/10371/181138	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000169616	ko_KR
dc.description	학위논문(석사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022.2. 김장우.	-
dc.description.abstract	Neural networks based natural language processing (NLP) models (e.g., LSTM, BERT) are emerging as promising solutions for NLP tasks. When running NLP models, we should support fast inference in a single batch environment, as NLP tasks require immediate responses. However, it is difficult to accelerate NLP models in a single batch due to the three challenges that follow; (1) a wide range of dimensions and irregular matrix operations, (2) non-negligible vector operations latency, and (3) heterogeneity of vector operations. In this paper, we propose FlexRun, an FPGA-based modular architecture approach to solve three challenges. FlexRun reconfigures the architecture adaptively to the input models. To this end, FlexRun consists of three parts. First, FlexRun:Architecture is a base architecture template with reconfigurable parameters. Next, in FlexRun:Algorithm, we define the design space and suggest algorithms to find the best design points in the design space. Lastly, FlexRun:Automation automatically finds the best design and implements the resulting architecture. For evaluation, we use Intels high-end FPGAs and achieve 2.69 and 1.44 speedup compared to V100 and Brainwave-like FPGA baseline, respectively.	-
dc.description.abstract	최근 딥러닝 기반의 자연어 처리 모델이 음성인식, 번역과 같은 자연어 처리 과제에 적극적으로 활용되고 있다. 자연어 처리 과제는 주로 즉각적인 반응을 요구하기 때문에 단일 배치 환경에서 자연어 처리 모델의 빠른 추론을 지원하는 것이 필수적이다. 하지만 자연어 처리 모델이 가진 특성들로 인해 단일 배치에서 자연어 처리를 가속하는 것은 힘들다. 해당 특성들은 다음과 같다; (1) 넓은 범위의 디멘션과 불균형한 매트릭스 연산, (2) 벡터 연산의 오버헤드, 그리고 (3) 벡터연산의 다양성. 본 학위논문에서는 FlexRun을 제안하여 세 가지 특성들을 해결하고 단일 배치 환경에서 자연어 처리의 추론을 가속한다. FlexRun은 FPGA의 높은 reconfigurability를 활용하여 주어진 타깃 모델에 맞게 아키텍처를 디자인한다. FlexRun에는 세 가지 기술이 있다. 첫 번째는 FPGA를 기반으로 하며 재구성 가능한 요소들로 이루어진 베이스 아키텍처 템플릿이다. 두 번째는 디자인 스페이스를 정의하고 디자인 스페이스에서 타깃 모델에 따라 최적의 디자인 포인트를 찾는 알고리즘이다. 마지막으로는 최적의 디자인을 찾는 것에서부터 아키텍처를 구현하는 일련의 과정들을 자동화하는 툴이다. 본 논문에서는 FlexRun을 적용하여 GPU 베이스라인과 FPGA 기반의 Brainwave-like 베이스라인과 비교해 유의미한 성능향상을 보여준다.	-
dc.description.tableofcontents	1. INTRODUCTION 1 2. Background 4 2.1 Neural Networks-based NLP models 4 2.1.1 RNN-based NLP Models 4 2.1.2 Attention-based NLP Models 5 2.2 Fast inference support for NLP tasks 6 3. Motivation 8 3.1 Chracteristics of NLP models 8 3.1.1 Diverse operational complexities 8 3.1.2 Varying range of dimensions 9 3.1.3 Various parameter configurations 10 3.1.4 Heterogeneous vector operations 10 3.2 Challenges of NLP models 12 3.2.1 Challenge 1: Wide range of dimensions and irregular matrix operations 12 3.2.2 Challenge 2: Non-negligible vector operations latency 12 3.2.3 Challenge 3: Heterogeneity of vector operations 13 3.3 Limitations of previous works 13 3.3.1 GPU (general-purpose accelerator) 14 3.3.2 ASICs 15 3.3.3 FPGA 16 3.4 Solutions 16 4. FlexRun 18 4.1 Overview 18 4.2 FlexRun:Architecture 19 4.2.1 Structure of of FlexRun:Architecture 20 4.2.2 Working mechanism of FlexRun:Architecture 22 4.3 FlexRun:Algorithm - Design Space 23 4.3.1 Design space of Gemv-unit: (#TILE, #DPE, LANE size) 24 4.3.2 Design space of Vec-unit: types, number, and order of basic vector operators 25 4.4 FlexRun:Algorithm - Design space exploration 27 4.4.1 Gemv-unit Rearrangement 27 4.4.2 Vec-unit Reconstruction 28 4.5 FlexRun:Automation 30 4.5.1 FlexRun:Generators 30 5. Implementation 32 5.1 FlexRun 32 5.1.1 FlexRun 32 5.1.2 Memory 33 5.2 Workloads and Experimental Setup 34 5.2.1 Workloads 34 5.2.2 Experimental setup 35 6. Evaluation 36 6.1 Performance improvement of FlexRun compared to the Baseline 36 6.2 Comparison of FlexRun and GPU 38 6.3 Scalability of FlexRun 39 6.4 Effectiveness of FlexRun 40 7. RelatedWork 41 8. Conclusion 43 Abstract (In Korean) 49	-
dc.format.extent	49	-
dc.language.iso	kor	-
dc.publisher	서울대학교 대학원	-
dc.subject	딥러닝	-
dc.subject	자연어 처리	-
dc.subject	FPGA	-
dc.subject	모듈러 아키텍처	-
dc.subject	가속기 설계	-
dc.subject	하드웨어 아키텍쳐	-
dc.subject	디자인스페이스 탐색	-
dc.subject.ddc	621.3	-
dc.title	Designing FPGA-based modular architectures for NLP models	-
dc.title.alternative	자연어 처리 모델을 위한 FPGA 기반 모듈러 아키텍처 설계	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Hur Suyeon	-
dc.contributor.department	공과대학 전기·정보공학부	-
dc.description.degree	석사	-
dc.date.awarded	2022-02	-
dc.identifier.uci	I804:11032-000000169616	-
dc.identifier.holdings	000000000047▲000000000054▲000000169616▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Files in This Item:

000000169616.pdf 3.04 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share