Systematic Approaches for Efficient Training of Deep Learning Models

김태범

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Systematic Approaches for Efficient Training of Deep Learning Models : 효율적인 딥러닝 모델 학습을 위한 시스템적 접근

DC Field	Value	Language
dc.contributor.advisor	전병곤	-
dc.contributor.author	김태범	-
dc.date.accessioned	2023-11-20T04:24:50Z	-
dc.date.available	2023-11-20T04:24:50Z	-
dc.date.issued	2023	-
dc.identifier.other	000000177892	-
dc.identifier.uri	https://hdl.handle.net/10371/196509	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000177892	ko_KR
dc.description	학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2023. 8. 전병곤.	-
dc.description.abstract	The growing demand for deep learning (DL) models has created a positive feedback loop with the software systems for DL training. On account of the matured optimizations of such software systems, DL models can be efficiently trained by exploiting the computation resources of DL accelerators. However, several difficulties that hinder the application of such optimizations are still emerging as the structure of models diversifies, and the scale of models increases. Without resolving those difficulties, DL training would yield inefficiencies in practice. In this dissertation, we investigate the reasons for such inefficiencies and design two novel software systems that resolve the inefficiencies. We first propose Terra, a system that handles the inefficient performance of imperative execution. Terra conducts an imperative-symbolic co-execution that performs the imperative execution of a DL program while delegating the decoupled DL operations to the symbolic execution. Accordingly, Terra can execute any imperative DL program with the optimized performance of the symbolic execution, achieving at most 1.73x speed up compared to the imperative execution. Next, we propose BPipe to resolve the memory inefficiency of pipeline parallelism in large language model training. We introduce a novel pipeline parallelism approach with an activation balancing method. With BPipe, we can train the same model more efficiently, up to 2.17x faster, by making all devices utilize comparable amounts of memory.	-
dc.description.abstract	딥러닝 모델에 대한 수요가 빠르게 증가하면서, 딥러닝 학습을 위한 소프트웨어 시스템들의 빠른 발전도 촉진하고 있다. 그러한 소프트웨어 시스템들은 딥러닝 모델들이 여러 딥러닝 가속기들의 계산 리소스를 최대로 이용하면서 학습할 수 있도록 상당히 완성도 있는 최적화를 지원한다. 하지만, 딥러닝 모델 구조가 다양 해지고, 모델의 크기가 계속 증가하면서 그러한 최적화에 방해가 되는 요소들이 끊임없이 생겨나고 있다. 만약 그러한 요소들을 해결하지 못하면, 비효율적으로 딥러닝 학습을 수행하게 된다. 이 논문에서는, 이러한 비효율성의 종류와 원인을 분석하고, 이를 해결하는 두 가지 새로운 시스템을 소개한다. 첫 번째 시스템 테라는 (Terra) 명령형 (imperative) 수행 모델이 갖는 비효율 적인 학습 속도를 해결한다. 테라는 명령형으로 딥러닝 프로그램을 수행을 하는 동시에 딥러닝 연산들을 분리하여 심볼릭 (symbolic) 수행한다. 그에 따라, 테라는 명령형 수행을 염두에 두고 작성된 딥러닝 프로그램에도 심볼릭 수행의 빠른 최적 화를 적용할 수 있게 하고, 명령형 수행 대비 최대 1.73배 빠른 학습을 지원한다. 그 다음으로 소개하는 시스템인 비파이프는 (BPipe) 대규모 모델 학습에서 메모리 비효율성을 해결한다. 비파이프는 학습 중간 생성되는 활성화 텐서량의 균형을 맞추는 새로운 파이프라인 병렬 학습 방법을 제시한다. 비파이프를 사용하면, 분산학습에서 모든 딥러닝 가속기들이 비슷한 양의 메모리를 사용하도록 만들어서, 최대 2.17배 만큼 빠른 학습을 수행할 수 있다.	-
dc.description.tableofcontents	Abstract i Chapter 1 Introduction 1 1.1 Efficiency in Deep Learning Model Training 1 1.2 Proposed Systems 2 1.2.1 Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs 2 1.2.2 Memory-Balanced Pipeline Parallelism for Training Large Language Models 5 1.3 Contributions 7 1.4 Dissertation Overview 7 Chapter 2 Background 10 2.1 Imperative and Symbolic Execution10 2.2 Imperative Program with Symbolic Execution12 2.3 Model Parallelism in Large Language Model Training 15 Chapter 3 Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs 17 3.1 Our Approach: Imperative-Symbolic Co-Execution 17 3.2 System Design 18 3.2.1 Imperative-Symbolic Co-Execution 19 3.2.2 Symbolic Graph Generation 21 3.3 Evaluation 35 3.3.1 Implementation Detail 35 3.3.2 Experiment Setup 37 3.3.3 Imperative Program Coverage 38 3.3.4 Training Throughput 40 3.3.5 Tracing Phase Analysis 44 3.4 Summary ............................... 45 Chapter 4 Memory-Balanced Pipeline Parallelism for Training Large Language Models 46 4.1 Motivation 46 4.2 Method 48 4.2.1 Pipeline Memory Imbalance 48 4.2.2 Activation Balancing 49 4.2.3 Balanced Memory Objective 51 4.2.4 Transfer Schedule 52 4.2.5 Pair-Adjacent Assignment 55 4.3 Evaluation 58 4.3.1 Implementation and Environment Setup 58 4.3.2 Training Performance 59 4.3.3 Memory Balancing 63 4.3.4 Performance Analysis 64 4.3.5 Communication Bandwidth Requirement for Transfer 65 4.4 Summary 68 Chapter 5 Related Work 70 Chapter 6 Future Work & Conclusion 75 6.1 Future Work 75 6.2 Conclusion 76 Bibliography 77 초록 97	-
dc.format.extent	xi,98	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	deep learning system	-
dc.subject	deep learning training	-
dc.subject.ddc	621.39	-
dc.title	Systematic Approaches for Efficient Training of Deep Learning Models	-
dc.title.alternative	효율적인 딥러닝 모델 학습을 위한 시스템적 접근	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	김태범	-
dc.contributor.department	공과대학 컴퓨터공학부	-
dc.description.degree	박사	-
dc.date.awarded	2023-08	-
dc.contributor.major	컴퓨터공학과	-
dc.identifier.uci	I804:11032-000000177892	-
dc.identifier.holdings	000000000050▲000000000058▲000000177892▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Files in This Item:

000000177892.pdf 3.55 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share