Elastic Distributed Training of Deep Neural Networks

이경근

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Elastic Distributed Training of Deep Neural Networks : 딥 뉴럴 네트워크의 탄력적 분산학습

DC Field	Value	Language
dc.contributor.advisor	전병곤	-
dc.contributor.author	이경근	-
dc.date.accessioned	2022-04-05T05:51:13Z	-
dc.date.available	2022-04-05T05:51:13Z	-
dc.date.issued	2021	-
dc.identifier.other	000000167904	-
dc.identifier.uri	https://hdl.handle.net/10371/177682	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000167904	ko_KR
dc.description	학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 이경근.	-
dc.description.abstract	As the training of Deep Neural Network (DNN) models relies more and more heavily on the shared GPU clusters or cloud computing services, elastic training of DNN has much potential gain for both the users and the managers of the shared clusters, such as idle resource utilization, job completion time (JCT), and responsiveness. However, making a distributed DNN training job elastic is not a trivial problem because we should handle the DNN training jobs states appropriately upon scaling events. Moreover, it is even more challenging to achieve both efficient scaling mechanism and correct job state management, which are the two conflicting goals. In this paper, we discuss the problem of state management in an elastic distributed DNN training jobs, and propose a design for fast and safe elastic DNN training system that can support various types of training jobs. We implemented an elastic training framework, named Elastic Parallax, and validated our system on the data-parallel training workloads.	-
dc.description.abstract	딥 뉴럴 네트워크(DNN) 모델들이 점점 공유 GPU 클러스터 또는 클라우드 컴퓨팅 서비스에 의존하게 됨에 따라, 유휴자원 활용, JCT, 반응성 등, 클러스터 사용자와 관리자 모두에게 있어 탄력적 학습을 지원하는 것의 잠재적 이점이 많아지고 있다. 그러나 분산 DNN 학습 작업을 탄력적으로 동작하게 만드는 것은 어려운 일인데, 왜냐하면 DNN 학습 작업을 탄력적이게 만들려면 스케일링 시마다 작업의 상태를 적절하게 관리해 주어야 하기 때문이다. 게다가, 효율적인 스케일링 메카니즘과 적절한 작업 상태 관리는 동시에 이루기 어려운 목표들이다. 따라서 본 논문에서는, 탄력적 분산 DNN 학습 작업의 상태 관리 문제를 논의하고, 이를 바탕으로 다양한 종류의 학습 작업을 지원할 수 있는 빠르고 안전한 탄력적 DNN 학습 시스템 디자인을 제안한다. 또한, 탄력적 학습 프레임워크인 Elastic Parallax를 직접 구현하고, 실제 데이터 병렬 학습 작업들에 대하여 시스템을 검증한다.	-
dc.description.tableofcontents	Abstract 1 1 Introduction 5 2 Background 8 2.1 Distributed DNN Training 8 2.2 Elastic Distributed Training 10 3 Problem Statement 12 3.1 Definitions 12 3.1.1 State and State Consistency 12 3.1.2 Elasticity 13 3.2 State Management Problem of Elastic DNN Training 14 4 State Synchronization for Elastic Training 16 4.1 Classification of State Constraints 16 4.2 State Synchronization Operations 17 4.2.1 Replicated States 18 4.2.2 Partitioned States 18 4.2.3 Singleton States 18 4.3 Implication on API Design 19 5 API and System Design 20 5.1 API Design 20 5.2 System Architecture 22 5.3 Implementation 24 5.3.1 Two-Phase Rendezvous 24 5.3.2 Elastic Input Pipeline 25 6 Evaluation 26 6.1 Evaluation Setup 26 6.1.1 Environment 26 6.1.2 Workloads 26 6.2 Replicated Data Parallelism 27 6.3 Partitioned Data Parallelism 28 7 Related Work 33 7.1 Elastic Machine Learning 33 7.2 Elastic DNN Training 33 8 Discussion and Conclusion 35 초록 42	-
dc.format.extent	ii, 42	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Deep Learning	-
dc.subject	Distributed Training	-
dc.subject	Elasticity	-
dc.subject	딥러닝	-
dc.subject	분산학습	-
dc.subject	탄력성	-
dc.subject.ddc	621.39	-
dc.title	Elastic Distributed Training of Deep Neural Networks	-
dc.title.alternative	딥 뉴럴 네트워크의 탄력적 분산학습	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Kyunggeun Lee	-
dc.contributor.department	공과대학 컴퓨터공학부	-
dc.description.degree	석사	-
dc.date.awarded	2021-08	-
dc.identifier.uci	I804:11032-000000167904	-
dc.identifier.holdings	000000000046▲000000000053▲000000167904▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Files in This Item:

000000167904.pdf 2.42 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share