TreeML: Taming Hyper-parameter Optimization of Deep Learning with Stage Trees

Abstract: 초 매개변수 최적화는 딥러닝 모델의 성능을 한계까지 끌어올리기 위해서는 필수 불가결한 과정이다. Study, 혹은 초 매개변수 최적화 작업은 각각 다른 초 매개변수 값을 가진 무수히 많은 딥러닝 학습 작업으로 이루어져 있으며, 각 학습 작업은 trial이라 불린다. 매우 많은 학습을 해야 하기에 연산이 많고, 짧게는 몇 시간에서 몇 주일씩 걸리기도 한다. 본 연구에서는 한 초 매개변수 최적화 작업으로부터 파생된 여러 trial 들의 초 매개변수 순열이 공통된 앞부분을 가짐을 밝힌다. 이러 한 발견으로부터, Hippo라는 새 시스템을 제안한다. Hippo는 trial들에서 공통된 순열 앞부분을 찾아 연산 결과를 재활용하여 전체 연산량을 크게 줄인다. 기존 초 매개변수 최적화 시스템은 trial마다 매번 새로 학습하는 반면, Hippo는 주어진 초 매개변수 순열을 stage라는 작은 단위로 쪼개어 동일한 stage끼리 합쳐 stage tree의 형태로 만든다. Hippo는 Search Plan이라는 내부 자료구조를 통해 현 초 매개변수 최적화 study의 모든 상태를 기록하며, 임계 경로 기반 스케줄러를 통해 전체 작업 수행 시간을 최적화한다. Hippo는 한 번에 한 개의 study뿐만 아니라, 복수의 study도 동시에 수행할 수 있다. Hippo는 여러 모델과 여러 초 매개변수 최적화 알고리즘에서 기존의 초 매개변수 최적화 시스템보다 전체 수행 시간을 최대 2.76배, GPU hour를 최대 4.81배 최적화한다. 복수의 study를 동시에 수행할 경우 수행 시간은 최대 4.81배, GPU hour는 최대 6.77배 최적화 할 수 있다.
Hyper-parameter optimization is crucial for pushing the accuracy of a deep learning model to its limits. A hyper-parameter optimization job, referred to as a study, involves numerous trials of training a model using different training knobs, and therefore is very computation-heavy, typically taking hours and days to finish.
We observe that trials issued from hyper-parameter optimization algorithms often share common hyper-parameter sequence prefixes. Based on this observa- tion, we propose TreeML, a hyper-parameter optimization system that reuses computation across trials to reduce the overall amount of computation signifi- cantly. Instead of treating each trial independently as in existing hyper-parameter optimization systems, TreeML breaks down the hyper-parameter sequences into stages and merges common stages to form a tree of stages (a stage tree). TreeML maintains an internal data structure, search plan, to manage the current status and history of a study, and employs a critical path based scheduler to minimize the overall study completion time. TreeML is applicable to not only single studies, but multi-study scenarios as well. Evaluations show that TreeMLs stage-based execution strategy outperforms trial-based methods for several mod- els and hyper-parameter optimization algorithms, reducing end-to-end training time by up to 2.76× (3.53×) and GPU-hours by up to 4.81× (6.77×), for single
(multiple) studies.

Language: eng

URI: https://hdl.handle.net/10371/178727

https://dcollection.snu.ac.kr/common/orgView/000000168056

Files in This Item:

000000168056.pdf 4.77 MB

Appears in Collections:

College of Education (사범대학)
- Program in Global Education Cooperation (협동과정-글로벌교육협력전공)
  - Theses (Master's Degree_협동과정-글로벌교육협력전공)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share