(A) study on hyperparameter optimization strategy utilizing training time in deep neural networks

조형헌

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

(A) study on hyperparameter optimization strategy utilizing training time in deep neural networks : 훈련 시간을 활용한 심층 신경망의 하이퍼파라미터 최적화 전략 연구

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 조형헌

Advisor: Wonjong Rhee

Major: 융합과학기술대학원 융합과학부

Issue Date: 2017-02

Publisher: 서울대학교 대학원

Keywords: 하이퍼파라미터 최적화 ; 훈련 시간 ; 상호 의존성 ; 최적화 전략 ; 심층신경망

Description: 학위논문 (석사)-- 서울대학교 대학원 : 융합과학부, 2017. 2. 이원종.

Abstract: While the need for feature engineering is greatly reduced in deep neural networks (DNN) in contrast to the machine learning (ML), Hyperparameter optimization (HPO) of DNN emerged as an important problem instead.
When DNN becomes deeper, the number of hyperparameters and the training time for each hyperparameter vector tends to increase significantly over traditional ML.
The HPO algorithms, which are often considered less efficient than manual HPO performed by experts with experiences, are more important in DNN due to the increased complexity of DNN's hyperparameters.
This thesis evaluates the existing HPO algorithms in DNN and analyzes the hyperparameter interdependencies from the viewpoints of test error and training time.
Spearmint, an existing Bayesian optimization method that updates the prior distribution from history, performed well when five or less hyperparameter involved.
Conducting experiments for HPO with seven hyperparameters of MNIST LeNet-5, a convolutional neural network (CNN) shows that
the test error distribution by a hyperparameter looks like a U shape, where test error changes abruptly.
However, the training time is strongly tied with the number of epochs and the number of neurons in DNN architecture.
Hence, HPO strategies utilizing the number of epochs and estimated training time are introduced and investigated in this thesis.
A strategy in this work consists of a coarse optimization and a fine optimization that are trains for small epochs and for large epochs, respectively.
Using a developed framework which provides traceability, extensibility, and comparability to HPO methods,
extended HPO methods are investigated by which apply fine optimization strategy after coarse optimization strategy to any HPO method.
Thus, it was found that extended methods can find better performance faster than the original method.
This thesis reveals that hyperparameter interdependency affects test error and training time variability in a CNN.
And utilizing the training time, which is highly predictable from a hyperparameter vector, shed light on the HPO speed enhancement of DNN.

Language: English

URI: https://hdl.handle.net/10371/133234

Files in This Item:

000000142054.pdf 9.68 MB

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Transdisciplinary Studies(융합과학부)
  - Theses (Master's Degree_융합과학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share