Publications
Detailed Information
(A) study on hyperparameter optimization strategy utilizing training time in deep neural networks : 훈련 시간을 활용한 심층 신경망의 하이퍼파라미터 최적화 전략 연구
Cited 0 time in
Web of Science
Cited 0 time in Scopus
- Authors
- Advisor
- Wonjong Rhee
- Major
- 융합과학기술대학원 융합과학부
- Issue Date
- 2017-02
- Publisher
- 서울대학교 대학원
- Keywords
- 하이퍼파라미터 최적화 ; 훈련 시간 ; 상호 의존성 ; 최적화 전략 ; 심층신경망
- Description
- 학위논문 (석사)-- 서울대학교 대학원 : 융합과학부, 2017. 2. 이원종.
- Abstract
- While the need for feature engineering is greatly reduced in deep neural networks (DNN) in contrast to the machine learning (ML), Hyperparameter optimization (HPO) of DNN emerged as an important problem instead.
When DNN becomes deeper, the number of hyperparameters and the training time for each hyperparameter vector tends to increase significantly over traditional ML.
The HPO algorithms, which are often considered less efficient than manual HPO performed by experts with experiences, are more important in DNN due to the increased complexity of DNN's hyperparameters.
This thesis evaluates the existing HPO algorithms in DNN and analyzes the hyperparameter interdependencies from the viewpoints of test error and training time.
Spearmint, an existing Bayesian optimization method that updates the prior distribution from history, performed well when five or less hyperparameter involved.
Conducting experiments for HPO with seven hyperparameters of MNIST LeNet-5, a convolutional neural network (CNN) shows that
the test error distribution by a hyperparameter looks like a U shape, where test error changes abruptly.
However, the training time is strongly tied with the number of epochs and the number of neurons in DNN architecture.
Hence, HPO strategies utilizing the number of epochs and estimated training time are introduced and investigated in this thesis.
A strategy in this work consists of a coarse optimization and a fine optimization that are trains for small epochs and for large epochs, respectively.
Using a developed framework which provides traceability, extensibility, and comparability to HPO methods,
extended HPO methods are investigated by which apply fine optimization strategy after coarse optimization strategy to any HPO method.
Thus, it was found that extended methods can find better performance faster than the original method.
This thesis reveals that hyperparameter interdependency affects test error and training time variability in a CNN.
And utilizing the training time, which is highly predictable from a hyperparameter vector, shed light on the HPO speed enhancement of DNN.
- Language
- English
- Files in This Item:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.