Publications

Detailed Information

(A) study on hyperparameter optimization strategy utilizing training time in deep neural networks : 훈련 시간을 활용한 심층 신경망의 하이퍼파라미터 최적화 전략 연구

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

조형헌

Advisor
Wonjong Rhee
Major
융합과학기술대학원 융합과학부
Issue Date
2017-02
Publisher
서울대학교 대학원
Keywords
하이퍼파라미터 최적화훈련 시간상호 의존성최적화 전략심층신경망
Description
학위논문 (석사)-- 서울대학교 대학원 : 융합과학부, 2017. 2. 이원종.
Abstract
While the need for feature engineering is greatly reduced in deep neural networks (DNN) in contrast to the machine learning (ML), Hyperparameter optimization (HPO) of DNN emerged as an important problem instead.
When DNN becomes deeper, the number of hyperparameters and the training time for each hyperparameter vector tends to increase significantly over traditional ML.
The HPO algorithms, which are often considered less efficient than manual HPO performed by experts with experiences, are more important in DNN due to the increased complexity of DNN's hyperparameters.
This thesis evaluates the existing HPO algorithms in DNN and analyzes the hyperparameter interdependencies from the viewpoints of test error and training time.
Spearmint, an existing Bayesian optimization method that updates the prior distribution from history, performed well when five or less hyperparameter involved.
Conducting experiments for HPO with seven hyperparameters of MNIST LeNet-5, a convolutional neural network (CNN) shows that
the test error distribution by a hyperparameter looks like a U shape, where test error changes abruptly.
However, the training time is strongly tied with the number of epochs and the number of neurons in DNN architecture.
Hence, HPO strategies utilizing the number of epochs and estimated training time are introduced and investigated in this thesis.
A strategy in this work consists of a coarse optimization and a fine optimization that are trains for small epochs and for large epochs, respectively.
Using a developed framework which provides traceability, extensibility, and comparability to HPO methods,
extended HPO methods are investigated by which apply fine optimization strategy after coarse optimization strategy to any HPO method.
Thus, it was found that extended methods can find better performance faster than the original method.
This thesis reveals that hyperparameter interdependency affects test error and training time variability in a CNN.
And utilizing the training time, which is highly predictable from a hyperparameter vector, shed light on the HPO speed enhancement of DNN.
Language
English
URI
https://hdl.handle.net/10371/133234
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share