Enhanced Neural Architecture Search and Applications

Abstract: 딥러닝과 심층 신경망은 인공지능 시스템을 실현하기 위한 가장 대중적인 선택지 중 하나가 되었다. 고도화된 딥러닝 기술을 다양한 분야에 적용하기 위해서는 이런 딥러닝 전문가의 양성 뿐만 아니라 비전문가들이 높은 수준의 성능을 지닌 심층 신경망을 확보할 수 있도록 하는 기술이 필요하다. 자동화 기계 학습에 기반한 방식으로써, 어떤 분야에서 준수한 성능을 내는 심층 신경망을 자동으로 얻을 수 있는 신경망 구조 탐색 기술이 심층 신경망 수동 설계의 실용적인 대안으로 큰 관심을 받고 있다. 정확한 탐색을 위해서는 굉장히 많은 탐색 비용이 필요하나, 가중치 공유 방법을 통해 심층 신경망의 성능을 추정할 수 있게 되어 탐색 비용이 크게 절감되었다. 가중치 공유 방법을 기반으로 다양한 신경망 구조 탐색 방법론이 제안되었고, 여러 분야에서 사람이 만든 심층 신경망의 성능을 뛰어넘는 결과를 만들어 내고 있다. 본 학위논문은 신경망 구조 탐색의 중요한 연구 주제인 어떻게 탐색 비용을 절감하는 지, 어떻게 탐색 공간과 탐색 알고리즘을 개선하여 탐색 성능을 향상시킬 지, 신경망 구조적 중요성이 간과되고 있는 분야에 신경망 구조 탐색을 어떻게 적용하는 지에 대한 방법론과 연구 결과물을 포함하고 있다.
첫번째 연구는 탐색 성능 저하 없이 신경망 구조 탐색 방법론들의 탐색 비용을 절감하는 것이다. 신경망 구조 탐색을 활용하고 싶은 연구자들에게 신경망 구조 탐색의 상당한 탐색 비용은 장애물이 된다. 이 문제를 해결하고자, 우리는 신경망 구조 탐색 가속화를 위해 주어진 데이터셋의 대표적 데이터로 이루어진 부분 집합인 프록시 데이터를 도입한다. 데이터 선택 방식은 다양한 딥러닝 분야에서 활용되어 왔지만, 신경망 구조 탐색 벤치마크 상에서의 기존 데이터 선택 방법들에 대한 평가를 통해 이들이 신경망 구조 탐색에 적합하지 않다는 것과 새로운 데이터 선택 방법이 필요하다라는 것을 보였다. 이 선택 방법들로 만든 프록시 데이터를 데이터 엔트로피를 통해 분석한 결과를 토대로, 우리는 신경망 구조 탐색에 적합한 새로운 프록시 데이터 선택 방법을 제안한다. 실효성을 보이기 위해 다양한 데이터셋과 탐색 공간들, 신경망 구조 탐색 방법들을 교차하여 철저히 실험을 하였다. 그 결과, 제안한 데이터 선택 방법을 적용한 신경망 구조 탐색 알고리즘은 전체 데이터셋을 사용했을 때 얻은 신경망의 성능에 비견하는 성능을 가진 신경망을 발견하였다. 탐색 비용을 크게 절감하였으며, 제안한 방법이 적용된 DARTS는 단일 GPU를 사용했을 때 CIFAR-10 상에서는 40분, ImageNet 상에서는 7.5시간을 소요하였다. 또한, 기존 방식과는 정반대로, 우리의 ImageNet 프록시 데이터 상에서 찾은 구조를 더 작은 데이터셋인 CIFAR-10에서 전이 평가했을 시, 최신 성능에 가까운 2.4% 테스트 오차율을 보였다.
두번째 연구에서 우리는 셀 기반 탐색 공간을 개량함으로써 이 탐색 공간 상에서 동작하는 미분가능한 신경망 구조 탐색의 성능을 개선한다. 슈퍼 신경망 셀 상에 적용된 연속적 완화를 통해 미분가능한 신경망 구조 탐색은 탐색 과정동안 다양한 셀 구조를 근사적으로 평가한다. 최적화된 슈퍼 신경망 셀로부터 셀 구조를 도출할 때 사용되는 최댓값을 기준으로 한 선택 규칙으로 인하여, 기존 셀 기반 탐색 공간에 포함되지 않으나 높은 성능을 낼 수 있는 많은 셀들이 불가피하게 전혀 선택되지 못한다. 이러한 기존 셀 도출 방식의 한계를 극복하기 위해, 우리는 탐색 공간을 확장시키고 이 확장된 탐색 공간에 존재하는 셀 구조를 도출할 수 있도록 상위-$k$ 생존 규칙을 제안한다. 한편, 상위-$k$ 생존 규칙이 적용된 기존 신경망 탐색 방법들에서는 큰 탐색 성능 저하가 일어나는 것을 관측하였다. 우리는 이 확장된 탐색 공간을 올바르게 탐색하기 위한 방법으로써, 베타 분포를 따르는 연속적 확률 변수로 후보 연산기들의 강도를 모델링하하는 BtNAS를 제안한다. BtNAS는 변분 베이즈 방법을 이용하여 연산기 강도의 참 사후 분포를 근사하며, 변분 베타 분포의 파라미터는 기울기 기반 최적화 방법과 편도함수 추정 기법을 통해 학습된다. 결과적으로 미분가능한 신경망 구조 탐색 방법들이 최근 발표한 셀 구조들에 비견되는 성능을 보이면서 더 적은 파라미터를 지닌 신경망을 생성하는 셀을 도출하였다. 또한, 슈퍼 신경망 내의 모든 중간 노드가 활성화되도록 연산기 강도를 조정하였으며, 이를 통해 CIFAR-10 상에서 2.3%의 최신 성능의 테스트 오차율을 달성하였다.
마지막 연구는 에너지 효율적인 스파이킹 신경망을 찾기 위한 신경망 구조 탐색을 활용하는 방안에 관한 것이다. 뉴런과 시냅스로 이루어진 뇌 내의 정보 전달 방식을 모방한 스파이킹 신경망은 큰 관심을 받아왔다. 스파이킹 신경망은 이산적이고 드물게 발생하는 스파이크에 의한 이벤트 기반 연산 방식을 통해 공간적, 시간적 정보를 효율적으로 처리할 수 있다. 대부분의 이전 연구들은 스파이킹 신경망의 성능과 에너지 효율성을 개선하기 위한 학습 방법론에 집중하였으며, 스파이킹 신경망에 관한 구조적 효과에 대해서는 거의 연구되지 않았다. 우리는 스파이킹 신경망을 위한 신경망 구조 탐색을 활용하기 위해, AutoSNN 이라는 스파이크를 고려한 신경망 구조 탐색 프레임워크를 제안한다. 먼저 기존 심층 신경망보다 스파이킹 신경망에 적합한 구조적 요소가 무엇인지 알아낸 뒤, 스파이킹 신경망을 위한 두 단계 탐색 공간을 구축한다. 그 뒤, 탐색 공간 내의 후보 구조들의 성능과 스파이크 생성을 추정하도록 가중치 공유 기반 원샷 슈퍼 신경망 학습 방식과 진화 탐색 알고리즘을 채택한다. 구조 평가에 사용되는 구조 적합도는 탐색 과정에서 정확도와 스파이크 수를 고려하도록 고안한다. AutoSNN이 찾은 스파이킹 신경망은 정확도와 에너지 효율성 측면에서 사람이 만든 스파이킹 신경망을 능가하였다. 또한, 우리는 뉴로모픽 데이터셋을 포함한 다양한 데이터셋에서 AutoSNN의 우월성을 보였다.
본 학위논문을 통해 중요한 세 연구 주제들에 따라 우리는 효율적이며 효과적인 신경망 구조 탐색 프레임워크를 개발하기 위한 방법들을 제안하고 신경망 구조 탐색을 이용하는 전략을 보인다. 제안한 방법들을 검증하기 위한 상당량의 실험적 결과를 제시하여, 딥러닝의 성능을 최대화하기 위해서는 구조 선택이 고려되어야함을 보였다. 따라서 본 학위논문에서 소개된 접근 방식들을 통해 신경망 구조 탐색이 적극 기용되고 다양한 분야의 연구자들이 신경망 구조 탐색과 함께 딥러닝을 이용하여 유망한 결과를 쉽게 얻을 수 있을 것이라 예상한다.
Deep learning with deep neural networks (DNNs) has become one of the most popular choice for realizing artificial intelligence systems. To apply the advanced technology of deep learning to various fields, it is necessary not only to nurture deep learning experts, but also to enable non-experts to gain DNNs with high-level performance. As an approach based on automatic machine learning, neural architecture search (NAS), which can automatically obtain a DNN architecture yielding the promising performance in the target domain, is garnering considerable interest as a practical alternative to the manual design for DNNs. Precise execution of NAS requires the tremendous search cost, but the weight-sharing makes NAS estimate the performance of DNNs, thereby significantly reducing the search cost. Based on the weight-sharing, diverse NAS methods have been proposed and produced results that exceed the performance of hand-crafted DNNs in various fields. This dissertation contains methods and substantial results on three essential research topics in NAS, including how to reduce the search cost, how to increase search performance by improving search spaces and search algorithms, and how to facilitate NAS for domains that overlook the architectural importance.
The first research is to reduce the search cost of NAS methods without search performance degradation. The significant computational search cost of NAS is a hindrance to researchers who want to employ NAS to their domains. To address this issue, we introduce proxy data, i.e., a representative subset of the target data, for the acceleration of NAS methods. Even though data selection has been used across various fields in deep learning, our evaluation for existing data selection methods on NAS benchmarks reveals that they are not suitable for NAS and a new data selection method is necessary. Based on in-depth analysis on proxy data constructed using these selection methods through data entropy, we propose a novel proxy data selection method tailored for NAS. To empirically demonstrate the effectiveness, we conduct thorough experiments across diverse datasets, search spaces, and NAS algorithms. Consequently, NAS algorithms with the proposed data selection method discover architectures that are competitive with those obtained using the entire dataset. It significantly reduces the search cost: executing DARTS with the proposed data selection method requires only 40 minutes on CIFAR-10 and 7.5 hours on ImageNet with a single GPU. Furthermore, as the inverse approach from the conventional NAS, when the architecture searched on our proxy data of ImageNet is transferred to the smaller datasets, i.e., CIFAR-10, a test error of 2.4% is yielded.
In the second research, we improve the search performance of differentiable NAS working on a cell-based search space by refining the search space. Through continuous relaxation on a super-network cell, differentiable NAS methods can approximately evaluate various cells during the search process. When a cell is derived from an optimized super-network cell, a number of the evaluated cells, which are not included in the conventional cell-based search space but may achieve a high performance, are inevitably discarded by the conventional argmax selection rule. To overcome the limitation of the conventional cell derivation, we extend the search space and propose a top-k survival rule to derive a cell belonging in the extended search space. Meanwhile, we observe that existing NAS methods experience a significant decrease in search performance when coupled with the top-k survival rule. To properly explore the extended search space, we propose BtNAS that individually models the operation strengths as continuous random variables following Beta distributions. BtNAS approximates these true posterior distributions using the variational Bayes method, and the parameters of the variational Beta distributions are trained through gradient-based optimization and the pathwise derivative estimator. BtNAS derives cells that are competitive to those recently reported by differentiable NAS methods and yield networks with fewer parameters. In addition, we adjust operation strengths to make all intermediate node in the super-network cell to be activated, and then achieve a state-of-the-art test error of 2.3% on CIFAR-10.
The last research is to exploit NAS to search for an energy-efficient spiking neural network (SNNs). SNNs, which mimic information transmission in brains with neurons and synapses, have received considerable attention. SNNs can efficiently process spatio-temporal information through event-driven computations with discrete and sparse spikes. Most previous studies focused on training methods to improve the performance and energy-efficiency of SNNs, and the architectural effect regarding SNNs was rarely studied. To facilitate NAS for SNNs, we propose a spike-aware neural architecture search framework, named AutoSNN. We first construct a two-level search space for SNNs by identifying which architectural factors are more suitable for SNNs than for conventional DNNs. Afterwards, to estimate the performance and the spike generation of candidate architectures in the search space, AutoSNN adopts the one-shot super-network training scheme based on weight-sharing and the evolutionary search algorithm. The architecture fitness value used for the architecture evaluation is specifically designed for considering both the accuracy and number of spikes during the search process. An SNN searched by AutoSNN outperforms handcrafted SNNs, in terms of both accuracy and energy-efficiency. We also demonstrate the superiority of AutoSNN on various datasets, including neuromorphic datasets.
Throughout this dissertation, we propose methods for developing an efficient and effective NAS framework and show a strategy to utilize NAS according to the three important research topics. To validate the proposed methods, we provide a significant amount of experimental results, thereby demonstrating that the architecture choice must be considered to maximize the performance of deep learning. Therefore, owing to the approaches introduced in this dissertation, it is expected that NAS will be actively employed and researchers from diverse fields can easily obtain promising results by adopting deep learning with NAS.

Language: eng

URI: https://hdl.handle.net/10371/181302

https://dcollection.snu.ac.kr/common/orgView/000000170587

Files in This Item:

000000170587.pdf 9.46 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share