Comprehensive Understanding and Design of Visual Knowledge Distillation

허병호

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Comprehensive Understanding and Design of Visual Knowledge Distillation : 영상 지식 증류 방법의 포괄적 이해 및 설계

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 허병호

Advisor: 최진영

Issue Date: 2019-08

Publisher: 서울대학교 대학원

Keywords: Knowledge distillation ; knowledge transfer ; network compression

Description: 학위논문(박사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2019. 8. 최진영.

Abstract: Knowledge distillation is a method to help learning a new network (student) by using
information of already trained network (teacher). As a new learning method, knowledge
distillation is actively studied for various applications such as network compression.
However, the analysis of knowledge distillation is not sufficient. In this dissertation, we
pursue a comprehensive understanding on the knowledge distillation mechanism and
develop advanced distillation methods for knowledge transfer. This dissertation includes
three distillation technique advances: distillation of decision boundary, distillation of
activation boundary, and comprehensive design of distillation loss.
First, we provide a new perspective based on a decision boundary, which is one
of the most important components of a classifier. The generalization performance of a
classifier is closely related to the adequacy of its decision boundary, so a good classifier
bears a good decision boundary. Therefore, transferring information closely related
to the decision boundary can be a good attempt for knowledge distillation. To realize
this goal, we utilize an adversarial attack to discover samples supporting a decision
boundary. Based on this idea, to transfer more accurate information about the decision
boundary, the proposed algorithm trains a student classifier based on the adversarial
samples supporting the decision boundary. Experiments show that the proposed method
indeed improves knowledge distillation and achieves the state-of-the-arts performance.
Second, based on the idea of decision boundary, we moved from distillation based on
network output to distillation based on hidden layer response. An activation boundary
for a neuron refers to a separating hyperplane that determines whether the neuron
is activated or deactivated. It has been long considered in neural networks that the
activations of neurons, rather than their exact output values, play the most important
role in forming classification-friendly partitions of the hidden feature space. However, as
far as we know, this aspect of neural networks has not been considered in the literature
of knowledge transfer. We propose a knowledge transfer method via distillation of
activation boundaries formed by hidden neurons. For the distillation, we propose an
activation transfer loss that has the minimum value when the boundaries generated by
the student coincide with those by the teacher. Since the activation transfer loss is not
differentiable, we design a piecewise differentiable loss approximating the activation
transfer loss. By the proposed method, the student learns a separating boundary between
activation region and deactivation region formed by each neuron in the teacher. Through
the experiments in various aspects of knowledge transfer, it is verified that the proposed
method outperforms the current state-of-the-art.
Lastly, we investigate the design aspects of feature distillation methods achieving
network compression and propose a novel feature distillation method in which the
distillation loss is designed to make a synergy among various aspects: teacher transform,
student transform, distillation feature position and distance function. Our proposed
distillation loss includes a feature transform with a newly designed margin ReLU, a
new distillation feature position, and a partial L2 distance function to skip redundant
information giving adverse effects to the compression of student. In ImageNet, our
proposed method achieves 21.65% of top-1 error with ResNet50, which outperforms the
performance of the teacher network, ResNet152. Our proposed method is evaluated on
various tasks such as image classification, object detection and semantic segmentation
and achieves a significant performance improvement in all tasks
지식 증류는 이미 학습되어 있는 신경망을 이용해 새로운 신경망의 학습을 돕는
것이다. 지식 증류는 신경망 압축과 같이 다양한 응용 분야를 가지고 있는 새로운
신경망 학습 방법으로 활발하게 연구되고 있다. 하지만, 지식 증류에 대한 분석은 아
직 충분하지 못하다. 따라서, 본 논문은 지식 증류의 작동 원리에 대한 심층 이해와
이를 바탕으로 한 지식 증류 방법의 개선을 목표로 한다. 본 논문은 3가지 방향으로
지식 증류 방법 향상시켰다. 그 3가지는 결정 경계와 지식 증류, 활성화 경계와 지식
증류, 지식 증류의 심층 설계이다.
먼저, 우리는 분류기에서 가장 중요한 요소 중 하나인 결정 경계의 관점에서 지
식 증류 방법을 분석하였다. 분류기의 일반화 성능은 분류기의 결정 경계와 크게
연관성이 있고, 따라서 좋은 분류기는 좋은 결정 경계를 가지고 있다는 것을 의미한
다. 따라서, 결정 경계에 가까운 정보를 전달하는 것이 지식 증류에 있어서도 좋은
효과를 가져올 수 있다. 이를 위해서, 우리는 적대적 공격 방법을 활용해 결정 경계와
가까운 샘플을 찾는 방법을 만들었다. 결정 경계를 보다 정확하게 전달하기 위해서,
우리는 만들어진 경계와 가까운 샘플을 지식 증류에 사용하여 학생 신경망을 학습
시켰다. 실험 결과는 결정 경계에 가까운 샘플을 사용한 제안하는 방법이 기존 지식
증류의 최고 수준의 기술을 달성했다는 것을 보여주었다.
두번째로 우리는 결정 경계에서의 개념을 기반으로 하여, 출력 기반의 증류 방
법에서 은닉 층 반응 기반의 증류 방법으로 확장하였다. 활성화 경계는 뉴런의 활성,
비활성을 가르는 분리 초평면으로 활성화 경계에 따라 뉴런의 반응 여부가 결정되
게 된다. 신경망에서 뉴런의 활성은 오래도록 중요한 요소로 다뤄져 왔고, 은닉 층을
분류에 도움이 되도록 구성하는데 중요한 역할을 담당한다. 하지만, 우리가 아는 한,
이렇게 중요한 활성화 경계는 지식 전달에서는 전혀 고려되지 않았다. 우리는 이를
해결하기위해 은닉 뉴런의 활성화 경계의 전달에 집중한 지식 전달 방법을 제안한
다. 먼저 우리는 교사 신경망과 학생 신경망의 활성화 경계가 일치했을 때 가장 작은
값을 가지는 함수를 만들고, 이를 활성화 전달 함수로 제안하였다. 활성화 경계 함
수는 미분 가능하지 않기 때문에, 우리는 활성화 경계 함수를 근사 할 수 있는 대체
함수를 디자인하였고, 이를 사용해 지식 증류를 수행 하였다. 제안하는 방법은 학생
신경망이 교사 신경망과 똑같은 뉴런 활성화 경계를 가지도록 만들어 주었다. 실험
을 통하여 제안하는 방법을 다양하게 검증해 보았고, 제안하는 방법이 최고 수준의
지식 증류 방법을 앞서는 성능을 가지도록 만들어 주었다.
마지막으로 우리는 특징층 기반 지식 증류 방법의 다양한 설계 요소에 대해 조
사하고, 조사한 내용을 바탕으로 새로운 특징층 기반 지식 증류 방법을 제안한다.
우리는 특징층 기반 지식 증류 방법을 교사 변환 함수, 학생 변환 함수, 증류 특징층
위치, 특징층 거리 함수의 4가지 요소로 나누었고 각각의 요소에서 가장 효과적인 방
법을 설계하였다. 우리가 제안한 증류 손실 함수는 새롭게 디자인된 margin ReLU와
새로운 증류층 위치, 부분적인 L2 거리 함수를 기반으로 하고 있으며, 특징층의 정
보 중 필요한 정보만을 가져오도록 디자인되었다. ImageNet 데이터셋에서 제안하는
방법은 ResNet50으로 21.65%의 에러율을 달성하였으며, 이는 3배나 깊은 신경망인
ResNet152를 앞서는 성능이다. 제안하는 방법은 이미지 분류, 물체 탐지, semantic
segmentation에서 성능을 평가하였고 모든 부분에서 커다란 성능 향상을 이룬 것을
확인하였다.

Language: eng

URI: https://hdl.handle.net/10371/162006

http://dcollection.snu.ac.kr/common/orgView/000000157818

Files in This Item:

000000157818.pdf 3.68 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share