Kernel-convoluted Deep Neural Networks with Data Augmentation

Abstract: Deep neural networks have performed significantly well in various fields, including image classification. In general, we consider the empirical risk minimization (ERM) to train networks; however, this solution can lead to a small training error but a large test error, known as overfitting. To alleviate the problem, data augmentation has been widely used in deep learning. Data augmentation is a method that uses a transformed data set instead of a given original training data set. In particular, the consideration of horizontal flips or rotations of images has been settled down as a matter of course, in image classification. As a sample-mixed augmentation, the Mixup method (Zhang et al. 2018), which uses linearly interpolated data, has emerged as a useful data augmentation tool to improve the generalization ability and the robustness to adversarial examples. The motivation is to curtail undesirable oscillations by its implicit model constraint to behave linearly at in-between observed data points and promote smoothness. In this thesis, we formally investigate this premise, propose a way to impose constraints explicitly, and extend it to incorporate implicit model constraints. Furthermore, we develop the proposed model into semi-supervised classification. In both tasks, we quantify each algorithm's generalization ability via the notion of empirical Rademacher complexity.
First, we propose kernel-convoluted models (KCM) where the smoothness constraint is explicitly imposed by locally averaging all shifted original functions with a kernel function. We apply the KCM to both supervised classification and semi-supervised classification.
In supervised classification, we also propose incorporating the Mixup method into the KCM to expand smoothness domains. In both cases of the KCM and the KCM adapted with the Mixup, we provide risk analysis, respectively, under mild conditions on kernel functions and the interpolating policy. As a result, we show that the upper bound of the excess risk over a new function class is not slower than that of the excess risk over the original function class. Using several datasets such as the two-moon dataset, CIFAR-10, and CIFAR-100, our experiments demonstrate that applying explicit smoothness on models leads to a better performance than considering only implicit smoothness or not considering smoothness in terms of generalization and robustness to adversarial examples.
In semi-supervised classification, we focus on scenarios with sparsely labeled data and numerous unlabeled data. The aim is to find a classifier that improves generalization performance using unlabeled data. To make it possible for the purpose, we assume that inputs belonging to the same cluster have the same label, called the cluster assumption. We suggest a principled approach with applying the KCM under the cluster assumption and derive generalization error bounds over deep neural networks. Furthermore, we validate the proposed method through numerical studies.
깊은 신경망은 이미지 분류를 포함한 다양한 분야에서 괄목할만한 성과를 보여왔다. 신경망 학습에 널리 사용되는 방법인 경험적 위험 최소화 방법은 학습자료에서의 오류와 시험 자료에서의 오류의 차이가 커지는, 즉 과적합을 야기한다. 이러한 과적합 문제를 완화하기 위해서 딥러닝에서는 자료 증대가 광범위하게 사용되고 있다. 자료 증대는 학습자료의 특정 변환을 통해 생성한 변환된 자료를 학습에 사용하는 방법으로써 이미지 분류에서는 수평 뒤집기 그리고 회전 등이 대표적인 예제라고 할 수 있다. 더불어, Mixup이라 불리는 자료혼합 증대 방법은 학습자료의 선형보간을 통해 생성된 새로운 자료로 학습하는 방법으로써 시험 자료에서의 높은 정확도를 산출해 줄 뿐만 아니라 적대적 자료에 대한 로버스트성에 대한 증진을 보인다. 선형보간은 관측치들 사이에 선형성을 부여하여 평활도를 고취시키는 내재적 모형 제약을 고려하는 것으로 해석되어 왔다.
본 논문에서는 이러한 전제를 정규적으로 조사하여 명시적 모형 제약을 고려할 수 있는 모형을 제안한다. 해당 모형은 커널 함수와 기존 모형의 합성곱으로 정의되며 기존 함수들의 국소 평균화를 통해 명시적으로 평활화를 부여하는 특징을 가진다.
또한, 지도 분류 문제에서 제안된 모형과 내재적 모형 제약과의 통합된 방법을 제안한다. 이러한 통합된 방법은 평활도의 범위를 넓히는 방법으로 기존의 자료 증대 방법과는 차별성을 지난다. 경험적 Rademacher 복잡성을 통해 두 알고리즘의 일반화 능력을 정량화하는 이론적 성질을 규명해본다. 모의 자료와 다양한 실제 자료에서 깊은 신경망 구현을 통해 제안된 방법의 실험적 성능 향상 또한 규명한다.
지도 분류 문제와는 달리 준지도 분류 문제에서는 라벨이 있는 입력자료와 라벨이 없는 입력자료를 모두 훈련에 사용한다. 대개의 경우 준지도 분류 문제에 사용되는 학습자료는 라벨이 표시된 자료가 적고 라벨이 표시되지 않은 자료를 상대적으로 많이 가지고 있다. 이러한 자료에서 분류 성능을 향상시키기 위해서는 가정이 반드시 요구된다. 대표적으로 흔히 사용되는 군집 가정은 입력자료가 같은 군집에 있는 경우 해당 입력자료의 라벨은 같은 값을 가진다는 의미한다. 군집화 알고리즘을 통해 해당 군집 가정을 만족하는 방법론을 제안하고 앞서 제안한 명시적 모형 제약을 위한 모형을 준지도 분류 문제로 확장해본다. 깊은 신경망에서 이론적 성질과 다양한 자료 분석을 통해 해당 방법론의 타당성을 규명한다.

Language: eng

URI: https://hdl.handle.net/10371/176092

https://dcollection.snu.ac.kr/common/orgView/000000164376

Files in This Item:

000000164376.pdf 7.03 MB

Appears in Collections:

College of Natural Sciences (자연과학대학)
- Dept. of Statistics (통계학과)
  - Theses (Ph.D. / Sc.D._통계학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share