Deep learning methodologies for semi-supervised and unsupervised cases

Abstract: In this thesis, we propose two learning methodologies for deep learning models.
We consider two cases: semi-supervised learning and unsupervised learning.

In semi-supervised learning, we spell out a new semi-supervised learning method, called GAB, that searches for a decision boundary whose neighborhood overlaps the least with the support of unlabeled data.
We construct a formal measure of the degree of overlap between the neighborhood of a given decision boundary and the support of unlabeled data and develop an algorithm to learn the model, which minimizes this penalty term. We theoretically prove that GAB finds the Bayes classifier successively and devise an algorithm with an approximated penalty term by generating artificial data near the current decision boundary based on an adversarial training technique. We empirically show that GAB not only competes well with the recent studies in prediction power but also requires much smaller computational resources.

In unsupervised learning, we propose a method for generative models maximizing the log-likelihood of observable variables directly by using the EM algorithm and an importance sampling algorithm, instead of employing variational inference. A novel feature of the proposed method is to develop a warm start technique by taking a convex combination of the expected complete log-likelihood and variational lower bound in the E-step, which stabilizes the learning procedure and thus results in superior performance. The proposed learning method called VAEM outperforms other variational methods in terms of the test log-likelihood without increasing computational cost much, generates more sharp and realistic images, and can be easily modified for nonstandard cases such as the presence of missing data which is not obvious for variational methods.
본 논문은 준지도 학습과 비지도 학습에서 딥러닝 모형을 학습하는 새로운 방법론을 제안한다.

첫째로, 준지도 학습에서는 현재의 분류기의 결정 경계 근방의 인공 자료가 실제 자료와 가능한 겹치지 않는 방향으로 분류기를 학습하는 새로운 준지도 학습 방법을 제시한다. 이를 구현하기 위해 결정 경계 근방과 실제 자료의 중첩 정도를 측정할 수 있는 새로운 지수를 개발하였고, 해당 지수를 최소화하는 방향으로 분류기를 찾는 것을 본 방법론의 목표로 한다. 또한 제안한 중첩 지수를 최소화하는 분류기의 결정 경계는 베이즈 분류기의 결정 경계와 거의 같음을 이론적으로 증명하였다. 중첩 지수의 근사를 위해서는 결정 경계 근방의 인공 자료를 생성하는 과정이 필요한데, 본 논문에서는 대립 훈련 방법을 응용한 인공 자료를 생성하는 방법을 제안한다. 본 연구에서 새롭게 제안한 준지도 방법론이 우수한 분류기를 효율적으로 잘 추정할 수 있음을 다양한 벤치마크 자료들에 적용하여 실험적으로 입증하였다.

둘째로, 비지도 학습에서는 EM 알고리즘과 중요도 표집을 통해 관측 변수의 우도 함수를 직접적으로 최대화하는 학습 방법을 제안한다. 특히 웜 스타트 방법을 응용하여 E-단계에서 로그 결합 우도의 기댓값 함수와 변분 하한 함수의 가중 평균을 사용하는 새로운 방법을 개발하였으며, 제시한 방법이 우수한 모형을 안정적으로 추정할 수 있게 해줌을 실험적으로 증명하였다. 준지도 학습과 마찬가지로 본 연구에서 제안한 비지도 학습 방법론이 타 방법론에 비해 우수한 성능을 가지고 있으며, 결측 자료가 존재할 때에도 잘 적용될 수 있음을 실험적으로 증명하였다.

Language: eng

URI: https://hdl.handle.net/10371/162436

http://dcollection.snu.ac.kr/common/orgView/000000157815

Files in This Item:

000000157815.pdf 4.17 MB

Appears in Collections:

College of Natural Sciences (자연과학대학)
- Dept. of Statistics (통계학과)
  - Theses (Ph.D. / Sc.D._통계학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share