Intelligent Data Selection and Semi-Supervised Learning for Support Vector Regression

김동일

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Intelligent Data Selection and Semi-Supervised Learning for Support Vector Regression : Support Vector Regression을 위한 지능적 데이터 선택 및 Semi-Supervised Learning

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김동일

Advisor: 조성준

Major: 공과대학 산업공학과

Issue Date: 2013-02

Publisher: 서울대학교 대학원

Keywords: 데이터마이닝 ; 데이터 선택 ; Support Vector Machine ; Semi-Supervised Learning

Description: 학위논문 (박사)-- 서울대학교 대학원 : 산업공학과, 2013. 2. 조성준.

Abstract: Support Vector Regression (SVR), a regression version of Support Vector
Machines (SVM), employing Structural Risk Minimization (SRM) principle
has become one of the most spotlighted algorithms with the capability
of solving nonlinear problems using the kernel trick. Despite of the great
generalization performance, there still exist open problems for SVR to overcome.
In this dissertation, two major open problems of SVR are studied:
(1) training complexity and (2) Semi–Supervised SVR (SS–SVR).
Since the training complexity of SVR is highly related to the number
of training data n: O(n3), training time complexity and O(n2), the training
memory complexity, it makes SVR difficult to be applied to big–sized
real–world datasets. In this dissertation, a data selection method, Margin
based Data Selection (MDS), was proposed in order to reduce the training
complexity. In order to overcome the training complexity problem, reducing
the number of training data is an effective approach. Data selection
approach is designed to select important or informative data among all
training data. For SVR, the most important data are support vectors. By
ε–loss foundation and the maximum margin learning, all support vectors of
SVR are located on or outside the ε–tube. With multiple sample learning,
MDS estimated the margin for all training data, efficiently. MDS selected
a subset of data by comparing the margin and ε. Through the experiments
conducted on 20 datasets, the performance of MDS was better than the
benchmark methods. The training time of SVR including running time of
MDS was with 38% ∼ 67% of training time of original datasets. At the
same time, the accuracy loss was 0% ∼ 1% of original SVR model.
Recently, the size of dataset is getting larger, and data are collected
from various applications. Since collecting the labeled data is expensive
and time consuming, the fraction of the unlabeled data over the labeled
data is getting increased. The conventional supervised learning method
uses only labeled data to train. Recently, Semi–Supervised Learning (SSL)
has been proposed in order to improve the conventional supervised learning
by training the unlabeled data along with the labeled data. In this dissertation,
a data generation and selection method for SS–SVR training is
proposed. In order to estimate the label distribution of the unlabeled data,
Probabilistic Local Reconstruction method (PLR) was employed. In order
to get robustness to noisy data, two PLRs (PLRlocal and PLRglobal) were
employed and the final label distribution was obtained by the conjugation
of 2–PLR. Then, training data were generated from the unlabeled data
with their the estimated label distribution. The data generation rate was
differed by uncertainty of the labeling. After that, MDS was employed to
reduce the training complexity increased by the generated data. Through
the experiments conducted on 18 datasets, the proposed method could improve
about 10% of the accuracy than the conventional supervised SVR,
and the training time of the proposed method including the construction
of final SVR was less than 25% of benchmark methods.
Two applications are analyzed. For response modeling, SVR based
two–stage response modeling, identifying respondents at the first stage and
then ranking them according to expected profit at the second stage, was
proposed. And MDS was employed in order to reduce the training complexity
of two–stage response modeling. The experimental results showed
that SVR employed two–stage response model could increase the profit
than the conventional response model. MDS reduced the training complexity
of SVR to about 60% of original SVR with minimum profit loss.
For Virtual Metrology (VM), the proposed SS–SVR method was applied
to a real–world VM dataset by using the unlabeled data with the labeled
data for training. Data were collected from two pieces of equipment of the
photo process. The experimental results showed the proposed SS–SVR
method could improve the accuracy about 8% on average than that of the
conventional VM model. The accuracy of proposed method was better
than benchmark method while the training time of the proposed method
was relatively small than benchmark methods.

Language: English

URI: https://hdl.handle.net/10371/118235

Files in This Item:

000000008858.pdf 1.93 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Industrial Engineering (산업공학과)
  - Theses (Ph.D. / Sc.D._산업공학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share