Publications

Detailed Information

Intelligent Data Selection and Semi-Supervised Learning for Support Vector Regression : Support Vector Regression을 위한 지능적 데이터 선택 및 Semi-Supervised Learning

DC Field Value Language
dc.contributor.advisor조성준-
dc.contributor.author김동일-
dc.date.accessioned2017-07-13T06:03:39Z-
dc.date.available2017-07-13T06:03:39Z-
dc.date.issued2013-02-
dc.identifier.other000000008858-
dc.identifier.urihttps://hdl.handle.net/10371/118235-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 산업공학과, 2013. 2. 조성준.-
dc.description.abstractSupport Vector Regression (SVR), a regression version of Support Vector
Machines (SVM), employing Structural Risk Minimization (SRM) principle
has become one of the most spotlighted algorithms with the capability
of solving nonlinear problems using the kernel trick. Despite of the great
generalization performance, there still exist open problems for SVR to overcome.
In this dissertation, two major open problems of SVR are studied:
(1) training complexity and (2) Semi–Supervised SVR (SS–SVR).
Since the training complexity of SVR is highly related to the number
of training data n: O(n3), training time complexity and O(n2), the training
memory complexity, it makes SVR difficult to be applied to big–sized
real–world datasets. In this dissertation, a data selection method, Margin
based Data Selection (MDS), was proposed in order to reduce the training
complexity. In order to overcome the training complexity problem, reducing
the number of training data is an effective approach. Data selection
approach is designed to select important or informative data among all
training data. For SVR, the most important data are support vectors. By
ε–loss foundation and the maximum margin learning, all support vectors of
SVR are located on or outside the ε–tube. With multiple sample learning,
MDS estimated the margin for all training data, efficiently. MDS selected
a subset of data by comparing the margin and ε. Through the experiments
conducted on 20 datasets, the performance of MDS was better than the
benchmark methods. The training time of SVR including running time of
MDS was with 38% ∼ 67% of training time of original datasets. At the
same time, the accuracy loss was 0% ∼ 1% of original SVR model.
Recently, the size of dataset is getting larger, and data are collected
from various applications. Since collecting the labeled data is expensive
and time consuming, the fraction of the unlabeled data over the labeled
data is getting increased. The conventional supervised learning method
uses only labeled data to train. Recently, Semi–Supervised Learning (SSL)
has been proposed in order to improve the conventional supervised learning
by training the unlabeled data along with the labeled data. In this dissertation,
a data generation and selection method for SS–SVR training is
proposed. In order to estimate the label distribution of the unlabeled data,
Probabilistic Local Reconstruction method (PLR) was employed. In order
to get robustness to noisy data, two PLRs (PLRlocal and PLRglobal) were
employed and the final label distribution was obtained by the conjugation
of 2–PLR. Then, training data were generated from the unlabeled data
with their the estimated label distribution. The data generation rate was
differed by uncertainty of the labeling. After that, MDS was employed to
reduce the training complexity increased by the generated data. Through
the experiments conducted on 18 datasets, the proposed method could improve
about 10% of the accuracy than the conventional supervised SVR,
and the training time of the proposed method including the construction
of final SVR was less than 25% of benchmark methods.
Two applications are analyzed. For response modeling, SVR based
two–stage response modeling, identifying respondents at the first stage and
then ranking them according to expected profit at the second stage, was
proposed. And MDS was employed in order to reduce the training complexity
of two–stage response modeling. The experimental results showed
that SVR employed two–stage response model could increase the profit
than the conventional response model. MDS reduced the training complexity
of SVR to about 60% of original SVR with minimum profit loss.
For Virtual Metrology (VM), the proposed SS–SVR method was applied
to a real–world VM dataset by using the unlabeled data with the labeled
data for training. Data were collected from two pieces of equipment of the
photo process. The experimental results showed the proposed SS–SVR
method could improve the accuracy about 8% on average than that of the
conventional VM model. The accuracy of proposed method was better
than benchmark method while the training time of the proposed method
was relatively small than benchmark methods.
-
dc.description.tableofcontentsAbstract i
Notation iv
Contents vi
List of Tables ix
List of Figures x
1 Introduction 1
1.1 Support Vector Regression . . . . . . . . . . . . . . . . . . . 2
1.2 Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . 5
1.4 Contributions of this Dissertation . . . . . . . . . . . . . . . 6
2 Literature Review 9
2.1 Support Vector Regression . . . . . . . . . . . . . . . . . . . 9
2.2 Data Selection for Support Vector Regression . . . . . . . . 12
2.2.1 Time Complexity Reduction . . . . . . . . . . . . . . 12
2.2.2 Data Selection Method . . . . . . . . . . . . . . . . . 13
2.2.3 Data Selection Method for Support Vector Regression 14
2.3 Semi–Supervised Learning for Support Vector Regression . 17
2.3.1 Semi–Supervised Learning . . . . . . . . . . . . . . . 17
2.3.2 Semi–Supervised Learning for Regression . . . . . . 18
3 Data Selection for Support Vector Regression 23
3.1 Voting based Data Selection . . . . . . . . . . . . . . . . . . 24
3.2 Margin based Data Selection . . . . . . . . . . . . . . . . . 25
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Experiment Setting . . . . . . . . . . . . . . . . . . . 28
3.3.2 Experimental Results . . . . . . . . . . . . . . . . . 30
3.4 Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 Data Generation and Selection for Semi-Supervised Sup
port Vector Regression 42
4.1 Labeling the Unlabeled Data . . . . . . . . . . . . . . . . . 43
4.2 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Data Selection and Support Vector Regression . . . . . . . 50
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 50
4.4.1 Experiment Setting . . . . . . . . . . . . . . . . . . . 50
4.4.2 Experimental Results . . . . . . . . . . . . . . . . . 53
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 Application 1: Data Selection for Response Modeling 71
5.1 Response Modeling . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Two–Stage Response Modeling . . . . . . . . . . . . . . . . 72
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 73
5.3.1 Experiment Setting . . . . . . . . . . . . . . . . . . . 73
5.3.2 Experimental Results . . . . . . . . . . . . . . . . . 75
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6 Application 2: Semi-Supervised Support Vector Regres-
sion for Virtual Metrology 80
6.1 Virtual Metrology . . . . . . . . . . . . . . . . . . . . . . . 80
6.2 Semi–Supervised Learning for Virtual Metrology . . . . . . 82
6.3 Virtual Metrology Process . . . . . . . . . . . . . . . . . . . 83
6.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3.2 Data Acquisition . . . . . . . . . . . . . . . . . . . . 83
6.3.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . 84
6.3.4 Feature Selection . . . . . . . . . . . . . . . . . . . . 84
6.3.5 Virtual Metrology Modeling . . . . . . . . . . . . . . 85
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 85
6.4.1 Experiment Setting . . . . . . . . . . . . . . . . . . . 85
6.4.2 Experimental Results . . . . . . . . . . . . . . . . . 86
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7 Conclusion 93
7.1 Summary and Contributions . . . . . . . . . . . . . . . . . . 93
7.2 Limitations and Future Work . . . . . . . . . . . . . . . . . 97
Bibliography 99
-
dc.formatapplication/pdf-
dc.format.extent2026497 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subject데이터마이닝-
dc.subject데이터 선택-
dc.subjectSupport Vector Machine-
dc.subjectSemi-Supervised Learning-
dc.subject.ddc670-
dc.titleIntelligent Data Selection and Semi-Supervised Learning for Support Vector Regression-
dc.title.alternativeSupport Vector Regression을 위한 지능적 데이터 선택 및 Semi-Supervised Learning-
dc.typeThesis-
dc.description.degreeDoctor-
dc.citation.pages106-
dc.contributor.affiliation공과대학 산업공학과-
dc.date.awarded2013-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share