Intelligent Data Selection and Semi-Supervised Learning for Support Vector Regression

김동일

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Intelligent Data Selection and Semi-Supervised Learning for Support Vector Regression : Support Vector Regression을 위한 지능적 데이터 선택 및 Semi-Supervised Learning

DC Field	Value	Language
dc.contributor.advisor	조성준	-
dc.contributor.author	김동일	-
dc.date.accessioned	2017-07-13T06:03:39Z	-
dc.date.available	2017-07-13T06:03:39Z	-
dc.date.issued	2013-02	-
dc.identifier.other	000000008858	-
dc.identifier.uri	https://hdl.handle.net/10371/118235	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 : 산업공학과, 2013. 2. 조성준.	-
dc.description.abstract	Support Vector Regression (SVR), a regression version of Support Vector Machines (SVM), employing Structural Risk Minimization (SRM) principle has become one of the most spotlighted algorithms with the capability of solving nonlinear problems using the kernel trick. Despite of the great generalization performance, there still exist open problems for SVR to overcome. In this dissertation, two major open problems of SVR are studied: (1) training complexity and (2) Semi–Supervised SVR (SS–SVR). Since the training complexity of SVR is highly related to the number of training data n: O(n3), training time complexity and O(n2), the training memory complexity, it makes SVR difficult to be applied to big–sized real–world datasets. In this dissertation, a data selection method, Margin based Data Selection (MDS), was proposed in order to reduce the training complexity. In order to overcome the training complexity problem, reducing the number of training data is an effective approach. Data selection approach is designed to select important or informative data among all training data. For SVR, the most important data are support vectors. By ε–loss foundation and the maximum margin learning, all support vectors of SVR are located on or outside the ε–tube. With multiple sample learning, MDS estimated the margin for all training data, efficiently. MDS selected a subset of data by comparing the margin and ε. Through the experiments conducted on 20 datasets, the performance of MDS was better than the benchmark methods. The training time of SVR including running time of MDS was with 38% ∼ 67% of training time of original datasets. At the same time, the accuracy loss was 0% ∼ 1% of original SVR model. Recently, the size of dataset is getting larger, and data are collected from various applications. Since collecting the labeled data is expensive and time consuming, the fraction of the unlabeled data over the labeled data is getting increased. The conventional supervised learning method uses only labeled data to train. Recently, Semi–Supervised Learning (SSL) has been proposed in order to improve the conventional supervised learning by training the unlabeled data along with the labeled data. In this dissertation, a data generation and selection method for SS–SVR training is proposed. In order to estimate the label distribution of the unlabeled data, Probabilistic Local Reconstruction method (PLR) was employed. In order to get robustness to noisy data, two PLRs (PLRlocal and PLRglobal) were employed and the final label distribution was obtained by the conjugation of 2–PLR. Then, training data were generated from the unlabeled data with their the estimated label distribution. The data generation rate was differed by uncertainty of the labeling. After that, MDS was employed to reduce the training complexity increased by the generated data. Through the experiments conducted on 18 datasets, the proposed method could improve about 10% of the accuracy than the conventional supervised SVR, and the training time of the proposed method including the construction of final SVR was less than 25% of benchmark methods. Two applications are analyzed. For response modeling, SVR based two–stage response modeling, identifying respondents at the first stage and then ranking them according to expected profit at the second stage, was proposed. And MDS was employed in order to reduce the training complexity of two–stage response modeling. The experimental results showed that SVR employed two–stage response model could increase the profit than the conventional response model. MDS reduced the training complexity of SVR to about 60% of original SVR with minimum profit loss. For Virtual Metrology (VM), the proposed SS–SVR method was applied to a real–world VM dataset by using the unlabeled data with the labeled data for training. Data were collected from two pieces of equipment of the photo process. The experimental results showed the proposed SS–SVR method could improve the accuracy about 8% on average than that of the conventional VM model. The accuracy of proposed method was better than benchmark method while the training time of the proposed method was relatively small than benchmark methods.	-
dc.description.tableofcontents	Abstract i Notation iv Contents vi List of Tables ix List of Figures x 1 Introduction 1 1.1 Support Vector Regression . . . . . . . . . . . . . . . . . . . 2 1.2 Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . 5 1.4 Contributions of this Dissertation . . . . . . . . . . . . . . . 6 2 Literature Review 9 2.1 Support Vector Regression . . . . . . . . . . . . . . . . . . . 9 2.2 Data Selection for Support Vector Regression . . . . . . . . 12 2.2.1 Time Complexity Reduction . . . . . . . . . . . . . . 12 2.2.2 Data Selection Method . . . . . . . . . . . . . . . . . 13 2.2.3 Data Selection Method for Support Vector Regression 14 2.3 Semi–Supervised Learning for Support Vector Regression . 17 2.3.1 Semi–Supervised Learning . . . . . . . . . . . . . . . 17 2.3.2 Semi–Supervised Learning for Regression . . . . . . 18 3 Data Selection for Support Vector Regression 23 3.1 Voting based Data Selection . . . . . . . . . . . . . . . . . . 24 3.2 Margin based Data Selection . . . . . . . . . . . . . . . . . 25 3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 28 3.3.1 Experiment Setting . . . . . . . . . . . . . . . . . . . 28 3.3.2 Experimental Results . . . . . . . . . . . . . . . . . 30 3.4 Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . 35 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4 Data Generation and Selection for Semi-Supervised Sup port Vector Regression 42 4.1 Labeling the Unlabeled Data . . . . . . . . . . . . . . . . . 43 4.2 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3 Data Selection and Support Vector Regression . . . . . . . 50 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 50 4.4.1 Experiment Setting . . . . . . . . . . . . . . . . . . . 50 4.4.2 Experimental Results . . . . . . . . . . . . . . . . . 53 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5 Application 1: Data Selection for Response Modeling 71 5.1 Response Modeling . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 Two–Stage Response Modeling . . . . . . . . . . . . . . . . 72 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 73 5.3.1 Experiment Setting . . . . . . . . . . . . . . . . . . . 73 5.3.2 Experimental Results . . . . . . . . . . . . . . . . . 75 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6 Application 2: Semi-Supervised Support Vector Regres- sion for Virtual Metrology 80 6.1 Virtual Metrology . . . . . . . . . . . . . . . . . . . . . . . 80 6.2 Semi–Supervised Learning for Virtual Metrology . . . . . . 82 6.3 Virtual Metrology Process . . . . . . . . . . . . . . . . . . . 83 6.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 83 6.3.2 Data Acquisition . . . . . . . . . . . . . . . . . . . . 83 6.3.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . 84 6.3.4 Feature Selection . . . . . . . . . . . . . . . . . . . . 84 6.3.5 Virtual Metrology Modeling . . . . . . . . . . . . . . 85 6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 85 6.4.1 Experiment Setting . . . . . . . . . . . . . . . . . . . 85 6.4.2 Experimental Results . . . . . . . . . . . . . . . . . 86 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7 Conclusion 93 7.1 Summary and Contributions . . . . . . . . . . . . . . . . . . 93 7.2 Limitations and Future Work . . . . . . . . . . . . . . . . . 97 Bibliography 99	-
dc.format	application/pdf	-
dc.format.extent	2026497 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	데이터마이닝	-
dc.subject	데이터 선택	-
dc.subject	Support Vector Machine	-
dc.subject	Semi-Supervised Learning	-
dc.subject.ddc	670	-
dc.title	Intelligent Data Selection and Semi-Supervised Learning for Support Vector Regression	-
dc.title.alternative	Support Vector Regression을 위한 지능적 데이터 선택 및 Semi-Supervised Learning	-
dc.type	Thesis	-
dc.description.degree	Doctor	-
dc.citation.pages	106	-
dc.contributor.affiliation	공과대학 산업공학과	-
dc.date.awarded	2013-02	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Industrial Engineering (산업공학과)
  - Theses (Ph.D. / Sc.D._산업공학과)

Files in This Item:

000000008858.pdf 1.93 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share