Publications

Detailed Information

Comparative study of computational algorithms for the Lasso under high-dimensional, highly correlated data : 고차원의 상관계수가 높은 자료에서의 라쏘의 계산 알고리즘들의 비교 연구

DC Field Value Language
dc.contributor.advisor임요한-
dc.contributor.author김백진-
dc.date.accessioned2017-07-19T08:46:04Z-
dc.date.available2017-07-19T08:46:04Z-
dc.date.issued2016-02-
dc.identifier.other000000132238-
dc.identifier.urihttps://hdl.handle.net/10371/131305-
dc.description학위논문 (석사)-- 서울대학교 대학원 : 통계학과, 2016. 2. 임요한.-
dc.description.abstractVariable selection is important in high-dime\-nsional data analysis. The Lasso regression is useful since it possesses sparsity, soft-decision rule, and computational efficiency.
However, since the Lasso penalized likelihood contains a nondifferentiable term, standard optimization tools cannot be applied. Many computation algorithms to optimize this Lasso penalized likelihood function in high-dimensional settings have been proposed. To name a few, coordinate descent (CD) algorithm, majorization-minimization using local quadratic approximation, fast iterative shrinkage thresholding algorithm (FISTA) and alternating direction methods of multiplier (ADMM).
In this paper, we undertake a comparative study that analyzes relative merits of these algorithms. We are especially concerned with numerical sensitivity to the correlation between the covariates. We conduct a simulation study considering factors
that affect the condition number of covariance matrix of the covariates, as well as the level of penalization. We apply the algorithms to cancer biomarker discovery, and compare convergence speed and stability.
-
dc.description.tableofcontents1 Introduction 1

2 Preliminaries 5
2.1. Coordinate Descent Algorithm (CD) 5
2.2. Majorization-Minimization using Local Quadratic Approximation (MM-LQA) 6
2.3. Fast Iterative Shrinkage Thresholding Algorithm (FISTA) 9
2.4. Alternating Direction Methods of Multipliers (ADMM) 11

3 Numerical Study 15
3.1. Method 15
3.1.1. Design of numerical study 15
3.1.2. Data generation 17
3.1.3. Algorithm parameters 17
3.2. Results of numerical study 18
3.2.1. Sensitivity to the condition number of population covariance matrices 19
3.2.2. Sensitivity to the ratio p=n 20
3.2.3. Sensitivity to the regularization parameter 21
3.2.4. Accuracy 22
3.2.5. Computation time 22
3.2.6. Oscillation of ADMM 23
3.2.7. Non-convergence 23

4 Application to cancer biomarker discovery 31
4.1. Method 32
4.2. Results 33

5 Conclusion 37

Appendices 43
0.1. Preconditioned conjugate gradient (PCG) method 44

국문초록 47
-
dc.formatapplication/pdf-
dc.format.extent2086515 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectLasso-
dc.subject.ddc519-
dc.titleComparative study of computational algorithms for the Lasso under high-dimensional, highly correlated data-
dc.title.alternative고차원의 상관계수가 높은 자료에서의 라쏘의 계산 알고리즘들의 비교 연구-
dc.typeThesis-
dc.contributor.AlternativeAuthorKim Baekjin-
dc.description.degreeMaster-
dc.citation.pages47-
dc.contributor.affiliation자연과학대학 통계학과-
dc.date.awarded2016-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share