Publications
Detailed Information
Comparative study of computational algorithms for the Lasso under high-dimensional, highly correlated data : 고차원의 상관계수가 높은 자료에서의 라쏘의 계산 알고리즘들의 비교 연구
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 임요한 | - |
dc.contributor.author | 김백진 | - |
dc.date.accessioned | 2017-07-19T08:46:04Z | - |
dc.date.available | 2017-07-19T08:46:04Z | - |
dc.date.issued | 2016-02 | - |
dc.identifier.other | 000000132238 | - |
dc.identifier.uri | https://hdl.handle.net/10371/131305 | - |
dc.description | 학위논문 (석사)-- 서울대학교 대학원 : 통계학과, 2016. 2. 임요한. | - |
dc.description.abstract | Variable selection is important in high-dime\-nsional data analysis. The Lasso regression is useful since it possesses sparsity, soft-decision rule, and computational efficiency.
However, since the Lasso penalized likelihood contains a nondifferentiable term, standard optimization tools cannot be applied. Many computation algorithms to optimize this Lasso penalized likelihood function in high-dimensional settings have been proposed. To name a few, coordinate descent (CD) algorithm, majorization-minimization using local quadratic approximation, fast iterative shrinkage thresholding algorithm (FISTA) and alternating direction methods of multiplier (ADMM). In this paper, we undertake a comparative study that analyzes relative merits of these algorithms. We are especially concerned with numerical sensitivity to the correlation between the covariates. We conduct a simulation study considering factors that affect the condition number of covariance matrix of the covariates, as well as the level of penalization. We apply the algorithms to cancer biomarker discovery, and compare convergence speed and stability. | - |
dc.description.tableofcontents | 1 Introduction 1
2 Preliminaries 5 2.1. Coordinate Descent Algorithm (CD) 5 2.2. Majorization-Minimization using Local Quadratic Approximation (MM-LQA) 6 2.3. Fast Iterative Shrinkage Thresholding Algorithm (FISTA) 9 2.4. Alternating Direction Methods of Multipliers (ADMM) 11 3 Numerical Study 15 3.1. Method 15 3.1.1. Design of numerical study 15 3.1.2. Data generation 17 3.1.3. Algorithm parameters 17 3.2. Results of numerical study 18 3.2.1. Sensitivity to the condition number of population covariance matrices 19 3.2.2. Sensitivity to the ratio p=n 20 3.2.3. Sensitivity to the regularization parameter 21 3.2.4. Accuracy 22 3.2.5. Computation time 22 3.2.6. Oscillation of ADMM 23 3.2.7. Non-convergence 23 4 Application to cancer biomarker discovery 31 4.1. Method 32 4.2. Results 33 5 Conclusion 37 Appendices 43 0.1. Preconditioned conjugate gradient (PCG) method 44 국문초록 47 | - |
dc.format | application/pdf | - |
dc.format.extent | 2086515 bytes | - |
dc.format.medium | application/pdf | - |
dc.language.iso | en | - |
dc.publisher | 서울대학교 대학원 | - |
dc.subject | Lasso | - |
dc.subject.ddc | 519 | - |
dc.title | Comparative study of computational algorithms for the Lasso under high-dimensional, highly correlated data | - |
dc.title.alternative | 고차원의 상관계수가 높은 자료에서의 라쏘의 계산 알고리즘들의 비교 연구 | - |
dc.type | Thesis | - |
dc.contributor.AlternativeAuthor | Kim Baekjin | - |
dc.description.degree | Master | - |
dc.citation.pages | 47 | - |
dc.contributor.affiliation | 자연과학대학 통계학과 | - |
dc.date.awarded | 2016-02 | - |
- Appears in Collections:
- Files in This Item:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.