Two Stage Dantzig Selector for High Dimensional Data
- 자연과학대학 통계학과
- Issue Date
- 서울대학교 대학원
- High dimensional regression; variable selection; Dantzig selector; selection consistency; oracle estimator; inverse covariance matrix estimation
- 학위논문 (박사)-- 서울대학교 대학원 : 통계학과, 2014. 2. 김용대.
- Variable selection is important in high dimensional regression. The traditional variable selection methods such as stepwise selection are unstable which means that the set of the selected variables are varying according to the data sets. As an alternative to those methods, a series of penalized methods are used for estimation and variable selection simultaneously. The LASSO yields sparse solution, but it is not selection consistent and biased. Non-convex penalized methods such as the SCAD and the MCP are known to be selection consistent and yield unbiased estimator. However they suffer from multiple local minima and their computations are unstable for tuning parameter. Two stage methods based on the LASSO such as one step LLA and calibrated CCCP are developed which can obtain the oracle estimator as the unique local minimum.
We propose a two stage method based on Dantzig selector. The motivation of our proposed method is that lessening the effect of the noise variables is important in the two stage method. The l1 norm of the Dantzig selector is always less than equal to that of the LASSO and the non-asymptotic error bounds of Dantzig selector tent to be lesser than those of LASSO for the same tuning parameter. Therefore we expect the improvement on the estimation using the Dantzig selector instead of the LASSO in the two stage method while this proposed method also satisfies the selection consistency. The results of the numerical experiments can support our contention.
We also apply these two stage methods which are based on LASSO or Dantzig selector to estimation of inverse covariance matrix (a.k.a. precision matrix). Precision matrix estimation is essential not only because it can be used in various applications but also because it refers to the direct relationship between variables via the conditional dependence of variables under the normality assumption. Under some regularity conditions our methods hold selection consistency and obtain columnwise root n-consistent estimator for true nonzero precision matrix elements. The numerical analyses show that the proposed methods perform well in terms of variable selection and estimation.