S-Space College of Natural Sciences (자연과학대학) Program in Bioinformatics (협동과정-생물정보학전공) Theses (Master's Degree_협동과정-생물정보학전공)
Risk prediction using common and rare genetic variants: application to Type 2 diabetes
- 자연과학대학 협동과정 생물정보학전공
- Issue Date
- 서울대학교 대학원
- Whole exome sequencing (WES) ; Risk prediction model ; Type 2 diabetes (T2D) ; Penalized regression methods ; Stepwise selection ; Support vector machine (SVM)
- 학위논문 (석사)-- 서울대학교 대학원 : 자연과학대학 협동과정 생물정보학전공, 2018. 2. 박태성.
- Genome-wide association studies (GWAS) have identified many disease-related common variants.Common genetic variants are being diagnosed and treated.Furthermore,using common genetic variants, there have been several prediction models suggested based on penalized regression or statistical learning methods. However, the common variant is not sufficient to explain the phenotype. One way to solve this problem is to consider rare variations. This is because rare variants has a large impact on disease. A recent development of next generation sequencing technology (NGS) has identified several disease-related rare genetic variants. However only a few studies have compared predictive models using both common and rare variants. The aim of our study is to compare the performance of prediction models systematically by using common and rare variants from the Whole Exome Sequencing (WES) data of Type 2 Diabetes Genetic Exploration by Next-generation sequencing in Ethnic Samples (T2D-GENES) Consortium. We first constructed risk prediction models, such as stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), Elastic-Net (EN) and support vector machine (SVM). We then compared prediction accuracy by calculating the area under the curve (AUC). Our results show that the performance using both common and rare variants was better than using either the common variants only or the rare variants only. Although the AUC values were different depending on the variant sets, the AUC values of SVM prediction models were always larger than those of other prediction models.Among the four rare variant sets, AUC value was larger at ptv_ns set.