Browse

Polygenic risk prediction models for Alzheimer's disease : 다유전적 위험 구조을 고려한 알츠하이머 질병 예측

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors
서수진
Advisor
원성호
Major
보건대학원 보건학과
Issue Date
2018-02
Publisher
서울대학교 대학원
Keywords
Alzheimer’s disease (AD)Group lasso regressionLasso regressionPenalized regressionPredictionPolygenic architecturePolygenic risk score (PRS)Ridge regression
Description
학위논문 (석사)-- 서울대학교 대학원 : 보건대학원 보건학과, 2018. 2. 원성호.
Abstract
Background: Alzheimers disease (AD) is known to have polygenic architecture, which indicates a large proportion of the susceptible single-nucleotide polymorphisms (SNPs) collectively account for a significant portion of variation of AD. Furthermore, since the effect of APOE e4 on the risk of AD is substantial, the impact of genetic factors other than the APOE gene may be masked when APOE e4 carriers and non-carriers are analyzed together. Patients with and without APOE e4 have different genetic bases and pathogenic distinctions (Jiang, et al., 2016). Therefore, stratification based on APOE e4 status can allow for the exploration of the underlying mechanism between APOE e4 carriers and non-carriers.

Objective: In this study, we assess the performance of penalized methods and non-penalized methods in the prediction of AD to consider the polygenic architecture of AD in the model. In addition, we compare the models stratified by APOE e4 status to the models with combined data.
Method: In this paper, penalized regression methods are used alternative to PRS. Unlike PRS, where a large number of underlying susceptibility genes are combined into one variable in the prediction model, the penalized regression methods consider those genes as separate variables. Owing to the penalty term to the coefficients, the problem of much larger number of genetic variants than the sample size. Some penalized methods, such as lasso (Tibshirani, 1996) and elastic-net (Zou and Hastie, 2005), conduct automatic variable selection and give more interpretable results. Furthermore, group lasso (Meier, et al., 2008) is the extension of lasso penalty and it selects variables at the predefined group level. In this paper, we grouped SNPs of APOE of non-carrier group and carrier group and apply group lasso regression. In addition, we explored the mechanisms of the two groups individually by stratify according to the presence of APOE e4 alleles. We applied the various models to National Research Center for Dementia (NRCD) data consisting of all Koreans. The predictive performance is evaluated by AUC.

Result: We assessed the odds ratio resulted from GWAS for the combined data, carrier group data and non-carrier group data. When comparing the odds ratios of 100 SNPs of which the p-value is the lowest, for the combined data, the most of large effects SNPs are on chromosome 19 where APOE gene locates and the others sporadically distributed having modest odds ratio, approximately 1.5. Looking at the odds ratios in separate groups, for APOE carrier group, some have high odds ratio exceeding 3 and the others have low value, less than 1. On the other hands, APOE non-carrier group doesnt seem to have high effect SNPs as carrier group, instead, most of SNPs have between 1 and 2 odds ratio.
The best accuracy was found in penalized methods for both the combined case and the separate cases. For the combined case, the largest AUC was 0.6520 with only 100 SNPs and 0.6671 with 10,000 SNPs, in PRS model and in ridge regression model, respectively. For the separate case, AUC 0.6551 with 100 SNPs and 0.6741 with 1,000 SNPs, in PRS model and in ridge regression model, respectively. When we crossly combined the models where y ~ APOE4 + APOE2 + AGE + SEX model for the carrier group and ridge regression model with 10,000 SNPs for non-carrier group, AUC was 0.6773.
We further investigated whether the stratified strategy helps to improve AD prediction. For each model when the number of SNPs is fixed, stratified model outperformed the combined model when relatively smaller number of SNPs are included but it was opposite when the number of SNPs is large. When we crossly calculated AUC and got the best accuracy 0.6773.

Conclusion: This study supports that AD has different polygenic architectures according to APOE types. First, the results of GWAS for combined data and separated data have shown that different kinds of SNPs affect AD with different effects. Second, we show that stratified analysis improves AUC over combined one. For extension of our analysis, we may identify people of high risk of AD without any APOE alleles. That is, the suggested method can provide more variation in estimated risk in the population.
Language
English
URI
https://hdl.handle.net/10371/141931
Files in This Item:
Appears in Collections:
Graduate School of Public Health (보건대학원)Dept. of Public Health (보건학과)Theses (Master's Degree_보건학과)
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse