Publications

Detailed Information

Pitfalls of using shared controls in meta-analysis of genetic association studies : 메타분석에서 표본중첩 현상이 존재 할 경우의 문제점과 해결책

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

김은지

Advisor
이상엽
Major
자연과학대학 화학부
Issue Date
2018-08
Publisher
서울대학교 대학원
Description
학위논문 (박사)-- 서울대학교 대학원 : 자연과학대학 화학부, 2018. 8. 이상엽.
Abstract
Recent genome-wide association studies frequently augment sample size by using publicly available shared controls. Nowadays, many disease consortia or groups of investigators are combining their efforts to understand the genetic basis of diseases by collecting summary statistics from participating studies and performing a meta-analysis. However, if participating studies share publicly available controls, those can induce correlations between statistics, which prohibit the use of the standard meta-analysis methods. Fortunately, recently developed meta-analysis methods can systematically account for these correlations and are widely used in single-disease or cross-disease meta-analyses. In this paper, we identify and report a phenomenon that using shared controls in multiple participating studies in a meta-analysis can dramatically reduce power even with the use of these systematic methods. We demonstrate that meta-analysis power decreases as we add shared controls to multiple studies in a meta-analysis, in contrast with the common expectation that additional samples should increase power. We investigate why this phenomenon occurs and provide three possible solutions to prevent this power reduction: (1) removing sample overlaps to make summary statistics completely independent, (2) exclusively using shared controls and no study-specific controls, or (3) using our newly proposed method FOLD (Fully-powered method for OverLapping Data).



To increase samples participating in an analysis, meta-analysis combines summary statistics from multiple independent studies. If multiple studies participating in a meta-analysis utilize the same public dataset as controls, the summary statistics from these studies are no more independent and become correlated. Lin and Sullivan proposed the correlation estimator based on the shared and unshared sample sizes and suggested an optimal test statistic to account for the correlations (AJHG 2010). Their method was shown to achieve similar power to the gold standard method, splitting, which refers to the method that splits shared individuals into the studies prior to meta-analysis when we have access to the genotype data. Many different methods were proposed after Lin and Sullivan, but most of these methods were based on the similar correlation estimator. In this paper, we report a phenomenon that the use of the standard method suggested by Lin and Sullivan can lead to unbalanced power for detecting protective alleles (OR < 1) and risk alleles (OR > 1). Specifically, when we assumed that the controls were shared, the power for detecting protective minor alleles (OR<1) were lower than the power for detecting risk minor alleles (OR>1). For example, for detecting a minor allele of frequency 10% and of OR=0.85, simulating meta-analysis of 5 studies showed that the standard method only achieved 61.6% power whereas splitting achieved 67.0%. By contrast, when we flipped the effect direction (OR=1.17), the existing method conversely achieved higher power (71.8%) than splitting. The degree of asymmetry was exacerbated as the minor allele frequency (MAF) decreased. To our knowledge, we are the first to report this phenomenon. After investigating on this phenomenon, we identified that the power asymmetry problem occurred because the standard correlation estimator did not exactly predict the true correlation. The existing estimator was approximated under the simple assumption of the null hypothesis of no effect, but under the alternative hypothesis, the true correlation is dependent on MAF and effect size. Thus, the errors in estimator could lead to substantially unbalanced power. To overcome the power asymmetry problem, we developed a method that uses an accurate correlation estimator, called PASTRY (A method to avoid Power ASymmeTRY). Our method is based on the correlation estimator that was designed to be accurate under the alternative hypothesis. We show that using our method, one can effectively achieve symmetry on power for testing risk and protective alleles.



Keyword: Meta-analysis, genome-wide association studies (GWAS), overlapping samples, correlation, power asymmetry, case-control study.



Student Number: 2011-30903
Language
English
URI
https://hdl.handle.net/10371/143225
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share