Publications

Detailed Information

Pitfalls of using shared controls in meta-analysis of genetic association studies : 메타분석에서 표본중첩 현상이 존재 할 경우의 문제점과 해결책

DC Field Value Language
dc.contributor.advisor이상엽-
dc.contributor.author김은지-
dc.date.accessioned2018-11-12T00:59:07Z-
dc.date.available2018-11-12T00:59:07Z-
dc.date.issued2018-08-
dc.identifier.other000000151681-
dc.identifier.urihttps://hdl.handle.net/10371/143225-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 자연과학대학 화학부, 2018. 8. 이상엽.-
dc.description.abstractRecent genome-wide association studies frequently augment sample size by using publicly available shared controls. Nowadays, many disease consortia or groups of investigators are combining their efforts to understand the genetic basis of diseases by collecting summary statistics from participating studies and performing a meta-analysis. However, if participating studies share publicly available controls, those can induce correlations between statistics, which prohibit the use of the standard meta-analysis methods. Fortunately, recently developed meta-analysis methods can systematically account for these correlations and are widely used in single-disease or cross-disease meta-analyses. In this paper, we identify and report a phenomenon that using shared controls in multiple participating studies in a meta-analysis can dramatically reduce power even with the use of these systematic methods. We demonstrate that meta-analysis power decreases as we add shared controls to multiple studies in a meta-analysis, in contrast with the common expectation that additional samples should increase power. We investigate why this phenomenon occurs and provide three possible solutions to prevent this power reduction: (1) removing sample overlaps to make summary statistics completely independent, (2) exclusively using shared controls and no study-specific controls, or (3) using our newly proposed method FOLD (Fully-powered method for OverLapping Data).



To increase samples participating in an analysis, meta-analysis combines summary statistics from multiple independent studies. If multiple studies participating in a meta-analysis utilize the same public dataset as controls, the summary statistics from these studies are no more independent and become correlated. Lin and Sullivan proposed the correlation estimator based on the shared and unshared sample sizes and suggested an optimal test statistic to account for the correlations (AJHG 2010). Their method was shown to achieve similar power to the gold standard method, splitting, which refers to the method that splits shared individuals into the studies prior to meta-analysis when we have access to the genotype data. Many different methods were proposed after Lin and Sullivan, but most of these methods were based on the similar correlation estimator. In this paper, we report a phenomenon that the use of the standard method suggested by Lin and Sullivan can lead to unbalanced power for detecting protective alleles (OR < 1) and risk alleles (OR > 1). Specifically, when we assumed that the controls were shared, the power for detecting protective minor alleles (OR<1) were lower than the power for detecting risk minor alleles (OR>1). For example, for detecting a minor allele of frequency 10% and of OR=0.85, simulating meta-analysis of 5 studies showed that the standard method only achieved 61.6% power whereas splitting achieved 67.0%. By contrast, when we flipped the effect direction (OR=1.17), the existing method conversely achieved higher power (71.8%) than splitting. The degree of asymmetry was exacerbated as the minor allele frequency (MAF) decreased. To our knowledge, we are the first to report this phenomenon. After investigating on this phenomenon, we identified that the power asymmetry problem occurred because the standard correlation estimator did not exactly predict the true correlation. The existing estimator was approximated under the simple assumption of the null hypothesis of no effect, but under the alternative hypothesis, the true correlation is dependent on MAF and effect size. Thus, the errors in estimator could lead to substantially unbalanced power. To overcome the power asymmetry problem, we developed a method that uses an accurate correlation estimator, called PASTRY (A method to avoid Power ASymmeTRY). Our method is based on the correlation estimator that was designed to be accurate under the alternative hypothesis. We show that using our method, one can effectively achieve symmetry on power for testing risk and protective alleles.



Keyword: Meta-analysis, genome-wide association studies (GWAS), overlapping samples, correlation, power asymmetry, case-control study.



Student Number: 2011-30903
-
dc.description.tableofcontents

Abstract i



Contents V



List of figures Viii



List of tables X





Chapter 1 : Optimal strategy to account for overlapping controls in cross-disease meta-analysis of genetic association studies.



Introduction 1



Results



1. Adding shared controls can reduce power of meta-



analysis 4



2. Power loss occurs in partial overlap design 6



3. FOLD maintains power 9



3.1. WTCCC data analysis 13



3.2. PGC data analysis 15



4. Splitting strategy comparison 21



Materials and methods



1. Current approaches for meta-analysis with



overlapping samples 24





1.1. Splitting approach. 24



1.2. Lin and Sullivans method. 24



1.3. Zaykin-Kozbur method. 26



1.4. ASSET 28



1.4. Decoupling method 29





2. FOLD 30



2.1. FOLD framework 34



2.2. FOLD gives smaller variances 36



3. PowerSplit 40



4. Power simulations 42



4.1. WTCCC data 44



4.2. PGC data 46



Discussion 47



References 53



Chapter 2 : Achieving balanced power for detecting risk and protective alleles in meta-analysis of association studies with overlapping subjects.



Introduction 57



Results



1. Power asymmetry of existing method 60



2. Correlation estimator of existing methods 64



2.1. Minor allele frequency 64



2.2. Relative risks 65



3. Cumulative effect of correlation inaccuracy 68



4. Performance of PASTRY 70





Materials and methods



1. Correlation from Lin and Sullivans method 74



2. Correlation estimator of PASTRY 75



3. Power simulations 77



Discussion 78



References 80



Abstract in Korean 82
-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subject.ddc540-
dc.titlePitfalls of using shared controls in meta-analysis of genetic association studies-
dc.title.alternative메타분석에서 표본중첩 현상이 존재 할 경우의 문제점과 해결책-
dc.typeThesis-
dc.description.degreeDoctor-
dc.contributor.affiliation자연과학대학 화학부-
dc.date.awarded2018-08-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share