Statistical Method Development for Rare Variant Association Tests in Family-based Designs

Longfei Wang

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Statistical Method Development for Rare Variant Association Tests in Family-based Designs : 가족 기반 희귀 변이 연관 분석을 위한 분석 알고리즘 개발

DC Field	Value	Language
dc.contributor.advisor	원성호	-
dc.contributor.author	Longfei Wang	-
dc.date.accessioned	2019-10-21T03:44:04Z	-
dc.date.available	2019-10-21T03:44:04Z	-
dc.date.issued	2019-08	-
dc.identifier.other	000000156395	-
dc.identifier.uri	https://hdl.handle.net/10371/162457	-
dc.identifier.uri	http://dcollection.snu.ac.kr/common/orgView/000000156395	ko_KR
dc.description	학위논문(박사)--서울대학교 대학원 :자연과학대학 협동과정 생물정보학전공,2019. 8. 원성호.	-
dc.description.abstract	수많은 전장유전체연관분석(GWAS)에도 불구하고 질병연관 유전체변이(DSL)는 제한적으로만 발견되었는데 이는 실종된 질병유전성(missing heritability)에 기인한다. 한 번에 긴 리드(read)를 시퀀싱하는 기술은 이를 보완해 줄 것으로 기대되어 왔으며, 이 기술의 발달 덕분에 유전체연관분석을 활용하여 여러 희귀(rare) 및 일반(common) 인과 변이를 발견할 수 있었다. 그러나 꽤 많은 샘플을 이용한 실험에서도 단일 변이를 대상으로한 전장유전체연관분석은 부정오류(false negative) 문제에서 자유로울 수 없다. 이에 희귀변이 연관 분석의 검정력을 증가시키기 위해 생물학적으로 연관이 있는 위치의 여러 유전체변이를 하나로 합쳐서 분석하는 방법들이 제안되었다. 버든 검정(burden test), 분산구조 검정(variance component test), 결합 옴니버스 검정(combined omnibus test) 등의 위치기반 연관 분석이 바로 그것이다. 희귀변이 연관분석에 위와 같은 분석방법을 활용하면 검정력이 크게 증가하여 더 많은 질병연관 유전체 변이를 발견할 수 있을 것으로 기대되어왔다. 하지만 샘플 간 유전적 이질성의 존재와 상대적으로 샘플 수가 적은 한계들 때문에 매우 적은 수의 변이 만이 발견되었다. 이러한 문제점을 해결하기 위해 다양한 방법들이 개발되었는데, 그 중 하나는 가족기반 분석 방법으로 이는 샘플 간 유전적 이질성과 집단층화 문제를 다루는데 용이하다. 두 번째로 서로 다른 표현형이 서로 관련이 있을 경우 검정력을 증가시키기 위해 이들을 한번에 분석하는 방법이 있다. 세 번째는 메타분석을 활용하여 여러 연구의 결과를 합치는 방법으로 이는 많은 연구들에서 효과적임이 밝혀졌다. 이 논문에서는 현재 많이 사용되고 있는 여러 가족기반 희귀변이 연관 분석 방법을 비교하였고 다른 방법들에 비해 FARVAT 이 통계적으로 견고하며 계산 효율적인 방법임을 보였다. 더 나아가 이를 다중 표현형 분석 방법(mFARVAT)과 메타분석 방법(metaFARVAT)으로 확장하였다. mFARVAT은 유사우도함수 기반 스코어 테스트(quasi-likelihood-based score test)를 다수의 표현형에 적용하는 희귀질환 연관분석 방법으로 표현형들에 대한 각 변이의 동질성 및 이질성 효과를 검증한다. metaFARVAT은 여러 연구에서의 유도함수 스코어를 결합하여 버든 통계량, 변이 임계(variable threshold) 통계량, 분산구조 통계량, 결합 옴니버스 통계량을 생성한다. 이는 여러 연구들의 결과를 이용하여 변이들의 동질성 및 이질성 효과를 검증하며, 정량 표현형 및 이분 표현형에 적용이 가능하다. 다양한 시나리오 하에서의 광범위한 모의 실험을 통해 제안한 방법들이 일반적으로 견고하고 효율적이라는 것을 보였다. 또한 이 방법을 활용하여DLEC1 등의 만성폐쇄성폐질환(COPD) 관련 후보 유전자를 발견하였다.	-
dc.description.abstract	Despite of tens of thousands of genome wide association studies (GWASs), the so-called missing heritability reveals that analyses of common variants identified only a limited number of disease susceptibility loci and a substantial amount of causal variants remain undiscovered by GWASs. Sequencing technology was expected to supply this additional information by obtaining large stretches of DNA spanning the entire genome, and improvements in this technology have enabled genetic association analysis of rare/common causal variants. However, single variant association tests commonly used by GWAS result in false negative findings unless very large samples are available. Alternatively, aggregation of association signals across multiple genetic variants in a biology relevant region is expected to boost statistical power for rare variant analysis. Numerous statistical methods have been proposed for region-based rare variant association studies, such as burden, variance component, and combined omnibus tests. Region-based association tests are expected to substantially improve statistical power for rare variant analyses and to identify additional disease susceptibility loci. However, very few significant results have been identified due to genetic heterogeneity and relatively small sample sizes. To address the limitations, various approaches have been developed. First, family-based designs play an important role in controlling genetic heterogeneity and population stratification. Second, disease status are often diagnosed by the outcomes of different but related phenotypes, and thus multiple phenotype analysis is supposed to provide additional information and increase power. Third, for the small sample issue, combining results from multiple studies using meta-analysis has been repeatedly addressed as an effective strategy. In this study, I compared the performance of a selection of the popular family-based rare variant association tests and found FARVAT is the most statistically robust and computationally efficient method. Besides, I extended FARVAT for multiple phenotype analysis (mFARVAT), and meta-analysis (metaFARVAT). mFARVAT is a quasi-likelihood-based score test for rare variant association analysis with multiple phenotypes, and tests both homogeneous and heterogeneous effects of each variant on multiple phenotypes. metaFARVAT combines quasi-likelihood scores from multiple studies and generates burden, variable threshold, variance component, and combined omnibus test statistics. metaFARVAT tests homogeneous and heterogeneous genetic effects of variants among different studies and can be applied to both quantitative and dichotomous phenotypes. With extensive simulation studies under various scenarios, I found that the proposed methods are generally robust and efficient with different underlying genetic architectures, and I identified some promising candidate genes associated with chronic obstructive pulmonary disease, including DLEC1.	-
dc.description.tableofcontents	Abstract i Contents iv List of Figures vii List of Tables viii 1 Introduction 1 1.1 The background on rare variant association studies 1 1.1.1 Overview of rare variant association studies 1 1.1.2 Challenges of rare variant association studies 8 1.2 Purpose of this study 12 1.3 Outline of the thesis 15 2 Overview of family-based rare variant association tests 16 2.1 Overview of family-based association studies 16 2.2 Comparison of the selected family-based rare variant association tests 21 2.2.1 Rare Variant Transmission Disequilibrium Test (RV-TDT) 24 2.2.2 Generalized Estimating Equations based Kernel Machine test (GEE-KM) 25 2.2.3 Combined Multivariate and Collapsing test for Pedigrees (PedCMC) 26 2.2.4 Gene-level kernel and burden tests for Pedigrees (PedGene) 27 2.2.5 FAmily-based Rare Variant Association Test (FARVAT) 28 2.2.6 Comparison of the methods with GAW19 data 30 2.3 Conclusions 38 3 Family-based Rare Variant Association Test for Multivariate Phenotypes 39 3.1 Introduction 39 3.2 Methods 40 3.2.1 Notations and the disease model 40 3.2.2 Choice of offset 42 3.2.3 Score for quasi-likelihood 43 3.2.4 Homogeneous mFARVAT 44 3.2.5 Heterogeneous mFARVAT 47 3.3 Simulation study 51 3.3.1 The simulation model 51 3.3.2 Evaluation of mFARVAT with simulated data 55 3.4 Application to COPD data 78 3.5 Discussion 85 4 Family-based Rare Variant Association Test for Meta-analysis 90 4.1 Introduction 90 4.2 Methods 92 4.2.1 Notation 92 4.2.2 Choices of Offset 93 4.2.3 Score for Quasi-likelihood 94 4.2.4 Homogeneous Model 95 4.2.5 Heterogeneous Model 98 4.3 Simulation study 101 4.3.1 The simulation model 101 4.3.2 Evaluation of metaFARVAT with simulated data 104 4.4 Application to COPD data 124 4.5 Discussion 132 5 Summary & Conclusions 145 Bibliography 149 Abstract (Korean) 156	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	rare variant association test	-
dc.subject	family-based designs	-
dc.subject	multiple phenotypes	-
dc.subject	meta-analysis	-
dc.subject	chronic obstructive pulmonary disease	-
dc.subject.ddc	574.8732	-
dc.title	Statistical Method Development for Rare Variant Association Tests in Family-based Designs	-
dc.title.alternative	가족 기반 희귀 변이 연관 분석을 위한 분석 알고리즘 개발	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	왕롱페이	-
dc.contributor.department	자연과학대학 협동과정 생물정보학전공	-
dc.description.degree	Doctor	-
dc.date.awarded	2019-08	-
dc.identifier.uci	I804:11032-000000156395	-
dc.identifier.holdings	000000000040▲000000000041▲000000156395▲	-

Appears in Collections:

College of Natural Sciences (자연과학대학)
- Program in Bioinformatics (협동과정-생물정보학전공)
  - Theses (Ph.D. / Sc.D._협동과정-생물정보학전공)

Files in This Item:

000000156395.pdf 3.07 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share