Publications

Detailed Information

Estimating genetic marker effects in population-based genomic study using regression model : 집단 유전체학에서 회귀 모형을 이용한 유전 표지자 효과 추정

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

이영섭

Advisor
김희발
Major
자연과학대학 협동과정 생물정보학전공
Issue Date
2017-02
Publisher
서울대학교 대학원
Keywords
Regression analysisSNP-SNP relationship matrix (SSRM)Missing-heritability problemSelection coefficientGenome-wide association study (GWAS)Best linear unbiased prediction (BLUP)Effective population size (Ne)
Description
학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2017. 2. 김희발.
Abstract
After various DNA (deoxyribonucleic acid) markers at the genomic DNA level had been discovered, scientists paid attention to DNA sequencing and genotyping. Genotyping is to uncover the genetic variants as one of the molecular markers. Single nucleotide polymorphisms (SNPs) are undeniably one of the most important markers. Especially, population-based SNP can possess the characteristics of an individual that may be different from others.

To reveal the causes of an individuals characteristics, one of the possible ways is to employ established statistical models. Regression analysis has frequently been used in the bioinformatics area. I analyzed the data using the regression models such as linear, nonlinear regression and mixed models.
This doctoral dissertation comprises five chapters. In chapter 1, overviews of the required population genetics theories, effective population size estimation, best linear unbiased prediction (BLUP) and genome-wide association study (GWAS) is introduced. To estimate the effective population size, two methods have been employed: classical Sveds equation and Kimura 2-Parameter (K2P) model and Watterson theta estimator. Sveds equation is based on nonlinear regression, computationally and K2P uses the number of SNPs. The BLUP is used to estimate the random effects in linear mixed models. Moreover, GWAS is used to find causal genetic variants associated with a trait. As one of the methods to predict random marker effects, I propose the Single Nucleotide Polymorphism – Genomic Best Linear Unbiased Prediction (SNP-GBLUP). This new BLUP is based on Genomic Relationship Matrix (GRM) in theory.
In chapter 2, effective population size of Korean Thoroughbred horses (TB horses) has been estimated. TB breeds have been beloved because of those breeds great racing capability. I tried to examine the genetic diversity and stability of Korean TB population using by estimating effective population size. I used two methods as mentioned earlier: Sveds equation as basic approach, K2P and Watterson theta estimator as the second approach. I estimated TB horses effective population size as 79 (Sveds equation) and 77 (K2P). This is rather weak when compared to other countries TB effective population size. For instance, Corbin et al. estimated Irish TB effective population size as 100. The author used Sveds equation which is based on linkage disequilibrium (LD).
In Chapter 3, I introduced SNP-SNP Relationship Matrix (SSRM) which deals with the pairwise relationships between SNPs. This relationship matrix can be considered more advanced and differentiated notion than the Genomic Relationship Matrix (GRM) which is important in Genomic-Best Linear Unbiased Prediction (G-BLUP). GRM extracts individual relationships that are crucial concepts of mixed model or BLUP. In the BLUP area, to deal with the random effects effectively, GRM is one of the requisites. SSRM is a novel concept, although it is based on multivariate normal distribution (MVN) and GRM. The difference of SSRM from GRM is grounded on the different definition of the relationships since it is defined at the individual or SNP level. The SSRM is certainly more difficult and not-easily-validated one. Despite this, the bioinformatic information contained in SSRM is sufficient because it can contain extensive information. I think that SSRM is the hidden information and GRM may be disguised or processed one by SNP information. By introducing SSRM, I analyzed the human height data using mixed model. Korean Association Resource Phase 3 (KARE3), Ansan-Ansung cohorts data contains each individuals traits and SNP information. The main objective was to check SSRMs usefulness in mixed model and compare SSRM-based SNP-GBLUP with SNP-BLUP (Single Nucleotide Polymorphism-Best Linear Unbiased Prediction) which is based on IID (independent & identically distributed) between SNP relationships. First, I introduced the theoretical derivation of SSRM based on probability density function (PDF) of the model and linear algebra. Second, I compared SNP-GBLUP with SNP-BLUP and G-BLUP by using human height and SNP data. The genetic values between SNP-GBLUP and SNP-BLUP were very disparate along with the SNP effects.
In chapter 4, I tried to solve Missing heritability problem in BLUP. Missing heritability problem is a problem that the associations cannot fully explain heritability that are estimated from correlations between relatives. This is important in association like GWAS or BLUP. BLUP deals with global genetic variants and complex traits. The traits were Berkshire eight pork quality traits (fat, carcass weight, shear force, Minolta color L, A, protein content, water holding capacity, backfat thickness). These traits are very important economic traits in the pork meat production industry and therefor those breeding values (BV) must be predicted with better accuracy as breeding strategies. First, using the GWA study, the putative quantitative trait loci (QTL) for traits of interest were scanned at the SNP level. I chose the criteria of the QTL as unadjusted P-value (<0.01) arbitrarily. Then I analyzed the Berkshire traits with the SNPs using the BLUP. The heritability estimated from BLUP was close to the known heritability estimates. The results showed better results than the results from using total SNPs (original data) in terms of genomic estimated breeding values (GEBVs) and heritability estimates.
In chapter 5, the selection coefficient in F1 generation (if borrowed from genetics) –the next generation of the current generation) – was predicted using Fishers fundamental theorem of natural selection and BLUP. Fishers fundamental theorem of natural selection states: The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time. The selection is one of the major driving forces to be able to change allele frequency. Thus not only to reveal the history of selection but also to predict future selection trends is very imperative. The statistical model was additive linear model like BLUP. I calculated the additive genetic variance of each SNP using SNP effects (from SNP-GBLUP) and using the Fishers theorem, calculated the selection coefficient. Then the gene ontology of significant SNP-containing genes was surveyed. The phenotypes were three Holstein milk-related traits (milk yield, fat and protein contents). These traits are very crucial to dairy farmers. The heritability estimates from the BLUP were not bad (milk yield, fat and protein content 0.39, 0.45 and 0.40, respectively). The gene catalogue was retrieved from Ensembl server (www.ensembl.org). The theorem links the genetic variance to selection coefficient. The features of selection coefficient were the next generation, expected, relative. The expected implies that the selection coefficient of this kinds of approach is just predicted one and relative means that the predicted values was recalibrated using the maximum values because the order of the values are dependent on the units of phenotypic values. The gene ontology contained in highly selective SNPs predicted from milk protein traits was dendritic spine morphogenesis, nitric oxide biosynthetic process, etc. Specially, dendritic spine morphogenesis was the most significant gene ontology. The dendritic spine is the major sites of excitatory synaptic transmission in the mammalian brain and is very imperative in synaptic development and plasticity. Thus the related genes of the dendritic spine morphogenesis are expected to be important target of future artificial selection trends of Holstein cattle in Korea. The gene ontology of milk yield and fat did not have any significant ontologies.
Language
English
URI
https://hdl.handle.net/10371/125386
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share