S-Space College of Agriculture and Life Sciences (농업생명과학대학) Dept. of Agricultural Biotechnology (농생명공학부) Theses (Ph.D. / Sc.D._농생명공학부)
Deciphering genomic variation and effective population size using NGS and SNP data in mammals
차세대 염기서열 및 단일염기다형성 데이터를 이용한 포유류의 유전체 변이와 유효집단크기 해독
- 농업생명과학대학 농생명공학부
- Issue Date
- 서울대학교 대학원
- 학위논문 (박사)-- 서울대학교 대학원 : 농생명공학부, 2014. 8. 김희발.
- This doctoral dissertation consists of five studies related to mammalian genetic variation and effective population size using SNP data or NGS data. Effective population size is essential to measure data size, quality and genetic diversity of animal population. I thus investigated economic trait-associated genetic variation of domesticated animal using SNP data. In addition, I examined copy number variation related to domestication process of cattle using NGS data.
In chapter 1, I introduced the basic background and necessity of the series of worked in this doctoral dissertation.
The effective population size (Ne) is important to assess the genetic diversity of animal populations. In chapter 2, I characterized more accurate linkage disequilibrium in a sample of 96 dairy cattle producing milk in Korea and estimated Ne that is approximately 122. And I inferred historical Ne and I can knew that a rapid increase Ne over the past 10 generations, and increased slowly thereafter. These results can be rationalized using current knowledge of the history of the dairy cattle breeds producing milk in Korea. In chapter 3, I investigated the common minke whale (Balaenoptera acutorostrata) genome using next generation sequencing. After then, I estimated historical effective population size in the minke whale based on coalescent model to know when minke whale population size decreases rapidly. As a result, I guessed that minke whale population diversity downsized to approximately 3.1%. And strong predicted time of minke whale declination during Holocene is approximately between 194 and 902 years ago. These whole-genome sequencing offers a chance to better understand the population history of the largest aquatic mammals on earth.
After knowing population characteristic, I investigated genetic variant related to economic traits of domesticated animal. In chapter 4, I identified SNPs related to horse racing performance. Thoroughbred, a relatively recent horse breed, is best known for its use in horse racing. Although myostatin (MSTN) variants have been reported to be highly associated with horse racing performance, the trait is more likely to be polygenic in nature. I conducted a two-stage genome-wide association study to search for genetic variants associated with the EBV. I identified 28 significant SNPs related to 17 genes. Among these, six genes have a function related to myogenesis and five genes are involved in muscle maintenance. To my knowledge, these genes are newly reported for the genetic association with racing performance of Thoroughbreds. It complements a recent horse GWAS of racing performance that identified other SNPs and genes as the most significant variants. These results will help to expand my knowledge of the polygenic nature of racing performance in Thoroughbreds. In chapter 5, I identified SNPs related to milk production of dairy cattle. Holsteins are known as the world's highest-milk producing dairy cattle. I inferred each EBVs using recent ridge regression BLUP. After then, I conducted multivariate genome-wide association study to search for genetic variants associated with the EBVs for milk production traits using SNP data. I identified 128 significant SNPs related to 47 genes. These genes were related to cellular component localization, protein localization, intracellular signaling cascade and microtubule. These genes are newly reported for the genetic association with milk production of Holstein. It complements a recent Holstein GWAS that identified other SNPs and genes as the most significant variants. These results will help to expand my knowledge of the polygenic nature of milk production in Holstein.
Finally, I detected cattle copy number variations related to domestication process, as another genetic source except SNP. Copy number variation (CNV), a source of genetic diversity in mammals, has been shown to underlie biological functions related to production traits. Notwithstanding, there have been few studies conducted on CNVs using next generation sequencing at the population level. I used NGS data containing ten Holsteins, a dairy cattle, and 22 Hanwoo, a beef cattle. The sequence data for each of the 32 animals varied from 13.58-fold to almost 20-fold coverage. I detected a total of 6,811 deleted CNVs across the analyzed individuals (average length = 2,732.2 bp) corresponding to 0.74% of the cattle genome (18.6 Mbp of variable sequence). By examining the overlap between CNV deletion regions and genes, I selected 30 genes with the highest deletion scores. These genes were found to be related to the nervous system, more specifically with nervous transmission, neuron motion, and neurogenesis. I regarded these genes as having been effected by the domestication process. Further analysis of the CNV genotyping information revealed 94 putative selected CNVs and 954 breed-specific CNVs. This study provides useful information for assessing the impact of CNVs on cattle traits using NGS data at the population level.