Publications

Detailed Information

A Statistical Method on DNA Methylation Calling and its Application with Next generation sequencing technique

DC Field Value Language
dc.contributor.advisor박태성-
dc.contributor.author허익수-
dc.date.accessioned2017-07-14T00:31:37Z-
dc.date.available2017-07-14T00:31:37Z-
dc.date.issued2015-08-
dc.identifier.other000000066578-
dc.identifier.urihttps://hdl.handle.net/10371/121154-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 통계학과, 2015. 8. 박태성.-
dc.description.abstractEpigenomics is the study of biological factors that induce mitotically and/or meiotically heritable changes in gene functions, except by changes in deoxyribonucleic acid (DNA) sequence. Representative mechanisms that make such changes are DNA methylation and histone modification. These two mechanisms control gene expression via changing affinities of transcription factor and/or altering patterns of DNA packing, rather than altering the underlying DNA sequence. Those epigenetic processes play roles in imprinting, gene silencing, X chromosome inactivation, position effect, maternal effects, and the progress of carcinogenesis. Therefore, the importance of the epigenetic process is rapidly increasing.
Especially, DNA methylation is one of the mostly interesting and vigorously studied phenomena. Methylation process is defined as addition of a methyl group to a substrate, or the substitution of an atom or molecular group by a methyl group. In DNA methylation process, methylation arises in the cytosine, which is one component of the four nucleotides: guanine (G), adenine (A), thymine (T), and cytosine (C). After methylation process on cytosines, those would be 5-Methylcytosine. This process especially actively occurs in cytosines that are located next to a guanine nucleotide in the DNA sequence. It is defined as CpG sites.
There have been many biochemical techniques to measure level of methylation. Firstly developed techniques are based on immunoprecipitation techniques. The techniques uses phenomenon of immunoprecipitation using florescent antibodies that combines to 5-Methylcytosines (Methylated DNA immunoprecipitation: MeDIP). Whole genomic region are divided into many small region such as genes and those divided DNA sequences are located in their own location in microarray panel. After hybridization of the sequences with florescent antibodies to reference sequences in the microarray panel, we measure light intensities of all microarray spots and the intensities would be regarded as methylation intensities of genomic regions (MeDIP-chip technique).
The MeDip-chip techniques are widely used to get methylation level information of genomic regions. However, the microarray based approaches have limitation because they use pre-defined sequences. To overcome the problem, Next-Generation sequencing (NGS) techniques were combined to MeDIP approach (MeDIP-seq). After DNA sequence fragments with florescent antibodies are sequenced and mapped to the reference whole genome, we can obtain methylated CpG cytosine regions and its intensities by sequencing read depth. Then we can tell whether some genomic regions are methylated or non-methylated. However, the MeDIP-seq techniques also have low resolution (several dozen base pairs) because they only count numbers of mapped DNA fragments that are methylated as methylation intensities.
To overcome these limits of MeDIP, a new method is recently developed based on NGS and Bisulfite treatment. In order to discriminate non-methylated cytosines and methylated cytosines, the technique included bisulfite treatment which converts non-methylated cytosine into thymine and we then estimate methylation level by measuring ratio between numbers of cytosines and thymines in each CpG cytosine site. Because we can obtain information of base-pair resolution methylation from the technique, new statistical methods are needed to handle this new type data. The first issue is to develop method which classifies binary methylation status (methylation calling) and the second issue is to develop method which detects differentially methylated region (DMR).
For those two issues, we proposed two statistical approaches. For the binary methylation binary calling issue, we propose a new classification tool using bayes classifier and local information (Bis-Class). This method used the biological phenomenon that methylation status are spatially correlated, therefore the method performs better than binomial test using false discovery rate (FDR) especially on the condition of low coverage depth and low methylation level. We showed advantages of our method through simulation and real data analysis using honeybee dataset. For the differential methylated region (DMR) detecting method, we proposed a modified Cochran?Mantel-Haenszel (CMH) statistic.
The original CMH statistic was proposed to test conditional independence between two variables with stratification. However, because there is substantial and spatial correlation of methylation level between adjacent CpG cytosine sites, we additionally included spatial correlation structure and impose biological importance weights on the binary called base-pair resolution information. Moreover, the method has advantages that it can be applied to more various situations such as analysis of ordinal or multinomial response. We compared our method with Fishers exact test that has been used for binary called bisulfite sequencing data. Using the modified CMH test, we can avoid type 1 error inflation and handle multiple biological replicated samples in each experimental group. We also conducted simulation study and real data analysis using honeybee bisulfite sequencing dataset to detect differential methylated region.
We expect that our proposed methods to handle bisulfite sequencing data via NGS techniques are widely used to elucidate biological relationships between epigenetic data and many biological endpoints such as cancers, aging, gene silencing, etc.
-
dc.description.tableofcontentsContents

Abstract i
Contents v
List of Figures viii
List of Tables x

1 Introduction 1
1.1 Background of Deoxyribonucleic acid (DNA) Methylation process 1
1.1.1 Definition of Methylation process 1
1.1.2 Basic Biological functions of Methylation process 2
1.2 Review of methylation measuring techniques 4
1.2.1 Immunoprecipitation-based methylation measuring technique (MeDIP) 4
1.2.2 Bisulfite-sequencing based methylation level measuring technique (BS-seq) 7
1.3 Purpose of this study 9
1.4 Outline of the thesis 10

2 Overview of methylation measuring methods 11
2.1 Regional measuring methods 11
2.1.1 Application to explore relationship between transcriptional noise and DNA methylation 12
2.2 Base-pair resolution measuring method 22
2.2.1 Experimental errors considered in Statistical test and Binomial test using False discovery rate (FDR) 22

3 A new classification tool of methylation status using bayes classifier and local methylation information 25
3.1 Introduction 25
3.2 Methods 31
3.2.1 Binomial test using FDR and its limit 31
3.2.2 Bis-class 34
3.3 Material and its description: Honeybee dataset 39
3.4 Simulation study 41
3.5 Application to real dataset 51
3.5.1 Calling of honeybee (Insect) dataset and validation of our method 51
3.6 Conclusion 63

4 Application to real dataset and detecting differentially methylated region (DMR) analysis 64
4.1 Introduction of DMR method 64
4.1.1 Fishers exact test using binary calling dataset and its limit 64
4.2 Methods 69
4.2.1 Overview of the Cochran-Mantel-Haenszel (CMH) test 69
4.2.2 Application of the CMH Method to BS-seq Data 71
4.3 Application to real dataset 79
4.3.1 Detecting DMRs using honeybee dataset and validation of our method 79
4.4 Conclusion 85
5 Summary and Conclusion 86

References 89
Abstract (Korean) 97
-
dc.formatapplication/pdf-
dc.format.extent3596552 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectDNA Methylation-
dc.subjectNext Generation Sequencing (NGS)-
dc.subjectBisulfite treatment-
dc.subjectMethylation binary calling-
dc.subjectDifferential Methylation Region (DMG) test-
dc.subject.ddc519-
dc.titleA Statistical Method on DNA Methylation Calling and its Application with Next generation sequencing technique-
dc.typeThesis-
dc.contributor.AlternativeAuthorIksoo Huh-
dc.description.degreeDoctor-
dc.citation.pagesx, 99-
dc.contributor.affiliation자연과학대학 통계학과-
dc.date.awarded2015-08-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share