Publications

Detailed Information

A comparison study of statistical methods for the analysis of metagenome data : 메타게놈 데이터 분석을 위한 통계적 방법론 비교

DC Field Value Language
dc.contributor.advisor박태성-
dc.contributor.author이찬영-
dc.date.accessioned2018-05-29T05:11:22Z-
dc.date.available2019-04-17-
dc.date.issued2018-02-
dc.identifier.other000000149487-
dc.identifier.urihttps://hdl.handle.net/10371/142485-
dc.description학위논문 (석사)-- 서울대학교 대학원 : 자연과학대학 협동과정 생물정보학전공, 2018. 2. 박태성.-
dc.description.abstractA comparison study of statistical methods for the analysis of
metagenome data

Chanyoung Lee
Interdisciplinary Program in Bioinformatics
The Graduate School
Seoul National University

With the advent of next-generation sequencing (NGS) technology, sequencing microorganisms from varied samples facilitates association analysis between feature and environment. Several statistical methods have been proposed for analyzing metagenome data such as Metastats, metagenomeSeq, ZIBSeq, ANCOM, edgeR, and DESeq2. Each method has assumed its own specific distribution and model assumptions. While there have been some comparative studies on these methods, the comparison is rather limited and the results have been varied depending on how to generate simulation datasets. In this study, we systematically investigate the properties of these statistical methods for finding differentially abundant features (DAF). In addition, centered log-ratio transformation and permutation logistic regression model (CLR Perm) were applied to metagenome data. We compare their performances using simulation data generated from the Human Microbiome Project (HMP). We first assessed the type I error rate of each method over different levels of sparsity. CLR Perm, metagenomeSeq and ANCOM methods yielded well preserved type I error rates regardless of sparsity. In the power comparison study, CLR Perm showed the highest power among the methods preserving type I error. Furthermore, we applied the methods to real data on colorectal cancer (CRC) to compare our results with existing taxonomic markers of CRC. In conclusion, we recommend using a combination of CLR Perm and metagenomeSeq for the analysis of metagenome data because there are differences in the list of significant taxa discovered by CLR Perm and metagenomeSeq.
-
dc.description.tableofcontents1 Introduction 1
2 Material and Methods 6
2.1 Simulation materials (HMP) 6
2.2 Colorectal cancer data 8
2.3 Existing methods 11
2.4 Permutation logistic regression with centered log-ratio transformation (CLR Perm) 14
3 Simulation 17
3.1 Simulation model 17
3.2 Power and type I error rate 18
4 Results 22
4.1 Simulation results 22
4.2 Colorectal cancer data results 26
5 Discussion 33
Bibliography 36
-
dc.formatapplication/pdf-
dc.format.extent4164810 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectDifferentially abundant feature-
dc.subjectMetagenome-
dc.subject16S rRNA-
dc.subjectAssociation test-
dc.subject.ddc574.8732-
dc.titleA comparison study of statistical methods for the analysis of metagenome data-
dc.title.alternative메타게놈 데이터 분석을 위한 통계적 방법론 비교-
dc.typeThesis-
dc.description.degreeMaster-
dc.contributor.affiliation자연과학대학 협동과정 생물정보학전공-
dc.date.awarded2018-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share