Publications

Detailed Information

A comparison study of statistical methods for the analysis of metagenome data : 메타게놈 데이터 분석을 위한 통계적 방법론 비교

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

이찬영

Advisor
박태성
Major
자연과학대학 협동과정 생물정보학전공
Issue Date
2018-02
Publisher
서울대학교 대학원
Keywords
Differentially abundant featureMetagenome16S rRNAAssociation test
Description
학위논문 (석사)-- 서울대학교 대학원 : 자연과학대학 협동과정 생물정보학전공, 2018. 2. 박태성.
Abstract
A comparison study of statistical methods for the analysis of
metagenome data

Chanyoung Lee
Interdisciplinary Program in Bioinformatics
The Graduate School
Seoul National University

With the advent of next-generation sequencing (NGS) technology, sequencing microorganisms from varied samples facilitates association analysis between feature and environment. Several statistical methods have been proposed for analyzing metagenome data such as Metastats, metagenomeSeq, ZIBSeq, ANCOM, edgeR, and DESeq2. Each method has assumed its own specific distribution and model assumptions. While there have been some comparative studies on these methods, the comparison is rather limited and the results have been varied depending on how to generate simulation datasets. In this study, we systematically investigate the properties of these statistical methods for finding differentially abundant features (DAF). In addition, centered log-ratio transformation and permutation logistic regression model (CLR Perm) were applied to metagenome data. We compare their performances using simulation data generated from the Human Microbiome Project (HMP). We first assessed the type I error rate of each method over different levels of sparsity. CLR Perm, metagenomeSeq and ANCOM methods yielded well preserved type I error rates regardless of sparsity. In the power comparison study, CLR Perm showed the highest power among the methods preserving type I error. Furthermore, we applied the methods to real data on colorectal cancer (CRC) to compare our results with existing taxonomic markers of CRC. In conclusion, we recommend using a combination of CLR Perm and metagenomeSeq for the analysis of metagenome data because there are differences in the list of significant taxa discovered by CLR Perm and metagenomeSeq.
Language
English
URI
https://hdl.handle.net/10371/142485
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share