Publications
Detailed Information
A comparison study of statistical methods for the analysis of metagenome data : 메타게놈 데이터 분석을 위한 통계적 방법론 비교
Cited 0 time in
Web of Science
Cited 0 time in Scopus
- Authors
- Advisor
- 박태성
- Major
- 자연과학대학 협동과정 생물정보학전공
- Issue Date
- 2018-02
- Publisher
- 서울대학교 대학원
- Keywords
- Differentially abundant feature ; Metagenome ; 16S rRNA ; Association test
- Description
- 학위논문 (석사)-- 서울대학교 대학원 : 자연과학대학 협동과정 생물정보학전공, 2018. 2. 박태성.
- Abstract
- A comparison study of statistical methods for the analysis of
metagenome data
Chanyoung Lee
Interdisciplinary Program in Bioinformatics
The Graduate School
Seoul National University
With the advent of next-generation sequencing (NGS) technology, sequencing microorganisms from varied samples facilitates association analysis between feature and environment. Several statistical methods have been proposed for analyzing metagenome data such as Metastats, metagenomeSeq, ZIBSeq, ANCOM, edgeR, and DESeq2. Each method has assumed its own specific distribution and model assumptions. While there have been some comparative studies on these methods, the comparison is rather limited and the results have been varied depending on how to generate simulation datasets. In this study, we systematically investigate the properties of these statistical methods for finding differentially abundant features (DAF). In addition, centered log-ratio transformation and permutation logistic regression model (CLR Perm) were applied to metagenome data. We compare their performances using simulation data generated from the Human Microbiome Project (HMP). We first assessed the type I error rate of each method over different levels of sparsity. CLR Perm, metagenomeSeq and ANCOM methods yielded well preserved type I error rates regardless of sparsity. In the power comparison study, CLR Perm showed the highest power among the methods preserving type I error. Furthermore, we applied the methods to real data on colorectal cancer (CRC) to compare our results with existing taxonomic markers of CRC. In conclusion, we recommend using a combination of CLR Perm and metagenomeSeq for the analysis of metagenome data because there are differences in the list of significant taxa discovered by CLR Perm and metagenomeSeq.
- Language
- English
- Files in This Item:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.