Development of prokaryotic taxonomy-based 16S rRNA and genome database

윤석환

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Development of prokaryotic taxonomy-based 16S rRNA and genome database : 원핵미생물 분류체계에 기반한 16S rRNA 유전자 및 유전체 데이터베이스의 개발

DC Field	Value	Language
dc.contributor.advisor	천종식	-
dc.contributor.author	윤석환	-
dc.date.accessioned	2017-10-27T17:12:13Z	-
dc.date.available	2017-10-27T17:12:13Z	-
dc.date.issued	2017-08	-
dc.identifier.other	000000145201	-
dc.identifier.uri	https://hdl.handle.net/10371/137145	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 자연과학대학 생명과학부, 2017. 8. 천종식.	-
dc.description.abstract	In prokaryotic taxonomy, the 16S ribosomal RNA (rRNA) gene sequence-based approach has served as an alternative standard method to DNA-DNA hybridization (DDH), for which the 97% 16S rRNA gene sequence similarity was considered to be equivalent to the 70% DDH value for species demarcation. While the 16S rRNA-based method is unable to perfectly classify and identify bacterial and archaeal species using 16S rRNA gene, it is currently the most general tool to evaluate the taxonomic position of a prokaryotic strain at the same genus or species levels. Therefore, the 16S rRNA-based approach is still important in the classification of prokaryotes and the use of a database with taxonomically well-curated sequences such as EzTaxon-e is essential for accurate species identification. There has been a recent evolution of DNA sequencing technologies, called next-generation sequencing (NGS), which has been facilitating Culture-independent microbial community analysis using 16S ribosomal RNA gene as well as the use of genome sequencing data for more informative and precise classification and identification of Bacteria and Archaea. Because the current species definition is based on the comparison of genome sequences between type and other strains in a given species, building a genome database with accurate taxonomic information is a premium need to enhance our efforts in exploring prokaryotic diversity and discovering new species as well as for routine identifications. In this study, an integrated database, called EzBioCloud, was constructed to hold the taxonomic hierarchy of Bacteria and Archaea that are represented by quality-controlled 16S rRNA gene and genome sequences. The various bioinformatics pipelines, tools, and algorithms which were applied during the construction of the database were also developed to optimally utilize the database contents. For a more efficient 16S rRNA-based analysis, the pairwise sequence alignment algorithm was improved and a high-performance microbial community analysis pipeline was newly developed in order to better facilitate the analysis of massive NGS data and to produce better results than conventional methods. For whole genome based analyses, quality assessment methods for genome assembly and a genome annotation pipeline were developed and evaluated. The full-length 16S rRNA extraction method and efficient average nucleotide identity (ANI) calculation algorithm were utilized in the identification of public prokaryotic genomes. In order to construct the integrated genome database, whole genome assemblies in the NCBI Assembly Database were first screened to determine low-quality genomes and then subsequently subjected to a composite identification bioinformatics pipeline that employed gene-based searches followed by the calculation of average nucleotide identity. The resulting database consisted of 61,700 species/phylotypes including 13,132 with validly published names, and 62,362 whole genome assemblies that were taxonomically identified at the genus, species and subspecies level. Genomic properties, such as genome size and GC content, and the occurrence in human microbiome data were calculated for each genus or higher taxa. This comprehensive database of taxonomy, 16S rRNA gene, and genome sequences, with its accompaniment of bioinformatics tools, should accelerate genome-based classification and identification of Bacteria and Archaea. The database and related search tools are available at http://www.ezbiocloud.net/.	-
dc.description.tableofcontents	CHAPTER 1 General introduction 1 1.1. Taxonomy of prokaryotes 2 1.1.1. Principle of prokaryotic taxonomy 2 1.1.2. Prokaryotic species concept 4 1.2. Next generation sequencing (NGS) 8 1.2.1. 454 Pyrosequencing 8 1.2.2. Illumina-Solexa sequencing 10 1.2.3. Pacific Bioscience SMRT sequencing 11 1.3. Use of 16S rRNA gene in microbiology 13 1.4. Prokaryotic genomics 17 1.5. Objectives of this study 21 CHAPTER 2 Development of bioinformatics pipelines and tools for EzBioCloud database 23 2.1. Introduction 24 2.1.1. 16S rRNA based prokaryote identification algorithm 25 2.1.2. Microbial community analysis 27 2.1.3. 16S rRNA sequence in genome with short-read sequencing data 31 2.1.4. Public genome data of prokaryotes 31 2.1.5. Quality of genome assembly 32 2.1.6. Average nucleotide identity 33 2.2. Materials and method 36 2.2.1. Improvement of 16S rRNA sequence based identification algorithm 36 2.2.2. Development of microbial taxonomic profiling (MTP) pipeline 38 2.2.3. Method for extracting full-length 16S rRNA genes from short-read sequencing data 42 2.2.4. Pipeline for prokaryotic whole genome analysis 44 2.2.5. Methods for the quality assessment of genome 48 2.2.6. Efficient calculation method for average nucleotide identity 52 2.3. Results 54 2.3.1. Advanced microbial taxonomic profiling (MTP) pipeline 54 2.3.2. Comparison of full length 16S rRNA extraction methods 62 2.3.3. Annotation of public genomes 66 2.3.4. Quality of bacterial genomes 68 2.3.5. Evaluation of algorithms for average nucleotide identity 75 2.4. Discussion 81 CHAPTER 3 Development of EzBioCloud: A taxonomically united database of 16S rRNA and whole genome assemblies 84 3.1. Introduction 85 3.2. Methods 87 3.2.1. Data collection 87 3.2.2. Identification of genome sequences 90 3.2.3. Calculation of genomics features for each taxon 93 3.2.4. Bacterial community analysis of human microbiome 93 3.2.5. Operating system and software development 95 3.3. Results 96 3.3.1. Comparison of databases 96 3.3.2. Hierarchical taxonomic backbone 99 3.3.3. Identification of genome projects 103 3.3.4. Genome-derived information 107 3.4. Discussion 108 CHAPTER 4 General conclusions 111 REFERENCES 115 국문초록 130	-
dc.format	application/pdf	-
dc.format.extent	3762145 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	Bioinformatics	-
dc.subject	Pipeline	-
dc.subject	Database	-
dc.subject	16S rRNA	-
dc.subject	Genome	-
dc.subject	Taxonomy	-
dc.subject	Microbiome	-
dc.subject	Next-generation sequencing	-
dc.subject	Prokaryote	-
dc.subject.ddc	570	-
dc.title	Development of prokaryotic taxonomy-based 16S rRNA and genome database	-
dc.title.alternative	원핵미생물 분류체계에 기반한 16S rRNA 유전자 및 유전체 데이터베이스의 개발	-
dc.type	Thesis	-
dc.contributor.AlternativeAuthor	Seok-Hwan Yoon	-
dc.description.degree	Doctor	-
dc.contributor.affiliation	자연과학대학 생명과학부	-
dc.date.awarded	2017-08	-

Appears in Collections:

College of Natural Sciences (자연과학대학)
- Dept. of Biological Sciences (생명과학부)
  - Theses (Ph.D. / Sc.D._생명과학부)

Files in This Item:

000000145201.pdf 3.59 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share