Publications

Detailed Information

Evolutionary studies on intra-species genomic diversity of Escherichia coli using large scale genome analysis : 대규모 유전체 분석을 통한 대장균의 유전체 다양성 및 진화 분석

DC Field Value Language
dc.contributor.advisor천종식-
dc.contributor.author이기현-
dc.date.accessioned2017-07-14T00:52:12Z-
dc.date.available2017-07-14T00:52:12Z-
dc.date.issued2016-08-
dc.identifier.other000000136119-
dc.identifier.urihttps://hdl.handle.net/10371/121446-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 생명과학부, 2016. 8. 천종식.-
dc.description.abstractBacterial evolution is driven by enormous genomic diversity present in the populations. Genomic diversity of a bacterial population is generated and maintained by the compounded influences of several microevolutionary mechanisms. Uniqueness of bacterial genome evolution originates from the mixture of vertical and horizontal heredity. As a result the dynamics of bacterial genomes within a species exhibits both the characters of clonal and sexual genetics, and impressively, seemingly unlimited genomic repertoire could be achieved by a single species. The course and consequences of genomic diversification within bacterial species have not been fully understood. Because of extensive genomic diversity within a species, understanding of the genomic evolution within bacterial species requires large scale exploratory and descriptive studies as well as explanatory studies based on the working hypothesis on how genomes evolve. A well-known laboratory model organism Escherichia coli has been shown to exploit highly diverse ecological niches in its natural population. Genomic studies of E. coli indeed revealed significant genome dynamism accompanied by ecological diversification. E. coli includes several types of pathogens that have exerted severe global burden of enteric diseases, and by that reason, whole genome surveys have been active for this species. At this point of time more than four thousands of E. coli genome sequences from genetically diverse strains have become available. Therefore E. coli constitutes an ideal model for studies of intra-specific genomic evolution of bacteria. In this thesis, multiple aspects of the genomic diversity of E. coli were explored and described by comparative analysis of 3,945 genome sequences of the strains belonging to the genus Escherichia. In addition the roles played by distinct microevolutionary mechanisms in the shaping of current structure of genomic diversity were assessed. Lastly a broader perspective on the evolution of E. coli genomes was achieved by analyzing the evolutionary history of E. coli and its closest relatives.
Exploration of the genomic diversity of E. coli was conducted in 4 aspects, by analyses of pan-genome size, sequence diversity, structural diversity and phylogenetic diversity. Openness of E. coli pan-genome was indicated from the analysis of 3,909 E. coli strains. Comparison between the phylogenetic diversity and the pan-genome size estimated for randomly selected subsets of the strains showed a linear relationship between the two values. Counter-intuitively the relative ratio of pan-genome size growth over the increment of phylogenetic diversity was higher in the phylogenetic groups of E. coli than for the entire species. Seeking for the reason behind this trend comprised a major theoretical motivation of this thesis. Sequence diversity of E. coli core genes had a unimodal distribution with 1.3% as the modal value. The core gene order was unexpectedly well conserved among E. coli genomes and the presence of clonal frame was supported by the linkage analysis, both indicating that the core-genome of E. coli was highly stable. An emerging conclusion from the analysis of genomic diversity was that the paces of gene contents diversification and gene sequence diversification can be uncoupled.
Based on whole genome scale phylogenetic analysis the phylogenetic structure was clearly present among the strains of E. coli. The nature of given phylogenetic structuring of E. coli population was another major theoretical motivation of this thesis. Increased inter-SNP linkage within the phylogenetic groups provided a clue that each phylogenetic group has relatively elevated clonality, while recombination rates in the ancestral population of E. coli were higher than the current rates. Assumption of clonality within phylogenetic groups could provide an explanation for the observed higher rate of within-group pan-genome growth rate per phylogenetic diversity expansion. Increased clonality is expected to result in increased efficiency of selective sweep caused by positive selection, thus resulting in the destruction and delay of sequence diversification. Inferences of recombination history in the core-genome of E. coli identified that 0.78% - 4.1% of the DNA segments in the core-genome has been replaced by homologous recombination. Among the extant lineages of E. coli the relative impact of recombination over mutation in the changes introduced to DNA sequences was distributed around 0.6 – 0.8. Relatively recent branches showed lower R/Theta than the ancestral branches, implying historical decline of recombinations influence. This direct observation of temporal decline of recombination supported the hypothesis of E. colis shifting toward clonality.
In the pan-genome of E. coli the singleton genes that occurred in just a single strain of E. coli could be originated from recent horizontal gene transfer or recent duplication. About half of the singleton genes could not be matched to any other genes in the current prokaryotic genome database. For about 10% of the E. coli singleton genes, highly similar proteins were found in diverse taxonomic divisions. Most frequently the best hits resided in the close relatives of E. coli in the Enterobacteriaceae family. However, distant taxa in other phyla, especially the Firmicutes, contributed significant amount of best hits, implying that those microbes share the common environmental gene pool with natural E. coli population.
Predominant direction of natural selection in E. coli genes were shown to be negative selection, which suppresses the diversification of sequences. Strength of negative selection was stronger in the core-genome in comparison to the genes with lower gene frequency. Despite that negative selection was dominant across all gene frequency spectrum, some genes exhibited dN/dS larger than 1 and seemed to be positively selected. Transposases comprised the largest proportions of positively selected genes. Multiple genes involved in flagellar biosynthesis were detected to be positively selected or have been under relaxed negative selection.
Based on the phylogenetic analysis of 21 genera in Enterobacteriaceae using their core-genome, the diversification within Enterobacteriaceae was characterized by the pattern of radiation and extensively conflicting phylogenetic signals at the basal area. Such ambiguity at deep branches were also observed for phylogenetic networks within the genus Escherichia. Temporally fragmented speciation might be supported by the observation. In attempt to resolve the divergence order between the species in Escherichia, Bayesian multi species coalescent analysis was carried out using 3 gene sets each composed of 60 core genes. The reconciled species tree and the collective graph of the coalescences estimated by the gene set re-confirmed that the divergence order between Escherichia spp. are ambiguous in reality. To add the geological time-scale information to the knowledge about E. coli evolution, a time-tree analysis was performed on the core-genome and the previously estimated divergence time of E. coli. By extending the previously known divergence time between E. coli and Salmonella enterica the age of Escherichia was shown to be between 37.9 – 40 MYA. The age of E. coli was estimated to be between 16.6 – 17.7 if the clade I was excluded from E. coli and 25.9 – 26.9 MYA if the clade I was included in E. coli. The obscurity of phylogenetic scenario for the origin of Shigella pathogens within E. coli was tackled by the comparison between multigene phylogeny of Shigella virulence plasmids and the chromosomal phylogeny. At least five independent plasmid acquisition events had to be assumed to explain the incongruence between the two phylogenies.
According to the results obtained in this study, population genetics of E. coli went through a transition from relatively sexual global population to relatively clonal sub-populations. Such a transition can provide the basis for the presence of phylogenetic structure, which is not common in bacterial species. Strong clonality was shown to have negative association with the genetic diversity of species, and the slowed sequence diversification due to the reduced recombination might be the reason for increased pan-genome growth rate per phylogenetic diversity in the phylogenetic groups of E. coli. As shown in the example of E. coli, bacterial genome evolution is affected by complex interplay between evolutionary mechanisms, and moreover, can be shifted in the course of intra-specific evolution. Therefore, the nature and concept of species and speciation in bacteria could be variable from species to species, and from time to time.
-
dc.description.tableofcontentsCHAPTER 1. General introduction 1
1.1. Bacterial genome evolution 2
1.2. Escherichia coli 9
1.3. Purposes and organization of this study 13

CHAPTER 2. Analysis of intra-specific genomic diversity of E. coli represented in the genome dataset 15
2.1. Introduction 16
2.2. Materials and methods 20
2.2.1. Newly sequenced E. coli genomes and the genome data obtained from public databases 20
2.2.2. Taxonomic identification, annotation of protein-coding genes and clustering of orthologous proteins 25
2.2.3. Pan-genome statistics 27
2.2.4. Phylogenetic analysis 29
2.2.5. Population structure inference using core single nucleotide polymorphisms 30
2.2.6. Analysis of gene contents variation, gene order conservation and genome-wide linkage between SNP sites 31
2.3. Results 33
2.3.1. Basic characterization of the genomes data 33
2.3.2. Open pan-genome of E. coli 40
2.3.3. Statistical analysis of pan-genome gene frequency distribution 47
2.3.4. Evolutionary rate of pan-genome growth 53
2.3.5. Phylogenetic and population genetic structure inferred from genome data 58
2.3.6. Intra-specific sequence diversity in the pan-genome of E. coli 65
2.3.7. Analysis of gene content variation 69
2.3.8. Conservation of synteny and linkage over long distance 75
2.3.9. Comparison of E. coli pan-genome properties and phylogenetic structure with those of other bacterial species 80
2.4. Discussion 87

CHAPTER 3. Characterization of microevolutionary processes that mediated genomic diversification of E. coli 93
3.1. Introduction 94
3.2. Materials and methods 97
3.2.1. Genome dataset 97
3.2.2. Analysis of homologous recombination events 98
3.2.3. Analysis of gene gain and loss history and tracking the origins of the singleton genes in E. coli pan-genome 100
3.2.4. Analysis of dN/dS ratio 102
3.3. Results 103
3.3.1. Impact of homologous recombination in genomic evolution of E. coli 103
3.3.2. Impact of gene gain and loss in the genomic evolution of E. coli and the origins of recently gained genes 119
3.3.3. Analysis of the signs of natural selection in the pan-genome of E. coli 128
3.4. Discussion 136

CHAPTER 4. Systematics study of E. coli and related taxa 143
4.1. Introduction 144
4.1.1. Timed history of bacterial evolution 144
4.1.2. Obscurities in the systematics of E. coli 147
4.2. Materials and methods 149
4.2.1. Reconstruction of Enterobacteriaceae phylogeny 149
4.2.2. Molecular clock analysis and species tree analysis of Escherichia 151
4.2.3. Reconstruction of Shigella virulence plasmid phylogeny 153
4.2.4. Reconstruction of rut and phn operon phylogenies 155
4.3. Results 156
4.3.1. Phylogenomic analysis of the evolutionary relationships of Enterobacteriaceae species 156
4.3.2. Molecular chronology of E. coli 168
4.3.3. Phylogenetic scenario for Shigella spp 170
4.3.4. Genes that distinguished E. coli from other Escherichia spp 175
4.4. Discussion 181

CHAPTER 5. Conclusions 189

REFERENCES 197

국문 초록 219
-
dc.formatapplication/pdf-
dc.format.extent7263767 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectE. coli-
dc.subjectBacteria-
dc.subjectEvolution-
dc.subjectGenomics-
dc.subjectPhylogenetics-
dc.subjectSpecies-
dc.subjectPan-genome-
dc.subject.ddc570-
dc.titleEvolutionary studies on intra-species genomic diversity of Escherichia coli using large scale genome analysis-
dc.title.alternative대규모 유전체 분석을 통한 대장균의 유전체 다양성 및 진화 분석-
dc.typeThesis-
dc.contributor.AlternativeAuthorLee, Kihyun-
dc.description.degreeDoctor-
dc.citation.pagesxvi, 223-
dc.contributor.affiliation자연과학대학 생명과학부-
dc.date.awarded2016-08-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share