S-Space College of Natural Sciences (자연과학대학) Dept. of Biological Sciences (생명과학부) Theses (Ph.D. / Sc.D._생명과학부)
Evolutionary studies on intra-species genomic diversity of Escherichia coli using large scale genome analysis
대규모 유전체 분석을 통한 대장균의 유전체 다양성 및 진화 분석
- 자연과학대학 생명과학부
- Issue Date
- 서울대학교 대학원
- 학위논문 (박사)-- 서울대학교 대학원 : 생명과학부, 2016. 8. 천종식.
- Bacterial evolution is driven by enormous genomic diversity present in the populations. Genomic diversity of a bacterial population is generated and maintained by the compounded influences of several microevolutionary mechanisms. Uniqueness of bacterial genome evolution originates from the mixture of vertical and horizontal heredity. As a result the dynamics of bacterial genomes within a species exhibits both the characters of clonal and sexual genetics, and impressively, seemingly unlimited genomic repertoire could be achieved by a single species. The course and consequences of genomic diversification within bacterial species have not been fully understood. Because of extensive genomic diversity within a species, understanding of the genomic evolution within bacterial species requires large scale exploratory and descriptive studies as well as explanatory studies based on the working hypothesis on how genomes evolve. A well-known laboratory model organism Escherichia coli has been shown to exploit highly diverse ecological niches in its natural population. Genomic studies of E. coli indeed revealed significant genome dynamism accompanied by ecological diversification. E. coli includes several types of pathogens that have exerted severe global burden of enteric diseases, and by that reason, whole genome surveys have been active for this species. At this point of time more than four thousands of E. coli genome sequences from genetically diverse strains have become available. Therefore E. coli constitutes an ideal model for studies of intra-specific genomic evolution of bacteria. In this thesis, multiple aspects of the genomic diversity of E. coli were explored and described by comparative analysis of 3,945 genome sequences of the strains belonging to the genus Escherichia. In addition the roles played by distinct microevolutionary mechanisms in the shaping of current structure of genomic diversity were assessed. Lastly a broader perspective on the evolution of E. coli genomes was achieved by analyzing the evolutionary history of E. coli and its closest relatives.
Exploration of the genomic diversity of E. coli was conducted in 4 aspects, by analyses of pan-genome size, sequence diversity, structural diversity and phylogenetic diversity. Openness of E. coli pan-genome was indicated from the analysis of 3,909 E. coli strains. Comparison between the phylogenetic diversity and the pan-genome size estimated for randomly selected subsets of the strains showed a linear relationship between the two values. Counter-intuitively the relative ratio of pan-genome size growth over the increment of phylogenetic diversity was higher in the phylogenetic groups of E. coli than for the entire species. Seeking for the reason behind this trend comprised a major theoretical motivation of this thesis. Sequence diversity of E. coli core genes had a unimodal distribution with 1.3% as the modal value. The core gene order was unexpectedly well conserved among E. coli genomes and the presence of clonal frame was supported by the linkage analysis, both indicating that the core-genome of E. coli was highly stable. An emerging conclusion from the analysis of genomic diversity was that the paces of gene contents diversification and gene sequence diversification can be uncoupled.
Based on whole genome scale phylogenetic analysis the phylogenetic structure was clearly present among the strains of E. coli. The nature of given phylogenetic structuring of E. coli population was another major theoretical motivation of this thesis. Increased inter-SNP linkage within the phylogenetic groups provided a clue that each phylogenetic group has relatively elevated clonality, while recombination rates in the ancestral population of E. coli were higher than the current rates. Assumption of clonality within phylogenetic groups could provide an explanation for the observed higher rate of within-group pan-genome growth rate per phylogenetic diversity expansion. Increased clonality is expected to result in increased efficiency of selective sweep caused by positive selection, thus resulting in the destruction and delay of sequence diversification. Inferences of recombination history in the core-genome of E. coli identified that 0.78% - 4.1% of the DNA segments in the core-genome has been replaced by homologous recombination. Among the extant lineages of E. coli the relative impact of recombination over mutation in the changes introduced to DNA sequences was distributed around 0.6 – 0.8. Relatively recent branches showed lower R/Theta than the ancestral branches, implying historical decline of recombinations influence. This direct observation of temporal decline of recombination supported the hypothesis of E. colis shifting toward clonality.
In the pan-genome of E. coli the singleton genes that occurred in just a single strain of E. coli could be originated from recent horizontal gene transfer or recent duplication. About half of the singleton genes could not be matched to any other genes in the current prokaryotic genome database. For about 10% of the E. coli singleton genes, highly similar proteins were found in diverse taxonomic divisions. Most frequently the best hits resided in the close relatives of E. coli in the Enterobacteriaceae family. However, distant taxa in other phyla, especially the Firmicutes, contributed significant amount of best hits, implying that those microbes share the common environmental gene pool with natural E. coli population.
Predominant direction of natural selection in E. coli genes were shown to be negative selection, which suppresses the diversification of sequences. Strength of negative selection was stronger in the core-genome in comparison to the genes with lower gene frequency. Despite that negative selection was dominant across all gene frequency spectrum, some genes exhibited dN/dS larger than 1 and seemed to be positively selected. Transposases comprised the largest proportions of positively selected genes. Multiple genes involved in flagellar biosynthesis were detected to be positively selected or have been under relaxed negative selection.
Based on the phylogenetic analysis of 21 genera in Enterobacteriaceae using their core-genome, the diversification within Enterobacteriaceae was characterized by the pattern of radiation and extensively conflicting phylogenetic signals at the basal area. Such ambiguity at deep branches were also observed for phylogenetic networks within the genus Escherichia. Temporally fragmented speciation might be supported by the observation. In attempt to resolve the divergence order between the species in Escherichia, Bayesian multi species coalescent analysis was carried out using 3 gene sets each composed of 60 core genes. The reconciled species tree and the collective graph of the coalescences estimated by the gene set re-confirmed that the divergence order between Escherichia spp. are ambiguous in reality. To add the geological time-scale information to the knowledge about E. coli evolution, a time-tree analysis was performed on the core-genome and the previously estimated divergence time of E. coli. By extending the previously known divergence time between E. coli and Salmonella enterica the age of Escherichia was shown to be between 37.9 – 40 MYA. The age of E. coli was estimated to be between 16.6 – 17.7 if the clade I was excluded from E. coli and 25.9 – 26.9 MYA if the clade I was included in E. coli. The obscurity of phylogenetic scenario for the origin of Shigella pathogens within E. coli was tackled by the comparison between multigene phylogeny of Shigella virulence plasmids and the chromosomal phylogeny. At least five independent plasmid acquisition events had to be assumed to explain the incongruence between the two phylogenies.
According to the results obtained in this study, population genetics of E. coli went through a transition from relatively sexual global population to relatively clonal sub-populations. Such a transition can provide the basis for the presence of phylogenetic structure, which is not common in bacterial species. Strong clonality was shown to have negative association with the genetic diversity of species, and the slowed sequence diversification due to the reduced recombination might be the reason for increased pan-genome growth rate per phylogenetic diversity in the phylogenetic groups of E. coli. As shown in the example of E. coli, bacterial genome evolution is affected by complex interplay between evolutionary mechanisms, and moreover, can be shifted in the course of intra-specific evolution. Therefore, the nature and concept of species and speciation in bacteria could be variable from species to species, and from time to time.