S-Space Graduate School of Public Health (보건대학원) Dept. of Public Health (보건학과) Theses (Ph.D. / Sc.D._보건학과)
Research on the genomic patterns of selected infectious viruses and their utilization using bioinformatics techniques
- 보건대학원 보건학과
- Issue Date
- 서울대학교 대학원
- 학위논문 (박사)-- 서울대학교 보건대학원 : 보건학과, 2015. 2. 손현석.
- The incidences of infectious diseases were expected to decrease gradually, due to improvements in public health and advances in the life sciences, medical technology, and other areas. However, infectious diseases are still important issues threatening human health. Infectious diseases prevalent in the past may re-emerge and re-spread infection, or new variants with resistance through infectious pathogens may create new infectious diseases previously unknown. There are many causes for infectious diseases, such as faster transportation, complex population movement, worldwide climate change, and environmental changes related to communicable diseases. Now, the issue of communicable diseases is no longer limited to within countries. In the future, newly emerging and re-emerging infectious diseases may have greater influences on global health due to the rapidly changing environment. Thus, these diseases need to be evaluated thoroughly, such as via monitoring infections and transmissions, analyses for the development of treatment and prevention methods, and prediction research for simulations. Large-scale biological data related to infectious diseases obtained through various studies is accumulating continuously. The aim of this study was to provide useful information based on large-scale biological data using bioinformatics techniques. For this, studies were conducted using a series of bioinformatics approaches, from biological data collection processes to software simulation applications, and targeted viruses were selected, such that wide ranges of sequence information were acquired in line with the occurrence and area of infection. First, a comparison study was performed on occult hepatitis B virus (HBV) infection (OBI), which is difficult to diagnose. OBI is caused normally by chronic HBV infection when HBV is not diagnosed in serological tests. When HBV infection persists, not only does liver damage occur, but problems may also arise during transfusion, liver transplants, and other procedures. In this study, to determine the biological patterns for OBI diagnosis, large-scale sequence data were collected from NCBI and a database was constructed for research purposes. No site-specific mutations were found when sequence analysis was performed on surface antigen regions of HBV in either the OBI or non-OBI group from the database. Phylogenetic analysis showed different patterns between the OBI and non-OBI groups, indicating differences in genetic characteristics. If sequence data based on various epidemiological information of OBI can be collected in the future, detailed studies will be possible. Second, evolutionary pattern analysis was performed, which is useful in forecast studies and analyses of the infection characteristics of viruses that cause infectious diseases. Analyses were performed on Rift Valley fever virus (RVFV), based on a large volume of data, and on human disease-causing viruses within the Phlebovirus genus related to vector-borne viral infections. Phylogenetic analysis of the Phlebovirus genus showed a pattern of viruses classified according to the vector. Correspondence analysis based on relative synonymous codon usage (RSCU) confirmed that the virus appeared to have distribution in each vector at the S segment. This suggests that characteristics shared by viruses were separated according to the vector or that viral characteristics that determine the vector are located at the S segment. We wanted to check whether evolutionary patterns differed according to the infected host and the region in which infection occurred, as the geographical range of RVFV is expanding more and more. Correspondence analysis in accordance with phylogenetic analysis and codon usage patterns confirmed that RVFV did not differ according to the host or region but showed differential patterns according to the isolation year of the virus. In addition, correlation analysis of the effective number of codons (ENC) values, GC content, and GC3 content with respect to isolation year showed that each segment had a significant correlation. These types of studies will be useful for forecasting research on RVFV. Third, a validation test for influenza virus sequence prediction by the SimFlu program was performed such that it can be applied to other viruses causing infectious diseases in the future. Results showed that the accuracy of simulation decreased as the seed sequence progressed (in time) from the isolation year, but predictions over short time periods achieved higher accuracy. The reason for providing sequence information for mutating influenza viruses is to identify candidate sequences of future influenza viruses. Efficient use of accumulated biological data is particularly important for infectious disease research because future threats that are infectious to humans are undoubtedly possible. In addition, maintaining workflows for large-scale biological data, such as data mining, integrating, and managing and the construction of analysis systems, can provide use in related biological research. The series of approaches reported in this study includes detection of pathogen-specific genomic regions, searches for information that can be utilized in prediction research, and identification of candidate pathogen sequences. These approaches can be applied to various areas which rely on understanding pathogenic variation patterns and monitoring the transmission of infectious agents.