Publications

Detailed Information

Network and Clustering Algorithms for the Analysis of Time Series Gene Expression Data : 시계열 유전자 발현 패턴 분석을 위한 네트워크 분석 및 클러스터링 기법

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

조겨리

Advisor
김선
Major
공과대학 전기·컴퓨터공학부
Issue Date
2018-08
Publisher
서울대학교 대학원
Description
학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 8. 김선.
Abstract
Expression levels of genes at the whole genome level, especially when gene expression is measured over a time period, can be useful for characterizing biological mechanisms underlying phenotypes. Dynamic gene expression information provides opportunities to understand how organisms react in specific conditions over time. Thus, time series gene expression data is growing rapidly. However, analysis of time series gene expression data is challenging since existing methods for non-time series data need to be modified to consider the time dimension.

In my doctoral study, I developed new bioinformatics methods to analyze time series data in the context of biological network or pathway propagation. My thesis consists of three studies. In the first study, a network topology-based approach for pathway enrichment analysis, TRAP, is developed for analyzing time series transcriptome data. TRAP extends the existing pathway analysis method, SPIA, for time series analysis and estimates statistical values to measure the dynamic propagation of signaling effect in the pathway graph. In experiments on a proprietary dataset for the analysis of rice upon drought stress, TRAP was able to find relevant pathways more accurately than several existing methods. In the second study, a method to detect regulators of perturbed pathways, TimeTP, is developed. TimeTP performs pathway analysis first to determine a set of perturbed sub-pathways containing genes that are connected to propagate expression changes over time by measuring cross-correlation between two vectors of expression. To detect regulators of the perturbed pathways, TimeTP extends the gene network to include upstream regulators of genes such as transcription factors. Influence maximization technique is used to evaluate and rank the influence of regulators on the perturbed pathways. TimeTP was applied to PIK3CA knock-in dataset and found significant sub-pathways and their regulators relevant to the PIP3 signaling pathway. In the final study, a method to infer gene network using clusters for time series gene expression data is proposed. Although several clustering methods have been developed, most of the algorithms do not take the time-to-time dependency and algorithms for time series assume evenly spaced time points. However, biological experiments are often performed in unevenly spaced time intervals due to the experimental constraints on biological samples. This study aims to incorporate Gaussian process regression into the clustering process to predict unobserved values in time series data and provide more accurate clustering result. In addition, a network of clusters is generated by measuring distance (similarity) of expression patterns between clusters. As a distance measure, shape-based distance (SBD) is used to capture similarity between time-shifted patterns. The proposed method can infer gene regulatory relationship and cascading signaling effect over time. In summary, my doctoral study analyzes gene expression time series to find perturbed pathways and sub-pathways with expression propagation over time, to detect regulators of the pathways and to cluster genes of which expression profiles are represented as Gaussian processes.
Language
English
URI
https://hdl.handle.net/10371/143201
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share