Publications

Detailed Information

Network and Clustering Algorithms for the Analysis of Time Series Gene Expression Data : 시계열 유전자 발현 패턴 분석을 위한 네트워크 분석 및 클러스터링 기법

DC Field Value Language
dc.contributor.advisor김선-
dc.contributor.author조겨리-
dc.date.accessioned2018-11-12T00:58:34Z-
dc.date.available2018-11-12T00:58:34Z-
dc.date.issued2018-08-
dc.identifier.other000000152463-
dc.identifier.urihttps://hdl.handle.net/10371/143201-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 8. 김선.-
dc.description.abstractExpression levels of genes at the whole genome level, especially when gene expression is measured over a time period, can be useful for characterizing biological mechanisms underlying phenotypes. Dynamic gene expression information provides opportunities to understand how organisms react in specific conditions over time. Thus, time series gene expression data is growing rapidly. However, analysis of time series gene expression data is challenging since existing methods for non-time series data need to be modified to consider the time dimension.

In my doctoral study, I developed new bioinformatics methods to analyze time series data in the context of biological network or pathway propagation. My thesis consists of three studies. In the first study, a network topology-based approach for pathway enrichment analysis, TRAP, is developed for analyzing time series transcriptome data. TRAP extends the existing pathway analysis method, SPIA, for time series analysis and estimates statistical values to measure the dynamic propagation of signaling effect in the pathway graph. In experiments on a proprietary dataset for the analysis of rice upon drought stress, TRAP was able to find relevant pathways more accurately than several existing methods. In the second study, a method to detect regulators of perturbed pathways, TimeTP, is developed. TimeTP performs pathway analysis first to determine a set of perturbed sub-pathways containing genes that are connected to propagate expression changes over time by measuring cross-correlation between two vectors of expression. To detect regulators of the perturbed pathways, TimeTP extends the gene network to include upstream regulators of genes such as transcription factors. Influence maximization technique is used to evaluate and rank the influence of regulators on the perturbed pathways. TimeTP was applied to PIK3CA knock-in dataset and found significant sub-pathways and their regulators relevant to the PIP3 signaling pathway. In the final study, a method to infer gene network using clusters for time series gene expression data is proposed. Although several clustering methods have been developed, most of the algorithms do not take the time-to-time dependency and algorithms for time series assume evenly spaced time points. However, biological experiments are often performed in unevenly spaced time intervals due to the experimental constraints on biological samples. This study aims to incorporate Gaussian process regression into the clustering process to predict unobserved values in time series data and provide more accurate clustering result. In addition, a network of clusters is generated by measuring distance (similarity) of expression patterns between clusters. As a distance measure, shape-based distance (SBD) is used to capture similarity between time-shifted patterns. The proposed method can infer gene regulatory relationship and cascading signaling effect over time. In summary, my doctoral study analyzes gene expression time series to find perturbed pathways and sub-pathways with expression propagation over time, to detect regulators of the pathways and to cluster genes of which expression profiles are represented as Gaussian processes.
-
dc.description.tableofcontentsChapter 1 Introduction 1

1.1 Gene expression 1

1.2 Biological pathway analysis 3

1.3 Time series analysis on gene expression data 6

1.3.1 Methods for analyzing time series transcriptome data 6

1.3.2 Methods for pathway based analysis of time series data 7

1.3.3 Methods for identifying regulators while analyzing time series data 8

1.4 Computational challenges and solutions in time series gene expression 8

1.5 Outline of the thesis 10

Chapter 2 A network topology-based approach for pathway enrichment analysis of time series transcriptome data 11

2.1 Background 12

2.2 Methods 19

2.2.1 One time point pathway analysis 19

2.2.2 time series pathway analysis 21

2.2.3 time series clustering 24

2.2.4 Text and graph representation of pathway analysis result 25

2.3 Results and Discussion 25

2.3.1 Pathway analysis results 27

2.3.2 Clustering results 34

2.3.3 Results from other tools 34

2.4 Conclusion 36

Chapter 3 Detecting regulators in a network labeled with time series by cross-correlation and influence maximization technique 39

3.1 Motivation 40

3.2 Methods 41

3.2.1 Dierential expression vector 43

3.2.2 Perturbed sub-pathway with delay-bounded expression propagation 43

3.2.3 P-value for perturbed sub-pathway 45

3.2.4 Time bounded network construction 47

3.2.5 Labeled influence maximization for transcription factor detection 48

3.3 Result 49

3.3.1 TF-Pathway map in time clock 50

3.3.2 Comparison with existing pathway/regulator analysis tools 52

3.4 Conclusion 55

Chapter 4 Inference of cluster network from unevenly spaced time series by Gaussian process and shape-based clustering 61

4.1 Introduction 62

4.2 Methods 63

4.2.1 Gap statistics using distance measure for Gaussian process 64

4.2.2 Edge denition by Shape-Based Distance (SBD) 65

4.3 Results 66

4.3.1 The optimal number of clusters from gap statistics 66

4.3.2 Cluster network of cell cycle dataset 68

4.4 Conclusion 69

Chapter 5 Conculsion 71

요약 89
-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subject.ddc621.3-
dc.titleNetwork and Clustering Algorithms for the Analysis of Time Series Gene Expression Data-
dc.title.alternative시계열 유전자 발현 패턴 분석을 위한 네트워크 분석 및 클러스터링 기법-
dc.typeThesis-
dc.contributor.AlternativeAuthorKyuri Jo-
dc.description.degreeDoctor-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2018-08-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share