Publications

Detailed Information

Identifying stress-related genes and predicting stress types in a heterogeneous time-series data : 이질적 시계열 유전자 데이터의 스트레스 연관 유전자 및 스트레스 예측 기법

DC Field Value Language
dc.contributor.advisor김 선-
dc.contributor.author강동원-
dc.date.accessioned2018-12-03T01:44:47Z-
dc.date.available2018-12-03T01:44:47Z-
dc.date.issued2018-08-
dc.identifier.other000000151664-
dc.identifier.urihttps://hdl.handle.net/10371/143905-
dc.description학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2018. 8. 김 선.-
dc.description.abstractAs gene expressions which contains data of big dimension begin to be formed, the necessity of integrated analysis of time series gene expression data is emerged. However, analyzing gene expression data is a new time series analysis problem that is not addressed in existing computer science as there are not only much time series data with few time points though it has many features but also its heterogeneous time series analysis problem in which the measurement points and experiment conditions are different with data of disorganized form, such as raw text and expression data of mixed time series.

In this study, I introduce feature embedding method with such heterogeneous time series data in form of minimizing data loss, and introduce logical relevance layer which indicates stress-gene correlation weight which is learned with cross-entropy and group effect. This layer also used in stress prediction model with logical filter layer on top of this model to get output in logical probability, and this layer is learned with CMCL (Confident Multiple Choice Learning) loss to prevent parameter overfitting.

This model revealed many Gene Ontology related to given stress with high stress-gene correlation weight. Also, to find out whether the genes which are only responding with specific stress are ranked higher, I compared gene rank for each stress of ordinary Fisher's method with my method, and I found many genes which has multiple GO term, which means correlated to multiple stimulus, are downranked in my method compared to combined limma p-value of each time series data using Fisher's method, which means this model gives high rank in genes which only respond to specific stress. Furthermore, this prediction model showed excellent performance compared to classical prediction methods like Random Forest and SVM.

Therefore, this result suggests new method for selecting gene only responding to specific stress type and predicting stress using time series data with small amount of time points and replication.
-
dc.description.tableofcontentsI. Introduction 1

1.1 Gene Expression Data 2

1.1.1 Microarray Data 2

1.1.2 Time-series Microarray Data 3

1.2 Motivation 3

1.2.1 Limitation of current biomarker detection methods 3

1.2.2 Difficulty of analyzing time-series data 4

II. Materials and Methods 6

2.1 Time series data 7

2.1.1 Definition 7

2.1.2 Dataset 8

2.1.3 Feature embedding 8

2.1.4 Limma and Foldchange 9

2.2 Stress-related gene detection model 11

2.2.1 Logical correlation layer 11

2.2.2 Group effect 12

2.3 Stress prediction model 13

2.3.1 Transposed logistic correlation layer 13

2.3.2 Normalizing 14

2.3.3 Logistic filter 15

2.3.4 CMCL loss function 15

2.4 Existing methods for performance comparison 16

2.4.1 Fishers method 16

2.4.2 Random Forest and SVM 17

III. Experiments and Results 20

3.1 Analysis of high stress-responsive genes 20

3.2 Gene rank comparsion with Fishers method 23

3.3 Stress type prediction 26

IV. Discussions 29

References 31

한글 초록 33
-
dc.formatapplication/pdf-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subject.ddc621.39-
dc.titleIdentifying stress-related genes and predicting stress types in a heterogeneous time-series data-
dc.title.alternative이질적 시계열 유전자 데이터의 스트레스 연관 유전자 및 스트레스 예측 기법-
dc.typeThesis-
dc.contributor.AlternativeAuthorDongwon Kang-
dc.description.degreeMaster-
dc.contributor.affiliation공과대학 컴퓨터공학부-
dc.date.awarded2018-08-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share