S-Space College of Engineering/Engineering Practice School (공과대학/대학원) Dept. of Computer Science and Engineering (컴퓨터공학부) Theses (Master's Degree_컴퓨터공학부)
Identifying stress-related genes and predicting stress types in a heterogeneous time-series data
이질적 시계열 유전자 데이터의 스트레스 연관 유전자 및 스트레스 예측 기법
- 김 선
- 공과대학 컴퓨터공학부
- Issue Date
- 서울대학교 대학원
- 학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2018. 8. 김 선.
- As gene expressions which contains data of big dimension begin to be formed, the necessity of integrated analysis of time series gene expression data is emerged. However, analyzing gene expression data is a new time series analysis problem that is not addressed in existing computer science as there are not only much time series data with few time points though it has many features but also its heterogeneous time series analysis problem in which the measurement points and experiment conditions are different with data of disorganized form, such as raw text and expression data of mixed time series.
In this study, I introduce feature embedding method with such heterogeneous time series data in form of minimizing data loss, and introduce logical relevance layer which indicates stress-gene correlation weight which is learned with cross-entropy and group effect. This layer also used in stress prediction model with logical filter layer on top of this model to get output in logical probability, and this layer is learned with CMCL (Confident Multiple Choice Learning) loss to prevent parameter overfitting.
This model revealed many Gene Ontology related to given stress with high stress-gene correlation weight. Also, to find out whether the genes which are only responding with specific stress are ranked higher, I compared gene rank for each stress of ordinary Fisher's method with my method, and I found many genes which has multiple GO term, which means correlated to multiple stimulus, are downranked in my method compared to combined limma p-value of each time series data using Fisher's method, which means this model gives high rank in genes which only respond to specific stress. Furthermore, this prediction model showed excellent performance compared to classical prediction methods like Random Forest and SVM.
Therefore, this result suggests new method for selecting gene only responding to specific stress type and predicting stress using time series data with small amount of time points and replication.