Computational Approaches for Exploring the Relationships in High Dimensional Spaces of Multi-Omics Data Utilizing Biological Prior Knowledge

오민식

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Computational Approaches for Exploring the Relationships in High Dimensional Spaces of Multi-Omics Data Utilizing Biological Prior Knowledge : 생물학적 사전 지식을 활용한 고차원의 다중 오믹스 관계를 찾는 컴퓨터 공학적 접근 방법

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 오민식

Advisor: 김선

Issue Date: 2021

Publisher: 서울대학교 대학원

Keywords: High dimensional data ; Multi-omics ; Gene expression ; Machine learning ; Biological prior knowledge ; Search space ; 다중 오믹스 ; 고차원 데이터 ; 생물학적 사전지식 ; 유전자 발현량 ; 기계학습 ; 탐색 공간

Description: 학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 김선.

Abstract: 세포가 어떻게 기능하고 외부 자극에 반응하는지 이해하는 것은 생물학, 의학에서 가장 중요한 관심사 중 하나이다. 기술의 발전으로 과학자들은 단일 생물학적 실험으로 세포의 변화요인들을 쉽게 측정할 수 있게 되었다. 주목할만한 예시로 게놈 시퀀싱, 유전자 발현량 측정, 유전자 발현을 조절하는 후성 유전체 측정 같은 다중 오믹스 데이터가 있다. 세포의 상태를 더 자세히 이해하기 위해서 다중 오믹스 조절자와 유전자 사이의 조절 관계를 알아내는 것이 중요하다. 하지만 다중 오믹스 조절 관계는 매우 복잡하고 모든 세포 상태 특이적인 관계를 실험적으로 검증하는 것은 불가능하다. 따라서, 서로 다른 유형의 고차원 오믹스 데이터로부터 관계를 예측하기 위한 효율적인 컴퓨터 공학적 접근방법이 요구된다. 이러한 고차원 데이터를 처리하는 한 가지 방법은 다양한 데이터베이스에서 선별된 유전자의 기능과 오믹스 간의 관계와 같은 외부 생물학적 지식을 통합하여 활용하는 것이다.
본 박사학위 논문은 생물학적 사전 지식을 활용하여 다중 오믹스 데이터로부터 유전자의 발현을 조절하는 관계를 예측하기 위한 세 가지 컴퓨터 공학적인 접근법을 제안하였다.
첫 번째는 마이크로 알엔에이와 유전자의 일대다 관계를 예측하기 위한 기법이다. 마이크로 알엔에이 표적 예측 문제는 가능한 표적 유전자의 개수가 너무 많으며 거짓 양성과 거짓음성의 비율을 조절해야 하는 문제가 있다. 이러한 문제를 해결하기 위해 마이크로 알엔에이-유전자와 데이터의 맥락 사이의 연관성을 문헌 지식을 활용하여 결정하고 마이크로 알엔에이-유전자 관계를 예측하기 위한 ContextMMIA를 개발하였다. ContextMMIA는 통계적 유의성과 문헌 관련성을 기반으로 마이크로 알엔에이-유전자 관계의 점수를 계산하여 관계의 우선순위를 결정한다. 예후가 다른 유방암 데이터에 대한 실험에서 ContextMMIA는 예후가 나쁜 유방암에서 활성화된 마이크로 알엔에이-유전자 관계를 예측하였고 기존 실험적으로 검증된 관계가 높은 우선순위로 예측되었으며 해당 유전자들이 유방암 관련 경로에 관여하는 것으로 알려졌다.
두 번째는 약물 반응을 일으키는 유전자의 다대일 조절 관계를 예측하기 위한 기법이다. 약물 반응 예측을 위해서 약물 반응 매개 유전자를 결정해야 하며 이를 위해 20,000개 유전자의 다중 오믹스 데이터를 통합 분석하는 방법이 필요하다. 이 문제를 해결하기 위해 저차원 임베딩 방법, 약물-유전자 연관성에 대한 문헌 지식 및 유전자-유전자 상호 작용 지식을 활용하여 약물 반응을 예측하기 위한 DRIM을 개발하였다. DRIM은 오토인코더, 텐서 분해, 약물-유전자 연관성을 이용하여 다중 오믹스 데이터에서 다대일 관계를 결정한다. 결정된 매개 유전자의 조절 관계를 유전자-유전자 상호 작용 지식과 약물 반응 시계열 유전자 발현 데이터의 상호 상관관계를 이용하여 결정한다. 유방암 세포주 데이터에 대한 실험에서 DRIM은 라파티닙이 표적으로 하는 PI3K-Akt 패스웨이에 관여하는 유전자들의 약물 반응 조절 관계를 예측하였고 라파티닙 반응성과 관련된 매개 유전자를 예측하였다. 그리고 예측된 조절 관계가 세포주 특이적인 패턴을 보이는 것을 확인하였다.
세 번째는 세포의 상태를 설명하는 조절자와 유전자의 다대다 조절 관계를 예측하기 위한 기법이다. 다대다 관계 예측을 위해 관찰된 유전자 발현 값과 유전자 조절 네트워크로부터 추정된 유전자 발현 값 사이의 차이를 측정하는 목적 함수를 만들었다. 목적 함수를 최소화하기 위하여 조절인자와 유전자의 수에 따라 기하급수적으로 증가하는 검색 공간을 탐색해야 한다. 이 문제를 해결하기 위해 조절자-유전자 상호 작용 지식을 활용하여 두 가지 연산을 반복하여 조절 관계를 찾는 최적화 기법을 개발하였다. 첫 번째 단계는 네트워크에 간선을 추가하기 위해 강화 학습 기반 휴리스틱을 통해 조절자를 선택하는 다대일 유전자 중심 관계를 탐색하는 단계이다. 두 번째 단계는 네트워크에서 간선을 제거하기 위해 유전자를 확률적으로 선택하는 일대다 조절자 중심 관계를 탐색하는 단계이다. 유방암 세포주 데이터에 대한 실험에서 제안된 방법은 이전의 최적화 방법보다 더 정확한 유전자 발현량 추정을 하였고 조절자 및 유전자 발현 데이터로 유방암 아형 특이적 네트워크를 구성하였다. 또한, 유방암 아형 관련 실험 검증된 조절 관계를 예측하였다.
요약하면, 본 박사학위 논문은 다중 오믹스 조절자와 유전자의 사이의 일대다, 다대일, 다대다 관계를 예측하기 위하여 생물학적 지식을 활용한 컴퓨터 공학적 접근법을 제안하였다. 제안된 방법은 증가하고 있는 분자 생물학 데이터를 분석하여 유전자 조절 상호 작용을 이해함으로써 세포 기능에 대한 심층적인 이해를 도와줄 수 있을 것으로 기대된다.
Understanding how cells function or respond to external stimuli is one of the most important questions in biology and medicine. Thanks to the advances in instrumental technologies, scientists can routinely measure events within cells in single biological experiments. Notable examples are multi-omics data: sequencing of genomes, quantifications of gene expression, and identification of epigenetic events that regulate expression of genes. In order to better understand cellular mechanisms, it is essential to identify regulatory relationships between multi-omics regulators and genes. However, regulatory relationships are very complex and it is infeasible to validate all condition-specific relationships experimentally. Thus, there is an urgent need for an efficient computational method to extract relationships from different types of high-dimensional omics data. One way to address these high-dimensional data is to incorporate external biological knowledge such as relationships between omics and functions of genes curated in various databases.
In my doctoral study, I developed three computational approaches to identify the regulatory relationships from multi-omics data utilizing biological prior knowledge.
The first study proposes a method to predict one-to-m relationships between miRNA and genes. The computational challenge of miRNA target prediction is that there are many miRNA target candidates, and the ratio of false positives to false negatives needs to be adjusted. This challenge is addressed by utilizing literature knowledge for determining the association between miRNA-gene and a given context. In this study, I developed ContextMMIA to predict miRNA-gene relationships from miRNA and gene expression data. ContextMMIA computes scores of miRNA-gene relationships based on statistical significance and literature relevance and prioritizes the relationships based on the scores. In experiments on breast cancer data with different prognosis, ContextMMIA predicted differentially activated miRNA-gene relationships in invasive breast cancer. The experimentally verified miRNA-gene relationships were predicted with high priority and those genes are known to be involved in breast cancer-related pathways.
The second study proposes a method to predict n-to-one relationships between regulators and gene on drug response. The computational challenge of drug response prediction is how to integrate multi-omics data of 20,000 genes for determining drug response mediator genes. This challenge is addressed by utilizing low-dimensional embedding methods, literature knowledge of drug-gene associations, and gene-gene interaction knowledge. For this problem, I developed DRIM to predict drug response relationships from the multi-omics data and drug-induced time-series gene expression data. DRIM uses autoencoder, tensor decomposition, and drug-gene association to determine n-to-one relationships from multi-omics data. Then, regulatory relationships of mediator genes are determined by gene-gene interaction knowledge and cross-correlation of drug-induced time-series gene expression data. In experiments on breast cancer cell line data, DRIM extracted mediator genes relevant to drug response and regulatory relationships of genes involved in the PI3K-Akt pathway targeted by lapatinib. In addition, DRIM revealed distinguished patterns of relationships in breast cancer cell lines with different lapatinib resistance.
The third study proposes a method to predict n-to-m relationships between regulators and genes. In order to predict n-to-m relationships, this study formulated an objective function that measures the deviation between observed gene expression values and estimated gene expression values derived from gene regulatory networks. The computational challenge of minimizing the objective function is to navigate the search space of relationships exponentially increasing according to the number of regulators and genes. This challenge is addressed by the iterative local optimization with regulator-gene interaction knowledge. In this study, I developed a two-step iterative RL-based method to predict n-to-m relationships from regulator and gene expression data. The first step is to explore the n-to-one gene-oriented step that selects regulators by reinforcement learning based heuristic to add edges to the network. The second step is to explore the one-to-m regulator-oriented step that stochastically selects genes to remove edges from the network. In experiments on breast cancer cell line data, the proposed method constructed breast cancer subtype-specific networks from the regulator and gene expression profiles with a more accurate gene expression estimation than previous combinatorial optimization methods. Moreover, regulatory relationships involved in the networks were associated with breast cancer subtypes.
In summary, in this thesis, I proposed computational methods for predicting one-to-m, n-to-one, and n-to-m relationships between multi-omics regulators and genes utilizing external domain knowledge. The proposed methods are expected to deepen our knowledge of cellular mechanisms by understanding gene regulatory interactions by analyzing the ever-increasing molecular biology data such as The Cancer Genome Atlas, Cancer Cell Line Encyclopedia.

Language: eng

URI: https://hdl.handle.net/10371/177540

https://dcollection.snu.ac.kr/common/orgView/000000168469

Files in This Item:

000000168469.pdf 8.00 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share