Browse

An Integrated Genomic Database for Evaluating Clinical Impact of Personal Genome Sequences
유전체 변이의 임상적 영향 평가를 위한 통합 유전체 데이터베이스

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors
박찬희
Advisor
김주한
Major
자연과학대학 협동과정 생물정보학전공
Issue Date
2016-02
Publisher
서울대학교 대학원
Keywords
biological databasedatabase integrationpersonal genome sequence
Description
학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2016. 2. 김주한.
Abstract
Searching biological databases for interpreting personal genome sequence is the essential routine that interpret the results of biological experiments and form a new hypothesis in genomics and proteomincs fields. It is difficult to retrieve all related information despite most researchers build in-houses or local databases to share information within groups. One must retrieve numerous resources to collect biological entries.
With increase of biological data and heterogeneous annotation scheme of genes, integration of gene-centric databases is demanding. Identifying identical genes across different gene-centric databases is a central problem in the integration of various biological databases. Traditional methods of identifying identical genes by gene symbol or genomic location may be often problematic because genes were not uniformly annotated leaving numerous genes not annotated and different methods of gene building for the identical genes can often result in different genomic locations.
We designed reliable and verified schemes to identify identical genes across three gene-centric databases (EntrezGene, UniGene, and Ensembl) using cross reference information, gene symbol and genomic location information. Gene-to-Gene cross reference network (GGN) was constructed using cross reference information. To increase reliability on identity of genes, GGN went through several procedures using topology of the network, producing reliable gene-to-gene cross reference network (RGGN). RGGN was highly consistent with traditional methods using gene symbol and genomic location. Lists of identical genes could be obtained through the processes of RGGN construction and then validation by gene symbol and genomic location. In contrast to gene integration scheme based on factitiously defined gene concepts, these schemes are natural, data-driven, and clear. Conflicts between different gene-centric databases are resolved by the introduction of network topology. We call this scheme as Closed Integration.
Only considered biological databases with cross-reference information, Open Integration approach integrates cross-reference network around biological databases identifiers, and resolves the counterpart identifiers in a target database from an input identifier describing how they are connected. This is useful for researchers who need to assess information across multiple databases and for integrating massive biological databases.
Using these schemes, integrated gene-centric database GRIP(Genome Resource Annotation Pipeline) was made, which is combined the benefits of open-closed-integration including OOP modeling as a balanced fashion.
GRIP was modeled ten biological objects divided by three categories, basic, complex, and knowledge to retrieve resources efficiently, Agent-GRIP provides a function that allows searching through the open-closed-integration GRIP. GRIP provides keyword-based search and biological knowledge-based search, which is enabled users can
define the desired search by combining each object in GRIP. Besides integration, these schemes can be also used for error corrections of biological databases. This is useful for researchers who need to assess information across multiple databases to interpret microarray experiment results, exome seq, rna seq and personal genome sequence analysis.
Language
English
URI
https://hdl.handle.net/10371/125383
Files in This Item:
Appears in Collections:
College of Natural Sciences (자연과학대학)Program in Bioinformatics (협동과정-생물정보학전공)Theses (Ph.D. / Sc.D._협동과정-생물정보학전공)
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse