Browse

A statistical analysis for next-generation sequencing data with a small number of samples
자료수가 적은 차세대 염기서열자료의 통계적 분석

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors
김정수
Advisor
박태성
Major
자연과학대학 협동과정 생물정보학전공
Issue Date
2014-02
Publisher
서울대학교 대학원
Keywords
NGSRNA-SeqExome-SeqStatistical analysis
Description
학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2014. 2. 박태성.
Abstract
With an advance of technology, new methods to meet a more suitable analysis that ever has been made, need to be developed. Since the microarray technology had been developed, plenty of methods have been invented, from genome-wide association analysis, which detects causative variants associated with diseases, to differential expression analysis, which identifies genes with dissimilar in abundance. In the early era, when the data was generated at great expense, researcher devoted to develop a method for the analysis of studies with small sample size. However, fast stabilization and incompleteness of the microarray technology lead many studies with larger sample size.
The efforts made by numerous scientists were concentrated on incorporating revisions into new methods for an analysis of microarray data. Therefore, microarray technology has experienced fast stabilization. In microarray technology, the information of interest should be pre-acquired and placed on a limited space as a set of probes. Because of this property of microarray technology, there has been limits to the amount and the variety of information we can access. Thus it is more suitable for detecting common information rather than individual-specific information with microarray. Thus, rather than small sample studies, microarray technology dedicated to large sample studies to elucidate common phenomena observed in a large sample.
Next-generation sequencing (NGS) technology is inherently suitable for detecting individual information. It was a well match between NGS technology and the personalized concept from the start of Human Genome Project. However, it is not easy to clarify the meaningful information from an individual data with a large amount of 1 base-pair resolution scale. Furthermore, relatively high cost and limited specimen availability often lead to studies with small samples (replicates). Eventually, to obtain results with significance from data with a small number of samples attracts researchers attention.
In this thesis, the approaches to genomic data and transcriptomic data both with small sample sizes will be provided. Specifically, for genomic data analysis, a new strategy called multiphasic analysis is suggested. Applying the strategy to a Mendelian disease, the strategy shows how it efficiently weed out a disease-causing variant from various candidates.
For transcriptomic data analysis, a new method is proposed for analysis of differential expression analyses between two classes, which can be applicable to RNA-Seq data with a small (even with non-replicated) number of replicates. the validity of the proposed method is provided by applying it to various real and simulated datasets and comparing the results to those obtained from other competing methods.
Language
English
URI
https://hdl.handle.net/10371/125373
Files in This Item:
Appears in Collections:
College of Natural Sciences (자연과학대학)Program in Bioinformatics (협동과정-생물정보학전공)Theses (Ph.D. / Sc.D._협동과정-생물정보학전공)
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse