A statistical analysis for next-generation sequencing data with a small number of samples

김정수

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

A statistical analysis for next-generation sequencing data with a small number of samples : 자료수가 적은 차세대 염기서열자료의 통계적 분석

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김정수

Advisor: 박태성

Major: 자연과학대학 협동과정 생물정보학전공

Issue Date: 2014-02

Publisher: 서울대학교 대학원

Keywords: NGS ; RNA-Seq ; Exome-Seq ; Statistical analysis

Description: 학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2014. 2. 박태성.

Abstract: With an advance of technology, new methods to meet a more suitable analysis that ever has been made, need to be developed. Since the microarray technology had been developed, plenty of methods have been invented, from genome-wide association analysis, which detects causative variants associated with diseases, to differential expression analysis, which identifies genes with dissimilar in abundance. In the early era, when the data was generated at great expense, researcher devoted to develop a method for the analysis of studies with small sample size. However, fast stabilization and incompleteness of the microarray technology lead many studies with larger sample size.
The efforts made by numerous scientists were concentrated on incorporating revisions into new methods for an analysis of microarray data. Therefore, microarray technology has experienced fast stabilization. In microarray technology, the information of interest should be pre-acquired and placed on a limited space as a set of probes. Because of this property of microarray technology, there has been limits to the amount and the variety of information we can access. Thus it is more suitable for detecting common information rather than individual-specific information with microarray. Thus, rather than small sample studies, microarray technology dedicated to large sample studies to elucidate common phenomena observed in a large sample.
Next-generation sequencing (NGS) technology is inherently suitable for detecting individual information. It was a well match between NGS technology and the personalized concept from the start of Human Genome Project. However, it is not easy to clarify the meaningful information from an individual data with a large amount of 1 base-pair resolution scale. Furthermore, relatively high cost and limited specimen availability often lead to studies with small samples (replicates). Eventually, to obtain results with significance from data with a small number of samples attracts researchers attention.
In this thesis, the approaches to genomic data and transcriptomic data both with small sample sizes will be provided. Specifically, for genomic data analysis, a new strategy called multiphasic analysis is suggested. Applying the strategy to a Mendelian disease, the strategy shows how it efficiently weed out a disease-causing variant from various candidates.
For transcriptomic data analysis, a new method is proposed for analysis of differential expression analyses between two classes, which can be applicable to RNA-Seq data with a small (even with non-replicated) number of replicates. the validity of the proposed method is provided by applying it to various real and simulated datasets and comparing the results to those obtained from other competing methods.

Language: English

URI: https://hdl.handle.net/10371/125373

Files in This Item:

000000017636.pdf 2.82 MB

Appears in Collections:

College of Natural Sciences (자연과학대학)
- Program in Bioinformatics (협동과정-생물정보학전공)
  - Theses (Ph.D. / Sc.D._협동과정-생물정보학전공)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share