Publications
Detailed Information
A non-parametric and information theoretic algorithm for identifying differentially expressed genes in multiclass RNA-seq samples : 다중 부류의 RNA-seq 표본에서 다르게 발현된 유전자를 확인하기 위한 비모수, 정보 이론적 알고리즘
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 김선 | - |
dc.contributor.author | 안재현 | - |
dc.date.accessioned | 2017-07-14T02:54:02Z | - |
dc.date.available | 2017-07-14T02:54:02Z | - |
dc.date.issued | 2014-02 | - |
dc.identifier.other | 000000016793 | - |
dc.identifier.uri | https://hdl.handle.net/10371/123030 | - |
dc.description | 학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 김선. | - |
dc.description.abstract | Gene expression in the whole cell can be routinely measured by microarray technologies or recently by using sequencing technologies. Using these technologies, identifying Differentially Expressed Genes (DEGs) among multiple phenotypes is one of the most important tasks in biology. Thus many methods for detecting DEGs between two groups has been developed. For example, T-test and relative entropy are used for detecting the difference between two probability distributions. When more than two phenotypes are considered, these methods are not applicable and other methods such as ANOVA F-test and Kruskal-Wallis are used for finding DEGs in the multiclass data. However, ANOVA F-test assumes a normal distribution and it is not designed to identify DEGs where gene are expressed distinctively in each of phenotypes. Kruskal-Wallis method, a non-parametric method, is more robust but sensitive to outliers. This thesis proposes a non-parametric and information theoretical approach for identifying DEGs. Our method can identify DEGs in the multiple class data and is less sensitive to outliers. In extensive experiments with simulated and real data, our method outperformed existing tools. In addition, a web service is implemented for the analysis of multi-class data: http://biohealth.snu.ac.kr/software/degselection | - |
dc.description.tableofcontents | Abstract i
Contents iii List of Figures v List of Tables vii Chapter 1 Introduction 1 1.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Need for a non-parametric method . . . . . . . . . . . . . 4 1.2.2 Distinguishing several groups at once . . . . . . . . . . . . 5 1.2.3 Need to be robust for outliers . . . . . . . . . . . . . . . . 5 Chapter 2 Methods 7 2.1 Overview of a proposed method . . . . . . . . . . . . . . . . . . . 7 2.2 Preprocessing data . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Dierence analysis using mutual information . . . . . . . . . . . 9 2.4 Estimating P-value . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 3 Results 11 3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.1 Simulated data . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.2 Real data . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Classication results . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.1 Simulated data . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.2 Real data . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Biological interpretation . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.1 Rice data . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.2 Breast cancer data . . . . . . . . . . . . . . . . . . . . . . 18 Chapter 4 Conclusion 21 요약 25 Acknowledgements 27 Chapter 5 Appendix 28 | - |
dc.format | application/pdf | - |
dc.format.extent | 2452724 bytes | - |
dc.format.medium | application/pdf | - |
dc.language.iso | en | - |
dc.publisher | 서울대학교 대학원 | - |
dc.subject | Differentially expressed genes | - |
dc.subject | information theoretic approach | - |
dc.subject | multiclass | - |
dc.subject | RNA-seq | - |
dc.subject.ddc | 621 | - |
dc.title | A non-parametric and information theoretic algorithm for identifying differentially expressed genes in multiclass RNA-seq samples | - |
dc.title.alternative | 다중 부류의 RNA-seq 표본에서 다르게 발현된 유전자를 확인하기 위한 비모수, 정보 이론적 알고리즘 | - |
dc.type | Thesis | - |
dc.description.degree | Master | - |
dc.citation.pages | 29 | - |
dc.contributor.affiliation | 공과대학 전기·컴퓨터공학부 | - |
dc.date.awarded | 2014-02 | - |
- Appears in Collections:
- Files in This Item:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.