Publications

Detailed Information

Denoising and Interaction Learning of Biological Data : 생체 자료 오류 정정 및 관계 학습

DC Field Value Language
dc.contributor.advisor윤성로-
dc.contributor.author이병한-
dc.date.accessioned2018-05-28T16:21:29Z-
dc.date.available2018-05-28T16:21:29Z-
dc.date.issued2018-02-
dc.identifier.other000000149606-
dc.identifier.urihttps://hdl.handle.net/10371/140674-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 2. 윤성로.-
dc.description.abstractSince the Human Genome Project was completed, enormous biological data have been accumulated as an attempt to understand the biological mechanisms of human. However, errors induced during the sequencing procedures and unrevealed inherent features of biological data for inferring their interactions arouse the necessity of large-scale data-driven applications. In this regard, this dissertation exploits the recent advances in machine learning and artificial intelligence techniques that have shown their success in time series sequence learning, including natural language processing and neural machine translation, to improve the reliability and computational performance of investigating
biological data.
This dissertation discusses three issues in sequence analysis and proposes methodologies to overcome them. First, to alleviate the error-prone nature of sequence reads from next-generation sequencing (NGS), we present an information theoretic approach for correcting sequence errors from various sequencers. Next, we show a generalized multi-graphics processing units (GPUs) accelerated sequence denoiser to address the computational challenges of denoising high-throughput sequences. Finally, we describe an end-to-end machine learning framework for robust sequence (e.g., miRNA) target prediction to boost the sensitivity without the laborious manual feature extraction procedure.
In summary, this dissertation proposes a set of methodologies on the basis of machine learning algorithms to handle biological sequences that can boost the reliability of downstream analysis.
-
dc.description.tableofcontents1 Introduction 1
2 Sequence Denoising 7
2.1 Background 10
2.1.1 Discrete Universal DEnoiser (DUDE) 10
2.2 Methods 13
2.2.1 Substitution Errors 13
2.2.2 Homopolymer Errors 16
2.3 Experimental Results 20
2.3.1 Experiment Setup 20
2.3.2 Evaluation Metric 22
2.3.3 Software Chosen for Comparison 23
2.3.4 Real Data: 454 Pyrosequencing 25
2.3.5 Real Data: Illumina Sequencing 30
2.3.6 Experiments on Simulated Data 37
2.4 Discussion 40
2.5 Summary 42
3 Scalability of a Denoiser 43
3.1 Background 44
3.1.1 Flowgrams bear Sequence Information 44
3.1.2 Noise Sources in Pyrosequenced Amplicons 45
3.1.3 Existing Denoisers for Pyrosequenced Amplicons 46
3.1.4 Profiling AmpliconNoise 47
3.1.5 CUDA Programming Model 48
3.2 Methods 49
3.2.1 Multi-GPU-Based Pairwise Distance Computation 51
3.2.2 Constructing OTU Models 54
3.2.3 Web Server Implementation 56
3.3 Experimental Results 57
3.3.1 Experiment Setup 57
3.3.2 Accuracy Comparison 57
3.3.3 Running Time Comparison 59
3.3.4 Understanding Output 63
3.4 Discussion 65
3.5 Summary 66
4 Sequence Interaction Learning 67
4.1 Background 68
4.1.1 Autoencoder 68
4.1.2 Recurrent Neural Network (RNN) 69
4.1.3 Biology of miRNA-mRNA Interactions 70
4.2 Methods 73
4.2.1 Input Representation 73
4.2.2 Modeling RNAs using RNN based Autoencoder 75
4.2.3 Modeling Interaction between RNAs 75
4.3 Experimental Results 77
4.3.1 Experiment Setup 77
4.3.2 Prediction Performance 79
4.3.3 Effects of Architecture Variation 82
4.3.4 Visual Inspection of RNN Activations 83
4.4 Discussion 84
4.5 Summary 86
5 Conclusion 87
Bibliography 90
Abstract in Korean 107
-
dc.formatapplication/pdf-
dc.format.extent7807474 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectmachine learning-
dc.subjectdeep learning-
dc.subjectend-to-end learning-
dc.subjectparallelization-
dc.subjectsequence error-
dc.subjectsequence interaction-
dc.subjecttime series-
dc.subjectmiRNA target-
dc.subject.ddc621.3-
dc.titleDenoising and Interaction Learning of Biological Data-
dc.title.alternative생체 자료 오류 정정 및 관계 학습-
dc.typeThesis-
dc.contributor.AlternativeAuthorByunghan Lee-
dc.description.degreeDoctor-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2018-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share