Publications

Detailed Information

DeepFam: Deep learning based alignment-free method for protein family modeling and prediction : DeepFam: 딥러닝 기반의 비-정렬법 단백질군 모델링 및 예측 방법론

DC Field Value Language
dc.contributor.advisor김선-
dc.contributor.author서석준-
dc.date.accessioned2018-05-29T03:31:53Z-
dc.date.available2018-05-29T03:31:53Z-
dc.date.issued2018-02-
dc.identifier.other000000150466-
dc.identifier.urihttps://hdl.handle.net/10371/141548-
dc.description학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2018. 2. 김선.-
dc.description.abstractRecently, there are a large number of newly sequenced proteins by the next- generation sequencing technologies. However, biological experiments are too expensive to characterize such a large number of protein sequences, thus protein function prediction is primarily done by computational mod- eling methods, such as profile hidden Markov model (pHMM) and k-mer based methods. Nevertheless, existing methods have some limitations-
dc.description.abstractk- mer based methods are not accurate enough to assign protein functions and pHMM is not fast enough to handle large number of protein sequences from numerous genome projects. Therefore, a more accurate and faster protein function prediction method is needed.
In this paper, we introduce DeepFam, an alignment-free method that can extract functional information directly from sequences without the need of multiple sequence alignments. Through extensive experiments using the Clusters of Orthologous Groups (COGs) and G protein-coupled receptor (GPCR) dataset, DeepFam showed higher performance in predicting func- tions of proteins compared to the state- of-the-art methods, both alignment- free and alignment-based methods. Additionally, we showed that DeepFam has a power of capturing conserved regions to model protein families. In fact, DeepFam was able to detect conserved regions documented in the Prosite database while predicting functions of proteins. Our deep learn- ing method will be useful in characterizing functions of the ever increasing protein sequences. Implementation of algorithm is available at https://bhi- kimlab.github.io/DeepFam.
-
dc.description.tableofcontents1. Introduction 10
1.1 Modeling protein families 11
1.1.1 Alignment-based protein family modeling 11
1.1.2 Alignment-free protein family modeling 12
1.1.3 Limitations of the current protein modeling methods 12
1.2 Motivation and our approach 14
2. Materials and Methods 16
2.1 DeepFam 16
2.1.1 Encoding 16
2.1.2 Convolutionlayer 18
2.1.3 1-Maxpooling layer and Dropout 19
2.1.4 Dense layer and Softmax layer 19
2.1.5 Training 20
2.1.6 Hyperparameters 21
2.2 Existing protein modeling methods used for performance comparison 23
2.2.1 Profile hidden Markov models 24
2.2.2 K-mer based logistic regression 24
2.2.3 Protvec based logistic regression 25
2.3 Datasets 26
2.3.1 Clusters of Orthologous Groups of proteins dataset 26
2.3.2 G Protein-Coupled Receptor dataset 27
3. Experiments and Results 28
3.1 Evaluation of Clusters of Orthologous Groups Prediction 29
3.2 Evaluation of G Protein-Coupled Receptor Family Prediction 30
3.3 Evaluation of execution time over the number of families 32
3.4 Interpreting the performance of the DeepFam model 34
4. Discussion 39
References 41
-
dc.formatapplication/pdf-
dc.format.extent2753013 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectDeepFam-
dc.subjectprotein function prediction-
dc.subjectalignment-free-
dc.subjectdeep learning-
dc.subject.ddc621.39-
dc.titleDeepFam: Deep learning based alignment-free method for protein family modeling and prediction-
dc.title.alternativeDeepFam: 딥러닝 기반의 비-정렬법 단백질군 모델링 및 예측 방법론-
dc.typeThesis-
dc.contributor.AlternativeAuthorSeokjunSeo-
dc.description.degreeMaster-
dc.contributor.affiliation공과대학 컴퓨터공학부-
dc.date.awarded2018-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share