Publications

Detailed Information

A Practical Algorithm for the All-Pairs Suffix-Prefix Problem : All-Pairs Suffix-Prefix 문제에 대한 실용적인 알고리즘

DC Field Value Language
dc.contributor.advisor박근수-
dc.contributor.author임지혁-
dc.date.accessioned2018-05-28T16:19:52Z-
dc.date.available2018-05-28T16:19:52Z-
dc.date.issued2018-02-
dc.identifier.other000000149935-
dc.identifier.urihttps://hdl.handle.net/10371/140662-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 2. 박근수.-
dc.description.abstractThe all-pairs suffix-prefix problem occurs as a subproblem of DNA sequence assembly where it is the most time-consuming part of the whole assembly process. In stringology, the optimal algorithms have been already proposed for the problem using a generalized suffix tree and enhanced suffix array. Since the generalized suffix tree and enhanced suffix array are heavy machineries, practical algorithms which are non-optimal but fast in practice have been proposed in bioinformatics. Among them, Readjoiner and SOF are state-of-the-art practical algorithms.

In this thesis, we present an algorithm for the all-pairs suffix-prefix problem that uses a simple data structure for storing input strings and advanced algorithmic techniques for matching, which together lead to fast running time in practice. Experimental results shows that our algorithm is about 10 times faster than SOF and about 20 times faster than Readjoiner on average in 16 real datasets and 2 random datasets. We also obtain reasonable scalability with a parallel implementation of our algorithm. We estimate the running time of matching step of our algorithm and present that the ratio of running time of SOF and our algorithm has a tendency to be proportional to $m/B$ of datasets through experimental analyses. We also show the relation between the running time of three algorithms (i.e., Readjoiner, SOF, and our algorithm) and the number of overlaps for datasets.
-
dc.description.tableofcontentsChapter 1 Introduction 1
1.1 Background 1
1.2 Contribution 3
1.3 Organization 4
Chapter 2 Previous Works 5
2.1 Preliminaries 5
2.2 Optimal Algorithms for the APSP Problem 7
2.2.1 Gusfield et al.s Algorithm 7
2.2.2 Ohlebusch and Gogs Algorithm 8
2.2.3 Tustumi et al.s Algorithm and Louza et al.s Algorithm 9
2.3 Practical Algorithms for the APSP Problem 9
2.3.1 Readjoiner 10
2.3.2 SOF 11
Chapter 3 New Algorithm for the APSP Problem 13
3.1 Overview of New Algorithm 13
3.2 Preprocessing Step 15
3.3 Matching Step 19
3.4 Output Step 23
3.5 Minor Optimizations 24
3.6 Implementation Options 25
3.7 Another Approach for the APSP Problem 26
Chapter 4 Experiments 28
4.1 Datasets and Experimental Environment 28
4.1.1 Datasets 28
4.1.2 Experimental Environment 30
4.2 Experimental Results 34
4.3 Experimental Analyses 39
Chapter 5 Conclusion 48
Bibliography 50
-
dc.formatapplication/pdf-
dc.format.extent5337306 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectAll-pairs suffix-prefix problem-
dc.subjectAlgorithm engineering-
dc.subjectDNA sequence assembly-
dc.subject.ddc621.3-
dc.titleA Practical Algorithm for the All-Pairs Suffix-Prefix Problem-
dc.title.alternativeAll-Pairs Suffix-Prefix 문제에 대한 실용적인 알고리즘-
dc.typeThesis-
dc.description.degreeDoctor-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2018-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share