A Practical Algorithm for the All-Pairs Suffix-Prefix Problem

임지혁

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

A Practical Algorithm for the All-Pairs Suffix-Prefix Problem : All-Pairs Suffix-Prefix 문제에 대한 실용적인 알고리즘

DC Field	Value	Language
dc.contributor.advisor	박근수	-
dc.contributor.author	임지혁	-
dc.date.accessioned	2018-05-28T16:19:52Z	-
dc.date.available	2018-05-28T16:19:52Z	-
dc.date.issued	2018-02	-
dc.identifier.other	000000149935	-
dc.identifier.uri	https://hdl.handle.net/10371/140662	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 2. 박근수.	-
dc.description.abstract	The all-pairs suffix-prefix problem occurs as a subproblem of DNA sequence assembly where it is the most time-consuming part of the whole assembly process. In stringology, the optimal algorithms have been already proposed for the problem using a generalized suffix tree and enhanced suffix array. Since the generalized suffix tree and enhanced suffix array are heavy machineries, practical algorithms which are non-optimal but fast in practice have been proposed in bioinformatics. Among them, Readjoiner and SOF are state-of-the-art practical algorithms. In this thesis, we present an algorithm for the all-pairs suffix-prefix problem that uses a simple data structure for storing input strings and advanced algorithmic techniques for matching, which together lead to fast running time in practice. Experimental results shows that our algorithm is about 10 times faster than SOF and about 20 times faster than Readjoiner on average in 16 real datasets and 2 random datasets. We also obtain reasonable scalability with a parallel implementation of our algorithm. We estimate the running time of matching step of our algorithm and present that the ratio of running time of SOF and our algorithm has a tendency to be proportional to $m/B$ of datasets through experimental analyses. We also show the relation between the running time of three algorithms (i.e., Readjoiner, SOF, and our algorithm) and the number of overlaps for datasets.	-
dc.description.tableofcontents	Chapter 1 Introduction 1 1.1 Background 1 1.2 Contribution 3 1.3 Organization 4 Chapter 2 Previous Works 5 2.1 Preliminaries 5 2.2 Optimal Algorithms for the APSP Problem 7 2.2.1 Gusfield et al.s Algorithm 7 2.2.2 Ohlebusch and Gogs Algorithm 8 2.2.3 Tustumi et al.s Algorithm and Louza et al.s Algorithm 9 2.3 Practical Algorithms for the APSP Problem 9 2.3.1 Readjoiner 10 2.3.2 SOF 11 Chapter 3 New Algorithm for the APSP Problem 13 3.1 Overview of New Algorithm 13 3.2 Preprocessing Step 15 3.3 Matching Step 19 3.4 Output Step 23 3.5 Minor Optimizations 24 3.6 Implementation Options 25 3.7 Another Approach for the APSP Problem 26 Chapter 4 Experiments 28 4.1 Datasets and Experimental Environment 28 4.1.1 Datasets 28 4.1.2 Experimental Environment 30 4.2 Experimental Results 34 4.3 Experimental Analyses 39 Chapter 5 Conclusion 48 Bibliography 50	-
dc.format	application/pdf	-
dc.format.extent	5337306 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	All-pairs suffix-prefix problem	-
dc.subject	Algorithm engineering	-
dc.subject	DNA sequence assembly	-
dc.subject.ddc	621.3	-
dc.title	A Practical Algorithm for the All-Pairs Suffix-Prefix Problem	-
dc.title.alternative	All-Pairs Suffix-Prefix 문제에 대한 실용적인 알고리즘	-
dc.type	Thesis	-
dc.description.degree	Doctor	-
dc.contributor.affiliation	공과대학 전기·컴퓨터공학부	-
dc.date.awarded	2018-02	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Files in This Item:

000000149935.pdf 5.09 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share