Publications
Detailed Information
A Practical Algorithm for the All-Pairs Suffix-Prefix Problem : All-Pairs Suffix-Prefix 문제에 대한 실용적인 알고리즘
Cited 0 time in
Web of Science
Cited 0 time in Scopus
- Authors
- Advisor
- 박근수
- Major
- 공과대학 전기·컴퓨터공학부
- Issue Date
- 2018-02
- Publisher
- 서울대학교 대학원
- Description
- 학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 2. 박근수.
- Abstract
- The all-pairs suffix-prefix problem occurs as a subproblem of DNA sequence assembly where it is the most time-consuming part of the whole assembly process. In stringology, the optimal algorithms have been already proposed for the problem using a generalized suffix tree and enhanced suffix array. Since the generalized suffix tree and enhanced suffix array are heavy machineries, practical algorithms which are non-optimal but fast in practice have been proposed in bioinformatics. Among them, Readjoiner and SOF are state-of-the-art practical algorithms.
In this thesis, we present an algorithm for the all-pairs suffix-prefix problem that uses a simple data structure for storing input strings and advanced algorithmic techniques for matching, which together lead to fast running time in practice. Experimental results shows that our algorithm is about 10 times faster than SOF and about 20 times faster than Readjoiner on average in 16 real datasets and 2 random datasets. We also obtain reasonable scalability with a parallel implementation of our algorithm. We estimate the running time of matching step of our algorithm and present that the ratio of running time of SOF and our algorithm has a tendency to be proportional to $m/B$ of datasets through experimental analyses. We also show the relation between the running time of three algorithms (i.e., Readjoiner, SOF, and our algorithm) and the number of overlaps for datasets.
- Language
- English
- Files in This Item:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.