Publications

Detailed Information

A Practical Algorithm for the All-Pairs Suffix-Prefix Problem : All-Pairs Suffix-Prefix 문제에 대한 실용적인 알고리즘

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

임지혁

Advisor
박근수
Major
공과대학 전기·컴퓨터공학부
Issue Date
2018-02
Publisher
서울대학교 대학원
Keywords
All-pairs suffix-prefix problemAlgorithm engineeringDNA sequence assembly
Description
학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 2. 박근수.
Abstract
The all-pairs suffix-prefix problem occurs as a subproblem of DNA sequence assembly where it is the most time-consuming part of the whole assembly process. In stringology, the optimal algorithms have been already proposed for the problem using a generalized suffix tree and enhanced suffix array. Since the generalized suffix tree and enhanced suffix array are heavy machineries, practical algorithms which are non-optimal but fast in practice have been proposed in bioinformatics. Among them, Readjoiner and SOF are state-of-the-art practical algorithms.

In this thesis, we present an algorithm for the all-pairs suffix-prefix problem that uses a simple data structure for storing input strings and advanced algorithmic techniques for matching, which together lead to fast running time in practice. Experimental results shows that our algorithm is about 10 times faster than SOF and about 20 times faster than Readjoiner on average in 16 real datasets and 2 random datasets. We also obtain reasonable scalability with a parallel implementation of our algorithm. We estimate the running time of matching step of our algorithm and present that the ratio of running time of SOF and our algorithm has a tendency to be proportional to $m/B$ of datasets through experimental analyses. We also show the relation between the running time of three algorithms (i.e., Readjoiner, SOF, and our algorithm) and the number of overlaps for datasets.
Language
English
URI
https://hdl.handle.net/10371/140662
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share