S-Space College of Engineering/Engineering Practice School (공과대학/대학원) Dept. of Electrical and Computer Engineering (전기·정보공학부) Theses (Ph.D. / Sc.D._전기·정보공학부)
Order-Preserving Matching in Numeric Strings
수치 문자열의 순서를 보존하는 매칭 기법
- 공과대학 전기·컴퓨터공학부
- Issue Date
- 서울대학교 대학원
- 학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 박근수.
- String matching is a fundamental problem in computer science and has been extensively studied. Sometimes a string consists of numeric values instead of alphabet characters, and we are interested in some trends in the text rather than specific patterns. We introduce a new string matching problem called order-preserving matching on numeric strings, where a pattern matches a text substring of the same length if the relative orders in the substring coincide with those of the pattern. Order-preserving matching is applicable to many scenarios such as stock price analysis and musical melody matching.
In this thesis, we define order-preserving matching in numeric strings, and present various representations of order relations and efficient algorithms of order-preserving matching with those representations. For single pattern matching, we give an O(n log m) time algorithm with the prefix representation based on the KMP algorithm, and optimize it further to obtain O(n + m log m) time with the nearest neighbor representation, where n and m are the lengths of the text and the pattern, respectively. For multiple pattern matching, we present an O((n+m) log m) time algorithm with the prefix representation based on the Aho-Corasick algorithm, where n is the text length and m is the sum of the lengths of the patterns. Our algorithms are presented in binary order relations first, and then extended to ternary order relations. With our extensions, the time complexities in binary order relations can be achieved in ternary order relations as well.