DFS mining: array-based solution for frequent pattern mining

김희정

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

DFS mining: array-based solution for frequent pattern mining : DFS mining: 빈발 패턴 마이닝을 위한 배열 기반 솔루션

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김희정

Advisor: Srinivasa Rao Satti

Major: 공과대학 전기·컴퓨터공학부

Issue Date: 2015-02

Publisher: 서울대학교 대학원

Keywords: FP-growth ; frequent pattern mining ; FP-tree ; DFS mining ; memory-efficient data structure

Description: 학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. Srinivasa Rao Satti.

Abstract: 빈발 패턴 마이닝은 데이터셋으로부터 의미있는 패턴을 찾는 마이닝 문제 중 하나로 마켓 데이터 분석, 생물정보학, 인덱스 선택 등 다양한 분야에서 활용되고 있는 중요한 문제이다. 이 문제를 푸는 대표적인 알고리즘인 FPgrowth 알고리즘은 FP-tree라는 트리 구조를 사용하여 빈발 패턴을 효율적으로 찾지만 포인터 사용으로 인해 메모리를 많이 차지한다는 문제가 있다.

이 논문에서 우리는 데이터셋을 포인터를 필요로 하지 않는 배열 구조에 저장하여 메모리를 적게 차지하지만 마이닝 연산을 비슷한 속도로 수행할 수 있는 DFS mining 알고리즘을 제안하였다. 우리는 실험을 통해 두 알고리즘을 비교하여 빈발 패턴 마이닝 문제에서 자료 구조와 메모리 및 연산 속도와의 관계를 알아보았다. 실험 결과로 Array-set을 생성하는 시간과 메모리 사용양은 FP-tree보다 작았지만, 마이닝 연산은 FP-growth 알고리즘이 빠르게 수행한다는 것을 알 수 있었다. 메모리를 효율적으로 사용하는 빈발 패턴 마이닝을 위해 트리 대신 배열을 이용했던 이 연구는 트리를 사용하는 알고리즘을 개선하기 위한 알고리즘 연구에서 활용 될 수 있다.
Frequent pattern mining is to find patterns that are founded frequently in a dataset. This mining problem has various applications: market basket analysis, bio-informatics, index selection etc. FP-growth is one of the well-known algorithms used for frequent pattern mining. FP-growth finds frequent patterns efficiently using a tree-based data structure called FP-tree. However, the memory usage of FP-growth is high because of many pointers stored in the tree.

In the thesis, we proposed a memory-efficient array-based data structure called Array-set and DFS mining which is an algorithm to support mining operation with reasonable time. With experimental study, we compared the two algorithms and discussed the relationship between two data structures and efficiencies of memory and time for the frequent pattern mining problem. Our experimental results confirm following observations. The construction time as well as the memory usage of the Array-set are smaller than those of the FP-tree while the mining time of the Array-set are greater than those of the FP-tree. Our research to take the place of a tree to an array can be applied to a research to improve the memory efficiency of the algorithm which uses a tree.

Language: English

URI: https://hdl.handle.net/10371/123125

Files in This Item:

000000024952.pdf 3.53 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share