분산 스트림 처리 엔진에서의 선언형 질의 언어 지원 및 성능 향상 방법

한만휘

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

분산 스트림 처리 엔진에서의 선언형 질의 언어 지원 및 성능 향상 방법 : Methods for supporting and enhancing declarative query language on distributed stream processing engine

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 한만휘

Advisor: 이상구

Major: 공과대학 컴퓨터공학부

Issue Date: 2016-02

Publisher: 서울대학교 대학원

Keywords: 분산 스트림 처리 엔진 ; 선언형 질의 언어 ; 연속 질의 ; 질의 최적화

Description: 학위논문 (석사)-- 서울대학교 대학원 : 컴퓨터공학부, 2016. 2. 이상구.

Abstract: 스트림 데이터 처리는 매우 오랜 시간 진행되어 왔으며 스트림 데이터에 대한 질의를 지원하는 시스템과 언어가 연구되어 왔다. 하지만 빅데이터 시대의 도래로 스트림 데이터의 양도 늘어나고, 스트림 데이터 상에서의 복잡한 질의에 대한 요구가 늘어나면서 단일 노드에서는 메모리 부족이나 처리량(throughput)이 스트림 속도를 따라가지 못하게 되었다. 특히 질의에서 스트림에 대한 윈도우 범위를 늘리게 되면 필요한 메모리 양이 선형적으로 늘어나게 되어 질의에서 윈도우 범위의 제한을 둘 수 밖에 없었다. 이러한 빅데이터 스트림을 처리하기 위해 최근 여러 기업들에서 분산 스트림 처리 엔진을 개발하였지만, 현재 선언형 언어를 지원하는 시스템들이 존재하지 않아 처리를 위한 프로그램을 직접 해주어야 하는 문제점이 존재한다.

본 논문에서는 이러한 문제들을 해결하기 위해 현재 가장 많이 쓰이는 스트림 질의 언어인 CQL의 연산자를 분산 스트림 처리 엔진 상에서 사용할 수 있도록 변환하였다. 그 후 이러한 연산자들을 사용하여 분산 스트림 처리 엔진 상에서 질의 처리 계획을 만드는 방법에 대하여 서술한다. 또한 만들어진 질의 처리 계획의 성능을 높이기 위해 네트워크 비용을 줄이는 작업 프로세스 공유(WP-sharing) 기법과 질의 처리를 더 효율적으로 할 수 있는 휴리스틱 방법들을 제시한다.

간단한 예시 질의와 실제 질의를 반영하기 위한 TPC-H Q10질의에 대하여 실험을 진행하였고, 진행결과 두 질의 모두 처리 노드의 수가 늘어날 수록 메세지 처리량이 거의 선형적으로 증가한 것을 볼 수 있었다. 그리고 본 논문에서 제시한 휴리스틱 방법을 적용한 질의 처리 계획이 더 좋은 메세지 처리량을 보인 것을 확인 할 수 있었다. 또한 작업 프로세스 공유 기법을 적용한 질의 처리 계획이 메세지 처리량을 평균적으로 예시 질의의 경우에는 약 20%, TPC-H Q10 질의의 경우에는 약 14% 더 증가 시켜 작업 프로세스 공유 기법의 적용이 네트워크 비용을 줄여 더 좋은 성능을 보이도록 한다는 것을 볼 수 있었다.
There had been lots of research on stream data processing and systems that support querying on data stream. With the advent of big-data era, the volume of stream data and the users' need for a complex query on stream data has increased. Because of that lack of memory and slow throughput has become a problem in single node environment. To process such big-data stream, many companies developed a distributed stream processing engine (DSPE), but they still lack declarative language support to use on it. So we must program each element every times to process stream on DSPE.

To solve this problem, we use CQL as a query language and re-defined their operator to use on DSPE. After that, we describe generating process of query execution plan on DSPE using these operators. We also propose WP-sharing method to reduce network cost and heuristic methods for enhancing generated query execution plan.

We have experimented with simple query and TPC-H Q10 query, to show the scalability of query execution plan. The result showed that the message throughput increased almost linearly as the worker increases. Also proposed heuristic methods showed better message throughput, and WP-sharing method increased message throughput 20% on simple query and 13% on TPC-H Q10 query. This result shows that WP-sharing method reduces network cost and results in better performance.

Language: Korean

URI: https://hdl.handle.net/10371/122646

Files in This Item:

000000131941.pdf 7.44 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share