배열 기반 데이터베이스를 이용한 대기 스캐닝 라이다 분석 최적화

김주훈

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

배열 기반 데이터베이스를 이용한 대기 스캐닝 라이다 분석 최적화 : Optimization for Atmospheric Scanning LiDAR Analysis using Array Database

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김주훈

Advisor: 문봉기

Issue Date: 2021

Publisher: 서울대학교 대학원

Keywords: SciDB ; UDO ; Parallel Chunk Processing ; GPU

Abstract: 대기 스캐닝 라이다 분석은 한 지점에 설치된 라이다를 통해 관측된 특정 반경 안의 결과를 분석하여 해당 지역의 조밀한 대기 농도 결과를 얻어내는 과정이다. 연구자들은 기존에 RDBMS와 GIS를 사용하거나, MATLAB이나 Python 등을 통해 해당 분석 과정을 구현하여 사용하고 있다. 한 대의 라이다 장비에서 관측되는 크기가 작은 데이터에 대해서는 해당 방법들로 가능하지만, 여러 대의 장비에서 관측된 오랜 기간의 데이터에 대해서는 기존 방법들만으로는 분석 성능의 한계가 존재한다. 연구자들이 파라미터를 변경해가며 분석 결과를 확인하는 작업은 빈번하게 발생하기 때문에, 데이터의 특성과 분석 과정에 알맞은 데이터베이스를 사용함으로써 쉽고 빠르게 큰 데이터를 관리하고 분석할 수 있어야 한다.

본 연구에서는 해당 문제를 해결하기 위하여 배열 데이터베이스를 적용하였다. 대기 스캐닝 라이다 데이터는 다차원적 특성을 가지고 있고, 분석과정에서 인접한 cell 사이의 locality를 활용할 수 있기 때문에 배열 기반 데이터베이스의 사용이 적절하다. 또한, SciDB의 shared nothing 구조를 활용하여 높은 scalability를 달성할 수 있다. 대기 스캐닝 라이다 분석과정을 SciDB안의 User-Defined Operator (UDO)를 통해 구현을 하였고, 두 가지 방향을 가지고 성능을 최적화하였다. 첫 번째 방법은 SciDB의 parallel chunk processing을 활용하였다. Operator 수행 시 여러 Instance들 안에 있는 chunk들이 병렬적으로 수행된다. 두 번째 방법은 분석 알고리즘 상에서 반복되는 연산들이 많은 Operator에 대해서 GPU를 통해 병렬화를 시도하였다.

실제 관측된 데이터들을 토대로 실험을 진행하였으며, 기존 방식들과 비교를 하였을 때 배열 기반 데이터베이스의 사용이 분석 성능 면에서 더 효율적임을 보였다. 해당 방법을 통해 연구자들은 짧은 시간 안에 큰 데이터에 대한 분석 결과를 얻을 수 있음을 확인하였다.
Atmospheric scanning LiDAR analysis is the process that obtains the fine-grained mass result of the range area by analyzing the LiDAR signal data. Researchers have been using RDBMS and GIS, MATLAB, or Python to implement the process. It is possible to analyze a small amount of historical data from just a few LiDAR h/w with those methods. However, with the large amount of historical data from lots of h/w, the performance does matter. It is clear that researchers usually check the analysis results by changing the parameters. Therefore, it is needed to use an appropriate database by considering the data characteristics and analysis process to manage and analyze the data fast and easily.

In this paper, we apply an array database to solve the problem. Array database fits on atmospheric scanning LiDAR data because it has multidimensional features, and there are operators that can use locality between adjacent cells. Also, SciDB has shared nothing architecture for scalability. We implemented the analysis process with User-Defined Operators (UDOs) in SciDB and optimize them in two ways. First, we use parallel chunk processing in SciDB. When the operator is running, chunks inside the instances are processed in parallel. Second, we implemented a GPU version of the operator which has many repetitive processes in parallel.

The experiments are held with the data based on real observed datasets. We show that our approach with the array database is better than the previous methods. Researchers can check the analysis results of big data in a short time with our method.

Language: kor

URI: https://hdl.handle.net/10371/178525

https://dcollection.snu.ac.kr/common/orgView/000000166466

Files in This Item:

000000166466.pdf 9.85 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share