A Flexible Architecture for Optimizing Distributed Data Processing

Abstract: Optimizing scheduling and communication of distributed data processing for resource and data characteristics is crucial for achieving high performance. Existing approaches to such optimizations largely fall into two categories. First, distributed runtimes provide low-level policy interfaces to apply the optimizations, but do not ensure the maintenance of correct application semantics and thus often require significant effort to use. Second, policy interfaces that extend a high-level application programming model ensure correctness, but do not provide sufficient fine control.

In this paper we propose a flexible architecture for optimizing distributed data processing. Our architecture aims to enable composable and reusable optimization policies tailored for various deployment scenarios including harnessing transient resources, performing geo-distributed data analytics, mitigating data skew, and handling large on-disk shuffle. To realize this architecture, we propose a new approach to build distributed dataflow optimization policies, and a new approach to harness transient resources in datacenters. Our evaluation results show that our flexible architecture brings performance improvements on par with existing specialized runtimes tailored for a specific deployment scenario.
분산 데이터 프로세싱의 스케쥴링과 커뮤니케이션을 리소스와 데이터 특성에 맞추어 최적화 하는 것은 높은 성능을 달성하는데 매우 중요하다. 기존 최적화 방식은 크게 두가지 카테고리로 나누어진다. 첫째, 분산 런타임들은 최적화를 적용하기 위한 로우 레벨 정책 인터페이스를 제공하지만, 올바른 애플리케이션 시멘틱의 보장을 하지 않기때문에, 사용하는데 큰 노력을 필요하게 한다. 둘째, 하이 레벨 애플리케이션 프로그래밍 모델을 확장하는 정책 인터페이스들은 올바른 시멘틱 보장을 하지만, 세밀한 컨트롤을 충분하게 제공하지 못한다.

본 논문에서 분산 데이터 처리 최적화를 위한 유연한 아키텍처를 제안한다. 우리의 유연한 아키텍처는 조합 가능하고 재사용 가능한, 다양한 실행환경에 맞춘 최적화 정책 개발을 가능하게 하는 것을 목표로 한다. 예를 들어서 일시적 자원 활용, 지리적 분산 데이터 분석, 데이터 스큐 처리, 디스크를 활용한 큰 데이터 셔플 등 실행환경이 있다. 유연한 아키텍처를 실현하기 위하여 우리는 분산 최적화 정책을 개발하는 새로운 방식 및 일시적 자원을 활용하는 새로운 방식을 제안한다. 우리의 유연한 아키텍처가 특정 실행 환경에 최적화된 기존 특수 런타임들에 가까운 성능 개선을 제공함을 실험 결과를 통해 보였다.

Language: eng

URI: https://hdl.handle.net/10371/175382

https://dcollection.snu.ac.kr/common/orgView/000000165130

Files in This Item:

000000165130.pdf 3.01 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share