Remote Memory for Virtualized Environments

Abstract: 클라우드 환경은 거대한 연산 자원을 상시 가동할 필요 없고 원하는 순간 원하는 양의 대한 연산 비용만을 지불하면 되기 때문에, 최근 인공지능 및 빅데이터 연산의 유행으로 인해 그 수요가 크게 증가하고 있다.
이러한 클라우드 컴퓨팅의 도입으로인해 고객은 서버 유지에 대한 비용을 크게 절감할 수 있고 서비스 제공자는 연산 자원의 이용 효율을 극대화 할 수 있다.

이러한 시나리오에서 데이터센터 입장에서는 연산 자원 활용 효율을 개선하는 것이 중요한 목표가 된다.
특히 최근 폭증하고 있는 데이터 센터의 규모를 고려하면 작은 효율 개선으로도 막대한 경제적 가치를 창출 할 수 있다.

데이터 센터의 효율은 위치 선정, 구조 설계, 냉각 시스템, 하드웨어 구성 등등 다양한 요소들에 영향을 받지만,
이 논문에서는 특히 연산 및 메모리 자원을 관리하는 소프트웨어 설계 및 구현을 다룬다.

본 논문에서는 데이터 센터 효율 개선을 획기적으로 개선하는 두가지 소프트웨어 기반 기술을 제안한다.
첫 째로 가상화 환경을 위한 소프트웨어 기반 메모리 분리 시스템을 제안한다.
최근 고속 네트워크의 발전으로 인해 원격 메모리 접근 비용이 획기적으로 줄어 들었고, 이 논문에서는 고성능 네트워킹 하드웨어를 이용하여 원격 메모리 위에서 실행되는 가상 머신의 큰 성능 저하 없이 실행할 수 있음을 보인다.
제안된 기술을 QEMU/KVM 가상머신 하이퍼바이저를 통해 평가한 결과, 본 논문에서 제안한 기법은 기존 시스템 대비 원격 페이징에 대한 꼬리 지연시간을 98.2% 개선함을 보인다.
또한 랙 규모의 작업처리 시뮬레이션을 통한 실험에서, 제안된 시스템은 전체 작업 처리 시간을 기존 시스템 대비 40.9% 줄일 수 있음을 보인다.

두 번째로 원격 메모리를 이용하는 즉각적인 가상머신 이주 기법을 제안하다.
가상화 환경의 원격 메모리 활용에 대한 확장은 그것만으로 자원 이용률 향상에 대해 큰 기여를 하지만, 여전히 한 서버에서 여러 어플리케이션이 경쟁적으로 자원을 이용하는 경우 성능이 크게 저하 될 수 있다.
이 논문에서 제안하는 즉각적인 가상머신 이주 기법은 원격 메모리 상에서 아주 작은 메타데이터의 전송만으로 가상머신의 이주를 가능하게 하며,
메모리 상에 키와 값을 저장하는 데이터베이스 벤치마크를 실행하는 가상머신을 기반으로 한 평가에서 기존 기법대비 실질적인 서비스 중단시간을 최대 92.6% 개선함을 보인다.
The raising importance of big data and artificial intelligence (AI) has led to an unprecedented shift in moving local computation into the cloud.
One of the key drivers behind this transformation was the exploding cost of owning and maintaining large computing systems powerful enough to process these new workloads.
Customers experience a reduced cost by renting only the required resources and only when needed, while data center operators benefit from efficiency at scale.

A key factor in operating a profitable data center is a high overall utilization of its resources.
Due to the scale of modern data centers, small improvements in efficiency translate to significant savings in the total cost of ownership (TCO).

There are many important elements that constitute an efficient data center such as its location, architecture, cooling system, or the employed hardware.
In this thesis, we focus on software-related aspects, namely the utilization of computational and memory resources.
Reports from data centers operated by Alibaba and Google show that the overall resource utilization has stagnated at a level of around 50 to 60 percent over the past decade.
This low average utilization is mostly attributable to peak demand-driven resource allocation despite the high variability of modern workloads in their resource usage.
In other words, data centers today lack an efficient way to put idle resources that are reserved but not used to work.

In this dissertation we present RackMem, a software-based solution to address the problem of low resource utilization through two main contributions.
First, we introduce a disaggregated memory system tailored for virtual environments.
We observe that virtual machines can use remote memory without noticeable performance degradation under moderate memory pressure on modern networking infrastructure.
We implement a specialized remote paging system for QEMU/KVM that reduces the remote paging tail-latency by 98.2% in comparison to the state of the art.
A job processing simulation at rack-scale shows that the total makespan can be reduced by 40.9% under our memory system.

While seamless disaggregated memory helps to balance memory usage across nodes, individual nodes can still suffer overloaded resources if co-located workloads exhibit high resource usage at the same time.
In a second contribution, we present a novel live migration technique for machines running on top of our remote paging system.
Under this instant live migration technique, entire virtual machines can be migrated in as little as 100 milliseconds.
An evaluation with in-memory key-value database workloads shows that the presented migration technique improves the state of the art by a wide margin in all key performance metrics.

The presented software-based solutions lay the technical foundations that allow data center operators to significantly improve the utilization of their computational and memory resources.
As future work, we propose new job schedulers and load balancers to make full use of these new technical foundations.

Language: eng

URI: https://hdl.handle.net/10371/178920

https://dcollection.snu.ac.kr/common/orgView/000000167536

Files in This Item:

000000167536.pdf 6.72 MB

Appears in Collections:

College of Dentistry/School of Dentistry (치과대학/치의학대학원)
- Dept. of Dental Science(치의과학과)
  - Theses (Ph.D. / Sc.D._치의과학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share