내장형 실시간 시스템의 교차계층 최적화: 예측 가능한 플래시 변환계층과 가상머신에의 적용

유종훈

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

내장형 실시간 시스템의 교차계층 최적화: 예측 가능한 플래시 변환계층과 가상머신에의 적용 : Cross-Layer Optimizations for Embedded Real-Time Systems: Their Application to Predictable Flash Translation Layers and Virtual Machines

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 유종훈

Advisor: 홍성수

Major: 공과대학 전기·컴퓨터공학부

Issue Date: 2013-02

Publisher: 서울대학교 대학원

Keywords: 실시간 내장형 시스템 ; 교차계층 최적화 ; 플래시 변환 계층 ; 페트리네트 모델링 ; 가상머신 ; 락 소유자 선점

Description: 학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 홍성수.

Abstract: 최근 스마트폰, 스마트 TV 등의 전통적인 IT 제품뿐만 아니라 자동차나 항공기와 같이 실시간성이 요구되는 내장형 시스템에도 소프트웨어 플랫폼이 널리 도입되고 있다. 이러한 소프트웨어 플랫폼은 대부분 계층 구조로 구성되어 각 계층이 잘 정의된 인터페이스를 통해서만 인접한 하위 계층에 접근할 수 있게 한다. 이 같은 계층 구조 원칙은 각 계층이 독립적으로 개발되고 이식될 수 있게 지원한다는 장점을 제공하지만, 프로그램 응답 시간에 큰 영향을 미치는 계층 내부 정보에 다른 계층들이 접근하는 것을 가로막음으로써 종종 시스템의 예측 가능성을 크게 저해한다. 이러한 문제를 해결할 수 있는 유용한 기법이 교차계층 최적화이다. 이는 계층 내부 상태를 선택적으로 상위 계층에게 노출시킴으로써 시스템 성능과 예측 가능성을 향상시키는 기법이다. 이 논문에서는 교차계층 최적화 기법을 적용함으로써 최근 내장형 실시간 시스템에서 이슈가 되고 있는 두 가지 문제를 해결하고자 한다. 첫 번째는 플래시 변환 계층(flash translation layer, FTL) 요청의 최악의 수행시간(worst-case execution time, WCET) 예측 문제이고, 두 번째는 가상머신(virtual machine, VM)에서 수행되는 태스크의 최악의 대기 시간(worst-case waiting time, WCWT)을 바운드하는 문제이다.
먼저 FTL 요청의 WCET 예측을 위해 페트리네트 기반 FTL 아키텍처를 제안한다. FTL은 NAND 플래시 장치 드라이버와 기존 파일 시스템 사이의 소프트웨어 계층이다. 플래시 메모리를 저장장치로 사용하는 응용의 실시간성을 보장하기 위해 파일 시스템이 호출하는 FTL 요청의 WCET를 예측하는 것이 필수적이다. 그런데 FTL 요청은 플래시 메모리의 내부 상태인 자원 가용량에 따라 매우 큰 편차를 보이기 때문에 이를 알지 못하고서는 정확한 수행 시간의 예측이 불가능하다. WCET 분석을 위해 널리 사용되는 기존의 정적 분석 기법은 최악의 자원 가용량을 가정하기 때문에 매우 비관적인 WCET를 예측할 뿐이다.
이 논문은 이러한 문제를 해결하기 위해 자원 가용량을 FTL에게 노출시키고, 이를 활용하여 주어진 FTL 요청의 WCET를 매개변수적으로 예측할 수 있는 FTL 아키텍처를 제안한다. 아울러 FTL의 설계 단계에서 FTL 내부 동작과 자원 가용량을 페트리네트를 이용해 모델링하는 개발 프로세스와 함께 런타임에 주어진 FTL 요청의 WCET를 계산할 수 있는 최적 알고리즘을 제시한다. 이때 페트리네트와 런타임 자원 가용량은 WCET를 계산하는 매개함수와 매개변수로 각각 사용된다. 이 논문에서는 제안된 FTL 아키텍처를 실제 NAND 플래시 위에 구현하고 실험을 통해 WCET 예측 정확도를 평가하였다. 실험 결과 제안된 기법은 기존의 정적 분석에 비해 평균적으로 54배나 짧은 WCET를 예측하여 훨씬 정확함을 보였다.
이어서 VM에서 수행되는 태스크의 WCWT를 바운드하기 위해 선점 안전 스핀 락을 지원하는 VM 아키텍처를 제안한다. VM에서 수행되는 태스크들은 공유 자원 동기화를 위해 스핀 락(spin lock)을 널리 사용한다. 이때 락을 획득하려는 태스크는 락이 가용해질 때까지 CPU 자원을 소모하며 스핀 대기(spin-wait)를 한다. 이때 락 소유자가 선점 (lock holder preemption, LHP) 당해 락 획득의 가능성이 없는데도 불구하고 대기하는 행위를 불필요한 스핀 대기라고 한다. 기존 VM 아키텍처에서는 VMM에 의해 수행되는 가상 CPU의 스케줄링에 게스트 OS가 접근하지 못하기 때문에 LHP가 발생했다는 사실을 알지 못한 채 불필요한 스핀 대기가 무한정 지속될 수 있다.
이 논문에서는 이 같은 문제를 해결하기 위해 가상 CPU 스케줄링을 게스트 OS에게 노출시키고, 게스트 OS는 이를 이용해 LHP로부터 복구할 수 있는 교차계층 최적화 기법을 제안한다. 이때 LHP 복구란 락 경쟁자가 자신이 수행되던 CPU를 선점 당한 락 소유자에게 대여하여 임계 구역(critical section)의 수행을 마칠 수 있게 지원하는 기법이다. 제안된 기법은 런타임에 LHP의 검출과 복구를 수행하는 선점 안전 스핀 락 알고리즘을 제안하여 게스트 OS의 기존 스핀 락 알고리즘을 대체한다. 그리고 이를 지원하기 위한 교차계층 인터페이스를 제시한다. 이 논문은 제안된 기법을 통해 불필요한 스핀 대기 시간이 주어진 락을 획득하기 위해 경합하는 태스크의 개수에 선형적으로 바운드됨을 보인다. 실험적 평가를 위해 이 논문에서는 제안된 기법을 리눅스와 KVM 상에 구현하고 스핀 대기 시간을 측정하였다. 실험 결과 제안된 VM 상에서 수행되는 태스크들의 스핀 대기 시간은 기존 VM에 비해 최대 458배나 짧은 것으로 나타났다.
Inspired by a huge success in traditional IT products such as smartphones and smart TVs, software platforms are rapidly adopted in embedded real-time systems such as vehicles and avionics systems. A modern software platform is structured in a collection of layers so that each layer is only allowed to access the immediately lower layer through well-defined interfaces. This principle provides each software layer with the abilities to be independently developed and ported to other systems. However, it often hinders predictable system design since it prohibits each layer from accessing other layers internal states that would be needed for precise prediction of timing behaviors of the system. Cross-layer optimization is a viable solution to such a problem. It makes one layers internal states visible to other layers so that they can be exploited for boosting system predictability as well as performance. In this dissertation, two cross-layer optimization techniques are proposed to solve problems arise in modern embedded real-time systems. One is predicting worst-case execution times (WCETs) of flash translation layer (FTL) requests and the other is bounding worst-case waiting times (WCWTs) of tasks running on a virtual machine (VM).
First, this dissertation proposes Petri net-based FTL architecture for predicting WCETs of FTL requests. Accurate WCET of FTL requests cannot be estimated without exploiting run-time resource availability since execution times of FTL request vary significantly depending on it. Conventional static analysis estimates pessimistic WCET when a program exhibits a huge variance in its execution time because the worst-case resource availability is assumed. On the other hand, parametric analysis can compute accurate WCET for each instance of program execution since it formulates WCET function and computes it at run-time by instantiating program inputs. Thus, parametric approach is suitable for FTL. However, existing parametric analysis techniques derives WCET function from programming language constructs and cannot take run-time resource availability into account when in the formulation.
In this dissertation, FTL architecture is presented where run-time resource availability is exposed to FTL which exploits it to parametrically compute WCET of FTL requests. In the proposed approach, WCET function and parameters are Petri net and run-time resource availability, respectively. This dissertation also presents an FTL development process where FTL developers construct Petri net from a given FTL algorithm and implement it. An optimal algorithm that can compute a given FTL request at run-time is also presented. Experimental results show that the proposed technique computes much accurate WCET than static analysis. The estimated WCET was 54 times shorter than that static WCET.
Second, this dissertation proposed VM architecture that enables preemption-safe spin lock for bounding worst-case waiting time of tasks running on a VM. When spin locks are used for multi-processor synchronization, lock contenders must busy-wait until a lock becomes available. When a contender spins while the lock holder is preempted, it is said to be performing useless spin-wait. In the worst case, useless spin-wait can last indefinitely. It causes huge waste of CPU resources as well as unpredictable response times. In conventional VM architecture, hypervisor and guest OS are strictly separated via layering principle. This prohibits the guest OS from exploiting vCPU scheduling performed by the hypervisor to know whether a lock holder is preempted.
To overcome such a limitation, this dissertation proposes VM architecture where vCPU scheduling is exposed to guest OS. Such information is exploited by tasks to perform LHP recovery where a lock contender lends its CPU to the preempted lock holder to finish its critical section. In the proposed architecture, preemption-safe spin lock algorithm substitutes for the legacy spin lock algorithm to perform LHP detection and recovery at run-time. Proposed technique can provide linear bound to the worst-case useless spin-wait time of a task running on a VM. Experimental results show that the proposed architecture can greatly reduce the spin-wait times compared to the legacy VM architecture. Specifically, spin-wait iteration count was reduced by 458 times.

Language: Korean

URI: https://hdl.handle.net/10371/118892

Files in This Item:

000000009210.pdf 1.71 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share