GPU-in-the-loop simulation for CPU/GPU heterogeneous platform

고영섭

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

GPU-in-the-loop simulation for CPU/GPU heterogeneous platform : CPU/GPU 이종 병렬 플랫폼을 위한 GPU-in-the-loop 시뮬레이션 기법

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 고영섭

Advisor: 하순회

Major: 공과대학 전기·컴퓨터공학부

Issue Date: 2016-02

Publisher: 서울대학교 대학원

Keywords: CPU/GPU heterogeneous platform ; GPU Simulation ; Virtual prototyping system ; GPU-in-the-loop simulation ; System call ; API

Description: 학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 하순회.

Abstract: 복잡한 3D 게임을 처리하거나, 높은 반응성을 가지는 유저인터페이스를 제공하기 위해서, 대부분의 임베디드 시스템에서 모바일 GPU 가 사용되고 있다. 게다가, 모바일 GPU 의 계산 능력이 높아지고, GPU 에 대한 프로그래밍이 가능해짐에 따라, 모바일 GPU 가 하나의 보조 연산 장치로서 여겨지고 있다. 모바일 GPU 의 경우, 서버 환경과 달리 제약된 파워상에서 수행되어야 하므로, 대게 적은 수의 코어를 포함한다. 그러므로, 주어진 성능과 파워 제약 조건을 만족시키기 위해서는 CPU 와 GPU 모두를 효율적으로 활용하는 것이 매우 중요하다.
CPU/GPU 이종 병렬 아키텍쳐를 설계하는 초기 단계에서 SW 에 대한 오류를 검출하거나 또는 다양한 설계 공간 탐색을 위해서, 가상 프로토타이핑 시스템을 사용하는 것이 일반적이다. 가상 프로토타이핑 시스템에서는 대상하는 시스템의 모든 구성요소에 대한 시뮬레이션 모델을 포함하므로, CPU 와 GPU 가 포함되는 이종 병렬 아키텍쳐를 위해서는 GPU 에 대한 시뮬레이션 모델이 반드시 필요하다. 그러나 일부 GPU 의 경우, 시뮬레이션 모델이 존재하지 않고, 있는 경우에도 주로 마이크로 아키텍쳐 수준에서의 아키텍쳐 탐색을 위한 목적으로 개발되어, 시뮬레이션 성능이 좋지 않다.
이러한 문제를 해결하기 위해서, 본 논문에서는 실제 하드웨어와 시뮬레이터를 결합하는 GPU-in-the-loop (GIL) 시뮬레이션 기법을 제안하려고 한다.
제안하는 방법의 경우, 다양한 수준에서 CPU 와 GPU 간의 연동이 가능한데, 첫번째 방법으로 시스템 콜 수준에서 시뮬레이터와 GPU 보드 간의 연동하는 기법을 제안한다. 제안하는 기법에서는 대상 시스템에 있는 공유 메모리가 시뮬레이터와 보드 상에 존재하는 서로 다른 두개의 메모리를 통해 시뮬레이션이 되므로, 두 메모리 간의 일관성을 유지하기 위한 메모리 동기화가 가장 중요한 문제이다. 시스템 콜 기반 기법에서 이 문제를 다루기 위해서, 주소 변환 테이블을 통해서 공유 되는 메모리 영역에 대한 정보를 저장하고, 실제 보드 상의 GPU 를 수행시키는 System Call 이 요청될 때마다, 해당 테이블을 이용하여 공유 되는 영역에 대한 동기화가 수행된다. GPU 의 수행을 시뮬레이터 상에서 모델링하기 위해, 인터럽트 기반 모델링 기법을 제안하였는데, 이 기법에서는 보드에서 측정된 GPU 수행시간을 고려하여, 시뮬레이터 상에서 GPU 인터럽트를 발생하도록 한다.
두번째 방법으로 API 수준에서 시뮬레이터와 보드 간의 연동하는 기법을 제안한다. 기존 Software Stack 에 포함된 디바이스 드라이버가 시뮬레이션 되는 경우, 다양한 GPU 를 지원하도록 확장하는 것이 어려우므로, API 기반 기법에서는 시뮬레이션 용도로 사용되는 새로운 라이브러리를 정의하고, 기존 SW stack 상에 존재하는 GPU 라이브러리를 대체하도록 하여, 디바이스 드라이버가 시뮬레이션 되지 않도록 한다. 그리고 API 수행시간을 시뮬레이터 상에서 모델링하기 위해서, 시뮬레이션을 위한 새로운 디바이스 드라이버를 정의하여, 해당 드라이버 내에서 sleep 함수를 호출하여, 보드에서 측정된 API 시간이 시뮬레이터상에 반영되게 된다.
현존하는 GPU API 중에서, 본 논문에서는 가장 많이 사용되는 OpenCL, CUDA 그리고 OpenGL ES API 에 대한 API 기반 시뮬레이션 기법을 제안한다. 그리고 올바른 시뮬레이션을 위해서, 비동기 동작, 멀티프로세스 지원, 복잡한 데이터 구조에 대한 메모리 동기화와 같은 어려운 문제들을 다양한 기법들을 통해 해결하였다.
실험 결과를 통해서, 제안된 기법이 적절한 수준의 정확도를 보장하면서, 빠른 시뮬레이션 성능을 제공할 수 있음을 확인할 수 있다. 그러므로, 제안된 기법은 SW 개발 용도뿐만 아니라, 시스템 수준에서의 성능 예측을 위한 용도로서 사용이 가능하다. 게다가, 제안된 기법의 경우, 실제 하드웨어가 사용되므로, GPU 에 대한 시뮬레이터가 제공되지 않는 경우에도 CPU/GPU 이종 병렬 시스템을 위한 가상 프로토타이핑 시스템을 구축하는 것이 가능하다.
A mobile GPU has been widely adopted in most embedded systems to handle the complex graphics computations required in modern 3D games and highly interactive UI (User Interface). Moreover, as mobile GPUs are gaining more computation power and becoming increasingly programmable, they are also used to accelerate general-purpose computations in various fields such as physics and math, and so on. Unlike server GPUs, mobile GPUs usually have fewer cores since a limited amount of power is available in a battery. Thus, it is important to efficiently utilize both CPUs and GPUs in mobile platforms to satisfy the performance and power constraints.
For design space exploration of such a CPU-GPU heterogeneous architecture or debugging the SW in the early design stage, a full system simulator is typically used, in which simulation models of all HW components in the target system is included. Unfortunately, building a full system simulator with GPU simulator is not always possible because there is no available GPU simulator, or if any, it is prohibitively slow since they are mainly developed for architecture exploration varying the internal micro-architecture of GPUs.
To solve these problems, this thesis proposes a GPU-in-the-loop (GIL) simulation technique that integrates a real GPU with a full system simulator for CPU/GPU heterogeneous platforms.
In the first part of this thesis, we propose a system call-level simulation technique in which a full system simulator interacts with a GPU board at system call level. Since the shared on-chip memory in the target system is modeled by two separate memories in the simulator and the board, memory synchronization is the most challenging problem in the proposed technique. To handle this problem in the system call-level technique, address translation tables are maintained for the shared memory regions and these memory regions are synchronized whenever the system calls which trigger the GPU execution are invoked in the board. To model the GPU execution in the simulator, interrupt-based modeling technique is proposed, in which the GPU interrupt is generated in consideration of the GPU execution time obtained from the real board.
In the second part of this thesis, we propose an API-level simulation technique in which a simulator and a board interact with each other at API level. Since the device driver in the original software stack makes it difficult to support various GPUs, a synthetic library is defined and it replaces the GPU library in the original software stack in order to ensure that the device driver is not used. To model timing of the API execution in the simulator, the sleep function is called in the synthetic driver so that the measured API time in the board elapses in the simulated time.
From the existing GPU APIs, we propose API-level simulation techniques for three commonly used APIs which are OpenCL, CUDA and OpenGL ES. And several challenging problems such as asynchronous behavior, multi-process support and memory synchronization for complex data structures are properly handled by several methods for correct simulation.
From the experimental results, we can confirm that the proposed technique can provide fast simulation speed with a reasonable timing accuracy. Therefore, it can be used not only for SW development but also for system level performance estimation. Moreover, the proposed technique makes the full system simulation for CPU/GPU heterogeneous platforms feasible even if a GPU simulator is not available.

Language: English

URI: https://hdl.handle.net/10371/119172

Files in This Item:

000000132916.pdf 1.99 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share