이진 가속을 효율적으로 하기 위한 호스트-가속기 통신 아키텍처 설계

김양수

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

이진 가속을 효율적으로 하기 위한 호스트-가속기 통신 아키텍처 설계 : A Host-Accelerator Communication Architecture Design for Efficient Binary Acceleration

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김양수

Advisor: 최기영

Major: 전기·컴퓨터공학부

Issue Date: 2012-02

Publisher: 서울대학교 대학원

Description: 학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2012. 2. 최기영.

Abstract: 일반적인 하드웨어 아키텍처에서 이진 코드의 일부 부분을 가속기에서 실행시킬 경우 데이터의 복제로 인한 문제가 발생할 수 있다. 이 문제는 필요한 데이터를 메인 메모리와 가속기의 내부 메모리 사이에서 복사할 때 발생하는 부하뿐 만이 아니며, 동일한 데이터가 여러 메모리에 중복하여 존재함으로써 데이터의 유일성이 깨져 최종 계산 결과에서 원하지 않았던 값을 얻을 수 있다는 점도 포함된다. 호스트 프로세서 코드에서 변환된 가속기 코드의 수행 결과가 호스트 프로세서와 다르게 될 경우 전체 시스템의 정확성을 깨트릴 수 있다. 이 문제는 이진 가속을 할 때 반드시 해결되어야 한다.

Configurable Range Memory(CRM)는 호스트 프로세서와 가속기가 공유하는 메모리 영역이다. CRM에는 프로세서 메모리 영역의 일부를 설정할 수 있으며, 설정된 데이터 영역으로 메인 메모리의 데이터가 읽혀와 저장된다. CRM은 메인 메모리와 가속기의 내부 메모리 사이에서 발생하는 메모리 복사에 따른 부하를 줄여 전체 시스템의 성능을 향상 시키기 위하여 제안되었다. 하지만 동일한 메모리 주소에 대해서 하나의 메모리 공간만 존재하는 CRM의 구조는 중복되는 메모리 공간에 의해 발생하는 문제를 원천적으로 해결해 줄 수 있으며, 이를 바로 이진 가속에도 도입할 수 있다. 그러나 이진 가속에 따른 제약 때문에 CRM 구조를 그대로 사용할 경우 CRM과 가속기 사이에서 불필요한 부하가 발생하여 전체 시스템의 성능을 떨어뜨릴 수 있는 문제가 발생한다.

이 논문에서는 이진 가속을 위한 새로운 CRM 아키텍처에 대해 논의한다. 이 아키텍처는 CRM 아키텍처와 같이 이진 가속을 할 때 발생할 수 있는 메모리 중복에 따른 문제를 해결하며, 또한 CRM에서 제시하고 있는 성능의 향상 또한 도모한다. 새 아키텍처의 동작에 대해서는 Verilog HDL을 이용하여 검증하였으며, 이를 FloRA Coarse-Grained Reconfigurable Array (CGRA) 아키텍처에 적용하여 트랜잭션 수준에서 시뮬레이션하여 전체 시스템의 성능을 측정하였다. 실험결과 제안한 아키텍처를 사용할 경우 어플리케이션에 따라 1.2배에서 1.9배의 성능 향상이 있음을 확인할 수 있었다.
Binary acceleration of a kernel on an accelerator may have a data duplication problem. Some data in an address range may be copied into the local memory of the accelerator incurring data copy overhead as well as a coherence problem. This memory problem must be solved to accelerate a kernel binary on the accelerator, since it may result in correctness issue.

Configurable Range Memory (CRM) is a memory shared by the host processor and the accelerator, which can specify its own address range such that the data within the range can be directly loaded into it. This memory is introduced to remove data-copy-overhead between the main memory and the local memory of the accelerator. By adopting CRM in our binary acceleration, we can avoid data duplication problem in the system. However, the memory may need to be carefully designed considering the memory access patterns of the accelerator not to incur any unnecessary overhead.

This work presents a new CRM architecture and shows how it improves the system performance with a novel Coarse-Grained Reconfigurable Array (CGRA) architecture. We verified its functionality by RTL model to use Verilog HDL. In addtion, we used transaction level simulation to show overall performance of the system. Experimental result showed that the proposed architecture improved performance by 1.2 to 1.9.

Language: kor

URI: https://hdl.handle.net/10371/155565

http://dcollection.snu.ac.kr/jsp/common/DcLoOrgPer.jsp?sItemId=000000000488

Files in This Item:: There are no files associated with this item.

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share