Inception V4 Network의 FPGA 구현을 위한 데이터 재사용 최적화

Abstract: 최근 컴퓨터 비전 분야에서 Deep Convolution Neural Network가 높은 성능을 보이고 있고, FPGA를 이용하여 이미지 추론을 가속하는 연구가 활발히 진행되고 있다. Deep CNN은 깊은 네트워크 특성때문에 많은 양의 weight 파라미터와 중간 feature map 데이터를 생성한다. 이로 인해, FPGA 상에서 추론할 때 많은 off-chip 메모리 접근을 하게 되고 이는 가속기 추론 속도 성능과 에너지 효율의 bottleneck으로 작용한다.
위 문제를 해결하기 위해 한 번 off-chip 메모리에 접근하여 가져온 데이터를 on-chip에서 최대한 재사용하는 방법들이 소개되었다. 하지만 기존의 데이터 재사용 방법들은 이미지 분류에서 높은 성능을 보이는 Inception V4 네트워크에 최적의 결과를 내지 못하는 모습을 보인다.
본 논문에서는 Inception V4 네트워크의 branch 구조를 고려하여 데이터를 on-chip에서 최대한 많이 재사용하는 Mixed convolution 방법을 제안한다. Mixed convolution은 Inception 모듈의 입력 feature map 데이터를 재사용하는 Grouped convolution과 branch 내에서 생성되는 중간 feature map 데이터를 재사용하는 Fused convolution을 모두 사용하는 것으로 2가지 방법의 장점을 모두 이용한다. 그 결과, Inception 모듈에서 생성되는 feature map 데이터에 대해서 421KB의 추가 on-chip 버퍼 메모리를 사용하여 off-chip 메모리 데이터 전송량을 37MB에서 12MB로, baseline대비 66.4% 감소시켰다. 또한, on-chip 버퍼 메모리를 최적화하기 위해 Inception-C 모듈에 full weight 재사용 방법을 사용함으로써 218KB의 추가 on-chip 버퍼 메모리를 사용하여 off-chip 메모리 데이터 전송량을 11MB로 더욱 줄여 baseline대비 68.6% 감소시켰다.
Deep Convolutional neural networks(DCNN) has been widely used in computer vision and achieved high performance enhancement. In addition, a lot of accelerator designs has been proposed using FPGA for CNN inference. DCNNs generate huge amounts of weight parameters and intermediate feature map data which requires many off-chip memory accesses during inference on FPGA accelerator. This leads to performance degradation and poor energy efficiency.
To reduce off-chip memory accesses, various of data reuse methods have been proposed. However, previous data reuse methods show low reusability on Inception V4 network which has high performance on image classification.
Considering branch topology of inception module, proposed data reuse method named Mixed convolution reuse feature map data using on-chip memory. Mixed convolution takes advantages of both Grouped convolution and Fused convolution which reuse input feature map data of inception module and intermediate feature map data of a branch respectively. As a result, Mixed convolution minimizes off chip feature map data transfer of inception modules, reducing by 66.4%, from 37MB to 12MB using extra 421KB on-chip buffer memory. In addition, to optimize on-chip buffer memory size required to minimize off-chip data transfer, Full weight reuse dataflow is applied to Inception-C module which results in reduction of off-chip feature map data transfer of inception module, reducing by 68.6%, from 37MB to 11MB using extra 218KB on-chip buffer memory.

Language: kor

URI: https://hdl.handle.net/10371/175285

https://dcollection.snu.ac.kr/common/orgView/000000164300

Files in This Item:

000000164300.pdf 1.24 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share