컨볼루션 신경망 학습의 메모리 집약적 연산 가속을 위한 적응형 커널 퓨전

황승환

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

컨볼루션 신경망 학습의 메모리 집약적 연산 가속을 위한 적응형 커널 퓨전 : Adaptive Kernel Fusion for Accelerating Memory-Intensive Operations of Convolutional Neural Network Training

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 황승환

Advisor: 안정호

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: 컨볼루션 신경망 ; GPU ; 배치 정규화 ; 커널 퓨전

Description: 학위논문(석사) -- 서울대학교대학원 : 융합과학기술대학원 지능정보융합학과, 2023. 8. 안정호.

Abstract: Batch normalization은 feature map의 데이터 분포를 정규분포로 정규화하는 연산으로 기계학습 모델의 정확도를 올려주지만, 연산량에 비해 많은 양의 메모리 접근을 요구하는 단점이 있었다. 특히, GPU에서 batch normalization 연산을 수행하면 메모리 접근량에 비해 낮은 연산량으로 인해 GPU의 병렬 연산을 저하한다. 이를 해결하기 위해 batch normalization 연산을 두 개로 쪼개어 전/후 연산에 fusion 하여 batch normalization 연산을 최적화하는 Batch Normalization Fission-n-Fusion (BNFF) 알고리즘이 제안되었지만, 해당 알고리즘은 batch normalization의 memory access를 줄여주는 대신, 최신 GPU 하드웨어에 적용하면 수행 시간을 증가시키는 문제가 있다.
본 논문에서, 우리는 최신 GPU 아키텍처에서 효과적인 batch normalization 가속을 위해 모델의 layer 별 파라미터에 따라 batch normalization의 fusion 방식을 바꾸는 적응형 커널 퓨전(Adaptive Kernel Fusion)을 제안한다. 우리는 먼저 최신 GPU 아키텍처에서 batch normalization 연산을 수행하는 feature map의 크기와 fusion 방식에 따른 연산 효율성을 분석하여 feature map의 크기에 따른 최적의 fusion 방식을 분석했다. 분석한 결과를 토대로 CNN 모델에 각각의 layer 별로 최적의 fusion 방식을 적용하는 알고리즘을 제안한다. 적응형 커널 퓨전을 적용한 ResNet 모델은 BNFF 알고리즘을 적용한 ResNet 모델보다 최대 ×1.69배, Pytorch로 구현한 ResNet 모델보다 최대 ×1.24배 빠른 학습이 가능하다.
The batch normalization operation improves the accuracy of machine learning models by normalizing the data distribution of feature maps to a Gaussian distribution. However, it requires a large amount of memory access compared to its computational workload, which induces the under-utilization of GPU performance. To address this issue, an algorithm called Batch normalization Fission-n-Fusion(BNFF) was proposed to optimize batch normalization by splitting the batch algorithm into two operations and fusing them into former and later operations. It reduces the memory access of batch normalization, but it is not suitable for modern GPU hardware, because the BNFF algorithm increases the training computation time.
In this paper, we propose an adaptive kernel fusion algorithm, which changes the fusion mechanism of batch normalization according to the model parameter, for efficiently optimizing the batch normalization optimization algorithm in modern GPU architecture. We analyze the efficiency according to the feature map size and the fusion mechanism of batch normalization on modern GPU architecture and determine the optimal fusion mechanism based on the feature map size. Based on the analysis, we propose a new algorithm that applies an optimal fusion mechanism to each layer of the machine learning model. Our evaluation shows that Adaptive kernel fusion improves the training time of the ResNet model up to ×1.24 and ×1.69, compared to the BNFF algorithm-based ResNet model and the original Pytorch ResNet model.

Language: kor

URI: https://hdl.handle.net/10371/197067

https://dcollection.snu.ac.kr/common/orgView/000000178156

Files in This Item:

000000178156.pdf 9.40 MB

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Intelligence and Information (지능정보융합학과)
  - Theses (Master's Degree_지능정보융합학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share