Deep Neural Network Training Accelerator Architecture Design: Acceleration of Backward Propagation using Sparsity of Neurons

이건희

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Deep Neural Network Training Accelerator Architecture Design: Acceleration of Backward Propagation using Sparsity of Neurons : 심층신경망 학습 가속기 구조 설계: 뉴런의 성김을 이용한 역전파 가속

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 이건희

Advisor: 이혁재

Issue Date: 2020

Publisher: 서울대학교 대학원

Keywords: Deep neural network training ; sparsity of neurons ; selective gradient computation ; 심층신경망 학습 ; 뉴론의 성김 ; 선택적 그래디언트 계산

Description: 학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 이혁재.

Abstract: Deep neural network has become one of the most important technologies in the various fields in computer science which tried to follow the human sense. In some fields, their performance defeats that of human sense with the help of the deep neural network. Since the fact that general purpose GPU can speed up deep neural network, GPU became the main device used for deep neural network. As the complexity of deep neural network becomes high that deep neural network requires more and more computing resources. However, general-purpose GPU consumes a lot of energy that the needs of specific hardware for deep neural network are rising. And nowadays, the specific hardwares are focusing on inference. With complicated network models, training a model consumes enormous time and energy using conventional devices. So there are increasing needs specific hardwares for DNN training.
The dissertation exploits deep neural network training accelerator architecture. The training process of a deep neural network (DNN) consists of three phases: forward propagation, backward propagation, and weight update. Among these, backward propagation for calculating gradients of activations is the most time consuming phase. The dissertation proposes hardware architectures to accelerate DNN training, focus- ing on the backward propagation phase. The dissertation makes use of the sparsity of the neurons incurred by ReLU layer or dropout layer to accelerate the backward propagation.
The first part of the dissertation proposes a hardware architecture to accelerate DNN backward propagation for convolutional layer. We assume using rectified linear unit (ReLU), which is the most widely used activation function. Since the output as well as the derivative of ReLU is zero for negative inputs, the gradient for activation is also zero for negative values. Thus, it is not needed to calculate the gradient of input activation if the input activation value is zero. Based on this observation, we design an efficient DNN accelerating hardware that skips the gradient computations for zero activations. We show the effectiveness of the approach through experiments with our accelerator design.
The second part of the dissertation proposes a hardware architecture for fully connected layer. Similar to ReLU layer, dropoout layer has explicit zero gradient for the dropped activation without gradient computation. Dropout is one of the regulariza- tion techniques which can solve the overfitting problem. During the DNN training, the dropout disconnect connections between neurons. Since the error does not propagated through the disconnected connections, we can detect zero gradient becomre computation. Making use of this characteristics, the dissertation proposes a hardware which can accelerate the backward propagation of fully connected layer. Further, the dissertation showed the effectiveness of the approach through simulation.
심층신경망은 컴퓨터 과학의 다양한 분야 중 인간의 감각을 쫓는 분야에서 가장 중요한 기술이 되어왔다. 몇몇 분야에서는 이미 심층신경망의 도움으로 인간의 감 각을 뛰어넘은 분야도 존재한다. GPGPU를 이용한 심층신경망의 가속이 가능해진 이후, GPU는 심층신경망에 있어 가장 주요한 장치로 사용되고 있다. 심층신경망 의 복잡도가 높아짐에 따라 연산에 더 많은 컴퓨팅 자원을 요구하고 있다. 그러나 GPGPU는 에너지 소모가 크기에 효율적인 심층신경망 전용 하드웨어 개발에 대한 요구가 증가하고 있다. 현재까지 이러한 전용 하드웨어는 주로 심층신경망 추론에 집중되어 왔다. 복잡한 심층신경망 모델은 학습에 긴 시간이 들고 많은 에너지를 소 모한다. 이에 심층신경망 학습을 위한 전용 하드웨어에 대한 요구가 늘어가고 있다.
본 학위논문은 심층신경망 학습 가속기 구조를 탐색하였다. 심층신경망의 학습 은 순전파, 역전파, 가중치 갱신 이렇게 세 단계로 이루어져 있다. 이 중 액티베이 션의 그래디언트를 구하는 역전파 단계가 가장 시간이 오래 걸리는 단계이다. 본 학위논문에서는 역전파 단계에 중점을 둔 심층신경망 학습을 가속하는 하드웨어 구조를 제안한다. ReLU 레이어 혹은 dropout 레이어로 인해 생긴 뉴론의 성김을 이용하여 심층신경망 학습의 역전파를 가속한다.
학위논문의 첫 부분은 합성곱 신경망의 역전파를 가속하는 심층신경망 학습 하 드웨어이다. 가장 많이 쓰이는 활성화 함수인 ReLU를 이용하는 신경망을 가정했다. 음수 입력값에 대한 ReLU 활성화 함수의 도함수가 0이 되어 해당 액티베이션의 그 래디언트 또한 0이 된다. 이 경우 그래디언트 값에 대한 계산 없이도 그래디언트 값이 0이 되는 것을 알 수 있기에 해당 그래디언트는 계산하지 않아도 된다. 이러한 특성을 이용하여 0값인 액티베이션에 대한 그래디언트 계산을 건너 뛸 수 있는 효율적인 심층신경망 가속 하드웨어를 설계했다. 또한 실험을 통해 본 하드웨어의 효율성을 검증했다.
학위논문의 두번째 부분은 완전연결 신경망의 학습을 가속하는 하드웨어 구조 제안이다. ReLU 레이어와 비슷하게 dropout 레이어 또한 그래디언트 계산 없이도 그 결과가 0임을 알 수 있다. Dropout은 심층신경망의 과적합을 해결하는 일반화 기법 중 하나로, 심층신경망 학습 과정 동안에만 무작위로 신경망의 연결을 끊어 놓는다. 신경망이 끊어진 경로로는 역전파 단계에서 에러가 전파되지 않기에 해당 그래디언트 값 또한 0임을 미리 알 수 있다. 이 특성을 이용하여 완전연결 신경망의 역전파를 가속할 수 있는 하드웨어를 설계했다. 또한 시뮬레이션을 통해 본 하드웨 어의 효율성을 검증했다.

Language: eng

URI: https://hdl.handle.net/10371/169247

http://dcollection.snu.ac.kr/common/orgView/000000163225

Files in This Item:

000000163225.pdf 4.64 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share