Improving Energy Efficiency of Neural Networks

Abstract: Deep learning with neural networks has shown remarkable performance in many applications. However, this success of deep learning is based on a tremendous amount of energy consumption, which becomes one of the major obstacles to deploy the deep learning model on mobile devices. To address this issue, many researchers have studied various methods for improving the energy efficiency of the neural networks. This dissertation is in line with those studies and contains mainly three approaches: 1) energy-efficient deep neural networks (DNNs), 2) application-specific accelerator, and 3) neuromorphic approach.

The first approach of this dissertation is energy-efficient DNNs by quantization. Quantization is a widely used method for reducing the energy consumption of neural networks. Many related studies have been published showing comparable accuracy to the floating-point representation (32 or 64 bits) with fewer bits. However, most of them have focused on certain models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs). To overcome these limitations, we propose a quantization method for memory-augmented neural networks (MANNs) that were introduced to increase model capacity and solve a long-term dependency problem of conventional RNNs. The in-depth analysis presented here reveals various challenges that do not appear in the quantization of the other DNNs. Without addressing them properly, quantized MANNs would normally suffer from excessive quantization error, which leads to degraded performance. In this study, we identify memory addressing (specifically, content-based addressing) as the main reason for the performance degradation and propose a robust quantization method for MANNs to address the challenge. In our experiments, we achieved a computation-energy gain of 22× with 8-bit fixed-point and binary quantization compared to the floating-point implementation. Measured on the bAbI dataset, the resulting model, named the quantized MANN (Q-MANN), improved the error rate by 46% and 30% with 8-bit fixed-point and binary quantization, respectively, compared to the MANN quantized using conventional techniques.

The second approach is an application-specific accelerator on field-programmable gate arrays (FPGAs). MANNs have shown promising results in question answering (QA) tasks that require holding contexts for answering a given question. As demands for QA on edge devices have increased, the utilization of MANNs in resource-constrained environments has become important. To achieve fast and energy-efficient inference of MANNs, we can exploit application-specific hardware accelerators on FPGAs. Although several accelerators for conventional deep neural networks have been designed, it is difficult to utilize the accelerators with MANNs efficiently, due to different requirements. In addition, characteristics of QA tasks should be considered for further improving the efficiency of inference on the accelerators. To address the aforementioned issues, we propose an inference accelerator of MANNs on FPGA. To fully utilize the proposed accelerator, we introduce fast inference methods considering the features of QA tasks. To evaluate our proposed approach, we implemented the proposed architecture on an FPGA and measured the execution time and energy consumption for the bAbI dataset. According to our thorough experiments, the proposed methods improved speed and energy efficiency of the inference of MANNs up to about 224 and 251 times, respectively, compared to those of CPU.

The final method is a neuromorphic approach represented by spiking neural networks (SNNs), which have gained much attention as the third generation of artificial neural networks. SNNs have gained considerable interest due to their energy-efficient characteristics. However, a lack of scalable training algorithms has restricted their applicability to practical machine learning problems. DNN-to-SNN conversion approaches have been widely studied to broaden the applicability of SNNs. Most previous studies, however, have not fully utilized spatio-temporal aspects of SNNs, which has led to inefficiency in terms of the number of spikes and inference latency. In this work, we introduce two methods for energy-efficient information transmission between neurons in deep SNNs: 1) burst coding and 2) T2FSNN model. Burst coding, which is introduced in deep SNN for the first time in this study, exploits burst spikes, which are a group of spikes with short inter-spike-interval. With the burst coding, we propose a hybrid neural coding scheme in deep SNNs to improve the energy efficiency further. Moreover, we present T2FSNN, which introduces the concept of time-to-first-spike coding into deep SNNs using the kernel-based dynamic threshold and dendrite to utilize temporal information in spike trains thoroughly. In addition, we propose gradient-based optimization and early firing methods to increase the efficiency of the T2FSNN further. Our experimental results showed the proposed methods could improve the energy efficiency of inference in deep SNNs.
인공신경망을 사용하는 딥러닝이 여러 분야에서 뛰어난 성능을 보여주고 있다. 그러나, 이러한 딥러닝의 성공은 엄청난 양의 에너지 소모를 기반으로 하고 있어 딥러닝 모델이 자원 제한적인 모바일 등의 환경에서 사용되는데 큰 걸림돌이 되고 있다. 이 문제를 해결하기 위해 인공신경망의 에너지 효율을 개선하려는 연구가 널리 수행되고 있다. 본 학위논문은 이러한 연구들의 연장 선상에서 에너지 효율적인 딥러닝 모델, 응용 전용 하드웨어 가속기, 뉴로모픽 접근 방법을 포함하고 있다.

첫 번째 방법은 양자화를 통한 에너지 효율적인 인공신경망 모델을 구성하는 방법이다. 양자화는 인공신경망의 에너지 소모를 줄이기 위해 널리 사용되고 있는 방법 중 하나이다. 많은 관련 연구들이 더 적은 비트로 부동소수점을 사용한 인공신경망과 비슷한 정확도를 보여주고 있다. 하지만, 대부분의 관련 연구들은 합성곱 신경망이나 순환 신경망 등 특정 인공신경망에 집중되어 있다. 이러한 한계를 극복하고자, 우리는 메모리 증강 신경망에 양자화를 적용하는 방법을 제안한다. 메모리 증강 신경망은 전통적인 순환 신경망의 긴 의존도 문제를 해결하기 위해 제안된 인공신경망이다. 우리는 철저한 분석을 통해 다른 인공신경망에서 드러나지 않은 양자화의 여러 문제점을 밝혀냈다. 이러한 문제들을 해결하지 않고서는 양자화된 메모리 증강 신경망은 일반적으로 양자화 오류로 인해 성능이 크게 저하된다. 이 연구에서 우리는 메모리 주소 지정 방식이 양자화된 메모리 증강 신경망에서 성능 저하의 주요 원인임을 밝혀내었다. 이러한 분석을 통해 우리는 양자화 오류에 강력한 메모리 증강 신경망을 제안한다. 부동소수점을 사용하는 메모리 증강 신경망과 비교하였을 때, 8 비트 고정소수점 및 이진 양자화를 적용하여 약 22배의 계산 에너지 이득을 달성하였다.

두 번째 방법은 응용 전용 하드웨어 가속기이다. 메모리 증강 신경망의 추론 에너지 효율을 개선하기 위해 우리는 전용 하드웨어 가속기 활용할 수 있다. 기존에 전통적인 인공신경망을 가속하는 여러 하드웨어 가속기가 있지만, 메모리 증강 신경망 가속기의 요구사항에 적합하지 않아 효율적으로 활용되기 어려웠다. 또한, 응용 가속기의 효율을 극대화하기 위해서는 목표로 하는 응용의 특성을 고려해야 한다. 이러한 문제점들을 해결하기 위해 우리는 질의응답에 사용될 수 있는 메모리 증강 신경망을 필드 프로그래머블 게이트 어레이에 구현하였다. 또한, 제안된 가속기를 최대한으로 활용하기 위해 질의응답의 특성을 고려한 빠른 추론 방법을 제안한다. 제안하는 가속기와 방법을 평가한 결과 중앙처리장치 대비 약 251배의 에너지 개선을 얻을 수 있었다.

마지막 방법은 스파이킹 신경망을 사용하는 뉴로모픽 접근 방식이다. 스파이킹 신경망은 높은 에너지 효율로 기대를 모으고 있는 인공신경망이다. 그러나, 깊은 스파이킹 신경망을 효율적으로 학습할 수 있는 알고리즘의 부재가 스파이킹 신경망의 적용을 가로막고 있다. 깊은 신경망에서 스파이킹 신경망으로 변환하는 방법은 이러한 학습 알고리즘 부재 문제를 해결하는 방법으로 널리 사용되고 있다. 하지만, 기존 관련 연구들은 일련의 스파이크에 존재하는 시간적 정보를 충분히 활용하지 못해 깊은 스파이킹 신경망의 효율을 높이지 못하고 있다. 이 연구에서는 에너지 효율적인 깊은 스파이킹 신경망을 구현하고자 버스트 코딩과 T2FSNN을 제안한다. 버스트 코딩은 짧은 간격의 스파이크들로 이루어진 버스트 스파이크를 활용하여 빠르게 정보를 전달하는 방법이다. 우리는 버스트 코딩과 함께 이종 뉴럴 코딩 방법을 제안하여 버스트 코딩의 활용성을 높였다. 또한, 일련의 스파이크에 존재하는 시간적 정보를 충분히 활용하기 위해 T2FSNN을 제안하였다. 경사 기반 최적화 방식과 이른 발화 방식을 제안하여 T2FSNN의 성능을 최적화 하였다. 제안하는 뉴럴 코딩과 방법들을 깊은 스파이킹 신경망에 적용한 결과 기존 대비 에너지 효율을 크게 개선할 수 있었다.

Language: eng

URI: https://hdl.handle.net/10371/169277

http://dcollection.snu.ac.kr/common/orgView/000000162600

Files in This Item:: There are no files associated with this item.

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share