Improving Energy Efficiency of Neural Networks

박성식

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Improving Energy Efficiency of Neural Networks : 인공신경망의 에너지 효율 개선

DC Field	Value	Language
dc.contributor.advisor	윤성로	-
dc.contributor.author	박성식	-
dc.date.accessioned	2020-10-13T02:51:44Z	-
dc.date.available	2020-10-13T02:51:44Z	-
dc.date.issued	2020	-
dc.identifier.other	000000162600	-
dc.identifier.uri	https://hdl.handle.net/10371/169277	-
dc.identifier.uri	http://dcollection.snu.ac.kr/common/orgView/000000162600	ko_KR
dc.description	학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 윤성로.	-
dc.description.abstract	Deep learning with neural networks has shown remarkable performance in many applications. However, this success of deep learning is based on a tremendous amount of energy consumption, which becomes one of the major obstacles to deploy the deep learning model on mobile devices. To address this issue, many researchers have studied various methods for improving the energy efficiency of the neural networks. This dissertation is in line with those studies and contains mainly three approaches: 1) energy-efficient deep neural networks (DNNs), 2) application-specific accelerator, and 3) neuromorphic approach. The first approach of this dissertation is energy-efficient DNNs by quantization. Quantization is a widely used method for reducing the energy consumption of neural networks. Many related studies have been published showing comparable accuracy to the floating-point representation (32 or 64 bits) with fewer bits. However, most of them have focused on certain models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs). To overcome these limitations, we propose a quantization method for memory-augmented neural networks (MANNs) that were introduced to increase model capacity and solve a long-term dependency problem of conventional RNNs. The in-depth analysis presented here reveals various challenges that do not appear in the quantization of the other DNNs. Without addressing them properly, quantized MANNs would normally suffer from excessive quantization error, which leads to degraded performance. In this study, we identify memory addressing (specifically, content-based addressing) as the main reason for the performance degradation and propose a robust quantization method for MANNs to address the challenge. In our experiments, we achieved a computation-energy gain of 22× with 8-bit fixed-point and binary quantization compared to the floating-point implementation. Measured on the bAbI dataset, the resulting model, named the quantized MANN (Q-MANN), improved the error rate by 46% and 30% with 8-bit fixed-point and binary quantization, respectively, compared to the MANN quantized using conventional techniques. The second approach is an application-specific accelerator on field-programmable gate arrays (FPGAs). MANNs have shown promising results in question answering (QA) tasks that require holding contexts for answering a given question. As demands for QA on edge devices have increased, the utilization of MANNs in resource-constrained environments has become important. To achieve fast and energy-efficient inference of MANNs, we can exploit application-specific hardware accelerators on FPGAs. Although several accelerators for conventional deep neural networks have been designed, it is difficult to utilize the accelerators with MANNs efficiently, due to different requirements. In addition, characteristics of QA tasks should be considered for further improving the efficiency of inference on the accelerators. To address the aforementioned issues, we propose an inference accelerator of MANNs on FPGA. To fully utilize the proposed accelerator, we introduce fast inference methods considering the features of QA tasks. To evaluate our proposed approach, we implemented the proposed architecture on an FPGA and measured the execution time and energy consumption for the bAbI dataset. According to our thorough experiments, the proposed methods improved speed and energy efficiency of the inference of MANNs up to about 224 and 251 times, respectively, compared to those of CPU. The final method is a neuromorphic approach represented by spiking neural networks (SNNs), which have gained much attention as the third generation of artificial neural networks. SNNs have gained considerable interest due to their energy-efficient characteristics. However, a lack of scalable training algorithms has restricted their applicability to practical machine learning problems. DNN-to-SNN conversion approaches have been widely studied to broaden the applicability of SNNs. Most previous studies, however, have not fully utilized spatio-temporal aspects of SNNs, which has led to inefficiency in terms of the number of spikes and inference latency. In this work, we introduce two methods for energy-efficient information transmission between neurons in deep SNNs: 1) burst coding and 2) T2FSNN model. Burst coding, which is introduced in deep SNN for the first time in this study, exploits burst spikes, which are a group of spikes with short inter-spike-interval. With the burst coding, we propose a hybrid neural coding scheme in deep SNNs to improve the energy efficiency further. Moreover, we present T2FSNN, which introduces the concept of time-to-first-spike coding into deep SNNs using the kernel-based dynamic threshold and dendrite to utilize temporal information in spike trains thoroughly. In addition, we propose gradient-based optimization and early firing methods to increase the efficiency of the T2FSNN further. Our experimental results showed the proposed methods could improve the energy efficiency of inference in deep SNNs.	-
dc.description.abstract	인공신경망을 사용하는 딥러닝이 여러 분야에서 뛰어난 성능을 보여주고 있다. 그러나, 이러한 딥러닝의 성공은 엄청난 양의 에너지 소모를 기반으로 하고 있어 딥러닝 모델이 자원 제한적인 모바일 등의 환경에서 사용되는데 큰 걸림돌이 되고 있다. 이 문제를 해결하기 위해 인공신경망의 에너지 효율을 개선하려는 연구가 널리 수행되고 있다. 본 학위논문은 이러한 연구들의 연장 선상에서 에너지 효율적인 딥러닝 모델, 응용 전용 하드웨어 가속기, 뉴로모픽 접근 방법을 포함하고 있다. 첫 번째 방법은 양자화를 통한 에너지 효율적인 인공신경망 모델을 구성하는 방법이다. 양자화는 인공신경망의 에너지 소모를 줄이기 위해 널리 사용되고 있는 방법 중 하나이다. 많은 관련 연구들이 더 적은 비트로 부동소수점을 사용한 인공신경망과 비슷한 정확도를 보여주고 있다. 하지만, 대부분의 관련 연구들은 합성곱 신경망이나 순환 신경망 등 특정 인공신경망에 집중되어 있다. 이러한 한계를 극복하고자, 우리는 메모리 증강 신경망에 양자화를 적용하는 방법을 제안한다. 메모리 증강 신경망은 전통적인 순환 신경망의 긴 의존도 문제를 해결하기 위해 제안된 인공신경망이다. 우리는 철저한 분석을 통해 다른 인공신경망에서 드러나지 않은 양자화의 여러 문제점을 밝혀냈다. 이러한 문제들을 해결하지 않고서는 양자화된 메모리 증강 신경망은 일반적으로 양자화 오류로 인해 성능이 크게 저하된다. 이 연구에서 우리는 메모리 주소 지정 방식이 양자화된 메모리 증강 신경망에서 성능 저하의 주요 원인임을 밝혀내었다. 이러한 분석을 통해 우리는 양자화 오류에 강력한 메모리 증강 신경망을 제안한다. 부동소수점을 사용하는 메모리 증강 신경망과 비교하였을 때, 8 비트 고정소수점 및 이진 양자화를 적용하여 약 22배의 계산 에너지 이득을 달성하였다. 두 번째 방법은 응용 전용 하드웨어 가속기이다. 메모리 증강 신경망의 추론 에너지 효율을 개선하기 위해 우리는 전용 하드웨어 가속기 활용할 수 있다. 기존에 전통적인 인공신경망을 가속하는 여러 하드웨어 가속기가 있지만, 메모리 증강 신경망 가속기의 요구사항에 적합하지 않아 효율적으로 활용되기 어려웠다. 또한, 응용 가속기의 효율을 극대화하기 위해서는 목표로 하는 응용의 특성을 고려해야 한다. 이러한 문제점들을 해결하기 위해 우리는 질의응답에 사용될 수 있는 메모리 증강 신경망을 필드 프로그래머블 게이트 어레이에 구현하였다. 또한, 제안된 가속기를 최대한으로 활용하기 위해 질의응답의 특성을 고려한 빠른 추론 방법을 제안한다. 제안하는 가속기와 방법을 평가한 결과 중앙처리장치 대비 약 251배의 에너지 개선을 얻을 수 있었다. 마지막 방법은 스파이킹 신경망을 사용하는 뉴로모픽 접근 방식이다. 스파이킹 신경망은 높은 에너지 효율로 기대를 모으고 있는 인공신경망이다. 그러나, 깊은 스파이킹 신경망을 효율적으로 학습할 수 있는 알고리즘의 부재가 스파이킹 신경망의 적용을 가로막고 있다. 깊은 신경망에서 스파이킹 신경망으로 변환하는 방법은 이러한 학습 알고리즘 부재 문제를 해결하는 방법으로 널리 사용되고 있다. 하지만, 기존 관련 연구들은 일련의 스파이크에 존재하는 시간적 정보를 충분히 활용하지 못해 깊은 스파이킹 신경망의 효율을 높이지 못하고 있다. 이 연구에서는 에너지 효율적인 깊은 스파이킹 신경망을 구현하고자 버스트 코딩과 T2FSNN을 제안한다. 버스트 코딩은 짧은 간격의 스파이크들로 이루어진 버스트 스파이크를 활용하여 빠르게 정보를 전달하는 방법이다. 우리는 버스트 코딩과 함께 이종 뉴럴 코딩 방법을 제안하여 버스트 코딩의 활용성을 높였다. 또한, 일련의 스파이크에 존재하는 시간적 정보를 충분히 활용하기 위해 T2FSNN을 제안하였다. 경사 기반 최적화 방식과 이른 발화 방식을 제안하여 T2FSNN의 성능을 최적화 하였다. 제안하는 뉴럴 코딩과 방법들을 깊은 스파이킹 신경망에 적용한 결과 기존 대비 에너지 효율을 크게 개선할 수 있었다.	-
dc.description.tableofcontents	1 Introduction 1 2 Background 8 2.1 Deep Neural Networks 8 2.1.1 Memory-Augmented Neural Networks 16 2.2 Quantization on Deep Neural Networks 19 2.2.1 Fixed-point and Binary Quantization 20 2.2.2 Fixed-point and Binary Quantization on Deep Neural Networks 21 2.3 Hardware Accelerators of Deep Neural Networks 23 2.4 Spiking Neural Networks 25 2.4.1 Neural Coding in Deep Spiking Neural Networks 28 2.4.2 Training Methods for Deep Spiking Neural Networks 29 3 Energy-efficient Deep Neural Networks 32 3.1 Introduction 32 3.2 In-Depth Analysis of the Quantization Problem on a MANN 34 3.2.1 Fixed-point and binary quantization on a MANN 34 3.2.2 Analysis of the effect of the quantization error on content-based addressing 35 3.3 Quantized MANN 41 3.3.1 Architecture 41 3.3.2 Effective training techniques 45 3.4 Experimental Results and Discussion 45 3.5 Summary 55 4 Application-specific Inference Accelerator 56 4.1 Introduction 56 4.2 Accelerator Architecture 58 4.2.1 Overall Architecture 58 4.2.2 Interface between the HOST and FPGA 60 4.2.3 Embedding Module 62 4.2.4 Memory Module 63 4.2.5 Read and Output Modules 66 4.3 Fast Inference Methods 68 4.3.1 Performance Bottleneck of the Proposed Accelerator 68 4.3.2 Inference Thresholding 70 4.3.3 Efficient Index Ordering for Inference Thresholding 76 4.4 Implementation 80 4.5 Experimental Results 83 4.6 Discussion 89 4.6.1 The effect of HOST-FPGA interface on the accelerator and proposed inference methods on CPU and GPU 89 4.6.2 The inference accelerator on FPGAs with constrained BRAM resources 90 4.6.3 Comparison the inference thresholding with hashing-based approach 91 4.7 Summary 92 5 Neuromorphic Approach: Spiking Neural Networks 93 5.1 Introduction 93 5.2 Deep SNNs with Burst Coding 96 5.2.1 Neural Model for Burst Spikes 96 5.2.2 Hybrid Neural Coding Scheme 99 5.3 Deep SNNs with TTFS Coding 102 5.3.1 T2FSNN Model 102 5.3.2 Gradient-Based Optimization Method 105 5.3.3 Early Firing Method 107 5.4 Experimental Results 109 5.4.1 Experimental Results on Burst Coding and Hybrid Neural Coding 109 5.4.2 Experimental Results on T2FSNN 114 5.5 Discussion 119 5.5.1 Burst Coding and Hybrid Neural Coding Scheme 119 5.5.2 Burst Coding vs. Phase Coding 122 5.5.3 T2FSNN 125 5.6 Summary 126 6 Conclusion 127 6.1 Future Work 131 6.1.1 Quantized Attention-Based Model 131 6.1.2 Fast Inference Methods on Other Tasks 132 6.1.3 Extension of DNN-to-SNN Conversion Methods 132 6.1.4 Training Spiking Neural Networks with Temporal Coding 133 Bibliography 135 Abstract (In Korean) 153	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Deep Learning	-
dc.subject	Deep Neural Network	-
dc.subject	Energy Efficiency	-
dc.subject	Quantization	-
dc.subject	Application-specific Accelerator	-
dc.subject	Spiking Neural Network	-
dc.subject	딥 러닝	-
dc.subject	인공신경망	-
dc.subject	에너지 효율	-
dc.subject	양자화	-
dc.subject	전용 하드웨어 가속기	-
dc.subject	스파이 킹 신경망	-
dc.subject.ddc	621.3	-
dc.title	Improving Energy Efficiency of Neural Networks	-
dc.title.alternative	인공신경망의 에너지 효율 개선	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Seongsik Park	-
dc.contributor.department	공과대학 전기·정보공학부	-
dc.description.degree	Doctor	-
dc.date.awarded	2020-08	-
dc.contributor.major	인공지능, 컴퓨터	-
dc.identifier.uci	I804:11032-000000162600	-
dc.identifier.holdings	000000000043▲000000000048▲000000162600▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Files in This Item:: There are no files associated with this item.

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share