Publications

Detailed Information

Row Streaming Dataflow using Chaining Buffer and Systolic Array+ Structure : Chaining Buffer를 활용한 Row Streaming Dataflow와 Systolic Array+ 구조

DC Field Value Language
dc.contributor.advisor안정호-
dc.contributor.author김휘수-
dc.date.accessioned2021-11-30T04:39:45Z-
dc.date.available2021-11-30T04:39:45Z-
dc.date.issued2021-02-
dc.identifier.other000000164188-
dc.identifier.urihttps://hdl.handle.net/10371/175880-
dc.identifier.urihttps://dcollection.snu.ac.kr/common/orgView/000000164188ko_KR
dc.description학위논문 (석사) -- 서울대학교 대학원 : 융합과학기술대학원 지능정보융합학과, 2021. 2. 안정호.-
dc.description.abstractConvolutional Neural Networks (CNNs) are widely used to solve complex problems in various fields such as image recognition, image classification, and video analysis.
Convolutional (CONV) layers are the most computationally intensive part of CNN inference; various architectures have been proposed to process it efficiently.
Among those, a systolic array is composed in the form of a 2D array of processing elements, which handles GEneral Matrix Multiplication (GEMM) with high efficiency.
However, to process a CONV layer as a GEMM type, image-to-column (im2col) processing, which is also called lowering, is required per layer, necessitating larger on-chip memory and a considerable amount of repetitive on-chip memory access.

In this paper, we propose a systolic array+ (SysAr+) structure augmented with a chaining buffer and a row-streaming dataflow that can maximize data reuse without the im2col pre-process in the CONV layer and repetitive access from the large on-chip memory.
When the proposed method is applied, in the 3X3 CONV layers, energy consumption is reduced by up to 19.7% in ResNet and 37.4% in DenseNet with an area overhead of 1.54% in SysAr+, and performance is improved by up to 32.4% in ResNet and 12.1% in DenseNet.
-
dc.description.abstractConvolutional Neural Networks (CNNs)은 현재 이미지 인식, 이미지 분류, 비디오 분석 등 다양한 분야에서 복잡한 문제를 해결하기 위해 가장 널리 사용되고 있다. Convolutional(CONV) layer는 CNN inference에서 가장 연산이 많은 부분으로 이를 효율적으로 처리하기 위해 다양한 구조들이 제안되었다.

제안된 많은 구조 중 하나인 systolic array는 2D array of processing elements의 형태로 구성되어 GEneral Matrix to matrix Multiplication (GEMM)을 처리하기에 매우 효율적인 형태이다. 하지만 CONV layer를 GEMM 형태로 처리하기 위해서는 lowering으로 불러지는 image-to-column (im2col) 처리가 매 layer마다 필요하게 되고 이로 인해 on-chip memory의 크기 요구량이 증가하고 또한 상당한 양의 반복적인 memory access가 발생하는 문제가 나타나게 된다.

본 논문에서는 CONV layer에서 im2col 전처리 과정을 제거하여 기존 on-chip memory에서 요구되던 data duplication을 제거하고 또한 on-chip memory에 반복적인 데이터 접근없이 데이터 재사용성을 극대화시킬수 있는 row-streaming data flow와 이를 위한 chaining buffer가 구현된 systolic array+ (SysAr+) 구조를 제안하였다. 본 논문에서 제안한 방식을 적용하면 3X3 CONV layer에서 SysAr+구조에서 기존 SysAr 구조 대비 약 1.54%의 크기 증가만으로 ResNet에서 최대 19.7% 그리고 DenseNet에서 최대 37.4% 에너지가 감소하고 성능은 ResNet에서 최대 32.4% 그리고 DenseNet에서 최대 12.1% 향상된다.
-
dc.description.tableofcontents1 Introduction to CNN accelerators 1
1.1 CNN accelerator and simulator . . . . . . . . . . . . . . . . . . . . 1
1.2 Characteristics analysis of CNN accelerator . . . . . . . . . . . . . 2
1.3 Area analysis in CNN accelerator reconfigured equally for the process, frequency, bit precision, and the number of PEs . . . . . . . 4
1.4 Energy consumption and performance analysis in ResNet-50 of
accelerators reconstructed with 1024 PEs . . . . . . . . . . . . . . 6
2 Energy-efficiency Challenges of Processing CONV Layers in a SysAr 10
2.1 Mapping for weights of CONV layers in SysAr . . . . . . . . . . . 13
3 SysAr+ Microarchitecture to Support Row-streaming Dataflow 15
3.1 SysAr+ Microarchitecture . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Row-streaming dataflow for SysAr+ . . . . . . . . . . . . . . . . 16
3.3 Chaining Buffer for row-streaming dataflow . . . . . . . . . . . . 18
3.4 Chaining Buffer formula . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Evaluation 21
4.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Architectures for comparison . . . . . . . . . . . . . . . . . . . . . 23
4.2.1 Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.2 Energy efficiency . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Conclusion 27
REFERENCES 33
국문초록 34
-
dc.format.extentvi, 35-
dc.language.isoeng-
dc.publisher서울대학교 대학원-
dc.subjectCNNs-
dc.subjectSystolic array-
dc.subjectIm2col-
dc.subjectMatrix lowering-
dc.subjectData reuse-
dc.subjectMemory Bandwidth-
dc.subjectCNN-
dc.subject데이터재사용-
dc.subject메모리 대역폭-
dc.subject.ddc006.3-
dc.titleRow Streaming Dataflow using Chaining Buffer and Systolic Array+ Structure-
dc.title.alternativeChaining Buffer를 활용한 Row Streaming Dataflow와 Systolic Array+ 구조-
dc.typeThesis-
dc.typeDissertation-
dc.contributor.AlternativeAuthorKIM Hweesoo-
dc.contributor.department융합과학기술대학원 지능정보융합학과-
dc.description.degreeMaster-
dc.date.awarded2021-02-
dc.identifier.uciI804:11032-000000164188-
dc.identifier.holdings000000000044▲000000000050▲000000164188▲-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share