An Auto-tuner for Quantizing Deep Neural Networks

QUAN QUAN

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

An Auto-tuner for Quantizing Deep Neural Networks

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: QUAN QUAN

Advisor: 이재욱

Major: 공과대학 컴퓨터공학부

Issue Date: 2019-02

Publisher: 서울대학교 대학원

Description: 학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 이재욱.

Abstract: AI 기반 응용 프로그램 및 서비스의 확산으로 심층 신경망 (DNN)의 효율적인 처리에 대한 수요가 크게 증가하고 있다. DNN은 많은 계산량과 메모리 공간을 필요로 하기 때문에 컴퓨팅 및 메모리 집약적인 것으로 알려져 있다. 양자화는 적 은 비트 수로 숫자를 표현하여 컴퓨팅 성능과 메모리 공간을 모두 줄이는데 널리 사용되는 방법이다.
그러나 계층별 최적화로 인해 악화되는 다양한 비트 폭을 가진 가능한 숫자 표현의 조합이 수천만가지가 있다, 따라서 DNN에 대한 최적의 숫자 표현을 찾는 것은 어려운 작업이다. 이를 해결하기 위해 본 논문는 DNN 양자화를 위한 자동 튜너를 제안한다. 여기서 자동 튜너는 정확도 제약 조건을 만족시키면서 사용자의 목적 함수를 최소화하여 숫자의 콤팩트한 표현 (숫자유형, 비트 및 바이어스)을 찾아 준다. FPGA 플랫폼과 bit-serial 하드웨어을 응용대상으로 각각 두 DNN 프레임 워크에서 11 개의 DNN 모델을 사용하여 평가 했다. 상대 정확도 최대 7% (1%) 손실이 허용되는 상황에 32 비트 floating-point를 사용하는 baseline 과 비교할 때에 변수 크기가 평균적으로 8배 (7배) 감소되고, 최대로는 16배까지 감소되었다.
With the proliferation of AI-based applications and services, there are strong demands for efficient processing of deep neural networks (DNNs). DNNs are known to be both compute- and memory-intensive as they require a tremen- dous amount of computation and large memory space. Quantization is a popu- lar technique to boost efficiency of DNNs by representing a number with fewer bits, hence reducing both computational strength and memory footprint. How- ever, it is a difficult task to find an optimal number representation for a DNN due to a combinatorial explosion in feasible number representations with vary- ing bit widths, which is only exacerbated by layer-wise optimization. To address this, an automatic tuner is proposed in this work for DNN quantization. Here, the auto-tuner can efficiently find a compact representation (type, bit width, and bias) for the number that minimizes the user-supplied objective function, while satisfying the accuracy constraint. The evaluation using eleven DNN models on two DNN frameworks targeting an FPGA platform and a bit-serial hardware, demonstrates over 8× (7×) reduction in the parameter size on aver- age when up to 7% (1%) loss of relative accuracy is tolerable, with a maximum reduction of 16×, compared to the baseline using 32-bit floating-point numbers.

Language: eng

URI: https://hdl.handle.net/10371/150788

Files in This Item:

000000154394.pdf 2.93 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share