Publications

Detailed Information

Knowledge Distillation for Optimization of Quantized Deep Neural Networks

Cited 0 time in Web of Science Cited 8 time in Scopus
Authors

Shin, Sungho; Boo, Yoonho; Sung, Wonyong

Issue Date
2020-10
Publisher
IEEE
Citation
2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), pp.111-116
Abstract
Knowledge distillation (KD) technique that utilizes a pretrained teacher model for training a student network is exploited for the optimization of quantized deep neural networks (QDNNs). We consider the choice of the teacher network and also investigate the effect of hyperparameters for KD. We employ several large floating-point and quantized models as the teacher network. The experiment shows that the softmax distribution produced by the teacher network is more important than its performance for effective KD training. Since the softmax distribution of the teacher network can be controlled by KD's hyperparameters, we analyze the interrelationship of each KD component for quantized DNN training. Our experiments show that even a small teacher model can achieve the same distillation performance as a large teacher model. We also propose the gradual soft loss reduction (GSLR) technique which controls the mixing ratio of hard and soft losses during training for robust KD based QDNN optimization.
ISSN
1520-6130
URI
https://hdl.handle.net/10371/186297
DOI
https://doi.org/10.1109/SiPS50750.2020.9195219
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share