Publications

Detailed Information

ADAPTIVE KNOWLEDGE DISTILLATION BASED ON ENTROPY

Cited 16 time in Web of Science Cited 23 time in Scopus
Authors

Kwon, Kisoo; Na, Hwidong; Lee, Hoshik; Kim, Nam Soo

Issue Date
2020-05
Publisher
IEEE
Citation
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, pp.7409-7413
Abstract
Knowledge distillation (KD) approach is widely used in the deep learning field mainly for model size reduction. KD utilizes soft labels of teacher model, which contain the dark-knowledge that one-hot ground-truth does not have. This knowledge can improve the performance of already saturated student model. In case of multiple-teacher models, generally, the same weighted average (interpolated training) of multiple-teacher's labels is applied to KD training. However, if the knowledge characteristics among teachers are somewhat different, the interpolated training can be at risk of crushing each knowledge characteristics and can also raise noise component. In this paper, we propose an entropy based KD training, which utilizes the teacher model labels with lower entropy at a larger rate among the various teacher models. The proposed method shows a better performance than the conventional KD training scheme in automatic speech recognition.
ISSN
1520-6149
URI
https://hdl.handle.net/10371/186515
DOI
https://doi.org/10.1109/ICASSP40776.2020.9054698
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share