Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition

Cited 1 time in Web of Science Cited 0 time in Scopus

Citation: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.46-50

Abstract: Attention-based models with convolutional encoders enable faster training and inference than recurrent neural network-based ones. However, convolutional models often require a very large receptive field to achieve high recognition accuracy, which not only increases the parameter size but also the computational cost and run-time memory footprint. A convolutional encoder with a short receptive field length can suffer from looping or skipping problems when the input utterance contains the same words as nearby sentences. We believe that this is due to the insufficient receptive field length, and try to remedy this problem by adding positional information to the convolution-based encoder. It is shown that the word error rate (WER) of a convolutional encoder with a short receptive field size can be reduced significantly by augmenting it with positional information. Visualization results are presented to demonstrate the effectiveness of adding positional information. The proposed method improves the accuracy of attention models with a convolutional encoder and achieves a WER of 10.60% on TED-LIUMv2 for an end-to-end speech recognition task.

Appears in Collections:

Show Full Item Record

Find it @ SNU

SNS Share