Publications

Detailed Information

Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition

DC Field Value Language
dc.contributor.authorPark, Jinhwan-
dc.contributor.authorSung, Wonyong-
dc.date.accessioned2022-10-17T04:27:47Z-
dc.date.available2022-10-17T04:27:47Z-
dc.date.created2022-10-06-
dc.date.issued2020-10-
dc.identifier.citationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.46-50-
dc.identifier.issn1990-9772-
dc.identifier.urihttps://hdl.handle.net/10371/186299-
dc.description.abstractAttention-based models with convolutional encoders enable faster training and inference than recurrent neural network-based ones. However, convolutional models often require a very large receptive field to achieve high recognition accuracy, which not only increases the parameter size but also the computational cost and run-time memory footprint. A convolutional encoder with a short receptive field length can suffer from looping or skipping problems when the input utterance contains the same words as nearby sentences. We believe that this is due to the insufficient receptive field length, and try to remedy this problem by adding positional information to the convolution-based encoder. It is shown that the word error rate (WER) of a convolutional encoder with a short receptive field size can be reduced significantly by augmenting it with positional information. Visualization results are presented to demonstrate the effectiveness of adding positional information. The proposed method improves the accuracy of attention models with a convolutional encoder and achieves a WER of 10.60% on TED-LIUMv2 for an end-to-end speech recognition task.-
dc.language영어-
dc.publisherISCA-INT SPEECH COMMUNICATION ASSOC-
dc.titleEffect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition-
dc.typeArticle-
dc.identifier.doi10.21437/Interspeech.2020-3163-
dc.citation.journaltitleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.identifier.wosid000833594100010-
dc.identifier.scopusid2-s2.0-85098133807-
dc.citation.endpage50-
dc.citation.startpage46-
dc.description.isOpenAccessN-
dc.contributor.affiliatedAuthorSung, Wonyong-
dc.type.docTypeProceedings Paper-
dc.description.journalClass1-
Appears in Collections:
Files in This Item:
There are no files associated with this item.

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share