Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition

Park, Jinhwan; Sung, Wonyong

doi:10.21437/Interspeech.2020-3163

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition

DC Field	Value	Language
dc.contributor.author	Park, Jinhwan	-
dc.contributor.author	Sung, Wonyong	-
dc.date.accessioned	2022-10-17T04:27:47Z	-
dc.date.available	2022-10-17T04:27:47Z	-
dc.date.created	2022-10-06	-
dc.date.issued	2020-10	-
dc.identifier.citation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.46-50	-
dc.identifier.issn	1990-9772	-
dc.identifier.uri	https://hdl.handle.net/10371/186299	-
dc.description.abstract	Attention-based models with convolutional encoders enable faster training and inference than recurrent neural network-based ones. However, convolutional models often require a very large receptive field to achieve high recognition accuracy, which not only increases the parameter size but also the computational cost and run-time memory footprint. A convolutional encoder with a short receptive field length can suffer from looping or skipping problems when the input utterance contains the same words as nearby sentences. We believe that this is due to the insufficient receptive field length, and try to remedy this problem by adding positional information to the convolution-based encoder. It is shown that the word error rate (WER) of a convolutional encoder with a short receptive field size can be reduced significantly by augmenting it with positional information. Visualization results are presented to demonstrate the effectiveness of adding positional information. The proposed method improves the accuracy of attention models with a convolutional encoder and achieves a WER of 10.60% on TED-LIUMv2 for an end-to-end speech recognition task.	-
dc.language	영어	-
dc.publisher	ISCA-INT SPEECH COMMUNICATION ASSOC	-
dc.title	Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition	-
dc.type	Article	-
dc.identifier.doi	10.21437/Interspeech.2020-3163	-
dc.citation.journaltitle	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.identifier.wosid	000833594100010	-
dc.identifier.scopusid	2-s2.0-85098133807	-
dc.citation.endpage	50	-
dc.citation.startpage	46	-
dc.description.isOpenAccess	N	-
dc.contributor.affiliatedAuthor	Sung, Wonyong	-
dc.type.docType	Proceedings Paper	-
dc.description.journalClass	1	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Journal Papers (저널논문_전기·정보공학부)

Files in This Item:: There are no files associated with this item.

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share