Publications

Detailed Information

Acoustic Modeling using Adversarially Trained Variational Recurrent Neural Network for Speech Synthesis

DC Field Value Language
dc.contributor.authorLee, Joun Yeop-
dc.contributor.authorCheon, Sung Jun-
dc.contributor.authorChoi, Byoung Jin-
dc.contributor.authorKim, Nam Soo-
dc.contributor.authorSong, Eunwoo-
dc.date.accessioned2022-10-26T07:22:06Z-
dc.date.available2022-10-26T07:22:06Z-
dc.date.created2022-10-21-
dc.date.issued2018-09-
dc.identifier.citation19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, pp.917-921-
dc.identifier.issn2308-457X-
dc.identifier.urihttps://hdl.handle.net/10371/186836-
dc.description.abstractIn this paper, we propose a variational recurrent neural network (VRNN) based method for modeling and generating speech parameter sequences. In recent years. the performance of speech synthesis systems has been improved over conventional techniques thanks to deep learning-based acoustic models. Among the popular deep learning techniques, recurrent neural networks (RNNs) has been successful in modeling time-dependent sequential data efficiently. However, due to the deterministic nature of RNNs prediction, such models do not reflect the full complexity of highly structured data, like natural speech. In this regard, we propose adversarially trained variational recurrent neural network (AdVRNN) which use VRNN to better represent the variability of natural speech for acoustic modeling in speech synthesis. Also, we apply adversarial learning scheme in training AdVRNN to overcome oversmoothing problem. We conducted comparative experiments for the proposed VRNN with the conventional gated recurrent unit which is one of RNNs, for speech synthesis system. It is shown that the proposed AdVRNN based method performed better than the conventional GRU technique.-
dc.language영어-
dc.publisherISCA-INT SPEECH COMMUNICATION ASSOC-
dc.titleAcoustic Modeling using Adversarially Trained Variational Recurrent Neural Network for Speech Synthesis-
dc.typeArticle-
dc.identifier.doi10.21437/Interspeech.2018-1598-
dc.citation.journaltitle19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES-
dc.identifier.wosid000465363900191-
dc.identifier.scopusid2-s2.0-85054971121-
dc.citation.endpage921-
dc.citation.startpage917-
dc.description.isOpenAccessN-
dc.contributor.affiliatedAuthorKim, Nam Soo-
dc.type.docTypeProceedings Paper-
dc.description.journalClass1-
Appears in Collections:
Files in This Item:
There are no files associated with this item.

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share