Publications

Detailed Information

A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

Ryu, Hyungshin; Kim, Sunhee; Chung, Minhwa

Issue Date
2023
Publisher
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Citation
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol.2023-August, pp.959-963
Abstract
Empirical studies report a strong correlation between pronunciation proficiency scores and phonetic errors in non-native speech assessments of human evaluators. However, the existing system of computer-assisted pronunciation training (CAPT) regards automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD) as independent and focuses on individual performance improvement. Motivated by the correlation between two tasks, we propose a novel architecture that jointly tackles APA and MDD using CTC and cross-entropy criteria with a multi-task learning scheme to benefit both tasks. To leverage additional knowledge transfer, Wav2Vec2-robust finetuned on TIMIT is used for the joint optimization. The integrated model significantly outperforms single-task learning, with a mean of 0.057 PCC increase for APA and 0.004 F1 increase for MDD on Speechocean762, which reveals that proficiency scores and phonetic errors are correlated for both human and model assessments.
ISSN
1990-9772
URI
https://hdl.handle.net/10371/199871
DOI
https://doi.org/10.21437/Interspeech.2023-337
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share