Vowel Duration and Fundamental Frequency Prediction for Automatic Transplantation of Native English Prosody onto Korean-accented Speech

Sabaleuski Matsvei

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Vowel Duration and Fundamental Frequency Prediction for Automatic Transplantation of Native English Prosody onto Korean-accented Speech : 자동 운율 복제를 위한 모음 길이와 기본 주파수 예측

DC Field	Value	Language
dc.contributor.advisor	정민화	-
dc.contributor.author	Sabaleuski Matsvei	-
dc.date.accessioned	2018-12-03T01:57:25Z	-
dc.date.available	2018-12-03T01:57:25Z	-
dc.date.issued	2018-08	-
dc.identifier.other	000000152641	-
dc.identifier.uri	https://hdl.handle.net/10371/144268	-
dc.description	학위논문 (석사)-- 서울대학교 대학원 : 인문대학 협동과정 인지과학전공, 2018. 8. 정민화.	-
dc.description.abstract	The use of computers to help people improve their pronunciation skills of a foreign language has rapidly increased in the last decades. Majority of such Computer-Assisted Pronunciation Training (CAPT) systems have been focused on teaching correct pronunciation of segments only, however, while prosody received much less attention. One of the new approaches to prosody training is self-imitation learning. Prosodic features from a native utterance are transplanted onto learners own speech, and given back as corrective feedback. The main drawback is that this technique requires two identical sets of native and non-native utterances, which makes its actual implementation cumbersome and inflexible. As a preliminary research towards developing a new method of prosody transplantation, the first part of the study surveys previous related works and points out their advantages and drawbacks. We also compare prosodic systems of Korean and English, point out major areas of mistakes that Korean learners of English tend to do, and then we analyze acoustic features that this mistakes are correlated with. We suggest that transplantation of vowel duration and fundamental frequency will be the most effective for self-imitation learning by Korean speakers of English. The second part of this study introduces a new proposed model for prosody transplantation. Instead of transplanting acoustic values from a pre-recorded utterance, we suggest to use a deep neural network (DNN) based system to predict them instead. Three different models are built and described: baseline recurrent neural network (RNN), long short-term memory (LSTM) model and gated recurrent unit (GRU) model. The models were trained on Boston University Radio Speech Corpus, using a minimal set of relevant input features. The models were compared with each other, as well as with state-of-the-art prosody prediction systems from speech synthesis research. Implementation of the proposed prediction model in automatic prosody transplantation is described and the results are analyzed. A perceptual evaluation by native speakers was carried out. Accentedness and comprehensibility ratings of modified and original non-native utterances were compared with each other. The results showed that duration transplantation can lead to the improvements in comprehensibility score. This study lays the groundwork for a fully automatic self-imitation prosody training system and its results can be used to help Korean learners master problematic areas of English prosody, such as sentence stress.	-
dc.description.tableofcontents	Chapter 1. Introduction . 10 1.1 Background. 10 1.2 Research Objective 12 1.3 Research Outline. 15 Chapter 2. Related Works. 16 2.1 Self-imitation Prosody Training. 16 2.1.1 Prosody Transplantation Methods . 18 2.1.2 Effects of Prosody Transplantation on Accentedness Rating 23 2.1.3 Effects of Self-Imitation Learning on Proficiency Rating 26 2.2 Prosody of Korean-accented English Speech 28 2.2.1 Prosodic Systems of Korean and English 28 2.2.2 Common Prosodic Mistakes. 29 2.3 Deep Learning Based Prosody Prediction 34 2.3.1 Deep Learning . 34 2.3.2 Recurrent Neural Networks 35 2.3.2 The Long Short-Term Memory Architecture. 37 2.3.3 Gated Recurrent Units. 39 2.3.4 Prosody Prediction Models 40 Chapter 3. Vowel Duration and Fundamental Frequency Prediction Model 43 3.1 Data 43 3.2. Input Feature Selection. 45 3.3 System Architecture and Training 56 3.4 Results and Evaluation 63 3.4.1 Objective Metrics. 63 3.4.2 Vowel Duration Prediction Models Results. 65 3.4.2 Fundamental Frequency Prediction Models Results 68 3.4.3 Comparison with other models . 68 Chapter 4. Automatic Prosody Transplantation 72 4.1 Data 72 4.2 Transplantation Method. 74 4.3 Perceptual Evaluation 79 4.4 Results 80 Chapter 5. Conclusion. 82 5.1 Summary 82 5.2 Contribution 84 5.3 Limitations 85 5.4 Recommendations for Future Study. 85 References 88 Appendix 96	-
dc.format	application/pdf	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject.ddc	153	-
dc.title	Vowel Duration and Fundamental Frequency Prediction for Automatic Transplantation of Native English Prosody onto Korean-accented Speech	-
dc.title.alternative	자동 운율 복제를 위한 모음 길이와 기본 주파수 예측	-
dc.type	Thesis	-
dc.description.degree	Master	-
dc.contributor.affiliation	인문대학 협동과정 인지과학전공	-
dc.date.awarded	2018-08	-

Appears in Collections:

College of Humanities (인문대학)
- Program in Cognitive Science (협동과정-인지과학전공)
  - Theses (Master's Degree_협동과정-인지과학전공)

Files in This Item:

000000152641.pdf 1.88 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share