Publications

Detailed Information

Vowel Duration and Fundamental Frequency Prediction for Automatic Transplantation of Native English Prosody onto Korean-accented Speech : 자동 운율 복제를 위한 모음 길이와 기본 주파수 예측

DC Field Value Language
dc.contributor.advisor정민화-
dc.contributor.authorSabaleuski Matsvei-
dc.date.accessioned2018-12-03T01:57:25Z-
dc.date.available2018-12-03T01:57:25Z-
dc.date.issued2018-08-
dc.identifier.other000000152641-
dc.identifier.urihttps://hdl.handle.net/10371/144268-
dc.description학위논문 (석사)-- 서울대학교 대학원 : 인문대학 협동과정 인지과학전공, 2018. 8. 정민화.-
dc.description.abstractThe use of computers to help people improve their pronunciation skills of a foreign language has rapidly increased in the last decades. Majority of such Computer-Assisted Pronunciation Training (CAPT) systems have been focused on teaching correct pronunciation of segments only, however, while prosody received much less attention. One of the new approaches to prosody training is self-imitation learning. Prosodic features from a native utterance are transplanted onto learners own speech, and given back as corrective feedback. The main drawback is that this technique requires two identical sets of native and non-native utterances, which makes its actual implementation cumbersome and inflexible.

As a preliminary research towards developing a new method of prosody transplantation, the first part of the study surveys previous related works and points out their advantages and drawbacks. We also compare prosodic systems of Korean and English, point out major areas of mistakes that Korean learners of English tend to do, and then we analyze acoustic features that this mistakes are correlated with. We suggest that transplantation of vowel duration and fundamental frequency will be the most effective for self-imitation learning by Korean speakers of English.

The second part of this study introduces a new proposed model for prosody transplantation. Instead of transplanting acoustic values from a pre-recorded utterance, we suggest to use a deep neural network (DNN) based system to predict them instead. Three different models are built and described: baseline recurrent neural network (RNN), long short-term memory (LSTM) model and gated recurrent unit (GRU) model. The models were trained on Boston University Radio Speech Corpus, using a minimal set of relevant input features. The models were compared with each other, as well as with state-of-the-art prosody prediction systems from speech synthesis research.

Implementation of the proposed prediction model in automatic prosody transplantation is described and the results are analyzed. A perceptual evaluation by native speakers was carried out. Accentedness and comprehensibility ratings of modified and original non-native utterances were compared with each other. The results showed that duration transplantation can lead to the improvements in comprehensibility score. This study lays the groundwork for a fully automatic self-imitation prosody training system and its results can be used to help Korean learners master problematic areas of English prosody, such as sentence stress.
-
dc.description.tableofcontentsChapter 1. Introduction . 10

1.1 Background. 10

1.2 Research Objective 12

1.3 Research Outline. 15

Chapter 2. Related Works. 16

2.1 Self-imitation Prosody Training. 16

2.1.1 Prosody Transplantation Methods . 18

2.1.2 Effects of Prosody Transplantation on Accentedness Rating 23

2.1.3 Effects of Self-Imitation Learning on Proficiency Rating 26

2.2 Prosody of Korean-accented English Speech 28

2.2.1 Prosodic Systems of Korean and English 28

2.2.2 Common Prosodic Mistakes. 29

2.3 Deep Learning Based Prosody Prediction 34

2.3.1 Deep Learning . 34

2.3.2 Recurrent Neural Networks 35

2.3.2 The Long Short-Term Memory Architecture. 37

2.3.3 Gated Recurrent Units. 39

2.3.4 Prosody Prediction Models 40

Chapter 3. Vowel Duration and Fundamental Frequency Prediction Model 43

3.1 Data 43

3.2. Input Feature Selection. 45

3.3 System Architecture and Training 56

3.4 Results and Evaluation 63

3.4.1 Objective Metrics. 63

3.4.2 Vowel Duration Prediction Models Results. 65

3.4.2 Fundamental Frequency Prediction Models Results 68

3.4.3 Comparison with other models . 68

Chapter 4. Automatic Prosody Transplantation 72

4.1 Data 72

4.2 Transplantation Method. 74

4.3 Perceptual Evaluation 79

4.4 Results 80

Chapter 5. Conclusion. 82

5.1 Summary 82

5.2 Contribution 84

5.3 Limitations 85

5.4 Recommendations for Future Study. 85

References 88

Appendix 96
-
dc.formatapplication/pdf-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subject.ddc153-
dc.titleVowel Duration and Fundamental Frequency Prediction for Automatic Transplantation of Native English Prosody onto Korean-accented Speech-
dc.title.alternative자동 운율 복제를 위한 모음 길이와 기본 주파수 예측-
dc.typeThesis-
dc.description.degreeMaster-
dc.contributor.affiliation인문대학 협동과정 인지과학전공-
dc.date.awarded2018-08-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share