S-Space College of Humanities (인문대학) Program in Cognitive Science (협동과정-인지과학전공) Theses (Master's Degree_협동과정-인지과학전공)
Automatic phrase break and sentence stress prediction in English with RNN
RNN 기반의 휴지부 및 문장 강세 자동 예측
- Evgenia Nedelko
- 인문대학 협동과정 인지과학전공
- Issue Date
- 서울대학교 대학원
- 학위논문 (석사)-- 서울대학교 대학원 : 인문대학 협동과정 인지과학전공, 2018. 8. 정민화.
- Prosody has been widely researched and studied from different angles in recent years. In linguistics, prosody is concerned with those elements of speech that are not individual phonetic segments (vowels and consonants) but are properties of syllables and larger units of speech. These contribute to linguistic functions such as intonation, tone, stress, and rhythm. Prosody may reflect various features of the speaker or the utterance: the emotional state of the speaker
the form of the utterance (statement, question, or command)
emphasis, contrast, and focus
or other elements of language that may not be encoded by grammar or by vocabulary.
In the present research we concentrated on such prosodic phenomena like phrase breaks and sentence stress. In English, pausing is more likely before a word carrying a high information content. Defining pause is not easy. First of all, pausing is a very natural phenomenon related to breathing. Also it seems necessary to distinguish between silent pauses and "filled" pauses where a hesitation is perceived but the speaker continues to emit sound.
Prosodic stress, or sentence stress, refers to stress patterns that apply at a higher level than the individual word – namely within a prosodic unit. It may involve a certain natural stress pattern characteristic of a given language, but may also involve the placing of emphasis on particular words because of their relative importance.
The goal of the present work was to conduct a multi-class prediction task consisting of three classes: NB standing for not pause occurring, B standing for a minor pause occurring in the sentence and BB standing for a pause marking a sentence boundary.
The second goal was to conduct a sentence stress prediction task and demonstrate that the implementation of neural network models without additionally extracted features will allow to achieve a relatively high performance. Both prediction tasks were performed based on the word embedding model together with neural network architectures.
The main hypothesis was that the implementation of bidirectional neural networks will help increase the accuracy of pause prediction and drastically improve the overall performance. The second hypothesis was that a pre-trained word embedding model in combination with a neural network architecture will allow to achieve good performance on sentence stress prediction task.
The third and the last goal was to examine and compare the performance of different neural network models on the prediction tasks mentioned above.
Keywords: Prosody prediction, Phrase breaks, Sentence stress, Deep learning, Neural networks.