Publications
Detailed Information
Automatic Generation of Morpheme Level Reordering Rules for Korean to English Machine Translation
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 신효필 | - |
dc.contributor.author | Breanna Castellani | - |
dc.date.accessioned | 2017-07-19T09:47:05Z | - |
dc.date.available | 2017-07-19T09:47:05Z | - |
dc.date.issued | 2017-02 | - |
dc.identifier.other | 000000141950 | - |
dc.identifier.uri | https://hdl.handle.net/10371/131963 | - |
dc.description | 학위논문 (석사)-- 서울대학교 대학원 : 언어학과, 2017. 2. 신효필. | - |
dc.description.abstract | Word order is one of the main challenges that machine translation systems must overcome when dealing with any linguistically divergent language pair, such as Korean and English. Statistical machine translation (SMT) models are often insufficient at long distance reordering due the distortion penalties they impose.Rule-based systems, on the other hand, are often costly, in both time and money, to build and maintain.
The present research proposes a new hybrid approach for Korean to English machine translation. While previous approaches have focused on the word, our approach considers the morpheme as the basic unit of translation for this language pair. We begin by developing a classification model to disambiguate Korean functional morphemes based on alignment pairs and context feature data. Then, according to our automatically generated rules, we apply this model in a preprocessing step to reorder the morphemes to better match English sentence structure. After retraining our statistical translation system, Moses, results indicate an improvement in overall translation quality. When the SMT system's internal lexicalized reordering is restricted, we note an increase in the BLEU score of 3.5% over the SMT-only baseline. In the case where we do not limit decoding-time reordering, an even greater BLEU score increase of 4.42% is observed. We also find evidence to suggest that our changes enable Moses to execute additional reordering operations at decoding time that it was previously unable to perform. | - |
dc.description.tableofcontents | Chapter 1. Introduction 1
Chapter 2. Literature Review 6 2.1 Machine Translation. 6 2.2 Reordering 10 2.3 Korean to English MT. 12 Chapter 3. Corpus Data and SMT System. 14 3.1 Background 14 3.2 Preparation. 15 3.3 Moses 17 Chapter 4. Rule Generation. 19 4.1 Corpus Processing. 20 4.1.1 Suggested Korean-English Alignments. 21 4.1.2 Feature Sets 24 4.1.3 Reordering Movement. 26 4.2 Rule Creation. 33 4.3 Input Preprocessing. 35 4.3.1 Rule Matching. 35 4.3.2 Morpheme Reordering. 38 4.4 Examples 40 Chapter 5. Results 44 Chapter 6. Conclusion. 49 References 51 Appendix A: Rules 55 Abstract in Korean 64 | - |
dc.format | application/pdf | - |
dc.format.extent | 679559 bytes | - |
dc.format.medium | application/pdf | - |
dc.language.iso | en | - |
dc.publisher | 서울대학교 대학원 | - |
dc.subject | automatic rule generation | - |
dc.subject | Korean-English MT | - |
dc.subject | hybrid machine translation | - |
dc.subject | rule-based preprocessing | - |
dc.subject | morpheme reordering | - |
dc.subject.ddc | 401 | - |
dc.title | Automatic Generation of Morpheme Level Reordering Rules for Korean to English Machine Translation | - |
dc.type | Thesis | - |
dc.description.degree | Master | - |
dc.citation.pages | 69 | - |
dc.contributor.affiliation | 인문대학 언어학과 | - |
dc.date.awarded | 2017-02 | - |
- Appears in Collections:
- Files in This Item:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.