Publications

Detailed Information

Korean Sentence Complexity Reduction for Machine Translation

DC Field Value Language
dc.contributor.advisor신효필-
dc.contributor.authorLuke Bates-
dc.date.accessioned2017-07-19T09:47:03Z-
dc.date.available2017-07-19T09:47:03Z-
dc.date.issued2017-02-
dc.identifier.other000000141927-
dc.identifier.urihttps://hdl.handle.net/10371/131962-
dc.description학위논문 (석사)-- 서울대학교 대학원 : 언어학과, 2017. 2. 신효필.-
dc.description.abstractText simplification used as a preprocessing task for the improved functionality of natural language processing systems has a long history of research based on European languages, yet, there is no research that has utilized Korean as the object of study. However, there is great demand for comprehensible Korean to English machine translations, yet due to the disparate nature of these two languages, machine translation often fails to achieve fluent results.
In order to improve the translation quality of Korean text as the source language, the first-ever rule-based Korean complexity reduction system was designed, constructed, and implemented in this study. This system was achieved by a unique technique termed "phrase-grouping and generalization of nuance structures," in Korean as a disambiguation tool. This technique has potential applications in all languages and additional natural language processing tasks. On top of this, in order to set a foundation for which complexity reduction operations and combinations generate fluent Korean and improved machine translation output, a unique factorial approach to simplification generation was also implemented.
In order to assess the output of the system proposed in the current research, the parallel evaluation of simplified Korean text by Korean native speakers and the evaluation of translations by English native speakers was conducted. The translation systems used in this study were Google Translate and Moses, both statistical machine translation systems, and Naver Translate, a neural machine translation system. This is the first research to conduct experiments on the interaction of text simplification and neural networks. Additionally, no known research has analyzed output from three machine translation systems simultaneously.
Generally, the proposed system generated relatively fluent Korean, though due to the factorial nature by which simplifications were generated, sentence quality usually began to deteriorate after more than one simplification operation. On the other hand, the proposed system as a preprocessing task for machine translation consistently improved translation quality for all three systems utilized in this study by up to two performed simplifications.
In the case of the statistical machine translation systems used in this study, more than two simplifications deteriorated not only Korean sentence quality, but also translation quality. However, in the case of Naver Translate, the neural machine translation system used in this study, even three simplifications resulted in translation improvement according to the evaluators. This study, then, emphasizes the need for more research conducted on text simplification as the field of natural language processing transitions to neural network-based approaches and applications.
-
dc.description.tableofcontents1. Introduction: Text Simplification 1
1.1 Korean Text Simplification 4
1.1.1 Korean Sentence Complexity Reduction 4
1.2 Research Objectives 5
1.2 Research Outline 6
2. Literature Review 8
2.1 Text Simplification 8
2.2 Machine Translation 12
2.2.1 Rule-based Machine Translation 12
2.2.2 Statistical Machine Translation 13
2.2.3 Neural Machine Translation 15
2.2.4 Hybrid Machine Translation 17
2.3 Text Simplification for Machine Translation 18
2.4 Automated Asian Text Simplification 21
2.4.1 Automated Japanese Simplification 22
2.4.2 Automated Korean Simplification 22
3. Samsung Machine Translation Corpus 24
3.1 Corpus Description 24
3.2 Corpus Issues 26
3.2.1 Korean Issues 26
3.2.1 English Issues 27
4. Korean Sentence Complexity Reduction System 29
4.1 Rule Creation and Description 30
4.1.1 Korean Coordination Simplification 30
4.1.2 Contrastive Coordination Simplification 34
4.1.3 Indirect Sentence Reduction 35
4.1.4 Gerund Reduction 38
4.1.5 Cause and Effect Reduction 40
4.2 Factorial Reduction 42
4.3 Phrase-grouping and Generalization 45
4.4 System Coverage 47
4.5 System Architecture 50
4.6 System Evaluation 52
5. Pilot Study 55
5.1 Methodology 55
5.2 Simplification Results 58
5.3 Simplification + Machine Translation Results 62
5.4 Pilot Study Discussion 69
6. Experiment 71
6.1 Methodology 71
6.2 Simplification Results 72
6.3 Simplification + Machine Translation Results 75
6.4 Naver Translate 84
6.5 Experiment Discussion 88
7. Conclusion 91
References 94
Abstract in Korean 99
-
dc.formatapplication/pdf-
dc.format.extent2321497 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjecttext simplification-
dc.subjectmachine translation-
dc.subjectkorean-
dc.subjectnatural language processing-
dc.subjectneural networks-
dc.subject.ddc401-
dc.titleKorean Sentence Complexity Reduction for Machine Translation-
dc.typeThesis-
dc.description.degreeMaster-
dc.citation.pages109-
dc.contributor.affiliation인문대학 언어학과-
dc.date.awarded2017-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share