SHERP

Automatic Generation of Morpheme Level Reordering Rules for Korean to English Machine Translation

Cited 0 time in webofscience Cited 0 time in scopus
Authors
Breanna Castellani
Advisor
신효필
Major
인문대학 언어학과
Issue Date
2017
Publisher
서울대학교 대학원
Keywords
automatic rule generationKorean-English MThybrid machine translationrule-based preprocessingmorpheme reordering
Description
학위논문 (석사)-- 서울대학교 대학원 : 언어학과, 2017. 2. 신효필.
Abstract
Word order is one of the main challenges that machine translation systems must overcome when dealing with any linguistically divergent language pair, such as Korean and English. Statistical machine translation (SMT) models are often insufficient at long distance reordering due the distortion penalties they impose.Rule-based systems, on the other hand, are often costly, in both time and money, to build and maintain. The present research proposes a new hybrid approach for Korean to English machine translation. While previous approaches have focused on the word, our approach considers the morpheme as the basic unit of translation for this language pair. We begin by developing a classification model to disambiguate Korean functional morphemes based on alignment pairs and context feature data. Then, according to our automatically generated rules, we apply this model in a preprocessing step to reorder the morphemes to better match English sentence structure. After retraining our statistical translation system, Moses, results indicate an improvement in overall translation quality. When the SMT system's internal lexicalized reordering is restricted, we note an increase in the BLEU score of 3.5% over the SMT-only baseline. In the case where we do not limit decoding-time reordering, an even greater BLEU score increase of 4.42% is observed. We also find evidence to suggest that our changes enable Moses to execute additional reordering operations at decoding time that it was previously unable to perform.
Language
English
URI
http://hdl.handle.net/10371/131963
Files in This Item:
Appears in Collections:
College of Humanities (인문대학)Linguistics (언어학과)Theses (Master's Degree_언어학과)
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse