Publications
Detailed Information
Morpheme-based Efficient Korean Word Embedding : 형태소 기반 효율적인 한국어 단어 임베딩
Cited 0 time in
Web of Science
Cited 0 time in Scopus
- Authors
- Advisor
- 권태경
- Major
- 공과대학 컴퓨터공학부
- Issue Date
- 2018-02
- Publisher
- 서울대학교 대학원
- Keywords
- word embedding ; morpheme embedding ; Korean ; neural network ; word2vec
- Description
- 학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2018. 2. 권태경.
- Abstract
- Word embedding is a strategy of mapping each word from a continuous vector space into one vector. It is the starting point of natural language processing task and greatly impacts the performance. Word2vec and Glove are among the most popular and widely used word embedding models. However, these models have limitations in that it is unable to learn the shared structure of words nor sub-word meanings. This is a serious limitation for morphologically rich languages such as Korean.
In this paper, we propose a new model which is an expansion of the previous skip-gram model to learn the sub-word information. The model defines each word vector as a sum of its morpheme vectors and hence, learns the vectors of morphemes. To test the efficiency of our embedding, we conducted a word similarity test and a word analogy test. Furthermore, by using our trained vectors as an input to the previous text classification model, we tested how much performance has actually been enhanced.
- Language
- English
- Files in This Item:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.