Morpheme-based Efficient Korean Word Embedding

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Morpheme-based Efficient Korean Word Embedding : 형태소 기반 효율적인 한국어 단어 임베딩

Cited 0 time in Web of Science Cited 0 time in Scopus

Keywords: word embedding ; morpheme embedding ; Korean ; neural network ; word2vec

Abstract: Word embedding is a strategy of mapping each word from a continuous vector space into one vector. It is the starting point of natural language processing task and greatly impacts the performance. Word2vec and Glove are among the most popular and widely used word embedding models. However, these models have limitations in that it is unable to learn the shared structure of words nor sub-word meanings. This is a serious limitation for morphologically rich languages such as Korean.
In this paper, we propose a new model which is an expansion of the previous skip-gram model to learn the sub-word information. The model defines each word vector as a sum of its morpheme vectors and hence, learns the vectors of morphemes. To test the efficiency of our embedding, we conducted a word similarity test and a word analogy test. Furthermore, by using our trained vectors as an input to the previous text classification model, we tested how much performance has actually been enhanced.

Files in This Item:

Appears in Collections:

Show Full Item Record

Find it @ SNU

SNS Share