Publications

Detailed Information

Morpheme-based Efficient Korean Word Embedding : 형태소 기반 효율적인 한국어 단어 임베딩

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

이동준

Advisor
권태경
Major
공과대학 컴퓨터공학부
Issue Date
2018-02
Publisher
서울대학교 대학원
Keywords
word embeddingmorpheme embeddingKoreanneural networkword2vec
Description
학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2018. 2. 권태경.
Abstract
Word embedding is a strategy of mapping each word from a continuous vector space into one vector. It is the starting point of natural language processing task and greatly impacts the performance. Word2vec and Glove are among the most popular and widely used word embedding models. However, these models have limitations in that it is unable to learn the shared structure of words nor sub-word meanings. This is a serious limitation for morphologically rich languages such as Korean.
In this paper, we propose a new model which is an expansion of the previous skip-gram model to learn the sub-word information. The model defines each word vector as a sum of its morpheme vectors and hence, learns the vectors of morphemes. To test the efficiency of our embedding, we conducted a word similarity test and a word analogy test. Furthermore, by using our trained vectors as an input to the previous text classification model, we tested how much performance has actually been enhanced.
Language
English
URI
https://hdl.handle.net/10371/141558
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share