Publications

Detailed Information

A Comparison of Oversampling Effects on Imbalanced Topic Classification of Korean Texts : 한국어 주제 분류에서 오버 샘플링 효과 비교

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

서이레

Advisor
김청택
Major
인문대학 협동과정 인지과학전공
Issue Date
2017-08
Publisher
서울대학교 대학원
Keywords
Imbalanced dataKorean text analysisoversamplingSMOTEsupervised learningtopic classification
Description
학위논문 (석사)-- 서울대학교 대학원 인문대학 협동과정 인지과학전공, 2017. 8. 김청택.
Abstract
Imbalanced data is a widely-acknowledged problem in supervised learning classification tasks. Oversampling is one way to overcome the problem and there are many methods of oversampling that have been discovered. While researches on the effect of oversampling on other languages have been widely conducted, studies comparing oversampling methods on Korean texts are scarce. This study compares the effect of oversampling methods on the task of classifying Korean internet news articles. This study finds that support vector machines (SVM) and logistic regression reacted with stability and performed best when paired with borderline-SMOTE2 in imbalanced conditions.
Language
English
URI
https://hdl.handle.net/10371/138054
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share