SHERP

A Comparison of Oversampling Effects on Imbalanced Topic Classification of Korean Texts
한국어 주제 분류에서 오버 샘플링 효과 비교

Cited 0 time in webofscience Cited 0 time in scopus
Authors
서이레
Advisor
김청택
Major
인문대학 협동과정 인지과학전공
Issue Date
2017
Publisher
서울대학교 대학원
Keywords
Imbalanced dataKorean text analysisoversamplingSMOTEsupervised learningtopic classification
Description
학위논문 (석사)-- 서울대학교 대학원 인문대학 협동과정 인지과학전공, 2017. 8. 김청택.
Abstract
Imbalanced data is a widely-acknowledged problem in supervised learning classification tasks. Oversampling is one way to overcome the problem and there are many methods of oversampling that have been discovered. While researches on the effect of oversampling on other languages have been widely conducted, studies comparing oversampling methods on Korean texts are scarce. This study compares the effect of oversampling methods on the task of classifying Korean internet news articles. This study finds that support vector machines (SVM) and logistic regression reacted with stability and performed best when paired with borderline-SMOTE2 in imbalanced conditions.
Language
English
URI
http://hdl.handle.net/10371/138054
Files in This Item:
Appears in Collections:
College of Humanities (인문대학)Program in Cognitive Science (협동과정-인지과학전공)Theses (Master's Degree_협동과정-인지과학전공)
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse