Publications
Detailed Information
A Comparison of Oversampling Effects on Imbalanced Topic Classification of Korean Texts : 한국어 주제 분류에서 오버 샘플링 효과 비교
Cited 0 time in
Web of Science
Cited 0 time in Scopus
- Authors
- Advisor
- 김청택
- Major
- 인문대학 협동과정 인지과학전공
- Issue Date
- 2017-08
- Publisher
- 서울대학교 대학원
- Keywords
- Imbalanced data ; Korean text analysis ; oversampling ; SMOTE ; supervised learning ; topic classification
- Description
- 학위논문 (석사)-- 서울대학교 대학원 인문대학 협동과정 인지과학전공, 2017. 8. 김청택.
- Abstract
- Imbalanced data is a widely-acknowledged problem in supervised learning classification tasks. Oversampling is one way to overcome the problem and there are many methods of oversampling that have been discovered. While researches on the effect of oversampling on other languages have been widely conducted, studies comparing oversampling methods on Korean texts are scarce. This study compares the effect of oversampling methods on the task of classifying Korean internet news articles. This study finds that support vector machines (SVM) and logistic regression reacted with stability and performed best when paired with borderline-SMOTE2 in imbalanced conditions.
- Language
- English
- Files in This Item:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.