Publications
Detailed Information
A Comparison of Oversampling Effects on Imbalanced Topic Classification of Korean Texts : 한국어 주제 분류에서 오버 샘플링 효과 비교
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 김청택 | - |
dc.contributor.author | 서이레 | - |
dc.date.accessioned | 2017-10-31T08:30:45Z | - |
dc.date.available | 2017-10-31T08:30:45Z | - |
dc.date.issued | 2017-08 | - |
dc.identifier.other | 000000146451 | - |
dc.identifier.uri | https://hdl.handle.net/10371/138054 | - |
dc.description | 학위논문 (석사)-- 서울대학교 대학원 인문대학 협동과정 인지과학전공, 2017. 8. 김청택. | - |
dc.description.abstract | Imbalanced data is a widely-acknowledged problem in supervised learning classification tasks. Oversampling is one way to overcome the problem and there are many methods of oversampling that have been discovered. While researches on the effect of oversampling on other languages have been widely conducted, studies comparing oversampling methods on Korean texts are scarce. This study compares the effect of oversampling methods on the task of classifying Korean internet news articles. This study finds that support vector machines (SVM) and logistic regression reacted with stability and performed best when paired with borderline-SMOTE2 in imbalanced conditions. | - |
dc.description.tableofcontents | Introduction 1
Machine Learning and Korean Text Classification 1 A Brief Introduction of the Main Classifiers 2 The Problem of Imbalanced Data 5 Approaches to Solve the Problem of Imbalanced Data 6 Literature Review 10 Imbalanced Data in Korean Studies 10 Characteristics of Text Data 11 Characteristics of the Korean Language 13 Research Question 17 Introduction to SMOTE Methods 18 SMOTE 18 Borderline-SMOTE 19 SVM-SMOTE 22 ADASYN 23 A Framework for Comparing the Effectiveness of SMOTE Methods 28 Relevant Factors in Classification Tasks 28 Performance Measures 29 Implementation 31 Text Preparations 31 Method 34 Experiments 36 Study 1: Articles with High Cosine Similarities 36 Study 2: Articles with Low Cosine Similarity 45 Discussion and Conclusion 54 Discussion 54 Conclusion 58 References 59 Appendix 67 국문 초록 80 | - |
dc.format | application/pdf | - |
dc.format.extent | 1347027 bytes | - |
dc.format.medium | application/pdf | - |
dc.language.iso | en | - |
dc.publisher | 서울대학교 대학원 | - |
dc.subject | Imbalanced data | - |
dc.subject | Korean text analysis | - |
dc.subject | oversampling | - |
dc.subject | SMOTE | - |
dc.subject | supervised learning | - |
dc.subject | topic classification | - |
dc.subject.ddc | 153 | - |
dc.title | A Comparison of Oversampling Effects on Imbalanced Topic Classification of Korean Texts | - |
dc.title.alternative | 한국어 주제 분류에서 오버 샘플링 효과 비교 | - |
dc.type | Thesis | - |
dc.contributor.AlternativeAuthor | Yirey Suh | - |
dc.description.degree | Master | - |
dc.contributor.affiliation | 인문대학 협동과정 인지과학전공 | - |
dc.date.awarded | 2017-08 | - |
- Appears in Collections:
- Files in This Item:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.