Countermeasures for Dataset Challenges in Deep Learning: Enhancing Robustness for Data Imbalance, Noisy Labels, and Stress-test

박슬기

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Countermeasures for Dataset Challenges in Deep Learning: Enhancing Robustness for Data Imbalance, Noisy Labels, and Stress-test : 현실 데이터의 문제를 해결하기 위한 강건한 딥러닝 전략: 데이터 불균형, 레이블 노이즈, 스트레스 테스트

DC Field	Value	Language
dc.contributor.advisor	최진영	-
dc.contributor.author	박슬기	-
dc.date.accessioned	2023-11-20T04:20:55Z	-
dc.date.available	2023-11-20T04:20:55Z	-
dc.date.issued	2023	-
dc.identifier.other	000000177390	-
dc.identifier.uri	https://hdl.handle.net/10371/196409	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000177390	ko_KR
dc.description	학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2023. 8. 최진영.	-
dc.description.abstract	Deep learning has shown remarkable success in solving a wide range of AI problems. However, when deployed in real-world scenarios, AI models are often challenged by issues such as noisy labels, imbalanced data, and robustness test. These challenges can have a significant impact on the performance and robustness of machine learning models. This thesis proposes strategies for addressing these challenges and improving the robustness of deep learning models. Specifically, the thesis presents novel methods for handling noisy labels and imbalanced data. The proposed methods are evaluated on the most popular benchmark datasets, and the results show that they can significantly improve the performance and robustness of deep learning models. Furthermore, the thesis introduces a new benchmark dataset, RoCOCO, to stress-test the robustness of multi-modal models. The dataset is designed to simulate real-world perturbations, providing a more realistic and challenging testbed for evaluating the robustness of AI models. Overall, the research presented in this thesis contributes to the development of robust deep learning techniques that can better handle the challenges that arise when deploying machine learning models in real-world scenarios.	-
dc.description.abstract	딥러닝은 다양한 인공지능 문제를 해결하는 데에서 놀라운 성공을 거두었다. 그러나 실제 환경에서 적용할 때, 데이터 불균형, 잘못된 (노이지) 라벨 및 신뢰도 테스트와 같은 문제로 인해 기계학습 모델의 일반화 성능과 강건성이 종종 도전 받는다. 본 논문에서는 이러한 문제를 해결하기 위한 전략과 딥러닝 모델의 강건성 향상을 제안한다. 구체적으로, 본 논문에서는 데이터 불균형 및 노이지 라벨을 다루기 위한 새로운 방법을 제시한다. 제안된 방법은 가장 인기 있는 벤치마크 데이터셋에서 평가되었으며, 결과는 딥러닝 모델의 일반화 성능과 강건성을 크게 향상시킬 수 있음을 보여준다. 또한, 본 논문에서는 멀티모달 모델의 강건성을 스트레스 테스트하기 위한 새로운 벤치마크 데이터셋인 RoCOCO를 소개한다. 이 데이터셋은 실제 세계의 변화를 시뮬레이션하여, 인공지능 모델의 강건성을 평가하는 보다 현실적이고 도전적인 테스트베드를 제공한다. 결론적으로, 이 논문에서 제시된 연구는 기계학습 모델을 실제 환경에서 적용할 때 발생하는 도전을 더 잘 다룰 수 있는 강건한 딥러닝 기술의 발전에 기여한다. 하지만, 향후 연구에서는 제안된 방법들의 한계점을 극복하고 더 많은 현실적인 시나리오에서의 강건성 평가를 위해 노력해야 할 것이다.	-
dc.description.tableofcontents	1 INTRODUCTION 1 2 RELATED WORK 4 2.1 Challenges from Imbalanced Data 4 2.1.1 Re-weighting approach 5 2.1.2 Data-level approach 5 2.1.3 Meta-learning approach 7 2.1.4 Other long-tailed methods 7 2.1.5 Data Augmentation and Mixup Methods 8 2.2 Challenges from Noisy Labels 8 2.2.1 Noise-cleaning Approach 9 2.2.2 Noise-tolerant Approach 9 2.3 Challenges from Robustness Test 10 2.3.1 Unimodal Robustness Test 10 2.3.2 Multimodal Robustness Test 11 2.3.3 Image-Text Matching Methods 11 2.3.4 Image-Text Matching Datasets 12 2.4 Influence function 12 3 Influence-Balanced Loss for Imbalanced Data 14 3.1 Overview 14 3.2 Influence-balanced Loss 16 3.2.1 Key Idea of Proposed Method 16 3.2.2 Influence Function 18 3.2.3 Influence-balanced weighting factor 18 3.2.4 Influence-Balanced Loss 19 3.2.5 Influence-Balanced Class-wise Re-weighting 20 3.2.6 Influence-balanced Training Scheme 21 3.3 Experiments 22 3.3.1 Experimental Settings 22 3.3.2 Analysis 24 3.3.3 Comparison of Class-Wise Accuracy 28 3.3.4 Comparison with State-of-the-Art 31 3.4 Summary 32 4 Context-rich Minority Oversampling for Imbalanced Data 33 4.1 Overview 33 4.2 Context-rich Minority Oversampling 36 4.2.1 Algorithm 36 4.2.2 Minor-class-weighted Distribution Q 37 4.2.3 Regularization Effect of CMO 38 4.3 Experiments 39 4.3.1 Experimental Settings 39 4.3.2 Long-tailed classification benchmarks 41 4.3.3 Analysis 48 4.4 Summary 51 5 Influential Rank: Post-training for Noisy Labels 53 5.1 Overview 53 5.2 Influential Rank 55 5.2.1 Intuition 57 5.2.2 Overfitting Scores 58 5.2.3 Post-processing with Influential Rank 59 5.2.4 Example: A Binary Classification 61 5.3 Experiments 63 5.3.1 Experimental Settings 63 5.3.2 Robustness Comparison 66 5.3.3 Comparison with Small-loss Removal 69 5.3.4 Training with Longer Epochs 72 5.3.5 Validity of OSD 72 5.3.6 Effects of hyperparameter 73 5.3.7 Effects of Multi-round Post-training 75 5.3.8 Distribution of OSD 76 5.3.9 Noisy Label Detection with Influential Rank 77 5.3.10 Experimental results after one-round 78 5.3.11 Detector for Video Data Cleaning 79 5.3.12 Regularizer for Performance Boosting 82 5.4 Summary 83 6 RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models 85 6.1 Overview 85 6.2 Robustness-Evaluation Benchmark 87 6.3 Robustness-Evaluation Benchmark 87 6.3.1 Observations motivating the proposed approach 87 6.3.2 Adversarial Image Generation 88 6.3.3 Adversarial Caption Generation 89 6.4 Experiments and Results 92 6.4.1 Experimental setting 92 6.4.2 Re-evaluation on RoCOCO 93 6.4.3 Analysis and Discussions 95 6.4.4 Semantic Contrastive Loss for Adversarial Captions 98 6.5 Summary 102 7 CONCLUSION 103 Abstract (In Korean) 132	-
dc.format.extent	xiv, 132	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Imbalanced data	-
dc.subject	Long-tail distribution	-
dc.subject	Image classification	-
dc.subject	Oversampling	-
dc.subject	Augmentation	-
dc.subject	Noisy label	-
dc.subject	Robust AI	-
dc.subject	Multi-modal	-
dc.subject	Image-text Matching	-
dc.subject	Stress-test benchmark	-
dc.subject.ddc	621.3	-
dc.title	Countermeasures for Dataset Challenges in Deep Learning: Enhancing Robustness for Data Imbalance, Noisy Labels, and Stress-test	-
dc.title.alternative	현실 데이터의 문제를 해결하기 위한 강건한 딥러닝 전략: 데이터 불균형, 레이블 노이즈, 스트레스 테스트	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Seulki Park	-
dc.contributor.department	공과대학 전기·정보공학부	-
dc.description.degree	박사	-
dc.date.awarded	2023-08	-
dc.contributor.major	Artificial Intelligence	-
dc.identifier.uci	I804:11032-000000177390	-
dc.identifier.holdings	000000000050▲000000000058▲000000177390▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Files in This Item:

000000177390.pdf 47.54 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share