결측치 대치를 활용한 신용데이터 분석방법에 관한 연구

임성현

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

결측치 대치를 활용한 신용데이터 분석방법에 관한 연구 : A Study on Credit Data Analysis Method Using Missing Data Imputation

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 임성현

Advisor: 장원철

Issue Date: 2022

Publisher: 서울대학교 대학원

Keywords: 결측 패턴 ; 결측 대치 ; 머신러닝을 이용한 분류 ; 계급 불균형

Description: 학위논문(석사) -- 서울대학교대학원 : 자연과학대학 통계학과, 2022.2. 장원철.

Abstract: 결측이 발생한 데이터를 모두 제거한 후 완전한 데이터만을 활용해 분석하는 것을 listwise deletion이라고 한다. 이는 편의성 등으로 인해 빈번하게 채택되는 결측에 대한 대응법이지만, 정보의 손실과 편향된 결과를 초래할 가능성이 크다. 이에 결측 대치(imputation) 기법 활용이 listwise deletion에 비해 이점이 있는가에 대해 연구하고자 한다.

본 논문에서는 머신러닝 훈련 과정에서 완전한 데이터만을 활용하는 것과 결측을 대치한 데이터를 활용하는 것을 비교해 통계적 결측 대치가 listwise deletion보다 더 나은 것을 보이고자 한다. 이를 적절하게 논의하기 위해 결측 패턴, 머신러닝 기법, 클래스 불균형 처리 등을 포함하여 내용을 전개하고, 은행의 고객 데이터를 활용해 연체 여부를 예측하는 분류 문제에서 결측 대치 기법의 효과가 있는지를 확인한다.
Listwise deletion is the process of analyzing only complete data after removing all missing data. Although this is a method to deal with missing data that is frequently adopted due to its convenience, it is highly likely to cause information loss and biased results. Therefore, the purpose of this study is to investigate whether the utilization of the imputation methods has an advantage over listwise deletion.

In this paper, we show that statistical imputation is better than listwise deletion by comparing the use of complete data in the machine learning training process with the use of data that is handled missing by imputation. To properly discuss this, we will develop the content, including missing patterns, machine learning techniques, and methods for handling class imbalance, and check the effectiveness of the imputation in the classification problem of predicting delinquency using bank customer data.

Language: kor

URI: https://hdl.handle.net/10371/182990

https://dcollection.snu.ac.kr/common/orgView/000000170224

Files in This Item:

000000170224.pdf 3.35 MB

Appears in Collections:

College of Natural Sciences (자연과학대학)
- Dept. of Statistics (통계학과)
  - Theses (Master's Degree_통계학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share