Measuring the Economic Value of Data in Machine Learning

최문석

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Measuring the Economic Value of Data in Machine Learning : 협력 게임이론을 활용한 기계학습에서의 데이터 경제적 가치 측정
A Cooperative Game Approach

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 최문석

Advisor: 이덕주

Issue Date: 2021-02

Publisher: 서울대학교 대학원

Keywords: Machine learning ; Cooperative game theory ; Semi-value ; Data evaluation ; Fairness ; Privacy ; 기계학습 ; 협동게임이론 ; semi-value ; 데이터 가치 ; 공정성 ; 개인정보보호성

Description: 학위논문 (석사) -- 서울대학교 대학원 : 공과대학 산업공학과, 2021. 2. 이덕주.

Abstract: As machine learning thrives in both academia and industry at the moment, data plays a salient role in training and validating machines. Meanwhile, few works have been developed on the economic evaluation of the data in data exchange market. The contribution of our work is two-fold. First, we take advantage of semi-values from cooperative game theory to model revenue distribution problem. Second, we construct a model consisting of provider, firm, and market while considering the privacy and fairness of machine learning. We showed Banzhaf value could be a reliable alternative to Shapley value in calculating the contribution of each datum. Also, we formulate the firms revenue maximization problem and present numerical analysis in the case of binary classifier with classical data examples. By assuming the firm only uses high quality data, we analyze its behavior in four different scenarios varying the datas fairness and compensating cost for data providers privacy. It turned out that the Banzhaf value is more sensitive to the fairness of data than the Shapley value. We analyzed the maximum revenue proportion which the firm gives away to data providers, as well as the range of number of data the firm would acquire.
기계학습이 현재 이론과 실생활 적용 모두에서 발전함에 따라 데이터는 인공지능 모델을 훈련하고 검증하는 데 중요한 역할을 하고 있다. 한편, 데이터 교환 시장에서 데이터의 경제성 평가에 대한 연구는 초기 단계이다. 본 논문의 기여는 두 가지 관점에서 접근할 수 있다. 첫째, 협동 게임 이론의 개념인 semi-value를 모델 수익 분배 문제에 활용한다. 둘째, 인공지능 모델의 공정성과 개인정보보호성을 고려한 데이터 제공자, 기업, 시장으로 구성된 모델을 제안한다. 본 연구에서 Banzhaf 값은 각 데이터의 기여도를 계산할 때 Shapley 값의 대안이 될 수 있음을 확인하였다. 또한 회사의 수익 극대화 문제를 모델링하였고, 추가적으로 데이터 예제를 사용하여 이진 분류 모델의 경우 수치 분석을 제시하였다. 이를 통해, Banzhaf 값은 Shapley 값보다 데이터의 공정성에 더 민감하다는 것을 확인하였다. 나아가 기업이 고품질 데이터만을 사용한다는 가정하에 데이터의 공정성과 데이터 제공자의 개인정보에 대한 보상비용을 달리하는 네 가지 시나리오에서 기업의 행동을 분석하였다. 기업은 데이터가 공정할수록 데이터 제공자에게 더 큰 수익을 보장해주었고, 고정비용이 작아질수록 가변비용을 통해서 데이터 제공자에게 수익을 나눠주는 것을 확인하였다.

Language: eng

URI: https://hdl.handle.net/10371/175197

https://dcollection.snu.ac.kr/common/orgView/000000164137

Files in This Item:

000000164137.pdf 1.42 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Industrial Engineering (산업공학과)
  - Theses (Master's Degree_산업공학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share