Classification for Multivariate Binary Data based on Association Rule

김명준

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Classification for Multivariate Binary Data based on Association Rule : 연관 규칙에 기반한 다변량 이진 데이터 분류 문제의 해결

DC Field	Value	Language
dc.contributor.advisor	PARK JUN YONG	-
dc.contributor.author	김명준	-
dc.date.accessioned	2023-06-29T02:36:43Z	-
dc.date.available	2023-06-29T02:36:43Z	-
dc.date.issued	2023	-
dc.identifier.other	000000174832	-
dc.identifier.uri	https://hdl.handle.net/10371/194385	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000174832	ko_KR
dc.description	학위논문(석사) -- 서울대학교대학원 : 자연과학대학 통계학과, 2023. 2. PARK JUN YONG.	-
dc.description.abstract	High-dimensional data refers to data which contains a lot of variables more than or equal to the number of observations. When dealing with high-dimensional data, it is necessary to select variables with high importance for further analysis, and association rule can be a useful method when data is binary. Association rule is one of the data mining techniques that extracts meaningful relationships from data. In this thesis, association rule will be used to analyze microbial DNA fingerprint data. To this end, this thesis uses association rule as a classifier and compares it with several machine learning models. Also, this thesis proposes a variable selection algorithm based on association rule. By comparing association rule with other variable selection methods, it was found that association rule is a useful technique to solve classification problems for multivariate binary data.	-
dc.description.abstract	고차원 데이터는 변수의 개수가 관측치의 수와 비슷하거나 그 이상으로 많은 데이터를 의미한다. 고차원 데이터를 다룰 때 추후 분석을 위해 중요도가 높은 변수를 선택하는 것은 필수적이며, 데이터가 이진 변수로만 이루어져 있는 경우 연관 규칙은 유용한 방법이 될 수 있다. 연관 규칙은 데이터로부터 유의미한 관계를 추출하는 데이터 마이닝 기법의 하나이다. 본 논문에서는 연관규칙을 활용하여 미생물 DNA 지문 데이터를 분석할 것이다. 이를 위해, 먼저 연관 규칙을 분류기로서 사용하고 여러 머신 러닝 모형과 그 성능을 비교한다. 더 나아가 연관 규칙에 기반한 변수 선택 방법을 제안하고, 이미 알려진 변수 선택 방법과 비교할 것이다. 이를 통해 다변량 이진 데이터의 분류 문제 해결에 있어서 연관 규칙이 유용함을 확인하는 것이 목표이다.	-
dc.description.tableofcontents	Abstract Chapter 1 Introduction 1 Chapter 2 Classification Methods 3 2.1 Association Rule 3 2.1.1 Association Rule 3 2.1.2 Classification based on Association Rule 5 2.2 L1 Regularized Logistic Regression 5 2.3 Random Forest 6 2.3.1 Decision Tree 6 2.3.2 Random Forest 7 2.4 Boosting 10 2.4.1 AdaBoost 10 2.4.2 Gradient Boosting 10 2.4.3 XGBoost 12 Chapter 3 Analysis and Results 13 3.1 Data Description 13 3.2 Evaluation Metrics 14 3.3 Classification 16 3.4 Variable Selection via Association Rule 17 Chapter 4 Conclusion 22 Appendix A Codes 24 A.1 R Code for Classification based on Association Rule 24 A.2 Python Code for L1 Regularized Logistic Regression 25 A.3 Python Code for Random Forest 27 A.4 Python Code for XGBoost 29 초록 37	-
dc.format.extent	v, 37	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Classification	-
dc.subject	Association rule	-
dc.subject	High-dimensional data	-
dc.subject	Multivariate binary data	-
dc.subject.ddc	519.5	-
dc.title	Classification for Multivariate Binary Data based on Association Rule	-
dc.title.alternative	연관 규칙에 기반한 다변량 이진 데이터 분류 문제의 해결	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Myungjun Kim	-
dc.contributor.department	자연과학대학 통계학과	-
dc.description.degree	석사	-
dc.date.awarded	2023-02	-
dc.identifier.uci	I804:11032-000000174832	-
dc.identifier.holdings	000000000049▲000000000056▲000000174832▲	-

Appears in Collections:

College of Natural Sciences (자연과학대학)
- Dept. of Statistics (통계학과)
  - Theses (Master's Degree_통계학과)

Files in This Item:

000000174832.pdf 2.10 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share