빅데이터 분석을 이용한 한국어 고정관념 구조화

Abstract: 최근 심각해지는 사회갈등의 양상을 이해하기 위해 현재 한국사회에 존재하는 고정관념을 내용에 따라 구체적으로 범주화하고, 각 범주에 대한 문장들로 구성된 대규모 데이터셋을 구축했다. 이를 위해 9개의 온라인 커뮤니티(디시인사이드, 일베, 네이트판, 더쿠, 뽐뿌, 루리웹, 웃긴대학, 워마드, 블라인드)와 4개의 웹사이트(유튜브, 네이버뉴스, 네이버영화, 다음뉴스)로부터 댓글을 수집했고, 크라우드소싱을 통해 댓글에 대한 고정관념 유무, 고정관념 대상, 심리적 친밀감의 평정 자료를 수집했다. 문장의 고정관념 포함 여부를 객관적으로 판단하기 위해 심리적 친밀감을 가중치로 활용하여 평정자 4명의 고정관념 유무 응답을 병합했다.
연구 1에서 고정관념 대상에 대한 응답 데이터를 일부 활용한 Ko-Sentece-Transformer 모델의 미세조정을 통해 응답 데이터를 효과적으로 임베딩했고, 가우시안 혼합 모형과 BERTopic을 사용하여 11개의 대범주와 99개의 소범주를 규명하였다. 11개의 대범주는 사건/사고, 산업/직업, 국제사회, 지역, 대인관계, 인터넷 사이트, 연령, 정치이념, 엔터테인먼트 산업, 종교/사상, 기타이다. 또한 연구 2에서 대범주 간 유사도, 범주 간 조합 양상, 부정적이지 않은 고정관념을 확인하여, 연구 1에서 규명한 범주들의 특성을 살펴보았다.
본 연구는 사람들의 실제 행동을 반영한 빅데이터인 온라인 댓글을 심리학 영역에서 활용한 점과 자료주도적 접근을 통해 한국사회의 고정관념을 구체적이고 구조적으로 규명한 점에서 의의가 있다.
To gain a comprehensive understanding of the recent escalation of social conflicts, this study aimed to categorize the objects of stereotypes, specifically within contemporary Korean society and construct an extensive dataset comprising sentences related to each identified category. The study utilizes online comments collected from nine online communities (DC Inside, Ilbe, Nate Pann, Theqoo, Ppomppu, Ruliweb, Humor University, Womad, Blind) and four websites (Youtube, Naver News, Naver Movies, Daum News). Crowd-sourcing was employed to collect data on the presence of stereotypes, the objects of stereotypes, and psychological intimacy associated with the comments. To objectively determine the inclusion of stereotypes in sentences, the presence of stereotype responses of four assessors were consolidated by weighting them based on the psychological intimacy ratings.
In study 1, the objects of stereotype responses were effectively embedded by fine-tuning the Ko-Sentence-Transformer model using a subset of the response data. Gaussian Mixture Models and BERTopic were employed to identify 11 major categories and 99 subcategories. The 11 major categories include Incidents, Industry/Occupation, International, Region, Interpersonal, Website, Age, Politics, Entertainment Industry, Belief, and Others. In Study 2, we explore the characteristics of the categories identified in Study 1 by analyzing the similarities among major categories and examining potential combinations between them. Additionally, the infrequent occurrence of non-negative stereotypes is discussed.
The contribution of this research lies in utilizing big data from online comments, which provides insights into the actual behaviors of individuals, enabling a data-driven structural investigation of stereotypes in Korean society.

Language: kor

URI: https://hdl.handle.net/10371/196968

https://dcollection.snu.ac.kr/common/orgView/000000179731

Files in This Item:

000000179731.pdf 1.98 MB

Appears in Collections:

College of Social Sciences (사회과학대학)
- Dept. of Psychology (심리학과)
  - Theses (Master's Degree_심리학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share