Understanding and Predicting User Behavior and Content Propagation Patterns in Internet: A Data-Scientific Approach

최대진

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Understanding and Predicting User Behavior and Content Propagation Patterns in Internet: A Data-Scientific Approach : 데이터 과학 분석 방법에 기반한 온라인 상의 사용자 행동 및 콘텐트 전파 패턴 이해 및 예측

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 최대진

Advisor: 권태경

Major: 공과대학 전기·컴퓨터공학부

Issue Date: 2019-02

Publisher: 서울대학교 대학원

Description: 학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2019. 2. 권태경.

Abstract: It becomes a norm for people to communicate with one another through various online social channels, such as message boards, online social networks, and social media. As these online digital channels of communications are producing a deluge of social data, computational data-driven studies have in turn spurred to understand human behaviors and communication patterns. As part of such studies, this thesis studies online communications from the following topics: (i) characterizing threaded conversations in terms of content, user, and community perspectives, (ii) characterizing popular and viral image propagation, and (iii) understanding content publishing and sharing patterns. To this end, three large-scale datasets that contain (i) 0.7 million threaded conversations from 1.5 million users from Reddit, (ii) 0.3 million images shared by 1 million users from Pinterest, and (iii) 4.2 billion requests for 80 million URLs created through Bitly are collected. The data-driven analysis on the datasets reveals that content, user behavioral, and topical community factors (e.g., difficulties of texts, portion of reciprocal communications, or discussion-encouraged communities) are highly associated with the large, responsive, or viral conversations. Through in-depth analysis on Pinterest dataset, this thesis shows that structural virality of image cascade differentiates large cascades in terms of its shape (i.e., broadcast or diffusion) and factors such as propagating time are differently related to the volume and virality. By modeling the relations among web sites (e.g., twitter.com, amazon.com) for content sources and publishing spaces from Bitly dataset, this thesis finds that they play different roles in publishing short URLs. For example, search engines, online social networks, and computer \& electronics sites like newsfeed services are popular spaces for content publishing while news and streaming services are widely used as content sources. The analysis of content publishing and sharing patterns through URL shortening reveals that users are likely to access different types of content via different websites. For example, adult or malicious content tend to be requested from search engines, shopping content is primarily accessed through online social networks, and news content is usually clicked through computer \& electronics websites. This thesis also reports that news or shopping content, published through online social networks, tend to be requested quickly and virally. Lastly, based on the lessons learned, a learning-based model to predict whether a conversation or an image cascade would be large or viral is proposed, which achieves a high performance. By giving valuable insights on understanding (i) how different users interact with others across different content, topics, and communities, (ii) what and how content is propagated in a viral manner, and (iii) how different content is published and accessed through different online spaces, this thesis is believed to contribute to better online services such as marketing or novel platform design.
사회 관계망 서비스, 소셜 미디어, 게시판 등 다양한 온라인 서비스의 발달로 한 사람이 다른 사람들과 다양한 채널을 통해 의사소통을 하는 것이 일반화 되었다. 이러한 온라인 디지털 채널들이 사용자들의 의사소통에 관련된 많은 데이터를 축적해 옴에 따라, 데이터에 기반하여 사람들의 행동이나 의사소통 방식을 모델링, 분석하고 예측하는 연구가 가능하게 되었다. 본 학위 논문에서는 이러한 연구의 한 부분으로 다음과 같은 데이터 기반 분석을 수행한다.: (i) 사용자 행동, 콘텐트, 사용자 집단 특성에 기반한 온라인 대화 패턴 분석, (ii) 인기있고 전염성 높은 (viral) 이미지 전파 특성 분석 및 예측, (iii) 온라인 콘텐트의 게시 및 소비 등 유통 흐름에 대한 분석. 이를 위해, (i) 약 150만 명의 레딧 유저로부터 생성된 70만개의 온라인 대화, (ii) 핀터레스트 내에 유포된 약 33만 개의 이미지 및 전파 데이터, (iii) Bitl를 통해 게시된 약 8천만개의 짧은 URL 및 42억개의 요청 데이터셋을 수집하고 분석한다. 이러한 분석들을 통해, 콘텐트, 사용자의 행동특성 및 집단적 특성이 각각 크고, 반응적이고, 전염적인 온라인 대화와 관련이 있음을 밝혀내었으며, 핀터레스트 데이터셋에 기반한 분석을 통해 이미지 전파에서 구조적 전염도 (Structural virality)가 단순히 큰 전파와 전파 모양 측면에서 차이가 있음을 밝혀내었다. 또한, Bitly 데이터셋에 기반하여 콘텐트와 리퍼러 (Referrer) 도메인 간의 관련성을 모델링함으로써, 서비스 별 특성 (뉴스피드, 스트리밍, 온라인 쇼핑 등) 에 따라 콘텐츠 게시 및 소비 패턴이 다름을 입증하였다. 이러한 발견들에 기반하여, 최종적으로 하나의 온라인 대화나 이미지 콘텐트가 커질지 혹은 전염적으로 확산될지를 예측하기 위한 기계학습 기반 모델을 제안하였다. 본 논문에서 제안된 모델은 최초에 관측된 코멘트 혹은 이미지 전파 패턴, 사용자의 행동 특성, 콘텐트의 특성을 모두 활용하여 높은 확률로 크거나 전염성이 높은 대화 및 이미지 전파를 예측할 수 있었다. 본 학위 논문을 통해 발견된 현상 및 예측 모델은 온라인 사회 관계망 서비스 제공자, 마케터, 콘텐트 제공자 등 정보나 콘텐츠의 확산을 목적으로 하는 사람들은 물론, 전파 패턴이나 확산 규모 등에 대한 해석가능한 인공지능 모델을 개발하는데 있어서 큰 기여를 할 수 있을 것으로 기대한다.

Language: eng

URI: https://hdl.handle.net/10371/151942

Files in This Item:

000000153960.pdf 6.38 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share