A Multimodal, Multispeaker Abstractive Summarization Dataset of Discussion Threads

키일리

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

A Multimodal, Multispeaker Abstractive Summarization Dataset of Discussion Threads : 멀티모덜 다중화자 토의 글타래의 추상적 요약을 위한 데이터셋

DC Field	Value	Language
dc.contributor.advisor	Gunhee Kim	-
dc.contributor.author	키일리	-
dc.date.accessioned	2023-11-20T04:23:56Z	-
dc.date.available	2023-11-20T04:23:56Z	-
dc.date.issued	2023	-
dc.identifier.other	000000177567	-
dc.identifier.uri	https://hdl.handle.net/10371/196485	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000177567	ko_KR
dc.description	학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2023. 8. Gunhee Kim.	-
dc.description.abstract	With recent advances in artificial intelligence and large language models, automatic summarization of documents such as news articles, dialogues, and online discussions has been improving rapidly. However, much of these improvements have been limited to text-only summarization, and have not addressed that many online discussions are increasingly multimodal, consisting of not only text but also videos and images. While the growing number of multimodal online discussions necessitates automatic summarization to save time and reduce content overload, existing summarization datasets do not sufficiently cover this domain. To address this, we present mRedditSum, the first multimodal discussion summarization dataset. It consists of 3,033 discussion threads where a post solicits advice regarding an issue described with an image and text, and respective comments express diverse opinions. We annotate each thread with a human-written summary that captures both the essential information from the text, as well as the details available only in the image. Experiments show that popular summarization models---GPT-3.5, BART, and T5---consistently improve in performance when visual information is incorporated. We also introduce a novel method, cluster-based multi-stage summarization, that outperforms existing baselines and serves as a competitive baseline for future work.	-
dc.description.abstract	인공지능 기술과 대규모 언어 모델의 발전에 힘입어, 뉴스, 대화, 토의를 위한 자동 요약 기술 또한 빠르게 발전했다. 그러나, 대부분의 자동 요약 기술은 텍스트만 요약하는 것에 한정되어 있으며, 비디오와 이미지를 수반하여 이뤄지고 있는 온라인상 많은 토의를 위한 기술은 거의 다뤄지지 않았다. 현재 요약 데이터 세트들 또한 텍스트들로만 이뤄져 있으며, 이러한 멀티모달 (Multimodal) 영역을 다루는 요약 데이터 세트는 충분치 않다. 이를 해결하기 위하여, 우리는 첫 멀티모달 토의 요약 데이터 세트인 mRedditSum을 선보인다. Reddit의 서브 레딧(subreddits)으로부터 모은 3,033개의 고품질의 토의 스레드(thread)들로 이루어진 본 데이터 세트는 이미지와 텍스트에 기반하여 조언을 구하는 글과 그 글에 다양한 의견으로 답하는 답변들로 구성돼 있다. 멀티모달의 특성에 맞게, 각 스레드에 해당하는 요약은 텍스트뿐만 아니라 이미지에서만 얻을 수 있는 정보들을 취합하여 사람이 작성하였다. 우리는 자동 요약에 자주 쓰이는 대규모 언어 모델들 - T5, BART, GPT-3 - 을 활용하여 실험을 진행하였고, 이미지 캡션(caption) 혹은 비전-텍스트 퓨전 계층(vision-text fusion layer)이 사용되었을 때, 자동 요약의 성능이 향상함을 보였다.	-
dc.description.tableofcontents	Contents Abstract Chapter 1 Introduction 1 1.1 Purpose of Research 1 1.2 Related Work 4 1.2.1 Discussion Thread Summarization 5 1.2.2 Multimodal Summarization 6 Chapter 2 The mRedditSum Dataset 8 2.1 Data Selection 8 2.2 Data Annotation 10 2.2.1 Step 1: Original Post Summarization 11 2.2.2 Step 2: Comment Cluster Summarization 11 2.2.3 Step 3: Summary Synthesis 12 2.3 Dataset Analyses 12 2.3.1 Statistics 12 2.3.2 Abstractiveness 13 2.3.3 Relatedness between Text and Images 13 Chapter 3 Models and Experiments 15 3.1 Task Definitions 15 3.2 Evaluation Metrics 16 3.2.1 ROUGE 16 3.2.2 BertScore 16 3.3 Models 16 3.3.1 Baseline Models 17 3.3.2 Cluster-based Multi-stage Summarization 19 3.4 Implementation Details 19 Chapter 4 Results and Analysis 22 4.1 Experiment Results 22 4.2 Qualitative Analysis 24 4.3 Human Evaluation 24 Chapter 5 Conclusion 27 Appendix A Annotation Interface 28 Appendix B Additional Sample Data 31 Appendix C Further Analyses 34 C.0.1 Summarization based on the Length of Input Threads 34 C.0.2 Summarization per Subreddit 37 Acknowledgements 44 요약 45	-
dc.format.extent	vii, 45	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Deep Learning	-
dc.subject	Natural Language Processing	-
dc.subject	Computer Vision	-
dc.subject	Abstractive Summarization	-
dc.subject	Multimodal Summarization	-
dc.subject	Dataset Annotation	-
dc.subject.ddc	621.39	-
dc.title	A Multimodal, Multispeaker Abstractive Summarization Dataset of Discussion Threads	-
dc.title.alternative	멀티모덜 다중화자 토의 글타래의 추상적 요약을 위한 데이터셋	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Keighley Shea Overbay	-
dc.contributor.department	공과대학 컴퓨터공학부	-
dc.description.degree	석사	-
dc.date.awarded	2023-08	-
dc.identifier.uci	I804:11032-000000177567	-
dc.identifier.holdings	000000000050▲000000000058▲000000177567▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Files in This Item:

000000177567.pdf 5.52 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share