Factual Consistency Evaluation for Conditional Text Generation Systems

이환희

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Factual Consistency Evaluation for Conditional Text Generation Systems : 조건부 텍스트 생성 시스템에 대한 사실 관계의 일관성 평가

DC Field	Value	Language
dc.contributor.advisor	정교민	-
dc.contributor.author	이환희	-
dc.date.accessioned	2022-12-29T07:38:59Z	-
dc.date.available	2022-12-29T07:38:59Z	-
dc.date.issued	2022	-
dc.identifier.other	000000172521	-
dc.identifier.uri	https://hdl.handle.net/10371/187723	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000172521	ko_KR
dc.description	학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022. 8. 정교민.	-
dc.description.abstract	최근의 사전학습 언어모델의 활용을 통한 조건부 텍스트 생성 시스템들의 발전에도 불구하고, 시스템들의 사실 관계의 일관성은 여전히 충분하지 않은 편이다. 그러나 널리 사용되는 n-그램 기반 유사성 평가 기법은 사실 일관성 평가에 매우 취약하다. 따라서, 사실 일관된 텍스트 생성 시스템을 개발하기 위해서는 먼저 시스템의 사실 관계를 제대로 평가할 수 있는 자동 평가 기법이 필요하다. 본 논문에서는 다양한 조건부 텍스트 생성 시스템에 대해, 이전 평가 기법보다 사실 관계 일관성 평가에서 인간의 판단과 매우 높은 상관관계를 보여주는 4가지 평가 기법을 제안한다. 이 기법들은 (1) 보조 태스크 활용 및 (2) 데이터 증강 기법 등을 활용한다. 첫째로, 우리는 중요한 핵심 단어또는 핵심 구문에 초점을 맞춘 두 가지 다른 보조 태스크를 활용하여 두 가지 사실 관계의 일관성 평가 기법을 제안한다. 우리는 먼저 핵심 구문의 가중치 예측 태스크를 이전 평가 기법에 결합하여 주관식 질의 응답을 위한 평가 기법을 제안한다. 또한, 우리는 질의 생성 및 응답을 활용하여 키워드에 대한 질의를 생성하고, 이미지와 캡션에 대한 질문의 답을 비교하여 사실 일관성을 확인하는 QACE를 제안한다. 둘째로, 우리는 보조 태스크 활용과 달리, 데이터 기반 방식의 학습을 통해 두 가지의 평가 기법을 제안한다. 구체적으로, 우리는 증강된 일관성 없는 텍스트를 일관성 있는 텍스트와 구분하도록 훈련한다. 먼저 규칙 기반 변형을 통한 불일치 캡션 생성으로 이미지 캡션 평가 지표 UMIC을 제안한다. 다음 단계로, 마스킹된 소스와 마스킹된 요약을 사용하여 일관성이 없는 요약을 생성하는 MFMA를 통해 평가 지표를 개발한다. 마지막으로, 데이터 기반 사실 일관성 평가 기법 개발의 확장으로, 시스템의 사실 관계 오류를 수정할 수 있는 빠른 사후 교정 시스템을 제안한다.	-
dc.description.abstract	Despite the recent advances of conditional text generation systems leveraged from pre-trained language models, factual consistency of the systems are still not sufficient. However, widely used n-gram similarity metrics are vulnerable to evaluate the factual consistency. Hence, in order to develop a factual consistent system, an automatic factuality metric is first necessary. In this dissertation, we propose four metrics that show very higher correlation with human judgments than previous metrics in evaluating factual consistency, for diverse conditional text generation systems. To build such metrics, we utilize (1) auxiliary tasks and (2) data augmentation methods. First, we focus on the keywords or keyphrases that are critical for evaluating factual consistency and propose two factual consistency metrics using two different auxiliary tasks. We first integrate the keyphrase weights prediction task to the previous metrics to propose a KPQA (Keyphrase Prediction for Question Answering)-metric for generative QA. Also, we apply question generation and answering to develop a captioning metric QACE (Question Answering for Captioning Evaluation). QACE generates questions on the keywords of the candidate. QACE checks the factual consistency by comparing the answers of these questions for the source image and the caption. Secondly, different from using auxiliary tasks, we directly train a metric with a data-driven approach to propose two metrics. Specifically, we train a metric to distinguish augmented inconsistent texts with the consistent text. We first modify the original reference captions to generate inconsistent captions using several rule-based methods such as substituting keywords to propose UMIC (Unreferenced Metric for Image Captioning). As a next step, we introduce a MFMA (Mask-and-Fill with Masked-Article)-metric by generating inconsistent summary using the masked source and the masked summary. Finally, as an extension of developing data-driven factual consistency metrics, we also propose a faster post-editing system that can fix the factual errors in the system.	-
dc.description.tableofcontents	1 Introduction 1 2 Background 10 2.1 Text Evaluation Metrics 10 2.1.1 N-gram Similarity Metrics 10 2.1.2 Embedding Similarity Metrics 12 2.1.3 Auxiliary Task Based Metrics 12 2.1.4 Entailment Based Metrics 13 2.2 Evaluating Automated Metrics 14 3 Integrating Keyphrase Weights for Factual Consistency Evaluation 15 3.1 Related Work 17 3.2 Proposed Approach: KPQA-Metric 18 3.2.1 KPQA 18 3.2.2 KPQA Metric 19 3.3 Experimental Setup and Dataset 23 3.3.1 Dataset 23 3.3.2 Implementation Details 26 3.4 Empirical Results 27 3.4.1 Comparison with Other Methods 27 3.4.2 Analysis 29 3.5 Conclusion 35 4 Question Generation and Question Answering for Factual Consistency Evaluation 36 4.1 Related Work 37 4.2 Proposed Approach: QACE 38 4.2.1 Question Generation 38 4.2.2 Question Answering 39 4.2.3 Abstractive Visual Question Answering 40 4.2.4 QACE Metric 42 4.3 Experimental Setup and Dataset 43 4.3.1 Dataset 43 4.3.2 Implementation Details 44 4.4 Empirical Results 45 4.4.1 Comparison with Other Methods 45 4.4.2 Analysis 46 4.5 Conclusion 48 5 Rule-Based Inconsistent Data Augmentation for Factual Consistency Evaluation 49 5.1 Related Work 51 5.2 Proposed Approach: UMIC 52 5.2.1 Modeling 52 5.2.2 Negative Samples 53 5.2.3 Contrastive Learning 55 5.3 Experimental Setup and Dataset 56 5.3.1 Dataset 56 5.3.2 Implementation Details 60 5.4 Empirical Results 61 5.4.1 Comparison with Other Methods 61 5.4.2 Analysis 62 5.5 Conclusion 65 6 Inconsistent Data Augmentation with Masked Generation for Factual Consistency Evaluation 66 6.1 Related Work 68 6.2 Proposed Approach: MFMA and MSM 70 6.2.1 Mask-and-Fill with Masked Article 71 6.2.2 Masked Summarization 72 6.2.3 Training Factual Consistency Checking Model 72 6.3 Experimental Setup and Dataset 73 6.3.1 Dataset 73 6.3.2 Implementation Details 74 6.4 Empirical Results 75 6.4.1 Comparison with Other Methods 75 6.4.2 Analysis 78 6.5 Conclusion 84 7 Factual Error Correction for Improving Factual Consistency 85 7.1 Related Work 87 7.2 Proposed Approach: RFEC 88 7.2.1 Problem Formulation 88 7.2.2 Training Dataset Construction 89 7.2.3 Evidence Sentence Retrieval 90 7.2.4 Entity Retrieval Based Factual Error Correction 90 7.3 Experimental Setup and Dataset 92 7.3.1 Dataset 92 7.3.2 Implementation Details 93 7.4 Empirical Results 93 7.4.1 Comparison with Other Methods 93 7.4.2 Analysis 95 7.5 Conclusion 95 8 Conclusion 97 Abstract (In Korean) 118	-
dc.format.extent	xiii, 119	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	factualconsistency	-
dc.subject	textgeneration	-
dc.subject	evaluationmetric	-
dc.subject.ddc	621.3	-
dc.title	Factual Consistency Evaluation for Conditional Text Generation Systems	-
dc.title.alternative	조건부 텍스트 생성 시스템에 대한 사실 관계의 일관성 평가	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Hwanhee Lee	-
dc.contributor.department	공과대학 전기·정보공학부	-
dc.description.degree	박사	-
dc.date.awarded	2022-08	-
dc.contributor.major	자연어처리	-
dc.identifier.uci	I804:11032-000000172521	-
dc.identifier.holdings	000000000048▲000000000055▲000000172521▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Files in This Item:

000000172521.pdf 7.46 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share