CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in
Abstractive Summarization
- URL: http://arxiv.org/abs/2109.09209v1
- Date: Sun, 19 Sep 2021 20:05:21 GMT
- Title: CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in
Abstractive Summarization
- Authors: Shuyang Cao and Lu Wang
- Abstract summary: We study generating abstractive summaries that are faithful and factually consistent with the given articles.
A novel contrastive learning formulation is presented, which leverages both reference summaries, as positive training data, and automatically generated erroneous summaries, as negative training data, to train summarization systems that are better at distinguishing between them.
- Score: 6.017006996402699
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study generating abstractive summaries that are faithful and factually
consistent with the given articles. A novel contrastive learning formulation is
presented, which leverages both reference summaries, as positive training data,
and automatically generated erroneous summaries, as negative training data, to
train summarization systems that are better at distinguishing between them. We
further design four types of strategies for creating negative samples, to
resemble errors made commonly by two state-of-the-art models, BART and PEGASUS,
found in our new human annotations of summary errors. Experiments on XSum and
CNN/Daily Mail show that our contrastive learning framework is robust across
datasets and models. It consistently produces more factual summaries than
strong comparisons with post error correction, entailment-based reranking, and
unlikelihood training, according to QA-based factuality evaluation. Human
judges echo the observation and find that our model summaries correct more
errors.
Related papers
- AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs)
Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z) - Lexical Repetitions Lead to Rote Learning: Unveiling the Impact of
Lexical Overlap in Train and Test Reference Summaries [131.80860903537172]
Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote.
We propose a fine-grained evaluation protocol by partitioning a test set based on the lexical similarity of reference test summaries with training summaries.
arXiv Detail & Related papers (2023-11-15T23:47:53Z) - Correcting Diverse Factual Errors in Abstractive Summarization via
Post-Editing and Language Model Infilling [56.70682379371534]
We show that our approach vastly outperforms prior methods in correcting erroneous summaries.
Our model -- FactEdit -- improves factuality scores by over 11 points on CNN/DM and over 31 points on XSum.
arXiv Detail & Related papers (2022-10-22T07:16:19Z) - Towards Robust Visual Question Answering: Making the Most of Biased
Samples via Contrastive Learning [54.61762276179205]
We propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples.
Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples.
We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.
arXiv Detail & Related papers (2022-10-10T11:05:21Z) - FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for
Abstractive Summarization [91.46015013816083]
We present FactPEG, an abstractive summarization model that addresses the problem of factuality during pre-training and fine-tuning.
Our analysis suggests FactPEG is more factual than using the original pre-training objective in zero-shot and fewshot settings.
arXiv Detail & Related papers (2022-05-16T17:39:14Z) - CONFIT: Toward Faithful Dialogue Summarization with
Linguistically-Informed Contrastive Fine-tuning [5.389540975316299]
Factual inconsistencies in generated summaries severely limit the practical applications of abstractive dialogue summarization.
We provide a typology of factual errors with annotation data to highlight the types of errors and move away from a binary understanding of factuality.
We propose a training strategy that improves the factual consistency and overall quality of summaries via a novel contrastive fine-tuning, called ConFiT.
arXiv Detail & Related papers (2021-12-16T09:08:40Z) - Annotating and Modeling Fine-grained Factuality in Summarization [36.88018450067003]
A major barrier to their use in practice is their propensity to output summaries that are not faithful to the input and that contain factual errors.
We explore both synthetic and human-labeled data sources for training models to identify factual errors in summarization.
We show that our best factuality detection model enables training of more factual XSum summarization models by allowing us to identify non-factual tokens in the training data.
arXiv Detail & Related papers (2021-04-09T11:20:44Z) - Factual Error Correction for Abstractive Summarization Models [41.77317902748772]
We propose a post-editing corrector module to correct factual errors in generated summaries.
We show that our model is able to correct factual errors in summaries generated by other neural summarization models.
We also find that transferring from artificial error correction to downstream settings is still very challenging.
arXiv Detail & Related papers (2020-10-17T04:24:16Z) - Multi-Fact Correction in Abstractive Text Summarization [98.27031108197944]
Span-Fact is a suite of two factual correction models that leverages knowledge learned from question answering models to make corrections in system-generated summaries via span selection.
Our models employ single or multi-masking strategies to either iteratively or auto-regressively replace entities in order to ensure semantic consistency w.r.t. the source text.
Experiments show that our models significantly boost the factual consistency of system-generated summaries without sacrificing summary quality in terms of both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-10-06T02:51:02Z) - Learning by Semantic Similarity Makes Abstractive Summarization Better [13.324006587838522]
We compare the generated summaries from recent LM, BART, and the reference summaries from a benchmark dataset, CNN/DM.
Interestingly, model-generated summaries receive higher scores relative to reference summaries.
arXiv Detail & Related papers (2020-02-18T17:59:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.