Towards Improving Faithfulness in Abstractive Summarization
- URL: http://arxiv.org/abs/2210.01877v1
- Date: Tue, 4 Oct 2022 19:52:09 GMT
- Title: Towards Improving Faithfulness in Abstractive Summarization
- Authors: Xiuying Chen, Mingzhe Li, Xin Gao, Xiangliang Zhang
- Abstract summary: We propose a Faithfulness Enhanced Summarization model (FES) to improve fidelity in abstractive summarization.
Our model outperforms strong baselines in experiments on CNN/DM and XSum.
- Score: 37.19777407790153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the success achieved in neural abstractive summarization based on
pre-trained language models, one unresolved issue is that the generated
summaries are not always faithful to the input document. There are two possible
causes of the unfaithfulness problem: (1) the summarization model fails to
understand or capture the gist of the input text, and (2) the model over-relies
on the language model to generate fluent but inadequate words. In this work, we
propose a Faithfulness Enhanced Summarization model (FES), which is designed
for addressing these two problems and improving faithfulness in abstractive
summarization. For the first problem, we propose to use question-answering (QA)
to examine whether the encoder fully grasps the input document and can answer
the questions on the key information in the input. The QA attention on the
proper input words can also be used to stipulate how the decoder should attend
to the source. For the second problem, we introduce a max-margin loss defined
on the difference between the language and the summarization model, aiming to
prevent the overconfidence of the language model. Extensive experiments on two
benchmark summarization datasets, CNN/DM and XSum, demonstrate that our model
significantly outperforms strong baselines. The evaluation of factual
consistency also shows that our model generates more faithful summaries than
baselines.
Related papers
- Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Ask Again, Then Fail: Large Language Models' Vacillations in Judgment [28.74246375289661]
We observe that current conversational language models often waver in their judgments when faced with follow-up questions.
We introduce a textscFollow-up Questioning Mechanism along with two metrics to quantify this inconsistency.
We develop a training-based framework textscUnwavering-FQ that teaches language models to maintain their originally correct judgments.
arXiv Detail & Related papers (2023-10-03T16:08:41Z) - Improving the Robustness of Summarization Systems with Dual Augmentation [68.53139002203118]
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input.
We first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise.
We propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models.
arXiv Detail & Related papers (2023-06-01T19:04:17Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Correcting Diverse Factual Errors in Abstractive Summarization via
Post-Editing and Language Model Infilling [56.70682379371534]
We show that our approach vastly outperforms prior methods in correcting erroneous summaries.
Our model -- FactEdit -- improves factuality scores by over 11 points on CNN/DM and over 31 points on XSum.
arXiv Detail & Related papers (2022-10-22T07:16:19Z) - The Factual Inconsistency Problem in Abstractive Text Summarization: A
Survey [25.59111855107199]
neural encoder-decoder models pioneered by Seq2Seq framework have been proposed to achieve the goal of generating more abstractive summaries.
At a high level, such neural models can freely generate summaries without any constraint on the words or phrases used.
However, the neural model's abstraction ability is a double-edged sword.
arXiv Detail & Related papers (2021-04-30T08:46:13Z) - Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner.
We study the entropy, or uncertainty, of the model's token-level predictions.
We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z) - FEQA: A Question Answering Evaluation Framework for Faithfulness
Assessment in Abstractive Summarization [34.2456005415483]
We tackle the problem of evaluating faithfulness of a generated summary given its source document.
We find that current models exhibit a trade-off between abstractiveness and faithfulness.
We propose an automatic question answering (QA) based metric for faithfulness.
arXiv Detail & Related papers (2020-05-07T21:00:08Z) - Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven
Cloze Reward [42.925345819778656]
We present ASGARD, a novel framework for Abstractive Summarization with Graph-Augmentation and semantic-driven RewarD.
We propose the use of dual encoders---a sequential document encoder and a graph-structured encoder---to maintain the global context and local characteristics of entities.
Results show that our models produce significantly higher ROUGE scores than a variant without knowledge graph as input on both New York Times and CNN/Daily Mail datasets.
arXiv Detail & Related papers (2020-05-03T18:23:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.