CONFIT: Toward Faithful Dialogue Summarization with
Linguistically-Informed Contrastive Fine-tuning
- URL: http://arxiv.org/abs/2112.08713v1
- Date: Thu, 16 Dec 2021 09:08:40 GMT
- Title: CONFIT: Toward Faithful Dialogue Summarization with
Linguistically-Informed Contrastive Fine-tuning
- Authors: Xiangru Tang, Arjun Nair, Borui Wang, Bingyao Wang, Jai Desai, Aaron
Wade, Haoran Li, Asli Celikyilmaz, Yashar Mehdad, Dragomir Radev
- Abstract summary: Factual inconsistencies in generated summaries severely limit the practical applications of abstractive dialogue summarization.
We provide a typology of factual errors with annotation data to highlight the types of errors and move away from a binary understanding of factuality.
We propose a training strategy that improves the factual consistency and overall quality of summaries via a novel contrastive fine-tuning, called ConFiT.
- Score: 5.389540975316299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Factual inconsistencies in generated summaries severely limit the practical
applications of abstractive dialogue summarization. Although significant
progress has been achieved by using pre-trained models, substantial amounts of
hallucinated content are found during the human evaluation. Pre-trained models
are most commonly fine-tuned with cross-entropy loss for text summarization,
which may not be an optimal strategy. In this work, we provide a typology of
factual errors with annotation data to highlight the types of errors and move
away from a binary understanding of factuality. We further propose a training
strategy that improves the factual consistency and overall quality of summaries
via a novel contrastive fine-tuning, called ConFiT. Based on our
linguistically-informed typology of errors, we design different modular
objectives that each target a specific type. Specifically, we utilize hard
negative samples with errors to reduce the generation of factual inconsistency.
In order to capture the key information between speakers, we also design a
dialogue-specific loss. Using human evaluation and automatic faithfulness
metrics, we show that our model significantly reduces all kinds of factual
errors on the dialogue summarization, SAMSum corpus. Moreover, our model could
be generalized to the meeting summarization, AMI corpus, and it produces
significantly higher scores than most of the baselines on both datasets
regarding word-overlap metrics.
Related papers
- Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing [71.29488677105127]
Existing scene text recognition (STR) methods struggle to recognize challenging texts, especially for artistic and severely distorted characters.
We propose a contrastive learning-based STR framework by leveraging synthetic and real unlabeled data without any human cost.
Our method achieves SOTA performance (94.7% and 70.9% average accuracy on common benchmarks and Union14M-Benchmark.
arXiv Detail & Related papers (2024-11-23T15:24:47Z) - Assessment of Transformer-Based Encoder-Decoder Model for Human-Like Summarization [0.05852077003870416]
This work leverages transformer-based BART model for human-like summarization.
On training and fine-tuning the encoder-decoder model, it is tested with diverse sample articles.
The finetuned model performance is compared with the baseline pretrained model.
Empirical results on BBC News articles highlight that the gold standard summaries written by humans are more factually consistent by 17%.
arXiv Detail & Related papers (2024-10-22T09:25:04Z) - Factual Dialogue Summarization via Learning from Large Language Models [35.63037083806503]
Large language model (LLM)-based automatic text summarization models generate more factually consistent summaries.
We employ zero-shot learning to extract symbolic knowledge from LLMs, generating factually consistent (positive) and inconsistent (negative) summaries.
Our approach achieves better factual consistency while maintaining coherence, fluency, and relevance, as confirmed by various automatic evaluation metrics.
arXiv Detail & Related papers (2024-06-20T20:03:37Z) - Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors [11.07539342949602]
We propose an end-to-end framework for detecting factual errors in text summarization.
Our framework uses a diverse set of LLM prompts to identify factual inconsistencies.
We calibrate the ensembled models to produce empirically accurate probabilities that a text is factually consistent or free of hallucination.
arXiv Detail & Related papers (2024-06-18T18:59:37Z) - What's under the hood: Investigating Automatic Metrics on Meeting Summarization [7.234196390284036]
Meeting summarization has become a critical task considering the increase in online interactions.
Current default-used metrics struggle to capture observable errors, showing weak to mid-correlations.
Only a subset reacts accurately to specific errors, while most correlations show either unresponsiveness or failure to reflect the error's impact on summary quality.
arXiv Detail & Related papers (2024-04-17T07:15:07Z) - RLVF: Learning from Verbal Feedback without Overgeneralization [94.19501420241188]
We study the problem of incorporating verbal feedback without such overgeneralization.
We develop a new method Contextualized Critiques with Constrained Preference Optimization (C3PO)
Our approach effectively applies verbal feedback to relevant scenarios while preserving existing behaviors for other contexts.
arXiv Detail & Related papers (2024-02-16T18:50:24Z) - SWING: Balancing Coverage and Faithfulness for Dialogue Summarization [67.76393867114923]
We propose to utilize natural language inference (NLI) models to improve coverage while avoiding factual inconsistencies.
We use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered.
Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-01-25T09:33:11Z) - Correcting Diverse Factual Errors in Abstractive Summarization via
Post-Editing and Language Model Infilling [56.70682379371534]
We show that our approach vastly outperforms prior methods in correcting erroneous summaries.
Our model -- FactEdit -- improves factuality scores by over 11 points on CNN/DM and over 31 points on XSum.
arXiv Detail & Related papers (2022-10-22T07:16:19Z) - Analyzing and Evaluating Faithfulness in Dialogue Summarization [67.07947198421421]
We first perform the fine-grained human analysis on the faithfulness of dialogue summaries and observe that over 35% of generated summaries are faithfully inconsistent respective the source dialogues.
We present a new model-level faithfulness evaluation method. It examines generation models with multi-choice questions created by rule-based transformations.
arXiv Detail & Related papers (2022-10-21T07:22:43Z) - Understanding Factual Errors in Summarization: Errors, Summarizers,
Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model.
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z) - CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in
Abstractive Summarization [6.017006996402699]
We study generating abstractive summaries that are faithful and factually consistent with the given articles.
A novel contrastive learning formulation is presented, which leverages both reference summaries, as positive training data, and automatically generated erroneous summaries, as negative training data, to train summarization systems that are better at distinguishing between them.
arXiv Detail & Related papers (2021-09-19T20:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.