Analyzing and Evaluating Faithfulness in Dialogue Summarization
- URL: http://arxiv.org/abs/2210.11777v1
- Date: Fri, 21 Oct 2022 07:22:43 GMT
- Title: Analyzing and Evaluating Faithfulness in Dialogue Summarization
- Authors: Bin Wang, Chen Zhang, Yan Zhang, Yiming Chen, Haizhou Li
- Abstract summary: We first perform the fine-grained human analysis on the faithfulness of dialogue summaries and observe that over 35% of generated summaries are faithfully inconsistent respective the source dialogues.
We present a new model-level faithfulness evaluation method. It examines generation models with multi-choice questions created by rule-based transformations.
- Score: 67.07947198421421
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue summarization is abstractive in nature, making it suffer from
factual errors. The factual correctness of summaries has the highest priority
before practical applications. Many efforts have been made to improve
faithfulness in text summarization. However, there is a lack of systematic
study on dialogue summarization systems. In this work, we first perform the
fine-grained human analysis on the faithfulness of dialogue summaries and
observe that over 35% of generated summaries are faithfully inconsistent
respective the source dialogues. Furthermore, we present a new model-level
faithfulness evaluation method. It examines generation models with multi-choice
questions created by rule-based transformations. Experimental results show that
our evaluation schema is a strong proxy for the factual correctness of
summarization models. The human-annotated faithfulness samples and the
evaluation toolkit are released to facilitate future research toward faithful
dialogue summarization.
Related papers
- Systematic Exploration of Dialogue Summarization Approaches for Reproducibility, Comparative Assessment, and Methodological Innovations for Advancing Natural Language Processing in Abstractive Summarization [0.0]
This paper delves into the reproduction and evaluation of dialogue summarization models.
Our research involved a thorough examination of several dialogue summarization models using the AMI dataset.
The primary objective was to evaluate the informativeness and quality of the summaries generated by these models through human assessment.
arXiv Detail & Related papers (2024-10-21T12:47:57Z) - Increasing faithfulness in human-human dialog summarization with Spoken Language Understanding tasks [0.0]
We propose an exploration of how incorporating task-related information can enhance the summarization process.
Results show that integrating models with task-related information improves summary accuracy, even with varying word error rates.
arXiv Detail & Related papers (2024-09-16T08:15:35Z) - Evaluating Robustness of Dialogue Summarization Models in the Presence
of Naturally Occurring Variations [13.749495524988774]
We systematically investigate the impact of real-life variations on state-of-the-art dialogue summarization models.
We introduce two types of perturbations: utterance-level perturbations that modify individual utterances with errors and language variations, and dialogue-level perturbations that add non-informative exchanges.
We find that both fine-tuned and instruction-tuned models are affected by input variations, with the latter being more susceptible.
arXiv Detail & Related papers (2023-11-15T05:11:43Z) - Improving Factuality of Abstractive Summarization via Contrastive Reward
Learning [77.07192378869776]
We propose a simple but effective contrastive learning framework that incorporates recent developments in reward learning and factuality metrics.
Empirical studies demonstrate that the proposed framework enables summarization models to learn from feedback of factuality metrics.
arXiv Detail & Related papers (2023-07-10T12:01:18Z) - Factually Consistent Summarization via Reinforcement Learning with
Textual Entailment Feedback [57.816210168909286]
We leverage recent progress on textual entailment models to address this problem for abstractive summarization systems.
We use reinforcement learning with reference-free, textual entailment rewards to optimize for factual consistency.
Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience, and conciseness of the generated summaries.
arXiv Detail & Related papers (2023-05-31T21:04:04Z) - Human-in-the-loop Abstractive Dialogue Summarization [61.4108097664697]
We propose to incorporate different levels of human feedback into the training process.
This will enable us to guide the models to capture the behaviors humans care about for summaries.
arXiv Detail & Related papers (2022-12-19T19:11:27Z) - A Focused Study on Sequence Length for Dialogue Summarization [68.73335643440957]
We analyze the length differences between existing models' outputs and the corresponding human references.
We identify salient features for summary length prediction by comparing different model settings.
Third, we experiment with a length-aware summarizer and show notable improvement on existing models if summary length can be well incorporated.
arXiv Detail & Related papers (2022-09-24T02:49:48Z) - Dialogue Summarization with Supporting Utterance Flow Modeling and Fact
Regularization [58.965859508695225]
We propose an end-to-end neural model for dialogue summarization with two novel modules.
The supporting utterance flow modeling helps to generate a coherent summary by smoothly shifting the focus from the former utterances to the later ones.
The fact regularization encourages the generated summary to be factually consistent with the ground-truth summary during model training.
arXiv Detail & Related papers (2021-08-03T03:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.