PSentScore: Evaluating Sentiment Polarity in Dialogue Summarization
- URL: http://arxiv.org/abs/2307.12371v2
- Date: Fri, 3 May 2024 16:48:50 GMT
- Title: PSentScore: Evaluating Sentiment Polarity in Dialogue Summarization
- Authors: Yongxin Zhou, Fabien Ringeval, François Portet,
- Abstract summary: We introduce and assess a set of measures aimed at quantifying the preservation of affective content in dialogue summaries.
Our findings indicate that state-of-the-art summarization models do not preserve well the affective content within their summaries.
We demonstrate that a careful selection of the training set for dialogue samples can lead to improved preservation of affective content in the generated summaries.
- Score: 3.875021622948646
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automatic dialogue summarization is a well-established task with the goal of distilling the most crucial information from human conversations into concise textual summaries. However, most existing research has predominantly focused on summarizing factual information, neglecting the affective content, which can hold valuable insights for analyzing, monitoring, or facilitating human interactions. In this paper, we introduce and assess a set of measures PSentScore, aimed at quantifying the preservation of affective content in dialogue summaries. Our findings indicate that state-of-the-art summarization models do not preserve well the affective content within their summaries. Moreover, we demonstrate that a careful selection of the training set for dialogue samples can lead to improved preservation of affective content in the generated summaries, albeit with a minor reduction in content-related metrics.
Related papers
- Increasing faithfulness in human-human dialog summarization with Spoken Language Understanding tasks [0.0]
We propose an exploration of how incorporating task-related information can enhance the summarization process.
Results show that integrating models with task-related information improves summary accuracy, even with varying word error rates.
arXiv Detail & Related papers (2024-09-16T08:15:35Z) - Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems.
Previous studies suggest using only a portion of the dialogue context in the annotation process.
This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z) - SWING: Balancing Coverage and Faithfulness for Dialogue Summarization [67.76393867114923]
We propose to utilize natural language inference (NLI) models to improve coverage while avoiding factual inconsistencies.
We use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered.
Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-01-25T09:33:11Z) - Human-in-the-loop Abstractive Dialogue Summarization [61.4108097664697]
We propose to incorporate different levels of human feedback into the training process.
This will enable us to guide the models to capture the behaviors humans care about for summaries.
arXiv Detail & Related papers (2022-12-19T19:11:27Z) - ED-FAITH: Evaluating Dialogue Summarization on Faithfulness [35.73012379398233]
We first present a systematic study of faithfulness metrics for dialogue summarization.
We observe that most metrics correlate poorly with human judgements despite performing well on news datasets.
We propose T0-Score -- a new metric for faithfulness evaluation.
arXiv Detail & Related papers (2022-11-15T19:33:50Z) - Analyzing and Evaluating Faithfulness in Dialogue Summarization [67.07947198421421]
We first perform the fine-grained human analysis on the faithfulness of dialogue summaries and observe that over 35% of generated summaries are faithfully inconsistent respective the source dialogues.
We present a new model-level faithfulness evaluation method. It examines generation models with multi-choice questions created by rule-based transformations.
arXiv Detail & Related papers (2022-10-21T07:22:43Z) - Enhancing Semantic Understanding with Self-supervised Methods for
Abstractive Dialogue Summarization [4.226093500082746]
We introduce self-supervised methods to compensate shortcomings to train a dialogue summarization model.
Our principle is to detect incoherent information flows using pretext dialogue text to enhance BERT's ability to contextualize the dialogue text representations.
arXiv Detail & Related papers (2022-09-01T07:51:46Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Topic-Oriented Spoken Dialogue Summarization for Customer Service with
Saliency-Aware Topic Modeling [61.67321200994117]
In a customer service system, dialogue summarization can boost service efficiency by creating summaries for long spoken dialogues.
In this work, we focus on topic-oriented dialogue summarization, which generates highly abstractive summaries.
We propose a novel topic-augmented two-stage dialogue summarizer ( TDS) jointly with a saliency-aware neural topic model (SATM) for topic-oriented summarization of customer service dialogues.
arXiv Detail & Related papers (2020-12-14T07:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.