DialogSum Challenge: Results of the Dialogue Summarization Shared Task
- URL: http://arxiv.org/abs/2208.03898v2
- Date: Tue, 9 Aug 2022 02:28:02 GMT
- Title: DialogSum Challenge: Results of the Dialogue Summarization Shared Task
- Authors: Yulong Chen, Naihao Deng, Yang Liu, Yue Zhang
- Abstract summary: We report the results of DialogSum Challenge, the shared task on summarizing real-life scenario dialogues at INLG 2022.
We find that there is a salient gap between model generated outputs and human annotated summaries by human evaluation from multiple aspects.
These findings demonstrate the difficulty of dialogue summarization and suggest that more fine-grained evaluatuion metrics are in need.
- Score: 15.791481299499537
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We report the results of DialogSum Challenge, the shared task on summarizing
real-life scenario dialogues at INLG 2022. Four teams participate in this
shared task and three submit their system reports, exploring different methods
to improve the performance of dialogue summarization. Although there is a great
improvement over the baseline models regarding automatic evaluation metrics,
such as Rouge scores, we find that there is a salient gap between model
generated outputs and human annotated summaries by human evaluation from
multiple aspects. These findings demonstrate the difficulty of dialogue
summarization and suggest that more fine-grained evaluatuion metrics are in
need.
Related papers
- Increasing faithfulness in human-human dialog summarization with Spoken Language Understanding tasks [0.0]
We propose an exploration of how incorporating task-related information can enhance the summarization process.
Results show that integrating models with task-related information improves summary accuracy, even with varying word error rates.
arXiv Detail & Related papers (2024-09-16T08:15:35Z) - ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark [26.100299485985197]
ComperDial consists of human-scored responses for 10,395 dialogue turns in 1,485 conversations collected from 99 dialogue agents.
In addition to single-turn response scores, ComperDial also contains dialogue-level human-annotated scores.
Building off ComperDial, we devise a new automatic evaluation metric to measure the general similarity of model-generated dialogues to human conversations.
arXiv Detail & Related papers (2024-06-17T05:51:04Z) - CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization [7.234196390284036]
This article summarizes the research on Transformer-based abstractive summarization for English dialogues.
We cover the main challenges present in dialog summarization (i.e., language, structure, comprehension, speaker, salience, and factuality)
We find that while some challenges, like language, have seen considerable progress, others, such as comprehension, factuality, and salience, remain difficult and hold significant research opportunities.
arXiv Detail & Related papers (2024-06-11T17:30:22Z) - Long Dialog Summarization: An Analysis [28.223798877781054]
This work emphasizes the significance of creating coherent and contextually rich summaries for effective communication in various applications.
We explore current state-of-the-art approaches for long dialog summarization in different domains and benchmark metrics based evaluations show that one single model does not perform well across various areas for distinct summarization tasks.
arXiv Detail & Related papers (2024-02-26T19:35:45Z) - Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models [51.75805497456226]
This work focuses on the factual consistency issue with the help of the dialogue summarization task.
Our evaluation shows that, on average, 26.8% of the summaries generated by LLMs contain factual inconsistency.
To stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data.
arXiv Detail & Related papers (2023-11-13T09:32:12Z) - Instructive Dialogue Summarization with Query Aggregations [41.89962538701501]
We introduce instruction-finetuned language models to expand the capability set of dialogue summarization models.
We propose a three-step approach to synthesize high-quality query-based summarization triples.
By training a unified model called InstructDS on three summarization datasets with multi-purpose instructive triples, we expand the capability of dialogue summarization models.
arXiv Detail & Related papers (2023-10-17T04:03:00Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - Analyzing and Evaluating Faithfulness in Dialogue Summarization [67.07947198421421]
We first perform the fine-grained human analysis on the faithfulness of dialogue summaries and observe that over 35% of generated summaries are faithfully inconsistent respective the source dialogues.
We present a new model-level faithfulness evaluation method. It examines generation models with multi-choice questions created by rule-based transformations.
arXiv Detail & Related papers (2022-10-21T07:22:43Z) - Is this Dialogue Coherent? Learning from Dialogue Acts and Entities [82.44143808977209]
We create the Switchboard Coherence (SWBD-Coh) corpus, a dataset of human-human spoken dialogues annotated with turn coherence ratings.
Our statistical analysis of the corpus indicates how turn coherence perception is affected by patterns of distribution of entities.
We find that models combining both DA and entity information yield the best performances both for response selection and turn coherence rating.
arXiv Detail & Related papers (2020-06-17T21:02:40Z) - Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data.
Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z) - Dialogue-Based Relation Extraction [53.2896545819799]
We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE.
We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks.
Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings.
arXiv Detail & Related papers (2020-04-17T03:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.