Human-in-the-loop Abstractive Dialogue Summarization
- URL: http://arxiv.org/abs/2212.09750v1
- Date: Mon, 19 Dec 2022 19:11:27 GMT
- Title: Human-in-the-loop Abstractive Dialogue Summarization
- Authors: Jiaao Chen, Mohan Dodda, Diyi Yang
- Abstract summary: We propose to incorporate different levels of human feedback into the training process.
This will enable us to guide the models to capture the behaviors humans care about for summaries.
- Score: 61.4108097664697
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Abstractive dialogue summarization has received increasing attention
recently. Despite the fact that most of the current dialogue summarization
systems are trained to maximize the likelihood of human-written summaries and
have achieved significant results, there is still a huge gap in generating
high-quality summaries as determined by humans, such as coherence and
faithfulness, partly due to the misalignment in maximizing a single
human-written summary. To this end, we propose to incorporate different levels
of human feedback into the training process. This will enable us to guide the
models to capture the behaviors humans care about for summaries. Specifically,
we ask humans to highlight the salient information to be included in summaries
to provide the local feedback , and to make overall comparisons among summaries
in terms of coherence, accuracy, coverage, concise and overall quality, as the
global feedback. We then combine both local and global feedback to fine-tune
the dialog summarization policy with Reinforcement Learning. Experiments
conducted on multiple datasets demonstrate the effectiveness and generalization
of our methods over the state-of-the-art supervised baselines, especially in
terms of human judgments.
Related papers
- Increasing faithfulness in human-human dialog summarization with Spoken Language Understanding tasks [0.0]
We propose an exploration of how incorporating task-related information can enhance the summarization process.
Results show that integrating models with task-related information improves summary accuracy, even with varying word error rates.
arXiv Detail & Related papers (2024-09-16T08:15:35Z) - PSentScore: Evaluating Sentiment Polarity in Dialogue Summarization [3.875021622948646]
We introduce and assess a set of measures aimed at quantifying the preservation of affective content in dialogue summaries.
Our findings indicate that state-of-the-art summarization models do not preserve well the affective content within their summaries.
We demonstrate that a careful selection of the training set for dialogue samples can lead to improved preservation of affective content in the generated summaries.
arXiv Detail & Related papers (2023-07-23T16:46:01Z) - GUMSum: Multi-Genre Data and Evaluation for English Abstractive
Summarization [10.609715843964263]
Automatic summarization with pre-trained language models has led to impressively fluent results, but is prone to 'hallucinations'
We present GUMSum, a dataset of English summaries in 12 written and spoken genres for evaluation of abstractive summarization.
arXiv Detail & Related papers (2023-06-20T03:21:10Z) - Factually Consistent Summarization via Reinforcement Learning with
Textual Entailment Feedback [57.816210168909286]
We leverage recent progress on textual entailment models to address this problem for abstractive summarization systems.
We use reinforcement learning with reference-free, textual entailment rewards to optimize for factual consistency.
Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience, and conciseness of the generated summaries.
arXiv Detail & Related papers (2023-05-31T21:04:04Z) - SummIt: Iterative Text Summarization via ChatGPT [12.966825834765814]
We propose SummIt, an iterative text summarization framework based on large language models like ChatGPT.
Our framework enables the model to refine the generated summary iteratively through self-evaluation and feedback.
We also conduct a human evaluation to validate the effectiveness of the iterative refinements and identify a potential issue of over-correction.
arXiv Detail & Related papers (2023-05-24T07:40:06Z) - Analyzing and Evaluating Faithfulness in Dialogue Summarization [67.07947198421421]
We first perform the fine-grained human analysis on the faithfulness of dialogue summaries and observe that over 35% of generated summaries are faithfully inconsistent respective the source dialogues.
We present a new model-level faithfulness evaluation method. It examines generation models with multi-choice questions created by rule-based transformations.
arXiv Detail & Related papers (2022-10-21T07:22:43Z) - Comparing Methods for Extractive Summarization of Call Centre Dialogue [77.34726150561087]
We experimentally compare several such methods by using them to produce summaries of calls, and evaluating these summaries objectively.
We found that TopicSum and Lead-N outperform the other summarisation methods, whilst BERTSum received comparatively lower scores in both subjective and objective evaluations.
arXiv Detail & Related papers (2022-09-06T13:16:02Z) - Controllable Abstractive Dialogue Summarization with Sketch Supervision [56.59357883827276]
Our model achieves state-of-the-art performance on the largest dialogue summarization corpus SAMSum, with as high as 50.79 in ROUGE-L score.
arXiv Detail & Related papers (2021-05-28T19:05:36Z) - Unsupervised Reference-Free Summary Quality Evaluation via Contrastive
Learning [66.30909748400023]
We propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning.
Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT.
Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries.
arXiv Detail & Related papers (2020-10-05T05:04:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.