Towards Understanding Omission in Dialogue Summarization
- URL: http://arxiv.org/abs/2211.07145v2
- Date: Thu, 11 May 2023 13:26:02 GMT
- Title: Towards Understanding Omission in Dialogue Summarization
- Authors: Yicheng Zou, Kaitao Song, Xu Tan, Zhongkai Fu, Qi Zhang, Dongsheng Li,
Tao Gui
- Abstract summary: Previous works indicated that omission is a major factor in affecting the quality of summarization.
We propose the OLDS dataset, which provides high-quality Omission Labels for Dialogue Summarization.
- Score: 45.932368303107104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue summarization aims to condense the lengthy dialogue into a concise
summary, and has recently achieved significant progress. However, the result of
existing methods is still far from satisfactory. Previous works indicated that
omission is a major factor in affecting the quality of summarization, but few
of them have further explored the omission problem, such as how omission
affects summarization results and how to detect omission, which is critical for
reducing omission and improving summarization quality. Moreover, analyzing and
detecting omission relies on summarization datasets with omission labels (i.e.,
which dialogue utterances are omitted in the summarization), which are not
available in the current literature. In this paper, we propose the OLDS
dataset, which provides high-quality Omission Labels for Dialogue
Summarization. By analyzing this dataset, we find that a large improvement in
summarization quality can be achieved by providing ground-truth omission labels
for the summarization model to recover omission information, which demonstrates
the importance of omission detection for omission mitigation in dialogue
summarization. Therefore, we formulate an omission detection task and
demonstrate our proposed dataset can support the training and evaluation of
this task well. We also call for research action on omission detection based on
our proposed datasets. Our dataset and codes are publicly available.
Related papers
- CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization [7.234196390284036]
This article summarizes the research on Transformer-based abstractive summarization for English dialogues.
We cover the main challenges present in dialog summarization (i.e., language, structure, comprehension, speaker, salience, and factuality)
We find that while some challenges, like language, have seen considerable progress, others, such as comprehension, factuality, and salience, remain difficult and hold significant research opportunities.
arXiv Detail & Related papers (2024-06-11T17:30:22Z) - PSentScore: Evaluating Sentiment Polarity in Dialogue Summarization [3.875021622948646]
We introduce and assess a set of measures aimed at quantifying the preservation of affective content in dialogue summaries.
Our findings indicate that state-of-the-art summarization models do not preserve well the affective content within their summaries.
We demonstrate that a careful selection of the training set for dialogue samples can lead to improved preservation of affective content in the generated summaries.
arXiv Detail & Related papers (2023-07-23T16:46:01Z) - Improving Faithfulness of Abstractive Summarization by Controlling
Confounding Effect of Irrelevant Sentences [38.919090721583075]
We show that factual inconsistency can be caused by irrelevant parts of the input text, which act as confounders.
We design a simple multi-task model to control such confounding by leveraging human-annotated relevant sentences when available.
Our approach improves faithfulness scores by 20% over strong baselines on AnswerSumm citepfabbri 2021answersumm dataset.
arXiv Detail & Related papers (2022-12-19T18:51:06Z) - Making Science Simple: Corpora for the Lay Summarisation of Scientific
Literature [21.440724685950443]
We present two novel lay summarisation datasets, PLOS (large-scale) and eLife (medium-scale)
We provide a thorough characterisation of our lay summaries, highlighting differing levels of readability and abstractiveness between datasets.
arXiv Detail & Related papers (2022-10-18T15:28:30Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD)
An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images.
The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z) - A Review on Fact Extraction and Verification [19.373340472113703]
We study the fact checking problem, which aims to identify the veracity of a given claim.
We focus on the task of Fact Extraction and VERification (FEVER) and its accompanied dataset.
This task is essential and can be the building block of applications such as fake news detection and medical claim verification.
arXiv Detail & Related papers (2020-10-06T20:05:43Z) - Exploiting Unsupervised Data for Emotion Recognition in Conversations [76.01690906995286]
Emotion Recognition in Conversations (ERC) aims to predict the emotional state of speakers in conversations.
The available supervised data for the ERC task is limited.
We propose a novel approach to leverage unsupervised conversation data.
arXiv Detail & Related papers (2020-10-02T13:28:47Z) - A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset.
We measure consensus between answers generated by the model and a set of relevant answers.
We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.