Semi-Supervised Dialogue Abstractive Summarization via High-Quality
Pseudolabel Selection
- URL: http://arxiv.org/abs/2403.04073v1
- Date: Wed, 6 Mar 2024 22:06:23 GMT
- Title: Semi-Supervised Dialogue Abstractive Summarization via High-Quality
Pseudolabel Selection
- Authors: Jianfeng He, Hang Su, Jason Cai, Igor Shalyminov, Hwanjun Song, Saab
Mansour
- Abstract summary: Semi-supervised dialogue summarization (SSDS) leverages model-generated summaries to reduce reliance on human-labeled data.
We propose a novel scoring approach, SiCF, which encapsulates three primary dimensions of summarization model quality.
- Score: 27.531083525683243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised dialogue summarization (SSDS) leverages model-generated
summaries to reduce reliance on human-labeled data and improve the performance
of summarization models. While addressing label noise, previous works on
semi-supervised learning primarily focus on natural language understanding
tasks, assuming each sample has a unique label. However, these methods are not
directly applicable to SSDS, as it is a generative task, and each dialogue can
be summarized in different ways. In this work, we propose a novel scoring
approach, SiCF, which encapsulates three primary dimensions of summarization
model quality: Semantic invariance (indicative of model confidence), Coverage
(factual recall), and Faithfulness (factual precision). Using the SiCF score,
we select unlabeled dialogues with high-quality generated summaries to train
summarization models. Comprehensive experiments on three public datasets
demonstrate the effectiveness of SiCF scores in uncertainty estimation and
semi-supervised learning for dialogue summarization tasks. Our code is
available at \url{https://github.com/amazon-science/summarization-sicf-score}.
Related papers
- Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - UMSE: Unified Multi-scenario Summarization Evaluation [52.60867881867428]
Summarization quality evaluation is a non-trivial task in text summarization.
We propose Unified Multi-scenario Summarization Evaluation Model (UMSE)
Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios.
arXiv Detail & Related papers (2023-05-26T12:54:44Z) - Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot
Text Classification Tasks [75.42002070547267]
We propose a self evolution learning (SE) based mixup approach for data augmentation in text classification.
We introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up.
arXiv Detail & Related papers (2023-05-22T23:43:23Z) - SWING: Balancing Coverage and Faithfulness for Dialogue Summarization [67.76393867114923]
We propose to utilize natural language inference (NLI) models to improve coverage while avoiding factual inconsistencies.
We use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered.
Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-01-25T09:33:11Z) - Evaluating the Factual Consistency of Large Language Models Through News
Summarization [97.04685401448499]
We propose a new benchmark called FIB(Factual Inconsistency Benchmark) that focuses on the task of summarization.
For factually consistent summaries, we use human-written reference summaries that we manually verify as factually consistent.
For factually inconsistent summaries, we generate summaries from a suite of summarization models that we have manually annotated as factually inconsistent.
arXiv Detail & Related papers (2022-11-15T18:50:34Z) - CEREAL: Few-Sample Clustering Evaluation [4.569028973407756]
We focus on the underexplored problem of estimating clustering quality with limited labels.
We introduce CEREAL, a comprehensive framework for few-sample clustering evaluation.
Our results show that CEREAL reduces the area under the absolute error curve by up to 57% compared to the best sampling baseline.
arXiv Detail & Related papers (2022-09-30T19:52:41Z) - Improving the Faithfulness of Abstractive Summarization via Entity
Coverage Control [27.214742188672464]
We propose a method to remedy entity-level hallucinations with Entity Coverage Control (ECC)
ECC computes entity coverage precision and prepend the corresponding control code for each training example.
We show that the proposed method leads to more faithful and salient abstractive summarization in supervised fine-tuning and zero-shot settings.
arXiv Detail & Related papers (2022-07-05T18:52:19Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - ARMAN: Pre-training with Semantically Selecting and Reordering of
Sentences for Persian Abstractive Summarization [7.16879432974126]
We propose ARMAN, a Transformer-based encoder-decoder model pre-trained with three novel objectives to address this issue.
In ARMAN, salient sentences from a document are selected according to a modified semantic score to be masked and form a pseudo summary.
We show that our proposed model achieves state-of-the-art performance on all six summarization tasks measured by ROUGE and BERTScore.
arXiv Detail & Related papers (2021-09-09T08:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.