Noisy Self-Knowledge Distillation for Text Summarization
- URL: http://arxiv.org/abs/2009.07032v2
- Date: Tue, 27 Jul 2021 15:38:55 GMT
- Title: Noisy Self-Knowledge Distillation for Text Summarization
- Authors: Yang Liu, Sheng Shen, Mirella Lapata
- Abstract summary: We apply self-knowledge distillation to text summarization which we argue can alleviate problems with maximum-likelihood training.
Our student summarization model is trained with guidance from a teacher which generates smoothed labels to help regularize training.
We demonstrate experimentally on three benchmarks that our framework boosts the performance of both pretrained and non-pretrained summarizers.
- Score: 83.49809205891496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we apply self-knowledge distillation to text summarization
which we argue can alleviate problems with maximum-likelihood training on
single reference and noisy datasets. Instead of relying on one-hot annotation
labels, our student summarization model is trained with guidance from a teacher
which generates smoothed labels to help regularize training. Furthermore, to
better model uncertainty during training, we introduce multiple noise signals
for both teacher and student models. We demonstrate experimentally on three
benchmarks that our framework boosts the performance of both pretrained and
non-pretrained summarizers achieving state-of-the-art results.
Related papers
- Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins [29.88235846291593]
biencoders estimate relevance of a document to a query by calculating the similarity of their respective embeddings.
Current state-of-the-art biencoders are trained using an expensive training regime involving knowledge distillation from a teacher model and batch-sampling.
We propose a novel parameter-free loss function for self-supervision that exploits the pre-trained language modeling capabilities of the encoder model as a training signal.
arXiv Detail & Related papers (2024-07-31T10:33:32Z) - Learning with Rejection for Abstractive Text Summarization [42.15551472507393]
We propose a training objective for abstractive summarization based on rejection learning.
We show that our method considerably improves the factuality of generated summaries in automatic and human evaluations.
arXiv Detail & Related papers (2023-02-16T19:07:08Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning [57.4036085386653]
We show that prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inferences based on lexical overlap.
We then show that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning.
arXiv Detail & Related papers (2021-09-09T10:10:29Z) - Alleviating Exposure Bias via Contrastive Learning for Abstractive Text
Summarization [9.70720105464003]
We propose to leverage contrastive learning to decrease the likelihood of low-quality summaries.
We experimentally demonstrate that our method effectively improves the performance of the state-of-the-art model on different datasets.
arXiv Detail & Related papers (2021-08-26T15:14:44Z) - Robustness of Accuracy Metric and its Inspirations in Learning with
Noisy Labels [51.66448070984615]
We show that maximizing training accuracy on sufficiently many noisy samples yields an approximately optimal classifier.
For validation, we prove that a noisy validation set is reliable, addressing the critical demand of model selection.
We show characterizations of models trained with noisy labels, motivated by our theoretical results, and verify the utility of a noisy validation set.
arXiv Detail & Related papers (2020-12-08T03:37:47Z) - Learning Not to Learn in the Presence of Noisy Labels [104.7655376309784]
We show that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption.
We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels.
arXiv Detail & Related papers (2020-02-16T09:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.