Referee: Reference-Free Sentence Summarization with Sharper
Controllability through Symbolic Knowledge Distillation
- URL: http://arxiv.org/abs/2210.13800v1
- Date: Tue, 25 Oct 2022 07:07:54 GMT
- Title: Referee: Reference-Free Sentence Summarization with Sharper
Controllability through Symbolic Knowledge Distillation
- Authors: Melanie Sclar, Peter West, Sachin Kumar, Yulia Tsvetkov, Yejin Choi
- Abstract summary: We present Referee, a novel framework for sentence summarization that can be trained reference-free (i.e., requiring no gold summaries for supervision)
Our work is the first to demonstrate that reference-free, controlled sentence summarization is feasible via the conceptual framework of Symbolic Knowledge Distillation.
- Score: 72.70058049274664
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Referee, a novel framework for sentence summarization that can be
trained reference-free (i.e., requiring no gold summaries for supervision),
while allowing direct control for compression ratio. Our work is the first to
demonstrate that reference-free, controlled sentence summarization is feasible
via the conceptual framework of Symbolic Knowledge Distillation (West et al.,
2022), where latent knowledge in pre-trained language models is distilled via
explicit examples sampled from the teacher models, further purified with three
types of filters: length, fidelity, and Information Bottleneck. Moreover, we
uniquely propose iterative distillation of knowledge, where student models from
the previous iteration of distillation serve as teacher models in the next
iteration. Starting off from a relatively modest set of GPT3-generated
summaries, we demonstrate how iterative knowledge distillation can lead to
considerably smaller, but better summarizers with sharper controllability. A
useful by-product of this iterative distillation process is a high-quality
dataset of sentence-summary pairs with varying degrees of compression ratios.
Empirical results demonstrate that the final student models vastly outperform
the much larger GPT3-Instruct model in terms of the controllability of
compression ratios, without compromising the quality of resulting
summarization.
Related papers
- Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins [29.88235846291593]
biencoders estimate relevance of a document to a query by calculating the similarity of their respective embeddings.
Current state-of-the-art biencoders are trained using an expensive training regime involving knowledge distillation from a teacher model and batch-sampling.
We propose a novel parameter-free loss function for self-supervision that exploits the pre-trained language modeling capabilities of the encoder model as a training signal.
arXiv Detail & Related papers (2024-07-31T10:33:32Z) - Enhancing Abstractiveness of Summarization Models through Calibrated
Distillation [30.199051061633803]
DisCal is a novel approach to enhance the level of abstractiveness without sacrificing informativeness.
Our experiments show that DisCal outperforms prior methods in abstractive summarization distillation.
arXiv Detail & Related papers (2023-10-20T18:43:49Z) - Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing [59.58984194238254]
We present Impossible Distillation, a novel framework for paraphrasing and sentence summarization.
Unlike prior works that rely on an extreme-scale teacher model, we hypothesize and verify the paraphrastic proximity intrinsic to pre-trained LMs.
By identifying and distilling generations from these subspaces, Impossible Distillation produces a high-quality dataset and model even from GPT2-scale LMs.
arXiv Detail & Related papers (2023-05-26T05:19:24Z) - HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
Transformers [49.79405257763856]
This paper focuses on task-agnostic distillation.
It produces a compact pre-trained model that can be easily fine-tuned on various tasks with small computational costs and memory footprints.
We propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning.
arXiv Detail & Related papers (2023-02-19T17:37:24Z) - Pre-trained Summarization Distillation [121.14806854092672]
Recent work on distilling BERT for classification and regression tasks shows strong performance using direct knowledge distillation.
Alternatively, machine translation practitioners distill using pseudo-labeling, where a small model is trained on the translations of a larger model.
A third, simpler approach is to'shrink and fine-tune' (SFT), which avoids any explicit distillation by copying parameters to a smaller student model and then fine-tuning.
arXiv Detail & Related papers (2020-10-24T23:15:43Z) - Noisy Self-Knowledge Distillation for Text Summarization [83.49809205891496]
We apply self-knowledge distillation to text summarization which we argue can alleviate problems with maximum-likelihood training.
Our student summarization model is trained with guidance from a teacher which generates smoothed labels to help regularize training.
We demonstrate experimentally on three benchmarks that our framework boosts the performance of both pretrained and non-pretrained summarizers.
arXiv Detail & Related papers (2020-09-15T12:53:09Z) - Why distillation helps: a statistical perspective [69.90148901064747]
Knowledge distillation is a technique for improving the performance of a simple "student" model.
While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help?
We show how distillation complements existing negative mining techniques for extreme multiclass retrieval.
arXiv Detail & Related papers (2020-05-21T01:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.