Related papers: Does Pretraining for Summarization Require Knowledge Transfer?

Does Pretraining for Summarization Require Knowledge Transfer?

URL: http://arxiv.org/abs/2109.04953v1
Date: Fri, 10 Sep 2021 15:54:15 GMT
Title: Does Pretraining for Summarization Require Knowledge Transfer?
Authors: Kundan Krishna, Jeffrey Bigham and Zachary C. Lipton
Abstract summary: We show that pretraining on character n-grams selected at random can nearly match the performance of models pretrained on real corpora. This work holds the promise of eliminating upstream corpora, which may alleviate some concerns over offensive language, bias, and copyright issues.
Score: 27.297137706355173
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pretraining techniques leveraging enormous datasets have driven recent advances in text summarization. While folk explanations suggest that knowledge transfer accounts for pretraining's benefits, little is known about why it works or what makes a pretraining task or dataset suitable. In this paper, we challenge the knowledge transfer story, showing that pretraining on documents consisting of character n-grams selected at random, we can nearly match the performance of models pretrained on real corpora. This work holds the promise of eliminating upstream corpora, which may alleviate some concerns over offensive language, bias, and copyright issues. To see whether the small residual benefit of using real data could be accounted for by the structure of the pretraining task, we design several tasks motivated by a qualitative study of summarization corpora. However, these tasks confer no appreciable benefit, leaving open the possibility of a small role for knowledge transfer.

Related papers

These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining [10.749875317643031]
Transfer learning is a cornerstone of modern machine learning, promising a way to adapt models pretrained on a broad mix of data to new tasks with minimal new data.<n>We evaluate model transfer from a pretraining mixture to each of its component tasks, assessing whether pretrained features can match the performance of task-specific direct training.<n>We identify a fundamental limitation in deep learning models, where networks fail to learn new features once they encode similar competing features during training.
arXiv Detail & Related papers (2025-06-23T01:04:29Z)
Is forgetting less a good inductive bias for forward transfer? [7.704064306361941]
We argue that the measure of forward transfer to a task should not be affected by the restrictions placed on the continual learner. Instead, forward transfer should be measured by how easy it is to learn a new task given a set of representations produced by continual learning on previous tasks. Our results indicate that less forgetful representations lead to a better forward transfer suggesting a strong correlation between retaining past information and learning efficiency on new tasks.
arXiv Detail & Related papers (2023-03-14T19:52:09Z)
Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data. The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task. We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z)
Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors [59.93972277761501]
We show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches. This simple modular approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks.
arXiv Detail & Related papers (2022-05-20T16:19:30Z)
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks. Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z)
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials. We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)
Towards All-around Knowledge Transferring: Learning From Task-irrelevant Labels [44.036667329736225]
Existing efforts mainly focus on transferring task-relevant knowledge from other similar data to tackle the issue. To date, no large-scale studies have been performed to investigate the impact of task-irrelevant features. We propose Task-Irrelevant Transfer Learning to exploit taskirrelevant features, which mainly are extracted from task-irrelevant labels.
arXiv Detail & Related papers (2020-11-17T06:43:58Z)
What is being transferred in transfer learning? [51.6991244438545]
We show that when training from pre-trained weights, the model stays in the same basin in the loss landscape. We present that when training from pre-trained weights, the model stays in the same basin in the loss landscape and different instances of such model are similar in feature space and close in parameter space.
arXiv Detail & Related papers (2020-08-26T17:23:40Z)
Continual Class Incremental Learning for CT Thoracic Segmentation [36.45569352490318]
Deep learning organ segmentation approaches require large amounts of annotated training data, which is limited in supply due to reasons of confidentiality and the time required for expert manual annotation. Being able to train models incrementally without having access to previously used data is desirable. In this setting, a model learns a new task effectively, but loses performance on previously learned tasks. The Learning without Forgetting (LwF) approach addresses this issue via replaying its own prediction for past tasks during model training. We show that LwF can successfully retain knowledge on previous segmentations, however, its ability to learn a new class decreases with the
arXiv Detail & Related papers (2020-08-12T20:08:39Z)
Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems. Our results show that transfer learning is more beneficial than previously thought. We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.