Improving Representational Continuity via Continued Pretraining
- URL: http://arxiv.org/abs/2302.13289v1
- Date: Sun, 26 Feb 2023 10:39:38 GMT
- Title: Improving Representational Continuity via Continued Pretraining
- Authors: Michael Sun, Ananya Kumar, Divyam Madaan and Percy Liang
- Abstract summary: Transfer learning community (LP-FT) outperforms naive training and other continual learning methods.
LP-FT also reduces forgetting in a real world satellite remote sensing dataset (FMoW)
variant of LP-FT gets state-of-the-art accuracies on an NLP continual learning benchmark.
- Score: 76.29171039601948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the continual representation learning setting: sequentially
pretrain a model $M'$ on tasks $T_1, \ldots, T_T$, and then adapt $M'$ on a
small amount of data from each task $T_i$ to check if it has forgotten
information from old tasks. Under a kNN adaptation protocol, prior work shows
that continual learning methods improve forgetting over naive training (SGD).
In reality, practitioners do not use kNN classifiers -- they use the adaptation
method that works best (e.g., fine-tuning) -- here, we find that strong
continual learning baselines do worse than naive training. Interestingly, we
find that a method from the transfer learning community (LP-FT) outperforms
naive training and the other continual learning methods. Even with standard kNN
evaluation protocols, LP-FT performs comparably with strong continual learning
methods (while being simpler and requiring less memory) on three standard
benchmarks: sequential CIFAR-10, CIFAR-100, and TinyImageNet. LP-FT also
reduces forgetting in a real world satellite remote sensing dataset (FMoW), and
a variant of LP-FT gets state-of-the-art accuracies on an NLP continual
learning benchmark.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training [20.98770732015944]
Few-shot intent detection involves training a deep learning model to classify utterances based on their underlying intents using only a small amount of labeled data.
We show that continual pre-training may not be essential, since the overfitting problem of PLMs on this task may not be as serious as expected.
To maximize the utilization of the limited available data, we propose a context augmentation method and leverage sequential self-distillation to boost performance.
arXiv Detail & Related papers (2023-06-08T15:26:52Z) - Revisiting k-NN for Fine-tuning Pre-trained Language Models [25.105882538429743]
We revisit k-Nearest-Neighbor (kNN) classifiers for augmenting the PLMs-based classifiers.
At the heart of our approach is the implementation of kNN-calibrated training, which treats predicted results as indicators for easy versus hard examples.
We conduct extensive experiments on fine-tuning, prompt-tuning paradigms and zero-shot, few-shot and fully-supervised settings.
arXiv Detail & Related papers (2023-04-18T15:28:47Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Pushing the Limits of Simple Pipelines for Few-Shot Learning: External
Data and Fine-Tuning Make a Difference [74.80730361332711]
Few-shot learning is an important and topical problem in computer vision.
We show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-15T02:55:58Z) - Deep learning for inverse problems with unknown operator [0.0]
In inverse problems where the forward operator $T$ is unknown, we have access to training data consisting of functions $f_i$ and their noisy images $Tf_i$.
We propose a new method that requires minimal assumptions on the data, and prove reconstruction rates that depend on the number of training points and the noise level.
arXiv Detail & Related papers (2021-08-05T17:21:12Z) - RNN Training along Locally Optimal Trajectories via Frank-Wolfe
Algorithm [50.76576946099215]
We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region.
We develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation.
arXiv Detail & Related papers (2020-10-12T01:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.