Related papers: Recyclable Tuning for Continual Pre-training

Recyclable Tuning for Continual Pre-training

URL: http://arxiv.org/abs/2305.08702v1
Date: Mon, 15 May 2023 15:05:44 GMT
Title: Recyclable Tuning for Continual Pre-training
Authors: Yujia Qin, Cheng Qian, Xu Han, Yankai Lin, Huadong Wang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, and Jie Zhou
Abstract summary: Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. We contend that proper algorithms for recycling outdated adapted weights should be developed. We show that both methods can be combined to achieve better performance.
Score: 98.51583779792031
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. Before an upgraded PLM is released, we may have tuned the original PLM for various tasks and stored the adapted weights. However, when tuning the upgraded PLM, these outdated adapted weights will typically be ignored and discarded, causing a potential waste of resources. We bring this issue to the forefront and contend that proper algorithms for recycling outdated adapted weights should be developed. To this end, we formulate the task of recyclable tuning for continual pre-training. In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent. Motivated by this finding, we analyze the connection between continually pre-trained PLMs from two novel aspects, i.e., mode connectivity, and functional similarity. Based on the corresponding findings, we propose both an initialization-based method and a distillation-based method for our task. We demonstrate their feasibility in improving the convergence and performance for tuning the upgraded PLM. We also show that both methods can be combined to achieve better performance. The source codes are publicly available at https://github.com/thunlp/RecyclableTuning.

Related papers

Meta-Learning Adaptable Foundation Models [37.458141335750696]
We introduce a meta-learning framework infused with PEFT in this intermediate retraining stage to learn a model that can be easily adapted to unseen tasks. In this setting, we demonstrate the suboptimality of standard retraining for finding an adaptable set of parameters. We then apply these theoretical insights to retraining the RoBERTa model to predict the continuation of conversations within the ConvAI2 dataset.
arXiv Detail & Related papers (2024-10-29T17:24:18Z)
Pruning Foundation Models for High Accuracy without Retraining [48.256389781305415]
It is challenging to deploy foundation models or large language models (LLMs) due to their massive parameters and computations. Post-training pruning methods are proposed to prune LLMs in one-shot without retraining. Our experiments demonstrate the superior performance of the proposed methods in comparison to SOTA baselines.
arXiv Detail & Related papers (2024-10-21T01:23:34Z)
LLaCA: Multimodal Large Language Continual Assistant [59.585544987096974]
Multimodal Continual Instruction Tuning (MCIT) is adopted to continually instruct MLLMs to follow human intent in sequential datasets. Existing gradient update would heavily destroy the tuning performance on previous datasets. We propose a method called Multimodal Large Language Continual Assistant (LLaCA) to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z)
CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning [17.614980614656407]
We propose Continual Generative training for Incremental prompt-Learning. We exploit Variational Autoencoders to learn class-conditioned distributions. We show that such a generative replay approach can adapt to new tasks while improving zero-shot capabilities.
arXiv Detail & Related papers (2024-07-22T16:51:28Z)
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search [50.45155830888697]
We develop a reinforced self-training approach, called ReST-MCTS*, based on integrating process reward guidance with tree search MCTS* for collecting higher-quality reasoning traces as well as per-step value to train policy and reward models. We first show that the tree-search policy in ReST-MCTS* achieves higher accuracy compared with prior LLM reasoning baselines such as Best-of-N and Tree-of-Thought, within the same search budget.
arXiv Detail & Related papers (2024-06-06T07:40:00Z)
On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code [12.708117108874083]
Pre-trained language models (PLMs) have become a prevalent technique in deep learning for code. We study two widely used PLM architectures on two downstream tasks, API call and API usage prediction. To address these issues, we implement five continual learning approaches, including replay-based and regularization-based methods.
arXiv Detail & Related papers (2023-05-06T18:00:21Z)
Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need [84.3507610522086]
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones. Recent pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL. We argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring.
arXiv Detail & Related papers (2023-03-13T17:59:02Z)
Improving Rare Word Recognition with LM-aware MWER Training [50.241159623691885]
We introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework. For the shallow fusion setup, we use LMs during both hypotheses generation and loss computation, and the LM-aware MWER-trained model achieves 10% relative improvement. For the rescoring setup, we learn a small neural module to generate per-token fusion weights in a data-dependent manner.
arXiv Detail & Related papers (2022-04-15T17:19:41Z)
JEM++: Improved Techniques for Training JEM [1.5533842336139065]
Joint Energy-based Model (JEM) is a recently proposed hybrid model that retains strong discriminative power of modern CNN classifiers. We propose a variety of new training procedures and architecture features to improve JEM's accuracy, training stability, and speed altogether.
arXiv Detail & Related papers (2021-09-19T00:17:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.