Related papers: HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning

HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning

URL: http://arxiv.org/abs/2407.05229v1
Date: Sun, 7 Jul 2024 01:50:25 GMT
Title: HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning
Authors: Liyuan Wang, Jingyi Xie, Xingxing Zhang, Hang Su, Jun Zhu,
Abstract summary: We propose a unified framework for continual learning (CL) with pre-trained models (PTMs) and parameter-efficient tuning (PET) We present Hierarchical Decomposition PET (HiDe-PET), an innovative approach that explicitly optimize the objective through incorporating task-specific and task-shared knowledge. Our approach demonstrates remarkably superior performance over a broad spectrum of recent strong baselines.
Score: 55.88910947643436
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The deployment of pre-trained models (PTMs) has greatly advanced the field of continual learning (CL), enabling positive knowledge transfer and resilience to catastrophic forgetting. To sustain these advantages for sequentially arriving tasks, a promising direction involves keeping the pre-trained backbone frozen while employing parameter-efficient tuning (PET) techniques to instruct representation learning. Despite the popularity of Prompt-based PET for CL, its empirical design often leads to sub-optimal performance in our evaluation of different PTMs and target tasks. To this end, we propose a unified framework for CL with PTMs and PET that provides both theoretical and empirical advancements. We first perform an in-depth theoretical analysis of the CL objective in a pre-training context, decomposing it into hierarchical components namely within-task prediction, task-identity inference and task-adaptive prediction. We then present Hierarchical Decomposition PET (HiDe-PET), an innovative approach that explicitly optimizes the decomposed objective through incorporating task-specific and task-shared knowledge via mainstream PET techniques along with efficient recovery of pre-trained representations. Leveraging this framework, we delve into the distinct impacts of implementation strategy, PET technique and PET architecture, as well as adaptive knowledge accumulation amidst pronounced distribution changes. Finally, across various CL scenarios, our approach demonstrates remarkably superior performance over a broad spectrum of recent strong baselines.

Related papers

Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention [29.805182634944536]
We introduce Prefix-Tuning+, a novel architecture that generalizes the principles of Prefix-Tuning while addressing its shortcomings.<n>Our experiments show that, across a diverse set of benchmarks, Prefix-Tuning+ consistently outperforms existing Prefix-Tuning methods.
arXiv Detail & Related papers (2025-06-16T16:30:26Z)
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections [65.36449542323277]
We present a unified theoretical framework bridgingSupervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training.<n>We propose a simple yet effective learning rate reduction approach that yields significant performance improvements.
arXiv Detail & Related papers (2025-06-15T05:42:29Z)
Learning Dynamics in Continual Pre-Training for Large Language Models [4.192010912385391]
Continual Pre-Training (CPT) has become a popular method to apply strong foundation models to specific downstream tasks.<n>We focus on how general and downstream domain performance evolves at each training step, with domain performance measured via validation losses.<n>Our formulation presents a comprehensive understanding of several critical factors in CPT, including loss potential, peak learning rate, training steps, replay ratio, etc.
arXiv Detail & Related papers (2025-05-12T17:47:32Z)
Faster Parameter-Efficient Tuning with Token Redundancy Reduction [38.47377525427411]
latency-efficient tuning (PET) aims to transfer pre-trained foundation models to downstream tasks by learning a small number of parameters. PET significantly reduces storage and transfer costs for each task regardless of exponentially increasing pre-trained model capacity. Most PET methods inherit the inference of their large backbone models and often introduce additional computational overhead.
arXiv Detail & Related papers (2025-03-26T07:15:08Z)
SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models [26.484208658326857]
Continual learning aims to incrementally acquire new concepts in data streams while resisting forgetting previous knowledge. With the rise of powerful pre-trained models (PTMs), there is a growing interest in training incremental learning systems.
arXiv Detail & Related papers (2024-11-04T15:34:30Z)
Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention [10.631495275246428]
Cross-domain pre-training of decision transformers (DT) has generated significant attention in offline reinforcement learning (Offline RL) Motivated by our analysis, we propose a general method GPT-DTMA, which equips a pre-trained DT with Mixture of Attention (MoA) Experiments demonstrate that GPT-DTMA achieves superior performance in short-term environments compared to baselines, and in long-term environments, it mitigates the negative impact caused by Markov Matrix.
arXiv Detail & Related papers (2024-09-11T03:18:34Z)
See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition [56.87609859444084]
parameter-efficient fine-tuning (PEFT) focuses on optimizing a select subset of parameters while keeping the rest fixed, significantly lowering computational and storage overheads. We take the first step to unify all approaches by dissecting them from a decomposition perspective. We introduce two novel PEFT methods alongside a simple yet effective framework designed to enhance the performance of PEFT techniques across various applications.
arXiv Detail & Related papers (2024-07-07T15:44:42Z)
Towards a General Framework for Continual Learning with Pre-training [55.88910947643436]
We present a general framework for continual learning of sequentially arrived tasks with the use of pre-training. We decompose its objective into three hierarchical components, including within-task prediction, task-identity inference, and task-adaptive prediction. We propose an innovative approach to explicitly optimize these components with parameter-efficient fine-tuning (PEFT) techniques and representation statistics.
arXiv Detail & Related papers (2023-10-21T02:03:38Z)
Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice. HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics. Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z)
A Unified Continual Learning Framework with General Parameter-Efficient Tuning [56.250772378174446]
"Pre-training $rightarrow$ downstream adaptation" presents both new opportunities and challenges for Continual Learning. We position prompting as one instantiation of PET, and propose a unified CL framework, dubbed as Learning-Accumulation-Ensemble (LAE) PET, e.g., using Adapter, LoRA, or Prefix, can adapt a pre-trained model to downstream tasks with fewer parameters and resources.
arXiv Detail & Related papers (2023-03-17T15:52:45Z)
Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior: From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks. We present a generalization bound for meta-learning, which was first derived by Rothfuss et al. We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.