Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality
- URL: http://arxiv.org/abs/2310.07234v1
- Date: Wed, 11 Oct 2023 06:51:46 GMT
- Title: Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality
- Authors: Liyuan Wang, Jingyi Xie, Xingxing Zhang, Mingyi Huang, Hang Su, Jun
Zhu
- Abstract summary: Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
- Score: 55.88910947643436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt-based continual learning is an emerging direction in leveraging
pre-trained knowledge for downstream continual learning, and has almost reached
the performance pinnacle under supervised pre-training. However, our empirical
research reveals that the current strategies fall short of their full potential
under the more realistic self-supervised pre-training, which is essential for
handling vast quantities of unlabeled data in practice. This is largely due to
the difficulty of task-specific knowledge being incorporated into instructed
representations via prompt parameters and predicted by uninstructed
representations at test time. To overcome the exposed sub-optimality, we
conduct a theoretical analysis of the continual learning objective in the
context of pre-training, and decompose it into hierarchical components:
within-task prediction, task-identity inference, and task-adaptive prediction.
Following these empirical and theoretical insights, we propose Hierarchical
Decomposition (HiDe-)Prompt, an innovative approach that explicitly optimizes
the hierarchical components with an ensemble of task-specific prompts and
statistics of both uninstructed and instructed representations, further with
the coordination of a contrastive regularization strategy. Our extensive
experiments demonstrate the superior performance of HiDe-Prompt and its
robustness to pre-training paradigms in continual learning (e.g., up to 15.01%
and 9.61% lead on Split CIFAR-100 and Split ImageNet-R, respectively). Our code
is available at \url{https://github.com/thu-ml/HiDe-Prompt}.
Related papers
- Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early.
We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization.
For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z) - On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase.
Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z) - Towards a General Framework for Continual Learning with Pre-training [55.88910947643436]
We present a general framework for continual learning of sequentially arrived tasks with the use of pre-training.
We decompose its objective into three hierarchical components, including within-task prediction, task-identity inference, and task-adaptive prediction.
We propose an innovative approach to explicitly optimize these components with parameter-efficient fine-tuning (PEFT) techniques and representation statistics.
arXiv Detail & Related papers (2023-10-21T02:03:38Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Reward-Predictive Clustering [20.82575016038573]
We provide a clustering algorithm that enables the application of reward-predictive state abstractions to deep learning settings.
A convergence theorem and simulations show that the resulting reward-predictive deep network maximally compresses the agent's inputs.
arXiv Detail & Related papers (2022-11-07T03:13:26Z) - Understanding and Mitigating Overfitting in Prompt Tuning for
Vision-Language Models [108.13378788663196]
We propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process.
We equip CoOp with Novel Learner Feature (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set.
arXiv Detail & Related papers (2022-11-04T02:06:22Z) - Explaining, Evaluating and Enhancing Neural Networks' Learned
Representations [2.1485350418225244]
We show how explainability can be an aid, rather than an obstacle, towards better and more efficient representations.
We employ such attributions to define two novel scores for evaluating the informativeness and the disentanglement of latent embeddings.
We show that adopting our proposed scores as constraints during the training of a representation learning task improves the downstream performance of the model.
arXiv Detail & Related papers (2022-02-18T19:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.