Provable Contrastive Continual Learning
- URL: http://arxiv.org/abs/2405.18756v1
- Date: Wed, 29 May 2024 04:48:11 GMT
- Title: Provable Contrastive Continual Learning
- Authors: Yichen Wen, Zhiquan Tan, Kaipeng Zheng, Chuanlong Xie, Weiran Huang,
- Abstract summary: We establish theoretical performance guarantees, which reveal how the performance of the model is bounded by training losses of previous tasks.
Inspired by our theoretical analysis of these guarantees, we propose a novel contrastive continual learning algorithm called CILA.
Our method shows great improvement on standard benchmarks and achieves new state-of-the-art performance.
- Score: 7.6989463205452555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning requires learning incremental tasks with dynamic data distributions. So far, it has been observed that employing a combination of contrastive loss and distillation loss for training in continual learning yields strong performance. To the best of our knowledge, however, this contrastive continual learning framework lacks convincing theoretical explanations. In this work, we fill this gap by establishing theoretical performance guarantees, which reveal how the performance of the model is bounded by training losses of previous tasks in the contrastive continual learning framework. Our theoretical explanations further support the idea that pre-training can benefit continual learning. Inspired by our theoretical analysis of these guarantees, we propose a novel contrastive continual learning algorithm called CILA, which uses adaptive distillation coefficients for different tasks. These distillation coefficients are easily computed by the ratio between average distillation losses and average contrastive losses from previous tasks. Our method shows great improvement on standard benchmarks and achieves new state-of-the-art performance.
Related papers
- Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - Contrastive Continual Learning with Importance Sampling and
Prototype-Instance Relation Distillation [14.25441464051506]
We propose Contrastive Continual Learning via Importance Sampling (CCLIS) to preserve knowledge by recovering previous data distributions.
We also present the Prototype-instance Relation Distillation (PRD) loss, a technique designed to maintain the relationship between prototypes and sample representations.
arXiv Detail & Related papers (2024-03-07T15:47:52Z) - Relaxed Contrastive Learning for Federated Learning [48.96253206661268]
We propose a novel contrastive learning framework to address the challenges of data heterogeneity in federated learning.
Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks.
arXiv Detail & Related papers (2024-01-10T04:55:24Z) - The Staged Knowledge Distillation in Video Classification: Harmonizing
Student Progress by a Complementary Weakly Supervised Framework [21.494759678807686]
We propose a new weakly supervised learning framework for knowledge distillation in video classification.
Our approach leverages the concept of substage-based learning to distill knowledge based on the combination of student substages and the correlation of corresponding substages.
Our proposed substage-based distillation approach has the potential to inform future research on label-efficient learning for video data.
arXiv Detail & Related papers (2023-07-11T12:10:42Z) - Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task.
We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - The Power of Contrast for Feature Learning: A Theoretical Analysis [42.20116348668721]
We show that contrastive learning outperforms the standard autoencoders and generative adversarial networks.
We also illustrate the impact of labeled data in supervised contrastive learning.
arXiv Detail & Related papers (2021-10-06T03:10:28Z) - Co$^2$L: Contrastive Continual Learning [69.46643497220586]
Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks.
We propose a rehearsal-based continual learning algorithm that focuses on continually learning and maintaining transferable representations.
arXiv Detail & Related papers (2021-06-28T06:14:38Z) - About contrastive unsupervised representation learning for
classification and its convergence [2.6782615615913343]
We build a theoretical framework around contrastive learning which guarantees for its performance can be proven.
We provide extensions of these results to training with multiple negative samples and for multiway classification.
We also provide convergence guarantees for the minimization of the contrastive training error with gradient descent of an overparametrized deep neural encoder.
arXiv Detail & Related papers (2020-12-02T10:08:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.