On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning
- URL: http://arxiv.org/abs/2210.06443v1
- Date: Wed, 12 Oct 2022 17:45:13 GMT
- Title: On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning
- Authors: Lorenzo Bonicelli and Matteo Boschini and Angelo Porrello and Concetto
Spampinato and Simone Calderara
- Abstract summary: Repeated optimization on a small pool of data inevitably leads to tight and unstable decision boundaries.
We propose Lipschitz-DrivEn Rehearsal (LiDER), a surrogate objective that induces smoothness in the backbone network.
By means of extensive experiments, we show that applying LiDER delivers a stable performance gain to several state-of-the-art rehearsal CL methods.
- Score: 17.179898279925155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rehearsal approaches enjoy immense popularity with Continual Learning (CL)
practitioners. These methods collect samples from previously encountered data
distributions in a small memory buffer; subsequently, they repeatedly optimize
on the latter to prevent catastrophic forgetting. This work draws attention to
a hidden pitfall of this widespread practice: repeated optimization on a small
pool of data inevitably leads to tight and unstable decision boundaries, which
are a major hindrance to generalization. To address this issue, we propose
Lipschitz-DrivEn Rehearsal (LiDER), a surrogate objective that induces
smoothness in the backbone network by constraining its layer-wise Lipschitz
constants w.r.t.\ replay examples. By means of extensive experiments, we show
that applying LiDER delivers a stable performance gain to several
state-of-the-art rehearsal CL methods across multiple datasets, both in the
presence and absence of pre-training. Through additional ablative experiments,
we highlight peculiar aspects of buffer overfitting in CL and better
characterize the effect produced by LiDER. Code is available at
https://github.com/aimagelab/LiDER
Related papers
- May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels [16.262555459431155]
We introduce Alternate Experience Replay (AER), which takes advantage of forgetting to maintain a clear distinction between clean, complex, and noisy samples in the memory buffer.
We demonstrate the effectiveness of our approach in terms of both accuracy and purity of the obtained buffer, resulting in a remarkable average gain of 4.71% points in accuracy with respect to existing loss-based purification strategies.
arXiv Detail & Related papers (2024-08-26T14:09:40Z) - IMEX-Reg: Implicit-Explicit Regularization in the Function Space for Continual Learning [17.236861687708096]
Continual learning (CL) remains one of the long-standing challenges for deep neural networks due to catastrophic forgetting of previously acquired knowledge.
Inspired by how humans learn using strong inductive biases, we propose IMEX-Reg to improve the generalization performance of experience rehearsal in CL under low buffer regimes.
arXiv Detail & Related papers (2024-04-28T12:25:09Z) - BECLR: Batch Enhanced Contrastive Few-Shot Learning [1.450405446885067]
Unsupervised few-shot learning aspires to bridge this gap by discarding the reliance on annotations at training time.
We propose a novel Dynamic Clustered mEmory (DyCE) module to promote a highly separable latent representation space.
We then tackle the, somehow overlooked yet critical, issue of sample bias at the few-shot inference stage.
arXiv Detail & Related papers (2024-02-04T10:52:43Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from
KKT Conditions for Margin Maximization [59.038366742773164]
Linears and leaky ReLU trained by gradient flow on logistic loss have an implicit bias towards satisfying the Karush-KuTucker (KKT) conditions.
In this work we establish a number of settings where the satisfaction of these conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks.
arXiv Detail & Related papers (2023-03-02T18:24:26Z) - Efficiently Computing Local Lipschitz Constants of Neural Networks via
Bound Propagation [79.13041340708395]
Lipschitz constants are connected to many properties of neural networks, such as robustness, fairness, and generalization.
Existing methods for computing Lipschitz constants either produce relatively loose upper bounds or are limited to small networks.
We develop an efficient framework for computing the $ell_infty$ local Lipschitz constant of a neural network by tightly upper bounding the norm of Clarke Jacobian.
arXiv Detail & Related papers (2022-10-13T22:23:22Z) - Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural
Networks [77.82638674792292]
Lipschitz constants of neural networks allow for guarantees of robustness in image classification, safety in controller design, and generalizability beyond the training data.
As calculating Lipschitz constants is NP-hard, techniques for estimating Lipschitz constants must navigate the trade-off between scalability and accuracy.
In this work, we significantly push the scalability frontier of a semidefinite programming technique known as LipSDP while achieving zero accuracy loss.
arXiv Detail & Related papers (2022-04-02T11:57:52Z) - An Investigation of Replay-based Approaches for Continual Learning [79.0660895390689]
Continual learning (CL) is a major challenge of machine learning (ML) and describes the ability to learn several tasks sequentially without catastrophic forgetting (CF)
Several solution classes have been proposed, of which so-called replay-based approaches seem very promising due to their simplicity and robustness.
We empirically investigate replay-based approaches of continual learning and assess their potential for applications.
arXiv Detail & Related papers (2021-08-15T15:05:02Z) - Enhancing Mixup-based Semi-Supervised Learning with Explicit Lipschitz
Regularization [5.848916882288327]
Semi-supervised learning (SSL) mitigates the challenge by exploiting the behavior of the neural function on large unlabeled data.
A successful example is the adoption of mixup strategy in SSL that enforces the global smoothness of the neural function.
We propose that mixup improves the smoothness of the neural function by bounding the Lipschitz constant of the gradient function of the neural networks.
arXiv Detail & Related papers (2020-09-23T23:19:19Z) - Exactly Computing the Local Lipschitz Constant of ReLU Networks [98.43114280459271]
The local Lipschitz constant of a neural network is a useful metric for robustness, generalization, and fairness evaluation.
We show strong inapproximability results for estimating Lipschitz constants of ReLU networks.
We leverage this algorithm to evaluate the tightness of competing Lipschitz estimators and the effects of regularized training on the Lipschitz constant.
arXiv Detail & Related papers (2020-03-02T22:15:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.