Integral Continual Learning Along the Tangent Vector Field of Tasks
- URL: http://arxiv.org/abs/2211.13108v3
- Date: Tue, 12 Dec 2023 03:52:00 GMT
- Title: Integral Continual Learning Along the Tangent Vector Field of Tasks
- Authors: Tian Yu Liu, Aditya Golatkar, Stefano Soatto, Alessandro Achille
- Abstract summary: We propose a lightweight continual learning method which incorporates information from specialized datasets incrementally.
It maintains a small fixed-size memory buffer, as low as 0.4% of the source datasets, which is updated by simple resampling.
Our method achieves strong performance across various buffer sizes for different datasets.
- Score: 112.02761912526734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a lightweight continual learning method which incorporates
information from specialized datasets incrementally, by integrating it along
the vector field of "generalist" models. The tangent plane to the specialist
model acts as a generalist guide and avoids the kind of over-fitting that leads
to catastrophic forgetting, while exploiting the convexity of the optimization
landscape in the tangent plane. It maintains a small fixed-size memory buffer,
as low as 0.4% of the source datasets, which is updated by simple resampling.
Our method achieves strong performance across various buffer sizes for
different datasets. Specifically, in the class-incremental setting we
outperform the existing methods that do not require distillation by an average
of 18.77% and 28.48%, for Seq-CIFAR-10 and Seq-TinyImageNet respectively. Our
method can easily be used in conjunction with existing replay-based continual
learning methods. When memory buffer constraints are relaxed to allow storage
of metadata such as logits, we attain an error reduction of 17.84% towards the
paragon performance on Seq-CIFAR-10.
Related papers
- Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement [29.675650285351768]
Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks.
Approximate MU is a practical method for large-scale models.
We propose a fast-slow parameter update strategy to implicitly approximate the up-to-date salient unlearning direction.
arXiv Detail & Related papers (2024-09-29T15:17:33Z) - Gradient Boosting Mapping for Dimensionality Reduction and Feature Extraction [2.778647101651566]
A fundamental problem in supervised learning is to find a good set of features or distance measures.
We propose a supervised dimensionality reduction method, where the outputs of weak learners define the embedding.
We show that the embedding coordinates provide better features for the supervised learning task.
arXiv Detail & Related papers (2024-05-14T10:23:57Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Stochastic Gradient Descent for Nonparametric Regression [11.24895028006405]
This paper introduces an iterative algorithm for training nonparametric additive models.
We show that the resulting inequality satisfies an oracle that allows for model mis-specification.
arXiv Detail & Related papers (2024-01-01T08:03:52Z) - Filter Pruning For CNN With Enhanced Linear Representation Redundancy [3.853146967741941]
We present a data-driven loss function term calculated from the correlation matrix of different feature maps in the same layer, named CCM-loss.
CCM-loss provides us with another universal transcendental mathematical tool besides L*-norm regularization.
In our new strategy, we mainly focus on the consistency and integrality of the information flow in the network.
arXiv Detail & Related papers (2023-10-10T06:27:30Z) - Understanding Collapse in Non-Contrastive Learning [122.2499276246997]
We show that SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size.
We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels.
arXiv Detail & Related papers (2022-09-29T17:59:55Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - Rethinking Reconstruction Autoencoder-Based Out-of-Distribution
Detection [0.0]
Reconstruction autoencoder-based methods deal with the problem by using input reconstruction error as a metric of novelty vs. normality.
We introduce semantic reconstruction, data certainty decomposition and normalized L2 distance to substantially improve original methods.
Our method works without any additional data, hard-to-implement structure, time-consuming pipeline, and even harming the classification accuracy of known classes.
arXiv Detail & Related papers (2022-03-04T09:04:55Z) - Meta-Generating Deep Attentive Metric for Few-shot Classification [53.07108067253006]
We present a novel deep metric meta-generation method to generate a specific metric for a new few-shot learning task.
In this study, we structure the metric using a three-layer deep attentive network that is flexible enough to produce a discriminative metric for each task.
We gain surprisingly obvious performance improvement over state-of-the-art competitors, especially in the challenging cases.
arXiv Detail & Related papers (2020-12-03T02:07:43Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.