Related papers: Gradient-matching coresets for continual learning

Gradient-matching coresets for continual learning

URL: http://arxiv.org/abs/2112.05025v1
Date: Thu, 9 Dec 2021 16:34:44 GMT
Title: Gradient-matching coresets for continual learning
Authors: Lukas Balles and Giovanni Zappella and C\'edric Archambeau
Abstract summary: We devise a coreset selection method based on the idea of gradient matching. We evaluate the method in the context of continual learning, where it can be used to curate a rehearsal memory.
Score: 8.525080112374374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. We evaluate the method in the context of continual learning, where it can be used to curate a rehearsal memory. Our method performs strong competitors such as reservoir sampling across a range of memory sizes.

Related papers

Balanced Gradient Sample Retrieval for Enhanced Knowledge Retention in Proxy-based Continual Learning [5.778730972088573]
Gradient-aligned samples are selected for their potential to reduce interference by re-aligning gradients. gradient-aligned samples are incorporated to reinforce stable, shared representations across tasks. Our approach increases the diversity among retrieved instances and achieves superior alignment in parameter space.
arXiv Detail & Related papers (2024-12-19T01:08:09Z)
Unlearning-based Neural Interpretations [51.99182464831169]
We show that current baselines defined using static functions are biased, fragile and manipulable. We propose UNI to compute an (un)learnable, debiased and adaptive baseline by perturbing the input towards an unlearning direction of steepest ascent.
arXiv Detail & Related papers (2024-10-10T16:02:39Z)
Gradient Coding in Decentralized Learning for Evading Stragglers [27.253728528979572]
We propose a new gossip-based decentralized learning method with gradient coding (GOCO) To avoid the negative impact of stragglers, the parameter vectors are updated locally using encoded gradients based on the framework of gradient coding. We analyze the convergence performance of GOCO for strongly convex loss functions.
arXiv Detail & Related papers (2024-02-06T17:49:02Z)
Active Labeling: Streaming Stochastic Gradients [91.76135191049232]
We formalize the "active labeling" problem, which generalizes active learning based on partial supervision. We provide a streaming technique that minimizes the ratio of generalization error over number of samples.
arXiv Detail & Related papers (2022-05-26T09:49:16Z)
Gradient-Matching Coresets for Rehearsal-Based Continual Learning [6.243028964381449]
The goal of continual learning (CL) is to efficiently update a machine learning model with new data without forgetting previously-learned knowledge. Most widely-used CL methods rely on a rehearsal memory of data points to be reused while training on new data. We devise a coreset selection method for rehearsal-based continual learning.
arXiv Detail & Related papers (2022-03-28T07:37:17Z)
Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous. We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z)
Adaptive Learning Rate and Momentum for Training Deep Neural Networks [0.0]
We develop a fast training method motivated by the nonlinear Conjugate Gradient (CG) framework. Experiments in image classification datasets show that our method yields faster convergence than other local solvers.
arXiv Detail & Related papers (2021-06-22T05:06:56Z)
Extending Contrastive Learning to Unsupervised Coreset Selection [26.966136750754732]
We propose an unsupervised way of selecting a core-set entirely unlabeled. We use two leading methods for contrastive learning. Compared with existing coreset selection methods with labels, our approach reduced the cost associated with human annotation.
arXiv Detail & Related papers (2021-03-05T10:21:51Z)
Better scalability under potentially heavy-tailed feedback [6.903929927172917]
We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed. We focus computational effort on robustly choosing a strong candidate based on a collection of cheap sub-processes which can be run in parallel. The exact selection process depends on the convexity of the underlying objective, but in all cases, our selection technique amounts to a robust form of boosting the confidence of weak learners.
arXiv Detail & Related papers (2020-12-14T08:56:04Z)
There and Back Again: Revisiting Backpropagation Saliency Methods [87.40330595283969]
Saliency methods seek to explain the predictions of a model by producing an importance map across each input sample. A popular class of such methods is based on backpropagating a signal and analyzing the resulting gradient. We propose a single framework under which several such methods can be unified.
arXiv Detail & Related papers (2020-04-06T17:58:08Z)
LT-Net: Label Transfer by Learning Reversible Voxel-wise Correspondence for One-shot Medical Image Segmentation [52.2074595581139]
We introduce a one-shot segmentation method to alleviate the burden of manual annotation for medical images. The main idea is to treat one-shot segmentation as a classical atlas-based segmentation problem, where voxel-wise correspondence from the atlas to the unlabelled data is learned. We demonstrate the superiority of our method over both deep learning-based one-shot segmentation methods and a classical multi-atlas segmentation method via thorough experiments.
arXiv Detail & Related papers (2020-03-16T08:36:17Z)
Disentangling Adaptive Gradient Methods from Learning Rates [65.0397050979662]
We take a deeper look at how adaptive gradient methods interact with the learning rate schedule. We introduce a "grafting" experiment which decouples an update's magnitude from its direction. We present some empirical and theoretical retrospectives on the generalization of adaptive gradient methods.
arXiv Detail & Related papers (2020-02-26T21:42:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.