Related papers: Continual learning: a feature extraction formalization, an efficient algorithm, and fundamental obstructions

Continual learning: a feature extraction formalization, an efficient algorithm, and fundamental obstructions

URL: http://arxiv.org/abs/2203.14383v1
Date: Sun, 27 Mar 2022 20:20:41 GMT
Title: Continual learning: a feature extraction formalization, an efficient algorithm, and fundamental obstructions
Authors: Binghui Peng and Andrej Risteski
Abstract summary: Continual learning is an emerging paradigm in machine learning. In this paper, we propose a framework for continual learning through the framework of feature extraction.
Score: 30.61165302635335
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continual learning is an emerging paradigm in machine learning, wherein a model is exposed in an online fashion to data from multiple different distributions (i.e. environments), and is expected to adapt to the distribution change. Precisely, the goal is to perform well in the new environment, while simultaneously retaining the performance on the previous environments (i.e. avoid "catastrophic forgetting") -- without increasing the size of the model. While this setup has enjoyed a lot of attention in the applied community, there hasn't be theoretical work that even formalizes the desired guarantees. In this paper, we propose a framework for continual learning through the framework of feature extraction -- namely, one in which features, as well as a classifier, are being trained with each environment. When the features are linear, we design an efficient gradient-based algorithm $\mathsf{DPGD}$, that is guaranteed to perform well on the current environment, as well as avoid catastrophic forgetting. In the general case, when the features are non-linear, we show such an algorithm cannot exist, whether efficient or not.

Related papers

Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest. Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z)
SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning [49.94607673097326]
We propose a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data. Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization algorithm. Our method showcases consistent state-of-the-art performance across diverse benchmarks and data distribution scenarios.
arXiv Detail & Related papers (2024-02-21T03:39:04Z)
Complementary Learning Subnetworks for Parameter-Efficient Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks. Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z)
Fairness Uncertainty Quantification: How certain are you that the model is fair? [13.209748908186606]
In modern machine learning, Gradient Descent (SGD) type algorithms are almost always used as training algorithms implying that the learned model, and consequently, its fairness properties are random. In this work we provide Confidence Interval (CI) for test unfairness when a group-fairness-aware, specifically, Disparate Impact (DI), and Disparate Mistreatment (DM) aware linear binary classifier is trained using online SGD-type algorithms.
arXiv Detail & Related papers (2023-04-27T04:07:58Z)
Neural Active Learning on Heteroskedastic Distributions [29.01776999862397]
We demonstrate the catastrophic failure of active learning algorithms on heteroskedastic datasets. We propose a new algorithm that incorporates a model difference scoring function for each data point to filter out the noisy examples and sample clean examples.
arXiv Detail & Related papers (2022-11-02T07:30:19Z)
DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning [29.80680408934347]
We propose an alternative framework to incremental learning where we continually fine-tune the model from a pre-trained representation. Our method takes advantage of linearization technique of a pre-trained neural network for simple and effective continual learning. We show that our method can be applied to general continual learning settings, we evaluate our method in data-incremental, task-incremental, and class-incremental learning problems.
arXiv Detail & Related papers (2022-08-17T06:58:14Z)
On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task. Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z)
Learning Neural Models for Natural Language Processing in the Face of Distributional Shift [10.990447273771592]
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications. It builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. It is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime
arXiv Detail & Related papers (2021-09-03T14:29:20Z)
Task-agnostic Continual Learning with Hybrid Probabilistic Models [75.01205414507243]
We propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification. The flow is used to learn the data distribution, perform classification, identify task changes, and avoid forgetting. We demonstrate the strong performance of HCL on a range of continual learning benchmarks such as split-MNIST, split-CIFAR, and SVHN-MNIST.
arXiv Detail & Related papers (2021-06-24T05:19:26Z)
Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes. Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)
Learning to Continuously Optimize Wireless Resource In Episodically Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment. We propose to build the notion of continual learning into the modeling process of learning wireless systems. Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.