Continual Learning using a Bayesian Nonparametric Dictionary of Weight
Factors
- URL: http://arxiv.org/abs/2004.10098v3
- Date: Tue, 27 Apr 2021 23:28:11 GMT
- Title: Continual Learning using a Bayesian Nonparametric Dictionary of Weight
Factors
- Authors: Nikhil Mehta, Kevin J Liang, Vinay K Verma and Lawrence Carin
- Abstract summary: Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings.
We propose a principled nonparametric approach based on the Indian Buffet Process (IBP) prior, letting the data determine how much to expand the model complexity.
We demonstrate the effectiveness of our method on a number of continual learning benchmarks and analyze how weight factors are allocated and reused throughout the training.
- Score: 75.58555462743585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Naively trained neural networks tend to experience catastrophic forgetting in
sequential task settings, where data from previous tasks are unavailable. A
number of methods, using various model expansion strategies, have been proposed
recently as possible solutions. However, determining how much to expand the
model is left to the practitioner, and often a constant schedule is chosen for
simplicity, regardless of how complex the incoming task is. Instead, we propose
a principled Bayesian nonparametric approach based on the Indian Buffet Process
(IBP) prior, letting the data determine how much to expand the model
complexity. We pair this with a factorization of the neural network's weight
matrices. Such an approach allows the number of factors of each weight matrix
to scale with the complexity of the task, while the IBP prior encourages sparse
weight factor selection and factor reuse, promoting positive knowledge transfer
between tasks. We demonstrate the effectiveness of our method on a number of
continual learning benchmarks and analyze how weight factors are allocated and
reused throughout the training.
Related papers
- Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - Prior-Free Continual Learning with Unlabeled Data in the Wild [24.14279172551939]
We propose a Prior-Free Continual Learning (PFCL) method to incrementally update a trained model on new tasks.
PFCL learns new tasks without knowing the task identity or any previous data.
Our experiments show that our PFCL method significantly mitigates forgetting in all three learning scenarios.
arXiv Detail & Related papers (2023-10-16T13:59:56Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Rank-Based Multi-task Learning for Fair Regression [9.95899391250129]
We develop a novel learning approach for multi-taskart regression models based on a biased dataset.
We use a popular non-parametric oracle-based non-world multipliers dataset.
arXiv Detail & Related papers (2020-09-23T22:32:57Z) - A Markov Decision Process Approach to Active Meta Learning [24.50189361694407]
In supervised learning, we fit a single statistical model to a given data set, assuming that the data is associated with a singular task.
In meta-learning, the data is associated with numerous tasks, and we seek a model that may perform well on all tasks simultaneously.
arXiv Detail & Related papers (2020-09-10T15:45:34Z) - Multi-Loss Weighting with Coefficient of Variations [19.37721431024278]
We propose a weighting scheme based on the coefficient of variations and set the weights based on properties observed while training the model.
The proposed method incorporates a measure of uncertainty to balance the losses, and as a result the loss weights evolve during training without requiring another (learning based) optimisation.
The validity of the approach is shown empirically for depth estimation and semantic segmentation on multiple datasets.
arXiv Detail & Related papers (2020-09-03T14:51:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.