Neural Architecture Search of Deep Priors: Towards Continual Learning
without Catastrophic Interference
- URL: http://arxiv.org/abs/2104.06788v1
- Date: Wed, 14 Apr 2021 11:25:30 GMT
- Title: Neural Architecture Search of Deep Priors: Towards Continual Learning
without Catastrophic Interference
- Authors: Martin Mundt, Iuliia Pliushch, Visvanathan Ramesh
- Abstract summary: We show that it is possible to find random weight architectures, a deep prior, that enables a linear classification to perform on par with fully trained deep counterparts.
In an extension to continual learning, we investigate the possibility of catastrophic interference free incremental learning.
- Score: 2.922007656878633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we analyze the classification performance of neural network
structures without parametric inference. Making use of neural architecture
search, we empirically demonstrate that it is possible to find random weight
architectures, a deep prior, that enables a linear classification to perform on
par with fully trained deep counterparts. Through ablation experiments, we
exclude the possibility of winning a weight initialization lottery and confirm
that suitable deep priors do not require additional inference. In an extension
to continual learning, we investigate the possibility of catastrophic
interference free incremental learning. Under the assumption of classes
originating from the same data distribution, a deep prior found on only a
subset of classes is shown to allow discrimination of further classes through
training of a simple linear classifier.
Related papers
- Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective [100.54185280153753]
We find that both classifier guidance and classifier-free guidance achieve conditional generation by pushing the denoising diffusion trajectories away from decision boundaries.
We propose a generic postprocessing step built upon flow-matching to shrink the gap between the learned distribution for a pretrained denoising diffusion model and the real data distribution.
arXiv Detail & Related papers (2025-03-13T17:59:59Z) - Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Continual learning with the neural tangent ensemble [0.6137178191238463]
We show that a neural network with N parameters can be interpreted as a weighted ensemble of N classifiers.
We derive the likelihood and posterior probability of each expert given past data.
Surprisingly, we learn that the posterior updates for these experts are equivalent to a scaled and projected form of gradient descent.
arXiv Detail & Related papers (2024-08-30T16:29:09Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - How does unlabeled data improve generalization in self-training? A
one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm.
We prove the benefits of unlabeled data in both training convergence and generalization ability.
Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z) - Deep Learning with Nonsmooth Objectives [0.0]
We explore the potential for using a nonsmooth loss function based on the max-norm in the training of an artificial neural network.
We hypothesise that this may lead to superior classification results in some special cases where the training data is either very small or unbalanced.
arXiv Detail & Related papers (2021-07-14T02:01:53Z) - Theoretical Analysis of Self-Training with Deep Networks on Unlabeled
Data [48.4779912667317]
Self-training algorithms have been very successful for learning with unlabeled data using neural networks.
This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning.
arXiv Detail & Related papers (2020-10-07T19:43:55Z) - An analytic theory of shallow networks dynamics for hinge loss
classification [14.323962459195771]
We study the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task.
We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss.
This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting.
arXiv Detail & Related papers (2020-06-19T16:25:29Z) - Uncovering Coresets for Classification With Multi-Objective Evolutionary
Algorithms [0.8057006406834467]
A coreset is a subset of the training set, using which a machine learning algorithm obtains performances similar to what it would deliver if trained over the whole original data.
A novel approach is presented: candidate corsets are iteratively optimized, adding and removing samples.
A multi-objective evolutionary algorithm is used to minimize simultaneously the number of points in the set and the classification error.
arXiv Detail & Related papers (2020-02-20T09:59:56Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.