On Sequential Loss Approximation for Continual Learning
- URL: http://arxiv.org/abs/2405.16498v1
- Date: Sun, 26 May 2024 09:20:47 GMT
- Title: On Sequential Loss Approximation for Continual Learning
- Authors: Menghao Waiyan William Zhu, Ercan Engin Kuruoğlu,
- Abstract summary: We introduce for continual learning Autodiff Quadratic Consolidation (AQC) and Neural Consolidation (NC)
AQC approximates the previous loss function with a quadratic function, and NC approximates the previous loss function with a neural network.
We empirically study these methods in class-incremental learning, for which regularization-based methods produce unsatisfactory results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce for continual learning Autodiff Quadratic Consolidation (AQC), which approximates the previous loss function with a quadratic function, and Neural Consolidation (NC), which approximates the previous loss function with a neural network. Although they are not scalable to large neural networks, they can be used with a fixed pre-trained feature extractor. We empirically study these methods in class-incremental learning, for which regularization-based methods produce unsatisfactory results, unless combined with replay. We find that for small datasets, quadratic approximation of the previous loss function leads to poor results, even with full Hessian computation, and NC could significantly improve the predictive performance, while for large datasets, when used with a fixed pre-trained feature extractor, AQC provides superior predictive performance. We also find that using tanh-output features can improve the predictive performance of AQC. In particular, in class-incremental Split MNIST, when a Convolutional Neural Network (CNN) with tanh-output features is pre-trained on EMNIST Letters and used as a fixed pre-trained feature extractor, AQC can achieve predictive performance comparable to joint training.
Related papers
- Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Emergence in non-neural models: grokking modular arithmetic via average gradient outer product [16.911836722312152]
We show that grokking is not specific to neural networks nor to gradient descent-based optimization.
We show that this phenomenon occurs when learning modular arithmetic with Recursive Feature Machines.
Our results demonstrate that emergence can result purely from learning task-relevant features.
arXiv Detail & Related papers (2024-07-29T17:28:58Z) - Minimizing Chebyshev Prototype Risk Magically Mitigates the Perils of Overfitting [1.6574413179773757]
We develop multicomponent loss functions that reduce intra-class feature correlation and maximize inter-class feature distance.
We implement the terms of the Chebyshev Prototype Risk (CPR) bound into our Explicit CPR loss function.
Our training algorithm reduces overfitting and improves upon previous approaches in many settings.
arXiv Detail & Related papers (2024-04-10T15:16:04Z) - Random Linear Projections Loss for Hyperplane-Based Optimization in Neural Networks [22.348887008547653]
This work introduces Random Linear Projections (RLP) loss, a novel approach that enhances training efficiency by leveraging geometric relationships within the data.
Our empirical evaluations, conducted across benchmark datasets and synthetic examples, demonstrate that neural networks trained with RLP loss outperform those trained with traditional loss functions.
arXiv Detail & Related papers (2023-11-21T05:22:39Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - RF+clust for Leave-One-Problem-Out Performance Prediction [0.9281671380673306]
We study leave-one-problem-out (LOPO) performance prediction.
We analyze whether standard random forest (RF) model predictions can be improved by calibrating them with a weighted average of performance values.
arXiv Detail & Related papers (2023-01-23T16:14:59Z) - Understanding and Improving Transfer Learning of Deep Models via Neural Collapse [37.483109067209504]
This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems.
We find strong correlation between feature collapse and downstream performance.
Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90%.
arXiv Detail & Related papers (2022-12-23T08:48:34Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.