Variational Density Propagation Continual Learning
- URL: http://arxiv.org/abs/2308.11801v1
- Date: Tue, 22 Aug 2023 21:51:39 GMT
- Title: Variational Density Propagation Continual Learning
- Authors: Christopher Angelini, Nidhal Bouaynaya, and Ghulam Rasool
- Abstract summary: Deep Neural Networks (DNNs) deployed to the real world are regularly subject to out-of-distribution (OoD) data.
This paper proposes a framework for adapting to data distribution drift modeled by benchmark Continual Learning datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks (DNNs) deployed to the real world are regularly subject
to out-of-distribution (OoD) data, various types of noise, and shifting
conceptual objectives. This paper proposes a framework for adapting to data
distribution drift modeled by benchmark Continual Learning datasets. We develop
and evaluate a method of Continual Learning that leverages uncertainty
quantification from Bayesian Inference to mitigate catastrophic forgetting. We
expand on previous approaches by removing the need for Monte Carlo sampling of
the model weights to sample the predictive distribution. We optimize a
closed-form Evidence Lower Bound (ELBO) objective approximating the predictive
distribution by propagating the first two moments of a distribution, i.e. mean
and covariance, through all network layers. Catastrophic forgetting is
mitigated by using the closed-form ELBO to approximate the Minimum Description
Length (MDL) Principle, inherently penalizing changes in the model likelihood
by minimizing the KL Divergence between the variational posterior for the
current task and the previous task's variational posterior acting as the prior.
Leveraging the approximation of the MDL principle, we aim to initially learn a
sparse variational posterior and then minimize additional model complexity
learned for subsequent tasks. Our approach is evaluated for the task
incremental learning scenario using density propagated versions of
fully-connected and convolutional neural networks across multiple sequential
benchmark datasets with varying task sequence lengths. Ultimately, this
procedure produces a minimally complex network over a series of tasks
mitigating catastrophic forgetting.
Related papers
- Posterior and variational inference for deep neural networks with heavy-tailed weights [0.0]
We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random.
We show that the corresponding posterior distribution achieves near-optimal minimax contraction rates.
We also provide variational Bayes counterparts of the results, that show that mean-field variational approximations still benefit from near-optimal theoretical support.
arXiv Detail & Related papers (2024-06-05T15:24:20Z) - Enhancing Transfer Learning with Flexible Nonparametric Posterior
Sampling [22.047309973614876]
This paper introduces nonparametric transfer learning (NPTL), a flexible posterior sampling method to address the distribution shift issue.
NPTL is suitable for transfer learning scenarios that may involve the distribution shift between upstream and downstream tasks.
arXiv Detail & Related papers (2024-03-12T03:26:58Z) - SPDE priors for uncertainty quantification of end-to-end neural data
assimilation schemes [4.213142548113385]
Recent advances in the deep learning community enables to adress this problem as neural architecture embedding data assimilation variational framework.
In this work, we draw from SPDE-based Processes to estimate prior models able to handle non-stationary covariances in both space and time.
Our neural variational scheme is modified to embed an augmented state formulation with both state SPDE parametrization to estimate.
arXiv Detail & Related papers (2024-02-02T19:18:12Z) - Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution.
We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Ensemble Modeling for Multimodal Visual Action Recognition [50.38638300332429]
We propose an ensemble modeling approach for multimodal action recognition.
We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
arXiv Detail & Related papers (2023-08-10T08:43:20Z) - Adversarial Adaptive Sampling: Unify PINN and Optimal Transport for the Approximation of PDEs [2.526490864645154]
We propose a new minmax formulation to optimize simultaneously the approximate solution, given by a neural network model, and the random samples in the training set.
The key idea is to use a deep generative model to adjust random samples in the training set such that the residual induced by the approximate PDE solution can maintain a smooth profile.
arXiv Detail & Related papers (2023-05-30T02:59:18Z) - Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables.
We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption.
We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.