Continual Learning from the Perspective of Compression
- URL: http://arxiv.org/abs/2006.15078v1
- Date: Fri, 26 Jun 2020 16:15:49 GMT
- Title: Continual Learning from the Perspective of Compression
- Authors: Xu He, Min Lin
- Abstract summary: Connectionist models such as neural networks suffer from catastrophic forgetting.
We propose a new continual learning method that combines ML plug-in and Bayesian mixture codes.
- Score: 28.90542302130312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Connectionist models such as neural networks suffer from catastrophic
forgetting. In this work, we study this problem from the perspective of
information theory and define forgetting as the increase of description lengths
of previous data when they are compressed with a sequentially learned model. In
addition, we show that continual learning approaches based on variational
posterior approximation and generative replay can be considered as
approximations to two prequential coding methods in compression, namely, the
Bayesian mixture code and maximum likelihood (ML) plug-in code. We compare
these approaches in terms of both compression and forgetting and empirically
study the reasons that limit the performance of continual learning methods
based on variational posterior approximation. To address these limitations, we
propose a new continual learning method that combines ML plug-in and Bayesian
mixture codes.
Related papers
- Leveraging viscous Hamilton-Jacobi PDEs for uncertainty quantification in scientific machine learning [1.8175282137722093]
Uncertainty (UQ) in scientific machine learning (SciML) combines the powerful predictive power of SciML with methods for quantifying the reliability of the learned models.
We provide a new interpretation for UQ problems by establishing a new theoretical connection between some Bayesian inference problems arising in SciML and viscous Hamilton-Jacobi partial differential equations (HJ PDEs)
We develop a new Riccati-based methodology that provides computational advantages when continuously updating the model predictions.
arXiv Detail & Related papers (2024-04-12T20:54:01Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - What learning algorithm is in-context learning? Investigations with
linear models [87.91612418166464]
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly.
We show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression.
Preliminary evidence that in-context learners share algorithmic features with these predictors.
arXiv Detail & Related papers (2022-11-28T18:59:51Z) - A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression [8.663322701649454]
We introduce a new empirical Bayes approach for large-scale multiple linear regression.
Our approach combines two key ideas: the use of flexible "adaptive shrinkage" priors and variational approximations.
We show that the posterior mean from our method solves a penalized regression problem.
arXiv Detail & Related papers (2022-08-23T12:42:57Z) - Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset.
Many approximation methods have been developed, which inevitably introduce approximation error.
This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior.
We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z) - Cooperative learning for multi-view analysis [2.368995563245609]
We propose a new method for supervised learning with multiple sets of features ("views")
Cooperative learning combines the usual squared error loss of predictions with an "agreement" penalty to encourage the predictions from different data views to agree.
We illustrate the effectiveness of our proposed method on simulated and real data examples.
arXiv Detail & Related papers (2021-12-23T03:13:25Z) - Surrogate Likelihoods for Variational Annealed Importance Sampling [11.144915453864854]
We introduce a surrogate likelihood that can be learned jointly with other variational parameters.
We show that our method performs well in practice and that it is well-suited for black-box inference in probabilistic programming frameworks.
arXiv Detail & Related papers (2021-12-22T19:49:45Z) - Learning Expected Emphatic Traces for Deep RL [32.984880782688535]
Off-policy sampling and experience replay are key for improving sample efficiency and scaling model-free temporal difference learning methods.
We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.
arXiv Detail & Related papers (2021-07-12T13:14:03Z) - Learned transform compression with optimized entropy encoding [72.20409648915398]
We consider the problem of learned transform compression where we learn both, the transform and the probability distribution over the discrete codes.
We employ a soft relaxation of the quantization operation to allow for back-propagation of gradients and employ vector (rather than scalar) quantization of the latent codes.
arXiv Detail & Related papers (2021-04-07T17:58:01Z) - Learning, compression, and leakage: Minimising classification error via
meta-universal compression principles [87.054014983402]
A promising group of compression techniques for learning scenarios is normalised maximum likelihood (NML) coding.
Here we consider a NML-based decision strategy for supervised classification problems, and show that it attains PAC learning when applied to a wide variety of models.
We show that the misclassification rate of our method is upper bounded by the maximal leakage, a recently proposed metric to quantify the potential of data leakage in privacy-sensitive scenarios.
arXiv Detail & Related papers (2020-10-14T20:03:58Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.