Efficient Weight-Space Laplace-Gaussian Filtering and Smoothing for Sequential Deep Learning
- URL: http://arxiv.org/abs/2410.06800v1
- Date: Wed, 9 Oct 2024 11:54:33 GMT
- Title: Efficient Weight-Space Laplace-Gaussian Filtering and Smoothing for Sequential Deep Learning
- Authors: Joanna Sliwa, Frank Schneider, Nathanael Bosch, Agustinus Kristiadi, Philipp Hennig,
- Abstract summary: Efficiently learning a sequence of related tasks, such as in continual learning, poses a significant challenge for neural nets.
We address this challenge with a grounded framework for sequentially learning related tasks based on Bayesian inference.
- Score: 29.328769628694484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficiently learning a sequence of related tasks, such as in continual learning, poses a significant challenge for neural nets due to the delicate trade-off between catastrophic forgetting and loss of plasticity. We address this challenge with a grounded framework for sequentially learning related tasks based on Bayesian inference. Specifically, we treat the model's parameters as a nonlinear Gaussian state-space model and perform efficient inference using Gaussian filtering and smoothing. This general formalism subsumes existing continual learning approaches, while also offering a clearer conceptual understanding of its components. Leveraging Laplace approximations during filtering, we construct Gaussian posterior measures on the weight space of a neural network for each task. We use it as an efficient regularizer by exploiting the structure of the generalized Gauss-Newton matrix (GGN) to construct diagonal plus low-rank approximations. The dynamics model allows targeted control of the learning process and the incorporation of domain-specific knowledge, such as modeling the type of shift between tasks. Additionally, using Bayesian approximate smoothing can enhance the performance of task-specific models without needing to re-access any data.
Related papers
- Model-Based Control with Sparse Neural Dynamics [23.961218902837807]
We propose a new framework for integrated model learning and predictive control.
We show that our framework can deliver better closed-loop performance than existing state-of-the-art methods.
arXiv Detail & Related papers (2023-12-20T06:25:02Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with
Gaussian Processes [39.411957858548355]
We show how to achieve smoother model predictive factor control using online sequential inference.
We evaluate this approach on several robot control tasks, matching to sample prior methods while also ensuring smoothness.
arXiv Detail & Related papers (2022-10-07T12:56:31Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Learning Representation for Bayesian Optimization with Collision-free
Regularization [13.476552258272402]
Large-scale, high-dimensional, and non-stationary datasets are common in real-world scenarios.
Recent works attempt to handle such input by applying neural networks ahead of the classical Gaussian process to learn a latent representation.
We show that even with proper network design, such learned representation often leads to collision in the latent space.
We propose LOCo, an efficient deep Bayesian optimization framework which employs a novel regularizer to reduce the collision in the learned latent space.
arXiv Detail & Related papers (2022-03-16T14:44:16Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - FG-Net: Fast Large-Scale LiDAR Point CloudsUnderstanding Network
Leveraging CorrelatedFeature Mining and Geometric-Aware Modelling [15.059508985699575]
FG-Net is a general deep learning framework for large-scale point clouds understanding without voxelizations.
We propose a deep convolutional neural network leveraging correlated feature mining and deformable convolution based geometric-aware modelling.
Our approaches outperform state-of-the-art approaches in terms of accuracy and efficiency.
arXiv Detail & Related papers (2020-12-17T08:20:09Z) - Sample-efficient reinforcement learning using deep Gaussian processes [18.044018772331636]
Reinforcement learning provides a framework for learning to control which actions to take towards completing a task through trial-and-error.
In model-based reinforcement learning efficiency is improved by learning to simulate the world dynamics.
We introduce deep Gaussian processes where the depth of the compositions introduces model complexity while incorporating prior knowledge on the dynamics brings smoothness and structure.
arXiv Detail & Related papers (2020-11-02T13:37:57Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.