Sequential Bayesian Neural Subnetwork Ensembles
- URL: http://arxiv.org/abs/2206.00794v1
- Date: Wed, 1 Jun 2022 22:57:52 GMT
- Title: Sequential Bayesian Neural Subnetwork Ensembles
- Authors: Sanket Jantre, Sandeep Madireddy, Shrijita Bhattacharya, Tapabrata
Maiti, Prasanna Balaprakash
- Abstract summary: We propose a sequential ensembling of dynamic Bayesian neuralworks that reduces model complexity through sparsity-inducing priors.
We empirically demonstrate that our proposed approach surpasses the baselines of the dense frequentist and Bayesian ensemble models.
- Score: 3.954301343416333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural network ensembles that appeal to model diversity have been used
successfully to improve predictive performance and model robustness in several
applications. Whereas, it has recently been shown that sparse subnetworks of
dense models can match the performance of their dense counterparts and increase
their robustness while effectively decreasing the model complexity. However,
most ensembling techniques require multiple parallel and costly evaluations and
have been proposed primarily with deterministic models, whereas sparsity
induction has been mostly done through ad-hoc pruning. We propose sequential
ensembling of dynamic Bayesian neural subnetworks that systematically reduce
model complexity through sparsity-inducing priors and generate diverse
ensembles in a single forward pass of the model. The ensembling strategy
consists of an exploration phase that finds high-performing regions of the
parameter space and multiple exploitation phases that effectively exploit the
compactness of the sparse model to quickly converge to different minima in the
energy landscape corresponding to high-performing subnetworks yielding diverse
ensembles. We empirically demonstrate that our proposed approach surpasses the
baselines of the dense frequentist and Bayesian ensemble models in prediction
accuracy, uncertainty estimation, and out-of-distribution (OoD) robustness on
CIFAR10, CIFAR100 datasets, and their out-of-distribution variants: CIFAR10-C,
CIFAR100-C induced by corruptions. Furthermore, we found that our approach
produced the most diverse ensembles compared to the approaches with a single
forward pass and even compared to the approaches with multiple forward passes
in some cases.
Related papers
- Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z) - Preconditioned Inexact Stochastic ADMM for Deep Model [35.37705488695026]
This paper develops an algorithm, PISA, which enables scalable parallel computing and supports various second-moment schemes.
Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz of the gradient.
Comprehensive experimental evaluations for or fine-tuning diverse FMs, including vision models, large language models, reinforcement learning models, generative adversarial networks, and recurrent neural networks, demonstrate its superior numerical performance compared to various state-of-the-art Directions.
arXiv Detail & Related papers (2025-02-15T12:28:51Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.
Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.
We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Dynamic ensemble selection based on Deep Neural Network Uncertainty
Estimation for Adversarial Robustness [7.158144011836533]
This work explore the dynamic attributes in model level through dynamic ensemble selection technology.
In training phase the Dirichlet distribution is apply as prior of sub-models' predictive distribution, and the diversity constraint in parameter space is introduced.
In test phase, the certain sub-models are dynamically selected based on their rank of uncertainty value for the final prediction.
arXiv Detail & Related papers (2023-08-01T07:41:41Z) - Evolve Smoothly, Fit Consistently: Learning Smooth Latent Dynamics For
Advection-Dominated Systems [14.553972457854517]
We present a data-driven, space-time continuous framework to learn surrogatemodels for complex physical systems.
We leverage the expressive power of the network and aspecially designed consistency-inducing regularization to obtain latent trajectories that are both low-dimensional and smooth.
arXiv Detail & Related papers (2023-01-25T03:06:03Z) - FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear
Modulation [69.34011200590817]
We introduce FiLM-Ensemble, a deep, implicit ensemble method based on the concept of Feature-wise Linear Modulation.
By modulating the network activations of a single deep network with FiLM, one obtains a model ensemble with high diversity.
We show that FiLM-Ensemble outperforms other implicit ensemble methods, and it comes very close to the upper bound of an explicit ensemble of networks.
arXiv Detail & Related papers (2022-05-31T18:33:15Z) - Deep-Ensemble-Based Uncertainty Quantification in Spatiotemporal Graph
Neural Networks for Traffic Forecasting [2.088376060651494]
We focus on a diffusion convolutional recurrent neural network (DCRNN), a state-of-the-art method for short-term traffic forecasting.
We develop a scalable deep ensemble approach to quantify uncertainties for DCRNN.
We show that our generic and scalable approach outperforms the current state-of-the-art Bayesian and a number of other commonly used frequentist techniques.
arXiv Detail & Related papers (2022-04-04T16:10:55Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - Dynamic Gaussian Mixture based Deep Generative Model For Robust
Forecasting on Sparse Multivariate Time Series [43.86737761236125]
We propose a novel generative model, which tracks the transition of latent clusters, instead of isolated feature representations.
It is characterized by a newly designed dynamic Gaussian mixture distribution, which captures the dynamics of clustering structures.
A structured inference network is also designed for enabling inductive analysis.
arXiv Detail & Related papers (2021-03-03T04:10:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.