On the Effectiveness of Mode Exploration in Bayesian Model Averaging for
Neural Networks
- URL: http://arxiv.org/abs/2112.03773v1
- Date: Tue, 7 Dec 2021 15:39:37 GMT
- Title: On the Effectiveness of Mode Exploration in Bayesian Model Averaging for
Neural Networks
- Authors: John T. Holodnak and Allan B. Wollaber
- Abstract summary: We evaluate several simple methods for exploring local regions of the weight space with respect to Brier score, accuracy, and expected calibration error.
While adding separate modes uniformly improves performance, we show that the simple mode exploration methods considered here produce little to no improvement over ensembles without mode exploration.
- Score: 1.3960152426268768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiple techniques for producing calibrated predictive probabilities using
deep neural networks in supervised learning settings have emerged that leverage
approaches to ensemble diverse solutions discovered during cyclic training or
training from multiple random starting points (deep ensembles). However, only a
limited amount of work has investigated the utility of exploring the local
region around each diverse solution (posterior mode). Using three well-known
deep architectures on the CIFAR-10 dataset, we evaluate several simple methods
for exploring local regions of the weight space with respect to Brier score,
accuracy, and expected calibration error. We consider both Bayesian inference
techniques (variational inference and Hamiltonian Monte Carlo applied to the
softmax output layer) as well as utilizing the stochastic gradient descent
trajectory near optima. While adding separate modes to the ensemble uniformly
improves performance, we show that the simple mode exploration methods
considered here produce little to no improvement over ensembles without mode
exploration.
Related papers
- Improved off-policy training of diffusion samplers [93.66433483772055]
We study the problem of training diffusion models to sample from a distribution with an unnormalized density or energy function.
We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods.
Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work.
arXiv Detail & Related papers (2024-02-07T18:51:49Z) - Implicit Variational Inference for High-Dimensional Posteriors [7.924706533725115]
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution.
We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors.
Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler.
arXiv Detail & Related papers (2023-10-10T14:06:56Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep
Learning under Distribution Shift [19.945634052291542]
We evaluate modern BDL algorithms on real-world datasets from the WILDS collection containing challenging classification and regression tasks.
We compare the algorithms on a wide range of large, convolutional and transformer-based neural network architectures.
We provide the first systematic evaluation of BDL for fine-tuning large pre-trained models.
arXiv Detail & Related papers (2023-06-21T14:36:03Z) - Classified as unknown: A novel Bayesian neural network [0.0]
We develop a new efficient Bayesian learning algorithm for fully connected neural networks.
We generalize the algorithm for a single perceptron for binary classification in citeH to multi-layer perceptrons for multi-class classification.
arXiv Detail & Related papers (2023-01-31T04:27:09Z) - Feature Space Particle Inference for Neural Network Ensembles [13.392254060510666]
Particle-based inference methods offer a promising approach from a Bayesian perspective.
We propose optimizing particles in the feature space where the activation of a specific intermediate layer lies.
Our method encourages each member to capture distinct features, which is expected to improve ensemble prediction robustness.
arXiv Detail & Related papers (2022-06-02T09:16:26Z) - Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit.
The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model.
We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z) - Greedy Bayesian Posterior Approximation with Deep Ensembles [22.466176036646814]
Ensembles of independently trained objective are a state-of-the-art approach to estimate predictive uncertainty in Deep Learning.
We show that our method is submodular with respect to the mixture of components for any problem in a function space.
arXiv Detail & Related papers (2021-05-29T11:35:27Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds.
We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors.
Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z) - Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence.
We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.