Related papers: On the Effectiveness of Mode Exploration in Bayesian Model Averaging for Neural Networks

On the Effectiveness of Mode Exploration in Bayesian Model Averaging for Neural Networks

URL: http://arxiv.org/abs/2112.03773v1
Date: Tue, 7 Dec 2021 15:39:37 GMT
Title: On the Effectiveness of Mode Exploration in Bayesian Model Averaging for Neural Networks
Authors: John T. Holodnak and Allan B. Wollaber
Abstract summary: We evaluate several simple methods for exploring local regions of the weight space with respect to Brier score, accuracy, and expected calibration error. While adding separate modes uniformly improves performance, we show that the simple mode exploration methods considered here produce little to no improvement over ensembles without mode exploration.
Score: 1.3960152426268768
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multiple techniques for producing calibrated predictive probabilities using deep neural networks in supervised learning settings have emerged that leverage approaches to ensemble diverse solutions discovered during cyclic training or training from multiple random starting points (deep ensembles). However, only a limited amount of work has investigated the utility of exploring the local region around each diverse solution (posterior mode). Using three well-known deep architectures on the CIFAR-10 dataset, we evaluate several simple methods for exploring local regions of the weight space with respect to Brier score, accuracy, and expected calibration error. We consider both Bayesian inference techniques (variational inference and Hamiltonian Monte Carlo applied to the softmax output layer) as well as utilizing the stochastic gradient descent trajectory near optima. While adding separate modes to the ensemble uniformly improves performance, we show that the simple mode exploration methods considered here produce little to no improvement over ensembles without mode exploration.

Related papers

An Enhanced Classification Method Based on Adaptive Multi-Scale Fusion for Long-tailed Multispectral Point Clouds [67.96583737413296]
We propose an enhanced classification method based on adaptive multi-scale fusion for MPCs with long-tailed distributions. In the training set generation stage, a grid-balanced sampling strategy is designed to reliably generate training samples from sparse labeled datasets. In the feature learning stage, a multi-scale feature fusion module is proposed to fuse shallow features of land-covers at different scales.
arXiv Detail & Related papers (2024-12-16T03:21:20Z)
Improved off-policy training of diffusion samplers [93.66433483772055]
We study the problem of training diffusion models to sample from a distribution with an unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods. Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work.
arXiv Detail & Related papers (2024-02-07T18:51:49Z)
Implicit Variational Inference for High-Dimensional Posteriors [7.924706533725115]
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution. We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors. Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler.
arXiv Detail & Related papers (2023-10-10T14:06:56Z)
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution Shift [19.945634052291542]
We evaluate modern BDL algorithms on real-world datasets from the WILDS collection containing challenging classification and regression tasks. We compare the algorithms on a wide range of large, convolutional and transformer-based neural network architectures. We provide the first systematic evaluation of BDL for fine-tuning large pre-trained models.
arXiv Detail & Related papers (2023-06-21T14:36:03Z)
Classified as unknown: A novel Bayesian neural network [0.0]
We develop a new efficient Bayesian learning algorithm for fully connected neural networks. We generalize the algorithm for a single perceptron for binary classification in citeH to multi-layer perceptrons for multi-class classification.
arXiv Detail & Related papers (2023-01-31T04:27:09Z)
Feature Space Particle Inference for Neural Network Ensembles [13.392254060510666]
Particle-based inference methods offer a promising approach from a Bayesian perspective. We propose optimizing particles in the feature space where the activation of a specific intermediate layer lies. Our method encourages each member to capture distinct features, which is expected to improve ensemble prediction robustness.
arXiv Detail & Related papers (2022-06-02T09:16:26Z)
Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit. The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model. We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z)
Greedy Bayesian Posterior Approximation with Deep Ensembles [22.466176036646814]
Ensembles of independently trained objective are a state-of-the-art approach to estimate predictive uncertainty in Deep Learning. We show that our method is submodular with respect to the mixture of components for any problem in a function space.
arXiv Detail & Related papers (2021-05-29T11:35:27Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds. We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors. Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z)
Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence. We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.