A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural
Networks
- URL: http://arxiv.org/abs/2209.11366v1
- Date: Fri, 23 Sep 2022 01:47:09 GMT
- Title: A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural
Networks
- Authors: Ponkrshnan Thiagarajan and Susanta Ghosh
- Abstract summary: We formulate a novel loss function for BNNs based on the geometric JS divergence and show that the conventional KL divergence-based loss function is its special case.
We demonstrate performance improvements over the state-of-the-art KL divergence-based BNN on the classification of a noisy CIFAR data set.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kullback-Leibler (KL) divergence is widely used for variational inference of
Bayesian Neural Networks (BNNs). However, the KL divergence has limitations
such as unboundedness and asymmetry. We examine the Jensen-Shannon (JS)
divergence that is more general, bounded, and symmetric. We formulate a novel
loss function for BNNs based on the geometric JS divergence and show that the
conventional KL divergence-based loss function is its special case. We evaluate
the divergence part of the proposed loss function in a closed form for a
Gaussian prior. For any other general prior, Monte Carlo approximations can be
used. We provide algorithms for implementing both of these cases. We
demonstrate that the proposed loss function offers an additional parameter that
can be tuned to control the degree of regularisation. We derive the conditions
under which the proposed loss function regularises better than the KL
divergence-based loss function for Gaussian priors and posteriors. We
demonstrate performance improvements over the state-of-the-art KL
divergence-based BNN on the classification of a noisy CIFAR data set and a
biased histopathology data set.
Related papers
- On weight and variance uncertainty in neural networks for regression tasks [1.6649383443094408]
We show that including the variance uncertainty can improve the prediction performance of the Bayesian NN.
We explore fully connected dense networks and dropout NNs with Gaussian and spike-and-slab priors, respectively, for the network weights.
arXiv Detail & Related papers (2025-01-08T04:44:47Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - How do noise tails impact on deep ReLU networks? [2.5889847253961418]
We show how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions.
We also contribute some new results on the approximation theory of deep ReLU neural networks.
arXiv Detail & Related papers (2022-03-20T00:27:32Z) - Cram\'er-Rao bound-informed training of neural networks for quantitative
MRI [11.964144201247198]
Neural networks are increasingly used to estimate parameters in quantitative MRI, in particular in magnetic resonance fingerprinting.
Their advantages are their superior speed and their dominance of the non-efficient unbiased estimator.
We find, however, that heterogeneous parameters are hard to estimate.
We propose a well-founded Cram'erRao loss function, which normalizes the squared error with respective CRB.
arXiv Detail & Related papers (2021-09-22T06:38:03Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Non-Asymptotic Performance Guarantees for Neural Estimation of
$\mathsf{f}$-Divergences [22.496696555768846]
Statistical distances quantify the dissimilarity between probability distributions.
A modern method for estimating such distances from data relies on parametrizing a variational form by a neural network (NN) and optimizing it.
This paper explores this tradeoff by means of non-asymptotic error bounds, focusing on three popular choices of SDs.
arXiv Detail & Related papers (2021-03-11T19:47:30Z) - A Biased Graph Neural Network Sampler with Near-Optimal Regret [57.70126763759996]
Graph neural networks (GNN) have emerged as a vehicle for applying deep network architectures to graph and relational data.
In this paper, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem.
We introduce a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded payouts.
arXiv Detail & Related papers (2021-03-01T15:55:58Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Frequentist Uncertainty in Recurrent Neural Networks via Blockwise
Influence Functions [121.10450359856242]
Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data.
Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods.
We develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals.
arXiv Detail & Related papers (2020-06-20T22:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.