Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks
- URL: http://arxiv.org/abs/2209.11366v4
- Date: Tue, 03 Dec 2024 22:44:12 GMT
- Title: Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks
- Authors: Ponkrshnan Thiagarajan, Susanta Ghosh,
- Abstract summary: We formulate a novel loss function for BNNs based on a new modification to the generalized Jensen-Shannon (JS) divergence, which is bounded.
We find that the JS divergence-based variational inference is intractable, and hence employed a constrained optimization framework to formulate these losses.
Our theoretical analysis and empirical experiments on multiple regression and classification data sets suggest that the proposed losses perform better than the KL divergence-based loss, especially when the data sets are noisy or biased.
- Score: 2.4554686192257424
- License:
- Abstract: Bayesian neural networks (BNNs) are state-of-the-art machine learning methods that can naturally regularize and systematically quantify uncertainties using their stochastic parameters. Kullback-Leibler (KL) divergence-based variational inference used in BNNs suffers from unstable optimization and challenges in approximating light-tailed posteriors due to the unbounded nature of the KL divergence. To resolve these issues, we formulate a novel loss function for BNNs based on a new modification to the generalized Jensen-Shannon (JS) divergence, which is bounded. In addition, we propose a Geometric JS divergence-based loss, which is computationally efficient since it can be evaluated analytically. We found that the JS divergence-based variational inference is intractable, and hence employed a constrained optimization framework to formulate these losses. Our theoretical analysis and empirical experiments on multiple regression and classification data sets suggest that the proposed losses perform better than the KL divergence-based loss, especially when the data sets are noisy or biased. Specifically, there are approximately 5% and 8% improvements in accuracy for a noise-added CIFAR-10 dataset and a regression dataset, respectively. There is about a 13% reduction in false negative predictions of a biased histopathology dataset. In addition, we quantify and compare the uncertainty metrics for the regression and classification tasks.
Related papers
- On weight and variance uncertainty in neural networks for regression tasks [1.6649383443094408]
We show that including the variance uncertainty can improve the prediction performance of the Bayesian NN.
We explore fully connected dense networks and dropout NNs with Gaussian and spike-and-slab priors, respectively, for the network weights.
arXiv Detail & Related papers (2025-01-08T04:44:47Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - How do noise tails impact on deep ReLU networks? [2.5889847253961418]
We show how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions.
We also contribute some new results on the approximation theory of deep ReLU neural networks.
arXiv Detail & Related papers (2022-03-20T00:27:32Z) - Cram\'er-Rao bound-informed training of neural networks for quantitative
MRI [11.964144201247198]
Neural networks are increasingly used to estimate parameters in quantitative MRI, in particular in magnetic resonance fingerprinting.
Their advantages are their superior speed and their dominance of the non-efficient unbiased estimator.
We find, however, that heterogeneous parameters are hard to estimate.
We propose a well-founded Cram'erRao loss function, which normalizes the squared error with respective CRB.
arXiv Detail & Related papers (2021-09-22T06:38:03Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Non-Asymptotic Performance Guarantees for Neural Estimation of
$\mathsf{f}$-Divergences [22.496696555768846]
Statistical distances quantify the dissimilarity between probability distributions.
A modern method for estimating such distances from data relies on parametrizing a variational form by a neural network (NN) and optimizing it.
This paper explores this tradeoff by means of non-asymptotic error bounds, focusing on three popular choices of SDs.
arXiv Detail & Related papers (2021-03-11T19:47:30Z) - A Biased Graph Neural Network Sampler with Near-Optimal Regret [57.70126763759996]
Graph neural networks (GNN) have emerged as a vehicle for applying deep network architectures to graph and relational data.
In this paper, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem.
We introduce a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded payouts.
arXiv Detail & Related papers (2021-03-01T15:55:58Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Frequentist Uncertainty in Recurrent Neural Networks via Blockwise
Influence Functions [121.10450359856242]
Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data.
Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods.
We develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals.
arXiv Detail & Related papers (2020-06-20T22:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.