Related papers: A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural Networks

A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural Networks

URL: http://arxiv.org/abs/2209.11366v1
Date: Fri, 23 Sep 2022 01:47:09 GMT
Title: A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural Networks
Authors: Ponkrshnan Thiagarajan and Susanta Ghosh
Abstract summary: We formulate a novel loss function for BNNs based on the geometric JS divergence and show that the conventional KL divergence-based loss function is its special case. We demonstrate performance improvements over the state-of-the-art KL divergence-based BNN on the classification of a noisy CIFAR data set.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Kullback-Leibler (KL) divergence is widely used for variational inference of Bayesian Neural Networks (BNNs). However, the KL divergence has limitations such as unboundedness and asymmetry. We examine the Jensen-Shannon (JS) divergence that is more general, bounded, and symmetric. We formulate a novel loss function for BNNs based on the geometric JS divergence and show that the conventional KL divergence-based loss function is its special case. We evaluate the divergence part of the proposed loss function in a closed form for a Gaussian prior. For any other general prior, Monte Carlo approximations can be used. We provide algorithms for implementing both of these cases. We demonstrate that the proposed loss function offers an additional parameter that can be tuned to control the degree of regularisation. We derive the conditions under which the proposed loss function regularises better than the KL divergence-based loss function for Gaussian priors and posteriors. We demonstrate performance improvements over the state-of-the-art KL divergence-based BNN on the classification of a noisy CIFAR data set and a biased histopathology data set.

Related papers

On weight and variance uncertainty in neural networks for regression tasks [1.6649383443094408]
We show that including the variance uncertainty can improve the prediction performance of the Bayesian NN. We explore fully connected dense networks and dropout NNs with Gaussian and spike-and-slab priors, respectively, for the network weights.
arXiv Detail & Related papers (2025-01-08T04:44:47Z)
A Robust Quantile Huber Loss With Interpretable Parameter Adjustment In Distributional Reinforcement Learning [19.89141873890568]
This paper introduces a generalized quantile Huber loss function derived from Wasserstein distance (WD) calculation. Compared to the classical quantile Huber loss, this innovative loss function enhances robustness against outliers. Empirical tests on Atari games, a common application in distributional RL, and a recent hedging strategy using distributional RL, validate our proposed loss function.
arXiv Detail & Related papers (2024-01-04T15:51:49Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
How do noise tails impact on deep ReLU networks? [2.5889847253961418]
We show how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions. We also contribute some new results on the approximation theory of deep ReLU neural networks.
arXiv Detail & Related papers (2022-03-20T00:27:32Z)
Robust Estimation for Nonparametric Families via Generative Adversarial Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems. Our work extend these to robust mean estimation, second moment estimation, and robust linear regression. In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Cram\'er-Rao bound-informed training of neural networks for quantitative MRI [11.964144201247198]
Neural networks are increasingly used to estimate parameters in quantitative MRI, in particular in magnetic resonance fingerprinting. Their advantages are their superior speed and their dominance of the non-efficient unbiased estimator. We find, however, that heterogeneous parameters are hard to estimate. We propose a well-founded Cram'erRao loss function, which normalizes the squared error with respective CRB.
arXiv Detail & Related papers (2021-09-22T06:38:03Z)
Sampling-free Variational Inference for Neural Networks with Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference. Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z)
Non-Asymptotic Performance Guarantees for Neural Estimation of $\mathsf{f}$-Divergences [22.496696555768846]
Statistical distances quantify the dissimilarity between probability distributions. A modern method for estimating such distances from data relies on parametrizing a variational form by a neural network (NN) and optimizing it. This paper explores this tradeoff by means of non-asymptotic error bounds, focusing on three popular choices of SDs.
arXiv Detail & Related papers (2021-03-11T19:47:30Z)
A Biased Graph Neural Network Sampler with Near-Optimal Regret [57.70126763759996]
Graph neural networks (GNN) have emerged as a vehicle for applying deep network architectures to graph and relational data. In this paper, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem. We introduce a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded payouts.
arXiv Detail & Related papers (2021-03-01T15:55:58Z)
Understanding Variational Inference in Function-Space [20.940162027560408]
We highlight some advantages and limitations of employing the Kullback-Leibler divergence in this setting. We propose (featurized) Bayesian linear regression as a benchmark for function-space' inference methods that directly measures approximation quality.
arXiv Detail & Related papers (2020-11-18T17:42:01Z)
An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence [65.24701908364383]
A Bayesian treatment can mitigate overconfidence in ReLU nets around the training data. But far away from them, ReLU neural networks (BNNs) can still underestimate uncertainty and thus be overconfident. We show that it can be applied emphpost-hoc to any pre-trained ReLU BNN at a low cost.
arXiv Detail & Related papers (2020-10-06T13:32:18Z)
Empirical Strategy for Stretching Probability Distribution in Neural-network-based Regression [5.35308390309106]
In regression analysis under artificial neural networks, the prediction performance depends on determining the appropriate weights between layers. We proposed weighted empirical stretching (WES) as a novel loss function to increase the overlap area of the two distributions. The improved results in RMSE for the extreme domain are expected to be utilized for prediction of abnormal events in non-linear complex systems.
arXiv Detail & Related papers (2020-09-08T06:08:14Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions [121.10450359856242]
Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data. Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods. We develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals.
arXiv Detail & Related papers (2020-06-20T22:45:32Z)
Cumulant GAN [17.4556035872983]
We propose a novel loss function for training Generative Adversarial Networks (GANs) We show that the corresponding optimization problem is equivalent to R'enyi divergence minimization. We experimentally demonstrate that image generation is more robust relative to Wasserstein GAN.
arXiv Detail & Related papers (2020-06-11T17:23:02Z)
Bayesian Neural Network via Stochastic Gradient Descent [0.0]
We show how gradient estimation can be applied on bayesian neural networks by gradient estimation techniques. Our work considerably beats the previous state of the art approaches for regression using bayesian neural networks.
arXiv Detail & Related papers (2020-06-04T18:33:59Z)
Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression. The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.