Variational Representations and Neural Network Estimation of R\'enyi
Divergences
- URL: http://arxiv.org/abs/2007.03814v4
- Date: Tue, 20 Jul 2021 16:12:35 GMT
- Title: Variational Representations and Neural Network Estimation of R\'enyi
Divergences
- Authors: Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis, Luc Rey-Bellet,
Jie Wang
- Abstract summary: We derive a new variational formula for the R'enyi family of divergences, $R_alpha(Q|P)$, between probability measures $Q$ and $P$.
By applying this theory to neural-network estimators, we show that if a neural network family satisfies one of several strengthened versions of the universal approximation property then the corresponding R'enyi divergence estimator is consistent.
- Score: 4.2896536463351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We derive a new variational formula for the R\'enyi family of divergences,
$R_\alpha(Q\|P)$, between probability measures $Q$ and $P$. Our result
generalizes the classical Donsker-Varadhan variational formula for the
Kullback-Leibler divergence. We further show that this R\'enyi variational
formula holds over a range of function spaces; this leads to a formula for the
optimizer under very weak assumptions and is also key in our development of a
consistency theory for R\'enyi divergence estimators. By applying this theory
to neural-network estimators, we show that if a neural network family satisfies
one of several strengthened versions of the universal approximation property
then the corresponding R\'enyi divergence estimator is consistent. In contrast
to density-estimator based methods, our estimators involve only expectations
under $Q$ and $P$ and hence are more effective in high dimensional systems. We
illustrate this via several numerical examples of neural network estimation in
systems of up to 5000 dimensions.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Function-space regularized R\'enyi divergences [6.221019624345409]
We propose a new family of regularized R'enyi divergences parametrized by a variational function space.
We prove several properties of these new divergences, showing that they interpolate between the classical R'enyi divergences and IPMs.
We show that the proposed regularized R'enyi divergences inherit features from IPMs such as the ability to compare distributions that are not absolutely continuous.
arXiv Detail & Related papers (2022-10-10T19:18:04Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Neural Estimation of Statistical Divergences [24.78742908726579]
A modern method for estimating statistical divergences relies on parametrizing an empirical variational form by a neural network (NN)
In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation.
We show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.
arXiv Detail & Related papers (2021-10-07T17:42:44Z) - Estimation of a regression function on a manifold by fully connected
deep neural networks [6.058868817939519]
The rate of convergence of least squares estimates based on fully connected spaces of deep neural networks with ReLU activation function is analyzed.
It is shown that in case that the distribution of the predictor variable is concentrated on a manifold, these estimates achieve a rate of convergence which depends on the dimension of the manifold and not on the number of components of the predictor variable.
arXiv Detail & Related papers (2021-07-20T14:43:59Z) - A unified view of likelihood ratio and reparameterization gradients [91.4645013545015]
We use a first principles approach to explain that LR and RP are alternative methods of keeping track of the movement of probability mass.
We show that the space of all possible estimators combining LR and RP can be completely parameterized by a flow field.
We prove that there cannot exist a single-sample estimator of this type outside our space, thus, clarifying where we should be searching for better Monte Carlo gradient estimators.
arXiv Detail & Related papers (2021-05-31T11:53:08Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Infinitely Deep Bayesian Neural Networks with Stochastic Differential
Equations [37.02511585732081]
We perform scalable approximate inference in a recently-proposed family of continuous-depth neural networks.
We demonstrate gradient-based variational inference, producing arbitrarily-flexible approximate posteriors.
This approach further inherits the memory-efficient training and tunable precision of neural ODEs.
arXiv Detail & Related papers (2021-02-12T14:48:58Z) - The k-tied Normal Distribution: A Compact Parameterization of Gaussian
Mean Field Posteriors in Bayesian Neural Networks [46.677567663908185]
Variational Bayesian Inference is a popular methodology for approxing posteriorimating over Bayesian neural network weights.
Recent work has explored ever richer parameterizations of the approximate posterior in the hope of improving performance.
We find that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance.
arXiv Detail & Related papers (2020-02-07T07:33:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.