Optimizing Variational Representations of Divergences and Accelerating
their Statistical Estimation
- URL: http://arxiv.org/abs/2006.08781v3
- Date: Wed, 23 Mar 2022 18:32:59 GMT
- Title: Optimizing Variational Representations of Divergences and Accelerating
their Statistical Estimation
- Authors: Jeremiah Birrell, Markos A. Katsoulakis, Yannis Pantazis
- Abstract summary: Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights.
They have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models.
We develop a methodology for building new, tighter variational representations of divergences.
- Score: 6.34892104858556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variational representations of divergences and distances between
high-dimensional probability distributions offer significant theoretical
insights and practical advantages in numerous research areas. Recently, they
have gained popularity in machine learning as a tractable and scalable approach
for training probabilistic models and for statistically differentiating between
data distributions. Their advantages include: 1) They can be estimated from
data as statistical averages. 2) Such representations can leverage the ability
of neural networks to efficiently approximate optimal solutions in function
spaces. However, a systematic and practical approach to improving the tightness
of such variational formulas, and accordingly accelerate statistical learning
and estimation from data, is currently lacking. Here we develop such a
methodology for building new, tighter variational representations of
divergences. Our approach relies on improved objective functionals constructed
via an auxiliary optimization problem. Furthermore, the calculation of the
functional Hessian of objective functionals unveils the local curvature
differences around the common optimal variational solution; this quantifies and
orders the tightness gains between different variational representations.
Finally, numerical simulations utilizing neural network optimization
demonstrate that tighter representations can result in significantly faster
learning and more accurate estimation of divergences in both synthetic and real
datasets (of more than 1000 dimensions), often accelerated by nearly an order
of magnitude.
Related papers
- Nonparametric Automatic Differentiation Variational Inference with
Spline Approximation [7.5620760132717795]
We develop a nonparametric approximation approach that enables flexible posterior approximation for distributions with complicated structures.
Compared with widely-used nonparametrical inference methods, the proposed method is easy to implement and adaptive to various data structures.
Experiments demonstrate the efficiency of the proposed method in approximating complex posterior distributions and improving the performance of generative models with incomplete data.
arXiv Detail & Related papers (2024-03-10T20:22:06Z) - Implicit Variational Inference for High-Dimensional Posteriors [7.924706533725115]
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution.
We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors.
Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler.
arXiv Detail & Related papers (2023-10-10T14:06:56Z) - Neural lasso: a unifying approach of lasso and neural networks [0.27624021966289597]
The statistical technique lasso for variable selection is represented through a neural network.
It is observed that, although both the statistical approach and its neural version have the same objective function, they differ due to their optimization.
A new optimization algorithm for identifying the significant variables emerged.
arXiv Detail & Related papers (2023-09-07T15:17:10Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Learning Likelihood Ratios with Neural Network Classifiers [0.12277343096128711]
approximations of the likelihood ratio may be computed using clever parametrizations of neural network-based classifiers.
We present a series of empirical studies detailing the performance of several common loss functionals and parametrizations of the classifier output.
arXiv Detail & Related papers (2023-05-17T18:11:38Z) - A Free Lunch with Influence Functions? Improving Neural Network
Estimates with Concepts from Semiparametric Statistics [41.99023989695363]
We explore the potential for semiparametric theory to be used to improve neural networks and machine learning algorithms.
We propose a new neural network method MultiNet, which seeks the flexibility and diversity of an ensemble using a single architecture.
arXiv Detail & Related papers (2022-02-18T09:35:51Z) - Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models.
Common methods often consider the feature statistics as deterministic values measured from the learned features.
We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
arXiv Detail & Related papers (2022-02-08T16:09:12Z) - Influence Estimation and Maximization via Neural Mean-Field Dynamics [60.91291234832546]
We propose a novel learning framework using neural mean-field (NMF) dynamics for inference and estimation problems.
Our framework can simultaneously learn the structure of the diffusion network and the evolution of node infection probabilities.
arXiv Detail & Related papers (2021-06-03T00:02:05Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.