Efficient Semi-Implicit Variational Inference
- URL: http://arxiv.org/abs/2101.06070v1
- Date: Fri, 15 Jan 2021 11:39:09 GMT
- Title: Efficient Semi-Implicit Variational Inference
- Authors: Vincent Moens, Hang Ren, Alexandre Maraval, Rasul Tutunov, Jun Wang,
Haitham Ammar
- Abstract summary: We propose an efficient and scalable semi-implicit extrapolational (SIVI)
Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
- Score: 65.07058307271329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose CI-VI an efficient and scalable solver for
semi-implicit variational inference (SIVI). Our method, first, maps SIVI's
evidence lower bound (ELBO) to a form involving a nonlinear functional nesting
of expected values and then develops a rigorous optimiser capable of correctly
handling bias inherent to nonlinear nested expectations using an
extrapolation-smoothing mechanism coupled with gradient sketching. Our
theoretical results demonstrate convergence to a stationary point of the ELBO
in general non-convex settings typically arising when using deep network models
and an order of $O(t^{-\frac{4}{5}})$ gradient-bias-vanishing rate. We believe
these results generalise beyond the specific nesting arising from SIVI to other
forms. Finally, in a set of experiments, we demonstrate the effectiveness of
our algorithm in approximating complex posteriors on various data-sets
including those from natural language processing.
Related papers
- Independently-Normalized SGD for Generalized-Smooth Nonconvex Optimization [19.000530691874516]
We show that many non machine learning problems meet that kind of condition that extends beyond traditional non-smoothepseps.
We propose an independently-normalized gradient descent algorithm, which leverages independent sampling and normalization.
arXiv Detail & Related papers (2024-10-17T21:52:00Z) - Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) in handling certain multi-scale problems.
We show that IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z) - Variational Bayesian Optimal Experimental Design with Normalizing Flows [0.837622912636323]
Variational OED estimates a lower bound of the EIG without likelihood evaluations.
We introduce the use of normalizing flows for representing variational distributions in vOED.
We show that a composition of 4--5 layers is able to achieve lower EIG estimation bias.
arXiv Detail & Related papers (2024-04-08T14:44:21Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z) - Convex Representation Learning for Generalized Invariance in
Semi-Inner-Product Space [32.442549424823355]
In this work we develop an algorithm for a variety of generalized representations in a semi-norms that representers in a lead, and bounds are established.
This allows in representations to be learned efficiently and effectively as confirmed in our experiments along with accurate predictions.
arXiv Detail & Related papers (2020-04-25T18:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.