Sampling in Constrained Domains with Orthogonal-Space Variational
Gradient Descent
- URL: http://arxiv.org/abs/2210.06447v1
- Date: Wed, 12 Oct 2022 17:51:13 GMT
- Title: Sampling in Constrained Domains with Orthogonal-Space Variational
Gradient Descent
- Authors: Ruqi Zhang, Qiang Liu, Xin T. Tong
- Abstract summary: We propose a new variational framework with a designed orthogonal-space gradient flow (O-Gradient) for sampling on a manifold.
We prove that O-Gradient converges to the target constrained distribution with rate $widetildeO (1/textthe number of iterations)$ under mild conditions.
- Score: 13.724361914659438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sampling methods, as important inference and learning techniques, are
typically designed for unconstrained domains. However, constraints are
ubiquitous in machine learning problems, such as those on safety, fairness,
robustness, and many other properties that must be satisfied to apply sampling
results in real-life applications. Enforcing these constraints often leads to
implicitly-defined manifolds, making efficient sampling with constraints very
challenging. In this paper, we propose a new variational framework with a
designed orthogonal-space gradient flow (O-Gradient) for sampling on a manifold
$\mathcal{G}_0$ defined by general equality constraints. O-Gradient decomposes
the gradient into two parts: one decreases the distance to $\mathcal{G}_0$ and
the other decreases the KL divergence in the orthogonal space. While most
existing manifold sampling methods require initialization on $\mathcal{G}_0$,
O-Gradient does not require such prior knowledge. We prove that O-Gradient
converges to the target constrained distribution with rate
$\widetilde{O}(1/\text{the number of iterations})$ under mild conditions. Our
proof relies on a new Stein characterization of conditional measure which could
be of independent interest. We implement O-Gradient through both Langevin
dynamics and Stein variational gradient descent and demonstrate its
effectiveness in various experiments, including Bayesian deep neural networks.
Related papers
- Functional Gradient Flows for Constrained Sampling [29.631753643887237]
We propose a new functional gradient ParVI method for constrained sampling, called constrained functional gradient flow (CFG)
We also present novel numerical strategies to handle the boundary integral term arising from the domain constraints.
arXiv Detail & Related papers (2024-10-30T16:20:48Z) - Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers [0.7704032792820767]
Deep neural networks are applied in more and more areas of everyday life.
They still lack essential abilities, such as robustly dealing with spatially transformed input signals.
We propose a novel technique to emulate such an inference process for neural nets.
arXiv Detail & Related papers (2024-05-06T09:47:29Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Gradient Estimation for Binary Latent Variables via Gradient Variance
Clipping [6.234350105794441]
gradient estimation is often necessary for fitting generative models with discrete latent variables.
DisARM and other estimators have potentially exploding variance near the boundary of the parameter space.
We propose a new gradient estimator textitbitflip-1 that has lower variance at the boundaries of the parameter space.
arXiv Detail & Related papers (2022-08-12T05:37:52Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Inexact and Stochastic Generalized Conditional Gradient with Augmented
Lagrangian and Proximal Step [2.0196229393131726]
We analyze inexact and versions of the CGALP algorithm developed in the authors' previous paper.
This allows one to compute some gradients, terms, and/or linear minimization oracles in an inexact fashion.
We show convergence of the Lagrangian to an optimum and feasibility of the affine constraint.
arXiv Detail & Related papers (2020-05-11T14:52:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.