Related papers: Global convergence of optimized adaptive importance samplers

Global convergence of optimized adaptive importance samplers

URL: http://arxiv.org/abs/2201.00409v2
Date: Sun, 28 Jan 2024 12:15:01 GMT
Title: Global convergence of optimized adaptive importance samplers
Authors: \"Omer Deniz Akyildiz
Abstract summary: We analyze the optimized adaptive importance sampler (OAIS) for performing Monte Carlo integration with general proposals. We derive nonasymptotic bounds for the global gradient of $chi2$-divergence for proposals.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We analyze the optimized adaptive importance sampler (OAIS) for performing Monte Carlo integration with general proposals. We leverage a classical result which shows that the bias and the mean-squared error (MSE) of the importance sampling scales with the $\chi^2$-divergence between the target and the proposal and develop a scheme which performs global optimization of $\chi^2$-divergence. While it is known that this quantity is convex for exponential family proposals, the case of the general proposals has been an open problem. We close this gap by utilizing the nonasymptotic bounds for stochastic gradient Langevin dynamics (SGLD) for the global optimization of $\chi^2$-divergence and derive nonasymptotic bounds for the MSE by leveraging recent results from non-convex optimization literature. The resulting AIS schemes have explicit theoretical guarantees that are uniform-in-time.

Related papers

A Kernel Approach for Semi-implicit Variational Inference [21.789560144560127]
Semi-implicit variational inference (SIVI) enhances the expressiveness of variational families through hierarchical semi-implicit distributions.<n>Recent score-matching approaches to SIVI (SIVI-SM) address this issue via a minimax formulation.<n>We propose kernel semi-implicit variational inference (KSIVI), a principled and tractable alternative.
arXiv Detail & Related papers (2026-01-17T12:06:12Z)
Local Entropy Search over Descent Sequences for Bayesian Optimization [48.7994415668802]
A practical alternative is to iteratively refine the neighborhood of an initial design using local optimization methods such as gradient descent.<n>We propose local entropy search (LES), a Bayesian optimization paradigm that explicitly targets the solutions reachable by the descent sequences.
arXiv Detail & Related papers (2025-11-24T15:52:17Z)
Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization [66.08522228989634]
We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV)<n>This is achieved by adopting a lifted perspective through mean-field Langevin dynamics (MFLD)
arXiv Detail & Related papers (2025-11-18T17:51:17Z)
Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models [5.076419064097734]
We analyze a matrix Gauss-Newton (SGN) method with Levenberg-Marquardt damping and mini-batch sampling.<n>Our theoretical results identify a favorable generalization regime for SGN in which a larger minimum eigenvalue of the Gauss-Newton along the optimization path yields tighter stability bounds.
arXiv Detail & Related papers (2025-11-06T01:50:45Z)
On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization [57.179679246370114]
A potential limitation of existing methods is the bias inherent in most perturbation estimators unless a stepsize is proposed.<n>We propose a novel family of unbiased gradient scaling estimators that eliminate bias while maintaining favorable construction.
arXiv Detail & Related papers (2025-10-22T18:25:43Z)
Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z)
Diffusion Stochastic Optimization for Min-Max Problems [33.73046548872663]
The optimistic gradient method is useful in addressing minimax optimization problems. Motivated by the observation that the conventional version suffers from the need for a large batch size, we introduce and analyze a new formulation termed Samevareps-generativeOGOG.
arXiv Detail & Related papers (2024-01-26T01:16:59Z)
Generalizing Gaussian Smoothing for Random Search [23.381986209234164]
Gaussian smoothing (GS) is a derivative-free optimization algorithm that estimates the gradient of an objective using perturbations of the current benchmarks. We propose to choose a distribution for perturbations that minimizes the error of such distributions with provably smaller MSE.
arXiv Detail & Related papers (2022-11-27T04:42:05Z)
Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD. We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z)
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints. A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual. This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Debiasing a First-order Heuristic for Approximate Bi-level Optimization [38.068090269482425]
Approximate bi-level optimization (ABLO) consists of (outer-level) optimization problems, involving numerical (inner-level) optimization loops. There is a lack of theoretical understanding of FOM's convergence properties. We propose an unbiased FOM enjoying constant memory complexity as a function of $r$.
arXiv Detail & Related papers (2021-06-04T13:46:48Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points. The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding. In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives [50.12802772165797]
The averaging technique combines all iterative solutions into a single solution. Experiments have demonstrated trade-off and the effectiveness of averaging compared with other averaging schemes.
arXiv Detail & Related papers (2020-03-09T18:14:00Z)
Better Theory for SGD in the Nonconvex World [2.6397379133308214]
Large-scale non optimization problems are ubiquitous in modern machine learning. We perform experiments on the effects of a wide array of synthetic minibatch sizes on the Gradient Descent (SG) problem.
arXiv Detail & Related papers (2020-02-09T09:56:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.