Related papers: Enhanced Derivative-Free Optimization Using Adaptive Correlation-Induced Finite Difference Estimators

Enhanced Derivative-Free Optimization Using Adaptive Correlation-Induced Finite Difference Estimators

URL: http://arxiv.org/abs/2502.20819v1
Date: Fri, 28 Feb 2025 08:05:54 GMT
Title: Enhanced Derivative-Free Optimization Using Adaptive Correlation-Induced Finite Difference Estimators
Authors: Guo Liang, Guangwu Liu, Kun Zhang,
Abstract summary: We develop an algorithm designed to enhance DFO in terms of both gradient estimation efficiency and sample efficiency.<n>We establish the consistency of our proposed algorithm and demonstrate that, despite using a batch of samples per iteration, it achieves the same convergence rate as the KW and SPSA methods.
Score: 6.054123928890574
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Gradient-based methods are well-suited for derivative-free optimization (DFO), where finite-difference (FD) estimates are commonly used as gradient surrogates. Traditional stochastic approximation methods, such as Kiefer-Wolfowitz (KW) and simultaneous perturbation stochastic approximation (SPSA), typically utilize only two samples per iteration, resulting in imprecise gradient estimates and necessitating diminishing step sizes for convergence. In this paper, we first explore an efficient FD estimate, referred to as correlation-induced FD estimate, which is a batch-based estimate. Then, we propose an adaptive sampling strategy that dynamically determines the batch size at each iteration. By combining these two components, we develop an algorithm designed to enhance DFO in terms of both gradient estimation efficiency and sample efficiency. Furthermore, we establish the consistency of our proposed algorithm and demonstrate that, despite using a batch of samples per iteration, it achieves the same convergence rate as the KW and SPSA methods. Additionally, we propose a novel stochastic line search technique to adaptively tune the step size in practice. Finally, comprehensive numerical experiments confirm the superior empirical performance of the proposed algorithm.

Related papers

A Trainable Optimizer [18.195022468462753]
We present a framework that jointly trains the full gradient estimator and the trainable weights of the model.<n>Pseudo-linear TO incurs negligible computational overhead, requiring only minimal additional multiplications.<n> Experiments demonstrate that TO methods converge faster than benchmark algorithms.
arXiv Detail & Related papers (2025-08-03T14:06:07Z)
Derivative-Free Optimization via Finite Difference Approximation: An Experimental Study [1.3886390523644807]
Derivative-free optimization (DFO) is vital in solving complex optimization problems where only noisy function evaluations are available through an oracle.<n>Two classical iteration approaches are Kiefer-Wolfowitz (KW) and simultaneous perturbation approximation (SPSA) algorithms.<n>This paper conducts a comprehensive experimental comparison among these approaches.
arXiv Detail & Related papers (2024-10-31T18:07:44Z)
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.<n>We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z)
A Correlation-induced Finite Difference Estimator [6.054123928890574]
We first provide a sample-driven method via the bootstrap technique to estimate the optimal perturbation, and then propose an efficient FD estimator based on correlated samples at the estimated optimal perturbation. Numerical results confirm the efficiency of our estimators and align well with the theory presented, especially in scenarios with small sample sizes.
arXiv Detail & Related papers (2024-05-09T09:27:18Z)
Efficient Computation of Sparse and Robust Maximum Association Estimators [0.4588028371034406]
Robust statistical estimators offer empirical precision but are often computationally challenging in high-dimensional sparse settings. Modern association estimator techniques are utilized for outliers without imposing resilience against other robust methods.
arXiv Detail & Related papers (2023-11-29T11:57:50Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
Tuning Stochastic Gradient Algorithms for Statistical Inference via Large-Sample Asymptotics [18.93569692490218]
tuning of gradient algorithms often based on trial-and-error rather than generalizable theory. We show that averaging with a large fixed step size is robust to the choice of tuning parameters. We lay the foundation for a systematic analysis of other gradient Monte Carlo algorithms.
arXiv Detail & Related papers (2022-07-25T17:58:09Z)
Accelerating Stochastic Probabilistic Inference [1.599072005190786]
Variational Inference (SVI) has been increasingly attractive thanks to its ability to find good posterior approximations of probabilistic models. Almost all the state-of-the-art SVI algorithms are based on first-order optimization and often suffer from poor convergence rate. We bridge the gap between second-order methods and variational inference by proposing a second-order based variational inference approach.
arXiv Detail & Related papers (2022-03-15T01:19:12Z)
Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings. We exploit a warm-start strategy to amortize the estimation of the exact gradient. By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.