Gradient-free stochastic optimization for additive models
- URL: http://arxiv.org/abs/2503.02131v2
- Date: Sat, 05 Apr 2025 13:02:50 GMT
- Title: Gradient-free stochastic optimization for additive models
- Authors: Arya Akhavan, Alexandre B. Tsybakov,
- Abstract summary: We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Lojasiewicz or the strong convexity condition.<n>We assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H"older family of functions.
- Score: 56.42455605591779
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-{\L}ojasiewicz or the strong convexity condition. Additionally, we assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H\"older family of functions. The additive model for H\"older classes of functions is well-studied in the literature on nonparametric function estimation, where it is shown that such a model benefits from a substantial improvement of the estimation accuracy compared to the H\"older model without additive structure. We study this established framework in the context of gradient-free optimization. We propose a randomized gradient estimator that, when plugged into a gradient descent algorithm, allows one to achieve minimax optimal optimization error of the order $dT^{-(\beta-1)/\beta}$, where $d$ is the dimension of the problem, $T$ is the number of queries and $\beta\ge 2$ is the H\"older degree of smoothness. We conclude that, in contrast to nonparametric estimation problems, no substantial gain of accuracy can be achieved when using additive models in gradient-free optimization.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization [0.6906005491572401]
We show that noise in batch descent (SGD) has the effect of smoothing objective function.<n>We show that there is an interesting relationship between the degree of smoothing by SGDs noise, and the well-studied sharpness' indicator.
arXiv Detail & Related papers (2023-11-15T07:27:40Z) - Variance-reduced Clipping for Non-convex Optimization [24.765794811146144]
Gradient clipping is a technique used in deep learning applications such as large-scale language modeling.
Recent experimental training have a fairly special behavior in that it mitigates order complexity.
arXiv Detail & Related papers (2023-03-02T00:57:38Z) - Stochastic Bias-Reduced Gradient Methods [44.35885731095432]
We develop a new primitive for optimization: a low-bias, low-cost smoothing of ther $x_star$ of any bound of the Moreau-Yoshida function.
arXiv Detail & Related papers (2021-06-17T13:33:05Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse
Gradients and Adaptive Sampling [29.600975900977343]
We propose a new $textbfZ$eroth-$textbfO$rder $textbfR$egularized $textbfO$ptimization method, dubbed ZORO.
When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function.
arXiv Detail & Related papers (2020-03-29T11:01:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.