Related papers: Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms

Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms

URL: http://arxiv.org/abs/2306.12383v3
Date: Mon, 25 Dec 2023 06:41:11 GMT
Title: Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms
Authors: Qian Yu, Yining Wang, Baihe Huang, Qi Lei, Jason D. Lee
Abstract summary: We show the first tight characterization of the optimal Hessian-dependent sample complexity. A Hessian-independent algorithm universally achieves the optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions.
Score: 64.10576998630981
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In stochastic zeroth-order optimization, a problem of practical relevance is understanding how to fully exploit the local geometry of the underlying objective function. We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity. Our contribution is twofold. First, from an information-theoretic point of view, we prove tight lower bounds on Hessian-dependent complexities by introducing a concept called energy allocation, which captures the interaction between the searching algorithm and the geometry of objective functions. A matching upper bound is obtained by solving the optimal energy spectrum. Then, algorithmically, we show the existence of a Hessian-independent algorithm that universally achieves the asymptotic optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions, which are enabled by a truncation method.

Related papers

Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z)
Proximal Oracles for Optimization and Sampling [18.77973093341588]
We consider convex optimization with non-smooth objective function and log-concave sampling with non-smooth potential. To overcome the challenges caused by non-smoothness, our algorithms employ two powerful proximal frameworks in optimization and sampling.
arXiv Detail & Related papers (2024-04-02T18:52:28Z)
Self-concordant Smoothing for Large-Scale Convex Composite Optimization [0.0]
We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other may be nonsmooth. We prove the convergence of two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton algorithm.
arXiv Detail & Related papers (2023-09-04T19:47:04Z)
Accelerated Cyclic Coordinate Dual Averaging with Extrapolation for Composite Convex Optimization [20.11028799145883]
We propose an Accelerated Cyclic Coordinate Dual Averaging with Extrapolation (A-CODER) method for composite convex optimization. We show that A-CODER attains the optimal convergence rate with improved dependence on the number of blocks compared to prior work. For the setting where the smooth component of the objective function is expressible in a finite sum form, we introduce a variance-reduced variant of A-CODER, VR-A-CODER, with state-of-the-art complexity guarantees.
arXiv Detail & Related papers (2023-03-28T19:46:30Z)
Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets. We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z)
An Algebraically Converging Stochastic Gradient Descent Algorithm for Global Optimization [14.336473214524663]
A key component in the algorithm is the randomness based on the value of the objective function. We prove the convergence of the algorithm with an algebra and tuning in the parameter space. We present several numerical examples to demonstrate the efficiency and robustness of the algorithm.
arXiv Detail & Related papers (2022-04-12T16:27:49Z)
A Granular Sieving Algorithm for Deterministic Global Optimization [6.01919376499018]
A gradient-free deterministic method is developed to solve global optimization problems for Lipschitz continuous functions. The method can be regarded as granular sieving with synchronous analysis in both the domain and range of the objective function.
arXiv Detail & Related papers (2021-07-14T10:03:03Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Accelerated Algorithms for Convex and Non-Convex Optimization on Manifolds [9.632674803757475]
We propose a scheme for solving convex and non- optimization problems on distance. Our proposed algorithm adapts to the level of complexity in the objective function.
arXiv Detail & Related papers (2020-10-18T02:48:22Z)
Sequential Subspace Search for Functional Bayesian Optimization Incorporating Experimenter Intuition [63.011641517977644]
Our algorithm generates a sequence of finite-dimensional random subspaces of functional space spanned by a set of draws from the experimenter's Gaussian Process. Standard Bayesian optimisation is applied on each subspace, and the best solution found used as a starting point (origin) for the next subspace. We test our algorithm in simulated and real-world experiments, namely blind function matching, finding the optimal precipitation-strengthening function for an aluminium alloy, and learning rate schedule optimisation for deep networks.
arXiv Detail & Related papers (2020-09-08T06:54:11Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.