On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization
- URL: http://arxiv.org/abs/2008.08757v2
- Date: Mon, 24 May 2021 08:52:07 GMT
- Title: On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization
- Authors: Xu Cai and Jonathan Scarlett
- Abstract summary: We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
- Score: 55.937424268654645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider algorithm-independent lower bounds for the problem
of black-box optimization of functions having a bounded norm is some
Reproducing Kernel Hilbert Space (RKHS), which can be viewed as a non-Bayesian
Gaussian process bandit problem. In the standard noisy setting, we provide a
novel proof technique for deriving lower bounds on the regret, with benefits
including simplicity, versatility, and an improved dependence on the error
probability. In a robust setting in which every sampled point may be perturbed
by a suitably-constrained adversary, we provide a novel lower bound for
deterministic strategies, demonstrating an inevitable joint dependence of the
cumulative regret on the corruption level and the time horizon, in contrast
with existing lower bounds that only characterize the individual dependencies.
Furthermore, in a distinct robust setting in which the final point is perturbed
by an adversary, we strengthen an existing lower bound that only holds for
target success probabilities very close to one, by allowing for arbitrary
success probabilities above $\frac{2}{3}$.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints [29.514323697659613]
We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints.
Our goal is to design best-of-both-worlds algorithms that perform under both and adversarial constraints.
arXiv Detail & Related papers (2024-05-25T08:09:36Z) - Online Constraint Tightening in Stochastic Model Predictive Control: A
Regression Approach [49.056933332667114]
No analytical solutions exist for chance-constrained optimal control problems.
We propose a data-driven approach for learning the constraint-tightening parameters online during control.
Our approach yields constraint-tightening parameters that tightly satisfy the chance constraints.
arXiv Detail & Related papers (2023-10-04T16:22:02Z) - Proportional Response: Contextual Bandits for Simple and Cumulative
Regret Minimization [29.579719765255927]
We propose a new family of efficient bandit algorithms for the contextual bandit setting.
Our algorithms work with any function class, are robust to model misspecification, and can be used in continuous arm settings.
arXiv Detail & Related papers (2023-07-05T08:34:54Z) - On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function.
We propose a general framework for both algorithm performance analysis.
We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - Bias no more: high-probability data-dependent regret bounds for
adversarial bandits and MDPs [48.44657553192801]
We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary.
Our approach relies on a simple increasing learning rate schedule, together with the help of logarithmically homogeneous self-concordant barriers and a strengthened Freedman's inequality.
arXiv Detail & Related papers (2020-06-14T22:09:27Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.