Open Problem: Tight Online Confidence Intervals for RKHS Elements
- URL: http://arxiv.org/abs/2110.15458v1
- Date: Thu, 28 Oct 2021 22:36:20 GMT
- Title: Open Problem: Tight Online Confidence Intervals for RKHS Elements
- Authors: Sattar Vakili, Jonathan Scarlett, Tara Javidi
- Abstract summary: We formalize the question of online confidence intervals in the RKHS setting and overview the existing results.
It is unclear whether the suboptimal regret bound is a fundamental shortcoming of these algorithms or an artifact of the proof.
- Score: 57.363123214464764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Confidence intervals are a crucial building block in the analysis of various
online learning problems. The analysis of kernel based bandit and reinforcement
learning problems utilize confidence intervals applicable to the elements of a
reproducing kernel Hilbert space (RKHS). However, the existing confidence
bounds do not appear to be tight, resulting in suboptimal regret bounds. In
fact, the existing regret bounds for several kernelized bandit algorithms
(e.g., GP-UCB, GP-TS, and their variants) may fail to even be sublinear. It is
unclear whether the suboptimal regret bound is a fundamental shortcoming of
these algorithms or an artifact of the proof, and the main challenge seems to
stem from the online (sequential) nature of the observation points. We
formalize the question of online confidence intervals in the RKHS setting and
overview the existing results.
Related papers
- Error Bounds For Gaussian Process Regression Under Bounded Support Noise With Applications To Safety Certification [12.813902876908127]
We provide novel error bounds for applying Gaussian Process Regression (GPR) under bounded support noise.
We show that these errors are substantially tighter than existing state-of-the-art bounds and are particularly well-suited for GPR with neural network kernels.
We illustrate how these bounds can be combined with barrier functions to successfully quantify the safety probability of an unknown dynamical system.
arXiv Detail & Related papers (2024-08-16T22:03:32Z) - Tighter Confidence Bounds for Sequential Kernel Regression [3.683202928838613]
We use martingale tail inequalities to establish new confidence bounds for sequential kernel regression.
Our confidence bounds can be computed by solving a conic program, although this bare version quickly becomes impractical.
We find that when our confidence bounds replace existing ones, the KernelUCB algorithm has better empirical performance, a matching worst-case performance guarantee and comparable computational cost.
arXiv Detail & Related papers (2024-03-19T13:47:35Z) - On the Sublinear Regret of GP-UCB [58.25014663727544]
We show that the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm enjoys nearly optimal regret rates.
Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel.
arXiv Detail & Related papers (2023-07-14T13:56:11Z) - Adaptation to Misspecified Kernel Regularity in Kernelised Bandits [27.912690223941386]
We study adaptivity to the regularity of translation-invariant kernels.
It is impossible to simultaneously achieve optimal cumulative regret in a pair of RKHSs with different regularities.
We connect the statistical difficulty for adaptivity in continuum-armed bandits in three fundamental types of function spaces.
arXiv Detail & Related papers (2023-04-26T21:12:45Z) - Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL)
Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets.
We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z) - Near-Optimal Confidence Sequences for Bounded Random Variables [5.901337162013615]
A fundamental problem for online inference is to provide a sequence of confidence intervals that are valid uniformly over the growing-into-infinity sample sizes.
We provide a near-optimal confidence sequence for bounded random variables by utilizing Bentkus' concentration results.
The resulting confidence sequence is confirmed to be favorable in both synthetic coverage problems and an application to adaptive stopping algorithms.
arXiv Detail & Related papers (2020-06-09T02:50:01Z) - Optimal Change-Point Detection with Training Sequences in the Large and
Moderate Deviations Regimes [72.68201611113673]
This paper investigates a novel offline change-point detection problem from an information-theoretic perspective.
We assume that the knowledge of the underlying pre- and post-change distributions are not known and can only be learned from the training sequences which are available.
arXiv Detail & Related papers (2020-03-13T23:39:40Z) - Corruption-Tolerant Gaussian Process Bandit Optimization [130.60115798580136]
We consider the problem of optimizing an unknown (typically non-producing) function with a bounded norm.
We introduce an algorithm based on Fast-Slow GP-UCB based on "fast but non-robust" and "slow"
We argue that certain dependencies cannot be required depending on the corruption level.
arXiv Detail & Related papers (2020-03-04T09:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.