Optimistic Information Directed Sampling
- URL: http://arxiv.org/abs/2402.15411v2
- Date: Thu, 27 Jun 2024 16:15:39 GMT
- Title: Optimistic Information Directed Sampling
- Authors: Gergely Neu, Matteo Papini, Ludovic Schwartz,
- Abstract summary: We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class.
We propose a new analytic framework for this setting that bridges the Bayesian theory of information-directed sampling due to Russo and Van Roy and the worst-case theory of Foster, Kakade Qian, and Rakhlin (2021) based on the decision-estimation coefficient.
- Score: 15.704243709119726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory of information-directed sampling due to Russo and Van Roy (2018) and the worst-case theory of Foster, Kakade, Qian, and Rakhlin (2021) based on the decision-estimation coefficient. Drawing from both lines of work, we propose a algorithmic template called Optimistic Information-Directed Sampling and show that it can achieve instance-dependent regret guarantees similar to the ones achievable by the classic Bayesian IDS method, but with the major advantage of not requiring any Bayesian assumptions. The key technical innovation of our analysis is introducing an optimistic surrogate model for the regret and using it to define a frequentist version of the Information Ratio of Russo and Van Roy (2018), and a less conservative version of the Decision Estimation Coefficient of Foster et al. (2021). Keywords: Contextual bandits, information-directed sampling, decision estimation coefficient, first-order regret bounds.
Related papers
- Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability [71.82666334363174]
We develop a unified framework for lower bound methods in statistical estimation and interactive decision making.
We introduce a novel measure, decision dimension, which facilitates the complexity of new lower bounds for interactive decision making.
arXiv Detail & Related papers (2024-10-07T15:14:58Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Tight Guarantees for Interactive Decision Making with the
Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts.
We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al.
Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z) - Hypothesis Transfer in Bandits by Weighted Models [8.759884299087835]
We consider the problem of contextual multi-armed bandits in the setting of hypothesis transfer learning.
We show a re-weighting scheme for which we show a reduction in the regret over the classic Linear UCB when transfer is desired.
We further extend this method to an arbitrary amount of source models, where the algorithm decides which model is preferred at each time step.
arXiv Detail & Related papers (2022-11-14T14:13:02Z) - Lifting the Information Ratio: An Information-Theoretic Analysis of
Thompson Sampling for Contextual Bandits [17.470829701201435]
We adapt the information-theoretic perspective of Russo and Van Roy [2016] to the contextual setting by introducing a new concept of information ratio.
This allows us to bound the regret in terms of the entropy of the prior distribution through a remarkably simple proof.
An interesting special case is that of logistic bandits with d-dimensional parameters, K actions, and Lipschitz logits, for which we provide a $widetildeO(sqrtdKT)$ regret upper-bound that does not depend on the smallest slope of the sigmoid link function.
arXiv Detail & Related papers (2022-05-27T12:04:07Z) - Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms.
We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z) - Contributions to Large Scale Bayesian Inference and Adversarial Machine
Learning [0.0]
The rampant adoption of ML methodologies has revealed that models are usually adopted to make decisions without taking into account the uncertainties in their predictions.
We believe that developing ML systems that take into predictive account uncertainties and are robust against adversarial examples is a must for real-world tasks.
arXiv Detail & Related papers (2021-09-25T23:02:47Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - Towards Robust and Reliable Algorithmic Recourse [11.887537452826624]
We propose a novel framework, RObust Algorithmic Recourse (ROAR), that leverages adversarial training for finding recourses that are robust to model shifts.
We also carry out detailed theoretical analysis which underscores the importance of constructing recourses that are robust to model shifts.
arXiv Detail & Related papers (2021-02-26T17:38:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.