Information Directed Sampling for Linear Partial Monitoring
- URL: http://arxiv.org/abs/2002.11182v1
- Date: Tue, 25 Feb 2020 21:30:56 GMT
- Title: Information Directed Sampling for Linear Partial Monitoring
- Authors: Johannes Kirschner, Tor Lattimore, Andreas Krause
- Abstract summary: We introduce information directed sampling (IDS) for partial monitoring with a linear reward and observation structure.
IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game.
We extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.
- Score: 112.05623123909895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Partial monitoring is a rich framework for sequential decision making under
uncertainty that generalizes many well known bandit models, including linear,
combinatorial and dueling bandits. We introduce information directed sampling
(IDS) for stochastic partial monitoring with a linear reward and observation
structure. IDS achieves adaptive worst-case regret rates that depend on precise
observability conditions of the game. Moreover, we prove lower bounds that
classify the minimax regret of all finite games into four possible regimes. IDS
achieves the optimal rate in all cases up to logarithmic factors, without
tuning any hyper-parameters. We further extend our results to the contextual
and the kernelized setting, which significantly increases the range of possible
applications.
Related papers
- Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints [8.784438985280094]
We study problems as pure exploration in multi-armed bandits with unknown linear constraints.
First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints.
Second, we leverage the Lagrangian lower bound and the properties of convex to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX.
arXiv Detail & Related papers (2024-10-24T15:26:14Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Linear Partial Monitoring for Sequential Decision-Making: Algorithms,
Regret Bounds and Applications [70.67112733968654]
Partial monitoring is an expressive framework for sequential decision-making.
We present a simple and unified analysis of partial monitoring, and further extend the model to the contextual and kernelized setting.
arXiv Detail & Related papers (2023-02-07T18:58:25Z) - Local SGD in Overparameterized Linear Regression [0.0]
We consider distributed learning using constant stepsize SGD (DSGD) over several devices, each sending a final model update to a central server.
We show that the excess risk is of order of the variance provided the number of local nodes grows not too large with the global sample size.
arXiv Detail & Related papers (2022-10-20T19:58:22Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - Information Directed Sampling for Sparse Linear Bandits [42.232086950768476]
We develop a class of information-theoretic Bayesian regret bounds that nearly match existing lower bounds on a variety of problem instances.
Numerical results demonstrate significant regret reductions by sparse IDS relative to several baselines.
arXiv Detail & Related papers (2021-05-29T10:26:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.