Asymptotically Optimal Information-Directed Sampling
- URL: http://arxiv.org/abs/2011.05944v4
- Date: Fri, 2 Jul 2021 08:21:07 GMT
- Title: Asymptotically Optimal Information-Directed Sampling
- Authors: Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesv\'ari
- Abstract summary: We introduce a simple and efficient algorithm for linear bandits with finitely many actions that isally optimal and nearly worst-case optimal in finite time.
The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines lower bound.
- Score: 37.816004557470556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a simple and efficient algorithm for stochastic linear bandits
with finitely many actions that is asymptotically optimal and (nearly)
worst-case optimal in finite time. The approach is based on the frequentist
information-directed sampling (IDS) framework, with a surrogate for the
information gain that is informed by the optimization problem that defines the
asymptotic lower bound. Our analysis sheds light on how IDS balances the
trade-off between regret and information and uncovers a surprising connection
between the recently proposed primal-dual methods and the IDS algorithm. We
demonstrate empirically that IDS is competitive with UCB in finite-time, and
can be significantly better in the asymptotic regime.
Related papers
- Sparse Optimistic Information Directed Sampling [11.986224119327387]
We show that SOIDS can optimally balance information and regret.<n>We show that SOIDS simultaneously achieves optimal worst-case regret in both the data-rich and data-poor regimes.
arXiv Detail & Related papers (2025-10-28T09:42:15Z) - Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z) - Diffusion Stochastic Optimization for Min-Max Problems [33.73046548872663]
The optimistic gradient method is useful in addressing minimax optimization problems.
Motivated by the observation that the conventional version suffers from the need for a large batch size, we introduce and analyze a new formulation termed Samevareps-generativeOGOG.
arXiv Detail & Related papers (2024-01-26T01:16:59Z) - Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and
Optimality [5.817158625734484]
Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning.
Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings.
arXiv Detail & Related papers (2023-07-13T17:29:01Z) - Reweighted Interacting Langevin Diffusions: an Accelerated Sampling
Methodfor Optimization [28.25662317591378]
We propose a new technique to accelerate sampling methods for solving difficult optimization problems.
Our method investigates the connection between posterior distribution sampling and optimization with Langevin dynamics.
arXiv Detail & Related papers (2023-01-30T03:48:20Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Asymptotic Randomised Control with applications to bandits [0.0]
We consider a general multi-armed bandit problem with correlated elements as a relaxed control problem.
By introducing an entropy regularisation, we obtain a smooth approximation to the value function.
This yields a novel semi-index approximation of the optimal decision process.
arXiv Detail & Related papers (2020-10-14T17:17:48Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.