Chained Information-Theoretic bounds and Tight Regret Rate for Linear
Bandit Problems
- URL: http://arxiv.org/abs/2403.03361v1
- Date: Tue, 5 Mar 2024 23:08:18 GMT
- Title: Chained Information-Theoretic bounds and Tight Regret Rate for Linear
Bandit Problems
- Authors: Amaury Gouverneur, Borja Rodr\'iguez-G\'alvez, Tobias J. Oechtering,
Mikael Skoglund
- Abstract summary: We study the regret of a variant of the Thompson-Sampling algorithm for bandit problems.
Under suitable continuity assumption of the rewards, our bound offers a tight rate of $O(dsqrtT)$ for $d$-dimensional linear bandit problems.
- Score: 37.82763068378491
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the Bayesian regret of a variant of the Thompson-Sampling
algorithm for bandit problems. It builds upon the information-theoretic
framework of [Russo and Van Roy, 2015] and, more specifically, on the
rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a
bound with regret rate of $O(d\sqrt{T \log(T)})$ for the $d$-dimensional linear
bandit setting. We focus on bandit problems with a metric action space and,
using a chaining argument, we establish new bounds that depend on the metric
entropy of the action space for a variant of Thompson-Sampling.
Under suitable continuity assumption of the rewards, our bound offers a tight
rate of $O(d\sqrt{T})$ for $d$-dimensional linear bandit problems.
Related papers
- Bandit Optimal Transport [1.0878040851638]
This article considers the bandit problem of learning to solve generic Kantorovich and entropic OT problems from repeated interactions.
We provide $tildemathcal O(sqrtT)$ regret algorithms for both problems by extending linear bandits on Hilbert spaces.
arXiv Detail & Related papers (2025-02-11T09:24:25Z) - Feel-Good Thompson Sampling for Contextual Dueling Bandits [49.450050682705026]
We propose a Thompson sampling algorithm, named FGTS.CDB, for linear contextual dueling bandits.
At the core of our algorithm is a new Feel-Good exploration term specifically tailored for dueling bandits.
Our algorithm achieves nearly minimax-optimal regret, i.e., $tildemathcalO(dsqrt T)$, where $d$ is the model dimension and $T$ is the time horizon.
arXiv Detail & Related papers (2024-04-09T04:45:18Z) - Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear
Bandit Algorithms [39.70492757288025]
We address the contextual linear bandit problem, where a decision maker is provided a context.
We show that the contextual problem can be solved as a linear bandit problem.
Our results imply a $O(dsqrtTlog T)$ high-probability regret bound for contextual linear bandits.
arXiv Detail & Related papers (2022-11-08T22:18:53Z) - Squeeze All: Novel Estimator and Self-Normalized Bound for Linear
Contextual Bandits [18.971564419292893]
We propose a linear contextual bandit algorithm with $O(sqrtdTlog T)$ regret bound, where $d$ is the dimension of contexts and $T$ is the time horizon.
Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization.
arXiv Detail & Related papers (2022-06-11T02:43:17Z) - Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret
in Stochastic Contextual Linear Bandits [10.127456032874978]
We prove an instance (poly) logarithmic regret for contextual bandits with linear payoff.
contexts indeed help to reduce the regret from $sqrtT$ to $polylog(T)$.
arXiv Detail & Related papers (2022-05-19T23:41:46Z) - Complete Policy Regret Bounds for Tallying Bandits [51.039677652803675]
Policy regret is a well established notion of measuring the performance of an online learning algorithm against an adaptive adversary.
We study restrictions on the adversary that enable efficient minimization of the emphcomplete policy regret
We provide an algorithm that w.h.p a complete policy regret guarantee of $tildemathcalO(mKsqrtT)$, where the $tildemathcalO$ notation hides only logarithmic factors.
arXiv Detail & Related papers (2022-04-24T03:10:27Z) - Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.