Related papers: Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

URL: http://arxiv.org/abs/2403.03361v1
Date: Tue, 5 Mar 2024 23:08:18 GMT
Title: Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems
Authors: Amaury Gouverneur, Borja Rodr\'iguez-G\'alvez, Tobias J. Oechtering, Mikael Skoglund
Abstract summary: We study the regret of a variant of the Thompson-Sampling algorithm for bandit problems. Under suitable continuity assumption of the rewards, our bound offers a tight rate of $O(dsqrtT)$ for $d$-dimensional linear bandit problems.
Score: 37.82763068378491
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a bound with regret rate of $O(d\sqrt{T \log(T)})$ for the $d$-dimensional linear bandit setting. We focus on bandit problems with a metric action space and, using a chaining argument, we establish new bounds that depend on the metric entropy of the action space for a variant of Thompson-Sampling. Under suitable continuity assumption of the rewards, our bound offers a tight rate of $O(d\sqrt{T})$ for $d$-dimensional linear bandit problems.

Related papers

Bandit Optimal Transport [1.0878040851638]
This article considers the bandit problem of learning to solve generic Kantorovich and entropic OT problems from repeated interactions. We provide $tildemathcal O(sqrtT)$ regret algorithms for both problems by extending linear bandits on Hilbert spaces.
arXiv Detail & Related papers (2025-02-11T09:24:25Z)
Feel-Good Thompson Sampling for Contextual Dueling Bandits [49.450050682705026]
We propose a Thompson sampling algorithm, named FGTS.CDB, for linear contextual dueling bandits. At the core of our algorithm is a new Feel-Good exploration term specifically tailored for dueling bandits. Our algorithm achieves nearly minimax-optimal regret, i.e., $tildemathcalO(dsqrt T)$, where $d$ is the model dimension and $T$ is the time horizon.
arXiv Detail & Related papers (2024-04-09T04:45:18Z)
Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms [39.70492757288025]
We address the contextual linear bandit problem, where a decision maker is provided a context. We show that the contextual problem can be solved as a linear bandit problem. Our results imply a $O(dsqrtTlog T)$ high-probability regret bound for contextual linear bandits.
arXiv Detail & Related papers (2022-11-08T22:18:53Z)
Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits [18.971564419292893]
We propose a linear contextual bandit algorithm with $O(sqrtdTlog T)$ regret bound, where $d$ is the dimension of contexts and $T$ is the time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization.
arXiv Detail & Related papers (2022-06-11T02:43:17Z)
Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits [10.127456032874978]
We prove an instance (poly) logarithmic regret for contextual bandits with linear payoff. contexts indeed help to reduce the regret from $sqrtT$ to $polylog(T)$.
arXiv Detail & Related papers (2022-05-19T23:41:46Z)
Complete Policy Regret Bounds for Tallying Bandits [51.039677652803675]
Policy regret is a well established notion of measuring the performance of an online learning algorithm against an adaptive adversary. We study restrictions on the adversary that enable efficient minimization of the emphcomplete policy regret We provide an algorithm that w.h.p a complete policy regret guarantee of $tildemathcalO(mKsqrtT)$, where the $tildemathcalO$ notation hides only logarithmic factors.
arXiv Detail & Related papers (2022-04-24T03:10:27Z)
Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation. We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
Contextual Blocking Bandits [35.235375147227124]
We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards. Playing an arm blocks it (across all contexts) for a fixed and known number of future time steps. We propose a UCB-based variant of the full-information algorithm that guarantees a $mathcalO(log T)$-regret w.r.t. an $alpha$regret strategy in $T time steps, matching the $Omega(log(T)$ lower bound
arXiv Detail & Related papers (2020-03-06T20:34:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.