Nearly Minimax Algorithms for Linear Bandits with Shared Representation
- URL: http://arxiv.org/abs/2203.15664v1
- Date: Tue, 29 Mar 2022 15:27:13 GMT
- Title: Nearly Minimax Algorithms for Linear Bandits with Shared Representation
- Authors: Jiaqi Yang, Qi Lei, Jason D. Lee, Simon S. Du
- Abstract summary: We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
- Score: 86.79657561369397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give novel algorithms for multi-task and lifelong linear bandits with
shared representation. Specifically, we consider the setting where we play $M$
linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit
tasks share a common $k(\ll d)$ dimensional linear representation. For both the
multi-task setting where we play the tasks concurrently, and the lifelong
setting where we play tasks sequentially, we come up with novel algorithms that
achieve $\widetilde{O}\left(d\sqrt{kMT} + kM\sqrt{T}\right)$ regret bounds,
which matches the known minimax regret lower bound up to logarithmic factors
and closes the gap in existing results [Yang et al., 2021]. Our main technique
include a more efficient estimator for the low-rank linear feature extractor
and an accompanied novel analysis for this estimator.
Related papers
- Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
We study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $tau$.
We show the minimax CVaR regret rate is $Omega(sqrttau-1AK)$, where $A$ is the number of actions and $K$ is the number of episodes.
We show that our algorithm achieves the optimal regret of $widetilde O(tau-1sqrtSAK)$ under a continuity assumption and in general attains a near
arXiv Detail & Related papers (2023-02-07T02:22:31Z) - Exploration in Linear Bandits with Rich Action Sets and its Implications
for Inference [23.364534479492715]
We show that the minimum eigenvalue of the expected matrix grows as $Omega(sqrtn) whenever the expected cumulative regret of the algorithm is $sqrtn)$.
We apply our result to two practical scenarios -- emphmodel selection and emphclustering in linear bandits.
arXiv Detail & Related papers (2022-07-23T20:25:07Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Near-optimal Representation Learning for Linear Bandits and Linear RL [41.33483293243257]
We first consider the setting where we play $M$ linear bandits with dimension $d$ concurrently.
These bandits share a common $k$-dimensional linear representation so that $kll d$ and $k ll M$.
We propose a sample-efficient algorithm, MTLR-OFUL, which leverages the shared representation to achieve $tildeO(MsqrtdkT + dsqrtkMT )$ regret.
arXiv Detail & Related papers (2021-02-08T11:11:53Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model.
We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs.
To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Low-Rank Generalized Linear Bandit Problems [19.052901817304743]
In a low-rank linear bandit problem, the reward of an action is the inner product between the action and an unknown low-rank matrix $Theta*$.
We propose an algorithm based on a novel combination of online-to-confidence-set conversioncitepabbasi2012online and the exponentially weighted average forecaster constructed by a covering of low-rank matrices.
arXiv Detail & Related papers (2020-06-04T15:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.