Near-optimal Representation Learning for Linear Bandits and Linear RL
- URL: http://arxiv.org/abs/2102.04132v1
- Date: Mon, 8 Feb 2021 11:11:53 GMT
- Title: Near-optimal Representation Learning for Linear Bandits and Linear RL
- Authors: Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang
- Abstract summary: We first consider the setting where we play $M$ linear bandits with dimension $d$ concurrently.
These bandits share a common $k$-dimensional linear representation so that $kll d$ and $k ll M$.
We propose a sample-efficient algorithm, MTLR-OFUL, which leverages the shared representation to achieve $tildeO(MsqrtdkT + dsqrtkMT )$ regret.
- Score: 41.33483293243257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies representation learning for multi-task linear bandits and
multi-task episodic RL with linear value function approximation. We first
consider the setting where we play $M$ linear bandits with dimension $d$
concurrently, and these bandits share a common $k$-dimensional linear
representation so that $k\ll d$ and $k \ll M$. We propose a sample-efficient
algorithm, MTLR-OFUL, which leverages the shared representation to achieve
$\tilde{O}(M\sqrt{dkT} + d\sqrt{kMT} )$ regret, with $T$ being the number of
total steps. Our regret significantly improves upon the baseline
$\tilde{O}(Md\sqrt{T})$ achieved by solving each task independently. We further
develop a lower bound that shows our regret is near-optimal when $d > M$.
Furthermore, we extend the algorithm and analysis to multi-task episodic RL
with linear value function approximation under low inherent Bellman error
\citep{zanette2020learning}. To the best of our knowledge, this is the first
theoretical result that characterizes the benefits of multi-task representation
learning for exploration in RL with function approximation.
Related papers
- Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
We propose a provably efficient reinforcement learning algorithm (both computationally and statistically) with general value function approximations.
Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting.
arXiv Detail & Related papers (2023-02-22T20:21:25Z) - Joint Representation Training in Sequential Tasks with Shared Structure [40.1056491921582]
We introduce the Shared-MatrixRL algorithm for the setting of Multitask MatrixRL.
We show the regret on the the $P$ tasks can be improved from $O(PHdsqrtNH)$ to $O((HdsqrtrP + HPsqrtrd)sqrtNH)$ over $N$ episodes of horizon $H$.
arXiv Detail & Related papers (2022-06-24T18:10:00Z) - Nearly Minimax Optimal Reinforcement Learning with Linear Function
Approximation [25.60689712525918]
We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear.
We propose a novel-efficient algorithm, LSVI-UCB$+$, which achieves an $widetildeO(HdsqrtT)$ regret bound where $H$ is the episode length, $d$ is the feature dimension, and $T$ is the number of steps.
arXiv Detail & Related papers (2022-06-23T06:04:21Z) - Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z) - Nearly Minimax Optimal Regret for Learning Infinite-horizon
Average-reward MDPs with Linear Function Approximation [95.80683238546499]
We propose a new algorithm UCRL2-VTR, which can be seen as an extension of the UCRL2 algorithm with linear function approximation.
We show that UCRL2-VTR with Bernstein-type bonus can achieve a regret of $tildeO(dsqrtDT)$, where $d$ is the dimension of the feature mapping.
We also prove a matching lower bound $tildeOmega(dsqrtDT)$, which suggests that the proposed UCRL2-VTR is minimax optimal up to logarithmic factors
arXiv Detail & Related papers (2021-02-15T02:08:39Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model.
We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs.
To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z) - Logarithmic Regret for Reinforcement Learning with Linear Function
Approximation [99.59319332864129]
We show that logarithmic regret is attainable under two recently proposed linear MDP assumptions.
To the best of our knowledge, these are the first logarithmic regret bounds for RL with linear function approximation.
arXiv Detail & Related papers (2020-11-23T17:25:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.