Related papers: Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

URL: http://arxiv.org/abs/2210.13083v1
Date: Mon, 24 Oct 2022 10:04:54 GMT
Title: Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
Authors: Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta
Abstract summary: We propose BanditSRL, a representation learning algorithm that learns a realizable representation with good spectral properties. We prove that BanditSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available.
Score: 103.69464492445779
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BanditSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BanditSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BanditSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.

Related papers

iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning [24.684363928059113]
We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. We achieve high performance and prevent representation collapse by quantizing the latent representation. Our method, named iQRL: implicitly Quantized Reinforcement Learning, is straightforward, compatible with any model-free RL algorithm.
arXiv Detail & Related papers (2024-06-04T18:15:44Z)
On the Complexity of Representation Learning in Contextual Linear Bandits [110.84649234726442]
We show that representation learning is fundamentally more complex than linear bandits. In particular, learning with a given set of representations is never simpler than learning with the worst realizable representation in the set.
arXiv Detail & Related papers (2022-12-19T13:08:58Z)
Improving Self-Supervised Learning by Characterizing Idealized Representations [155.1457170539049]
We prove necessary and sufficient conditions for any task invariant to given data augmentations. For contrastive learning, our framework prescribes simple but significant improvements to previous methods. For non-contrastive learning, we use our framework to derive a simple and novel objective.
arXiv Detail & Related papers (2022-09-13T18:01:03Z)
Light-weight probing of unsupervised representations for Reinforcement Learning [20.638410483549706]
We study whether linear probing can be a proxy evaluation task for the quality of unsupervised RL representation. We show that the probing tasks are strongly rank correlated with the downstream RL performance on the Atari100k Benchmark. This provides a more efficient method for exploring the space of pretraining algorithms and identifying promising pretraining recipes.
arXiv Detail & Related papers (2022-08-25T21:08:01Z)
Context Uncertainty in Contextual Bandits with Applications to Recommender Systems [16.597836265345634]
We propose a new type of recurrent neural networks, dubbed recurrent exploration networks (REN), to jointly perform representation learning and effective exploration in the latent space. Our theoretical analysis shows that REN can preserve the rate-linear suboptimal regret even when there exists uncertainty in the learned representations. Our empirical study demonstrates that REN can achieve satisfactory long-term rewards on both synthetic and real-world recommendation datasets, outperforming state-of-the-art models.
arXiv Detail & Related papers (2022-02-01T23:23:50Z)
Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning [3.308743964406687]
$k$-Step Latent (KSL) is a representation learning method that enforces temporal consistency of representations. KSL produces encoders that generalize better to new tasks unseen during training.
arXiv Detail & Related papers (2021-10-11T00:16:43Z)
Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL [84.14947307790361]
We propose an efficient algorithm, called ReLEX, for representation learning in both online and offline reinforcement learning. We show that the online version of ReLEX, called Re-UCB, always performs no worse than the state-of-the-art algorithm without representation selection. For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space.
arXiv Detail & Related papers (2021-06-22T17:16:50Z)
How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm. We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure. This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z)
Leveraging Good Representations in Linear Contextual Bandits [131.91060536108301]
A contextual bandit problem may admit multiple linear representations. Recent works showed that there exist "good" representations for which constant problem-dependent regret can be achieved. We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation.
arXiv Detail & Related papers (2021-04-08T14:05:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.