Related papers: Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL

Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL

URL: http://arxiv.org/abs/2106.11935v2
Date: Wed, 14 Feb 2024 07:05:06 GMT
Title: Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL
Authors: Weitong Zhang and Jiafan He and Dongruo Zhou and Amy Zhang and Quanquan Gu
Abstract summary: We propose an efficient algorithm, called ReLEX, for representation learning in both online and offline reinforcement learning. We show that the online version of ReLEX, called Re-UCB, always performs no worse than the state-of-the-art algorithm without representation selection. For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space.
Score: 84.14947307790361
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The success of deep reinforcement learning (DRL) lies in its ability to learn a representation that is well-suited for the exploration and exploitation task. To understand how the choice of representation can improve the efficiency of reinforcement learning (RL), we study representation selection for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represented in a bilinear form. We propose an efficient algorithm, called ReLEX, for representation learning in both online and offline RL. Specifically, we show that the online version of ReLEX, called ReLEX-UCB, always performs no worse than the state-of-the-art algorithm without representation selection, and achieves a strictly better constant regret if the representation function class has a "coverage" property over the entire state-action space. For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity. This is the first result with constant sample complexity for representation learning in offline RL.

Related papers

Offline Multitask Representation Learning for Reinforcement Learning [86.26066704016056]
We study offline multitask representation learning in reinforcement learning (RL) We propose a new algorithm called MORL for offline multitask representation learning. Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model.
arXiv Detail & Related papers (2024-03-18T08:50:30Z)
Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks. We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z)
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL) We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions. Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z)
Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation. While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable. We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z)
Provable Benefit of Multitask Representation Learning in Reinforcement Learning [46.11628795660159]
This paper theoretically characterizes the benefit of representation learning under the low-rank Markov decision process (MDP) model. To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask reinforcement learning.
arXiv Detail & Related papers (2022-06-13T04:29:02Z)
Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning [3.308743964406687]
$k$-Step Latent (KSL) is a representation learning method that enforces temporal consistency of representations. KSL produces encoders that generalize better to new tasks unseen during training.
arXiv Detail & Related papers (2021-10-11T00:16:43Z)
Representation Learning for Online and Offline RL in Low-rank MDPs [36.398511188102205]
We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix. For the online setting, operating with the same computational oracles used in FLAMBE, we propose an algorithm REP-UCB Upper Confidence Bound Representation learning for RL. For the offline RL setting, we develop an algorithm that leverages pessimism to learn under a partial coverage condition.
arXiv Detail & Related papers (2021-10-09T22:04:34Z)
Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel. On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations. On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z)
Graph-based State Representation for Deep Reinforcement Learning [1.5901689240516976]
We exploit the fact that the underlying Markov decision process (MDP) represents a graph, which enables us to incorporate the topological information for effective state representation learning. Motivated by the recent success of node representations for several graph analytical tasks we specifically investigate the capability of node representation learning methods to effectively encode the topology of the underlying MDP in Deep RL. We find that all embedding methods outperform the commonly used matrix representation of grid-world environments in all of the studied cases.
arXiv Detail & Related papers (2020-04-29T05:43:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.