Related papers: Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability

Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability

URL: http://arxiv.org/abs/2302.03770v2
Date: Wed, 11 Oct 2023 21:42:30 GMT
Title: Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability
Authors: Hanlin Zhu, Amy Zhang
Abstract summary: Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills that aim to reach diverse goals. offline GCRL only requires purely pre-collected datasets to perform training tasks. We show that a modified offline GCRL algorithm is both provably efficient with general function approximation and single-policy concentrability.
Score: 11.786486763236104
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills that aim to reach diverse goals. In particular, offline GCRL only requires purely pre-collected datasets to perform training tasks without additional interactions with the environment. Although offline GCRL has become increasingly prevalent and many previous works have demonstrated its empirical success, the theoretical understanding of efficient offline GCRL algorithms is not well established, especially when the state space is huge and the offline dataset only covers the policy we aim to learn. In this paper, we provide a rigorous theoretical analysis of an existing empirically successful offline GCRL algorithm. We prove that under slight modification, this algorithm enjoys an $\widetilde{O}(\text{poly}(1/\epsilon))$ sample complexity (where $\epsilon$ is the desired suboptimality of the learned policy) with general function approximation thanks to the property of (semi-)strong convexity of the objective functions. We only require nearly minimal assumptions on the dataset (single-policy concentrability) and the function class (realizability). Moreover, this algorithm consists of two uninterleaved optimization steps, which we refer to as $V$-learning and policy learning, and is computationally stable since it does not involve minimax optimization. We also empirically validate our theory by showing that the modified algorithm outperforms the previous algorithm in various real-world environments. To the best of our knowledge, this is the first algorithm that is both provably efficient with general function approximation and single-policy concentrability, and empirically successful without requiring solving minimax optimization problems.

Related papers

Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies. The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z)
A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation [66.26739783789387]
We propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for reinforcement learning. MQL-UCB achieves minimax optimal regret of $tildeO(dsqrtHK)$ when $K$ is sufficiently large and near-optimal policy switching cost. Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.
arXiv Detail & Related papers (2023-11-26T08:31:57Z)
Iteratively Refined Behavior Regularization for Offline Reinforcement Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration. By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement. Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z)
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning [23.222448307481073]
We propose a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage. Our algorithm combines the marginalized importance sampling framework with the actor-critic paradigm. We provide both theoretical analysis and experimental results to validate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2023-01-30T07:53:53Z)
CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards global optimality [5.0915256711576475]
This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trayy (TO) and Reinforcement Learning (RL) in a single trajectory.
arXiv Detail & Related papers (2022-11-12T10:16:35Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets [4.126347193869613]
Upside-Down Reinforcement Learning (UDRL) is an approach for solving problems that does not require value functions. Goal-Conditional Supervised Learning (GCSL) optimized a lower bound on goal-reaching performance. This raises expectations that such algorithms may enjoy guaranteed convergence to the optimal policy in arbitrary environments.
arXiv Detail & Related papers (2022-05-13T12:43:25Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation [74.3002974673248]
We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data. offline RL is becoming increasingly relevant in practice, because online data collection is well suited to safety-critical domains. Our results show that sample-efficient offline reinforcement learning requires either restrictive coverage conditions or representation conditions that go beyond complexity learning.
arXiv Detail & Related papers (2021-11-21T23:22:37Z)
Online Sub-Sampling for Reinforcement Learning with General Function Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm. For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times. In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z)
Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic. We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds. We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z)
FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks. This problem is still not fully understood, for which two major challenges need to be addressed. We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.