Provably Efficient Offline Goal-Conditioned Reinforcement Learning with
General Function Approximation and Single-Policy Concentrability
- URL: http://arxiv.org/abs/2302.03770v2
- Date: Wed, 11 Oct 2023 21:42:30 GMT
- Title: Provably Efficient Offline Goal-Conditioned Reinforcement Learning with
General Function Approximation and Single-Policy Concentrability
- Authors: Hanlin Zhu, Amy Zhang
- Abstract summary: Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills that aim to reach diverse goals.
offline GCRL only requires purely pre-collected datasets to perform training tasks.
We show that a modified offline GCRL algorithm is both provably efficient with general function approximation and single-policy concentrability.
- Score: 11.786486763236104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Goal-conditioned reinforcement learning (GCRL) refers to learning
general-purpose skills that aim to reach diverse goals. In particular, offline
GCRL only requires purely pre-collected datasets to perform training tasks
without additional interactions with the environment. Although offline GCRL has
become increasingly prevalent and many previous works have demonstrated its
empirical success, the theoretical understanding of efficient offline GCRL
algorithms is not well established, especially when the state space is huge and
the offline dataset only covers the policy we aim to learn. In this paper, we
provide a rigorous theoretical analysis of an existing empirically successful
offline GCRL algorithm. We prove that under slight modification, this algorithm
enjoys an $\widetilde{O}(\text{poly}(1/\epsilon))$ sample complexity (where
$\epsilon$ is the desired suboptimality of the learned policy) with general
function approximation thanks to the property of (semi-)strong convexity of the
objective functions. We only require nearly minimal assumptions on the dataset
(single-policy concentrability) and the function class (realizability).
Moreover, this algorithm consists of two uninterleaved optimization steps,
which we refer to as $V$-learning and policy learning, and is computationally
stable since it does not involve minimax optimization. We also empirically
validate our theory by showing that the modified algorithm outperforms the
previous algorithm in various real-world environments. To the best of our
knowledge, this is the first algorithm that is both provably efficient with
general function approximation and single-policy concentrability, and
empirically successful without requiring solving minimax optimization problems.
Related papers
- Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies.
The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Importance Weighted Actor-Critic for Optimal Conservative Offline
Reinforcement Learning [23.222448307481073]
We propose a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage.
Our algorithm combines the marginalized importance sampling framework with the actor-critic paradigm.
We provide both theoretical analysis and experimental results to validate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2023-01-30T07:53:53Z) - CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards
global optimality [5.0915256711576475]
This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trayy (TO) and Reinforcement Learning (RL) in a single trajectory.
arXiv Detail & Related papers (2022-11-12T10:16:35Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Upside-Down Reinforcement Learning Can Diverge in Stochastic
Environments With Episodic Resets [4.126347193869613]
Upside-Down Reinforcement Learning (UDRL) is an approach for solving problems that does not require value functions.
Goal-Conditional Supervised Learning (GCSL) optimized a lower bound on goal-reaching performance.
This raises expectations that such algorithms may enjoy guaranteed convergence to the optimal policy in arbitrary environments.
arXiv Detail & Related papers (2022-05-13T12:43:25Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Offline Reinforcement Learning: Fundamental Barriers for Value Function
Approximation [74.3002974673248]
We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data.
offline RL is becoming increasingly relevant in practice, because online data collection is well suited to safety-critical domains.
Our results show that sample-efficient offline reinforcement learning requires either restrictive coverage conditions or representation conditions that go beyond complexity learning.
arXiv Detail & Related papers (2021-11-21T23:22:37Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z) - Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic.
We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds.
We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.