Learning from Sparse Offline Datasets via Conservative Density
Estimation
- URL: http://arxiv.org/abs/2401.08819v2
- Date: Mon, 11 Mar 2024 14:43:52 GMT
- Title: Learning from Sparse Offline Datasets via Conservative Density
Estimation
- Authors: Zhepeng Cen, Zuxin Liu, Zitong Wang, Yihang Yao, Henry Lam, Ding Zhao
- Abstract summary: We propose a novel training algorithm called Conservative Density Estimation (CDE)
CDE addresses the challenge by explicitly imposing constraints on the state-action occupancy stationary distribution.
Our method achieves state-of-the-art performance on the D4RL benchmark.
- Score: 27.93418377019955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) offers a promising direction for learning
policies from pre-collected datasets without requiring further interactions
with the environment. However, existing methods struggle to handle
out-of-distribution (OOD) extrapolation errors, especially in sparse reward or
scarce data settings. In this paper, we propose a novel training algorithm
called Conservative Density Estimation (CDE), which addresses this challenge by
explicitly imposing constraints on the state-action occupancy stationary
distribution. CDE overcomes the limitations of existing approaches, such as the
stationary distribution correction method, by addressing the support mismatch
issue in marginal importance sampling. Our method achieves state-of-the-art
performance on the D4RL benchmark. Notably, CDE consistently outperforms
baselines in challenging tasks with sparse rewards or insufficient data,
demonstrating the advantages of our approach in addressing the extrapolation
error problem in offline RL.
Related papers
- Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning [11.0460569590737]
This paper presents advanced techniques of training diffusion policies for offline reinforcement learning (RL)
We show that an SDE has a solution that we can use to calculate the log probability of the policy, yielding an entropy regularizer that improves the exploration of offline datasets.
By combining the entropy-regularized diffusion policy with Q-ensembles in offline RL, our method achieves state-of-the-art performance on most tasks in D4RL benchmarks.
arXiv Detail & Related papers (2024-02-06T15:34:30Z) - Reasoning with Latent Diffusion in Offline Reinforcement Learning [11.349356866928547]
offline reinforcement learning holds promise as a means to learn high-reward policies from a static dataset.
Key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset.
We propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills.
arXiv Detail & Related papers (2023-09-12T20:58:21Z) - Offline Reinforcement Learning with Imbalanced Datasets [23.454333727200623]
A real-world offline reinforcement learning (RL) dataset is often imbalanced over the state space due to the challenge of exploration or safety considerations.
We show that typically offline RL methods based on distributional constraints, such as conservative Q-learning (CQL), are ineffective in extracting policies under the imbalanced dataset.
Inspired by natural intelligence, we propose a novel offline RL method that utilizes the augmentation of CQL with a retrieval process to recall past related experiences.
arXiv Detail & Related papers (2023-07-06T03:22:19Z) - Offline Imitation Learning with Suboptimal Demonstrations via Relaxed
Distribution Matching [109.5084863685397]
offline imitation learning (IL) promises the ability to learn performant policies from pre-collected demonstrations without interactions with the environment.
We present RelaxDICE, which employs an asymmetrically-relaxed f-divergence for explicit support regularization.
Our method significantly outperforms the best prior offline method in six standard continuous control environments.
arXiv Detail & Related papers (2023-03-05T03:35:11Z) - Offline Reinforcement Learning with Adaptive Behavior Regularization [1.491109220586182]
offline reinforcement learning (RL) defines a sample-efficient learning paradigm, where a policy is learned from static and previously collected datasets.
We propose a novel approach, which we refer to as adaptive behavior regularization (ABR)
ABR enables the policy to adaptively adjust its optimization objective between cloning and improving over the policy used to generate the dataset.
arXiv Detail & Related papers (2022-11-15T15:59:11Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Regularizing a Model-based Policy Stationary Distribution to Stabilize
Offline Reinforcement Learning [62.19209005400561]
offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets.
A key challenge of offline RL is the instability of policy training, caused by the mismatch between the distribution of the offline data and the undiscounted stationary state-action distribution of the learned policy.
We regularize the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process.
arXiv Detail & Related papers (2022-06-14T20:56:16Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement
Learning [125.8224674893018]
Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment.
Applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions.
We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
arXiv Detail & Related papers (2022-02-23T15:27:16Z) - Uncertainty-Based Offline Reinforcement Learning with Diversified
Q-Ensemble [16.92791301062903]
We propose an uncertainty-based offline RL method that takes into account the confidence of the Q-value prediction and does not require any estimation or sampling of the data distribution.
Surprisingly, we find that it is possible to substantially outperform existing offline RL methods on various tasks by simply increasing the number of Q-networks along with the clipped Q-learning.
arXiv Detail & Related papers (2021-10-04T16:40:13Z) - Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment.
The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data.
We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.