State-Constrained Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2405.14374v1
- Date: Thu, 23 May 2024 09:50:04 GMT
- Title: State-Constrained Offline Reinforcement Learning
- Authors: Charles A. Hepburn, Yue Jin, Giovanni Montana,
- Abstract summary: We introduce a novel framework named emphstate-constrained offline reinforcement learning.
Our framework significantly enhances learning potential and reduces previous limitations.
We also introduce StaCQ, a deep learning algorithm that is both performance-driven on the D4RL benchmark datasets and closely aligned with our theoretical propositions.
- Score: 9.38848713730931
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional offline reinforcement learning methods predominantly operate in a batch-constrained setting. This confines the algorithms to a specific state-action distribution present in the dataset, reducing the effects of distributional shift but restricting the algorithm greatly. In this paper, we alleviate this limitation by introducing a novel framework named \emph{state-constrained} offline reinforcement learning. By exclusively focusing on the dataset's state distribution, our framework significantly enhances learning potential and reduces previous limitations. The proposed setting not only broadens the learning horizon but also improves the ability to combine different trajectories from the dataset effectively, a desirable property inherent in offline reinforcement learning. Our research is underpinned by solid theoretical findings that pave the way for subsequent advancements in this domain. Additionally, we introduce StaCQ, a deep learning algorithm that is both performance-driven on the D4RL benchmark datasets and closely aligned with our theoretical propositions. StaCQ establishes a strong baseline for forthcoming explorations in state-constrained offline reinforcement learning.
Related papers
- Integrating Domain Knowledge for handling Limited Data in Offline RL [10.068880918932415]
offline RL algorithms perform sub-optimally when confronted with limited data confined to specific regions within the state space.
This paper proposes a novel domain knowledge-based regularization technique and adaptively refines the initial domain knowledge to boost performance in limited data with partially omitted states.
arXiv Detail & Related papers (2024-06-11T07:59:17Z) - Learning from Sparse Offline Datasets via Conservative Density
Estimation [27.93418377019955]
We propose a novel training algorithm called Conservative Density Estimation (CDE)
CDE addresses the challenge by explicitly imposing constraints on the state-action occupancy stationary distribution.
Our method achieves state-of-the-art performance on the D4RL benchmark.
arXiv Detail & Related papers (2024-01-16T20:42:15Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Offline Stochastic Shortest Path: Learning, Evaluation and Towards
Optimality [57.91411772725183]
In this paper, we consider the offline shortest path problem when the state space and the action space are finite.
We design the simple value-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks.
Our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal.
arXiv Detail & Related papers (2022-06-10T07:44:56Z) - Exploiting Action Impact Regularity and Exogenous State Variables for
Offline Reinforcement Learning [30.337391523928396]
We explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning.
We discuss algorithms that exploit the Action Impact Regularity (AIR) property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration.
We demonstrate that the algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in simulated and real world environments.
arXiv Detail & Related papers (2021-11-15T20:14:18Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - Constrained episodic reinforcement learning in concave-convex and
knapsack settings [81.08055425644037]
We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints.
Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.
arXiv Detail & Related papers (2020-06-09T05:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.