Exploiting Action Impact Regularity and Exogenous State Variables for
Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2111.08066v5
- Date: Wed, 3 May 2023 17:51:30 GMT
- Title: Exploiting Action Impact Regularity and Exogenous State Variables for
Offline Reinforcement Learning
- Authors: Vincent Liu, James R. Wright, Martha White
- Abstract summary: We explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning.
We discuss algorithms that exploit the Action Impact Regularity (AIR) property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration.
We demonstrate that the algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in simulated and real world environments.
- Score: 30.337391523928396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning -- learning a policy from a batch of data --
is known to be hard for general MDPs. These results motivate the need to look
at specific classes of MDPs where offline reinforcement learning might be
feasible. In this work, we explore a restricted class of MDPs to obtain
guarantees for offline reinforcement learning. The key property, which we call
Action Impact Regularity (AIR), is that actions primarily impact a part of the
state (an endogenous component) and have limited impact on the remaining part
of the state (an exogenous component). AIR is a strong assumption, but it
nonetheless holds in a number of real-world domains including financial
markets. We discuss algorithms that exploit the AIR property, and provide a
theoretical analysis for an algorithm based on Fitted-Q Iteration. Finally, we
demonstrate that the algorithm outperforms existing offline reinforcement
learning algorithms across different data collection policies in simulated and
real world environments where the regularity holds.
Related papers
- Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Behavior Prior Representation learning for Offline Reinforcement
Learning [23.200489608592694]
We introduce a simple, yet effective approach for learning state representations.
Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset.
We show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks.
arXiv Detail & Related papers (2022-11-02T04:15:20Z) - Data Valuation for Offline Reinforcement Learning [1.3535770763481902]
The field of offline reinforcement learning addresses issues through outsourcing the collection of data to a domain expert or a carefully monitored program.
With the emergence of data markets, an alternative to constructing a dataset in-house is to purchase external data.
This raises questions regarding the transferability and robustness of an offline reinforcement learning agent trained on externally acquired data.
arXiv Detail & Related papers (2022-05-19T13:21:40Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Dealing with the Unknown: Pessimistic Offline Reinforcement Learning [25.30634466168587]
We propose a Pessimistic Offline Reinforcement Learning (PessORL) algorithm to actively lead the agent back to the area where it is familiar.
We focus on problems caused by out-of-distribution (OOD) states, and deliberately penalize high values at states that are absent in the training dataset.
arXiv Detail & Related papers (2021-11-09T22:38:58Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - PLAS: Latent Action Space for Offline Reinforcement Learning [18.63424441772675]
The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment.
Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions.
We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets.
arXiv Detail & Related papers (2020-11-14T03:38:38Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.