Offline RL With Resource Constrained Online Deployment
- URL: http://arxiv.org/abs/2110.03165v1
- Date: Thu, 7 Oct 2021 03:43:09 GMT
- Title: Offline RL With Resource Constrained Online Deployment
- Authors: Jayanth Reddy Regatti, Aniket Anand Deshmukh, Frank Cheng, Young Hun
Jung, Abhishek Gupta, Urun Dogan
- Abstract summary: offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible.
This work introduces and formalizes a novel resource-constrained problem setting.
We highlight the performance gap between policies trained using the full offline dataset and policies trained using limited features.
- Score: 13.61540280864938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline reinforcement learning is used to train policies in scenarios where
real-time access to the environment is expensive or impossible. As a natural
consequence of these harsh conditions, an agent may lack the resources to fully
observe the online environment before taking an action. We dub this situation
the resource-constrained setting. This leads to situations where the offline
dataset (available for training) can contain fully processed features (using
powerful language models, image models, complex sensors, etc.) which are not
available when actions are actually taken online. This disconnect leads to an
interesting and unexplored problem in offline RL: Is it possible to use a
richly processed offline dataset to train a policy which has access to fewer
features in the online environment? In this work, we introduce and formalize
this novel resource-constrained problem setting. We highlight the performance
gap between policies trained using the full offline dataset and policies
trained using limited features. We address this performance gap with a policy
transfer algorithm which first trains a teacher agent using the offline dataset
where features are fully available, and then transfers this knowledge to a
student agent that only uses the resource-constrained features. To better
capture the challenge of this setting, we propose a data collection procedure:
Resource Constrained-Datasets for RL (RC-D4RL). We evaluate our transfer
algorithm on RC-D4RL and the popular D4RL benchmarks and observe consistent
improvement over the baseline (TD3+BC without transfer). The code for the
experiments is available at
https://github.com/JayanthRR/RC-OfflineRL}{github.com/RC-OfflineRL.
Related papers
- Offline Reinforcement Learning for Wireless Network Optimization with
Mixture Datasets [13.22086908661673]
Reinforcement learning (RL) has boosted the adoption of online RL for wireless radio resource management (RRM)
Online RL algorithms require direct interactions with the environment.
offline RL can produce a near-optimal RL policy even when all involved behavior policies are highly suboptimal.
arXiv Detail & Related papers (2023-11-19T21:02:17Z) - Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate
Exploration Bias [96.14064037614942]
offline retraining, a policy extraction step at the end of online fine-tuning, is proposed.
An optimistic (exploration) policy is used to interact with the environment, and a separate pessimistic (exploitation) policy is trained on all the observed data for evaluation.
arXiv Detail & Related papers (2023-10-12T17:50:09Z) - Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning.
Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z) - Leveraging Offline Data in Online Reinforcement Learning [24.18369781999988]
Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL.
In the online RL setting, the agent has no prior knowledge of the environment, and must interact with it in order to find an $epsilon$-optimal policy.
In the offline RL setting, the learner instead has access to a fixed dataset to learn from, but is unable to otherwise interact with the environment, and must obtain the best policy it can from this offline data.
arXiv Detail & Related papers (2022-11-09T15:39:32Z) - Offline RL With Realistic Datasets: Heteroskedasticity and Support
Constraints [82.43359506154117]
We show that typical offline reinforcement learning methods fail to learn from data with non-uniform variability.
Our method is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.
arXiv Detail & Related papers (2022-11-02T11:36:06Z) - Adaptive Behavior Cloning Regularization for Stable Offline-to-Online
Reinforcement Learning [80.25648265273155]
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment.
During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data.
We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability.
Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
arXiv Detail & Related papers (2022-10-25T09:08:26Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z) - The Least Restriction for Offline Reinforcement Learning [0.0]
We propose a creative offline reinforcement learning framework, the Least Restriction (LR)
The LR regards selecting an action as taking a sample from the probability distribution.
It is able to learn robustly from different offline datasets, including random and suboptimal demonstrations.
arXiv Detail & Related papers (2021-07-05T01:50:40Z) - OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.