Related papers: Offline RL With Resource Constrained Online Deployment

Offline RL With Resource Constrained Online Deployment

URL: http://arxiv.org/abs/2110.03165v1
Date: Thu, 7 Oct 2021 03:43:09 GMT
Title: Offline RL With Resource Constrained Online Deployment
Authors: Jayanth Reddy Regatti, Aniket Anand Deshmukh, Frank Cheng, Young Hun Jung, Abhishek Gupta, Urun Dogan
Abstract summary: offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible. This work introduces and formalizes a novel resource-constrained problem setting. We highlight the performance gap between policies trained using the full offline dataset and policies trained using limited features.
Score: 13.61540280864938
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible. As a natural consequence of these harsh conditions, an agent may lack the resources to fully observe the online environment before taking an action. We dub this situation the resource-constrained setting. This leads to situations where the offline dataset (available for training) can contain fully processed features (using powerful language models, image models, complex sensors, etc.) which are not available when actions are actually taken online. This disconnect leads to an interesting and unexplored problem in offline RL: Is it possible to use a richly processed offline dataset to train a policy which has access to fewer features in the online environment? In this work, we introduce and formalize this novel resource-constrained problem setting. We highlight the performance gap between policies trained using the full offline dataset and policies trained using limited features. We address this performance gap with a policy transfer algorithm which first trains a teacher agent using the offline dataset where features are fully available, and then transfers this knowledge to a student agent that only uses the resource-constrained features. To better capture the challenge of this setting, we propose a data collection procedure: Resource Constrained-Datasets for RL (RC-D4RL). We evaluate our transfer algorithm on RC-D4RL and the popular D4RL benchmarks and observe consistent improvement over the baseline (TD3+BC without transfer). The code for the experiments is available at https://github.com/JayanthRR/RC-OfflineRL}{github.com/RC-OfflineRL.

Related papers

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data [64.74333980417235]
We show that retaining offline data is unnecessary as long as we use a properly-designed online RL approach for fine-tuning offline RL. We show that Warm-start RL (WSRL) is able to fine-tune without retaining any offline data, and is able to learn faster and attains higher performance than existing algorithms.
arXiv Detail & Related papers (2024-12-10T18:57:12Z)
Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets [13.22086908661673]
Reinforcement learning (RL) has boosted the adoption of online RL for wireless radio resource management (RRM) Online RL algorithms require direct interactions with the environment. offline RL can produce a near-optimal RL policy even when all involved behavior policies are highly suboptimal.
arXiv Detail & Related papers (2023-11-19T21:02:17Z)
Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias [96.14064037614942]
offline retraining, a policy extraction step at the end of online fine-tuning, is proposed. An optimistic (exploration) policy is used to interact with the environment, and a separate pessimistic (exploitation) policy is trained on all the observed data for evaluation.
arXiv Detail & Related papers (2023-10-12T17:50:09Z)
Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning. Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z)
Leveraging Offline Data in Online Reinforcement Learning [24.18369781999988]
Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must interact with it in order to find an $epsilon$-optimal policy. In the offline RL setting, the learner instead has access to a fixed dataset to learn from, but is unable to otherwise interact with the environment, and must obtain the best policy it can from this offline data.
arXiv Detail & Related papers (2022-11-09T15:39:32Z)
Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints [82.43359506154117]
We show that typical offline reinforcement learning methods fail to learn from data with non-uniform variability. Our method is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.
arXiv Detail & Related papers (2022-11-02T11:36:06Z)
Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning [80.25648265273155]
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment. During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data. We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability. Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
arXiv Detail & Related papers (2022-10-25T09:08:26Z)
Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment. We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return. On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z)
The Least Restriction for Offline Reinforcement Learning [0.0]
We propose a creative offline reinforcement learning framework, the Least Restriction (LR) The LR regards selecting an action as taking a sample from the probability distribution. It is able to learn robustly from different offline datasets, including random and suboptimal demonstrations.
arXiv Detail & Related papers (2021-07-05T01:50:40Z)
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited. Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors. In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z)
Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR) We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.