Augmenting Offline Reinforcement Learning with State-only Interactions
- URL: http://arxiv.org/abs/2402.00807v2
- Date: Wed, 02 Oct 2024 23:24:50 GMT
- Title: Augmenting Offline Reinforcement Learning with State-only Interactions
- Authors: Shangzhe Li, Xinhua Zhang,
- Abstract summary: Batch offline data have been shown considerably beneficial for reinforcement learning.
In this paper, we consider a novel opportunity where interaction with environment is feasible, but only restricted to observations.
As a result, the learner must make good sense of the offline data to synthesize an efficient scheme of querying the transition of state.
- Score: 12.100856289121863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch offline data have been shown considerably beneficial for reinforcement learning. Their benefit is further amplified by upsampling with generative models. In this paper, we consider a novel opportunity where interaction with environment is feasible, but only restricted to observations, i.e., \textit{no reward} feedback is available. This setting is broadly applicable, as simulators or even real cyber-physical systems are often accessible, while in contrast reward is often difficult or expensive to obtain. As a result, the learner must make good sense of the offline data to synthesize an efficient scheme of querying the transition of state. Our method first leverages online interactions to generate high-return trajectories via conditional diffusion models. They are then blended with the original offline trajectories through a stitching algorithm, and the resulting augmented data can be applied generically to downstream reinforcement learners. Superior empirical performance is demonstrated over state-of-the-art data augmentation methods that are extended to utilize state-only interactions.
Related papers
- Hybrid Reinforcement Learning from Offline Observation Alone [19.14864618744221]
We consider the hybrid reinforcement learning setting where the agent has access to both offline data and online interactive access.
We propose the first algorithm in the trace model setting that provably matches the performance of algorithms that leverage a reset model.
arXiv Detail & Related papers (2024-06-11T13:34:05Z) - ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories [27.5648276335047]
Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL)
We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory diffuser (ATraDiff)
ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - Causal Action Influence Aware Counterfactual Data Augmentation [23.949113120847507]
We propose CAIAC, a data augmentation method that can create synthetic transitions from a fixed dataset without having access to online environment interactions.
By utilizing principled methods for quantifying causal influence, we are able to perform counterfactual reasoning by swapping $itaction$-unaffected parts of the state-space.
This leads to a substantial increase in robustness of offline learning algorithms against distributional shift.
arXiv Detail & Related papers (2024-05-29T09:19:50Z) - Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline
Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance.
We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z) - Causal Decision Transformer for Recommender Systems via Offline
Reinforcement Learning [23.638418776700522]
We propose a new model named the causal decision transformer for recommender systems (CDT4Rec)
CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction.
To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.
arXiv Detail & Related papers (2023-04-17T00:05:52Z) - Synthetic Experience Replay [48.601879260071655]
We propose Synthetic Experience Replay ( SynthER), a diffusion-based approach to flexibly upsample an agent's collected experience.
We show that SynthER is an effective method for training RL agents across offline and online settings.
We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data.
arXiv Detail & Related papers (2023-03-12T09:10:45Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for
Reinforcement Learning [84.30765628008207]
We propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning.
Our method outperforms the current state-of-the-art methods by a large margin on both benchmarks.
arXiv Detail & Related papers (2021-06-08T07:37:37Z) - OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.