Related papers: Data Valuation for Offline Reinforcement Learning

Data Valuation for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2205.09550v1
Date: Thu, 19 May 2022 13:21:40 GMT
Title: Data Valuation for Offline Reinforcement Learning
Authors: Amir Abolfazli, Gregory Palmer and Daniel Kudenko
Abstract summary: The field of offline reinforcement learning addresses issues through outsourcing the collection of data to a domain expert or a carefully monitored program. With the emergence of data markets, an alternative to constructing a dataset in-house is to purchase external data. This raises questions regarding the transferability and robustness of an offline reinforcement learning agent trained on externally acquired data.
Score: 1.3535770763481902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The success of deep reinforcement learning (DRL) hinges on the availability of training data, which is typically obtained via a large number of environment interactions. In many real-world scenarios, costs and risks are associated with gathering these data. The field of offline reinforcement learning addresses these issues through outsourcing the collection of data to a domain expert or a carefully monitored program and subsequently searching for a batch-constrained optimal policy. With the emergence of data markets, an alternative to constructing a dataset in-house is to purchase external data. However, while state-of-the-art offline reinforcement learning approaches have shown a lot of promise, they currently rely on carefully constructed datasets that are well aligned with the intended target domains. This raises questions regarding the transferability and robustness of an offline reinforcement learning agent trained on externally acquired data. In this paper, we empirically evaluate the ability of the current state-of-the-art offline reinforcement learning approaches to coping with the source-target domain mismatch within two MuJoCo environments, finding that current state-of-the-art offline reinforcement learning algorithms underperform in the target domain. To address this, we propose data valuation for offline reinforcement learning (DVORL), which allows us to identify relevant and high-quality transitions, improving the performance and transferability of policies learned by offline reinforcement learning algorithms. The results show that our method outperforms offline reinforcement learning baselines on two MuJoCo environments.

Related papers

Active Advantage-Aligned Online Reinforcement Learning with Offline Data [56.98480620108727]
We introduce A3RL, which incorporates a novel confidence aware Active Advantage Aligned sampling strategy.<n>We demonstrate that our method outperforms competing online RL techniques that leverage offline data.
arXiv Detail & Related papers (2025-02-11T20:31:59Z)
Enhancing Offline Reinforcement Learning with Curriculum Learning-Based Trajectory Valuation [6.4653739435880455]
Deep reinforcement learning (DRL) relies on the availability and quality of training data, often requiring extensive interactions with specific environments. In many real-world scenarios, where data collection is costly and risky, offline reinforcement learning (RL) offers a solution by utilizing data collected by domain experts and searching for a batch-constrained optimal policy. Existing offline RL methods often struggle with challenges posed by non-matching data from external sources.
arXiv Detail & Related papers (2025-02-02T00:03:53Z)
Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode. We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z)
Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows [30.926243761581624]
Causal Normalizing Flow (CNF) is developed to learn the transition and reward functions for data generation and augmentation in offline policy evaluation and training. CNF gains predictive and counterfactual reasoning capabilities for sequential decision-making tasks, revealing a high potential for OOD adaptation. Our CNF-based offline RL approach is validated through empirical evaluations, outperforming model-free and model-based methods by a significant margin.
arXiv Detail & Related papers (2024-05-06T22:44:32Z)
Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance. We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z)
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware [35.29390454207064]
Dexterous manipulation in particular remains an open problem in its general form. We propose a benchmark including a large collection of data for offline learning from a dexterous manipulation platform on two tasks. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
arXiv Detail & Related papers (2023-07-28T17:29:49Z)
Offline Robot Reinforcement Learning with Uncertainty-Guided Human Expert Sampling [11.751910133386254]
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data. We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data. Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
arXiv Detail & Related papers (2022-12-16T01:41:59Z)
Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations [58.758928936316785]
offline reinforcement learning from visual observations with continuous action spaces remains under-explored. We show that modifications to two popular vision-based online reinforcement learning algorithms suffice to outperform existing offline RL methods.
arXiv Detail & Related papers (2022-06-09T22:08:47Z)
Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions. In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios. We propose to leverage latent-variable policies that can represent a broader class of policy distributions. Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z)
The Challenges of Exploration for Offline Reinforcement Learning [8.484491887821473]
We study the two processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The task-agnostic setting for data collection, where the task is not known a priori, is of particular interest. We use this decoupled framework to strengthen intuitions about exploration and the data prerequisites for effective offline RL.
arXiv Detail & Related papers (2022-01-27T23:59:56Z)
Representation Matters: Offline Pretraining for Sequential Decision Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making. We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z)
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited. Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors. In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z)
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems [108.81683598693539]
offline reinforcement learning algorithms hold tremendous promise for making it possible to turn large datasets into powerful decision making engines. We will aim to provide the reader with an understanding of these challenges, particularly in the context of modern deep reinforcement learning methods.
arXiv Detail & Related papers (2020-05-04T17:00:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.