Offline Preference-Based Apprenticeship Learning
- URL: http://arxiv.org/abs/2107.09251v2
- Date: Thu, 22 Jul 2021 04:30:17 GMT
- Title: Offline Preference-Based Apprenticeship Learning
- Authors: Daniel Shin, Daniel S. Brown
- Abstract summary: We study how an offline dataset can be used to address two challenges that autonomous systems face when they endeavor to learn from, adapt to, and collaborate with humans.
First, we use the offline dataset to efficiently infer the human's reward function via pool-based active preference learning.
Second, given this learned reward function, we perform offline reinforcement learning to optimize a policy based on the inferred human intent.
- Score: 11.21888613165599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study how an offline dataset of prior (possibly random) experience can be
used to address two challenges that autonomous systems face when they endeavor
to learn from, adapt to, and collaborate with humans : (1) identifying the
human's intent and (2) safely optimizing the autonomous system's behavior to
achieve this inferred intent. First, we use the offline dataset to efficiently
infer the human's reward function via pool-based active preference learning.
Second, given this learned reward function, we perform offline reinforcement
learning to optimize a policy based on the inferred human intent. Crucially,
our proposed approach does not require actual physical rollouts or an accurate
simulator for either the reward learning or policy optimization steps, enabling
both safe and efficient apprenticeship learning. We identify and evaluate our
approach on a subset of existing offline RL benchmarks that are well suited for
offline reward learning and also evaluate extensions of these benchmarks which
allow more open-ended behaviors. Our experiments show that offline
preference-based reward learning followed by offline reinforcement learning
enables efficient and high-performing policies, while only requiring small
numbers of preference queries. Videos available at
https://sites.google.com/view/offline-prefs.
Related papers
- Preference Elicitation for Offline Reinforcement Learning [59.136381500967744]
We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm.
Our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy.
arXiv Detail & Related papers (2024-06-26T15:59:13Z) - Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline
Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance.
We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning.
Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z) - Offline Robot Reinforcement Learning with Uncertainty-Guided Human
Expert Sampling [11.751910133386254]
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data.
We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data.
Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
arXiv Detail & Related papers (2022-12-16T01:41:59Z) - Towards Data-Driven Offline Simulations for Online Reinforcement
Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z) - How to Spend Your Robot Time: Bridging Kickstarting and Offline
Reinforcement Learning for Vision-based Robotic Manipulation [17.562522787934178]
Reinforcement learning (RL) has been shown to be effective at learning control from experience.
RL typically requires a large amount of online interaction with the environment.
We investigate ways to minimize online interactions in a target task, by reusing a suboptimal policy.
arXiv Detail & Related papers (2022-05-06T16:38:59Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.