Offline Robot Reinforcement Learning with Uncertainty-Guided Human
Expert Sampling
- URL: http://arxiv.org/abs/2212.08232v1
- Date: Fri, 16 Dec 2022 01:41:59 GMT
- Title: Offline Robot Reinforcement Learning with Uncertainty-Guided Human
Expert Sampling
- Authors: Ashish Kumar, Ilya Kuzovkin
- Abstract summary: Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data.
We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data.
Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
- Score: 11.751910133386254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in batch (offline) reinforcement learning have shown
promising results in learning from available offline data and proved offline
reinforcement learning to be an essential toolkit in learning control policies
in a model-free setting. An offline reinforcement learning algorithm applied to
a dataset collected by a suboptimal non-learning-based algorithm can result in
a policy that outperforms the behavior agent used to collect the data. Such a
scenario is frequent in robotics, where existing automation is collecting
operational data. Although offline learning techniques can learn from data
generated by a sub-optimal behavior agent, there is still an opportunity to
improve the sample complexity of existing offline reinforcement learning
algorithms by strategically introducing human demonstration data into the
training process. To this end, we propose a novel approach that uses
uncertainty estimation to trigger the injection of human demonstration data and
guide policy training towards optimal behavior while reducing overall sample
complexity. Our experiments show that this approach is more sample efficient
when compared to a naive way of combining expert data with data collected from
a sub-optimal agent. We augmented an existing offline reinforcement learning
algorithm Conservative Q-Learning with our approach and performed experiments
on data collected from MuJoCo and OffWorld Gym learning environments.
Related papers
- Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline
Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance.
We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Benchmarking Offline Reinforcement Learning on Real-Robot Hardware [35.29390454207064]
Dexterous manipulation in particular remains an open problem in its general form.
We propose a benchmark including a large collection of data for offline learning from a dexterous manipulation platform on two tasks.
We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
arXiv Detail & Related papers (2023-07-28T17:29:49Z) - Robust online active learning [0.7734726150561089]
This work investigates the performance of online active linear regression in contaminated data streams.
We propose a solution that bounds the search area of a conditional D-optimal algorithm and uses a robust estimator.
arXiv Detail & Related papers (2023-02-01T13:14:26Z) - Improving Behavioural Cloning with Positive Unlabeled Learning [15.484227081812852]
We propose a novel iterative learning algorithm for identifying expert trajectories in mixed-quality robotics datasets.
Applying behavioral cloning to the resulting filtered dataset outperforms several competitive offline reinforcement learning and imitation learning baselines.
arXiv Detail & Related papers (2023-01-27T14:17:45Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.