Automatic Evaluation of Excavator Operators using Learned Reward
Functions
- URL: http://arxiv.org/abs/2211.07941v1
- Date: Tue, 15 Nov 2022 06:58:00 GMT
- Title: Automatic Evaluation of Excavator Operators using Learned Reward
Functions
- Authors: Pranav Agarwal, Marek Teichmann, Sheldon Andrews, Samira Ebrahimi
Kahou
- Abstract summary: We propose a novel strategy for the automatic evaluation of excavator operators.
We take into account the internal dynamics of the excavator and the safety criterion at every time step to evaluate the performance.
For a policy learned using these external reward prediction models, our results demonstrate safer solutions.
- Score: 5.372817906484557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training novice users to operate an excavator for learning different skills
requires the presence of expert teachers. Considering the complexity of the
problem, it is comparatively expensive to find skilled experts as the process
is time-consuming and requires precise focus. Moreover, since humans tend to be
biased, the evaluation process is noisy and will lead to high variance in the
final score of different operators with similar skills. In this work, we
address these issues and propose a novel strategy for the automatic evaluation
of excavator operators. We take into account the internal dynamics of the
excavator and the safety criterion at every time step to evaluate the
performance. To further validate our approach, we use this score prediction
model as a source of reward for a reinforcement learning agent to learn the
task of maneuvering an excavator in a simulated environment that closely
replicates the real-world dynamics. For a policy learned using these external
reward prediction models, our results demonstrate safer solutions following the
required dynamic constraints when compared to policy trained with task-based
reward functions only, making it one step closer to real-life adoption. For
future research, we release our codebase at
https://github.com/pranavAL/InvRL_Auto-Evaluate and video results
https://drive.google.com/file/d/1jR1otOAu8zrY8mkhUOUZW9jkBOAKK71Z/view?usp=share_link .
Related papers
- RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations.
RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - CLARE: Conservative Model-Based Reward Learning for Offline Inverse
Reinforcement Learning [26.05184273238923]
This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL)
We devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function.
Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy.
arXiv Detail & Related papers (2023-02-09T17:16:29Z) - Data-Driven Inverse Reinforcement Learning for Expert-Learner Zero-Sum
Games [30.720112378448285]
We formulate inverse reinforcement learning as an expert-learner interaction.
The optimal performance intent of an expert or target agent is unknown to a learner agent.
We develop an off-policy IRL algorithm that does not require knowledge of the expert and learner agent dynamics.
arXiv Detail & Related papers (2023-01-05T10:35:08Z) - Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration.
Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design.
We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Self-Supervised Exploration via Latent Bayesian Surprise [4.088019409160893]
In this work, we propose a curiosity-based bonus as intrinsic reward for Reinforcement Learning.
We extensively evaluate our model by measuring the agent's performance in terms of environment exploration.
Our model is cheap and empirically shows state-of-the-art performance on several problems.
arXiv Detail & Related papers (2021-04-15T14:40:16Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - A New Framework for Query Efficient Active Imitation Learning [5.167794607251493]
There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive.
We propose a new framework for imitation learning (IL) algorithm that actively and interactively learns a model of the user's reward function with efficient queries.
We evaluate the proposed method with simulated human on a state-based 2D navigation task, robotic control tasks and the image-based video games.
arXiv Detail & Related papers (2019-12-30T18:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.