Related papers: Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

URL: http://arxiv.org/abs/2303.06710v2
Date: Tue, 14 Mar 2023 16:16:58 GMT
Title: Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning
Authors: Siddharth Singi, Zhanpeng He, Alvin Pan, Sandip Patel, Gunnar A. Sigurdsson, Robinson Piramuthu, Shuran Song, Matei Ciocarlie
Abstract summary: In a Human-in-the-Loop paradigm, a robotic agent is able to act mostly autonomously in solving a task, but can request help from an external expert when needed. We present a Reinforcement Learning based approach to this problem, where a semi-autonomous agent asks for external assistance when it has low confidence in the eventual success of the task. We show that our method makes effective use of a limited budget of expert calls at run-time, despite having no access to the expert at training time.
Score: 13.184897303302971
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In a Human-in-the-Loop paradigm, a robotic agent is able to act mostly autonomously in solving a task, but can request help from an external expert when needed. However, knowing when to request such assistance is critical: too few requests can lead to the robot making mistakes, but too many requests can overload the expert. In this paper, we present a Reinforcement Learning based approach to this problem, where a semi-autonomous agent asks for external assistance when it has low confidence in the eventual success of the task. The confidence level is computed by estimating the variance of the return from the current state. We show that this estimate can be iteratively improved during training using a Bellman-like recursion. On discrete navigation problems with both fully- and partially-observable state information, we show that our method makes effective use of a limited budget of expert calls at run-time, despite having no access to the expert at training time.

Related papers

Self-Regulation and Requesting Interventions [63.5863047447313]
We propose an offline framework that trains a "helper" policy to request interventions. We score optimal intervention timing with PRMs and train the helper model on these labeled trajectories. This offline approach significantly reduces costly intervention calls during training.
arXiv Detail & Related papers (2025-02-07T00:06:17Z)
Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization [56.674356045200696]
We propose a novel method to train AI agents to incorporate knowledge and skills for multiple tasks without the need for cumbersome note systems or prior high-quality demonstration data. Our approach employs an iterative process where the agent collects new experiences, receives corrective feedback from humans in the form of hints, and integrates this feedback into its weights. We demonstrate the efficacy of our approach by implementing it in a Llama-3-based agent which, after only a few rounds of feedback, outperforms advanced models GPT-4o and DeepSeek-V3 in a taskset.
arXiv Detail & Related papers (2025-02-03T17:45:46Z)
Getting By Goal Misgeneralization With a Little Help From a Mentor [5.012314384895538]
This paper explores whether allowing an agent to ask for help from a supervisor in unfamiliar situations can mitigate this issue. We focus on agents trained with PPO in the CoinRun environment, a setting known to exhibit goal misgeneralization. We find that methods based on the agent's internal state fail to proactively request help, instead waiting until mistakes have already occurred.
arXiv Detail & Related papers (2024-10-28T14:07:41Z)
Automatic Evaluation of Excavator Operators using Learned Reward Functions [5.372817906484557]
We propose a novel strategy for the automatic evaluation of excavator operators. We take into account the internal dynamics of the excavator and the safety criterion at every time step to evaluate the performance. For a policy learned using these external reward prediction models, our results demonstrate safer solutions.
arXiv Detail & Related papers (2022-11-15T06:58:00Z)
When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning [57.53138994155612]
A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world. A critical challenge is the presence of irreversible states which require external assistance to recover from, such as when a robot arm has pushed an object off of a table. We propose an algorithm that efficiently learns to detect and avoid states that are irreversible, and proactively asks for help in case the agent does enter them.
arXiv Detail & Related papers (2022-10-19T17:57:24Z)
Learning to Guide Multiple Heterogeneous Actors from a Single Human Demonstration via Automatic Curriculum Learning in StarCraft II [0.5911087507716211]
In this work, we aim to train deep reinforcement learning agents that can command multiple heterogeneous actors. Our results show that an agent trained via automated curriculum learning can outperform state-of-the-art deep reinforcement learning baselines.
arXiv Detail & Related papers (2022-05-11T21:53:11Z)
Teachable Reinforcement Learning via Advice Distillation [161.43457947665073]
We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher. We show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms.
arXiv Detail & Related papers (2022-03-19T03:22:57Z)
Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions. By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem. We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them. Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
AvE: Assistance via Empowerment [77.08882807208461]
We propose a new paradigm for assistance by instead increasing the human's ability to control their environment. This task-agnostic objective preserves the person's autonomy and ability to achieve any eventual state.
arXiv Detail & Related papers (2020-06-26T04:40:11Z)
Should artificial agents ask for help in human-robot collaborative problem-solving? [0.7251305766151019]
We propose to start from hypotheses derived from an empirical study in a human-robot interaction. We check whether receiving help from an expert when solving a simple close-ended task allows to accelerate or not the learning of this task. Our experiences have allowed us to conclude that, whether requested or not, a Q-learning algorithm benefits in the same way from expert help as children do.
arXiv Detail & Related papers (2020-05-25T09:15:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.