Asking for Help: Failure Prediction in Behavioral Cloning through Value
Approximation
- URL: http://arxiv.org/abs/2302.04334v1
- Date: Wed, 8 Feb 2023 20:56:23 GMT
- Title: Asking for Help: Failure Prediction in Behavioral Cloning through Value
Approximation
- Authors: Cem Gokmen, Daniel Ho, Mohi Khansari
- Abstract summary: We introduce Behavioral Cloning Value Approximation (BCVA), an approach to learning a state value function based on and trained jointly with a Behavioral Cloning policy.
We demonstrate the effectiveness of BCVA by applying it to the challenging mobile manipulation task of latched-door opening.
- Score: 8.993237527071756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in end-to-end Imitation Learning approaches has shown
promising results and generalization capabilities on mobile manipulation tasks.
Such models are seeing increasing deployment in real-world settings, where
scaling up requires robots to be able to operate with high autonomy, i.e.
requiring as little human supervision as possible. In order to avoid the need
for one-on-one human supervision, robots need to be able to detect and prevent
policy failures ahead of time, and ask for help, allowing a remote operator to
supervise multiple robots and help when needed. However, the black-box nature
of end-to-end Imitation Learning models such as Behavioral Cloning, as well as
the lack of an explicit state-value representation, make it difficult to
predict failures. To this end, we introduce Behavioral Cloning Value
Approximation (BCVA), an approach to learning a state value function based on
and trained jointly with a Behavioral Cloning policy that can be used to
predict failures. We demonstrate the effectiveness of BCVA by applying it to
the challenging mobile manipulation task of latched-door opening, showing that
we can identify failure scenarios with with 86% precision and 81% recall,
evaluated on over 2000 real world runs, improving upon the baseline of simple
failure classification by 10 percentage-points.
Related papers
- Robotic Control via Embodied Chain-of-Thought Reasoning [86.6680905262442]
Key limitation of learned robot control policies is their inability to generalize outside their training data.
Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models can substantially improve their robustness and generalization ability.
We introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features before predicting the robot action.
arXiv Detail & Related papers (2024-07-11T17:31:01Z) - Conformalized Teleoperation: Confidently Mapping Human Inputs to High-Dimensional Robot Actions [4.855534476454559]
We learn a mapping from low-dimensional human inputs to high-dimensional robot actions.
Our key idea is to adapt the assistive map at training time to additionally estimate high-dimensional action quantiles.
We propose an uncertainty-interval-based mechanism for detecting high-uncertainty user inputs and robot states.
arXiv Detail & Related papers (2024-06-11T23:16:46Z) - IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning [43.19346528232497]
A popular approach for increasing policy robustness to distribution shift is interactive imitation learning.
We propose IntervenGen, a novel data generation system that can autonomously produce a large set of corrective interventions.
We show that it can increase policy robustness by up to 39x with only 10 human interventions.
arXiv Detail & Related papers (2024-05-02T17:06:19Z) - Model-Based Runtime Monitoring with Interactive Imitation Learning [30.70994322652745]
This work aims to endow a robot with the ability to monitor and detect errors during task execution.
We introduce a model-based runtime monitoring algorithm that learns from deployment data to detect system anomalies and anticipate failures.
Our method outperforms the baselines across system-level and unit-test metrics, with 23% and 40% higher success rates in simulation and on physical hardware.
arXiv Detail & Related papers (2023-10-26T16:45:44Z) - Distributional Instance Segmentation: Modeling Uncertainty and High
Confidence Predictions with Latent-MaskRCNN [77.0623472106488]
In this paper, we explore a class of distributional instance segmentation models using latent codes.
For robotic picking applications, we propose a confidence mask method to achieve the high precision necessary.
We show that our method can significantly reduce critical errors in robotic systems, including our newly released dataset of ambiguous scenes.
arXiv Detail & Related papers (2023-05-03T05:57:29Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Optimal decision making in robotic assembly and other trial-and-error
tasks [1.0660480034605238]
We study a class of problems providing (1) low-entropy indicators of terminal success / failure, and (2) unreliable (high-entropy) data to predict the final outcome of an ongoing task.
We derive a closed form solution that predicts makespan based on the confusion matrix of the failure predictor.
This allows the robot to learn failure prediction in a production environment, and only adopt a preemptive policy when it actually saves time.
arXiv Detail & Related papers (2023-01-25T22:07:50Z) - Efficiently Learning Recoveries from Failures Under Partial
Observability [31.891933360081342]
We present a general approach for robustifying manipulation strategies in a sample-efficient manner.
Our approach incrementally improves robustness by first discovering the failure modes of the current strategy.
We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning.
arXiv Detail & Related papers (2022-09-27T18:00:55Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.