Optimal decision making in robotic assembly and other trial-and-error
tasks
- URL: http://arxiv.org/abs/2301.10846v1
- Date: Wed, 25 Jan 2023 22:07:50 GMT
- Title: Optimal decision making in robotic assembly and other trial-and-error
tasks
- Authors: James Watson, Nikolaus Correll
- Abstract summary: We study a class of problems providing (1) low-entropy indicators of terminal success / failure, and (2) unreliable (high-entropy) data to predict the final outcome of an ongoing task.
We derive a closed form solution that predicts makespan based on the confusion matrix of the failure predictor.
This allows the robot to learn failure prediction in a production environment, and only adopt a preemptive policy when it actually saves time.
- Score: 1.0660480034605238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Uncertainty in perception, actuation, and the environment often require
multiple attempts for a robotic task to be successful. We study a class of
problems providing (1) low-entropy indicators of terminal success / failure,
and (2) unreliable (high-entropy) data to predict the final outcome of an
ongoing task. Examples include a robot trying to connect with a charging
station, parallel parking, or assembling a tightly-fitting part. The ability to
restart after predicting failure early, versus simply running to failure, can
significantly decrease the makespan, that is, the total time to completion,
with the drawback of potentially short-cutting an otherwise successful
operation. Assuming task running times to be Poisson distributed, and using a
Markov Jump process to capture the dynamics of the underlying Markov Decision
Process, we derive a closed form solution that predicts makespan based on the
confusion matrix of the failure predictor. This allows the robot to learn
failure prediction in a production environment, and only adopt a preemptive
policy when it actually saves time. We demonstrate this approach using a
robotic peg-in-hole assembly problem using a real robotic system. Failures are
predicted by a dilated convolutional network based on force-torque data,
showing an average makespan reduction from 101s to 81s (N=120, p<0.05). We
posit that the proposed algorithm generalizes to any robotic behavior with an
unambiguous terminal reward, with wide ranging applications on how robots can
learn and improve their behaviors in the wild.
Related papers
- Generalizability of Graph Neural Networks for Decentralized Unlabeled Motion Planning [72.86540018081531]
Unlabeled motion planning involves assigning a set of robots to target locations while ensuring collision avoidance.
This problem forms an essential building block for multi-robot systems in applications such as exploration, surveillance, and transportation.
We address this problem in a decentralized setting where each robot knows only the positions of its $k$-nearest robots and $k$-nearest targets.
arXiv Detail & Related papers (2024-09-29T23:57:25Z) - Learning to Recover from Plan Execution Errors during Robot Manipulation: A Neuro-symbolic Approach [7.768747914019512]
We propose an approach (blending learning with symbolic search) for automated error discovery and recovery.
We present an anytime version of our algorithm, where instead of recovering to the last correct state, we search for a sub-goal in the original plan.
arXiv Detail & Related papers (2024-05-29T10:03:57Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Distributional Instance Segmentation: Modeling Uncertainty and High
Confidence Predictions with Latent-MaskRCNN [77.0623472106488]
In this paper, we explore a class of distributional instance segmentation models using latent codes.
For robotic picking applications, we propose a confidence mask method to achieve the high precision necessary.
We show that our method can significantly reduce critical errors in robotic systems, including our newly released dataset of ambiguous scenes.
arXiv Detail & Related papers (2023-05-03T05:57:29Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Asking for Help: Failure Prediction in Behavioral Cloning through Value
Approximation [8.993237527071756]
We introduce Behavioral Cloning Value Approximation (BCVA), an approach to learning a state value function based on and trained jointly with a Behavioral Cloning policy.
We demonstrate the effectiveness of BCVA by applying it to the challenging mobile manipulation task of latched-door opening.
arXiv Detail & Related papers (2023-02-08T20:56:23Z) - Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z) - SABER: Data-Driven Motion Planner for Autonomously Navigating
Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal.
We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints.
recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution.
A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z) - Learning from Sparse Demonstrations [17.24236148404065]
The paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot learn an objective function from a few demonstrated examples.
The method finds an objective function and a time-warping function such that the robot's resulting trajectorys sequentially follow the trajectorys with minimal discrepancy loss.
The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments.
arXiv Detail & Related papers (2020-08-05T14:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.