Related papers: Optimal decision making in robotic assembly and other trial-and-error tasks

Optimal decision making in robotic assembly and other trial-and-error tasks

URL: http://arxiv.org/abs/2301.10846v1
Date: Wed, 25 Jan 2023 22:07:50 GMT
Title: Optimal decision making in robotic assembly and other trial-and-error tasks
Authors: James Watson, Nikolaus Correll
Abstract summary: We study a class of problems providing (1) low-entropy indicators of terminal success / failure, and (2) unreliable (high-entropy) data to predict the final outcome of an ongoing task. We derive a closed form solution that predicts makespan based on the confusion matrix of the failure predictor. This allows the robot to learn failure prediction in a production environment, and only adopt a preemptive policy when it actually saves time.
Score: 1.0660480034605238
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Uncertainty in perception, actuation, and the environment often require multiple attempts for a robotic task to be successful. We study a class of problems providing (1) low-entropy indicators of terminal success / failure, and (2) unreliable (high-entropy) data to predict the final outcome of an ongoing task. Examples include a robot trying to connect with a charging station, parallel parking, or assembling a tightly-fitting part. The ability to restart after predicting failure early, versus simply running to failure, can significantly decrease the makespan, that is, the total time to completion, with the drawback of potentially short-cutting an otherwise successful operation. Assuming task running times to be Poisson distributed, and using a Markov Jump process to capture the dynamics of the underlying Markov Decision Process, we derive a closed form solution that predicts makespan based on the confusion matrix of the failure predictor. This allows the robot to learn failure prediction in a production environment, and only adopt a preemptive policy when it actually saves time. We demonstrate this approach using a robotic peg-in-hole assembly problem using a real robotic system. Failures are predicted by a dilated convolutional network based on force-torque data, showing an average makespan reduction from 101s to 81s (N=120, p<0.05). We posit that the proposed algorithm generalizes to any robotic behavior with an unambiguous terminal reward, with wide ranging applications on how robots can learn and improve their behaviors in the wild.

Related papers

Action Flow Matching for Continual Robot Learning [57.698553219660376]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks. We introduce a generative framework leveraging flow matching for online robot dynamics model alignment. We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z)
FAST: Efficient Action Tokenization for Vision-Language-Action Models [98.15494168962563]
We propose a new compression-based tokenization scheme for robot actions, based on the discrete cosine transform. Based on FAST, we release FAST+, a universal robot action tokenizer, trained on 1M real robot action trajectories.
arXiv Detail & Related papers (2025-01-16T18:57:04Z)
Generalizability of Graph Neural Networks for Decentralized Unlabeled Motion Planning [72.86540018081531]
Unlabeled motion planning involves assigning a set of robots to target locations while ensuring collision avoidance. This problem forms an essential building block for multi-robot systems in applications such as exploration, surveillance, and transportation. We address this problem in a decentralized setting where each robot knows only the positions of its $k$-nearest robots and $k$-nearest targets.
arXiv Detail & Related papers (2024-09-29T23:57:25Z)
Learning to Recover from Plan Execution Errors during Robot Manipulation: A Neuro-symbolic Approach [7.768747914019512]
We propose an approach (blending learning with symbolic search) for automated error discovery and recovery. We present an anytime version of our algorithm, where instead of recovering to the last correct state, we search for a sub-goal in the original plan.
arXiv Detail & Related papers (2024-05-29T10:03:57Z)
Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z)
Distributional Instance Segmentation: Modeling Uncertainty and High Confidence Predictions with Latent-MaskRCNN [77.0623472106488]
In this paper, we explore a class of distributional instance segmentation models using latent codes. For robotic picking applications, we propose a confidence mask method to achieve the high precision necessary. We show that our method can significantly reduce critical errors in robotic systems, including our newly released dataset of ambiguous scenes.
arXiv Detail & Related papers (2023-05-03T05:57:29Z)
Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. In this work, we propose MEDAL++, a novel design for self-improving robotic systems. The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z)
Asking for Help: Failure Prediction in Behavioral Cloning through Value Approximation [8.993237527071756]
We introduce Behavioral Cloning Value Approximation (BCVA), an approach to learning a state value function based on and trained jointly with a Behavioral Cloning policy. We demonstrate the effectiveness of BCVA by applying it to the challenging mobile manipulation task of latched-door opening.
arXiv Detail & Related papers (2023-02-08T20:56:23Z)
Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration. We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z)
SABER: Data-Driven Motion Planner for Autonomously Navigating Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal. We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints. recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution. A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z)
Learning from Sparse Demonstrations [17.24236148404065]
The paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot learn an objective function from a few demonstrated examples. The method finds an objective function and a time-warping function such that the robot's resulting trajectorys sequentially follow the trajectorys with minimal discrepancy loss. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments.
arXiv Detail & Related papers (2020-08-05T14:25:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.