Related papers: Robot Learning with Crash Constraints

Robot Learning with Crash Constraints

URL: http://arxiv.org/abs/2010.08669v3
Date: Thu, 28 Jan 2021 00:34:40 GMT
Title: Robot Learning with Crash Constraints
Authors: Alonso Marco, Dominik Baumann, Majid Khadiv, Philipp Hennig, Ludovic Righetti, Sebastian Trimpe
Abstract summary: In robot applications where failing is undesired but not catastrophic, many algorithms struggle with leveraging data obtained from failures. This is usually caused by (i) the failed experiment ending prematurely, or (ii) the acquired data being scarce or corrupted. We consider failing behaviors as those that violate a constraint and address the problem of learning with crash constraints.
Score: 37.685515446816105
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the past decade, numerous machine learning algorithms have been shown to successfully learn optimal policies to control real robotic systems. However, it is common to encounter failing behaviors as the learning loop progresses. Specifically, in robot applications where failing is undesired but not catastrophic, many algorithms struggle with leveraging data obtained from failures. This is usually caused by (i) the failed experiment ending prematurely, or (ii) the acquired data being scarce or corrupted. Both complicate the design of proper reward functions to penalize failures. In this paper, we propose a framework that addresses those issues. We consider failing behaviors as those that violate a constraint and address the problem of learning with crash constraints, where no data is obtained upon constraint violation. The no-data case is addressed by a novel GP model (GPCR) for the constraint that combines discrete events (failure/success) with continuous observations (only obtained upon success). We demonstrate the effectiveness of our framework on simulated benchmarks and on a real jumping quadruped, where the constraint threshold is unknown a priori. Experimental data is collected, by means of constrained Bayesian optimization, directly on the real robot. Our results outperform manual tuning and GPCR proves useful on estimating the constraint threshold.

Related papers

Exposing the Copycat Problem of Imitation-based Planner: A Novel Closed-Loop Simulator, Causal Benchmark and Joint IL-RL Baseline [49.51385135697656]
Within machine learning-based planning, imitation learning (IL) is a common algorithm. It primarily learns driving policies directly from supervised trajectory data. It remains challenging to determine if the learned policy truly understands fundamental driving principles. This work proposes a novel closed-loop simulator supporting both imitation and reinforcement learning.
arXiv Detail & Related papers (2025-04-20T18:51:26Z)
Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies [19.27526590452503]
FAIL-Detect is a two-stage approach for failure detection in imitation learning-based robotic manipulation. We first distill policy inputs and outputs into scalar signals that correlate with policy failures and capture uncertainty. Our experiments show learned signals to be mostly consistently effective, particularly when using our novel flow-based density estimator.
arXiv Detail & Related papers (2025-03-11T15:47:12Z)
Rare event modeling with self-regularized normalizing flows: what can we learn from a single failure? [10.460029312784911]
This paper introduces CalNF, a framework for posterior learning from limited data. It achieves state-of-the-art performance on data-limited failure modeling and inverse problems. It enables a first-of-a-kind case study into the root causes of the 2022 Southwest Airlines scheduling crisis.
arXiv Detail & Related papers (2025-02-28T14:47:52Z)
Positive-Unlabeled Constraint Learning (PUCL) for Inferring Nonlinear Continuous Constraints Functions from Expert Demonstrations [8.361428709513476]
This paper presents a novel Positive-Unlabeled Constraint Learning (PUCL) algorithm to infer a continuous arbitrary constraint function from demonstration. Within our framework, we treat all data in demonstrations as positive (feasible) data, and learn a control policy to generate potentially infeasible trajectories. It successfully infers and transfers the continuous nonlinear constraints and outperforms other baseline methods in terms of constraint accuracy and policy safety.
arXiv Detail & Related papers (2024-08-03T01:09:48Z)
Learning General Continuous Constraint from Demonstrations via Positive-Unlabeled Learning [8.361428709513476]
This paper presents a positive-unlabeled (PU) learning approach to infer a continuous, arbitrary and possibly nonlinear, constraint from demonstration. The effectiveness of the proposed method is validated in two Mujoco environments.
arXiv Detail & Related papers (2024-07-23T14:00:18Z)
Truly No-Regret Learning in Constrained MDPs [61.78619476991494]
We propose a model-based primal-dual algorithm to learn in an unknown CMDP. We prove that our algorithm achieves sublinear regret without error cancellations.
arXiv Detail & Related papers (2024-02-24T09:47:46Z)
Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks. To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks. However, it is not expected in practice considering the memory constraint or data privacy issue. As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z)
Optimal decision making in robotic assembly and other trial-and-error tasks [1.0660480034605238]
We study a class of problems providing (1) low-entropy indicators of terminal success / failure, and (2) unreliable (high-entropy) data to predict the final outcome of an ongoing task. We derive a closed form solution that predicts makespan based on the confusion matrix of the failure predictor. This allows the robot to learn failure prediction in a production environment, and only adopt a preemptive policy when it actually saves time.
arXiv Detail & Related papers (2023-01-25T22:07:50Z)
Benchmarking Constraint Inference in Inverse Reinforcement Learning [19.314352936252444]
In many real-world problems, the constraints followed by expert agents are often hard to specify mathematically and unknown to the RL agents. In this paper, we construct a CIRL benchmark in the context of two major application domains: robot control and autonomous driving. The benchmark, including the information for reproducing the performance of CIRL algorithms, is publicly available at https://github.com/Guiliang/CIRL-benchmarks-public.
arXiv Detail & Related papers (2022-06-20T09:22:20Z)
Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine. These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults. We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z)
Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations [50.37808220291108]
This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations. We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety. We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior.
arXiv Detail & Related papers (2021-11-18T23:21:00Z)
Excursion Search for Constrained Bayesian Optimization under a Limited Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures. Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.