Positive-Unlabeled Constraint Learning for Inferring Nonlinear Continuous Constraints Functions from Expert Demonstrations
- URL: http://arxiv.org/abs/2408.01622v2
- Date: Thu, 16 Jan 2025 10:30:40 GMT
- Title: Positive-Unlabeled Constraint Learning for Inferring Nonlinear Continuous Constraints Functions from Expert Demonstrations
- Authors: Baiyu Peng, Aude Billard,
- Abstract summary: Planning for diverse real-world robotic tasks necessitates to know and write all constraints.
This paper presents a novel two-step Positive-Unlabeled Constraint Learning (PUCL) algorithm to infer a continuous constraint function from demonstrations.
It successfully infers the continuous nonlinear constraints and outperforms other baseline methods in terms of constraint accuracy and policy safety.
- Score: 8.361428709513476
- License:
- Abstract: Planning for diverse real-world robotic tasks necessitates to know and write all constraints. However, instances exist where these constraints are either unknown or challenging to specify accurately. A possible solution is to infer the unknown constraints from expert demonstration. This paper presents a novel two-step Positive-Unlabeled Constraint Learning (PUCL) algorithm to infer a continuous constraint function from demonstrations, without requiring prior knowledge of the true constraint parameterization or environmental model as existing works. We treat all data in demonstrations as positive (feasible) data, and learn a control policy to generate potentially infeasible trajectories, which serve as unlabeled data. The proposed two-step learning framework first identifies reliable infeasible data using a distance metric, and secondly learns a binary feasibility classifier (i.e., constraint function) from the feasible demonstrations and reliable infeasible data. The proposed method is flexible to learn complex-shaped constraint boundary and will not mistakenly classify demonstrations as infeasible as previous methods. The effectiveness of the proposed method is verified in four constrained environments, using a networked policy or a dynamical system policy. It successfully infers the continuous nonlinear constraints and outperforms other baseline methods in terms of constraint accuracy and policy safety. This work has been published in IEEE Robotics and Automation Letters (RA-L). Please refer to the final version at https://doi.org/10.1109/LRA.2024.3522756
Related papers
- UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations [11.666700714916065]
We address the problem of offline learning a policy that avoids undesirable demonstrations.
We formulate the learning task as maximizing a statistical distance between the learning policy and the undesirable policy.
Our algorithm, UNIQ, tackles these challenges by building on the inverse Q-learning framework.
arXiv Detail & Related papers (2024-10-10T18:52:58Z) - Learning Constraint Network from Demonstrations via Positive-Unlabeled Learning with Memory Replay [8.361428709513476]
This paper presents a positive-unlabeled (PU) learning approach to infer a continuous, arbitrary and possibly nonlinear, constraint from demonstration.
The effectiveness of the proposed method is validated in two Mujoco environments.
arXiv Detail & Related papers (2024-07-23T14:00:18Z) - Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space.
Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z) - Data-Driven Reachability Analysis of Stochastic Dynamical Systems with
Conformal Inference [1.446438366123305]
We consider data-driven reachability analysis of discrete-time dynamical systems using conformal inference.
We focus on learning-enabled control systems with complex closed-loop dynamics.
arXiv Detail & Related papers (2023-09-17T07:23:01Z) - SaFormer: A Conditional Sequence Modeling Approach to Offline Safe
Reinforcement Learning [64.33956692265419]
offline safe RL is of great practical relevance for deploying agents in real-world applications.
We present a novel offline safe RL approach referred to as SaFormer.
arXiv Detail & Related papers (2023-01-28T13:57:01Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - What are the Statistical Limits of Offline RL with Linear Function
Approximation? [70.33301077240763]
offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of sequential decision making strategies.
This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning.
arXiv Detail & Related papers (2020-10-22T17:32:13Z) - Robot Learning with Crash Constraints [37.685515446816105]
In robot applications where failing is undesired but not catastrophic, many algorithms struggle with leveraging data obtained from failures.
This is usually caused by (i) the failed experiment ending prematurely, or (ii) the acquired data being scarce or corrupted.
We consider failing behaviors as those that violate a constraint and address the problem of learning with crash constraints.
arXiv Detail & Related papers (2020-10-16T23:56:35Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Learning Constraints from Locally-Optimal Demonstrations under Cost
Function Uncertainty [6.950510860295866]
We present an algorithm for learning parametric constraints from locally-optimal demonstrations, where the cost function being optimized is uncertain to the learner.
Our method uses the Karush-Kuhn-Tucker (KKT) optimality conditions of the demonstrations within a mixed integer linear program (MILP) to learn constraints which are consistent with the local optimality of the demonstrations.
We evaluate our method on high-dimensional constraints and systems by learning constraints for 7-DOF arm and quadrotor examples, show that it outperforms competing constraint-learning approaches, and can be effectively used to plan new constraint-satisfying trajectories in the environment
arXiv Detail & Related papers (2020-01-25T15:57:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.