Positive-Unlabeled Constraint Learning (PUCL) for Inferring Nonlinear Continuous Constraints Functions from Expert Demonstrations
- URL: http://arxiv.org/abs/2408.01622v1
- Date: Sat, 3 Aug 2024 01:09:48 GMT
- Title: Positive-Unlabeled Constraint Learning (PUCL) for Inferring Nonlinear Continuous Constraints Functions from Expert Demonstrations
- Authors: Baiyu Peng, Aude Billard,
- Abstract summary: This paper presents a novel Positive-Unlabeled Constraint Learning (PUCL) algorithm to infer a continuous arbitrary constraint function from demonstration.
Within our framework, we treat all data in demonstrations as positive (feasible) data, and learn a control policy to generate potentially infeasible trajectories.
It successfully infers and transfers the continuous nonlinear constraints and outperforms other baseline methods in terms of constraint accuracy and policy safety.
- Score: 8.361428709513476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planning for a wide range of real-world robotic tasks necessitates to know and write all constraints. However, instances exist where these constraints are either unknown or challenging to specify accurately. A possible solution is to infer the unknown constraints from expert demonstration. This paper presents a novel Positive-Unlabeled Constraint Learning (PUCL) algorithm to infer a continuous arbitrary constraint function from demonstration, without requiring prior knowledge of the true constraint parameterization or environmental model as existing works. Within our framework, we treat all data in demonstrations as positive (feasible) data, and learn a control policy to generate potentially infeasible trajectories, which serve as unlabeled data. In each iteration, we first update the policy and then a two-step positive-unlabeled learning procedure is applied, where it first identifies reliable infeasible data using a distance metric, and secondly learns a binary feasibility classifier (i.e., constraint function) from the feasible demonstrations and reliable infeasible data. The proposed framework is flexible to learn complex-shaped constraint boundary and will not mistakenly classify demonstrations as infeasible as previous methods. The effectiveness of the proposed method is verified in three robotic tasks, using a networked policy or a dynamical system policy. It successfully infers and transfers the continuous nonlinear constraints and outperforms other baseline methods in terms of constraint accuracy and policy safety.
Related papers
- UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations [11.666700714916065]
We address the problem of offline learning a policy that avoids undesirable demonstrations.
We formulate the learning task as maximizing a statistical distance between the learning policy and the undesirable policy.
Our algorithm, UNIQ, tackles these challenges by building on the inverse Q-learning framework.
arXiv Detail & Related papers (2024-10-10T18:52:58Z) - Adaptive Event-triggered Reinforcement Learning Control for Complex Nonlinear Systems [2.08099858257632]
We propose an adaptive event-triggered reinforcement learning control for continuous-time nonlinear systems.
We show that accurate and efficient determination of triggering conditions is possible without the need for explicit learning triggering conditions.
arXiv Detail & Related papers (2024-09-29T20:42:19Z) - Learning General Continuous Constraint from Demonstrations via Positive-Unlabeled Learning [8.361428709513476]
This paper presents a positive-unlabeled (PU) learning approach to infer a continuous, arbitrary and possibly nonlinear, constraint from demonstration.
The effectiveness of the proposed method is validated in two Mujoco environments.
arXiv Detail & Related papers (2024-07-23T14:00:18Z) - SaFormer: A Conditional Sequence Modeling Approach to Offline Safe
Reinforcement Learning [64.33956692265419]
offline safe RL is of great practical relevance for deploying agents in real-world applications.
We present a novel offline safe RL approach referred to as SaFormer.
arXiv Detail & Related papers (2023-01-28T13:57:01Z) - Learning Control Policies for Stochastic Systems with Reach-avoid
Guarantees [20.045860624444494]
We study the problem of learning controllers for discrete-time non-linear dynamical systems with formal reach-avoid guarantees.
We learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work.
Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy.
arXiv Detail & Related papers (2022-10-11T10:02:49Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Learning Implicitly with Noisy Data in Linear Arithmetic [94.66549436482306]
We extend implicit learning in PAC-Semantics to handle intervals and threshold uncertainty in the language of linear arithmetic.
We show that our implicit approach to learning optimal linear programming objective constraints significantly outperforms an explicit approach in practice.
arXiv Detail & Related papers (2020-10-23T19:08:46Z) - What are the Statistical Limits of Offline RL with Linear Function
Approximation? [70.33301077240763]
offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of sequential decision making strategies.
This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning.
arXiv Detail & Related papers (2020-10-22T17:32:13Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Learning Constraints from Locally-Optimal Demonstrations under Cost
Function Uncertainty [6.950510860295866]
We present an algorithm for learning parametric constraints from locally-optimal demonstrations, where the cost function being optimized is uncertain to the learner.
Our method uses the Karush-Kuhn-Tucker (KKT) optimality conditions of the demonstrations within a mixed integer linear program (MILP) to learn constraints which are consistent with the local optimality of the demonstrations.
We evaluate our method on high-dimensional constraints and systems by learning constraints for 7-DOF arm and quadrotor examples, show that it outperforms competing constraint-learning approaches, and can be effectively used to plan new constraint-satisfying trajectories in the environment
arXiv Detail & Related papers (2020-01-25T15:57:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.