Related papers: Learning General Continuous Constraint from Demonstrations via Positive-Unlabeled Learning

Learning General Continuous Constraint from Demonstrations via Positive-Unlabeled Learning

URL: http://arxiv.org/abs/2407.16485v2
Date: Fri, 22 Nov 2024 12:58:21 GMT
Title: Learning General Continuous Constraint from Demonstrations via Positive-Unlabeled Learning
Authors: Baiyu Peng, Aude Billard,
Abstract summary: This paper presents a positive-unlabeled (PU) learning approach to infer a continuous, arbitrary and possibly nonlinear, constraint from demonstration. The effectiveness of the proposed method is validated in two Mujoco environments.
Score: 8.361428709513476
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Planning for a wide range of real-world tasks necessitates to know and write all constraints. However, instances exist where these constraints are either unknown or challenging to specify accurately. A possible solution is to infer the unknown constraints from expert demonstration. The majority of prior works limit themselves to learning simple linear constraints, or require strong knowledge of the true constraint parameterization or environmental model. To mitigate these problems, this paper presents a positive-unlabeled (PU) learning approach to infer a continuous, arbitrary and possibly nonlinear, constraint from demonstration. From a PU learning view, We treat all data in demonstrations as positive (feasible) data, and learn a (sub)-optimal policy to generate high-reward-winning but potentially infeasible trajectories, which serve as unlabeled data containing both feasible and infeasible states. Under an assumption on data distribution, a feasible-infeasible classifier (i.e., constraint model) is learned from the two datasets through a postprocessing PU learning technique. The entire method employs an iterative framework alternating between updating the policy, which generates and selects higher-reward policies, and updating the constraint model. Additionally, a memory buffer is introduced to record and reuse samples from previous iterations to prevent forgetting. The effectiveness of the proposed method is validated in two Mujoco environments, successfully inferring continuous nonlinear constraints and outperforming a baseline method in terms of constraint accuracy and policy safety.

Related papers

Semi-pessimistic Reinforcement Learning [14.86779635383123]
We propose a semi-pessimistic RL method to leverage abundant unlabeled data.<n>It considerably simplifies the learning process, as it seeks a lower bound of the reward function.<n>It enjoys the guaranteed improvement when utilizing vast unlabeled data, but requires much less restrictive conditions.
arXiv Detail & Related papers (2025-05-25T06:47:36Z)
Estimating Control Barriers from Offline Data [14.241303913878887]
We propose a novel framework for learning neural CBFs through a fixed, sparsely-labeled dataset collected prior to training. With limited amount of offline data, it achieves state-of-the-art performance for dynamic obstacle avoidance.
arXiv Detail & Related papers (2025-02-21T04:55:20Z)
Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
Positive-Unlabeled Constraint Learning (PUCL) for Inferring Nonlinear Continuous Constraints Functions from Expert Demonstrations [8.361428709513476]
This paper presents a novel Positive-Unlabeled Constraint Learning (PUCL) algorithm to infer a continuous arbitrary constraint function from demonstration. Within our framework, we treat all data in demonstrations as positive (feasible) data, and learn a control policy to generate potentially infeasible trajectories. It successfully infers and transfers the continuous nonlinear constraints and outperforms other baseline methods in terms of constraint accuracy and policy safety.
arXiv Detail & Related papers (2024-08-03T01:09:48Z)
Learning Adversarial MDPs with Stochastic Hard Constraints [37.24692425018]
We study online learning in constrained Markov decision processes (CMDPs) with adversarial losses and hard constraints. Our work is the first to study CMDPs involving both adversarial losses and hard constraints.
arXiv Detail & Related papers (2024-03-06T12:49:08Z)
Adversarial Imitation Learning On Aggregated Data [0.0]
Inverse Reinforcement Learning (IRL) learns an optimal policy, given some expert demonstrations, thus avoiding the need for the tedious process of specifying a suitable reward function. We propose an approach which removes these requirements through a dynamic, adaptive method called Adversarial Imitation Learning on Aggregated Data (AILAD) It learns conjointly both a non linear reward function and the associated optimal policy using an adversarial framework.
arXiv Detail & Related papers (2023-11-14T22:13:38Z)
Primal Dual Continual Learning: Balancing Stability and Plasticity through Adaptive Memory Allocation [86.8475564814154]
We show that it is both possible and beneficial to undertake the constrained optimization problem directly. We focus on memory-based methods, where a small subset of samples from previous tasks can be stored in a replay buffer. We show that dual variables indicate the sensitivity of the optimal value of the continual learning problem with respect to constraint perturbations.
arXiv Detail & Related papers (2023-09-29T21:23:27Z)
Offline Imitation Learning with Suboptimal Demonstrations via Relaxed Distribution Matching [109.5084863685397]
offline imitation learning (IL) promises the ability to learn performant policies from pre-collected demonstrations without interactions with the environment. We present RelaxDICE, which employs an asymmetrically-relaxed f-divergence for explicit support regularization. Our method significantly outperforms the best prior offline method in six standard continuous control environments.
arXiv Detail & Related papers (2023-03-05T03:35:11Z)
SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning [64.33956692265419]
offline safe RL is of great practical relevance for deploying agents in real-world applications. We present a novel offline safe RL approach referred to as SaFormer.
arXiv Detail & Related papers (2023-01-28T13:57:01Z)
A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled. We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples. We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z)
Online Selective Classification with Limited Feedback [82.68009460301585]
We study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance. Two salient aspects of the setting we consider are that the data may be non-realisable, due to which abstention may be a valid long-term action. We construct simple versioning-based schemes for any $mu in (0,1],$ that make most $Tmu$ mistakes while incurring smash$tildeO(T1-mu)$ excess abstention against adaptive adversaries.
arXiv Detail & Related papers (2021-10-27T08:00:53Z)
Self-Supervised Noisy Label Learning for Source-Free Unsupervised Domain Adaptation [87.60688582088194]
We propose a novel Self-Supervised Noisy Label Learning method. Our method can easily achieve state-of-the-art results and surpass other methods by a very large margin.
arXiv Detail & Related papers (2021-02-23T10:51:45Z)
PLAS: Latent Action Space for Offline Reinforcement Learning [18.63424441772675]
The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment. Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions. We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets.
arXiv Detail & Related papers (2020-11-14T03:38:38Z)
Learning Implicitly with Noisy Data in Linear Arithmetic [94.66549436482306]
We extend implicit learning in PAC-Semantics to handle intervals and threshold uncertainty in the language of linear arithmetic. We show that our implicit approach to learning optimal linear programming objective constraints significantly outperforms an explicit approach in practice.
arXiv Detail & Related papers (2020-10-23T19:08:46Z)
What are the Statistical Limits of Offline RL with Linear Function Approximation? [70.33301077240763]
offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of sequential decision making strategies. This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning.
arXiv Detail & Related papers (2020-10-22T17:32:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.