Learning Constraint Network from Demonstrations via Positive-Unlabeled Learning with Memory Replay
- URL: http://arxiv.org/abs/2407.16485v3
- Date: Thu, 16 Jan 2025 11:59:02 GMT
- Title: Learning Constraint Network from Demonstrations via Positive-Unlabeled Learning with Memory Replay
- Authors: Baiyu Peng, Aude Billard,
- Abstract summary: This paper presents a positive-unlabeled (PU) learning approach to infer a continuous, arbitrary and possibly nonlinear, constraint from demonstration.
The effectiveness of the proposed method is validated in two Mujoco environments.
- Score: 8.361428709513476
- License:
- Abstract: Planning for a wide range of real-world tasks necessitates to know and write all constraints. However, instances exist where these constraints are either unknown or challenging to specify accurately. A possible solution is to infer the unknown constraints from expert demonstration. The majority of prior works limit themselves to learning simple linear constraints, or require strong knowledge of the true constraint parameterization or environmental model. To mitigate these problems, this paper presents a positive-unlabeled (PU) learning approach to infer a continuous, arbitrary and possibly nonlinear, constraint from demonstration. From a PU learning view, We treat all data in demonstrations as positive (feasible) data, and learn a (sub)-optimal policy to generate high-reward-winning but potentially infeasible trajectories, which serve as unlabeled data containing both feasible and infeasible states. Under an assumption on data distribution, a feasible-infeasible classifier (i.e., constraint model) is learned from the two datasets through a postprocessing PU learning technique. The entire method employs an iterative framework alternating between updating the policy, which generates and selects higher-reward policies, and updating the constraint model. Additionally, a memory buffer is introduced to record and reuse samples from previous iterations to prevent forgetting. The effectiveness of the proposed method is validated in two Mujoco environments, successfully inferring continuous nonlinear constraints and outperforming a baseline method in terms of constraint accuracy and policy safety.
Related papers
- Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning.
One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems.
We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - Positive-Unlabeled Constraint Learning for Inferring Nonlinear Continuous Constraints Functions from Expert Demonstrations [8.361428709513476]
Planning for diverse real-world robotic tasks necessitates to know and write all constraints.
This paper presents a novel two-step Positive-Unlabeled Constraint Learning (PUCL) algorithm to infer a continuous constraint function from demonstrations.
It successfully infers the continuous nonlinear constraints and outperforms other baseline methods in terms of constraint accuracy and policy safety.
arXiv Detail & Related papers (2024-08-03T01:09:48Z) - Learning Adversarial MDPs with Stochastic Hard Constraints [37.24692425018]
We study online learning in constrained Markov decision processes (CMDPs) with adversarial losses and hard constraints.
Our work is the first to study CMDPs involving both adversarial losses and hard constraints.
arXiv Detail & Related papers (2024-03-06T12:49:08Z) - Primal Dual Continual Learning: Balancing Stability and Plasticity through Adaptive Memory Allocation [86.8475564814154]
We show that it is both possible and beneficial to undertake the constrained optimization problem directly.
We focus on memory-based methods, where a small subset of samples from previous tasks can be stored in a replay buffer.
We show that dual variables indicate the sensitivity of the optimal value of the continual learning problem with respect to constraint perturbations.
arXiv Detail & Related papers (2023-09-29T21:23:27Z) - SaFormer: A Conditional Sequence Modeling Approach to Offline Safe
Reinforcement Learning [64.33956692265419]
offline safe RL is of great practical relevance for deploying agents in real-world applications.
We present a novel offline safe RL approach referred to as SaFormer.
arXiv Detail & Related papers (2023-01-28T13:57:01Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Constrained Policy Gradient Method for Safe and Fast Reinforcement
Learning: a Neural Tangent Kernel Based Approach [6.316693022958221]
This paper presents a constrained policy gradient algorithm for safe learning.
We introduce constraints for safe learning with the following steps.
The efficiency of the constrained learning was demonstrated with a shallow and wide ReLU network in the Cartpole and Lunar Lander OpenAI gym environments.
arXiv Detail & Related papers (2021-07-19T20:25:15Z) - Self-Supervised Noisy Label Learning for Source-Free Unsupervised Domain
Adaptation [87.60688582088194]
We propose a novel Self-Supervised Noisy Label Learning method.
Our method can easily achieve state-of-the-art results and surpass other methods by a very large margin.
arXiv Detail & Related papers (2021-02-23T10:51:45Z) - PLAS: Latent Action Space for Offline Reinforcement Learning [18.63424441772675]
The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment.
Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions.
We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets.
arXiv Detail & Related papers (2020-11-14T03:38:38Z) - What are the Statistical Limits of Offline RL with Linear Function
Approximation? [70.33301077240763]
offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of sequential decision making strategies.
This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning.
arXiv Detail & Related papers (2020-10-22T17:32:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.