Learning Soft Constraints From Constrained Expert Demonstrations
- URL: http://arxiv.org/abs/2206.01311v2
- Date: Thu, 27 Apr 2023 19:26:45 GMT
- Title: Learning Soft Constraints From Constrained Expert Demonstrations
- Authors: Ashish Gaurav, Kasra Rezaee, Guiliang Liu, Pascal Poupart
- Abstract summary: Inverse reinforcement learning (IRL) methods assume that the expert data is generated by an agent optimizing some reward function.
We consider the setting where the reward function is given, and the constraints are unknown, and propose a method that is able to recover these constraints satisfactorily from the expert data.
We demonstrate our approach on synthetic environments, robotics environments and real world highway driving scenarios.
- Score: 16.442694252601452
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inverse reinforcement learning (IRL) methods assume that the expert data is
generated by an agent optimizing some reward function. However, in many
settings, the agent may optimize a reward function subject to some constraints,
where the constraints induce behaviors that may be otherwise difficult to
express with just a reward function. We consider the setting where the reward
function is given, and the constraints are unknown, and propose a method that
is able to recover these constraints satisfactorily from the expert data. While
previous work has focused on recovering hard constraints, our method can
recover cumulative soft constraints that the agent satisfies on average per
episode. In IRL fashion, our method solves this problem by adjusting the
constraint function iteratively through a constrained optimization procedure,
until the agent behavior matches the expert behavior. We demonstrate our
approach on synthetic environments, robotics environments and real world
highway driving scenarios.
Related papers
- Exterior Penalty Policy Optimization with Penalty Metric Network under Constraints [52.37099916582462]
In Constrained Reinforcement Learning (CRL), agents explore the environment to learn the optimal policy while satisfying constraints.
We propose a theoretically guaranteed penalty function method, Exterior Penalty Policy Optimization (EPO), with adaptive penalties generated by a Penalty Metric Network (PMN)
PMN responds appropriately to varying degrees of constraint violations, enabling efficient constraint satisfaction and safe exploration.
arXiv Detail & Related papers (2024-07-22T10:57:32Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Interactively Learning Preference Constraints in Linear Bandits [100.78514640066565]
We study sequential decision-making with known rewards and unknown constraints.
As an application, we consider learning constraints to represent human preferences in a driving simulation.
arXiv Detail & Related papers (2022-06-10T17:52:58Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - Regularized Inverse Reinforcement Learning [49.78352058771138]
Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior.
Regularized IRL applies strongly convex regularizers to the learner's policy.
We propose tractable solutions, and practical methods to obtain them, for regularized IRL.
arXiv Detail & Related papers (2020-10-07T23:38:47Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z) - Constrained Reinforcement Learning for Dynamic Optimization under
Uncertainty [1.5797349391370117]
Dynamic real-time optimization (DRTO) is a challenging task due to the fact that optimal operating conditions must be computed in real time.
The main bottleneck in the industrial application of DRTO is the presence of uncertainty.
We present a constrained reinforcement learning (RL) based approach to accommodate these difficulties.
arXiv Detail & Related papers (2020-06-04T10:17:35Z) - Deep Constrained Q-learning [15.582910645906145]
In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a set of constraints.
We propose Constrained Q-learning, a novel off-policy reinforcement learning framework restricting the action space directly in the Q-update to learn the optimal Q-function for the induced constrained MDP and the corresponding safe policy.
arXiv Detail & Related papers (2020-03-20T17:26:03Z) - Teaching the Old Dog New Tricks: Supervised Learning with Constraints [18.88930622054883]
Adding constraint support in Machine Learning has the potential to address outstanding issues in data-driven AI systems.
Existing approaches typically apply constrained optimization techniques to ML training, enforce constraint satisfaction by adjusting the model design, or use constraints to correct the output.
Here, we investigate a different, complementary, strategy based on "teaching" constraint satisfaction to a supervised ML method via the direct use of a state-of-the-art constraint solver.
arXiv Detail & Related papers (2020-02-25T09:47:39Z) - First Order Constrained Optimization in Policy Space [19.00289722198614]
We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS)
FOCOPS maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints.
We provide empirical evidence that our simple approach achieves better performance on a set of constrained robotics locomotive tasks.
arXiv Detail & Related papers (2020-02-16T05:07:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.