Inverse Constrained Reinforcement Learning
- URL: http://arxiv.org/abs/2011.09999v3
- Date: Fri, 21 May 2021 09:18:14 GMT
- Title: Inverse Constrained Reinforcement Learning
- Authors: Usman Anwar, Shehryar Malik, Alireza Aghasi, Ali Ahmed
- Abstract summary: In this work, we consider the problem of learning constraints from demonstrations of a constraint-abiding agent's behavior.
We show that our framework can successfully learn the most likely constraints that the agent respects.
These learned constraints are textittransferable to new agents that may have different morphologies and/or reward functions.
- Score: 12.669649178762718
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In real world settings, numerous constraints are present which are hard to
specify mathematically. However, for the real world deployment of reinforcement
learning (RL), it is critical that RL agents are aware of these constraints, so
that they can act safely. In this work, we consider the problem of learning
constraints from demonstrations of a constraint-abiding agent's behavior. We
experimentally validate our approach and show that our framework can
successfully learn the most likely constraints that the agent respects. We
further show that these learned constraints are \textit{transferable} to new
agents that may have different morphologies and/or reward functions. Previous
works in this regard have either mainly been restricted to tabular (discrete)
settings, specific types of constraints or assume the environment's transition
dynamics. In contrast, our framework is able to learn arbitrary
\textit{Markovian} constraints in high-dimensions in a completely model-free
setting. The code can be found it:
\url{https://github.com/shehryar-malik/icrl}.
Related papers
- CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning [23.76366118253271]
Current solvers fail to produce efficient policies respecting hard constraints.
We present Constraints as terminations (CaT), a novel constrained RL algorithm.
Videos and code are available at https://constraints-as-terminations.io.
arXiv Detail & Related papers (2024-03-27T17:03:31Z) - From Instructions to Constraints: Language Model Alignment with
Automatic Constraint Verification [70.08146540745877]
We investigate common constraints in NLP tasks, categorize them into three classes based on the types of their arguments.
We propose a unified framework, ACT (Aligning to ConsTraints), to automatically produce supervision signals for user alignment with constraints.
arXiv Detail & Related papers (2024-03-10T22:14:54Z) - ConstraintChecker: A Plugin for Large Language Models to Reason on
Commonsense Knowledge Bases [53.29427395419317]
Reasoning over Commonsense Knowledge Bases (CSKB) has been explored as a way to acquire new commonsense knowledge.
We propose **ConstraintChecker**, a plugin over prompting techniques to provide and check explicit constraints.
arXiv Detail & Related papers (2024-01-25T08:03:38Z) - Learning Shared Safety Constraints from Multi-task Demonstrations [53.116648461888936]
We show how to learn constraints from expert demonstrations of safe task completion.
We learn constraints that forbid highly rewarding behavior that the expert could have taken but chose not to.
We validate our method with simulation experiments on high-dimensional continuous control tasks.
arXiv Detail & Related papers (2023-09-01T19:37:36Z) - Controlled Text Generation with Natural Language Instructions [74.88938055638636]
InstructCTG is a controlled text generation framework that incorporates different constraints.
We first extract the underlying constraints of natural texts through a combination of off-the-shelf NLP tools and simple verbalizes.
By prepending natural language descriptions of the constraints and a few demonstrations, we fine-tune a pre-trained language model to incorporate various types of constraints.
arXiv Detail & Related papers (2023-04-27T15:56:34Z) - Learning Soft Constraints From Constrained Expert Demonstrations [16.442694252601452]
Inverse reinforcement learning (IRL) methods assume that the expert data is generated by an agent optimizing some reward function.
We consider the setting where the reward function is given, and the constraints are unknown, and propose a method that is able to recover these constraints satisfactorily from the expert data.
We demonstrate our approach on synthetic environments, robotics environments and real world highway driving scenarios.
arXiv Detail & Related papers (2022-06-02T21:45:31Z) - SaDe: Learning Models that Provably Satisfy Domain Constraints [16.46852109556965]
We present a machine learning approach that can handle a wide variety of constraints, and guarantee that these constraints will be satisfied by the model even on unseen data.
We cast machine learning as a maximum satisfiability problem, and solve it using a novel algorithm SaDe which combines constraint satisfaction with gradient descent.
arXiv Detail & Related papers (2021-12-01T15:18:03Z) - Safe Reinforcement Learning with Natural Language Constraints [39.70152978025088]
We propose learning to interpret natural language constraints for safe RL.
HazardWorld is a new multi-task benchmark that requires an agent to optimize reward while not violating constraints specified in free-form text.
We show that our method achieves higher rewards (up to 11x) and fewer constraint violations (by 1.8x) compared to existing approaches.
arXiv Detail & Related papers (2020-10-11T03:41:56Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z) - Constrained episodic reinforcement learning in concave-convex and
knapsack settings [81.08055425644037]
We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints.
Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.
arXiv Detail & Related papers (2020-06-09T05:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.