Maximum Causal Entropy Inverse Constrained Reinforcement Learning
- URL: http://arxiv.org/abs/2305.02857v1
- Date: Thu, 4 May 2023 14:18:19 GMT
- Title: Maximum Causal Entropy Inverse Constrained Reinforcement Learning
- Authors: Mattijs Baert, Pietro Mazzaglia, Sam Leroux, Pieter Simoens
- Abstract summary: We propose a novel method that utilizes the principle of maximum causal entropy to learn constraints and an optimal policy.
We evaluate the effectiveness of the learned policy by assessing the reward received and the number of constraint violations.
Our method has been shown to outperform state-of-the-art approaches across a variety of tasks and environments.
- Score: 3.409089945290584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When deploying artificial agents in real-world environments where they
interact with humans, it is crucial that their behavior is aligned with the
values, social norms or other requirements of that environment. However, many
environments have implicit constraints that are difficult to specify and
transfer to a learning agent. To address this challenge, we propose a novel
method that utilizes the principle of maximum causal entropy to learn
constraints and an optimal policy that adheres to these constraints, using
demonstrations of agents that abide by the constraints. We prove convergence in
a tabular setting and provide an approximation which scales to complex
environments. We evaluate the effectiveness of the learned policy by assessing
the reward received and the number of constraint violations, and we evaluate
the learned cost function based on its transferability to other agents. Our
method has been shown to outperform state-of-the-art approaches across a
variety of tasks and environments, and it is able to handle problems with
stochastic dynamics and a continuous state-action space.
Related papers
- Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Constrained Meta-Reinforcement Learning for Adaptable Safety Guarantee
with Differentiable Convex Programming [4.825619788907192]
This paper studies the unique challenges of ensuring safety in non-stationary environments by solving constrained problems through the lens of the meta-learning approach (learning-to-learn)
We first employ successive convex-constrained policy updates across multiple tasks with differentiable convexprogramming, which allows meta-learning in constrained scenarios by enabling end-to-end differentiation.
arXiv Detail & Related papers (2023-12-15T21:55:43Z) - Risk-Aware Continuous Control with Neural Contextual Bandits [8.911816419902427]
We propose a risk-aware decision-making framework for contextual bandit problems.
Our framework is designed to cater to various risk levels, effectively balancing constraint satisfaction against performance.
We evaluate our framework in a real-world use case involving a 5G mobile network.
arXiv Detail & Related papers (2023-12-15T17:16:04Z) - Learning Safety Constraints From Demonstration Using One-Class Decision
Trees [1.81343777902022]
We present a novel approach that leverages one-class decision trees to facilitate learning from expert demonstrations.
The learned constraints are subsequently employed within an oracle constrained reinforcement learning framework.
In contrast to other methods, our approach offers an interpretable representation of the constraints, a vital feature in safety-critical environments.
arXiv Detail & Related papers (2023-12-14T11:48:22Z) - Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task.
We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z) - Towards Robust Bisimulation Metric Learning [3.42658286826597]
Bisimulation metrics offer one solution to representation learning problem.
We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies.
We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
arXiv Detail & Related papers (2021-10-27T00:32:07Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z) - Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions.
We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel.
We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.