Benchmarking Constraint Inference in Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2206.09670v1
- Date: Mon, 20 Jun 2022 09:22:20 GMT
- Title: Benchmarking Constraint Inference in Inverse Reinforcement Learning
- Authors: Guiliang Liu, Yudong Luo, Ashish Gaurav, Kasra Rezaee and Pascal
Poupart
- Abstract summary: In many real-world problems, the constraints followed by expert agents are often hard to specify mathematically and unknown to the RL agents.
In this paper, we construct a CIRL benchmark in the context of two major application domains: robot control and autonomous driving.
The benchmark, including the information for reproducing the performance of CIRL algorithms, is publicly available at https://github.com/Guiliang/CIRL-benchmarks-public.
- Score: 19.314352936252444
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When deploying Reinforcement Learning (RL) agents into a physical system, we
must ensure that these agents are well aware of the underlying constraints. In
many real-world problems, however, the constraints followed by expert agents
(e.g., humans) are often hard to specify mathematically and unknown to the RL
agents. To tackle these issues, Constraint Inverse Reinforcement Learning
(CIRL) considers the formalism of Constrained Markov Decision Processes (CMDPs)
and estimates constraints from expert demonstrations by learning a constraint
function. As an emerging research topic, CIRL does not have common benchmarks,
and previous works tested their algorithms with hand-crafted environments
(e.g., grid worlds). In this paper, we construct a CIRL benchmark in the
context of two major application domains: robot control and autonomous driving.
We design relevant constraints for each environment and empirically study the
ability of different algorithms to recover those constraints based on expert
trajectories that respect those constraints. To handle stochastic dynamics, we
propose a variational approach that infers constraint distributions, and we
demonstrate its performance by comparing it with other CIRL baselines on our
benchmark. The benchmark, including the information for reproducing the
performance of CIRL algorithms, is publicly available at
https://github.com/Guiliang/CIRL-benchmarks-public
Related papers
- CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning [23.76366118253271]
Current solvers fail to produce efficient policies respecting hard constraints.
We present Constraints as terminations (CaT), a novel constrained RL algorithm.
Videos and code are available at https://constraints-as-terminations.io.
arXiv Detail & Related papers (2024-03-27T17:03:31Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for
Robotics Control with Action Constraints [9.293472255463454]
This study presents a benchmark for evaluating action-constrained reinforcement learning (RL) algorithms.
We evaluate existing algorithms and their novel variants across multiple robotics control environments.
arXiv Detail & Related papers (2023-04-18T05:45:09Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints.
We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Constraint Sampling Reinforcement Learning: Incorporating Expertise For
Faster Learning [43.562783189118]
We introduce a practical algorithm for incorporating human insight to speed learning.
Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy.
In all cases, CSRL learns a good policy faster than baselines.
arXiv Detail & Related papers (2021-12-30T22:02:42Z) - Deep RL With Information Constrained Policies: Generalization in
Continuous Control [21.46148507577606]
We show that a natural constraint on information flow might confer onto artificial agents in continuous control tasks.
We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm.
Our experiments show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments.
arXiv Detail & Related papers (2020-10-09T15:42:21Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.