Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms
- URL: http://arxiv.org/abs/2402.15392v2
- Date: Thu, 6 Jun 2024 06:49:52 GMT
- Title: Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms
- Authors: Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli,
- Abstract summary: Inverse reinforcement learning (IRL) aims to recover the reward function of an expert agent from demonstrations of behavior.
This paper introduces a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting.
- Score: 23.61332577985059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inverse reinforcement learning (IRL) aims to recover the reward function of an expert agent from demonstrations of behavior. It is well-known that the IRL problem is fundamentally ill-posed, i.e., many reward functions can explain the demonstrations. For this reason, IRL has been recently reframed in terms of estimating the feasible reward set (Metelli et al., 2021), thus, postponing the selection of a single reward. However, so far, the available formulations and algorithmic solutions have been proposed and analyzed mainly for the online setting, where the learner can interact with the environment and query the expert at will. This is clearly unrealistic in most practical applications, where the availability of an offline dataset is a much more common scenario. In this paper, we introduce a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting and we analyze the complexity of its estimation. This requires the introduction an original learning framework that copes with the intrinsic difficulty of the setting, for which the data coverage is not under control. Then, we propose two computationally and statistically efficient algorithms, IRLO and PIRLO, for addressing the problem. In particular, the latter adopts a specific form of pessimism to enforce the novel desirable property of inclusion monotonicity of the delivered feasible set. With this work, we aim to provide a panorama of the challenges of the offline IRL problem and how they can be fruitfully addressed.
Related papers
- Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback [45.82577700155503]
ONI is a distributed architecture that simultaneously learns an RL policy and an intrinsic reward function.
Our approach achieves state-of-the-art performance across a range of challenging, sparse reward tasks from the NetHack Learning Environment.
arXiv Detail & Related papers (2024-10-30T13:52:43Z) - Preference Elicitation for Offline Reinforcement Learning [59.136381500967744]
We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm.
Our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy.
arXiv Detail & Related papers (2024-06-26T15:59:13Z) - How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach [23.61332577985059]
Inverse Reinforcement Learning (IRL) uses samples to improve its estimate of the reward function.
We show that none of the algorithms available in the literature can scale to problems with large state spaces.
We introduce the novel framework of rewards compatibility, which generalizes the notion of feasible set.
arXiv Detail & Related papers (2024-06-06T07:37:42Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Is Inverse Reinforcement Learning Harder than Standard Reinforcement
Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems.
This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime.
As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z) - Adversarial Imitation Learning On Aggregated Data [0.0]
Inverse Reinforcement Learning (IRL) learns an optimal policy, given some expert demonstrations, thus avoiding the need for the tedious process of specifying a suitable reward function.
We propose an approach which removes these requirements through a dynamic, adaptive method called Adversarial Imitation Learning on Aggregated Data (AILAD)
It learns conjointly both a non linear reward function and the associated optimal policy using an adversarial framework.
arXiv Detail & Related papers (2023-11-14T22:13:38Z) - Accelerating Exploration with Unlabeled Prior Data [66.43995032226466]
We study how prior data without reward labels may be used to guide and accelerate exploration for an agent solving a new sparse reward task.
We propose a simple approach that learns a reward model from online experience, labels the unlabeled prior data with optimistic rewards, and then uses it concurrently alongside the online data for downstream policy and critic optimization.
arXiv Detail & Related papers (2023-11-09T00:05:17Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Task-Guided IRL in POMDPs that Scales [22.594913269327353]
In inverse linear reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts.
Most IRL techniques require the computationally forward problem -- computing an optimal policy given a reward function -- in POMDPs.
We develop an algorithm that reduces the information while increasing the data efficiency.
arXiv Detail & Related papers (2022-12-30T21:08:57Z) - Offline Stochastic Shortest Path: Learning, Evaluation and Towards
Optimality [57.91411772725183]
In this paper, we consider the offline shortest path problem when the state space and the action space are finite.
We design the simple value-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks.
Our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal.
arXiv Detail & Related papers (2022-06-10T07:44:56Z) - Offline Inverse Reinforcement Learning [24.316047317028147]
offline RL is to learn optimal policies when a fixed exploratory demonstrations data-set is available.
Inspired by the success of IRL techniques in achieving state of the art imitation performances in online settings, we exploit GAN based data augmentation procedures to construct the first offline IRL algorithm.
arXiv Detail & Related papers (2021-06-09T13:44:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.