Related papers: How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach

How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach

URL: http://arxiv.org/abs/2406.03812v2
Date: Tue, 08 Oct 2024 17:16:10 GMT
Title: How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach
Authors: Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli,
Abstract summary: Inverse Reinforcement Learning (IRL) uses samples to improve its estimate of the reward function. We show that none of the algorithms available in the literature can scale to problems with large state spaces. We introduce the novel framework of rewards compatibility, which generalizes the notion of feasible set.
Score: 23.61332577985059
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In online Inverse Reinforcement Learning (IRL), the learner can collect samples about the dynamics of the environment to improve its estimate of the reward function. Since IRL suffers from identifiability issues, many theoretical works on online IRL focus on estimating the entire set of rewards that explain the demonstrations, named the feasible reward set. However, none of the algorithms available in the literature can scale to problems with large state spaces. In this paper, we focus on the online IRL problem in Linear Markov Decision Processes (MDPs). We show that the structure offered by Linear MDPs is not sufficient for efficiently estimating the feasible set when the state space is large. As a consequence, we introduce the novel framework of rewards compatibility, which generalizes the notion of feasible set, and we develop CATY-IRL, a sample efficient algorithm whose complexity is independent of the cardinality of the state space in Linear MDPs. When restricted to the tabular setting, we demonstrate that CATY-IRL is minimax optimal up to logarithmic factors. As a by-product, we show that Reward-Free Exploration (RFE) enjoys the same worst-case rate, improving over the state-of-the-art lower bound. Finally, we devise a unifying framework for IRL and RFE that may be of independent interest.

Related papers

Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms [23.61332577985059]
Inverse reinforcement learning (IRL) aims to recover the reward function of an expert agent from demonstrations of behavior. This paper introduces a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting.
arXiv Detail & Related papers (2024-02-23T15:49:46Z)
Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems. This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime. As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z)
When is Agnostic Reinforcement Learning Statistically Tractable? [76.1408672715773]
A new complexity measure, called the emphspanning capacity, depends solely on the set $Pi$ and is independent of the MDP dynamics. We show there exists a policy class $Pi$ with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for learnability between generative access and online access models.
arXiv Detail & Related papers (2023-10-09T19:40:54Z)
Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.<n>We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z)
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning [43.9624940128166]
This paper explains the benefits of distributional reinforcement learning (DistRL) through the lens of small-loss bounds. In online RL, we propose a DistRL algorithm that constructs confidence sets using maximum likelihood estimation. In offline RL, we show that pessimistic DistRL enjoys small-loss PAC bounds that are novel to the offline setting and are more robust to bad single-policy coverage.
arXiv Detail & Related papers (2023-05-25T04:19:43Z)
Towards Theoretical Understanding of Inverse Reinforcement Learning [45.3190496371625]
Inverse reinforcement learning (IRL) denotes a powerful family of algorithms for recovering a reward function justifying the behavior demonstrated by an expert agent. In this paper, we make a step towards closing the theory gap of IRL in the case of finite-horizon problems with a generative model.
arXiv Detail & Related papers (2023-04-25T16:21:10Z)
Sample Efficient Deep Reinforcement Learning via Local Planning [21.420851589712626]
This work focuses on sample-efficient deep reinforcement learning (RL) with a simulator. We propose an algorithmic framework, named uncertainty-first local planning (UFLP), that takes advantage of this property. We demonstrate that this simple procedure can dramatically improve the sample cost of several baseline RL algorithms on difficult exploration tasks.
arXiv Detail & Related papers (2023-01-29T23:17:26Z)
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED) PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward. To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z)
Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration. We show that this separation does not exist in the setting of linear MDPs. We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z)
Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization [43.51553742077343]
inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration. This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions consistent with the expert demonstrations.
arXiv Detail & Related papers (2020-11-17T10:17:45Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
On Reward-Free Reinforcement Learning with Linear Function Approximation [144.4210285338698]
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. In this work, we give both positive and negative results for reward-free RL with linear function approximation.
arXiv Detail & Related papers (2020-06-19T17:59:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.