Related papers: Inverse Reinforcement Learning without Reinforcement Learning

Inverse Reinforcement Learning without Reinforcement Learning

URL: http://arxiv.org/abs/2303.14623v4
Date: Mon, 29 Jan 2024 19:18:42 GMT
Title: Inverse Reinforcement Learning without Reinforcement Learning
Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
Abstract summary: Inverse Reinforcement Learning (IRL) aims to learn a reward function that rationalizes expert demonstrations. Traditional IRL methods require repeatedly solving a hard reinforcement learning problem as a subroutine. We have reduced the easier problem of imitation learning to repeatedly solving the harder problem of RL.
Score: 40.7783129322142
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Inverse Reinforcement Learning (IRL) is a powerful set of techniques for imitation learning that aims to learn a reward function that rationalizes expert demonstrations. Unfortunately, traditional IRL methods suffer from a computational weakness: they require repeatedly solving a hard reinforcement learning (RL) problem as a subroutine. This is counter-intuitive from the viewpoint of reductions: we have reduced the easier problem of imitation learning to repeatedly solving the harder problem of RL. Another thread of work has proved that access to the side-information of the distribution of states where a strong policy spends time can dramatically reduce the sample and computational complexities of solving an RL problem. In this work, we demonstrate for the first time a more informed imitation learning reduction where we utilize the state distribution of the expert to alleviate the global exploration component of the RL subroutine, providing an exponential speedup in theory. In practice, we find that we are able to significantly speed up the prior art on continuous control tasks.

Related papers

Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control [12.832009040635462]
Reinforcement learning (RL) has proven to be well-performed and general-purpose in the inventory control (IC) domain. Online experience is expensive to acquire in real-world applications. Online experience may not reflect the true demand due to the lost sales phenomenon typical in IC.
arXiv Detail & Related papers (2024-06-26T13:52:47Z)
Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword. We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration. We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z)
The Virtues of Pessimism in Inverse Reinforcement Learning [38.98656220917943]
Inverse Reinforcement Learning is a powerful framework for learning complex behaviors from expert demonstrations. It is desirable to reduce the exploration burden by leveraging expert demonstrations in the inner-loop RL. We consider an alternative approach to speeding up the RL in IRL: emphpessimism, i.e., staying close to the expert's data distribution, instantiated via the use of offline RL algorithms.
arXiv Detail & Related papers (2024-02-04T21:22:29Z)
Toward Computationally Efficient Inverse Reinforcement Learning via Reward Shaping [42.09724642733125]
This work motivates the use of potential-based reward shaping to reduce the computational burden of each RL sub-problem. This work serves as a proof-of-concept and we hope will inspire future developments towards computationally efficient IRL.
arXiv Detail & Related papers (2023-12-15T17:50:18Z)
RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive. They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z)
Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world. Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z)
Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel. On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations. On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
On the Theory of Reinforcement Learning with Once-per-Episode Feedback [120.5537226120512]
We introduce a theory of reinforcement learning in which the learner receives feedback only once at the end of an episode. This is arguably more representative of real-world applications than the traditional requirement that the learner receive feedback at every time step.
arXiv Detail & Related papers (2021-05-29T19:48:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.