Related papers: Accelerating Inverse Reinforcement Learning with Expert Bootstrapping

Accelerating Inverse Reinforcement Learning with Expert Bootstrapping

URL: http://arxiv.org/abs/2402.02608v1
Date: Sun, 4 Feb 2024 20:49:53 GMT
Title: Accelerating Inverse Reinforcement Learning with Expert Bootstrapping
Authors: David Wu and Sanjiban Choudhury
Abstract summary: We show that better utilization of expert demonstrations can reduce the need for hard exploration in the inner RL loop. Specifically, we propose two simple recipes: (1) placing expert transitions into the replay buffer of the inner RL algorithm (e.g. Soft-Actor Critic) which directly informs the learner about high reward states instead of forcing the learner to discover them through extensive exploration, and (2) using expert actions in Q value bootstrapping to improve the target Q value estimates and more accurately describe high value expert states.
Score: 13.391861125428234
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing inverse reinforcement learning methods (e.g. MaxEntIRL, $f$-IRL) search over candidate reward functions and solve a reinforcement learning problem in the inner loop. This creates a rather strange inversion where a harder problem, reinforcement learning, is in the inner loop of a presumably easier problem, imitation learning. In this work, we show that better utilization of expert demonstrations can reduce the need for hard exploration in the inner RL loop, hence accelerating learning. Specifically, we propose two simple recipes: (1) placing expert transitions into the replay buffer of the inner RL algorithm (e.g. Soft-Actor Critic) which directly informs the learner about high reward states instead of forcing the learner to discover them through extensive exploration, and (2) using expert actions in Q value bootstrapping in order to improve the target Q value estimates and more accurately describe high value expert states. Our methods show significant gains over a MaxEntIRL baseline on the benchmark MuJoCo suite of tasks, speeding up recovery to 70\% of deterministic expert performance by 2.13x on HalfCheetah-v2, 2.6x on Ant-v2, 18x on Hopper-v2, and 3.36x on Walker2d-v2.

Related papers

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching [23.600285251963395]
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimize the reward through repeated RL procedures. We propose a novel approach to IRL by direct policy optimization, exploiting a linear factorization of the return as the inner product of successor features and a reward vector.
arXiv Detail & Related papers (2024-11-11T14:05:50Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations. RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
The Virtues of Pessimism in Inverse Reinforcement Learning [38.98656220917943]
Inverse Reinforcement Learning is a powerful framework for learning complex behaviors from expert demonstrations. It is desirable to reduce the exploration burden by leveraging expert demonstrations in the inner-loop RL. We consider an alternative approach to speeding up the RL in IRL: emphpessimism, i.e., staying close to the expert's data distribution, instantiated via the use of offline RL algorithms.
arXiv Detail & Related papers (2024-02-04T21:22:29Z)
Inverse Reinforcement Learning without Reinforcement Learning [40.7783129322142]
Inverse Reinforcement Learning (IRL) aims to learn a reward function that rationalizes expert demonstrations. Traditional IRL methods require repeatedly solving a hard reinforcement learning problem as a subroutine. We have reduced the easier problem of imitation learning to repeatedly solving the harder problem of RL.
arXiv Detail & Related papers (2023-03-26T04:35:53Z)
Data-Driven Inverse Reinforcement Learning for Expert-Learner Zero-Sum Games [30.720112378448285]
We formulate inverse reinforcement learning as an expert-learner interaction. The optimal performance intent of an expert or target agent is unknown to a learner agent. We develop an off-policy IRL algorithm that does not require knowledge of the expert and learner agent dynamics.
arXiv Detail & Related papers (2023-01-05T10:35:08Z)
Bayesian Q-learning With Imperfect Expert Demonstrations [56.55609745121237]
We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations. We evaluate our approach on a sparse-reward chain environment and six more complicated Atari games with delayed rewards.
arXiv Detail & Related papers (2022-10-01T17:38:19Z)
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method. REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes. It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive. They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
Discriminator Soft Actor Critic without Extrinsic Rewards [0.30586855806896046]
It is difficult to imitate well in unknown states from a small amount of expert data and sampling data. We propose Discriminator Soft Actor Critic (DSAC) to make this algorithm more robust to distribution shift.
arXiv Detail & Related papers (2020-01-19T10:45:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.