PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning
- URL: http://arxiv.org/abs/2102.12560v1
- Date: Wed, 24 Feb 2021 21:12:09 GMT
- Title: PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning
- Authors: Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques,
Gregory Farquhar
- Abstract summary: We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
- Score: 102.36450942613091
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study reinforcement learning (RL) with no-reward demonstrations, a setting
in which an RL agent has access to additional data from the interaction of
other agents with the same environment. However, it has no access to the
rewards or goals of these agents, and their objectives and levels of expertise
may vary widely. These assumptions are common in multi-agent settings, such as
autonomous driving. To effectively use this data, we turn to the framework of
successor features. This allows us to disentangle shared features and dynamics
of the environment from agent-specific rewards and policies. We propose a
multi-task inverse reinforcement learning (IRL) algorithm, called \emph{inverse
temporal difference learning} (ITD), that learns shared state features,
alongside per-agent successor features and preference vectors, purely from
demonstrations without reward labels. We further show how to seamlessly
integrate ITD with learning from online environment interactions, arriving at a
novel algorithm for reinforcement learning with demonstrations, called $\Psi
\Phi$-learning (pronounced `Sci-Fi'). We provide empirical evidence for the
effectiveness of $\Psi \Phi$-learning as a method for improving RL, IRL,
imitation, and few-shot transfer, and derive worst-case bounds for its
performance in zero-shot transfer to new tasks.
Related papers
- A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult.
We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z) - Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement
Learning with Sub-optimal Demonstrations [25.536792010283566]
Inverse reinforcement learning (IRL) aims to explicitly infer an underlying reward function based on collected expert demonstrations.
We introduce the Distance-rank Aware Sequential Reward Learning (DRASRL) framework.
Our framework demonstrates significant performance improvements over previous SOTA methods.
arXiv Detail & Related papers (2023-10-13T02:38:35Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - XIRL: Cross-embodiment Inverse Reinforcement Learning [25.793366206387827]
We show that it is possible to automatically learn vision-based reward functions from cross-embodiment demonstration videos.
Specifically, we present a self-supervised method for Cross-embodiment Inverse Reinforcement Learning.
We find our learned reward function not only works for embodiments seen during training, but also generalizes to entirely new embodiments.
arXiv Detail & Related papers (2021-06-07T18:45:07Z) - Learning Invariant Representations for Reinforcement Learning without
Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction.
Bisimulation metrics quantify behavioral similarity between states in continuous MDPs.
We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.