Related papers: PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning

PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning

URL: http://arxiv.org/abs/2102.12560v1
Date: Wed, 24 Feb 2021 21:12:09 GMT
Title: PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Authors: Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar
Abstract summary: We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
Score: 102.36450942613091
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study reinforcement learning (RL) with no-reward demonstrations, a setting in which an RL agent has access to additional data from the interaction of other agents with the same environment. However, it has no access to the rewards or goals of these agents, and their objectives and levels of expertise may vary widely. These assumptions are common in multi-agent settings, such as autonomous driving. To effectively use this data, we turn to the framework of successor features. This allows us to disentangle shared features and dynamics of the environment from agent-specific rewards and policies. We propose a multi-task inverse reinforcement learning (IRL) algorithm, called \emph{inverse temporal difference learning} (ITD), that learns shared state features, alongside per-agent successor features and preference vectors, purely from demonstrations without reward labels. We further show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $\Psi \Phi$-learning (pronounced `Sci-Fi'). We provide empirical evidence for the effectiveness of $\Psi \Phi$-learning as a method for improving RL, IRL, imitation, and few-shot transfer, and derive worst-case bounds for its performance in zero-shot transfer to new tasks.

Related papers

Learning to Reason without External Rewards [100.27210579418562]
Training large language models (LLMs) for complex reasoning via Reinforcement Learning with Verifiable Rewards (RLVR) is effective but limited by reliance on costly, domain-specific supervision.<n>We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data.<n>We propose Intuitor, an RLIF method that uses a model's own confidence, termed self-certainty, as its sole reward signal.
arXiv Detail & Related papers (2025-05-26T07:01:06Z)
Subtask-Aware Visual Reward Learning from Segmented Demonstrations [97.80917991633248]
This paper introduces REDS: REward learning from Demonstration with Demonstrations, a novel reward learning framework. We train a dense reward function conditioned on video segments and their corresponding subtasks to ensure alignment with ground-truth reward signals. Our experiments show that REDS significantly outperforms baseline methods on complex robotic manipulation tasks in Meta-World.
arXiv Detail & Related papers (2025-02-28T01:25:37Z)
Proto Successor Measure: Representing the Behavior Space of an RL Agent [37.55496993803242]
"Zero-shot learning" is elusive for general-purpose reinforcement learning algorithms. We present Proto Successor Measure: the basis set for all possible behaviors of a Reinforcement Learning Agent. We derive a practical algorithm to learn these basis functions using reward-free interaction data from the environment.
arXiv Detail & Related papers (2024-11-29T00:09:39Z)
A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult. We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z)
Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations [25.536792010283566]
Inverse reinforcement learning (IRL) aims to explicitly infer an underlying reward function based on collected expert demonstrations. We introduce the Distance-rank Aware Sequential Reward Learning (DRASRL) framework. Our framework demonstrates significant performance improvements over previous SOTA methods.
arXiv Detail & Related papers (2023-10-13T02:38:35Z)
MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes. This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people. We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z)
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
XIRL: Cross-embodiment Inverse Reinforcement Learning [25.793366206387827]
We show that it is possible to automatically learn vision-based reward functions from cross-embodiment demonstration videos. Specifically, we present a self-supervised method for Cross-embodiment Inverse Reinforcement Learning. We find our learned reward function not only works for embodiments seen during training, but also generalizes to entirely new embodiments.
arXiv Detail & Related papers (2021-06-07T18:45:07Z)
Learning Invariant Representations for Reinforcement Learning without Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction. Bisimulation metrics quantify behavioral similarity between states in continuous MDPs. We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z)
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.