Related papers: Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience

Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience

URL: http://arxiv.org/abs/2208.04919v1
Date: Tue, 9 Aug 2022 17:29:49 GMT
Title: Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
Authors: Marwa Abdulhai, Natasha Jaques, Sergey Levine
Abstract summary: inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
Score: 89.30876995059168
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses the problem of inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. IRL can provide a generalizable and compact representation for apprenticeship learning, and enable accurately inferring the preferences of a human in order to assist them. %and provide for more accurate prediction. However, effective IRL is challenging, because many reward functions can be compatible with an observed behavior. We focus on how prior reinforcement learning (RL) experience can be leveraged to make learning these preferences faster and more efficient. We propose the IRL algorithm BASIS (Behavior Acquisition through Successor-feature Intention inference from Samples), which leverages multi-task RL pre-training and successor features to allow an agent to build a strong basis for intentions that spans the space of possible goals in a given domain. When exposed to just a few expert demonstrations optimizing a novel goal, the agent uses its basis to quickly and effectively infer the reward function. Our experiments reveal that our method is highly effective at inferring and optimizing demonstrated reward functions, accurately inferring reward functions from less than 100 trajectories.

Related papers

Residual Reward Models for Preference-based Reinforcement Learning [11.797520525358564]
Preference-based Reinforcement Learning (PbRL) provides a way to learn high-performance policies in environments where the reward signal is hard to specify.<n>PbRL can suffer from slow convergence speed since it requires training in a reward model.<n>We propose a method to effectively leverage prior knowledge with a Residual Reward Model (RRM)
arXiv Detail & Related papers (2025-07-01T09:43:57Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently. Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world. Current methods to mitigate this misalignment work by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency [7.806014635635933]
We propose a method that uses primitive behaviours that have been previously learned to solve simple tasks. This guidance is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed. We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time.
arXiv Detail & Related papers (2023-10-03T06:49:57Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling. Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z)
Curious Exploration and Return-based Memory Restoration for Deep Reinforcement Learning [2.3226893628361682]
In this paper, we focus on training a single agent to score goals with binary success/failure reward function. The proposed method can be utilized to train agents in environments with fairly complex state and action spaces.
arXiv Detail & Related papers (2021-05-02T16:01:34Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Inverse Reinforcement Learning via Matching of Optimality Profiles [2.561053769852449]
We propose an algorithm that learns a reward function from demonstrations of suboptimal or heterogeneous performance. We show that our method is capable of learning reward functions such that policies trained to optimize them outperform the demonstrations used for fitting the reward functions.
arXiv Detail & Related papers (2020-11-18T13:23:43Z)
Off-Policy Adversarial Inverse Reinforcement Learning [0.0]
Adversarial Imitation Learning (AIL) is a class of algorithms in Reinforcement learning (RL) We propose an Off-Policy Adversarial Inverse Reinforcement Learning (Off-policy-AIRL) algorithm which is sample efficient as well as gives good imitation performance.
arXiv Detail & Related papers (2020-05-03T16:51:40Z)
Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks. We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible. We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.