Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience
- URL: http://arxiv.org/abs/2208.04919v1
- Date: Tue, 9 Aug 2022 17:29:49 GMT
- Title: Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience
- Authors: Marwa Abdulhai, Natasha Jaques, Sergey Levine
- Abstract summary: inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
- Score: 89.30876995059168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of inverse reinforcement learning (IRL) --
inferring the reward function of an agent from observing its behavior. IRL can
provide a generalizable and compact representation for apprenticeship learning,
and enable accurately inferring the preferences of a human in order to assist
them. %and provide for more accurate prediction. However, effective IRL is
challenging, because many reward functions can be compatible with an observed
behavior. We focus on how prior reinforcement learning (RL) experience can be
leveraged to make learning these preferences faster and more efficient. We
propose the IRL algorithm BASIS (Behavior Acquisition through Successor-feature
Intention inference from Samples), which leverages multi-task RL pre-training
and successor features to allow an agent to build a strong basis for intentions
that spans the space of possible goals in a given domain. When exposed to just
a few expert demonstrations optimizing a novel goal, the agent uses its basis
to quickly and effectively infer the reward function. Our experiments reveal
that our method is highly effective at inferring and optimizing demonstrated
reward functions, accurately inferring reward functions from less than 100
trajectories.
Related papers
- REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Learning and reusing primitive behaviours to improve Hindsight
Experience Replay sample efficiency [7.806014635635933]
We propose a method that uses primitive behaviours that have been previously learned to solve simple tasks.
This guidance is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed.
We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time.
arXiv Detail & Related papers (2023-10-03T06:49:57Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Curious Exploration and Return-based Memory Restoration for Deep
Reinforcement Learning [2.3226893628361682]
In this paper, we focus on training a single agent to score goals with binary success/failure reward function.
The proposed method can be utilized to train agents in environments with fairly complex state and action spaces.
arXiv Detail & Related papers (2021-05-02T16:01:34Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Inverse Reinforcement Learning via Matching of Optimality Profiles [2.561053769852449]
We propose an algorithm that learns a reward function from demonstrations of suboptimal or heterogeneous performance.
We show that our method is capable of learning reward functions such that policies trained to optimize them outperform the demonstrations used for fitting the reward functions.
arXiv Detail & Related papers (2020-11-18T13:23:43Z) - Off-Policy Adversarial Inverse Reinforcement Learning [0.0]
Adversarial Imitation Learning (AIL) is a class of algorithms in Reinforcement learning (RL)
We propose an Off-Policy Adversarial Inverse Reinforcement Learning (Off-policy-AIRL) algorithm which is sample efficient as well as gives good imitation performance.
arXiv Detail & Related papers (2020-05-03T16:51:40Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.