Unsupervised Behavior Extraction via Random Intent Priors
- URL: http://arxiv.org/abs/2310.18687v1
- Date: Sat, 28 Oct 2023 12:03:34 GMT
- Title: Unsupervised Behavior Extraction via Random Intent Priors
- Authors: Hao Hu, Yiqin Yang, Jianing Ye, Ziqing Mai, Chongjie Zhang
- Abstract summary: UBER is an unsupervised approach to extract useful behaviors from offline reward-free datasets via diversified rewards.
We show that rewards generated from random neural networks are sufficient to extract diverse and useful behaviors.
- Score: 29.765683436971027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reward-free data is abundant and contains rich prior knowledge of human
behaviors, but it is not well exploited by offline reinforcement learning (RL)
algorithms. In this paper, we propose UBER, an unsupervised approach to extract
useful behaviors from offline reward-free datasets via diversified rewards.
UBER assigns different pseudo-rewards sampled from a given prior distribution
to different agents to extract a diverse set of behaviors, and reuse them as
candidate policies to facilitate the learning of new tasks. Perhaps
surprisingly, we show that rewards generated from random neural networks are
sufficient to extract diverse and useful behaviors, some even close to expert
ones. We provide both empirical and theoretical evidence to justify the use of
random priors for the reward function. Experiments on multiple benchmarks
showcase UBER's ability to learn effective and diverse behavior sets that
enhance sample efficiency for online RL, outperforming existing baselines. By
reducing reliance on human supervision, UBER broadens the applicability of RL
to real-world scenarios with abundant reward-free data.
Related papers
- Accelerating Exploration with Unlabeled Prior Data [66.43995032226466]
We study how prior data without reward labels may be used to guide and accelerate exploration for an agent solving a new sparse reward task.
We propose a simple approach that learns a reward model from online experience, labels the unlabeled prior data with optimistic rewards, and then uses it concurrently alongside the online data for downstream policy and critic optimization.
arXiv Detail & Related papers (2023-11-09T00:05:17Z) - Kernel Density Bayesian Inverse Reinforcement Learning [5.699034783029326]
Inverse reinforcement learning (IRL) methods infer an agent's reward function using demonstrations of expert behavior.
This work introduces a principled and theoretically grounded framework that enables Bayesian IRL to be applied across a variety of domains.
arXiv Detail & Related papers (2023-03-13T03:00:03Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible.
In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types.
We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z) - On Reward-Free Reinforcement Learning with Linear Function Approximation [144.4210285338698]
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest.
In this work, we give both positive and negative results for reward-free RL with linear function approximation.
arXiv Detail & Related papers (2020-06-19T17:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.