Unsupervised Zero-Shot Reinforcement Learning via Functional Reward
Encodings
- URL: http://arxiv.org/abs/2402.17135v1
- Date: Tue, 27 Feb 2024 01:59:02 GMT
- Title: Unsupervised Zero-Shot Reinforcement Learning via Functional Reward
Encodings
- Authors: Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine
- Abstract summary: We present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem.
Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples.
We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks.
- Score: 107.1837163643886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Can we pre-train a generalist agent from a large amount of unlabeled offline
trajectories such that it can be immediately adapted to any new downstream
tasks in a zero-shot manner? In this work, we present a functional reward
encoding (FRE) as a general, scalable solution to this zero-shot RL problem.
Our main idea is to learn functional representations of any arbitrary tasks by
encoding their state-reward samples using a transformer-based variational
auto-encoder. This functional encoding not only enables the pre-training of an
agent from a wide diversity of general unsupervised reward functions, but also
provides a way to solve any new downstream tasks in a zero-shot manner, given a
small number of reward-annotated samples. We empirically show that FRE agents
trained on diverse random unsupervised reward functions can generalize to solve
novel tasks in a range of simulated robotic benchmarks, often outperforming
previous zero-shot RL and offline RL methods. Code for this project is provided
at: https://github.com/kvfrans/fre
Related papers
- Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback [24.759613248409167]
Reward engineering has long been a challenge in Reinforcement Learning research.
We propose RL-VLM-F, a method that automatically generates reward functions for agents to learn new tasks.
We demonstrate that RL-VLM-F successfully produces effective rewards and policies across various domains.
arXiv Detail & Related papers (2024-02-06T04:06:06Z) - Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint [104.53687944498155]
Reinforcement learning (RL) has been widely used in training large language models (LLMs)
We propose a new RL method named RLMEC that incorporates a generative model as the reward model.
Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process.
arXiv Detail & Related papers (2024-01-11T17:58:41Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Does Zero-Shot Reinforcement Learning Exist? [11.741744003560095]
A zero-shot RL agent is an agent that can solve any RL task instantly with no additional planning or learning.
This marks a shift from the reward-centric RL paradigm towards "controllable" agents.
Strategies for approximate zero-shot RL ave been suggested using successor features (SFs) or forward-backward (FB) representations.
arXiv Detail & Related papers (2022-09-29T16:54:05Z) - Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement
Learning [20.12564350629561]
Reinforcement learning (RL) has drawn increasing interests in recent years due to its tremendous success in various applications.
Standard RL algorithms can only be applied for single reward function, and cannot adapt to an unseen reward function quickly.
We advocate a general operator view of reinforcement learning, which enables us to directly approximate the operator that maps from reward function to value function.
arXiv Detail & Related papers (2022-01-01T19:52:38Z) - Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task.
We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.