Related papers: Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

URL: http://arxiv.org/abs/2402.17135v1
Date: Tue, 27 Feb 2024 01:59:02 GMT
Title: Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
Authors: Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine
Abstract summary: We present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks.
Score: 107.1837163643886
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods. Code for this project is provided at: https://github.com/kvfrans/fre

Related papers

Absolute Zero: Reinforced Self-play Reasoning with Zero Data [61.46462130246158]
Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models.<n>We introduce the Absolute Zero Reasoner (AZR), a system that self-evolves its training curriculum and reasoning ability.<n>AZR achieves overall SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models.
arXiv Detail & Related papers (2025-05-06T09:08:00Z)
Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment. Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z)
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback [24.759613248409167]
Reward engineering has long been a challenge in Reinforcement Learning research. We propose RL-VLM-F, a method that automatically generates reward functions for agents to learn new tasks. We demonstrate that RL-VLM-F successfully produces effective rewards and policies across various domains.
arXiv Detail & Related papers (2024-02-06T04:06:06Z)
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint [104.53687944498155]
Reinforcement learning (RL) has been widely used in training large language models (LLMs) We propose a new RL method named RLMEC that incorporates a generative model as the reward model. Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process.
arXiv Detail & Related papers (2024-01-11T17:58:41Z)
REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world. Current methods to mitigate this misalignment work by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Self-Supervised Reinforcement Learning that Transfers using Random Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards. Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z)
A Multiplicative Value Function for Safe and Efficient Reinforcement Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z)
Does Zero-Shot Reinforcement Learning Exist? [11.741744003560095]
A zero-shot RL agent is an agent that can solve any RL task instantly with no additional planning or learning. This marks a shift from the reward-centric RL paradigm towards "controllable" agents. Strategies for approximate zero-shot RL ave been suggested using successor features (SFs) or forward-backward (FB) representations.
arXiv Detail & Related papers (2022-09-29T16:54:05Z)
Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning [20.12564350629561]
Reinforcement learning (RL) has drawn increasing interests in recent years due to its tremendous success in various applications. Standard RL algorithms can only be applied for single reward function, and cannot adapt to an unseen reward function quickly. We advocate a general operator view of reinforcement learning, which enables us to directly approximate the operator that maps from reward function to value function.
arXiv Detail & Related papers (2022-01-01T19:52:38Z)
Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task. We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.