SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning
- URL: http://arxiv.org/abs/2311.02013v2
- Date: Thu, 29 Feb 2024 03:47:12 GMT
- Title: SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning
- Authors: Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy
Zhang, Scott Niekum
- Abstract summary: Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions.
We present a novel approach to GCRL under a new lens of mixture-distribution matching, leading to our discriminator-free method: SMORe.
- Score: 33.125187822259186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with
learning to achieve multiple goals in an environment purely from offline
datasets using sparse reward functions. Offline GCRL is pivotal for developing
generalist agents capable of leveraging pre-existing datasets to learn diverse
and reusable skills without hand-engineering reward functions. However,
contemporary approaches to GCRL based on supervised learning and contrastive
learning are often suboptimal in the offline setting. An alternative
perspective on GCRL optimizes for occupancy matching, but necessitates learning
a discriminator, which subsequently serves as a pseudo-reward for downstream
RL. Inaccuracies in the learned discriminator can cascade, negatively
influencing the resulting policy. We present a novel approach to GCRL under a
new lens of mixture-distribution matching, leading to our discriminator-free
method: SMORe. The key insight is combining the occupancy matching perspective
of GCRL with a convex dual formulation to derive a learning objective that can
better leverage suboptimal offline data. SMORe learns scores or unnormalized
densities representing the importance of taking an action at a state for
reaching a particular goal. SMORe is principled and our extensive experiments
on the fully offline GCRL benchmark composed of robot manipulation and
locomotion tasks, including high-dimensional observations, show that SMORe can
outperform state-of-the-art baselines by a significant margin.
Related papers
- Accelerating Goal-Conditioned RL Algorithms and Research [17.155006770675904]
Self-supervised goal-conditioned reinforcement learning (GCRL) agents discover new behaviors by learning from the goals achieved during unstructured interaction with the environment.
These methods have failed to see similar success due to a lack of data from slow environment simulations as well as a lack of stable algorithms.
We release a benchmark (JaxGCRL) for self-supervised GCRL, enabling researchers to train agents for millions of environment steps in minutes on a single GPU.
arXiv Detail & Related papers (2024-08-20T17:58:40Z) - Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - What is Essential for Unseen Goal Generalization of Offline
Goal-conditioned RL? [31.202506227437937]
offline goal-conditioned RL (GCRL) offers a way to train general-purpose agents from fully offline datasets.
We propose a new offline GCRL method, Generalizable Offline goAl-condiTioned RL (GOAT)
On a new benchmark containing 9 independent identically distributed (IID) tasks and 17 OOD tasks, GOAT outperforms current state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-05-30T09:26:32Z) - Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning.
Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - Improving Zero-shot Generalization in Offline Reinforcement Learning
using Generalized Similarity Functions [34.843526573355746]
Reinforcement learning (RL) agents are widely used for solving complex sequential decision making tasks, but exhibit difficulty in generalizing to scenarios not seen during training.
We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations.
We propose a new theoretically-motivated framework called Generalized Similarity Functions (GSF), which uses contrastive learning to train an offline RL agent to aggregate observations based on the similarity of their expected future behavior.
arXiv Detail & Related papers (2021-11-29T15:42:54Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z) - Cross-Trajectory Representation Learning for Zero-Shot Generalization in
RL [21.550201956884532]
generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training.
Many promising approaches to this challenge consider RL as a process of training two functions simultaneously.
We propose Cross-Trajectory Representation Learning (CTRL), a method that runs within an RL agent and conditions its encoder to recognize behavioral similarity in observations.
arXiv Detail & Related papers (2021-06-04T00:43:10Z) - Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment.
Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.