Related papers: SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

URL: http://arxiv.org/abs/2311.02013v2
Date: Thu, 29 Feb 2024 03:47:12 GMT
Title: SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning
Authors: Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum
Abstract summary: Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions. We present a novel approach to GCRL under a new lens of mixture-distribution matching, leading to our discriminator-free method: SMORe.
Score: 33.125187822259186
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions. Offline GCRL is pivotal for developing generalist agents capable of leveraging pre-existing datasets to learn diverse and reusable skills without hand-engineering reward functions. However, contemporary approaches to GCRL based on supervised learning and contrastive learning are often suboptimal in the offline setting. An alternative perspective on GCRL optimizes for occupancy matching, but necessitates learning a discriminator, which subsequently serves as a pseudo-reward for downstream RL. Inaccuracies in the learned discriminator can cascade, negatively influencing the resulting policy. We present a novel approach to GCRL under a new lens of mixture-distribution matching, leading to our discriminator-free method: SMORe. The key insight is combining the occupancy matching perspective of GCRL with a convex dual formulation to derive a learning objective that can better leverage suboptimal offline data. SMORe learns scores or unnormalized densities representing the importance of taking an action at a state for reaching a particular goal. SMORe is principled and our extensive experiments on the fully offline GCRL benchmark composed of robot manipulation and locomotion tasks, including high-dimensional observations, show that SMORe can outperform state-of-the-art baselines by a significant margin.

Related papers

SuperRL: Reinforcement Learning with Supervision to Boost Language Model Reasoning [42.54530036364341]
SuperRL is a unified training framework that adaptively incorporates offline supervision into reinforcement learning.<n>SuperRL consistently outperforms standard reinforcement learning by improving sample efficiency, generalization, and robustness under sparse rewards.
arXiv Detail & Related papers (2025-06-01T17:43:54Z)
Diffusion Guidance Is a Controllable Policy Improvement Operator [98.11511661904618]
CFGRL is trained with the simplicity of supervised learning, yet can further improve on the policies in the data.<n>On offline RL tasks, we observe a reliable trend -- increased guidance weighting leads to increased performance.
arXiv Detail & Related papers (2025-05-29T14:06:50Z)
Yes, Q-learning Helps Offline In-Context RL [69.26691452160505]
In this study, we explore the integration of RL objectives within an offline in-context reinforcement learning framework.<n>We demonstrate that optimizing RL objectives directly improves performance by approximately 30% on average compared to widely adopted Algorithm Distillation (AD)<n>Our results also reveal that the addition of conservatism during value learning brings additional improvements in almost all settings tested.
arXiv Detail & Related papers (2025-02-24T21:29:06Z)
Accelerating Goal-Conditioned RL Algorithms and Research [17.155006770675904]
Self-supervised goal-conditioned reinforcement learning (GCRL) agents discover new behaviors by learning from the goals achieved during unstructured interaction with the environment. These methods have failed to see similar success due to a lack of data from slow environment simulations as well as a lack of stable algorithms. We release a benchmark (JaxGCRL) for self-supervised GCRL, enabling researchers to train agents for millions of environment steps in minutes on a single GPU.
arXiv Detail & Related papers (2024-08-20T17:58:40Z)
Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL. We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z)
Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment. Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z)
What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL? [31.202506227437937]
offline goal-conditioned RL (GCRL) offers a way to train general-purpose agents from fully offline datasets. We propose a new offline GCRL method, Generalizable Offline goAl-condiTioned RL (GOAT) On a new benchmark containing 9 independent identically distributed (IID) tasks and 17 OOD tasks, GOAT outperforms current state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-05-30T09:26:32Z)
Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning. Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive. They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z)
Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions [34.843526573355746]
Reinforcement learning (RL) agents are widely used for solving complex sequential decision making tasks, but exhibit difficulty in generalizing to scenarios not seen during training. We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations. We propose a new theoretically-motivated framework called Generalized Similarity Functions (GSF), which uses contrastive learning to train an offline RL agent to aggregate observations based on the similarity of their expected future behavior.
arXiv Detail & Related papers (2021-11-29T15:42:54Z)
Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment. We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return. On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z)
Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL [21.550201956884532]
generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training. Many promising approaches to this challenge consider RL as a process of training two functions simultaneously. We propose Cross-Trajectory Representation Learning (CTRL), a method that runs within an RL agent and conditions its encoder to recognize behavioral similarity in observations.
arXiv Detail & Related papers (2021-06-04T00:43:10Z)
Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment. Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.