Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning
- URL: http://arxiv.org/abs/2511.07730v2
- Date: Fri, 14 Nov 2025 08:49:48 GMT
- Title: Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning
- Authors: Bill Chunyuan Zheng, Vivek Myers, Benjamin Eysenbach, Sergey Levine,
- Abstract summary: Key question is how to estimate the temporal distance between pairs of observations.<n>We show how these approaches can be integrated into a practical GCRL method that fits a quasimetric distance.<n>We also demonstrate that our method can enable stitching in the real-world robotic manipulation domain.
- Score: 72.24831946301613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning how to reach goals in an environment is a longstanding challenge in AI, yet reasoning over long horizons remains a challenge for modern methods. The key question is how to estimate the temporal distance between pairs of observations. While temporal difference methods leverage local updates to provide optimality guarantees, they often perform worse than Monte Carlo methods that perform global updates (e.g., with multi-step returns), which lack such guarantees. We show how these approaches can be integrated into a practical GCRL method that fits a quasimetric distance using a multistep Monte-Carlo return. We show our method outperforms existing GCRL methods on long-horizon simulated tasks with up to 4000 steps, even with visual observations. We also demonstrate that our method can enable stitching in the real-world robotic manipulation domain (Bridge setup). Our approach is the first end-to-end GCRL method that enables multistep stitching in this real-world manipulation domain from an unlabeled offline dataset of visual observations.
Related papers
- Gaussian Process Aggregation for Root-Parallel Monte Carlo Tree Search with Continuous Actions [17.674265727888063]
We introduce a method that uses Gaussian Process Regression to obtain value estimates for promising actions that were not trialed in the environment.<n>We perform a systematic evaluation across 6 different domains, demonstrating that our approach outperforms existing aggregation strategies.
arXiv Detail & Related papers (2025-12-10T15:09:06Z) - Offline Goal-conditioned Reinforcement Learning with Quasimetric Representations [72.24831946301613]
Approaches for goal-conditioned reinforcement learning (GCRL) often use learned state representations to extract goal-reaching policies.<n>We propose an approach that unifies these two frameworks, using the structure of a quasimetric representation space (triangle inequality) with the right additional constraints to learn successor representations that enable optimal goal-reaching.<n>Our approach is able to exploit a **quasimetric** distance parameterization to learn **optimal** goal-reaching distances, even with **suboptimal** data and in **stochastic** environments.
arXiv Detail & Related papers (2025-09-24T18:45:32Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL)
We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo.
Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z) - C-Learning: Horizon-Aware Cumulative Accessibility Estimation [29.588146016880284]
We introduce the concept of cumulative accessibility functions, which measure the reachability of a goal from a given state within a specified horizon.
We show that these functions obey a recurrence relation, which enables learning from offline interactions.
We evaluate our approach on a set of multi-goal discrete and continuous control tasks.
arXiv Detail & Related papers (2020-11-24T20:34:31Z) - Learning to Optimize Non-Rigid Tracking [54.94145312763044]
We employ learnable optimizations to improve robustness and speed up solver convergence.
First, we upgrade the tracking objective by integrating an alignment data term on deep features which are learned end-to-end through CNN.
Second, we bridge the gap between the preconditioning technique and learning method by introducing a ConditionNet which is trained to generate a preconditioner.
arXiv Detail & Related papers (2020-03-27T04:40:57Z) - Ready Policy One: World Building Through Active Learning [35.358315617358976]
We introduce Ready Policy One (RP1), a framework that views Model-Based Reinforcement Learning as an active learning problem.
RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization.
We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
arXiv Detail & Related papers (2020-02-07T09:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.