Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
- URL: http://arxiv.org/abs/2510.07257v1
- Date: Wed, 08 Oct 2025 17:20:53 GMT
- Title: Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
- Authors: Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski,
- Abstract summary: offline goal-conditioned reinforcement learning (GCRL) trains policies that reach user-specified goals at test time.<n>We introduce Test-Time Graph Search (TTGS), a lightweight planning approach to solve the GCRL task.<n>TTGS accepts any state-space distance or cost signal, builds a weighted graph over dataset states, and performs fast search to assemble a sequence of subgoals that a frozen policy executes.
- Score: 56.13800388912632
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline goal-conditioned reinforcement learning (GCRL) trains policies that reach user-specified goals at test time, providing a simple, unsupervised, domain-agnostic way to extract diverse behaviors from unlabeled, reward-free datasets. Nonetheless, long-horizon decision making remains difficult for GCRL agents due to temporal credit assignment and error accumulation, and the offline setting amplifies these effects. To alleviate this issue, we introduce Test-Time Graph Search (TTGS), a lightweight planning approach to solve the GCRL task. TTGS accepts any state-space distance or cost signal, builds a weighted graph over dataset states, and performs fast search to assemble a sequence of subgoals that a frozen policy executes. When the base learner is value-based, the distance is derived directly from the learned goal-conditioned value function, so no handcrafted metric is needed. TTGS requires no changes to training, no additional supervision, no online interaction, and no privileged information, and it runs entirely at inference. On the OGBench benchmark, TTGS improves success rates of multiple base learners on challenging locomotion tasks, demonstrating the benefit of simple metric-guided test-time planning for offline GCRL.
Related papers
- GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL [64.8155693023222]
Open-source native GUI agents still lag behind closed-source systems on long-horizon navigation tasks.<n>This gap stems from a shortage of high-quality, action-aligned reasoning data.<n>We present GUI-Libra, a tailored training recipe that addresses these challenges.
arXiv Detail & Related papers (2026-02-25T18:34:57Z) - Few Shot Semi-Supervised Learning for Abnormal Stop Detection from Sparse GPS Trajectories [9.895353254067894]
Abnormal stop detection in intercity coach transportation is critical for ensuring passenger safety, operational reliability, and regulatory compliance.<n>Existing methods often assume dense sampling or regular movement patterns, limiting their applicability.<n>We propose a Sparsity-Aware (SAS) method that adaptively defines segment boundaries based on local spatial-temporal density.
arXiv Detail & Related papers (2025-10-14T16:22:34Z) - Test-time Offline Reinforcement Learning on Goal-related Experience [50.94457794664909]
Research in foundation models has shown that performance can be substantially improved through test-time training.<n>We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state.<n>Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out.
arXiv Detail & Related papers (2025-07-24T21:11:39Z) - Clue-RAG: Towards Accurate and Cost-Efficient Graph-based RAG via Multi-Partite Graph and Query-Driven Iterative Retrieval [15.599544326509436]
Retrieval-Augmented Generation (RAG) addresses the limitation by incorporating external information, often from graph-structured data.<n>We propose Clue-RAG, a novel approach that introduces a multi-partite graph index and a query-driven iterative retrieval strategy.<n>Experiments on three QA benchmarks show that Clue-RAG significantly outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-07-11T09:36:45Z) - Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning [15.902089688167871]
offline goal-conditioned reinforcement learning (GCRL) offers a practical learning paradigm where goal-reaching policies are trained from abundant unlabeled datasets.<n>We propose option-aware Temporally Abstracted value learning, dubbed OTA, which incorporates temporal abstraction into the temporal-difference learning process.<n>We experimentally show that the high-level policy extracted using OTA achieves strong performance on complex tasks from OGBench.
arXiv Detail & Related papers (2025-05-19T05:51:11Z) - Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.