Related papers: Test-time Offline Reinforcement Learning on Goal-related Experience

Test-time Offline Reinforcement Learning on Goal-related Experience

URL: http://arxiv.org/abs/2507.18809v1
Date: Thu, 24 Jul 2025 21:11:39 GMT
Title: Test-time Offline Reinforcement Learning on Goal-related Experience
Authors: Marco Bagatella, Mert Albaba, Jonas Hübotter, Georg Martius, Andreas Krause,
Abstract summary: Research in foundation models has shown that performance can be substantially improved through test-time training.<n>We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state.<n>Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out.
Score: 50.94457794664909
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Foundation models compress a large amount of information in a single, large neural network, which can then be queried for individual tasks. There are strong parallels between this widespread framework and offline goal-conditioned reinforcement learning algorithms: a universal value function is trained on a large number of goals, and the policy is evaluated on a single goal in each test episode. Extensive research in foundation models has shown that performance can be substantially improved through test-time training, specializing the model to the current goal. We find similarly that test-time offline reinforcement learning on experience related to the test goal can lead to substantially better policies at minimal compute costs. We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state and quality with respect to the evaluation goal. We demonstrate across a wide range of high-dimensional loco-navigation and manipulation tasks that fine-tuning a policy on the selected data for a few gradient steps leads to significant performance gains over standard offline pre-training. Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out. Finally, we study compute allocation at inference, demonstrating that, at comparable costs, GC-TTT induces performance gains that are not achievable by scaling model size.

Related papers

NeuroTTT: Bridging Pretraining-Downstream Task Misalignment in EEG Foundation Models via Test-Time Training [6.030518150035875]
This paper introduces a two-stage alignment strategy for EEG foundation models.<n>First, we propose NeuroTTT: a domain-specific self-supervised fine-tuning paradigm.<n>Second, we perform self-supervised test-time training on individual unlabeled test samples.<n>Our approach is the first to unify domain-tuned self-supervision with test-time training in large-scale EEG foundation models.
arXiv Detail & Related papers (2025-09-30T14:14:46Z)
Towards Monotonic Improvement in In-Context Reinforcement Learning [18.67894044930047]
In-Context Reinforcement Learning (ICRL) has emerged as a promising paradigm for developing agents that can rapidly adapt to new tasks.<n>Recent approaches train large sequence models on monotonic policy improvement data from online RL, aiming to a continue improved testing time performance.<n>We propose two methods for estimating Context Value at both training and testing time.
arXiv Detail & Related papers (2025-09-27T09:42:19Z)
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning [37.62558445850573]
We propose an algorithm, iterative influence-based filtering (IIF), for online RL training.<n>IIF reduces sample complexity, speeds up training, and achieves higher returns.<n>These results advance interpretability, efficiency, and effectiveness of online RL.
arXiv Detail & Related papers (2025-05-25T19:25:57Z)
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework. We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples. We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z)
Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode. We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z)
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation. We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z)
Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels [6.372625755672473]
We propose a new method for approximating active learning acquisition strategies that are based on retraining with hypothetically-labeled candidate data points. Although this is usually infeasible with deep networks, we use the neural tangent kernel to approximate the result of retraining.
arXiv Detail & Related papers (2022-06-25T06:13:27Z)
C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation. We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states. E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z)
MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning [108.79676336281211]
Continuous deployment of new policies for data collection and online learning is either cost ineffective or impractical. We propose a new algorithmic learning framework called Model-based Uncertainty regularized and Sample Efficient Batch Optimization. Our framework discovers novel and high quality samples for each deployment to enable efficient data collection.
arXiv Detail & Related papers (2021-02-23T01:30:55Z)
Representation Matters: Offline Pretraining for Sequential Decision Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making. We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.