Related papers: Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics

Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics

URL: http://arxiv.org/abs/2505.13150v1
Date: Mon, 19 May 2025 14:12:19 GMT
Title: Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics
Authors: Maksim Bobrin, Ilya Zisman, Alexander Nikulin, Vladislav Kurenkov, Dmitry Dylov,
Abstract summary: Behavioral Foundation Models (BFMs) proved successful in producing policies for arbitrary tasks in a zero-shot manner.<n>Here, we show that one of the methods from the BFM family, Forward-Backward (FB) representation, cannot distinguish between distinct dynamics.<n>We propose a FB model with a transformer-based belief estimator, which greatly facilitates zero-shot adaptation.
Score: 42.446740732573296
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Behavioral Foundation Models (BFMs) proved successful in producing policies for arbitrary tasks in a zero-shot manner, requiring no test-time training or task-specific fine-tuning. Among the most promising BFMs are the ones that estimate the successor measure learned in an unsupervised way from task-agnostic offline data. However, these methods fail to react to changes in the dynamics, making them inefficient under partial observability or when the transition function changes. This hinders the applicability of BFMs in a real-world setting, e.g., in robotics, where the dynamics can unexpectedly change at test time. In this work, we demonstrate that Forward-Backward (FB) representation, one of the methods from the BFM family, cannot distinguish between distinct dynamics, leading to an interference among the latent directions, which parametrize different policies. To address this, we propose a FB model with a transformer-based belief estimator, which greatly facilitates zero-shot adaptation. We also show that partitioning the policy encoding space into dynamics-specific clusters, aligned with the context-embedding directions, yields additional gain in performance. These traits allow our method to respond to the dynamics observed during training and to generalize to unseen ones. Empirically, in the changing dynamics setting, our approach achieves up to a 2x higher zero-shot returns compared to the baselines for both discrete and continuous tasks.

Related papers

Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making [48.998030470623384]
offline decision-making requires reliable behaviors from fixed datasets without further interaction.<n>We propose a compositional model-based diffusion framework consisting of: (i) a planner that generates diverse, task-aligned trajectories; (ii) a dynamics model that enforces consistency with the underlying system dynamics; and (iii) a ranker module that selects behaviors aligned with the task objectives.
arXiv Detail & Related papers (2025-12-09T06:26:02Z)
Balance Equation-based Distributionally Robust Offline Imitation Learning [8.607736795429638]
Imitation Learning (IL) has proven highly effective for robotic and control tasks where manually designing reward functions or explicit controllers is infeasible.<n>Standard IL methods implicitly assume that the environment dynamics remain fixed between training and deployment.<n>We address this challenge through Balance Equation-based Distributionally Robust Offline Learning.<n>We formulate the problem as a distributionally robust optimization over an uncertainty set of transition models, seeking a policy that minimizes the imitation loss under the worst-case transition distribution.
arXiv Detail & Related papers (2025-11-11T07:48:09Z)
Grounded Test-Time Adaptation for LLM Agents [75.62784644919803]
Large language model (LLM)-based agents struggle to generalize to novel and complex environments.<n>We propose two strategies for adapting LLM agents by leveraging environment-specific information available during deployment.
arXiv Detail & Related papers (2025-11-06T22:24:35Z)
Wavelet Predictive Representations for Non-Stationary Reinforcement Learning [16.99397898280072]
WISDOM captures multi-scale features in evolving MDP sequences by transforming task representation sequences into the wavelet domain.<n>WISDOM significantly outperforms existing baselines in both sample efficiency and performance.
arXiv Detail & Related papers (2025-10-06T05:49:18Z)
Set Pivot Learning: Redefining Generalized Segmentation with Vision Foundation Models [15.321114178936554]
We introduce the concept of Set Pivot Learning, a paradigm shift that redefines domain generalization (DG) based on Vision Foundation Models (VFMs)<n>Traditional DG assumes that the target domain is inaccessible during training, but the emergence of VFMs renders this assumption unclear and obsolete.<n>We propose Set Pivot Learning (SPL), a new definition of domain migration task based on VFMs, which is more suitable for current research and application requirements.
arXiv Detail & Related papers (2025-08-03T04:20:35Z)
Dynamic Manipulation of Deformable Objects in 3D: Simulation, Benchmark and Learning Strategy [88.8665000676562]
Prior methods often simplify the problem to low-speed or 2D settings, limiting their applicability to real-world 3D tasks.<n>To mitigate data scarcity, we introduce a novel simulation framework and benchmark grounded in reduced-order dynamics.<n>We propose Dynamics Informed Diffusion Policy (DIDP), a framework that integrates imitation pretraining with physics-informed test-time adaptation.
arXiv Detail & Related papers (2025-05-23T03:28:25Z)
Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models.<n>Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process.<n>We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z)
On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities. The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations. We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z)
EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model [46.99510778097286]
Unsupervised reinforcement learning (URL) poses a promising paradigm to learn useful behaviors in a task-agnostic environment. We introduce a novel model-fused paradigm to jointly pre-train the dynamics model and unsupervised exploration policy in the pre-training phase. We show that EUCLID achieves state-of-the-art performance with high sample efficiency.
arXiv Detail & Related papers (2022-10-02T12:11:44Z)
Data Augmentation through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement Learning [0.0]
offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task. Recent works showed that an expert-guided pipeline relying on Density Estimation methods effectively detects this structure in deterministic environments. We show that the former results lead to a performance improvement when solving the learned MDP and then applying the optimized policy in the real environment.
arXiv Detail & Related papers (2021-12-18T14:32:32Z)
A New Representation of Successor Features for Transfer across Dissimilar Environments [60.813074750879615]
Many real-world RL problems require transfer among environments with different dynamics. We propose an approach based on successor features in which we model successor feature functions with Gaussian Processes. Our theoretical analysis proves the convergence of this approach as well as the bounded error on modelling successor feature functions.
arXiv Detail & Related papers (2021-07-18T12:37:05Z)
Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes [25.513074215377696]
This paper proposes a continual online model-based reinforcement learning approach. It does not require pre-training to solve task-agnostic problems with unknown task boundaries. In experiments, our approach outperforms alternative methods in non-stationary tasks.
arXiv Detail & Related papers (2020-06-19T23:52:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.