Related papers: Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging

Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging

URL: http://arxiv.org/abs/2512.08333v2
Date: Thu, 18 Dec 2025 10:00:32 GMT
Title: Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging
Authors: Yajat Yadav, Zhiyuan Zhou, Andrew Wagenmaker, Karl Pertsch, Sergey Levine,
Abstract summary: Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize.<n>They still fall short on new tasks not covered in the training data.<n>We develop a method that preserves the generalization capabilities of the generalist policy during finetuning.
Score: 53.41119829581115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize across a wide spectrum of behaviors, enabling a single policy to act in varied real-world environments. However, they still fall short on new tasks not covered in the training data. When finetuned on limited demonstrations of a new task, these policies often overfit to the specific demonstrations--not only losing their prior abilities to solve a wide variety of generalist tasks but also failing to generalize within the new task itself. In this work, we aim to develop a method that preserves the generalization capabilities of the generalist policy during finetuning, allowing a single policy to robustly incorporate a new skill into its repertoire. Our goal is a single policy that both learns to generalize to variations of the new task and retains the broad competencies gained from pretraining. We show that this can be achieved through a simple yet effective strategy: interpolating the weights of a finetuned model with that of the pretrained model. We show, across extensive simulated and real-world experiments, that such model merging produces a single model that inherits the generalist abilities of the base model and learns to solve the new task robustly, outperforming both the pretrained and finetuned model on out-of-distribution variations of the new task. Moreover, we show that model merging performance scales with the amount of pretraining data, and enables continual acquisition of new skills in a lifelong learning setting, without sacrificing previously learned generalist abilities.

Related papers

Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment. Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z)
Self-Supervised Reinforcement Learning that Transfers using Random Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards. Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z)
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs [63.936622239286685]
We find that interference among different tasks and modalities is the main factor to this phenomenon. We introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Code and pre-trained generalist models shall be released.
arXiv Detail & Related papers (2022-06-09T17:59:59Z)
Model Based Meta Learning of Critics for Policy Gradients [19.431964785397717]
We present a framework to meta-learn the critic for gradient-based policy learning. Our algorithm leads to learned critics that resemble the ground truth Q function for a given task. After meta-training, the learned critic can be used to learn new policies for new unseen task and environment settings.
arXiv Detail & Related papers (2022-04-05T13:43:12Z)
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks. We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z)
Learning Adaptable Policy via Meta-Adversarial Inverse Reinforcement Learning for Decision-making Tasks [2.1485350418225244]
We build an adaptable imitation learning model based on the integration of Meta-learning and Adversarial Inverse Reinforcement Learning. We exploit the adversarial learning and inverse reinforcement learning mechanisms to learn policies and reward functions simultaneously from available training tasks.
arXiv Detail & Related papers (2021-03-23T17:16:38Z)
Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy. We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.