Related papers: Closing the Train-Test Gap in World Models for Gradient-Based Planning

Closing the Train-Test Gap in World Models for Gradient-Based Planning

URL: http://arxiv.org/abs/2512.09929v1
Date: Wed, 10 Dec 2025 18:59:45 GMT
Title: Closing the Train-Test Gap in World Models for Gradient-Based Planning
Authors: Arjun Parthasarathy, Nimit Kalra, Rohun Agrawal, Yann LeCun, Oumayma Bounou, Pavel Izmailov, Micah Goldblum,
Abstract summary: We propose improved methods for training world models that enable efficient gradient-based planning.<n>At test time, our approach outperforms or matches the classical gradient-free cross-entropy method.
Score: 64.36544881136405
License: http://creativecommons.org/licenses/by/4.0/
Abstract: World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at inference time. Compared to traditional MPC procedures, which rely on slow search algorithms or on iteratively solving optimization problems exactly, gradient-based planning offers a computationally efficient alternative. However, the performance of gradient-based planning has thus far lagged behind that of other approaches. In this paper, we propose improved methods for training world models that enable efficient gradient-based planning. We begin with the observation that although a world model is trained on a next-state prediction objective, it is used at test-time to instead estimate a sequence of actions. The goal of our work is to close this train-test gap. To that end, we propose train-time data synthesis techniques that enable significantly improved gradient-based planning with existing world models. At test time, our approach outperforms or matches the classical gradient-free cross-entropy method (CEM) across a variety of object manipulation and navigation tasks in 10% of the time budget.

Related papers

Parallel Stochastic Gradient-Based Planning for World Models [39.699893143984916]
We propose a robust and highly parallelizable planner that leverages the differentiability of the learned world model for efficient optimization.<n>Our method treats states as optimization variables ("virtual states") with soft dynamics constraints, enabling parallel and easier optimization.<n>Our planner, which we call GRASP (GradAxed Planner), can be viewed as a valid version of a non-condensed or collocation-based optimal controller.
arXiv Detail & Related papers (2026-01-31T02:57:47Z)
Bounding Distributional Shifts in World Modeling through Novelty Detection [15.354352209595973]
We use a variational autoencoder as a novelty detector to ensure that proposed action trajectories during planning do not cause the learned model to deviate from the training data distribution.<n>The proposed method improves over state-of-the-art solutions in terms of data efficiency.
arXiv Detail & Related papers (2025-08-08T07:42:14Z)
Optimizing ML Training with Metagradient Descent [69.89631748402377]
We introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale.<n>We then introduce a "smooth model training" framework that enables effective optimization using metagradients.
arXiv Detail & Related papers (2025-03-17T22:18:24Z)
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z)
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.<n>We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
Simple Hierarchical Planning with Diffusion [54.48129192534653]
Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets. We introduce the Hierarchical diffuser, a fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning. Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost.
arXiv Detail & Related papers (2024-01-05T05:28:40Z)
Gradient-based Planning with World Models [21.9392160209565]
We present an exploration of a gradient-based alternative that fully leverages the differentiability of the world model. In a sample-efficient setting, our method achieves on par or superior performance compared to the alternative approaches in most tasks.
arXiv Detail & Related papers (2023-12-28T18:54:21Z)
Locally Optimal Descent for Dynamic Stepsize Scheduling [45.6809308002043]
We introduce a novel dynamic learning scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in phases. Our approach is based on estimating a locally-optimal practice tuning-rate in the direction of a smooth gradient. Our findings indicate that our method needs minimal tuning when compared to existing approaches.
arXiv Detail & Related papers (2023-11-23T09:57:35Z)
Temporal Difference Learning for Model Predictive Control [29.217382374051347]
Data-driven model predictive control has two key advantages over model-free methods. TD-MPC achieves superior sample efficiency and performance over prior work on both state and image-based continuous control tasks.
arXiv Detail & Related papers (2022-03-09T18:58:28Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.