Closing the Train-Test Gap in World Models for Gradient-Based Planning
- URL: http://arxiv.org/abs/2512.09929v1
- Date: Wed, 10 Dec 2025 18:59:45 GMT
- Title: Closing the Train-Test Gap in World Models for Gradient-Based Planning
- Authors: Arjun Parthasarathy, Nimit Kalra, Rohun Agrawal, Yann LeCun, Oumayma Bounou, Pavel Izmailov, Micah Goldblum,
- Abstract summary: We propose improved methods for training world models that enable efficient gradient-based planning.<n>At test time, our approach outperforms or matches the classical gradient-free cross-entropy method.
- Score: 64.36544881136405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at inference time. Compared to traditional MPC procedures, which rely on slow search algorithms or on iteratively solving optimization problems exactly, gradient-based planning offers a computationally efficient alternative. However, the performance of gradient-based planning has thus far lagged behind that of other approaches. In this paper, we propose improved methods for training world models that enable efficient gradient-based planning. We begin with the observation that although a world model is trained on a next-state prediction objective, it is used at test-time to instead estimate a sequence of actions. The goal of our work is to close this train-test gap. To that end, we propose train-time data synthesis techniques that enable significantly improved gradient-based planning with existing world models. At test time, our approach outperforms or matches the classical gradient-free cross-entropy method (CEM) across a variety of object manipulation and navigation tasks in 10% of the time budget.
Related papers
- Parallel Stochastic Gradient-Based Planning for World Models [39.699893143984916]
We propose a robust and highly parallelizable planner that leverages the differentiability of the learned world model for efficient optimization.<n>Our method treats states as optimization variables ("virtual states") with soft dynamics constraints, enabling parallel and easier optimization.<n>Our planner, which we call GRASP (GradAxed Planner), can be viewed as a valid version of a non-condensed or collocation-based optimal controller.
arXiv Detail & Related papers (2026-01-31T02:57:47Z) - Bounding Distributional Shifts in World Modeling through Novelty Detection [15.354352209595973]
We use a variational autoencoder as a novelty detector to ensure that proposed action trajectories during planning do not cause the learned model to deviate from the training data distribution.<n>The proposed method improves over state-of-the-art solutions in terms of data efficiency.
arXiv Detail & Related papers (2025-08-08T07:42:14Z) - Optimizing ML Training with Metagradient Descent [69.89631748402377]
We introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale.<n>We then introduce a "smooth model training" framework that enables effective optimization using metagradients.
arXiv Detail & Related papers (2025-03-17T22:18:24Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.<n>We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - Simple Hierarchical Planning with Diffusion [54.48129192534653]
Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets.
We introduce the Hierarchical diffuser, a fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning.
Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost.
arXiv Detail & Related papers (2024-01-05T05:28:40Z) - Gradient-based Planning with World Models [21.9392160209565]
We present an exploration of a gradient-based alternative that fully leverages the differentiability of the world model.
In a sample-efficient setting, our method achieves on par or superior performance compared to the alternative approaches in most tasks.
arXiv Detail & Related papers (2023-12-28T18:54:21Z) - Locally Optimal Descent for Dynamic Stepsize Scheduling [45.6809308002043]
We introduce a novel dynamic learning scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in phases.
Our approach is based on estimating a locally-optimal practice tuning-rate in the direction of a smooth gradient.
Our findings indicate that our method needs minimal tuning when compared to existing approaches.
arXiv Detail & Related papers (2023-11-23T09:57:35Z) - Temporal Difference Learning for Model Predictive Control [29.217382374051347]
Data-driven model predictive control has two key advantages over model-free methods.
TD-MPC achieves superior sample efficiency and performance over prior work on both state and image-based continuous control tasks.
arXiv Detail & Related papers (2022-03-09T18:58:28Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.