Parallel Stochastic Gradient-Based Planning for World Models
- URL: http://arxiv.org/abs/2602.00475v1
- Date: Sat, 31 Jan 2026 02:57:47 GMT
- Title: Parallel Stochastic Gradient-Based Planning for World Models
- Authors: Michael Psenka, Michael Rabbat, Aditi Krishnapriyan, Yann LeCun, Amir Bar,
- Abstract summary: We propose a robust and highly parallelizable planner that leverages the differentiability of the learned world model for efficient optimization.<n>Our method treats states as optimization variables ("virtual states") with soft dynamics constraints, enabling parallel and easier optimization.<n>Our planner, which we call GRASP (GradAxed Planner), can be viewed as a valid version of a non-condensed or collocation-based optimal controller.
- Score: 39.699893143984916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: World models simulate environment dynamics from raw sensory inputs like video. However, using them for planning can be challenging due to the vast and unstructured search space. We propose a robust and highly parallelizable planner that leverages the differentiability of the learned world model for efficient optimization, solving long-horizon control tasks from visual input. Our method treats states as optimization variables ("virtual states") with soft dynamics constraints, enabling parallel computation and easier optimization. To facilitate exploration and avoid local optima, we introduce stochasticity into the states. To mitigate sensitive gradients through high-dimensional vision-based world models, we modify the gradient structure to descend towards valid plans while only requiring action-input gradients. Our planner, which we call GRASP (Gradient RelAxed Stochastic Planner), can be viewed as a stochastic version of a non-condensed or collocation-based optimal controller. We provide theoretical justification and experiments on video-based world models, where our resulting planner outperforms existing planning algorithms like the cross-entropy method (CEM) and vanilla gradient-based optimization (GD) on long-horizon experiments, both in success rate and time to convergence.
Related papers
- Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory [101.2076718776139]
We propose a robust interactive world model capable of maintaining coherent visual memory over 1000+ frames in complex real-world environments.<n>We introduce a Pose-free Memory (HPMC) that distills historical latents into a fixed-budget geometric representation.<n>We also propose an Uncertainty-aware Action Labeling module that discretizes continuous motion into a tri-state logic.
arXiv Detail & Related papers (2026-02-02T17:52:56Z) - Neural Nonmyopic Bayesian Optimization in Dynamic Cost Settings [73.44599934855067]
LookaHES is a nonmyopic BO framework designed for dynamic, history-dependent cost environments.<n>LookaHES combines a multi-step variant of $H$-Entropy Search with pathwise sampling and neural policy optimization.<n>Our innovation is the integration of neural policies, including large language models, to effectively navigate structured, domain-specific action spaces.
arXiv Detail & Related papers (2026-01-10T09:49:45Z) - Closing the Train-Test Gap in World Models for Gradient-Based Planning [64.36544881136405]
We propose improved methods for training world models that enable efficient gradient-based planning.<n>At test time, our approach outperforms or matches the classical gradient-free cross-entropy method.
arXiv Detail & Related papers (2025-12-10T18:59:45Z) - Autonomous Vehicle Path Planning by Searching With Differentiable Simulation [55.46735086899153]
Planning allows an agent to safely refine its actions before executing them in the real world.<n>In autonomous driving, this is crucial to avoid collisions and navigate in complex, dense traffic scenarios.<n>Here we propose Differentiable Simulation for Search (DSS), a framework that leverages the differentiable simulator Waymax as both a next state predictor and a critic.
arXiv Detail & Related papers (2025-11-14T07:56:34Z) - Reinforced Reasoning for Embodied Planning [18.40186665383579]
Embodied planning requires agents to make coherent multi-step decisions based on dynamic visual observations and natural language goals.<n>We introduce a reinforcement fine-tuning framework that brings R1-style reasoning enhancement into embodied planning.
arXiv Detail & Related papers (2025-05-28T07:21:37Z) - Locally Optimal Descent for Dynamic Stepsize Scheduling [45.6809308002043]
We introduce a novel dynamic learning scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in phases.
Our approach is based on estimating a locally-optimal practice tuning-rate in the direction of a smooth gradient.
Our findings indicate that our method needs minimal tuning when compared to existing approaches.
arXiv Detail & Related papers (2023-11-23T09:57:35Z) - Rethinking Optimization with Differentiable Simulation from a Global
Perspective [20.424212055832676]
Differentiable simulation is a promising toolkit for fast gradient-based policy optimization and system identification.
We study the challenges that differentiable simulation presents when it is not feasible to expect that a single descent reaches a global optimum.
We propose a method that combines Bayesian optimization with semi-local 'leaps' to obtain a global search method that can use gradients effectively.
arXiv Detail & Related papers (2022-06-28T17:08:53Z) - DiffSkill: Skill Abstraction from Differentiable Physics for Deformable
Object Manipulations with Tools [96.38972082580294]
DiffSkill is a novel framework that uses a differentiable physics simulator for skill abstraction to solve deformable object manipulation tasks.
In particular, we first obtain short-horizon skills using individual tools from a gradient-based simulator.
We then learn a neural skill abstractor from the demonstration trajectories which takes RGBD images as input.
arXiv Detail & Related papers (2022-03-31T17:59:38Z) - Hallucinative Topological Memory for Zero-Shot Visual Planning [86.20780756832502]
In visual planning (VP), an agent learns to plan goal-directed behavior from observations of a dynamical system obtained offline.
Most previous works on VP approached the problem by planning in a learned latent space, resulting in low-quality visual plans.
Here, we propose a simple VP method that plans directly in image space and displays competitive performance.
arXiv Detail & Related papers (2020-02-27T18:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.