What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?
- URL: http://arxiv.org/abs/2512.24497v1
- Date: Tue, 30 Dec 2025 22:50:03 GMT
- Title: What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?
- Authors: Basile Terver, Tsung-Yen Yang, Jean Ponce, Adrien Bardes, Yann LeCun,
- Abstract summary: A family of methods has introduced planning algorithms that optimize in the learned representation space of the world model.<n>In this work, we characterize models from this family as JEPA-WMs and investigate the technical choices that make algorithms from this class work.<n>We propose a model that outperforms two established baselines, DINO-WM and V-JEPA-2-AC, in both navigation and manipulation tasks.
- Score: 47.66705831032197
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A long-standing challenge in AI is to develop agents capable of solving a wide range of physical tasks and generalizing to new, unseen tasks and environments. A popular recent approach involves training a world model from state-action trajectories and subsequently use it with a planning algorithm to solve new tasks. Planning is commonly performed in the input space, but a recent family of methods has introduced planning algorithms that optimize in the learned representation space of the world model, with the promise that abstracting irrelevant details yields more efficient planning. In this work, we characterize models from this family as JEPA-WMs and investigate the technical choices that make algorithms from this class work. We propose a comprehensive study of several key components with the objective of finding the optimal approach within the family. We conducted experiments using both simulated environments and real-world robotic data, and studied how the model architecture, the training objective, and the planning algorithm affect planning success. We combine our findings to propose a model that outperforms two established baselines, DINO-WM and V-JEPA-2-AC, in both navigation and manipulation tasks. Code, data and checkpoints are available at https://github.com/facebookresearch/jepa-wms.
Related papers
- Closing the Train-Test Gap in World Models for Gradient-Based Planning [64.36544881136405]
We propose improved methods for training world models that enable efficient gradient-based planning.<n>At test time, our approach outperforms or matches the classical gradient-free cross-entropy method.
arXiv Detail & Related papers (2025-12-10T18:59:45Z) - World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning [60.100794160682646]
We propose a new learning framework that jointly optimize state prediction and action selection through preference learning.<n>To automatically collect trajectories and stepwise preference data without human annotation, we introduce a tree search mechanism for extensive exploration via trial-and-error.<n>Our method significantly outperforms existing methods and GPT-4o when applied to Qwen2-VL (7B), LLaVA-1.6 (7B), and LLaMA-3.2 (11B)
arXiv Detail & Related papers (2025-03-13T15:49:56Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - Thinker: Learning to Plan and Act [18.425843346728648]
The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model.
We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark.
arXiv Detail & Related papers (2023-07-27T16:40:14Z) - A Reinforcement Learning-assisted Genetic Programming Algorithm for Team
Formation Problem Considering Person-Job Matching [70.28786574064694]
A reinforcement learning-assisted genetic programming algorithm (RL-GP) is proposed to enhance the quality of solutions.
The hyper-heuristic rules obtained through efficient learning can be utilized as decision-making aids when forming project teams.
arXiv Detail & Related papers (2023-04-08T14:32:12Z) - Learning to Execute: Efficient Learning of Universal Plan-Conditioned
Policies in Robotics [20.148408520475655]
We introduce Learning to Execute (L2E), which leverages information contained in approximate plans to learn universal policies that are conditioned on plans.
In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning.
arXiv Detail & Related papers (2021-11-15T16:58:50Z) - Continuous Control for Searching and Planning with a Learned Model [5.196149362684628]
Decision-making agents with planning capabilities have achieved huge success in the challenging domain like Chess, Shogi, and Go.
Researchers proposed the MuZero algorithm that can learn the dynamical model through the interactions with the environment.
We show the proposed algorithm outperforms the soft actor-critic (SAC) algorithm, a state-of-the-art model-free deep reinforcement learning algorithm.
arXiv Detail & Related papers (2020-06-12T19:10:41Z) - PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals [14.315501760755609]
PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards.
Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
arXiv Detail & Related papers (2020-06-01T12:53:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.