Related papers: Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals

Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals

URL: http://arxiv.org/abs/2601.05848v1
Date: Fri, 09 Jan 2026 15:23:36 GMT
Title: Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals
Authors: Nate Gillman, Yinghua Zhou, Zitian Tang, Evan Luo, Arjan Chakravarthy, Daksh Aggarwal, Michael Freeman, Charles Herrmann, Chen Sun,
Abstract summary: Goal Force allows users to define goals via explicit force vectors and intermediate dynamics.<n>We train a video generation model on a curated dataset of synthetic causal primitives.<n>Our results suggest that by grounding video generation in fundamental physical interactions, models can emerge as implicit neural physics simulators.
Score: 15.286299359279509
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in video generation have enabled the development of ``world models'' capable of simulating potential futures for robotics and planning. However, specifying precise goals for these models remains a challenge; text instructions are often too abstract to capture physical nuances, while target images are frequently infeasible to specify for dynamic tasks. To address this, we introduce Goal Force, a novel framework that allows users to define goals via explicit force vectors and intermediate dynamics, mirroring how humans conceptualize physical tasks. We train a video generation model on a curated dataset of synthetic causal primitives-such as elastic collisions and falling dominos-teaching it to propagate forces through time and space. Despite being trained on simple physics data, our model exhibits remarkable zero-shot generalization to complex, real-world scenarios, including tool manipulation and multi-object causal chains. Our results suggest that by grounding video generation in fundamental physical interactions, models can emerge as implicit neural physics simulators, enabling precise, physics-aware planning without reliance on external engines. We release all datasets, code, model weights, and interactive video demos at our project page.

Related papers

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models [100.65199317765608]
Physical principles are fundamental to realistic visual simulation, but remain a significant oversight in transformer-based video generation.<n>We introduce a physics-aware reinforcement learning paradigm for video generation models that enforces physical collision rules directly in high-dimensional spaces.<n>We extend this paradigm to a unified framework, termed Mimicry-Discovery Cycle (MDcycle), which allows substantial fine-tuning.
arXiv Detail & Related papers (2026-01-16T08:40:10Z)
Simulating the Visual World with Artificial Intelligence: A Roadmap [48.64639618440864]
Video generation is shifting from generating visually appealing clips to building virtual environments that support interaction and maintain physical plausibility.<n>This survey provides a systematic overview of this evolution, conceptualizing modern video foundation models as the combination of two core components.<n>We trace the progression of video generation through four generations, culminating in a video generation model that embodies intrinsic physical plausibility.
arXiv Detail & Related papers (2025-11-11T18:59:50Z)
Learning to Generate Object Interactions with Physics-Guided Video Diffusion [28.191514920144456]
We introduce KineMask, an approach for physics-guided video generation that enables realistic rigid body control, interactions, and effects.<n>We propose a two-stage training strategy that gradually removes future motion supervision via object masks.<n>Experiments show that KineMask achieves strong improvements over recent models of comparable size.
arXiv Detail & Related papers (2025-10-02T17:56:46Z)
What Happens Next? Anticipating Future Motion by Generating Point Trajectories [76.16266402727643]
We consider the problem of forecasting motion from a single image, predicting how objects in the world are likely to move.<n>We formulate this task as conditional generation of dense trajectory grids with a model that closely follows the architecture of modern video generators.<n>This approach captures scene-wide dynamics and uncertainty, yielding more accurate and diverse predictions than prior regressors and generators.
arXiv Detail & Related papers (2025-09-25T21:03:56Z)
RoboScape: Physics-informed Embodied World Model [25.61586473778092]
We present RoboScape, a unified physics-informed world model that jointly learns RGB video generation and physics knowledge.<n>Experiments demonstrate that RoboScape generates videos with superior visual fidelity and physical plausibility across diverse robotic scenarios.<n>Our work provides new insights for building efficient physics-informed world models to advance embodied intelligence research.
arXiv Detail & Related papers (2025-06-29T08:19:45Z)
Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals [18.86902152614664]
We investigate using physical forces as a control signal for video generation.<n>We propose force prompts which enable users to interact with images through both localized point forces.<n>We demonstrate that these force prompts can enable videos to respond realistically to physical control signals.
arXiv Detail & Related papers (2025-05-26T01:04:02Z)
VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior [88.51778468222766]
Video diffusion models (VDMs) have advanced significantly in recent years, enabling the generation of highly realistic videos.<n>VDMs often fail to produce physically plausible videos due to an inherent lack of understanding of physics.<n>We propose a novel two-stage image-to-video generation framework that explicitly incorporates physics with vision and language informed physical prior.
arXiv Detail & Related papers (2025-03-30T09:03:09Z)
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models [50.38647583839384]
We propose InterDyn, a framework that generates videos of interactive dynamics given an initial frame and a control signal encoding the motion of a driving object or actor.<n>Our key insight is that large video generation models can act as both neurals and implicit physics simulators'', having learned interactive dynamics from large-scale video data.
arXiv Detail & Related papers (2024-12-16T13:57:02Z)
RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing [38.97168020979433]
We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model. Our proposed framework, RoboPack, employs a recurrent graph neural network to estimate object states. We demonstrate our approach on a real robot equipped with a compliant Soft-Bubble tactile sensor on non-prehensile manipulation and dense packing tasks.
arXiv Detail & Related papers (2024-07-01T16:08:37Z)
Physics-Integrated Variational Autoencoders for Robust and Interpretable Generative Modeling [86.9726984929758]
We focus on the integration of incomplete physics models into deep generative models. We propose a VAE architecture in which a part of the latent space is grounded by physics. We demonstrate generative performance improvements over a set of synthetic and real-world datasets.
arXiv Detail & Related papers (2021-02-25T20:28:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.