Related papers: Chain of Time: In-Context Physical Simulation with Image Generation Models

Chain of Time: In-Context Physical Simulation with Image Generation Models

URL: http://arxiv.org/abs/2511.00110v1
Date: Thu, 30 Oct 2025 21:46:26 GMT
Title: Chain of Time: In-Context Physical Simulation with Image Generation Models
Authors: YingQiao Wang, Eric Bigelow, Boyi Li, Tomer Ullman,
Abstract summary: Chain of Time is motivated by in-context reasoning in machine learning, as well as mental simulation in humans.<n>We apply the Chain-of-Time method to synthetic and real-world domains, including 2-D graphics simulations and natural 3-D videos.<n>Using Chain-of-Time simulation substantially improves the performance of a state-of-the-art image generation model.
Score: 11.493192167966846
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel cognitively-inspired method to improve and interpret physical simulation in vision-language models. Our ``Chain of Time" method involves generating a series of intermediate images during a simulation, and it is motivated by in-context reasoning in machine learning, as well as mental simulation in humans. Chain of Time is used at inference time, and requires no additional fine-tuning. We apply the Chain-of-Time method to synthetic and real-world domains, including 2-D graphics simulations and natural 3-D videos. These domains test a variety of particular physical properties, including velocity, acceleration, fluid dynamics, and conservation of momentum. We found that using Chain-of-Time simulation substantially improves the performance of a state-of-the-art image generation model. Beyond examining performance, we also analyzed the specific states of the world simulated by an image model at each time step, which sheds light on the dynamics underlying these simulations. This analysis reveals insights that are hidden from traditional evaluations of physical reasoning, including cases where an image generation model is able to simulate physical properties that unfold over time, such as velocity, gravity, and collisions. Our analysis also highlights particular cases where the image generation model struggles to infer particular physical parameters from input images, despite being capable of simulating relevant physical processes.

Related papers

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models [100.65199317765608]
Physical principles are fundamental to realistic visual simulation, but remain a significant oversight in transformer-based video generation.<n>We introduce a physics-aware reinforcement learning paradigm for video generation models that enforces physical collision rules directly in high-dimensional spaces.<n>We extend this paradigm to a unified framework, termed Mimicry-Discovery Cycle (MDcycle), which allows substantial fine-tuning.
arXiv Detail & Related papers (2026-01-16T08:40:10Z)
PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding [50.454084539837005]
PhysChoreo is a novel framework that can generate videos with diverse controllability and physical realism from a single image.<n>Our method consists of two stages: first, it estimates the static initial physical properties of all objects in the image through part-aware physical property reconstruction.<n>Then, through temporally instructed and physically editable simulation, it synthesizes high-quality videos with rich dynamic behaviors and physical realism.
arXiv Detail & Related papers (2025-11-25T17:59:04Z)
Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers [3.951575888190684]
This study investigates a transformer adaptation for video prediction with a simple end-to-end approach, comparing various selftemporal-attention layouts.<n>We introduce a simple yet effective transformer for autoregressive video prediction, utilizing continuous pixel-space representations for video prediction horizon.
arXiv Detail & Related papers (2025-10-23T17:58:45Z)
A simulation-heuristics dual-process model for intuitive physics [28.707537312978502]
Using a pouring-marble task, our human study revealed two distinct error patterns when predicting pouring angles, differentiated by simulation time.<n>We propose a dual-process framework, Simulation-Heuristics Model (SHM), where intuitive physics employs simulation for short-time simulation but switches to simulation when simulation becomes costly.<n>The SHM aligns more precisely with human behavior and demonstrates consistent predictive performance across diverse scenarios, advancing our understanding of the adaptive nature of intuitive physical reasoning.
arXiv Detail & Related papers (2025-04-13T12:34:02Z)
PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos [21.441062722848265]
PhysTwin is a novel framework that uses sparse videos of dynamic objects under interaction to produce a photo- and physically realistic, real-time interactive replica.<n>Our approach centers on two key components: (1) a physics-informed representation that combines spring-mass models for realistic physical simulation, and generative shape models for geometry, and Gaussian splats for rendering.<n>Our method integrates an inverse physics framework with visual perception cues, enabling high-fidelity reconstruction even from partial, occluded, and limited viewpoints.
arXiv Detail & Related papers (2025-03-23T07:49:19Z)
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations? [7.1606014219358425]
We propose to investigate the potential of generative models in the context of physical simulations.<n>We provide a dataset of 300k image-pairs and baseline evaluations for three different physical simulation tasks.
arXiv Detail & Related papers (2025-03-07T11:19:13Z)
PhysMotion: Physics-Grounded Dynamics From a Single Image [24.096925413047217]
We introduce PhysMotion, a novel framework that leverages principled physics-based simulations to guide intermediate 3D representations generated from a single image and input conditions.<n>Our approach addresses the limitations of traditional data-driven generative models and result in more consistent physically plausible motions.
arXiv Detail & Related papers (2024-11-26T07:59:11Z)
Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics [48.99021224773799]
We propose the Neural Material Adaptor (NeuMA), which integrates existing physical laws with learned corrections. We also propose Particle-GS, a particle-driven 3D Gaussian Splatting variant that bridges simulation and observed images.
arXiv Detail & Related papers (2024-10-10T17:43:36Z)
Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation. It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes. We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z)
Conditional Generative Models for Simulation of EMG During Naturalistic Movements [45.698312905115955]
We present a conditional generative neural network trained adversarially to generate motor unit activation potential waveforms. We demonstrate the ability of such a model to predictively interpolate between a much smaller number of numerical model's outputs with a high accuracy.
arXiv Detail & Related papers (2022-11-03T14:49:02Z)
Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions. We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors. Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)
Learning to Simulate Complex Physics with Graph Networks [68.43901833812448]
We present a machine learning framework and model implementation that can learn to simulate a wide variety of challenging physical domains. Our framework---which we term "Graph Network-based Simulators" (GNS)--represents the state of a physical system with particles, expressed as nodes in a graph, and computes dynamics via learned message-passing. Our results show that our model can generalize from single-timestep predictions with thousands of particles during training, to different initial conditions, thousands of timesteps, and at least an order of magnitude more particles at test time.
arXiv Detail & Related papers (2020-02-21T16:44:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.