MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM
- URL: http://arxiv.org/abs/2505.16456v1
- Date: Thu, 22 May 2025 09:40:34 GMT
- Title: MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM
- Authors: Siwei Meng, Yawei Luo, Ping Liu,
- Abstract summary: MAGIC is a training-free framework for single-image physical property inference and dynamic generation.<n>Our framework generates motion-rich videos from a static image and closes the visual-to-physical gap through a confidence-driven feedback loop.<n> Experiments show that MAGIC outperforms existing physics-aware generative methods in inference accuracy and achieves greater temporal coherence.
- Score: 14.522189177415724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in static 3D generation have intensified the demand for physically consistent dynamic 3D content. However, existing video generation models, including diffusion-based methods, often prioritize visual realism while neglecting physical plausibility, resulting in implausible object dynamics. Prior approaches for physics-aware dynamic generation typically rely on large-scale annotated datasets or extensive model fine-tuning, which imposes significant computational and data collection burdens and limits scalability across scenarios. To address these challenges, we present MAGIC, a training-free framework for single-image physical property inference and dynamic generation, integrating pretrained image-to-video diffusion models with iterative LLM-based reasoning. Our framework generates motion-rich videos from a static image and closes the visual-to-physical gap through a confidence-driven LLM feedback loop that adaptively steers the diffusion model toward physics-relevant motion. To translate visual dynamics into controllable physical behavior, we further introduce a differentiable MPM simulator operating directly on 3D Gaussians reconstructed from the single image, enabling physically grounded, simulation-ready outputs without any supervision or model tuning. Experiments show that MAGIC outperforms existing physics-aware generative methods in inference accuracy and achieves greater temporal coherence than state-of-the-art video diffusion models.
Related papers
- Diffusion-based 3D Hand Motion Recovery with Intuitive Physics [29.784542628690794]
We present a novel 3D hand motion recovery framework that enhances image-based reconstructions.<n>Our model captures the distribution of refined motion estimates conditioned on initial ones, generating improved sequences.<n>We identify valuable intuitive physics knowledge during hand-object interactions, including key motion states and their associated motion constraints.
arXiv Detail & Related papers (2025-08-03T16:44:24Z) - Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation [54.42523027597904]
We introduce a novel framework that integrates symbolic regression and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting.<n>Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories.
arXiv Detail & Related papers (2025-07-09T13:28:42Z) - DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios [48.09735396455107]
Hand-Object Interaction (HOI) generation has significant application potential.<n>Current 3D HOI motion generation approaches heavily rely on predefined 3D object models and lab-captured motion data.<n>We propose a novel framework that combines visual priors and dynamic constraints within a synchronized diffusion process to generate the HOI video and motion simultaneously.
arXiv Detail & Related papers (2025-06-03T05:04:29Z) - Motion aware video generative model [12.5036873986483]
Diffusion-based video generation has yielded unprecedented quality in visual content and semantic coherence.<n>Current approaches rely on statistical learning without explicitly modeling the underlying physics of motion.<n>This paper introduces a physics-informed frequency domain approach to enhance the physical plausibility of generated videos.
arXiv Detail & Related papers (2025-06-02T20:42:54Z) - VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior [88.51778468222766]
Video diffusion models (VDMs) have advanced significantly in recent years, enabling the generation of highly realistic videos.<n>VDMs often fail to produce physically plausible videos due to an inherent lack of understanding of physics.<n>We propose a novel two-stage image-to-video generation framework that explicitly incorporates physics with vision and language informed physical prior.
arXiv Detail & Related papers (2025-03-30T09:03:09Z) - Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.<n>To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.<n> Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z) - PhysMotion: Physics-Grounded Dynamics From a Single Image [24.096925413047217]
We introduce PhysMotion, a novel framework that leverages principled physics-based simulations to guide intermediate 3D representations generated from a single image and input conditions.<n>Our approach addresses the limitations of traditional data-driven generative models and result in more consistent physically plausible motions.
arXiv Detail & Related papers (2024-11-26T07:59:11Z) - PhysFlow: Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation [9.306758077479472]
PhysFlow is a novel approach that leverages multi-modal foundation models and video diffusion to achieve enhanced 4D dynamic scene simulation.<n>This integrated framework enables accurate prediction and realistic simulation of dynamic interactions in real-world scenarios.
arXiv Detail & Related papers (2024-11-21T18:55:23Z) - DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors [75.83647027123119]
We propose to learn the physical properties of a material field with video diffusion priors.<n>We then utilize a physics-based Material-Point-Method simulator to generate 4D content with realistic motions.
arXiv Detail & Related papers (2024-06-03T16:05:25Z) - EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via
Self-Supervision [85.17951804790515]
EmerNeRF is a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes.
It simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping.
Our method achieves state-of-the-art performance in sensor simulation.
arXiv Detail & Related papers (2023-11-03T17:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.