Related papers: PhyMAGIC: Physical Motion-Aware Generative Inference with Confidence-guided LLM

Related papers

FastPhysGS: Accelerating Physics-based Dynamic 3DGS Simulation via Interior Completion and Adaptive Optimization [56.17833729527066]
We propose FastPhysGS, a framework for physics-based dynamic 3DGS simulation.<n>FastPhysGS achieves high-fidelity physical simulation in 1 minute using only 7 GB runtime memory.
arXiv Detail & Related papers (2026-02-02T07:00:42Z)
PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models [100.65199317765608]
Physical principles are fundamental to realistic visual simulation, but remain a significant oversight in transformer-based video generation.<n>We introduce a physics-aware reinforcement learning paradigm for video generation models that enforces physical collision rules directly in high-dimensional spaces.<n>We extend this paradigm to a unified framework, termed Mimicry-Discovery Cycle (MDcycle), which allows substantial fine-tuning.
arXiv Detail & Related papers (2026-01-16T08:40:10Z)
MotionPhysics: Learnable Motion Distillation for Text-Guided Simulation [25.78198969054392]
MotionPhysics is an end-to-end differentiable framework that infers plausible physical parameters from a user-provided natural language prompt.<n>We evaluate MotionPhysics across more than thirty scenarios, including real-world, human-designed, and AI-generated 3D objects.
arXiv Detail & Related papers (2026-01-01T22:56:37Z)
ProPhy: Progressive Physical Alignment for Dynamic World Simulation [55.456455952212416]
ProPhy is a Progressive Physical Alignment Framework that enables explicit physics-aware conditioning and anisotropic generation.<n>We show that ProPhy produces more realistic, dynamic, and physically coherent results than existing state-of-the-art methods.
arXiv Detail & Related papers (2025-12-05T09:39:26Z)
PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement [45.990473754456104]
Video Large Language Models (Video LLMs) have shown impressive performance across a wide range of video-language tasks.<n>We propose PhyVLLM, a physical-guided video-language framework that explicitly incorporates physical motion into Video LLMs.<n>We show that PhyVLLM significantly outperforms state-of-the-art Video LLMs on both physical reasoning and general video understanding tasks.
arXiv Detail & Related papers (2025-12-04T07:28:56Z)
PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection [10.498184571108995]
We propose PhysCorr, a unified framework for modeling, evaluating, and optimizing physical consistency in video generation.<n>Specifically, we introduce PhysicsRM, the first dual-dimensional reward model that quantifies both intra-object stability and inter-object interactions.<n>Our approach is model-agnostic and scalable, enabling seamless integration into a wide range of video diffusion and transformer-based backbones.
arXiv Detail & Related papers (2025-11-06T02:40:57Z)
TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility [70.24211591214528]
Video generative models produce sequences that violate intuitive physical laws, such as objects floating, teleporting, or morphing.<n>Existing Video-Language Models (VLMs) struggle to identify physics violations, exposing fundamental limitations in their temporal and causal reasoning.<n>We introduce TRAVL, a fine-tuning recipe that combines a balanced training dataset with a trajectory-aware attention module to improve motion encoding.<n>We propose ImplausiBench, a benchmark of 300 videos (150 real, 150 generated) that removes linguistic biases and isolates visual-temporal understanding.
arXiv Detail & Related papers (2025-10-08T21:03:46Z)
PhysHMR: Learning Humanoid Control Policies from Vision for Physically Plausible Human Motion Reconstruction [52.44375492811009]
We present PhysHMR, a unified framework that learns a visual-to-action policy for humanoid control in a physics-based simulator.<n>A key component of our approach is the pixel-as-ray strategy, which lifts 2D keypoints into 3D spatial rays and transforms them into global space.<n>PhysHMR produces high-fidelity, physically plausible motion across diverse scenarios, outperforming prior approaches in both visual accuracy and physical realism.
arXiv Detail & Related papers (2025-10-02T21:01:11Z)
Diffusion-based 3D Hand Motion Recovery with Intuitive Physics [29.784542628690794]
We present a novel 3D hand motion recovery framework that enhances image-based reconstructions.<n>Our model captures the distribution of refined motion estimates conditioned on initial ones, generating improved sequences.<n>We identify valuable intuitive physics knowledge during hand-object interactions, including key motion states and their associated motion constraints.
arXiv Detail & Related papers (2025-08-03T16:44:24Z)
Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation [54.42523027597904]
We introduce a novel framework that integrates symbolic regression and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting.<n>Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories.
arXiv Detail & Related papers (2025-07-09T13:28:42Z)
DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z)
SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios [48.09735396455107]
Hand-Object Interaction (HOI) generation has significant application potential.<n>Current 3D HOI motion generation approaches heavily rely on predefined 3D object models and lab-captured motion data.<n>We propose a novel framework that combines visual priors and dynamic constraints within a synchronized diffusion process to generate the HOI video and motion simultaneously.
arXiv Detail & Related papers (2025-06-03T05:04:29Z)
Motion aware video generative model [12.5036873986483]
Diffusion-based video generation has yielded unprecedented quality in visual content and semantic coherence.<n>Current approaches rely on statistical learning without explicitly modeling the underlying physics of motion.<n>This paper introduces a physics-informed frequency domain approach to enhance the physical plausibility of generated videos.
arXiv Detail & Related papers (2025-06-02T20:42:54Z)
Think Before You Diffuse: Infusing Physical Rules into Video Diffusion [55.046699347579455]
The complexity of real-world motions, interactions, and dynamics introduce great difficulties when learning physics from data.<n>We propose DiffPhy, a generic framework that enables physically-correct and photo-realistic video generation by fine-tuning a pre-trained video diffusion model.
arXiv Detail & Related papers (2025-05-27T18:26:43Z)
VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior [88.51778468222766]
Video diffusion models (VDMs) have advanced significantly in recent years, enabling the generation of highly realistic videos.<n>VDMs often fail to produce physically plausible videos due to an inherent lack of understanding of physics.<n>We propose a novel two-stage image-to-video generation framework that explicitly incorporates physics with vision and language informed physical prior.
arXiv Detail & Related papers (2025-03-30T09:03:09Z)
Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.<n>To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.<n> Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z)
PhysMotion: Physics-Grounded Dynamics From a Single Image [24.096925413047217]
We introduce PhysMotion, a novel framework that leverages principled physics-based simulations to guide intermediate 3D representations generated from a single image and input conditions.<n>Our approach addresses the limitations of traditional data-driven generative models and result in more consistent physically plausible motions.
arXiv Detail & Related papers (2024-11-26T07:59:11Z)
PhysFlow: Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation [9.306758077479472]
PhysFlow is a novel approach that leverages multi-modal foundation models and video diffusion to achieve enhanced 4D dynamic scene simulation.<n>This integrated framework enables accurate prediction and realistic simulation of dynamic interactions in real-world scenarios.
arXiv Detail & Related papers (2024-11-21T18:55:23Z)
DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors [75.83647027123119]
We propose to learn the physical properties of a material field with video diffusion priors.<n>We then utilize a physics-based Material-Point-Method simulator to generate 4D content with realistic motions.
arXiv Detail & Related papers (2024-06-03T16:05:25Z)
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision [85.17951804790515]
EmerNeRF is a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes. It simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping. Our method achieves state-of-the-art performance in sensor simulation.
arXiv Detail & Related papers (2023-11-03T17:59:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.