Related papers: PhysID: Physics-based Interactive Dynamics from a Single-view Image

PhysID: Physics-based Interactive Dynamics from a Single-view Image

URL: http://arxiv.org/abs/2506.17746v1
Date: Sat, 21 Jun 2025 15:57:58 GMT
Title: PhysID: Physics-based Interactive Dynamics from a Single-view Image
Authors: Sourabh Vasant Gothe, Ayon Chattopadhyay, Gunturi Venkata Sai Phani Kiran, Pratik, Vibhav Agarwal, Jayesh Rajkumar Vachhani, Sourav Ghosh, Parameswaranath VM, Barath Raj KR,
Abstract summary: We present PhysID, that streamlines the creation of physics-based interactive dynamics from a single-view image.<n>We integrate an on-device physics-based engine for physically plausible real-time rendering with user interactions.
Score: 1.7214450148288793
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Transforming static images into interactive experiences remains a challenging task in computer vision. Tackling this challenge holds the potential to elevate mobile user experiences, notably through interactive and AR/VR applications. Current approaches aim to achieve this either using pre-recorded video responses or requiring multi-view images as input. In this paper, we present PhysID, that streamlines the creation of physics-based interactive dynamics from a single-view image by leveraging large generative models for 3D mesh generation and physical property prediction. This significantly reduces the expertise required for engineering-intensive tasks like 3D modeling and intrinsic property calibration, enabling the process to be scaled with minimal manual intervention. We integrate an on-device physics-based engine for physically plausible real-time rendering with user interactions. PhysID represents a leap forward in mobile-based interactive dynamics, offering real-time, non-deterministic interactions and user-personalization with efficient on-device memory consumption. Experiments evaluate the zero-shot capabilities of various Multimodal Large Language Models (MLLMs) on diverse tasks and the performance of 3D reconstruction models. These results demonstrate the cohesive functioning of all modules within the end-to-end framework, contributing to its effectiveness.

Related papers

Half-Physics: Enabling Kinematic 3D Human Model with Physical Interactions [88.01918532202716]
We introduce a novel approach that embeds SMPL-X into a tangible entity capable of dynamic physical interactions with its surroundings.<n>Our approach maintains kinematic control over inherent SMPL-X poses while ensuring physically plausible interactions with scenes and objects.<n>Unlike reinforcement learning-based methods, which demand extensive and complex training, our half-physics method is learning-free and generalizes to any body shape and motion.
arXiv Detail & Related papers (2025-07-31T17:58:33Z)
PhysiInter: Integrating Physical Mapping for High-Fidelity Human Interaction Generation [35.563978243352764]
We introduce physical mapping, integrated throughout the human interaction generation pipeline.<n>Specifically, motion imitation within a physics-based simulation environment is used to project target motions into a physically valid space.<n>Experiments show our method achieves impressive results in generated human motion quality, with a 3%-89% improvement in physical fidelity.
arXiv Detail & Related papers (2025-06-09T06:04:49Z)
MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM [14.522189177415724]
MAGIC is a training-free framework for single-image physical property inference and dynamic generation.<n>Our framework generates motion-rich videos from a static image and closes the visual-to-physical gap through a confidence-driven feedback loop.<n> Experiments show that MAGIC outperforms existing physics-aware generative methods in inference accuracy and achieves greater temporal coherence.
arXiv Detail & Related papers (2025-05-22T09:40:34Z)
Large Model Empowered Metaverse: State-of-the-Art, Challenges and Opportunities [23.465545107612595]
The Metaverse is an immersive, persistent digital ecosystem where users can interact, socialize, and work within 3D virtual environments.<n>This paper investigates the integration of large models within the Metaverse.<n>We propose a generative AI-based framework for optimizing Metaverse rendering.
arXiv Detail & Related papers (2025-01-18T13:52:48Z)
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models [50.38647583839384]
We propose InterDyn, a framework that generates videos of interactive dynamics given an initial frame and a control signal encoding the motion of a driving object or actor.<n>Our key insight is that large video generation models can act as both neurals and implicit physics simulators'', having learned interactive dynamics from large-scale video data.
arXiv Detail & Related papers (2024-12-16T13:57:02Z)
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z)
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation [62.53760963292465]
PhysDreamer is a physics-based approach that endows static 3D objects with interactive dynamics. We present our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study.
arXiv Detail & Related papers (2024-04-19T17:41:05Z)
DROP: Dynamics Responses from Human Motion Prior and Projective Dynamics [21.00283279991885]
We introduce DROP, a novel framework for modeling Dynamics Responses of humans using generative mOtion prior and Projective dynamics. We conduct extensive evaluations of our model across different motion tasks and various physical perturbations, demonstrating the scalability and diversity of responses.
arXiv Detail & Related papers (2023-09-24T20:25:59Z)
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction [158.74130075865835]
Given a malfunctional 3D object, humans can perform mental simulations to reason about its functionality and figure out how to fix it. To mimic humans' mental simulation process, we present FixNet, a novel framework that seamlessly incorporates perception and physical dynamics.
arXiv Detail & Related papers (2022-05-05T17:59:36Z)
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation [75.0278287071591]
ThreeDWorld (TDW) is a platform for interactive multi-modal physical simulation. TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments. We present initial experiments enabled by TDW in emerging research directions in computer vision, machine learning, and cognitive science.
arXiv Detail & Related papers (2020-07-09T17:33:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.