Dynamic Visual Reasoning by Learning Differentiable Physics Models from
Video and Language
- URL: http://arxiv.org/abs/2110.15358v1
- Date: Thu, 28 Oct 2021 17:59:13 GMT
- Title: Dynamic Visual Reasoning by Learning Differentiable Physics Models from
Video and Language
- Authors: Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum,
Chuang Gan
- Abstract summary: We propose a unified framework that can jointly learn visual concepts and infer physics models of objects from videos and language.
This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
- Score: 92.7638697243969
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work, we propose a unified framework, called Visual Reasoning with
Differ-entiable Physics (VRDP), that can jointly learn visual concepts and
infer physics models of objects and their interactions from videos and
language. This is achieved by seamlessly integrating three components: a visual
perception module, a concept learner, and a differentiable physics engine. The
visual perception module parses each video frame into object-centric
trajectories and represents them as latent scene representations. The concept
learner grounds visual concepts (e.g., color, shape, and material) from these
object-centric representations based on the language, thus providing prior
knowledge for the physics engine. The differentiable physics model, implemented
as an impulse-based differentiable rigid-body simulator, performs
differentiable physical simulation based on the grounded concepts to infer
physical properties, such as mass, restitution, and velocity, by fitting the
simulated trajectories into the video observations. Consequently, these learned
concepts and physical models can explain what we have seen and imagine what is
about to happen in future and counterfactual scenarios. Integrating
differentiable physics into the dynamic reasoning framework offers several
appealing benefits. More accurate dynamics prediction in learned physics models
enables state-of-the-art performance on both synthetic and real-world
benchmarks while still maintaining high transparency and interpretability; most
notably, VRDP improves the accuracy of predictive and counterfactual questions
by 4.5% and 11.5% compared to its best counterpart. VRDP is also highly
data-efficient: physical parameters can be optimized from very few videos, and
even a single video can be sufficient. Finally, with all physical parameters
inferred, VRDP can quickly learn new concepts from a few examples.
Related papers
- PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation [29.831214435147583]
We present PhysGen, a novel image-to-video generation method.
It produces a realistic, physically plausible, and temporally consistent video.
Our key insight is to integrate model-based physical simulation with a data-driven video generation process.
arXiv Detail & Related papers (2024-09-27T17:59:57Z) - Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation.
It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes.
We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z) - Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion [35.71595369663293]
We propose textbfPhysics3D, a novel method for learning various physical properties of 3D objects through a video diffusion model.
Our approach involves designing a highly generalizable physical simulation system based on a viscoelastic material model.
Experiments demonstrate the effectiveness of our method with both elastic and plastic materials.
arXiv Detail & Related papers (2024-06-06T17:59:47Z) - ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense.
We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy.
We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z) - Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos [78.49864987061689]
Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound.
Existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds.
We propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip.
arXiv Detail & Related papers (2023-03-29T17:59:53Z) - Differentiable Dynamics for Articulated 3d Human Motion Reconstruction [29.683633237503116]
We introduce DiffPhy, a differentiable physics-based model for articulated 3d human motion reconstruction from video.
We validate the model by demonstrating that it can accurately reconstruct physically plausible 3d human motion from monocular video.
arXiv Detail & Related papers (2022-05-24T17:58:37Z) - Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation.
This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z) - Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions.
We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors.
Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.