Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
- URL: http://arxiv.org/abs/2303.16897v3
- Date: Sat, 8 Jul 2023 07:12:28 GMT
- Title: Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
- Authors: Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan
- Abstract summary: Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound.
Existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds.
We propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip.
- Score: 78.49864987061689
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Modeling sounds emitted from physical object interactions is critical for
immersive perceptual experiences in real and virtual worlds. Traditional
methods of impact sound synthesis use physics simulation to obtain a set of
physics parameters that could represent and synthesize the sound. However, they
require fine details of both the object geometries and impact locations, which
are rarely available in the real world and can not be applied to synthesize
impact sounds from common videos. On the other hand, existing video-driven deep
learning-based approaches could only capture the weak correspondence between
visual content and impact sounds since they lack of physics knowledge. In this
work, we propose a physics-driven diffusion model that can synthesize
high-fidelity impact sound for a silent video clip. In addition to the video
content, we propose to use additional physics priors to guide the impact sound
synthesis procedure. The physics priors include both physics parameters that
are directly estimated from noisy real-world impact sound examples without
sophisticated setup and learned residual parameters that interpret the sound
environment via neural networks. We further implement a novel diffusion model
with specific training and inference strategies to combine physics priors and
visual information for impact sound synthesis. Experimental results show that
our model outperforms several existing systems in generating realistic impact
sounds. More importantly, the physics-based representations are fully
interpretable and transparent, thus enabling us to perform sound editing
flexibly.
Related papers
- The Sound of Water: Inferring Physical Properties from Pouring Liquids [85.30865788636386]
We study the connection between audio-visual observations and the underlying physics of pouring liquids.
Our objective is to automatically infer physical properties such as the liquid level, the shape and size of the container, the pouring rate and the time to fill.
arXiv Detail & Related papers (2024-11-18T01:19:37Z) - Differentiable Physics-based System Identification for Robotic Manipulation of Elastoplastic Materials [43.99845081513279]
This work introduces a novel Differentiable Physics-based System Identification (DPSI) framework that enables a robot arm to infer the physics parameters of elastoplastic materials and the environment.
With only a single real-world interaction, the estimated parameters can accurately simulate visually and physically realistic behaviours.
arXiv Detail & Related papers (2024-11-01T13:04:25Z) - PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation [29.831214435147583]
We present PhysGen, a novel image-to-video generation method.
It produces a realistic, physically plausible, and temporally consistent video.
Our key insight is to integrate model-based physical simulation with a data-driven video generation process.
arXiv Detail & Related papers (2024-09-27T17:59:57Z) - Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation.
It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes.
We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z) - PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation [62.53760963292465]
PhysDreamer is a physics-based approach that endows static 3D objects with interactive dynamics.
We present our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study.
arXiv Detail & Related papers (2024-04-19T17:41:05Z) - Dynamic Visual Reasoning by Learning Differentiable Physics Models from
Video and Language [92.7638697243969]
We propose a unified framework that can jointly learn visual concepts and infer physics models of objects from videos and language.
This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
arXiv Detail & Related papers (2021-10-28T17:59:13Z) - Physics-based Human Motion Estimation and Synthesis from Videos [0.0]
We propose a framework for training generative models of physically plausible human motion directly from monocular RGB videos.
At the core of our method is a novel optimization formulation that corrects imperfect image-based pose estimations.
Results show that our physically-corrected motions significantly outperform prior work on pose estimation.
arXiv Detail & Related papers (2021-09-21T01:57:54Z) - Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions.
We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors.
Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.