Echo-Reconstruction: Audio-Augmented 3D Scene Reconstruction
- URL: http://arxiv.org/abs/2110.02405v1
- Date: Tue, 5 Oct 2021 23:23:51 GMT
- Title: Echo-Reconstruction: Audio-Augmented 3D Scene Reconstruction
- Authors: Justin Wilson and Nicholas Rewkowski and Ming C. Lin and Henry Fuchs
- Abstract summary: Reflective and textureless surfaces such as windows, mirrors, and walls can be a challenge for object and scene reconstruction.
We propose Echoreconstruction, an audio-visual method that uses the reflections of sound to aid in geometry and audio reconstruction for virtual conferencing, teleimmersion, and other AR/VR experience.
- Score: 30.951713301164016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reflective and textureless surfaces such as windows, mirrors, and walls can
be a challenge for object and scene reconstruction. These surfaces are often
poorly reconstructed and filled with depth discontinuities and holes, making it
difficult to cohesively reconstruct scenes that contain these planar
discontinuities. We propose Echoreconstruction, an audio-visual method that
uses the reflections of sound to aid in geometry and audio reconstruction for
virtual conferencing, teleimmersion, and other AR/VR experience. The mobile
phone prototype emits pulsed audio, while recording video for RGB-based 3D
reconstruction and audio-visual classification. Reflected sound and images from
the video are input into our audio (EchoCNN-A) and audio-visual (EchoCNN-AV)
convolutional neural networks for surface and sound source detection, depth
estimation, and material classification. The inferences from these
classifications enhance scene 3D reconstructions containing open spaces and
reflective surfaces by depth filtering, inpainting, and placement of unmixed
sound sources in the scene. Our prototype, VR demo, and experimental results
from real-world and virtual scenes with challenging surfaces and sound indicate
high success rates on classification of material, depth estimation, and
closed/open surfaces, leading to considerable visual and audio improvement in
3D scenes (see Figure 1).
Related papers
- 3D Audio-Visual Segmentation [44.61476023587931]
Recognizing the sounding objects in scenes is a longstanding objective in embodied AI, with diverse applications in robotics and AR/VR/MR.
We propose a new approach, EchoSegnet, characterized by integrating the ready-to-use knowledge from pretrained 2D audio-visual foundation models.
Experiments demonstrate that EchoSegnet can effectively segment sounding objects in 3D space on our new benchmark, representing a significant advancement in the field of embodied AI.
arXiv Detail & Related papers (2024-11-04T16:30:14Z) - AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.
Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.
We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.
Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - Hearing Anything Anywhere [26.415266601469767]
We introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene.
This allows us to synthesize novel auditory experiences through the space with any source audio.
We show that our model outperforms state-ofthe-art baselines on rendering monaural and RIRs and music at unseen locations.
arXiv Detail & Related papers (2024-06-11T17:56:14Z) - AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
Synthesis [61.07542274267568]
We study a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning.
We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF.
We present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields.
arXiv Detail & Related papers (2023-02-04T04:17:19Z) - Listen2Scene: Interactive material-aware binaural sound propagation for
reconstructed 3D scenes [69.03289331433874]
We present an end-to-end audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications.
We propose a novel neural-network-based sound propagation method to generate acoustic effects for 3D models of real environments.
arXiv Detail & Related papers (2023-02-02T04:09:23Z) - Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from
Videos [32.48058491211032]
We present the first method for visual speech-aware perceptual reconstruction of 3D mouth expressions.
We do this by proposing a "lipread" loss, which guides the fitting process so that the elicited perception from the 3D reconstructed talking head resembles that of the original video footage.
arXiv Detail & Related papers (2022-07-22T14:07:46Z) - Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
Video [94.42811508809994]
We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to audio.
Whereas existing approaches leverage visual features extracted directly from video frames, our approach explicitly disentangles the geometric cues present in the visual stream to guide the learning process.
arXiv Detail & Related papers (2021-11-21T19:26:45Z) - NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in
the Wild [80.09093712055682]
We introduce a surface analog of implicit models called Neural Reflectance Surfaces (NeRS)
NeRS learns a neural shape representation of a closed surface that is diffeomorphic to a sphere, guaranteeing water-tight reconstructions.
We demonstrate that surface-based neural reconstructions enable learning from such data, outperforming volumetric neural rendering-based reconstructions.
arXiv Detail & Related papers (2021-10-14T17:59:58Z) - Learning Audio-Visual Dereverberation [87.52880019747435]
Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition.
Our idea is to learn to dereverberate speech from audio-visual observations.
We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene.
arXiv Detail & Related papers (2021-06-14T20:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.