Novel-View Acoustic Synthesis from 3D Reconstructed Rooms
- URL: http://arxiv.org/abs/2310.15130v1
- Date: Mon, 23 Oct 2023 17:34:31 GMT
- Title: Novel-View Acoustic Synthesis from 3D Reconstructed Rooms
- Authors: Byeongjoo Ahn, Karren Yang, Brian Hamilton, Jonathan Sheaffer, Anurag
Ranjan, Miguel Sarabia, Oncel Tuzel, Jen-Hao Rick Chang
- Abstract summary: We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis.
We identify the main challenges of novel-view acoustic synthesis as sound source localization, separation, and dereverberation.
We show that incorporating room impulse responses (RIRs) derived from 3D reconstructed rooms enables the same network to jointly tackle these tasks.
- Score: 18.49261985372842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the benefit of combining blind audio recordings with 3D scene
information for novel-view acoustic synthesis. Given audio recordings from 2-4
microphones and the 3D geometry and material of a scene containing multiple
unknown sound sources, we estimate the sound anywhere in the scene. We identify
the main challenges of novel-view acoustic synthesis as sound source
localization, separation, and dereverberation. While naively training an
end-to-end network fails to produce high-quality results, we show that
incorporating room impulse responses (RIRs) derived from 3D reconstructed rooms
enables the same network to jointly tackle these tasks. Our method outperforms
existing methods designed for the individual tasks, demonstrating its
effectiveness at utilizing 3D visual information. In a simulated study on the
Matterport3D-NVAS dataset, our model achieves near-perfect accuracy on source
localization, a PSNR of 26.44 dB and a SDR of 14.23 dB for source separation
and dereverberation, resulting in a PSNR of 25.55 dB and a SDR of 14.20 dB on
novel-view acoustic synthesis. Code, pretrained model, and video results are
available on the project webpage (https://github.com/apple/ml-nvas3d).
Related papers
- AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.
Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.
We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.
Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Overview of the L3DAS23 Challenge on Audio-Visual Extended Reality [15.034352805342937]
The primary goal of the L3DAS23 Signal Processing Grand Challenge at ICASSP 2023 is to promote and support collaborative research on machine learning for 3D audio signal processing.
We provide a brand-new dataset, which maintains the same general characteristics of the L3DAS21 and L3DAS22 datasets.
We propose updated baseline models for both tasks that can now support audio-image couples as input and a supporting API to replicate our results.
arXiv Detail & Related papers (2024-02-14T15:34:28Z) - Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment.
We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio.
Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z) - AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
Synthesis [61.07542274267568]
We study a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning.
We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF.
We present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields.
arXiv Detail & Related papers (2023-02-04T04:17:19Z) - Listen2Scene: Interactive material-aware binaural sound propagation for
reconstructed 3D scenes [69.03289331433874]
We present an end-to-end audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications.
We propose a novel neural-network-based sound propagation method to generate acoustic effects for 3D models of real environments.
arXiv Detail & Related papers (2023-02-02T04:09:23Z) - Novel-View Acoustic Synthesis [140.1107768313269]
We introduce the novel-view acoustic synthesis (NVAS) task.
given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?
We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space.
arXiv Detail & Related papers (2023-01-20T18:49:58Z) - L3DAS21 Challenge: Machine Learning for 3D Audio Signal Processing [6.521891605165917]
The L3DAS21 Challenge is aimed at encouraging and fostering collaborative research on machine learning for 3D audio signal processing.
We release the L3DAS21 dataset, a 65 hours 3D audio corpus, accompanied with a Python API that facilitates the data usage and results submission stage.
arXiv Detail & Related papers (2021-04-12T14:29:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.