Related papers: Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes

Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes

URL: http://arxiv.org/abs/2302.02809v3
Date: Wed, 26 Apr 2023 03:44:57 GMT
Title: Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes
Authors: Anton Ratnarajah, Dinesh Manocha
Abstract summary: We present an end-to-end audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications. We propose a novel neural-network-based sound propagation method to generate acoustic effects for 3D models of real environments.
Score: 69.03289331433874
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present an end-to-end binaural audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications. We propose a novel neural-network-based binaural sound propagation method to generate acoustic effects for 3D models of real environments. Any clean audio or dry audio can be convolved with the generated acoustic effects to render audio corresponding to the real environment. We propose a graph neural network that uses both the material and the topology information of the 3D scenes and generates a scene latent vector. Moreover, we use a conditional generative adversarial network (CGAN) to generate acoustic effects from the scene latent vector. Our network is able to handle holes or other artifacts in the reconstructed 3D mesh model. We present an efficient cost function to the generator network to incorporate spatial audio effects. Given the source and the listener position, our learning-based binaural sound propagation approach can generate an acoustic effect in 0.1 milliseconds on an NVIDIA GeForce RTX 2080 Ti GPU and can easily handle multiple sources. We have evaluated the accuracy of our approach with binaural acoustic effects generated using an interactive geometric sound propagation algorithm and captured real acoustic effects. We also performed a perceptual evaluation and observed that the audio rendered by our approach is more plausible as compared to audio rendered using prior learning-based sound propagation algorithms.

Related papers

Visual Acoustic Fields [39.43953430861896]
We propose Visual Acoustic Fields, a framework that bridges hitting sounds and visual signals within a 3D space. Our approach features two key modules: sound generation and sound localization. To the best of our knowledge, this is the first dataset to connect visual and acoustic signals in a 3D context.
arXiv Detail & Related papers (2025-03-31T16:16:10Z)
SOAF: Scene Occlusion-aware Neural Acoustic Field [9.651041527067907]
We propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation. Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling. We extract features from local acoustic field centred around the receiver using a Fibonacci Sphere to generate audio for novel views.
arXiv Detail & Related papers (2024-07-02T13:40:56Z)
AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio. We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment. Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z)
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities. RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z)
Novel-View Acoustic Synthesis from 3D Reconstructed Rooms [17.72902700567848]
We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. We identify the main challenges of novel-view acoustic synthesis as sound source localization, separation, and dereverberation. We show that incorporating room impulse responses (RIRs) derived from 3D reconstructed rooms enables the same network to jointly tackle these tasks.
arXiv Detail & Related papers (2023-10-23T17:34:31Z)
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis [61.07542274267568]
We study a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning. We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF. We present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields.
arXiv Detail & Related papers (2023-02-04T04:17:19Z)
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video [94.42811508809994]
We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to audio. Whereas existing approaches leverage visual features extracted directly from video frames, our approach explicitly disentangles the geometric cues present in the visual stream to guide the learning process.
arXiv Detail & Related papers (2021-11-21T19:26:45Z)
Learning Audio-Visual Dereverberation [87.52880019747435]
Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition. Our idea is to learn to dereverberate speech from audio-visual observations. We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene.
arXiv Detail & Related papers (2021-06-14T20:01:24Z)
Points2Sound: From mono to binaural audio using 3D point cloud scenes [0.0]
We propose Points2Sound, a multi-modal deep learning model which generates a version from mono audio using 3D point cloud scenes. Results show that 3D visual information can successfully guide multi-modal deep learning models for the task of synthesis.
arXiv Detail & Related papers (2021-04-26T10:44:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.