Neural Acoustic Context Field: Rendering Realistic Room Impulse Response
With Neural Fields
- URL: http://arxiv.org/abs/2309.15977v1
- Date: Wed, 27 Sep 2023 19:50:50 GMT
- Title: Neural Acoustic Context Field: Rendering Realistic Room Impulse Response
With Neural Fields
- Authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
- Abstract summary: This letter proposes a novel Neural Acoustic Context Field approach, called NACF, to parameterize an audio scene.
Driven by the unique properties of RIR, we design a temporal correlation module and multi-scale energy decay criterion.
Experimental results show that NACF outperforms existing field-based methods by a notable margin.
- Score: 61.07542274267568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Room impulse response (RIR), which measures the sound propagation within an
environment, is critical for synthesizing high-fidelity audio for a given
environment. Some prior work has proposed representing RIR as a neural field
function of the sound emitter and receiver positions. However, these methods do
not sufficiently consider the acoustic properties of an audio scene, leading to
unsatisfactory performance. This letter proposes a novel Neural Acoustic
Context Field approach, called NACF, to parameterize an audio scene by
leveraging multiple acoustic contexts, such as geometry, material property, and
spatial information. Driven by the unique properties of RIR, i.e., temporal
un-smoothness and monotonic energy attenuation, we design a temporal
correlation module and multi-scale energy decay criterion. Experimental results
show that NACF outperforms existing field-based methods by a notable margin.
Please visit our project page for more qualitative results.
Related papers
- AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.
Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.
We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.
Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling [57.1025908604556]
An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment.
We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment.
We introduce ActiveRIR, a reinforcement learning policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions.
arXiv Detail & Related papers (2024-04-24T21:30:01Z) - AV-RIR: Audio-Visual Room Impulse Response Estimation [49.469389715876915]
Accurate estimation of Room Impulse Response (RIR) is important for speech processing and AR/VR applications.
We propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given reverberant speech signal and visual cues of its corresponding environment.
arXiv Detail & Related papers (2023-11-30T22:58:30Z) - Blind Acoustic Room Parameter Estimation Using Phase Features [4.473249957074495]
We propose utilizing novel phase-related features to extend recent approaches to blindly estimate the so-called "reverberation fingerprint" parameters.
The addition of these features is shown to outperform existing methods that rely solely on magnitude-based spectral features.
arXiv Detail & Related papers (2023-03-13T20:05:41Z) - Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener.
We explore how to infer RIRs based on a sparse set of images and echoes observed in the space.
In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z) - Deep Impulse Responses: Estimating and Parameterizing Filters with Deep
Networks [76.830358429947]
Impulse response estimation in high noise and in-the-wild settings is a challenging problem.
We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning.
arXiv Detail & Related papers (2022-02-07T18:57:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.