Learning Neural Acoustic Fields
- URL: http://arxiv.org/abs/2204.00628v1
- Date: Mon, 4 Apr 2022 17:59:37 GMT
- Title: Learning Neural Acoustic Fields
- Authors: Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio
Torralba, Chuang Gan
- Abstract summary: We introduce Neural Acoustic Fields (NAFs), an implicit representation that captures how sounds propagate in a physical scene.
By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs.
We demonstrate that the continuous nature of NAFs enables us to render spatial acoustics for a listener at an arbitrary location, and can predict sound propagation at novel locations.
- Score: 110.22937202449025
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Our environment is filled with rich and dynamic acoustic information. When we
walk into a cathedral, the reverberations as much as appearance inform us of
the sanctuary's wide open space. Similarly, as an object moves around us, we
expect the sound emitted to also exhibit this movement. While recent advances
in learned implicit functions have led to increasingly higher quality
representations of the visual world, there have not been commensurate advances
in learning spatial auditory representations. To address this gap, we introduce
Neural Acoustic Fields (NAFs), an implicit representation that captures how
sounds propagate in a physical scene. By modeling acoustic propagation in a
scene as a linear time-invariant system, NAFs learn to continuously map all
emitter and listener location pairs to a neural impulse response function that
can then be applied to arbitrary sounds. We demonstrate that the continuous
nature of NAFs enables us to render spatial acoustics for a listener at an
arbitrary location, and can predict sound propagation at novel locations. We
further show that the representation learned by NAFs can help improve visual
learning with sparse views. Finally, we show that a representation informative
of scene structure emerges during the learning of NAFs.
Related papers
- SOAF: Scene Occlusion-aware Neural Acoustic Field [9.651041527067907]
We propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation.
Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling.
We extract features from local acoustic field centred around the receiver using a Fibonacci Sphere to generate audio for novel views.
arXiv Detail & Related papers (2024-07-02T13:40:56Z) - AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.
Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.
We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.
Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields [3.954853544590893]
We propose NeRAF, a method that jointly learns acoustic and radiance fields.
NeRAF synthesizes both novel views and spatialized room impulse responses (RIR) at new positions.
We demonstrate that NeRAF generates high-quality audio on SoundSpaces and RAF datasets.
arXiv Detail & Related papers (2024-05-28T14:17:41Z) - Self-Supervised Learning for Few-Shot Bird Sound Classification [10.395255631261458]
Self-supervised learning (SSL) in audio holds significant potential across various domains.
In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations.
arXiv Detail & Related papers (2023-12-25T22:33:45Z) - Neural Acoustic Context Field: Rendering Realistic Room Impulse Response
With Neural Fields [61.07542274267568]
This letter proposes a novel Neural Acoustic Context Field approach, called NACF, to parameterize an audio scene.
Driven by the unique properties of RIR, we design a temporal correlation module and multi-scale energy decay criterion.
Experimental results show that NACF outperforms existing field-based methods by a notable margin.
arXiv Detail & Related papers (2023-09-27T19:50:50Z) - AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
Synthesis [61.07542274267568]
We study a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning.
We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF.
We present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields.
arXiv Detail & Related papers (2023-02-04T04:17:19Z) - Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
Video [94.42811508809994]
We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to audio.
Whereas existing approaches leverage visual features extracted directly from video frames, our approach explicitly disentangles the geometric cues present in the visual stream to guide the learning process.
arXiv Detail & Related papers (2021-11-21T19:26:45Z) - Learning Audio-Visual Dereverberation [87.52880019747435]
Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition.
Our idea is to learn to dereverberate speech from audio-visual observations.
We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene.
arXiv Detail & Related papers (2021-06-14T20:01:24Z) - A proto-object based audiovisual saliency map [0.0]
We develop a proto-object based audiovisual saliency map (AVSM) for analysis of dynamic natural scenes.
Such environment can be useful in surveillance, robotic navigation, video compression and related applications.
arXiv Detail & Related papers (2020-03-15T08:34:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.