NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
- URL: http://arxiv.org/abs/2405.18213v1
- Date: Tue, 28 May 2024 14:17:41 GMT
- Title: NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
- Authors: Amandine Brunetto, Sascha Hornauer, Fabien Moutarde,
- Abstract summary: NeRAF is a method that jointly learns acoustic and radiance fields.
It synthesizes both novel views and spatialized audio at new positions.
NeRAF achieves substantial performance improvements over previous works.
- Score: 3.954853544590893
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Sound plays a major role in human perception, providing essential scene information alongside vision for understanding our environment. Despite progress in neural implicit representations, learning acoustics that match a visual scene is still challenging. We propose NeRAF, a method that jointly learns acoustic and radiance fields. NeRAF is designed as a Nerfstudio module for convenient access to realistic audio-visual generation. It synthesizes both novel views and spatialized audio at new positions, leveraging radiance field capabilities to condition the acoustic field with 3D scene information. At inference, each modality can be rendered independently and at spatially separated positions, providing greater versatility. We demonstrate the advantages of our method on the SoundSpaces dataset. NeRAF achieves substantial performance improvements over previous works while being more data-efficient. Furthermore, NeRAF enhances novel view synthesis of complex scenes trained with sparse data through cross-modal learning.
Related papers
- AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.
Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.
We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.
Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - Hearing Anything Anywhere [26.415266601469767]
We introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene.
This allows us to synthesize novel auditory experiences through the space with any source audio.
We show that our model outperforms state-ofthe-art baselines on rendering monaural and RIRs and music at unseen locations.
arXiv Detail & Related papers (2024-06-11T17:56:14Z) - LANe: Lighting-Aware Neural Fields for Compositional Scene Synthesis [65.20672798704128]
We present Lighting-Aware Neural Field (LANe) for compositional synthesis of driving scenes.
We learn a scene representation that disentangles the static background and transient elements into a world-NeRF and class-specific object-NeRFs.
We demonstrate the performance of our model on a synthetic dataset of diverse lighting conditions rendered with the CARLA simulator.
arXiv Detail & Related papers (2023-04-06T17:59:25Z) - AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
Synthesis [61.07542274267568]
We study a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning.
We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF.
We present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields.
arXiv Detail & Related papers (2023-02-04T04:17:19Z) - Novel-View Acoustic Synthesis [140.1107768313269]
We introduce the novel-view acoustic synthesis (NVAS) task.
given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?
We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space.
arXiv Detail & Related papers (2023-01-20T18:49:58Z) - Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields [32.200557554874784]
This paper provides a new approach to scene understanding, by leveraging the recent progress on implicit 3D representation and neural rendering.
Building upon the great success of Neural Radiance Fields (NeRFs), we introduce Scene-Property Synthesis with NeRF.
We facilitate addressing a variety of scene understanding tasks under a unified framework, including semantic segmentation, surface normal estimation, reshading, keypoint detection, and edge detection.
arXiv Detail & Related papers (2022-06-09T17:59:50Z) - Unsupervised Discovery and Composition of Object Light Fields [57.198174741004095]
We propose to represent objects in an object-centric, compositional scene representation as light fields.
We propose a novel light field compositor module that enables reconstructing the global light field from a set of object-centric light fields.
arXiv Detail & Related papers (2022-05-08T17:50:35Z) - Learning Neural Acoustic Fields [110.22937202449025]
We introduce Neural Acoustic Fields (NAFs), an implicit representation that captures how sounds propagate in a physical scene.
By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs.
We demonstrate that the continuous nature of NAFs enables us to render spatial acoustics for a listener at an arbitrary location, and can predict sound propagation at novel locations.
arXiv Detail & Related papers (2022-04-04T17:59:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.