Related papers: NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

URL: http://arxiv.org/abs/2405.18213v1
Date: Tue, 28 May 2024 14:17:41 GMT
Title: NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Authors: Amandine Brunetto, Sascha Hornauer, Fabien Moutarde,
Abstract summary: NeRAF is a method that jointly learns acoustic and radiance fields. It synthesizes both novel views and spatialized audio at new positions. NeRAF achieves substantial performance improvements over previous works.
Score: 3.954853544590893
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Sound plays a major role in human perception, providing essential scene information alongside vision for understanding our environment. Despite progress in neural implicit representations, learning acoustics that match a visual scene is still challenging. We propose NeRAF, a method that jointly learns acoustic and radiance fields. NeRAF is designed as a Nerfstudio module for convenient access to realistic audio-visual generation. It synthesizes both novel views and spatialized audio at new positions, leveraging radiance field capabilities to condition the acoustic field with 3D scene information. At inference, each modality can be rendered independently and at spatially separated positions, providing greater versatility. We demonstrate the advantages of our method on the SoundSpaces dataset. NeRAF achieves substantial performance improvements over previous works while being more data-efficient. Furthermore, NeRAF enhances novel view synthesis of complex scenes trained with sparse data through cross-modal learning.

Related papers

AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio. We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment. Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z)
Hearing Anything Anywhere [26.415266601469767]
We introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene. This allows us to synthesize novel auditory experiences through the space with any source audio. We show that our model outperforms state-ofthe-art baselines on rendering monaural and RIRs and music at unseen locations.
arXiv Detail & Related papers (2024-06-11T17:56:14Z)
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis [61.07542274267568]
We study a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning. We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF. We present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields.
arXiv Detail & Related papers (2023-02-04T04:17:19Z)
Novel-View Acoustic Synthesis [140.1107768313269]
We introduce the novel-view acoustic synthesis (NVAS) task. given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint? We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space.
arXiv Detail & Related papers (2023-01-20T18:49:58Z)
Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields [32.200557554874784]
This paper provides a new approach to scene understanding, by leveraging the recent progress on implicit 3D representation and neural rendering. Building upon the great success of Neural Radiance Fields (NeRFs), we introduce Scene-Property Synthesis with NeRF. We facilitate addressing a variety of scene understanding tasks under a unified framework, including semantic segmentation, surface normal estimation, reshading, keypoint detection, and edge detection.
arXiv Detail & Related papers (2022-06-09T17:59:50Z)
Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener. We explore how to infer RIRs based on a sparse set of images and echoes observed in the space. In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z)
Learning Neural Acoustic Fields [110.22937202449025]
We introduce Neural Acoustic Fields (NAFs), an implicit representation that captures how sounds propagate in a physical scene. By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs. We demonstrate that the continuous nature of NAFs enables us to render spatial acoustics for a listener at an arbitrary location, and can predict sound propagation at novel locations.
arXiv Detail & Related papers (2022-04-04T17:59:37Z)
BARF: Bundle-Adjusting Neural Radiance Fields [104.97810696435766]
We propose Bundle-Adjusting Neural Radiance Fields (BARF) for training NeRF from imperfect camera poses. BARF can effectively optimize the neural scene representations and resolve large camera pose misalignment at the same time. This enables view synthesis and localization of video sequences from unknown camera poses, opening up new avenues for visual localization systems.
arXiv Detail & Related papers (2021-04-13T17:59:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.