Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
- URL: http://arxiv.org/abs/2403.18821v1
- Date: Wed, 27 Mar 2024 17:59:56 GMT
- Title: Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
- Authors: Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard,
- Abstract summary: Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
- Score: 65.79402756995084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic room data from multiple modalities. The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms. We used this dataset to evaluate existing methods for novel-view acoustic synthesis and impulse response generation which previously relied on synthetic data. In our evaluation, we thoroughly assessed existing audio and audio-visual models against multiple criteria and proposed settings to enhance their performance on real-world data. We also conducted experiments to investigate the impact of incorporating visual data (i.e., images and depth) into neural acoustic field models. Additionally, we demonstrated the effectiveness of a simple sim2real approach, where a model is pre-trained with simulated data and fine-tuned with sparse real-world data, resulting in significant improvements in the few-shot learning approach. RAF is the first dataset to provide densely captured room acoustic data, making it an ideal resource for researchers working on audio and audio-visual neural acoustic field modeling techniques. Demos and datasets are available on our project page: https://facebookresearch.github.io/real-acoustic-fields/
Related papers
- Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data [69.7174072745851]
We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data.
To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization.
To address the second challenge, we propose a novel caption generation technique that leverages the reasoning capabilities of Large Language Models.
arXiv Detail & Related papers (2024-10-02T22:05:36Z) - SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios [19.24195341920164]
We introduce SonicSim, a synthetic toolkit to generate data for moving sound sources.
It supports multi-level adjustments, including scene-level, microphone-level, and source-level.
To validate the differences between synthetic data and real-world data, we randomly selected 5 hours of raw data without reverberation.
The results indicate that the synthetic data generated by SonicSim can effectively generalize to real-world scenarios.
arXiv Detail & Related papers (2024-10-02T12:33:59Z) - Pre-training with Synthetic Patterns for Audio [18.769951782213973]
We propose to pre-train audio encoders using synthetic patterns instead of real audio data.
Our framework achieves performance comparable to models pre-trained on AudioSet-2M.
arXiv Detail & Related papers (2024-10-01T08:52:35Z) - AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.
Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.
We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.
Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling [57.1025908604556]
An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment.
We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment.
We introduce ActiveRIR, a reinforcement learning policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions.
arXiv Detail & Related papers (2024-04-24T21:30:01Z) - Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment.
We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio.
Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z) - RealImpact: A Dataset of Impact Sound Fields for Real Objects [29.066504517249083]
We present RealImpact, a large-scale dataset of real object impact sounds recorded under controlled conditions.
RealImpact contains 150,000 recordings of impact sounds of 50 everyday objects with detailed annotations.
We make preliminary attempts to use our dataset as a reference to current simulation methods for estimating object impact sounds.
arXiv Detail & Related papers (2023-06-16T16:25:41Z) - AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
Synthesis [61.07542274267568]
We study a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning.
We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF.
We present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields.
arXiv Detail & Related papers (2023-02-04T04:17:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.