Reconstruction of Sound Field through Diffusion Models
- URL: http://arxiv.org/abs/2312.08821v2
- Date: Wed, 21 Feb 2024 16:15:40 GMT
- Title: Reconstruction of Sound Field through Diffusion Models
- Authors: Federico Miotello, Luca Comanducci, Mirco Pezzoli, Alberto Bernardini,
Fabio Antonacci and Augusto Sarti
- Abstract summary: Reconstructing the sound field in a room is an important task for several applications, such as sound control and augmented (AR) or virtual reality (VR)
We propose a data-driven generative model for reconstructing the magnitude of acoustic fields in rooms with a focus on the modal frequency range.
We introduce, for the first time, the use of a conditional Denoising Diffusion Probabilistic Model (DDPM) trained in order to reconstruct the sound field (SF-Diff) over an extended domain.
- Score: 15.192190218332843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing the sound field in a room is an important task for several
applications, such as sound control and augmented (AR) or virtual reality (VR).
In this paper, we propose a data-driven generative model for reconstructing the
magnitude of acoustic fields in rooms with a focus on the modal frequency
range. We introduce, for the first time, the use of a conditional Denoising
Diffusion Probabilistic Model (DDPM) trained in order to reconstruct the sound
field (SF-Diff) over an extended domain. The architecture is devised in order
to be conditioned on a set of limited available measurements at different
frequencies and generate the sound field in target, unknown, locations. The
results show that SF-Diff is able to provide accurate reconstructions,
outperforming a state-of-the-art baseline based on kernel interpolation.
Related papers
- HARP: A Large-Scale Higher-Order Ambisonic Room Impulse Response Dataset [0.6568378556428859]
This contribution introduces a dataset of 7th-order Ambisonic Room Impulse Responses (HOA-RIRs) created using the Image Source Method.
By employing higher-order Ambisonics, our dataset enables precise spatial audio reproduction.
The presented 64-microphone configuration allows us to capture RIRs directly in the Spherical Harmonics domain.
arXiv Detail & Related papers (2024-11-21T15:16:48Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Neural Acoustic Context Field: Rendering Realistic Room Impulse Response
With Neural Fields [61.07542274267568]
This letter proposes a novel Neural Acoustic Context Field approach, called NACF, to parameterize an audio scene.
Driven by the unique properties of RIR, we design a temporal correlation module and multi-scale energy decay criterion.
Experimental results show that NACF outperforms existing field-based methods by a notable margin.
arXiv Detail & Related papers (2023-09-27T19:50:50Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Generative adversarial networks with physical sound field priors [6.256923690998173]
This paper presents a deep learning-based approach for learns-temporal reconstruction of sound fields using Generative Adversa Networks (GANs)
The proposed method uses a plane wave basis and the underlying statistical distributions of pressure in rooms to reconstruct sound fields from a limited number of measurements.
The results suggest that this approach provides a promising approach to sound field reconstruction using generative models that allow for a physically informed acoustics prior to problems.
arXiv Detail & Related papers (2023-08-01T10:11:23Z) - Realistic Noise Synthesis with Diffusion Models [68.48859665320828]
Deep image denoising models often rely on large amount of training data for the high quality performance.
We propose a novel method that synthesizes realistic noise using diffusion models, namely Realistic Noise Synthesize Diffusor (RNSD)
RNSD can incorporate guided multiscale content, such as more realistic noise with spatial correlations can be generated at multiple frequencies.
arXiv Detail & Related papers (2023-05-23T12:56:01Z) - DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly
Detection [89.49600182243306]
We reformulate the reconstruction process using a diffusion model into a noise-to-norm paradigm.
We propose a rapid one-step denoising paradigm, significantly faster than the traditional iterative denoising in diffusion models.
The segmentation sub-network predicts pixel-level anomaly scores using the input image and its anomaly-free restoration.
arXiv Detail & Related papers (2023-03-15T16:14:06Z) - Mean absorption estimation from room impulse responses using virtually
supervised learning [0.0]
This paper introduces and investigates a new approach to estimate mean absorption coefficients solely from a room impulse response (RIR)
This inverse problem is tackled via virtually-supervised learning, namely, the RIR-to-absorption mapping is implicitly learned by regression on a simulated dataset using artificial neural networks.
arXiv Detail & Related papers (2021-09-01T14:06:20Z) - Deep Sound Field Reconstruction in Real Rooms: Introducing the ISOBEL
Sound Field Dataset [0.0]
This paper extends evaluations of sound field reconstruction at low frequencies by introducing a dataset with measurements from four real rooms.
The paper advances on a recent deep learning-based method for sound field reconstruction using a very low number of microphones.
arXiv Detail & Related papers (2021-02-12T11:34:18Z) - Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU.
The proposed model is based on an encoder-decoder architecture with skip-connections.
It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.