Related papers: Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training

Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training

URL: http://arxiv.org/abs/2504.14409v1
Date: Sat, 19 Apr 2025 21:43:56 GMT
Title: Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training
Authors: Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Jonathan Le Roux,
Abstract summary: This report details MERL's system for room impulse response (RIR) estimation submitted to the Generative Data Augmentation Workshop at ICASSP 2025.<n>We first pre-train a neural acoustic field conditioned by room geometry on an external large-scale dataset in which pairs of RIRs and the geometries are provided.<n>The neural acoustic field is then adapted to each target room by using the enrollment data.<n>We predict the RIRs for each pair of source and receiver locations specified by Task 1, and use these RIRs to train the speaker distance estimation model in Task 2.
Score: 34.14967280931229
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This report details MERL's system for room impulse response (RIR) estimation submitted to the Generative Data Augmentation Workshop at ICASSP 2025 for Augmenting RIR Data (Task 1) and Improving Speaker Distance Estimation (Task 2). We first pre-train a neural acoustic field conditioned by room geometry on an external large-scale dataset in which pairs of RIRs and the geometries are provided. The neural acoustic field is then adapted to each target room by using the enrollment data, where we leverage either the provided room geometries or geometries retrieved from the external dataset, depending on availability. Lastly, we predict the RIRs for each pair of source and receiver locations specified by Task 1, and use these RIRs to train the speaker distance estimation model in Task 2.

Related papers

MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients [7.468132532398651]
We implement three features on top of the traditional image source method-based (ISM) shoebox RIRs.<n>We train a DeepFilternet3 model for each RIR dataset and evaluate the performance on a test set of real RIRs.
arXiv Detail & Related papers (2025-07-13T19:00:26Z)
DiffusionRIR: Room Impulse Response Interpolation using Diffusion Models [16.92449230293275]
High-quality RIR estimates drive applications such as virtual microphones, sound source localization, augmented reality, and data augmentation. This research addresses the challenge of estimating RIRs at unmeasured locations within a room using Denoising Diffusion Probabilistic Models (DDPM)
arXiv Detail & Related papers (2025-04-29T10:52:07Z)
AV-RIR: Audio-Visual Room Impulse Response Estimation [49.469389715876915]
Accurate estimation of Room Impulse Response (RIR) is important for speech processing and AR/VR applications. We propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given reverberant speech signal and visual cues of its corresponding environment.
arXiv Detail & Related papers (2023-11-30T22:58:30Z)
NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions [93.02515761070201]
We present a novel type of neural fields that uses general radial bases for signal representation. Our method builds upon general radial bases with flexible kernel position and shape, which have higher spatial adaptivity and can more closely fit target signals. When applied to neural radiance field reconstruction, our method achieves state-of-the-art rendering quality, with small model size and comparable training speed.
arXiv Detail & Related papers (2023-09-27T06:32:05Z)
Data-driven modelling of brain activity using neural networks, Diffusion Maps, and the Koopman operator [0.0]
We propose a machine-learning approach to model long-term out-of-sample dynamics of brain activity from task-dependent fMRI data. We use Diffusion maps (DMs) to discover a set of variables that parametrize the low-dimensional manifold on which the emergent high-dimensional fMRI time series evolve. We construct reduced-order-models (ROMs) on the embedded manifold via two techniques: Feedforward Neural Networks (FNNs) and the Koopman operator.
arXiv Detail & Related papers (2023-04-24T09:08:12Z)
Towards Improved Room Impulse Response Estimation for Speech Recognition [53.04440557465013]
We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of far-field automatic speech recognition (ASR) We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a generative adversarial network (GAN) based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features.
arXiv Detail & Related papers (2022-11-08T00:40:27Z)
Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener. We explore how to infer RIRs based on a sparse set of images and echoes observed in the space. In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z)
Self-Learning for Received Signal Strength Map Reconstruction with Neural Architecture Search [63.39818029362661]
We present a model based on Neural Architecture Search (NAS) and self-learning for received signal strength ( RSS) map reconstruction. The approach first finds an optimal NN architecture and simultaneously train the deduced model over some ground-truth measurements of a given ( RSS) map. Experimental results show that signal predictions of this second model outperforms non-learning based state-of-the-art techniques and NN models with no architecture search.
arXiv Detail & Related papers (2021-05-17T12:19:22Z)
StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation [6.824692201913681]
StoRIR is a room impulse response generation method dedicated to audio data augmentation in machine learning applications. We show that StoRIR, when used for audio data augmentation in a speech enhancement task, allows deep learning models to achieve better results on a wide range of metrics than when using the conventional image-source method.
arXiv Detail & Related papers (2020-08-17T11:56:47Z)
Weakly-supervised land classification for coastal zone based on deep convolutional neural networks by incorporating dual-polarimetric characteristics into training dataset [1.0494061710470493]
We explore the performance of DCNNs on semantic segmentation using spaceborne polarimetric synthetic aperture radar (PolSAR) datasets. The semantic segmentation task using PolSAR data can be categorized as weakly supervised learning when the characteristics of SAR data and data annotating procedures are factored in. Three DCNN models, including SegNet, U-Net, and LinkNet, are implemented next.
arXiv Detail & Related papers (2020-03-30T17:32:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.