StoRIR: Stochastic Room Impulse Response Generation for Audio Data
Augmentation
- URL: http://arxiv.org/abs/2008.07231v1
- Date: Mon, 17 Aug 2020 11:56:47 GMT
- Title: StoRIR: Stochastic Room Impulse Response Generation for Audio Data
Augmentation
- Authors: Piotr Masztalski, Mateusz Matuszewski, Karol Piaskowski, Micha{\l}
Romaniuk
- Abstract summary: StoRIR is a room impulse response generation method dedicated to audio data augmentation in machine learning applications.
We show that StoRIR, when used for audio data augmentation in a speech enhancement task, allows deep learning models to achieve better results on a wide range of metrics than when using the conventional image-source method.
- Score: 6.824692201913681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we introduce StoRIR - a stochastic room impulse response
generation method dedicated to audio data augmentation in machine learning
applications. This technique, in contrary to geometrical methods like
image-source or ray tracing, does not require prior definition of room
geometry, absorption coefficients or microphone and source placement and is
dependent solely on the acoustic parameters of the room. The method is
intuitive, easy to implement and allows to generate RIRs of very complicated
enclosures. We show that StoRIR, when used for audio data augmentation in a
speech enhancement task, allows deep learning models to achieve better results
on a wide range of metrics than when using the conventional image-source
method, effectively improving many of them by more than 5 %. We publish a
Python implementation of StoRIR online
Related papers
- Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis [88.86777314004044]
We propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view visualization.
Experiments show that our X-Gaussian outperforms state-of-the-art methods by 6.5 dB while enjoying less than 15% training time and over 73x inference speed.
arXiv Detail & Related papers (2024-03-07T00:12:08Z) - AV-RIR: Audio-Visual Room Impulse Response Estimation [49.469389715876915]
Accurate estimation of Room Impulse Response (RIR) is important for speech processing and AR/VR applications.
We propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given reverberant speech signal and visual cues of its corresponding environment.
arXiv Detail & Related papers (2023-11-30T22:58:30Z) - Synthetic Wave-Geometric Impulse Responses for Improved Speech
Dereverberation [69.1351513309953]
We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation.
We demonstrate that speech dereverberation models trained on hybrid synthetic RIRs outperform models trained on RIRs generated by prior geometric ray tracing methods.
arXiv Detail & Related papers (2022-12-10T20:15:23Z) - Decision Forest Based EMG Signal Classification with Low Volume Dataset
Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience.
We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z) - Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener.
We explore how to infer RIRs based on a sparse set of images and echoes observed in the space.
In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - A Model Compression Method with Matrix Product Operators for Speech
Enhancement [15.066942043773267]
We propose a model compression method based on matrix product operators (MPO) to substantially reduce the number of parameters in neural network models for speech enhancement.
Our proposal provides an effective model compression method for speech enhancement, especially in cloud-free application.
arXiv Detail & Related papers (2020-10-10T08:53:25Z) - DDSP: Differentiable Digital Signal Processing [13.448630251745163]
We introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods.
We achieve high-fidelity generation without the need for large autoregressive models or adversarial losses.
P enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning.
arXiv Detail & Related papers (2020-01-14T06:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.