FAST-RIR: Fast neural diffuse room impulse response generator
- URL: http://arxiv.org/abs/2110.04057v1
- Date: Thu, 7 Oct 2021 05:21:01 GMT
- Title: FAST-RIR: Fast neural diffuse room impulse response generator
- Authors: Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh
Manocha, Dong Yu
- Abstract summary: We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Our FAST-RIR is capable of generating RIRs for a given input reverberation time with an average error of 0.02s.
We show that our proposed FAST-RIR with batch size 1 is 400 times faster than a state-of-the-art diffuse acoustic simulator (DAS) on a CPU.
- Score: 81.96114823691343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a neural-network-based fast diffuse room impulse response
generator (FAST-RIR) for generating room impulse responses (RIRs) for a given
acoustic environment. Our FAST-RIR takes rectangular room dimensions, listener
and speaker positions, and reverberation time as inputs and generates specular
and diffuse reflections for a given acoustic environment. Our FAST-RIR is
capable of generating RIRs for a given input reverberation time with an average
error of 0.02s. We evaluate our generated RIRs in automatic speech recognition
(ASR) applications using Google Speech API, Microsoft Speech API, and Kaldi
tools. We show that our proposed FAST-RIR with batch size 1 is 400 times faster
than a state-of-the-art diffuse acoustic simulator (DAS) on a CPU and gives
similar performance to DAS in ASR experiments. Our FAST-RIR is 12 times faster
than an existing GPU-based RIR generator (gpuRIR). We show that our FAST-RIR
outperforms gpuRIR by 2.5% in an AMI far-field ASR benchmark.
Related papers
- AV-RIR: Audio-Visual Room Impulse Response Estimation [49.469389715876915]
Accurate estimation of Room Impulse Response (RIR) is important for speech processing and AR/VR applications.
We propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given reverberant speech signal and visual cues of its corresponding environment.
arXiv Detail & Related papers (2023-11-30T22:58:30Z) - Streaming Speech-to-Confusion Network Speech Recognition [19.720334657478475]
We present a novel streaming ASR architecture that outputs a confusion network while maintaining limited latency.
We show that 1-best results of our model are on par with a comparable RNN-T system.
We also show that our model outperforms a strong RNN-T baseline on a far-field voice assistant task.
arXiv Detail & Related papers (2023-06-02T20:28:14Z) - Synthetic Wave-Geometric Impulse Responses for Improved Speech
Dereverberation [69.1351513309953]
We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation.
We demonstrate that speech dereverberation models trained on hybrid synthetic RIRs outperform models trained on RIRs generated by prior geometric ray tracing methods.
arXiv Detail & Related papers (2022-12-10T20:15:23Z) - Towards Improved Room Impulse Response Estimation for Speech Recognition [53.04440557465013]
We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of far-field automatic speech recognition (ASR)
We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators.
We then propose a generative adversarial network (GAN) based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features.
arXiv Detail & Related papers (2022-11-08T00:40:27Z) - Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener.
We explore how to infer RIRs based on a sparse set of images and echoes observed in the space.
In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z) - MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D
Scenes [56.946057850725545]
We propose a mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh.
Our method can handle input triangular meshes with arbitrary topologies (2K - 3M triangles)
We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error.
arXiv Detail & Related papers (2022-05-18T23:50:34Z) - StoRIR: Stochastic Room Impulse Response Generation for Audio Data
Augmentation [6.824692201913681]
StoRIR is a room impulse response generation method dedicated to audio data augmentation in machine learning applications.
We show that StoRIR, when used for audio data augmentation in a speech enhancement task, allows deep learning models to achieve better results on a wide range of metrics than when using the conventional image-source method.
arXiv Detail & Related papers (2020-08-17T11:56:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.