Synthetic Wave-Geometric Impulse Responses for Improved Speech
Dereverberation
- URL: http://arxiv.org/abs/2212.05360v1
- Date: Sat, 10 Dec 2022 20:15:23 GMT
- Title: Synthetic Wave-Geometric Impulse Responses for Improved Speech
Dereverberation
- Authors: Rohith Aralikatti, Zhenyu Tang, Dinesh Manocha
- Abstract summary: We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation.
We demonstrate that speech dereverberation models trained on hybrid synthetic RIRs outperform models trained on RIRs generated by prior geometric ray tracing methods.
- Score: 69.1351513309953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel approach to improve the performance of learning-based
speech dereverberation using accurate synthetic datasets. Our approach is
designed to recover the reverb-free signal from a reverberant speech signal. We
show that accurately simulating the low-frequency components of Room Impulse
Responses (RIRs) is important to achieving good dereverberation. We use the GWA
dataset that consists of synthetic RIRs generated in a hybrid fashion: an
accurate wave-based solver is used to simulate the lower frequencies and
geometric ray tracing methods simulate the higher frequencies. We demonstrate
that speech dereverberation models trained on hybrid synthetic RIRs outperform
models trained on RIRs generated by prior geometric ray tracing methods on four
real-world RIR datasets.
Related papers
- Model and Deep learning based Dynamic Range Compression Inversion [12.002024727237837]
Inverting DRC can help to restore the original dynamics to produce new mixes and/or to improve the overall quality of the audio signal.
We propose a model-based approach with neural networks for DRC inversion.
Our results show the effectiveness and robustness of the proposed method in comparison to several state-of-the-art methods.
arXiv Detail & Related papers (2024-11-07T00:33:07Z) - Speech enhancement with frequency domain auto-regressive modeling [34.55703785405481]
Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation.
We propose a unified framework of speech dereverberation for improving the speech quality and the automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2023-09-24T03:25:51Z) - Radio Generation Using Generative Adversarial Networks with An Unrolled
Design [18.049453261384013]
We develop a novel GAN framework for radio generation called "Radio GAN"
The first is learning based on sampling points, which aims to model an underlying sampling distribution of radio signals.
The second is an unrolled generator design, combined with an estimated pure signal distribution as a prior, which can greatly reduce learning difficulty.
arXiv Detail & Related papers (2023-06-24T07:47:22Z) - Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion [85.54515118077825]
This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality.
To reduce computational complexity, LinDiff employs a patch-based processing approach that partitions the input signal into small patches.
Our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed.
arXiv Detail & Related papers (2023-06-09T07:02:43Z) - Machine learning for phase-resolved reconstruction of nonlinear ocean
wave surface elevations from sparse remote sensing data [37.69303106863453]
We propose a novel approach for phase-resolved wave surface reconstruction using neural networks.
Our approach utilizes synthetic yet highly realistic training data on uniform one-dimensional grids.
arXiv Detail & Related papers (2023-05-18T12:30:26Z) - Towards Improved Room Impulse Response Estimation for Speech Recognition [53.04440557465013]
We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of far-field automatic speech recognition (ASR)
We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators.
We then propose a generative adversarial network (GAN) based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features.
arXiv Detail & Related papers (2022-11-08T00:40:27Z) - Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener.
We explore how to infer RIRs based on a sparse set of images and echoes observed in the space.
In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z) - Deep Impulse Responses: Estimating and Parameterizing Filters with Deep
Networks [76.830358429947]
Impulse response estimation in high noise and in-the-wild settings is a challenging problem.
We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning.
arXiv Detail & Related papers (2022-02-07T18:57:23Z) - FAST-RIR: Fast neural diffuse room impulse response generator [81.96114823691343]
We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Our FAST-RIR is capable of generating RIRs for a given input reverberation time with an average error of 0.02s.
We show that our proposed FAST-RIR with batch size 1 is 400 times faster than a state-of-the-art diffuse acoustic simulator (DAS) on a CPU.
arXiv Detail & Related papers (2021-10-07T05:21:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.