A neural network-supported two-stage algorithm for lightweight
dereverberation on hearing devices
- URL: http://arxiv.org/abs/2204.02978v2
- Date: Wed, 31 May 2023 15:34:46 GMT
- Title: A neural network-supported two-stage algorithm for lightweight
dereverberation on hearing devices
- Authors: Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning and Timo
Gerkmann
- Abstract summary: A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper.
The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-filter.
Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs)
- Score: 13.49645012479288
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A two-stage lightweight online dereverberation algorithm for hearing devices
is presented in this paper. The approach combines a multi-channel multi-frame
linear filter with a single-channel single-frame post-filter. Both components
rely on power spectral density (PSD) estimates provided by deep neural networks
(DNNs). By deriving new metrics analyzing the dereverberation performance in
various time ranges, we confirm that directly optimizing for a criterion at the
output of the multi-channel linear filtering stage results in a more efficient
dereverberation as compared to placing the criterion at the output of the DNN
to optimize the PSD estimation. More concretely, we show that training this
stage end-to-end helps further remove the reverberation in the range accessible
to the filter, thus increasing the \textit{early-to-moderate} reverberation
ratio. We argue and demonstrate that it can then be well combined with a
post-filtering stage to efficiently suppress the residual late reverberation,
thereby increasing the \textit{early-to-final} reverberation ratio. This
proposed two stage procedure is shown to be both very effective in terms of
dereverberation performance and computational demands, as compared to e.g.
recent state-of-the-art DNN approaches. Furthermore, the proposed two-stage
system can be adapted to the needs of different types of hearing-device users
by controlling the amount of reduction of early reflections.
Related papers
- Fast T2T: Optimization Consistency Speeds Up Diffusion-Based Training-to-Testing Solving for Combinatorial Optimization [83.65278205301576]
We propose to learn direct mappings from different noise levels to the optimal solution for a given instance, facilitating high-quality generation with minimal shots.
This is achieved through an optimization consistency training protocol, which minimizes the difference among samples.
Experiments on two popular tasks, the Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS), demonstrate the superiority of Fast T2T regarding both solution quality and efficiency.
arXiv Detail & Related papers (2025-02-05T07:13:43Z) - Resampling Filter Design for Multirate Neural Audio Effect Processing [9.149661171430257]
We explore the use of signal resampling at the input and output of the neural network as an alternative solution.
We show that a two-stage design consisting of a half-band IIR filter cascaded with a Kaiser window FIR filter can give similar or better results to the previously proposed model adjustment method.
arXiv Detail & Related papers (2025-01-30T16:44:49Z) - Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising [15.152748065111194]
This paper describes speech enhancement for realtime automatic speech recognition in real environments.
It estimates the masks of clean dry speech from a noisy echoic mixture spectrogram with a deep neural network (DNN) and then computes a enhancement filter used for beamforming.
The performance of such a supervised approach, however, is drastically degraded under mismatched conditions.
arXiv Detail & Related papers (2024-10-30T08:32:47Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification.
We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information.
SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z) - Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA
Clustering of DNN Embeddings [9.826793576487736]
This paper presents a new two-pass version of a system for speaker diarization using clustering and embeddings.
For the Callhome corpus, we achieve the first published error rate below 4% without any task-dependent parameter tuning.
We also show significant progress towards a robust single solution for multiple diarization tasks.
arXiv Detail & Related papers (2021-04-06T12:52:55Z) - Exploiting Multiple Timescales in Hierarchical Echo State Networks [0.0]
Echo state networks (ESNs) are a powerful form of reservoir computing that only require training of linear output weights.
Here we explore the timescales in hierarchical ESNs, where the reservoir is partitioned into two smaller reservoirs linked with distinct properties.
arXiv Detail & Related papers (2021-01-11T22:33:17Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z) - ADRN: Attention-based Deep Residual Network for Hyperspectral Image
Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one.
Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z) - End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization [43.15288441772729]
Denoising networks learn mapping from noisy speech to clean one directly.
Existing schemes have either of two critical issues: spectrum and metric mismatches.
This paper presents a new end-to-end denoising framework with the goal of joint SDR and PESQ optimization.
arXiv Detail & Related papers (2019-01-26T02:48:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.