Related papers: A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices

A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices

URL: http://arxiv.org/abs/2204.02978v2
Date: Wed, 31 May 2023 15:34:46 GMT
Title: A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices
Authors: Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning and Timo Gerkmann
Abstract summary: A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper. The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-filter. Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs)
Score: 13.49645012479288
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper. The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-filter. Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs). By deriving new metrics analyzing the dereverberation performance in various time ranges, we confirm that directly optimizing for a criterion at the output of the multi-channel linear filtering stage results in a more efficient dereverberation as compared to placing the criterion at the output of the DNN to optimize the PSD estimation. More concretely, we show that training this stage end-to-end helps further remove the reverberation in the range accessible to the filter, thus increasing the \textit{early-to-moderate} reverberation ratio. We argue and demonstrate that it can then be well combined with a post-filtering stage to efficiently suppress the residual late reverberation, thereby increasing the \textit{early-to-final} reverberation ratio. This proposed two stage procedure is shown to be both very effective in terms of dereverberation performance and computational demands, as compared to e.g. recent state-of-the-art DNN approaches. Furthermore, the proposed two-stage system can be adapted to the needs of different types of hearing-device users by controlling the amount of reduction of early reflections.

Related papers

Noise Conditional Variational Score Distillation [60.38982038894823]
Noise Conditional Variational Score Distillation (NCVSD) is a novel method for distilling pretrained diffusion models into generative denoisers.<n>By integrating this insight into the Variational Score Distillation framework, we enable scalable learning of generative denoisers.
arXiv Detail & Related papers (2025-06-11T06:01:39Z)
Fast T2T: Optimization Consistency Speeds Up Diffusion-Based Training-to-Testing Solving for Combinatorial Optimization [83.65278205301576]
We propose to learn direct mappings from different noise levels to the optimal solution for a given instance, facilitating high-quality generation with minimal shots. This is achieved through an optimization consistency training protocol, which minimizes the difference among samples. Experiments on two popular tasks, the Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS), demonstrate the superiority of Fast T2T regarding both solution quality and efficiency.
arXiv Detail & Related papers (2025-02-05T07:13:43Z)
Resampling Filter Design for Multirate Neural Audio Effect Processing [9.149661171430257]
We explore the use of signal resampling at the input and output of the neural network as an alternative solution. We show that a two-stage design consisting of a half-band IIR filter cascaded with a Kaiser window FIR filter can give similar or better results to the previously proposed model adjustment method.
arXiv Detail & Related papers (2025-01-30T16:44:49Z)
Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising [15.152748065111194]
This paper describes speech enhancement for realtime automatic speech recognition in real environments. It estimates the masks of clean dry speech from a noisy echoic mixture spectrogram with a deep neural network (DNN) and then computes a enhancement filter used for beamforming. The performance of such a supervised approach, however, is drastically degraded under mismatched conditions.
arXiv Detail & Related papers (2024-10-30T08:32:47Z)
Spatial Annealing for Efficient Few-shot Neural Rendering [73.49548565633123]
We introduce an accurate and efficient few-shot neural rendering method named textbfSpatial textbfAnnealing regularized textbfNeRF (textbfSANeRF)<n>By adding merely one line of code, SANeRF delivers superior rendering quality and much faster reconstruction speed compared to current few-shot neural rendering methods.
arXiv Detail & Related papers (2024-06-12T02:48:52Z)
Single and Few-step Diffusion for Generative Speech Enhancement [18.487296462927034]
Diffusion models have shown promising results in speech enhancement. In this paper, we address these limitations through a two-stage training approach. We show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting.
arXiv Detail & Related papers (2023-09-18T11:30:58Z)
Low-rank extended Kalman filtering for online learning of neural networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream. The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix. In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z)
Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification. We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information. SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z)
NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo [97.07453889070574]
We present a new multi-view depth estimation method that utilizes both conventional SfM reconstruction and learning-based priors. We show that our proposed framework significantly outperforms state-of-the-art methods on indoor scenes.
arXiv Detail & Related papers (2021-09-02T17:54:31Z)
Neural Calibration for Scalable Beamforming in FDD Massive MIMO with Implicit Channel Estimation [10.775558382613077]
Channel estimation and beamforming play critical roles in frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems. We propose a deep learning-based approach that directly optimize the beamformers at the base station according to the received uplink pilots. A neural calibration method is proposed to improve the scalability of the end-to-end design.
arXiv Detail & Related papers (2021-08-03T14:26:14Z)
Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings [9.826793576487736]
This paper presents a new two-pass version of a system for speaker diarization using clustering and embeddings. For the Callhome corpus, we achieve the first published error rate below 4% without any task-dependent parameter tuning. We also show significant progress towards a robust single solution for multiple diarization tasks.
arXiv Detail & Related papers (2021-04-06T12:52:55Z)
Exploiting Multiple Timescales in Hierarchical Echo State Networks [0.0]
Echo state networks (ESNs) are a powerful form of reservoir computing that only require training of linear output weights. Here we explore the timescales in hierarchical ESNs, where the reservoir is partitioned into two smaller reservoirs linked with distinct properties.
arXiv Detail & Related papers (2021-01-11T22:33:17Z)
Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix. We train a CNN in the low-SNR regime to predict DoAs across all SNRs. Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z)
ADRN: Attention-based Deep Residual Network for Hyperspectral Image Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one. Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z)
Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals. Two main challenges are the complex acoustic environment and the real-time processing requirement. We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization [43.15288441772729]
Denoising networks learn mapping from noisy speech to clean one directly. Existing schemes have either of two critical issues: spectrum and metric mismatches. This paper presents a new end-to-end denoising framework with the goal of joint SDR and PESQ optimization.
arXiv Detail & Related papers (2019-01-26T02:48:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.