Time-Variance Aware Real-Time Speech Enhancement
- URL: http://arxiv.org/abs/2302.13063v1
- Date: Sat, 25 Feb 2023 11:37:35 GMT
- Title: Time-Variance Aware Real-Time Speech Enhancement
- Authors: Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu
- Abstract summary: Current end-to-end deep neural network (DNN) based methods usually model time-variant components implicitly.
We propose a dynamic kernel generation (DKG) module that can be introduced as a learnable plug-in to a DNN-based end-to-end pipeline.
Experimental results verify that DKG module improves the performance of the model under time-variant scenarios.
- Score: 27.180179632422853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Time-variant factors often occur in real-world full-duplex communication
applications. Some of them are caused by the complex environment such as
non-stationary environmental noises and varying acoustic path while some are
caused by the communication system such as the dynamic delay between the
far-end and near-end signals. Current end-to-end deep neural network (DNN)
based methods usually model the time-variant components implicitly and can
hardly handle the unpredictable time-variance in real-time speech enhancement.
To explicitly capture the time-variant components, we propose a dynamic kernel
generation (DKG) module that can be introduced as a learnable plug-in to a
DNN-based end-to-end pipeline. Specifically, the DKG module generates a
convolutional kernel regarding to each input audio frame, so that the DNN model
is able to dynamically adjust its weights according to the input signal during
inference. Experimental results verify that DKG module improves the performance
of the model under time-variant scenarios, in the joint acoustic echo
cancellation (AEC) and deep noise suppression (DNS) tasks.
Related papers
- Fast Window-Based Event Denoising with Spatiotemporal Correlation
Enhancement [85.66867277156089]
We propose window-based event denoising, which simultaneously deals with a stack of events.
In spatial domain, we choose maximum a posteriori (MAP) to discriminate real-world event and noise.
Our algorithm can remove event noise effectively and efficiently and improve the performance of downstream tasks.
arXiv Detail & Related papers (2024-02-14T15:56:42Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - ProgressiveMotionSeg: Mutually Reinforced Framework for Event-Based
Motion Segmentation [101.19290845597918]
This paper presents a Motion Estimation (ME) module and an Event Denoising (ED) module jointly optimized in a mutually reinforced manner.
Taking temporal correlation as guidance, ED module calculates the confidence that each event belongs to real activity events, and transmits it to ME module to update energy function of motion segmentation for noise suppression.
arXiv Detail & Related papers (2022-03-22T13:40:26Z) - End-to-End Complex-Valued Multidilated Convolutional Neural Network for
Joint Acoustic Echo Cancellation and Noise Suppression [25.04740291728234]
In this paper, we exploit the offset-compensating ability of complex time-frequency masks and propose an end-to-end complex neural network architecture.
We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement.
arXiv Detail & Related papers (2021-10-02T07:41:41Z) - Dissecting User-Perceived Latency of On-Device E2E Speech Recognition [34.645194215436966]
We show that factors affecting token emission latency, and endpointing behavior significantly impact on user-perceived latency (UPL)
We achieve the best trade-off between latency and word error rate when performing ASR jointly with endpointing, and using the recently proposed alignment regularization.
arXiv Detail & Related papers (2021-04-06T00:55:11Z) - Neural ODE Processes [64.10282200111983]
We introduce Neural ODE Processes (NDPs), a new class of processes determined by a distribution over Neural ODEs.
We show that our model can successfully capture the dynamics of low-dimensional systems from just a few data-points.
arXiv Detail & Related papers (2021-03-23T09:32:06Z) - Inferring, Predicting, and Denoising Causal Wave Dynamics [3.9407250051441403]
The DISTributed Artificial neural Network Architecture (DISTANA) is a generative, recurrent graph convolution neural network.
We show that DISTANA is very well-suited to denoise data streams, given that re-occurring patterns are observed.
It produces stable and accurate closed-loop predictions even over hundreds of time steps.
arXiv Detail & Related papers (2020-09-19T08:33:53Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.