Nonlinear Residual Echo Suppression Based on Multi-stream Conv-TasNet
- URL: http://arxiv.org/abs/2005.07631v1
- Date: Fri, 15 May 2020 16:41:16 GMT
- Title: Nonlinear Residual Echo Suppression Based on Multi-stream Conv-TasNet
- Authors: Hongsheng Chen, Teng Xiang, Kai Chen, Jing Lu
- Abstract summary: We propose a residual echo suppression method based on the modification of fully convolutional time-domain audio separation network (Conv-TasNet)
Both the residual signal of the linear acoustic echo cancellation system, and the output of the adaptive filter are adopted to form multiple streams for the Conv-TasNet.
- Score: 22.56178941790508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Acoustic echo cannot be entirely removed by linear adaptive filters due to
the nonlinear relationship between the echo and far-end signal. Usually a post
processing module is required to further suppress the echo. In this paper, we
propose a residual echo suppression method based on the modification of fully
convolutional time-domain audio separation network (Conv-TasNet). Both the
residual signal of the linear acoustic echo cancellation system, and the output
of the adaptive filter are adopted to form multiple streams for the
Conv-TasNet, resulting in more effective echo suppression while keeping a lower
latency of the whole system. Simulation results validate the efficacy of the
proposed method in both single-talk and double-talk situations.
Related papers
- Closed-form Filtering for Non-linear Systems [83.91296397912218]
We propose a new class of filters based on Gaussian PSD Models, which offer several advantages in terms of density approximation and computational efficiency.
We show that filtering can be efficiently performed in closed form when transitions and observations are Gaussian PSD Models.
Our proposed estimator enjoys strong theoretical guarantees, with estimation error that depends on the quality of the approximation and is adaptive to the regularity of the transition probabilities.
arXiv Detail & Related papers (2024-02-15T08:51:49Z) - Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification.
We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information.
SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z) - A neural network-supported two-stage algorithm for lightweight
dereverberation on hearing devices [13.49645012479288]
A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper.
The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-filter.
Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs)
arXiv Detail & Related papers (2022-04-06T11:08:28Z) - Adaptive Low-Pass Filtering using Sliding Window Gaussian Processes [71.23286211775084]
We propose an adaptive low-pass filter based on Gaussian process regression.
We show that the estimation error of the proposed method is uniformly bounded.
arXiv Detail & Related papers (2021-11-05T17:06:59Z) - End-to-End Complex-Valued Multidilated Convolutional Neural Network for
Joint Acoustic Echo Cancellation and Noise Suppression [25.04740291728234]
In this paper, we exploit the offset-compensating ability of complex time-frequency masks and propose an end-to-end complex neural network architecture.
We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement.
arXiv Detail & Related papers (2021-10-02T07:41:41Z) - Deep Residual Echo Suppression with A Tunable Tradeoff Between Signal
Distortion and Echo Suppression [13.558688470594676]
A UNet neural network maps the outputs of a linear acoustic echo canceler to the desired signal in the spectral domain.
The system employs 136 thousand parameters, and requires 1.6 Giga floating-point operations per second and 10 Mega-bytes of memory.
arXiv Detail & Related papers (2021-06-25T09:49:18Z) - Class-Conditional Defense GAN Against End-to-End Speech Attacks [82.21746840893658]
We propose a novel approach against end-to-end adversarial attacks developed to fool advanced speech-to-text systems such as DeepSpeech and Lingvo.
Unlike conventional defense approaches, the proposed approach does not directly employ low-level transformations such as autoencoding a given input signal.
Our defense-GAN considerably outperforms conventional defense algorithms in terms of word error rate and sentence level recognition accuracy.
arXiv Detail & Related papers (2020-10-22T00:02:02Z) - Residual acoustic echo suppression based on efficient multi-task
convolutional neural network [0.0]
We propose a real-time residual acoustic echo suppression (RAES) method using an efficient convolutional neural network.
The training criterion is based on a novel loss function, which we call as the suppression loss, to balance the suppression of residual echo and the distortion of near-end signals.
arXiv Detail & Related papers (2020-09-29T11:26:25Z) - Acoustic Echo Cancellation by Combining Adaptive Digital Filter and
Recurrent Neural Network [11.335343110341354]
A fusion scheme by combining adaptive filter and neural network is proposed for Acoustic Echo Cancellation.
The echo could be reduced in a large scale by adaptive filtering, resulting in little residual echo.
The neural network is elaborately designed and trained for suppressing such residual echo.
arXiv Detail & Related papers (2020-05-19T06:25:52Z) - Audio-visual Multi-channel Recognition of Overlapped Speech [79.21950701506732]
This paper presents an audio-visual multi-channel overlapped speech recognition system featuring tightly integrated separation front-end and recognition back-end.
Experiments suggest that the proposed multi-channel AVSR system outperforms the baseline audio-only ASR system by up to 6.81% (26.83% relative) and 22.22% (56.87% relative) absolute word error rate (WER) reduction on overlapped speech constructed using either simulation or replaying of the lipreading sentence 2 dataset respectively.
arXiv Detail & Related papers (2020-05-18T10:31:19Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.