Related papers: Adaptive Speech Quality Aware Complex Neural Network for Acoustic Echo Cancellation with Supervised Contrastive Learning

Adaptive Speech Quality Aware Complex Neural Network for Acoustic Echo Cancellation with Supervised Contrastive Learning

URL: http://arxiv.org/abs/2210.16791v2
Date: Tue, 1 Nov 2022 14:41:34 GMT
Title: Adaptive Speech Quality Aware Complex Neural Network for Acoustic Echo Cancellation with Supervised Contrastive Learning
Authors: Bozhong Liu, Xiaoxi Yu, Hantao Huang
Abstract summary: Acoustic echo cancellation is designed to remove echoes, reverberation, and unwanted added sounds from the microphone signal. This paper proposes adaptive speech quality complex neural networks to focus on specific tasks for real-time acoustic echo cancellation.
Score: 3.1644851830271747
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Acoustic echo cancellation (AEC) is designed to remove echoes, reverberation, and unwanted added sounds from the microphone signal while maintaining the quality of the near-end speaker's speech. This paper proposes adaptive speech quality complex neural networks to focus on specific tasks for real-time acoustic echo cancellation. In specific, we propose a complex modularize neural network with different stages to focus on feature extraction, acoustic separation, and mask optimization receptively. Furthermore, we adopt the contrastive learning framework and novel speech quality aware loss functions to further improve the performance. The model is trained with 72 hours for pre-training and then 72 hours for fine-tuning. The proposed model outperforms the state-of-the-art performance.

Related papers

A Small-footprint Acoustic Echo Cancellation Solution for Mobile Full-Duplex Speech Interactions [1.5929852667227002]
This paper presents a neural network-based solution to address challenges in scenarios with varying hardware, nonlinear distortions and long latency.<n>Progressive learning is employed to improve AEC augmentation effectiveness resulting in a considerable improvement in speech quality.
arXiv Detail & Related papers (2025-08-11T02:45:31Z)
Active Speech Enhancement: Active Speech Denoising Decliping and Deveraberation [13.575063025878208]
We introduce a new paradigm for active sound modification: Active Speech Enhancement (ASE)<n>We propose a novel Transformer-Mamba-based architecture, along with a task-specific loss function designed to jointly optimize interference suppression and signal enrichment.<n>Our method outperforms existing baselines across multiple speech processing tasks -- including denoising, dereverberation, and declipping.
arXiv Detail & Related papers (2025-05-22T17:10:18Z)
UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization [60.43992089087448]
Dysarthric speech reconstruction systems aim to automatically convert dysarthric speech into normal-sounding speech. We propose a Unit-DSR system, which harnesses the powerful domain-adaptation capacity of HuBERT for training efficiency improvement. Compared with NED approaches, the Unit-DSR system only consists of a speech unit normalizer and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded sub-modules or auxiliary tasks.
arXiv Detail & Related papers (2024-01-26T06:08:47Z)
Deep model with built-in self-attention alignment for acoustic echo cancellation [1.30661828021882]
We propose a deep learning architecture with built-in self-attention based alignment. Our approach achieves significant improvements for difficult delay estimation cases on real recordings.
arXiv Detail & Related papers (2022-08-24T05:29:47Z)
End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder. We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z)
Improving Speech Enhancement through Fine-Grained Speech Characteristics [42.49874064240742]
We propose a novel approach to speech enhancement aimed at improving perceptual quality and naturalness of enhanced signals. We first identify key acoustic parameters that have been found to correlate well with voice quality. We then propose objective functions which are aimed at reducing the difference between clean speech and enhanced speech with respect to these features.
arXiv Detail & Related papers (2022-07-01T07:04:28Z)
Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition [57.15942628305797]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems for normal speech. This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training. Cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features.
arXiv Detail & Related papers (2022-03-19T08:47:18Z)
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments. We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition. We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z)
Personalized Speech Enhancement: New Models and Comprehensive Evaluation [27.572537325449158]
We propose two neural networks for personalized speech enhancement (PSE) models that achieve superior performance to the previously proposed VoiceFilter. We also create test sets that capture a variety of scenarios that users can encounter during video conferencing. Our results show that the proposed models can yield better speech recognition accuracy, speech intelligibility, and perceptual quality than the baseline models.
arXiv Detail & Related papers (2021-10-18T21:21:23Z)
Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning with Knowledge Distillation [26.39206098000297]
We propose a novel personalized speech enhancement method to adapt a compact denoising model to the test-time specificity. Our goal in this test-time adaptation is to utilize no clean speech target of the test speaker. Instead of the missing clean utterance target, we distill the more advanced denoising results from an overly large teacher model.
arXiv Detail & Related papers (2021-05-08T00:42:03Z)
Residual acoustic echo suppression based on efficient multi-task convolutional neural network [0.0]
We propose a real-time residual acoustic echo suppression (RAES) method using an efficient convolutional neural network. The training criterion is based on a novel loss function, which we call as the suppression loss, to balance the suppression of residual echo and the distortion of near-end signals.
arXiv Detail & Related papers (2020-09-29T11:26:25Z)
Multi-task self-supervised learning for Robust Speech Recognition [75.11748484288229]
This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. We employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances. We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.
arXiv Detail & Related papers (2020-01-25T00:24:45Z)
Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals. Two main challenges are the complex acoustic environment and the real-time processing requirement. We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.