Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust
Speech Recognition
- URL: http://arxiv.org/abs/2302.11362v2
- Date: Wed, 3 May 2023 05:06:51 GMT
- Title: Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust
Speech Recognition
- Authors: Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng
- Abstract summary: gradient remedy (GR) is a simple yet effective approach to solve interference between task gradients in noise-robust speech recognition.
We show that the proposed approach well resolves the gradient interference and relative word error rate (WER) reductions of 9.3% and 11.1% over multi-task learning baseline.
- Score: 23.042478625584653
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech enhancement (SE) is proved effective in reducing noise from noisy
speech signals for downstream automatic speech recognition (ASR), where
multi-task learning strategy is employed to jointly optimize these two tasks.
However, the enhanced speech learned by SE objective may not always yield good
ASR results. From the optimization view, there sometimes exists interference
between the gradients of SE and ASR tasks, which could hinder the multi-task
learning and finally lead to sub-optimal ASR performance. In this paper, we
propose a simple yet effective approach called gradient remedy (GR) to solve
interference between task gradients in noise-robust speech recognition, from
perspectives of both angle and magnitude. Specifically, we first project the SE
task's gradient onto a dynamic surface that is at acute angle to ASR gradient,
in order to remove the conflict between them and assist in ASR optimization.
Furthermore, we adaptively rescale the magnitude of two gradients to prevent
the dominant ASR task from being misled by SE gradient. Experimental results
show that the proposed approach well resolves the gradient interference and
achieves relative word error rate (WER) reductions of 9.3% and 11.1% over
multi-task learning baseline, on RATS and CHiME-4 datasets, respectively. Our
code is available at GitHub.
Related papers
- Unifying Speech Enhancement and Separation with Gradient Modulation for
End-to-End Noise-Robust Speech Separation [23.758202121043805]
We propose a novel network to unify speech enhancement and separation with gradient modulation to improve noise-robustness.
Experimental results show that our approach achieves the state-of-the-art on large-scale Libri2Mix- and Libri3Mix-noisy datasets.
arXiv Detail & Related papers (2023-02-22T03:54:50Z) - Enhancing and Adversarial: Improve ASR with Speaker Labels [49.73714831258699]
We propose a novel adaptive gradient reversal layer for stable and effective adversarial training without tuning effort.
Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training.
Our best speaker-based MTL achieves 7% relative improvement on the Switchboard Hub5'00 set.
arXiv Detail & Related papers (2022-11-11T17:40:08Z) - Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition [26.77806246793544]
Speech enhancement (SE) is introduced as front-end to reduce noise for ASR, but it also suppresses some important speech information.
We propose a dual-path style learning approach for end-to-end noise-robust speech recognition (DPSL-ASR)
Experiments show that the proposed approach achieves relative word error rate (WER) reductions of 10.6% and 8.6% over the best IFF-Net baseline.
arXiv Detail & Related papers (2022-03-28T15:21:57Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Gated Recurrent Fusion with Joint Training Framework for Robust
End-to-End Speech Recognition [64.9317368575585]
This paper proposes a gated recurrent fusion (GRF) method with joint training framework for robust end-to-end ASR.
The GRF algorithm is used to dynamically combine the noisy and enhanced features.
The proposed method achieves the relative character error rate (CER) reduction of 10.04% over the conventional joint enhancement and transformer method.
arXiv Detail & Related papers (2020-11-09T08:52:05Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - Improving noise robust automatic speech recognition with single-channel
time-domain enhancement network [100.1041336974175]
We show that a single-channel time-domain denoising approach can significantly improve ASR performance.
We show that single-channel noise reduction can still improve ASR performance.
arXiv Detail & Related papers (2020-03-09T09:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.