MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data
- URL: http://arxiv.org/abs/2203.12369v2
- Date: Thu, 24 Mar 2022 10:03:35 GMT
- Title: MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data
- Authors: George Close, Thomas Hain and Stefan Goetze
- Abstract summary: We propose a "de-generator" which attempts to improve the robustness of the prediction network.
Experimental results on the VoiceBank-DEMAND dataset show relative improvement in PESQ score of 3.8%.
- Score: 26.94528951545861
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training of speech enhancement systems often does not incorporate knowledge
of human perception and thus can lead to unnatural sounding results.
Incorporating psychoacoustically motivated speech perception metrics as part of
model training via a predictor network has recently gained interest. However,
the performance of such predictors is limited by the distribution of metric
scores that appear in the training data. In this work, we propose MetricGAN+/-
(an extension of MetricGAN+, one such metric-motivated system) which introduces
an additional network - a "de-generator" which attempts to improve the
robustness of the prediction network (and by extension of the generator) by
ensuring observation of a wider range of metric scores in training.
Experimental results on the VoiceBank-DEMAND dataset show relative improvement
in PESQ score of 3.8% (3.05 vs 3.22 PESQ score), as well as better
generalisation to unseen noise and speech.
Related papers
- Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired
Users using Intermediate ASR Features and Human Memory Models [29.511898279006175]
This work combines the use ofWhisper ASR decoder layer representations as neural network input features with an exemplar-based, psychologically motivated model of human memory to predict human intelligibility ratings for hearing-aid users.
Substantial performance improvement over an established intrusive HASPI baseline system is found, including on enhancement systems and listeners unseen in the training data, with a root mean squared error of 25.3 compared with the baseline of 28.7.
arXiv Detail & Related papers (2024-01-24T17:31:07Z) - DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata [28.260347585185176]
We present three novel methods to improve intelligibility prediction accuracy.
MBI-Net+ is an enhanced version of MBI-Net, the top-performing system in the 1st Clarity Prediction Challenge.
arXiv Detail & Related papers (2023-09-18T07:51:09Z) - Collaborative Learning with a Drone Orchestrator [79.75113006257872]
A swarm of intelligent wireless devices train a shared neural network model with the help of a drone.
The proposed framework achieves a significant speedup in training, leading to an average 24% and 87% saving in the drone hovering time.
arXiv Detail & Related papers (2023-03-03T23:46:25Z) - Metric-oriented Speech Enhancement using Diffusion Probabilistic Model [23.84172431047342]
Deep neural network based speech enhancement technique focuses on learning a noisy-to-clean transformation supervised by paired training data.
The task-specific evaluation metric (e.g., PESQ) is usually non-differentiable and can not be directly constructed in the training criteria.
We propose a metric-oriented speech enhancement method (MOSE) which integrates a metric-oriented training strategy into its reverse process.
arXiv Detail & Related papers (2023-02-23T13:12:35Z) - MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility
Prediction Model for Hearing Aids [22.736703635666164]
We propose a multi-branched speech intelligibility prediction model (MBI-Net) for predicting subjective intelligibility scores of hearing aid (HA) users.
The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores.
arXiv Detail & Related papers (2022-04-07T09:13:44Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement [37.3251779254894]
We propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed.
With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN.
arXiv Detail & Related papers (2021-04-08T06:46:35Z) - Effects of Word-frequency based Pre- and Post- Processings for Audio
Captioning [49.41766997393417]
The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning.
The system received the highest evaluation scores, but which of the individual elements most fully contributed to its perfor-mance has not yet been clarified.
arXiv Detail & Related papers (2020-09-24T01:07:33Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.