On the Behavior of Intrusive and Non-intrusive Speech Enhancement
Metrics in Predictive and Generative Settings
- URL: http://arxiv.org/abs/2306.03014v1
- Date: Mon, 5 Jun 2023 16:30:17 GMT
- Title: On the Behavior of Intrusive and Non-intrusive Speech Enhancement
Metrics in Predictive and Generative Settings
- Authors: Danilo de Oliveira, Julius Richter, Jean-Marie Lemercier, Tal Peer,
Timo Gerkmann
- Abstract summary: We evaluate the performance of the same speech enhancement backbone trained under predictive and generative paradigms.
We show that intrusive and non-intrusive measures correlate differently for each paradigm.
- Score: 14.734454356396157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since its inception, the field of deep speech enhancement has been dominated
by predictive (discriminative) approaches, such as spectral mapping or masking.
Recently, however, novel generative approaches have been applied to speech
enhancement, attaining good denoising performance with high subjective quality
scores. At the same time, advances in deep learning also allowed for the
creation of neural network-based metrics, which have desirable traits such as
being able to work without a reference (non-intrusively). Since generatively
enhanced speech tends to exhibit radically different residual distortions, its
evaluation using instrumental speech metrics may behave differently compared to
predictively enhanced speech. In this paper, we evaluate the performance of the
same speech enhancement backbone trained under predictive and generative
paradigms on a variety of metrics and show that intrusive and non-intrusive
measures correlate differently for each paradigm. This analysis motivates the
search for metrics that can together paint a complete and unbiased picture of
speech enhancement performance, irrespective of the model's training process.
Related papers
- Self-supervised Fine-tuning for Improved Content Representations by
Speaker-invariant Clustering [78.2927924732142]
We propose speaker-invariant clustering (Spin) as a novel self-supervised learning method.
Spin disentangles speaker information and preserves content representations with just 45 minutes of fine-tuning on a single GPU.
arXiv Detail & Related papers (2023-05-18T15:59:36Z) - Adversarial Representation Learning for Robust Privacy Preservation in
Audio [11.409577482625053]
Sound event detection systems may inadvertently reveal sensitive information about users or their surroundings.
We propose a novel adversarial training method for learning representations of audio recordings.
The proposed method is evaluated against a baseline approach with no privacy measures and a prior adversarial training method.
arXiv Detail & Related papers (2023-04-29T08:39:55Z) - Improving the Intent Classification accuracy in Noisy Environment [9.447108578893639]
In this paper, we investigate how environmental noise and related noise reduction techniques to address the intent classification task with end-to-end neural models.
For this task, the use of speech enhancement greatly improves the classification accuracy in noisy conditions.
arXiv Detail & Related papers (2023-03-12T06:11:44Z) - PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech
Enhancement [41.872384434583466]
We propose a learning objective that formalizes differences in perceptual quality.
We identify temporal acoustic parameters that are non-differentiable.
We develop a neural network estimator that can accurately predict their time-series values.
arXiv Detail & Related papers (2023-02-16T05:17:06Z) - Perceive and predict: self-supervised speech representation based loss
functions for speech enhancement [23.974815078687445]
It is shown that the distance between the feature encodings of clean and noisy speech correlate strongly with psychoacoustically motivated measures of speech quality and intelligibility.
Experiments using this distance as a loss function are performed and improved performance over the use of STFT spectrogram distance based loss.
arXiv Detail & Related papers (2023-01-11T10:20:56Z) - On the robustness of non-intrusive speech quality model by adversarial
examples [10.985001960872264]
We show that deep speech quality predictors can be vulnerable to adversarial perturbations.
We further explore and confirm the viability of adversarial training for strengthening robustness of models.
arXiv Detail & Related papers (2022-11-11T23:06:24Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - Towards End-to-end Unsupervised Speech Recognition [120.4915001021405]
We introduce wvu which does away with all audio-side pre-processing and improves accuracy through better architecture.
In addition, we introduce an auxiliary self-supervised objective that ties model predictions back to the input.
Experiments show that wvuimproves unsupervised recognition results across different languages while being conceptually simpler.
arXiv Detail & Related papers (2022-04-05T21:22:38Z) - Improving Distortion Robustness of Self-supervised Speech Processing
Tasks with Domain Adaptation [60.26511271597065]
Speech distortions are a long-standing problem that degrades the performance of supervisely trained speech processing models.
It is high time that we enhance the robustness of speech processing models to obtain good performance when encountering speech distortions.
arXiv Detail & Related papers (2022-03-30T07:25:52Z) - Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning [57.4036085386653]
We show that prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inferences based on lexical overlap.
We then show that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning.
arXiv Detail & Related papers (2021-09-09T10:10:29Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.