The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets,
Subjective Speech Quality and Testing Framework
- URL: http://arxiv.org/abs/2001.08662v2
- Date: Sun, 19 Apr 2020 16:16:08 GMT
- Title: The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets,
Subjective Speech Quality and Testing Framework
- Authors: Chandan K. A. Reddy, Ebrahim Beyrami, Harishchandra Dubey, Vishak
Gopal, Roger Cheng, Ross Cutler, Sergiy Matusevych, Robert Aichner, Ashkan
Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke
- Abstract summary: The INTERSPEECH 2020 Deep Noise Suppression Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement.
We open-source a large clean speech and noise corpus for training the noise suppression models and a representative test set to real-world scenarios.
The winners of this challenge will be selected based on subjective evaluation on a representative test set using P.808 framework.
- Score: 27.074806625047646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The INTERSPEECH 2020 Deep Noise Suppression Challenge is intended to promote
collaborative research in real-time single-channel Speech Enhancement aimed to
maximize the subjective (perceptual) quality of the enhanced speech. A typical
approach to evaluate the noise suppression methods is to use objective metrics
on the test set obtained by splitting the original dataset. Many publications
report reasonable performance on the synthetic test set drawn from the same
distribution as that of the training set. However, often the model performance
degrades significantly on real recordings. Also, most of the conventional
objective metrics do not correlate well with subjective tests and lab
subjective tests are not scalable for a large test set. In this challenge, we
open-source a large clean speech and noise corpus for training the noise
suppression models and a representative test set to real-world scenarios
consisting of both synthetic and real recordings. We also open source an online
subjective test framework based on ITU-T P.808 for researchers to quickly test
their developments. The winners of this challenge will be selected based on
subjective evaluation on a representative test set using P.808 framework.
Related papers
- Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance [4.291589126905706]
In the AutoML domain, test accuracy is heralded as the quintessential metric for evaluating model efficacy.
However, the reliability of test accuracy as the primary performance metric has been called into question.
The distribution of hard samples between training and test sets affects the difficulty levels of those sets.
We propose a benchmarking procedure for comparing hard sample identification methods.
arXiv Detail & Related papers (2024-09-22T11:38:14Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge [19.810337081901178]
Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals.
This discrepancy can result in poor performance when the test domain significantly differs from the synthetic training domain.
The UDASE task of the 7th CHiME challenge aimed to leverage real-world noisy speech recordings from the test domain.
arXiv Detail & Related papers (2024-02-02T13:45:42Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z) - Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment
Model with Cross-Domain Features [30.57631206882462]
The MOSA-Net is designed to estimate speech quality, intelligibility, and distortion assessment scores based on a test speech signal as input.
We show that the MOSA-Net can precisely predict perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (BLS) scores when tested on both noisy and enhanced speech utterances.
arXiv Detail & Related papers (2021-11-03T17:30:43Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - Utilizing Self-supervised Representations for MOS Prediction [51.09985767946843]
Existing evaluations usually require clean references or parallel ground truth data.
Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception.
We develop an automatic evaluation approach that correlates well with human perception while not requiring ground truth data.
arXiv Detail & Related papers (2021-04-07T09:44:36Z) - Rethinking Evaluation in ASR: Are Our Models Robust Enough? [30.114009549372923]
We show that, in general, reverberative and additive noise augmentation improves generalization performance across domains.
We demonstrate that when a large enough set of benchmarks is used, average word error rate (WER) performance over them provides a good proxy for performance on real-world noisy data.
arXiv Detail & Related papers (2020-10-22T14:01:32Z) - The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets,
Subjective Testing Framework, and Challenge Results [27.074806625047646]
DNS Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement.
We open-sourced a large clean speech and noise corpus for training the noise suppression models.
We also open-sourced an online subjective test framework based on ITU-T P.808 for researchers to reliably test their developments.
arXiv Detail & Related papers (2020-05-16T23:48:37Z) - Generating diverse and natural text-to-speech samples using a quantized
fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples.
We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.