The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets,
Subjective Testing Framework, and Challenge Results
- URL: http://arxiv.org/abs/2005.13981v3
- Date: Sun, 18 Oct 2020 04:36:21 GMT
- Title: The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets,
Subjective Testing Framework, and Challenge Results
- Authors: Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger
Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami,
Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke
- Abstract summary: DNS Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement.
We open-sourced a large clean speech and noise corpus for training the noise suppression models.
We also open-sourced an online subjective test framework based on ITU-T P.808 for researchers to reliably test their developments.
- Score: 27.074806625047646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The INTERSPEECH 2020 Deep Noise Suppression (DNS) Challenge is intended to
promote collaborative research in real-time single-channel Speech Enhancement
aimed to maximize the subjective (perceptual) quality of the enhanced speech. A
typical approach to evaluate the noise suppression methods is to use objective
metrics on the test set obtained by splitting the original dataset. While the
performance is good on the synthetic test set, often the model performance
degrades significantly on real recordings. Also, most of the conventional
objective metrics do not correlate well with subjective tests and lab
subjective tests are not scalable for a large test set. In this challenge, we
open-sourced a large clean speech and noise corpus for training the noise
suppression models and a representative test set to real-world scenarios
consisting of both synthetic and real recordings. We also open-sourced an
online subjective test framework based on ITU-T P.808 for researchers to
reliably test their developments. We evaluated the results using P.808 on a
blind test set. The results and the key learnings from the challenge are
discussed. The datasets and scripts can be found here for quick access
https://github.com/microsoft/DNS-Challenge.
Related papers
- How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance [4.291589126905706]
In the AutoML domain, test accuracy is heralded as the quintessential metric for evaluating model efficacy.
However, the reliability of test accuracy as the primary performance metric has been called into question.
The distribution of hard samples between training and test sets affects the difficulty levels of those sets.
We propose a benchmarking procedure for comparing hard sample identification methods.
arXiv Detail & Related papers (2024-09-22T11:38:14Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge [19.810337081901178]
Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals.
This discrepancy can result in poor performance when the test domain significantly differs from the synthetic training domain.
The UDASE task of the 7th CHiME challenge aimed to leverage real-world noisy speech recordings from the test domain.
arXiv Detail & Related papers (2024-02-02T13:45:42Z) - Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC)
NPC consists of a detection module and a correction module.
We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z) - Towards Robust and Generalizable Training: An Empirical Study of Noisy
Slot Filling for Input Perturbations [38.766702041991046]
We introduce a noise robustness evaluation dataset named Noise-SF for slot filling task.
The proposed dataset contains five types of human-annotated noise.
We find that baseline models have poor performance in robustness evaluation.
arXiv Detail & Related papers (2023-10-05T12:59:57Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - Utilizing Self-supervised Representations for MOS Prediction [51.09985767946843]
Existing evaluations usually require clean references or parallel ground truth data.
Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception.
We develop an automatic evaluation approach that correlates well with human perception while not requiring ground truth data.
arXiv Detail & Related papers (2021-04-07T09:44:36Z) - Interspeech 2021 Deep Noise Suppression Challenge [41.68545171728067]
DNS challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality.
We open-sourced training and test datasets for the wideband scenario.
In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios.
arXiv Detail & Related papers (2021-01-06T07:46:25Z) - The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets,
Subjective Speech Quality and Testing Framework [27.074806625047646]
The INTERSPEECH 2020 Deep Noise Suppression Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement.
We open-source a large clean speech and noise corpus for training the noise suppression models and a representative test set to real-world scenarios.
The winners of this challenge will be selected based on subjective evaluation on a representative test set using P.808 framework.
arXiv Detail & Related papers (2020-01-23T17:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.